Saturday, September 5, 2009

Sep 5 - Capra, Usability Problem Description and the Evaluator Effect in Usability Testing

Usability Problem Description and the Evaluator Effect in Usability Testing.
Miranda G. Capra
Dissertation submitted to the Faculty of the Virginia Polytechnic Institute and State University in partial fulfillment of the requirements for the degree of
Doctor of Philosophy in Industrial and Systems Engineering

March 13, 2006
Blacksburg, Virginia


ABSTRACT .
Previous usability evaluation method (UEM) comparison studies have noted an evaluator effect on problem detection in heuristic evaluation, with evaluators differing in problems found and problem severity judgments. There have been few studies of the evaluator effect in usability testing (UT), task-based testing with end-users. UEM comparison studies focus on counting usability problems detected, but we also need to assess the content of usability problem descriptions (UPDs) to more fully measure evaluation effectiveness. The goals of this research were to develop UPD guidelines, explore the evaluator effect in UT, and evaluate the usefulness of the guidelines for grading UPD content.
Ten guidelines for writing UPDs were developed by consulting usability practitioners through two questionnaires and a card sort. These guidelines are (briefly): be clear and avoid jargon, describe problem severity, provide backing data, describe problem causes, describe user actions, provide a solution, consider politics and diplomacy, be professional and scientific, describe your methodology, and help the reader sympathize with the user. A fourth study compared usability reports collected from 44 evaluators, both practitioners and graduate students, watching the same 10-minute UT session recording. Three judges measured problem detection for each evaluator and graded the reports for following 6 of the UPD guidelines.
There was support for existence of an evaluator effect, even when watching prerecorded sessions, with low to moderate individual thoroughness of problem detection across all/severe problems (22%/34%), reliability of problem detection (37%/50%) and reliability of severity judgments (57% for severe ratings). Practitioners received higher grades averaged across the 6 guidelines than students did, suggesting that the guidelines may be useful for grading reports. The grades for the guidelines were not correlated with thoroughness, suggesting that the guideline grades complement measures of problem detection.
A simulation of evaluators working in groups found a 34% increase in severe problems found by adding a second evaluator. The simulation also found that thoroughness of individual evaluators would have been overestimated if the study had included a small number of evaluators. The final recommendations are to use multiple evaluators in UT, and to assess both problem detection and description when measuring evaluation effectiveness.

INTRODUCTION

Formative usability evaluations are an important part of the usability life cycle, identifying usability problems present in an interface that designers should fix in the next design iteration.
Figure 1.1 provides an overview of a formative usability evaluation and the usability life cycle.
The output of a formative usability evaluation is a set of usability problems (Hartson, Andre, & Williges, 2003), but there are many different usability evaluation methods (UEMs).
Empirical evaluations involve end-users. The most common empirical method is usability testing or think aloud testing, which is generally a taskbased session in a usability laboratory.
Analytical evaluations involve expert review of an interface, such as Heuristic Evaluation (Nielsen, 1994b; Nielsen & Molich, 1990).

Formative usability evaluation is not a reliable process.
Evaluators discover different sets of usability problems depending on the usability evaluation method (UEM) used or the individual evaluator that performs an analytical evaluation (Hertzum & Jacobsen, 2003).
Other factors that can affect problem detection are the number and type of users involved in usability testing (Law & Hvannberg, 2004a; Nielsen, 1994a; Spool & Schroeder, 2001; Virzi, 1990, 1992) and the number of evaluators involved in an expert review (Dumas & Sorce, 1995; Dutt, Johnson, & Johnson, 1994).
Evaluators also differ in their judgment of the severity of usability problems (Hassenzahl, 2000; Hertzum & Jacobsen, 2003).
Hertzum and Jacobsen coined the term evaluator effect to refer to differences in problem detection and severity judgments by evaluators using the same UEM (Hertzum & Jacobsen, 2003; Jacobsen, Hertzum, & John, 1998).

Most studies comparing UEM effectiveness or measuring the evaluator effect rely on measures that involve counting usability problems identified by each evaluator, such as the any-two agreement measure of reliability (Hertzum & Jacobsen, 2003) or the thoroughness and validity of problem sets (Hartson et al., 2003). These measures are useful, but they do not give a complete measure of the effectiveness of an evaluation.
Counting usability problems detected is part of the equation, but a better question to ask is how can an evaluation help to efficiently improve a product (Wixon, 2003).

Communicating the results of the test (either through a written report or verbally) is an essential part of the usability testing process (Rubin, 1994).
Andre, Hartson, Belz, and McCreary (2001) express the opinion that poor documentation and communication of usability problems identified diminish the effectiveness of a usability evaluation and can reduce the return on the effort invested in conducting the evaluation.
Dumas, Molich and Jeffries (2004) suggest that communication style and attitude in report writing can affect recipients’ acceptance of suggestions and the number of the problems recipients choose to fix.
Jeffries (1994) suggests that developers may interpret poorly described problems as false alarms, causing the developers to ignore the poorly described problems and increasing the likelihood that developers will treat future problems as opinion or false alarms.
A more complete measure of UEM output would assess not only the quantity of problem descriptions generated but also the quality.

There has been little formal research into usability problem description.
Many authors have developed structured problem reporting forms and problem classification schemes (Andre et al., 2001; Cockton, Woolrych, & Hindmarch, 2004; Hvannberg & Law, 2003; Lavery, Cockton, & Atkinson, 1997; Sutcliffe, 2000) to increase the utility of problem descriptions and thoroughness of problem diagnosis.
However, these studies have not provided formal documentation of poor problem descriptions, nor have they provided measures of problem description quality to demonstrate the effectiveness of these tools.

1.1 Problem Statement

Formative usability evaluation is not a reliable process. There is evidence of an evaluator effect on problem detection in analytical methods and a user effect in usability testing. Usability testing had been considered the gold standard of usability evaluation, but there have been few studies of whether it is also subject to an evaluator effect.
Previous studies of the evaluator effect on problem detection in usability testing have allowed different tasks and users, used a small sample size, or used student teams; we need larger studies of the evaluator effect that focus specifically on study analysis, rather than design and execution.
In addition, previous studies of usability testing have focused primarily on comparing the number of problems identified by each evaluation. Counting usability problems identified by an evaluation is a necessary but not sufficient measure of evaluation effectiveness.
We need measures of usability problem description (UPD) content to be able to document shortcomings in UPDs and more fully measure evaluation effectiveness.

1.2 Goals

This research had three goals:
(1) develop guidelines for describing usability problems,
(2) explore the evaluator effect in usability testing, and
(3) evaluate the usefulness of the guidelines for judging the content of usability problem descriptions.

This research should lead to a better understanding of usability problem descriptions and the evaluator affecting usability testing, and provide a basis for future studies to formally develop metrics of usability problem description quality.

Research Question 1: How do practitioners describe usability problems?
Research Question 2: Is there an evaluator effect in usability testing?
Research Question 3: How can we assess the content of UPDs?


1.3 Approach

The first phase of this study focused on developing guidelines for describing usability problems.
There is little discussion of UPDs in usability articles or textbooks to form the basis of guidelines. Instead, research used an exploratory approach to develop guidelines, consulting usability practitioners about important qualities of UPDs with a series of questionnaires.

The second phase focused on the evaluator effect in usability testing.
There have been 10 previous studies of the evaluator effect in usability testing. .....
The second phase also served as a preliminary assessment of use of the guidelines to assess the content of usability reports. The guidelines from Phase I were used to evaluate the usability reports collected in Phase II. Assessment measures included the extent to which practitioners followed the guidelines, if following the guidelines was an indicator of practical usability experience, and comparing opinions about the importance of the guidelines with behavior in terms of following the guidelines when writing the usability reports.

Table 1.1 summarizes the goals and approach of this research, and Table 1.2 summarizes the phases and outputs.

My Comments: Capra had used the 2 tables to systematically explain on the Problems, Goals, Approach, Chapters, Phases, Studies, RQ and Output (deliverables).
Table 1.1 Research problems, goals and approach
Table 1.2 Summary of Chapters, Phases and Studies



2.4 Discussion
Table 2.12 Ten Guidelines for Describing Usability Problems

1 Be clear and precise while avoiding wordiness and jargon.
Define terms that you use. Be concrete, not vague. Be practical, not theoretical. Use descriptions that non-HCI people will appreciate. Avoid so much detail that no one will want to read the description.

2 Describe the impact and severity of the problem,
including business effects (support costs, time loss, etc.), impact on the user's task and importance of the task. Describe how often the problem will occur, and system components that are affected or involved.

3 Support your findings with data
such as: how many users experienced the problem and how often; task attempts, time and success/failure; critical incident descriptions; and other objective data, both quantitative and qualitative. Provide traceability of the problem to observed data.

4 Describe the cause of the problem,
including context such as the interaction architecture and the user's task. Describe the main usability issue involved in the problem. Avoid guessing about the problem cause or user's thoughts.

5 Describe observed user actions,
including specific examples from the study, such as the user's navigation flow through the system, user's subjective reactions, screen shots and task success/failure. Mention whether the problem was user-reported or experimenter observed.

6 Consider politics and diplomacy
when writing your description. Avoid judging the system, criticizing decisions made by other team members, pointing fingers or assigning blame. Point out good design elements and successful user interactions. Be practical, avoiding theory and jargon.

7 Be professional and scientific in your description.
Use only facts from the study, rather than opinions or guesses. Back your findings with sources beyond the current study, such as external classification scheme, proven usability design principles, and previous research.

8 Describe a solution to the problem,
providing alternatives and tradeoffs. Be specific enough to be helpful without dictating a solution, guessing, or jumping to conclusions. Supplement with pictures, screen capture, usability design principles and/or previous research.

9 Describe your methodology and background.
Describe how you found this problem (field study, lab study, expert evaluation, etc.). Describe the limitations of your domain knowledge. Describe the user groups that were affected and the breadth of system components involved.

10 Help the reader sympathize with the user's problem
by using descriptions that are evocative and anecdotal. Make sure the description is readable and understandable. Use user-centric language rather than system-centric. Be complete while avoiding excessive detail.

2.4.2 Applications

These guidelines will be useful for usability practitioners, instructors and
researchers.

· Usability practitioners can use this list to create usability problem reporting forms, to create checklists of items to address in writing problem descriptions, or to evaluate usability reports generated in usability studies. Usability groups could evaluate their work products to ensure that they are writing effective problem descriptions in usability reports.

· Usability instructors can use them to explain what should be in a usability problem description and to grade usability evaluations conducted by students. Following as many of the guidelines as possible would be an appropriate exercise for students, where thoroughness of the report is more important than the time spent on it. Usability students need to practice writing complete descriptions so that they will learn what information they need to take note of during a usability evaluation. Training of new practitioners could include more practice and review opportunities for the guidelines rated as more difficult.

· Usability researchers can use the guidelines to assess problem descriptions generated by different usability practitioners or usability evaluation methods. Current research in UEM effectiveness focuses on counting usability problems identified in evaluations, but more effort should be focused on the quality of descriptions as well as the quantity. This was explored further in Phase II.

The most important suggestion for using the guidelines is to select the guidelines
that fit your project and organization
.

My Comments: I did literature review on Miranda Capra's PhD Dissertation when I did my Master's Project Dissertation.

2.6 Conclusion

The output of Phase I was a set of 10 guidelines for describing usability problems.
The 10 guidelines presented in this section are suggestions, not rules. It would be difficult
and time-consuming to follow every single guideline for every single problem description. Different guidelines may be more or less important for different projects, clients, and organizations.

References that I may want to read further in future:
ANSI. (2001). Common Industry Format for Usability Test Reports (ANSI NCITS 354-2001). New York: author.
Bastien, J. M. C., Scapin, D. L., & Leulier, C. (1996). Looking for usability problems with the ergonomic criteria and with the ISO 9241-10 dialogue principles. In M. J. Tauber & V. Bellotti & R. Jeffries & J. D. Mackinlay & J. Nielsen (Eds.), Proceedings of the Conference on Human Factors in Computing Systems (CHI '96) (pp. 77-78). New York: ACM.
Chin, J. P., Diehl, V. A., & Norman, L. K. (1988). Development of an instrument measuring user satisfaction of the human-computer interface. In E. Soloway & D. Frye & S. B. Sheppart (Eds.), Proceedings of the Conference on Human Factors in Computing Systems (CHI '88) (pp. 213-218). New York: ACM.
Connell, I. W., & Hammond, N. V. (1999). Comparing Usability Evaluation Principles with Heuristics: Problem Instances vs. Problem Types. In M. A. Sasse & C. Johnson (Eds.), Proceedings of the Human Computer Interaction - INTERACT '99 (pp. 621-629): IOS.
De Angeli, A., Matera, M., Costabile, M. F., Garzotto, F., & Paolini, P. (2000). Validating the SUE Inspection Technique. In Proceedings of the Working Conference on Advanced Visual Interfaces (AVI 2000) (pp. 143-150). New York: ACM.
Dumas, J. S., Molich, R., & Jeffries, R. (2004). Describing usability problems: are we sending the right message? interactions, 11(4), 24-29.
Hassenzahl, M. (2000). Prioritizing usability problems: data-driven and judgement-driven severity estimates. Behaviour & Information Technology, 19(1), 29-42.
Hornbæk, K., & Frøkjær, E. (2005). Comparing Usability Problems and Redesign Proposals as Input to Practical Systems Development. In Proceedings of the Conference on Human factors in computing systems (CHI 2005) (pp. 391-400). New York: ACM.
Hvannberg, E. T., & Law, L.-C. (2003). Classification of Usability Problems (CUP) Scheme. In M. Rauterberg (Ed.) Proceedings of the Human-Computer Interaction - INTERACT'03 (pp. 655-662): IOS.
ISO. (1998). Ergonomic requirements for office work with visual display terminals (VDTs) -- Part 11: Guidance on usability (ISO 9241-11:1998(E)). Geneva: author.
ISO. (1999a). Ergonomic requirements for office work with visual display terminals (VDTs) -- Part 10: Dialogue principles (ISO 9241-10:1998(E)). Geneva: author.
ISO. (1999b). Human -centred design processes for interactive systems (ISO 134071999(E)). Geneva: author.
Jacobsen, N. E., Hertzum, M., & John, B. E. (1998). The Evaluator Effect in Usability Tests. In C.-M. Karat & A. Lund & J. Coutaz & J. Karat (Eds.), Proceedings of the Conference on Human Factors in Computing Systems (CHI 98) (pp. 255-256). New York: ACM.
Jeffries, R. (1994). Usability Problem Reports: Helping Evaluators Communicate Effectively with Developers. In J. Nielsen & R. L. Mack (Eds.), Usability Inspection Methods (pp. 273-294). New York: John Wiley.
Keenan, S. L., Hartson, H. R., Kafura, D. G., & Schulman, R. S. (1999). The usability problem taxonomy: A framework for classification and analysis. Empirical Software Engineering, 4(1), 71-104.
Lavery, D., Cockton, G., & Atkinson, M. P. (1997). Comparison of evaluation methods using structured usability problem reports. Behaviour & Information Technology, 16(4/5), 246-266.
Mack, R., & Montaniz, F. (1994). Observing, Predicting, and Analyzing Usability Problems. In R. L. Mack & J. Nielsen (Eds.), Usability Inspection Methods (pp. 295-340). New York: John Wiley & Sons.
Nielsen, J. (1992). Finding Usability Problems Through Heuristic Evaluation. In P. Bauersfeld & J. Bennett & G. Lynch (Eds.), Proceedings of the Conference on Human Factors in Computing Systems (CHI 92) (pp. 373-380). New York: ACM.
Nielsen, J., & Landauer, T. K. (1993). A Mathematical model of the finding of usability problems. In S. Ashlund & K. Mullet & A. Henderson & E. Hollnagel & T. White (Eds.), Proceedings of the INTERCHI 93: Conference on Human Factors in Computing Systems (INTERACT 93 and CHI 93) (pp. 206-213). New York: ACM.
Rourke, C. (2003). CUE-4: Lessons in Best Practice for Usability Testing and Expert Evaluation. User Vision. Retrieved December 14, 2004 from http://www.uservision.co.uk/usability_articles/usability_CUE.asp .
Skov, M. B., & Stage, J. (2005). Supporting problem identification in usability evaluations. In Proceedings of the 19th conference of the computer-human interaction special interest group (CHISIG) of Australia on Computer-human interaction: citizens online: considerations for today and the future (OZCHI 2005). Narrabundah, Australia: Computer-Human Interaction Special Interest Group (CHISIG) of Australia.
Theofanos, M., & Quesenbery, W. (2005). Towards the Design of Effective Formative Test Reports. Journal of Usability Studies, 1(1), 27-45.
Theofanos, M., Quesenbery, W., Snyder, C., Dayton, D., & Lewis, J. (2005). Reporting
on Formative Testing
. A UPA 2005 Workshop Report. Bloomingdale, IL: UPA. Retrieved on October 3, 2005 from the UPA website http://www.upassoc.org/usability_resources/conference/2005/formative%20reporting-upa2005.pdf.

No comments:

Post a Comment