Wednesday, November 4, 2009

Nov 4 - Somervell, Heuristics Creation (PhD dissertation)


Chapter 4
Heuristics Creation


4.1 Introduction

Ensuring usability is an ongoing challenge for software developers. Myriad testing techniques exist, leading to a trade-off between implementation cost and results effectiveness.
Usability testing techniques are broken down into analytical and empirical types.
Analytical methods involve inspection of the system, typically experts in the application field, who identify problems in a walkthrough process.
Empirical methods leverage people who could be real users of the application in controlled tests of specific aspects of the system, often to determine efficiency in performing tasks with the system.
Using either type has advantages and disadvantages, but practitioners typically have limited budgets for usability testing. Thus, they need to use techniques that give useful results while not requiring significant funds. Analytic methods fit this requirement more readily for formative evaluation stages.
With the advent of new technologies and non-traditional interfaces, analytic techniques like heuristics hold the key to early and effective interface evaluation.
There are problems with using analytical methods (like heuristics) that can decrease the validity of results [21]. These problems come from applying a small set of guidelines to a wide range of systems, necessitating interpretation of evaluation results. This illustrates how generic guidelines are not readily applicable to all systems [40], and more specific heuristics are necessary.
Our goal was to create a more specific set, tailored to this system class, yet still have a
set that can be generic enough to apply to all systems in this class.

LSIEs focus on very specific user goals based on the critical parameters of interruption, reaction, and comprehension. Differing levels of each parameter (high, medium, or low) define different system classes [62]. We focus on LSIEs which require medium interruption, low to high
reaction, and high comprehension.

4.2 Motivation

Tremendous effort has been devoted to the study of usability evaluation, specifically in comparing analytic to empirical methods.
Nielsen’s heuristics are probably the most notable set of analytical techniques, developed to facilitate formative usability testing [71, 70]. They have come under fire for their claims that heuristic evaluations are comparable to user testing, yet require fewer test subjects. Comparisons of user testing to heuristic evaluation are numerous [48, 50, 90].

Some have worked to develop targeted heuristics for specific application types.
Baker et al. report on adapting heuristic evaluation to groupware systems [5]. They show that applying heuristic evaluation methods to groupware systems is effective and efficient for formative usability evaluation.
Mankoff et al. compare an adapted set of heuristics to Nielsen’s original set [56]. They studied ambient displays with both sets of heuristics and determined that their adapted set is better suited to ambient displays.

4.3 Processes Involved

How does one create a set of heuristics anyway? We could follow the steps of previous researchers and just use pre-existing heuristics, then reason about the target system class, hopefully coming up with a list of new heuristics that prove useful.
Nielsen and Molich explicitly state that the heuristics come from years of experience and reflection. Not surprising as the heuristics emerged some 30 years after graphical interfaces became mainstream. In the case of Nielsen and Mack, they at least validated their method through using it in the analysis of several systems, after they had created their set.
The two mentioned studies relied upon vague descriptions of theoretical underpinnings [5] or simple tweaking of existing heuristics [56].
Our approach to this lack of structure in creating heuristics is to take a logical look at how one might uncover or discover heuristics for a particular type of system. Basically, to gain insight about a certain type of system, one could analyze several example applications in that system class based on the critical parameters for that system class, and then use the results of that analysis to categorize and group the issues discovered into re-usable design guidelines or heuristics.
These stages involve:
• selection of target systems.
• inspection of these systems.
An approach like claims analysis [15] provides necessary
structure to knowledge extraction and provides a consistent representation.
• classifying design implications. Leveraging the underlying critical parameters can help
organize the claims found in terms of impacts to those parameters.
• categorizing design implications. Scenario Based Design [77] provides a mechanism for
categorizing design knowledge into manageable parts.
• extracting high level design guidance. Based on the groupings developed in the previous step, high level design guidelines can be formulated in terms of design issues.
• synthesizing potential heuristics. By matching and relating similar issues, heuristics can
be synthesized.

4.4 Selecting Systems

The first step in the creation process requires careful selection of example systems to inspect and analyze for uncovering existing problems in the systems. The idea is to uncover typical issues inherent in that specific type of system.
Our goal was to use a representative set of systems from the LSIE class. We wanted systems that had been in use for a while, with reports on usage or studies on usability to help validate the analysis we would perform on the systems. //we chose the following five LSIE systems, including some from our own work and some from other well-documented design efforts, to further investigate in the creation process.

• GAWK [31] This system provides teachers and students an overview and history of current project work by group and time, on a public display in the classroom.
• Photo News Board [85] This system provides photos of news stories in four categories, shown on a large display in a break room or lab.
• Notification Collage [36] This system provides users with communication information and various data from others in the shared space on a large screen.
• What’s Happening? [94, 95] This system shows relevant information (news, traffic, weather) to members of a local group on a large, wall display.
• BlueBoard [78] This system allows members in a local setting to view information pages about what is occurring in their location (research projects, meetings, events).

These five systems were chosen as a representative set of large screen information exhibits. The GAWK and Photo News Board were created in local labs and thus we have access to the developers and potential user classes. The other three are some of the more famous and familiar ones found in recent literature.

4.5 Analyzing Systems

Now that we have selected our target systems, we must now determine the typical usability issues and problems inherent in these systems. Performing usability analysis or testing of these systems finds the issues and problems each system holds. To find usability problems we can do analytic or empirical investigations, recording the issues we find.
We chose to use an analytic evaluation approach to the five aforementioned LSIEs, based on
arguments from Section 3.3. We wanted to uncover as many usability concerns as possible, so
we chose claims analysis [15, 77] as the analytic vehicle with which we investigated our systems.

4.5.1 Claims Analysis

Claims analysis is a method for determining the impacts design decisions have on user goals for
a specific piece of software [15, 77]. Claims are statements about a design element reflecting a
positive or negative effect resulting from using the design element in a system [15].
Claims analysis involves inspection and reflection on the wordings of specific claims to determine the psychological impacts a design artifact may have on a user [15]. The wordings are the actual words used to describe positive and negative effects of the claims. The impacts are the overall psychological effect on the user.

4.5.2 System Claims

Claims were made for each of the five systems that were inspected. These claims focused on design artifacts and overall goals of the systems. These claims are based on typical usage, as exemplified by the scenarios shown for each system. On average, there were over 50 claims made per system.
Table 4.2 shows the breakdown of the numbers of claims found for each system.
Each claim dealt with some design element in the interface, showing upsides or downsides resulting from a particular design choice.
These claims can be thought of as problem indicators, unveiling potential problems with the system being able to support the user goals. These problem indicators include positive aspects of design choices as well. By including the good with the bad, we gain fuller understanding of the underlying design issues.

4.5.3 Validating Claims

How do we know that the claims we found through our analysis represent the “real” design challenges in the systems? This is a fair question and one that must be addressed. We need to verify that the claims we are using to extract design guidance for LSIE systems are actually representative of real user problems encountered during use of those systems. We tackled this problem through several different techniques.

For the GAWK and Photo News Board, we relied upon existing empirical studies [85] to validate the claims we found for those systems.
For the Notification Collage we relied upon discussion and feedback from the system developers. We sent the list of claims and scenarios to Saul Greenberg and Michael Rounding and asked them to verify that the claims we made for the Notification Collage were typical of what they observed users actually doing with the system. Michael Rounding provided a thorough response that indicated most of the claims were indeed correct and experienced by real users of the system.
A similar effort was attempted with both the What’s Happening? and Blue Board systems. The developers of these systems were contacted but no specific feedback was provided on our claims. However, John Stasko, co-developer of the What’s Happening? system, provided interview feedback on the system and provided a nice publication [95] that served as validation material for the claims. This report provides details on user experiments done with the What’s Happening? system. Using this report, we were able to verify that most of the claims we made for the system were experienced in those experiments.
Unfortunately, none of the developers of the Blue Board system responded to our request. We were able to use existing literature on the system to verify some of the claims but the reports on user behavior in [78] did not provide enough material to validate all of the claims we found for that system.

4.6 Categorizing Claims

Now that we have analyzed several systems in the LSIE class, and we have over 250 claims about design decisions for those systems, how do we make sense of it all and glean reusable design guidance in the form of heuristics? To make sense of the claims we have, we need to group and categorize similar claims.
This requires a framework to ensure consistent classification and facilitate final heuristic synthesis from the classification. This is where the idea of critical parameters plays an important role.


4.6.1 Classifying Claims Using the IRC Framework


Recall that notification systems can be classified by their level of impact on interruption, reaction, and comprehension [62]. This classification scheme can be simplified to reflect a high, medium, or low impact to each of interruption, reaction, and comprehension.

In other words, we can take a single claim and classify it according to the impact it would have on the user goals associated with the system.
For example, we have a claim about the collage metaphor from the Notification Collage system that suggests that the lack of organization can hinder efforts to find information. This claim would be classified as “high” interruption because it increases the time required to find a piece of information. It could also be classified as “low” comprehension because it reduces a person’s ability to understand the information quickly and accurately. It is perfectly acceptable to have the claim fit into both classifications.

4.6.2 Assessing Goal Impact

Determining the impact a claim has on the user goals was done through inspection and reflection techniques. Each claim was read and approached from the scenarios for the system, trying to identify if the claim had an impact on the user goals. A claim impacted a user goal if it was determined through the wording of the claim that one of interruption, reaction, or comprehension was modified by the design element.
To assign user goal impacts to the claims, a team of experts should assess each claim.
These experts should have extensive knowledge of the system class, and the critical parameters that define that class. Knowledge of claims analysis techniques and/or usability evaluation are highly recommended.
We used a two- person team of experts.
Differences occurred when these classifications were not compatible.
Agreement was measured as the number of claims with the same classification divided by the total number of claims. We found that initial agreement on the claims was near 94% and after discussion was 100% for all claims.
This calculation comes from the fact that out of 253 individual claims, 237 were classified by the inspectors as impacting the user goals in the same way, i.e. all of the experts agreed on the same classification.

4.6.3 Categorization Through Scenario Based Design

Categorization is needed to separate the claims into manageable groups. By focusing on related claims, similar design tradeoffs can be considered together. An interface design methodology is useful because these approaches often provide a built-in structure that facilitates claims categorization.
Possible design methodologies include Scenario Based Design [77], User Centered Design [73], and Norman’s Stages of Action [72].
Scenario based design (SBD)[77] is an interface design methodology that relies on scenarios
about typical usage of a target system.

Activity Design
Activity design involves what users can and cannot accomplish with the system, at a high level [77]. These are the tasks that the interface supports, ones that the users would otherwise not be able to accomplish.
Activity design encompasses metaphors and supported/unsupported activities [77].

Information Design
Information design deals with how information is shown and how the interface looks [77]. Design decisions for information presentation directly impact comprehension, as well as interruption. Identifying the impacts of information design decisions on user goals can lead to effective design guidelines.
We chose to use the following sub-categories for refining the information design category: use of screen space, foreground and background colors, use of fonts, use of audio, use of animation, and layout. These sub-categories were chosen because they cover almost all of the design issues relevant to information design [77].

Interaction Design
Interaction design focuses on how a user would interact with a system (clicking, typing, etc) [77]. This includes recognizing affordances, understanding the behavior of interface controls, knowing the expected transitions of states in the interface, support for error recovery and undo operations, feedback about task goals, and configurability options for different user classes [77].

Categorization
Armed with the above categories, we are now able to group individual claims into an organized structure, thereby facilitating further analysis and reuse. So how do we know in which area a particular claim should go? This again is done through group analysis and discussion regarding the wording of the claim. The claim wordings typically indicate which category of SBD applies, and any disagreements can be handled through discussion and mitigation.
Similar to the classification effort, this categorization process relied upon the claim wordings for correct placement within the SBD categories. The sub- categories for each of activity, information, and interaction provide 14 areas in which claims may be placed.

Unclassified Claims

Some of the claims were deemed to be unclassified, since the claim did not impact interruption, reaction, or comprehension. While it is possible to situate these claims within the SBD categories, if the claim does not impact one of the three user goals, it was said to be unclassified.

4.7 Synthesis Into Heuristics

After classifying the problems within the framework, we then needed to extract usable design recommendations from those problems.
This required re-inspection of the claim groupings to determine the underlying causes to these issues.
Since the problems come from different systems, we get a broad look at potential design flaws. Identifying and recognizing these flaws in these representative systems can help other designers avoid making those same mistakes in their work.

4.7.1 Visualizing the Problem Tree

To better understand how claims impacted the user goals of each of the systems, a problem tree was created to aid in the visualization of the dispersion of the claims within different areas of the SBD categories.
A problem tree is a collection of claims for a system class, organized by categories, sub-categories, and critical parameter. It serves as a representation of the design knowledge that is collected from the claims analysis process.
A node in the problem tree refers to a collection of claims that fits within a single category (from SBD) with a single classification (from the critical parameters). A leaf in the tree refers to a single claim, and is attached to some node in the tree.

4.7.2 Identifying Issues

To glean reusable design guidance from the individual claims, team discussion was used. A team of experts who are familiar with the claims analysis process and the problem tree considers each node in the tree with the aim of identifying one or more issues that capture the claims within said node.
Issues are design statements, more general than individual claims.
This effort produced 22 issues that covered the 333 claims.

4.7.3 Issues to Heuristics

Armed with the 22 high level issues, we now needed to extract a subset of high level design heuristics from these issues. Twenty-two is unmanageable for formative heuristic evaluation [66] and in many cases the issues were similar or related, suggesting opportunities for concatenation and grouping. This similarity allowed us to create higher level, more generic heuristics to capture the issues.
We created eight final heuristics, capturing the 22 issues discovered in the earlier process.
Table 4.7 provides an example of how we moved from the issues to the heuristics. In most instances, two or three issues could be combined into a single heuristic. However some of the issues were already at a high level and were taken directly into the heuristic list.


4.7.4 Heuristics

Here is the list of heuristics that can be used to guide evaluation of large screen information exhibits.
Explanatory text follows each heuristic, to clarify and illustrate how the heuristics could impact evaluation. Each is general enough to be applied to many systems in this application class, yet they all address the unique user goals of large screen information exhibits.

Appropriate color schemes should be used for supporting information understanding.
Try using cool colors such as blue or green for background or borders. Use warm colors like red and yellow for highlighting or emphasis.

Layout should reflect the information according to its intended use.
Time based information should use a sequential layout; topical information should use categorical, hierarchical, or grid layouts. Screen space should be delegated according to information importance.

Judicious use of animation is necessary for effective design.
Multiple, separate animations should be avoided. Indicate current and target locations if items are to be automatically moved around the display. Introduce new items with slower, smooth transitions. Highlighting related information is an effective technique for showing relationships among data.

Use text banners only when necessary.
Reading text on a large screen takes time and effort. Try to keep it at the top or bottom of the screen if necessary. Use sans serif fonts to facilitate reading, and make sure the font sizes are big enough.

Show the presence of information, but not the details.
Use icons to represent larger information structures, or to provide an overview of the information space, but not the detailed information; viewing information details is better suited to desktop interfaces. The magnitude or density of the information dictates representation mechanism (text vs icons for example).

Using cyclic displays can be useful, but care must be taken in implementation.
Indicate “where” the display is in the cycle (i.e. 1 of 5 items, or progress bar). Timings (both for
single item presence and total cycle time) on cycles should be appropriate and allow users to
understand content without being distracted.

Avoid the use of audio.
Audio is distracting, and on a large public display, could be detrimental to others in the setting. Furthermore, lack of audio can reinforce the idea of relying on the visual system for information exchange.

Eliminate or hide configurability controls.
Large public displays should be configured one time by an administrator. Allowing multiple users to change settings can increase confusion and distraction caused by the display. Changing the interface too often prevents users from learning the interface.

Source: Somervell, Jacob. Developing Heuristic Evaluation Methods for Large Screen Information Exhibits Based on Critical Parameters. [Dissertation, PhD in Computer Science and Applications] Virginia Polytechnic Institute and State University. June 22, 2004.

No comments:

Post a Comment