Saturday, October 31, 2009

Nov 1 – Somervell’s dissertation

Chapter 2
Literature Review


developing new heuristics for the LSIE system class, based on critical parameters.

Critical Parameters
William Newman put forth the idea of critical parameters for guiding design and strengthening evaluation in [68] as a solution to the growing disparity between interactive system design and separate evaluation.
For example, consider airport terminals, where the critical parameter would be flight capacity per hour per day [68]. All airport terminals can be assessed in terms of this capacity, and improving that capacity would invariably mean we have a better airport. Newman argues that by establishing parameters for application classes, researchers can begin establishing evaluation criteria, thereby providing continuity in evaluation that allows us “to tell whether progress is being made” [68].
In addition, Newman argues that critical parameters can actually provide support for developing design methodologies, based on the most important aspects of a design space. This ability separates critical parameters from traditional usability metrics. Most usability metrics, like “learnability” or “ease of use” only probe the interaction of the user with some interface, focusing not on the intended purpose of the system but on what the user can do with the system.
Critical parameters focus on supporting the underlying system functions that allow one to determine whether the system performs its intended tasks.
Indeed, the connection between critical parameters and traditional usability metrics can be described as input and output of a “usability” function. Critical parameters are used to derive the appropriate usability metrics for a given system, and these metrics are related to the underlying system goals through the critical parameters.

Critical Paramters for Notification Systems
In [62], we embraced Newman’s view of critical parameters and established three parameters that define the notification systems design space.
Interruption, reaction, and comprehension are three attributes of all notification systems that allow one to assess whether the system serves its intended use.
Furthermore, these parameters allow us to assess the user models and system designs associated with notification systems in terms of how well a system supports these three parameters.
High and low values of each parameter capture the intent of the system, and allow one to measure whether the system supports these intents.

2.2.1 Analytical Methods
Analytical methods show great promise for ensuring formative evaluation is completed, and not just acknowledged in the software life cycle. These methods provide efficient and effective usability results [70].
The alternative usually involves costly user studies, which are difficult to perform, and increase the design phases for most interface development projects. It is for these reasons that we focus on analytical methods, specifically heuristics.
Heuristic methods are chosen in this research for two reasons.
One, these methods are considered “discount” methods because they require minimal resources for the usability problems they uncover [70].
Two, these methods only require system mock-ups or screen shots for evaluation, which makes them desirable for formative evaluation. These are strong arguments for developing this method for application in multiple areas.

2.2.2 Heuristic Evaluation
A popular evaluation method, both in academia and industry is heuristic evaluation.
Heuristics are simple, fast approaches to assessing usability [70]. Expert evaluators visually inspect an interface to determine problems related to a set of guidelines (heuristics). These experts identify problems based on whether or not the interface fails to adhere to a given heuristic. When there is a failure, there is typically a usability problem. Studies of heuristics have shown them to be effective (in terms of numbers of problems found) and efficient (in terms of cost to perform) [48, 50].
Some researchers have illustrated difficulties with heuristic evaluation. Cockton & Woolrych suggest that heuristics should be used less in evaluation, in favor of empirical evaluations involving users [21]. Their arguments revolve around discrepancies among different evaluators and the low number of major problems that are found through the technique. Gray & Salzman also point out this weakness in [32].
Despite these objections, heuristic evaluation methods, particularly Nielsen’s, are still popular for their “discount” [70] approach to usability evaluation. Several recent works deal with adapting heuristic approaches to specified areas.
Baker et al. report on adapting heuristic evaluation to groupware systems [5]. They show that applying heuristic evaluation methods to groupware systems is effective and efficient for formative usability evaluation.
Mankoff et al. actually compare an adapted set of heuristics to Nielsen’s original set [56]. They studied ambient displays (which are similar to the systems that would be classified as ambient displays in the IRC framework) with both sets of heuristics and determined that their adapted set is better suited to ambient displays.
The heuristic usability evaluation method will be investigated in this research, but with different forms of heuristics, some adapted specifically to large screen information exhibits, others geared towards more general interface types (like generic notification systems or simply interfaces).
The focus of our work is to create a new set of heuristics by reliance on critical parameters. One that is tailored to the LSIE system class.

2.2.3 Comparing UEMs
Recent examples of work that strives to compare heuristic approaches to other UEMs (like lab-based user testing) include work shown at the 46th Annual Meeting of the Human Factors and Ergonomics Society.
Chattratichart and Brodie report on a comparison study of heuristic methods [16]. They extended heuristic evaluation (based on Nielsen’s) with a small set of content areas. These content areas served to focus the evaluation, thus producing more reliable results. It should also be noted that subjective opinions about the new method favored the original approach over the new approach. The added complexity of grouping problems into the content areas is the speculated cause of this finding [16].
Tan and Bishu compared heuristic evaluation to user testing [90]. They focused their work on web page evaluation and found that heuristic evaluation found more problems, but that the two techniques found different classes of problems. This means that these two methods are difficult to compare since the resulting problem lists are so different (like comparing apples to oranges). This difficulty in comparing analytical to empirical methods has been debated (see Human Computer Interaction 13(4) for a great summary of this debate) before and this particular work brings it to light in a more current example.

There has been some work on the best ways to compare UEMs. These studies are often limited to a specific area within HCI.
For example, Lavery et al. compared heuristics and task analysis in the domain of software visualization [52]. Their work resulted in development of problem reports that facilitate comparison of problems found with different methods. Their comparisons relied on effectiveness, efficiency, and validity measures for each method.
Others have also pointed out that effectiveness, efficiency, and validity are desirable measures for comparing UEMs (beyond simple numbers of usability problems obtained through the method) [40, 21].
Hartson et al. further put forth thoroughness, validity, reliability, and downstream utility as measures for comparing usability evaluation methods [40].

Chapter 3
Background and Motivation

However, there are many different types of usability evaluation methods one could employ to test design, and it is unclear which ones would serve as the best for this system class (large screen information exhibits).
One important variation in methods is whether to use an interface-specific tool or a generic tool that applies to a broad class of systems.
This preliminary study investigates tradeoffs of these two approaches (generic or specific) for evaluating LSIEs, by applying two types of evaluation to example LSIE systems.
This work provides the motivation and direction for the creation, testing, and use of a new set of heuristics tailored to the LSIE system class.

3.2 Assessing Evaluation Methods
Specific evaluation tools are developed for a single application, and apply solely to the system being tested (we refer to this as a per-study basis).
Many researchers use this approach, creating evaluation metrics, heuristics, or questionnaires tailored to the system in question (for example see [5, 56]). These tools seem advantageous because they provide fine grained insight into the target system, yielding detailed redesign solutions. However, filling immediate needs is costly—for each system to be tested a new evaluation method needs to be designed (by designers or evaluators), implemented, and used in the evaluation phase of software development.

In contrast, system-class evaluation tools are not tailored to a specific system and tend to focus on higher level, critical problem areas that might occur in systems within a common class.
These methods are created once (by usability experts) and used many times in separate evaluations. They are desirable for allowing ready application, promoting comparison between different systems, benchmarking system performance measures, and recognizing long-term, multi-project development progress.
However, using a system-class tool often means evaluators sacrifice focus on important interface details, since not all of the system aspects may be addressed by a generic tool. The appeal of system- class methods is apparent over a long-term period, namely through low cost and high benefit.

We conducted an experiment to determine the benefits of each approach in supporting a claims analysis, a key process within the scenario-based design approach [15, 77]. In a claims analysis, an evaluator makes claims about how important interface features will impact users.
Claims can be expressed as tradeoffs, conveying upsides or downsides of interface aspects like supported or unsupported activities, use of metaphors, information design choices (use of color, audio, icons, etc.), or interaction design techniques (affordances, feedback, configuration options, etc.). These claims capture the psychological impacts specific design decisions may have on users.

3.3 Motivation from Prior Work
UEM research efforts have developed high level, generic evaluation procedures, a notable example being Nielsen’s heuristics [70].
Heuristic evaluation has been embraced by practitioners because of its discount approach to assessing usability. With this approach (which involves identification of usability problems that fall into nine general and “most common problem areas”), 3-5 expert evaluators can uncover 70% of an interface’s usability problems.
However, the drawbacks to this approach (and most generic approaches) are evident in the need to develop more specific versions of heuristics for particular classes of systems.
For example, Mankoff et al. created a modified set of heuristics for ambient displays [56]. These displays differ from regular interfaces in that they often reside off the desktop, incorporating parts of the physical space in their design, hence necessitating a more specific approach to evaluation. They came up with the new set of heuristics by eliminating some from Nielsen’s original set, modifying the remaining heuristics to reflect ambient wording, and then added five new heuristics [56]. However, they do not report the criteria used in eliminating the original heuristics, the reasons for using the new wordings, or how they came up with the five new heuristics. They proceeded to compare this new set of heuristics to Nielsen’s original set and found the more specific heuristics provided better usability results.

Similar UEM work dealt with creating modified heuristics for groupware systems [5]. In this work, Baker et al. modified Nielsen’s original set to more closely match the user goals and needs associated with groupware systems. They based their modification on prior groupware system models to provide guidance in modifying Nielsen’s heuristics. The Locales Framework [35] and the mechanics of collaboration [38] helped Baker et al. in formulating their new heuristics. However, they do not describe how these models helped them in their creation, nor how they were used. From the comparison, they found the more application class-specific set of heuristics produced better results compared to the general set (Nielsen’s).

Both of these studies suggest that system-class specific heuristics are more desirable for formative evaluation. However, the creation processes used in both are not adequately described. It seems that to obtain the new set of heuristics, all the researchers did was modify Nielsen’s heuristics.
Unfortunately, it is not clear how this modification occurred. Did the researchers base the changes on important user goals for the system, as determined through critical parameters for the system class? Or was the modification based on guesswork or simple “this seems important for this type of system” style logic?


Source: Somervell, Jacob. Developing Heuristic Evaluation Methods for Large Screen Information Exhibits Based on Critical Parameters. [Dissertation, PhD in Computer Science and Applications] Virginia Polytechnic Institute and State University. June 22, 2004.

No comments:

Post a Comment