Thursday, November 5, 2009

Nov 5,6 - Baker, Evaluation Methodology (MSc thesis)

Heuristic Evaluation of Shared Workspace Groupware
Chapter 4
Evaluation Methodology


Having formulated two sets of groupware heuristics from two inter-related frameworks in Chapter 3, the next logical task in my research is to validate these heuristics. To that extent, this chapter describes the two-step methodology used to carry out this objective.

The first step was a pilot study whereby the groupware heuristics were reviewed and subsequently modified prior to conducting the main research study, the second step. The main research study was set up to mirror the methodology and terminology employed by Nielsen to validate his heuristics (Nielsen and Molich 1990, Nielsen 1992).

For our purposes, two groups of inspectors with varying degrees of expertise in HCI and CSCW evaluated two groupware systems, a toy groupware editor called GroupDraw (Roseman and Greenberg 1994), and a very substantial commercial groupware system called Groove (www.groove.net). Their goal was to record as many usability problems as possible that violated the groupware heuristics.

4.1 Objective

As per my research goals (see Section 1.4), the objective of validating the heuristics is to:
“Demonstrate that the adapted heuristic evaluation for groupware remains a ‘discount’ usability technique by analyzing the ability of inspectors to identify problems in collaborative applications”.
As a means to execute the stated objective, I revisit Nielsen’s motivations for traditional
heuristic evaluation where he designed it as a usability engineering methodology that can
be done cheaply, quickly, and produce useful results (Nielsen and Molich 1990, Mack
and Nielsen 1994).

To briefly elaborate the three key terms:
1. Cheaply.
Nielsen’s heuristic evaluation places a low demand on resources. Nonexperts can carry out inspections; therefore, it is not confined to using more costly experts (the caveat: experts will produce better results) (Nielsen 1992).
This is practical because heuristic evaluation is, in practice, relatively easy to learn and to apply (Mack and Nielsen 1994). Consequently, extensive resources are not required for training. Finally, heuristic evaluation requires only the evaluator, the interface, paper, and pencil. No special equipment or facilities are necessary.

2. Quickly.
Heuristic evaluations do not require advance planning nor do they require a large amount of time to conduct (Nielsen and Molich 1990). Typically, the evaluation of an interface can be done within an hour or two for a simple interface and somewhat longer for more complex interfaces.

3. Useful results.
Despite performing heuristic evaluations with less control and formality than would be entailed by formal user testing, this technique provably produces useful results (Nielsen and Molich 1990, Mack and Nielsen 1994). As discussed in Chapter 2, only a few evaluators (3-5) can discover a majority (~75%) of interface bugs with varying severity. Fixing these bugs can in turn improve the usability of products.

Within my main research study, I look to validate heuristics applied to groupware evaluation to see if it remains a discount usability technique that is quick and cheap to perform while producing useful results.

To gauge this cost-effectiveness, I conducted the evaluation in the following manner:
• I chose as inspectors people with knowledge of human computer interaction, but limited knowledge of Computer Supported Cooperative Work.
• Prior to performing their evaluations, inspectors received a basic one hour training lecture and a packet of written materials (see Appendix A) explaining the heuristics.
• Inspectors self-selected the length of time and amount of effort they would put into completing the evaluation.

4.2 Pilot study

The groupware heuristics used by the inspectors to perform their evaluation of the two systems are presented in Appendix A.5 and A.6. A quick glance over the mechanics of collaboration heuristics reveals that their explanation and format differ from the same heuristics presented in the previous chapter.
Due to time constraints and my secondary focus on the Locales Framework heuristics, the pilot study was conducted with only the mechanics of collaboration heuristics.

4.2.1 Participants

Three professional HCI and groupware practitioners with 5 to 17 years of relevant experience were recruited to review the mechanics of collaboration heuristics. All three were also currently engaged in a project that involved re-designing the user interface for a real-time, shared workspace collaborative application.

4.2.2 Method

Each participant received a copy of the mechanics of collaboration heuristics similar to
what is found in Chapter 3.
Each was asked to address the following two questions:
1. Do I understand the principles of the heuristic?
2. Would I be able to apply this heuristic as part of a heuristic evaluation?

The objective was to gain informal feedback regarding the comprehensibility of each heuristic and its suitability to be applied in the context of an evaluation. Feedback was gathered in the form of written comments on the original handouts and verbal comments recorded from interviews with each individual after their review.

4.2.3 Results

Overall, the reviewers’ feedback was positive regarding the content of each mechanics of collaboration heuristic. They expressed that beginning a heuristic with its supporting theory—derived from studies of face-to-face interactions—helped to establish its motivation. In addition, the reviewers considered the heuristics to be practical since they included examples of techniques employed by groupware systems to comply with each one.
Despite the positive feedback, two main areas of concerns surfaced in response to the reviewers answering the two aforementioned questions.
The first area of concern surrounds the heuristics’ comprehension. Each reviewer had to re-read each heuristic (in some cases, several times) in order to comfortably understand the concepts.
The second area of concern centered on the ability of the reviewers to apply the heuristics as part of a groupware evaluation. ...the reviewers concluded that it would be awkward to use the heuristics in their current format to effectively evaluate groupware systems.

4.2.4 Discussion

Prior to conducting the main research study, the mechanics of collaboration heuristics (and subsequently the Locales Framework heuristics) had to be revised to address the reviewers’ concerns. I did not want the bottleneck to ‘good’ results to be the inability of the inspectors to understand and apply the heuristics.

To address the first concern, all of the heuristics were re-written with the intent of making them an ‘easier read’. Domain specific terms were replaced with more common terms. Sentences were shortened. In addition, all new concepts introduced by the heuristics were clearly defined and spelt out.

The second concern regarding the practicality of the heuristics in their current format raised an interesting issue, one that I had not considered up until this point. It is one thing to have all the pertinent theory encapsulated in a set of heuristics, but it is another to ensure that the heuristics are packaged as a practitioner’s training tool to facilitate conducting a heuristic evaluation.
The naïve approach is to structure these heuristics as a checklist whereby a practitioner is presented with a series of items that can be systematically checked off during an inspection.

In response to the reviewers’ comments, the mechanics of collaboration heuristics were re-structured in an attempt to facilitate their role as a tool.
The first section of each heuristic was divided into two new sections “Theory” and “What this means for groupware”.
“Theory” provides the underlying principles behind each heuristic and is not critical to performing an evaluation.
The intent of the next section, “What this means for groupware”, is to provide all the pertinent information that an inspector should consult when performing a heuristic evaluation.
The final section “Techniques used in groupware” remained essentially intact with the “Typical groupware support” section from the early version of the heuristics.

4.2.5 Sanity check

To ensure that all comments had been adequately addressed, the same three professionals reviewed the revised mechanics of collaboration heuristics. All were comfortable that their comments had been addressed. They viewed the new heuristics as easier to read and understand. This set of heuristics and its training material is presented in Appendix A.5.

4.2.6 Locales Framework heuristics

Although the pilot study was conducted with the mechanics of collaboration heuristics, the findings were transferable to the Locales Framework heuristics. Consequently, the latter were re-written to address the issues in a similar manner to the mechanics of collaboration heuristics.
The only difference between the two sets of heuristics is the lack of a “Techniques used in groupware” section with the locales heuristics.

4.3 Main research study

Subsequent to revising both sets of heuristics in response to the pilot study findings, my next step was to see if these adapted heuristics could be used for “heuristic evaluation of groupware and remain a ‘discount’ usability technique”.
To do this, I analyze the ability of inspectors to identify problems in two collaborative applications.

4.3.1 Participants

To assess the practicality of the groupware heuristics, I looked at the ability of individuals with minimal training in HCI but with varying levels of knowledge in CSCW to apply them. We recruited several groups fitting these criteria and validated our demographics through a questionnaire.

Participants were categorized as two evaluator types: novice and regular.
Novice evaluators were 16 students in their 3rd or 4th year of a University computer science program. All had completed one full course in HCI and were currently enrolled in a second senior-level advanced undergraduate HCI course. When asked, the majority indicated some experience with designing and evaluating graphical user interfaces. However, few had any substantive knowledge regarding CSCW interface design principles. Consequently, the group consisted of “novices” with respect to CSCW usability but not with respect to computers and HCI.
• Regular specialists were 2 professors and 9 students working on their graduate degrees in computer science. All had a history of research, applied work, and/or class work in groupware and CSCW, as well as conventional user interface design and evaluation. These individuals were labeled ‘regular specialists’ since in contrast to the former group; they were knowledgeable of groupware fundamentals.

Except for the professors, all participants were students. Due to their limited availability, professional HCI/CSCW practitioners from industry were not employed as part of our research.

4.3.2 Materials

To help inspectors conduct their heuristic evaluation, we gave them a training packet, workstations, and the two groupware systems.

Training packet.
The training packet (Appendix A) consisted of two sets of groupware heuristics; one based on the mechanics of collaboration (A.5) and the other on the Locales Framework (A.6). These were revised versions in accordance with the pilot study findings.
The packet also contains the following forms:
• a consent form outlining the inspectors’ participation in the research study (A.1);
• a background information questionnaire to help ascertain each inspector’s knowledge and experience in the areas of HCI and CSCW (A.2);
• pre- and post-training feedback forms containing questions assessing the ease with which the inspectors were able to comprehend the heuristics before and after their training session (A.4); and
• problem reports designed in accordance with Nielsen (1994a) and Cox (1998) and used by the inspectors to capture their usability problems (A.7).

Workstations.
The novice evaluators conducted the heuristic evaluations of the two groupware systems on four PC workstations located in a single row in the undergraduate computer lab.
...Installation instructions (Appendix A.3) were provided on how to set-up the software.

Groupware systems.
As part of the study, the participants evaluated two quite different
shared visual workspaces contained in two real-time groupware systems: GroupDraw
and Groove.
GroupDraw is an object-oriented ‘toy’ drawing program built to show people
how to program the GroupKit groupware toolkit (Roseman and Greenberg 1996).
Groove is a virtual space for real-time, small group interactions. Users create “shared spaces” to communicate and collaborate with one another. Changes made to a shared space by one participant are automatically synchronized with all other computers.
Its functionality includes:
1. Communication tools – live voice over the Internet, instant messaging, text-based chat, and threaded discussion.
2. Content sharing tools – shared files, pictures, and contacts.
3. Joint activity tools – co-Web browsing, multiple-user drawing and editing, group calendar.

4.3.3 Method

The heuristic evaluation of the GroupDraw and Groove interfaces followed Nielsen’s standard recommendations (refer to chapter 2 for details).
This involved administering an orientation session to each group of inspectors prior to the evaluation process.
However, I did not conduct a debriefing session due to the geographical separation between the
inspectors and the researcher (myself).

Orientation session.
Prior to the session, each participant was asked to:
• sign the consent form (A.1);
• fill-out the background information questionnaire (A.2);
• read the detailed written description of the groupware heuristics (A.5 and A.6); and
• complete the pre-training feedback form (A.4).
Given the number of inspectors (27 total inspectors) and their location (22 in Calgary and 5 in Saskatoon), I conducted three separate 90-minute orientation sessions to three audiences.

* Collected the signed consent form, the completed background questionnaire and pretraining
feedback form from each inspector.
* Inspectors were handed a blank post-training feedback form (this form was identical to the pre-training feedback form) and an ample supply of blank problem reports for their heuristic evaluation of the systems.
* Conducted a one-hour training session on the proposed groupware heuristics. This included a review of the theory supporting each heuristics, how to apply them during an evaluation, and real-time groupware examples that illustrated compliance and non-compliance with each heuristic.
* Participants then filled out a post-training feedback form so that I could gauge how well they comprehended the groupware heuristics upon receiving the training.
* Provided the inspectors with an overview of the two groupware systems under test, GroupDraw and Groove.
* For GroupDraw, the inspectors were asked to evaluate only the shared workspace (Figure 4.1 bottom) and the Notes functionality (Figure 4.2) as the means for communicating with one another. For Groove, I asked the inspectors to evaluate the Outliner tool (Figure 4.3) as well as the text chat and audio link.
* General instructions were given regarding the process for conducting the heuristic evaluation. The novices were to inspect both systems with only the mechanics of collaboration heuristics. The regular specialists were to assess the groupware interfaces with both the mechanics of collaboration and Locales Framework heuristics.

Evaluation process.
Each inspector dictated when, where, and how to perform the evaluation. As with traditional heuristic evaluation, they could use the heuristics to systematically review all the functionality. Alternatively, they could walk through an imaginary task of their own making and verify how each step of the task complied with the heuristics.
Inspectors had complete control over the length of time and amount of effort they for completing the evaluation.
In some instances, the evaluation was performed in pairs and other times it involved four or five inspectors working concurrently.

4.3.4 Data collection.

For each problem uncovered, the inspectors completed a separate problem report by recording a description of the problem, the violated heuristic, a severity rating, and an (optional) solution to the problem. They judged a ‘major’ severity rating as one that represented a significant obstacle to effective collaboration, while a ‘minor’ rating is one that could be worked around by the participant. A blank problem report is found in Appendix A.7.
With respect to formulating severity ratings for each usability problem, Nielsen (1994a) states that evaluators have difficulty performing this step during the evaluation process since they are more focused on finding new usability problems. In addition, each evaluator will not find all the usability problems in the system; therefore, the severity ratings will be incomplete since they only reflect those problems found by the evaluator.
These original problem reports form the raw data for my analysis in the next chapter (see Appendix B for all problem reports).

4.4 Conclusion

This chapter reviewed the methodology for the pilot study and the subsequent changes to the groupware heuristics in preparation for the main study. Next, the main research study was introduced via a detailed description of its methodology. Of primary importance is that we used people that we felt were reasonable approximations of the actual practitioners we would expect to do groupware evaluations, and that our methodology echoed Nielsen’s traditional heuristic evaluation methodology.


Source: Baker, Kevin F. Heuristic Evaluation of Shared Workspace Groupware based on the Mechanics of Collaboration. [Thesis, M.Sc.] University of Calgary, Calgary, Alberta, Canada. May 2002.

No comments:

Post a Comment