Saturday, October 31, 2009

Nov 1 – Somervell’s dissertation

Chapter 2
Literature Review


developing new heuristics for the LSIE system class, based on critical parameters.

Critical Parameters
William Newman put forth the idea of critical parameters for guiding design and strengthening evaluation in [68] as a solution to the growing disparity between interactive system design and separate evaluation.
For example, consider airport terminals, where the critical parameter would be flight capacity per hour per day [68]. All airport terminals can be assessed in terms of this capacity, and improving that capacity would invariably mean we have a better airport. Newman argues that by establishing parameters for application classes, researchers can begin establishing evaluation criteria, thereby providing continuity in evaluation that allows us “to tell whether progress is being made” [68].
In addition, Newman argues that critical parameters can actually provide support for developing design methodologies, based on the most important aspects of a design space. This ability separates critical parameters from traditional usability metrics. Most usability metrics, like “learnability” or “ease of use” only probe the interaction of the user with some interface, focusing not on the intended purpose of the system but on what the user can do with the system.
Critical parameters focus on supporting the underlying system functions that allow one to determine whether the system performs its intended tasks.
Indeed, the connection between critical parameters and traditional usability metrics can be described as input and output of a “usability” function. Critical parameters are used to derive the appropriate usability metrics for a given system, and these metrics are related to the underlying system goals through the critical parameters.

Critical Paramters for Notification Systems
In [62], we embraced Newman’s view of critical parameters and established three parameters that define the notification systems design space.
Interruption, reaction, and comprehension are three attributes of all notification systems that allow one to assess whether the system serves its intended use.
Furthermore, these parameters allow us to assess the user models and system designs associated with notification systems in terms of how well a system supports these three parameters.
High and low values of each parameter capture the intent of the system, and allow one to measure whether the system supports these intents.

2.2.1 Analytical Methods
Analytical methods show great promise for ensuring formative evaluation is completed, and not just acknowledged in the software life cycle. These methods provide efficient and effective usability results [70].
The alternative usually involves costly user studies, which are difficult to perform, and increase the design phases for most interface development projects. It is for these reasons that we focus on analytical methods, specifically heuristics.
Heuristic methods are chosen in this research for two reasons.
One, these methods are considered “discount” methods because they require minimal resources for the usability problems they uncover [70].
Two, these methods only require system mock-ups or screen shots for evaluation, which makes them desirable for formative evaluation. These are strong arguments for developing this method for application in multiple areas.

2.2.2 Heuristic Evaluation
A popular evaluation method, both in academia and industry is heuristic evaluation.
Heuristics are simple, fast approaches to assessing usability [70]. Expert evaluators visually inspect an interface to determine problems related to a set of guidelines (heuristics). These experts identify problems based on whether or not the interface fails to adhere to a given heuristic. When there is a failure, there is typically a usability problem. Studies of heuristics have shown them to be effective (in terms of numbers of problems found) and efficient (in terms of cost to perform) [48, 50].
Some researchers have illustrated difficulties with heuristic evaluation. Cockton & Woolrych suggest that heuristics should be used less in evaluation, in favor of empirical evaluations involving users [21]. Their arguments revolve around discrepancies among different evaluators and the low number of major problems that are found through the technique. Gray & Salzman also point out this weakness in [32].
Despite these objections, heuristic evaluation methods, particularly Nielsen’s, are still popular for their “discount” [70] approach to usability evaluation. Several recent works deal with adapting heuristic approaches to specified areas.
Baker et al. report on adapting heuristic evaluation to groupware systems [5]. They show that applying heuristic evaluation methods to groupware systems is effective and efficient for formative usability evaluation.
Mankoff et al. actually compare an adapted set of heuristics to Nielsen’s original set [56]. They studied ambient displays (which are similar to the systems that would be classified as ambient displays in the IRC framework) with both sets of heuristics and determined that their adapted set is better suited to ambient displays.
The heuristic usability evaluation method will be investigated in this research, but with different forms of heuristics, some adapted specifically to large screen information exhibits, others geared towards more general interface types (like generic notification systems or simply interfaces).
The focus of our work is to create a new set of heuristics by reliance on critical parameters. One that is tailored to the LSIE system class.

2.2.3 Comparing UEMs
Recent examples of work that strives to compare heuristic approaches to other UEMs (like lab-based user testing) include work shown at the 46th Annual Meeting of the Human Factors and Ergonomics Society.
Chattratichart and Brodie report on a comparison study of heuristic methods [16]. They extended heuristic evaluation (based on Nielsen’s) with a small set of content areas. These content areas served to focus the evaluation, thus producing more reliable results. It should also be noted that subjective opinions about the new method favored the original approach over the new approach. The added complexity of grouping problems into the content areas is the speculated cause of this finding [16].
Tan and Bishu compared heuristic evaluation to user testing [90]. They focused their work on web page evaluation and found that heuristic evaluation found more problems, but that the two techniques found different classes of problems. This means that these two methods are difficult to compare since the resulting problem lists are so different (like comparing apples to oranges). This difficulty in comparing analytical to empirical methods has been debated (see Human Computer Interaction 13(4) for a great summary of this debate) before and this particular work brings it to light in a more current example.

There has been some work on the best ways to compare UEMs. These studies are often limited to a specific area within HCI.
For example, Lavery et al. compared heuristics and task analysis in the domain of software visualization [52]. Their work resulted in development of problem reports that facilitate comparison of problems found with different methods. Their comparisons relied on effectiveness, efficiency, and validity measures for each method.
Others have also pointed out that effectiveness, efficiency, and validity are desirable measures for comparing UEMs (beyond simple numbers of usability problems obtained through the method) [40, 21].
Hartson et al. further put forth thoroughness, validity, reliability, and downstream utility as measures for comparing usability evaluation methods [40].

Chapter 3
Background and Motivation

However, there are many different types of usability evaluation methods one could employ to test design, and it is unclear which ones would serve as the best for this system class (large screen information exhibits).
One important variation in methods is whether to use an interface-specific tool or a generic tool that applies to a broad class of systems.
This preliminary study investigates tradeoffs of these two approaches (generic or specific) for evaluating LSIEs, by applying two types of evaluation to example LSIE systems.
This work provides the motivation and direction for the creation, testing, and use of a new set of heuristics tailored to the LSIE system class.

3.2 Assessing Evaluation Methods
Specific evaluation tools are developed for a single application, and apply solely to the system being tested (we refer to this as a per-study basis).
Many researchers use this approach, creating evaluation metrics, heuristics, or questionnaires tailored to the system in question (for example see [5, 56]). These tools seem advantageous because they provide fine grained insight into the target system, yielding detailed redesign solutions. However, filling immediate needs is costly—for each system to be tested a new evaluation method needs to be designed (by designers or evaluators), implemented, and used in the evaluation phase of software development.

In contrast, system-class evaluation tools are not tailored to a specific system and tend to focus on higher level, critical problem areas that might occur in systems within a common class.
These methods are created once (by usability experts) and used many times in separate evaluations. They are desirable for allowing ready application, promoting comparison between different systems, benchmarking system performance measures, and recognizing long-term, multi-project development progress.
However, using a system-class tool often means evaluators sacrifice focus on important interface details, since not all of the system aspects may be addressed by a generic tool. The appeal of system- class methods is apparent over a long-term period, namely through low cost and high benefit.

We conducted an experiment to determine the benefits of each approach in supporting a claims analysis, a key process within the scenario-based design approach [15, 77]. In a claims analysis, an evaluator makes claims about how important interface features will impact users.
Claims can be expressed as tradeoffs, conveying upsides or downsides of interface aspects like supported or unsupported activities, use of metaphors, information design choices (use of color, audio, icons, etc.), or interaction design techniques (affordances, feedback, configuration options, etc.). These claims capture the psychological impacts specific design decisions may have on users.

3.3 Motivation from Prior Work
UEM research efforts have developed high level, generic evaluation procedures, a notable example being Nielsen’s heuristics [70].
Heuristic evaluation has been embraced by practitioners because of its discount approach to assessing usability. With this approach (which involves identification of usability problems that fall into nine general and “most common problem areas”), 3-5 expert evaluators can uncover 70% of an interface’s usability problems.
However, the drawbacks to this approach (and most generic approaches) are evident in the need to develop more specific versions of heuristics for particular classes of systems.
For example, Mankoff et al. created a modified set of heuristics for ambient displays [56]. These displays differ from regular interfaces in that they often reside off the desktop, incorporating parts of the physical space in their design, hence necessitating a more specific approach to evaluation. They came up with the new set of heuristics by eliminating some from Nielsen’s original set, modifying the remaining heuristics to reflect ambient wording, and then added five new heuristics [56]. However, they do not report the criteria used in eliminating the original heuristics, the reasons for using the new wordings, or how they came up with the five new heuristics. They proceeded to compare this new set of heuristics to Nielsen’s original set and found the more specific heuristics provided better usability results.

Similar UEM work dealt with creating modified heuristics for groupware systems [5]. In this work, Baker et al. modified Nielsen’s original set to more closely match the user goals and needs associated with groupware systems. They based their modification on prior groupware system models to provide guidance in modifying Nielsen’s heuristics. The Locales Framework [35] and the mechanics of collaboration [38] helped Baker et al. in formulating their new heuristics. However, they do not describe how these models helped them in their creation, nor how they were used. From the comparison, they found the more application class-specific set of heuristics produced better results compared to the general set (Nielsen’s).

Both of these studies suggest that system-class specific heuristics are more desirable for formative evaluation. However, the creation processes used in both are not adequately described. It seems that to obtain the new set of heuristics, all the researchers did was modify Nielsen’s heuristics.
Unfortunately, it is not clear how this modification occurred. Did the researchers base the changes on important user goals for the system, as determined through critical parameters for the system class? Or was the modification based on guesswork or simple “this seems important for this type of system” style logic?


Source: Somervell, Jacob. Developing Heuristic Evaluation Methods for Large Screen Information Exhibits Based on Critical Parameters. [Dissertation, PhD in Computer Science and Applications] Virginia Polytechnic Institute and State University. June 22, 2004.

Oct 31 - Somervell, Developing Heuristic Evaluation Methods for Large Screen Information Exhibits Based on Critical Parameters (PhD dissertation)

ABSTRACT
Evaluation is the key to effective interface design. It becomes even more important when the interfaces are for cutting edge technology, in application areas that are new and with little prior design knowledge. Knowing how to evaluate new interfaces can decrease development effort and increase the returns on resources spent on formative evaluation. The problem is that there are few, if any, readily available evaluation tools for these new interfaces.
This work focuses on the creation and testing of a new set of heuristics that are tailored to the large screen information exhibit (LSIE) system class. This new set is created through a structured process that relies upon critical parameters associated with the notification systems design space. By inspecting example systems, performing claims analysis, categorizing claims, extracting design knowledge, and finally synthesizing heuristics; we have created a usable set of heuristics that is better equipped for supporting formative evaluation.
Contributions of this work include: a structured heuristic creation process based on critical parameters, a new set of heuristics tailored to the LSIE system class, reusable design knowledge in the form of claims and high level design issues, and a new usability evaluation method comparison test. These contributions result from the creation of the heuristics and two studies that illustrate the usability and utility of the new heuristics.


1 Introduction

2 Literature Review

3 Background and Motivation

7 Discussion

8 Conclusion


But, how would be go about creating an evaluation tool that applies to this type of system? Would we want to create a tool dedicated to this single system or would a more generic, system class level tool be a better investment of our time? Evidence from preliminary work suggests that system-class level evaluation tools hold the most promise for long-term performance benchmarking and system comparison, over more generic tools or even tools tailored for an individual system [85, 56, 5]. A system class level tool is situated more towards the specific side of the generality/specificity scale; yet, it is still generic enough to apply to many different systems within a class. So, again, how would we go about creating a new tool for this type of system? The key to successful evaluation tool creation is focusing on the user goals associated with the target system class. This requires an understanding of the system class, in terms of these critical user goals.

What is UEM?
UEMs are tools or techniques used by usability engineers to discover problems in the design of software systems, typically measuring performance against some usability metric (ease of use, learnability, etc).

What is Heuristic Evaluation?
Heuristic evaluation is a specific type of UEM in which expert usability professionals inspect a system according to a set of guidelines. This method is analytic in nature because the experts review a system (through prototypes or screen-shots) and try to discover usability issues from inspection and reflection upon the guidelines.
We need a specific tool, like heuristics, that can support formative evaluation of these displays.

Heuristics have been used throughout the HCI community for quick, efficient usability evaluation [66, 70, 69, 48, 40, 32, 56, 21].

Contributions of this work include:
• Critical parameter based creation of system class heuristics
We develop and use a new heuristic creation process that leverages critical parameters from the target system class. Researchers can now focus UEM development effort on a structured process that yields usable heuristics.
• Heuristics tailored to the LSIE system class
LSIE researchers and developers now have a new tool in their arsenal of evaluation methods. These heuristics focus on the unique user goals associated with the LSIE system class.
• LSIE system design guidance
In addition to the heuristics, we produced significant numbers of design tradeoffs from system inspection. These claims are useful to other system developers because the claims can be reused in disparate projects.
• UEM comparison tool
Through our efforts to compare the new heuristics to other existing alternatives, we developed a new comparison technique that relies upon expert inspection to provide a simplified method for calculating UEM comparison metrics.
• Deeper understanding of the generality vs. specificity tradeoff
Finally, we also provide more insight into the question of the level of specificity a UEM should have for a given system. We also find support for system-class specific UEMs, as other work has indicated.

The remainder of this document is organized as follows:
• Chapter 2 discusses appropriate literature and related work, situating our critical parameter
based approach and providing motivation;
• Chapter 3 provides details on early studies that illustrate the need for an effective UEM
creation method, it also illustrates the utility of claims analysis for uncovering problem sets;
• Chapter 4 describes the UEM creation process, including descriptions of the five LSIE systems
(phase 1);
• Chapter 5 describes the comparison experiment, including discussion (phase 2);
• Chapter 6 describes three efforts to show the heuristic set produced in Chapter 4 is indeed
useful and usable (phase 3);
• Chapter 7 provides a discussion of the implications of this work;
• and Chapter 8 provides detailed descriptions of the contributions and information on future
work directions.


Source:
Somervell, Jacob. Developing Heuristic Evaluation Methods for Large Screen Information Exhibits Based on Critical Parameters. [Dissertation, PhD in Computer Science and Applications] Virginia Polytechnic Institute and State University. June 22, 2004.

Oct 31 - Ling, Advances in Heuristic Usability Evaluation Method (PhD dissertation)

Oct 31 – Ling's PhD dissertation

Heuristic evaluation is one of the most popularly used usability evaluation method in
both industry and academia. (Ling, 2005; Rosenbaum, 2000)
In a major survey conducted by Rosenbaum, Rohn and Hamburg (2000), heuristic evaluation was noted as the most used usability evaluation method.

Using the newly developed E-Commerce heurist set (of usability criteria) resulted in finding larger number of real usability problems than using Nielsen’s (1994d) heuristic set. (Ling 2005)

Sweeney, Maguire and Shackel (1993) used the concept of “approach to evaluation” to classify usability methods: user-based, theory-based and expert-based. Their framework reflects different data sources that form the basis of evaluation.

Mack and Nielsen (1004) stated 4 ways to evaluate user interfaces.
Automatically
Empirically
Formally
Informally.
The informal evaluation methods are also referred to as usability inspection methods.

Ling (2005) had categorized usability evaluation methods into seven.
1) Analytic theory-based methods – include: (a) cognitive task analysis; (b) goal, operator, method and selection (GOMS) models; user action notations (UAN).
GOMS to estimate the expert’s task performance time.
2) Expert evaluation methods – include formal usability inspections, cognitive walkthroughs, pluralistic walkthroughs, guideline reviews, heuristic evaluation, claim analysis.
Guideline methods apply a set of established guidelines and rules to target system; differ from hruristic evaluation because they use a larger number of guidelines than the number of heuristics used in heuristic evaluations. Hence, guideline methods require lower level of expertise for the evaluators.
Comments: May be I could combine concept of heuristic evaluation and guidelines review method to form a new method whereby the evaluators are guided by a large set of guidelines for each heuristic.
3) Observational evaluation methods – include direct observations, videos, computer logging, verbal protocols, cooperative observations, critical incident reports and ethnographic studies.
Collect data on what users do when they interact with an interface.
4) Survey evaluation methods – include questionnaires, interviews, focus group and user feedback.
Evaluators to ask users for their subjective views of the user interface of a system.
5) Experimental evaluation methods – include beta testing, think aloud method, constructive interactions, retrospective testing, coaching methods and performance measures.
Involve laboratory experiments to analyze user’s interaction with the system.
Think aloud method asks users to verbalize their thinking throughout the test.
6) Psycho-physiological measures of satisfaction or workload – collect physiological data e.g. electrical activity in brain, heart rate, blood pressure, pupil dilation, skin conductivity, level of adrenaline in blood, mental state of users in term of satisfaction and mental workload.
Predicting user’s workload and satisfaction level.
7) Automatic testing – use programs to capture automatically the critical measures of an interface, e.g. response time, HTML broken links.
Generate a list of low-level interface problems.

Ling (2005) had described thoroughly the Nielsen’s Heuristics and heuristic evaluation method.

Muller et al (1995, 1998) added four more “participatory” heuristics (in addition to Nielsen’s ten heuristics) and validated the new set, which aims to assess how well the interactive system meets the needs of its users and their work environment (Muller et al, 1995; Muller et al, 1998). Also, added users ans inspectors to expert inspectors (traditionally used in heuristic evaluation). Four additional “participatory” heuristics are:
a) Respect the user and his/her skills
b) Promote a pleasurable experience with the system
c) Support quality work
d) Protect privacy.

Ling (2005) had explained about four drawbacks of the heuristic evaluation method.

Ling (2005) explained the factors affecting the results of heuristic evaluation.
a) Individual differences
b) Expertise level
c) Task scenario
d) Observation during evaluation
e) Heuristic set
f) Individual or group evaluations.

Domain Specific Heuristic Set:
Heuristic evaluation method was developed and applied mainly for single user, productivity-oriented desktop programs, which were the major computer applications in the early 1990s.
But with computer technologies getting more integrated into everyday life and new types of HCI emerging, Nielsen’s ten heuristics may not be able to cover usability issues in new computing systems.
For example, mobile systems need to address issues of changing context of use (Vetere et al, 2003).
Because domain-specific heuristics can be developed to supplement existing heuristics (Molich & Nielsen, 1990; Nielsen, 1993; Nielsen & Mack, 1994), researchers have derived many adapted heuristic sets to address the typical requirements and problems in different kinds of application domain.

Ling (2005) discussed how other researchers (including Baker et al, 2001; Baker et al, 2002) came out with the modified/customised heuristics set.

Ling (2005) defined usability problems, severity of usabillity problems, usability problem report.

Ling (2005) focused on usability of e-commerce websites, seemingly.

Questionnaires are the most widely used method to identify an individual's attitudes and feelings toward a software system (Kirakowski & Corbett, 1990).

What have I read:
Chapter 1 Objective and Significance
Chapter 2 Background Literature
Chapter 3 Conceptual Model and Hypotheses
Chapter 9 Conclusions and Recommendations

Ling (2005) concluded:
Based on the experimental results, the following guidelines can be derived on how to perform heuristic evaluation.
* Find evaluator of field independent cognitive style
* Use domain specific heuristic set to guide evaluation
* Have evaluators conduct the evaluation in pairs to reduce variability in results.


Source:
Ling, Chen. Advances in Heuristic Usability Evaluation Method. [Dissertation, PhD] Purdue University, West Lafayette, Indiana. Dec 2005.

Friday, October 30, 2009

Oct 23 - "E-Learning" related Dissertations

Searched and downloaded dissertations for keyword of "E-Learning."

Chang, Chinhong Lim. Faculty Perceptions and Utilization of a Learning Management System in Higher Education. [Dissertation, PhD] Ohio University. June 2008.

Craig, Janet C. E2: Efficient and Effective E-Learning. [Dissertation, PhD] Capella University. Oct 2007.

Devey, Patrick L. The e-Volving Practitioner: A Heuristic Formative Evaluation of an Online course Based on an Action Research Methodology. [Thesis, M.A.] Concordia University, Montreal, Quebec, Canada. Aug 2002.

Dhaliwal, Baljeet Singh. Assemble To Order Learning Management System. [Thesis, M.Sc.] Simon Fraser University. Summer 2006.

Filimban, Ghadeer Zainuddin. Factors that Contribute to the Effectiveness of Online Learning Technology at Oregon State University. [Dissertation, PhD in Education] Oregon State University. June 2008.

Gallaher, James William Jr. The Adoption of E-Learning Across Professional Groups in a Fortune 500 Company. [Thesis, PhD in Education] University of Illinois at Urbana-Champaign. 2002.

Jessup, Stephanie A. PROCESSES USED BY INSTRUCTIONAL DESIGNERS TO CREATE ELEARNING AND LEARNING OBJECTS. [Dissertation, PhD] Capella University. June 2007.

Jobert-Egou, Cecile. Learning Management Systems: A Case Study of the Implementation of a Web-based Competency and Training Management Program at Bell Canada. [Thesis, M.A.] Concordia University, Montreal, Quebec, Canada. Nov 2002.

Kramer, Heidi. Measuring the Effect of E-Learning on Job Performance. [Dissertation, PhD in Computing Technology in Education] Nova Southeastern University. 2007.

Malcolm, Marci. The Relationship between Learning Styles and Success in Online Learning. [Dissertation, PhD in Education] Northcentral University, Prescott Valley, Arizona. Feb 2009.

Parsons, Ann M. A Delphi Study of Best Practices of Online Instructional Design Practices in Malaysia. [Dissertation, PhD] Capella University. Sep 2008.

Richter, Gina A. BEYOND ENGAGEMENT: INSTRUCTIONAL STRATEGIES THAT FACILITATE LEARNERS’ COGNITIVE PROCESSES AND PRODUCE EFFECTIVE AND INSTRUCTIONALLY EFFICIENT E-LEARNING. [Dissertation, PhD] Capella University. April 2008.

Tai, Luther. Corporate E-Learning: How E-Learning is created in three large corporations. [Dissertation, Doctor of Education] University of Pennsylvania. 2005.

Yaw, Dorothy Carole. An Evaluation of E-Learning in Industry at Level Three Based Upon the Kirkpatrick Model. [Dissertation, PhD] Indiana State University, Terre Haute, Indiana. Dec 2005.

Yin, Zheng. Study of Metadata for Learning Objects. [Thesis, M.Sc. (System Science)] University of Ottawa, Canada. 2004.

Zhu, Mingwei. An Open Source Based Learning Management System. [Thesis, Master of Computer Science] The University of New Brunswick. Nov 2005.

Oct 22 (part 2) - Dissertations downloaded

Reifschneider, Marina B. Factors Affecting Perceptions of Online Education Quality and Effectiveness in Brazil. [Dissertation, PhD in Educational Leadership] TUI University. Sep 2009.

Shih, Yuhsun E. Dynamic Language Learning: Comparing Mobile Language Learning with Online Language Learning. [Dissertation, PhD] Capella University. June 2007.

Smith, Terry J. Seniors Go Online. An Assessment of the Value of Usability: Is it Perceived Usefulness or Perceived Ease of use? [Dissertation, PhD in Information Systems] Nova Southeastern University. 2007.

Williams, Paul W. Assessing Mobile Learning Effectiveness and Acceptance. [Dissertation, PhD] George Washington University. Jan 31, 2009.

Womble, Joy Chastity. E-Learning: The Relationship among Learner Satisfaction, Self-Efficacy, and Usefulness. [Dissertation, PhD in Industrial/Organizational Psychology] San Diego Alliant International University. 2007.

Keywords searched:
1 Mobile Learning
2 E-Learning Effectiveness
3 Learner Satisfaction
4 Online Learning Usability
5 Mobile Application Usability
6 E-Learning (to be continued...)

Oct 22 - Dissertations...Mobile Learning, E-Learning Effectiveness, Learner Satisfaction, Online Learning Usability, Mobile Application Usability

Keywords searched:
1 Mobile Learning
2 E-Learning Effectiveness
3 Learner Satisfaction
4 Online Learning Usability
5 Mobile Application Usability
6 E-Learning (to be continued...)

Dissertations were searched and downloaded using ProQuest.

Abitt, Jason T. The Development of an Evaluation Framework for a Web-Based Course Management System in Higher Education. [Dissertation, PhD] University of Idaho. May 2005.

Colbry, Kathleen Tamara Luchini. Design Guidelines for Developing Scaffolded, Handheld Software to Support Learners during Science Inquiry. [Dissertation, PhD (Computer Science & Engineering)] The University of Michigan. 2005.

Elshair, Hanan M. The Strategies Used By Students To Read Educational Websites and Their Relation to Website Usability and Text Design. [dissertation, Doctor of Education] University of Pittsburgh. 2002.

Esch, Thomas J. E-Learning Effectiveness: An Examination of Online Training Methods for Training End-Users of New Technology Systems. [Dissertation, PhD in Business Administration] Touro University International. April 2003.

Ezzedine, Shadi Najib. Design Guidelines for Wireless Distributed Learning at Royal Roads University. [Thesis, M.A. in Distributed Learning] Royal Roads University. Nov 18, 2003.

Friend, Jean Rose. Website Usefulness for Third Agers: A Case Study of Older Adults and Senior Related Websites. [Dissertation, PhD] University of Virginia. Jan 2001.

Gamble, Angela L. The Effectiveness of E-Learning in a Multicultural Environment. [Dissertation, PhD] Capella University. Jan 2009.

Goode, Christina M. Evaluating the Quality, Usability, and Potential Effectiveness of Online Learning Modules: A Case Study of Teaching with Technology Grant Recipients at the University of Tennessee, Knoxville. [Dissertation, Doctor of Education] The University of Tennessee, Knoxville. Dec 2003.

Harmons, Eric M. A Usability Study of A Post-Conference Online Self-Assessment Program for Healthcare Professionals. [Thesis, M.A.] Texas Woman's University, Denton, Texas. Aug 2005.

Karlson, Amy Kathleen. Interface and Interaction Design for One-Handed Mobile Computing. [Dissertation, PhD] University of Maryland, College Park. 2007.

Lai, Horng-Ji. Evaluation of WWW On-Line Courseware Usability. [Dissertation, PhD] University of Idaho. May 2004.

Lavoie, Marie-Claude. Enabling Contextual MLearning: Design Recommendations for a Context-Appropriate User Interface Enabling Mobile Learning. [Thesis, M.A. (Educational Technology)] Concordia University, Montreal, Quebec, Canada. Jan 2007.

Lee, Kwang Bok. The Design and Development of User Interfaces for Small Screen Computers. [Thesis, PhD] Rensselaer Polytechnic Institute, Troy, New York. June 2003.

Mendoza, Valerie Nicole Duran. Usability of Technology: The Causes and Levels of Frustration Over Time. [Thesis, M.Sc.] The University of Texas at El Paso. July 2005.

Morales-Morell, Anibal. Usability Aspects of A Location-Aware ToDo List Application. [Thesis, M.Sc. in Computer Engineering] University of Puerto Rico, Mayaguez. 2000.

O'Dell, Toni. Generational Differences in Satisfaction with E-Learning in A Corporate-Learning Environment. [Dissertation, Doctor of Education] University of Houston. May 2009.

Pestina, Simona. Development Frameworks for Mobile/Wireless User Interface: A Comparative Study. [Thesis, Master of Computer Science] Concordia University, Montreal, Quebec, Canada. March 2002.

Platt, Jeffrey Lynn. The Efficacy of An Electronic Performance Support System as a Training Tool for Online Faculty. [Dissertation, PhD] Iowa State University, Ames, Iowa. 2008.

Oct 20 (part 2): "Usability Evaluation" related Dissertations

Khartabil, Rana. User-Centered Design and Evaluation of a Dynamic Biochemical Pathway Visualization Tool. [Thesis ,Master in Computer Science] The University of Ottawa, Ottawa, Ontario, Canada. 2005.

Lee, Minseok. Usability of Collaboration Technologies. [Dissertation, PhD] Purdue University, West Lafayette, Indiana. Aug 2007.

Li, Qian. Integrating Usability into Use Cases: A Methodology for User Interface Design. [Dissertation, PhD] The University of Connecticut. 2003.

Ling, Chen. Advances in Heuristic Usability Evaluation Method. [Dissertation, PhD] Purdue University, West Lafayette, Indiana. Dec 2005.

Loser, Marilyn Maguire. Computer Conference Message Navigation, Structure and Organization: Usability Evaluation of A Spatial Interface. [Dissertation, PhD in Organizational Learning & Instructional Technology] The University of New Mexico, Albuquerque, New Mexico. July 2001.

Rihal, Saravjit Singh. Relationship between Certain Objective Performance Measures and Subjective Usability. [Dissertation, PhD] Texas A&M University. May 2001.

Roberts, Vera Louise. Methods for Inclusion: Employing Think Aloud Protocols in Software Usability Studies with Individuals who are Deaf. [Thesis, PhD] University of Toronto. 2004.

Somervell, Jacob. Developing Heuristic Evaluation Methods for Large Screen Information Exhibits Based on Critical Parameters. [Dissertation, PhD in Computer Science and Applications] Virginia Polytechnic Institute and State University. June 22, 2004.

Takeshita, Harumi. Usability Evaluation for Handheld Devices: Presenting Clinical Evidence at the Point of Care. [Thesis, Master of Applied Science] University of Toronto. 2003.

Yang, Lan. Pilot Usability Study of UI Prototype for Collaborative Adaptative Decision Support in Neonatal Intensive Care Unit. [Thesis, M.A.Sc. (Electrical Engineering)] University of Ottawa, Ottawa, Ontario, Canada. 2005.

Oct 20....downloaded dissertations (part 2)

Oct 20 - ProQuest dissertation search "Usability Evaluation"

Oct 20 - Keywords were "Usability Evaluation." Continued to search and download, as continuation from Oct 15.

Al-Nuaim, Hana Abdullah. Development and Validation of a Multimedia User Interface Usability Evaluation Tool in the context of Educational Web Sites. [Dissertation, Doctor of Science] George Washington University. Jan 31, 2000.

Andre, Terence S. Determining the Effectiveness of the Usability Problem Inspector: A Theory-Based Model and Tool for Finding Usability Problems. [Dissertation, PhD in Industrial & Systems Engineering] Virginia Polytechnic Institute and State University, Blacksburg, Virginia. April 3, 2000.

Baker, Kevin F. Heuristic Evaluation of Shared Workspace Groupware based on the Mechanics of Collaboration. [Thesis, M.Sc.] University of Calgary, Calgary, Alberta, Canada. May 2002.

Capra, Miranda G. Usability Problem Description and the Evaluator Effect in Usability Testing. [Dissertation, PhD in Industrial & Systems Engineering] Virginia Polytechnic Institute and State University, Blacksburg, Virginia. March 13, 2006.

Chang, Yaowen. A Theory-based Usability Study of the Mouseover Abstract Interface. [Dissertation, Doctor of Education] Columbia University. 2005.

Clapsaddle, Donna J. Measuring Usability: Categorically Modeling Successful Websites Using Established Metrics. [dissertation, Doctor of Professional Studies in Computing] Pace University. May 2004.

Dykstra, Dean Julian. A Comparison of Heuristic Evaluation and Usability Testing: The Efficacy of a Domain-specific Heuristic Checklist. [dissertation, PhD] Texas A&M University. Dec 1993.

Elgin, Peter D. VALIDATING THE USER-CENTERED HYBRID ASSESSMENT TOOL (USER-CHAT): A COMPARATIVE USABILITY EVALUATION. [Dissertation, PhD] Kansas State University, Manhattan, Kansas. 2007.

Faulkner, Laura Lynn. Structured Software Usability Evaluation: An Experiment in Evaluation Design. [Dissertation, PhD] University of Texas at Austin. May 2006.

Govindaraju, Majorkumar. Development of Generic Design Guidelines to Manufacture Usable Consumer Products. [Dissertation, PhD] University of Cincinnati. 1999.

Hebb, Christopher Louis. Website Usability Evaluation using Sequential Analysis. [dissertation, PhD] Indiana University, Bloomington. May 2005.

Ho, Janet Chingyun. Evaluation of A Virtual Campus: Bell University Labs. [Thesis, Master of Applied Science] University of Toronto. 2000.

Hu, Xiangqun. Development and Evaluation of a Web-based Architectural Design Tool. [Thesis, Master of Computer Science] Technical University of Nova Scotia, Halifax, Nova Scotia. 1997.

Ivory, Melody Yvette. An Empirical Foundation for Automated Web Interface Evaluation. [Dissertation, PhD in Computer Science] University of California at Berkeley. Fall 2001.

Jenkins, Lillie Ruth. DESIGNING SYSTEMS THAT MAKE SENSE: WHAT DESIGNERS SAY ABOUT THEIR COMMUNICATION WITH USERS DURING THE USABILITY TESTING CYCLE. [Dissertation, PhD] the Ohio State University. 2004.

Jobrack-McDaniel, Naomi Ruth. Remote versus Laboratory Usability Evaluations with Static versus Interactive Graphical User Interfaces. [Thesis, M.A.] San Jose State University. May 1999.

Oct 15 - "Usability Questionnaire" & "Usability Evaluation" -ProQuest Dissertation Search

Oct 15 - Search dissertations with keywords "Usability Questionnaire" and "Usability Evaluation" at ProQuest.

Eden, Joel Uzi. The Distributed: Cognitive Walkthrough: The Impact of Differences in Cognitive Theoruy on Usability Evaluation. [Thesis, PhD] Drexel University. May 2008.

Fu, Limin. Usability Evaluation of Web Page Design. [Thesis, PhD] Purdue University. May 1999.

Howarth, Jonathan Randall. Supporting Novice Usability Practitioners with Usability Engineering Tools. [Dissertation, PhD in Computer Science & Applications] Virginia Polytechnic Institute and State University, Blacksburg, Virginia. April 13, 2007.

Olarte Enciso, Nadia Elizabeth. Development and Usability Heuristic Evaluation of an Application in PDA for Supporting Physicians Tasks at the Point of Care. [Project, M.E. in Computer Engineering] University of Perto Rico, Mayaguez. 2007.

Rawls, Charles L. Jr. PERFORMANCE SUPPORT AND USABILITY: AN EXPERIMENTAL STUDY OF ELECTRONIC PERFORMANCE SUPPORT INTERFACES. [Dissertation, Doctor of Education] University of Central Florida, Orlando, Florida. 2005.

Ryu, Young Sam. Development of Usability Questionnaires for Electronic Mobile Products and Decision Making Methods. [Dissertation, PhD in Industrial & Systems Engineering] Virginia Polytechnic Institute and State University, Blacksburg, Virginia. July 2005.

Schnitman, Ivana. The Dynamics Involved in Web-based Learning Environment (WLE) Interface Design and Human-Computer Interactions (HCI): Connections with Learning Performance. [Dissertation, Doctor of Education in Technology Education] West Virginia University, Morgantown, West Virginia. 2007.

Young, Deborah Elspeth. Evaluation of Dust Control Technologies for Drywall Finishing Operations: Industry Implementation Trends, Worker Perceptions, Effectiveness and Usability. [Dissertation, PhD in Industrial & Systems Engineering] Virginia Polytechnic Institute and State University, Blacksburg, Virginia. August 3, 2007.

Zhikute, Lina. Design and Evaluation of a Siebel Basic Navigation Course. [Thesis, M.A. (Educational Technology)] Concorda University, Montreal, Canada. April 2006.

Oct 30 - Second F2F Meeting with Supervisor & Co-Supervisor

Second F2F Meeting with Supervisor and Co-Supervisor.
30 October 2009 (Friday)
10.00 am
At K-Space.

Oct 23 - Borrowed 12 books from MMU Library

Richard MANDER, and Bud Smith. Web Usability for Dummies. Hungry Minds, New York, 2002.
TK 5105.888 .M36 2002

Jakob NIELSEN, and Hoa Loranger. Prioritizing Web Usability. New Riders, Berkeley, California, 2006.
TK 5105.888 .N54 2006

Jakob NIELSEN. Designing Web Usability. New Riders, Indianapolis, Indiana, 2000.
TK 5105.888 .N54 2000

Christian LINDHOLM, Turkka Keinonen, Harri Kiljander. Mobile Usability: How Nokia Changed the Face of Mobile Phone. McGraw-Hill, New York, 2003.
TK 6570 .M6 M63 2003

Mark PEARROW. The Wireless Web Usability Handbook. Charles River Media, Hingham, Massachusetts, 2002.
TK 5105.888 .P43 2002

Jeffrey RUBIN. Handbook of Usability Testing. John Wiley & Sons, New York, 1994.
QA 76.9 .U83 R83 1994

Matt JONES, and Gary Marsden. Mobile Interaction Design. John Wiley & Sons, West Sussex, England, 2006.
TK 6570 .M6 J66 2006

Tom BRINCK, Darren Gergle, Scott D. Wood. Usability For The Web: Designing Web Sites That Work. Morgan Kaufmann Publishers, San Francisco, California, 2002.
TK 5105.888 .B75 2002

Ameeta D. JADAV. Designing Usable Web Interfaces. Prentice Hall, Upper Saddle River, New Jersey, 2003.
TK 5105.888 .J33 2003

Deborah J. MAYHEW. The Usability Engineering Lifecycle: A Practitioner's Handbook for User Interface Design. Morgan Kaufmann Publishers, San Francisco, 1999.
QA 76.9 .U83 M39 1999

Randolf G. BIAS, and Deborah J. Mayhew (editors). Cost-Justifying Usability: An Update for an Internet Age, Second Edition. Morgan Kaufmann Publishers, San Francisco, California, 2005.
QA 76.9 .U83 C67 2005

John EGAN. Relationship Marketing: Exploring relational strategies in marketing, Third Edition. Pearson Education Limited, Essex, England, 2008.
HF 5415.55 .E33 2008

Monday, October 19, 2009

Plan for week Oct 19-24: Review Chapter 1,2; Download dissertations

Plan for week of Oct 19-24, 2009:
1) Review again Chapter 1 and 2.....for the purpose of part of the preparation for Proposal Defense.
2) Download dissertations related to usability evaluation, mobile learning.

Plan to use ProQuest to search for the relevant dissertations. Dissertations downloaded will be used as reference and benchmark for my planning of my Dissertation.

Last week, I did not manage to do what I planned to. Last Thursday & Friday, my notebook computer was down, i.e. after I went to K-Space. I think my computer was attacked when I was using the K-Space wifi.
What can I do to protect my computer?

Hope...to be able to try using some Usability Questionnaire this week....to cover what I missed last week.

Thursday, October 15, 2009

Oct 15, 2009 - I am now REGISTERed as PhD Candidate

Today, October 15, 2009 (Thursday), I was registered into the Ph.D. (CM) program at Institute of Postgraduate Studies (IPS) at Multimedia University (MMU).

Heee....I am SO HapPY!!!

I have visited the IPS Library.
I have used ProQuest for the first time to search and download dissertations. Due to time limitation, I only managed to download a few. Would download more next week when I go to MMU.

Bumped into Co-Supervisor at K-Space and chatted a while. He emphasised about PhD Candidate having to publish research papers in journals. He also talked about the benefit of attending conference to do some social networking, getting to know potential external examiners.
Met up Supervisor, and told her about my new understanding of Usability after several weeks of exploratory reading. Supervisor suggested to meet up together with her friend, Dr F, and C0-Supervisor to discuss/share on Usability matters and my research....over some tea.
Sounds good to me.

Saturday, October 10, 2009

Plan for Oct 12-17: Evaluate usability using Usability Questionnaires

Plan for week of Oct 12-17: experience using the Usability Questionnaires to evaluate usability of mobile websites/WAPsites.

Progress report for week of Oct 5-7:
I managed to read about many Usability Questionnaires. One particular questionnaire I could not find was CUSI.
I read about ASQ, PSSUQ, CSUQ, SUMI, QUIS, SUS,.....just to recall off-hand.
I did a total of 34 blog writings.

I am satisfied with this week's progress. Last week, I was too busy from Monday till Thursday, whereas on Friday, I was really mentally exhauted. Last Friday afternoon till last Saturday, I rested a lot (by supplementary naps and watching DVD to relax my mind).

Tue, Oct 6 - 2 blogs - @Cyberjaya
Thu, Oct 8 - 11 blogs - @Jakarta
Fri, Oct 9 - 14 blogs - @Jakarta
Sat, Oct 10 - 7 blogs - @Jakarta.

Hooray! Yippee! Yaahoo! I am registering for my PhD (CM) on next Thursday, Oct 15, 2009.

Oct 10 - Microsoft Desirability Tool - Travis

Microsoft Desirability Tool
as per Travis

This is a type of Usability Questionnaire.

Filename: wordchoice.xls

Accessible
Advanced
Ambiguous
Annoying
Appealing
Approachable
Attractive
Awkward
Boring
Bright
Business-like
Busy
Clean
Clear
Cluttered
Compelling
Complex
Comprehensive
Confusing
Consistent
Contradictory
Controllable
Convenient
Counter-intuitive
Creative
Credible
Cutting edge
Dated
Desirable
Difficult
Distracting
Dull
Easy to use
Effective
Efficient
Effortless
Empowering
Energetic
Engaging
Entertaining
Exciting
Expected
Familiar
Fast
Faulty
Flexible
Fresh
Friendly
Frustrating
Fun
Hard to Use
High quality
Illogical
Impressive
Inadequate
Incomprehensible
Inconsistent
Ineffective
Innovative
Insecure
Intimidating
Intuitive
Irrelevant
Meaningful
Misleading
Motivating
New
Non-standard
Obscure
Old
Ordinary
Organised
Overwhelming
Patronising
Poor quality
Powerful
Predictable
Professional
Relevant
Reliable
Responsive
Rigid
Satisfying
Secure
Simple
Simplistic
Slow
Sophisticated
Stable
Stimulating
Straightforward
Stressful
System-oriented
Time-consuming
Time-saving
Too technical
Trustworthy
Unattractive
Unconventional
Understandable
Unpredictable
Unrefined
Usable
Useful
Vague

Source: http://www.userfocus.co.uk/articles/satisfaction.html

Friday, October 9, 2009

Oct 10 - Travis, Measuring satisfaction: Beyond the usability questionnaire

Measuring satisfaction: Beyond the usability questionnaire

Most usability tests culminate with a short questionnaire that asks the participant to rate, usually on a 5- or 7-point scale, various characteristics of the system. Experience shows that participants are reluctant to be critical of a system, no matter how difficult they found the tasks.

This article describes a guided interview technique that overcomes this problem based on a word list of over 100 adjectives. — David Travis, March 3, 2008, updated 22 July 2009.

Measuring user satisfaction

A common mistake made by novice usability test moderators is to think that the aim of a usability test is to elicit a participant's reactions to a user interface. Experienced test moderators realise that a participant's reaction is just one measure of usability.
To get the complete usability picture, we also need to consider effectiveness (can people complete their tasks?) and efficiency (how long do people take?).

These dimensions of usability come from the International Standard, ISO 9241-11, which defines usability as:
"Extent to which a product can be used by specified users to achieve specified goals with effectiveness, efficiency and satisfaction in a specified context of use."

The ISO definition of usability makes it clear that user satisfaction is just one important dimension of usability.
People may be well disposed to a system but fail to complete business-critical tasks with it, or do so in a roundabout way.
The three measures of usability — effectiveness, efficiency and satisfactionare independent (PDF document) and you need to measure all three to get a rounded measure of usability.

Importance of collecting satisfaction measures

A second mistake made by people new to the field of usability is to measure satisfaction by using a questionnaire only (either at the end of the session or on completion of each task). There are many issues to consider when designing a good questionnaire, and few usability questionnaires are up to scratch.
For example, we've known for over 60 years that you need to avoid the "acquiescence bias": the fact that people are more likely to agree with a statement than disagree with it (Cronbach, 1946). This means that you need to balance positively-phrased statements (such as "I found this interface easy to use") with negative ones (such as "I found this interface difficult to navigate").
So it's surprising that two commonly used questionnaires in the field of usability — the Usefulness, Satisfaction, and Ease of use (USE) questionnaire and the Computer System Usability Questionnaire (CSUQ) — suffer from just this problem: every question in both of these questionnaires is positively phrased, which means the results from them are biased towards positive responding.

Questionnaires that avoid this source of bias often suffer from other sources of bias.
For example, few undergo tests of reliability. This means that the same questionnaire may yield different results at different times (this can be checked by measuring the questionnaire's test-retest reliability).
Even fewer usability questionnaires are assessed for validity. This means that there is no guarantee that the questionnaire actually measures user satisfaction.

Problems with measuring satisfaction

In our studies, we notice that participants tend to rate an interface highly on a post-test questionnaire even when they fail to complete many of the tasks.

"In studies such as this one, we have found subjects reluctant to be critical of designs when they are asked to assign a rating to the design. In our usability tests, we see the same phenomenon even when we encourage subjects to be critical. We speculate that the test subjects feel that giving a low rating to a product gives the impression that they are "negative" people, that the ratings reflect negatively on their ability to use computer-based technology, that some of the blame for a product's poor performance falls on them, or that they don't want to hurt the feelings of the person conducting the test." - Wiklund et al (1992).

Once you ask participants to assign a number to their experience, their experience suddenly becomes better than it actually was. We need some way of controlling this tendency.

The Microsoft Desirability Toolkit

There are alternatives to measuring satisfaction with a questionnaire.
A few years back, researchers at Microsoft developed the "Desirability Toolkit" (Word document). This comprised a series of 118 "product reaction cards", containing words like "Consistent", "Sophisticated" and "Useful".

On completion of a usability test, participants were asked to sort through the cards and select the five cards that most closely matched their personal reactions to the system they had just used.
The five selected cards then became the basis of a post-test guided interview.
For example, the interviewer would pick one of the cards chosen by the participant and say, "I see that one of the cards you selected was 'Consistent'. Tell me what was behind your choice of that word".
I've used this approach in several usability studies and what has struck me is the fact that it helps elicit negative comments from participants.
This methodology seems to give participants "permission" to be critical of the system. Not only do participants choose negative as well as positive adjectives, they may also place a "negative" spin on an otherwise "positive" adjective.
For example, "Sophisticated" at first sounds positive but I have had participants choose this item to mean, "It's a bit too sophisticated for my tastes".

An alternative implementation

In our studies, we now use a simple paper checklist of adjectives. We first ask people to read through the words and select as many as they like that they think apply to the interface.
We then ask the participant to circle just 5 adjectives from those chosen, and these adjectives become the basis of the post-test guided interview.

Customising the word list

This is a technique to help participants categorise their reactions to an interface that you then explore in more depth in the post-test guided interview. This means that, for a particular study, you should replace some of the words with others that may be more relevant.
For example, if we were usability testing a web site for a client whose brand values are "Fun, Value for Money, Quality and Innovation", we would replace four of the existing adjectives with those.
(This makes for an interesting discussion with the client when participants don't select those terms. It gets even more interesting if participants choose antonyms to the brand values, such as "Boring", "Expensive", "Inferior" and "Traditional").

How to analyse the data

The real benefit of this approach is in the way it uncovers participant reactions and attitudes. You get a depth of understanding and an authenticity in participants' reactions that just can't be achieved with traditional questionnaires and surveys. So this approach is ideal as a qualitative approach to guide an interview.
But you can also derive metrics from these data. Here's how.

Word cloud
The simplest measure is to count up the number of times a word was chosen by participants.
In our studies, we find that we get a fair amount of consistency in the words chosen. For example, Figure 1 shows a word cloud from the results we obtained from a recent 12-participant usability test. Participants could choose from a corpus of 103 words but some words were selected more often (such as "Easy to use", which was selected by half the participants).

Verbal protocol analysis
A more robust statistic can be derived from carrying out a verbal protocol analysis of the guided interview where the participant discusses the reasons for his or her choice of words. This simply means listening to the post-test interview and coding each participant's comments.
The simplest way to do this is to divide a piece of paper into two columns and write "Positive" at the top of one column and "Negative" at the top of the other column. Listen to the interview (either live or recorded) and every time you hear the participant make a positive comment about the interface, place a mark in the "Positive" column. Every time you hear the participant make a negative comment about the interface, place a mark in the "Negative" column. At the end of the interview, you add up the positive and negative totals and compute the percentage of positive comments.
So for example, if there are 5 positive comments and 5 negative comments the percentage of positive comments is 50% (5 divided by 10). Similarly, if there are 9 positive comments and 3 negative comments the percentage of positive comments is 75% (9 divided by 12).
This could be used as a satisfaction metric to compare interfaces.

Now you try

If you would like to try out this method in one of your own studies, we've developed an Excel spreadsheet that you can use to generate and randomise the word list. (Randomisation of the list prevents order effects).
The spreadsheet also contains a worksheet that lets you analyse the data and generate a word cloud. We do this by using an advanced feature in Wordle. (It bothers us that Wordle applies colours randomly. We want the colour to convey information like the text size does, as in Figure 1 above. So we used some Excel tomfoolery to generate colour information for Wordle. This way, the most popular adjectives are also the darkest and the less popular comments fade into the distance). The Excel file contains macros; you can disable the macros if you want and still print the word list, but you'll lose the randomisation and analysis functionality. I hope you find it useful to start collecting more in-depth measures of user satisfaction.
Download the spreadsheet (version 1.0) to generate and randomise the word list.
By the way, if you find this spreadsheet useful then you'll love our Usability Test Plan Toolkit. The word list is just one of 6 appendices in the Test Plan Toolkit, which has everything you need to conduct your next usability test.

Literature cited

Benedek, J. and Miner, T. "Measuring Desirability: New Methods for Evaluating Desirability in a Usability Lab Setting." (Word document) Redmond, WA: Microsoft Corporation, 2002.
Cronbach, L.J. (1946) Response sets and test validity. Educational and Psychological Measurements 6, pp. 475-494.
Wiklund, M., Thurrott, C. and Dumas, J. (1992). "Does the Fidelity of Software Prototypes Affect the Perception of Usability?" Proc. Human Factors Society 36th Annual Meeting, 399-403.

Source: http://www.userfocus.co.uk/articles/satisfaction.html

Oct 10 - Tullis & Stetson, A Comparison of Questionnaires for Assessing Website Usability

A Comparison of Questionnaires for Assessing Website Usability
Thomas S. Tullis, Fidelity Investments
Jacqueline N. Stetson, Fidelity Investments and Bentley College

 
Various questionnaires have been reported in the literature for assessing the perceived usability of an interactive system, e.g:
–Questionnaire for User Interface Satisfaction (QUIS) (1988)
–Computer System Usability Questionnaire (CSUQ) (1995)
–System Usability Scale (SUS) (1996)
 
A slightly different approach was taken by Microsoft with their "Product Reaction Cards"(2002)
And we have been using our own questionnaire for several years in our Usability Lab at Fidelity Investments
 
Problem
How well do these questionnaires apply to the assessment of Websites?
Do any of these questionnaires work well, as an adjunct to a usability test, with relatively small numbers of users?


Our Study
Limited ourselves to questionnaires in the published literature
–Did not include commercial services for evaluating website usability (e.g., WAMMI, RelevantView, NetRaker, Vividence).
We studied five questionnaires:
–SUS
–QUIS
–CSUQ
–Microsoft’s "Words"
–Our own questionnaire
 
Questionnaire #1: SUS
•Developed at Digital Equipment Corp.
•Consists of ten items.
•Adapted by replacing "system"with "website".
•Each item is a statement (positive or negative) and a rating on a five-point scale of "Strongly Disagree"to "Strongly Agree".
 
Questionnaire #2: QUIS
•Developed at the University of Maryland.
•Original questionnaire had 27 questions.
–We dropped 3 that did not seem relevant to Websites (e.g., "Remembering names and use of commands").
•"System"was replaced by "website"and term "screen"was replaced by "web page".
•Each question is a rating on a ten-point scale with appropriate anchors.
 
Questionnaire #3: CSUQ
•Developed at IBM.
•Composed of 19 questions.
•"System"or "computer system"was replaced by "website".
•Each question is a statement and a rating on a seven-point scale of "Strongly Disagree"to "Strongly Agree".
 
Questionnaire #4: Words
•Based on the 118 words used by Microsoft on their Product Reaction Cards.
–Some positive (e.g., "Convenient")
–Some negative (e.g., "Unattractive")
•Each word was presented with a check-box
–Users were asked to choose the words that best describe their interaction with the website.
–Could choose as many or as few words as they wished.

Questionnaire #5: Ours
•Developed ourselves and have been using for several years in our usability tests of websites.
•Composed of nine statements (e.g., "This website is visually appealing") to which the user responds on a seven-point scale from "Strongly Disagree"to "Strongly Agree".
•Points of the scale are numbered -3, -2, -1, 0, 1, 2, 3.
–Obvious neutral point at 0.
 
A Live Experiment!
•We’re going to compare two sites:
–CircuitCity.com
–Outpost.com
•Task 1: Your digital camera uses SmartMediacards. Find the least expensive external reader (USB) for your PC that will read them.
•Task 2: You do lots of hiking. Find the least
expensive personal GPS with map capability
and at least 8 MB of memory.

 
Method of Our Study
•Conducted entirely on our company Intranet.
•123 of our employees participated.
•Each participant was randomly assigned to one of the five questionnaire conditions.
•Each was asked to perform two tasks on each of two well-known personal financial information sites.
•Sites studied:
–Finance.Yahoo.com
–Kiplinger.com
–Hereafter referred to only as "Site 1"and
"Site 2". Don’t assume which is which.
•Tasks:
–Find the highest price in the past year for a share of .
–Find the mutual fund with the highest 3-year return.
•Order of presentation of the two sites was randomized.
•After completing (or at least attempting) the two tasks on a site, the user was presented with the questionnaire for their randomly selected condition.
•Each user completed the same questionnaire for both sites.

 
Data Analysis
•For each participant, an overall score was calculated for each website by averaging all of the ratings on the questionnaire that was used.
–All scales had been coded internally so that the "better"end corresponded to higher numbers.
–These were converted to percentages by dividing each score by the maximum score possible on that scale.
–For example, a rating of 3 on SUS was converted to a percentage by dividing that by 5 (the maximum score for SUS), giving a percentage of 60%.
•Special treatment for the "Words"condition since it did not involve rating scales:
–Before the study, we classified each of the words as being "Positive"or "Negative".
–Not grouped or identified as such to the participants.
–For each participant, an overall score was calculated by counting the total number of words that person selected and then dividing that number into the number of "Positive"words chosen.
–If someone selected 8 positive words and 10 words total, that yielded a score of 80%.
 
Results
•Calculated frequency distributions for the ratings, converted to percentages, for:
–Each questionnaire
–Both websites

Bar Charts are used to compare the results. See http://www.upassoc.org/usability_resources/conference/2004/UPA-2004-TullisStetson.pdf

My Comments: This study will be a good benchmark and reference for my research.

 
Results: Summary
•All five questionnaires showed that Site 1 was significantly preferred over Site 2 (p<.01). •The largest mean difference (74% vs. 38%) was found using the Words questionnaire, but this was also the questionnaire that yielded the greatest variability. Analysis of Sub-samples
•Next we analyzed randomly selected sub-samples of the data at size 6, 8, 10, 12, and 14.
–20 random samples for each size
•For each sample, t-test was conducted to determine whether the results showed that Site 1 was significantly better than Site 2 (the conclusion from the full dataset).
 
Analysis of Sub-samples
•Accuracy of the results increases as the sample size gets larger.
•With a sample size of only 6, all of the questionnaires yield accuracy of only 30-40%
–60-70% of the time, at that sample size, you would fail to find a significant difference between the two sites.
•Accuracy of some of the questionnaires increases quicker than others.
–SUS jumps up to about 75% accuracy at a size of 8.
 
Caveats
•Results were undoubtedly influenced by:
–The sites studied.
–The tasks used.
•We have only addressed the question of whether a given questionnaire was able to reliably distinguish between the ratings of one site vs. the other.
–Often you care more about how well the results help guide a redesign.
 

Conclusions
•One of the simplest questionnaires studied, SUS (with only 10 rating scales), yielded among the most reliable results across sample sizes.
–Also the only one whose questions all address different aspects of the user’s reaction to the website as a whole.
•For the conditions of this study, sample sizes of at least 12-14 participants are needed to get reasonably reliable results.

Source:
http://www.upassoc.org/usability_resources/conference/2004/UPA-2004-TullisStetson.pdf

Oct 10 - Borysowich, Sample Website Usability Questionnaire

Sample Website Usability Questionnaire

Craig Borysowich (Chief Technology Tactician) posted 7/24/2007
Please provide the following information so we can further develop this web site to be more usable.

1. How easy was it to understand each of the links on the home page?

— Very Easy — Easy — Average — Difficult — Very Difficult

2. On the home page, how easy was it to find the appropriate link for information you wanted?

— Very Easy — Easy — Average — Difficult — Very Difficult


3. How easy was it to understand the titles on each page you accessed?

— Very Easy — Easy — Average — Difficult — Very Difficult


4. How easy was it to scan the titles in text to find the information you wanted?

— Very Easy — Easy — Average — Difficult — Very Difficult


5. How easy was it to understand links to other web sites?

— Very Easy — Easy — Average — Difficult — Very Difficult

Source: http://it.toolbox.com/blogs/enterprise-solutions/sample-website-usability-questionnaire-17825

Oct 10 - Lund, Measuring Usability with the USE Questionnaire

Over the years I have worked with colleagues at Ameritech (where the work began), U.S. WEST Advanced Technologies, and most recently Sapient to create a tool that has helped in dealing with some of these questions.

The tool that we developed is called the USE Questionnaire. USE stands for Usefulness, Satisfaction, and Ease of use. These are the three dimensions that emerged most strongly in the early development of the USE Questionnaire. For many applications, Usability appears to consist of Usefulness and Ease of Use, and Usefulness and Ease of Use are correlated.

Each factor in turn drives user satisfaction and frequency of use. Users appear to have a good sense of what is usable and what is not, and can apply their internal metrics across domains.

USE Questionnaire

Usefulness
It helps me be more effective.
It helps me be more productive.
It is useful.
It gives me more control over the activities in my life.
It makes the things I want to accomplish easier to get done.
It saves me time when I use it.
It meets my needs.
It does everything I would expect it to do.


Ease of Use
It is easy to use.
It is simple to use.
It is user friendly.
It requires the fewest steps possible to accomplish what I want to do with it.
It is flexible.
Using it is effortless.
I can use it without written instructions.
I don't notice any inconsistencies as I use it.
Both occasional and regular users would like it.
I can recover from mistakes quickly and easily.
I can use it successfully every time.


Ease of Learning
I learned to use it quickly.
I easily remember how to use it.
It is easy to learn to use it.
I quickly became skillful with it.


Satisfaction
I am satisfied with it.
I would recommend it to a friend.
It is fun to use.
It works the way I want it to work.
It is wonderful.
I feel I need to have it.

It is pleasant to use.

Work to refine the items and the scales continues. There is some evidence that for web sites and certain consumer products there is an additional dimension of fun or aesthetics associated with making a product compelling.

General Background

Subjective reactions to the usability of a product or application tend to be neglected in favor of performance measures, and yet it is often the case that these metrics measure the aspects of the user experience that are most closely tied to user behavior and purchase decisions.

While some tools exist for assessing software usability, they typically are proprietary (and may only be available for a fee). More importantly, they do not do a good job of assessing usability across domains.
When re-engineering began at Ameritech, it became important to be able to set benchmarks for product usability and to be able to measure progress against those benchmarks.
It also was critical to ensure resources were being used as efficiently as possible, and so tools to help select the most cost-effective methodology and the ability to prioritize design problems to be fixed by developers became important.
Finally, it became clear that we could eliminate all the design problems and still end up with a product that would fail in the marketplace.

It was with this environment as a background that a series of studies began at Ameritech. The first one was headed by Amy Schwartz, and was a collaboration of human factors, market research in our largest marketing organization, and a researcher from the University of Michigan business school.

Building on that research, I decided to develop a short questionnaire that could be used to measure the most important dimensions of usability for users, and to measure those dimensions across domains. Ideally it should work for software, hardware, services, and user support materials. It should allow meaningful comparisons of products in different domains, even though testing of the products happened at different times and perhaps under different circumstances. In the best of all worlds, the items would have a certain amount of face validity for both users and practitioners, and it would be possible to imagine the aspects of the design that might influence ratings of the items.
It would not be intended to be a diagnostic tool, but rather would treat the dimensions of usability as dependent variables.

Subsequent research would assess how various aspects of a given category of design would impact usability ratings.
The early studies at Ameritech suggested that a viable questionnaire could be created. Interestingly, the results of those early studies were consistent with studies conducted in the MIS and technology diffusion areas, which also had identified the importance of and the relationship between Usefulness, Satisfaction, and Ease of Use.

How It Developed

The first step in identifying potential items for the questionnaire was to collect a large pool of items to test. The items were collected from previous internal studies, from the literature, and from brainstorming. The list was then massaged to eliminate or reword items that could not be applied across the hardware, software, documentation, and service domains. One goal was to make the items as simply worded as possible, and as general as possible.
As rounds of testing progressed, standard psychometric techniques were used to weed out additional items that appeared to be too idiosyncratic or to improve items through ongoing tweaking of the wording. In general, the items contributing to each scale were of approximately equal weight, the Chronbach's Alphas were very high, and for the most part the items appeared to tap slightly different aspects of the dimensions being measured.

The questionnaires were constructed as seven-point Likert rating scales. Users were asked to rate agreement with the statements, raging from strongly disagree to strongly agree.
Various forms of the questionnaires were used to evaluate user attitudes towards a variety of consumer products.

Factor analyses following each study suggested that users were evaluating the products primarily using three dimensions, Usefulness, Satisfaction, and Ease of Use. Evidence of other dimensions was found, but these three served to most effectively discriminate between interfaces.
Partial correlation calculated using scales derived for these dimensions suggested that Ease of Use and Usefulness influence one another, such that improvements in Ease of Use improve ratings of Usefulness and vice versa.
While both drive Satisfaction, Usefulness is relatively less important when the systems are internal systems that users are required to use. Users are more variable in their Usefulness ratings when they have had only limited exposure to a product.
As expected from the literature, Satisfaction was strongly related to the usage (actual or predicted).
For internal systems, the items contributing to Ease of Use for other products actually could be separated into two factors, Ease of Learning and Ease of Use (which were obviously highly correlated).

Conclusion

While the questionnaire has been used successfully by many companies around the world, and as part of several dissertation projects, the development of the questionnaire is still not over. For the reasons cited, this is an excellent starting place. The norms I have developed over the years have been useful in determining when I have achieved sufficient usability to enable success in the market. To truly develop a standardized instrument, however, the items should be taken through a complete psychometric instrument development process.
A study I have been hoping to run is one that simultaneously uses the USE Questionnaire and other questionnaires like SUMI or QUIS to evaluate applications. Once a publicly available (i.e., free) standardized questionnaire is available that applies across domains, a variety of interesting lines of research are possible. The USE Questionnaire should continue to be useful as it stands, but I hope the best is yet to come.


Measuring Usability with the USE Questionnaire
By Arnold M. Lund

Source: http://www.stcsig.org/usability/newsletter/0110_measuring_with_use.html

Oct 10 - HCIL, QUIS: The Questionnaire for User Interaction Satisfaction

QUIS: The Questionnaire for User Interaction Satisfaction

Subjective evaluation is an important component in the evaluation of workstation usability.
We have developed and standardized a general user evaluation instrument for interactive computer systems. The methods of psychological test construction were applied in order to ensure proper construct and empirical validity of the items and to assess their reliability. A hierarchical approach was taken in which overall usability was divided into subcomponents which constituted independent psychometric scales. For example, subcomponents include character readability, usefulness of online help, and meaningfulness of error messages.

Evaluation on these scales is assessed by user ratings of specific system attributes such as character definition, contrast, font, and spacing for the scale of character readability.

The purpose of the questionnaire is to:
1. guide in the design or redesign of systems,
2. give managers a tool for assessing potential areas of system improvement,
3. provide researchers with a validated instrument for conducting comparative evaluations, and
4. serve as a test instrument in usability labs. Validation studies continue to be run. It was recently shown that mean ratings are virtually the same for paper versus computer versions of the QUIS, but the computer version elicits more and longer open-ended comments.

The QUIS is licensed through the Office of Technology Liaison. Short and long paper versions are available as well as online versions that run in Windows and Macintosh environments, and now in HTML. The QUIS is currently licensed to dozens of usability labs and research centers around the world.

Related Papers:
Slaughter, L., Norman, K.L., Shneiderman, B. (March 1995) Assessing users' subjective satisfaction with the Information System for Youth Services (ISYS),VA Tech Proc. of Third Annual Mid-Atlantic Human Factors Conference (Blacksburg, VA, March 26-28, 1995) 164-170.CS-TR-3463, CAR-TR-768
Chin, J. P., Diehl, V. A, Norman, K. (Sept. 1987) Development of an instrument measuring user satisfaction of the human-computer interface, Proc. ACM CHI '88 (Washington, DC) 213-218. CS-TR-1926, CAR-TR-328

Participants:
Kent Norman, Department of Psychology
Ben Shneiderman, Computer Science
Ben Harper, Department of Psychology

Source: http://www.cs.umd.edu/hcil/quis/

Oct 10 - Usabilitynet, Usability Questionnaires

My comments: This is a good summary/intro of several Usability Questionnaires.

It has been observed that questionnaires are the most frequently used tools for usability evaluation. This page is a list of usability questionnaire resources, extending the information presented on the questionnaires page of Usabilitynet.

SUMI
This is a mature questionnaire whose standardisation base and manual have been regularly updated. It is good for desktop products, but has also been used to evaluate command-and-control applications. It is a commercial product which comes complete with scoring and report generation software. It is designed and sold by the Human Factors Research Group at University College Cork.

WAMMI
This is a new questionnaire, designed to evaluate the quality of use of web sites. It is backed up by an extensive standardisation database, and it is purchased on a per report basis. It is the result of a joint development project by Jurek Kirakowski and Nigel Claridge.

SUS
This is a mature questionnaire, developed by John Brooke in 1986 and not published until years later. It is very robust and has been extensively used and adapted. It is public domain and nobody has published any standardisation data about it. Of all the public domain questionnaires, this is the most strongly recommended.

QUIS
This is a questionnaire developed by Kent Norman that has been modified many times to keep it current since its first appearance. It is commercially available and is championed by Ben Shneiderman in his book Designing the User Interface. Reading, MA: Addison-Wesley Publishing Co., 1998. Despite lack of standardisation and validation data, it has many adherents.

USE (see http://www.mindspring.com/~alund/USE/IntroductionToUse.html)
This questionnaire is still in development by Arnie Lund (last updated 11/11/98). It attempts to create a three-factor model of usability that can be applied to many situations. However, no reliability or validation data are presented. Public domain use is encouraged.

CSUQ
This is a well-designed questionnaire developed by Jim Lewis and it is public domain. It has excellent psychometric reliability properties but no standardisation base.

IsoNorm (in German only)
This questionnaire is designed to test the usability quality of software following the ISO 9241 part 10 principles. It is created by a team led by Jochim Puemper. Strong reliabilities are claimed for the sub-scales, although it appears there may be a strong inter-correlation between them as well. Downloads and an on-line version are available from the above URL, as well as articles about it (all in German.)

IsoMetrics
This questionnaire is produced by Guenter Gediga and his team. It is another attempt to produce a way of measuring ISO 9241 part 10, with reference to specific software features that may give rise to low usability data. It is therefore good both for summative and formative assessments. The questionnaire is well researched and detailed statistical information is given. Downloads of English and German versions are available. There is no standardisation base for it but it is public domain.


Questionnaire resources
Source: http://www.usabilitynet.org/tools/r_questionnaire.htm

Oct 9 - Usability Questionnaire: Nielsen's Heuristic Evaluation

Nielsen's Heuristic Evaluation
Based on: Nielsen, J. (1993) Usability Engineering. Academic Press. Chapter 5, p. 115.

Please evaluate the system according to Nielsen's usability heuristics.
Try to respond to all the items.
For items that are not applicable, use: NA

Likert scale:
1 = bad
2
3
4
5
6
7 = good
NA

1. Simple and Natural Dialogue

2. Speak the Users' Language

3. Minimize User Memory Load

4. Consistency

5. Feedback

6. Clearly Marked Exits

7. Shortcuts

8. Good Error Messages

9. Prevent Errors

10. Help and Documentation


Nielsen's Heuristic Evaluation
Based on: Nielsen, J. (1993) Usability Engineering. Academic Press. Chapter 5, p. 115.
Source: http://hcibib.org/perlman/question.cgi?form=NHE

Oct 9 - Usability Questionnaire: Nielsen's Attributes of Usability

Nielsen's Attributes of Usability
.
Please rate the system according to Nielsen's attributes of usability.
Try to respond to all the items.
For items that are not applicable, use: NA
Likert scale:
1
2
3
4
5
6
7
NA


1. Learnability

2. Efficiency

3. Memorability

4. Errors (Accuracy)

5. Subjective Satisfaction


Nielsen's Attributes of Usability
Based on: Nielsen, J. (1993) Usability Engineering. Academic Press. Chapter 2.2, p. 26.
Source: http://hcibib.org/perlman/question.cgi?form=NAU