Friday, October 9, 2009

Oct 9 - ASQ: After-Scenario Questionnaire - Lewis of IBM

The After-Scenario Questionnaire (ASQ)

In a scenario-based usability study, participants use a product, such as a computer application, to do a series of realistic tasks.
The After-Scenario Questionnaire (ASQ) is a three-item questionnaire that IBM usability evaluators have used to assess participant satisfaction after the completion of each scenario. (See the appendix for a copy of the questionnaire.)
The items address three important components of user satisfaction with system usability: ease of task completion, time to complete a task, and adequacy of support information (on-line help, messages, and documentation). Because the questionnaire is very short, it takes very little time for participants to complete – an important practical consideration for usability studies.
Usability professionals have used these items (or very similar items) in usability studies at IBM for many years, but a recent series of studies has provided a database of sufficient size to allow a preliminary psychometric evaluation of the ASQ.
The ASQ items are the constituent items for a summative, or Likert, scale (McIver & Carmines, 1981; Nunnally, 1978). In developing summative scales, it is important to consider item construction, item selection and psychometric evaluation.

Item Construction
The items are 7-point graphic scales, anchored at the end points with the terms "Strongly agree" for 1 and "Strongly disagree" for 7, and a Not Applicable (N/A) point outside the scale, as shown in the appendix.

Item Selection
The content of the items reflects components of usability that usability
professionals at IBM have generally considered important.

Psychometric Evaluation
The office-applications studies. Scenario-based usability studies of three office application systems (Lewis, Henry, & Mack, 1990) provided the data for a psychometric evaluation of the ASQ. Forty-eight employees of temporary help agencies participated in the studies, with 15 hired in Hawthorne, New York; 15 hired in Boca Raton, Florida; and 18 hired in Southbury, Connecticut. Each set of participants consisted of one-third clerical/secretarial work experience with no mouse experience (SECNO), one-third business professionals with no mouse experience (BPNO), and one-third business professionals with at least three months of mouse experience (BPMS). All participants had at least three months experience using some type of computer system. They had no programming training or experience, and had no (or very limited) knowledge of operating systems.
Popular word-processing applications, mail applications, calendar applications, and spreadsheet applications installed in three different operating environments comprised the three office systems (hereafter referred to as System I, System II and System III). All three environments allowed windowing, used a mouse as a pointing device, and allowed a certain amount of integration among the applications. The systems differed in details of implementation, but were generally similar. The three wordprocessing and spreadsheet applications were similar, but the mail and calendar applications differed considerably. The studies contained eight scenarios in common,
Participants began the study with a brief lab tour, read a description of the study's purpose and the day's agenda, and completed a background questionnaire. Participants using System I completed an interactive tutorial shipped with the system. Tutors provided the other participants with a brief demonstration about how to move, point and select with a mouse; how to open the icons for each product; and how to maximize and minimize windows.
After this system exploration period (usually about 1 hour), participants performed the scenarios, completing the ASQ as they finished each scenario. While the participant performed the scenario, an observer logged the participant's activities. If the participant completed the scenario without assistance and produced the correct output, then he or she completed the scenario successfully. Either after completing all scenarios or at the end of the workday (with some scenarios never attempted), participants provided an overall system rating with the Post-Study System Usability Questionnaire (PSSUQ) (Lewis, 1992b; Lewis, Henry, & Mack, 1990).
Participants usually needed a full work day (8 hours) to complete the study.
At the end of the three studies, the researchers entered the responses to the ASQ, PSSUQ, and the scenario completion data into a database.
From this database, it was possible to conduct an exploratory factor analysis, reliability analyses, validity analyses, and a sensitivity analysis.

Factor analysis.
Due to the design of this study (eight scenarios and a 3-item questionnaire), either an 8-factor or 3-factor solution would have been reasonable. An 8-factor solution could indicate grouping by scenario, and a 3-factor solution could indicate grouping by item type. Figure 1 shows the scree plot for the eigenvalues.
The scree plot for this analysis did not support a 3-factor solution, but did support an 8-factor solution. The rotated factor pattern is in Table 2. Using a selection criterion of .5 for the factor loadings (indicated with bold type), a clear relationship existed between the factors and the scenarios. The eight factors accounted for almost all (94%) of the variance in the data.

Reliability.
For the eight summative scales derived from the eight factors, all the coefficient alphas exceeded .90. Coefficient alphas this large were surprising because each scale contained only three items, and reliability is largely a function of the number of scale items (Nunnally, 1978).

Validity.
The correlation between the ASQ scores and scenario failure or success (coded as 0=failure and 1=success) was -.40 (n=48, p<.01). This result showed that participants who successfully completed a scenario tended to give lower (more favorable) ASQ ratings – evidence of concurrent validity.

Sensitivity.
Of the 48 participants, 27 completed all of the ASQ items for all of the scenarios. This reduced database was appropriate for an analysis-of-variance (ANOVA) to assess the sensitivity of the ASQ. Specifically, did the ASQ scores discriminate among the different systems, user groups, or scenarios in the three usability studies? The main effect of Scenario was highly significant (F(7,126)=8.92, p<.0001).
The Scenario by System interaction was also significant (F(14,126)=1.75, p=.05). These results suggest that the ASQ scale score is a reasonably sensitive measure.

Discussion
These findings have limited generalizability because the sample size for the factor analysis was relatively small. The usual recommendation would require 120 participants for this analysis (5 participants x 8 scenarios/participant x 3 items/scenario). On the other hand, the resulting factor structure was very clear.
The psychometric evaluation of this questionnaire showed that it is reasonable to condense the three ASQ items into a single scale through summation (or, equivalently, averaging). The available evidence indicates that the ASQ is reliable, valid, and sensitive. This condensation should allow easier interpretation and reporting of results when usability practitioners use the ASQ.

My Comments: This would be helpful if I eventually select to develop Usability Questionnaire as usability evaluation tool.

IBM Computer Usability Satisfaction Questionnaires: Psychometric Evaluation and Instructions for Use
Technical Report 54.786
James R. Lewis
Human Factors Group
Boca Raton, FL

Source: http://drjim.0catch.com/usabqtr.pdf

No comments:

Post a Comment