Friday, September 25, 2009

Sep 25 - Nielsen, Card Sorting: How Many Users to Test (Alertbox)

Summary: Testing ever-more users in card sorting has diminishing returns, but you should still use three times more participants than you would in traditional usability tests.

Card Sorting: How Many Users to Test

One of the biggest challenges in website and intranet design is creating the information architecture: what goes where?
A classic mistake is to structure the information space based on how you view the content -- which often results in different subsites for each of your company's departments or information providers.

You can better enhance usability by creating an information architecture that reflects how users view the content.
In each of our intranet studies, we've found that some of the biggest productivity gains occur when companies restructure their intranet to reflect employees' workflow.
And in e-commerce, sales increase when products appear in the categories where users expect to find them.

All very good, but how do you find out the users' view of an information space and where they think each item should go?
For researching this type of mental model, the primary method is card sorting:
1. Write the name (and perhaps a short description) of each of the main items on an index card. Yes, good old paper cards.
2. Shuffle the cards and give the deck to a user. (The standard recommendations for recruiting test participants apply: they must be representative users, etc.)
3. Ask each user to sort the cards into piles, placing items that belong together in the same pile. Users can make as many or as few piles as they want; some piles can be big, others small.
4. Optional extra steps include asking users to arrange the resulting piles into bigger groups, and to name the different groups and piles. The latter step can give you ideas for words and synonyms to use for navigation labels, links, headlines, and search engine optimization.

Research Study

First, they tested 168 users, generating very solid results. They then simulated the outcome of running card sorting studies with smaller user groups by analyzing random subsets of the total dataset. For example, to see what a test of twenty users would generate, they selected twenty users randomly from the total set of 168 and analyzed only that subgroup's card sorting data. By selecting many such samples, it was possible to estimate the average findings from testing different numbers of users.
The main quantitative data from a card sorting study is a set of similarity scores that measures the similarity of user ratings for various item pairs. If all users sorted two cards into the same pile, then the two items represented by the cards would have 100% similarity. If half the users placed two cards together and half placed them in separate piles, those two items would have a 50% similarity score.
We can assess the outcome of a smaller card sorting study by asking how well its similarity scores correlate with the scores derived from testing a large user group. (A reminder: correlations run from -1 to +1. A correlation of 1 shows that the two datasets are perfectly aligned; 0 indicates no relationship; and negative correlations indicate datasets that are opposites of each other.)

How Many Users?

For most usability studies, I recommend testing five users, since that's enough data to teach you most of what you'll ever learn in a test. For card sorting, however, there's only a 0.75 correlation between the results from five users and the ultimate results. That's not good enough.
You must test fifteen users to reach a correlation of 0.90, which is a more comfortable place to stop. After fifteen users, diminishing returns set in and correlations increase very little: testing thirty people gives a correlation of 0.95 -- certainly better, but usually not worth twice the money. There are hardly any improvements from going beyond thirty users: you have to test sixty people to reach 0.98, and doing so is definitely wasteful.
Tullis and Wood recommend testing twenty to thirty users for card sorting. Based on their data, my recommendation is to test fifteen users.

Why More Users for Card Sorting?

We know that five users are enough for most usability studies, so why do we need three times as many participants to reach the same level of insight with card sorting?
Because the methods differ in two key ways:

User testing is an evaluation method: we already have a design, and we're trying to find out whether or not it's a good match with human nature and user needs. Although people differ substantially in their capabilities (domain knowledge, intelligence, and computer skills), if a certain design element causes difficulties, we'll see so after testing a few users.
A low-end user might experience more severe difficulties than a high-end user, but the magnitude of the difficulties is not at issue unless you are running a measurement study (which requires more users).
All you need to know is that the design element doesn't work for humans and should be changed.

Card sorting is a generative method: we don't yet have a design, and our goal is to find out how people think about certain issues.
There is great variability in different people's mental models and in the vocabulary they use to describe the same concepts.
We must collect data from a fair number of users before we can achieve a stable picture of the users' preferred structure and determine how to accommodate differences among users.

Source:
Jakob Nielsen's Alertbox, July 19, 2004:
Card Sorting: How Many Users to Test
http://www.useit.com/alertbox/20040719.html
Card Sorting: How Many Users to Test (Jakob Nielsen's Alertbox)

No comments:

Post a Comment