Friday, September 25, 2009
Plan for week Sep 28 - Oct 3: Usability Questionnaire
Usability Questionnaire:
SUMI
QUIS
PSSUQ
SUS
ASQ
CUSI
Usability Questionnaire is also a method used for evaluating usability. In my readings, I have come accross this methodology. I know the Usability Questionnaire would contain a set of questions.
I would like to know more about this. Usability Questionnaire is a type of usability evaluation tool. My research title is "Usability Evaluation Tool for Mobile Learning Applications."
Hence, reading up (literature review) on the popular types of Usability Questionnaire would definitely be useful for my knowledge.
Week of Sep 21-26: Progress Report
WIB: 2356 - Fri - 25/9/2009. (WIB = Waktu Indonesia Barat)
I have completed reading 22 Jakob Nielsen's short articles (Alertbox).
Sep 21 -> 9
Sep 22 -> 3
Sep 23 -> 7
Sep 25 -> 3
I have chosen Alertbox articles for 2006, 2005 and 2004.
I will be flying back to KL tonight, ETA 2330.
Last week I read 21 Jakob Nielsen's short articles (Alertbox).
Sep 25 - Nielsen, Risks of Quantitative Studies (Alertbox)
Risks of Quantitative Studies
There are two main types of user research: quantitative (statistics) and qualitative (insights).
The key benefit of quantitative studies is simple: they boil a complex situation down to a single number that's easy to grasp and discuss. I exploit this communicative clarity myself, for example, in reporting that using websites is 206% more difficult for users with disabilities and 122% more difficult for senior citizens than for mainstream users.
Beware Number Fetishism
When I read reports from other people's research, I usually find that their qualitative study results are more credible and trustworthy than their quantitative results. It's a dangerous mistake to believe that statistical research is somehow more scientific or credible than insight-based observational research. In fact, most statistical research is less credible than qualitative studies.
User interfaces and usability are highly contextual, and their effectiveness depends on a broad understanding of human behavior.
Fixating on numbers rather than qualitative insights has driven many usability studies astray. As the following points illustrate, quantitative approaches are inherently risky in a host of ways.
Random Results
Researchers often perform statistical analysis to determine whether numeric results are "statistically significant." By convention, they deem an outcome significant if there is less than 5% probability that it could have occurred randomly rather than signifying a true phenomenon.
This sounds reasonable, but it implies that one out of twenty "significant" results might be random if researchers rely purely on quantitative methods.
Luckily, most good researchers -- especiaally those in the user-interface field -- use more than a simple quantitative analysis. Thus, they typically have insights beyond simple statistics when they publish a paper, which drives down, but doesn't eliminate, bogus findings.
There's a reverse phenomenon as well: Sometimes a true finding is statistically insignificant because of the experiment's design. Perhaps the study didn't include enough participants to observe a major -- but rare -- finding in sufficient numbers. It would therefore be wrong to dismiss issues as irrelevant just because they don't show up in quantitative study results.
Pulling Correlations Out of a Hat
If you measure enough variables, you will inevitably discover that some seem to correlate. Run all your stats through the software and a few "significant" correlations will surely pop out. (Remember: one out of twenty analyses are "significant," even if there is no underlying true phenomenon.)
Studies that measure seven metrics will generate twenty-one possible correlations between the variables. Thus, on average, such studies will have one bogus correlation that the statistics program deems "significant," even if the issues being measured have no real connection.
In my Web Usability 2004 project, we collected metrics on fifty-three different aspects of user behavior on websites. There are thus 1,378 possible correlations that I could throw into the hopper. Even if we didn't discover anything at all in the study, about sixty-nine correlations would emerge as "statistically significant."
Overlooking Covariants
Even when a correlation represents a true phenomenon, it can be misleading if the real action concerns a third variable that is related to the two you're studying.
For example, studies show that intelligence declines by birth order. In other words, a person who was a first-born child will on average have a higher IQ than someone who was born second. Third-, fourth-, fifth-born children and so on have progressively lower average IQs. This data seems to present a clear warning to prospective parents: Don't have too many kids, or they'll come out increasingly stupid. Not so.
There's a hidden third variable at play: smarter parents tend to have fewer children. When you want to measure the average IQ of first-born children, you sample the offspring of all parents, regardless of how many kids they have. But when you measure the average IQ of fifth-born children, you're obviously sampling only the offspring of parents who have five or more kids. There will thus be a bigger percentage of low-IQ children in the latter sample, giving us the true -- but misleading -- conclusion that fifth-born children have lower average IQs than first-born children. Any given couple can have as many children as they want, and their younger children are unlikely to be significantly less intelligent than their older ones. When you measure intelligence based on a random sample from the available pool of children, however, you're ignoring the parents, who are the true cause of the observed data.
(Update added 2007: The newest research suggests that there may actually be a tiny advantage in IQ for first-born children after correcting for family size and the parents' economic and educational status. But the point remains that you have to correct for these covariants, and when you do so, the IQ difference is much less than plain averages may lead you to believe.)
As a Web example, you might observe that longer link texts are positively correlated with user success. This doesn't mean that you should write long links. Website designers are the hidden covariant here: clueless designers tend to use short text links like "more," "click here," and made-up words.
Over-Simplified Analysis
To get good statistics, you must tightly control the experimental conditions -- often so tightly that the findings don't generalize to real problems in the real world.
This is a common problem for university research, where the test subjects tend to be undergraduate students rather than mainstream users. Also, instead of testing real websites with their myriad contextual complexities, many academic studies test scaled-back designs with a small page count and simplified content.
For example, it's easy to run a study that shows breadcrumbs are useless: just give users directed tasks that require them to go in a straight line to the desired destination and stop there. Such users will (rightly) ignore any breadcrumb trail. Breadcrumbs are still recommended for many sites, of course. Not only are they lightweight, and thus unlikely to interfere with direct-movement users, but they're helpful to users who arrive deep within a site via search engines and direct links. Breadcrumbs give these users context and help users who are doing comparisons by offering direct access to higher levels of the information architecture.
Usability-in-the-large is often neglected by narrow research that doesn't consider, for example, revisitation behavior, search engine visibility, and multi-user decision-making.
Distorted Measurements
It's easy to prejudice a usability study by helping the users at the wrong time or by using the wrong tasks. In fact, you can prove virtually anything you want if you design the study accordingly. This is often a factor behind "sponsored" studies that purport to show that one vendor's products are easier to use than a competitor's products.
Even if the experimenters aren't fraudulent, it's easy to get hoodwinked by methodological weaknesses, such as directing the users' attention to specific details on the screen.
The very fact that you're asking about some design elements rather than others makes users notice them more and thus changes their behavior.
Many Web advertising studies are misleading, possibly because most such studies come from advertising agencies.
The most common distortion is the novelty effect: whenever a new advertising format is introduced, it's always accompanied by a study showing that the new type of ad generates more user clicks. Sure, that's because the new format enjoys a temporary advantage: it gathers user attention simply because it's new and users have yet to train themselves to ignore it.
The study might be genuine as far as it goes, but it says nothing about the new advertising format's long-term advantages once the novelty effect wears off.
Publication Bias
Editors follow the "man bites dog" principle to highlight new and interesting stories. While understandable, this preference for new and different findings imposes a significant bias in the results that get exposure.
Usability is a very stable field. User behavior is pretty much the same year after year. I keep finding the same results in study after study, as do many others. Every now and then, a bogus result emerges and publication bias ensures that it gets much more attention than it deserves.
Consider the question of Web page download time. Everyone knows that faster is better. Interaction design theory has documented the importance of response times since 1968, and this importance has been seen empirically in countless Web studies since 1995. E-commerce sites that speed up response times sell more. The day your server is slow, you lose traffic. (This happened to me recently: on January 14, Tog got "slashdotted"; because we share a server, my site lost 10% of its normal pageviews for a Wednesday when AskTog's increased traffic slowed useit.com down.)
If twenty people study download times, nineteen will conclude that faster is better. But again: one of every twenty statistical analyses will give the wrong result, and this one study might be widely discussed simply because it's new. The nineteen correct studies, in contrast, might easily escape mention.
Judging Bizarre Results
Bizarre results are sometimes supported by seemingly convincing numbers. You can use the issues I've raised here as a sanity check: Did the study pull correlations out of a hat? Was it biased or overly narrow? Was it promoted purely because it's different? Or was it just a fluke?
Typically, you'll discover that deviant findings should be ignored.
The broad concepts of human behavior in interactive systems are stable and easy to understand. The exceptions usually turn out to be exactly that: exceptions.
In 1989, for example, I published a paper on discount usability engineering, stating that small, fast user studies are superior to larger studies, and that testing with about five users is typically sufficient.
This was quite contrary to the prevailing wisdom at the time, which was dominated by big-budget testing. During the fifteen years since my original claim, several other researchers reached similar conclusions, and we developed a mathematical model to substantiate the theory behind my empirical observation. Today, almost everyone who does user testing has concluded that they learn most of what they'll ever learn with about five users.
But four or five studies constitute a trend, which much enhances the finding's credibility as a general phenomenon.
Quantitative Studies: Intrinsic Risks
All the reasons I've listed for quantitative studies being misleading indicate bad research; it's possible to do good quantitative research and derive valid insights from measurements. But doing so is expensive and difficult.
Quantitative studies must be done exactly right in every detail or the numbers will be deceptive. There are so many pitfalls that you're likely to land in one of them and get into trouble.
If you rely on numbers without insights, you don't have backup when things go wrong. You'll stumble down the wrong path, because that's where the numbers will lead.
Qualitative studies are less brittle and thus less likely to break under the strain of a few methodological weaknesses. Even if your study isn't perfect in every last detail, you'll still get mostly good results from a qualitative method that relies on understanding users and their observed behavior.
Yes, experts get better results than beginners from qualitative studies.
But for quantitative studies, only the best experts get any valid results at all, and only then if they're extremely careful.
Source:
Jakob Nielsen's Alertbox, March 1, 2004:
Risks of Quantitative Studies
http://www.useit.com/alertbox/20040301.html
Risks of Quantitative Studies (Jakob Nielsen's Alertbox)
Sep 25 - Nielsen, Card Sorting: How Many Users to Test (Alertbox)
Card Sorting: How Many Users to Test
One of the biggest challenges in website and intranet design is creating the information architecture: what goes where?
A classic mistake is to structure the information space based on how you view the content -- which often results in different subsites for each of your company's departments or information providers.
You can better enhance usability by creating an information architecture that reflects how users view the content.
In each of our intranet studies, we've found that some of the biggest productivity gains occur when companies restructure their intranet to reflect employees' workflow.
And in e-commerce, sales increase when products appear in the categories where users expect to find them.
All very good, but how do you find out the users' view of an information space and where they think each item should go?
For researching this type of mental model, the primary method is card sorting:
1. Write the name (and perhaps a short description) of each of the main items on an index card. Yes, good old paper cards.
2. Shuffle the cards and give the deck to a user. (The standard recommendations for recruiting test participants apply: they must be representative users, etc.)
3. Ask each user to sort the cards into piles, placing items that belong together in the same pile. Users can make as many or as few piles as they want; some piles can be big, others small.
4. Optional extra steps include asking users to arrange the resulting piles into bigger groups, and to name the different groups and piles. The latter step can give you ideas for words and synonyms to use for navigation labels, links, headlines, and search engine optimization.
Research Study
First, they tested 168 users, generating very solid results. They then simulated the outcome of running card sorting studies with smaller user groups by analyzing random subsets of the total dataset. For example, to see what a test of twenty users would generate, they selected twenty users randomly from the total set of 168 and analyzed only that subgroup's card sorting data. By selecting many such samples, it was possible to estimate the average findings from testing different numbers of users.
The main quantitative data from a card sorting study is a set of similarity scores that measures the similarity of user ratings for various item pairs. If all users sorted two cards into the same pile, then the two items represented by the cards would have 100% similarity. If half the users placed two cards together and half placed them in separate piles, those two items would have a 50% similarity score.
We can assess the outcome of a smaller card sorting study by asking how well its similarity scores correlate with the scores derived from testing a large user group. (A reminder: correlations run from -1 to +1. A correlation of 1 shows that the two datasets are perfectly aligned; 0 indicates no relationship; and negative correlations indicate datasets that are opposites of each other.)
How Many Users?
For most usability studies, I recommend testing five users, since that's enough data to teach you most of what you'll ever learn in a test. For card sorting, however, there's only a 0.75 correlation between the results from five users and the ultimate results. That's not good enough.
You must test fifteen users to reach a correlation of 0.90, which is a more comfortable place to stop. After fifteen users, diminishing returns set in and correlations increase very little: testing thirty people gives a correlation of 0.95 -- certainly better, but usually not worth twice the money. There are hardly any improvements from going beyond thirty users: you have to test sixty people to reach 0.98, and doing so is definitely wasteful.
Tullis and Wood recommend testing twenty to thirty users for card sorting. Based on their data, my recommendation is to test fifteen users.
Why More Users for Card Sorting?
We know that five users are enough for most usability studies, so why do we need three times as many participants to reach the same level of insight with card sorting?
Because the methods differ in two key ways:
User testing is an evaluation method: we already have a design, and we're trying to find out whether or not it's a good match with human nature and user needs. Although people differ substantially in their capabilities (domain knowledge, intelligence, and computer skills), if a certain design element causes difficulties, we'll see so after testing a few users.
A low-end user might experience more severe difficulties than a high-end user, but the magnitude of the difficulties is not at issue unless you are running a measurement study (which requires more users).
All you need to know is that the design element doesn't work for humans and should be changed.
Card sorting is a generative method: we don't yet have a design, and our goal is to find out how people think about certain issues.
There is great variability in different people's mental models and in the vocabulary they use to describe the same concepts.
We must collect data from a fair number of users before we can achieve a stable picture of the users' preferred structure and determine how to accommodate differences among users.
Source:
Jakob Nielsen's Alertbox, July 19, 2004:
Card Sorting: How Many Users to Test
http://www.useit.com/alertbox/20040719.html
Card Sorting: How Many Users to Test (Jakob Nielsen's Alertbox)
Sep 25 - Nielsen, The Need for Web Design Standards (Alertbox)
The Need for Web Design Standards
Unfortunately, much of the Web is like an anthill built by ants on LSD: many sites don't fit into the big picture, and are too difficult to use because they deviate from expected norms.
Several design elements are common enough that users expect them to work in a certain way.
Here's my definition of three different standardization levels:
Standard: 80% or more of websites use the same design approach. Users strongly expect standard elements to work a certain way when they visit a new site because that's how things always work.
Convention: 50-79% of websites use the same design approach. With a convention, users expect elements to work a certain way when they visit a new site because that's how things usually work.
Confusion: with these elements, no single design approach dominates, and even the most popular approach is used by at most 49% of websites. For such design elements, users don't know what to expect when they visit a new site.
How Many Design Elements Are Standardized?
To estimate the extent to which Web design complies with interface standards, I compared two studies: my own study of twenty-four features on fifty corporate homepages, and a University of Washington master's thesis that studied thirty-three features on seventy-five e-commerce sites.
Following are the extent to which websites have standardized on the fifty-seven design approaches studied:
Standard: 37% of design elements were done the same way by at least four-fifths of the sites. Standard design elements included:
* A logo in the upper left corner of the page
* A search box on the homepage
* An absence of splash pages
* Breadcrumbs listed horizontally (when they were used)
Convention: 40% of design elements were done the same way by at least half the sites (but less than four-fifths of the sites). Conventional design elements included:
* Using the label "site map" for the site map (which is recommended from user research on site map usability)
* Changing the color of visited links (recommended to help navigation)
* Placing the shopping cart link in the upper right corner of page
* Placing links to sibling areas (neighboring topics at the same information architecture level as the current location) in the left-hand column
Confusion: 23% of design elements were done in so many ways that no single approach dominated. Confusion reigned in several areas, including:
* The main navigation schemes, which included left-hand menu, tabs across the top, navbar across the top, Yahoo-style directory in the middle, and so on
* Placement of the search feature, which included upper right, upper left, middle, and elsewhere on the page
* The sign-in process
* Placement of Help
At first glance, it might seem wonderful that only 1/4 of the design issues created confusion.
But look at the design element examples at each standardization level. Unfortunately, the most firmly standardized issues are the simplest and most localized ones, such as where to put the logo or how to display breadcrumb trails.
The confusing design elements are the bigger issues that contribute more strongly to users' ability to master the whole site, as opposed to dealing with individual pages. Navigation is confusing. Search is confusing. Sign-in is confusing. Even Help is confusing, reducing the usability of the user's last resort when all else has failed.
Why Design Standards Help Users
We must eliminate confusing design elements and move as far as possible into the realm of design conventions. Even better, we should establish design standards for every important website task.
Standards ensure that users
* know what features to expect,
* know how these features will look in the interface,
* know where to find these features on the site and on the page,
* know how to operate each feature to achieve their goal,
* don't have to ponder the meaning of unknown design elements,
* don't miss important features because they overlook a non-standard design element, and
* don't get nasty surprises when something doesn't work as expected.
Why Websites Should Comply With Design Standards
One simple reason:
Jakob's Law of the Internet User Experience: users spend most of their time on other websites.
In visiting all these other sites, people become accustomed to the prevailing design standards and conventions. Thus, when users arrive at your site, they assume it will work the same way as other sites.
In my recent research into Web-wide user behavior, users left websites after 1 minute and 49 seconds on average, concluding in that time that the website didn't fulfill their needs.
With so little time to convince prospects that you're worthy of their business, you shouldn't waste even a second making them struggle with a deviant user interface.
Going forward, we must produce and follow widely-used conventions and design patterns for the bigger issues in Web design, including:
* the structure of product pages,
* workflow (beyond simplistic shopping carts),
* the main types of information a corporate site should provide, and
* the information architecture for that information (where to find what).
Intranet Standards
Design standards are one area in which intranets are better off than public websites.
A key distinction between an intranet and the Internet is that the intranet has a single authority in charge.
The intranet team can define a design standard and promote it throughout the corporation. The team can also implement a single publishing system that ensures consistency by placing all content into a single set of well-designed templates.
Source:
Jakob Nielsen's Alertbox, September 13, 2004:
The Need for Web Design Standards
http://www.useit.com/alertbox/20040913.html
The Need for Web Design Standards (Jakob Nielsen's Alertbox)
First F2F Meeting with Supervisor and Co-Supervisor
Next Friday, 2 Oct 2009.
10.00 am.
@ K-Space, FCM.
:)
#####################
as at 28 Sep 2009
revised to
Monday, 5 Oct 2009
2.00 pm.
@K-Space
Wednesday, September 23, 2009
Sep 23 - Nielsen, Newsletter Usability: Can a Professional Publisher Do Better? (Alertbox)
The Washington Post's email newsletter earns a high usability score. It's particularly good at setting users' expectations before they subscribe, though the unsubscribe interface has some problems.Newsletter Usability: Can a Professional Publisher Do Better?
My recent review of the Bush and Kerry campaigns' email newsletters concluded that both U.S. presidential candidates published newsletters with good content, but both had severe deficiencies in their content's user interface.
The Washington Post has a dedicated newsletter called the Weekly Campaign Report that covers many of the same issues as the candidates' newsletters. So, I set out to answer a question: Does a professional publisher do better with its email newsletter than the campaign sites do?
Result: A Grand-Slam Winner
I evaluated washingtonpost.com's subscribe and unsubscribe interfaces on September 21, 2004, and evaluated the Weekly Campaign Report newsletter itself during a four-week period from September 6 to October 3, 2004. I scored the website and mailings for compliance with the 127 design guidelines for newsletter usability derived from our recent research with users who were reading and subscribing to a large number of email newsletters. The Washington Post handily outscores both George W. Bush and John Kerry. Not only is its overall rating much better, but the Post also got more points than either candidate in each of the four major newsletter usability areas.
Okay, so 72% is certainly not a perfect score. I prefer to see sites complying with 80-90% of usability guidelines. (A 100% score is not required because any given site often has special circumstances that make it appropriate to deviate from a few guidelines.) Still, 72% is a very respectable level on today's Internet, and it's dramatically higher than the 57-58% achieved by Bush and Kerry
Subscription Interface
Washingtonpost.com scores much better than the presidential campaign sites in setting users' expectations for what they'll get if they subscribe to the newsletter. Basically, Bush and Kerry say "give us your email address, and we'll send you some [unspecified] stuff." The main reason that the Post's subscription interface scores less than 100% is that it requires user registration, including many nosy questions requesting personal information. This is presumably driven by the myth that user demographics are the way to target ads on the Internet, but registration impedes usability and drives away subscribers. The net result is thus lower advertising revenues.
Amazingly, washingtonpost.com partly violates the basic guideline to promote relevant newsletters in context by failing to link to its newsletters from appropriate articles throughout the site. They do link to the Weekly Campaign Report from the "2004 Election" category page, so I gave them half credit for this guideline.
Washingtonpost.com also scores low on helping users find newsletter information through its user guidance features. I did give it a few points for good implementation of search usability guidelines: when users search "subscribe" or "newsletters," there's a clear best-bets-style link to the newsletters at the top of the results list. But a search for "unsubscribe" provides no results and no advice on what to do -- not even a link to Help or a site map.
Subscription Maintenance and Unsubscribing
User registration comes back to bite the Post in terms of unsubscribe usability. The newsletter doesn't follow the recommended one-click process for unsubscribing. Instead, users must sign in, which requires them to remember their passwords. Nasty.At least the site offers a decent interface for recovering forgotten passwords, which somewhat alleviates the usability problem of requiring users to sign in before they can get off the mailing list.
Once users sign in, it's easy to unsubscribe from either a single newsletter or from all the newsletters they receive. Good.
In several cases, it's possible to change the frequency of newsletters, going from daily to weekly newsletters for users who feel flooded with email. This is good, though there is no recovery interface that offers a corresponding weekly newsletter as a substitution when users unsubscribe from a daily newsletter.
Newsletter Content
The Post gets high scores for newsletter content, but then so did the presidential candidates. You would expect a leading newspaper to have people who can write and edit, and they do, collecting a tad more points in this area than the campaign newsletters.Headlines are short (good), but not always specific enough for the online medium (bad). For example, the September 8 newsletter was entitled "Over the Top," which doesn't quite explain the newsletter's main thrust. A subject line like "Campaign Rhetoric Takes a Nasty Turn" would have generated a higher open rate.
Subheads are often straightforward and to the point (good), such as "Poll Analysis" or "Kerry Blasts Bush on Guns." The Washington Post collects only half credit for the common task of newsletter printing, because the printout's right side is cut off if users simply click the Print button in their email program. The cursed frozen layout strikes again.
Advertisements are handled appropriately and don't damage the user experience, except that, paradoxically, some editorial content is too similar to ads and users might therefore overlook it. Looking like an advertisement has long been one of the top Web design mistakes, and should be avoided.
Improving Newsletter Usability
Is it unfair to compare the presidential campaign newsletters with one that is professionally published by a major newspaper? I don't think so. But presidential campaigns are another matter: they have budgets of more than $300 million each. One of their biggest problems was not a lack of writing skill, but a lack of editorial judgment.The Bush and Kerry newsletters have improved their usability scores somewhat since I evaluated them. From a positive perspective, this indicates that the campaign managers are capable of learning from experience.
The Washington Post clearly shows a good email newsletter's potential. The Post's Web design team and newsletter editors have done a superlative job and beat both George W. Bush and John Kerry by a mile.
The Post newsletter is also better than the majority of email newsletters that we've evaluated from corporate websites and Internet marketers. Yes, the usability could be even better (I am never satisfied), but my conclusion is: good job, folks.
Source:
Jakob Nielsen's Alertbox, October 11, 2004:
Newsletter Usability: Can a Professional Publisher Do Better?
http://www.useit.com/alertbox/20041011.htmlNewsletter Usability: Can a Professional Publisher Do Better? (Jakob N...
Sep 23 - Nielsen, Acting on User Research (Alertbox)
Acting on User Research
The good news in user research is that we're building up a massive body of knowledge about user behavior in online systems.
Here's one way of quantifying the amount of current usability knowledge: We've published 3,326 pages of usability research reports. Those reports contain 1,217 general design guidelines and 1,961 screenshots, all of which have information about how specific designs helped or hindered real users.
Other researchers publish reports as well, and you may also have an internal usability group or have commissioned consultants to study issues of interest to you.
In total, a huge mass of user research. How should you deal with all these findings?
First Steps
User research is a reality check. It tells you what really happens when people use computers. You can speculate on what customers want, or you can find out.
Research offers an understanding of how users behave online, and is a solid foundation on which to build a design.
I still recommend that you user test your own design: any time you have a new idea, build a paper prototype and test it so that you don't waste money implementing ideas that don't work. But, if you start with design ideas that are based on the actual behavior of real human beings, you'll have considerably fewer usability problems than if you target a design at a hypothetical or idealized user.
It can be overwhelming at first to see a long list of new research findings. Try to process them in small bites.
For example, look at your homepage or a typical content page in light of the new findings. Print out a copy and circle each design element that might violate a design guideline or cause users problems.
You can also use research findings as a checklist: go through your own design one guideline at a time and see whether you comply. Whenever you're in violation of an established usability finding, you can dig deeper into that finding's underlying user research and learn more about it. With your new knowledge, you might decide to fix your design to make it compliant with users' typical behavior.
Handling Disagreements
There are two main reasons people disagree with a research study and its conclusions:
1. their own research shows something different, or
2. their personal opinions and preferences differ from the research recommendations.
If your research findings disagree with published results, you have two options.
First, it's possible that your study had a methodology flaw, so it's worth reviewing the study in light of the new results. There are numerous issues to consider to run a valid user test.
Second, it could be that you're dealing with a special case, and your findings actually are different. Such cases do exist, although exceptions are more rare than people would like to think.
If your own intuition disagrees with published findings, view it as a learning opportunity that can improve your future insights.
Design is a business decision. You should follow the data and do what generates the biggest profits for your company, not what wins design awards.
If you disagree strongly, you can always run a study of your design to determine whether you are one of those rare exceptions. General usability guidelines typically hold true in about 90% of cases. There are many special circumstances that make the remaining 10% sufficiently atypical such that the best solution will be something other than the normal recommendation.
If you run an online business, you're in the user experience business: all the value flows through a user interface.
Conclusion
It's essential to develop the expertise to interpret user research and an understanding of when to run usability studies. You still have to know how to deal with the reports and make the research findings relevant to your business.
Source:
Jakob Nielsen's Alertbox, November 8, 2004:
Acting on User Research
http://www.useit.com/alertbox/20041108.html
Acting on User Research (Jakob Nielsen's Alertbox)
Tuesday, September 22, 2009
Sep 23 - Nielsen, Authentic Behavior in User Testing (Alertbox)
Authentic Behavior in User Testing
It's a miracle that user testing works: You bring people into a room where they're surrounded by strangers, monitored by cameras, and asked to perform difficult tasks with miserable websites. Often, there's even an intimidating one-way mirror dominating one wall. Under these conditions, how can anybody accomplish anything?
User Engagement
When test participants are asked to perform tasks, they usually get so engaged in using the interface that the usability lab's distractions recede. Users know that there's a camera, and maybe even a one-way mirror hiding additional observers, but their attention is focused on the screen.
It's a basic human desire to want to perform well on a test. We can say, "we're not testing you, we're testing the system" all we want. People still feel like they're taking a test, and they want to pass. They don't want to be defeated by a computer.
Because they want to be successful, users allocate their mental resources to what's happening on the screen, not what's happening in the room. Of course, this concentration can easily be broken, which is why it's a cardinal rule of user testing to have observers remain absolutely quiet if they're in the room with users.
Generally, as long as observers stay quiet and out of view (behind the user or behind the mirror), participants will remain engaged in their tasks.
One downside of users' tendency to engage strongly is that they sometimes work harder on tasks in a test session than they would at home. If the user says, "I would stop here," you can bet that they'd probably stop a few screens earlier if they weren't being tested.
Suspension of Disbelief
In user testing, we pull people away from their offices or homes and ask them to pretend to perform business or personal tasks with our design. Obviously, an artificial scenario.
As part of the full usability life cycle, there are good reasons to conduct field studies and observe users' behavior in their natural habitats. Unfortunately, field studies are much more expensive than lab studies, and it's typically difficult to get permission to conduct research inside other companies.
The tendency to suspend disbelief is deeply rooted in the human condition, and may have developed to help prehistoric humans bond around the camp fire in support of storytelling and magic ceremonies.
* You're not looking at people, you're looking at pictures of people, in the form of glowing dots on a picture tube.
* You're not looking at pictures of real people, you're looking at pictures of actors pretending to be characters, like Mr. Spock and Captain Picard, that don't exist.
* You're not looking at pictures of actors using transporters, shooting phasers, and flying faster-than-light starships. All such activities are simulated with special effects.
You know all of this, and yet you engage in the story when watching the show.
Similarly, in usability studies, participants easily pretend that the scenario is real and that they're really using the design. For this to happen, you obviously need realistic test tasks and to have recruited representative users who might actually perform such tasks in the real world. Assuming both, most usability participants will suspend disbelief and simply attempt the task at hand.
Suspension of disbelief goes so far that users engage strongly with paper prototypes where the user interface is purely a piece of paper. As long as you can move through screens in pursuit of your goal, you will behave as if the system were real and not simulated.
When Engagement and Suspension of Disbelief Fail
User testing typically works, but there are exceptions. Occasionally, test participants are so lazy and difficult to engage that they never suspend disbelief and work on the tasks for real.
For example, if you ask such users to select a product to solve a problem, they'll typically stop at the first remotely related product, even if it's basically unsuitable and wouldn't be bought by anybody who really had the target problem.
In rare cases, such nonrealistic usage is serious enough that you must simply excuse the participant and discard the session's data. If users haven't suspended disbelief and performed conscientiously, you can't trust that anything they've done represents real use.
The easiest and most common approach is to ask the user whether this is what he or she would do at the office (or at home, for a consumer project). This small reminder is often enough to get users engaged. Variants of this technique include:
* Ask users if they have enough information to make a decision (when they stop after finding the first nugget of information about the problem).
* Ask users if they are sure that they've selected the best product (if they stop after finding the first likely purchase, without evaluating alternative options).
If users fail to take the tasks seriously in your first few test sessions, you can usually rescue the study by modifying the test instructions or task descriptions.
For example, it's often enough to add an introductory remark such as, "Pretend that you must not only find the best solution, but justify the choice to your boss."
Including an absent boss in the test scenario encourages suspension of disbelief and usually works wonders. I've also found it effective to ask users to "write five bullet points for your boss, explaining the main pros and cons of this product." (You can also ask for three bullets of pros and three bullets of cons.)
Source
Jakob Nielsen's Alertbox, February 14, 2005:
Authentic Behavior in User Testing
http://www.useit.com/alertbox/20050214.html
Authentic Behavior in User Testing (Jakob Nielsen's Alertbox)
Sep 23 - Nielsen, Evangelizing Usability: Change Your Strategy at the Halfway Point (Alertbox)
Summary: The evangelism strategies that help a usability group get established in a company are different from the ones needed to create a full-fledged usability culture.
Introduction
The approach that takes your company from miserable usability to decent design is not the one you'll need to get from good to great.
A company progresses through a series of maturity levels as usability becomes more widely accepted in the organization and more tightly integrated with the development process. If you are the company's leading user advocate or usability manager, one of your main jobs is to prod the company to the next level.
Early Evangelism: From User Advocate to Usability Group
Starting point: One or two people in the company care about usability, but working on usability activities is rarely their main job. As a result, they start small, typically by doing a little user testing on the side.
Desired goal: Establish an official usability group with a manager, a charter, and a budget to perform usability activities.
In the early maturity stage, few resources are available and the company is not truly committed to usability. About halfway through this growth process, the company typically has a few full-time usability specialists, but they won't officially "own" usability because they don't have a recognized spot on the orgchart.
Under these circumstances, it's impossible to support the full user-centered-design (UCD) life cycle, and it would be futile for the few lonely usability specialists to try to do so. A company can't be forced to jump through multiple maturity levels in one push.
At this point, the strategy should be as follows:
Usability methods. Do small, qualitative user tests. Don't bother with advanced methods, such as field studies and benchmark studies, because you won't have the time or money (and might lack the expertise).
Lifecycle stage. You perform usability at whatever point the project manager calls you in. This is typically late in the project, when the manager realizes that the user interface is in trouble. We all know that it's better to start usability activities early -- before design begins -- so that user data can drive the design's direction.
Choice of project. Work with projects that want to work with you. When you don't have the organizational mandate to own usability, you can't impose yourself on projects that might need you, but don't realize it.
Methodology growth. Set up a usability lab, establish procedures for recruiting test participants, and define standardized test tasks for your domain.
The key word for early evangelism is to be opportunistic in allocating your scarce resources. You can't follow the recommended usability process in all its glory because your organization lacks the commitment required. Instead of fighting windmills, go for the easy wins.
Late Evangelism: From Usability Group to Usability Culture
Starting point: There's an official usability group with a manager, a charter, and a budget to perform usability activities.
Desired goals: Establish an entire user experience department, with several specialized groups for different user-centered activities. Generate total company commitment to a formal UCD process -- owned by the user experience department -- for all development projects.
Typically, usability becomes "established" in a company without giving the usability group the power to fully own the total user experience. The usability group is often viewed as a service organization that supplies usability expertise to project teams at their managers' request.
As the company matures, more resources become available for usability, but some prioritization is still needed. Rather than chase easy wins, you must build spectacular wins for usability to convince executives to move the organization to the desired goal state.
At this point, the strategy should be as follows:
Usability methods. Continue to perform simple user tests to clean up bad design. You'll likely do this forever. Now, however, you should spend more resources on deeper studies that generate new insights and thus inspire new features or paradigm shifts in the design. Conduct early studies such as field studies and competitive studies that set the design's direction. Collect usability metrics from benchmark studies. Track this data over time to quantify usability's return on investment.
Lifecycle stage. Press to start usability activities earlier for each new project. For key projects, ensure sufficient resources to conduct both early, pre-design studies (for big-win insights to shape the project) and multiple rounds of iterative testing (to polish the design beyond simply fixing its worst flaws).
Choice of project. Focus on high-impact projects where the benefits of substantial usability improvements will have enormous monetary value and high visibility for senior executives. Better to support the full life cycle of a few important projects than to do lower-impact, last-minute testing for all projects.
Methodology growth. Once you've performed enough comparative, field, and iterative studies, generalize the lessons and write usability guidelines for your organization's particular type of user interface designs. Develop and maintain official UI design standards.
The goal of late-stage evangelism is to fully integrate usability with development so that it becomes second nature to start projects with usability activities, before design begins. The organization needs a usability culture.
All managers should understand the basic steps in the UCD life cycle -- if nothing else, because that life cycle has been mandated as the way projects get done.
For this to happen, executives must have seen several examples of the added value created by full-fledged usability, as opposed to last-minute user testing.
Because user testing is so cheap and so profitable, it's easy to get caught at a mid-level of organizational maturity, where user testing is common but deeper research is avoided. v
Source:
Jakob Nielsen's Alertbox, March 28, 2005:
Evangelizing Usability: Change Your Strategy at the Halfway Point
http://www.useit.com/alertbox/20050328.html
Evangelizing Usability: Changing Strategies at the Halfway Point (Jakob Nielsen's Al...
Sep 23 - Nielsen, Durability of Usability Guidelines (Alertbox)
Summary: About 90% of usability guidelines from 1986 are still valid, though several guidelines are less important because they relate to design elements that are rarely used today.
From 1984 to 1986, the U.S. Air Force compiled existing usability knowledge into a single, well-organized set of guidelines for its user interface designers. I was one of several people who advised the project (in a small way), and thus received a copy of the final 478-page book in August 1986.
The project identified 944 guidelines. This may seem like a lot, but it pales against the 1,277 guidelines for Web and intranet usability we've identified so far--and we're not done yet.
Twenty-Year-Old Guidelines--Past Their Expiration Date?
The 944 guidelines related to military command and control systems built in the 1970s and early 1980s; most used mainframe technology. You might think that these old findings would be completely irrelevant to today's user interface designers. If so, you'd be wrong.
I decided to use the 1986 report to assess the longevity of usability work. Because reassessing all 944 guidelines would require too much effort, I took a shortcut, reviewing ten guidelines from each of the report's six sections, for a total sample of sixty guidelines. (A sidebar reprints these sixty guidelines, so you can judge them for yourself.)
Of those sixty usability guidelines, fifty-four continue to be valid today. In other words, 90% of the old guidelines are still correct.
What's Changed
Ten percent of the guidelines would have to be retracted or reconsidered for today's world. But even these questionable guidelines are at least partly correct in most cases. In fact, I would deem only two guidelines (3%) completely wrong and harmful to usability if followed.
Guideline 4.2.6 said to provide a unique identification for each display in a consistent location at the top of the display frame. This guideline worked well in the target domain of mainframes: Users typically navigated only a few screens, and having a unique ID let them understand their current location. The IDs also made it easy for manuals and help features to refer to specific screens.
Today, screen identifiers would clutter the screens with irrelevant information. They would not help modern users, who move freely among numerous locations.
Even this invalidated guideline continues to contain a core of truth: it's good for users to know where they are and what they can do on each screen. The current recommendation is to provide a headline or title that concisely summarizes each screen's purpose.
Guideline 3.1.4.13 said to assign a single function key to any continuously available feature. This made sense for mainframe interfaces because they relied extensively on function keys to speed up the interaction. Also, mainframe systems were so heavily moded that very few functions were available across all system areas; the few that were obviously deserved special treatment.
Modern systems attempt to be modeless, so many features have become ubiquitous and accessible from anywhere. Furthermore, function keys are no longer the primary way of operating computers. Given these two changes, it no longer makes sense to assign function keys to constantly available features.
In addition to the invalid guidelines, twenty percent of guidelines are essentially irrelevant today because they relate to rarely used interface technologies.
For example, guideline 1.4.13 discussed how to overtype the field markers (typically underscores) that mainframe systems used to indicate where users could type their input. Today, input fields are almost always denoted by text entry boxes, so knowing how to improve a field marker's usability is largely irrelevant.
What's Still Valid
Of the 944 guidelines from 1986, 70% percent continue to be both correct and relevant today. There is much good advice, for example, on dealing with entry fields and labels on online forms, which have changed little from the dominant mainframe designs of the 1970s.
The guidelines on using business graphics to display different types of data are also highly relevant today. In our recent studies of how investors and financial analysts use the investor relations area of corporate websites, we found many usability problems related to overly complex charts. Following twenty-year-old guidelines for charting numbers would have improved many IR sites considerably.
The guidelines for error messages, system feedback, and login also hold up. It was interesting to see that guideline 6.2.1 recommended single sign-on. In our intranet usability measurements, we found that login difficulties (mainly due to multiple sign-in requirements) accounted for the second-largest difference in employee productivity between intranets with good usability and those with poor usability. (Search usability constituted the biggest difference between good and bad intranets.)
Why Usability Guidelines Endure
You would be hard-pressed to find any other Air Force technical manual from 1986 that's 70% correct and relevant today.
Usability guidelines endure because they depend on human behavior, which changes very slowly, if at all. What was difficult for users twenty years ago continues to be difficult today.
I recently analyzed my own old guidelines for Web usability, as published on the Alertbox and elsewhere in the early days of the Web. Of those early guidelines, 80% continue to be valid and relevant. Of course, my early guidelines are only ten to eleven years old, so it's hardly surprising that they'd score better than twenty-year-old guidelines.
Usability guidelines mainly become obsolete when they're tightly bound to specific technologies. For example, neither the 1986 field marker guidelines nor my 1995 guideline on making hypertext links blue held up. (More recent guidelines for link colors offer updated recommendations.) However, the corresponding underlying usability principles do hold: ensure that users know what they can do and that they can recognize actionable user interface elements.
Posterity vs. the Present
The more permanent guidelines tend to be those that are the most abstracted from technology.
The lure of the present is especially strong when writing for the Web. In writing a book, I'm highly conscious of people who will be reading my text ten or more years into the future. But when posting to my website, I tend to write for today's readers, even though 80% of the pageviews will occur after an article has passed into the archives. Luckily, most of my old analyses hold up pretty well, and ten-year-old articles continue to be 78% relevant.
However seductive the present might be, writing for the Web is writing for the ages, not just for the moment.
Usability guidelines have proven highly durable, and most hold true over time. Present-day designers should not dismiss old findings because of their age.
Source:
Jakob Nielsen's Alertbox, January 17, 2005:
Durability of Usability Guidelines
http://www.useit.com/alertbox/20050117.html
Durability of Usability Guidelines (Jakob Nielsen's Alertbox)
Sep 23 - Nielsen, Formal Usability Reports vs. Quick Findings (Alertbox)
Summary: Formal reports are the most common way of documenting usability studies, but informal reports are faster to produce and are often a better choice.
Introduction
I recently asked 258 usability practitioners which methods they use to communicate findings from their studies:
42% produce a formal written test report with full details on the methodology
36% write a "quick findings" report
24% circulate an email that lists the study's top findings
15% disseminate a spreadsheet of the findings
14% enter usability findings into a bug-tracking database
21% conduct a meeting in which they offer a formal presentation of the findings
27% conduct an informal meeting or debriefing to discuss the findings
1% show full-length videos from the test sessions
4% show highlights videos from the test
3% create and display posters or other physical exhibits
There's no one best approach to reporting usability study findings. Most people use more than one method, depending on their corporate culture and usability lifecycle approach.
That said, the survey clearly found that formal and brief reports are the two dominant means of disseminating usability findings. Both approaches have their place.
When to Use Quick Findings Reports
You can maximize user interface quality by conducting many rounds of testing as part of an iterative design process. To move rapidly and conduct the most tests within a given time frame and budget, informal reports are the best option.
Preparing a formal slide-based presentation will simply slow you down, as will using videos or statistics. Instead, simply hold a quick debriefing immediately after the test, structured around test observers' informal notes on user behavior. Follow this meeting with a short email to the entire team (the shorter the email, the greater the probability that it will be read).
Some organizations thrive on formal presentations and slide-deck circulation. In my view, this is a poor method for documenting usability findings. Bullet points don't capture the subtleties of user behavior, and it's almost impossible to interpret a slide presentation even a few months after it was created.
Extremely brief write-ups work well for studies aimed at finding an interface's main flaws to drive design iterations. Such studies are largely throwaway; once you've created the design's next version, the study findings are rarely useful.
When to Use Formal Reports
The formal report remains the most common format, but I think it's overused and prefer more rapid reporting and more frequent testing. The formal report definitely has its place, however, as in cases like these:
Benchmark studies or other quantitative tests. Unless you document the measurement methods in detail, you can't judge the numbers. Also, one of the main reasons to measure a benchmark is to measure it again later and compare. To do so, you need to know everything about the original study.
Competitive studies. When you test a broad sample of alternative designs, the resulting lessons are usually so fundamental and interesting that they warrant a complete report, with many screenshots and in-depth analysis.
Field studies. Most organizations rarely conduct studies at customer locations; when they do, the findings deserve an archival report that can be used for years. Also, field studies usually produce important insights that are too complex to be explained in a quick write-up.
Consulting projects. When you hire an expensive consultant to evaluate your user experience, you should milk the resulting insights for a long time to come. The outside perspective is only valuable, however, if it remains inside once the consultant has gone. To ensure this, you need a report that's both insightful and comprehensive.
Conclusion
The best usability reports are learning tools that help form a shared understanding across the team. It's worth investing the effort to produce a few formal reports each year. One way to free-up resources and make some reports extra good is to scale down your ambitions for most of your everyday reports, keeping them quick and informal.
Source:
Jakob Nielsen's Alertbox, April 25, 2005:
Formal Usability Reports vs. Quick Findings
http://www.useit.com/alertbox/20050425.html
Formal Usability Report vs. Quick Findings (Jakob Nielsen's Alertbox)
Sep 23 - Nielsen, Usability: Empiricism or Ideology? (Alertbox)
Summary: Usability's job is to research user behavior and find out what works. Usability should also defend users' rights and fight for simplicity. Both aspects have their place, and it's important to recognize the difference.
Introduction
There's a duality to usability. On the one hand, it's a quality assurance methodology that tells you what works and what doesn't work in the field of user experience. On the other hand, usability is a belief system that aims to ensure human mastery of the constructed environment.
Both perspectives are valid.
Usability as Empiricism
The economist Arnold Kling recently summarized the long-term growth in the economy by a somewhat peculiar metric: flour bags. When measured by how many bags of flour you can buy for a day's wages, the average worker today generates 430 times the value of a worker in the year 1500. (Kling uses bags of flour to compare productivity because it's one of the few things that has been produced continuously and provides the same benefit today as it did in past centuries.)
Whether in science or business, the basic point is the same: you propose a solution, then see if it works in the real world. Hypotheses that work become accepted scientific theory; companies that offer the most value to customers become established in business.
Usability is also a reality check. There are two main ways for usability to derive value from reality:
* Before design begins, usability methods such as field studies and competitive studies are used to set the design's direction based on knowledge of the real world. These methods are similar to the scientific method's hypothesis-checking elements: you discover principles that explain observed reality and then use them as a guide to build products that are more likely to work.
* After a design has been created, other usability methods, such as user testing, determine whether humans can understand the proposed user interface. Just as entrepreneurs compete to see which business ideas create the most value for customers, usability specialists show customers alternative interface designs to see which one works best. The main difference is that it's much cheaper to test a paper prototype of a design than it is to start a company.
When something causes problems for many users on many different websites, we issue a guideline warning against it. Similarly, when a design element works well under many different conditions, we issue a guideline recommending it.
Despite these differences, the fundamental approach of usability and harder sciences is the same: conclusions and recommendations are grounded in what is empirically observed in the real world. The job of usability is to be the reality check for a design project and -- given human behavior -- determine what works and what doesn't.
Usability as Ideology
At the same time, usability is also an ideology -- the belief in a certain specialized type of human rights:
* The right of people to be superior to technology. If there's a conflict between technology and people, then technology must change.
* The right of empowerment. Users should understand what's happening and be capable of controlling the outcome.
* The right to simplicity. Users should get their way with computers without excessive hassle.
* The right of people to have their time respected. Awkward user interfaces waste valuable time.
If designers and project managers don't believe in the usability ideology, why would they implement usability's empirical findings? After all, if you don't want to make things easy, knowing how to make them easy is irrelevant.
Respecting users' rights makes people happier and thus makes the world a better place -- nice, but not reason enough for most hard-nosed decision makers. Luckily, the Web provides a very clear-cut reason to support the usability ideology even if you only care about the bottom line: If your site is too difficult, users will simply leave.
On average, websites that try usability double their sales or other desired business metrics.
Balancing the Two Perspectives
As a user advocate, you need both perspectives: usability as empiricism and usability as ideology. Each perspective requires a particular approach.
When taking the empirical approach, you must be unyielding and always report the truth, no matter how unpopular. If something works easily, say so. If something will cause users to leave, say so. The only way to improve quality is to base decisions on the facts, and others on your team should know these facts.
In contrast, when viewing usability as an ideology, you must be willing to compromise. Sometimes decisions must be made that will lower the design's usability quality, either because of limited time and budget or because of trade-offs with other desirable qualities. Of course, project managers can only make good trade-offs when they know the facts about the design elements that help or hurt users.
Source:
Jakob Nielsen's Alertbox, June 27, 2005:
Usability: Empiricism or Ideology?
http://www.useit.com/alertbox/20050627.html
Usability: Empiricism or Ideology? (Jakob Nielsen's Alertbox)
Sep 22 - Putting A/B Testing in Its Place (Alertbox)
Summary: Measuring the live impact of design changes on key business metrics is valuable, but often creates a focus on short-term improvements. This near-term view neglects bigger issues that only qualitative studies can find.
Introduction
In A/B testing, you unleash two different versions of a design on the world and see which performs the best. For decades, this has been a classic method in direct mail, where companies often split their mailing lists and send out different versions of a mailing to different recipients. A/B testing is also becoming popular on the Web, where it's easy to make your site show different page versions to different visitors.
Sometimes, A and B are directly competing designs and each version is served to half the users. Other times, A is the current design and serves as the control condition that most users see. In this scenario, B, which might be more daring or experimental, is served only to a small percentage of users until it has proven itself.
Benefits
Compared with other methods, A/B testing has four huge benefits:
1. It measures the actual behavior of your customers under real-world conditions. You can confidently conclude that if version B sells more than version A, then version B is the design you should show all users in the future.
2. It can measure very small performance differences with high statistical significance because you can throw boatloads of traffic at each design. The sidebar shows how you can measure a 1% difference in sales between two designs.
3. It can resolve trade-offs between conflicting guidelines or qualitative usability findings by determining which one carries the most weight under the circumstances.
For example, if an e-commerce site prominently asks users to enter a discount coupon, user testing shows that people will complain bitterly if they don't have a coupon because they don't want to pay more than other customers. At the same time, coupons are a good marketing tool, and usability for coupon holders is obviously diminished if there's no easy way to enter the code.
When e-commerce sites have tried A/B testing with and without coupon entry fields, overall sales typically increased by 20-50% when users were not prompted for a coupon on the primary purchase and checkout path. Thus, the general guideline is to avoid prominent coupon fields.
4. It's cheap: once you've created the two design alternatives (or the one innovation to test against your current design), you simply put both of them on the server and employ a tiny bit of software to randomly serve each new user one version or the other. Also, you typically need to cookie users so that they'll see the same version on subsequent visits instead of suffering fluctuating pages, but that's also easy to implement. There's no need for expensive usability specialists to monitor each user's behavior or analyze complicated interaction design questions. You just wait until you've collected enough statistics, then go with the design that has the best numbers.
Limitations
With these clear benefits, why don't we use A/B testing for all projects? Because the downsides usually outweigh the upsides.
First, A/B testing can only be used for projects that have one clear, all-important goal, that's to say a single KPI (key performance indicator). Furthermore, this goal must be measurable by computer, by counting simple user actions. Examples of measurable actions include:
Sales for an e-commerce site.
Users subscribing to an email newsletter.
Users opening an online banking account.
Users downloading a white paper, asking for a salesperson to call, or otherwise explicitly moving ahead in the sales pipeline.
For many sites, the ultimate goals are not measurable through user actions on the server. Goals like improving brand reputation or supporting the company's public relations efforts can't be measured by whether users click a specific button.
Similarly, while you can easily measure how many users sign up for your email newsletter, you can't assess the equally important issue of how they read your newsletter content without observing subscribers as they open the messages.
A second downside of A/B testing is that it only works for fully implemented designs. It's cheap to test a design once it's up and running, but we all know that implementation can take a long time.
In contrast, paper prototyping lets you try out several different ideas in a single day. Of course, prototype tests give you only qualitative data, but they typically help you reject truly bad ideas quickly and focus your efforts on polishing the good ones.
Short-Term Focus
A/B testing's driving force is the number being measured as the test's outcome. Usually, this is an immediate user action, such as buying something. In theory, there's no reason why the metric couldn't be a long-term outcome, such as total customer value over a five-year period. In practice, however, such long-term tracking rarely occurs. Nobody has the patience to wait years before they know whether A or B is the way to go.
Basing your decisions on short-term numbers, however, can lead you astray. A common example: Should you add a promotion to your homepage or product pages?
No Behavioral Insights
The biggest problem with A/B testing is that you don't know why you get the measured results. You're not observing the users or listening in on their thoughts. All you know is that, statistically, more people performed a certain action with design A than with design B. Sure, this supports the launch of design A, but it doesn't help you move ahead with other design decisions.
Say, for example, that you tested two sizes of Buy buttons and discovered that the big button generated 1% more sales than the small button. Does that mean that you would sell even more with an even bigger button? Or maybe an intermediate button size would increase sales by 2%. You don't know, and to find out you have no choice but to try again with another collection of buttons.
Of course, you also have no idea whether other changes might bring even bigger improvements, such as changing the button's color or the wording on its label. Or maybe changing the button's page position or its label's font size, rather than changing the button’s size, would create the same or better results.
Worst of all, A/B testing provides data only on the element you're testing. It's not an open-ended method like user testing, where users often reveal stumbling blocks you never would have expected.
Combining Methods
A/B testing has more problems than benefits. You should not make it the first method you choose for improving your site's conversion rates. And it should certainly never be the only method used on a project.
Qualitative observation of user behavior is faster and generates deeper insights. Also, qualitative research is less subject to the many errors and pitfalls that plague quantitative research.
A/B testing does have its own advantages, however, and provides a great supplement to qualitative studies. Once your company's commitment to usability has grown to a level where you're regularly conducting many forms of user research, A/B testing definitely has its place in the toolbox.
Source:
Jakob Nielsen's Alertbox, August 15, 2005:
Putting A/B Testing in Its Place
http://www.useit.com/alertbox/20050815.html
Putting A/B Testing in Its Place (Jakob Nielsen's Alertbox)
Monday, September 21, 2009
Sep 22 - Weblog Usability: The Top Ten Design Mistakes (Alertvbox)
Summary: Blogs are often too internally focused and ignore key usability issues, making it hard for new readers to understand the site and trust the author.
Introduction
Weblogs are a form of website. The thousands of normal website usability guidelines therefore apply to them, as do this year's top ten design mistakes. But weblogs are also a special genre of website; they have unique characteristics and thus distinct usability problems.
One of a weblog's great benefits is that it essentially frees you from "Web design." You write a paragraph, click a button, and it's posted on the Internet. No need for visual design, page design, interaction design, information architecture, or any programming or server maintenance.
Blogs make having a simple website much easier, and as a result, the number of people who write for the Web has exploded. This is a striking confirmation of the importance of ease of use.
Weblogs' second benefit is that they're a Web-native content genre: they rely on links, and short postings prevail. You can simply find something interesting on another site and link to it, possibly with commentary or additional examples.
As a third benefit, blogs are part of an ecosystem (often called the Blogosphere) that serves as a positive feedback loop: Whatever good postings exist are promoted through links from other sites. More reader/writers see this good stuff, and the very best then get linked to even more. As a result, link frequency follows a Zipf distribution, with disproportionally more links to the best postings.
Some weblogs are really just private diaries intended only for a handful of family members and close friends. Usability guidelines generally don't apply to such sites.
Also, while readers of your intranet weblog might know you, usability is important because your readers are on company time. (As an example, see IBM's use of intranet blogs — among the ten best intranets of 2006.)
1. No Author Biographies
Unless you're a business blog, you probably don't need a full-fledged "about us" section the way a corporate site does.
Basic rationale for "about us" translates directly into the need for an "about me" page on a weblog: users want to know who they're dealing with.
It's a simple matter of trust. Anonymous writings have less credence than something that's signed.
2. No Author Photo
Even weblogs that provide author bios often omit the author photo. A photo is important for two reasons:
1. It offers a more personable impression of the author. You enhance your credibility by the simple fact that you're not trying to hide. Also, users relate more easily to somebody they've seen.
2. It connects the virtual and physical worlds. People who've met you before will recognize your photo, and people who've read your site will recognize you when you meet in person (say, at a conference — or the company cafeteria if you're an intranet blogger). A huge percentage of the human brain is dedicated to remembering and recognizing faces.
Also, if you run a professional blog and expect to be quoted in the press, you should follow the recommendations for using the Web for PR and include a selection of high-resolution photos that photo editors can download.
3. Nondescript Posting Titles
Sadly, even though weblogs are native to the Web, authors rarely follow the guidelines for writing for the Web in terms of making content scannable. This applies to a posting's body text, but it's even more important with headlines.
Users must be able to grasp the gist of an article by reading its headline. Avoid cute or humorous headlines that make no sense out of context.
Your posting's title is microcontent and you should treat it as a writing project in its own right. On a value-per-word basis, headline writing is the most important writing you do.
Descriptive headlines are especially important for representing your weblog in search engines, newsfeeds (RSS), and other external environments.
In those contexts, users often see only the headline and use it to determine whether to click into the full posting. Even if users see a short abstract along with the headline (as with most search engines), user testing shows that people often read only the headline. In fact, people often read only the first three or four words of a headline when scanning a list of possible places to go.
4. Links Don't Say Where They Go
Many weblog authors seem to think it's cool to write link anchors like: "some people think" or "there's more here and here." Remember one of the basics of the Web: Life is too short to click on an unknown. Tell people where they're going and what they'll find at the other end of the link.
Generally, you should provide predictive information in either the anchor text itself or the immediately surrounding words. You can also use link titles for supplementary information that doesn't fit with your content. (To see a link title in action, mouse over the "link titles" link.)
5. Classic Hits are Buried
Don't relegate such classics to the archives, where people can only find something if they know you posted it, say, in May 2003.
Highlight a few evergreens in your navigation system and link directly to them. For example, my own list of almost 300 Alertbox columns starts by saying, "Read these first: Usability 101 and Top Ten Mistakes of Web Design."
Also, remember to link to your past pieces in newer postings. Don't assume that readers have been with you from the beginning; give them background and context in case they want to read more about your ideas.
6. The Calendar is the Only Navigation
A timeline is rarely the best information architecture, yet it's the default way to navigate weblogs. Most weblog software provides a way to categorize postings so users can easily get a list of all postings on a certain topic.
Do use categorization, but avoid the common mistake of tagging a posting with almost all of your categories. Be selective.
Categories must be sufficiently detailed to lead users to a thoroughly winnowed list of postings. Ten to twenty categories are appropriate for structuring many topics.
On the main page for each category, highlight that category's evergreens as well as a time line of its most recent postings.
7. Irregular Publishing Frequency
Establishing and meeting user expectations is one of the fundamental principles of Web usability.
For a weblog, users must be able to anticipate when and how often updates will occur.
For most weblogs, daily updates are probably best, but weekly or even monthly updates might work as well, depending on your topic. In either case, pick a publication schedule and stick to it.
If you usually post daily but sometimes let months go by without new content, you'll lose many of your loyal — and thus most valuable — readers.
To ensure regular publishing, hold back some ideas and post them when you hit a dry spell.
8. Mixing Topics
If you publish on many different topics, you're less likely to attract a loyal audience of high-value users. Busy people might visit a blog to read an entry about a topic that interests them. They're unlikely to return, however, if their target topic appears only sporadically among a massive range of postings on other topics.
The more focused your content, the more focused your readers. That, again, makes you more influential within your niche.
Specialized sites rule the Web, so aim tightly. This is especially important if you're in the business-to-business (B2B) sector.
If you have the urge to speak out on, say, both American foreign policy and the business strategy of Internet telephony, establish two blogs. You can always interlink them when appropriate.
9. Forgetting That You Write for Your Future Boss
Whenever you post anything to the Internet — whether on a weblog, in a discussion group, or even in an email — think about how it will look to a hiring manager in ten years. Once stuff's out, it's archived, cached, and indexed in many services that you might never be aware of.
Think twice before posting. If you don't want your future boss to read it, don't post.
10. Having a Domain Name Owned by a Weblog Service
Having a weblog address ending in blogspot.com, typepad.com, etc. will soon be the equivalent of having an @aol.com email address or a Geocities website: the mark of a naïve beginner who shouldn't be taken too seriously.
Letting somebody else own your name means that they own your destiny on the Internet. They can degrade the service quality as much as they want. They can increase the price as much as they want. They can add atop your content as many pop-ups, blinking banners, or other user-repelling advertising techniques as they want. They can promote your competitor's offers on your pages.
The longer you stay at someone else's domain name, the higher the cost of going independent. Yes, it's tempting to start a new weblog on one of the services that offer free accounts. It's easy, it's quick, and it's obviously cheap. But it only costs $8 per year to get your personal domain name and own your own future.
As soon as you realize you're serious about blogging, move it away from a domain name that's controlled by somebody else. The longer you delay, the more pain you'll feel when you finally make the move.
Source:
Jakob Nielsen's Alertbox, October 17, 2005:
Weblog Usability: The Top Ten Design Mistakes
http://www.useit.com/alertbox/weblogs.html
Blog Usability: Top 10 Weblog Design Mistakes (Jakob Nielsen's Alertbox)