Tuesday, September 22, 2009

Sep 22 - Putting A/B Testing in Its Place (Alertbox)

Putting A/B Testing in Its Place

Summary: Measuring the live impact of design changes on key business metrics is valuable, but often creates a focus on short-term improvements. This near-term view neglects bigger issues that only qualitative studies can find.

Introduction
In A/B testing, you unleash two different versions of a design on the world and see which performs the best. For decades, this has been a classic method in direct mail, where companies often split their mailing lists and send out different versions of a mailing to different recipients. A/B testing is also becoming popular on the Web, where it's easy to make your site show different page versions to different visitors.
Sometimes, A and B are directly competing designs and each version is served to half the users. Other times, A is the current design and serves as the control condition that most users see. In this scenario, B, which might be more daring or experimental, is served only to a small percentage of users until it has proven itself.

Benefits

Compared with other methods, A/B testing has four huge benefits:

1. It measures the actual behavior of your customers under real-world conditions. You can confidently conclude that if version B sells more than version A, then version B is the design you should show all users in the future.

2. It can measure very small performance differences with high statistical significance because you can throw boatloads of traffic at each design. The sidebar shows how you can measure a 1% difference in sales between two designs.

3. It can resolve trade-offs between conflicting guidelines or qualitative usability findings by determining which one carries the most weight under the circumstances.
For example, if an e-commerce site prominently asks users to enter a discount coupon, user testing shows that people will complain bitterly if they don't have a coupon because they don't want to pay more than other customers. At the same time, coupons are a good marketing tool, and usability for coupon holders is obviously diminished if there's no easy way to enter the code.
When e-commerce sites have tried A/B testing with and without coupon entry fields, overall sales typically increased by 20-50% when users were not prompted for a coupon on the primary purchase and checkout path. Thus, the general guideline is to avoid prominent coupon fields.

4. It's cheap: once you've created the two design alternatives (or the one innovation to test against your current design), you simply put both of them on the server and employ a tiny bit of software to randomly serve each new user one version or the other. Also, you typically need to cookie users so that they'll see the same version on subsequent visits instead of suffering fluctuating pages, but that's also easy to implement. There's no need for expensive usability specialists to monitor each user's behavior or analyze complicated interaction design questions. You just wait until you've collected enough statistics, then go with the design that has the best numbers.

Limitations

With these clear benefits, why don't we use A/B testing for all projects? Because the downsides usually outweigh the upsides.

First, A/B testing can only be used for projects that have one clear, all-important goal, that's to say a single KPI (key performance indicator). Furthermore, this goal must be measurable by computer, by counting simple user actions. Examples of measurable actions include:
Sales for an e-commerce site.
Users subscribing to an email newsletter.
Users opening an online banking account.
Users downloading a white paper, asking for a salesperson to call, or otherwise explicitly moving ahead in the sales pipeline.

For many sites, the ultimate goals are not measurable through user actions on the server. Goals like improving brand reputation or supporting the company's public relations efforts can't be measured by whether users click a specific button.
Similarly, while you can easily measure how many users sign up for your email newsletter, you can't assess the equally important issue of how they read your newsletter content without observing subscribers as they open the messages.

A second downside of A/B testing is that it only works for fully implemented designs. It's cheap to test a design once it's up and running, but we all know that implementation can take a long time.

In contrast, paper prototyping lets you try out several different ideas in a single day. Of course, prototype tests give you only qualitative data, but they typically help you reject truly bad ideas quickly and focus your efforts on polishing the good ones.

Short-Term Focus

A/B testing's driving force is the number being measured as the test's outcome. Usually, this is an immediate user action, such as buying something. In theory, there's no reason why the metric couldn't be a long-term outcome, such as total customer value over a five-year period. In practice, however, such long-term tracking rarely occurs. Nobody has the patience to wait years before they know whether A or B is the way to go.
Basing your decisions on short-term numbers, however, can lead you astray. A common example: Should you add a promotion to your homepage or product pages?

No Behavioral Insights

The biggest problem with A/B testing is that you don't know why you get the measured results. You're not observing the users or listening in on their thoughts. All you know is that, statistically, more people performed a certain action with design A than with design B. Sure, this supports the launch of design A, but it doesn't help you move ahead with other design decisions.

Say, for example, that you tested two sizes of Buy buttons and discovered that the big button generated 1% more sales than the small button. Does that mean that you would sell even more with an even bigger button? Or maybe an intermediate button size would increase sales by 2%. You don't know, and to find out you have no choice but to try again with another collection of buttons.

Of course, you also have no idea whether other changes might bring even bigger improvements, such as changing the button's color or the wording on its label. Or maybe changing the button's page position or its label's font size, rather than changing the button’s size, would create the same or better results.

Worst of all, A/B testing provides data only on the element you're testing. It's not an open-ended method like user testing, where users often reveal stumbling blocks you never would have expected.

Combining Methods

A/B testing has more problems than benefits. You should not make it the first method you choose for improving your site's conversion rates. And it should certainly never be the only method used on a project.
Qualitative observation of user behavior is faster and generates deeper insights. Also, qualitative research is less subject to the many errors and pitfalls that plague quantitative research.

A/B testing does have its own advantages, however, and provides a great supplement to qualitative studies. Once your company's commitment to usability has grown to a level where you're regularly conducting many forms of user research, A/B testing definitely has its place in the toolbox.

Source:
Jakob Nielsen's Alertbox, August 15, 2005:
Putting A/B Testing in Its Place
http://www.useit.com/alertbox/20050815.html
Putting A/B Testing in Its Place (Jakob Nielsen's Alertbox)

No comments:

Post a Comment