Wednesday, July 29, 2009

Comparing two test groups

In the UX world, we often compare how people perform in Design A vs. Design B on certain measures such as pageviews, bounce rate, clickthrough, or sales. To decide whether Design A or Design B is better, we need to compare the means and variability of the users' scores in these two groups.

Sometimes you can tell which design is better by eyeballing the data. Suppose we split our site traffic between versions A and B of our web site. Design A used neon-colored blinking links, and Design B used standard links. After looking at our analytical reports, we might see that A visitors spent an average of 1 minute on the site, while B visitors spent 3 minutes. Both groups had similarly low variability. We can probably say that Design B was better than Design A for getting users to stay on the site because the scores for both groups are so different without extreme outliers skewing the results.

It is when the two groups look much more alike that we really need a statistical test. When comparing two groups, we can run an independent samples t-test. (Fun fact: The t-test was invented at the Guinness brewery.)

A t-test works by comparing the means and variability of two groups of interest. It essentially considers how much the scores for each of the two groups overlap. If they overlap completely, then the two groups are not different from one another. They less they overlap, the more likely the two groups are statistically different.

Example. We currently sell our flagship product, The MowBee, on a web page with no pictures. We want to know if a picture will increase sales. We split traffic equally between picture and no picture versions of the site. After a week, we funneled 100 people to the Picture page, and 100 people to the No Picture page. Each person's purchase (if any) is recorded. Our data file at the end has 200 cases and three variables: participant id number, condition (pic, no pic), and dollar amount of sales. If we ran this data through SPSS' independent samples t-test and found statistical significance, the results would have a t-test score (t) and the level of probability (p) that we wrongly found statistical difference between the two groups. The means and standard deviation for each group would also be reported so we can see which was higher. We might have a result like this:
We made significantly more money per visitor, t (198) = 2.47, p < 0.05, with the Picture page (M=$85, SD=44.41) than the No Picture page (M=$47.50, SD=18.45).

No comments:

Post a Comment