Wednesday, July 29, 2009

Your friend, the data file

Before you can use a program like SPSS, you need a spreadsheet of study data. Here are some basics to help you prepare a data file. Let's use the example that we ran a usability study with 20 participants on a new personal finance web site.

Each row of your spreadsheet represents one case (or participant) in the study. For this file, you would have 20 cases. Each column represents one variable. Your variables would be questions or data that you collected for each person such as participant id number, age, computer ownership, time on tasks, satisfaction ratings, and comprehension test scores.

Since some statistical packages accept only numerical data, a best practice is to code everything as numbers. Suppose you asked participants for their age group: 18-24, 25-34, 35-44, etc. You can recode their responses as 1 (=18-24), 2 (=25-34), 3 (=35-44), etc.

Maintaining a consistent numbering scheme is also a good practice. I like to code No and Yes answers as 0 and 1 and use 0 for any kind of No answer such as "I don't own X" or "I don't use Y."

I like to code Likert scale answers starting at 1, where 1 is the most negative option and n is most positive. If the question is worded negatively, you may have to recode the answers in semantically negative to positive order. Using a consistent scale makes it easier to analyze related questions.

Example A: Imagine you asked: What is your level of agreement with these statements? where 1 = I definitely disagree, 3 = I feel neutral, and 5 = I definitely agree.
  1. I balance my checkbook.
  2. I never keep my pay check stubs.

To make these answers semantically parallel, you would code question 1 exactly as the participants answered it: 1=1, 2=2, etc. You would flip the scale (1=5, 2=4, etc.) for question 2 because the statement is negative: when participants say they disagree, they mean they do keep their pay stubs. Thus, if you test for a statistical correlation, you in essence look for a positive correlation between balancing a checkbook and keeping pay stubs - much simpler to think about than negative correlation with not keeping pay stubs.

If you have missing answers from your participants, keep those spreadsheet cells blank so you do not accidentally run calculations on them. Any answer you want to exclude from calculations should also be treated as blanks.

Example B: Suppose you ask, "How satisfied are you with your bank?" where 1 = Very unsatisfied, 3 = Neutral, and 5 = Very satisfied. You also offer an "I don't know" option. Any "I don't know's" should be coded as blank. If you ran a stat on satisfaction level, you would want to exclude people who don't know because they cannot speak to this question.

No comments:

Post a Comment