Thursday, October 8, 2009

Oct 8 - Questionnaires in Usability Engineering (part 2)

What is meant by reliability?

The reliability of a questionnaire is the ability of the questionnaire to give the same results when filled out by like-minded people in similar circumstances. Reliability is usually expressed on a numerical scale from zero (very unreliable) to one (extremely reliable.)

What is meant by validity?

The validity of a questionnaire is the degree to which the questionnaire is actually measuring or collecting data about what you think it should be measuring or collecting data about. Note that not only do opinion surveys have validity issues; factual questionnaires may have very serious validity issues if for instance, respondents interpret the questions in different ways.


Factual-type questionnaires are easy to do, though, aren't they?

A factual, or 'survey' questionnaire is one that asks for relatively straightforward information and does not need personal interpretation to answer. Answers to factual questions can be proven right or wrong. An opinion based questionnaire is one that asks the respondent what they think of something. An answer to an opinion question cannot be proven right or wrong: it is simply the opinion of the respondent and is inaccessible to independent verification.


What's the difference between a questionnaire which gives you numbers and one that gives you free text comments?

A closed-ended questionnaire is one that leaves no room for individual comments from the respondent. The respondent replies to a set of questions in terms of pre-set responses for each question. These responses can then be coded as numbers.
An open-ended questionnaire requests the respondent to reply to the questions in their own words, maybe even to suggest topics to which replies may be given. The ultimate open-ended questionnaire is a 'critical incident' type of questionnaire in which respondents explain several good or bad experiences, and the circumstances which led up to them, and what happened after, all in their own words.

Closed-ended questionnaires are good if you are going to be processing massive quantities of data, or if your questionnaire is appropriately scaled to yield meaningful numeric data. If you are using a closed-ended questionnaire, however, encourage the respondents to leave their comments either in a special space provided on the page, or in the margins. You'll be surprised what this gives you.
Open ended questionnaires are good if you are in an exploratory phase of your research or you are looking for some very specific comments or answers that can't be summarised in a numeric code.


Can you mix factual and opinion questions, closed and open ended questions?

It's a good idea to mix some open-ended questions in a closed-ended opinion questionnaire and it's also not a bad thing to have some factual questions at the start of an opinion questionnaire to find out who the respondents are, what they do, and so on.
Some of your factual questions may need to be open-ended, for instance if you are asking respondents for the name of the hardware they are using.
This also means you can construct your own questionnaire booklets by putting together a reliable opinion questionnaire, for instance, and then add some factual questions at the front and maybe some open ended opinion questions at the end.


How do you analyse open-ended questionnaires?

The standard method is called 'content analysis' and is a subject all of its own.
Content analysis usually lets you boil down responses into categories, and then you can count the frequency of occurrence of different categories of response.


How many response options should there be in a numeric questionnaire?

There are two sets of issues here. One is, should we have an odd or even number of response options. The general answer to give here is that, if there is a possibility of having a 'neutral' response to a set of questions, then you should have an odd number of questions with the central point being the neutral place. On the other hand, if it is a question of whether something is good/bad, male/female (bi-polar) then basically, you are looking at two response options. You may wish to assess the strength of the polarity; you are actually asking two questions in one: firstly, is to good or bad, and secondly, is it really very good or very bad. This leads you to an even number of response options.
Some people use even numbers of response options to 'force' the respondents to go one way or another. What happens in practice is that respondents end up giving random responses between the two middle items. Not very useful.


How many anchors should a questionnaire have?

The little verbal comments above the numbers ('strongly agree', etc.) are what we call anchors. In survey work, where the questions are factual, it is considered a good idea to have anchors above all the response options, and this will give you accurate results. In opinion or attitude work, you are asking a respondent to express their position on a scale of feeling from strong agreement to strong disagreement, for instance. Although it would be helpful to indicate the central (neutral) point if it is meaningful to do so, having numerous anchors may not be so important. Indeed, some questionnaires on attitudes have been proposed with a continuous line and two end anchors for each statement. The respondent has to place a mark on the line indicating the amount of agreement or disagreement they wish to express. Such methods are still relatively new.
A related question is, should I include a 'no answer' option for each item. This depends on what kind of questionnaire you are developing. A factual style questionnaire should most probably not have a 'no answer' option unless issues of privacy are involved. If in an opinion questionnaire, many of your respondents complain about items 'not being applicable' to the situation, you should consider carefully whether these items should be changed or re-worded.
In general, I tend to distrust 'not applicable' boxes in questionnaires. If the item is really not applicable, it shouldn't be there in the first place. If it is applicable, then you are simply cutting down on the amount of data you are going to get. But this is a personal opinion.


Should favourable responses always be be checked on the left (or right) hand side of the scale?

Usually no. The reason for not constructing a questionnaire in this manner is because response bias can come into play.
A respondent can simply check off all the 'agrees' without having to consider each statement carefully, so you have no guarantee that they've actually responded to your statements -- they could be working on 'auto-pilot'. Of course, such questionnaires will also produce fairly impressive statistical reliabilities, but again, that could be a cheat.


Is a long questionnaire better than a short one? How short can a questionnaire be?

You have to ensure that you have enough statements which cover the most common shades of opinion about the construct being rated. But this has to be balanced against the need for conciseness: you can produce a long questionnaire that has fantastic reliabilities and validities when tested under controlled conditions with well-motivated respondents, but ordinary respondents may just switch off and respond at random after a while. In general, because of statistical artefacts, long questionnaires will tend to produce good reliabilities with well-motivated respondents, and shorter questionnaires will produce less impressive reliabilities but short questionnaires may be a better test of overall opinion in practice.

A questionnaire should not be judged by its statistical reliability alone. Because of the nature of statistics, especially the so-called law of large numbers, we will find that what was only a trend with a small sample becomes statistically significant with a large sample. Statistical 'significance' is a technical term with a precise mathematical meaning. Significance in the everyday sense of the word is a much broader concept.


So high statistical reliability is not the 'gold standard' to aim for?

If a short (say 8 - 10 items) questionnaire exhibits high reliabilities (above 0.85, as a rule of thumb) then you should look at the items carefully and examine them for spurious repetitions. Longer questionnaires (12 - 20 items) if well constructed should yield reliability values of 0.70 or more.
I stress these are rules of thumb: there is nothing absolute about them.


What's the minimum and maximum figure for reliability?

Theoretically, the minimum is 0.00 and the maximum is 1.0. Suspect a questionnaire whose reliability falls below 0.50 unless it is very short (3-4 items) and there is a sound reason to adopt it.
The problem with questionnaires of low reliability is that you simply don't know whether they are telling you the truth about what you are trying to measure or not. It's the lack of assurance that's the problem.


Where can I find out more about questionnaires?

Please don't take seriously those books which devote a chapter to Likert scaling and then urge you to go out and try doing a questionnaire yourself. These authors are doing everyone a disservice. Here is a minimalist list of reference sources for questionnaire construction that I have found useful as teaching material.

Aiken, Lewis R., 1996, Rating Scales and Checklists. Wiley. ISBN 0-471-12787-6. Good general introduction including discussions of personality and achievement questionnaires.

Czaja, Ronald, and Johnny Blair, 1996, Designing Surveys. Pine Forge Press. ISBN 0-8039-9056-1. A useful resource for factual-style surveys, including material on interviews as well as mail surveys.

DeVellis, Robert F., 1991, Scale Development, Theory and Applications. Sage Publications, Applied Social Research Methods Series vol. 26. ISBN 0-8039-3776-8. Somewhat theoretical, but important information if you want to take questionnaire development seriously.

Ghiselli, Edwin E., John P. Campbell, and Sheldon Zedeck, 1981, Measurement Theory for the Behavioural Sciences. WH Freeman & Co. ISBN 0-7167-1252-0. A useful reference for statistical issues. Considered 'very readable' by some.

Kline, Paul, 1986, A Handbook of Test Construction. Methuen. ISBN 0-416-39430-2. Practically-orientated, with a lot of good, helpful advice for all stages of questionnaire construction and testing. Some people find it tough going but it is a classic.

Stecher, Brian M. and W. Alan Davis, 1987, How to Focus an Evaluation. Sage Publications. ISBN 0-803903127-1. About more than just questionnaires, but it serves to remind the reader that questionnaires are always part of a broader set of concerns when carrying out an evaluation.


Questionnaires in Usability Engineering
A List of Frequently Asked Questions (3rd Ed.)
Compiled by: Jurek Kirakowski,
Human Factors Research Group, Cork, Ireland.
This edition: 2nd June, 2000.

Source: http://www.ucc.ie/hfrg/resources/qfaq1.html

No comments:

Post a Comment