The logic of statistical (stat) testing is not complex, but it can be difficult to understand, because it is the reverse of everyday logic and what normal people expect. Basically, to determine if two numbers differ significantly, it is assum that they are the same. The test then determines whether this notion can be reject, and we can say that the numbers are statistically significantly different at the (some pretermin) confidence level.
While it is not complex
The logic can be subtle. One subtlety leads to a common error, aid and abett by automatic computer stat testing – overtesting. Suppose there is a group of 200 men and one of 205 women, and they respond to a new product concept on a purchase intent scale. The data might look like that shown in Table A.
Statistical logic assumes that the two percentages to be test are from the same population – they do not differ. Therefore, it is assum that men have the same purchase interest as women.
Supercharge your marketing with our Job Function Email Database. Reach out to the right people by targeting particular roles and get your message across to the key decision-makers. Rest assured of high-quality, tailored data that drives job function email list better engagement, improves lead generation, and enhances conversion. Do not waste time on generic campaigns if you could actually focus on audiences that count. Power up your marketing to achieve better results using our Job Function Email Database today!
The rules also assume that the numbers are unrelat, in the sense that the percentages being test are free to be whatever they might be, from 0 percent to 100 percent. Restricting them in any way changes the probabilities, and the dynamics of the statistical test.
The right way to test for
A difference in purchase intent is to pick a key measure to summarize the responses, and test that measure. In Table A, the Top Two Box score was test – the combin percentages from the top two points on the scale (definitely would buyplus probably would buy).
Within the group of men, this number could have turn out to be anything. It just happen to be 13 percent. Within the group of women, it could have been anything, and, as it turns out, was 40 percent. Within each group, the number was free to be anything from 0 percent to 100 percent, so picking this percentage to test follows the statistical rule.
The stat test indicates that the idea that these percentages are from the same place (or are the same) can be reject, so we can say they are statistically significantly different at the 95 percent confidence level.
Something different
Often happens in practice, though. Since the computer programs that generate survey data do not knowwhat summary measure she worked with the it team to redesign the website will be important, these programs test everything. When looking at computer-generat data tables, the statistical results will look something like those shown in Table B.
If the Top Two Box score is select ahead of time, and that is all that is examin (as in Table A), then this automatic testing is very helpful. It does the work, and shows that 13 percent differs from 40 percent. The other stat test results are ignor. However, if the data are report as shown in Table B, there is a problem.
The percentages for the men
If one percentage is pick for testing, it is taken outof the scale, in a sense. The other percentages are no longer free to be whatever they might be. They must add to 100 percent minus the set, fix percent that was select for testing. Percentages for the men can vary from 0 percent to 87 percent, but they can’t be higher, because 13 percent is us up.
Similarly, percentages for the women can vary from 0 percent to 60 percent, but 40 percent is us already. When you look at testing in the other rows, or row by row, you are no longer using the confidence level you think you are using – it becomes something else.
Statistically if one said of
Table B that the percentages that definitely would buyand the percentages that definitely/probably would buyboth differ at the 95 percent confidence level, it would be wrong. One of them does, but the other difference is at some phone number my unknown level of significance, probably much less than 95 percent, given one relat significant difference.
Stat tests are very useful
Each one answers a specific question about a numerical relationship. The one most commonly ask about scale responses is whether two numbers differ significantly. If they are the right two numbers, and the proper test is us, the question is easily answer. If they are the wrong two numbers, or the wrong test has been us, the decision maker can be misl.
However, this action has undesirable consequences because some respondents are contact before an interviewer is available. In most cases, the dialer then places the respondent on hold or disconnects the call. Both actions decrease respondent goodwill.
The FTC has mandat through legislation governing the activities of telemarketers a number of metrics to assure responsible dialer use. It would be in the best interest of the research community for those who use dialers to voluntarily adopt these guidelines as a baseline. A comprehensive presentation of these regulations can be found in the FTC’s Telemarketing Sales Rule. A brief summary is available online at www.ftc.gov/bcp/conline/pubs/buspubs/calling.htm .
Survey length
The marginal cost to increase the length of an Internet questionnaire is negligible. It is largely confin to some additional programming and hosting time. However, respondents are annoy when the instrument runs on for an extend period of time.
At the end of the survey, if they finish, these respondents may feel abus and exploit. As a guide, questionnaires with an average time longer than 15 minutes should be carefully examin. Many surveys can be condens through careful iting or, if ne, bifurcat into separate efforts. Another solution involves increasing the incentive beyond the trivial sum too commonly offer by Internet panel providers.
Respondent fatigue
Due primarily to the availability of new technologies such as IVR and the Internet, there has been a proliferation of very low-cost survey activity. Some of this simple work is of value, thereby justifying the respondent’s time. However, many pop surveys are of nominal value. Often the research design is perform by an uninitiat researcher and the results are of little value.
For instance, the newscaster-announc toll-free number to collect responses to a quick survey tout to prict who would win the upcoming Presidential election, or another Web survey to determine the lifespan of Jennifer Lopez’s latest romance. Too often, these quick-and-dirty efforts diminish respondent goodwill and harm the survey research industry.