INTRODUCTORY STATISTICS:CONCEPTS, MODELS, AND APPLICATIONSWWW Version 1.0First Published 7/15/96Revised 8/5/97Revised 2/19/98David W. Stockburger Southwest Missouri State University@Copyright 1996 by David W. StockburgerA MAYORAL FANTASYImagine, if you will, that you have just been elected mayor of a medium-sized city.You like your job; people recognize you on the street and show you the properamount of respect. You are always being asked to out lunch and dinner, etc. You wantto keep your job as long as possible.In addition to electing you mayor, the electorate voted for a new income tax at the lastelection. In an unprecedented show of support for your administration, the amount ofthe tax was left unspecified, to be decided by you (this is a fantasy!). You know thepeople of the city fairly well, however, and they would throw you out of office in aminute if you taxed them too much. If you set the tax rate too low, the effects of thisaction might not be as immediate, as it takes some time for the city and firedepartments to deteriorate, but just as certain.You have a pretty good idea of the amount of money needed to run the city. You donot, however, have more than a foggy notion of the distribution of income in yourcity. The IRS, being the IRS, refuses to cooperate. You decide to conduct a survey tofind the necessary information.A FULL-BLOWN FIASCOSince there are approximately 150,000 people in your city, you hire 150 students toconduct 1000 surveys each. It takes considerable time to hire and train the students toconduct the surveys. You decide to pay them $5.00 a survey, a considerable sumwhen the person being surveyed is a child with no income, but not much for therichest man in town who employs an army of CPAs. The bottom line is that it willcost approximately $750,000, or close to three-quarters of a million dollars to conductthis survey.After a considerable period of time has elapsed, (because it takes time to conduct thatmany surveys,) your secretary announces that the task is complete. Boxes and boxesof surveys are placed on your desk.
You begin your task of examining the figures. The first one is $33,967, the next is$13,048, the third is $309,339 etc. Now the capacity for human short-term memory isapproximately five to nine chunks (7 plus or minus 2, [Miller, 1963]). What thismeans is that by the time you are examining the tenth income, you have forgotten oneof the previous incomes, unless you put the incomes in long term memory. Placing150,000 numbers in long term memory is slightly overwhelming so you do notattempt that task. By the time you have reached the 100,000th number you have anattack of nervous exhaustion, are carried away by the men in white, and are neverseen again.ORGANIZING AND DESCRIBING THE DATA - ANALTERNATIVE ENDINGIn an alternative ending to the fantasy, you had at one time in your college careermade it through the first half of an introductory statistics course. This part of thecourse covered the DESCRIPTIVE function of statistics. That is, procedures fororganizing and describing sets of data.Basically, there are two methods of describing data: pictures and numbers. Pictures ofdata are called frequency distributions and make the task of understanding sets ofnumbers cognitively palatable. Summary numbers may also be used to describe othernumbers, and are called statistics. An understanding of what two or three of thesesummary numbers mean allows you to have a pretty good understanding of what thedistribution of numbers looks like. In any case, it is easier to deal with two or threenumbers than with 150,000.After organizing and describing the data, you make a decision about the amount of taxto implement. Everything seems to be going well until an investigative reporter fromthe local newspaper prints a story about the three-quarters of a million dollar cost ofthe survey. The irate citizens immediately start a recall petition. You resign the officein disgrace before you are thrown out.SAMPLING - AN ALTERNATIVE APPROACH AND A HAPPYENDINGIf you had only completed the last part of the statistics course in which you wereenrolled, you would have understood the basic principles of the INFERENTIALfunction of statistics. Using inferential statistics, you can take a sample from thepopulation, describe the numbers of the sample using descriptive statistics, and inferthe population distribution. Granted, there is a risk of error involved, but if the riskcan be minimized the savings in time, effort, and money is well worth the risk.In the preceding fantasy, suppose that rather than surveying the entire population, yourandomly selected 1000 people to survey. This procedure is called SAMPLING fromthe population and the individuals selected are called a SAMPLE. If each individual inthe population is equally likely to be included in the sample, the sample is called aRANDOM SAMPLE.
Now, instead of 150 student surveyors, you only need to hire 10 surveyors, who eachsurvey 100 citizens. The time taken to collect the data is a fraction of that taken tosurvey the entire population. Equally important, now the survey costs approximately$5000, an amount that the taxpayers are more likely to accept.At the completion of the data collection, the descriptive function of statistics is usedto describe the 1000 numbers, but an additional analysis must be carried out togeneralize (infer) from the sample to the population.Some reflection on your part suggests that it is possible that the sample contained1000 of the richest individuals in your city. If this were the case, then the estimate ofthe amount of income to tax would be too high. Equally possible is the situationwhere 1000 of the poorest individuals were included in the survey (the bums on skidrow), in which case the estimate would be too low. These possibilities exist throughno fault of yours or the procedure utilized. They are said to be due to chance; adistorted sample just happened to be selected.The beauty of inferential statistics is that the amount of probable error, or likelihoodof either of the above possibilities, may be specified. In this case, the possibility ofeither of the above extreme situations actually occurring is so remote that they may bedismissed. , However, the chance that there will be some error in our estimationprocedure is pretty good. Inferential statistics will allow you to specify the amount oferror with statements like, "I am 95 percent sure that the estimate will be within $200of the true value." You are willing to trade the risk of error and inexact informationbecause the savings in time, effort, and money are so great.At the conclusion of the fantasy a grateful citizenry makes you king (or queen). Youreceive a large salary increase and are elected to the position for life. You maycontinue this story any way that you like at this point ...
DOES CAFFEINE MAKE PEOPLE MOREALERT?Does the coffee I drink almost every morning really make me more alert. If all thestudents drank a cup of coffee before class, would the time spent sleeping in classdecrease? These questions may be answered using experimental methodology andhypothesis testing procedures.The last part of the text is concerned with HYPOTHESIS TESTING, or procedures tomake rational decisions about the reality of effects. The purpose of hypothesis testingis perhaps best illustrated by an example.To test the effect of caffeine on alertness in people, one experimental design woulddivide the classroom students into two groups; one group receiving coffee withcaffeine, the other coffee without caffeine. The second group gets coffee withoutcaffeine rather than nothing to drink because the effect of caffeine is the effect ofinterest, rather than the effect of ingesting liquids. The number of minutes thatstudents sleep during that class would be recorded.Suppose the group, which got coffee with caffeine, sleeps less on the average than thegroup which drank coffee without caffeine. On the basis of this evidence, theresearcher argues that caffeine had the predicted effect.A statistician, learning of the study, argues that such a conclusion is not warrantedwithout performing a hypothesis test. The reasoning for this argument goes asfollows: Suppose that caffeine really had no effect. Isn't it possible that the differencebetween the average alertness of the two groups was due to chance? That is, theindividuals who belonged to the caffeine group had gotten a better night's sleep, weremore interested in the class, etc., than the no caffeine group? If the class was dividedin a different manner the differences would disappear.The purpose of the hypothesis test is to make a rational decision between thehypotheses of real effects and chance explanations. The scientist is never able tototally eliminate the chance explanation, but may decide that the difference between
the two groups is so large that it makes the chance explanation unlikely. If this is thecase, the decision would be made that the effects are real. A hypothesis test specifieshow large the differences must be in order to make a decision that the effects are real.At the conclusion of the experiment, then, one of two decisions will be madedepending upon the size of the differences between the caffeine and no caffeinegroups. The decision will either be that caffeine has an effect, making people morealert, or that chance factors (the composition of the group) could explain the result.The purpose of the hypothesis test is to eliminate false scientific conclusions as muchas possible.