13.07.2015 Views

Contents

Contents

Contents

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

CHAPTER 11: Data Analysis and Interpretation: Part I. Describing Data, Confidence Intervals, Correlation 351Cleaning the Data We want to begin by examining the general features of thedata and edit or “clean” the data as necessary (Mosteller & Hoaglin, 1991). Wecheck carefully for errors such as missing or impossible values (e.g., numbersoutside the range of a given scale). Errors can arise because participants misusea scale (e.g., by reversing the order of importance) or because someone enteringdata into a computer skips a number or transposes a digit. When typing amanuscript, most of us rely on a “spell checker” to catch our many typos andmisspellings. Unfortunately, there is no such device for detecting numerical errorsthat are entered into a computer (however, see Kaschak & Moore, 2000, forsuggestions to reduce errors). It is up to the researcher to make sure that dataare clean prior to moving ahead.Of particular importance is the detection of anomalies and errors. As wehave seen, an anomaly sometimes signals an error in data recording, suchas would be the case if the number 8 appears among data based on respondents’use of a 7-point scale, or if an IQ score of 10 was recorded in a sampleof college student participants. Other anomalies are outliers. An outlier is anextreme number in an array; it just doesn’t seem to “go with” the main body ofdata even though it may be within the realm of possible values. When doinga reaction-time study, for instance, where we expect most responses to be lessthan 1,500 msec, we might be surprised to see a reaction time of 4,000 msec.If nearly all of the other values in a large data set are less than 1,500, a valueof 4,000 in the same data set certainly could be viewed as an outlier. Yet suchvalues are possible in reaction-time studies when participants sneeze, absentmindedlylook away from a display, or mistakenly think that data collectionhas halted and start to leave. A respondent completing a questionnaire maymisread a question and submit a response that is far more extreme than anyother response in the data set. Unfortunately, researchers do not rely on asingle definition of an outlier, and several “rules of thumb” are used (see, forexample, Zechmeister & Posavac, 2003).When anomalies appear in a data set, we must decide whether they should beexcluded from additional analyses. Those anomalies that clearly can be judgedto be errors should be corrected or dropped from the data set, but, when doingso, a researcher must report their removal from the data analysis and explain,if possible, why the anomaly occurred.In the first stage of data analysis we also want to look for ways to describethe distribution of scores meaningfully. What is the dispersion (variability)like? Are the data skewed or relatively normally distributed? One of the goalsof this first stage of analysis is to determine whether the data require transformationprior to proceeding. Transforming data is a process of “re-expression”( Hoaglin, Mosteller, & Tukey, 1983). Examples of relatively simple transformationsinclude those that express inches as feet, degrees Fahrenheit as Celsius,or number correct as percent correct. More sophisticated statistical transformationsare also sometimes useful.The best way to get a feel for a set of data is to construct a picture of it. Anadvantage of computer-aided data analysis is that we can quickly and easilyplot data using various display options (e.g., frequency polygons, histograms)and just as easily incorporate changes of scale (e.g., inches to feet) to see how

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!