13.07.2015 Views

Sweating the Small Stuff: Does data cleaning and testing ... - Frontiers

Sweating the Small Stuff: Does data cleaning and testing ... - Frontiers

Sweating the Small Stuff: Does data cleaning and testing ... - Frontiers

SHOW MORE
SHOW LESS
  • No tags were found...

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

FinchModern methods for <strong>the</strong> detection of multivariate outliersmight well yield statistical results that continue to be influencedby <strong>the</strong> presence of outliers. Thus, o<strong>the</strong>r methods described hereshould be considered as viable options when multivariate outliersare present. In <strong>the</strong> final analysis, such an approach mustbe based on <strong>the</strong> goals of <strong>the</strong> <strong>data</strong> analysis <strong>and</strong> <strong>the</strong> study as awhole. The removal of outliers, when done, must be carriedout thoughtfully <strong>and</strong> with purpose so that <strong>the</strong> resulting <strong>data</strong>setis both representative of <strong>the</strong> population of interest <strong>and</strong> usefulwith <strong>the</strong> appropriate statistical tools to address <strong>the</strong> researchquestions.REFERENCESBrown, T. A. (2006). Confirmatory FactorAnalysis for Applied Research.New York: The Guilford Press.Donoho, D. L., <strong>and</strong> Gasko, M. (1992).Breakdown properties of <strong>the</strong> locationestimates based on halfspacedepth <strong>and</strong> projected outlyingness.Ann. Stat. 20, 1803–1827.Evans, V. P. (1999). “Strategies fordetecting outliers in regressionanalysis: an introductory primer,” inAdvances in Social Science Methodology,ed. B. Thompson (Stamford,CT: JAI Press), 271–286.Genton, M. G., <strong>and</strong> Lucas, A. (2003).Comprehensive definitions of breakdownpoints for independent <strong>and</strong>dependent observations. J. R. Stat.Soc. Ser. B Stat. Methodol. 65,81–94.Hardin, J., <strong>and</strong> Rocke, D. M. (2004).Outlier detection in <strong>the</strong> multiplecluster setting using <strong>the</strong> minimumcovariance determinant estimator.Comput. Stat. Data Anal. 44,625–638.Huberty, C. J., <strong>and</strong> Olejnik, S. (2006).Applied MANOVA <strong>and</strong> DiscriminantAnalysis. Hoboken, NJ: John Wiley &Sons, Inc.Johnson, R. A., <strong>and</strong> Wichern, D. W.(2002). Applied Multivariate StatisticalAnalysis. New York: PrenticeHall.Kaufman, L., <strong>and</strong> Rousseeuw, P.(2005). Finding Groups in Data:An Introduction to Cluster Analysis.Hoboken, NJ: John Wiley & Sons,Inc.Kirk, R. E. (1995). ExperimentalDesign: Procedures for <strong>the</strong> BehavioralSciences. Pacific Grove, CA:Brooks/Cole.Kruskal, W. (1988). Miracles <strong>and</strong> statistics:<strong>the</strong> causal assumption of independence.J. Am. Stat. Assoc. 83,929–940.Mahalanobis, P. C. (1936). On <strong>the</strong> generalizeddistance in statistics. Proc.Indian Natl. Sci. Acad. B Biol. Sci. 2,49–55.Marascuilo, L. A., <strong>and</strong> Serlin, R. C.(1988). Statistical Methods for <strong>the</strong>Social <strong>and</strong> Behavioral Sciences. NewYork: W. H. Freeman.Maronna, R., Martin, D., <strong>and</strong> Yohai, V.(2006). Robust Statistics: Theory <strong>and</strong>Methods. Hoboken, NJ: John Wiley& Sons, Inc.Mourão-Mir<strong>and</strong>a, J., Hardoon, D. R.,Hahn, T., Marqu<strong>and</strong>, A. F., Williams,S. C. R., Shawe-Taylor, J., <strong>and</strong> Brammer,M. (2011). Patient classificationas an outlier detection problem: anapplication of <strong>the</strong> one-class supportvector machine. Neuroimage 58,793–804.Osborne, J. W., <strong>and</strong> Overbay, A. (2004).The power of outliers (<strong>and</strong> whyresearchers should always check for<strong>the</strong>m). Pract. Assess. Res. Eval. 9.Available at: http://PAREonline.net/getvn.asp?v=9&n=6Pedhazur, E. J. (1997). Multiple Regressionin Behavioral Research: Explanation<strong>and</strong> Prediction. Orl<strong>and</strong>o, FL:Harcourt Brace College Publishers.R Foundation for Statistical Computing.(2010). R Software, Version2.12.1, Vienna: The R Foundation.Rousseeuw, P. J., <strong>and</strong> Leroy, A. M.(1987). Robust Regression <strong>and</strong> OutlierDetection. New York: Wiley.Rousseeuw, P. J., <strong>and</strong> van Driessen,K. (1999). A fast algorithm for<strong>the</strong> minimum covariance determinantestimator. Technometrics 41,212–223.Stevens, J. P. (2009). Applied MultivariateStatistics for <strong>the</strong> Social Sciences.Mahwah, NJ: Lawrence ErlbaumAssociates, Publishers.Tabachnick, B. G., <strong>and</strong> Fidell, L. S.(2007). Using Multivariate Statistics.Boston: Pearson Education, Inc.Tukey, J. W. (1975). “Ma<strong>the</strong>matics <strong>and</strong><strong>the</strong> picturing of <strong>data</strong>,” in Proceedingof <strong>the</strong> International Congress ofMa<strong>the</strong>meticians, Vol. 2, 523–531.Wilcox, R. R. (2005). Introduction toRobust Estimation <strong>and</strong> Hypo<strong>the</strong>sisTesting. Burlington, MA: ElsevierAcademic Press.Wilcox, R. R. (2010). Fundamentals ofModern Statistical Methods: SubstantiallyImproving Power <strong>and</strong> Accuracy.New York: Springer.Zijlstra, W. P., van der Ark, L. A., <strong>and</strong>Sijtsmal, K. (2007). Robust Mokkenscale analysis by means of a forwardsearch algorithm for outlier detection.Multivariate Behav. Res. 46,58–89.Zijlstra, W. P., van der Ark, L. A., <strong>and</strong>Sijtsmal, K. (2011). Outliers in questionnaire<strong>data</strong>: can <strong>the</strong>y be detected<strong>and</strong> should <strong>the</strong>y be removed? J. Educ.Behav. Stat. 36, 186–212.Conflict of Interest Statement: Theauthor declares that <strong>the</strong> research wasconducted in <strong>the</strong> absence of any commercialor financial relationships thatcould be construed as a potential conflictof interest.Received: 20 December 2011; paper pendingpublished: 17 January 2012; accepted:06 June 2012; published online: 05 July2012.Citation: Finch WH (2012) Distributionof variables by method of outlierdetection. Front. Psychology 3:211. doi:10.3389/fpsyg.2012.00211This article was submitted to <strong>Frontiers</strong>in Quantitative Psychology <strong>and</strong> Measurement,a specialty of <strong>Frontiers</strong> in Psychology.Copyright © 2012 Finch. This is an openaccessarticle distributed under <strong>the</strong> termsof <strong>the</strong> Creative Commons AttributionLicense, which permits use, distribution<strong>and</strong> reproduction in o<strong>the</strong>r forums, provided<strong>the</strong> original authors <strong>and</strong> sourceare credited <strong>and</strong> subject to any copyrightnotices concerning any third-partygraphics etc.www.frontiersin.org July 2012 | Volume 3 | Article 211 | 69

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!