13.07.2015 Views

Sweating the Small Stuff: Does data cleaning and testing ... - Frontiers

Sweating the Small Stuff: Does data cleaning and testing ... - Frontiers

Sweating the Small Stuff: Does data cleaning and testing ... - Frontiers

SHOW MORE
SHOW LESS
  • No tags were found...

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Kraha et al.Interpreting multiple regressionbe considered <strong>the</strong> most important variable (RIW = 0.241), followedby X3 (RIW = 0.045) <strong>and</strong> X2 (RIW = 0.015). The RIWsoffer an additional representation of <strong>the</strong> individual effect ofeach predictor while simultaneously considering <strong>the</strong> combinationof predictors as well (Johnson, 2000). The sum of <strong>the</strong>weights (0.241 + 0.045 + 0.015 = 0.301) is equal to R 2 . PredictorX1 can be interpreted as <strong>the</strong> most important variablerelative to o<strong>the</strong>r predictors (Johnson, 2001). The interpretationis consistent with a full DA, because both <strong>the</strong> individualpredictor contribution with <strong>the</strong> outcome variable (r X 1·Y ),<strong>and</strong> <strong>the</strong> potential multicollinearity (r X 1·X2 <strong>and</strong> r X 1·X3) witho<strong>the</strong>r predictors are accounted for. While <strong>the</strong> RIWs may differslightly compared to general dominance weights (e.g., 0.015<strong>and</strong> 0.016, respectively, for X2), <strong>the</strong> conclusions are <strong>the</strong> consistentwith those from a full DA. This method rank orders<strong>the</strong> variables with X1 as <strong>the</strong> most important, followed by X3<strong>and</strong> X2. The suppression role of X2, however, is not identifiedby this method, which helps explain its rank as third in thisprocess.DISCUSSIONPredictor variables are more commonly correlated than not inmost practical situations, leaving researchers with <strong>the</strong> necessity toaddressing such multicollinearity when <strong>the</strong>y interpret MR results.Historically, views about <strong>the</strong> impact of multicollinearity on regressionresults have ranged from challenging to highly problematic.At <strong>the</strong> extreme, avoidance of multicollinearity is sometimes evenconsidered a prerequisite assumption for conducting <strong>the</strong> analysis.These perspectives notwithst<strong>and</strong>ing, <strong>the</strong> current article haspresented a set of tools that can be employed to effectively interpret<strong>the</strong> roles various predictors have in explaining variance in acriterion variable.To be sure, traditional reliance on st<strong>and</strong>ardized or unst<strong>and</strong>ardizedweights will often lead to poor or inaccurate interpretationswhen multicollinearity or suppression is present in <strong>the</strong> <strong>data</strong>. Ifresearchers choose to rely solely on <strong>the</strong> null hypo<strong>the</strong>sis statisticalsignificance test of <strong>the</strong>se weights, <strong>the</strong>n <strong>the</strong> risk of interpretive erroris noteworthy. This is primarily because <strong>the</strong> weights are heavilyaffected by multicollinearity, as are <strong>the</strong>ir SE which directly impact<strong>the</strong> magnitude of <strong>the</strong> corresponding p values. It is this realitythat has led many to suggest great caution when predictors arecorrelated.Advances in <strong>the</strong> literature <strong>and</strong> supporting software technologyfor <strong>the</strong>ir application have made <strong>the</strong> issue of multicollinearity muchless critical. Although predictor correlation can certainly complicateinterpretation, use of <strong>the</strong> methods discussed here allowfor a much broader <strong>and</strong> more accurate underst<strong>and</strong>ing of <strong>the</strong>MR results regarding which predictors explain how much variancein <strong>the</strong> criterion, both uniquely <strong>and</strong> in unison with o<strong>the</strong>rpredictors.In <strong>data</strong> situations with a small number of predictors or very lowlevels of multicollinearity, <strong>the</strong> interpretation method used mightnot be as important as results will most often be very similar.However, when <strong>the</strong> <strong>data</strong> situation becomes more complicated (asis often <strong>the</strong> case in real-world <strong>data</strong>, or when suppression exists asexampled here), more care is needed to fully underst<strong>and</strong> <strong>the</strong> nature<strong>and</strong> role of predictors.CAUSE AND EFFECT, THEORY, AND GENERALIZATIONAlthough current methods are helpful, it is very important thatresearchers remain aware that MR is ultimately a correlationalbasedanalysis, as are all analyses in <strong>the</strong> general linear model.Therefore, variable correlations should not be construed as evidencefor cause <strong>and</strong> effect relationships. The ability to claim cause<strong>and</strong> effect are predominately issues of research design ra<strong>the</strong>r thanstatistical analysis.Researchers must also consider <strong>the</strong> critical role of <strong>the</strong>ory whentrying to make sense of <strong>the</strong>ir <strong>data</strong>. Statistics are mere tools tohelp underst<strong>and</strong> <strong>data</strong>, <strong>and</strong> <strong>the</strong> issue of predictor importance inany given model must invoke <strong>the</strong> consideration of <strong>the</strong> <strong>the</strong>oreticalexpectations about variable relationships. In different contexts<strong>and</strong> <strong>the</strong>ories, some relationships may be deemed more or lessrelevant.Finally, <strong>the</strong> pervasive impact of sampling error cannot beignored in any analytical approach. Sampling error limits <strong>the</strong> generalizabilityof our findings <strong>and</strong> can cause any of <strong>the</strong> methodsdescribed here to be more unique to our particular sample thanto future samples or <strong>the</strong> population of interest. We should notassume too easily that <strong>the</strong> predicted relationships we observe willnecessarily appear in future studies. Replication continues to be akey hallmark of good science.INTERPRETATION METHODSThe seven approaches discussed here can help researchers betterunderst<strong>and</strong> <strong>the</strong>ir MR models, but each has its own strengths <strong>and</strong>limitations. In practice, <strong>the</strong>se methods should be used to informeach o<strong>the</strong>r to yield a better representation of <strong>the</strong> <strong>data</strong>. Below wesummarize <strong>the</strong> key utility provided by each approach.Pearson r correlation coefficientPearson r is commonly employed in research. However, as illustratedin <strong>the</strong> heuristic example, r does not take into account<strong>the</strong> multicollinearity between variables <strong>and</strong> <strong>the</strong>y do not allowdetection of suppressor effects.Beta weights <strong>and</strong> structure coefficientsInterpretations of both β weights <strong>and</strong> structure coefficients providea complementary comparison of predictor contribution to <strong>the</strong>regression equation <strong>and</strong> <strong>the</strong> variance explained in <strong>the</strong> effect. Betaweights alone should not be utilized to determine <strong>the</strong> contributionpredictor variables make to a model because a variable mightbe denied predictive credit in <strong>the</strong> presence of multicollinearity.Courville <strong>and</strong> Thompson, 2001; see also Henson, 2002) advocatedfor <strong>the</strong> interpretation of (a) both β weights <strong>and</strong> structurecoefficients or (b) both β weights <strong>and</strong> correlation coefficients.When taken toge<strong>the</strong>r, β <strong>and</strong> structure coefficients can illuminate<strong>the</strong> impact of multicollinearity, reflect more clearly <strong>the</strong> ability ofpredictors to explain variance in <strong>the</strong> criterion, <strong>and</strong> identify suppressoreffects. However, <strong>the</strong>y do not necessarily provide detailedinformation about <strong>the</strong> nature of unique <strong>and</strong> commonly explainedvariance, nor about <strong>the</strong> magnitude of <strong>the</strong> suppression.All possible subsets regressionAll possible subsets regression is exploratory <strong>and</strong> comes withincreasing interpretive difficulty as predictors are added to<strong>Frontiers</strong> in Psychology | Quantitative Psychology <strong>and</strong> Measurement March 2012 | Volume 3 | Article 44 | 83

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!