13.07.2015 Views

Sweating the Small Stuff: Does data cleaning and testing ... - Frontiers

Sweating the Small Stuff: Does data cleaning and testing ... - Frontiers

Sweating the Small Stuff: Does data cleaning and testing ... - Frontiers

SHOW MORE
SHOW LESS
  • No tags were found...

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Kraha et al.Interpreting multiple regressionwould be attained if only regression weights were considered. Themethods examined here include inspection of zero-order correlationcoefficients, β weights, structure coefficients, commonalitycoefficients, all possible subsets regression, dominance weights,<strong>and</strong> relative importance weights (RIW). Taken toge<strong>the</strong>r, <strong>the</strong> variousmethods will highlight <strong>the</strong> complex relationships betweenpredictors <strong>the</strong>mselves,as well as between predictors <strong>and</strong> <strong>the</strong> dependentvariables. Analysis from <strong>the</strong>se different st<strong>and</strong>points allows<strong>the</strong> researcher to fully investigate regression results <strong>and</strong> lessen <strong>the</strong>impact of multicollinearity. We also concretely demonstrate eachmethod using <strong>data</strong> from a heuristic example <strong>and</strong> provide referenceinformation or direct syntax comm<strong>and</strong>s from a variety of statisticalsoftware packages to help make <strong>the</strong> methods accessible to readers.In some cases multicollinearity may be desirable <strong>and</strong> part ofa well-specified model, such as when multi-operationalizing aconstruct with several similar instruments. In o<strong>the</strong>r cases, particularlywith poorly specified models, multicollinearity may beso high that <strong>the</strong>re is unnecessary redundancy among predictors,such as when including both subscale <strong>and</strong> total scale variables aspredictors in <strong>the</strong> same regression. When unnecessary redundancyis present, researchers may reasonably consider deletion of one ormore predictors to reduce collinearity. When predictors are related<strong>and</strong> <strong>the</strong>oretically meaningful as part of <strong>the</strong> analysis, <strong>the</strong> currentmethods can help researchers parse <strong>the</strong> roles related predictorsplay in predicting <strong>the</strong> dependent variable. Ultimately, however,<strong>the</strong> degree of collinearity is a judgement call by <strong>the</strong> researcher, but<strong>the</strong>se methods allow researchers a broader picture of its impact.PREDICTOR INTERPRETATION TOOLSCORRELATION COEFFICIENTSOne method to evaluate a predictor’s contribution to <strong>the</strong> regressionmodel is <strong>the</strong> use of correlation coefficients such as Pearsonr, which is <strong>the</strong> zero-order bivariate linear relationship between anindependent <strong>and</strong> dependent variable. Correlation coefficients aresometimes used as validity coefficients in <strong>the</strong> context of constructmeasurement relationships (Nunnally <strong>and</strong> Bernstein, 1994). Oneadvantage of r is that it is <strong>the</strong> fundamental metric common to alltypes of correlational analyses in <strong>the</strong> general linear model (Henson,2002; Thompson, 2006; Zientek <strong>and</strong> Thompson, 2009). Forinterpretation purposes,Pearson r is often squared (r 2 ) to calculatea variance-accounted-for effect size.Although widely used <strong>and</strong> reported, r is somewhat limited inits utility for explaining MR relationships in <strong>the</strong> presence of multicollinearity.Because r is a zero-order bivariate correlation, it doesnot take into account any of <strong>the</strong> MR variable relationships exceptthat between a single predictor <strong>and</strong> <strong>the</strong> criterion variable. As such,r is an inappropriate statistic for describing regression results asit does not consider <strong>the</strong> complicated relationships between predictors<strong>the</strong>mselves <strong>and</strong> predictors <strong>and</strong> criterion (Pedhazur, 1997;Thompson, 2006). In addition, Pearson r is highly sample specific,meaning that r might change across individual studies evenwhen <strong>the</strong> population-based relationship between <strong>the</strong> predictor <strong>and</strong>criterion variables remains constant (Pedhazur, 1997).Only in <strong>the</strong> hypo<strong>the</strong>tical (<strong>and</strong> unrealistic) situation when <strong>the</strong>predictors are perfectly uncorrelated is r a reasonable representationof predictor contribution to <strong>the</strong> regression effect. This isbecause <strong>the</strong> overall R 2 is simply <strong>the</strong> sum of <strong>the</strong> squared correlationsbetween each predictor (X) <strong>and</strong> <strong>the</strong> outcome (Y ):R 2 = r Y −X1 2 + r Y −X2 2 + ...+ r Y −Xk 2, orR 2 = (r Y −X1 )(r Y −X1 ) + (r Y −X2 )(r Y −X2 ) + ...+ ( )( )r Y −Xk rY −Xk . (1)This equation works only because <strong>the</strong> predictors explain different<strong>and</strong> unique portions of <strong>the</strong> criterion variable variance. Whenpredictors are correlated <strong>and</strong> explain some of <strong>the</strong> same variance of<strong>the</strong> criterion, <strong>the</strong> sum of <strong>the</strong> squared correlations would be greaterthan 1.00, because r does not consider this multicollinearity.BETA WEIGHTSOne answer to <strong>the</strong> issue of predictors explaining some of <strong>the</strong> samevariance of <strong>the</strong> criterion is st<strong>and</strong>ardized regression (β) weights.Betas are regression weights that are applied to st<strong>and</strong>ardized (z)predictor variable scores in <strong>the</strong> linear regression equation, <strong>and</strong><strong>the</strong>y are commonly used for interpreting predictor contributionto <strong>the</strong> regression effect (Courville <strong>and</strong> Thompson, 2001). Theirutility lies squarely with <strong>the</strong>ir function in <strong>the</strong> st<strong>and</strong>ardized regressionequation, which speaks to how much credit each predictorvariable is receiving in <strong>the</strong> equation for predicting <strong>the</strong> dependentvariable, while holding all o<strong>the</strong>r independent variables constant.As such, a β weight coefficient informs us as to how much change(in st<strong>and</strong>ardized metric) in <strong>the</strong> criterion variable we might expectwith a one-unit change (in st<strong>and</strong>ardized metric) in <strong>the</strong> predictorvariable, again holding all o<strong>the</strong>r predictor variables constant (Pedhazur,1997). This interpretation of a β weight suggests that itscomputation must simultaneously take into account <strong>the</strong> predictorvariable’s relationship with <strong>the</strong> criterion as well as <strong>the</strong> predictorvariable’s relationships with all o<strong>the</strong>r predictors.When predictors are correlated, <strong>the</strong> sum of <strong>the</strong> squared bivariatecorrelations no longer yields <strong>the</strong> R 2 effect size. Instead, βs canbe used to adjust <strong>the</strong> level of correlation credit a predictor gets increating <strong>the</strong> effect:R 2 = (β 1 )(r Y −X1 ) + (β 2 )(r Y −X2 ) + ...+ (β k ) ( r Y −Xk). (2)This equation highlights <strong>the</strong> fact that β weights are notdirect measures of relationship between predictors <strong>and</strong> outcomes.Instead, <strong>the</strong>y simply reflect how much credit is being given to predictorsin <strong>the</strong> regression equation in a particular context (Courville<strong>and</strong> Thompson, 2001). The accuracy of β weights are <strong>the</strong>oreticallydependent upon having a perfectly specified model, sinceadding or removing predictor variables will inevitably change βvalues. The problem is that <strong>the</strong> true model is rarely, if ever, known(Pedhazur, 1997).Sole interpretation of β weights is troublesome for several reasons.To begin, because <strong>the</strong>y must account for all relationshipsamong all of <strong>the</strong> variables, β weights are heavily affected by <strong>the</strong>variances <strong>and</strong> covariances of <strong>the</strong> variables in question (Thompson,2006). This sensitivity to covariance (i.e., multicollinear) relationshipscan result in very sample-specific weights which c<strong>and</strong>ramatically change with slight changes in covariance relationshipsin future samples, <strong>the</strong>reby decreasing generalizability. Forexample, β weights can even change in sign as new variables areadded or as old variables are deleted (Darlington, 1968).<strong>Frontiers</strong> in Psychology | Quantitative Psychology <strong>and</strong> Measurement March 2012 | Volume 3 | Article 44 | 77

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!