11.07.2015 Views

Preface to First Edition - lib

Preface to First Edition - lib

Preface to First Edition - lib

SHOW MORE
SHOW LESS
  • No tags were found...

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

290 PRINCIPAL COMPONENT ANALYSISlongjump 0.91 0.78 0.74 0.82 1.00 0.07 0.70javelin 0.01 0.00 0.27 0.33 0.07 1.00 -0.02run800m 0.78 0.59 0.42 0.62 0.70 -0.02 1.00Examination of these numerical values confirms that most pairs of events arepositively correlated, some moderately (for example, high jump and shot) andothers relatively highly (for example, high jump and hurdles). And we see thatthe correlations involving the javelin event are all close <strong>to</strong> zero. One possibleexplanation for the latter finding is perhaps that training for the other sixevents does not help much in the javelin because it is essentially a ‘technical’event. An alternative explanation is found if we examine the scatterplot matrixin Figure 16.1 a little more closely. It is very clear in this diagram that forall events except the javelin there is an outlier, the competi<strong>to</strong>r from PapuaNew Guinea (PNG), who is much poorer than the other athletes at these sixevents and who finished last in the competition in terms of points scored. Butsurprisingly in the scatterplots involving the javelin it is this competi<strong>to</strong>r whoagain stands out but because she has the third highest value for the event.It might be sensible <strong>to</strong> look again at both the correlation matrix and thescatterplot matrix after removing the competi<strong>to</strong>r from PNG; the relevant Rcode isR> heptathlon round(cor(heptathlon[,-score]), 2)hurdles highjump shot run200m longjump javelin run800mhurdles 1.00 0.58 0.77 0.83 0.89 0.33 0.56highjump 0.58 1.00 0.46 0.39 0.66 0.35 0.15shot 0.77 0.46 1.00 0.67 0.78 0.34 0.41run200m 0.83 0.39 0.67 1.00 0.81 0.47 0.57longjump 0.89 0.66 0.78 0.81 1.00 0.29 0.52javelin 0.33 0.35 0.34 0.47 0.29 1.00 0.26run800m 0.56 0.15 0.41 0.57 0.52 0.26 1.00The correlations change quite substantially and the new scatterplot matrix inFigure 16.2 does not point us <strong>to</strong> any further extreme observations. In the remainderof this chapter we analyse the heptathlon data with the observationsof the competi<strong>to</strong>r from Papua New Guinea removed.Because the results for the seven heptathlon events are on different scales weshall extract the principal components from the correlation matrix. A principalcomponent analysis of the data can be applied using the prcomp functionwith the scale argument set <strong>to</strong> TRUE <strong>to</strong> ensure the analysis is carried out onthe correlation matrix. The result is a list containing the coefficients definingeach component (sometimes referred <strong>to</strong> as loadings), the principal componentscores, etc. The required code is (omitting the score variable)R> heptathlon_pca print(heptathlon_pca)© 2010 by Taylor and Francis Group, LLC

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!