12.07.2015 Views

R dummies

R dummies

R dummies

SHOW MORE
SHOW LESS
  • No tags were found...

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Statisticians love it when they can link one variable to another. Sunlight, forexample, is detrimental to skirts: The longer the sun shines, the shorter skirtsbecome. We say that the number of hours of sunshine correlates with skirt length.Obviously, there isn’t really a direct causal relationship here — you won’t find shortskirts during the summer in polar regions. But, in many cases, the search for causalrelationships starts with looking at correlations.To illustrate this, let’s take a look at the famous iris dataset in R. One of thegreatest statisticians of all time, Sir Ronald Fisher, used this dataset to illustratehow multiple measurements can be used to discriminate between different species.This dataset contains five variables, as you can see by using the names() function:> names(iris)[1] “Sepal.Length” “Sepal.Width” “Petal.Length”[4] “Petal.Width” “Species”It contains measurements of flower characteristics for three species of iris andfrom 50 flowers for each species. Two variables describe the sepals (Sepal.Lengthand Sepal.Width), two other variables describe the petals (Petal.Length andPetal.Width), and the last variable (Species) is a factor indicating from whichspecies the flower comes.Looking at relationsAlthough looks can be deceiving, you want to eyeball your data before diggingdeeper into it. In Chapter 16, you create scatterplots for two variables. To plot agrid of scatterplots for all combinations of two variables in your dataset, you cansimply use the plot() function on your data frame, like this:> plot(iris[-5])Because scatterplots are useful only for continuous variables, you can dropall variables that are not continuous. Too many variables in the plot matrixmakes the plots difficult to see. In the previous code, you drop the variableSpecies, because that’s a factor.You can see the result of this simple line of code in Figure 14-4. The variablenames appear in the squares on the diagonal, indicating which variables areplotted along the x-axis and the y-axis. For example, the second plot on the third

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!