12.07.2015 Views

R dummies

R dummies

R dummies

SHOW MORE
SHOW LESS
  • No tags were found...

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

[1] 44 119 62 133 142> iris[index, ]Sepal.Length Sepal.Width Petal.Length Petal.Width Species44 5.0 3.5 1.6 0.6 setosa119 7.7 2.6 6.9 2.3 virginica62 5.9 3.0 4.2 1.5 versicolor133 6.4 2.8 5.6 2.2 virginica142 6.9 3.1 5.1 2.3 virginicaRemoving duplicate dataA very useful application of subsetting data is to find and remove duplicatevalues.R has a useful function, duplicated(), that finds duplicate values and returns alogical vector that tells you whether the specific value is a duplicate of a previousvalue. This means that for duplicated values, duplicated() returns FALSE for thefirst occurrence and TRUE for every following occurrence of that value, as in thefollowing example:> duplicated(c(1,2,1,3,1,4))[1] FALSE FALSE TRUE FALSE TRUE FALSEIf you try this on a data frame, R automatically checks the observations(meaning, it treats every row as a value). So, for example, with the data frameiris:> duplicated(iris)[1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE[10] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE....[136] FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE[145] FALSE FALSE FALSE FALSE FALSE FALSEIf you look carefully, you notice that row 143 is a duplicate (because the 143rdelement of your result has the value TRUE). You also can tell this by using thewhich() function:> which(duplicated(iris))[1] 143Now, to remove the duplicate from iris, you need to exclude this row fromyour data. Remember that there are two ways to exclude data using subsetting:

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!