06.09.2021 Views

Learning Statistics with R - A tutorial for psychology students and other beginners, 2018a

Learning Statistics with R - A tutorial for psychology students and other beginners, 2018a

Learning Statistics with R - A tutorial for psychology students and other beginners, 2018a

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

paste( hw, ng, sep = ".", collapse = ":::")<br />

[1] "hello.nasty:::world.government"<br />

7.8.3 Splitting strings<br />

At <strong>other</strong> times you have the opposite problem to the one in the last section: you have a whole lot of<br />

text bundled together into a single string that needs to be pulled apart <strong>and</strong> stored as several different<br />

variables. For instance, the data set that you get sent might include a single variable containing someone’s<br />

full name, <strong>and</strong> you need to separate it into first names <strong>and</strong> last names. To do this in R you can use the<br />

strsplit() function, <strong>and</strong> <strong>for</strong> the sake of argument, let’s assume that the string you want to split up is<br />

the following string:<br />

> monkey monkey.1 monkey.1<br />

[[1]]<br />

[1] "It" "was" "the" "best" "of" "times." "It" "was"<br />

[9] "the" "blurst" "of" "times."<br />

One thing to note in passing is that the output here is a list (you can tell from the [[1]] part of the<br />

output), whose first <strong>and</strong> only element is a character vector. This is useful in a lot of ways, since it means<br />

that you can input a character vector <strong>for</strong> x <strong>and</strong>thenthenhavethestrsplit() function split all of them,<br />

but it’s kind of annoying when you only have a single input. To that end, it’s useful to know that you<br />

can unlist() the output:<br />

> unlist( monkey.1 )<br />

[1] "It" "was" "the" "best" "of" "times." "It" "was"<br />

[9] "the" "blurst" "of" "times."<br />

To underst<strong>and</strong> why it’s important to remember to use the fixed = TRUE argument, suppose we wanted<br />

to split this into two separate sentences. That is, we want to use split = "." as our delimiter string. As<br />

long as we tell R to remember to treat this as a fixed separator character, then we get the right answer:<br />

> strsplit( x = monkey, split = ".", fixed = TRUE )<br />

[[1]]<br />

[1] "It was the best of times" " It was the blurst of times"<br />

- 231 -

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!