11.07.2015 Views

statisticalrethinkin..

statisticalrethinkin..

statisticalrethinkin..

SHOW MORE
SHOW LESS
  • No tags were found...

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

HOW TO USE THIS BOOK 11than initially. You’ll see that the command line makes some things that are prohibitively difficultfor point-and-click interfaces, like recoding large data tables or running 10-thousandresampled regressions, trivial.How you should work. But I would be cruel, if I just told the reader to use a command-linetool, without also explaining something about how to do it. You do have to relearn somehabits, but it isn’t a major change. For readers who have only used menu-driven statisticssoware before, there will be some significant readjustment. But aer a few days, it willseem natural to you. For readers who have used command-driven statistics soware likeStata and SAS, there is still some readjustment ahead. I’ll explain the overall approach, first.en I’ll say why even Stata and SAS users are in for a change.First, the sane approach to scripting statistical analyses is to work back-and-forth betweentwo applications: (1) a plain text editor of your choice and (2) the R program itself. Aplain text editor is a program that creates and edits simple formatting-free text files. Commonexamples include Notepad (in Windows) and Text Edit (in Mac OS X) and Emacs (inmost *NIX distributions, including Mac OS X). You will use the plain text editor to keep arunning log of the commands you feed into the R application for processing. You absolutelydo not want to just type out commands directly into R itself. Instead, you want to eithercopy-and-paste lines of code from your plain text editor into R, or instead read entire scriptfiles directly into R. You will always of course enter commands directly into R as you exploredata or debug or merely play. But your serious work should be implemented through theplain text editor. ere are several practical reasons for this.First, editing commands in R is awkward. Inevitably, you will make typos. Editing acomplex line of code in R to find and fix a single misplaced character is a pain. In contrast,in your plain text editor, you can easily move the cursor to any position to edit, withouthaving to fight with the R command line at the same time.Second, you can see the whole picture in the text editor, because it separates input fromoutput. e text editor holds the plan of action. You want to plan your strategy and build thecommands to implement it there, free from the clutter of intermediate outputs and warningmessages and other vomitus of the R application itself. While it is possible to save a log ofcommands input into R, this hardly helps with planning.ird, you want a secure and portable record of your work. Once you solve a scriptingproblem once, you can consult your previous scripts to remember how you did it. And whena colleagues asks you how you did an analyses, you can just email them the script. e onlyfield I know of in which such a practice is actually common is economics (I believe AERenforces this policy?), but it should be the norm everyplace. e format should be portable,so that any colleague can open it on any computer system. Plain text files fit this bill. MSWord files do not. Additionally, a complex text processor like MS Word actually gets in theway, because it forces formatting and spell checking that has no effect on coding. You wanta plain text editor to show every space you type.You can add comments to your R scripts to help you plan the code and remember laterwhat the code is doing. To make a comment, just begin a line with the # symbol. To helpclarify the approach, below I provide a very short complete script for running a linear regressionon one of R’s built-in sets of data. Even if you don’t know what the code does yet,hopefully you will see it as a basic model of clarity of formatting and use of comments.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!