handout - Daina Chiba

Contents 

Summer Methods Workshop: 

Presenting Statistical Results 

Daina Chiba 

d.chiba@rice.edu 

June 16, 2010 

1 Why you want to learn both Stata and R 2 

2 Stata 2 

2.1 Producing Regression Tables Efficiently . . . . . . . . . . . . . . . . . . . . . 2 

2.1.1 DOs and DON’Ts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 

2.1.2 Tables with Multiple Columns . . . . . . . . . . . . . . . . . . . . . . 2 

2.1.3 Exporting Stata Results to a L ATEX document . . . . . . . . . . . . . 5 

2.1.4 Exporting Stata Results to Word/Powerpoint documents . . . . . . . 7 

2.2 Additional Tips . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 

2.2.1 Rescaling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 

2.2.2 Putting Stars . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 

2.3 Exporting Stata Results to R . . . . . . . . . . . . . . . . . . . . . . . . . . 9 

3 R 10 

3.1 Text Editor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 

3.1.1 How to Install Tinn-R (PC users) . . . . . . . . . . . . . . . . . . . . 10 

3.1.2 How to Use Tinn-R (PC users) . . . . . . . . . . . . . . . . . . . . . 10 

3.1.3 How to install Aquamacs (Mac users) . . . . . . . . . . . . . . . . . . 12 

3.1.4 How to use Aquamacs (Mac users) . . . . . . . . . . . . . . . . . . . 12 

3.2 Calculating Substantive Effects (a.k.a., CLARIFY “by hand”) . . . . . . . . 13 

3.3 Plotting Substantive Effects . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 

3.4 Plotting Substantive Effects: When X is a Discrete Variable... . . . . . . . . 17 

3.5 Descriptive Statistics: Plotting (Im)Balance . . . . . . . . . . . . . . . . . . 18 

3.6 Plotting Regression Coefficients . . . . . . . . . . . . . . . . . . . . . . . . . 19 

1

1 Why you want to learn both Stata and R 

• The field of Political Science is making a gradual transition from Stata to R. 

• It may be the case that your co-author can use only one of them. 

• Stata’s advantage 

– Fast & easy 

– Some methods are available only in Stata (e.g., selection models) 

• R’s advantage 

2 Stata 

– Flexible graphic devices 

– Able to handle multiple data sets simultaneously. 

– Some statistical models are available only in R (e.g., IRT models, Bayesian) 

2.1 Producing Regression Tables Efficiently 

Although some professional journals encourage the authors to use figures instead of tables in 

reporting statistical results, some people still prefer old-fashioned tables to figures (and they 

can be your co-author!). So, let us first see how to make a regression table of publicationquality 

efficiently. 

The table on the next page is a screenshot of a regression table from Russett & Oneal 

(2005, 299). 1 We will see how to automate the process of making a regression table like this 

on Stata. 

2.1.1 DOs and DON’Ts 

• Don’t copy and paste from Stata’s Results Window (or a log file) to Word or Excel. 

• Automatize the process as much as possible to avoid human error and save time. 

2.1.2 Tables with Multiple Columns 

If you want to produce a regression table with multiple column entries, use esttab package. 

Try running the following chunk of the do-file (in 1_StataPart.do) 

1 The paper itself is included in the workshop packet. Find /resources/ONealRussett 2005.pdf 

2

Stata code 

use "output/smw06162010dataL.dta", clear 

global covariates "smldem lrgdem smldep allies lncaprat dircont lndstab majpower" 

logit mzmid1 $covariates systsize midpy*, cluster(dyadid) 

estimates store allmids 

logit mzfatald1 $covariates systsize fatalpy*, cluster(dyadid) 

estimates store fatalmids 

logit mzcowwar1 $covariates systsize warpy*, cluster(dyadid) 

estimates store wars 

Now, we have run three models shown in TABLE 1, and stored each of the results in 

memory. Let’s display the three models simultaneously. To do this, run the following 

Stata code 

esttab allmids fatalmids wars, /// 

b(%10.3f) se scalars("ll Log lik." "chi2 Chi-squared") /// 

label mtitles keep($covariates _cons) 

The first line tells Stata to show the three models we estimated above (allmids, fatalmids, 

wars), the second line declares that we want beta coefficients rounded up to 3 decimal points 

b (%10.3f), standard errors, se, and some other scalar values of interests (N, the maximized 

value of the likelihood function, and χ 2 ). 

3

You should be seeing a table like this on your Stata window. 

Compare the table in the original paper and that on your Stata window. Can you find a 

mistake that Oneal and Russett made? 

I suspect that O & R copied and pasted the results from Stata window to a Word 

document, and somewhere in the process, they accidentally flipped the sign of one coefficient. 

If a mistake is this easy to find, maybe either one of the authors and/or publisher and/or 

editors and/or reviewers can find it (although in this case nobody found it). But, mistakes 

you’ll make maybe less conspicuous and more consequential. So, again, don’t use copy & 

paste! 

4

2.1.3 Exporting Stata Results to a L ATEX document 

If you use L ATEX, incorporating Stata results is very easy, and the process can be almost 

completely automatized. First, run the following 

Stata code 

esttab allmids fatalmids wars using "handout/ro2005tbl.tex", tex replace /// 

b(%10.3f) se scalars("ll Log lik." "chi2 Chi-squared") /// 

label mtitles keep($covariates _cons) /// 

title("Models of the onset of militarized interstate disputes, fatal disputes, and war, 1885--2001") 

Running this chunk of code will create (replace) a text file ro2005tbl.tex under the handout 

folder. To insert the created table into your paper, add the following line to the L ATEX source 

file: 

\input{ro2005tbl.tex} 

Then you will have the following table (next page). 

Every time you make some changes to your statistical model and re-estimate models, 

Stata and L ATEX will create a publication-quality table for you automatically. Days of 

copying and pasting are over. 

5

Table 1: Models of the onset of militarized interstate disputes, fatal disputes, and war, 1885–2001 

(1) (2) (3) 

allmids fatalmids wars 

Lower democracy -0.069 ∗∗∗ -0.096 ∗∗∗ -0.162 ∗∗∗ 

(0.008) (0.017) (0.030) 

Higher democracy 0.038 ∗∗∗ 0.038 ∗∗∗ 0.043 

(0.007) (0.011) (0.024) 

Lower trade-to-GDP ratio -32.262 ∗∗∗ -95.833 ∗∗∗ -45.763 

(8.987) (25.895) (26.878) 

Allies 0.075 -0.199 -0.562 

(0.101) (0.178) (0.346) 

Capability ratio (log) -0.284 ∗∗∗ -0.410 ∗∗∗ -0.754 ∗∗∗ 

(0.030) (0.049) (0.096) 

Contiguous 1.127 ∗∗∗ 1.151 ∗∗∗ 0.982 ∗∗ 

(0.152) (0.249) (0.357) 

Distance (log) -0.289 ∗∗∗ -0.466 ∗∗∗ -0.365 ∗ 

(0.053) (0.081) (0.161) 

Major power 1.007 ∗∗∗ 1.118 ∗∗∗ 2.240 ∗∗∗ 

(0.133) (0.227) (0.398) 

Constant -0.489 -0.643 -3.176 ∗ 

(0.406) (0.618) (1.267) 

Observations 464953 464692 464953 

Log lik. -8118.156 -2926.785 -757.367 

Chi-squared 3861.950 1345.308 472.660 

Standard errors in parentheses 

∗ p < 0.05, ∗∗ p < 0.01, ∗∗∗ p < 0.001 

6

2.1.4 Exporting Stata Results to Word/Powerpoint documents 

Microsoft Word is NOT a good software for scientific documentation in the sense that you 

have to do a significant part of the otherwise automatized process “by hand,” which not 

only enhances the danger of making errors but also requires laborious copying, pasting, and 

editing. So, I strongly encourage you to learn how to use L ATEX. One of the first-step would 

be to play with the source file of this handout (handout/handout.tex). You should be able 

to create a pdf from the source file. 

But, again, your may co-author with your superior, who would not want to learn L ATEX and 

simply give you an order to create a fancy table on Word or PowerPoint. Here is a way to 

minimize the job of copying and pasting and risk of making errors in the process. 

First, run the following 

Stata code 

estout allmids fatalmids wars, /// 

cells(b(star fmt(%9.3f)) se(par fmt(%9.3f))) style(fixed) /// 

stats(N, fmt(a1 %9.3f %9.3f)), using "handout/ro2005tbl.txt", replace 

Running this chunk of code will create (replace) a text file ro2005tbl.txt under the handout 

folder. 

1. Open Microsoft Excel. 

2. Go to “File” and choose “Open” (or, hit Control + O on PC or Command + O on Mac) 

3. Find ro2005tbl.txt. You may have to change the “Files of type” option. 

4. Specify the numerical columns as “Text” not “General” (Otherwise, Excel will recognize 

the numbers in parentheses as negative.) 

5. Format as you like, and copy and paste it into a Word or PowerPoint document. 

7

2.2 Additional Tips 

2.2.1 Rescaling 

If one of your coefficients exceeds 10 in absolute values, consider rescaling the variable. If it 

exceeds 1, 000, you simply must rescale it. Coefficients greater than 1, 000, 000 are not only 

ugly but also misleading. 

There are several ways to rescale a variable, none of which changes the statistical results 

substantively. 

• Multiply the variable with some number (a natural choice would be 10 x with x = 

{−3, −2, −1, 1, 2, 3}, for example). 

• Standardize the variable (subtract its mean and then divide it with its standard deviation). 

After standardization, the variable has 0 mean and a unit variance. 

You can apply either one of the above to some or all of the variables included in your 

regression. 

Table 2: Models with rescaled Trade variables 

(1) (2) (3) (4) 

original multiply standardizeOne standardizeALL 

Lower trade-to-GDP ratio -32.262 ∗∗∗ -0.323 ∗∗∗ -0.094 ∗∗∗ -0.094 ∗∗∗ 

(-3.59) (-3.59) (-3.59) (-3.59) 

Lower democracy -0.069 ∗∗∗ -0.069 ∗∗∗ -0.069 ∗∗∗ -0.403 ∗∗∗ 

(-8.55) (-8.55) (-8.55) (-8.55) 

Higher democracy 0.038 ∗∗∗ 0.038 ∗∗∗ 0.038 ∗∗∗ 0.256 ∗∗∗ 

(5.69) (5.69) (5.69) (5.69) 

Allies 0.075 0.075 0.075 0.018 

(0.75) (0.75) (0.75) (0.75) 

Observations 464953 464953 464953 464953 

Log lik. -8118.156 -8118.156 -8118.156 -8118.156 

Chi-squared 3861.950 3861.950 3861.950 3861.950 

t statistics in parentheses 

∗ p < 0.05, ∗∗ p < 0.01, ∗∗∗ p < 0.001 

We can see in the above table that results are identical after (2) multiplication of one variable 

(3) standardization of one variable or (4) standardization of all variables. 

8

2.2.2 Putting Stars 

For the sake of convenience, I think it is good to highlight statistically significant coefficients 

in some ways. I have no objection to putting stars, but keep in mind that there are some 

people who think putting stars is a sin. Some of those people who hate stars, however, think 

that it is OK to highlight statistically significant coefficients by writing them in thick font 

(I honestly don’t know why). If, unfortunately, your reviewer or your TA expresses such a 

belief, you’d better listen to him than trying to point out the stupidity of that person. 

FYI, AJPS prohibits us from placing multiple stars to differentiate multiple confidence 

levels. 

2.3 Exporting Stata Results to R 

This will become essential when you need to estimate some fancy statistical models that are 

easier to implement in Stata, but then want to use R to make graphs. 

Run the following chunk of Stata code: 

Stata code 

insheet using "output/smw06162010dataS.txt", clear 

global covariates "smldem lrgdem smldep allies lncaprat dircont lndstab majpower" 

replace smldep = smldep*1000 

logit mzmid1 $covariates 

mat beta = e(b) 

mat vcov = e(V) 

// Export Beta and V-COV matrix 

preserve 

svmat beta, names(matcol) 

outsheet beta* in 1 using "output/sr_beta.txt", replace nolabel 

restore 

preserve 

svmat vcov, names(matcol) 

outsheet vcov* in 1/9 using "output/sr_vcov.txt", replace nolabel 

restore 

Two text files will be created (replaced) within the output folder. 

9

3 R 

3.1 Text Editor 

Why you want to use a text editor? 

• Syntax highlighting 

• Parenthesis matching 

• Let you send part or all of the R codes to R 

A powerful text editor significantly reduces careless mistakes and save your day. 

3.1.1 How to Install Tinn-R (PC users) 

If you want to install Tinn-R to a Rice computer (for which you don’t have an Administrator 

privilege): 

1. Go to the Tinn-R website and download the last runnable version without installer 

(1.15.1.7) (.zip, 2.9 Mb) at the bottom of the page. (Alternatively, you can click here 

to download it.) 

2. Unzip the downloaded zip file. 

3. Copy the Tinn-R folder to “System (C:) Program Files” 

4. Under Program Files/Tinn-R/bin, there is the executable file named “Tinn-R.exe”. 

Create a shortcut to this file and place it to some convenient location (e.g., Desktop). 

3.1.2 How to Use Tinn-R (PC users) 

1. Open R 

2. Open Tinn-R, open the accompanying R code (2_RPart.R) from Tinn-R. 

3. Arrange the two windows side by side, so that you can see both windows simultaneously 

(See the top figure on the next page.) 

4. Assign “Hot keys.” 

(a) Click R menu of Tinn-R’s, and choose “Hotkeys of R” 

(b) A small window will appear (See the bottom figure on the next page). 

(c) Choose “Send selection” from the left menu, and then click on “Active” on the 

bottom. 

(d) Assign a shortcut key of your choosing. I chose “Ctrl + j” 

(e) Click “Add” 

10

Now, as you select several lines of the R code from the right window and then hit “Ctrl 

+ j”, the selected text will be sent to R and should appear on the left window. 

You may need to set the directory: 

R code 

> setwd("2010_0616") 

11

3.1.3 How to install Aquamacs (Mac users) 

Download and install Aquamacs from http://aquamacs.org/. Plain and simple. 

3.1.4 How to use Aquamacs (Mac users) 

1. Open the accompanying R code (2_RPart.R) with Aquamacs. 

2. Open a new window by hitting “Command + N” 

3. Rearrange the two windows side by side if necessary. 

4. When the new window is active, hit “Alt + x” and then “Shift + R”. 

5. You should see a message “M-x R” on the bottom of the window. Then hit return 

twice. R will begin. 

6. Short-cut keys are already assigned. Two of them I frequently use are 

• Hit “Ctrl + C + C” (hit C twice while holding down the Ctrl key): send a 

“paragraph” to R 

• Hit “Ctrol + C + N”: send a line to R. 

12

3.2 Calculating Substantive Effects (a.k.a., CLARIFY “by hand”) 

Suppose you want to calculate substantive effects of some variable after running some fancy 

statistical model, but CLARIFY does not support the model. In this case, you have to 

implement the CLARIFY procedure “by hand.” I believe Randy has already covered how to 

do it with Stata, so I will show you how to do it with R. 

Assume further that the model you want to estimate is not readily available in R. For 

example, models such as Heckman’s probit, Sartori’s selection model, von Stein’s selection 

model, and censored duration model by Boehmke et al. can be easily estimated on Stata 

with ado file, but estimating these models in R requires a bit of programming. 2 

1. Estimate a model in Stata 

2. Save the estiamtes (coefficient vector and variance covariance matrix) and export them 

as a text file 

3. From R, import the estimates 

4. Simulate coefficients ( ˜ β) 

5. Set X of your interest at some value while holding the other variables at mean or 

median. 

6. Calculate E(Y) = F (X ˜ β), where F (·) is some link function (probit, logit, etc.) 

7. Repeat the steps 5 and 6 

As we have already completed the steps 1 & 2 in Section 2.3, we will begin with 3. 

A nice thing about using R is that we don’t really need to iterate the Steps 5 and 6 

using loop. Instead, we will use matrix to set X at many values and calculate E(Y) in one 

iteration. 

Step 3: Importing estimates from Stata 

Run the following chunk of R code (in 2_RPart.R). 

R code 

> beta vcov std.err z.score p.value reg.tbl colnames(reg.tbl) reg.tbl 

Compare the table shown in your R window and the one in your Stata window. They should 

be identical. 

2 In the future, you might want to learn how to program these models in R, but for now, let’s assume 

that we don’t have time to do it. 

13

Step 4: Simulate coefficients 

Then, we resample parameters (beta coefficients) from the sampling distribution characterized 

by the estimated beta coefficient and the variance-covariance matrix. 

Let’s draw a 1,000 set of simulated beta. 

R code 

> require(MASS) 

> nsims set.seed(12345) 

> simb dim(simb) 

> summary(simb) 

> hist(simb[1,]) 

Step 5: Set X 

Let’s say we are interested in the effect of the military capability variable (Capability ratio), 

and calculate its substantive effects along with 95% confidence intervals. We will vary the 

capability variable from its minimum to maximum value in the sample, while holding constant 

the other variables at mean or median values. 

Let’s first load the data set. 

R code 

> data data$smldep x.of.i range(x.of.i) 

It turns out x.of.i the capability variable ranges from 0 to 8.5. We split this range into 49 

intervals. 

R code 

set.x

Now, we have set X by“column binding”(cbind) the x.of.i vector and representative values 

of the other variables. 

• What is the dimension of set.x? 

• It is important that the orders of the regressors in set.x and coefficients in beta are 

matched exactly. 

Step 6: Calculate E(Y) = F (Xβ) 

Now we have 

• Profile X with 50 rows and 9 columns, and 

• Simulated ˜ β with 1000 rows and 9 columns. 

This means that we are to obtain a distribution of E(Y) for 50 different regressor profiles. 

Then, 

• What should be the dimension of E(Y)? 

• How do we calculate E(Y)? (Note: our model is logit.) 

> x.beta exp.y

3.3 Plotting Substantive Effects 

Run the R code under section 3.3. You will obtain Figure 1 below (output/figs/subeff.pdf). 

Probability of MID Onset 

0.0000 0.0010 0.0020 0.0030 

Predicted Probability 

95% Confidence Bands 

0 2 4 

Capability Ratio (log) 

6 8 

Figure 1: The effect of military capability balance on the probability of MID onset 

Insert figure caption here. Explain what this figure means substantively, how the other variables are treated 

(held at mean / median, etc.), how the confidence bands are calculated (in this case, CLARIFY-style), etc. 

Make sure that readers can understand what’s going on without reading the text. 

16

3.4 Plotting Substantive Effects: When X is a Discrete Variable... 

If the regressor of your interest takes only a limited number of discrete values (for example, 

0 or 1), then consider using histograms instead of line plots. An example of histogram 

representation of substantive effects are shown in Figure 2. To produce this figure, run the 

R code under section 3.4 and you will obtain (output/figs/subeff2.pdf) 

Frequency 

0 100 200 300 400 

Noncontiguous dyads 

Contiguous dyads 

0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 

Probability of MID Onset 

Figure 2: The effect of geographic contiguity on the probability of MID onset 

The figure illustrates the effect of contiguity on the probability that a pair of states experience militarized 

interstate disputes between one another. The black histogram located to the left shows the distribution 

of predicted probability of MID onset for dyads that do not share a geographic border, whereas the gray 

histogram to the right shows that for contiguous dyads. All the covariates other than the contiguity variable 

is held at their median (binary variables) or mean (continuous variables) values. Uncertainty estimates are 

obtained by resampling 1000 draws of β coefficients from the sampling distribution reported in Figure 4. 

17

3.5 Descriptive Statistics: Plotting (Im)Balance 

This figure illustrates the difference between treated and untreated (control) observation 

units in terms of pre-treatment covariates (and/or outcome variable). You might find this 

type of figure useful when you want to show (im)balance statistics before / after matching. 

In this particular example, the unit of observation is dyad-year, the “treatment” variable 

is a binary variable measuring whether or not members of dyad are allied or not in the 

observation year. 

Run the R code under section 3.5, and you will obtain Figure 3 (output/figs/sumstat.pdf). 

MID Onset 

Lower 

trade−to−GDP ratio 

Distance 

(log) 

0 .01 .02 .03 .04 

● 

● 

0 1 2 3 4 

| ● 

| ● 

0 2 4 6 8 10 

|● 

●| 

Lower democracy 

Capability ratio 

(log) 

Major Power 

−10 −5 0 5 10 

| ● 

0 2 4 6 8 

| ● 

|● 

0 .25 .5 .75 1 

● 

● 

Figure 3: Summary statistics 

● 

| 

Higher democracy 

Contiguity 

● 

● 

−10 −5 0 5 10 

0 .25 .5 .75 1 

● 

● 

Mean for allied dyads 

(N = 788) 

Mean for not allied dyads 

(N = 11,445) 

| Median 

25% and 75% percentile 

Each panel shows descriptive statistics of each variable. Circles show the mean, vertical ticks show the 

median, solid horizontal line segments show the lower and upper quartiles, and dotted lines span the range 

of the variables. Black circles represent the mean of the 788 dyad-years that have formal alliance ties with 

one another, and white circles represent the mean of the 11,445 dyad-years that are not allied. 

18 

● 

● 

| 

|

3.6 Plotting Regression Coefficients 

Run the R code under section 3.6, and you will obtain Figure 4 (output/figs/coefplot.pdf). 

Lower democracy 

Higher democracy 

Lower 

trade−to−GDP ratio 

Allies 

Capability ratio 

Contiguity 

Distance 

Major Power 

−1 0 1 2 3 4 

● 

● 

● 

● 

● 

● 

−1 0 1 2 3 4 

Figure 4: Estimated logit coefficients 

This figure shows the estimated coefficients from the logistic regression of MID onset (n = 12,233). Circles 

represent the point estimates, and the associated horizontal line segments show the 90% confidence intervals 

for the estimate. 

19 

● 

●

handout - Daina Chiba

Create successful ePaper yourself

Delete template?

Save as template?