10.04.2013 Views

STATA 11 for Windows SAMPLE SESSION - Food Security Group ...

STATA 11 for Windows SAMPLE SESSION - Food Security Group ...

STATA 11 for Windows SAMPLE SESSION - Food Security Group ...

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Stata <strong>11</strong> Sample Session Section 2 – Restructuring Data Files – Table Lookup & Aggregation<br />

Rename any key variables in both<br />

files to the same name<br />

each case in the production file (c-q4.dta), we need to<br />

look up the product and unit in the conver.dta file. We<br />

will merge the in<strong>for</strong>mation from this file into the file in<br />

memory (the production file). The variable with the<br />

conversion factor will then be available to calculate the<br />

total kgs produced. In Stata we want to use the “joinby”<br />

command <strong>for</strong> this merge. It can be found through the<br />

menus with the following choice:<br />

Data<br />

Combine datasets<br />

Form all pairwise combinations within groups.<br />

The input files <strong>for</strong> a merge must be sorted by the key<br />

variable(s) (key variables are those variables you are<br />

using to match by between the two files). Since there is a<br />

unique conversion factor <strong>for</strong> each product-unit<br />

combination, both our product variable and our unit<br />

variable are the key variables. The CONVER.DTA file is<br />

already sorted by prod and unit. We must sort the<br />

current working file that is in memory the same way,<br />

while taking account of the fact that the unit variable is<br />

named p1a and not unit. To sort the cases:<br />

1. From the Data menu select<br />

Sort<br />

Ascending data<br />

The Sort - Sort data dialog box will open.<br />

2. In the Variables: box select prod and p1a<br />

3. Click on the “copy” icon and then click on Ok.<br />

4. Switch to the do-file editor and paste the command.<br />

The Stata command is:<br />

sort prod p1a<br />

Let’s look at the two variables using the tab1 command.<br />

We can type in the Command window<br />

tab1 prod p1a<br />

There are 1,693 cases. We have many products. For the<br />

tabulation of p1a we see 2 values that have no labels (0<br />

and 1) and note that there are only 1670 cases that contain<br />

a value <strong>for</strong> p1a. There are possible data problems. We<br />

would expect to see a value in p1a <strong>for</strong> every crop that was<br />

harvested. How would you determine if there are missing<br />

data in the p1a variable? If it were possible, corrections<br />

should be made be<strong>for</strong>e proceeding further.<br />

We cannot merge the two files unless the variables that<br />

57

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!