10.08.2013 Views

Introduction to Stata 8

Introduction to Stata 8

Introduction to Stata 8

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

10.6. Combining files [U] 25<br />

Appending files [R] append<br />

To combine the information from two files with the same variables, but different persons:<br />

// c:\dokumenter\proj1\gen.filab.do<br />

use c:\dokumenter\proj1\fila.dta , clear<br />

append using c:\dokumenter\proj1\filb.dta<br />

save c:\dokumenter\proj1\filab.dta<br />

Merging files [R] merge<br />

To combine the information from two files with different information about the same persons:<br />

// c:\dokumenter\proj1\gen.filab.do<br />

use c:\dokumenter\proj1\fila.dta , clear<br />

merge lbnr using c:\dokumenter\proj1\filb.dta<br />

save c:\dokumenter\proj1\filab.dta<br />

Both files must be sorted beforehand by the matching key (lbnr in the example above), and<br />

the matching key must have the same name in both data sets. Apart from the matching key the<br />

variable names should be different. Below A and B symbolize the variable set in the input<br />

files, and numbers represent the matching key. Missing information is shown by . (period):<br />

fila filb filab _merge<br />

1A<br />

2A<br />

4A1<br />

4A2<br />

1B<br />

3B<br />

4B<br />

1AB<br />

2A.<br />

3.B<br />

4A1B<br />

4A2B<br />

3<br />

1<br />

2<br />

3<br />

3<br />

<strong>Stata</strong> creates the variable _merge which takes the value 1 if only data set 1 (fila)<br />

contributes, 2 if only data set 2 (filb) contributes, and 3 if both sets contribute. Check for<br />

mismatches by:<br />

tab1 _merge<br />

list lbnr _merge if _merge < 3<br />

For lbnr 4 there were two observations in fila, but only one in filb. The result was two<br />

observations with the information from filb assigned <strong>to</strong> both of them. This enables <strong>to</strong><br />

distribute information eg. about doc<strong>to</strong>rs <strong>to</strong> each of their patients – if that is what you desire.<br />

But what if the duplicate lbnr 4 was an error? To check for duplicate id's before merging,<br />

sort and compare with the previous observation:<br />

sort lbnr<br />

list lbnr if lbnr==lbnr[_n-1]<br />

Another way <strong>to</strong> check for and list observations with duplicate id's is:<br />

duplicates report lbnr<br />

duplicates list lbnr<br />

merge is a lot more flexible than described here; see [R] merge.<br />

25

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!