07.02.2014 Views

Introduction to Stata 8 - (GRIPS

Introduction to Stata 8 - (GRIPS

Introduction to Stata 8 - (GRIPS

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

10.6. Combining files [U] 25<br />

Appending files<br />

[R] append<br />

To combine the information from two files with the same variables, but different persons:<br />

// c:\dokumenter\proj1\gen.filab.do<br />

use c:\dokumenter\proj1\fila.dta , clear<br />

append using c:\dokumenter\proj1\filb.dta<br />

save c:\dokumenter\proj1\filab.dta<br />

Merging files<br />

[R] merge<br />

To combine the information from two files with different information about the same persons:<br />

// c:\dokumenter\proj1\gen.filab.do<br />

use c:\dokumenter\proj1\fila.dta , clear<br />

merge lbnr using c:\dokumenter\proj1\filb.dta<br />

save c:\dokumenter\proj1\filab.dta<br />

Both files must be sorted beforehand by the matching key (lbnr in the example above), and<br />

the matching key must have the same name in both data sets. Apart from the matching key the<br />

variable names should be different. Below A and B symbolize the variable set in the input<br />

files, and numbers represent the matching key. Missing information is shown by . (period):<br />

fila filb filab _merge<br />

1A<br />

2A<br />

4A1<br />

4A2<br />

1B<br />

3B<br />

4B<br />

1AB<br />

2A.<br />

3.B<br />

4A1B<br />

4A2B<br />

3<br />

1<br />

2<br />

3<br />

3<br />

<strong>Stata</strong> creates the variable _merge which takes the value 1 if only data set 1 (fila)<br />

contributes, 2 if only data set 2 (filb) contributes, and 3 if both sets contribute. Check for<br />

mismatches by:<br />

tab1 _merge<br />

list lbnr _merge if _merge < 3<br />

For lbnr 4 there were two observations in fila, but only one in filb. The result was two<br />

observations with the information from filb assigned <strong>to</strong> both of them. This enables <strong>to</strong><br />

distribute information eg. about doc<strong>to</strong>rs <strong>to</strong> each of their patients – if that is what you desire.<br />

But what if the duplicate lbnr 4 was an error? To check for duplicate id's before merging,<br />

sort and compare with the previous observation:<br />

sort lbnr<br />

list lbnr if lbnr==lbnr[_n-1]<br />

Another way <strong>to</strong> check for and list observations with duplicate id's is:<br />

duplicates report lbnr<br />

duplicates list lbnr<br />

merge is a lot more flexible than described here; see [R] merge.<br />

25

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!