Introduction to Stata 8 - (GRIPS
Introduction to Stata 8 - (GRIPS
Introduction to Stata 8 - (GRIPS
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
10.6. Combining files [U] 25<br />
Appending files<br />
[R] append<br />
To combine the information from two files with the same variables, but different persons:<br />
// c:\dokumenter\proj1\gen.filab.do<br />
use c:\dokumenter\proj1\fila.dta , clear<br />
append using c:\dokumenter\proj1\filb.dta<br />
save c:\dokumenter\proj1\filab.dta<br />
Merging files<br />
[R] merge<br />
To combine the information from two files with different information about the same persons:<br />
// c:\dokumenter\proj1\gen.filab.do<br />
use c:\dokumenter\proj1\fila.dta , clear<br />
merge lbnr using c:\dokumenter\proj1\filb.dta<br />
save c:\dokumenter\proj1\filab.dta<br />
Both files must be sorted beforehand by the matching key (lbnr in the example above), and<br />
the matching key must have the same name in both data sets. Apart from the matching key the<br />
variable names should be different. Below A and B symbolize the variable set in the input<br />
files, and numbers represent the matching key. Missing information is shown by . (period):<br />
fila filb filab _merge<br />
1A<br />
2A<br />
4A1<br />
4A2<br />
1B<br />
3B<br />
4B<br />
1AB<br />
2A.<br />
3.B<br />
4A1B<br />
4A2B<br />
3<br />
1<br />
2<br />
3<br />
3<br />
<strong>Stata</strong> creates the variable _merge which takes the value 1 if only data set 1 (fila)<br />
contributes, 2 if only data set 2 (filb) contributes, and 3 if both sets contribute. Check for<br />
mismatches by:<br />
tab1 _merge<br />
list lbnr _merge if _merge < 3<br />
For lbnr 4 there were two observations in fila, but only one in filb. The result was two<br />
observations with the information from filb assigned <strong>to</strong> both of them. This enables <strong>to</strong><br />
distribute information eg. about doc<strong>to</strong>rs <strong>to</strong> each of their patients – if that is what you desire.<br />
But what if the duplicate lbnr 4 was an error? To check for duplicate id's before merging,<br />
sort and compare with the previous observation:<br />
sort lbnr<br />
list lbnr if lbnr==lbnr[_n-1]<br />
Another way <strong>to</strong> check for and list observations with duplicate id's is:<br />
duplicates report lbnr<br />
duplicates list lbnr<br />
merge is a lot more flexible than described here; see [R] merge.<br />
25