11.07.2015 Views

[U] User's Guide

[U] User's Guide

[U] User's Guide

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

156 [ U ] 13 Functions and expressions. sort patient time. by patient: generate lasttime=time[_N]. by patient: generate lastvital=vital[_N]. by patient: drop if _n!=113.8 Indicator values for levels of factor variablesStata’s factor-variable features let us access virtual indicator variables for categorical variables andtheir interactions; see [U] 11.4.3 Factor variables and [U] 25 Working with categorical data andfactor variables. We can use those virtual indicator variables in expressions just as though the virtualvariables existed in our data. If you have not read about factor-variable varlists in [U] 11.4.3 Factorvariables, do so now.If group is a categorical variable taking on the value 1, 2, or 3, consider the expression. generate group1 = 1.groupWe have taken the virtual indicator variable that is 1 when group = 1 and 0 when group ≠ 1and made it into a real variable—group1. That is strictly true only if group is never missing. Ifgroup can be missing, we need to add that 1.group is missing when group is missing.These virtual variables extend to interactions. If we also have a variable, sex, that is 0 for malesand 1 for females, then. generate sex0grp2 = 0.sex#2.groupcreates the variable sex0grp2, which is 1 when sex = 0 and group = 2, . (missing) when sex orgroup is missing, and 0 otherwise.Virtual indicator variables can be used in any expression, including if expressions.13.9 Time-series operatorsTime-series operators allow you to refer to the lag of gnp by typing L.gnp, the second lag bytyping L2.gnp, etc. There are also operators for lead (F), difference D, and seasonal difference S.Time-series operators can be used with varlists and with expressions. See [U] 11.4.4 Time-seriesvarlists if you have not read it already. This section has to do with using time-series operators inexpressions such as with generate. You do not have to create new variables; you can use thetime-series operated variables directly.13.9.1 Generating lags, leads, and differencesIn a time-series context, referring to L2.gnp is better than referring to gnp[ n-2] because theremight be missing observations. Pretend that observation 4 contains data for t = 25 and observation5 data for t = 27. L2.gnp will still produce correct answers; L2.gnp for observation 5 will be thevalue from observation 4 because the time-series operators look at t to find the relevant observation.The more mechanical gnp[ n-2] just goes 2 observations back, which, here, would not produce thedesired result.This same idea holds for differences. In our example, D.gnp will produce a missing value inobservation 5 (t = 27) because there is no data recorded for t = 26, and therefore there is no firstdifference for t = 27.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!