30.12.2014 Views

Solutions to the practice problems. - UCLA Biostatistics

Solutions to the practice problems. - UCLA Biostatistics

Solutions to the practice problems. - UCLA Biostatistics

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Propensity scores can also be used <strong>to</strong> deal with missing data. Instead of creating a logistic model for “treatment”<br />

one creates a logistic model for “missingness” for each variable that has missing values based on <strong>the</strong><br />

o<strong>the</strong>r available variables. One can <strong>the</strong>n use <strong>the</strong> propensity scores ei<strong>the</strong>r <strong>to</strong> up weight points that are similar<br />

<strong>to</strong> <strong>the</strong> observations that had missing values or else use <strong>the</strong> propensity scores <strong>to</strong> perform imputation, using<br />

subjects with similar propensity scores <strong>to</strong> those with missing values <strong>to</strong> do <strong>the</strong> filling in.<br />

(3) Survival Analysis Example: The numbers below represent <strong>the</strong> survival times in years for two groups<br />

with a rare disease. The values with (+) are censored. Use <strong>the</strong> values <strong>to</strong> answer <strong>the</strong> following questions.<br />

(Note: (You should know how <strong>to</strong> do <strong>the</strong> calculations in this problem manually but it’s much easier in STATA<br />

or SAS. In STATA <strong>the</strong> relevant commands are under <strong>the</strong> “sts” header. You use “sts set” <strong>to</strong> tell STATA what<br />

your time and censoring variables are. You use “sts graph, by(group)” <strong>to</strong> get survival curves, separated by<br />

groups if you like. You use “sts test time group, logrank” <strong>to</strong> obtain <strong>the</strong> log rank test. I would never make<br />

<strong>the</strong> calculations this messy on <strong>the</strong> exam–I might simply give you <strong>the</strong> tables of calculations like <strong>the</strong> one on<br />

<strong>the</strong> class handout and ask you <strong>to</strong> explain how one of <strong>the</strong> values was computed....)<br />

Group 0: 1.33+, 2.21, 2.80+, 3.92. 5.44+, 9.99, 12.01, 12.07, 17.31, 28.65<br />

Group 1: 0.122+, 0.43, .64, 1.58, 2.02, 3.08, 3.62+, 4.33, 5.52+, 11.86<br />

(a) and (b) Show how <strong>to</strong> find <strong>the</strong> Kaplan-Meier estimates of <strong>the</strong> survival curves for <strong>the</strong>se two groups and<br />

sketch <strong>the</strong> first few values. Does it appear that one group does better than <strong>the</strong> o<strong>the</strong>r (The complete<br />

curves are shown in <strong>the</strong> accompanying graphics file for your reference.) Specifically, ive <strong>the</strong> estimated 3-year<br />

survival probabilities, S 1 (3) and S 2 (3) based on <strong>the</strong>se data along with <strong>the</strong> corresponding confidence intervals.<br />

Solution: To compute <strong>the</strong> Kaplan-Meier curve we need <strong>to</strong> order <strong>the</strong> observed event times (deaths, not<br />

censoring times) and count <strong>the</strong> number of people who die at those event times and who are at risk at each<br />

of those event times. If we let d i be <strong>the</strong> number who die at event time t i and Y i be <strong>the</strong> number at risk at<br />

time t i <strong>the</strong>n <strong>the</strong> K-M estima<strong>to</strong>r of <strong>the</strong> survival function is<br />

The corresponding variance is<br />

Ŝ(t) = Π ti≤t(1 − d i<br />

Y i<br />

)<br />

Ŝ 2 i (t) ∑ t i≤t<br />

d i<br />

Y i (Y i − d i )<br />

For group 0 we have 7 uncensored event times and no two people die at <strong>the</strong> same time so we have d i = 1 in all<br />

cases. To get <strong>the</strong> number at risk we need <strong>to</strong> account for <strong>the</strong> censoring–only people who have not previously<br />

died or been censored by time t i get counted as at risk for <strong>the</strong> subsequent interval. I show <strong>the</strong> first few calculations<br />

for both groups. Below we are interested in 3-year survival so we really only need <strong>the</strong> calculations<br />

that occur before t = 3. For group 0, <strong>the</strong> first event time is t 1 = 2.21. One person is censored before that<br />

time, so 9 out of 10 subjects are still in study and we have Y 1 = 9. The resulting estimate of <strong>the</strong> survival time<br />

is Ŝ(t 1) = 1−1/9 = 8/9 = .889. The corresponding estimate of <strong>the</strong> variance is (.889) 2 (1/(9∗(9−1))) = .011.<br />

The second observed death occurs at 3.92 which is already past t = 3. This means that our 3-year estimate<br />

of <strong>the</strong> survival probability for group 1 is .889. Let’s do <strong>the</strong> next calculation anyway. There is ano<strong>the</strong>r person<br />

censored in <strong>the</strong> interim so Y i = 7. Our estimated survival value is Ŝ(t 2) = (1 − 1/9)(1 − 1/7) = .762. The<br />

corresponding variance is (.762) 2 (1/72 + 1/42) = .022. To plot <strong>the</strong>se we draw a “step function” which is flat<br />

with Ŝ(t) = 1 for times prior <strong>to</strong> t 1 = 2.21 and <strong>the</strong>n drops down <strong>to</strong> .889 between t 1 = 2.21 and t 2 = 3.92 and<br />

<strong>the</strong>n <strong>to</strong> .762 until t 3 = 9.99 and so on.<br />

For group 1 <strong>the</strong>re are four events before time t = 3 and only one censoring time, which occurs before <strong>the</strong><br />

first event. Thus our numbers at risk our Y 1 = 9, Y 2 = 8, Y 3 = 7 and Y 4 = 6. The 3-year estimate of <strong>the</strong><br />

4

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!