Statistical Considerations

biostatistics.cmu.edu.tw

Statistical Considerations

International Study of Public Health

and Air Pollution in Asia

Statistical Considerations

Hung‐Mo Lin Sc.D.

Mount Sinai School of Medicine

New York, USA


Objectives

• Share experience (Primary)

– To discuss about the important role of statisticians

– To encourage statistical scientists to pursue multidisciplinary

collaboration in public health arena

– To outlines basic statistical considerations in an

international air pollution and health effect project

• Report findings (Secondary)


The New York Times Headline

• On August 6, 2009 front page:

For Today’s Graduate, Just One Word:

Statistics


In the article

• “We are rapidly entering a world where

everything can be monitored and measured.”

• “But the big problem is going to be the ability

of humans to use, analyze and make sense of

the data.”

Exciting time to be a statistician!!


The PAPA Project

• Public Health and Air Pollution in Asia

• New partnership within the Clean Air

Initiative for Asian Cities

• Carried out by Health Effects Institute (HEI)

• Supported by government, industry, Asian

officials, and others.

• Shanghai, Hong Kong, Wuhan, and Bangkok


Air Pollution and Health

• Increased daily mortality is associated with

air pollution

– PM10 (particulate matter ≤ 10 μm in aerodynamic

diameter) effect is generally in the range of 0.2‐2%

excess deaths per 10 μg/m 3 increments

• Air pollution effects are generally similar

among North American and Western

European regions


Why Asia

• Expanding current science base to inform

Asian air regulatory decision

• Western research relevant to Asian

population; however extrapolation poses

challenges

– Population characteristics

– Population sources and mixes

– Climate

• Are observed risks similar? Greater? Smaller?


• Clean data and vigorous analytic methods

are the key to credibility

• Credibility is the key to policy making and

changes

• Need to start from a common ground before

allowing individual city deviation

– Study Design


Approach

• In order to

– provide a basis for combining estimates

– isolate important city‐specific independent factors

• A common protocol for study design, data

management and statistical analysis

• A management framework to conduct the

coordinated analysis


Structure

Data

Collection

Analysis

Method


Role of Statisticians

• Initial kick‐off meeting

• Annual meeting

• Professional meeting in Asia and USA

• For every meeting,

– A statistician is REQUIED to be present

– Discussed all the technical details

• data management, statistical methods, report writing

– Approved by the scientific oversight committee


PART II

Statistical Considerations


Overview

• Model Choice

• Model Fitting Process

• Smoothing

• Sensitivity Analysis


Wuhan Study

• To investigate whether the effect of air

pollution on daily mortality is modified

by season in Wuhan, China using data

from 2001-2004

• Wuhan has extreme cold and hot

climate - “oven city”


Mortality


PM10 Data


Time Series Data

Confounders


Estimation of PM10 Effect

Modeling Choices

Linear Model


Death =α + β PM10

Additive Model


Fully, non-, semi- parametric

Death =α + β PM10 + s(Temp)

Generalized Linear Model

(GLM)


g(Death) =α + β PM10

Generalize Additive Model

(GAM)


g(Death) =α + β PM10 + s(Temp)


Generalized Additive Model

• Widely used in environmental epidemiology in

time series studies

• Identify and represent non‐linear effects of

confounding variables, such as trend,

seasonality, and weather

• Smoothing function

• Alternative to considering polynomial terms or

searching for the appropriate transformation


Base Model (Poisson)

• Log(Death) = α + γ(Week) + s(Day) + s(Temp) +

s(Humidity) …

– Develop the best model without PM10

– Control for day of the week (and others)

– S( ) is smoothing function

I.e. Relationships between day, temperature,

humidity and log(death) can be of any forms

– Residuals of time series data should have small

partial autocorrelation


Partial Autocorrelation Function

(a) All Natural

(b) CVD

PACF

-0.10 -0.05 0.00 0.05 0.10

PACF

-0.10 -0.05 0.00 0.05 0.10

0 5 10 15 20 25 30

0 5 10 15 20 25 30

Lag

Lag

(c) Stroke

(d) CARD

PACF

-0.10 -0.05 0.00 0.05 0.10

PACF

-0.10 -0.05 0.00 0.05 0.10

0 5 10 15 20 25 30

0 5 10 15 20 25 30

Lag

Lag


PM10 Model

• Log(Death) = α + β PM10 + γ(Weekday) + s(Day) +

s(Temp) + s(Humid) …

– Assume PM10 effect is parametric (linear) to

log(Death)

– Best base model + PM10 effect

– Is linear relationship reasonable?

– Confounded by other pollutants (e.g., O3, SO2)?


Exposure Response Curve


Smoothing

• Time series, and weather related data are

confounders. We don’t really care about their

effects on mortality.

• Nonparametric techniques for fitting a

regression function in a flexible data‐defined

manner

• Appropriate when having unusual data

structure


Spline

• Smoothly joined piecewise lines

• Knots: points where lines join together

• Issue: where and how many knots?

– Too many knots too wiggly

– Too few knots too smooth


Choices of Knots

• Natural spline

– Pre‐specified

– Somewhat arbitrary

• Smoothing spline

– Every unique observation is a knot

– Tend to be too many

• Penalized spline

– Chosen by computer algorithm


Sensitivity analysis

• Model mis-specification:

−under-fitting, over-fitting, and mis-fitting

• Sensitivity analyses:

−Use different number of spline methods, knots

−6-8 DF / Year for time

− 3-4 DF / Year for temperature and humidity

• Temporal structure:

− Explore different temporal structures


Different Models


Excess Risk (%) of Mortality


15

15

15

Wuhan –Extreme Weather

(A) Non-Accidental

(B) Cardiovascular

Mean %Change

3

5

7

10

15

3

5

7

10

15

3

5

7

10

15

3

5

7

10

15

3

5

7

10

15

3

5

7

10

0 1 2 3 4

Mean %Change

-2 0 2 4 6

Low Normal High

Low Normal High

(C) Stroke

(D) Cardiac

Mean %Change

3

5

7

10

15

3

5

7

10

15

3

5

7

10

15

3

5

7

10

15

3

5

7

10

15

3

5

7

10

-2 0 2 4 6

Mean %Change

-2 0 2 4 6 8

Low Normal High

Low Normal High

(E) Respiratory

(F) Cardiopulmonary

Mean %Change

3

5

7

10

15

3

5

7

10

15

3

5

7

10

15

3

5

7

10

15

3

5

7

10

15

3

5

7

10

-6 -2 2 6

Mean %Change

0 2 4 6

Low Normal High

Low Normal High


Reference

• Environ Health Perspect 2008; 116:1172‐8.

• Environ Health Perspect 2008; 116:1195–

1202.

• Health Effect Institutes website (PAPA)

– www.healtheffects.org/international.htm#PAPA


Google

• “I keep saying the sexy job in the next 10

years will be statisticians. …The ability to take

data – to be able to understand it, to process

it, to extract value from it, to visualize it, to

communicate it – that’s going to be a hugely

important skill in the next decades.”

–Hal Varian, Google’s chief economist


Conclusion

• By sharing the experience gained in this

study, the speaker wishes to motivate more

statistical scientists to pursue multidisciplinary

and international collaboration in

public health arena.


Smoothing spline with cross-validation (solid) and

pre-specified knots (dashed)

More magazines by this user
Similar magazines