01.02.2014 Views

Introduzione alla Statistica bayesiana - Politecnico di Milano

Introduzione alla Statistica bayesiana - Politecnico di Milano

Introduzione alla Statistica bayesiana - Politecnico di Milano

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

.<br />

.<br />

<strong>Introduzione</strong> <strong>alla</strong> <strong>Statistica</strong> <strong>bayesiana</strong><br />

Alessandra Guglielmi<br />

<strong>Politecnico</strong> <strong>di</strong> <strong>Milano</strong><br />

Dipartimento <strong>di</strong> Matematica<br />

<strong>Milano</strong>, Italia<br />

e-mail: alessandra.guglielmi@polimi.it<br />

1 ottobre 2012<br />

A. Guglielmi Bayesian Statistics 1


Bayesian learning<br />

Y = sample space = set of of possible dataset;<br />

y: a single dataset<br />

Θ = parameter space = set of all possible parameters values,<br />

from which we hope to identify the value that best represents<br />

the true population characteristics<br />

Under the Bayesian approach, there are TWO random<br />

elements (Y , θ):<br />

π(θ) prior <strong>di</strong>stribution: describes our belief that θ<br />

represents the true population characteristics<br />

p(y|θ) likelihood: describes our belief that y would be the<br />

outcome if θ is the true parameter value<br />

A. Guglielmi Bayesian Statistics 2


Bayes’ Theorem<br />

Once we obtain y, we update our beliefs about θ computing<br />

π(θ|y) posterior <strong>di</strong>stribution: describes our belief that θ is<br />

the true value, having observed y<br />

The posterior <strong>di</strong>stribution is computed via Bayes’ Theorem:<br />

π(θ|y) =<br />

p(y|θ)π(θ)<br />

∫<br />

Θ p(y|θ)π(θ)dθ<br />

Ricordatevi il Teo <strong>di</strong> Bayes per una partizione finita<br />

R Example: binomial_beta.R<br />

A. Guglielmi Bayesian Statistics 3


Bayes’ Theorem<br />

It is a typical scientific approach:<br />

the prior belief is UPDATED via observed data and yields<br />

posterior <strong>di</strong>stribution<br />

it suggests that scientific inference is based on 2 parts:<br />

one depends on the scientist’s subjective opinion and<br />

understan<strong>di</strong>ng of the phenomenon under study BEFORE<br />

an EXPERIMENT was performed, the other depends on<br />

the observed data the scientist has obtained from the<br />

experiment.<br />

A. Guglielmi Bayesian Statistics 4


Stima dei parametri<br />

Viene fatta sintetizzando la <strong>di</strong>stribuzione finale π(θ|y 1 , . . . , y n )<br />

attraverso, per esempio,<br />

me<strong>di</strong>a a posteriori E[θ|y 1 , . . . , y n ]<br />

varianza a posteriori Var[θ|y 1 , . . . , y n ]<br />

stime interv<strong>alla</strong>ri C : P(θ ∈ C|y 1 , . . . , y n ) ≥ 0.95<br />

Meto<strong>di</strong> computazionali per ricavare queste stime, soprattutto<br />

per simulare d<strong>alla</strong> <strong>di</strong>stribuzione finale<br />

MCMC: Markov chain Monte Carlo methods<br />

IDEA: se NON è possibile simulare v.a. iid d<strong>alla</strong> <strong>di</strong>stribuzione<br />

finale, si costruisce una catena markoviana la cui <strong>di</strong>stribuzione<br />

limite è la <strong>di</strong>stribuzione finale, applicando il Teorema Ergo<strong>di</strong>co<br />

per approssimare integrali della posterior.<br />

A. Guglielmi Bayesian Statistics 5


Previsione <strong>di</strong> nuove osservazioni<br />

Observe n tosses of a coin: (y 1 , . . . , y n ) (y i = 1 if the coin was<br />

H at the i-th toss)<br />

What is the probability that the next toss will be H?<br />

Bayesian pre<strong>di</strong>ction: P(Y n+1 = 1|Y 1 = y 1 , . . . , Y n = y n )<br />

Posterior pre<strong>di</strong>ctive <strong>di</strong>stribution: L(Y new |Y = y)<br />

A. Guglielmi Bayesian Statistics 6


Pre<strong>di</strong>ction<br />

What is the probability that the next toss will be H?<br />

Likelihood: θ ∑ y i<br />

(1 − θ) n−∑ y i<br />

; prior π(θ); posterior π(θ|y)<br />

∫<br />

P(Y n+1 = 1|Y 1 = y 1 , . . . , Y n = y n ) =<br />

∫<br />

=<br />

(0,1)<br />

(0,1)<br />

θπ(θ|y)dθ = E(θ|y) = α + ∑ y i<br />

α + β + n<br />

Ex: n = 10, ∑ y i = 3, α = β = 1:<br />

P(Y n+1 = 1|θ)π(θ|y)dθ<br />

P(Y 11 = 1|Y 1 = y 1 , . . . , Y n = y n ) = E(θ|y) = 4 12 = 1 3 ≠ 1 2 = E(θ)<br />

A. Guglielmi Bayesian Statistics 7


Bayesian vs non-Bayesian approach<br />

FREQUENTIST approach<br />

parameters are fixed at their true but unknown value<br />

“objective” notion of probability<br />

good large sample properties<br />

estimation: maximizing the likelihood<br />

confidence intervals: <strong>di</strong>fficult to interpret<br />

no symmetry in testing hypotheses H 0 and H 1 , <strong>di</strong>fficult<br />

interpretation of p-values<br />

BAYESIAN approach<br />

parameters are r.v.s with <strong>di</strong>stributions attached to them<br />

subjective notion of probability (prior) combined with data<br />

does not require large sample approximations (inference is<br />

exact for any n)<br />

estimation: via summary statistics of the posterior<br />

<strong>di</strong>stributions; their computation via simulation based<br />

approach (MCMC)<br />

cre<strong>di</strong>ble intervals: NO problems in interpreting them<br />

H 0 and H 1 are symmetric<br />

A. Guglielmi Bayesian Statistics 8


Bayesian Hierarchical Models<br />

Multilevel data: results of a test for students in a population of<br />

schools in US<br />

patients within several hospitals<br />

people (or items) within provinces within regions within<br />

countries<br />

Two levels:<br />

groups<br />

units within groups<br />

y ij is the data of the i-th unit in group j, i = 1, . . . , n j ,<br />

j = 1, . . . , J<br />

(Y 1 , Y 2 , . . . , Y J ) Y j = (Y 1,j , . . . , Y nj ,j)<br />

A. Guglielmi Bayesian Statistics 9


Bayesian Gaussian hierarchical model<br />

Y 1,j , . . . , Y nj ,j|ϕ j<br />

iid ∼ N(ϕj , σ 2 )<br />

within-group model<br />

ϕ 1 , . . . , ϕ J |(µ, τ 2 ) iid ∼ N(µ, τ 2 )<br />

(µ, τ 2 ) ∼ π<br />

between-group model<br />

population of groups/group-parameters: pre<strong>di</strong>ction on a student<br />

coming from a new school, selected at random from the<br />

population of groups<br />

group-specific parameters<br />

ϕ 1 , . . . , ϕ J are NOT independent, since we want to share<br />

information between the groups; the dependency is mild<br />

(exchangeability)<br />

A. Guglielmi Bayesian Statistics 10


Bayesian Gaussian hierarchical model<br />

The prior is completed assuming:<br />

1<br />

σ 2 ∼ gamma(ν 0<br />

2 , ν 0σ0<br />

2<br />

2 ) σ2 within-group variance<br />

1<br />

τ 2 ∼ gamma(η 0<br />

2 , η 0˜σ<br />

0<br />

2<br />

2 ) τ 2 between-group variance<br />

µ ∼ N(µ 0 , γ0 2 )<br />

R Example: Bayesian_hierarchical.R<br />

A. Guglielmi Bayesian Statistics 11


Borrowing strength<br />

E<br />

(<br />

ϕ j |ȳ j , µ, τ 2 , σ 2) =<br />

n j /σ 2<br />

n j /σ 2 + 1/τ 2 ȳj +<br />

frequentist estimator of ϕ j<br />

1/τ 2<br />

n j /σ 2 + 1/τ 2 µ<br />

prior mean of ϕ j<br />

When n j is small, i.e. group j gives little info about ϕ j :<br />

E(ϕ j | . . .) ≈ µ<br />

The Bayesian estimate is obtained borrowing strength from the<br />

other groups (through µ)<br />

When τ 2 is large (heterogeneous groups): E(ϕ j | . . .) ≈ ȳ j ; there<br />

is less shrinkage to µ, relying more on the info in group j.<br />

NO POOLING: τ 2 = +∞ ⇔ one analysis for each group<br />

COMPLETE POOLING: τ 2 = 0 ⇔ ϕ 1 = . . . = ϕ J = µ one<br />

single parameter<br />

A. Guglielmi Bayesian Statistics 12

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!