Introduzione alla Statistica bayesiana - Politecnico di Milano
Introduzione alla Statistica bayesiana - Politecnico di Milano
Introduzione alla Statistica bayesiana - Politecnico di Milano
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
.<br />
.<br />
<strong>Introduzione</strong> <strong>alla</strong> <strong>Statistica</strong> <strong>bayesiana</strong><br />
Alessandra Guglielmi<br />
<strong>Politecnico</strong> <strong>di</strong> <strong>Milano</strong><br />
Dipartimento <strong>di</strong> Matematica<br />
<strong>Milano</strong>, Italia<br />
e-mail: alessandra.guglielmi@polimi.it<br />
1 ottobre 2012<br />
A. Guglielmi Bayesian Statistics 1
Bayesian learning<br />
Y = sample space = set of of possible dataset;<br />
y: a single dataset<br />
Θ = parameter space = set of all possible parameters values,<br />
from which we hope to identify the value that best represents<br />
the true population characteristics<br />
Under the Bayesian approach, there are TWO random<br />
elements (Y , θ):<br />
π(θ) prior <strong>di</strong>stribution: describes our belief that θ<br />
represents the true population characteristics<br />
p(y|θ) likelihood: describes our belief that y would be the<br />
outcome if θ is the true parameter value<br />
A. Guglielmi Bayesian Statistics 2
Bayes’ Theorem<br />
Once we obtain y, we update our beliefs about θ computing<br />
π(θ|y) posterior <strong>di</strong>stribution: describes our belief that θ is<br />
the true value, having observed y<br />
The posterior <strong>di</strong>stribution is computed via Bayes’ Theorem:<br />
π(θ|y) =<br />
p(y|θ)π(θ)<br />
∫<br />
Θ p(y|θ)π(θ)dθ<br />
Ricordatevi il Teo <strong>di</strong> Bayes per una partizione finita<br />
R Example: binomial_beta.R<br />
A. Guglielmi Bayesian Statistics 3
Bayes’ Theorem<br />
It is a typical scientific approach:<br />
the prior belief is UPDATED via observed data and yields<br />
posterior <strong>di</strong>stribution<br />
it suggests that scientific inference is based on 2 parts:<br />
one depends on the scientist’s subjective opinion and<br />
understan<strong>di</strong>ng of the phenomenon under study BEFORE<br />
an EXPERIMENT was performed, the other depends on<br />
the observed data the scientist has obtained from the<br />
experiment.<br />
A. Guglielmi Bayesian Statistics 4
Stima dei parametri<br />
Viene fatta sintetizzando la <strong>di</strong>stribuzione finale π(θ|y 1 , . . . , y n )<br />
attraverso, per esempio,<br />
me<strong>di</strong>a a posteriori E[θ|y 1 , . . . , y n ]<br />
varianza a posteriori Var[θ|y 1 , . . . , y n ]<br />
stime interv<strong>alla</strong>ri C : P(θ ∈ C|y 1 , . . . , y n ) ≥ 0.95<br />
Meto<strong>di</strong> computazionali per ricavare queste stime, soprattutto<br />
per simulare d<strong>alla</strong> <strong>di</strong>stribuzione finale<br />
MCMC: Markov chain Monte Carlo methods<br />
IDEA: se NON è possibile simulare v.a. iid d<strong>alla</strong> <strong>di</strong>stribuzione<br />
finale, si costruisce una catena markoviana la cui <strong>di</strong>stribuzione<br />
limite è la <strong>di</strong>stribuzione finale, applicando il Teorema Ergo<strong>di</strong>co<br />
per approssimare integrali della posterior.<br />
A. Guglielmi Bayesian Statistics 5
Previsione <strong>di</strong> nuove osservazioni<br />
Observe n tosses of a coin: (y 1 , . . . , y n ) (y i = 1 if the coin was<br />
H at the i-th toss)<br />
What is the probability that the next toss will be H?<br />
Bayesian pre<strong>di</strong>ction: P(Y n+1 = 1|Y 1 = y 1 , . . . , Y n = y n )<br />
Posterior pre<strong>di</strong>ctive <strong>di</strong>stribution: L(Y new |Y = y)<br />
A. Guglielmi Bayesian Statistics 6
Pre<strong>di</strong>ction<br />
What is the probability that the next toss will be H?<br />
Likelihood: θ ∑ y i<br />
(1 − θ) n−∑ y i<br />
; prior π(θ); posterior π(θ|y)<br />
∫<br />
P(Y n+1 = 1|Y 1 = y 1 , . . . , Y n = y n ) =<br />
∫<br />
=<br />
(0,1)<br />
(0,1)<br />
θπ(θ|y)dθ = E(θ|y) = α + ∑ y i<br />
α + β + n<br />
Ex: n = 10, ∑ y i = 3, α = β = 1:<br />
P(Y n+1 = 1|θ)π(θ|y)dθ<br />
P(Y 11 = 1|Y 1 = y 1 , . . . , Y n = y n ) = E(θ|y) = 4 12 = 1 3 ≠ 1 2 = E(θ)<br />
A. Guglielmi Bayesian Statistics 7
Bayesian vs non-Bayesian approach<br />
FREQUENTIST approach<br />
parameters are fixed at their true but unknown value<br />
“objective” notion of probability<br />
good large sample properties<br />
estimation: maximizing the likelihood<br />
confidence intervals: <strong>di</strong>fficult to interpret<br />
no symmetry in testing hypotheses H 0 and H 1 , <strong>di</strong>fficult<br />
interpretation of p-values<br />
BAYESIAN approach<br />
parameters are r.v.s with <strong>di</strong>stributions attached to them<br />
subjective notion of probability (prior) combined with data<br />
does not require large sample approximations (inference is<br />
exact for any n)<br />
estimation: via summary statistics of the posterior<br />
<strong>di</strong>stributions; their computation via simulation based<br />
approach (MCMC)<br />
cre<strong>di</strong>ble intervals: NO problems in interpreting them<br />
H 0 and H 1 are symmetric<br />
A. Guglielmi Bayesian Statistics 8
Bayesian Hierarchical Models<br />
Multilevel data: results of a test for students in a population of<br />
schools in US<br />
patients within several hospitals<br />
people (or items) within provinces within regions within<br />
countries<br />
Two levels:<br />
groups<br />
units within groups<br />
y ij is the data of the i-th unit in group j, i = 1, . . . , n j ,<br />
j = 1, . . . , J<br />
(Y 1 , Y 2 , . . . , Y J ) Y j = (Y 1,j , . . . , Y nj ,j)<br />
A. Guglielmi Bayesian Statistics 9
Bayesian Gaussian hierarchical model<br />
Y 1,j , . . . , Y nj ,j|ϕ j<br />
iid ∼ N(ϕj , σ 2 )<br />
within-group model<br />
ϕ 1 , . . . , ϕ J |(µ, τ 2 ) iid ∼ N(µ, τ 2 )<br />
(µ, τ 2 ) ∼ π<br />
between-group model<br />
population of groups/group-parameters: pre<strong>di</strong>ction on a student<br />
coming from a new school, selected at random from the<br />
population of groups<br />
group-specific parameters<br />
ϕ 1 , . . . , ϕ J are NOT independent, since we want to share<br />
information between the groups; the dependency is mild<br />
(exchangeability)<br />
A. Guglielmi Bayesian Statistics 10
Bayesian Gaussian hierarchical model<br />
The prior is completed assuming:<br />
1<br />
σ 2 ∼ gamma(ν 0<br />
2 , ν 0σ0<br />
2<br />
2 ) σ2 within-group variance<br />
1<br />
τ 2 ∼ gamma(η 0<br />
2 , η 0˜σ<br />
0<br />
2<br />
2 ) τ 2 between-group variance<br />
µ ∼ N(µ 0 , γ0 2 )<br />
R Example: Bayesian_hierarchical.R<br />
A. Guglielmi Bayesian Statistics 11
Borrowing strength<br />
E<br />
(<br />
ϕ j |ȳ j , µ, τ 2 , σ 2) =<br />
n j /σ 2<br />
n j /σ 2 + 1/τ 2 ȳj +<br />
frequentist estimator of ϕ j<br />
1/τ 2<br />
n j /σ 2 + 1/τ 2 µ<br />
prior mean of ϕ j<br />
When n j is small, i.e. group j gives little info about ϕ j :<br />
E(ϕ j | . . .) ≈ µ<br />
The Bayesian estimate is obtained borrowing strength from the<br />
other groups (through µ)<br />
When τ 2 is large (heterogeneous groups): E(ϕ j | . . .) ≈ ȳ j ; there<br />
is less shrinkage to µ, relying more on the info in group j.<br />
NO POOLING: τ 2 = +∞ ⇔ one analysis for each group<br />
COMPLETE POOLING: τ 2 = 0 ⇔ ϕ 1 = . . . = ϕ J = µ one<br />
single parameter<br />
A. Guglielmi Bayesian Statistics 12