13.07.2015 Views

Identifying Speculative Bubbles with an Infinite Hidden Markov Model

Identifying Speculative Bubbles with an Infinite Hidden Markov Model

Identifying Speculative Bubbles with an Infinite Hidden Markov Model

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

1 Introduction<strong>Bubbles</strong>, which are recognized as germs of economic <strong>an</strong>d fin<strong>an</strong>cial instability, have drawnconsiderable attention over the last several decades.specific data generating processes of bubbles has not yet been reached.Nevertheless, a general agreement onEv<strong>an</strong>s (1991), forexample, recommends a periodically collapsing explosive process for bubbles. The explosivebehavior of bubbles prevails throughout the sample period, however faces a probability ofcollapsing (to a non-zero value) when it exceeds a certain threshold. If the bubble survives,it exp<strong>an</strong>ds at a rate faster th<strong>an</strong> the previous stage. 1 In contrast, Phillips et al. (2011b,PWY hereafter) propose a locally explosive bubble process in which the explosive behavior isa temporary phenomenon. Namely, asset prices tr<strong>an</strong>sit from a unit root regime to a mildlyexplosive regime when bubbles originate <strong>an</strong>d slide back to the level before origination (<strong>with</strong>a small perturbation) upon collapsing. 2Although the aforementioned processes have beenfrequently utilized as the data generating processes when comparing different econometric testsor dating algorithms for bubbles 3 , they have not been justified by <strong>an</strong>y data series containingbubble episodes.This paper applies <strong>an</strong> infinite hidden <strong>Markov</strong> model (iHMM) to reconcile existing datagenerating processes <strong>with</strong>in a unified <strong>an</strong>d coherent Bayesi<strong>an</strong> framework. This new approach isattractive to practitioners from three perspectives.First of all, the iHMM is generic to the bubble problem, since current literature assumes thatthe data dynamics in the presence of bubbles are associated <strong>with</strong> regime ch<strong>an</strong>ges. Heuristically,Figure 1b illustrates the periodically collapsing bubble process of Ev<strong>an</strong>s (1991) while Figure 1cshows the locally explosive bubble behavior of PWY. One common feature is that they bothassume a fixed number of regimes a priori, hence the model uncertainty problem is ignored.One contribution of the iHMM is to provide much more flexibility by allowing <strong>an</strong> unknownnumber of regimes, which is treated as a parameter <strong>an</strong>d estimated endogenously. This model1 Charemza <strong>an</strong>d Deadm<strong>an</strong> (1995) propose a stochastic explosive bubble process based on the periodicallycollapsing process of Ev<strong>an</strong>s (1991).2 Shi et al. (2011) modify the locally explosive process by replacing the one-period bubble collapsing process<strong>with</strong> a stationary me<strong>an</strong>-reverting process.3 See Ev<strong>an</strong>s (1991) <strong>an</strong>d Phillips et al. (2011b), among others.2


0 20 40 60 80 100(a) The non-bubble scenariotime0 20 40 60 80 100(b) The periodically collapsing explosivebehavior of Ev<strong>an</strong>s (1991)time0 20 40 60 80 100time0 20 40 60 80 100time(c) The locally explosive behavior of PWY(d) The hybrid bubble behaviorFigure 1: Illustration of different data generating processes of bubblesincorporates linear dynamic (Figure 1a), existing nonlinear dynamics (Figure 1a-1b), <strong>an</strong>d somemuch richer dynamic <strong>with</strong> multiple <strong>an</strong>d heterogeneous bubbles (Figure 1d). 4The second contribution is that we propose <strong>an</strong> easy <strong>an</strong>d coherent dating algorithm for bubblesby estimating the iHMM in the Bayesi<strong>an</strong> framework. One of the prevailing approachesfor date stamping bubbles 5 is the <strong>Markov</strong>-switching Augmented Dickey-Fuller (MSADF) testproposed by Hall et al. (1999, HPS hereafter). The MSADF test requires to assume or testfor state dimension before estimating the model. However, to the best of our knowledge, theperform<strong>an</strong>ce of testing procedures for the state dimension of a <strong>Markov</strong>-switching model which4 Phillips et al. (2011a) argues that multiples bubbles is a inherent feature of a long-sp<strong>an</strong> economic or fin<strong>an</strong>cialprice series.5 Another prevailing approach is the sup type unit root test of Phillips et al. (2011b) <strong>an</strong>d Phillips et al.(2011a).3


involves nonstationary (especially explosive) behavior has not yet been investigated. A subjectiveor <strong>an</strong> inaccurate selection of the state dimension may cause signific<strong>an</strong>t bias in parameterestimation <strong>an</strong>d regime classification. Moreover, the boostrapping procedure embedded in theMSADF test is computationally burdensome as Psaradakis et al. (2001) pointed out <strong>an</strong>d theasymptotic correctness of such bootstrapping procedure has not yet been established <strong>an</strong>d isfar from obvious. In contrast to existing frequentists’ approaches, the Bayesi<strong>an</strong> methodologyallows us to draw inference <strong>with</strong> a small sample size. The number of regimes <strong>an</strong>d other modelparameters are estimated simult<strong>an</strong>eously using <strong>Markov</strong> Chain Monte Carlo methods. Thedating algorithm is then built on the posterior distributions of the iHMM’s parameters. Theimplementation of this algorithm is much less computational dem<strong>an</strong>ding compared <strong>with</strong> HPS.Lastly, out approach is less subjective th<strong>an</strong> the iHMM of Teh et al. (2006) <strong>an</strong>d Fox et al.(2011) by using two parallel hierarchical structures for the model parameters. Geweke <strong>an</strong>dJi<strong>an</strong>g (2011) emphasize the import<strong>an</strong>ce of the prior elicitation for regime ch<strong>an</strong>ge models. Oneprominent approach to dealing this problem is using hierarchical structures as Pesar<strong>an</strong> et al.(2006) among m<strong>an</strong>y. Simply speaking, we estimate the prior for the parameters which characterizeeach regime instead of assuming it as fixed. This methodology produces results robustto the prior choice from <strong>an</strong> empirical point of view. It is also very convenient from the computationalperspective, since regime switching may be practically infeasible <strong>with</strong> some wild prior.The hierarchical structure will shrink it to a reasonable one, hence facilitates the mixing of the<strong>Markov</strong> chain.The first application of the iHMM is to the money base, exch<strong>an</strong>ge rate <strong>an</strong>d consumer pricein Argentina from J<strong>an</strong>uary 1983 to November 1989 as in HPS. It is designed to investigate ifthere exist <strong>an</strong>y new discoveries after we extend the finite hidden <strong>Markov</strong> model to the infinitedimension. The two-regime <strong>Markov</strong> switching model of HPS (MS2 thereafter) is estimated inthe Bayesi<strong>an</strong> framework as a benchmark. On one h<strong>an</strong>d, some similarities exist between theiHMM <strong>an</strong>d the MS2. On the other h<strong>an</strong>d, we find new prominent features implied by the iHMM.First, the iHMM <strong>an</strong>d MS2 have the same results for the money base, which are resembleto the locally explosive behavior of PWY. Second, the iHMM implies that the exch<strong>an</strong>ge rate’s4


dynamic is similar to to the periodically collapsing process of Ev<strong>an</strong>s (1991), while the MS2finds no sign of bubble collapse. Lastly, the iHMM finds evidence of bubble existence inthe consumer price throughout the whole sample period, whereas the MS2 suggests that theexplosive behaviors only appear for two short periods starting in June 1985 <strong>an</strong>d July 1989,respectively.We use the predictive likelihood as the criterion for model comparison. It is based onprediction <strong>an</strong>d acts as the Ockham’s razor by automatically punishing overparametrization.The results show that the two-regime specification is as good as the iHMM for the money base.However, for the exch<strong>an</strong>ge rate <strong>an</strong>d the consumer price, the predictive likelihood stronglysupports the iHMM against the MS2. Hence, the results found by the iHMM is more crediblefrom the statistical point of view. We find the explosive money growth in June 1985 did nottrigger signific<strong>an</strong>t ch<strong>an</strong>ge of dynamics for the exch<strong>an</strong>ge rate <strong>an</strong>d consumer price. On the otherh<strong>an</strong>d, the explosive growth of the money base in July 1989 is associated <strong>with</strong> both the exch<strong>an</strong>gerate <strong>an</strong>d the consumer price switching to explosive dynamic regimes.The second application is to U.S. oil price from April 1983 to December 2010. The oilinventory is used as a proxy to the market fundamental. According to the predictive likelihoods,the iHMM fits the oil price better th<strong>an</strong> the MS2. The iHMM suggests that there exist mildbubbles in the oil price during most of the sample period <strong>with</strong> four major bubble collapsingperiods following the 1985 oil price war, the first Persi<strong>an</strong> Gulf War, the 1998 Asi<strong>an</strong> fin<strong>an</strong>cialcrisis <strong>an</strong>d the subprime mortgage crisis. On the other h<strong>an</strong>d, no explosive dynamic is discoveredin the oil inventory data.The rest of the paper is org<strong>an</strong>ized as follows. Section 2 introduces the infinite hidden<strong>Markov</strong> model. The estimation procedure, along <strong>with</strong> the dating algorithm of bubbles <strong>an</strong>dthe model comparison method, are described in Section 3. The application to the Argentinahyperinflation period <strong>an</strong>d the U.S oil price are in Section 4 <strong>an</strong>d Section 5. Section 6 concludesthe paper.5


2 <strong>Infinite</strong> <strong>Hidden</strong> <strong>Markov</strong> <strong>Model</strong>The infinite hidden <strong>Markov</strong> model is expressed asy t | s t = i, Θ, Y 1,t−1 ∼ f(y t | θ i , Y 1,t−1 ), (1)Pr(s t = i | s t−1 = j, S 1,t−2 , P, Y 1,t−1 ) = Pr(s t = i | s t−1 = j, P ) = π ji , (2)where i, j = 1, 2, · · · . y t is the data at time t <strong>an</strong>d Y 1,t−1 = (y 1 , · · · , y t−1 ). s t is the regimeindicator at time t <strong>an</strong>d S 1,t−2 = (s 1 , · · · , s t−2 ). Θ = (θ 1 , θ 2 , · · · ) is the collection of parameterθ i ’s, which characterize each regime. P is <strong>an</strong> infinite dimensional tr<strong>an</strong>sition matrix <strong>with</strong> π ji onits jth row <strong>an</strong>d ith column. 6 (1) shows that the conditional density of y t depends on s t <strong>an</strong>dthe past information Y 1,t−1 . (2) implies that the dynamic of s t only depends on s t−1 .A finite hidden <strong>Markov</strong> model (HMM) <strong>with</strong> K regimes, for inst<strong>an</strong>ce the two-regime <strong>Markov</strong>switchingmodel of HPS (K = 2), is nested in the iHMM by assuming K π ji = 1 for j∑=1, · · · , K, <strong>an</strong>d the initial regime s 1 ∈ {1, · · · , K}. In addition, the periodically collapsingprocess of Ev<strong>an</strong>s (1991) <strong>an</strong>d the locally explosive process of Phillips et al. (2011b) c<strong>an</strong> becaptured by a three-regime HMM.In order to identify exuber<strong>an</strong>t dynamics, this paper assumes (1) to have a form as of theaugmented Dickey-Fuller (ADF) test:i=1f(∆y t | θ st , Y 1,t−1 ) ∼ N(ϕ st ,0 + β st y t−1 + ϕ st, 1∆y t−1 + · · · + ∆ϕ st, qy t−q , σ 2 s t), (3)where q is the lag order <strong>an</strong>d θ st = (ϕ ′ s t, σ st ) by construction <strong>with</strong> ϕ st = (ϕ st ,0, β st , ϕ st, 1, · · · , ϕ st ,q).Notice that we model ∆y t instead of y t . The existence of explosive behaviors is determined bythe coefficient of y t−1 , namely β st . A r<strong>an</strong>dom walk process implies β st = 0 in <strong>an</strong> ADF test. Inthis paper, a positive β st shows that y t is explosive at time t.The Bayesi<strong>an</strong> approach is applied in estimation to deal <strong>with</strong> infinite dimensionality. Twoparallel hierarchical priors, one governing Θ <strong>an</strong>d the other governing P , are introduced as6 ∑By definition, π ji ≥ 0 for all j, i = 1, 2, · · · <strong>an</strong>d ∞ π ji = 1 for all j = 1, 2, · · · .i=16


follows.2.1 Prior of ΘWe assume θ i to have the regular normal gamma distribution NG(ϕ, H, χ 2 , ν 2) (see Geweke(2009)), which is conjugate to linear models. In detail,( χσi −2 ∼ G2 , ν )2<strong>an</strong>d ϕ i | σ i ∼ N ( ϕ, σ 2 i H −1) . (4)The inverse of the vari<strong>an</strong>ce σ −2iis drawn from a gamma distribution <strong>with</strong> degree of freedom ν 2<strong>an</strong>d multiplier χ 2 . Conditional on σ i, the vector of regression coefficients, ϕ i , has a multivariatenormal distribution <strong>with</strong> me<strong>an</strong> ϕ <strong>an</strong>d covari<strong>an</strong>ce matrix σ 2 i H−1 . 7Define λ = (ϕ, H, χ, ν) as the collection of the parameters in the normal gamma distribution.A common practice for a finite hidden <strong>Markov</strong> model is to assume λ as const<strong>an</strong>t. For the iHMM,however, the number of regimes c<strong>an</strong> grow <strong>with</strong> the sample size, which allows us to learn λ byusing the information across regimes. Hence, we estimate it by giving it the prior as follows:H ∼ W(A 0 , a 0 ), (5)ϕ | H ∼ N(m 0 , τ 0 H −1 ), (6)χ ∼ G( d 02 , c 0),2(7)ν ∼ Exp(ρ ν ). (8)H is a positive definite matrix <strong>an</strong>d drawn from a Wishart distribution. 8 Conditional on H, ϕhas a multivariate normal distribution <strong>with</strong> me<strong>an</strong> m 0 <strong>an</strong>d covari<strong>an</strong>ce matrix τ 0 H −1 . The priorof χ is a gamma distribution <strong>with</strong> multiplier d 0 /2 <strong>an</strong>d degree of freedom of c 0 /2. The prior ofν is <strong>an</strong> exponential distribution <strong>with</strong> parameter ρ ν .7 χ <strong>an</strong>d ν are positive scalars. The me<strong>an</strong> <strong>an</strong>d vari<strong>an</strong>ce of σ −2i are ν 2ν<strong>an</strong>d , respectively. The me<strong>an</strong> of σ 2 χ χ 2 i isχ. ϕ is a (q + 2) × 1 vector <strong>an</strong>d H is a (q + 2) × (q + 2) positive definite matrix.(ν−2) 8 The Wishart distribution is the extension of the gamma distribution to the multivariate setting. ParametersA 0 is a (q + 2) × (q + 2) positive definite matrix <strong>an</strong>d a 0 is a positive scalar. A sample from Wishart distributionW (A 0 , a 0 ) is a positive definite matrix. The prior me<strong>an</strong> of H is A 0 a 0 . The vari<strong>an</strong>ce of H ij , the ith row <strong>an</strong>d thejth column element of H, is a 0 (A 2 ij + A ii A ij ), where A ij is the ith row <strong>an</strong>d the jth column element of A 0 .7


This methodology is less subjective th<strong>an</strong> Teh et al. (2006) <strong>an</strong>d Fox et al. (2011) becausethe hierarchical structure is robust to prior elicitation. It also facilitates the mixing of the<strong>Markov</strong> chain by shrinking λ to a reasonable region so that a new regime c<strong>an</strong> be easily born ifa structural ch<strong>an</strong>ge is implied by the data.2.2 Prior of PThe infinite-dimensional tr<strong>an</strong>sition matrix P is comprised of <strong>an</strong> infinite number of infinitedimensionalrow vector π j ’s, where j = 1, 2, · · · . Each π j = (π j1 , π j2 , · · · ) represents a probabilitymeasure on the natural numbers, namely, Pr(A | π j ) = ∞ π ji 1 i∈A . By definition,∑we∑should have π ji ≥ 0 for each j <strong>an</strong>d i <strong>an</strong>d ∞ π ji = 1 for each j. The prior of P is set asi=1i=1π 0 ∼ SBP(γ), (9)π j | π 0 ∼ DP (c, (1 − ρ)π 0 + ρδ j ) . (10)π 0 is a r<strong>an</strong>dom probability measure on the natural numbers <strong>an</strong>d drawn from a stick breakingprocess (SBP). 9 It serves as a hierarchical parameter of all π j ’s. Conditional on π 0 , each π jis drawn from a Dirichlet process (DP) <strong>with</strong> concentration parameter c <strong>an</strong>d shape parameter(1 − ρ)π 0 + ρδ j . 10 The aforementioned constraints on π j ’s are automatically satisfied by thisprior.From (10), the shape parameter, (1 − ρ)π 0 + ρδ j , is <strong>an</strong> infinite discrete distribution <strong>an</strong>drepresents the me<strong>an</strong> of π j by the definition of DP. It is a convex combination of the hierarchicaldistribution π 0 <strong>an</strong>d a degenerate distribution at integer j, δ j 11 , <strong>with</strong> ρ ∈ [0, 1]. The hierarchical9 The stick breaking process generates a probability measure over natural numbers. Each number is associated<strong>with</strong> a non-zero probability. For a probability measure p ≡ (p 1 , p 2 , · · · ) ∼ SBP(γ), where γ is a positive scalarwhich controls the concentration of a r<strong>an</strong>dom probability measure, p i is the probability associated <strong>with</strong> integeri <strong>with</strong> i = 1, 2, · · · . Appendix A.2 provides detailed expl<strong>an</strong>ation of this process.10 A Dirichlet process is a distribution of discrete distributions. It has two parameters: the shape parameter<strong>an</strong>d the concentration parameter. The shape parameter is a probability measure <strong>an</strong>d controls the centre of ther<strong>an</strong>dom samples, which is <strong>an</strong>alogous to the me<strong>an</strong> of a distribution. The concentration parameter is a positivescalar <strong>an</strong>d controls the tightness of a r<strong>an</strong>dom draw, which is <strong>an</strong>alogous to the inverse of the vari<strong>an</strong>ce of adistribution. Appendix A.1 provides a detailed{discussion of this process.1 if j ∈ A11 δ j is a probability measure <strong>with</strong> δ j (A) = .0 o.w.8


distribution π 0 creates a common shape for each π j <strong>an</strong>d δ j reflects the prior belief of regimepersistence. By construction, conditional on π 0 <strong>an</strong>d ρ, the me<strong>an</strong> of the tr<strong>an</strong>sition matrix P isa convex combination of two infinite-dimensional matrices, expressed by⎡E(P | π 0 , ρ) = (1 − ρ) ·⎢⎣⎤ ⎡π 01 π 02 π 03 · · ·π 01 π 02 π 03 · · ·+ ρ ·π 01 π 02 π 03 · · · ⎥ ⎢⎦ ⎣.. . . ..⎤1 0 0 · · ·0 1 0 · · ·.0 0 1 · · · ⎥⎦.. . . ..The above conditional me<strong>an</strong> of P shows that the self-tr<strong>an</strong>sition probability is larger as ρ goescloser to 1. In the rest of the paper, ρ is referred to as the sticky coefficient. It is introduced tothe iHMM for two reasons. First, empirical evidence shows that regime persistence is a salientfeature of m<strong>an</strong>y macroeconomic <strong>an</strong>d fin<strong>an</strong>ce variables. The sticky coefficient explicitly embedsthis feature into the prior. Second, a finite hidden <strong>Markov</strong> model usually has a small number ofregimes, which guar<strong>an</strong>tees that each regime c<strong>an</strong> have a reasonable amount of data. The infinitehidden <strong>Markov</strong> model, however, may assign each data to one distinct regime. This phenomenonis called state saturation, which is obviously not interesting <strong>an</strong>d harmful to forecasting. Thesticky coefficient shrinks the over-dispersed regime allocation towards a coherent one <strong>an</strong>d henceavoids the state saturation problem.In summary, the iHMM is comprised of (1) <strong>an</strong>d (2), in which (1) takes the form of (3) forbubble detection <strong>an</strong>d estimation. (4)-(8) comprise the hierarchical prior for Θ, <strong>an</strong>d (9)-(10)comprise the hierarchical prior for P .3 Estimation, Dating Algorithm <strong>an</strong>d <strong>Model</strong> Comparison3.1 EstimationThe posterior sampling is based on a <strong>Markov</strong> Chain Monte Carlo (MCMC) method. Fox et al.(2011) show that the block sampler which approximates the iHMM <strong>with</strong> truncation is more9


efficient th<strong>an</strong> the individual sampler. 12 We approximate the iHMM by truncation <strong>with</strong> a finitebut large number of regimes. If the number of regimes is large enough, the finite approximationis equivalent to the iHMM in practice. 13Suppose L is the maximal number of regimes in the approximation, the model is as follows:( γπ 0 ∼ DirL L), · · · , γ , (11)π j | π 0 ∼ Dir ((1 − ρ)cπ 01 , ..., (1 − ρ)cπ 0i + ρc, · · · , (1 − ρ)cπ 0L ) , (12)s t | s t−1 = j ∼ π j , (13)(ϕ, H, χ, ν) ∼ G, (14)θ i ∼ NG(ϕ, H, χ 2 , ν ), (15)2∆y t | Y 1,t−1 ∼ N(ϕ st,0 + β st y t−1 + ϕ st,1∆y t−1 + · · · + ∆ϕ st,qy t−q , σ 2 s t), (16)where j = 1, 2, · · · , L <strong>an</strong>d Dir represents the Dirichlet distribution. Notice that the only approximationis (11), which is from Ishwar<strong>an</strong> <strong>an</strong>d Zarepour (2000). They show that Dir ( γL , · · · , γ )Lconverges to SBP(γ) as L → ∞. (12) is not <strong>an</strong> approximation, because a DP is equivalent toa Dirichlet distribution if its shape parameter only has support on a finite set. (11) <strong>an</strong>d (12)comprise the prior of P . The dynamic of the regime indicator (13) is the same as (2). (14)<strong>an</strong>d (15) comprise the prior of Θ, which is the same as (4) - (8) in the iHMM. The conditionaldata density (16) is the same as (1). Different L’s are tried to investigate the robustness of theblock sampler in applications.To sample from the posterior distribution, the MCMC method partitions the parameterspace into four parts: (S, I), (Θ, P, π 0 ), (ϕ, H, χ) <strong>an</strong>d ν, where S <strong>an</strong>d I are the collection ofregime indicators s t ’s <strong>an</strong>d binary auxiliary variables I t ’s respectively. 14 Each part is r<strong>an</strong>domlysampled conditional on the other parts <strong>an</strong>d the data Y = (y 1 , · · · , y T ). The sampling algorithmsare as follows (see Appendix B for more details):12 Consistency proof of the approximation c<strong>an</strong> be found in Ishwar<strong>an</strong> <strong>an</strong>d Zarepour (2000), Ishwar<strong>an</strong> <strong>an</strong>dZarepour (2002). Ishwar<strong>an</strong> <strong>an</strong>d James (2001) compare the individual sampler <strong>with</strong> the block sampler <strong>an</strong>d findthat the later one is more efficient in terms of mixing.13 For example, the approximation is exact if the number of regimes equals to the number of data T .14 I t is <strong>an</strong> auxiliary variable to sample π 0 . Details are in the appendix B.10


1. Sample (S, I) | Θ, P, Y(a) Sample S | Θ, P, Y by the forward filtering <strong>an</strong>d backward sampling method of Chib(1996).(b) Sample I | S by a Polya Urn scheme.2. Sample (Θ, P, π 0 ) | S, I, Y(a) Sample π 0 | I from a Dirichlet distribution.(b) Sample P | π 0 , S from Dirichlet distributions.(c) Sample Θ | S, Y by the regular linear model results.3. Sample (ϕ, H, χ) | S, Θ, ν(a) Sample (ϕ, H) | S, Θ by conjugacy of the Normal-Wishart distribution.(b) Sample χ | ν, S, Θ from a gamma distribution.4. Sample ν | χ, S, Θ by a Metropolis-Hastings algorithm.After initiating the parameter values, the algorithm is applied iteratively to obtain a largenumber of samples of the model parameters. We discard the first block of the samples to removedependence on the initial values. The rest of the samples, {S (i) , Θ (i) , P (i) , π (i)0 , ϕ(i) , H (i) , χ (i) , ν (i) } N i=1 ,are used for inferences as if they were drawn from their posterior distributions. Simulation consistentposterior statistics are computed as sample averages. For example, the posterior me<strong>an</strong>of ϕ is calculated by 1 N∑ Ni=1 ϕ(i) . In order to avoid the label-switching problem in mixturemodels 15 , we use label-invari<strong>an</strong>t statistics suggested by Geweke (2007) so that the posteriorsampling algorithm c<strong>an</strong> be implemented <strong>with</strong>out modification.15 For example, switching the values of (θ j , π j ) <strong>an</strong>d (θ k , π k ), swapping the values of state s t for s t = j, k whilekeeping the other parameters unch<strong>an</strong>ged results in the same likelihood.11


3.2 Dating Algorithm of <strong>Bubbles</strong>The explosive behavior of bubbles depends on the estimates of (16). The autoregressive coefficientof y t−1 (i.e. β st ) should be positive in the presence of explosive bubbles <strong>an</strong>d negativewhen the data is locally stationary me<strong>an</strong>-reverting.The Bayesi<strong>an</strong> approach produces the whole posterior distribution of β st instead of a pointestimate, so one c<strong>an</strong> make decisions based on his/her specific loss function. This paper considerstwo intuitive posterior statistics, which are derived from two simple loss functions, to helpidentify bubbles. One is the posterior probability P (β st > 0 | Y ) <strong>an</strong>d the other is the posteriorme<strong>an</strong> E (β st | Y ).For the first statistic, we claim that a bubble exists at time t if the posterior probability ofβ st > 0 is above 0.5 <strong>an</strong>d there is no bubble otherwise. More concisely,bubble exists in period t if P (β st > 0 | Y ) > 0.5 .It is easy to see that this criterion is derived from <strong>an</strong> absolute value loss function. The cutoffvalue c<strong>an</strong> be different from 0.5 if the loss function is asymmetric.The second statistic is based on a quadratic loss function. We claim that a bubble exists ifthe posterior me<strong>an</strong> of β stis above zero <strong>an</strong>d there is no bubble otherwise. More concisely,bubble exists in period t if E (β st | Y ) > 0 .This statistic also shows the magnitude of explosiveness. The higher the value is, the fasterthe bubble exp<strong>an</strong>ses.3.3 <strong>Model</strong> ComparisonWe use the predictive likelihood for model comparison as suggested by Geweke <strong>an</strong>d Amis<strong>an</strong>o(2010). Conditional on <strong>an</strong> initial data set Y 1,t , the predictive likelihood of Y t+1,T = (y t+1 , · · · , y T )12


4 Empirical Application: Argentina HyperinflationIn this section, we apply the iHMM approach to the money base, exch<strong>an</strong>ge rate <strong>an</strong>d consumerprice in Argentina from J<strong>an</strong>uary 1983 to November 1989. The money base is used as a proxyfor market fundamental <strong>an</strong>d the exch<strong>an</strong>ge rate data series is to capture fundamentally determinedbubble-like behavior. The purpose is to investigate whether there is evidence of bubblebehaviors in the consumer price.These three data series are also examined in HPS <strong>an</strong>d Shi (2010). Both HPS <strong>an</strong>d Shi (2010)conduct a two-regime <strong>Markov</strong>-switching ADF (MSADF) test (<strong>with</strong> different specifications inthe error vari<strong>an</strong>ce) on these three data series <strong>an</strong>d conclude no evidence of bubbles in theconsumer price. The two-regime <strong>Markov</strong>-switching (MS2) models of HPS <strong>an</strong>d Shi (2010) areboth estimated by MLE.As a benchmark, we estimate a MS2 model using the Bayesi<strong>an</strong> approach, which is 16∆y t | s t = j, Y 1,t−1 ∼ N(ϕ j0 + β j y t−1 + ϕ j1 ∆y t−1 + · · · + ϕ j4 ∆y t−4 , σ 2 j ) (18)Pr(s t = j | s t−1 = j) = p jj (19)<strong>with</strong> j = 1, 2. The prior of self-tr<strong>an</strong>sition probabilities p 11 <strong>an</strong>d p 22 is Beta(9, 1) <strong>an</strong>d the priorof (ϕ j , σ j ) is a normal-gamma distribution, namely σ −2jThe prior me<strong>an</strong> <strong>an</strong>d vari<strong>an</strong>ce of precision σ −2j∼ G(1, 1) <strong>an</strong>d ϕ j | σ j ∼ N(0, σ 2 I).are unity. The infinite hidden <strong>Markov</strong> model,(11)-(16) is estimated by setting L = 5 <strong>an</strong>d <strong>with</strong> priors ϕ, H ∼ NW(0, 1, 0.2I, 5), χ ∼ G(1, 1)<strong>an</strong>d ν ∼ Exp(1). 17 We set this prior in order to make the prior parameters of the MS2 modelequal to the me<strong>an</strong> of the hierarchical prior of the iHMM.Figure 2 illustrates the posterior probabilities of β st> 0 (i.e P (β st > 0|Y )) for the logarithmicmoney base, exch<strong>an</strong>ge rate <strong>an</strong>d consumer price. From the MS2 model (dotted line),we c<strong>an</strong> see that the posterior probability exceeds the 0.5 in June 1985 <strong>an</strong>d July 1989 for allthree data series, which suggests the existence of explosive behaviors. Me<strong>an</strong>while, since thespikes appear simult<strong>an</strong>eous in these two periods, the explosive behavior of market fundamentals16 The lag order is the same as that in HPS <strong>an</strong>d Shi (2010).17 Larger Ls produce the same results in bubble detection.14


(money growth) is consistent <strong>with</strong> the explosive dynamics of the exch<strong>an</strong>ge rate <strong>an</strong>d consumerprice. However, we also find bubbles in the exch<strong>an</strong>ge rate emerge in April 1987, October 1987<strong>an</strong>d September 1988, which have some locally explosive dynamics that c<strong>an</strong> not be explainedby the money base. This is further supported by the posterior me<strong>an</strong> of β st (i.e. E(β st |Y ))displayed in Figure 3.0.0 0.2 0.4 0.6 0.8money baseP(βt>0 | Data,IHMM)P(βt>0 | Data,MS2)0.2 0.4 0.6 0.80.0 0.2 0.4 0.6 0.8 1.0 1.20.0 0.2 0.4 0.6 0.8 1.0exch<strong>an</strong>ge rateP(βt>0 | Data,IHMM)P(βt>0 | Data,MS2)priceP(βt>0 | Data,IHMM)P(βt>0 | Data,MS2)0.2 0.4 0.6 0.8 1.0 0.5 0.6 0.7 0.8 0.91984−01 1985−01 1986−01 1987−01 1988−01 1989−01Figure 2: The growth rate <strong>an</strong>d the posterior probabilities of β stexch<strong>an</strong>ge rate <strong>an</strong>d consumer price> 0 for the money base,The posterior probability <strong>an</strong>d posterior me<strong>an</strong> of the money base from the iHMM (dashedline) are identical to those from the MS2 model. Therefore, it is reasonable to have a two-regimespecification for the money base. On the other h<strong>an</strong>d, the iHMM shows distinct probability <strong>an</strong>dme<strong>an</strong> patterns for the exch<strong>an</strong>ge rate <strong>an</strong>d consumer price.For the exch<strong>an</strong>ge rate, the posterior probability of β st > 0 implied by the iHMM in Figure 2is almost a mirror image of that by the MS2 model except in 1989. Any plunge of P (β st > 0 | Y )of the iHMM is associated <strong>with</strong> a decrease of the exch<strong>an</strong>ge rate, which is consistent <strong>with</strong> Ev<strong>an</strong>s’s(1991) bubble collapse regime. Ironically, for the MS2 model, 4 out of 5 spikes in Figure 2correspond to decreasing of the exch<strong>an</strong>ge rate, which is counter intuitive. This is caused by the15


0.0 0.2 0.4 0.6 0.80.0 0.2 0.4 0.6 0.8 1.0 1.20.0 0.2 0.4 0.6 0.8 1.0money baseE(βt | Data,IHMM)E(βt | Data,MS2)exch<strong>an</strong>ge rateE(βt | Data,IHMM)E(βt | Data,MS2)priceE(βt | Data,IHMM)E(βt | Data,MS2)0.00 0.02 0.04 0.06 0.08 0.10 −0.005 0.005 0.015 0.0250.00 0.02 0.04 0.06 0.081984−01 1985−01 1986−01 1987−01 1988−01 1989−01Figure 3: The growth rate <strong>an</strong>d posterior me<strong>an</strong> of β stconsumer pricefor the money base, exch<strong>an</strong>ge rate <strong>an</strong>dlimited number of regimes in the MS2 model. The iHMM in Figure 2 <strong>an</strong>d 3 shows that thereare approximately 3 regimes: one is associated <strong>with</strong> the bubble emerging at July 1989; oneis related to the exch<strong>an</strong>ge rate decreasing periods starting at June 1985, April 1987, October1987 <strong>an</strong>d September 1988; <strong>an</strong>d one for the rest of the sample. The MS2 model clearly combinesthe first two regimes into one <strong>an</strong>d produces confusing results.For the consumer price, the iHMM implies that bubbles exist throughout the whole sampleperiod in the bottom p<strong>an</strong>el of Figure 2. The bottom p<strong>an</strong>el of Figure 3 shows the dynamicsof the consumer price is consistent <strong>with</strong> Ev<strong>an</strong>s’s (1991) assumption <strong>with</strong> time-varying degreesof explosiveness. The exp<strong>an</strong>sion rate is relatively higher in the one <strong>an</strong>d a half years sp<strong>an</strong>ningfrom June 1987 to August 1988, <strong>an</strong>d a considerable increase occurred over the period fromMarch 1989 to July 1989. The posterior me<strong>an</strong> of β st at the peak of this episode is 0.08. Therate then dropped rapidly so that by August 1989 it was 0.01 (same as the rate before theincrease occurred). On the other h<strong>an</strong>d, the MS2 is unable to capture the dynamics becauseof assuming a limited number of regimes. Although it identifies two bubble periods staring at16


Raftery’s (1995) criterion.5 Empirical Application: U.S Oil PriceThe second application investigates the existence of speculative bubble behaviors in the U.Soil price. Several papers have studied the evidence for bubbles in the oil price (among others,Phillips <strong>an</strong>d Yu (2011), Sornette et al. (2009) <strong>an</strong>d Shi <strong>an</strong>d Arora (2011)). Although the sampleperiods <strong>an</strong>d methodologies used in these papers are different, most studies have found evidenceof bubble existence.Our data is sampled from April 1983 to December 2010. The price of the nearest-monthWest Texas Intermediate futures contract, obtained from DataStream International, is usedas a proxy for the spot oil price.We deflate the oil price by U.S. Consumer Price Indexobtained from the U.S Bureau of Labour Statistics. The market fundamental is proxied bythe oil inventory, which is the ending stocks excluding Strategic Petroleum Reserve of crudeoil <strong>an</strong>d petroleum products (thous<strong>an</strong>d barrels) <strong>an</strong>d comes from the U.S. Energy InformationAdministration.We apply both the MS2 model <strong>an</strong>d the iHMM to the logarithmic real oil price <strong>an</strong>d thelogarithmic oil inventory. 18 The priors are assumed to be the same as in Section 4. Theposterior probabilities of β st > 0 <strong>an</strong>d the posterior me<strong>an</strong> of β st5, respectively.are presented in Figure 4 <strong>an</strong>dFor the oil inventory, both models suggest that there is no evidence of explosive behavior.Namely, the posterior probabilities (posterior me<strong>an</strong>s) of the inventory from both models arebelow 0.5 (zero). In fact, the posterior me<strong>an</strong> of β stremains the same throughout the sampleperiod. 19 This supports that regime switching is not a feature of the oil inventory dynamic inthe sample period.For the oil price , the patterns of the posterior probability P (β st > 0|Y ) <strong>an</strong>d the posterior18 For the oil inventory, we have estimated three versions of the data: the raw data, the data that is scaled by10 −5 <strong>an</strong>d the log of the raw data. All results are the same for bubble detection <strong>an</strong>d regime identification. So weonly report the results based on the log of the raw data.19 Notice that the posterior me<strong>an</strong> of β st from the iHMM is smaller th<strong>an</strong> that from the MS2 model. This isbecause the iHMM uses hierarchical structures but the two-regime MS model does not.18


20 40 60 80 100 120Oil PriceP(β t>0 | Data,IHMM)P(β t>0 | Data,MS2)0.0 0.2 0.4 0.6 0.8 1.013.70 13.75 13.80 13.85 13.90Log oil inventoryP(β t>0 | Data,IHMM)P(β t>0 | Data,MS2)0.0 0.2 0.4 0.6 0.8 1.0198412 198709 199006 199302 199511 199808 200105 200401 200610 200907Figure 4: Posterior probabilities of β st > 0 for the logarithmic oil price <strong>an</strong>d inventory20 40 60 80 100 120Oil PriceE(β t | Data,IHMM)E(β t | Data,MS2)−0.06 −0.04 −0.02 0.0013.70 13.75 13.80 13.85 13.90Log Oil inventoryE(β t | Data,IHMM)E(β t | Data,MS2)−0.3 −0.2 −0.1 0.0 0.1 0.2 0.3198412 198709 199006 199302 199511 199808 200105 200401 200610 200907Figure 5: Posterior me<strong>an</strong> of β stfor the logarithmic oil price <strong>an</strong>d inventory19


me<strong>an</strong> E (β st |Y ) from the MS2 model are different from the iHMM. For the MS2 model, theposterior probabilities of β st > 0 are always smaller th<strong>an</strong> 0.5 <strong>an</strong>d the posterior me<strong>an</strong>s of β stare negative. Therefore, no evidence of bubble exists based on the MS2 model. However, theiHMM implies that the oil price dynamic is comprised of mild explosive behavior <strong>an</strong>d bubblecollapse phases, which is a prominent feature <strong>an</strong>d different from Ev<strong>an</strong>s (1991) <strong>an</strong>d PWY. Formost of the sample period, the posterior me<strong>an</strong> of β st is slightly positive. There are signific<strong>an</strong>tfalls (below zero) in periods following the 1985 oil price war (1985M11-1986M09), the firstPersi<strong>an</strong> Gulf War (1990M04-1991M03), the 1998 Asi<strong>an</strong> fin<strong>an</strong>cial crisis (1998M12-1999M08)<strong>an</strong>d the subprime mortgage crisis (2008M07-2009M07).The formal model comparison is in Table 2. It reports the log predictive likelihoods forthe oil price <strong>an</strong>d the oil inventory. We c<strong>an</strong> see that the iHMM outperforms the two-regimeMS model for both data series. The differences of the log predictive likelihoods are 72 <strong>an</strong>d 488for the log price <strong>an</strong>d the log inventory respectively. The results strongly support the iHMMagainst the two-regime switching model.Table 2: Log predictive likelihoods2-regime MS modeliHMMOil price 249 321Oil inventory -2397 -1909The last 300 observations (out of 333) are use to calculatethe predictive likelihood.6 ConclusionsThis paper proposes <strong>an</strong> infinite hidden <strong>Markov</strong> model to integrate the detection, date-stamping<strong>an</strong>d estimation of bubble behaviors in a coherent Bayesi<strong>an</strong> framework. It reconciles the existingdata generating processes of speculative bubbles <strong>an</strong>d is able to reveal the dynamic patterns ofreal data series. Two parallel hierarchical structures provide a parsimonious methodology forprior elicitation <strong>an</strong>d improve out-of-sample forecasts.20


The iHMM is applied to Argentina money base, exch<strong>an</strong>ge rate <strong>an</strong>d consumer price fromJ<strong>an</strong>uary 1983 to November 1989 <strong>an</strong>d U.S. oil price <strong>an</strong>d oil inventory from April 1983 to December2010. The predictive likelihoods strongly support the iHMM against a finite <strong>Markov</strong>switching model.The dynamic of the Argentina money base is similar to the locally explosive behavior ofPWY, where the explosive behavior is a tr<strong>an</strong>sitory phenomenon. The Argentina exch<strong>an</strong>ge rate<strong>an</strong>d consumer price, on the other h<strong>an</strong>d, resemble the periodically collapsing explosive behaviorof Ev<strong>an</strong>s (1991), where the explosive behavior prevails throughout the sample period <strong>an</strong>d therate of explosiveness is time-varying. Furthermore, we discover that the exp<strong>an</strong>sion of the moneysupply in July 1989 is closely related to the simult<strong>an</strong>eous regime ch<strong>an</strong>ges of the exch<strong>an</strong>ge rate<strong>an</strong>d the consumer price (to a regime <strong>with</strong> a faster exp<strong>an</strong>sion rate).For the U.S. oil price, we find that mild explosive behavior exists for most of the timeexcept 4 major bubble collapsing periods following the 1985 oil price war, the first Persi<strong>an</strong> GulfWar, the 1998 Asi<strong>an</strong> fin<strong>an</strong>cial crisis <strong>an</strong>d the subprime mortgage crisis. This feature is differentfrom existing data generating processes as Ev<strong>an</strong>s (1991) <strong>an</strong>d PWY. On the other h<strong>an</strong>d, regimech<strong>an</strong>ge is not a feature of U.S. oil inventory data.ReferencesJ. Berger. The case for objective bayesi<strong>an</strong> <strong>an</strong>alysis. Bayesi<strong>an</strong> Analysis, 1(3):385–402, 2006.Wojciech W. Charemza <strong>an</strong>d Derek F. Deadm<strong>an</strong>. <strong>Speculative</strong> bubbles <strong>with</strong> stochastic explosiveroots: The failure of unit root testing. Journal of Empirical Fin<strong>an</strong>ce, 2:153–163, 1995.S. Chib. Calculating posterior distributions <strong>an</strong>d modal estimates in <strong>Markov</strong> mixture models.Journal of Econometrics, 75(1):79–97, 1996.George W. Ev<strong>an</strong>s. Pitfalls in testing for explosive bubbles in asset prices. The Americ<strong>an</strong>Economic Review, 81(4):922–930, Sep. 1991.21


Ferguson. A bayesi<strong>an</strong> <strong>an</strong>alysis of some nonparametric problem. The Annals of Statistics, 1(2):209–230, 1973.E.B. Fox, E.B. Sudderth, M.I. Jord<strong>an</strong>, <strong>an</strong>d A.S. Willsky. A Sticky HDP-HMM <strong>with</strong> Applicationto Speaker Diarization. Annals of Applied Statistics, 5(2A):1020–1056, 2011.J. Geweke. Interpretation <strong>an</strong>d inference in mixture models: Simple MCMC works. ComputationalStatistics & Data Analysis, 51(7):3529–3550, 2007.J. Geweke. Complete <strong>an</strong>d Incomplete Econometric <strong>Model</strong>s. Princeton Univ Pr, 2009.J. Geweke <strong>an</strong>d G. Amis<strong>an</strong>o. Comparing <strong>an</strong>d evaluating Bayesi<strong>an</strong> predictive distributions ofasset returns. International Journal of Forecasting, 2010.J. Geweke <strong>an</strong>d Y. Ji<strong>an</strong>g. Inference <strong>an</strong>d prediction in a multiple structural break model. Journalof Econometrics, 2011.Stephen G. Hall, Zacharias Psaradakis, <strong>an</strong>d Martin Sola. Detecting periodically collapsingbubbles: A markov-switching unit root test. Journal of Applied Econometrics, 14:143–154,1999.H. Ishwar<strong>an</strong> <strong>an</strong>d L.F. James. Gibbs Sampling Methods for Stick-Breaking Priors. Journal ofthe Americ<strong>an</strong> Statistical Association, 96(453), 2001.H. Ishwar<strong>an</strong> <strong>an</strong>d M. Zarepour. <strong>Markov</strong> chain Monte Carlo in approximate Dirichlet <strong>an</strong>d betatwo-parameter process hierarchical models. Biometrika, 87(2):371, 2000.H. Ishwar<strong>an</strong> <strong>an</strong>d M. Zarepour. Dirichlet prior sieves in finite normal mixtures. Statistica Sinica,12(3):941–963, 2002.R.E. Kass <strong>an</strong>d A.E. Raftery. Bayes factors. Journal of the Americ<strong>an</strong> Statistical Association,90(430), 1995.P. Müller. A generic approach to posterior integration <strong>an</strong>d Gibbs sampling. Rapport technique,pages 91–09, 1991.22


M.H. Pesar<strong>an</strong>, D. Pettenuzzo, <strong>an</strong>d A. Timmerm<strong>an</strong>n. Forecasting time series subject to multiplestructural breaks. Review of Economic Studies, 73(4):1057–1084, 2006.Peter C. B. Phillips <strong>an</strong>d Jun Yu. Dating the timeline of fin<strong>an</strong>cial bubbles during the subprimecrisis. Qu<strong>an</strong>titative Economics, forthcoming, 2011.Peter C. B. Phillips, Shu-Ping Shi, <strong>an</strong>d Jun Yu. Testing for multiple bubbles. SingaporeM<strong>an</strong>agement University, Center for Fin<strong>an</strong>cial Econometrics Working Paper 2010-03, 2011a.Peter C. B. Phillips, Y<strong>an</strong>gru Wu, <strong>an</strong>d Jun Yu. Explosive behavior in the 1990s Nasdaq: Whendid exuber<strong>an</strong>ce escalate asset values? International Economic Review, 52:201–226, 2011b.Zacharias Psaradakis, Martin Sola, <strong>an</strong>d Fabio Spagnolo. A simple procedure for detectingperiodically collapsing rationa bubbles. Economics Letters, 24(72):317–323, 2001.GO Roberts, A. Gelm<strong>an</strong>, <strong>an</strong>d WR Gilks. Weak convergence <strong>an</strong>d optimal scaling of r<strong>an</strong>domwalk Metropolis algorithms. The Annals of Applied Probability, 7(1):110–120, 1997.J Sethuram<strong>an</strong>. A constructive definition of dirichlet priors. Statistica Sinica, 4:639–650, 1994.Shu-Ping Shi. <strong>Bubbles</strong> or volatility: A markov-switching adf test <strong>with</strong> regime-varying errorvari<strong>an</strong>ce. Australi<strong>an</strong> National University, School of Economics Working Paper 2010-524,2010.Shu-Ping Shi <strong>an</strong>d Vipin Arora. An application of models of speculative behaviour to oil prices.Australi<strong>an</strong> National University, CAMA Working paper 2011-11, 2011.Shu-Ping Shi, Peter C. B. Phillips, <strong>an</strong>d Jun Yu. Specification sensitivities in right-tailed unitroot testing for fin<strong>an</strong>cial bubbles.Paper 17, 2011.Hong Kong Institute for Monetary Research, WorkingD. Sornette, R. Woodard, <strong>an</strong>d W. Zhou. The 2006-2008 oil bubble <strong>an</strong>d beyond. Physica A,388:1571–1576, 2009.23


Y.W. Teh, M.I. Jord<strong>an</strong>, M.J. Beal, <strong>an</strong>d D.M. Blei. Hierarchical dirichlet processes. Journal ofthe Americ<strong>an</strong> Statistical Association, 101(476):1566–1581, 2006.24


ADirichlet Process <strong>an</strong>d Stick Breaking ProcessA.1 Dirichlet ProcessBefore introducing the Dirichlet process, the definition of the Dirichlet distribution is thefollowing:Definition The Dirichlet distribution is denoted by Dir(α), where α is a K-dimensionalvector of positive values. Each sample x from Dir(α) is a K-dimensional vector <strong>with</strong> x i ∈ (0, 1)∑<strong>an</strong>d K x i = 1. The probability density function isi=1p(x | α) =∑Γ( K α i ) K∏i=1K∏Γ(α i )i=1i=1x α i−1iA special case is the Beta distribution, where K = 2.∑Define α 0 = K α i <strong>an</strong>d X i as the ith element of the r<strong>an</strong>dom vector X from a Dirichleti=1distribution Dir(α). The r<strong>an</strong>dom variable X i has me<strong>an</strong> α iα 0<strong>an</strong>d vari<strong>an</strong>ce α i(α 0 −α i )α 2 0 (α 0+1) . Hence,we c<strong>an</strong> further decompose α into two parts: a shape parameter G 0 = ( α 1α 0, · · · , α Kα0) <strong>an</strong>d aconcentration parameter α 0 . The shape parameter G 0 represents the center of the r<strong>an</strong>domvector X <strong>an</strong>d the concentration parameter α 0 controls how close X is to G 0 .The Dirichlet distribution is conjugate to the multi-nominal distribution in the followingsense: ifX ∼ Dir(α),β = (n 1 , . . . , n K ) | X ∼ Mult(X),25


∑where n i is the number of occurrences of i in a sample of n = K n i points from the discretedistribution on {1, · · · , K} defined by X. Then,i=1X | β = (n 1 , . . . , n K ) ∼ Dir(α + β).This relationship is used in Bayesi<strong>an</strong> statistics to estimate the hidden parameters X, given acollection of n samples. Intuitively, if the prior is represented as Dir(α), then Dir(α + β) isthe posterior following a sequence of observations <strong>with</strong> histogram β.The Dirichlet process was introduced by Ferguson (1973) as the extension of the Dirichletdistribution from finite dimensions to infinite dimensions. It is a distribution of distributions<strong>an</strong>d has two parameters: the shape parameter G 0 is a distribution over a sample space Ω <strong>an</strong>dthe concentration parameter α 0 is a positive scalar. They have similar interpretation as theircounterparts in the Dirichlet distribution. The formal definition is the following:Definition The Dirichlet process over a set Ω is a stochastic process whose sample path is aprobability distribution over Ω. For a r<strong>an</strong>dom distribution F distributed according to a Dirichletprocess DP(α 0 , G 0 ), given <strong>an</strong>y finite measurable partition A 1 , A 2 , · · · , A K of the samplespace Ω, the r<strong>an</strong>dom vector (F (A 1 ), · · · , F (A K )) is distributed as a Dirichlet distribution <strong>with</strong>parameters (α 0 G 0 (A 1 ), · · · , α 0 G 0 (A K )).Use the results form the Dirichlet distribution, for <strong>an</strong>y measurable set A, the r<strong>an</strong>domvariable F (A) has me<strong>an</strong> G 0 (A) <strong>an</strong>d vari<strong>an</strong>ce G 0(A)(1−G 0 (A))α 0 +1. The me<strong>an</strong> implies the shapeparameter G 0 represents the center of a r<strong>an</strong>dom distribution F drawn from a Dirichlet processDP(α 0 , G 0 ). Define a i ∼ F as <strong>an</strong> observation drawn from the distribution F . Because bydefinition P (a i ∈ A | F ) = F (A), we c<strong>an</strong> derive P (a i ∈ A | G 0 ) = E(P (a i ∈ A | F ) | G 0 ) =E(F (A) | G 0 ) = G 0 (A). Hence, the shape parameter G 0 is also the marginal distribution of<strong>an</strong> observation a i . The vari<strong>an</strong>ce implies the concentration parameter α 0 controls how close ther<strong>an</strong>dom distribution F is to the shape parameter G 0 . The larger α 0 is, the more likely F isclose to G 0 , <strong>an</strong>d vice versa.26


Suppose there are n observations, a = (a 1 , · · · , a n ), drawn from the distribution F . Usen∑δ ai (A j ) toi=1represent the number of a i in set A j , where A 1 , · · · , A K is a measurable partition of thesample space Ω <strong>an</strong>d δ ai (A j ) is the Dirac measure, where⎧⎪⎨ 1 if a i ∈ A jδ ai (A j ) =.⎪⎩ 0 if a i /∈ A j( n∑)n∑Conditional on (F (A 1 ), · · · , F (A K )), the vector δ ai (A 1 ), · · · , δ ai (A K ) has a multinominaldistribution. By the conjugacy of Dirichlet distribution to the multi-nominali=1i=1distribution,the posterior distribution of (F (A 1 ), · · · , F (A K )) is still a Dirichlet distribution(F (A 1 ), · · · , F (A K )) | a ∼ Dir(α 0 G 0 (A 1 ) +n∑δ ai (A 1 ), · · · , α 0 G 0 (A K ) +i=1)n∑δ ai (A K )Because this result is valid for <strong>an</strong>y finite measurable partition, the posterior of F is still Dirichletprocess by definition, <strong>with</strong> new parameters α ∗ 0 <strong>an</strong>d G∗ 0 , wherei=1α ∗ 0 = α 0 + nG ∗ 0 = α 0α 0 + n G 0 +nα 0 + nThe posterior shape parameter, G ∗ 0 , is the mixture of the prior <strong>an</strong>d the empirical distributionimplied by observations. As n → ∞, the shape parameter of the posterior converges tothe empirical distribution. The concentration parameter α ∗ 0n∑i=1δ ain→ ∞ implies the posterior of Fconverges to the empirical distribution <strong>with</strong> probability one. Ferguson (1973) showed that ar<strong>an</strong>dom distribution drawn from a Dirichlet process is almost sure discrete, although the shapeparameter G 0 c<strong>an</strong> be continuous.27


A.2 Stick breaking processFor a r<strong>an</strong>dom distribution F ∼ DP(α 0 , G 0 ), because F is almost surely discrete, it c<strong>an</strong> berepresented by two parts: different values θ i ’s <strong>an</strong>d their corresponding probabilities p i ’s, wherei = 1, 2, · · · . Sethuram<strong>an</strong> (1994) found the stick breaking representation of the Dirichlet processby writing F ≡ (θ, p), where θ ≡ (θ 1 , θ 2 , · · · ) ′ , p ≡ (p 1 , p 2 , · · · ) ′ ∑<strong>with</strong> p i > 0 <strong>an</strong>d ∞ p i = 1. TheF ∼ DP(α 0 , G 0 ) c<strong>an</strong> be generated byi=1V iiid∼ Beta(1, α 0 ) (20)∏i−1p i = V i (1 − V j ) (21)j=1θ iiid∼ G 0 (22)where i = 1, 2, · · · . In this representation, p <strong>an</strong>d θ are generated independently. The processgenerating p, (20) <strong>an</strong>d (21), is called the stick breaking process <strong>an</strong>d denoted by p ∼ SBP(α 0 ).The name comes after the p i ’s generation. For each i, the remaining probability, 1 − i−1 ∑p j ,is sliced by a proportion of V i <strong>an</strong>d given to p i . It’s like breaking a stick <strong>an</strong> infinite number oftimes.j=1BBlock samplerB.1 Sample (S, I) | Θ, P, YS | Θ, P, Y is sampled by the forward filter <strong>an</strong>d backward sampler of Chib (1996).I is introduced to facilitate the π 0 sampling. From (11) <strong>an</strong>d (12), the filtered distributionof π j conditional on S 1,t <strong>an</strong>d π 0 is a Dirichlet distribution:()π j | S 1,t , π 0 ∼ Dir c(1 − ρ)π 01 + n (t)j1 , · · · , c(1 − ρ)π 0j + cρ + n (t)jj , · · · , c(1 − ρ)π 0L + n (t)jLwhere n (t)jiis the number of {τ | s τ = i, s τ−1 = j, τ ≤ t}. Integrating out π j , the conditional28


distribution of s t+1 given S 1,t <strong>an</strong>d π 0 is:p(s t+1 = i | s t = j, S 1,t , π 0 ) ∝ c(1 − ρ)π 0i + cρδ j (i) + n (t)jiConstruct a variable I t <strong>with</strong> a Bernoulli distribution⎧∑ ⎪⎨ cρ + L n (t)jiif I t+1 = 0p(I t+1 | s t = j, S 1,t , π 0 ) ∝j=1⎪⎩c(1 − ρ) if I t+1 = 1<strong>an</strong>d the conditional distribution:p(s t+1 = i | I t+1 = 0, s t = j, S t , β) ∝ n (t)ji+ cρδ j (i)p(s t+1 = i | I t+1 = 1, s t = j, S t , β) ∝ π 0iThis construction preserves the same conditional distribution of s t+1 given S 1,t <strong>an</strong>d π 0 . Tosample I | S, use the Bernoulli distribution:c(1 − ρ)π 0iI t+1 | s t+1 = i, s t = j, π 0 ∼ Ber(n (t)).ji+ cρδ j (i) + c(1 − ρ)π 0iB.2 Sample (Θ, P, π 0 ) | S, I, YAfter sampling I <strong>an</strong>d S, write m i = ∑ I t . By construction, the conditional posterior of π 0s t=igiven S <strong>an</strong>d I only depends on I <strong>an</strong>d is a Dirichlet distribution by conjugacy:π 0 | S, I ∼ Dir( γ L + m 1, · · · , γ L + m L)This approach of sampling π 0 is simpler th<strong>an</strong> Fox et al. (2011).Conditional on π 0 <strong>an</strong>d S, the sampling of π j is straightforward by conjugacy:π j | π 0 , S ∼ Dir(c(1 − ρ)π 01 + n j1 , · · · , c(1 − ρ)π 0j + cρ + n jj , · · · , c(1 − ρ)π 0L + n jL )29


where n ji is the number of {τ | s τ = i, s τ−1 = j}.Sampling Θ | S, Y uses the results of regular linear models. The prior is:(ϕ i , σ −2i) ∼ NG(ϕ, H, χ 2 , ν 2 ).By conjugacy, the posterior is:<strong>with</strong>(ϕ i , σi−2 ) | S, Y ∼ NG(ϕ i , H i , χ i2 , ν i2 )ϕ i = H −1i (Hϕ + X ′ iY i )H i = H + X ′ iX iχ i = χ + Y ′i Y i + ϕ ′ Hϕ − ϕ ′ Hϕν i = ν + n iwhere Y i is the collection of y t in regime i. x t = (1, y t−1 , · · · , y t−q ). X i <strong>an</strong>d n i are the collectionof x t <strong>an</strong>d the number of observations in regime i, respectively.B.3 Sample (ϕ, H, χ) | S, Θ, νThe conditional posterior is:ϕ, H | {ϕ i , σ i } K i=1 ∼ NW(m 1 , τ 1 , A 1 , a 1 )30


where K is the number of active regimes, <strong>with</strong> which at least one data point is associated. ϕ i<strong>an</strong>d σ i are the parameters of regime i.m 1 =τ 1 =A 1 =1τ0 −1 ∑+ K σi−2i=11τ −10 + K ∑(i=1A −10 +a 1 = a 0 + K.σ −2iK∑i=1(τ −10 m 0 +K∑i=1)σi −2 ϕ iσi−2 ϕ i ϕ ′ i + τ0 −1 m 0m ′ 0 − τ1 −1 m 1m ′ 1) −1The conditional posterior of χ is:χ | ν, {σ i } K i=1 ∼ G(d 1 /2, c 1 /2)<strong>with</strong> d 1 = d 0 + K ∑i=1σ −2i<strong>an</strong>d c 1 = c 0 + Kν.B.4 Sample ν | χ, S, ΘThe conditional posterior of ν has no regular density form:p(ν | χ, {σ i } K i=1) ∝() K ((χ/2) ν/2 K) ν/2∏σi−2 exp{− ν }.Γ(ν/2)ρi=1νThe Metropolis-Hastings method is applied to sample ν.distribution:{<strong>with</strong> accept<strong>an</strong>ce probability minν | ν ′ ∼ G( ζ νν ′ , ζ ν)Draw a new ν from a proposal}, where ν ′ is the value from the pre-1, p(ν|χ,{σ i} K i=1 )f G(ν ′ ; ζ νν ,ζ ν )p(ν ′ |χ,{σ i } K i=1 )f G(ν; ζ νν ′ ,ζ ν )vious sweep. ζ ν is fine tuned to produce a reasonable accept<strong>an</strong>ce rate around 0.5, as suggestedby Roberts et al. (1997) <strong>an</strong>d Müller (1991).31

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!