Prediction with Mixture Autoregressive Models - MIMS - The ...

Prediction with MixtureAutoregressive ModelsGeorgi N. BoshnakovFirst version: 31 January 2006Research Report No. 6, 2006, Probability and Statistics GroupSchool of Mathematics, The University of Manchester

Prediction with mixture autoregressive modelsGeorgi N. BoshnakovMathematics DepartmentUniversity of Manchester Institute of Science and TechnologyP O Box 88Manchester M60 1QD, UKE-mail: georgi.boshnakov@umist.ac.ukTel: (+44) (0)161 200 3684Fax: (+44) (0)161 200 3669AbstractMixture autoregressive (MAR) models have the attractive propertythat the shape of the conditional distribution of a forecast dependson the recent history of the process. In particular, it may havea varying number of modes over time. We show that the distributionsof the multi-step predictors in MAR models are also mixturesand specify them analytically. In the important case when the originalMAR model is a mixture of normal or, more generally, α-stabledistributions, the multi-step distributions are also mixtures of normal(respectively α-stable) distributions. The latter provide a frameworkfor introducing (possibly skewed) multi-modal components with heavytails. It is valuable to know that these important classes of mixturesare “closed” in respect to prediction since this allows for deriving thepredictive distributions by simple arithmetic manipulation of the parametersof the original distributions. Our approach is applicable toother models as well.mixture autoregression, prediction, time series, stable distri-Keywords:butions

1 IntroductionMixture autoregressive models are studied in Wong and Li (2000) and Wong(1998). This is a relatively simple class of models having the attractive propertythat the shape of the conditional distribution of a forecast depends onthe recent history of the process. In particular, it may have a varying numberof modes over time. Modes are intuitive and a simple graph of a conditionaldistribution of a predictor having two modes may be far more useful thana report of the forecast (conditional mean) and its (conditional) variance,simply because the qualitative features are more stable than parameter estimates.Wong and Li (2000) note that the multi-step conditional distributions ofpredictors in MAR models are intractable analytically and resort to MonteCarlo simulations. Our aim is to show that the distributions of the multi-steppredictors in MAR models are also mixtures and to specify them analytically.Our approach is applicable also to the case of the switching-regime autoregressivemodels were the sequence of random variables used to choose a component(regime) is a Markov process, rather than a sequence of independentvariables.An important methodological question is whether analytical results mattersince the number of components in the distribution of the h-step predictorgrows quckly with h and the number of components in the MAR model.A simulation involving thousands or even millions of replications can besafely considered a single-step calculation. Normally the simulation approachavoids submersion in analytical calculations and provides a common frameworkfor the solution of entire classes of problems. This is however not aprerogative of the simulation methods any more. Systems for symbolic computationsare becoming main stream tools, e.g., in financial applications. Insuch systems computing analytically the neccessary distributions is hardlymore difficult than setting up a simulation. For example, computation of integralsreduces to typing them with the syntax of the corresponding symbolicsystem (at least in principle).While it is true that a mixture, e.g., of 2 7 components (corresponding to a7 lags ahead predictor from a two-component MAR model) is hardly suitablefor presentation to humans, it can be viewed as a black box and simply usedfor further calculations, plotting or decision making. After all, we do notask to see the random sequences used in simulation, as long as we believethey have the required properties. Moreover, if the original MAR model is amixture of normal or, more generally, α-stable distributions, then the multistepdistributions are also mixtures of normal (respectively α-stable) distributions.The latter provide a framework for introducing (possibly skewed)1

multi-modal components with heavy tails. It is valuable to know that theseimportant classes of mixtures are “closed” in respect to prediction since thisallows for deriving the distributions by simple arithmetic manipulation of theparameters of the original distributions.We also demonstrate that (conditional) characteristic functions are thenatural tool for calculations in this type of model. Conditional means andvariances are not sufficient for prediction (even when they exist) in the presenceof severe deviation from normality. On the other hand, their calculationis very intuitive. Natural alternatives to them are the conditional characteristicfunctions. A characteristic function contains the entire distributionalinformation. If needed, conditional mean, variance or other moments canbe obtained from it. Moreover, unlike variances and means, characteristicfunctions always exist and hence provide more general results. This is notonly of academic interest since the idea of non-existing variance, or mean,filters down to applications and apparently will get the same level of acceptanceas the fact that many common distributions assume the possibility ofarbitrarily large values of the quantities of interest. In the important classof α-stable distributions with 0 < α < 2, moments of order greater than orequal to α do not exist (see Zolotarev (1986)). Time series models basedon stable distributions are studied actively (see Samorodnitsky and Taqqu(1994)), Rachev and Mittnik (2000) and the references therein). Also, it isdifficult to obtain manageable forms of the probability densities of the stabledistributions (except in some special cases). On the other hand, their characteristicfunctions have a remarkably simple form and thus are the naturaltool to use. Some of the most efficient methods for simulation of α-stabledistributions are based on their characteristic functions, not densities.2 The MAR modelA process {y(t)} is said to be a mixture autoregressive process if the conditionaldistribution function of y(t) given the information from the past ofthe process is a mixture of the following form:F t|t−1 (x) ≡ Pr(y(t) ≤ x|F t−1 )g∑( x − φk,0 − ∑ p ki=1= π k F φ )k,iy(t − i)k , (1)σ kk=12

or, alternatively, if {y(t)} evolves according to the equation⎧φ 1,0 + ∑ p 1i=1⎪⎨φ 1,iy(t − i) + σ 1 ε 1 (t), with probability π 1 ,φ 2,0 + ∑ p 2i=1y(t) =φ 2,iy(t − i) + σ 2 ε 2 (t), with probability π 2 ,. . .⎪⎩φ g,0 + ∑ p gi=1 φ g,iy(t − i) + σ g ε g (t), with probability π g ,where(2)• F t is the sigma field generated by the process {y(t)} up to and includingtime t;• g is a positive integer, the number of components in the model;• the probabilities π k > 0, k = 1, . . . , g, ∑ gk=1 π k = 1, define a discretedistribution, π;• for each k = 1, . . . , g:– the distribution function of ε k (t), the kth noise component, is F k ;– the kth (autoregressive) component, y k (t), is of order p k ≥ 1;– φ k,i , i = 0, 1, . . . , p k , are the autoregressive coefficients of the kthcomponent;– the distribution whose distribution function is F k , has densityfunction f k and characteristic function ϕ k . The “location” and“scale” parameters of this distribution are assumed to be zeroand one, respectively;– σ k > 0, is a scaling for the kth noise component.It is convenient to set p = max 1≤k≤g p k and φ k,i = 0 for i > p k . We assumealso that t > p thus avoiding the necessity to specify the mechanismgenerating y(t) for t = 1, . . . , p.So, at each time t one of g autoregressive-like equations is picked up atrandom to generate y(t). The definition of the MAR model through equation(1) is more compact and somewhat more general than that based on (2)since it does not introduce explicitly the noise components {ε k (t)} and thusavoids the discussion of their dependence structure.Equation (2) can be written in a more compact form, useful for calculations.Defineµ k (t) = φ k,0 +p k∑i=1φ k,i y(t − i), k = 1, . . . , g,3

y k (t) = φ k,0 +p k∑i=1φ k,i y(t − i) + σ k ε k (t)= µ k (t) + σ k ε k (t), k = 1, . . . , g.Let {z t } be an iid sequence of random variables with distribution π, suchthat Pr{z t = k} = π k , k = 1, . . . , g. Then y(t) is given by component k inequation (2) if z t = k. Now equation (2) can be written asy(t) = φ zt,0 +p∑φ zt,iy(t − i) + σ zt ε zt (t) (3)i=1= µ zt (t) + σ zt ε zt (t). (4)The conditional density corresponding to F t|t−1 (x) isf t|t−1 (x) =g∑(π k x − φk,0 − ∑ p ki=1f φ )k,iy(t − i)k . (5)σ k σ kk=1The functions F t|t−1 (x) and f t|t−1 (x) are often evaluated at observed valuesof y(t), e.g.,F t|t−1 (y(t)) =g∑k=1( y(t) − φk,0 − ∑ p ki=1π k F φ )k,iy(t − i)k .σ kThe meaning of y(t) (random variable or observed value) is normally obviousfrom the context.Equation (3) (or, equivalently, (2)) allows for studying the MAR modelusing the properties of the conditional expectation operator, i.e., using probabilisticreasoning. Alternatively, the usage of the conditional distributionfunction directly could be called “analytical” reasoning. Although the twoapproaches are equivalent, each of them has its own merits.One inconvenience in the representation of the MAR model through equation(3) is that processes ε k (t) and z t are introduced explicitly and thereforetheir dependence structure has to be specified explicitly. Weaker conditionsare possible, but we assume that ε k (t) are jointly independent and are alsoindependent of past ys in the sense that for each t the σ-field generated bythe set of random variables {ε k (t + n), n ≥ 1, 1 ≤ k ≤ g} is independent ofF t . Further, we assume that the choice of the component at time t (i.e., z t )does not depend on F t−1 and {ε k (t), t ≥ 1, 1 ≤ k ≤ g}.For many applications it is sufficient to consider noise components havingstandard normal distribution, i.e., ϕ k (s) = e −s2 /2 . Another useful class is4

the class of α-stable distributions, e.g., ϕ k (s) = e −|s|α is the characteristicfunction of a symmetric α-stable noise component.The requirement about the location (shift) and scale parameters of thenoise components is justified by the presence of φ k,0 and σ k in (2). Forsymmetric uni-modal distributions the median or the mean (if it exists) isnormally chosen as location parameter. Similarly, the standard deviation(if it exists) can be chosen to be the scale parameter. The choice may alsodepend on the family of distributions. For example, when the normal distributionis considered a part of the α-stable family with α = 2, a scaleparameter equal to one gives the characteristic function e −|s|2 , but this is anormal distribution with standard deviation √ 2, not 1.3 Representation of y(t) for h-step predictionEquation (3) can be used for one-step prediction. For a longer horizon, say h,we need a generalization of (3) which does not contain y(t−1), . . . , y(t−h+1).This can be obtained by iteratively replacing y(t − 1), y(t − 2), . . . (seeAppendix B). We formulate the result in the following lemma.Lemma 1 The following representation holds for any h ≥ 1:∑h−1y(t) = µ zt,...,zt−h+1 (t) + θ (zt,...,z t−h+1)i ε zt−i (t − i), (6)i=0whereµ zt,...,z t−h+1(t) = β (zt,...,z t−h+1)0 +p∑i=1β (zt,...,z t−h+1)i y(t − h + 1 − i), (7)with coefficients obtained from (3)–(4) for h = 1 and defined recursively forh ≥ 2 by⎧⎪⎨ β (zt,...,z t−h+1)β (zt,...,z 0 + β (zt,...,z t−h+1)1 φ zt−h ,0 for i = 0,t−h)i = β (zt,...,z t−h+1)i+1 + β (zt,...,z t−h+1)1 φ zt−h ,i for i = 1, . . . , p − 1, (8)⎪⎩β (zt,...,z t−h+1)1 φ zt−h ,p for i = p.{θ (zt,...,z t−h) θ (zt,...,z t−h+1)i for i = 0, . . . , h − 1,i =β (zt,...,z (9)t−h+1)1 σ zt−h for i = h.5

The coefficients θ (zt,...,z t−h+1)i and β (zt,...,z t−h+1)i are functions of z t , . . . , z t−h+1and the parameters φ k,i and σ k of the MAR model. For example, for h = 2we have⎧⎪⎨ φ k,0 + φ k,1 φ l,0 for i = 0,{β (k,l)i = φ k,i+1 + φ k,1 φ l,i for 1 ≤ i ≤ p − 1, θ (k,l) σ k for i = 0,i =⎪⎩φ k,1 σ l for i = 1.φ k,1 φ l,p for i = p,(10)µ k (t) has the meaning of a location parameter of the (conditional) distributionof the kth component of the MAR model. In particular, it is itsconditional expectation when the mean E ε k (t) exists (see Appendix A), i.e.,E(y k (t)|F t−1 ) = µ k (t).The variables y k (t) are somewhat artificial but an important consequence ofthe last equation is thatµ k (t) = E(y(t)|F t−1 , z t = k), k = 1, . . . , g.More generally, for any horizon h ≥ 1 and any htuple k 1 , . . . , k h such that1 ≤ k i ≤ g, for 1 ≤ i ≤ h,µ k1 ,...,k h(t) = E(y(t)|F t−h , z t = k 1 , z t−1 = k 2 , . . . , z t−h+1 = k h )p∑= β (k 1,...,k h )i y(t − i) + β (k 1,...,k h )0 ,i=1when the means of the noise components exist but we should stress againthat our results do not rely on the existence of any moments.4 Conditional characteristic functions for theMAR modelThe (one-step) conditional characteristic function of y(t) isϕ t|t−1 (s) ≡ E(e isy(t) |F t−1 ) =g∑π k e isµ k(t) ϕ k (σ k s). (11)The two-step conditional characteristic function can be found by applyingthe iterated rule for conditional expectations and replacing y(t − 1) usingequations (11) and (3) (see Appendix B),k=1ϕ t|t−2 (s) ≡ E(e isy(t) |F t−2 ) = E(E(e isy(t) |F t−1 )|F t−2 )6

=g∑k=1g∑π k π l e isµ k,l(t) ϕ k (σ k s)ϕ l (φ k,1 σ l s), (12)l=1where µ k,l (t) is a linear combination of y(t − i), i ≥ 2 (see equation (19)).Thus, the conditional distribution of y(t) given F t−2 is a mixture of g 2components with mixing probabilities π k π l . For Gaussian noise componentswe haveϕ k (σ k s)ϕ l (σ l φ k,1 s) = e −σ2 k s2 /2 e −σ2 l φ2 k,1 s2 /2 = e −(σ2 k +σ2 l φ2 k,1 )s2 /2 ,which together with equation (12) shows that the conditional distributionof the two-step predictor is a mixture of normal distributions with variancesσ 2 k + σ2 l φ2 k,1 and means µ k,l(t). It is worth noting that the calculations for thegeneral case are not more complicated than that for the Gaussian case.The conditional characteristic functions for longer horizons can be obtainedrecursively using equation (6), Lemma 1 and the iterated rule forconditional expectations (see Appendix B)E(e isy(t) |F t−h ) = E(E(e isy(t) |F t−h , z t , . . . , z t+1−h )|F t−h )g∑h−1∏= (π k1 · · · π kh )e is(µ k 1 ,...,k h(t))ϕ kh−i (θ (k 1,...,k h )h−is). (13)k 1 ,...,k h =1It is straightforward to do this calculation for the case of dependent z t forminga Markov chain. The notation will be somewhat more complicated butthe end result may even contain fewer terms if the transition matrix of theMarkov chain contains zeroes. Also, joint conditional distributions can befound using the same technique.The product of the noise characteristic functions in equation (13) showsthat if the components of the MAR model are normal then the conditionaldistributions of the predictors for all lags are mixtures of normals. Moregenerally, if ε k (t) are α-stable, S α (1, β k , 0), k = 1, . . . , g, then the product∏ h−1i=0 ϕ k h−i(θ (k 1,...,k h )h−is) is the characteristic function of an α-stable distribution(see Section 5 and Appendix C).5 Conditional distributions of predictorsWe summarise below the results of Section 4 and derive some useful corollaries.Recall that ε k (t) and y k (t) are the kth noise component and the kthcomponent of the MAR model, respectively.i=07

Theorem 1 For each h ≥ 1 the conditional characteristic function, ϕ t+h|t (s) ≡E(e isy(t+h) |F t ), of the h-step predictor at time t of the MAR process (2) isgiven byϕ t+h|t (s) =g∑k 1 ,...,k h =1h−1∏(π k1 · · · π kh )e is µ k 1 ,...,k h(t+h)ϕ kh−i (θ (k 1,...,k h )h−is), (14)i=0whereµ k1 ,...,k h(t + h) =p∑i=1β (k 1,...,k h )i y(t + 1 − i) + β (k 1,...,k h )0 .The expression on the right-hand side of (14) represents a mixture. Hence,to stress the mixture property, this theorem can be stated alternatively asfollows.Theorem 2 The conditional distribution of the h-step predictor at time tof the MAR process (2) is a mixture of g h components. The probabilityassociated with the (k 1 , . . . , k h )th component is π k1 · · · π kh and its conditionalcharacteristic function is∏h−1e isµ k 1 ,...,k h(t+h)i=0ϕ kh−i (θ (k 1,...,k h )h−is).Note that the exponent, e is(...) , is the only term in the characteristic functionof the (k 1 , . . . , k h )th component that depends on the past of the process{y(t)}. Since such a factor corresponds to a shift of the distribution we havethe following result.Corollary 1 Only the location parameter of the conditional distribution ofthe (k 1 , . . . , k h )th component of the h-step predictor depends on the past ofthe process {y(t)}.It should be stressed however that this qualitative property holds for theconditional distributions of the individual components, not for the conditionaldistribution of y(t + h) itself. The shape and other properties of thelatter (such as Var(y(t + h)|F t )) do depend on y(t), y(t − 1), etc.Since we have assumed that the location parameters of the noise componentsare zero, it is tempting to state that the location parameter of thedistribution of the (k 1 , . . . , k h )th component of the h-step predictor at timet is µ k1 ,...,k h(t + h), but this is not always the case (see Corollary 3, equation(15), below for an example).8

In general, the distributions of the components of the h-step predictorsdepend both on h and on the noise distributions in the specification of theMAR model and do not necessarily belong to a common class. However, ifthe components of the MAR model are stable then the components of thepredictors are stable as well. Hence we deduce the following corollaries fromTheorems 1 and 2 for this important case. The case of normal componentsis treated separately since the other stable distributions do not have finitevariance.Corollary 2 If the components of the MAR model (2) are normal, then foreach h ≥ 1 the components of the h-step ahead predictor at time t are alsonormal. The conditional mean and variance of the (k 1 , . . . , k h )th componentare µ k1 ,...,k h(t + h) and ∑ h−1) 2 , respectively.i=0 (θ(k 1,...,k h )h−iFor the two-step predictors (i.e., for h = 2) these simplify to µ k,l (t + 2) andσ 2 k + σ2 l φ2 k,1 , respectively.Corollary 3 If the components of the MAR model (2) are α-stable, S α (1, β i , 0),i = 1, . . . , g, with 0 < α ≤ 2, then for each h ≥ 1 the components of the h-stepahead predictor at time t are α-stable. The distribution of the (k 1 , . . . , k h )thcomponent is S α (σ, β, µ), where{µ k1 ,...,kµ =h(t + h) if α ≠ 1,∑µ k1 ,...,k h(t + h) + 2 h−1π i=0 β k h−iθ (k 1,...,k h )h−ilog|θ (k 1,...,k h )(15)h−i| if α = 1.σ =( ∑h−1) 1/α|θ (k 1,...,k h )h−i| α , β =i=06 Example∑ h−1i=0 β k h−i|θ (k 1,...,k h )h−i| α sign(θ (k 1,...,k h )h−i)∑ h−1.i=0 |θ(k 1,...,k h )h−i| α(16)Wong and Li (2000) built a mixture autoregressive model for the daily IBMstock closing price data ((Box and Jenkins 1976)) and provided an examplefor a two-step predictive distribution. Their model has the following parameters:g = 3; π = (0.5439, 0.4176, 0.0385); σ 1 = 4.8227, σ 2 = 6.0082,σ 3 = 18.1716; p 1 = 2, p 2 = 2, p 3 = 1; φ k,0 = 0, k = 1, 2, 3; φ 1,1 = 0.6792,φ 1,2 = 0.3208, φ 2,1 = 1.6711, φ 2,2 = −0.6711, φ 3,1 = 1. Putting these parametersinto (5) gives the one-step conditional density,f t+1|t (x) = 0.000845235e −0.0015142(x−yt)2 +0.0449924e −0.0214976(x−0.6792yt−0.3208y t−1) 29

0.040.030.020.01• 340 360 380 400Figure 1: Pdf of 1-step predictor at t = 258 for the IBM data+ 0.0277285e −0.013851(x−1.6711yt+0.6711y t−1) 2 .From Theorem 1 and Corollary 2 (see also equation (12)) the two step conditionaldensity is−0.000757101 (x−yt)2f t+2|t (x) = 0.0000230104e−0.00201992 (x−1. yt)2+ 0.000530972e−0.000850474 (x−1. yt)2+ 0.000264531e+ 0.0104655e −0.00666972 (x−0.463911 yt−0.536089 y t−1) 2+ 0.000444341e −0.00141457 (x−0.6792 yt−0.3208 y t−1) 2+ 0.0188846e −0.0128023 (x−0.782113 yt−0.217887 y t−1) 2+ 0.0131094e −0.0104654 (x−1.45581 yt+0.455811 y t−1) 2+ 0.000335127e −0.00136498 (x−1.6711 yt+0.6711 y t−1) 2+ 0.00708503e −0.00518551 (x−2.12148 yt+1.12148 y t−1) 2 .(17)For this series y(258) = 361, y(257) = 399. Fig. 1 and Fig. 2 provide graphs ofthe conditional densities f 259|258 (x) and f 260|258 (x), respectively. Both condiitonaldensities are multi-modal in this case. Wong and Li (2000) provide agraph of the two-step density obtained by simulation. It is (at least visually)indistinguishable from the plot of formula (17) given in Figure 2.10

0.0250.020.0150.010.005• 300 320 340 360 380 400 420Figure 2: Exact pdf of 2-step predictor at t = 258 for the IBM data. Formula(17) is used with y(258) = 361, y(257) = 399.7 ConclusionWe have shown that the multi-step predictors for a mixture autoregressivemodel are mixtures and can be computed effectively. Moreover, when thenoise components are normal or stable the predictors remain mixtures ofnormal, respectively stable, distributions for all lags. We have also demonstratedthat the conditional characteristic function is a useful and intuitiveinstrument for analysis of MAR models. It is well suited for work with stabledistributions. At the same time, it is hardly more difficult to deal with thanthe conditional means.8 AcknowledgementsI am grateful to Dr Wong for providing me with a copy of the dissertationWong (1998).11

ADerivation of the representation of y(t) givenby Lemma 1To clarify the ideas we start with the case h = 2. It is necessary to eliminatey(t − 1) from the expression for y(t). From equations (3)-(4) we gety(t) = µ zt (t) + σ zt ε zt (t)where= (µ zt (t) − φ zt,1y(t − 1)) + φ zt,1y(t − 1) + σ zt ε zt (t)= (µ zt (t) − φ zt,1y(t − 1)) + φ zt,1(µ zt−1 (t − 1) + σ zt−1 ε zt−1 (t − 1)) + σ zt ε zt (t)= µ zt,z t−1(t) + φ zt,1σ zt−1 ε zt−1 (t − 1) + σ zt ε zt (t), (18)µ k,l (t) ≡ µ k (t) − φ k,1 y(t − 1) + φ k,1 µ l (t − 1)p−1∑= φ k,0 + φ k,1 φ l,0 + (φ k,i+1 + φ k,1 φ l,i )y(t − 1 − i)i=1+ φ k,1 φ l,p y(t − 1 − p).(19)Equations (18)-(19) provide the required representation of y(t) in the form (6)with coefficients given by equation (10).The above procedure can be done for any lag h. Indeed, assume that y(t)is represented in the form (6)–(7) for some h ≥ 2. From equation (6) we get∑h−1y(t) = µ zt,...,zt−h+1 (t) + θ (zt,...,z t−h+1)i ε zt−i (t − i)i=0= (µ zt,...,z t−h+1(t) − β (zt,...,z t−h+1)1 y(t − h)) + β (zt,...,z t−h+1)1 y(t − h)∑h−1+ θ (zt,...,z t−h+1)i ε zt−i (t − i)= (µ zt,...,zt−h+1 (t) − β (zt,...,z t−h+1)1 y(t − h)) + β (zt,...,z t−h+1)1 µ zt−h (t − h)∑h−1+ β (zt,...,z t−h+1)1 σ zt−h ε zt−h (t − h) + θ (zt,...,z t−h+1)i ε zt−i (t − i). (20)Our aim is to eliminate y(t − h). From equation (7) we getµ zt,...,z t−h+1(t) − β (zt,...,z t−h+1)1 y(t − h)i=1i=0i=0p−1∑= β (zt,...,z t−h+1)0 + β (zt,...,z t−h+1)i+1 y(t − h − i). (21)12

From equation (4) we getβ (zt,...,z t−h+1)1 y(t − h) = β (zt,...,z t−h+1)1 (µ zt−h (t − h) + σ zt−h ε zt−h (t − h)) (22)p∑µ zt−h (t − h) = φ zt−h ,0 + φ zt−h ,iy(t − h − i). (23)i=1From equation (20) we can now see that the required representation for y(t)for lag h + 1 is∑h−1y(t) = µ zt,...,zt−h (t)+β (zt,...,z t−h+1)1 σ zt−h ε zt−h (t−h)+ θ (zt,...,z t−h+1)i ε zt−i (t−i),whereµ zt,...,z t−h(t)i=0(24)= (µ zt,...,z t−h+1(t) − β (zt,...,z t−h+1)1 y(t − h)) + β (zt,...,z t−h+1)1 µ zt−h (t − h), (25)since the last expressions is a linear combination of y(t − i), i ≥ h + 1. Toobtain the formulae (8)–(9) we use equations (20), (22) and (23) to simplifyequation (25) top−1∑µ zt,...,zt−h (t) = β (zt,...,z t−h+1)0 + β (zt,...,z t−h+1)i+1 y(t − h − i) (26)i=1+ β (zt,...,z t−h+1)1 φ zt−h ,0 +p∑i=1β (zt,...,z t−h+1)1 φ zt−h ,iy(t − h − i).Comparing equations (24) and (26) with (6)–(9) we get the desired result.When the mean E ε k (t) exists we haveE(y k (t)|F t−1 ) = E(µ k (t) + σ k ε k (t)|F t−1 )= E(µ k (t)|F t−1 ) + E(σ k ε k (t)|F t−1 )= µ k (t) + σ k E (ε k (t))= µ k (t).BDerivation of the conditional characteristicfunctionsThe two-step conditional conditional characteristic function can be found byusing the iterated rule for conditional expectations and (11) as follows:ϕ t|t−2 (s) ≡ E(e isy(t) |F t−2 )13

= E(E(e isy(t) |F t−1 )|F t−2 )g∑ P= E( π k e is(φ p kk,0+ m=1 φ k,m y(t−m)) ϕ k (σ k s)|F t−2 )====g∑k=1g∑k=1g∑k=1g∑k=1k=1π k e is(φ k,0+ P p km=2 φ k,m y(t−m)) ϕ k (σ k s) E(e isφ k,1y(t−1) |F t−2 )π k e is(φ k,0+ P p km=2 φ k,m y(t−m)) ϕ k (σ k s)π k e is(φ k,0+ P p km=2 φ k,m y(t−m)) ϕ k (σ k s)g∑π k π l e isµ k,l(t) ϕ k (σ k s)ϕ l (φ k,1 σ l s).l=1More generally, using equation (6) we get for any h ≥ 2ϕ t|t−h (s) ≡ E(e isy(t) |F t−h )= E(E(e isy(t) |F t−h , z t , . . . , z t+1−h )|F t−h )g∑π l e isφ k,1µ l (t−1) ϕ l (φ k,1 σ l s)l=1g∑l=1π l e isφ k,1(φ l,0 + P p lm=1 φ l,my(t−1−m)) ϕ l (φ k,1 σ l s)= E(E(e is(µz t ,...,z t+1−h (t)+P h−1m=0 θ(z t ,...,z t+1−h )h−m ε zt−m (t−m)) |F t−h , z t , . . . , z t+1−h )|F t−h )= E(e isµz t ,...,z t+1−h (t) E(e is(P h−1m=0 θ(z t ,...,z t+1−h )h−m ε zt−m (t−m)) |F t−h , z t , . . . , z t+1−h )|F t−h )h−1∏= E(e isµz t ,...,z t+1−h (t)=g∑k 1 ,...,k h =1m=0(π k1 · · · π kh )e isµ k 1 ,...,k h(t)ϕ zt−m (θ (zt,...,z t+1−h)h−ms)|F t−h )h−1∏m=0ϕ kh−m (θ (k 1,...,k h )h−ms).Cα-stable noise componentsLet ε k (t) be α-stable, S α (1, β k , 0), k = 1, . . . , g. Then their characteristicfunctions are given by (see (Lukacs 1979, Theorem 5.7.3)),ϕ k (s) = e −|s|α (1−iβ k sign(s)w(|s|,α)) , (27)where{w(|s|, α) =tan πα ,2when α ≠ 1,2π when α = 1. (28)14

Hence, for any a ≠ 0andw(|as|, α) ={tan πα 2, when α ≠ 1,2log|s| + 2 log|a|, when α = 1,π πlog(ϕ k (as)) = −|as| α (1 − iβ k sign(as)w(|as|, α))= −|a| α |s| α + i|a| α |s| α β k sign(a) sign(s)w(|as|, α)).After some algebraic manipulation and using β defined in equation (16), weget that log( ∏ h−1m=0 ϕ k h−m(θ (k 1,...,k h )h−ms)) is equal towhen α ≠ 1, and to−|s| α h−1∑|θ (k 1,...,k h )h−m| α (1 − iβ sign(s) tan πα 2 ),m=0∑h−1− |s| |θ (k 1,...,k h )h−m|(1 − iβ sign(s) 2 )π log|s|m=0+ is(2π∑h−1m=0β kh−m θ (k 1,...,k h )h−mlog|θ (k 1,...,k h )h−m|otherwise. These expressions provide indeed the logarithms of the characteristicfunctions of the α-stable distributions specified in Corollary 3.),ReferencesBox, G. E. and Jenkins, G. M. (1976) Time series analysis: Forecastingand control. Rev. ed., Holden-Day Series in Time Series Analysis. SanFrancisco etc.: Holden-Day.Lukacs, E. (1979) Characteristic functions. (Kharakteristicheskie funktsii).Transl. from the English by V. M. Zolotarev., Moskva: Nauka.Rachev, S. and Mittnik, S. (2000), Stable Paretian models in finance., Chichester:Wiley.Samorodnitsky, G. and Taqqu, M. S. (1994) Stable non-Gaussian randomprocesses: stochastic models with infinite variance., Stochastic Modeling.New York, NY: Chapman & Hall.15

Wong, C. S. (1998) Statistical inference for some nonlinear time series models,PhD thesis, University of Hong Kong, Hong Kong.Wong, C. S. and Li, W. K. (2000) On a mixture autoregressive model., J. R.Stat. Soc., Ser. B, Stat. Methodol. 62(1), 95–115.Zolotarev, V. (1986) One-dimensional stable distributions. Transl. from theRussian by H. H. McFaden, ed. by Ben Silver., Translations of MathematicalMonographs, Vol. 65. Providence, Rhode Island: AmericanMathematical Society.16

Prediction with Mixture Autoregressive Models - MIMS - The ...

Create successful ePaper yourself

Delete template?

Save as template?