10.07.2015 Views

GLS estimation of dynamic factor models - Econometrics

GLS estimation of dynamic factor models - Econometrics

GLS estimation of dynamic factor models - Econometrics

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

1 IntroductionSince the influential work <strong>of</strong> Forni, Hallin, Lippi, and Reichlin (2000), Stock andWatson (2002a, 2002b), Bai and Ng (2002), and Bai (2003) <strong>dynamic</strong> <strong>factor</strong> <strong>models</strong>have become an important tool in macroeconomic forecasting (e.g. Watson 2003;Eickmeier and Ziegler 2007) and structural analysis (e.g. Giannone, Reichlin,and Sala 2002; Bernanke, Boivin, and Eliasz 2004; Eickmeier 2007). Under theweak assumptions <strong>of</strong> an approximate <strong>factor</strong> model (Chamberlain and Rothschild1983), the parameters <strong>of</strong> the <strong>models</strong> can be consistently estimated by applying thetraditional principal component (PC) estimator (Stock and Watson 2002a; Bai2003) or – in the frequency-domain – by using the <strong>dynamic</strong> principal componentestimator (Forni et al. 2000). Assuming Gaussian i.i.d. errors, the PC estimatoris equivalent to the ML estimator and, therefore, the PC estimator is expectedto share the asymptotic properties <strong>of</strong> the ML estimator. It is well known that aGeneralized Least Squares (<strong>GLS</strong>) type criterion function yields a more efficientestimator than the OLS based PC estimator if the errors are heteroskedastic (e.g.Boivin and Ng 2006; Doz, Giannone, and Reichlin 2006a; Choi 2007). However,it is less clear how the estimator can be improved in the case <strong>of</strong> serially correlatederrors. Stock and Watson (2005) suggest a <strong>GLS</strong> transformation similar to the onethat is used to correct for autocorrelation in the linear regression model. As wewill argue below, this approach has the disadvantage that such a transformationmay inflate the number <strong>of</strong> <strong>factor</strong>s.In this paper we consider the Gaussian ML estimator in <strong>models</strong>, where theerrors are heteroskedastic and autocorrelated. We derive the first order conditionsfor a maximum <strong>of</strong> the (approximate) log-likelihood function and show thatthe resulting system <strong>of</strong> equations can be solved by running a sequence <strong>of</strong> <strong>GLS</strong>regressions. Specifically, the <strong>factor</strong>s can be estimated by taking into account possibleheteroskedasticity <strong>of</strong> the errors, whereas the <strong>factor</strong> loadings are estimatedby using the usual <strong>GLS</strong> transformation for autocorrelated errors. We show thatasymptotically the estimated covariance parameters do not affect the limitingdistribution <strong>of</strong> the PC-<strong>GLS</strong> estimator. Therefore, the feasible two-step <strong>GLS</strong> <strong>estimation</strong>procedure is asymptotically as efficient as the estimator that maximizesthe approximate likelihood function. In small samples, however, our Monte Carlosimulations suggest that the iterated PC-<strong>GLS</strong> estimator can be substantially moreefficient than the simpler two-step estimator. In a related paper, Jungbacker andKoopman (2008) consider the state space representation <strong>of</strong> the <strong>factor</strong> model,2


3 The PC-<strong>GLS</strong> estimatorIn this section we follow Stock and Watson (2005) and assume that the idiosyncraticcomponents have a stationary heterogenous autoregressive representation<strong>of</strong> the forme it = ρ 1,i e i,t−1 + · · · + ρ pi ,ie i,t−pi + ε it (4)ρ i (L)e it = ε it , (5)where ρ i (L) is defined above. The autoregressive structure <strong>of</strong> the idiosyncraticcomponent can be represented in matrix format by defining the (T − p i ) × Tmatrix⎡R(ρ (i) ) =⎢⎣−ρ pi ,i −ρ pi −1,i −ρ pi −2,i · · · 1 0 0 · · ·0 −ρ pi ,i −ρ pi −1,i · · · −ρ 1,i 1 0 · · · ⎥⎦... .. . .. . .. . .. . .. · · ·⎤Thus, the autoregressive representation (4) is written in matrix form asR(ρ (i) )e i = ε i , (6)where ε i = [ε i,pi +1, . . ., ε iT ] ′ and e i = [e i1 , . . .,e iT ] ′ . Furthermore, we do notimpose the assumption that the idiosyncratic errors have the same variancesacross i and t, but assume that σi 2 = E(ε 2 it) may be different across i.We do not need to make specific assumptions about the <strong>dynamic</strong> properties <strong>of</strong>the vector <strong>of</strong> common <strong>factor</strong>s, F t . Apart from some minor regularity conditionsthe only consequential assumption that we have to impose on the <strong>factor</strong>s is thatthey are weakly serially correlated (cf. Assumption 1 in section 4).The PC-<strong>GLS</strong> estimator maximizes the approximate Gaussian log-likelihoodfunctionS(F, Λ, ρ, Σ)=− T N∑ N∑ T∑log σi 2 (e it − ρ 1,i e i,t−1 − . . . − ρ pi ,ie i,t−pi ) 2−, (7)2i=1i=1 t=p i +1where Σ = diag(σ1 2, . . .,σ2 N ). If x it is normally distributed and N → ∞, then thePC-<strong>GLS</strong> estimator is asymptotically equivalent to the ML estimator. This canbe seen by writing the log-likelihood function as L(X) = L(e|F) + L(F), whereL(e|F) denotes the logarithm <strong>of</strong> the conditional density function <strong>of</strong> e 11 , . . .,e NTconditional on the <strong>factor</strong>s F and L(F) is the log-density <strong>of</strong> (F 1 ′, . . .,F T ′ ). SinceL(e|F) is O p (NT) and L(F) is O p (T) it follows that as N → ∞ maximizingL(e|F) is equivalent to maximizing the full likelihood function L(X).62σ 2 i


The gradients <strong>of</strong> the criterion function stated above are obtained as{ T∑}g λi (·) = ∂S(·) = 1 ε∂λ i σi2 it [ρ i (L)F t ]g Ft (·) = ∂S(·)∂F t==N∑i=1N∑i=1g ρk,i (·) = ∂S(·) = 1 ∂ρ k,i σi2g σ 2i(·) = ∂S(·)∂σ 2 i=t=p i +11σ 2 i(ε it λ i − ρ 1,i ε i,t+1 λ i − · · · − ρ pi ,iε i,t+pi λ i )(8)1[ρ i (L −1 )ε it ]λ i (9)σ 2 iT∑t=p i +1∑ Tt=p i +1 ε2 it2σ 4 iε it (x i,t−k − λ ′ i F t−k) (10)− T , (11)2σi2where ε is = 0 for s > T. The PC-<strong>GLS</strong> estimator is obtained by setting thesegradients equal to zero and solving the resulting system iteratively. A practicalproblem is the large dimension <strong>of</strong> the system consisting <strong>of</strong> 2Nr +N + ∑ p i equations.Accordingly, in many practical situations it is very demanding to computethe inverse <strong>of</strong> the Hessian matrix that is required to construct an iterative minimizationalgorithm. We therefore suggest a simple two-step estimator that willbe shown to be asymptotically equivalent to the PC-<strong>GLS</strong> estimator.Let us first assume that the covariance parameters ρ and Σ are known. The(infeasible) two-step estimators ˜F t (t = 1, . . .,T) and ˜λ i (i = 1, . . .,N) that resultfrom using the PC estimators as first stage estimators are obtained by solvingthe following sets <strong>of</strong> equations:g Ft (̂Λ, ˜Ft , ρ, Σ) = 0 (12)g λi ( ˜λ i , ̂F, ρ, Σ) = 0, (13)where ̂F = [ ̂F 1 , . . ., ̂F T ] ′ and ̂Λ = [̂λ 1 , . . .,̂λ N ] ′ are the ordinary PC-OLS estimators<strong>of</strong> F and Λ.It is not difficult to see that the two-step estimator <strong>of</strong> λ i is equivalent to theleast-squares estimator <strong>of</strong> λ i in the regression:) ( (ρ i (L)x it = ρ i (L) ̂F) ′λit + ε ∗ it (t = p i + 1, . . .,T). (14)where ε ∗ it = ε it + ρ i (L)(F t − ̂F t ) ′ λ i .7


The two-step estimator <strong>of</strong> F t (given ̂Λ) is more difficult to understand. Considerthe two-way <strong>GLS</strong> transformation that accounts for both serial correlationand heteroskedasticity:1σ iρ i (L)x it = 1 σ îλ′ i [ρ i (L)F t ] + 1 σ iε it , (15)where for notational convenience we assume p i = p for all i. Furthermore ournotation ignores the <strong>estimation</strong> error that results from replacing λ i by ̂λ i . 4We will argue below that in order to estimate F t we can ignore the <strong>GLS</strong>transformation that is due to serial correlation. But let us first consider the fulltwo-step <strong>GLS</strong> estimator <strong>of</strong> F t that corresponds to condition (9). Collecting theequations for t = p + 1, . . .,T the model can be re-written in matrix notation as˜X i = ˜Z i f + ˜ε i , (16)where ˜X i = σ −1i [ρ i (L)x i,p+1 , . . .,ρ i (L)x iT ] ′ , ˜ε i = σ −1i [ε i,p+1 , . . .,ε iT ] ′ , ˜Zi =′[̂λ i ⊗ R(ρ (i) )], and f = vec(F). The complete system can be written asσ −1i˜x = ˜Zf + ˜ε, (17)where ˜x = [ ˜X 1, ′ . . ., ˜X N ′ ]′ , ˜Z = [ ˜Z′ 1 , . . ., ˜Z N ′ ]′ , and ˜ε = [˜ε ′ 1, . . ., ˜ε ′ N ]′ . To see thatthe least-squares estimator <strong>of</strong> f is equivalent to a two-step estimator setting thegradient (9) equal to zero (given some initial estimator <strong>of</strong> λ i ), consider the modelwith only one <strong>factor</strong> (i.e., f = F) and ρ i (L) = 1 − ρ i L. Since⎡⎤⎡ ⎤−ρ i 0 0 · · · 0 ⎡ ⎤−ρ N∑ N∑˜Z i˜ε ̂λ 1 −ρ i 0 · · · 0ε i ε i2i2′ ii =σ 2 0 1 −ρ i · · · 0ε i3⎢i=1 i=1 i⎢⎣.. ..⎥⎣ε i4⎥⎦ = ∑ N̂λ ε i2 − ρ i ε i3iσ⎦2 ε i3 − ρ i ε i4i=1 i⎢ ⎥⎣ . ⎦.0 0 0 · · · 1it follows that the system estimator based on (17) solves the first order condition(9). Note that the resulting estimator involves the inversion <strong>of</strong> the T × T matrix˜Z ′ ˜Z, which is computationally demanding if T is large.Fortunately, this estimator can be simplified, since the <strong>GLS</strong> transformationdue to the serial correlation <strong>of</strong> the errors is irrelevant. The <strong>GLS</strong> transformationresulting from heteroskedastic errors yields X ∗ t = Λ ∗ F t +u t , where X ∗ t = Σ −1/2 X t ,4 The complete error term is given by σ −1i [ε it +(λ i −̂λ i ) ′ ρ i (L)F t ]. However, as we will show insection 4, the <strong>estimation</strong> error in ̂λ i does not affect the asymptotic properties <strong>of</strong> the estimator.ε iT8


Λ ∗ = Σ −1/2 Λ, and u t = Σ −1/2 e t . Replacing Λ ∗ by ̂Λ ∗ = Σ −1/2̂Λ, two-step <strong>estimation</strong>implies estimating F 1 , . . .,F T from the systemX ∗ 1 = ̂Λ ∗ F 1 + u ∗ 1.X ∗ T = ̂Λ ∗ F T + u ∗ T ,where u ∗ t = u t +(Λ ∗ −̂Λ ∗ )F t . Note that the vectors u ∗ t and u ∗ s are correlated, whichsuggests to estimate the system by using a <strong>GLS</strong> (SUR) estimator. However, it iswell known that the <strong>GLS</strong> estimator is identical to (equation-wise) OLS <strong>estimation</strong>,if the regressor matrix is identical in all equations. Indeed, since in the presentsetup the regressor matrix is ̂Λ ∗ for all equations, it follows that single-equationOLS <strong>estimation</strong> is as efficient as estimating the whole system by using a <strong>GLS</strong>approach. Thus, the <strong>estimation</strong> procedure for F t can be simplified by ignoringthe serial correlation <strong>of</strong> the errors. This suggests to estimate F t from the crosssectionregression( )1 1x it =ω i ω i F t + u ∗ it (i = 1, . . .,N), (18)îλ′where ω 2 i = E(e 2 it). In what follows we focus on this simplified version <strong>of</strong> thetwo-step <strong>estimation</strong> approach as its properties are equivalent to those <strong>of</strong> the full<strong>GLS</strong> <strong>estimation</strong> procedure..4 Asymptotic distribution <strong>of</strong> the two-step PC-<strong>GLS</strong> estimatorIn this section we first study the asymptotic properties <strong>of</strong> the infeasible twostepestimator, that is, we assume that the covariance parameters ρ k,i and ω 2 i areknown. In the following section we study what happens if the unknown covarianceparameters are replaced by estimates. As we will see, the results derived in thissection carry over to the case <strong>of</strong> estimated covariance parameters.Our analysis is based on the same set <strong>of</strong> assumptions as in Bai (2003):Assumption 1: (i) E||F t || 4 < ∞ for all t and T ∑ −1 Tt=1 F tF t′p→ Ψ F (p.d).(ii) ||λ i || < ∞ for all i and N −1 Λ ′ Λ → Ψ Λ (p.d.). (iii) For the idiosyncraticcomponents it is assumed that E(e it ) = 0, E|e it | 8 < ∞, 0 < |γ N (s, s)| < ∞,9


F ′t as the t’th row <strong>of</strong> F ∗ ) with the understanding that F obeys the normalizationT −1 F ′ F p → I r .As we do not impose the assumptions <strong>of</strong> a strict <strong>factor</strong> model with stationaryidiosyncratic errors, we define the following “pseudo-true” values <strong>of</strong> the autoregressiveand variance parameters:where⎛Γ i = lim E ⎜⎝ 1 T →∞ TT∑t=p i +1T∑ωi 2 = lim T −1T →∞t=1E(e 2 it)[ρ 1,i , . . .,ρ pi ,i] ′ = Γ −1i,11 Γ i,10,⎡ ⎤⎞e i,t−1⎢ ⎥⎣ . ⎦ [ ] ⎟e it · · · e i,t−pi ⎠ = [ ]Γ i,10 Γ i,11 ,e i,t−piΓ i,10 is a p i × 1 vector, and Γ i,11 is a p i × p i matrix.For the asymptotic analysis we need to impose the following assumption.Assumption 2: (i) There exists a positive constant C < ∞, such that 1 C < ω2 i


(ii) If (N, T → ∞) and √ N/T → 0, then for each t,√N( ˜Ft − F t )d−1→ N(0, ˜ΨΛ Ṽ (t)λe˜Ψ−1Λ ),whereṼ (t)λe = limN→∞ N −1 N∑i=1N∑1ω 2 j=1 i ω2 jλ i λ ′ jE(e it e jt )and ˜Ψ Λ = lim N→∞ N −1 Λ ′ Ω −1 Λ and Ω =diag(ω 2 1 , . . .,ω2 N ).Remark 1: From (i) it follows that the asymptotic distribution remains thesame if the estimate ρ i (L) ̂F t in (14) is replaced by ρ i (L)F t . This suggests that the<strong>estimation</strong> error in ̂F t does not affect the asymptotic properties <strong>of</strong> the estimator˜λ i . A similar result holds for the regressor ωi−1 ̂λ i . In other words, the additionalassumptions on the relative rates <strong>of</strong> N and T ensure that the estimates <strong>of</strong> theregressors in equations (14) and (18) can be treated as “super-consistent”. Thefollowing section sheds more light on this important property.Remark 2: The assumptions on the relative rates <strong>of</strong> N and T may appear tobe in conflict with each other. However, the two parts <strong>of</strong> Theorem 1 are fulfilledif N = cT δ where 1/2 < δ < 2. Therefore, the limiting distribution should give areliable guidance if both dimensions N and T are <strong>of</strong> comparable magnitude.Remark 3: It is interesting to compare the result <strong>of</strong> Theorem 1 with the asymptoticdistribution obtained by Choi (2007). In the latter paper it is assumed thatE(e t e ′ t ) = Ω for all t, where e t = [e 1t , . . .,e Nt ] ′ , i.e. the idiosyncratic componentsare assumed to be stationary. In this case the model can be transformed asX ∗ = F ∗ Λ ∗′ +e ∗ , where X ∗ = XΩ −1/2 , F ∗ = FJ, Λ ∗ = Ω −1/2 Λ(J ′ ) −1 , e ∗ = eΩ −1/2andJ = TΛ ′ Ω −1 ΛF ′ ˜F( ˜F ′ XΩ −1 X ′ ˜F) −1 ,such that the covariance matrix <strong>of</strong> e ∗ is identical the identity matrix. Note thatthe normalization <strong>of</strong> the <strong>factor</strong> space is different from the normalization <strong>of</strong> the PC-OLS estimator, whereas our PC-<strong>GLS</strong> estimator adopts the original normalization.Imposing the former normalization, the asymptotic covariance matrix <strong>of</strong> the <strong>GLS</strong>estimator ˜F reduces to a diagonal matrix (cf. Choi 2007).Remark 4: The two-step approach can also be employed to an unbalanceddata set with different numbers <strong>of</strong> time periods for the variables. Stock andWatson (2002b) suggest an EM algorithm, where the missing values are replaced12


y an estimate <strong>of</strong> the common component. Let ˆx it = ̂λ ′ i ̂F t denote the estimatedobservation based on the balanced data set ignoring all time periods with missingobservations. The updated estimates <strong>of</strong> the common <strong>factor</strong>s and <strong>factor</strong> loadingsare obtained by applying the PC-OLS estimator to the enhanced data set, wherethe the missing values are replaced by the estimates ˆx it . Employing the updatedestimates <strong>of</strong> F t and λ i , results in improved estimates <strong>of</strong> the missing values that canin turn be used to yield new PC-OLS estimates <strong>of</strong> the common <strong>factor</strong>s and <strong>factor</strong>loadings. This <strong>estimation</strong> procedure can be iterated until convergence. Usingthe two-step procedure, the <strong>estimation</strong> procedure can be initialized by using thereduced (balanced) data set to obtain the PC-OLS estimates ̂F t and ̂λ i . In thesecond step the vector <strong>of</strong> common <strong>factor</strong>s is estimated from regression (18). As theT cross-section regressions may employ different numbers <strong>of</strong> observations, missingvalues do not raise any problems. Similarly, the N time-series regressions (14)may be based on different sample sizes. As in the EM algorithm, this <strong>estimation</strong>procedure can be iterated until convergence.5 The feasible <strong>GLS</strong> estimatorIn practice the covariance parameters are unknown and must be replaced byconsistent estimates. The feasible two-step PC-<strong>GLS</strong> estimators ˜λ i,bρ and ˜F t,bω solvethe first order conditionswhere˜g λi (˜λ i,bρ , ̂F, ̂ρ (i) ) =˜g Ft (̂Λ, ˜F t,bω , ̂Ω) =T∑t=p i +1N∑i=1[̂ρ i (L)(x it − ˜λ ′ i,bρ ̂F t )][̂ρ i (L) ̂F t ] = 0 (19)1(x it − ̂λ ′ ˜F i t,bω )̂λ i = 0, (20)̂ω 2 îω 2 i = 1 TT∑ê 2 it (21)and ê it = x it − ̂λ ′ ̂F i t . Furthermore, ̂ρ (i) = [̂ρ 1,i , . . ., ̂ρ pi ,i] ′ is the least-squaresestimator from the regressiont=1ê it = ̂ρ 1,i ê i,t−1 + · · · + ̂ρ pi ,iê i,t−pi + ̂ε it . (22)These estimators can be iterated using the resulting estimates ˜λ i,bρ and ˜F t,bω instead<strong>of</strong> the estimates ̂F t and ̂λ i in regressions (14) and (18). Similarly, improved13


estimators <strong>of</strong> the covariance parameters can be obtained from the second stepresiduals ẽ it = x it − ˜λ ′ i,bρ ˜F t,bω . This iterative <strong>estimation</strong> scheme converges to theestimators that maximize the criterion function given by equation (7). To studythe limiting distribution <strong>of</strong> the feasible two-step PC-<strong>GLS</strong> estimator, the followingLemma is used.Lemma 1: Let ̂ρ (i) = [̂ρ 1,i , . . ., ̂ρ pi ,i] ′ denote the least-squares estimates from(22) and ̂ω i 2 is the estimator defined in (21). Under Assumption 1 we have as(N, T → ∞)̂ρ (i) = ρ (i) + O p (T −1/2 ) + O p (δ −2NT ) and ̂ω2 i = ω2 i + O p(T −1/2 ) + O p (δ −2NT ),where δ NT = min( √ N, √ T).The following theorem shows that the asymptotic distributions <strong>of</strong> the feasibletwo-step PC-<strong>GLS</strong> estimators <strong>of</strong> λ i and F t are identical to the ones stated inTheorem 1.Theorem 2: Let ˜λ i,bρ and ˜F t,bω denote the feasible two-step PC-<strong>GLS</strong> estimatorsbased on (19) and (20). Under Assumptions 1–3 and if (N, T → ∞) and√T/N → 0:√T(˜λi,bρ − ˜λ i )If (N, T → ∞) and √ N/T → 0:p→ 0.√N( ˜Ft,bω − ˜F t ) p → 0and, therefore, the estimators ˜λ i,bρ and ˜F t,bω solving ˜g λi (˜λ i,bρ , ̂F, ̂ρ (i) ) = 0 and˜g Ft (̂Λ, ˜F t,bω , ̂Ω) = 0, respectively, have the same limiting distribution as the (infeasible)<strong>GLS</strong> estimators <strong>of</strong> Theorem 1.The pro<strong>of</strong> <strong>of</strong> this theorem is based on the fact that additional terms in the Taylorseries expansion <strong>of</strong> the first order conditions that depend on the derivatives withrespect to remaining parameters converge to zero in probability under the givenconditions on N and T. Another important consequence <strong>of</strong> Theorem 2 is thatthe iterated PC-<strong>GLS</strong> estimators <strong>of</strong> λ i and F t have the same asymptotic propertiesas the feasible two-step estimators. Thus, additional steps do not improvethe asymptotic distribution <strong>of</strong> the estimators. However, further iterations mayimprove the small sample properties.14


6 Small sample propertiesIn order to investigate the small sample properties <strong>of</strong> the proposed estimators, weperform a Monte Carlo study. In particular, we calculate the relative efficiency<strong>of</strong> the respective estimator compared to the infeasible estimator that solves thefirst order conditions ˜g λi (˜λ i , F, ρ (i) ) and ˜g Ft (Λ, ˜F t , Ω). The latter is optimal inthe sense that the respective estimates are based on the true parameter valuesdescribing heteroskedasticity and autocorrelation. Furthermore, the infeasibleestimator employs the true <strong>factor</strong> loadings when estimating the <strong>factor</strong>s, and whenestimating the loadings it uses the true common <strong>factor</strong>s. Thus, inefficiencies whichcould arise from employing estimated regressors and estimated autocorrelationand heteroskedasticity parameters are absent.This infeasible estimator is the reference point for four different approachesconsidered in this investigation, the standard PC estimator, the two-step and iteratedPC-<strong>GLS</strong> estimators as described above, and the quasi maximum likelihood(QML) estimator <strong>of</strong> Doz et al. (2006a). The latter authors suggest maximumlikelihood <strong>estimation</strong> <strong>of</strong> the approximate <strong>dynamic</strong> <strong>factor</strong> model via the Kalmanfilter employing the EM algorithm. In order to make the standard <strong>estimation</strong>approach <strong>of</strong> traditional <strong>factor</strong> analysis feasible in the large approximate <strong>dynamic</strong><strong>factor</strong> environment, Doz et al. (2006a, 2006b) approximate the probabilisticmodel, where their approximation does not allow for cross-sectional correlation<strong>of</strong> the idiosyncratic component. However, their <strong>estimation</strong> procedure does takeinto account <strong>factor</strong> <strong>dynamic</strong>s as well as heteroskedasticity <strong>of</strong> the idiosyncraticerror. 5whereThe data-generating process <strong>of</strong> our Monte Carlo study is the following:x it = λ i F t + e it ,F t = γF t−1 + u t , u tiid∼ N(0, (1 − γ 2 ))e it = ρ i e i,t−1 + ε it , ε itiid∼ N(0, σ 2 i (1 − ρ2 i ))ρ iλ iiid∼ U[a, b]iid∼ U(0, 1).5 Even though their actual implementation <strong>of</strong> the estimator does not allow for serial correlation<strong>of</strong> the idiosyncratic component, Doz et al. (2006a) point out that, in principle, it ispossible to take into account this feature in the <strong>estimation</strong> approach. However, the resultingestimator is computationally demanding as it implies N additional transition equations for theidiosyncratic components.15


The interval from which the ρ i ’s are drawn, i.e. the choice <strong>of</strong> a and b, varies fordifferent simulation setups and is specified below. Furthermore, in our baselinesimulations, we set the number <strong>of</strong> static and <strong>dynamic</strong> <strong>factor</strong>s equal to one and,therefore, F t is a scalar. In order to check the robustness <strong>of</strong> our results, however,we also consider setups where the number <strong>of</strong> <strong>factor</strong>s is increased to five. Twodifferent scenarios are considered. First, we concentrate on the <strong>dynamic</strong> aspectsiidand set γ = 0.7, ρ i ∼ U[0.5, 0.9], as well as σi 2 = 2 ∀i. In the second case, theiidfocus is on heteroskedasticity, where γ = 0, ρ i = 0 ∀i, and σ i ∼ |N( √ 2, 0.25)|.We generate 1000 replications for different (N, T)-specifications. In particular,we set N = 50, 100, 200, 300 and T = 50, 100, 200. To construct a performancemeasure that is invariant to the normalization <strong>of</strong> the <strong>factor</strong>s (or loadings) ourmeasure <strong>of</strong> relative efficiency is based on the ratioeff( ̂F, ̂F 0 ) = 1 − R2 (F, ̂F 0 ), (23)1 − R 2 (F, ̂F)where R 2 (F, ̂F) is the coefficient <strong>of</strong> determination <strong>of</strong> a regression <strong>of</strong> F (the true<strong>factor</strong>) on ̂F (the estimator under consideration) and a constant and R 2 (F, ̂F 0 )denotes the R 2 <strong>of</strong> a similar regression, where ̂F is replaced by the benchmarkestimator ̂F 0 (the infeasible <strong>GLS</strong> estimator). Consequently, numbers close to oneindicate high accuracy <strong>of</strong> the estimator compared to the infeasible <strong>GLS</strong> estimator,whereas numbers close to zero imply low efficiency. 6We begin with a setup featuring only one <strong>factor</strong>. Table 1 reports the resultsfor the case <strong>of</strong> autocorrelated errors. Apparently, the PC and QML estimatorsperform relatively poorly, where the efficiency measures <strong>of</strong> the two estimators are<strong>of</strong> comparable magnitude. The low accuracy can be explained by the fact thatboth estimators fail to take into account serial correlation <strong>of</strong> the idiosyncraticcomponent. However, the QML procedure takes into account the <strong>dynamic</strong>s <strong>of</strong>the common <strong>factor</strong>s. As has been argued in section 3, the <strong>dynamic</strong> properties <strong>of</strong>the <strong>factor</strong>s are irrelevant for the asymptotic properties as N → ∞. In contrast,for the <strong>factor</strong> loadings both the two-step and the iterated PC-<strong>GLS</strong> estimatorachieve a considerable efficiency gain compared to the PC and QML estimators.In particular, with larger T the two PC-<strong>GLS</strong> estimators become increasinglyaccurate and show a similar performance as expected from Theorem 2. Thispicture changes somewhat when examining the results for the <strong>factor</strong>s. Using the6 In the multiple <strong>factor</strong> case, our measure <strong>of</strong> relative efficiency is generalized by consideringthe trace R 2 <strong>of</strong> a regression <strong>of</strong> the true <strong>factor</strong>s on the respective estimated <strong>factor</strong>s and aconstant.16


two-step estimator does not lead to an efficiency gain relative to PC. In thisrespect, note that the two-step regression for the common <strong>factor</strong>s is not affectedby possible autocorrelation <strong>of</strong> the errors but exploits possible heteroskedasticity<strong>of</strong> these terms. Interestingly, iterating the PC-<strong>GLS</strong> estimator until convergenceleads to a dramatic increase in relative accuracy. 7 This is due to the fact that theloadings are estimated more precisely by taking into account the autocorrelation<strong>of</strong> the errors. Thus, in the second step, the regressors have a smaller error andthis improves the efficiency <strong>of</strong> the <strong>factor</strong> estimates.The results for heteroskedastic errors are presented in Table 2. Again, the PCestimator performs relatively poorly when considering the common <strong>factor</strong>s, whichcomes as no surprise, since PC does not take into account possible heteroskedasticity<strong>of</strong> the idiosyncratic component. This contrasts with the high accuracy <strong>of</strong>the estimated <strong>factor</strong> loadings. This is due to the fact that to a first degree, asexplained in section 3, what is important for the efficient <strong>estimation</strong> <strong>of</strong> the <strong>factor</strong>loadings is to allow for autocorrelation <strong>of</strong> the errors. Not surprisingly, sincethere is no serial correlation in this scenario, the accuracy <strong>of</strong> PC is relativelyhigh. Furthermore, the two-step PC-<strong>GLS</strong> estimator <strong>of</strong> the <strong>factor</strong> loadings hasthe same asymptotic properties as the ordinary PC estimator if the errors areserially uncorrelated. In fact, PC exhibits a similar performance as the two-stepPC-<strong>GLS</strong> estimator. A slight efficiency improvement with respect to the loadingsis attainable by employing the iterated PC-<strong>GLS</strong> estimator, in particular if N issmall compared to T. Analogous to the case with autocorrelated errors, the efficiencygain is due to the fact that by estimating the <strong>factor</strong>s more precisely viaincorporating heteroskedasticity, in the second step, the regressors have a smallererror, thus improving the accuracy <strong>of</strong> the estimated <strong>factor</strong> loadings. However, inline with Theorem 2, for larger samples the two PC-<strong>GLS</strong> estimators perform similarly.The same is true for the <strong>factor</strong> estimates, even though, in accordance withTheorem 2, there are slight efficiency improvements when iterating the PC-<strong>GLS</strong>estimator in cases with a large N compared to T. Finally, the QML estimates <strong>of</strong>the <strong>factor</strong>s as well as the <strong>factor</strong> loadings show a strong performance, even slightlybetter than the iterated PC-<strong>GLS</strong> estimator. This is due to the fact, that in thisscenario the approximating model coincides with the true model and the QMLestimator is equivalent to the exact ML estimator.This broad picture also emerges in the case <strong>of</strong> multiple <strong>factor</strong>s. In particular,7 The number <strong>of</strong> iterations is limited to a maximum <strong>of</strong> 5. First, this reduces the computationalburden and we find no further improvement if the number <strong>of</strong> iterations is increased.17


we consider a setup with five <strong>factor</strong>s, where the corresponding results are presentedin Tables 3 and 4. The main difference to the case considered above isthat, in general, larger sample sizes are needed to achieve the same relative performance.This is particularly true for the two-step estimator and to a smallerextend also for the iterated PC-<strong>GLS</strong> estimator. Overall, this feature seems tobe more important in the scenario with autocorrelation than in the setup withheteroskedasticity.For example, under autocorrelation the two-step estimates <strong>of</strong> the <strong>factor</strong> loadingsdo not achieve as considerable a gain in efficiency compared to PC as inthe one <strong>factor</strong> case. For the advantage with respect to accuracy <strong>of</strong> this estimatorto become apparent, larger sample sizes are needed than in the precedingscenario. Increasing the cross-section and time-series dimension up to 500, forexample, leads to relative efficiency measures for the PC, two-step and iteratedPC-<strong>GLS</strong> estimators <strong>of</strong> 0.296, 0.927, and 0.946, respectively. Similarly, the <strong>factor</strong>estimates <strong>of</strong> the iterated PC-<strong>GLS</strong> estimator are a little less precise for the samplessizes considered than in the one <strong>factor</strong> case. This is due to the fact, thatthe loadings are estimated a little less accurate than before, thus leading to lessprecise estimates <strong>of</strong> the regressors, which in turn negatively affects the efficiency<strong>of</strong> the <strong>factor</strong> estimates. Nevertheless, in particular compared to the other estimatorsunder consideration, the iterated PC-<strong>GLS</strong> estimator shows a quite goodperformance.On the other hand, as indicated above, the qualitative findings in the case<strong>of</strong> heteroskedasticity are quite close to the one <strong>factor</strong> case. The most visible differenceis the steep increase in efficiency with the sample size. For example, inaccordance with Theorem 2, the estimates <strong>of</strong> the loadings <strong>of</strong> the two-step estimatorconsiderably gain in accuracy when increasing N. The respective measureincreases for T = 200 from a value <strong>of</strong> 0.324 for N = 50 to 0.843 for N = 300,where in the one <strong>factor</strong> case the corresponding increase is only from 0.663 to0.950. Analogous results are found for the same estimator with respect to the<strong>factor</strong> estimates. In this case, however, as expected from the aforementioned theorem,the efficiency gain results when increasing T. For N = 300 the measure <strong>of</strong>relative efficiency increases from 0.368 to 0.759, when T rises from 50 to 200. Thecorresponding numbers in the scenario with only one <strong>factor</strong> are 0.705 and 0.811.Moreover, also in this setup, iterating the PC-<strong>GLS</strong> estimator until convergenceincreases the accuracy <strong>of</strong> the estimates, sometimes considerably. 88 Unfortunately, in very small sample sizes, convergence <strong>of</strong> the iterated PC-<strong>GLS</strong> estimator is18


7 ConclusionIn this paper we propose a <strong>GLS</strong>-type <strong>estimation</strong> procedure that allows for heteroskedasticand autocorrelated errors. Since the <strong>estimation</strong> <strong>of</strong> the covarianceparameters does not affect the limiting distribution <strong>of</strong> the estimators, the feasibletwo-step PC-<strong>GLS</strong> estimator is asymptotically as efficient as the infeasible<strong>GLS</strong>-estimator (assuming that the covariance parameters are known) and theiterated version that solves the first order condition <strong>of</strong> the (approximate) MLestimator. Notwithstanding these asymptotic results, the results <strong>of</strong> our MonteCarlo experiments suggest that the gain in efficiency by iterating the sequential<strong>GLS</strong> estimator may be substantial.If one is willing to accept the framework <strong>of</strong> a strict <strong>factor</strong> model (that is amodel with cross-sectionally uncorrelated <strong>factor</strong>s and idiosyncratic errors), thenour approach can also be employed for inference. For example, recent work byBreitung and Eickmeier (2008) shows that a Chow-type test for structural breakscan be derived using the iterated PC-<strong>GLS</strong> estimator. Other possible applicationsare LR tests for the number <strong>of</strong> common <strong>factor</strong>s or tests <strong>of</strong> hypotheses on the<strong>factor</strong> space.not always assured. In our setup this only happens for a couple <strong>of</strong> simulation runs for the case<strong>of</strong> heteroskedasticity where N = 50 and T = 200. That is why we choose not to report resultsfor this specification.19


AppendixThe following lemma plays a central role in the pro<strong>of</strong> <strong>of</strong> the following theorems:Lemma A.1: It holds for all k ≤ p i thatT∑T∑(i) T −1 ( ̂F t − F t )F t−k ′ = O p (δ −2NT ), T −1 ( ̂F t − F t ) ̂F t−k ′ = O p (δ −2t=p i +1(ii) T −1 T∑t=p i +1(iii) T −1 T∑(iv)(v)t=p i +1N −1 N∑i=1N −1 N∑i=1̂F t ̂F′t−k = T −1T∑t=p i +1( ̂F t − F t )e i,t−k = O p (δ −21ω 2 i1ω 2 it=p i +1F t F ′t−k + O p(δ −2NT )NT )(̂λ i − λ i )λ ′ i = O p(δ −2NT ),(̂λ i − λ i )e it = O p (δ −2NT ) .N −1 N∑i=11ω 2 iNT )(̂λ i − λ i )̂λ ′ i = O p(δ −2NT )Pro<strong>of</strong>: (i) The pro<strong>of</strong> follows closely the pro<strong>of</strong> for k = 0 provided by Bai (2003,Lemmas B.2 and B.3). We therefore present only the main steps.We start from the representation̂F t − F t = 1 (NT V −1NT ̂F ′ FΛ ′ e t + ̂F ′ eΛF t + ̂F)′ ee t ,where e t = [e 1t , . . .,e Nt ] ′ , e = [e 1 , . . .,e T ] ′ , and V NT is a r × r diagonal matrix <strong>of</strong>the r largest eigenvalues <strong>of</strong> (NT) −1 XX ′ (cf. Bai, 2003, Theorem 1). Consider(1T∑(T̂F t − F t )F ′ 1T∑−1t−k =NT 2VNT̂F ′ FΛ ′ e t F t−k ′ + ̂FT∑′ eΛ F t F t−k′t=p i +1t=p i +1t=p i +1)= I + II + III.From Assumption 1 (v) it follows thatΛ ′T∑t=p i +1e t F ′t−k = N∑T∑i=1 t=p i +120+ ̂F ′ eT∑t=p i +1e t F ′t−ke it λ i F ′t−k = O p( √ NT).


and using Lemma B.2 <strong>of</strong> Bai (2003) it follows that T −1 ̂F ′ F = T −1 F ′ F +T −1 ( ̂F −F) ′ F = T −1 F ′ F + O p (δ −2NT). Thus, we obtain( ) ( )I = V −1NTT −1 ̂F ′ 1T∑( )F √ Λ ′ e t F t−k′ 1 1√ = O p √ .NT NT NTNext, we considert=p i +1Λ ′ e ′ ̂F = Λ′T∑t=1e t F ′t + Λ′T∑e t ( ̂F t − F t ) ′ .t=1Following Bai (2003, p. 160), we have1NT Λ′1NT Λ′t=1T∑t=1( ) 1e t F t ′ = O p √NTT∑( )e t ( ̂F t − F t ) ′ 1= O p √ .δ NT NUsing T −1 ∑ Tt=p i +1 F ′t F t−k = O p (1), we obtain( ) ( 1II = V −1NTNT ̂F ′ 1eΛTT∑t=p i +1For the remaining term, we obtainwhere= 1T 21NT 2 ̂F ′ eT∑T∑t=p i +1T∑s=1 t=p i +1) [ ( ) ( )]1F t F t−k′1= O p √ + O p √ O p (1).NT δ NT Ne t F ′t−k = 1NT 2As in Bai (2003, p. 164f), we obtainIII = V −1NTT∑̂F s F ′t−kζ NT (s, t) + 1T 2T∑s=1 t=p i +1T∑e ′ s e t ̂F s F ′t−kT∑s=1 t=p i +1ζ NT (s, t) = e ′ s e t/N − γ N (s, t)γ N (s, t) = E(e ′ se t /N).[O p( 1δ NT√T)+ O p()]1√ .δ NT N̂F s F ′t−kγ N (s, t),21


Collecting these results, we obtain( ) ( )1 1I + II + III = O p √ + O p √NT TδNT( )1+ O p √NδNT( ) 1= O p .δNT2The pro<strong>of</strong> <strong>of</strong> the second result in (i) is a similar modification <strong>of</strong> Lemma A.1 inBai (2003) and is therefore omitted.(ii) ConsiderT −1T∑t=p i +1= T −1 ( T∑= T −1 ⎛⎝= T −1 T∑t=p i +1T∑t=p i +1t=p i +1̂F t ̂F′t−k = T −1T∑t=p i +1[F t + ( ̂F t − F t )][F t−k + ( ̂F t−k − F t−k )] ′F t F ′t−k + ( ̂F t − F t )F ′t−k + F t ( ̂F ′t−k − F ′t−k) + ( ̂F t − F t )( ̂F ′t−k − F ′t−k)F t F t−k ′ + F t ( ̂F t−k ′ − F t−k)′ + (} {{ }̂F t − F t ) ̂F ′ ⎠t−k} {{ }TaTbF t F ′t−k + a + b.Using (i) the terms a and b can be shown to be O p (δ −2NT ).(iii) The pro<strong>of</strong> <strong>of</strong> k = 0 is given in Bai (2003, Lemma B.1). It is not difficultto see that the result remains unchanged if k ≠ 0.(iv) Following Bai (2003, p. 165) we havêλ i − λ i = T −1 F ′ e i + T −1 ̂F ′ (F − ̂F)λ i + T −1 ( ̂F − F) ′ e i , (24)⎞)where e i = [e i1 , . . .,e iT ] ′ . Post-multiplying by ωi−2 λ ′ i and averaging yieldsN(∑N)N −1 1(̂λ i − λ i )λ ′ i = T ∑−1 F ′ N −1 1e i λ ′ ii=1ω 2 i( N) (+ T −1 ̂F ′ (F − ̂F)∑N)N −1 1λ i λ ′ i + T −1 ( ̂F∑− F) ′ N −1 1e i λ ′ ii=1ω 2 ii=1ω 2 ii=1ω 2 i.From Bai (2003, p. 165) it follows that the last two terms are O p (δ −2NT). FromAssumption 1 (v) and Assumption 2 (i) it follows thatT∑ 1F t λ ′ i∣∣e it∣∣ ≤ 1T∑F t λ ′ i∣∣e it∣∣ = O p(1/ √ T),t=1ω 2 iω 2 min22t=1


where ω min = min(ω 1 , . . ., ω N ). Thus, the first part <strong>of</strong> (iv) is O p (δ −2NT). Thesecond equation can be shown by using the first part and Lemma A.1 (v).(v) From (24) it follows that∑ N T∑ N∑N −1 (̂λ i − λ i )e it = N −1 T −1i=1= a + b.s=1 i=1̂F s e is e it + N −1 T −1 T∑N∑s=1 i=1̂F s (F s − ̂F s ) ′ λ i e itFor expression a we writeT∑ ∑ N T∑N −1 T −1 ̂F s e is e it = T −1s=1 i=1s=1[ N]∑T∑̂F s N −1 e is e it − E(e is e it ) +T −1i=1s=1̂F s γ N (s, t)From Lemma A.2 (a) and (b) <strong>of</strong> Bai (2003) it follows that the first term on ther.h.s. is O p (N −1/2 δ −1NT ), whereas the second term is O p(T −1 ).To analyze b we note that by Lemma A.1 (i) and Assumption 1 (v)[] [T∑N]T −1 ̂F s (F s − ̂F∑s ) ′ N −1 λ i e it = O p (δ −2NT )O p(N −1/2 )s=1Collecting these results, it follows that∣∣ ∣∣ N ∣∣ N ∑−1 1 ∣∣∣∣ ∣∣∣∣(̂λωi2 i − λ i )e it ≤ 1N ωmin2 ∣∣ N ∑ ∣∣∣∣ ∣∣∣∣−1 (̂λ i − λ i )e iti=1i=1= O p (T −1 ) + O p (N −1/2 δ −1NT ) + O p(N −1/2 δ −2NT ) = O p(δ −2NT ).Pro<strong>of</strong> <strong>of</strong> Theorem 1:The two-step estimator <strong>of</strong> λ i is obtained asi=1˜λ i = [ ̂F ′ R(ρ (i) ) ′ R(ρ (i) ) ̂F] −1 ̂F ′ R(ρ (i) ) ′ R(ρ (i) )X i= [ ̂F ′ R(ρ (i) ) ′ R(ρ (i) ) ̂F] −1 ̂F ′ R(ρ (i) ) ′ R(ρ (i) )(Fλ i + e i )= [ ̂F ′ R(ρ (i) ) ′ R(ρ (i) ) ̂F] −1 ̂F ′ R(ρ (i) ) ′ R(ρ (i) ){[ ̂F + (F − ̂F)]λ i + e i )}˜λ i − λ i = [ ̂F ′ R(ρ (i) ) ′ R(ρ (i) ) ̂F] −1 ̂F ′ R(ρ (i) ) ′ R(ρ (i) )[(F − ̂F)λ i + e i ],where e i = [e i1 , . . .,e iT ] ′ . Using Lemma A.1 (ii) it followsT −1 ̂F ′ R(ρ (i) ) ′ R(ρ (i) ) ̂F= T −1 F ′ R(ρ (i) ) ′ R(ρ (i) )F + O p (δ −2p→(i) ˜ΨF .NT )23


Using Lemma A.1 (i) we obtain T −1 ∑ Tt=p i +1 ̂F t−k ( ̂F ′t−k − F ′t−k ) is O p(δ −2NT ) and,therefore,T −1 ̂F ′ R(ρ (i) ) ′ R(ρ (i) )( ̂F − F)λ i = O p (δ −2NT ).Finally we considerT∑T∑[ρ i (L) ̂F t ] [ρ i (L)e it ] = T −1/2T −1/2t=p i +1t=p i +1= T −1/2 T∑t=p i +1ρ i (L)[F t + ( ̂F t − F t )]ρ i (L)e itρ i (L)F t [ρ i (L)e it ] + O p ( √ T/δ 2 NT ),where Lemma A.1 (iii) is used. Thus we find√T(˜λi −λ i ) = [T −1 F ′ R(ρ (i) ) ′ R(ρ (i) )F] −1 T −1/2 F ′ R(ρ (i) ) ′ R(ρ (i) )e i +O p ( √ T/δ 2 NT) ,where √ T/δ 2 NT → 0 if √ T/N → 0. Finally, Assumption 1 (v) implieswhere Ṽ (i)Fefollows.T −1/2 F ′ R(ρ (i) ) ′ R(ρ (i) )e id → N(0, Ṽ (i)Fe ),is defined in Theorem 1. With these results, part (i) <strong>of</strong> the theoremThe pro<strong>of</strong> <strong>of</strong> part (ii) is similar. We therefore present the main steps only. Thetwo-step estimator <strong>of</strong> F t is given by˜F t= (̂Λ ′ Ω −1̂Λ) −1̂Λ′ Ω −1 X t= (̂Λ ′ Ω −1̂Λ) −1̂Λ′ Ω −1 [(̂Λ − ̂Λ + Λ)F t + e t ]˜F t − F t = (̂Λ ′ Ω −1̂Λ) −1̂Λ′ Ω −1 [(Λ − ̂Λ)F t + e t ],where e t = [e 1t , . . .,e Nt ] ′ . Following Bai (2003) and using Lemma A.1 (iv) and(v) it follows thatN −1̂Λ′ Ω −1̂Λ = N −1 Λ ′ Ω −1 Λ + O p (δ −2NT ) p→ ˜ΨΛN −1̂Λ′ Ω −1 (̂Λ − Λ) = O p (δ −2NT )N −1 (̂Λ − Λ) ′ Ω −1 e t= O p (δ −2NT )N −1/2̂Λ′ Ω −1 e t = N −1/2 Λ ′ Ω −1 e t + O p ( √ N/δ 2 NT )N −1/2 Λ ′ Ω −1 e d t → N(0, Ṽ (t)λe())Ṽ (t)λe= E lim N −1 Λ ′ Ω −1 e t e ′ tN→∞ Ω−1 Λ= lim N ∑ N N∑−1 1λN→∞ ω 2 i λ ′ jE(e it e jt ).i=1 j=1 i ω2 jFrom these results the limit distribution stated in Theorem 1 (ii) follows.24


Pro<strong>of</strong> <strong>of</strong> Lemma 1:Let⎡⎢z t = ⎣e it.e i,t−pi +1⎤⎡⎥ ⎢⎦ and ẑ t = ⎣x it − ̂λ ′ i ̂F t.x i,t−pi +1 − ̂λ ′ i ̂F t−pi +1Using the same arguments as in Lemma 4 <strong>of</strong> Bai and Ng (2002) it can be shownthatT∑T∑T −1 ê it ẑ i,t−1 − T −1 e it z i,t−1 = O p (δ −2NT )t=p i +1t=p i +1and T ∑ −1 Tt=p i +1 (ẑ i,t−1ẑ i,t−1 ′ −z i,t−1z i,t−1 ′ ) = O p(δ −2NT). Therefore, we obtain for theleast-squares estimator <strong>of</strong> ρ (i)̂ρ (i) = ρ (i) +( T∑t=p i +1z t−1 z ′ t−1) −1T∑= ρ (i) + O p (T −1/2 ) + O p (δ −2NT )t=p i +1and, similarly, for the least-square estimator <strong>of</strong> ωi 2 :() (T∑̂ω i 2 = ωi 2 + T −1 e 2 it − ω2 i +t=p i +1= ω 2 i + O p (T −1/2 ) + O p (δ −2NT )T −1⎤⎥⎦.z t−1 e it + O p (δ −2NT )T∑t=p i +1(ê 2 it − e2 it ) )Pro<strong>of</strong> <strong>of</strong> Theorem 2:The <strong>estimation</strong> error <strong>of</strong> the first-step estimators does not affect the second-stepestimators if the first derivative <strong>of</strong> the first order condition is <strong>of</strong> smaller stochasticorder (e.g. Newey and McFadden 1994). A first order Taylor expansion <strong>of</strong> thefirst order condition around the true parameter values yields˜g λi (λ i , ̂F, ̂ρ (i) ) ≃ ˜g λi (λ i , F, ρ (i) ) +˜g Ft (̂Λ, F t , ̂Ω) ≃ ˜g Ft (Λ, F t , Ω) +T∑t=p i +1pD 1Ft ( ̂F∑ it − F t ) + D 1ρk,i (̂ρ i,k − ρ i,k )N∑D 2λi (̂λ i − λ i ) +i=1i=1N∑D 2ω 2i(̂ω i 2 − ω2 i ),i=125


whereD 1Ft = ∂˜g λi (λ i , F, ρ (i) )/∂F ′t = ρ i(L −1 )ε it − λ i [ρ i (L −1 )ρ i (L)F t ] ′D 2λiD 1ρk,i= ∂˜g Ft (Λ, F t , Ω)/∂λ ′ i = − 1 [Fωi2 t λ ′ i − e itI r ]( T∑)= ∂˜g λi (λ i , F, ρ (i) )/∂ρ k,i = − ε it F t−k + e i,t−k [ρ i (L)F t ]t=p i +1D 2ω 2i= ∂˜g Ft (Λ, F t , Ω)/∂ω 2 i = − 1 ω 4 ie it λ i .Using the results <strong>of</strong> Lemma A.1 we obtainT −1T∑t=p i +1N −1 N∑i=1D 1Ft ( ̂F t − F t ) = O p (δ −2NT )D 2λi (̂λ i − λ i ) = O p (δ −2NT ).From Assumption 1 (v) it follows that D 1ρk,i is O p (T 1/2 ) and by using Lemma 1we obtainD 1ρk,i (̂ρ i,k − ρ i,k ) = O p (1) + O p ( √ T/δ 2 NT)D 2ω 2i(̂ω 2 i − ω 2 i ) = − e it(e ′ i e i/T − ω 2 i )ω 4 iUnder Assumption 1 (iii) it follows thatN −1 N∑i=1With these results we obtain= O p (T −1/2 ) + O p (δ −2NT ).λ i + O p (δ −2NT )D 2ω 2i(̂ω 2 i − ω 2 i ) = O p (N −1/2 T −1/2 ) + O p (δ −2NT ).1√ ˜g λi (λ i , ̂F, ̂ρ (i) ) = √ 1 ˜g λi (λ i , F, ρ (i) ) + O p ( √ T/δNT)2T T1√N˜g Ft (̂Λ, F t , ̂Ω) = 1 √N˜g Ft (Λ, F t , Ω) + O p ( √ N/δ 2 NT ) + O p(T −1/2 ).If (N, T → ∞) and √ T/N → 0, then T −1/2˜g λi (λ i , ̂F, ̂ρ (i) ) converges to T −1/2˜g λi (λ i , F, ρ (i) ).If √ N/T → 0, then N −1/2˜g Ft (̂Λ, F t , ̂ω 2 i ) converges to N −1/2˜g Ft (Λ, F t , ω 2 i ).26


The first order conditions <strong>of</strong> ρ k,i and ω 2 i result as˜g ρk,i (λ i , F, ρ k,i ) = −˜g ω 2i(λ i , F, ω 2 i ) = −T∑t=p i +1T∑t=p i +1[ρ i (L)x it − λ ′ i ρ i(L)F t ](x i,t−k − λ ′ i F t−k)[(xit − λ ′ iF t ) 2 − ω 2 i]The derivatives with respect to λ i are given by∂˜g ρk,i (·)T∑= − e i,t−k [ρ i (L)F t ] + ε it F t−k = O p (T 1/2 )∂λ ′ i∂˜g ω 2i(·)∂λ ′ i= 2t=p i +1T∑t=p i +1e it F t = O p (T 1/2 ).If e it is weakly cross-correlated as assumed in Assumption 1 and √ T/N → 0 thenSimilarly,∂˜g ρk,i (·)∂λ ′ i(̂λ i − λ i ) = O} {{ } p (1).O p(T −1/2 )∂˜g ω 2i(·)(̂λ i − λ i ) = O p (1).∂λ ′ iSince ˜g ρk,i (·) and ˜g ω 2i(·) are O p (T 1/2 ) it follows that the <strong>estimation</strong> error <strong>of</strong> theestimate <strong>of</strong> λ i does not affect the asymptotic properties if √ T/N → 0.The derivatives with respect to F t are obtained asIt follows from Lemma A.1 thatT∑ ∂˜g ρk,i (·)T −1∂˜g ρk,i (·)∂F t= ρ i (L −1 )e i,t−k λ i + ε i,t+k λ i∂˜g ω 2i(·)∂F t= 2e it λ i .t=p i +1T −1 T∑t=1∂F ′t( ̂F t − F t ) = O p (δ −2NT )∂˜g ωi (·)(∂F ̂Ft′ t − F t ) = O p (δ −2NT ).Therefore, the first derivatives with respect to F t vanish if √ T/δ√NT2 → 0 orT/N → 0 and the asymptotic properties <strong>of</strong> the estimators ˜λi,bρ −˜λ i and ˜F t,bρ − ˜F tare the same as if they were computed by using the true errors e it .27


ReferencesAnderson, T. W. (1984), Introduction to Multivariate Statistical Analysis, Sec.Ed., John Wiley: New York.Bai, J. (2003), Inferential Theory for Factor Models <strong>of</strong> Large Dimensions, Econometrica,71, 135–172.Bai, J. (2004), Estimating Cross-section Common Stochastic Trends in NonstationaryPanel Data, Journal <strong>of</strong> <strong>Econometrics</strong>, 122, 137-183.Bai, J. (2005), Panel Data Models with Interactive Fixed Effects, New YorkUniversity, mimeo.Bai, J., and S. Ng (2002), Determining the Number <strong>of</strong> Factors in ApproximateFactor Models, Econometrica, 70, 191–221.Bai, J., and S. Ng (2004), A PANIC Attack on Unit Roots and Cointegration,Econometrica, 72, 1127–1177.Bernanke, B. S., J. Boivin, and P. Eliasz (2004), Measuring the Effects <strong>of</strong>Monetary Policy: A Factor-Augmented Vector Autoregressive (FAVAR)Approach, NBER Working Paper 10220.Boivin, J., and S. Ng (2006), Are More Data Always Better for Factor Analysis,Journal <strong>of</strong> <strong>Econometrics</strong>, 132, 169–194.Breitung, J., and S. Eickmeier (2008) Testing for structural breaks in <strong>dynamic</strong><strong>factor</strong> <strong>models</strong>, working paper, University <strong>of</strong> Bonn.Chamberlain, G., and M. Rothschild (1983), Arbitrage, Factor Structureand Mean-Variance Analysis in Large Asset Markets, Econometrica, 51,1305-1324.Choi, I. (2007), Efficient Estimation <strong>of</strong> Factor Models, Working Paper,http://ihome.ust.hk/˜inchoi.Doz, C., D. Giannone, and L. Reichlin (2006a), A quasi maximum likelihoodapproach for large approximate <strong>dynamic</strong> <strong>factor</strong> <strong>models</strong>, Working PaperSeries 674, European Central Bank.28


Doz, C., D. Giannone, and L. Reichlin (2006b), A Two-step Estimator forLarge Approximate Dynamic Factor Models Based on Kalman Filtering,Working Paper, ECARES - Université Libre de Bruxelles.Eickmeier, S. (2007), Business Cycle Transmission from the US to Germany -A Structural Factor Approach, European Economic Review, 51, 521–551.Eickmeier, S., and C. Ziegler (2007), How Successful are Dynamic FactorModels at Forecasting Output and Inflation? A Meta-Analytic Approach,forthcoming in: Journal <strong>of</strong> Forecasting.Forni, M., M. Hallin, F. Lippi, and L. Reichlin (2000), The Generalized DynamicFactor Model: Identification and Estimation, The Review <strong>of</strong> Economicsand Statistics, 82, 540–554.Forni, M., M. Hallin, M. Lippi, and L. Reichlin (2005), The generalized <strong>dynamic</strong><strong>factor</strong> model: one-sided <strong>estimation</strong> and forecasting, Journal <strong>of</strong> theAmerican Statistical Association, 100, 830–840.Giannone, D., L. Reichlin, and L. Sala (2002), Tracking Greenspan: Systematicand Unsystematic Monetary Policy Revisited, Working Paper, ECARES- Université Libre de Bruxelles.Jungbacker, B., and S.J. Koopman (2008), Likelihood-based Analysis for DynamicFactor Models, Free University <strong>of</strong> Amsterdam, mimeo.Newey, W., and D. McFadden (1994), Large Sample Estimation and HypothesisTesting, in Handbook <strong>of</strong> <strong>Econometrics</strong>, v. 4., eds. R. Engle andD. McFadden. Amsterdam: Elsevier Science.Phillips, P. C. B. (1986), Understanding Spurious Regressions in <strong>Econometrics</strong>,Journal <strong>of</strong> <strong>Econometrics</strong>, 33, 311–340.Stock, J. H., and M. W. Watson (2002a), Macroeconomic Forecasting UsingDiffusion Indexes, Journal <strong>of</strong> Business & Economic Statistics, 20, 147-162.Stock, J. H., and M. W. Watson (2002b), Forecasting Using Principal ComponentsFrom a Large Number <strong>of</strong> Predictors, Journal <strong>of</strong> the American StatisticalAssociation, 97, 1167-1179.29


Stock, J. H., and M. W. Watson (2005), Implications <strong>of</strong> Dynamic Factor Modelsfor VAR Analysis, NBER Working Paper no. 11467.Watson, M. W. (2003), Macroeconomic forecasting using many predictors, in:Dewatripont, M., L. Hansen, S. Turnovsky (eds.), Advances in <strong>Econometrics</strong>,Theory and Applications, Eighth World Congress <strong>of</strong> the EconometricSociety, Vol. III, 87-115.30


Figure 1: Histogram <strong>of</strong> the sample variances8765Frequency432100 0.2 0.4 0.6 0.8 12σ , mean=0.55, std=0.29i31


Figure 2: Histogram <strong>of</strong> the sample autocorrelations141210Frequency86420−0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.8 1ρ i, mean=0.01, std=0.4132


iidTable 1: Relative efficiency: one <strong>factor</strong>, autocorrelation (γ = 0.7, ρ i ∼ U[0.5, 0.9],σi 2 = 2) loadings (λ i ) <strong>factor</strong>s (F t )PC two-step iterated QML PC two-step iterated QMLT=50N=50 0.446 0.663 0.850 0.431 0.409 0.403 0.735 0.300N=100 0.457 0.735 0.914 0.457 0.317 0.314 0.763 0.242N=200 0.467 0.778 0.943 0.472 0.222 0.219 0.780 0.166N=300 0.465 0.787 0.951 0.475 0.165 0.163 0.765 0.133T=100N=50 0.376 0.787 0.845 0.341 0.692 0.674 0.871 0.469N=100 0.386 0.865 0.916 0.377 0.621 0.606 0.878 0.464N=200 0.397 0.908 0.950 0.392 0.530 0.519 0.892 0.411N=300 0.401 0.922 0.962 0.396 0.452 0.444 0.887 0.361T=200N=50 0.328 0.823 0.829 0.294 0.868 0.851 0.931 0.651N=100 0.346 0.907 0.914 0.327 0.840 0.825 0.941 0.668N=200 0.356 0.946 0.955 0.344 0.789 0.774 0.944 0.661N=300 0.357 0.959 0.967 0.350 0.749 0.736 0.946 0.634Notes: Entries are the performance measure defined in (23). PC is the ordinary principalcomponent estimator, two-step and iterated indicate the two-step PC-<strong>GLS</strong> and iteratedPC-<strong>GLS</strong> estimators, respectively, introduced in section 3, and QML is the quasi maximumlikelihood estimator <strong>of</strong> Doz et al. (2006b).33


Table 2: Relative efficiency: one <strong>factor</strong>, heteroskedasticity (γ = 0, ρ i = 0 ∀i,σ iiid∼ |N( √ 2, 0.25)|)loadings (λ i ) <strong>factor</strong>s (F t )PC two-step iterated QML PC two-step iterated QMLT=50N=50 0.818 0.804 0.932 0.953 0.344 0.692 0.821 0.860N=100 0.914 0.890 0.956 0.978 0.305 0.722 0.833 0.843N=200 0.966 0.940 0.969 0.991 0.259 0.718 0.839 0.864N=300 0.983 0.953 0.970 0.994 0.239 0.705 0.849 0.769T=100N=50 0.748 0.740 0.936 0.951 0.376 0.794 0.872 0.927N=100 0.887 0.873 0.963 0.977 0.337 0.810 0.877 0.929N=200 0.951 0.933 0.974 0.989 0.289 0.792 0.874 0.921N=300 0.971 0.952 0.979 0.995 0.257 0.780 0.876 0.922T=200N=50 0.667 0.663 0.938 0.944 0.401 0.852 0.897 0.958N=100 0.855 0.846 0.967 0.975 0.344 0.843 0.895 0.959N=200 0.936 0.926 0.982 0.990 0.302 0.831 0.895 0.951N=300 0.962 0.950 0.985 0.989 0.268 0.811 0.891 0.830Notes: See Table 1.34


iidTable 3: Relative efficiency: five <strong>factor</strong>s, autocorrelation (γ =0.7, ρ i ∼ U[0.5, 0.9],σi 2 = 2) loadings (λ i ) <strong>factor</strong>s (F t )PC two-step iterated QML PC two-step iterated QMLT=50N=50 0.469 0.521 0.584 0.460 0.607 0.607 0.612 0.571N=100 0.464 0.520 0.618 0.459 0.407 0.405 0.473 0.370N=200 0.463 0.520 0.645 0.458 0.239 0.237 0.326 0.219N=300 0.459 0.518 0.655 0.460 0.167 0.165 0.250 0.155T=100N=50 0.320 0.414 0.508 0.297 0.610 0.607 0.653 0.530N=100 0.319 0.442 0.662 0.301 0.428 0.423 0.619 0.357N=200 0.318 0.462 0.805 0.304 0.271 0.267 0.618 0.223N=300 0.318 0.475 0.861 0.307 0.202 0.199 0.615 0.166T=200N=50 0.235 0.365 0.452 0.194 0.663 0.657 0.736 0.530N=100 0.246 0.468 0.717 0.212 0.532 0.525 0.807 0.395N=200 0.265 0.613 0.864 0.233 0.424 0.418 0.849 0.301N=300 0.275 0.694 0.908 0.248 0.364 0.358 0.858 0.260Notes: See Table 1.35


Table 4: Relative efficiency: five <strong>factor</strong>s, heteroskedasticity (γ = 0, ρ i = 0 ∀i,σ iiid∼ |N( √ 2, 0.25)|)loadings (λ i ) <strong>factor</strong>s (F t )PC two-step iterated QML PC two-step iterated QMLT=50N=50 0.636 0.631 0.697 0.770 0.389 0.435 0.497 0.590N=100 0.696 0.689 0.811 0.877 0.265 0.342 0.492 0.595N=200 0.804 0.790 0.913 0.952 0.206 0.335 0.597 0.665N=300 0.869 0.851 0.940 0.973 0.196 0.368 0.656 0.689T=100N=50 0.465 0.463 0.559 0.738 0.393 0.449 0.522 0.729N=100 0.588 0.583 0.818 0.892 0.300 0.438 0.680 0.793N=200 0.783 0.773 0.937 0.954 0.283 0.579 0.821 0.822N=300 0.864 0.851 0.956 0.970 0.268 0.615 0.828 0.825T=200N=50 0.325 0.324 n/a 0.749 0.404 0.472 n/a 0.855N=100 0.508 0.505 0.816 0.895 0.346 0.571 0.786 0.892N=200 0.765 0.759 0.943 0.952 0.340 0.743 0.878 0.900N=300 0.851 0.843 0.961 0.969 0.322 0.759 0.876 0.898Notes: See Table 1.36

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!