Moneta Chlass Entner Hoyersecond step, conditional <strong>in</strong>dependence relations (or d-separations, which are the graphicalcharacterization of conditional <strong>in</strong>dependence) are merely used to erase edges and,<strong>in</strong> further steps, to direct edges. The output of such algorithms are not necessarily ones<strong>in</strong>gle graph, but a class of Markov equivalent graphs.There is noth<strong>in</strong>g neither <strong>in</strong> the Markov or faithfulness condition, nor <strong>in</strong> the constra<strong>in</strong>tbasedalgorithms that limits them to l<strong>in</strong>ear and Gaussian sett<strong>in</strong>gs. Graphical causalmodels do not require per se any a priori specification of the functional dependence betweenvariables. However, <strong>in</strong> applications of graphical models to SVAR, conditional<strong>in</strong>dependence is ascerta<strong>in</strong>ed by test<strong>in</strong>g vanish<strong>in</strong>g partial correlations (Swanson andGranger, 1997; Bessler and Lee, 2002; Demiralp and Hoover, 2003; Moneta, 2008).S<strong>in</strong>ce normal distribution guarantees the equivalence between zero partial correlationand conditional <strong>in</strong>dependence, these applications deal de facto with l<strong>in</strong>ear and Gaussianprocesses.2.2. Test<strong>in</strong>g residuals zero partial correlationsThere are alternative methods to test zero partial correlations among the error termsû t = (u 1t ,...,u kt ) ′ . Swanson and Granger (1997) use the partial correlation coefficient.That is, <strong>in</strong> order to test, for <strong>in</strong>stance, ρ(u it ,u kt |u jt ) = 0, they use the standard t statisticsfrom a least square regression of the model:u it = α j u jt + α k u kt + ε it , (7)on the basis that α k = 0 ⇔ ρ(u it ,u kt |u jt ) = 0. S<strong>in</strong>ce Swanson and Granger (1997) imposethe partial correlation constra<strong>in</strong>ts look<strong>in</strong>g only at the set of partial correlations of orderone (that is conditioned on only one variable), <strong>in</strong> order to run their tests they considerregression equations with only two regressors, as <strong>in</strong> equation (7).Bessler and Lee (2002) and Demiralp and Hoover (2003) use Fisher’s z that is<strong>in</strong>corporated <strong>in</strong> the software TETRAD (Sche<strong>in</strong>es et al., 1998):z(ρ XY.K ,T) = 1 (︃ )︃√︀ |1 + ρXY.K |T − |K| − 3 log2|1 − ρ XY.K | , (8)where |K| equals the number of variables <strong>in</strong> K and T the sample size. If the variables(for <strong>in</strong>stance X = u it , Y = u kt , K = (u jt ,u ht )) are normally distributed, we have thatz(ρ XY.K ,T) − z(ˆρ XY.K ,T) ∼ N(0,1) (9)(see Spirtes et al., 2000, p.94).A different approach, which takes <strong>in</strong>to account the fact that correlations are obta<strong>in</strong>edfrom residuals of a regression, is proposed by Moneta (2008). In this case it is usefulto write the VAR model of equation (3) <strong>in</strong> a more compact form:Y t = Π ′ X t + u t , (10)110
Causal Search <strong>in</strong> SVARwhere X ′ t = [Y′ t−1 , ...,Y′ t−p ], which has dimension (1 × kp) and Π′ = [A 1 ,...,A p ], whichhas dimension (k × kp). In case of stable VAR process (see next subsection), the conditionalmaximum likelihood estimate of Π for a sample of size T is given byMoreover, the ith row of ˆΠ ′ is⎡ ⎤⎡⎤T∑︁ T∑︁ˆΠ ′ = ⎢⎣ Y t X ′ t⎥⎦⎢⎣ X t X ′ t⎥⎦t=1t=1t=1⎤⎡⎤T∑︁ T∑︁ˆπ ′ i⎡⎢⎣ = Y it X ′ t⎥⎦⎢⎣ X t X ′ t⎥⎦which co<strong>in</strong>cides with the estimated coefficient vector from an OLS regression of Y it onX t (Hamilton 1994: 293). The maximum likelihood estimate of the matrix of varianceand covariance among the error terms Σ u turns out to be ˆΣ u = (1/T) ∑︀ Tt=1 û t û ′ t , whereû t = Y t − ˆΠ ′ X t . Therefore, the maximum likelihood estimate of the covariance betweenu it and u jt is given by the (i, j) element of ˆΣ u : ˆσ i j = (1/T) ∑︀ Tt=1 û it û jt . Denot<strong>in</strong>g by σ i jthe (i, j) element of Σ u , let us first def<strong>in</strong>e the follow<strong>in</strong>g matrix transform operators: vec,which stacks the columns of a k × k matrix <strong>in</strong>to a vector of length k 2 and vech, whichvertically stacks the elements of a k × k matrix on or below the pr<strong>in</strong>cipal diagonal <strong>in</strong>toa vector of length k(k + 1)/2. For example:[︃ ]︃σ11 σvec 12σ 21 σ 22⎡=⎢⎣t=1−1−1σ 11[︃ ]︃σ 21 σ11 σ, vech 12σ 12 σ⎤⎥⎦21 σ 22σ 22.,⎡= ⎢⎣σ 11σ 21σ 22⎤⎥⎦ .The process be<strong>in</strong>g stationary and the error terms Gaussian, it turns out that:√ dT [vech( ˆΣ u ) − vech(Σ u )] −→ N(0, Ω), (11)where Ω = 2D + k (Σ u ⊗ Σ u )(D + k )′ , D + k ≡ (D′ k D k) −1 D ′ k , D k is the unique (k 2 × k(k +1)/2) matrix satisfy<strong>in</strong>g D k vech(Ω) = vec(Ω), and ⊗ denotes the Kronecker product (seeHamilton 1994: 301). For example, for k = 2, we have,√T⎡⎢⎣ˆσ 11 − σ 11ˆσ 12 − σ 12⎤⎥⎦ˆσ 22 − σ 22d⎛⎡−→ N ⎜⎝ ⎢⎣000⎤⎥⎦ , ⎡⎢⎣2σ 2 112σ 11 σ 12 2σ 2 122σ 11 σ 12 σ 11 σ 22 + σ 2 122σ 12 σ 222σ 2 122σ 12 σ 22 2σ 2 22Therefore, to test the null hypothesis that ρ(u it ,u jt ) = 0 from the VAR estimated residuals,it is possible to use the Wald statistic:⎤⎞⎥⎦ ⎟⎠T ( ˆσ i j ) 2ˆσ ii ˆσ j j + ˆσ 2 i j≈ χ 2 (1).111