Moneta Chlass Entner Hoyerstricted to be<strong>in</strong>g lower triangular (<strong>in</strong> an appropriate order<strong>in</strong>g of the variables). While <strong>in</strong>general this model is not identifiable because we cannot uniquely match the shocks tothe residuals, Lacerda et al. (2008) showed that the model is identifiable when assum<strong>in</strong>gstability of the generat<strong>in</strong>g model <strong>in</strong> (15) (the absolute value of the biggest eigenvalue <strong>in</strong>B 0 is smaller than one) and disjo<strong>in</strong>t cycles.Another restriction of the above model is that all relevant variables must be <strong>in</strong>cluded<strong>in</strong> the model (causal sufficiency). Hoyer et al. (2008b) extended the abovemodel by allow<strong>in</strong>g for hidden variables. This leads to an overcomplete basis ICAmodel, mean<strong>in</strong>g that there are more <strong>in</strong>dependent non-Gaussian sources than observedmixtures. While there exist methods for estimat<strong>in</strong>g overcomplete basis ICA models,those methods which achieve the required accuracy do not scale well. Additionally,the solution is aga<strong>in</strong> only unique up to order<strong>in</strong>g, scal<strong>in</strong>g, and sign, and when <strong>in</strong>clud<strong>in</strong>ghidden variables the order<strong>in</strong>g-ambiguity cannot be resolved and <strong>in</strong> some cases leads toseveral observationally equivalent models, just as <strong>in</strong> the cyclic case above.We note that it is also possible to comb<strong>in</strong>e the approach of section 2 with that describedhere. That is, if some of the shocks are Gaussian or close to Gaussian, it maybe advantageous to use a comb<strong>in</strong>ation of constra<strong>in</strong>t-based search and non-Gaussianitybasedsearch. Such an approach was proposed <strong>in</strong> Hoyer et al. (2008a). In particular,the proposed method does not make any assumptions on the distributions of the VARresidualsu t . Basically, the PC algorithm (see Section 2) is run first, followed by utilizationof whatever non-Gaussianity there is to further direct edges. Note that there isno need to know <strong>in</strong> advance which shocks are non-Gaussian s<strong>in</strong>ce f<strong>in</strong>d<strong>in</strong>g such shocksis part of the algorithm.F<strong>in</strong>ally, we need to po<strong>in</strong>t out that while the basic ICA-based approach does notrequire the faithfulness assumption, the extensions discussed at the end of this sectiondo.4. Nonparametric sett<strong>in</strong>g4.1. TheoryL<strong>in</strong>ear systems dom<strong>in</strong>ate VAR, SVAR, and more generally, multivariate time seriesmodels <strong>in</strong> econometrics. However, it is not always the case that we know how a variableX may cause another variable Y. It may be the case that we have little or no a prioriknowledge about the way how Y depends on X. In its most general form we wantto know whether X is <strong>in</strong>dependent of Y conditional on the set of potential graphicalparents Z, i.e.H 0 : Y ⊥ X | Z, (17)where Y, X,Z is a set of time series variables. Thereby, we do not per se require ana priori specification of how Y possibly depends on X. However, constra<strong>in</strong>t basedalgorithms typically specify conditional <strong>in</strong>dependence <strong>in</strong> a very restrictive way. In cont<strong>in</strong>uoussett<strong>in</strong>gs, they simply test for nonzero partial correlations, or <strong>in</strong> other words, forl<strong>in</strong>ear (<strong>in</strong>)dependencies. Hence, these algorithms will fail whenever the data generationprocess (DGP) <strong>in</strong>cludes nonl<strong>in</strong>ear causal relations.118
Causal Search <strong>in</strong> SVARIn search for a more general specification of conditional <strong>in</strong>dependency, Chlaß andMoneta (2010) suggest a procedure based on nonparametric density estimation. There<strong>in</strong>,neither the type of dependency between Y and X, nor the probability distributions ofthe variables need to be specified. The procedure exploits the fact that if two randomvariables are <strong>in</strong>dependent of a third, one obta<strong>in</strong>s their jo<strong>in</strong>t density by the product of thejo<strong>in</strong>t density of the first two, and the marg<strong>in</strong>al density of the third. Hence, hypothesistest (17) translates <strong>in</strong>to:f (Y, X,Z)H 0 : = f (YZ)f (XZ) f (Z) . (18)If we def<strong>in</strong>e h 1 (·) := f (Y, X,Z) f (Z), and h 2 (·) := f (YZ) f (XZ), we have:H 0 : h 1 (·) = h 2 (·). (19)We estimate h 1 and h 2 us<strong>in</strong>g a kernel smooth<strong>in</strong>g approach (see Wand and Jones, 1995,ch.4). Kernel smooth<strong>in</strong>g has the outstand<strong>in</strong>g property that it is <strong>in</strong>sensitive to autocorrelationphenomena and, therefore, immediately applicable to longitud<strong>in</strong>al or time seriessett<strong>in</strong>gs (Welsh et al., 2002).In particular, we use a so-called product kernel estimator:)︁ (︁1ĥ 1 (x,y,z;b) = KYi)︁ (︁−y KZi −z)︁}︁{︁∑︀ (︁ ni=1KZi)︁}︁−zp{︁∑︀ ni=1K (︁ X i −xN 2 b m+d b )︁ (︁ b bbZi −z)︁}︁{︁∑︀ ni=1 KZ K (︁ )︁ (︁Y i −y Zi)︁}︁−z (20)Kp ,1{︁∑︀ĥ 2 (x,y,z;b) = ni=1K (︁ X i −xN 2 b m+d bwhere X i , Y i , and Z i are the i th realization of the respective time series, K denotes thekernel function, b <strong>in</strong>dicates a scalar bandwidth parameter, and K p represents a productkernel 2 .So far, we have shown how we can estimate h 1 and h 2 . To see whether these aredifferent, we require some similarity measure between both conditional densities. Thereare different ways to measure the distance between a product of densities:(i) The weighted Hell<strong>in</strong>ger distance proposed by Su and White (2008):⎧ √︃⎫d H = 1 2n∑︁ ⎪⎨n ⎪⎩ 1 − h 2 (X i ,Y i ,Z i ) ⎪⎬a(X i ,Y i ,Z i ), (21)h 1 (X i ,Y i ,Z i ) ⎪⎭i=1where a(·) is a nonnegative weight<strong>in</strong>g function. Both the weight<strong>in</strong>g function a(·),and the result<strong>in</strong>g test statistic are specified <strong>in</strong> Su and White (2008).(ii) The Euclidean distance proposed by Szekely and Rizzo (2004) <strong>in</strong> their ‘energytest’:d E = 1 n∑︁ n∑︁||h 1i − h 2nj|| − 1 n∑︁ n∑︁||h 1i − h 12nj|| − 1 n∑︁ n∑︁||h 2i − h 22nj||, (22)i=1j=1i=1j=12. I.e. K p ((Z i − z)/b) = ∏︀ dj=1 K((Z ji − z j )/b). For our simulations (see next section) we choose thekernel: K(u) = (3 − u 2 )φ(u)/2, with φ(u) the standard normal probability density function. We use a“rule-of-thumb” bandwidth: b = n −1/8.5 .bbi=1j=1b119