Popescuβ > 0 y C,i =x N,i =⎡K∑︁k=1⎢⎣⎡K∑︁k=1⎢⎣a 11 0 0a 12 a 22 0 y⎤⎥⎦ C,i−k + w C,i (39)0 0 a 33 C,ka 11 0 00 a 22 0 x⎤⎥⎦ N,i−k + w N,i0 0 a 22 N,ky N,i = Bx N,i (40)x D,i =⎡K∑︁k=1⎢⎣a 11 0 a 130 a 22 a 23 x⎤⎥⎦ D,i−k + w D,i0 0 a 22 D,ky MC = (1 − |β|)y N + |β|y C‖y N ‖ F‖y C ‖ F(41)y DC = (1 − χ )y MC + χ y D‖y MC ‖ F‖y D ‖ F(42)The Table 3, similar to the tables <strong>in</strong> the preced<strong>in</strong>g section, shows results for allusual methods, except for PSIpartial which is PSI calculated on the partial coherenceas def<strong>in</strong>ed above and calculated from Welch (cross-spectral) estimators <strong>in</strong> the case ofmixed noise and a common driver.Table 3: TRIPLES: Commonly driven, additive mixed colored noiseMax. Accuracy TP , FP< 0.10100 500 1000 5000 100 500 1000 5000Ψ p 0.53 0.61 0.71 0.75 0.12 0.31 0.49 0.56Ψ 0.54 0.60 0.70 0.72 0.10 0.25 0.40 0.52CSI 0.51 0.60 0.69 0.76 0.09 0.27 0.38 0.45PDC 0.55 0.54 0.60 0.58 0.13 0.12 0.16 0.13DTF 0.51 0.56 0.59 0.61 0.12 0.09 0.09 0.11Notice that the TP rates are lower for all methods with respect to Table 2 whichrepresents the mixed noise situation without any common driver.10. DiscussionIn a recent talk, Emanuel Parzen (Parzen, 2004) proposed, both <strong>in</strong> h<strong>in</strong>dsight and forfuture consideration, that aim of statistics consist <strong>in</strong> an ‘answer mach<strong>in</strong>e’, i.e. a more<strong>in</strong>telligent, automatic and comprehensive version of Fisher’s almanac, which currentlyconsists <strong>in</strong> a plenitude of chapters and sections related to different types of hypotheses62
Robust Statistics for <strong>Causality</strong>and assumption sets meant to model, <strong>in</strong>sofar as possible, the ever expand<strong>in</strong>g variety ofdata available. These categories and sub-categories are not always dist<strong>in</strong>ct, and furthermorethere are compet<strong>in</strong>g general approaches to the same problems (e.g. Bayesian vs.frequentist). Is an ‘answer mach<strong>in</strong>e’ realistic <strong>in</strong> terms of time-series causality, prerequisitesfor which are found throughout this almanac, and which has developed <strong>in</strong> parallel<strong>in</strong> different discipl<strong>in</strong>es?This work began by discuss<strong>in</strong>g Granger causality <strong>in</strong> abstract terms, po<strong>in</strong>t<strong>in</strong>g out theimplausibility of f<strong>in</strong>d<strong>in</strong>g a general method of causal discovery, s<strong>in</strong>ce that depends onthe general learn<strong>in</strong>g and time-series prediction problem, which are <strong>in</strong>computable. However,if any consistent patterns that can be found mapp<strong>in</strong>g the history of one time seriesvariable to the current state of another (us<strong>in</strong>g non-parametric tests), there is sufficientevidence of causal <strong>in</strong>teraction and the null hypothesis is rejected. Such a determ<strong>in</strong>ationstill does not address direction of <strong>in</strong>teraction and relative strength of causal <strong>in</strong>fluence,which may require a complete model of the DGP. This study - like many others - reliedon the rather strong assumption of stationary l<strong>in</strong>ear Gaussian DGPs but otherwise madeweak assumptions on model order, sampl<strong>in</strong>g and observation noise. Are there, <strong>in</strong>stead,more general assumptions we can use? The follow<strong>in</strong>g is a list of compet<strong>in</strong>g approaches<strong>in</strong> <strong>in</strong>creas<strong>in</strong>g order of (subjectively judged) strength of underly<strong>in</strong>g assumption(s):∙ Non-parametric tests of conditional probability for Granger non-causality rejection.These directly compare the probability distributions P(y 1, j | y 1, j−1..1 ,u j−1..1 )P(y 1, j | y 1, j−1..1 ,u j−1..1 ) to detect a possible statistically significant difference. Proposedapproaches (see chapter <strong>in</strong> this volume by (Moneta et al., 2011) for adetailed overview and tabulated robustness comparison) <strong>in</strong>clude product kerneldensity with kernel smooth<strong>in</strong>g (Chlaß and Moneta, 2010), made robust by bootstrapp<strong>in</strong>gand with density distances such as the Hell<strong>in</strong>ger (Su and White, 2008),Euclidean (Szekely and Rizzo, 2004), or completely nonparametric differencetests such Cramer-Von Mises or Kolmogorov-Smirnov. A potential pitfall ofnonparametric approaches is their loss of power for higher dimensionality of thespace over which the probabilities are estimated - aka the curse of dimensionality(Yatchew, 1998). This can occur if the lag order K needed to be considered ishigh, if the system memory is long, or the number of other variables over whichGC must be conditioned (u j−1..1 ) is high. In the case of mixed noise, strongGC estimation would require account<strong>in</strong>g for all observed variables (which <strong>in</strong>neuroscience can number <strong>in</strong> the hundreds). While non-parametric non-causalityrejection is a very useful tool (and could be valid even if the lag considered <strong>in</strong>analysis is much smaller than the true lag K), <strong>in</strong> practice we would require robustestimated of causal direction and relative strength of different factors, which impliesa complete account<strong>in</strong>g of all relevant factors. As was already discussed, <strong>in</strong>many cases Granger non-causality is likely to be rejected <strong>in</strong> both directions: it isuseful to f<strong>in</strong>d the dom<strong>in</strong>ant one.∙ General parametric or semi-parametric (black-box) predictive model<strong>in</strong>g subjectto GC <strong>in</strong>terpretation which can provide directionality, factor analysis and <strong>in</strong>ter-63