16.11.2012 Views

Brain–Computer Interfaces - Index of

Brain–Computer Interfaces - Index of

Brain–Computer Interfaces - Index of

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

118 B. Blankertz et al.<br />

2.2 Regularized Linear Classification<br />

There is some debate about whether to use linear or non-linear methods to classify<br />

single EEG trials; see the discussion in [33]. In our experience, linear methods perform<br />

well, if an appropriate preprocessing <strong>of</strong> the data is performed. E.g., band-power<br />

features itself are far from being Gaussian distributed due to the involved squaring<br />

and the best classification <strong>of</strong> such features is nonlinear. But applying the logarithm<br />

to those features makes their distribution close enough to Gaussian such that linear<br />

classification typically works well. Linear methods are easy to use and robust. But<br />

there is one caveat that applies also to linear methods. If the number <strong>of</strong> dimensions<br />

<strong>of</strong> the data is high, simple classification methods like Linear Discriminant Analysis<br />

(LDA) will not work properly. The good news is that there is a remedy called<br />

shrinkage (or regularization) that helps in this case. A more detailed analysis <strong>of</strong><br />

the problem and the presentation <strong>of</strong> its solution is quite mathematical. Accordingly,<br />

the subsequent subsection is only intended for readers interested in those<br />

technical details.<br />

2.2.1 Mathematical Part<br />

For known Gaussian distributions with the same covariance matrix for all classes,<br />

it can be shown that Linear Discriminant Analysis (LDA) is the optimal classifier<br />

in the sense that it minimizes the risk <strong>of</strong> misclassification for new samples drawn<br />

from the same distributions [34]. Note that LDA is equivalent to Fisher Discriminant<br />

Analysis and Least Squares Regression [34]. For EEG classification, the assumption<br />

<strong>of</strong> Gaussianity can be achieved rather well by appropriate preprocessing <strong>of</strong> the data.<br />

But the means and covariance matrices <strong>of</strong> the distributions have to be estimated<br />

from the data, since the true distributions are not known.<br />

The standard estimator for a covariance matrix is the empirical covariance (see<br />

equation (1) below). This estimator is unbiased and has under usual conditions good<br />

properties. But for extreme cases <strong>of</strong> high-dimensional data with only a few data<br />

points given, the estimation may become inprecise, because the number <strong>of</strong> unknown<br />

parameters that have to be estimated is quadratic in the number <strong>of</strong> dimensions.<br />

This leads to a systematic error: Large eigenvalues <strong>of</strong> the original covariance<br />

matrix are estimated too large, and small eigenvalues are estimated too small; see<br />

Fig. 4. This error in the estimation degrades classification performance (and invalidates<br />

the optimality statement for LDA). Shrinkage is a common remedy for the<br />

systematic bias [35] <strong>of</strong> the estimated covariance matrices (e.g. [36]):<br />

Let x1, ..., xn ∈ Rd be n feature vectors and let<br />

ˆ� = 1<br />

n − 1<br />

n�<br />

(xi −ˆμ)(xi −ˆμ) ⊤<br />

i=1<br />

be the unbiased estimator <strong>of</strong> the covariance matrix. In order to counterbalance the<br />

estimation error, ˆ� is replaced by<br />

(1)

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!