14.02.2013 Views

Mathematics in Independent Component Analysis

Mathematics in Independent Component Analysis

Mathematics in Independent Component Analysis

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Chapter 1<br />

Statistical mach<strong>in</strong>e learn<strong>in</strong>g of<br />

biomedical data<br />

Biostatistics deals with the analysis of high-dimensional data sets orig<strong>in</strong>at<strong>in</strong>g from biological or<br />

biomedical problems. An important challenge <strong>in</strong> this analysis is to identify underly<strong>in</strong>g statistical<br />

patterns that facilitate the <strong>in</strong>terpretation of the data set us<strong>in</strong>g techniques from mach<strong>in</strong>e<br />

learn<strong>in</strong>g. A possible approach is to learn a more mean<strong>in</strong>gful representation of the data set,<br />

which maximizes certa<strong>in</strong> statistical features. Such often l<strong>in</strong>ear representations have several potential<br />

applications <strong>in</strong>clud<strong>in</strong>g the decomposition of objects <strong>in</strong>to ‘natural’ components (Lee and<br />

Seung, 1999), redundancy and dimensionality reduction (Friedman and Tukey, 1975), biomedical<br />

data analysis, microarray data m<strong>in</strong><strong>in</strong>g or enhancement, feature extraction of images <strong>in</strong> nuclear<br />

medic<strong>in</strong>e, etc. (Alpayd<strong>in</strong>, 2004, Bishop, 2007, Cichocki and Amari, 2002, Hyvär<strong>in</strong>en et al., 2001c,<br />

MacKay, 2003, Mitchell, 1997).<br />

In the follow<strong>in</strong>g, we will review some statistical representation models and discuss identifiability<br />

conditions. The result<strong>in</strong>g separation algorithms will be applied to various biomedical<br />

problems <strong>in</strong> the last part of this summary.<br />

1.1 Introduction<br />

Assume the data is given by a multivariate time series x(t) ∈ R m , where t <strong>in</strong>dexes time, space<br />

or some other quantity. Data analysis can be def<strong>in</strong>ed as f<strong>in</strong>d<strong>in</strong>g a mean<strong>in</strong>gful representation of<br />

x(t) i.e. as x(t) = f(s(t)) with unknown features s(t) ∈ R m and mix<strong>in</strong>g mapp<strong>in</strong>g f. Often, f is<br />

assumed to be l<strong>in</strong>ear, so we are deal<strong>in</strong>g with the situation<br />

x(t) = As(t) (1.1)<br />

with a mix<strong>in</strong>g matrix A ∈ R m×n . Often, white noise n(t) is added to the model, yield<strong>in</strong>g<br />

x(t) = As(t) + n(t); this can be <strong>in</strong>cluded <strong>in</strong> s(t) by <strong>in</strong>creas<strong>in</strong>g its dimension. In the situation<br />

(1.1), the analysis problem is reformulated as the search for a (possibly overcomplete) basis, <strong>in</strong><br />

which the feature signal s(t) allows more <strong>in</strong>sight <strong>in</strong>to the data than x(t) itself. This of course<br />

has to be specified with<strong>in</strong> a statistical framework.<br />

There are two general approaches to f<strong>in</strong>d<strong>in</strong>g data representations or models as <strong>in</strong> (1.1):<br />

3

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!