27.08.2013 Views

Libro de Resúmenes / Book of Abstracts (Español/English)

Libro de Resúmenes / Book of Abstracts (Español/English)

Libro de Resúmenes / Book of Abstracts (Español/English)

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Resumenes 151<br />

DNA microarrays are typical examples <strong>of</strong> high dimensionality context.<br />

They are part <strong>of</strong> a new class <strong>of</strong> biotechnologies which allow the monitoring<br />

(and quantification) <strong>of</strong> expression levels for thousands <strong>of</strong> genes<br />

simultaneously – gene expression pr<strong>of</strong>iles.<br />

DNA microarrays data contain more variables than observations (few<br />

pr<strong>of</strong>iles that quantify the expression levels <strong>of</strong> several thousands <strong>of</strong> genes).<br />

For this reason, traditional classification multivariate methods tend to<br />

produce unstable results due to overparametrization. In this context, the<br />

feature selection acquires relevance.<br />

In supervised classification, all the <strong>de</strong>finitions for “feature selection”<br />

emphasize the problem <strong>of</strong> choosing a minimum subset <strong>of</strong> relevant features<br />

to characterize and discriminate between classes in datasets different from<br />

the learning set.<br />

We analyze a data set from an experiment with DNA microarrays<br />

comparing wild type (Col-0) and mutant plants (Npr-1, Jar-1) <strong>of</strong> Arabidopsis<br />

thaliana. There were evaluated 11250 gene expression pr<strong>of</strong>iles, previously<br />

and 18 hours after the inoculation with the fungus Erysiphe cichoracearum.<br />

The goal is to choose a manageable set <strong>of</strong> genes composed by those<br />

which are clearly activated or repressed by the infection, and by those<br />

which distinguish among the different kinds <strong>of</strong> plants.<br />

Obtain pertinent feature subsets is very important for the<br />

classification task because the experimental confirmation <strong>of</strong> the statistic<br />

discoveries only can be done with a small set <strong>of</strong> genes by costly laboratory<br />

techniques.<br />

An empirical comparison <strong>of</strong> the performance <strong>of</strong> different feature<br />

selection methods with the dataset presented is carried out in this work.<br />

The methods presented here are based on distance and consistency<br />

measures and built from known procedures such as the filter RELIEF (Kira<br />

and Ren<strong>de</strong>l, 1992).

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!