Libro de Resúmenes / Book of Abstracts (Español/English)
Libro de Resúmenes / Book of Abstracts (Español/English)
Libro de Resúmenes / Book of Abstracts (Español/English)
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
Resumenes 151<br />
DNA microarrays are typical examples <strong>of</strong> high dimensionality context.<br />
They are part <strong>of</strong> a new class <strong>of</strong> biotechnologies which allow the monitoring<br />
(and quantification) <strong>of</strong> expression levels for thousands <strong>of</strong> genes<br />
simultaneously – gene expression pr<strong>of</strong>iles.<br />
DNA microarrays data contain more variables than observations (few<br />
pr<strong>of</strong>iles that quantify the expression levels <strong>of</strong> several thousands <strong>of</strong> genes).<br />
For this reason, traditional classification multivariate methods tend to<br />
produce unstable results due to overparametrization. In this context, the<br />
feature selection acquires relevance.<br />
In supervised classification, all the <strong>de</strong>finitions for “feature selection”<br />
emphasize the problem <strong>of</strong> choosing a minimum subset <strong>of</strong> relevant features<br />
to characterize and discriminate between classes in datasets different from<br />
the learning set.<br />
We analyze a data set from an experiment with DNA microarrays<br />
comparing wild type (Col-0) and mutant plants (Npr-1, Jar-1) <strong>of</strong> Arabidopsis<br />
thaliana. There were evaluated 11250 gene expression pr<strong>of</strong>iles, previously<br />
and 18 hours after the inoculation with the fungus Erysiphe cichoracearum.<br />
The goal is to choose a manageable set <strong>of</strong> genes composed by those<br />
which are clearly activated or repressed by the infection, and by those<br />
which distinguish among the different kinds <strong>of</strong> plants.<br />
Obtain pertinent feature subsets is very important for the<br />
classification task because the experimental confirmation <strong>of</strong> the statistic<br />
discoveries only can be done with a small set <strong>of</strong> genes by costly laboratory<br />
techniques.<br />
An empirical comparison <strong>of</strong> the performance <strong>of</strong> different feature<br />
selection methods with the dataset presented is carried out in this work.<br />
The methods presented here are based on distance and consistency<br />
measures and built from known procedures such as the filter RELIEF (Kira<br />
and Ren<strong>de</strong>l, 1992).