28.02.2013 Views

Bio-medical Ontologies Maintenance and Change Management

Bio-medical Ontologies Maintenance and Change Management

Bio-medical Ontologies Maintenance and Change Management

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Classifying Patterns in <strong>Bio</strong>informatics Databases 205<br />

50% (1655) corresponds to non-significant instances. The genetic sequences<br />

are formed by 60 nucleotides <strong>and</strong> their binary codification is shown in table<br />

2 too. Since we just want to differentiate between EI <strong>and</strong> IE sequences, the<br />

remaining 50% of the instances are considered as non-significant.<br />

Given that the alpha-beta heteroassociative memories use binary operations,<br />

the patterns used in the learning <strong>and</strong> recalling phase must be binary<br />

patterns too. So it was necessary to create a correspondence table between<br />

the DNA sequences <strong>and</strong> binary patterns. It was evident, by the results shown<br />

in this work <strong>and</strong> in [32], that the one-hot codification is the optimal.<br />

This codification is very useful, especially in the following situations: where<br />

in the DNA sequences there are some positions in which it is necessary to<br />

indicate that such position could take more than one value. For example, for<br />

the sequence ATCG, it is possible that the third position could be represented<br />

either for “C” or “T”. In this case in order to create a binary sequence that<br />

represents those values we just have to apply the binary operator “OR” with<br />

the corresponding “C” <strong>and</strong> “T” sequences. This is possible due to the good<br />

performance of the alpha-beta memories in the presence of altered patterns<br />

(see table 2).<br />

Table 2. Nucleotide Convertion<br />

Nucleotide Code<br />

A 1000<br />

T 0100<br />

C 0010<br />

G 0001<br />

D 1011<br />

N 1111<br />

S 0101<br />

R 1001<br />

In order to get an estimate of how well the algorithm learned both, the<br />

concept of promoter <strong>and</strong> the EI or IE splice-junction sequence, a series of<br />

experiments was made. Then the results were compared with some other<br />

algorithms that work under the same conditions.<br />

4.1 DNA Promoter Sequence Classification<br />

There are two other works to which we could compare our results, J. Ortega<br />

[33] <strong>and</strong> Baffes <strong>and</strong> Mooney [34], both of them made their experiments under<br />

the following conditions: 85 instances were r<strong>and</strong>omly taken to build the<br />

training set <strong>and</strong> the remaining 21 were left in the test set. This procedure<br />

was repeated 100 times. The table 3 is a comparative between [33], [34], <strong>and</strong><br />

the ABMMC, where N/D st<strong>and</strong> for Not-Determined.<br />

It is evident that ABMMC overcome, without problems, the performance<br />

of the other algorithms, even when learning with only 80 or 70 instances

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!