28.02.2013 Views

Bio-medical Ontologies Maintenance and Change Management

Bio-medical Ontologies Maintenance and Change Management

Bio-medical Ontologies Maintenance and Change Management

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Classifying Patterns in <strong>Bio</strong>informatics Databases 189<br />

Inside DNA chains there are some nucleotide strings known as genes,which<br />

are sequences of nucleotide, also called exons. These exons are of particular<br />

interest, since they have a particular correspondence with a certain protein:<br />

they code for this particular protein. However, the exons are located in DNA<br />

chains in a disorderly fashion, making it very difficult to take a portion of the<br />

chain <strong>and</strong> say that it codes for a certain protein, as is. Between exons there<br />

are sequences which do not code for a protein, called introns; the sequences<br />

that do not code for proteins <strong>and</strong> are between genes are known as intergenetic<br />

regions [7], [10], [11].<br />

On the other h<strong>and</strong>, proteins are polypeptides formed inside cells as sequences<br />

of 20 different aminoacids [12], which are denoted by 20 different<br />

letters. Each of these 20 aminoacids is coded by one or more codons [9]. The<br />

chemical properties differentiating the 20 aminoacids make them group together<br />

to conform proteins with certain tridimensional structures, defining<br />

the specific functions of a cell [11].<br />

2.1.2 Main Problems in <strong>Bio</strong>informatics<br />

The different problems addressed by <strong>Bio</strong>informatics can be classified into<br />

three categories: genomic tasks, proteomic tasks, <strong>and</strong>gene expression tasks.<br />

The first refers to the study of various genomes when taken as entities with<br />

similar contents, while the second refers to the study of all the proteins which<br />

arise from a genome [13]. The last one is related to studying the relationships<br />

between a nucleotide string <strong>and</strong> the proteins generated by that string. In this<br />

chapter we are interested on the genomic task, particularly on both promoter<br />

<strong>and</strong> splice-junction identification. Promoters are regions located immediately<br />

before each gene, indicating that what follows is a gene; they also regulate<br />

the beginning of transcription [14]. A splice-junction zone is where an intron<br />

becomes an exon an viceversa, this is important in order to identified which<br />

segments of the sequence code for a protein[14].<br />

2.1.3 Computational Tools Applied to <strong>Bio</strong>informatics<br />

One of the most used computational methods in <strong>Bio</strong>informatics are artificial<br />

neural networks. These are a set of models which emulate biological neural<br />

networks. Some of the tasks in which artificial neural networks are most<br />

employed are classification <strong>and</strong> function approximation problems. However,<br />

it is noteworthy to mention that, even though neural networks <strong>and</strong> associative<br />

memories are equivalent models, as shown by Hopfiled [15], associative<br />

memories have not been much used in <strong>Bio</strong>informatics. Results obtained by<br />

feed-forward back-propagation neural networks have been compared to those<br />

shown by BLAST, one of the most widely used tools in <strong>Bio</strong>informatics. In<br />

these comparisons, the neural network trained with the back-propagation<br />

algorithm presented a better processing speed, but not as effective results<br />

[16]-[20].

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!