Algorithmes de prediction et de recherche de multi-structures d'ARN
Algorithmes de prediction et de recherche de multi-structures d'ARN
Algorithmes de prediction et de recherche de multi-structures d'ARN
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
20 Chapter 1. Background – RNAs, non-coding RNAs, and bioinformatics<br />
The m<strong>et</strong>hod consists of two measures, one for RNA secondary structure conservation and<br />
one for thermodynamic stability of the sequence. The <strong>prediction</strong> of consensus secondary<br />
structure for alignment is procee<strong>de</strong>d by using RNAalifold. The two in<strong>de</strong>pen<strong>de</strong>nt diagnostic<br />
features of structural ncRNAs are used to classify an alignment as “structural RNA” or<br />
“other”. For this purpose, RNAz uses a support vector machine (SVM) learning algorithm<br />
which is trained on a large test s<strong>et</strong> of well known ncRNAs.<br />
• Ab initio gene fin<strong>de</strong>rs. Some m<strong>et</strong>hods use essential statistical features of data like GC<br />
percentage to recognize the region of data respecting this feature. The study in [20] uses<br />
a machine learning approach to extract common characteristics (such as base composition<br />
bias), amongst known RNAs for <strong>prediction</strong> of new RNA genes in the unannotated regions<br />
of prokaryotic and archaeal genomes. Further work has been carried out in this direction<br />
in [98].<br />
1.2.8 Comparison of RNA <strong>structures</strong><br />
Comparing two RNA <strong>structures</strong> involves <strong>de</strong>fining and computing a similarity b<strong>et</strong>ween two <strong>structures</strong>.<br />
This similarity is often based on a distance b<strong>et</strong>ween arborescent <strong>structures</strong>, as trees or<br />
arc-annotated sequences, and can take into account both the primary and the secondary structure.<br />
Structure comparison can be used to assert or reject the membership of an unknown RNA<br />
to a given ncRNA family, hence predicting the function, or to help evolutionary studies.<br />
Comparison of RNA secondary structure is implemented in tools such as RNAdistance [51,<br />
101, 100], RNAforester [55, 56, 57], Migal [3] and Gar<strong>de</strong>nia [14]. To compute a distance value, it<br />
is also suggested to compare the whole folding space of two sequences through their partition<br />
function and matrix of base pairing probabilities, like in RNAPDIST [16, 51]. Finally, [2] reports<br />
a benchmark of several RNA comparison tools.<br />
1.2.9 Tools for non-secondary <strong>structures</strong><br />
As mentioned previously, secondary <strong>structures</strong> are not the real 3D structure of an RNA. We<br />
now briefly mention m<strong>et</strong>hods and tools that go beyond secondary <strong>structures</strong>, or that combine<br />
secondary <strong>structures</strong>.<br />
• Prediction of RNA-RNA interactions. Some small non-coding RNAs have a posttranscriptional<br />
regulation role: by base-pairing with mRNAs, they modify the gene expression.<br />
A particular case is miRNA interaction, for which many tools have been proposed<br />
(review in [10]). For generic RNA-RNA interactions, some m<strong>et</strong>hods consi<strong>de</strong>r only the<br />
interaction site, neglecting intra-RNA interactions, such as in RNAhybrid [91], RNAduplex<br />
[51] or RNAplex [109]. Other approaches take into account both intra-RNA structure<br />
and inter-RNA interactions:<br />
– A first i<strong>de</strong>a is to compute the MFE or the partition function of the concatenation of<br />
the two RNA sequences, as in Pairfold [7] and RNAcofold [51, 12];<br />
– Some approaches, as RNAup [79], consi<strong>de</strong>r the problem in two steps, first breaking<br />
intramolecular binding of both RNAs, then adding intermolecular interaction. The<br />
energy gained from hybridization is thus the sum of the energy of the two steps;