Algorithmes de prediction et de recherche de multi-structures d'ARN
Algorithmes de prediction et de recherche de multi-structures d'ARN
Algorithmes de prediction et de recherche de multi-structures d'ARN
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
16 Chapter 1. Background – RNAs, non-coding RNAs, and bioinformatics<br />
for secondary structure <strong>prediction</strong> maximizing the number of base pairs was introduced later, in<br />
1978, by Nussinov <strong>et</strong> al. [84]. Base pair maximization is a very simple mo<strong>de</strong>l and does not take<br />
into account full energy param<strong>et</strong>ers. The same year, Waterman and Smith [125] presented a<br />
m<strong>et</strong>hod to predict the MFE structure through a simple nearest-neighbor mo<strong>de</strong>l. In 1981, Zuker<br />
and Stiegler [134] proposed a more elaborated dynamic programming algorithm, predicting the<br />
MFE structure. Later in the 1980’s, these folding algorithms were applied on the Turner energy<br />
mo<strong>de</strong>l [131].<br />
mfold/unafold [132] and RNAfold [51] are some popular implementations of these algorithms.<br />
For a sequence of length n, they run in time O(n 3 ) (or even O(n 4 ) if one takes into account interior<br />
loops of any size) and space O(n 2 ). D<strong>et</strong>ailed analysis of suboptimal <strong>prediction</strong>s, especially<br />
for mfold/unafold, will be done in Chapter 3. Recently, some optimizations in average less time<br />
and space have been proposed [8].<br />
1.2.4 Structure <strong>prediction</strong>, beyond the MFE structure<br />
Most of the time, computing the MFE structure is not enough:<br />
• Even if the Turner mo<strong>de</strong>l is well established, this empiric thermodynamic data can be incompl<strong>et</strong>e:<br />
slight changes in the thermodynamic mo<strong>de</strong>l could produce very different foldings<br />
with similar energy levels;<br />
• The Nearest Neighbor mo<strong>de</strong>l does not allow for pseudoknots and base tripl<strong>et</strong>s, and it<br />
does not reflect the interactions of the RNA with other molecules in the cell. Thus the<br />
projection of a real RNA structure on its secondary component may not be optimal;<br />
• In the cell, RNA molecules do not take only the most stable structure, but they seem to<br />
rapidly change their conformation b<strong>et</strong>ween <strong>structures</strong> with similar free energies. These<br />
changes can be induced by the interaction of the RNA with other molecules, and mostly<br />
concern their 3D structure. Some RNAs, including several mRNAs in their untranslated<br />
region, have even been shown to bear conformational switching in their secondary structure.<br />
Thus the MFE structure is not sufficient to fully un<strong>de</strong>rstand the folding possibilities of an<br />
RNA. Instead of g<strong>et</strong>ting the optimal structure, a compl<strong>et</strong>e backtracking through the dynamic<br />
programming table can generate all suboptimal secondary <strong>structures</strong> within a given threshold<br />
from minimum free energy. Being able to compute alternative <strong>structures</strong> may also be useful<br />
in <strong>de</strong>signing RNA sequences which not only have low folding energy, but whose folding landscape<br />
would suggest rapid and robust folding. Such suboptimal backtracking is implemented in<br />
RNAsubopt [127] using Waterman and Byers algorithm [124], whereas mfold does not implement<br />
this full backtrack (see Chapter 3).<br />
However, the s<strong>et</strong> of suboptimal <strong>structures</strong> is often huge : the number of such <strong>structures</strong><br />
is exponential in the length of the sequence. RNAshapes [105] suggests “abstract shapes” to<br />
classify the secondary <strong>structures</strong>. During computation of the suboptimal <strong>structures</strong>, RNAshapes<br />
partitions the folding space into s<strong>et</strong>s of <strong>structures</strong> according to their shapes.<br />
One may be interested by selecting only relevant <strong>structures</strong>, for example those corresponding<br />
to a local minimum of energy. The goal is to keep relevant suboptimal <strong>structures</strong> in the free<br />
energy landscape that correspond to local minima. The notions of saturated <strong>structures</strong> and<br />
locally optimal secondary <strong>structures</strong> me<strong>et</strong> these requirements.