Algorithmes de prediction et de recherche de multi-structures d'ARN
Algorithmes de prediction et de recherche de multi-structures d'ARN
Algorithmes de prediction et de recherche de multi-structures d'ARN
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
22 Chapter 1. Background – RNAs, non-coding RNAs, and bioinformatics<br />
The “full” s<strong>et</strong> of sequences is automatically produced with Infernal [81] (covariance mo<strong>de</strong>ls<br />
using SCFGs [31], see Section 1.2.6) on regions selected by sequence similarity scans. Some of<br />
these “full” sequences may be “promoted” to “seed” sequences after a “manual verification”. All<br />
“full” sequences are aligned and annotated with a consensus secondary structure. Note that this<br />
consensus structure is based on the alignment, and thus may not be present in each sequence.<br />
There are other more specialized ncRNA databases, as for example:<br />
• Sprinzel tRNA Database [103] 5 , and the Genomic tRNA Database 6 both focus on<br />
tRNAs. The second database contains <strong>prediction</strong>s obtained with tRNAscan-SE [66];<br />
• miRBase [45] 7 is a database of microRNA sequences and their corresponding mature<br />
sequences;<br />
• RNaseP Database [18] 8 is a database of RNase P RNAs;<br />
• RDP-II [24] 9 and SILVA [88] 10 are databases of ribosomal RNAs. The first one corresponds<br />
to bacterial and archaeal small-subunit 16S rRNA sequences. The second one<br />
contains datas<strong>et</strong>s of small aligned (16S/18S, SSU) and large subunit (23S/28S, LSU) ribosomal<br />
RNA (rRNA) sequences.<br />
1.3 Contents of the thesis<br />
Algorithms on RNA secondary structure are far more efficient than those on pseudoknotted or<br />
3D <strong>structures</strong>. Moreover, the secondary structure alone is often sufficient to i<strong>de</strong>ntify and classify<br />
ncRNAs. In this thesis, I focused on s<strong>et</strong>s of secondary <strong>structures</strong> – either putative, or real<br />
– on a same sequence that I called RNA <strong>multi</strong>-<strong>structures</strong>.<br />
In<strong>de</strong>ed, even if tools presented in Section 1.2.4 can predict distinct <strong>structures</strong>, including<br />
“suboptimal” ones, no tool allows to further analyze at once a s<strong>et</strong> of several <strong>structures</strong>. Anyone<br />
interested in several “candidates” in a typical mfold/unafold or RNAsuboptoutput must repeat<br />
analysis (including pattern matching) on every candidate, even if these candidates share common<br />
parts such as helices.<br />
Moreover, as mentioned in Section 1.2.4, several RNAs, such as riboswitches, seems to change<br />
their conformations: in these cases, studying only one structure such as the optimal MFE<br />
structure is not enough to have a good un<strong>de</strong>rstanding of the RNA. Is there any other structure<br />
with the same energy value that can yield us b<strong>et</strong>ter informations?<br />
To efficiently represent <strong>multi</strong>-<strong>structures</strong>, I propose to consi<strong>de</strong>r them as nested levels of flat<br />
<strong>structures</strong> (Chapter 2). This <strong>de</strong>composition naturally follows the tree representation of RNA,<br />
factorizing common helices of the <strong>structures</strong>. I studied the two following questions:<br />
5 http://www.staff.uni-bayreuth.<strong>de</strong>/~btc914/search/in<strong>de</strong>x.html<br />
6 http://gtrnadb.ucsc.edu/<br />
7 http://www.mirbase.org/<br />
8 http://jwbrown.mbio.ncsu.edu/RNaseP<br />
9 http://rdp.cme.msu.edu/<br />
10 http://www.arb-silva.<strong>de</strong>/