11.01.2013 Views

Algorithmes de prediction et de recherche de multi-structures d'ARN

Algorithmes de prediction et de recherche de multi-structures d'ARN

Algorithmes de prediction et de recherche de multi-structures d'ARN

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

22 Chapter 1. Background – RNAs, non-coding RNAs, and bioinformatics<br />

The “full” s<strong>et</strong> of sequences is automatically produced with Infernal [81] (covariance mo<strong>de</strong>ls<br />

using SCFGs [31], see Section 1.2.6) on regions selected by sequence similarity scans. Some of<br />

these “full” sequences may be “promoted” to “seed” sequences after a “manual verification”. All<br />

“full” sequences are aligned and annotated with a consensus secondary structure. Note that this<br />

consensus structure is based on the alignment, and thus may not be present in each sequence.<br />

There are other more specialized ncRNA databases, as for example:<br />

• Sprinzel tRNA Database [103] 5 , and the Genomic tRNA Database 6 both focus on<br />

tRNAs. The second database contains <strong>prediction</strong>s obtained with tRNAscan-SE [66];<br />

• miRBase [45] 7 is a database of microRNA sequences and their corresponding mature<br />

sequences;<br />

• RNaseP Database [18] 8 is a database of RNase P RNAs;<br />

• RDP-II [24] 9 and SILVA [88] 10 are databases of ribosomal RNAs. The first one corresponds<br />

to bacterial and archaeal small-subunit 16S rRNA sequences. The second one<br />

contains datas<strong>et</strong>s of small aligned (16S/18S, SSU) and large subunit (23S/28S, LSU) ribosomal<br />

RNA (rRNA) sequences.<br />

1.3 Contents of the thesis<br />

Algorithms on RNA secondary structure are far more efficient than those on pseudoknotted or<br />

3D <strong>structures</strong>. Moreover, the secondary structure alone is often sufficient to i<strong>de</strong>ntify and classify<br />

ncRNAs. In this thesis, I focused on s<strong>et</strong>s of secondary <strong>structures</strong> – either putative, or real<br />

– on a same sequence that I called RNA <strong>multi</strong>-<strong>structures</strong>.<br />

In<strong>de</strong>ed, even if tools presented in Section 1.2.4 can predict distinct <strong>structures</strong>, including<br />

“suboptimal” ones, no tool allows to further analyze at once a s<strong>et</strong> of several <strong>structures</strong>. Anyone<br />

interested in several “candidates” in a typical mfold/unafold or RNAsuboptoutput must repeat<br />

analysis (including pattern matching) on every candidate, even if these candidates share common<br />

parts such as helices.<br />

Moreover, as mentioned in Section 1.2.4, several RNAs, such as riboswitches, seems to change<br />

their conformations: in these cases, studying only one structure such as the optimal MFE<br />

structure is not enough to have a good un<strong>de</strong>rstanding of the RNA. Is there any other structure<br />

with the same energy value that can yield us b<strong>et</strong>ter informations?<br />

To efficiently represent <strong>multi</strong>-<strong>structures</strong>, I propose to consi<strong>de</strong>r them as nested levels of flat<br />

<strong>structures</strong> (Chapter 2). This <strong>de</strong>composition naturally follows the tree representation of RNA,<br />

factorizing common helices of the <strong>structures</strong>. I studied the two following questions:<br />

5 http://www.staff.uni-bayreuth.<strong>de</strong>/~btc914/search/in<strong>de</strong>x.html<br />

6 http://gtrnadb.ucsc.edu/<br />

7 http://www.mirbase.org/<br />

8 http://jwbrown.mbio.ncsu.edu/RNaseP<br />

9 http://rdp.cme.msu.edu/<br />

10 http://www.arb-silva.<strong>de</strong>/

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!