13.07.2015 Views

Computational Analysis and Paleogenomics of Interspersed ...

Computational Analysis and Paleogenomics of Interspersed ...

Computational Analysis and Paleogenomics of Interspersed ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

<strong>Computational</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Interspersed</strong> Repeats | 39the biological termini <strong>of</strong> repeats (Bao <strong>and</strong> Eddy, 2002), this remains an inherentlydifficult problem for the downstream steps <strong>of</strong> genome annotation, <strong>and</strong> inparticular for the automated classification <strong>of</strong> the repeats. In theory, short seedextension approaches such as those used by RepeatScout or ReaS should beless sensitive to this problem <strong>and</strong> generate more faithfully defined consensussequences. Nevertheless, these programs still appear to suffer from the problem<strong>of</strong> repeat fragmentation, in particular when low coverage whole genome shotgunsequences are used as input (N. Ranganathan <strong>and</strong> C. F., unpublished observation).Classification <strong>of</strong> the repeatsRecent advances in the development <strong>of</strong> de novo repeat identification tools havebeen fueled by the pressing need to automate the construction <strong>of</strong> repeat librariesto assist in the assembly <strong>of</strong> the multitude <strong>of</strong> genome sequencing projects.All <strong>of</strong> the programs described above, in particular when used in combination(Quesneville et al., 2005), have the potential to greatly accelerate the construction<strong>of</strong> consensus repeat libraries <strong>and</strong> as such they should largely fulfill the needsfor raw repeat masking <strong>and</strong> assisting in genome assembly. However, the outputproduced by programs such as RECON (Bao <strong>and</strong> Eddy, 2002) or RepeatScout(Price et al., 2005) provide little if any biological information on the repeatsthemselves <strong>and</strong> therefore cannot be used to annotate <strong>and</strong> classify the repeats. Atpresent, there is no program to completely automate this tedious yet biologicallyimportant phase <strong>of</strong> genome annotation.Developing such s<strong>of</strong>tware represents a very challenging issue in computationalbiology. Not only are there currently only a few experts whose knowledge<strong>of</strong> the elements can cover the extreme diversity <strong>of</strong> TEs, but also new groups <strong>of</strong>elements are continuously being discovered <strong>and</strong> existing groups being reclassified,merged, or subdivided. Consequently, the body <strong>of</strong> TE literature is enormous<strong>and</strong> rapidly exp<strong>and</strong>ing. As <strong>of</strong> September 2005 a single keyword searchin Medline using the combination “(retro)transposable OR (retro)transposon”retrieves over 21 000 publications, including ~1500 review articles! Thus, synthesizingthe diversity <strong>of</strong> TEs into a rational <strong>and</strong> comprehensive classificationupon which one could develop an automatic system <strong>of</strong> repeat annotation isextremely difficult.As mentioned earlier, TEs are divided into two classes depending on theirtransposition intermediate. Class 1 elements transpose via reverse-transcription<strong>of</strong> a transcribed RNA intermediate, while class 2 elements move directly througha DNA intermediate (Finnegan, 1989). The two classes are further dividedinto distinct subclasses <strong>and</strong> a plethora <strong>of</strong> clades <strong>and</strong> superfamilies using otherstructural features related to other aspects <strong>of</strong> their transposition mechanism.For example, non-LTR retrotransposons used a target-primed mechanism <strong>of</strong>transposition where reverse-transcription <strong>and</strong> integration are coupled, whileLTR retrotransposons are reverse transcribed within a viral-like particle <strong>and</strong> theresulting cDNA integrated in the genome as a DNA intermediate.Uncorrected pro<strong>of</strong>s — not for distribution

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!