13.07.2015 Views

Computational Analysis and Paleogenomics of Interspersed ...

Computational Analysis and Paleogenomics of Interspersed ...

Computational Analysis and Paleogenomics of Interspersed ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

<strong>Computational</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Interspersed</strong> Repeats | 45Consensus sequences <strong>and</strong> ancestral repeat reconstructionThe consensus sequence is generated from a multiple alignment <strong>of</strong> a sample<strong>of</strong> interspersed copies assigned to the same TE family using a simple majorityrule for each base position in the alignment (Fig. 2.3). The consensus may befurther improved by manually correcting ambiguous or overlooked positionsthat could introduce stop codons <strong>and</strong> frameshifts in the ORF(s). Given thatall TEs from a family arose from a single or few active TE copies (also knownas source or master elements), the consensus can be considered a particularlyaccurate approximation <strong>of</strong> the sequence <strong>of</strong> the original master copy that gaverise to the family under consideration ( Jurka <strong>and</strong> Milosavljevic, 1991; Smit etal., 1995). This principle has been experimentally validated through the demonstrationthat the reconstructed consensus sequence <strong>of</strong> three different animalDNA transposase families restored their enzymatic activity in vitro <strong>and</strong>/or invivo (Ivics et al., 1997; Lampe et al., 1996; Miskey et al., 2003). Another earlysuccessful exploitation <strong>of</strong> the consensus approach was the resurrection <strong>of</strong> anextinct promoter for the human L1 retrotransposon based on a phylogeneticallyreconstructed consensus (Adey et al., 1994).An extension <strong>of</strong> the concept <strong>of</strong> consensus is that sequence comparisons <strong>of</strong>individual repeat copies to their consensus can be used to infer an approximateage <strong>of</strong> the repeat family, i.e. the time lapsed since transposition <strong>and</strong> insertion<strong>of</strong> the copies (Kapitonov <strong>and</strong> Jurka, 1996; Smit et al., 1995). This is because atCopy 1Copy 2Copy 3Copy 4Copy 5Copy 6Copy 7Copy 8CONS.66756756Figure 2.3 The concept <strong>of</strong> TE consensus sequence <strong>and</strong> its utility to infer the age <strong>of</strong> aTE family. In this schematic representation, eight related TE sequences are shown asthick horizontal lines. The consensus (CONS.) is deduced from a multiple alignment<strong>of</strong> TE sequences that vary at several Figure positions. 3 Most sequence variations occur atdiscrete positions, but some are also shared between copies. All shared <strong>and</strong> nonsharednucleotide changes may be used as phylogenetic characters to infer the relationship<strong>of</strong> the elements within the family (see Figure 2.4). Changes include single nucleotidesubstitutions (vertical ticks) <strong>and</strong> insertions (triangle above the sequence) <strong>and</strong> deletions(interrupted line) <strong>of</strong> variable size. The number <strong>of</strong> pairwise nucleotide substitutionsto the consensus is shown for each copy on the right h<strong>and</strong> side <strong>of</strong> the figure. Theaverage number <strong>of</strong> substitution to the consensus (here n = 6) can be used to infer theapproximate age <strong>of</strong> the family if the neutral rate <strong>of</strong> nucleotide substitution per year isknown for the host species. For example, if the neutral rate is 1 substitution per site permillion year, the family depicted here would be ~6 million year old.Uncorrected pro<strong>of</strong>s — not for distribution

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!