13.07.2015 Views

The Genom of Homo sapiens.pdf

The Genom of Homo sapiens.pdf

The Genom of Homo sapiens.pdf

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

126 WANG, BUHLER, AND BRENTTWINSCAN% ID in aligned CDS (genome similarity) % ID in aligned CDS (genome similarity)Figure 2. (A) TWINSCAN performance on the greater CFTR region <strong>of</strong> mouse with informant sequence from various organisms, plottedby accuracy <strong>of</strong> exact exon prediction (Y-axis) versus percent identity in aligned coding regions (a proxy for evolutionary distance,X-axis). (B) BLASTN alignments between the greater CFTR region <strong>of</strong> mouse and informant sequence from various organisms. Foreach pair <strong>of</strong> species, both the percentage <strong>of</strong> CDS sequence that aligns and the percentage <strong>of</strong> intron sequence that aligns (excludingsplice sites) are plotted against percent identity in aligned coding regions.SCAN accuracy using each informant database is plottedagainst the percent identity in aligned mouse coding sequences(a proxy for evolutionary divergence), the resultis a unimodal curve peaking at chicken (Fig. 2A). Humanis the second best informant for mouse, then Fugu, thenrat. Accuracy with cat as the informant is very similar toaccuracy with human, as expected given their similar divergencefrom mouse; likewise, the Tetraodon and Fuguinformant sequences yield similar accuracy.To gain insight into these accuracy results, we analyzedthe BLASTN alignments that were used to createthe conservation sequences for TWINSCAN (see Methods).For each informant database, we compared the percentage<strong>of</strong> mouse coding sequence (CDS) that aligns withthe informant to the percentage <strong>of</strong> mouse intron sequencethat aligns with the informant (excluding splice site regions).This analysis provides a compelling explanationfor the observed differences in gene prediction accuracy(Fig. 2B). <strong>The</strong> comparison between mouse and Fugu exhibitsessentially no alignment in the introns, but less than40% <strong>of</strong> the CDS aligns. Moving closer in divergence, intronalignment remains very low in the mouse–chickencomparison (0.3%), but CDS alignment jumps to morethan 80%. Thus, chicken alignments appear to have greatpower to discriminate between coding and noncoding sequence.Moving even closer, the human alignments coverten times more <strong>of</strong> the mouse introns than do the chickenalignments, but only about 1.2 times more <strong>of</strong> the CDS.Since there is 48 times more noncoding sequence thanexon sequence in this region, about 25 times more noncodingbases than CDS bases align to rat. Intuitively, thefact that many more <strong>of</strong> the aligned bases are noncodingthan coding would seem to yield little discriminativepower, even though a higher percentage <strong>of</strong> coding basesare aligned than noncoding bases. <strong>The</strong> results for cat andhuman are very similar to one another, as are those forFugu and Tetraodon, suggesting that most <strong>of</strong> the observeddifferences are due to evolutionary distance.Close examination <strong>of</strong> individual genes provides additionalinsight into how alignments affect gene-structureprediction. For example, the CFTR gene itself contains 27exons and spans more than 150 kb <strong>of</strong> genomic sequence.Alignments <strong>of</strong> four informant databases from different lineagesto the mouse CFTR gene are shown in Figure 3A.Clearly, chicken alignments correspond very closely to themouse exons, whereas many exons are missed by the fishtetraodonFigure 3. (A) <strong>The</strong> mouse CFTR gene (red) together with BLASTN alignments from the “greater CFTR” regions <strong>of</strong> Tetraodon,chicken, human, and rat. (B) <strong>The</strong> mouse CFTR gene (red), the corresponding TWINSCAN prediction without using any informant(green), the TWINSCAN prediction using chicken as the informant (blue), and blocks <strong>of</strong> mouse–chicken alignment (black).

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!