Sequencing
SFAF2016%20Meeting%20Guide%20Final%203
SFAF2016%20Meeting%20Guide%20Final%203
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
11th Annual <strong>Sequencing</strong>, Finishing, and Analysis in the Future Meeting<br />
TINK: A NOVEL EUKARYOTIC EVIDENCE BASED<br />
PAN-TRANSCRIPTOME GENERATION PIPELINE<br />
Wednesday, 1st June 18:30 La Fonda NM Room (1st floor) Poster (PS‐1a.03)<br />
Chandler Roe, Jason Travis, Nathan Hicks, Elizabeth Driebe, David Engelthaler, Paul Keim<br />
TGen North<br />
While next generation sequencing has become an increasingly easy laboratory procedure, eukaryotic<br />
genome annotation is still a challenging bioinformatic task. High‐throughput mRNA sequencing<br />
(RNA‐Seq) platforms allow for a variety of applications such as novel transcript and isoform discovery,<br />
expression estimate analysis, alternative splicing as well as exploration of non‐model‐organism<br />
transcriptomes. However, the required genome assembly and annotation is a complicated and timeconsuming<br />
process that requires multiple steps and command line skills. Our pipeline, TINK, generates<br />
an evidence based pan‐transcriptome reference to be used for RNA‐Seq analysis. It provides<br />
a rapid, all encompassing, one‐time analysis that allows for discovery of unique transcripts. This<br />
pipeline combines ab initio gene prediction using the program AUGUSTUS, protein homology prediction<br />
utilizing AAT and de novo RNASeq assemblies using both PASA and Trinity. These results<br />
are weighted and combined using EvidenceModeler to create individual genome annotations for<br />
each sequenced sample and further compiles, clusters and de‐replicates these annotations to create<br />
a novel pan‐transcriptome reference. We have used this technique to explore differential expression<br />
and identify novel transcripts from the fungal pathogen Cryptococcus gatti. This pathogen has been<br />
characterized into four types, I‐IV, within which subgroups exist. In order to capture transcripts<br />
unique to one subtype of C. gattii as well as differing expression levels, multiple analyzes would need<br />
to be performed using a different reference each time, which is both computationally expensive and<br />
time consuming. TINK provided a reference to allow a single analysis on this data, greatly reducing<br />
time and resources.<br />
37