Sequencing
SFAF2016%20Meeting%20Guide%20Final%203
SFAF2016%20Meeting%20Guide%20Final%203
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
11th Annual <strong>Sequencing</strong>, Finishing, and Analysis in the Future Meeting<br />
ALGORITHMIC SOLUTIONS FOR HIGH ACCURACY<br />
GENE FINDING IN JUST ASSEMBLED NGS GENOMES<br />
Thursday, 2nd June 9:30 La Fonda Ballroom Invited Speaker (IS‐2)<br />
Mark Borodovsky<br />
Georgia Institute of Technology<br />
Gene prediction and annotation plays central role in genomics. However, in spite of much attention, open<br />
problems still exist and stimulate a search for new algorithmic solutions in all categories of gene finding.<br />
Prokaryotic genes can be identified with higher average accuracy than eukaryotic ones. Nevertheless,<br />
the error rate is not negligible and largely species‐specific. Most errors are made in prediction of genes<br />
located in genomic regions with atypical G+C composition. I will talk about our efforts to improve<br />
GeneMarkS, a self‐training tool used in many genome projects. The new tool, GeneMarkS‐2 (Tang et al.,<br />
submitted), uses local G+C‐specific heuristic models to make initial predictions of atypical genes that<br />
serve as ‘external’ evidence in subsequent self‐training iterations. Unlike the current GeneMarkS the<br />
new tool makes adjustments of the model structure within the self‐training process. In multiple tests we<br />
have demonstrated that the new tool is favorably compared to the existing gene finders.<br />
We also report progress in developing tools for structural annotation of eukaryotic genomes. We have<br />
constantly updated the self‐training ab initio gene prediction tool, GeneMark‐ES. Recently, it was<br />
extended to fully automated GeneMark‐ET (Lomsadze et al., 2014) that integrates information on<br />
mapped RNA‐Seq reads as well as extension to GeneMark‐EP (Lomsadze et al., in preparation) that uses<br />
initial self‐training and gene prediction to generate external evidence in terms of genomic footprints of<br />
homologous proteins.<br />
For ab initio gene prediction in fungal genomes we have developed fungi specific self‐training methods.<br />
The constantly updated fungal version of GeneMark‐ES has been used in a number of DOE JGI and Broad<br />
Institute fungal genome sequencing projects since 2007.<br />
Our metagenomic gene finder, MetaGeneMark (Zhu et al., 2010) employed in IMG/M for metagenome<br />
annotation and conventionally used for analysis of bacterial and arhaeal sequences was further<br />
developed to predict genes in fungal metagenomes.<br />
Finally, we describe BRAKER1 (Hoff et al., 2015), a pipeline for unsupervised RNA‐Seq‐based genome<br />
annotation that combines advantages of GeneMark‐ET and AUGUSTUS. We observed that BRAKER1 was<br />
more accurate than MAKER2 (Holt et al., 2011) when it is using assembled RNA‐Seq as sole source of<br />
extrinsic evidence. BRAKER1 does not require pre‐trained parameters or a separate manually curated<br />
training step.<br />
All the tools described above can be applied for analysis of just assembled NGS genomes.<br />
Speaker’s biographical sketch<br />
Mark Borodovsky is a Regents' Professor at the Join Wallace H. Coulter Department of Biomedical<br />
Engineering of Georgia Institute of Technology and Emory University and Director of the Center for<br />
Bioinformatics and Computational Genomics at Georgia Tech. He is also a Chair of the Department of<br />
Bioinformatics at the Moscow Institute of Physics and Technology in Moscow, Russia.<br />
Mr. Borodovsky is interested in promoting bioinformatics education. He is a Founder of the Georgia<br />
Tech Bioinformatics M.Sc. and Ph.D. Program, a Member of Educational Committee of the International<br />
Society of Computational Biology as well as organizer of a series of International Conferences in<br />
Bioinformatics at Georgia Tech started in 1997.<br />
102