01.06.2016 Views

Sequencing

SFAF2016%20Meeting%20Guide%20Final%203

SFAF2016%20Meeting%20Guide%20Final%203

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

11th Annual <strong>Sequencing</strong>, Finishing, and Analysis in the Future Meeting<br />

ALGORITHMIC SOLUTIONS FOR HIGH ACCURACY<br />

GENE FINDING IN JUST ASSEMBLED NGS GENOMES<br />

Thursday, 2nd June 9:30 La Fonda Ballroom Invited Speaker (IS‐2)<br />

Mark Borodovsky<br />

Georgia Institute of Technology<br />

Gene prediction and annotation plays central role in genomics. However, in spite of much attention, open<br />

problems still exist and stimulate a search for new algorithmic solutions in all categories of gene finding.<br />

Prokaryotic genes can be identified with higher average accuracy than eukaryotic ones. Nevertheless,<br />

the error rate is not negligible and largely species‐specific. Most errors are made in prediction of genes<br />

located in genomic regions with atypical G+C composition. I will talk about our efforts to improve<br />

GeneMarkS, a self‐training tool used in many genome projects. The new tool, GeneMarkS‐2 (Tang et al.,<br />

submitted), uses local G+C‐specific heuristic models to make initial predictions of atypical genes that<br />

serve as ‘external’ evidence in subsequent self‐training iterations. Unlike the current GeneMarkS the<br />

new tool makes adjustments of the model structure within the self‐training process. In multiple tests we<br />

have demonstrated that the new tool is favorably compared to the existing gene finders.<br />

We also report progress in developing tools for structural annotation of eukaryotic genomes. We have<br />

constantly updated the self‐training ab initio gene prediction tool, GeneMark‐ES. Recently, it was<br />

extended to fully automated GeneMark‐ET (Lomsadze et al., 2014) that integrates information on<br />

mapped RNA‐Seq reads as well as extension to GeneMark‐EP (Lomsadze et al., in preparation) that uses<br />

initial self‐training and gene prediction to generate external evidence in terms of genomic footprints of<br />

homologous proteins.<br />

For ab initio gene prediction in fungal genomes we have developed fungi specific self‐training methods.<br />

The constantly updated fungal version of GeneMark‐ES has been used in a number of DOE JGI and Broad<br />

Institute fungal genome sequencing projects since 2007.<br />

Our metagenomic gene finder, MetaGeneMark (Zhu et al., 2010) employed in IMG/M for metagenome<br />

annotation and conventionally used for analysis of bacterial and arhaeal sequences was further<br />

developed to predict genes in fungal metagenomes.<br />

Finally, we describe BRAKER1 (Hoff et al., 2015), a pipeline for unsupervised RNA‐Seq‐based genome<br />

annotation that combines advantages of GeneMark‐ET and AUGUSTUS. We observed that BRAKER1 was<br />

more accurate than MAKER2 (Holt et al., 2011) when it is using assembled RNA‐Seq as sole source of<br />

extrinsic evidence. BRAKER1 does not require pre‐trained parameters or a separate manually curated<br />

training step.<br />

All the tools described above can be applied for analysis of just assembled NGS genomes.<br />

Speaker’s biographical sketch<br />

Mark Borodovsky is a Regents' Professor at the Join Wallace H. Coulter Department of Biomedical<br />

Engineering of Georgia Institute of Technology and Emory University and Director of the Center for<br />

Bioinformatics and Computational Genomics at Georgia Tech. He is also a Chair of the Department of<br />

Bioinformatics at the Moscow Institute of Physics and Technology in Moscow, Russia.<br />

Mr. Borodovsky is interested in promoting bioinformatics education. He is a Founder of the Georgia<br />

Tech Bioinformatics M.Sc. and Ph.D. Program, a Member of Educational Committee of the International<br />

Society of Computational Biology as well as organizer of a series of International Conferences in<br />

Bioinformatics at Georgia Tech started in 1997.<br />

102

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!