01.06.2016 Views

Sequencing

SFAF2016%20Meeting%20Guide%20Final%203

SFAF2016%20Meeting%20Guide%20Final%203

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

11th Annual <strong>Sequencing</strong>, Finishing, and Analysis in the Future Meeting<br />

SYNERCLUST, A TRULY SCALABLE ORTHOLOG<br />

CLUSTERING TOOL<br />

Wednesday, 1st June 20:00 La Fonda NM Room (1st floor) Poster (PS‐1b.21)<br />

Christophe Georgescu 1 , Alison D Griggs 1 , Aviv Regev 1 , Ilan Wapinski 2 ,<br />

Brian J Haas 1 , Ashlee Earl 1<br />

1 Broad Institute, 2 enEvolv<br />

Accurate ortholog identification is a vital component of comparative genomic studies. Popular<br />

sequence similarity based approaches, such as OrthoMCL, struggle to cluster or‐ thologs when there<br />

are high rates of paralogs, and although phylogeneticbased methods handle paralogs, they are not<br />

sufficiently fast or scalable to work on large sets of whole genomes. Fur‐ thermore, most approaches<br />

do not take synteny into account, which means information useful for distinguishing paralogs is<br />

unused. Synergy, originally developed to work on eukaryotic species, uses a hybrid approach to<br />

resolve ortholog clusters, relying upon sequence similarity, synteny and phy‐ logeny. Here, we present<br />

Synerclust, a tool that takes the fundamentals of Synergy and adds a number of improvements<br />

that retain Synergy’s high accuracy, but makes it amenable to ortholog clustering of hundreds to<br />

thousands of whole genome data sets, representing either eukaryotic or prokaryotic species.<br />

SynerClust bypasses the all vs all Blast requirement inherent to other cluster‐ ing tools by selecting<br />

and comparing cluster representatives at each node in an input species tree. Working from tip to<br />

root, SynerClust solves and keeps track of orthology relationships, ultimately providing the most<br />

parsimonious solution that takes into account gene gains and losses, common‐ place in prokaryotes.<br />

We have also optimized SynerClust for memory usage and made it amenable for running on many<br />

different compute infrastructures.<br />

87

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!