Sequencing
SFAF2016%20Meeting%20Guide%20Final%203
SFAF2016%20Meeting%20Guide%20Final%203
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
11th Annual <strong>Sequencing</strong>, Finishing, and Analysis in the Future Meeting<br />
SYNERCLUST, A TRULY SCALABLE ORTHOLOG<br />
CLUSTERING TOOL<br />
Wednesday, 1st June 20:00 La Fonda NM Room (1st floor) Poster (PS‐1b.21)<br />
Christophe Georgescu 1 , Alison D Griggs 1 , Aviv Regev 1 , Ilan Wapinski 2 ,<br />
Brian J Haas 1 , Ashlee Earl 1<br />
1 Broad Institute, 2 enEvolv<br />
Accurate ortholog identification is a vital component of comparative genomic studies. Popular<br />
sequence similarity based approaches, such as OrthoMCL, struggle to cluster or‐ thologs when there<br />
are high rates of paralogs, and although phylogeneticbased methods handle paralogs, they are not<br />
sufficiently fast or scalable to work on large sets of whole genomes. Fur‐ thermore, most approaches<br />
do not take synteny into account, which means information useful for distinguishing paralogs is<br />
unused. Synergy, originally developed to work on eukaryotic species, uses a hybrid approach to<br />
resolve ortholog clusters, relying upon sequence similarity, synteny and phy‐ logeny. Here, we present<br />
Synerclust, a tool that takes the fundamentals of Synergy and adds a number of improvements<br />
that retain Synergy’s high accuracy, but makes it amenable to ortholog clustering of hundreds to<br />
thousands of whole genome data sets, representing either eukaryotic or prokaryotic species.<br />
SynerClust bypasses the all vs all Blast requirement inherent to other cluster‐ ing tools by selecting<br />
and comparing cluster representatives at each node in an input species tree. Working from tip to<br />
root, SynerClust solves and keeps track of orthology relationships, ultimately providing the most<br />
parsimonious solution that takes into account gene gains and losses, common‐ place in prokaryotes.<br />
We have also optimized SynerClust for memory usage and made it amenable for running on many<br />
different compute infrastructures.<br />
87