Sequencing
SFAF2016%20Meeting%20Guide%20Final%203
SFAF2016%20Meeting%20Guide%20Final%203
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
11th Annual <strong>Sequencing</strong>, Finishing, and Analysis in the Future Meeting<br />
EXPLORATION OF READ DEPTH AND LENGTH FOR<br />
STRUCTURAL VARIATION DETECTION<br />
Wednesday, 1st June 20:00 La Fonda NM Room (1st floor) Poster (PS‐1b.22)<br />
Adam English, Jesse Farek, Donna Muzny, William Salerno,<br />
Eric Bowerwinkle, Richard Gibbs<br />
Baylor College of Medicine<br />
Long‐read sequencing (>1 kbp) offers more complete genomic information when compared to shortread<br />
sequencing (~100 bp), but the accuracy and relatively high cost‐per‐base limits the practicality<br />
of long reads as the sole data source in high‐throughput whole‐genome sequencing projects. An alternate,<br />
more cost‐effective strategy is to combine data types, which has been effectively implemented<br />
by de novo assembly tools including pacbioToCA and PBJelly. Here we illustrate how SV detection<br />
varies with different combinations of sequencing technologies, methods, and coverages.<br />
We first create calls from a haploid cell‐line CHM1‐tert from PBHoney (PMID: 24915764) from 40x<br />
PacBio coverage, 134x/400 bp Illumina data and an independently derived set of PacBio SVs through<br />
Parliament (PMID: 25886820), a consolidation SV discovery tool, to generate ~25,000 variant loci,<br />
~9,000 of which are supported by short‐ and long‐read hybrid assembly.<br />
Next, using lower per‐data type coverage, we explore SV detection when applied to the diploid<br />
human HS1011 using 20x PacBio coverage (i.e., 10x per haploid genome), and multiple coverages<br />
and insert sizes of Illumina paired‐end sequencing as well as other technologies including aCGH and<br />
BioNano Irys optical mapping. These combinations show that PacBio data for evaluation expands<br />
the hybrid assembled variants by 42% and PBHoney’s PacBio discovery by an additional 46%.<br />
Finally, we evaluate the added value of long‐read data of an Ashkenazim trio with ~30x coverage for<br />
each parent and ~60x proband coverage. We find a Mendelian consistency rate of 90% for parental<br />
homozygous calls and 75% for proband homozygous calls.<br />
By exploring coverage titration points, we have quantified the impact on SV detection of specific<br />
combinations of short‐ and long‐read data. Together, these experiments suggest that robust SV<br />
detection from whole‐genome data can be achieved with hybrid read data at notably low coverages.<br />
88