01.06.2016 Views

Sequencing

SFAF2016%20Meeting%20Guide%20Final%203

SFAF2016%20Meeting%20Guide%20Final%203

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

11th Annual <strong>Sequencing</strong>, Finishing, and Analysis in the Future Meeting<br />

EXPLORATION OF READ DEPTH AND LENGTH FOR<br />

STRUCTURAL VARIATION DETECTION<br />

Wednesday, 1st June 20:00 La Fonda NM Room (1st floor) Poster (PS‐1b.22)<br />

Adam English, Jesse Farek, Donna Muzny, William Salerno,<br />

Eric Bowerwinkle, Richard Gibbs<br />

Baylor College of Medicine<br />

Long‐read sequencing (>1 kbp) offers more complete genomic information when compared to shortread<br />

sequencing (~100 bp), but the accuracy and relatively high cost‐per‐base limits the practicality<br />

of long reads as the sole data source in high‐throughput whole‐genome sequencing projects. An alternate,<br />

more cost‐effective strategy is to combine data types, which has been effectively implemented<br />

by de novo assembly tools including pacbioToCA and PBJelly. Here we illustrate how SV detection<br />

varies with different combinations of sequencing technologies, methods, and coverages.<br />

We first create calls from a haploid cell‐line CHM1‐tert from PBHoney (PMID: 24915764) from 40x<br />

PacBio coverage, 134x/400 bp Illumina data and an independently derived set of PacBio SVs through<br />

Parliament (PMID: 25886820), a consolidation SV discovery tool, to generate ~25,000 variant loci,<br />

~9,000 of which are supported by short‐ and long‐read hybrid assembly.<br />

Next, using lower per‐data type coverage, we explore SV detection when applied to the diploid<br />

human HS1011 using 20x PacBio coverage (i.e., 10x per haploid genome), and multiple coverages<br />

and insert sizes of Illumina paired‐end sequencing as well as other technologies including aCGH and<br />

BioNano Irys optical mapping. These combinations show that PacBio data for evaluation expands<br />

the hybrid assembled variants by 42% and PBHoney’s PacBio discovery by an additional 46%.<br />

Finally, we evaluate the added value of long‐read data of an Ashkenazim trio with ~30x coverage for<br />

each parent and ~60x proband coverage. We find a Mendelian consistency rate of 90% for parental<br />

homozygous calls and 75% for proband homozygous calls.<br />

By exploring coverage titration points, we have quantified the impact on SV detection of specific<br />

combinations of short‐ and long‐read data. Together, these experiments suggest that robust SV<br />

detection from whole‐genome data can be achieved with hybrid read data at notably low coverages.<br />

88

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!