Sequencing

Recommendations

Info

11th Annual Sequencing, Finishing, and Analysis in the Future Meeting IMPROVING GENOME ANALYSIS USING LINKED-READS Friday, 3rd June 14:00 La Fonda Ballroom Talk (OS‐9.01) Deanna Church, Kristina Giorda, Cassandra Jabara, Sofia Kyriazopoulu Panagiotopoulou, Andrew Wei Xu, Heather Ordonez, Haynes Heaton, Mark Pratt, Patrick Marks, Paul Hardenbol, Adrian Fehr, Michael Schnall Levin 10x Genomics, Inc High‐throughput sequencing (HTS) has revolutionized genome analysis. Tens of thousands of genomes and hundreds of thousands of exomes have been analyzed globally allowing for new biological insights at both population and individual levels. Despite these advances, it has become increasingly clear that traditional methods are insufficient for providing a complete view of the genome. Paralogous sequences can often confound alignment, leaving biomedically important regions of the genome with low quality alignments and variant calls. Extracting information on large‐scale events, includingcopy number variants (CNVs) and complex structural variants (SVs), is challenging using only short read data. Further, haplotype‐level resolution in a single individual is not attainable using short read analysis. To address these problems, we have developed a technology that allows for the retention of long range information while retaining the power, accuracy, and scalability of short read sequencing technologies, producing a data type referred to as ‘Linked‐Reads’ that enables a more complete analysis of a genome. At its core, haplotype‐level dilution of long input molecules into over 1 million barcoded partitions allows for high‐resolution reference‐based analysis. We have demonstrated the ability to reconstruct individual haplotypes that span several megabases and have validated these haplotype reconstructions using trio sequencing data. Coupling Linked‐Reads with novel algorithms that take advantage of these linkages allows for improved performance in regions of the genome that are typically inaccessible due to the presence of paralogous sequence. Validation of these variant calls has been challenging as they typically fall outside the Genome In a Bottle (GIAB) high confidence regions, but we have confirmed several hundred of these using orthogonal sequencing technologies. The power of the long range linkages also enables the improved detection of complex structural variants. In addition to identifying copy number variants (CNVs) we detect inter and intra‐chromosomal events as well as more complex structural rearrangements. Linked‐ Read technology can be used in both a genome and targeted sequencing context, allowing access to a broader range of applications. The development of Linked‐Reads is an important step in the evolution of genome analysis by allowing access to more of the genome, resolving complex variants and reconstructing long‐range haplotypes. 143
11th Annual Sequencing, Finishing, and Analysis in the Future Meeting A REFERENCE-AGNOSTIC AND RAPIDLY QUERYABLE NGS READ DATA FORMAT ALLOWS FOR FLEXIBLE ANALYSIS AT SCALE Friday, 3rd June 14:20 La Fonda Ballroom Talk (OS‐9.02) Niranjan Shekar 1 , William Salerno 2 , Adam English 2 , Adina Mangubat 1 , Jeremy Bruestle 1 , Eric Boerwinkle 3 , Richard Gibbs 2 1 Spiral Genetics Inc, 2 Human Genome Sequencing Center Baylor College of Medicine, 3 University of Texas Health Science Center at Houston In identifying the complement of genetic variants that are associated with complex disease, larger sample sizes increase power. Studies such as the Alzheimer’s Disease Sequencing Project and the CHARGE Consortium where samples are collected from a range of centers show heterogeneous data, requiring informatics that can additively scale to thousands of samples and analytics that go beyond identifying small variants in NGS data. At scale, the challenge of evaluating SNPs, indels and SVs becomes the “N+1” problem of incrementally adding samples without having to perpetually reevaluate petabytes of population read data stored in BAM files. The Biograph Analysis Format (BAF) is a method of indexing NGS data that extends the Burrows Wheeler Transform to allow for multiple paths, effectively creating a read overlap graph of the data. A BAF of HiSeq X 30x WGS data is 8.3 Gb, 95% smaller than the corresponding BAM. Generated from the BAM in 14 hours, the BAF can be queried up to 200,000 times a second. Multiple BAFs can be combined, which at scale results in a file size of approximately 3GB per individual. Because the BAF can be batched across individuals, query time grows less than linearly with the number of individuals. For example, if 30,000 putative SV sites to be queried, SV‐typing these sites across 10,000 HiSeq X WGS samples in BioGraph Analysis Format would require less than 30 TB of storage (for all the read data), 16 CPU hours, and 10 minutes (using 100 machines). Here, we perform read over assembly to genotype 4,276 SVs larger than 80bp detected in at least one individual of the Ashkenazi Jewish Trio by Pindel. At 1,195 of these locations, there was at least one SV call in any one individual and all of these calls, except for 25 (2.1%) were consistent with mendelian inheritance. Further, read overlap assembly to genotype variants was performed at 3,935 locations where PBHoney called an SV with long read sequencing data on the same Trio. Of those, 1,327 locations had at least one genotype with all but 55 (4.1%) being consistent with mendelian inheritance. Additionally, the data are reference‐agnostic, so variants can be called against any reference or against the read graph of any other set of individuals, dramatically reducing the time for data harmonization. Further, information is divided such that the “read overlap graph” created from all the individuals is separate from the information indicating that path through the graph for each individual. This allows a search for a particular variation of interest directly from the read data remotely and rapidly, without the opportunity to reveal the exact individual(s) from that the variant originates. Because the data are essentially a read overlap graph, it is possible to accurately characterize SVs by traversing the graph from a particular location or search for a particular sequence associated with the SV. So, fast querying of small files with reasonable compute requirements provides an N+1 solution for SVs. 144
Page 1 and 2:
Sequencing, Finishing, Analysis in
Page 3 and 4:
11th Annual Sequencing, Finishing,
Page 5 and 6:
xGen ® Exome Research Panel • Re
Page 7 and 8:
Page 9 and 10:
Page 11 and 12:
Page 13 and 14:
Page 15 and 16:
Page 17 and 18:
Page 19 and 20:
Page 21 and 22:
Page 23 and 24:
Page 25 and 26:
Page 27 and 28:
Page 29 and 30:
Page 31 and 32:
Page 33 and 34:
Page 35 and 36:
Page 37 and 38:
Page 39 and 40:
Page 41 and 42:
Page 43 and 44:
Page 45 and 46:
Page 47 and 48:
Page 49 and 50:
Page 51 and 52:
Page 53 and 54:
Page 55 and 56:
Page 57 and 58:
Page 59 and 60:
Page 61 and 62:
Page 63 and 64:
Page 65 and 66:
Page 67 and 68:
Page 69 and 70:
Page 71 and 72:
Page 73 and 74:
Page 75 and 76:
Page 77 and 78:
Page 79 and 80:
Page 81 and 82:
Page 83 and 84:
Page 85 and 86:
Page 87 and 88:
Page 89 and 90:
Page 91 and 92:
Page 93 and 94: 11th Annual Sequencing, Finishing,
Page 143: 11th Annual Sequencing, Finishing,
Page 161 and 162: Reliable solutions for focused NGS
Page 167: 166
show all

Sequencing

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?