Application of Sequencing Alignment Techniques toCUDA GPU TechnologiesNvidia GTC5/15/2012D. Andrew Carr, Ph.D.Director of Bioinformaticsacarr@atlab.comPowered by:
The key: locate thebest match(es)• The computational question is: at whatlocation(s) do the best match(es) occur?The biological question is: Within the context of thematch, what sequence differences are observed?Perfect alignmentProper short read alignment?
Theory to practice: the three mostcommon alignment algorithms.• Dynamic programming– Smith-Waterman (local alignment)• Accuracy: good with gapped pairs• Processing: Computationally expensive O(N 2 ) and with trace-back a lot ofmemory is required; this is slow• Limitations: indexing to find targets is required.– Needleman-Wunsch (global alignment)• Good for small genomes and long matching alignments• Processing: O(N 2 ) Talk today showed novel pruning technique for in largematches.• Limitations: requires hard left hand bound known query and target size.– Basic Local Alignment Search Tool (BLAST)• Accuracy: finds 9mer exact matches between read and scaffold, then expands out– Burke et al.*showed that a 7-mer gave best accuracy but it was computationallyintractable.– Our research shows that a key of 4-6 is even better depending on the algorithm.• Limitations: misses locations when reads are shortBurke,J., Davison,D. and Hide,W. (1999) d2 cluster: a validated method for clustering EST and full-length cDNA sequences. Genome Res., 9, 1135–1142.
Why GPUs?• Massively parallel environment ideal for searching large numbers oflocations simultaneously!• Very complex space –– 1000 Genomes * 6 Billion nucleotides.– 100 Mb per run search space.– Concerned with inexact matching• Variant sources of noise– Sequencer– Algorithmic– Analyst• Currently researchers have to run more than one tool.
Current GPU based tools• Burrows-Wheeler Algorithm (BWA) implementations– CUSHAW (http://sourceforge.net/projects/cushaw/)• Features: 10x faster than standard architecture BWA• Limitations: no gaps and a single pre-compressed target– BarraCUDA (http://www.many-core.group.cam.ac.uk/projects/lam.shtml)• Features: ~ as fast as MT BWA, same accuracy as BWA• Limitations: single pre-compressed target– SOAP3-GPU (http://soap.genomics.org.cn/soap3.html), “the next version of SOAP”• Features: combination of indexing and compression• Limitations: single pre-compressed target• Smith-Waterman implementations– CUDASW++• Features: Smith-Waterman accuracy• Limitations: handles only segments of >=144nt and a scaffold of 59,000nt– SWIFT• Seeded Hash• Removes repeats in genome by default– SeqNFind® SW• Features: scaffold can be a full chromosome –length(on a single card), queries up to 256nt; optimizedto run on a GPU cluster. Target indexing and compression on the fly
A test data set for alignmentalgorithms• Synthetic Sequences– 1 million 50mers• Randomly selected from the human genome• Randomly altered with up to 3 alterations.– Insertions, deletions and single nucleotide modifications– This is done to reflect frequency in real data and to push algorithmic assumptions• Aligned with BWA and Smith-Waterman• BWA (1) and Bowtie (CPU version) (2)– (1) Li H. and Durbin R. (2010) Fast and accurate long-read alignment with Burrows-Wheeler Transform. Bioinformatics– (2) Langmead B, et al Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biology 10:R25.– 86% accuracy• Within 30bp of target• best hit only• Expected BWA – best at 200+ bp (http://bio-bwa.sourceforge.net/)• SeqNFind® Smith-Waterman– 98% accurate– May be second best hit in genome but due to reporting method of all locations withina set score the location was always returned.
Why are locations missed?• Algorithmic properties must beconsidered with respect to bio-sequenceproperties– BWA:• Most commonly when Mismatch and Gap consecutive– SW:• When the errors are close to the beginning of the sequence.Not enough accumulated score.
Barriers to using GPUsfor Genomics– Data Size:• The genomic data is massive• Requires compression in a searchable manner– Even with compression difficult to keep within the band-width frame work.– Requires a new algorithmic approach– GPUs are great for highly parallel computation– Difficult due to the parallel memory access requirement» Genomic search and comparison requires heavy address lookup» Long development time –rewards unknown• Bioinformatics Community slow to accept new tools or technologies– Willing to accept current high error rate– Lack of awareness of power and application of GPUs to genomics– Wants open source» Open Source Validation !!!!– Single card» Multiple cards may be better…
SecondsLessons learned in DevelopingGPU Genomics tools• Multiple card band-width– PCI bus limiting– High bandwidth tasks.• Large amounts of data.• Want more registers– And increased cache to allow morethreads.• CUDA quirks -- CUDA has it’s ownrules– Internal loops on the cache betweenVRAM or texture map address lookups• Influence run times.• Example of vRAM call vs. local cacheloop.– Depends on the driver• Driver (Volatile)– Can affect speed by 4-50% (negatively)• Most recent– Test set of 33mer queries against ChrY– Workstation with 4 2050s.25020015010050Driver0Time to run on multiple Fermi v4.11 2 3 42 Cards(seconds)Cards4 Card (seconds)4.0 136 1014.1 128 864.2 153 168Recent test of SeqNFind® SWshowing variation due to change of driver.
GPUs are the future:of Bioinformatics• Massively parallel environment ideal for searching large numbers oflocations simultaneously!• Advent of Hyper–Q and Dynamic Parallelism !!!!!– Very exciting!• 20k more cores – more threads!!!• More complicated space to come– Methylation and Acetylation data– FASTQ scores– Topological information– Conformational information– MORE GENOMES –(Diploid)
The Future of GPUs andGenomicsAs GPUs improve more currently usedalgorithms will be easy to port.◦ These will carry with them their underlyingflaws◦ Not necessarily best for GPU architecture.◦ Solution: Innovation Beyond the 1D:◦ More and more genomic related attributes arelinked with structure.• G-Quadraplexes• Hairpins• mRNARNA Secondary Structure: vertebrate telomerase.Image from RFAM database◦ Although protein tools exist for folding the lessstringent nature of the nucleotide backbonemakes it more of a challenge.◦ Bridging 1D into 3DImage copyright D. Andrew CarrDeveloped in CarrVis
Acknowledgements• ATL– Don Kolva– Dr. Christine Paszko• UNCC –– Dr. Shannon Schlueter– Dr. Jennifer Weller• Saeed Khoshnevis• Chris OverallD. Andrew Carr, Ph.D.Director of Bioinformaticsacarr@atlab.com
Questions?ATL offers the first GPU based complete Sequence analysis solution.For additional information please call 800.565.5467 to speak with a Sales RepresentativeWorkstation andclusterconfigurationsare available.