Shotgun Sequencing

First Presentation - DIMACS REU

Shotgun SequencingMultiple copies of genomeShearedBIGrandom fragmentsDATASequenced ReadsContig AssemblyScaffold Assembly

Assembly difficulties• 0.1%-15% per base error rate depending ontechnology• One solution: Remove all infrequent K-mers

anana-slug.soe.ucsc.eduBase call error filtering

Single Cell Genomics

www.nature.comSingle cell problems

Single cell problemsHeterogeneous Sample CoverageAmplified Single Cell Sample et al.

A possibility• Collapse hamming distance balls around mostfrequent

How can this be improved?• First: Does this really work? How well?– How many real K-mers do we lose?– What kind of data performs best?• More biologically sophisticated heuristics– Ex. Maximum Likelihood/Maximum entropy• Find pairs of reliable K-mers with reliabledistance estimate

More magazines by this user
Similar magazines