ShotgunSequencingMultiple copies of genomeShearedBIGrandom fragmentsDATASequenced ReadsContig AssemblyScaffold Assembly
Assembly difficulties• 0.1%-15% per base error rate depending ontechnology• One solution: Remove all infrequent K-mers
anana-slug.soe.ucsc.eduBase call error filtering
Single Cell Genomics
www.nature.comSingle cell problems
Single cell problemsHeterogeneous Sample CoverageAmplified Single Cell Sample Coveragewww.neb.uk.comNikolenko et al.
A possibility• Collapse hamming distance balls around mostfrequent K-merswww.math.cornell.edu
How can this be improved?• First: Does this really work? How well?– How many real K-mers do we lose?– What kind of data performs best?• More biologically sophisticated heuristics– Ex. Maximum Likelihood/Maximum entropy• Find pairs of reliable K-mers with reliabledistance estimate