2014.03.31.GenomeAccess.Sequencing and Assembly
2014.03.31.GenomeAccess.Sequencing and Assembly
2014.03.31.GenomeAccess.Sequencing and Assembly
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
Ingredients for a good assembly<br />
Coverage<br />
L<strong>and</strong>er Waterman Expected Contig Length vs Coverage<br />
Read Length<br />
Quality<br />
Expected Contig Length<br />
Expected Contig Length (bp)<br />
100 1k 10k 100k 1M<br />
dog N50 +<br />
dog mean +<br />
p<strong>and</strong>a N50 +<br />
p<strong>and</strong>a mean +<br />
1000 bp<br />
710 bp<br />
250 bp<br />
100 bp<br />
52 bp<br />
30 bp<br />
0 5 10 15 20 25 30 35 40<br />
Read Coverage<br />
Read Coverage<br />
High coverage is required<br />
– Oversample the genome to ensure<br />
every base is sequenced with long<br />
overlaps between reads<br />
– Biased coverage will also fragment<br />
assembly<br />
Reads & mates must be longer<br />
than the repeats<br />
– Short reads will have false overlaps<br />
forming hairball assembly graphs<br />
– With long enough reads, assemble<br />
entire chromosomes into contigs<br />
Errors obscure overlaps<br />
– Reads are assembled by finding<br />
kmers shared in pair of reads<br />
– High error rate requires very short<br />
seeds, increasing complexity <strong>and</strong><br />
forming assembly hairballs<br />
Current challenges in de novo plant genome sequencing <strong>and</strong> assembly<br />
Schatz MC, Witkowski, McCombie, WR (2012) Genome Biology. 12:243