25.04.2014 Views

2014.03.31.GenomeAccess.Sequencing and Assembly

2014.03.31.GenomeAccess.Sequencing and Assembly

2014.03.31.GenomeAccess.Sequencing and Assembly

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Ingredients for a good assembly<br />

Coverage<br />

L<strong>and</strong>er Waterman Expected Contig Length vs Coverage<br />

Read Length<br />

Quality<br />

Expected Contig Length<br />

Expected Contig Length (bp)<br />

100 1k 10k 100k 1M<br />

dog N50 +<br />

dog mean +<br />

p<strong>and</strong>a N50 +<br />

p<strong>and</strong>a mean +<br />

1000 bp<br />

710 bp<br />

250 bp<br />

100 bp<br />

52 bp<br />

30 bp<br />

0 5 10 15 20 25 30 35 40<br />

Read Coverage<br />

Read Coverage<br />

High coverage is required<br />

– Oversample the genome to ensure<br />

every base is sequenced with long<br />

overlaps between reads<br />

– Biased coverage will also fragment<br />

assembly<br />

Reads & mates must be longer<br />

than the repeats<br />

– Short reads will have false overlaps<br />

forming hairball assembly graphs<br />

– With long enough reads, assemble<br />

entire chromosomes into contigs<br />

Errors obscure overlaps<br />

– Reads are assembled by finding<br />

kmers shared in pair of reads<br />

– High error rate requires very short<br />

seeds, increasing complexity <strong>and</strong><br />

forming assembly hairballs<br />

Current challenges in de novo plant genome sequencing <strong>and</strong> assembly<br />

Schatz MC, Witkowski, McCombie, WR (2012) Genome Biology. 12:243

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!