13.07.2015 Views

The Genom of Homo sapiens.pdf

The Genom of Homo sapiens.pdf

The Genom of Homo sapiens.pdf

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Evolving Methods for the Assembly <strong>of</strong> Large <strong>Genom</strong>esR.A. GIBBS AND G.M. WEINSTOCKHuman <strong>Genom</strong>e Sequencing Center, Baylor College <strong>of</strong> Medicine, Houston, Texas 77030Several large genomes have been sequenced in recentyears, leading to the general perception that genome assemblyis a solved problem, and, aside from the considerableexpense <strong>of</strong> generating the DNA sequence reads,the pathway to completing future projects is straightforward.Closer examination reveals that this is not the case,and the complexity <strong>of</strong> large genomes and the relativenewness <strong>of</strong> the tools available for piecing them together,plus the rapid evolution <strong>of</strong> related technologies, make thisan active area for scientific development. A particularconcern is the ongoing role <strong>of</strong> mapping and utilization <strong>of</strong>large insert bacterial artificial chromosome clones(BACs) when tackling new species for which resourcesand reagents are scarce.Current approaches for assembling large genomes includethe “hierarchical” strategy, used for the publiclyfunded nematode consortium (<strong>The</strong> C. elegans SequencingConsortium 1998) and human genome projects (Landeret al. 2001), and the whole-genome shotgun (WGS)methods (Weber and Myers 1997) used for the privatelyfunded human sequence (Venter et al. 2001), Drosophilamelanogaster (Adams et al. 2000), mouse (Waterston etal. 2002b), and, more recently, ciona (Dehal et al. 2002),zebrafish (http://www.sanger.ac.uk/Projects/D_rerio/),Drosophila pseudoobscura (http://www.hgsc.bcm.tmc.edu/projects/drosophila/), the chimp (Pennisi 2003),pufferfish (Aparicio et al. 2002), dog (Kirkness et al.2003), and honeybee (http://www.hgsc.bcm.tmc.edu/projects/honeybee/). Conceptually, these methods representopposite extremes (Fig. 1). In the hierarchical strategy,considerable effort is required at the beginning <strong>of</strong> theproject and generally includes the generation, manipulation,and analysis <strong>of</strong> BACs. Ideally, the precise positionalrelationship between the BACs is established prior to sequencing.Next, each individual ~200-kb clone is treatedBAC by BACWhole <strong>Genom</strong>eShotgun (WGS)as an individual, localized DNA sequence project. Thisincludes random sequencing as well as a more painstakingfinishing phase. Finally, the BAC sequences arejoined at the end <strong>of</strong> the project, to reconstruct the completegenome.In contrast, in a WGS project, sequences are generatedfrom the ends <strong>of</strong> inserts <strong>of</strong> randomly selected (shotgunned)subclones <strong>of</strong> the whole genome. When a sufficientaverage depth <strong>of</strong> sequence coverage is achieved, thegenome is assembled by advanced s<strong>of</strong>tware that primarilyfunctions by recognizing individual sequence overlaps.To maintain assembly accuracy over long genomedistances, the sequences are generated from subcloneswith different insert sizes. <strong>The</strong> assembly is subsequentlyconstrained by requiring that sequence reads from theends <strong>of</strong> the same subclone (mate-pairs) are at their expectedrelative positions in the final sequence (Edwardset al. 1990). In many instances, these clone-end sequencesinclude at least a small fraction <strong>of</strong> BAC ends,since those clones are easily manipulated and their insertsspan hundreds <strong>of</strong> kilobases.Depending on the depth <strong>of</strong> coverage, a randomly sequencedgenome can be referred to as a preliminary draft(

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!