13.07.2015 Views

The Genom of Homo sapiens.pdf

The Genom of Homo sapiens.pdf

The Genom of Homo sapiens.pdf

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

190 GIBBS AND WEINSTOCKBACATLASWGSFigure 3. Formation <strong>of</strong> eBACs by BAC-Fishing. Sequencingreads from a lightly sequenced (~1x coverage) BAC are used toidentify reads from WGS sequencing (5–6x coverage) that overlapand thus come from the genomic region <strong>of</strong> the BAC. <strong>The</strong> relevantWGS reads are co-assembled with the BAC reads usingPhrap to produce an eBAC, an assembly with much greater coverage(6–7x) than the original skimmed BAC sequence.quence (Myers et al. 2002; Waterston et al. 2002a, 2003).Even apart from these debates, the shortcomings <strong>of</strong> eachapproach are apparent. On the one hand, the hierarchicalapproach is laborious and requires the development <strong>of</strong> anintermediate product (the clone map) that is ultimatelyabandoned. On the other hand, WGS sequencing requiresextensive work to piece together the segments <strong>of</strong>genomes that are repeated over distances longer than onekilobase.To improve the sequencing <strong>of</strong> large genomes, wetherefore developed a new “Combined Strategy” and appliedit to the Rat <strong>Genom</strong>e Sequencing Project (RGSP)(Rat <strong>Genom</strong>e Sequencing Project Consortium 2004). <strong>The</strong>underlying principle is to combine the precision and orientationafforded by BAC clones with the ease and scalability<strong>of</strong> a WGS approach (Fig. 2). As a consequence, theRGSP was designed to include low-coverage (1.5x) sequencefrom a full set <strong>of</strong> BAC clones that covered the entirerat genome, as well as an abundance <strong>of</strong> WGS readsthat would provide deep overall sequence coverage. <strong>The</strong>rationale was that specific WGS reads could be recruitedto join the sequences that were derived from within BACclones. When brought together, these mixed reads couldbe assembled using our familiar s<strong>of</strong>tware for handlingBAC-sized sequencing projects.This was the first, real-time combined use <strong>of</strong> BACs andWGS data to generate a complete genome assembly,since, although the previous Drosophila and concurrentmurine genome projects both used WGS and BACs, thesewere employed in sequential and separate phases. Wepredicted the BAC component <strong>of</strong> the Combined Strategywould be particularly important in the rat genome sequencingprogram, as it aimed to generate a “draft” <strong>of</strong>about 7x sequence coverage, but not finished sequence.In this case, the BACs were likely to confer an overallhigher quality on the eventual assembly.Figure 3 illustrates the process <strong>of</strong> BAC Fishing, the firststep and a key element <strong>of</strong> the Combined Strategy. At thisstage, the small numbers <strong>of</strong> “bait” sequence reads fromindividual BACs are used to identify larger numbers <strong>of</strong>matching “catch” DNA sequence reads from WGS libraries.Typically 500–800 BAC reads “fish” 2,500–3,000 reads from a 6x WGS pool. We developed BACfisher(Havlak et al. 2004) s<strong>of</strong>tware that enabled this fishingprocess, and PHRAP was used for the subsequentstringent step <strong>of</strong> local assembly to form the “enrichedBACs” (eBACs) containing the two kinds <strong>of</strong> reads.<strong>The</strong> high quality <strong>of</strong> the data contained within eacheBAC validated the basic principle <strong>of</strong> the CombinedStrategy (Fig. 4). <strong>The</strong> eBACs were made publicly availableas a useful resource while the project progressed, andformed the foundation for the remainder <strong>of</strong> the rat assembly.To generate a complete rat genome draft, ~19,000eBACs were assembled into strings <strong>of</strong> BACtigs on the basis<strong>of</strong> their sequence overlaps. <strong>The</strong> BACtigs were nextjoined into super-BACtigs by large clone insert mate-pairinformation and ultimately aligned to genome-mappingdata to form the complete assembly (Fig. 5). More details<strong>of</strong> this assembly procedure are discussed elsewhere(Havlak et al. 2004).<strong>The</strong> overall quality <strong>of</strong> the rat assembly was very high,and the project therefore managed to capture both theeconomic and genome coverage advantages <strong>of</strong> WGSwhile retaining the ease and precision <strong>of</strong> assembly affordedby the BAC components. <strong>The</strong> additional advantages<strong>of</strong> the BACs were illustrated by a comparison withWGSBACCombined AssemblyFigure 2. <strong>The</strong> Combined Strategy for sequencing largegenomes.Figure 4. High quality <strong>of</strong> eBACs. An example <strong>of</strong> a base-by-basecomparison between an eBAC (ordinate) and the same region <strong>of</strong>finished sequence (abscissa). <strong>The</strong> colinearity and a few gaps areevident. No repeat filtering was performed, resulting in the largeamount <strong>of</strong> signal <strong>of</strong>f the diagonal throughout the graph.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!