13.07.2015 Views

The Genom of Homo sapiens.pdf

The Genom of Homo sapiens.pdf

The Genom of Homo sapiens.pdf

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

192 GIBBS AND WEINSTOCKsentation <strong>of</strong> BACs in a pool, which is generally foundsporadically and is <strong>of</strong> limited impact when multiple poolsare used. Typically, a total sequence coverage <strong>of</strong> about 1xper BAC is distributed among the rows and columns, andthis is sufficient for deconvolution <strong>of</strong> most reads to individualBACs. <strong>The</strong> contigs that result from assembly <strong>of</strong>the deconvoluted reads, while representing partial coverage<strong>of</strong> a BAC, can be used as bait in BAC-Fishing to addWGS reads which, when assembled, result in full BACcoverage comparable to eBACs, described above. Moreover,since the overhead associated with construction <strong>of</strong>individual eBAC sequences is reduced dramatically bypooling, it is possible to employ deeper tiling paths, improvingthe quality <strong>of</strong> draft sequences by reducing gapsand resolving duplications on a finer scale.This same approach can be used with much lower sequencecoverage to map BACs to a reference sequenceand build a BAC map. Pooled <strong>Genom</strong>e Indexing (PGI)provides cross-species alignments by comparing readsagainst the genome <strong>of</strong> a closely related organism (Csurosand Milosavljevic 2002, 2004). Where a row read and acolumn read align close to each other in the genome(within a BAC insert length), the reads are deconvolutedto the BAC at the row–column intersection, and the BACis simultaneously mapped to the region between the twomatches. PGI pooling schemes are being applied to maprhesus macaque BAC clones to the human genome. Thissequence-based mapping integrates the mapping and sequencingcomponents <strong>of</strong> a genome project. Although it ispossible that a PGI approach could replace BAC fingerprinting,in practice, fingerprinting still provides an independentdata set that is valuable in quality assessment.However, the ability to map high clone coverage withPGI suggests that an effective strategy is to limit futurefingerprinting to those BACs on a tiling path derivedfrom PGI. <strong>The</strong>se various BAC pooling methods are onfirm theoretical and technical footing and are currentlybeing tested in the genome projects at the BCM-HGSC(Table 2). <strong>The</strong>y are expected to lead to cost reductions <strong>of</strong>the order <strong>of</strong> $1.5–2.0 million for a full draft sequence <strong>of</strong>a mammalian genome.OTHER COMPONENTS OF GENOMEPROJECTSAdditional data, beyond the final draft DNA sequence,are required to realize the full utility <strong>of</strong> a genome project.Three avenues <strong>of</strong> additive data that increase value <strong>of</strong> agenome project are finishing, characterization <strong>of</strong> polymorphisms,and sequencing cDNAs. Not surprisingly, theuse <strong>of</strong> BACs in the genome assemblies has ramificationfor the development <strong>of</strong> these additional component data.Finishing encompasses a range <strong>of</strong> activities that producediffering degrees <strong>of</strong> polishing <strong>of</strong> the draft sequence.At one end extreme are gap-filling and quality improvementsthat can be driven by automated sequence analysisand produce a useful but not complete improvement inthe sequence. Alternatively, to reach the highest grade <strong>of</strong>sequence quality requires much more sophisticated andlabor-intensive efforts for resolving difficult regions.<strong>The</strong>se latter approaches employ a variety <strong>of</strong> techniquessuch as different sequencing chemistries, specializedshotgun library techniques, or transposon-based methods.A reasonable compromise at the genome scale is for adraft sequence product to be greatly improved by moreautomated approaches, whereas selected regions are finishedto a higher grade. This makes practical the possibility<strong>of</strong> finishing essentially all coding regions as well astargeted features, such as genes <strong>of</strong> interest recommendedby the research community (e.g., QTLs); genes related byhomology with disease or important models in otherspecies; genes <strong>of</strong> interest identified by differential rates <strong>of</strong>evolution; presence/absence in closely related genomes;boundaries <strong>of</strong> syntenic regions with human or closely relatedgenomes; difficult to assemble regions (e.g., due tohigh repeat content); and members <strong>of</strong> new repeat classes.As noted above in the rat project, finished sequencealso provides a gold standard against which the draft assemblycan be compared to quantitatively assess quality(Fig. 4). BAC-based projects provide a convenient way todefine regions for finishing as well as the ideal reagent touse for these directed finishing approaches.Exploring polymorphism in a genome is also an impor-Table 2. <strong>Genom</strong>e Sequencing Projects Under Way at the BCM-HGSCSpecies Dates Size Goal Status Methods a %BCMHuman (<strong>Homo</strong> <strong>sapiens</strong>) 1990–2003 2.9 Gb finished complete clone by clone 10.5Mouse (Mus musculus) 1997–1999 2.6 Gb finished N/A clone by clone 80 aRhesus monkey (M. mulatta) 2003– 2.9 Gb draft in progress CAPSS/PGI-Combined >60 aRed flour beetle (T. castaneum) 2004– 0.2 Gb draft in progress WGS 100Bacteria (numerous) ongoing

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!