13.07.2015 Views

The Genom of Homo sapiens.pdf

The Genom of Homo sapiens.pdf

The Genom of Homo sapiens.pdf

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

THE FINISHED HUMAN GENOME 3Figure 1. Schematic representation <strong>of</strong> the hierarchical mappingand sequencing strategy used for the Human <strong>Genom</strong>e Project.formation from the different maps was integrated bymarkers that were shared between them, thus providingindependent confirmation <strong>of</strong> each level. Because mostmarkers had sequence information attached to them (i.e.,they were sequence-tagged sites, or STSs, in thegenome), they were also integrated into the genomic sequence.This approach made it possible to select clonesfor sequencing that ensured maximal coverage <strong>of</strong> thegenome, while minimizing sequencing redundancy. Inaddition, obtaining all the sequence on a clone-by-clonebasis ensured that any regions which proved difficult tosequence could be resolved locally within the 40- to 200-kb segment <strong>of</strong> the large-insert bacterial clone.Clones identified for sequencing were shotgun subclonedinto single-stranded M13 bacteriophage or double-strandedplasmid vectors (e.g., pUC18). Sequencereads from one or both ends were generated after propagation<strong>of</strong> the subclones in Escherichia coli and extraction<strong>of</strong> the DNA. For efficient generation <strong>of</strong> high-quality finishedsequence, each BAC or PAC clone was typically sequencedto give a minimum <strong>of</strong> six- or eightfold coveragein random shotgun reads, which were assembled usingthe program PHRAP (http://www.phrap.org/) to generateon average between six and ten contigs. This assembledunfinished sequence was then manually assessed in orderto determine the best strategy for closing gaps and resolvingall ambiguities. Additional sequence was generatedfrom subclones or PCR products that spanned gaps.A range <strong>of</strong> directed sequencing strategies, such as the use<strong>of</strong> small-insert (McMurray et al. 1998) or transposontaggedlibraries (Devine et al. 1997), were developed toobtain sequence in particularly difficult regions. At theend <strong>of</strong> the finishing process, the sequence <strong>of</strong> each bacterialclone was at least 99.99% accurate and contained nogaps, achieving the standard agreed upon by the internationalconsortium.Through the course <strong>of</strong> the HGP, considerable improvementsin sequencing chemistry and hardware were implementedto improve the overall quality <strong>of</strong> the product. Examples<strong>of</strong> chemistry improvements include thereplacement <strong>of</strong> fluorescent dye primer sequencing by robust,fluorescent dye terminator sequencing, and the development<strong>of</strong> sequencing strategies for tracts <strong>of</strong> DNA thatcan adopt secondary structures inhibitory to polymeraseprogression. Early sequencing hardware improvementsincluded increasing the number <strong>of</strong> lanes on each polyacrylamideslab gel, and extending read lengths. Substantialfurther improvement was provided by the later introduction<strong>of</strong> capillary sequencers, which had theadvantages <strong>of</strong> very accurate lane tracking, higherthroughput (8 runs/day compared with 3 runs/day on slabgel sequencers), longer read lengths, and no requirementfor manual preparation <strong>of</strong> polyacrylamide gels. Most <strong>of</strong>the large sequencing laboratories also invested considerableeffort in automating many <strong>of</strong> the steps in the sequencingprocess, including plaque/colony picking, DNAextraction, and sequencing. As a result <strong>of</strong> these improvements,the total capacity <strong>of</strong> sequencing centers engagedin the project had risen to over 100 million reads per yearby August, 2000.In 1999 and 2000, the increased efficiency in shotgunsequence generation was used to accelerate the production<strong>of</strong> sequence for the whole genome by generatingdraft sequence (assemblies <strong>of</strong> at least fourfold sequencecoverage in high-quality sequence) for every clone in thetilepath (see Fig. 2). By October 2000, a “working draft”<strong>of</strong> the genome was assembled comprising sequence from29,298 large bacterial clones (Lander et al. 2001). Of theclones, 8,277, representing approximately 30% <strong>of</strong> thegenome, were already finished (see above). This includedthe sequences <strong>of</strong> Chromosomes 22 and 21, published in1999 and 2000, respectively (Dunham et al. 1999; Hattoriet al. 2000).With the completion <strong>of</strong> the working draft, the IHGSCcontinued the process <strong>of</strong> generating a finished genome sequence.At this stage, completion <strong>of</strong> the clone tilepaths foreach chromosome was assigned to chromosome coordinatorsdistributed among the sequencing centers (see Table2). To finish the map, gaps were closed wherever possibleby exhaustive screening <strong>of</strong> multiple libraries (BAC, PAC,cosmid, or YAC) to identify additional clones to be sequenced.In the late stages <strong>of</strong> this process, the sizes <strong>of</strong> theremaining gaps were estimated by fluorescent in situ hybridization<strong>of</strong> extended DNA fibers, interphase nuclei, ormetaphase chromosomes. In addition, when the mouseand rat draft genome sequences became available, it waspossible to align the human sequence in the vicinity <strong>of</strong> thegap to the syntenic position in the rodent sequence and obtainan estimate <strong>of</strong> gap size. To finish the sequence <strong>of</strong> eachBAC or PAC clone, at least tw<strong>of</strong>old additional shotgun sequencecoverage <strong>of</strong> each clone was generated to bring theoverall coverage to at least sixfold. <strong>The</strong> iterative steps <strong>of</strong>manual review and directed sequencing (described above)were then taken to complete the sequence to the standardsagreed upon by the IHGSC.Further work on chromosome closure continues to bepursued at the genome centers, taking advantage <strong>of</strong> new resourcesand new techniques for manipulating DNA to closegaps that have been refractory to the methods used previously.For example, using sequence reads generated fromflow-sorted Chromosome 20 for SNP discovery, it has alreadybeen possible to add a further 120 kb <strong>of</strong> sequence andto close three <strong>of</strong> the gaps that were not represented in largeinsertclone libraries (P. Deloukas, pers. comm.).

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!