12.07.2015 Views

Initial sequencing and analysis of the human genome - Vitagenes

Initial sequencing and analysis of the human genome - Vitagenes

Initial sequencing and analysis of the human genome - Vitagenes

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

articlesultimate goal <strong>of</strong> a completely ®nished sequence. The results beloware based on <strong>the</strong> map <strong>and</strong> sequence data available on 7 October2000, except as o<strong>the</strong>rwise noted. At <strong>the</strong> end <strong>of</strong> this section, weprovide a brief update <strong>of</strong> key data.Clone selectionThe hierarchical shotgun method involves <strong>the</strong> <strong>sequencing</strong> <strong>of</strong> overlappinglarge-insert clones spanning <strong>the</strong> <strong>genome</strong>. For <strong>the</strong> HumanGenome Project, clones were largely chosen from eight large-insertlibraries containing BAC or P1-derived arti®cial chromosome(PAC) clones (Table 1; refs 82±88). The libraries were made bypartial digestion <strong>of</strong> genomic DNA with restriction enzymes.Toge<strong>the</strong>r, <strong>the</strong>y represent around 65-fold coverage (redundant sampling)<strong>of</strong> <strong>the</strong> <strong>genome</strong>. Libraries based on o<strong>the</strong>r vectors, such ascosmids, were also used in early stages <strong>of</strong> <strong>the</strong> project.The libraries (Table 1) were prepared from DNA obtained fromanonymous <strong>human</strong> donors in accordance with US Federal Regulationsfor <strong>the</strong> Protection <strong>of</strong> Human Subjects in Research(45CFR46) <strong>and</strong> following full review by an Institutional ReviewBoard. Brie¯y, <strong>the</strong> opportunity to donate DNA for this purpose wasbroadly advertised near <strong>the</strong> two laboratories engaged in libraryBox 1Genome glossarySequenceRaw sequence Individual unassembled sequence reads, producedby <strong>sequencing</strong> <strong>of</strong> clones containing DNA inserts.Paired-end sequence Raw sequence obtained from both ends <strong>of</strong> acloned insert in any vector, such as a plasmid or bacterial arti®cialchromosome.Finished sequence Complete sequence <strong>of</strong> a clone or <strong>genome</strong>, withan accuracy <strong>of</strong> at least 99.99% <strong>and</strong> no gaps.Coverage (or depth) The average number <strong>of</strong> times a nucleotide isrepresented by a high-quality base in a collection <strong>of</strong> r<strong>and</strong>om rawsequence. Operationally, a `high-quality base' is de®ned as one with anaccuracy <strong>of</strong> at least 99% (corresponding to a PHRED score <strong>of</strong> at least 20).Full shotgun coverage The coverage in r<strong>and</strong>om raw sequenceneeded from a large-insert clone to ensure that it is ready for ®nishing; thisvaries among centres but is typically 8±10-fold. Clones with full shotguncoverage can usually be assembled with only a h<strong>and</strong>ful <strong>of</strong> gaps per100 kb.Half shotgun coverage Half <strong>the</strong> amount <strong>of</strong> full shotgun coverage(typically, 4±5-fold r<strong>and</strong>om coverage).ClonesBAC clone Bacterial arti®cial chromosome vector carrying a genomicDNA insert, typically 100±200 kb. Most <strong>of</strong> <strong>the</strong> large-insert clonessequenced in <strong>the</strong> project were BAC clones.Finished clone A large-insert clone that is entirely represented by®nished sequence.Full shotgun clone A large-insert clone for which full shotgunsequence has been produced.Draft clone A large-insert clone for which roughly half-shotgunsequence has been produced. Operationally, <strong>the</strong> collection <strong>of</strong> draftclones produced by each centre was required to have an averagecoverage <strong>of</strong> fourfold for <strong>the</strong> entire set <strong>and</strong> a minimum coverage <strong>of</strong>threefold for each clone.Predraft clone A large-insert clone for which some shotgunsequence is available, but which does not meet <strong>the</strong> st<strong>and</strong>ards forinclusion in <strong>the</strong> collection <strong>of</strong> draft clones.Contigs <strong>and</strong> scaffoldsContig The result <strong>of</strong> joining an overlapping collection <strong>of</strong> sequences orclones.Scaffold The result <strong>of</strong> connecting contigs by linking information frompaired-end reads from plasmids, paired-end reads from BACs, knownmessenger RNAs or o<strong>the</strong>r sources. The contigs in a scaffold are ordered<strong>and</strong> oriented with respect to one ano<strong>the</strong>r.Fingerprint clone contigs Contigs produced by joining clonesinferred to overlap on <strong>the</strong> basis <strong>of</strong> <strong>the</strong>ir restriction digest ®ngerprints.Sequenced-clone layout Assignment <strong>of</strong> sequenced clones to <strong>the</strong>physical map <strong>of</strong> ®ngerprint clone contigs.<strong>Initial</strong> sequence contigs Contigs produced by merging overlappingsequence reads obtained from a single clone, in a process calledsequence assembly.Merged sequence contigs Contigs produced by taking <strong>the</strong> initialsequence contigs contained in overlapping clones <strong>and</strong> merging thosefound to overlap. These are also referred to simply as `sequence contigs'where no confusion will result.Sequence-contig scaffolds Scaffolds produced by connectingsequence contigs on <strong>the</strong> basis <strong>of</strong> linking information.Sequenced-clone contigs Contigs produced by merging overlappingsequenced clones.Sequenced-clone-contig scaffolds Scaffolds produced by joiningsequenced-clone contigs on <strong>the</strong> basis <strong>of</strong> linking information.Draft <strong>genome</strong> sequence The sequence produced by combining<strong>the</strong> information from <strong>the</strong> individual sequenced clones (by creatingmerged sequence contigs <strong>and</strong> <strong>the</strong>n employing linking information tocreate scaffolds) <strong>and</strong> positioning <strong>the</strong> sequence along <strong>the</strong> physical map <strong>of</strong><strong>the</strong> chromosomes.N50 length A measure <strong>of</strong> <strong>the</strong> contig length (or scaffold length)containing a `typical' nucleotide. Speci®cally, it is <strong>the</strong> maximum length Lsuch that 50% <strong>of</strong> all nucleotides lie in contigs (or scaffolds) <strong>of</strong> size at least L.Computer programs <strong>and</strong> databasesPHRED A widely used computer program that analyses raw sequenceto produce a `base call' with an associated `quality score' for eachposition in <strong>the</strong> sequence. A PHRED quality score <strong>of</strong> X corresponds to anerror probability <strong>of</strong> approximately 10 - X/10 . Thus, a PHRED quality score <strong>of</strong>30 corresponds to 99.9% accuracy for <strong>the</strong> base call in <strong>the</strong> raw read.PHRAP A widely used computer program that assembles rawsequence into sequence contigs <strong>and</strong> assigns to each position in <strong>the</strong>sequence an associated `quality score', on <strong>the</strong> basis <strong>of</strong> <strong>the</strong> PHREDscores <strong>of</strong> <strong>the</strong> raw sequence reads. A PHRAP quality score <strong>of</strong> Xcorresponds to an error probability <strong>of</strong> approximately 10 - X/10 . Thus, aPHRAP quality score <strong>of</strong> 30 corresponds to 99.9% accuracy for a base in<strong>the</strong> assembled sequence.GigAssembler A computer program developed during this projectfor merging <strong>the</strong> information from individual sequenced clones into a draft<strong>genome</strong> sequence.Public sequence databases The three coordinated internationalsequence databases: GenBank, <strong>the</strong> EMBL data library <strong>and</strong> DDBJ.Map featuresSTS Sequence tagged site, corresponding to a short (typically lessthan 500 bp) unique genomic locus for which a polymerase chainreaction assay has been developed.EST Expressed sequence tag, obtained by performing a single rawsequence read from a r<strong>and</strong>om complementary DNA clone.SSR Simple sequence repeat, a sequence consisting largely <strong>of</strong> at<strong>and</strong>em repeat <strong>of</strong> a speci®c k-mer (such as (CA) 15 ). Many SSRs arepolymorphic <strong>and</strong> have been widely used in genetic mapping.SNP Single nucleotide polymorphism, or a single nucleotide position in<strong>the</strong> <strong>genome</strong> sequence for which two or more alternative alleles arepresent at appreciable frequency (traditionally, at least 1%) in <strong>the</strong> <strong>human</strong>population.Genetic map A <strong>genome</strong> map in which polymorphic loci arepositioned relative to one ano<strong>the</strong>r on <strong>the</strong> basis <strong>of</strong> <strong>the</strong> frequency withwhich <strong>the</strong>y recombine during meiosis. The unit <strong>of</strong> distance iscentimorgans (cM), denoting a 1% chance <strong>of</strong> recombination.Radiation hybrid (RH) map A <strong>genome</strong> map in which STSs arepositioned relative to one ano<strong>the</strong>r on <strong>the</strong> basis <strong>of</strong> <strong>the</strong> frequency withwhich <strong>the</strong>y are separated by radiation-induced breaks. The frequency isassayed by analysing a panel <strong>of</strong> <strong>human</strong>±hamster hybrid cell lines, eachproduced by lethally irradiating <strong>human</strong> cells <strong>and</strong> fusing <strong>the</strong>m withrecipient hamster cells such that each carries a collection <strong>of</strong> <strong>human</strong>chromosomal fragments. The unit <strong>of</strong> distance is centirays (cR), denotinga 1% chance <strong>of</strong> a break occuring between two loci.NATURE | VOL 409 | 15 FEBRUARY 2001 | www.nature.com © 2001 Macmillan Magazines Ltd865

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!