13.07.2015 Views

Rice Genetics IV - IRRI books - International Rice Research Institute

Rice Genetics IV - IRRI books - International Rice Research Institute

Rice Genetics IV - IRRI books - International Rice Research Institute

SHOW MORE
SHOW LESS
  • No tags were found...

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

nal assembly may point to areas that were incorrectly sorted by the finisher. One mustalways remember not to rely entirely on assembly algorithms, however, as they arenot always adept at sorting high-identity repeat sequences. Combined PHRAP qualityvalues can be generated for each base in the assembly based on the quality, strand,and chemistry of the underlying primary data. PHRAP scores of 40 or better indicatean extremely accurate sequence. Overlapping data from clones that share a commonportion of sequence can be independently finished, and the resulting data can be comparedfor differences and discrepancies. Analysis of the assembly using alternate navigationsoftware (such as Gap4 from the Staden package [Staden et al 1998]) andindependent resequencing of questionable areas within the assembly can also be informative.The sequencing pipeline, finishing time, and data releaseIn most if not all large-scale sequencing operations, an assembly line is constructed.In these labs, clones are processed one group after another from beginning until end.The first step is the making of subclone libraries. This typically takes 1 to 2 weeks perclone. Following subclone library construction is a phase of production sequencing.As an example, in our lab, the production phase typically takes 1 to 2 weeks as well.The next phase is the finishing phase, which involves assembly, resolution of problems,filling of gaps, and verification of the assembled sequence. On an uncomplicatedclone, this might take about 3–5 weeks. Some complicated clones might takemonths or longer to complete. Some of the clones we sequenced from the centromereof Arabidopsis chromosome <strong>IV</strong> took more than 1 year to finish (Mayer et al 1999).Following finishing and validation of the clones, the sequence is analyzed.Several factors concerning this process need to be emphasized. First, this is anextremely parallel process, that is, many clones, 10 or more, may be processed atabout the same time. The clones are not done sequentially. Second, most of the dataare available after the production-sequencing phase. The gaps and ambiguities remainingprior to finishing often account for only a small percentage of the total sequence.Although the data are definitely more difficult to use prior to finishing, theyare mostly complete, and hence very valuable. Last, as a result of the process at mostlarge sequencing centers, a very large amount of partially finished sequence is availableat any given time. In fact, it would not be surprising if the amount of unfinishedsequence exceeds the amount of finished sequence available for much of the life of aproject, even if the sequence is being finished as rapidly as possible. This is the resultof the parallel nature of the process, the bottleneck of finishing, and the typical gradualincrease in sequencing capacity. This phenomenon has been known since large-scalesequencing began in the early 1990s.Aware of this phenomenon, virtually all groups doing large-scale genome sequencingsince 1995 have agreed to a policy of immediate release of assembled contiguousfragments of a sequence of more than 2 kilobases. This is the currently acceptedinternational standard for data release referred to as the Bermuda Standards (Bentley1996, Guyer 1998, HUGO Web site at http://www.gene.ucl.ac.uk/hugo/publicationsreports.html;Wellcome Trust Web site at http://www.wellcome.ac.uk/en/1/Strategies and techniques for finishing genomic sequence 211

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!