13.07.2015 Views

The Genom of Homo sapiens.pdf

The Genom of Homo sapiens.pdf

The Genom of Homo sapiens.pdf

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

198 HAYASHIZAKIpasses per day. <strong>The</strong>refore, we decided to develop it inhouse. A detailed description is given elsewhere (Shibataet al. 2000), but the following points deserve brief mention.After construction <strong>of</strong> the FL cDNA library, the clonesspread onto agar plates should be picked up by a Qbotsystem (Genetix Ltd., UK) to make a master plate. Areplica plate should be made from this master plate as abackup (Itoh et al. 1999; Shibata et al. 2000). <strong>The</strong> platedclones should then undergo inoculation, cultivation, plasmidpreparation, cycle sequencing reaction or transcriptionalsequencing, and sequencing steps. This was set upin the form <strong>of</strong> a pipeline through our RISA Inoculator,RISA Filtrator and Disitometer, RISA plasmid preparator,GS384 thermal cycler, and RISA 384 capillary sequencer.A basic concept in the development <strong>of</strong> the RISAsystem was that error in the identification <strong>of</strong> clonesshould be eradicated. A significant difference betweenthe requirements for large-scale cDNA sequencing andlarge-scale genome sequencing is that the correspondence<strong>of</strong> the sequence data to clones on master plates hasto be perfect in the case <strong>of</strong> cDNA cloning. We alwayshave to go back to the master plate to recover the FLcDNA clones; therefore, the identification <strong>of</strong> clones is essentialfor analysis after sequencing. However, withgenome shotgun sequencing, only trace sequences are assembledafter sequencing, and the shotgun clones are notused. Thus, in the RISA system, the relative location <strong>of</strong>the clones is not changed until the data are obtained.<strong>The</strong>refore, for processing by the RISA system the samplesare transferred on 384-format plates (or 96-formatplates for the plasmid preparatory step). <strong>The</strong> RISA sequenceris also designed to inject all samples directly into384-format capillary arrays from 384-format plates.Figure 4. <strong>The</strong> number <strong>of</strong> genes discovered in the course <strong>of</strong> thisproject. <strong>The</strong> discoveries <strong>of</strong> new singleton cDNAs are increasing,whereas abundant and middle abundant cDNAs have alreadybecome saturated.COLLECTION OF FL cDNACLONES (PHASE I)We started to collect the samples from various organs<strong>of</strong> mice at varying stages in development (Carninci et al.2003). In total, 267 tissues were collected, including primordialgerm cells, fertilized eggs, two-cell and four-cellzygotes, and so on. After construction <strong>of</strong> FL cDNA, wepicked up clones randomly from each library and subjectedthe first 5000 clones to end sequencing. If the libraryseems to be saturated, we stop to sequence moreclones from it. If it is not saturated, we continue to sequence.In Figure 4, the horizontal axis represents thedate from June 1996 to July 2002, and the vertical axisrepresents the number <strong>of</strong> clusters. <strong>The</strong> abundant cDNAs(more than 10 appearances) and the middle abundantcDNAs (6–10 appearances) have already been collected,as shown in this figure. Middle rare cDNAs (2–5 appearances)are also saturated. Only new cDNAs (single appearance)collected as singletons are still increasing.In total, 1,916,592 clones from normalized and subtractedlibraries were picked up. Calculation <strong>of</strong> the enrichmentefficiency for normalization and subtractionsuggests this is equivalent to 14,400,000 clones in a standardlibrary. All <strong>of</strong> these clones were subjected to end-sequencingfrom the 3´ site, and these were classified into188,000 clusters. <strong>The</strong> data from the clustering and mappingonto the genome sequence <strong>of</strong> all 520,311 5´-end sequenceswere used to select 60,770 clones to subject t<strong>of</strong>ull sequencing analysis.EXPRESSION PROFILES ANDPROTEIN–PROTEIN INTERACTIONExpression pr<strong>of</strong>iles were analyzed using Stanford-typecDNA microarrays (Miki et al. 2001; Bono et al. 2003).We used mRNA prepared from the whole body <strong>of</strong> amouse C57BL/6 embryo at 17.5 days. We produced expressionpr<strong>of</strong>iles using 20,000 genes, with redundancy in49 tissues, at FANTOM1. Additionally, we produced60,000 genes with redundancy in 21 tissues at FAN-TOM2. To analyze protein–protein interactions, we havedeveloped a high-throughput, mammalian, two-cell hybridsystem. <strong>The</strong> system has a capacity <strong>of</strong> 20,000wells/day and operates as one assay system, consisting <strong>of</strong>one robot and one fluorescent reader. As a pilot study, wehave produced 6000 x 6000 protein–protein interactions(Suzuki et al. 2001, 2003). <strong>The</strong>se two systems are verygood platforms for subsequent analysis and annotation.FANTOM AND INTERNATIONAL FANTOMCONSORTIUMTo annotate the full-length sequencing data, we organizedan international meeting named FANTOM (FunctionalANnoTation <strong>of</strong> Mouse cDNA). Over a hundredfirst-class scientists working in various life science fieldswere invited to be members <strong>of</strong> the international FAN-TOM consortium to annotate the function <strong>of</strong> each gene.We had two meetings, FANTOM 1 (from August 29 toSeptember 5, 2000) (Kawai et al. 2001) and FANTOM2(FANTOM2 Typhoon meeting; October 2001, FAN-TOM2 Cherry blossom meeting; April 2002) (Okazaki etal. 2002). For FANTOM2, we developed a teleconferencesystem (named MATRICS) allowing on-line annotationto be carried out. Every member can annotate in detailthe function <strong>of</strong> genes. FANTOM members revised25% <strong>of</strong> the annotation database made by computer s<strong>of</strong>tware.<strong>The</strong> conclusion was that 60,770 FANTOM2 sequencesstill had redundancy and that the database con-

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!