Views
4 years ago

Analysis of 1.9 Mb of contiguous sequence from chromosome 4 of ...

Analysis of 1.9 Mb of contiguous sequence from chromosome 4 of ...

letters to nature high

letters to nature high degree of similarity of the Arabidopsis homologues to the human RNA helicase 1 gene (4365c) and to the yeast SUV3 ATPdependent RNA helicase (3435c) may indicate significant functional conservation of RNA processing mechanisms among euakaryotes. Two homologues of kinesins from Caenorhabditis elegans and Xenopus, 3115c and 3205c, show sequence conservation over the three functional domains, indicating that mechanisms of intracellular motility may be conserved between C. elegans, vertebrates and plants. Mutation of a putative Ca 2+ -binding kinesin called ZWI- CHEL caused morphological defects in trichomes 21 , demonstrating that this class of motor molecule contributes to cell shape in plants. Twelve per cent of the predicted genes found in the FCA region encode potential components of signal transduction pathways. For example, a homologue of a C. elegans calcium-sensor protein related to Drosophila frequenin, a calcium-binding protein that modulates neuronal efficiency 22 , is encoded by 4205c. Another Arabidopsis homologue of the yeast SNF-1 kinase, which regulates cellular sugar metabolism, is encoded by 3330c. 3280c encodes a protein kinase closely related to a developmentally regulated yeast kinase, SPS1, required for spore wall formation 23 . Four highly significant findings have been made in this pilot-scale sequencing project. First, there is a consistently high gene density over an extended contiguous region, with 389 predicted genes in 1.87 Mb. The relatively high gene density encountered in this region is also found in other sequenced regions. On the basis of this work and the 13 Mb of available sequence from four chromosomes, and the size of YAC contigs covering most of the low-copy regions of the five chromosomes, the total number of protein-coding genes is probably about 21,000. Second, the genome sequence has a high information content: 54% of the predicted genes can be assigned cellular roles on the basis of enzymatic, structural or other functions inferred by sequence similarity to proteins of known function. Nevertheless, the specific functions of most of these genes in plants requires further analysis. The remaining 46% of predicted genes, which either have no significant similarities to other genes or are similar to genes of unknown function, require extensive systematic experimentation to determine their cellular roles. Third, nearly 20% of the predicted genes are members of gene families that may have arisen by gene duplication and divergence. In other sequenced eukaryotes, such as C. elegans and yeast, the proportion is not as high. If the number of gene families in Arabidopsis is found to be 15,000 after more comprehensive sequencing, Arabidopsis will have a similar-sized genome complement to the model metazoans, Drosophila 24 and C. elegans 25 . This may represent a minimal number of genes required for the function of complex multicellular organisms with highly diverged mechanisms of development and environmental interactions 26 . Finally, it is now clear that a straightforward shotgun-sequencing strategy can generate contiguous sequence from nearly all of the low-copy regions of the Arabidopsis genome. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Methods Contig assembly. The physical map of Arabidopsis chromosome 4 is represented by 4 YAC contigs covering 17 Mb (ref. 2). Sequencing was initiated at the FCA locus on the 13.5-Mb lower arm. The YAC map was used to assemble two cosmid contigs containing 1.2 Mb, using subcloning of YACs into cosmids in order to complete regions not represented in genomic cosmid libraries 5 . BAC clones were identified by hybridization to YAC clones and were assembled into the contigs to complete coverage 27 . Sequencing strategy. Sequence was determined from an overlapping continguous set of 6 BAC clones, 15 Lorist cosmids, 16 CC cosmids, 5 CAt cosmids, and 45 YAC subclones in the binary cosmid vector pCLD 04541. All libraries were prepared using DNA isolated from the Columbia ecotype. Clones were distributed in a network of 17 EU labs for sequencing, and the resulting sequence was assembled at the Martinsrieder Institut für Protein Sequenzen (MIPS). The quality controls used during sequence assembly involved comparison of 220,134 bp of overlap sequences, resequencing of selected regions (90,566 bp), and resequencing of suspected low accuracy regions (8,684 bp), such as those harbouring potential frameshifts revealed by gene modelling. The total sequence produced was 2,094,637 bp, and the total nonredundant sequence was 1,874,503 bp. Shotgun sequencing of cosmid and BAC clones was the most common sequencing strategy. Sequence analysis. An initial BLASTX analysis 28 was used to compare all reading frames with all protein sequences and separately with the translations of Arabidopsis and other plant ESTs. A search for tRNAs and repeats was also made. Genefinder, Genmark and XGRAIL were modified using published Arabidopsis sequence and used for the identification of Arabidopsis genes. NetPlantGene was used for recognition of splice sites 29 . Where possible, the predictions were checked for consistency with known protein sequences or cognate cDNA sequences, and the gene models adjusted accordingly. A total of 21,031 bp of cognate cDNA sequences were produced to aid modelling. The resulting protein sequences were extracted using FINDORFS for further analysis. 8 Received 8 August; accepted 31 October 1997. 1. Meyerowitz, E. M. & Somerville, C. R. (eds) Arabidopsis (Cold Spring Harbor Laboratory Press, Cold Spring Harbor, NY, 1994). 2. Schmidt, R. et al. Physical map and orgnaization of Arabidopsis chromosome 4. Science 270, 480–483 (1995). 3. Zachgo, E. A. et al. A physical map of chromosome 2 of Arabidopsis thaliana. Genome Res. 6, 19–25 (1996). 4. Schmidt, R., Love, K., West, J., Lenehan, Z. & Dean, C. Description of 31 YAC contigs spanning the majority of Arabidopsis thaliana chromosome 5. Plant J. 11, 563–572 (1997). 5. Bancroft, I. et al. A strategy involving the use of high-redundancy YAC subclone libraries facilitates the contiguous representation in cosmid and BAC clones of 1.7 Mb of the genome of Arabidopsis thaliana. Weeds World 4, 1–9 (1997). 6. Sato, S. et al. Structural analysis of Arabidopsis thaliana chromosome 5. I. Sequence features of the 1.6 Mb regions covered by twenty physically assigned P1 clones. DNA Res. 4, 215–230 (1997). 7. Pearson, W. R. & Lipman, D. J. Improved tools for biological sequence comparison. Proc. Natl Acad. Sci. USA 85, 2444–2448 (1988). 8. Riley, M. Functions of gene products in E. coli. Microbiol. Rev. 57, 862–952 (1993). 9. Mewes, H.-W. et al. Overview of the yeast genome. Nature 387 (suppl.) 7–84 (1997). 10. White, S. E., Habera, L. F. & Wessler, S. R. Retrotransposons in the flanking regions of normal plant genes: A role for copia-like elements in the evolution fo gene structure and expression. Proc. Natl Acad. Sci. USA 91, 11792–11796 (1994). 11. Konieczny, A., Voytas, D. F., Cummings, M. P. & Ausubel, F. M. A superfamily of Arabidopsis thaliana retrotransposons. Genetics 127, 801–809 (1991). 12. SanMiguel, P. et al. Nested retrotransposons in the intergenic regions of the maize genome. Science 274, 765–768 (1996). 13. Wessler, S. R., Bureau, T. E. & White, S. E. LTR-retrotransposons and MITEs: important players in the evolution of plant genomes. Curr. Opin. Genet. Dev. 5, 814–821 (1995). 14. Parker, J. E. et al. The Arabidopsis downy mildew resistance gene RPP5 shares similarity to the Toll and interleukin-1 receptors with N and L6. The Plant Cell 9, 879–894 (1997). 15. Pear, J. R., Kawagoe, Y., Screckengost, W. E., Delmer, D. P. & Stalker, D. M. Higher plants contain homologues of the bacterial celA genes encoding the catalytic subunit of cellulose synthase. Proc. Natl Acad. Sci. USA 93, 12637–12642 (1996). 16. Back, K. & Chappell, J. Cloning and bacterial expression of a sesquiterpene cyclase from Hyoscamus muticus and its molecular comparison to related terpene cyclases. J. Biol. Chem. 270, 7375–7381 (1995). 17. Gavin, K. A., Hidaka, M. & Stillman, B. Conserved initiator proteins in eukaryotes. Science 270, 1667– 1671 (1995). 18. Fishel, R. et al. The human mutator gene homologue MHS2 and its association with hereditary nonpolyposis colon cancer. Cell 75, 1027–1038 (1993). 19. Marcus, G. A., Silverman, N., Berger, S. L., Horiuchi, J. & Guarente, L. Functional similarity and physical association between GCN5 and ADA2: putative transcriptional adaptors. EMBO J. 13, 4807– 4815 (1994). 20. Neuwald, A. F. & Landsman, D. GCN5-related histone N-acetyltransferases belong to a diverse superfamily that includes the yeast SPT10 protein. Trends Biochem. Sci. 22, 154–155 (1997). 21. Oppenheimer, D. G. et al. Essential role for a kinesin-like protein in Arabidopsis trichome morphogenesis. Proc. Natl Acad. Sci. USA 94, 6261–6266 (1997). 22. Pongs, O. et al. Frequenin–a novel calcium-binding protein that modulates synaptic efficacy in the Drosophila nervous system. Neuron 11, 15–28 (1993). 23. Friesen, H., Lunz, R., Doyle, S. & Segall, J. Mutation in the SPS1-encoded protein kinase of Saccharomyces cerevisiae leads to defects in transcription and morphology during spore formation. Genes Dev. 8, 2162–2175 (1994). 24. Gabor Miklos, G. & Rubin, G. M. The role of the genome project in determining gene function: Insights from model organisms. Cell 86, 521–529 (1996). 25. Wilson, R. et al. 2.2Mb of contiguous nucleotide sequence from chromosome III of C. elegans. Nature 368, 32–38 (1994). 26. Meyerowitz, E. M. Plants and the logic of development. Genetics 145, 5–9 (1997. 27. Bent, E., Johnson, S., Bancroft, I. BAC representation of two low-copy regions of the genome of Arabidopsis thaliana. The Plant J. (in the press). 28. Altschul, S. F., Gish, W., Miller, W., Myers, E. W. & Lipman, D. J. Basic local alignment search tool. J. Mol. Biol. 215, 403–410 (1990). 29. Hebsgaard, S. M. et al. Splice site prediction in Arabidopsis thaliana pre-mRNA by combining local and global sequence information. Nucleic Acids Res. 24, 3439–3452 (1996). Acknowledgements. This work was initiated and sponsored by the European Commission, DG-XII Life Sciences. Additional support from the BBSRC Plant and Animal Genome Analysis Programme, GREG (Groupe de Recherche et d’Etude des Genomes), BioResearch Ireland, and Plan Nacional de Investigacion Cientifica y Technica is gratefully acknowledged. Correspondence and requests for materials should be addressed to M.B. (e-mail: bevan@bbsrc.ac.uk). Nature © Macmillan Publishers Ltd 1998 488 NATURE | VOL 391 | 29 JANUARY 1998

3000w 3010w 3035w 3040w 3070w 3085w 3100w 3110w 3120w 3130w 150000 3005c 3015c 3020c 3025c 3030c 3045c 3050c 3055c 3060c 3075c 3080c 3090c 3095c 3105c 3115c 3125c 3135c 3145c 3065c 3140c 3245w 3150w 3175w 3180w 3190w 3205w 3230w 3235w 3240w 3265w 3270w 3275w 3290w 1 150001 300000 3145c 3155c 3160c 3165c 3170c 3185c 3195c 3200c 3210c 3215c 3220c 3225c 3250c 3255c3260c 3280c 3285c 3295c 3310w 3300w 3320w 3325w 3335w 3340w 3355w 3360w 3375w 3385w 3390w 3405w 3415w 3420w 300001 450000 3305c 3315c 3330c 3345c 3350c 3365c 3370c 3380c 3395c 3400c 3410c 3425c 3470w 3550w 3430w 3440w 3445w 3455w 3465w 3475w 3485w 3505w 3510w 3515w 3525w 3545w 3555w 3575w 3590w 3595w 3605w 3610w 450001 600000 3435c 3450c 3460c 3480c 3490c 3495c 3500c 3520c 3530c 3540c 3560c 3570c 3580c 3585c 3600c 3615c 3535c 3565c 3620c 3630w 3655w 3625w 3635w 3660w 3675w 3685w 3700w 3720w 3725w 3735w 600001 750000 3645c 3650c 3665c 3670c 3680c 3690c 3695c 3705c 3710c 3715c 3730c 3620c 3640c 3750w 3755w 3760w 3765w 3775w 3795w 3800w 3810w 3820w 3825w 3845w 3850w 3855w 3865w 3870w 3875w 3880w 3885w 3890w 3895w 750001 900000 3740c 3745c 3770c 3780c 3785c 3790c 3805c 3815c 3830c 3840c 3860c 3835c 4050w 4065w 3900w 3925w 3935w 3950w 3960w 3965w 3980w 3985w 3990w 3995w 4005w 4010w 4025w 4045w 4055w 900001 1050000 3905c 3910c 3915c 3920c 3930c 3940c 3955c 3970c 3975c 4000c 4020c 4030c 4035c 4040c 4060c 3945c 4015c 4065w 4190w 4070w 4085w 4090w 4095w 4105w 4110w 4115w 4120w 4135w 4140w 4150w 4155w 4175w 4185w 1050001 1200000 4075c 4080c 4100c 4125c 4130c 4145c 4160c 4165c 4170c 4180c 4195c 4200c 4210w 4225w 4230w 4240w 4250w 4255w 4265w 4280w 4290w 4310w 4320w 4325w 4350w 4355w 4360w 1200001 1350000 4205c 4215c 4220c 4235c 4245c 4260c 4270c 4275c 4285c 4295c 4300c 4305c 4315c 4330c 4335c 4345c 4365c 4233c 4340c 4440w 4380w 4390w 4395w 4415w 4435w 4445w 4450w 1350001 1500000 4370c 4375c 4385c 4400c 4405c 4410c 4420c 4425c 4430c 4460c 4470c 4475c 4480c 4490c 4495c 4500c 4505c 4465c 4485c 4525w 4535w 4555w 4570w 4580w 4610w 4625w 4635w 4660w 4665w 4670w 4680w 4685w 1500001 1650000 4505c 4510c 4515c 4520c 4530c 4545c 4550c 4560c 4565c 4575c 4585c 4590c 4595c 4605c 4615c 4620c 4630c 4640c 4645c 4650c 4655c 4675c 4690c 4700c 4540c 4600c 4695c 4855w 4705w 4710w 4720w 4725w 4735w 4740w 4755w 4765w 4785w 4795w 4805w 4815w 4835w 4845w 4860w 4865w 1650001 1800000 4700c 4715c 4730c 4745c 4750c 4760c 4770c 4775c 4780c 4790c 4800c 4810c 4820c 4825c 4830c 4840c 4850c 4870c 4880w 4885w 4900w 4915w 4920w 4930w 4940w 4945w 1800001 1950000 8 4875c 4890c 4895c 4905c 4910c 4925c 4935c Nature © Macmillan Publishers Ltd 1998

Analysis of 106 kb of contiguous DNA sequence from the D genome ...
articles The DNA sequence of human chromosome 21
The sequence and analysis of duplication-rich human chromosome 16
The DNA sequence and analysis of human chromosome 14
Sequence analysis of the Bacillus subtilis chromosome region ...
Analysis of the genome sequence of the ¯owering plant Arabidopsis ...
Sequencing and analysis of a 63 kb bacterial artificial chromosome ...
The DNA sequence of human chromosome 22 - The University of ...
The DNA sequence of human chromosome 7 - Eddy/Rivas Lab
Analysis of the genome sequence of the ¯owering plant Arabidopsis ...
Comparative Analysis of Expressed Sequence Tags in Resistant ...
Initial sequencing and analysis of the human genome
Initial sequencing and comparative analysis of ... - Medgen.unige.ch
Initial sequencing and analysis of the human genome - Vitagenes
Analysis of expressed sequence tags from biodiesel plant Jatropha ...
Genome sequence and comparative analysis of the model ... - CCB
Initial sequencing and comparative analysis of the mouse genome
Cloning and Sequence Analysis of the rpsL and - Journal of ...
Initial sequencing and analysis of the human genome - Vitagenes
from genome sequence to functional analysis and medical ...
Full genome sequence analysis of parechoviruses from Brazil ...
Sequence Analysis of a Cryptic Plasmid pKW2124 from Weissella ...
Analysis of a normalised expressed sequence tag (EST) library from ...
Comparative Analysis of Expressed Genes from ... - Annals of Botany
Computational genomic study of LTP pathway in the context of Raf/Ksr homologue in human and chimpanzee