13.07.2015 Views

The Genom of Homo sapiens.pdf

The Genom of Homo sapiens.pdf

The Genom of Homo sapiens.pdf

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Evolution <strong>of</strong> Eukaryotic Gene Repertoire and Gene Structure:Discovering the Unexpected Dynamics <strong>of</strong> <strong>Genom</strong>e EvolutionI.B. ROGOZIN,* V.N. BABENKO,* N.D. FEDOROVA,* J. D. JACKSON,* A.R. JACOBS,*D.M. KRYLOV,* K.S. MAKAROVA,* R. MAZUMDER,* † S.L. MEKHEDOV,* B.G. MIRKIN, †A.N. NIKOLSKAYA,* B.S. RAO,* S. SMIRNOV,* A.V. SOROKIN,* A.V. SVERDLOV,*S. VASUDEVAN,* Y.I. WOLF,* J.J. YIN,* D.A. NATALE,* AND E.V. KOONIN**National Center for Biotechnology Information, National Library <strong>of</strong> Medicine, National Institutes <strong>of</strong> Health,Bethesda, Maryland and † School <strong>of</strong> Information Systems and Computer Science, Birkbeck College,University <strong>of</strong> London, London, WC1E 7HX, United KingdomCOMPARATIVE GENOMICS, EVOLUTIONARYCLASSIFICATION OF GENES, ANDPHYLETIC PATTERNSComparative genomics has already changed our understanding<strong>of</strong> genome evolution. In what might amount to aparadigm shift in evolutionary biology, genome comparisonshave shown that lineage-specific gene loss and horizontalgene transfer (HGT) are not freak incidents <strong>of</strong>evolution but extremely common phenomena that, to alarge degree, have shaped the extant genomes, at leastthose <strong>of</strong> prokaryotes (Doolittle 1999; Gogarten et al.2002; Snel et al. 2002; Mirkin et al. 2003). <strong>The</strong> extent <strong>of</strong>gene loss occurring in certain lineages <strong>of</strong> prokaryotes,particularly parasites, is astonishing: In some cases,>80% genes in the genome have been lost over ~200 millionyears <strong>of</strong> evolution (Moran 2002). Horizontal genetransfer is harder to document, but a strong case has beenmade for its extensive contribution to the evolution <strong>of</strong>prokaryotes (Ochman et al. 2000; Koonin et al. 2001;Mirkin et al. 2003).Gene exchange between phylogenetically distant eukaryotesdoes not appear to be an important evolutionaryphenomenon. In contrast, the contribution <strong>of</strong> gene loss tothe evolution <strong>of</strong> eukaryotic genomes was probably substantial,although the level <strong>of</strong> genome fluidity observed inprokaryotes is unlikely to have been attained in eukaryoticevolution. A comparison <strong>of</strong> the genomes <strong>of</strong> twoyeasts, Saccharomyces cerevisiae and Schizosaccharomycespombe, showed that, in the S. cerevisiae lineage,up to 10% <strong>of</strong> genes have been lost since the divergence <strong>of</strong>the two species (Aravind et al. 2000). It appears likelythat, in eukaryotic parasites with small genomes, e.g., themicrosporidia, much more massive gene elimination hasoccurred (Katinka et al. 2001). In contrast, the extent <strong>of</strong>gene loss in complex, multicellular eukaryotes remainsunclear, although the small number <strong>of</strong> unique genes in thehuman genome when compared to the mouse genome(and vice versa) suggests considerable stability <strong>of</strong> thegene repertoire (Waterston et al. 2002). Present address: Protein Identification Resource, Georgetown UniversityMedical Center, 3900 Reservoir Road, NW, Washington, DC20007.We are interested in quantitative analysis <strong>of</strong> the dynamics<strong>of</strong> genome evolution. A prerequisite for suchstudies is a classification <strong>of</strong> the genes from the sequencedgenomes based on homologous relationships. <strong>The</strong> twoprincipal categories <strong>of</strong> homologs are orthologs and paralogs(Fitch 1970; Sonnhammer and Koonin 2002). Orthologsare homologous genes that evolved via verticaldescent from a single ancestral gene in the last commonancestor <strong>of</strong> the compared species. Paralogs are homologousgenes, which, at some stage <strong>of</strong> evolution, haveevolved by duplication <strong>of</strong> an ancestral gene. Orthologyand paralogy are two sides <strong>of</strong> the same coin because,when a duplication (or a series <strong>of</strong> duplications) occurs afterthe speciation event that separated the comparedspecies, orthology becomes a relationship between sets <strong>of</strong>paralogs, rather than individual genes (genes that belongto orthologous sets are sometimes called co-orthologs)(Sonnhammer and Koonin 2002).Robust identification <strong>of</strong> orthologs and paralogs is criticalfor the construction <strong>of</strong> evolutionary scenarios, whichinclude, along with vertical inheritance, lineage-specificgene loss and, possibly, HGT (Snel et al. 2002; Mirkin etal. 2003). <strong>The</strong> algorithms for the construction <strong>of</strong> thesescenarios involve, in one form or another, tracing thefates <strong>of</strong> individual genes, which is feasible only when orthologs(including co-orthologs) are known. In principle,orthologs, including co-orthologs, should be identified byphylogenetic analysis <strong>of</strong> entire families <strong>of</strong> homologousproteins, which is expected to define orthologous proteinsets as clades (e.g., Sicheritz-Ponten and Andersson2001). However, for genome-wide protein sets, suchanalysis remains labor-intensive and error-prone. Thus,procedures have been developed for identification <strong>of</strong> sets<strong>of</strong> probable orthologs without explicit use <strong>of</strong> phylogeneticmethods. Generally, these approaches are based onthe notion <strong>of</strong> a genome-specific best hit (BeT), i.e., theprotein from a target genome, which has the greatest sequencesimilarity to a given protein from the querygenome (Tatusov et al. 1997; Huynen and Bork 1998).<strong>The</strong> central assumption here is that orthologs have agreater similarity to each other than to any other proteinfrom the respective genomes due to the conservation <strong>of</strong>functional constraints. When multiple genomes are analyzed,pairs <strong>of</strong> probable orthologs detected on the basis <strong>of</strong>Cold Spring Harbor Symposia on Quantitative Biology, Volume LXVIII. © 2003 Cold Spring Harbor Laboratory Press 0-87969-709-1/04. 293

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!