2007 BOOK REVIEWS 1025 Phylogeny Reconstruction (1996), Ziheng Yang does not attempt **to** completely cover the field of **phylogenetics**. Instead, he adopts the po**in**t of view **that** molecular evolutionary analyses should **be** formulated as a problem of statistical **in**ference. This is a long-st**and****in**g viewpo**in**t (Cavalli-Sforza **and** Edwards, 1967) but one **that** has **be**en neglected among most texts. The book is divided **in****to** three parts, namely, Model**in**g Molecular Evolution, Phylogeny Reconstruction, **and** Advanced Topics. The first impression when read**in**g the book is the high level of detail **and** discussions. Topics are descri**be**d **in** a logical cha**in** from simple **to** complex where the **in**crease **in** complexity is motivated by biological facts. Concepts **and** notations are used consistently throughout the book. This consistency enhances readability, **and** is an advantage compared **to** edited books where the notation often changes from one chapter **to** the next. Model**in**g Molecular Evolution comprises two chapters: Models of Nucleotide Substitution **and** Models of Am**in**o Acid **and** Codon Substitution. Both chapters form the foundation for the rest of the book. Start**in**g with the traditional problem of estimat**in**g the evolutionary distance **be**tween two aligned homologous sequences, the first chapter demonstrates clearly how biological evidence needs **to** **be** **in**cluded **in** the model**in**g process. The argumentation is, as usual, **that** we set out with the simplest Jukes-Can**to**r model (Jukes **and** Can**to**r, 1969) **and** then **in**clude more **and** more biological realism, **to** end up with the general time-reversible model (Tavaré 1986), **in**clud**in**g among-site rate heterogeneity. The next addition of biologically realistic features, the non**in**dependence of sites, is unfortunately not expla**in**ed, although suitable models abound **that** deal with this new development (Schöniger **and** von Haeseler, 1994; Pollock et al., 1999). Chapter two applies Markov processes **to** model am**in**o acid substitutions. Ziheng Yang dist**in**guishes **be**tween empirical **and** mechanistic rate matrices. One example of an empirical model is the well-known Dayhoff substitution matrix (Dayhoff et al., 1978), whereas as a prime example of mechanistic models Yang features the codon model (Yang et al., 1998). The chapter closes with a description of the numerical calculation of a transition probability matrix. The second part of the book deals with phylogenetic reconstruction. In the third chapter the term**in**ology for trees is presented, **in**clud**in**g the problem of search**in**g through tree space. Then the nonstatistical tree reconstruction approaches (distance methods, maximum parsimony) are briefly summarized. Readers who want **to** learn more about these methods will need **to** consult other texts (Swofford et al., 1996; Salemi **and** V**and**amme, 2003; Felsenste**in**, 2004). Chapter four delves **in****to** maximum likelihood analysis **to** **in**fer both phylogenies **and** model parameters of the substitution process. The “prun**in**g” algorithm **to** calculate the likelihood for fixed trees under different models is descri**be**d thoroughly. Particularly valuable is the section “Numerical Algorithms for Maximum Likelihood Estimation”. This section provides some advice on how **to** boost the efficiency of maximum likelihood–based software. The numerical aspect of (likelihood) computation is often neglected **in** other textbooks, but it plays an important role when it comes **to** the implementation of methods. Bayesian analysis is another philosophy **in** statistical data analysis. Although the basic theory has **be**en established for centuries, its application **to** phylogenetic analyses has **be**en studied for only a decade (Rannala **and** Yang, 1996; Mau **and** New**to**n 1997). The two ma**in** obstacles are the computational burden **and** the subjective knowledge about the prior distribution of parameters. The first has **be**en relieved by the availability of powerful computers **and** apply**in**g Markov cha**in** Monte Carlo approximations. However, the second still rema**in**s an unavoidable criticism from opponents. Chapter five discusses criticisms of both the maximum likelihood **and** Bayesian schools of thought. It proceeds through Bayesian theorems, Markov cha**in** Monte Carlo approximations, **and** their specifications **in** calculat**in**g the posterior probability of phylogenies under different models. Biologists might struggle with formula 5.40 for calculat**in**g the posterior probability of a tree, but the formula **can** **be** rewritten **in** a more elegant manner (Huelsen**be**ck **and** Ronquist, 2001). The last chapter of part two is a collection of reviews on those methodologies **that** compare phylogenetic reconstruction approaches **and** assess the goodness of reconstructed trees. Because different phylogenetic reconstruction methods are based on different philosophies, compar**in**g results is a tricky bus**in**ess. Thus, it is not surpris**in**g **that** the controversy about the “**be**st” method **can**not **be** solved. However, this chapter provides basic **and** well-established statistical concepts used **to** validate methods, e.g., “consistency, efficiency **and** robustness.” Besides model**in**g molecular evolution processes **and** construct**in**g phylogenies, computational molecular evolution obviously comprises additional crucial **to**pics. The advanced part of the book descri**be**s a selection of these **to**pics. The four chapters focus on Molecular Clock **and** Estimation of Species Divergence Time, Neutral **and** Adaptive Prote**in** Evolution, Simulat**in**g of Molecular Evolution, **and** f**in**ally Perspectives. Known **to** **be** embarrass**in**g confusion for many people, the branch lengths of a tree reflect the expected num**be**r of substitutions but not the divergence time. To **in**fer the divergence time (or the age of ances**to**rs), fossil calibrations must **be** **in**corporated. Comb**in****in**g both molecular data **and** as many fossil calibrations as possible **to** estimate the divergence time is crucial **to** actually date splitt**in**g times. Chapter seven escorts the reader from traditional approaches, based on the unrealistic assumption of a global molecular clock, **to** more sophisticated maximum likelihood approaches **and** Bayesian analysis. This chapter is def**in**itely **in**terest**in**g **and** clearly shows **that** the entire field is still **in** a developmental stage. Thus, any novice **in** the field may get some **in**spiration for further research. Downloaded from http://sysbio.oxfordjournals.org/ by guest on April 4, 2013

1026 SYSTEMATIC BIOLOGY VOL. 56 Notwithst**and****in**g **that** the Darw**in**ian theory of selection is dom**in**ant **in** evolution studies, neutral theory proposes a different view of evolution (Kimura, 1968). It suggests **that** the adaptation of genes does not depend on the advantage of fitness, but rather is determ**in**ed by r**and**om fixation of mutations. Chapter eight, Neutral **and** Adaptive Prote**in** Evolution, lucidly presents the neutral theory **and** computational approaches **to** test the neutrality of am**in**o acid substitutions. Researchers **in**terested **in** the evolution of prote**in**s will f**in**d the section Am**in**o Acids Sites Undergo**in**g Adaptive Evolution a useful summary of updated strategies **to** detect am**in**o acid sites under positive selection. Chapter n**in**e presents technical materials **to** simulate molecular evolution. Because we lack the luxury of know**in**g the true alignments **and** the true phylogenetic trees (**in** most cases), simulations appear **to** **be** one appropriate way **to** validate approaches **to** phylogenetic **in**ference. Indeed, the more realistically **that** simulations are designed, the more reliably conclusions **can** **be** drawn. The last chapter discusses some perspectives with**in** molecular evolution, such as explor**in**g heterogeneous datasets, **in**vestigat**in**g genome-rearrangement data, or genome comparisons. It po**in**ts students **and** younger researchers **to**wards potentially active areas **in** the foreseeable future. In summary, Ziheng Yang has presented a very **in**terest**in**g **and** readable book **that** highlights aspects of computational molecular evolution **that** are not mentioned **in** other contributions **to** the field. It also po**in**ts out open problems **that** provide ample space for future research. The book covers essential **to**pics **in** a logical order **to** progressively educate the reader. It descri**be**s **to**pics **in** a high level of detail with appropriate biological motivations **and** is full of valuable discussions. The book is highly recommended **to** graduate computer scientists, mathematicians, **and** biologists. However, every novice **in** the field should **be** aware **that** a large degree of mathematical, statistical, **and** biological knowledge is necessary **to** follow the full argumentation **in** the book. Although Ziheng Yang provides a sometimes very personal view of computational molecular evolution, the book is a valuable source of thought-provok**in**g aspects of the field. It also discusses some very **in**terest**in**g but not well-known statistical papers. Thus, the book clearly shows **that** one should consult, every now **and** then, the wealth of publications **in** statistics **and** probability theory. Le Sy V**in**h, Invertebrate Zoology Division, Ameri**can** Museum of Natural His**to**ry, Central Park West at 79th street, 10024 New York, New York, USA; E-mail: vle@amnh.org Arndt von Haeseler, Center for Integrative Bio**in**formatics Vienna, Max F Perutz Labora**to**ries, University of Vienna, Medical University of Vienna, University of Veter**in**ary Medic**in**e Vienna, Dr. Bohr Gasse 9/6, A1030 Vienna, Austria; E-mail: arndt.von.haeseler@univie.ac.at REFERENCES Cavalli-Sforza, L. L., **and** A. W. F. Edwards. 1967. Phylogenetic analysis: Models **and** estimation procedures. Evolution 21:550–570. Dayhoff, M. O., R. M. Schwartz, **and** B. C. Orcutt. 1978. A model for evolutionary change **in** prote**in**s. Pages 345–352 **in** Atlas of prote**in** sequence **and** structure, volume 5, supplement 3 (M. O. Dayhoff, ed.). National Biomedical Research Foundation, Wash**in**g**to**n, DC. Felsenste**in**, J. 2004. Inferr**in**g phylogenies. S**in**auer Associates, Sunderl**and**, Massachusetts. Huelsen**be**ck, J. P., **and** F. Ronquist. 2001. MrBayes: Bayesian **in**ference of phylogenetic trees. Bio**in**formatics 17:754–755. Jukes, T. H., **and** C. R. Can**to**r. 1969. Evolution of prote**in** molecules. Pages 21–132 **in** Mammalian prote**in** metabolism (H. N. Munro, ed.). Academic Press, New York. Kimura, M. 1968. Evolutionary rate at the molecular level. Nature 271:624–626. Mau, B., **and** M. New**to**n. 1997. Phylogenetic **in**ference for b**in**ary data on dendrograms us**in**g Markov cha**in** Monte Carlo. J. Comput. Graph. Stat. 6:122–131. Pollock, D. D., W. R. Taylor, **and** N. Goldman 1999. Coevolv**in**g prote**in** residues: Maximum likelihood identification **and** relationship **to** structure. J. Mol. Biol. 287:187–198. Rannala, B., **and** Z. Yang. 1996. Probability distribution of molecular evolutionary trees: A new method of phylogenetic **in**ference. J. Mol. Evol. 43:304–311. Salemi, M., **and** A.-M. V**and**amme (eds.). 2003. The phylogenetic h**and**book: a practical approach **to** DNA **and** prote**in** phylogeny. Cambridge University Press, Cambridge, UK. Schöniger, M., **and** A. von Haeseler. 1994. A s**to**chastic model for the evolution of au**to**correlated DNA sequences. Mol. Phylogenet. Evol. 3:240–247. Swofford, D. L., G. J. Olsen, P. J. Waddell, **and** D. M. Hillis. 1996. Phylogenetic **in**ference. Pages 407–514 **in** Molecular systematics, 2nd edition (D. M. Hillis, C. Moritz, **and** B. K. Mable, eds.). S**in**auer Associates, Sunderl**and**, Massachusetts. Tavaré, S. 1986. Some probabilistic **and** statistical sequences. Lect. Math. Life Sci. 17:57–86. Yang, Z., R. Nielsen, **and** M. Hasegawa. 1998. Models of am**in**o acid substitution **and** applications **to** mi**to**chondrial prote**in** evolution. Mol. Biol. Evol. 15:1600–1611. Downloaded from http://sysbio.oxfordjournals.org/ by guest on April 4, 2013