44 Grundlagen der Bioinformatik, SS’09, D. Huson, May 10, 2009The distance-based SP-score of the profile <strong>alignment</strong> A ∗ is:D sp (A ∗ ) =∑ L∑s(a ∗ pi, a ∗ qi) =1≤p
Grundlagen der Bioinformatik, SS’09, D. Huson, May 10, 2009 45<strong>Sequence</strong>s are aligned bottom-up along the guide tree, first aligning pairs of <strong>sequence</strong>s, then <strong>sequence</strong>sagainst profiles (sub-<strong>alignment</strong>s) and then profiles against profiles.Different algorithms use different methods to compute the guide tree.4.9.4 Feng-DoolittleA first progressive <strong>alignment</strong> algorithm was published in 1987 by Feng and Doolittle 1 .Algorithm 4.9.31. Calculate all ( r2)pairwise <strong>alignment</strong> scores and convert them into distances.2. Construct a rooted guide tree from the distance matrix using the “Fitch–Margoliash” algorithm.3. Build a multiple <strong>alignment</strong> bottom-up along the guide tree and return the <strong>alignment</strong> of all <strong>sequence</strong>sthat is produced at the root of the tree.The distance score used by Feng-Doolittle is:whereD = − log S eff = − log S obs − S randS max − S rand,• S obs is the observed similarity score for a pair of <strong>sequence</strong>s,• S max is the maximum possible score, and• S rand is the expected score of an <strong>alignment</strong> of two random <strong>sequence</strong>s of the same length andcomposition.The “effective score” S eff can be viewed as a normalised percentage similarity.The <strong>sequence</strong>-<strong>sequence</strong> <strong>alignment</strong>s are conducted using the profile <strong>alignment</strong> approach.4.9.5 CLUSTALWCLUSTALW 2 is still one of the most popular programs for computing an MSA, although more recentmethods such as T-Coffee or Muscle are designed to produce better <strong>alignment</strong>s in practice.Algorithm 4.9.4 (ClustalW progressive <strong>alignment</strong>) 1. Construct a distance matrix of all ( )r2pairs by pairwise dynamic programming <strong>alignment</strong> followed by approximate conversion of similarityscores to evolutionary distances.2. Construct a guide tree using the Neighbor-Joining tree-building method from the distance matrix.3. Progressively align <strong>sequence</strong>s at nodes of tree in order of decreasing similarity, using <strong>sequence</strong><strong>sequence</strong>,<strong>sequence</strong>-profile and profile-profile <strong>alignment</strong>.1 Feng, D-F & Doolittle, RF. Progressive <strong>sequence</strong> <strong>alignment</strong> as a prerequisite to correct phylogenetic trees. J. Mol.Evol. 25:351-360, 19872 Thompson, J.D., Higgins, D.G. & Gibson, T.J. CLUSTAL W: improving the sensitivity of progressive multiple<strong>sequence</strong> <strong>alignment</strong> through <strong>sequence</strong> weighting, positions-specific gap penalties and weight matrix choice. Nucleic AcidsResearch, 22:4673-4680, 1997.Thompson,J.D., Gibson,T.J., Plewniak,F., Jeanmougin,F. & Higgins,D.G. The ClustalX windows interface: flexiblestrategies for multiple <strong>sequence</strong> <strong>alignment</strong> aided by quality analysis tools. Nucleic Acids Research, 24:4876-4882, 1997.