Practical Course on Multiple Sequence Alignment - CNB - Protein ...
Practical Course on Multiple Sequence Alignment - CNB - Protein ...
Practical Course on Multiple Sequence Alignment - CNB - Protein ...
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
precise meaning of equivalence is generally c<strong>on</strong>text dependent: for the<br />
phylogeneticist, equivalent residues have comm<strong>on</strong> evoluti<strong>on</strong>ary ancestry; for<br />
the structural biologist, equivalent residues corresp<strong>on</strong>d to analogous positi<strong>on</strong>s<br />
bel<strong>on</strong>ging to homologous folds in a set of proteins; for the molecular biologist,<br />
equivalent residues play similar functi<strong>on</strong>al roles in their corresp<strong>on</strong>ding<br />
proteins. In each case, an alignment provides a bird’s eye view of the<br />
underlying evoluti<strong>on</strong>ary, structural, or functi<strong>on</strong>al c<strong>on</strong>straints characterizing a<br />
protein family in a c<strong>on</strong>cise, visually intuitive format.<br />
Many bioinformatic methods rely <strong>on</strong> MSAs. Success of methods relies <strong>on</strong><br />
quality of alignment. MSA a difficult problem, although current automatic<br />
approaches are good, usually they can be improved by human interventi<strong>on</strong> to<br />
yield better alignments and hence better analysis. It is difficult to teach how to<br />
do this well - depends <strong>on</strong> obtaining experience. This course aims to provide a<br />
starting point to obtain this experience.<br />
A typical multiple sequence alignment workflow will be:<br />
1. Clearly formulate the questi<strong>on</strong> you want to answer. E.g.: “What is the<br />
sec<strong>on</strong>dary structure predicti<strong>on</strong> for protein sequence”<br />
2. Collect a set of sequences to address this questi<strong>on</strong>. E.g. using a<br />
BLAST database search.<br />
3. Create and examine initial automatic MSA. E.g. align using Clustal<br />
and examine in JalView<br />
4. Adjust the set of sequences to obtain an optimal set. E.g. remove<br />
unrelated sequences, add sequences from additi<strong>on</strong>al searches.<br />
5. Manually adjust alignment to correct automatically-introduced<br />
errors.<br />
Some notes <strong>on</strong> protein evoluti<strong>on</strong><br />
The evoluti<strong>on</strong>ary variati<strong>on</strong>s of a protein provide much informati<strong>on</strong> about the<br />
protein itself. Evoluti<strong>on</strong>ary divergence into different species has resulted in<br />
many variants of the same protein, all with essentially the same biological<br />
functi<strong>on</strong> but different amino acid sequences. The differences and similarities<br />
of the amino acid sequences of these variants reflect the c<strong>on</strong>straints of<br />
structure and functi<strong>on</strong> for that protein.<br />
The number of possible RNA, DNA, or protein sequences is so great that it is<br />
implausible that similar l<strong>on</strong>g sequences could have arisen by any mechanism<br />
other than evoluti<strong>on</strong>ary divergence from the same ancestor.<br />
Nucleic acids and proteins that have evolved from a comm<strong>on</strong> ancestor are said<br />
to be homologous. The sequences of homologous genes and proteins were<br />
identical at the time they originated by replicati<strong>on</strong> of a single gene.<br />
Subsequently the two genes accumulate mutati<strong>on</strong>al changes, and the<br />
sequences of homologous genes and proteins can be identical, similar to<br />
varying degrees, or unrecognizably dissimilar because of extensive mutati<strong>on</strong>.<br />
<strong>Protein</strong>s and genes are either homologous or not, because they either did or<br />
did not descent from a comm<strong>on</strong> ancestor.<br />
The <strong>on</strong>ly explanati<strong>on</strong> other than divergence for similarities am<strong>on</strong>g sequences is<br />
c<strong>on</strong>vergence, in which two or more unrelated sequences have become similar<br />
under the pressure selecti<strong>on</strong> for similar functi<strong>on</strong>s. C<strong>on</strong>vergence is a fairly<br />
6