12.05.2015 Views

Practical Course on Multiple Sequence Alignment - CNB - Protein ...

Practical Course on Multiple Sequence Alignment - CNB - Protein ...

Practical Course on Multiple Sequence Alignment - CNB - Protein ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

precise meaning of equivalence is generally c<strong>on</strong>text dependent: for the<br />

phylogeneticist, equivalent residues have comm<strong>on</strong> evoluti<strong>on</strong>ary ancestry; for<br />

the structural biologist, equivalent residues corresp<strong>on</strong>d to analogous positi<strong>on</strong>s<br />

bel<strong>on</strong>ging to homologous folds in a set of proteins; for the molecular biologist,<br />

equivalent residues play similar functi<strong>on</strong>al roles in their corresp<strong>on</strong>ding<br />

proteins. In each case, an alignment provides a bird’s eye view of the<br />

underlying evoluti<strong>on</strong>ary, structural, or functi<strong>on</strong>al c<strong>on</strong>straints characterizing a<br />

protein family in a c<strong>on</strong>cise, visually intuitive format.<br />

Many bioinformatic methods rely <strong>on</strong> MSAs. Success of methods relies <strong>on</strong><br />

quality of alignment. MSA a difficult problem, although current automatic<br />

approaches are good, usually they can be improved by human interventi<strong>on</strong> to<br />

yield better alignments and hence better analysis. It is difficult to teach how to<br />

do this well - depends <strong>on</strong> obtaining experience. This course aims to provide a<br />

starting point to obtain this experience.<br />

A typical multiple sequence alignment workflow will be:<br />

1. Clearly formulate the questi<strong>on</strong> you want to answer. E.g.: “What is the<br />

sec<strong>on</strong>dary structure predicti<strong>on</strong> for protein sequence”<br />

2. Collect a set of sequences to address this questi<strong>on</strong>. E.g. using a<br />

BLAST database search.<br />

3. Create and examine initial automatic MSA. E.g. align using Clustal<br />

and examine in JalView<br />

4. Adjust the set of sequences to obtain an optimal set. E.g. remove<br />

unrelated sequences, add sequences from additi<strong>on</strong>al searches.<br />

5. Manually adjust alignment to correct automatically-introduced<br />

errors.<br />

Some notes <strong>on</strong> protein evoluti<strong>on</strong><br />

The evoluti<strong>on</strong>ary variati<strong>on</strong>s of a protein provide much informati<strong>on</strong> about the<br />

protein itself. Evoluti<strong>on</strong>ary divergence into different species has resulted in<br />

many variants of the same protein, all with essentially the same biological<br />

functi<strong>on</strong> but different amino acid sequences. The differences and similarities<br />

of the amino acid sequences of these variants reflect the c<strong>on</strong>straints of<br />

structure and functi<strong>on</strong> for that protein.<br />

The number of possible RNA, DNA, or protein sequences is so great that it is<br />

implausible that similar l<strong>on</strong>g sequences could have arisen by any mechanism<br />

other than evoluti<strong>on</strong>ary divergence from the same ancestor.<br />

Nucleic acids and proteins that have evolved from a comm<strong>on</strong> ancestor are said<br />

to be homologous. The sequences of homologous genes and proteins were<br />

identical at the time they originated by replicati<strong>on</strong> of a single gene.<br />

Subsequently the two genes accumulate mutati<strong>on</strong>al changes, and the<br />

sequences of homologous genes and proteins can be identical, similar to<br />

varying degrees, or unrecognizably dissimilar because of extensive mutati<strong>on</strong>.<br />

<strong>Protein</strong>s and genes are either homologous or not, because they either did or<br />

did not descent from a comm<strong>on</strong> ancestor.<br />

The <strong>on</strong>ly explanati<strong>on</strong> other than divergence for similarities am<strong>on</strong>g sequences is<br />

c<strong>on</strong>vergence, in which two or more unrelated sequences have become similar<br />

under the pressure selecti<strong>on</strong> for similar functi<strong>on</strong>s. C<strong>on</strong>vergence is a fairly<br />

6

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!