Practical Course on Multiple Sequence Alignment - CNB - Protein ...
Practical Course on Multiple Sequence Alignment - CNB - Protein ...
Practical Course on Multiple Sequence Alignment - CNB - Protein ...
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
Representing multiple sequence alignments<br />
Although most of the time you will represent a multiple sequence alignment as<br />
a matrix of aligned residues, in some cases it is c<strong>on</strong>venient to use alternative<br />
representati<strong>on</strong>s (as shortcuts or for generati<strong>on</strong> of nice figures). Two of these<br />
alternative representati<strong>on</strong>s are the c<strong>on</strong>sensus sequences, and sequence logos.<br />
Alternative more complex representati<strong>on</strong>s will be presented in the next<br />
secti<strong>on</strong>s (Working with profiles).<br />
C<strong>on</strong>sensus sequence<br />
C<strong>on</strong>sensus sequence refers to the most comm<strong>on</strong> nucleotide or amino acid at a<br />
particular positi<strong>on</strong> after multiple sequences are aligned. A c<strong>on</strong>sensus<br />
sequence is a way of representing the results of a multiple sequence<br />
alignment, where related sequences are compared to each other, and similar<br />
functi<strong>on</strong>al sequence motifs are found. The c<strong>on</strong>sensus sequence shows the<br />
residues that are most abundant in the alignment at each positi<strong>on</strong>.<br />
Example: WW domain (from SMART database) represented as c<strong>on</strong>sensus<br />
sequences c<strong>on</strong>sidering 80, 65 and 50%.<br />
O54971/1-33<br />
CONSENSUS/80%<br />
CONSENSUS/65%<br />
CONSENSUS/50%<br />
PLPPGWEKRTDSN-GRVYFVN---HNTRITQWEDPRS<br />
.h..sW..hhs.p.sh.aahs.....stpopWptPt.<br />
slsssWppthsss.GphYYhs...ppocpopWpcPp.<br />
sLPsGWccttsss.G+sYYaN...ppT+copW-cPss<br />
The grouping of amino acids to classes and class abbreviati<strong>on</strong> (the key) used<br />
within c<strong>on</strong>sensus sequences are shown below.<br />
Class Key Residues<br />
alcohol o S,T<br />
aliphatic l I,L,V<br />
any . A,C,D,E,F,G,H,I,K,L,M,N,P,Q,R,S,T,V,W,Y<br />
aromatic a F,H,W,Y<br />
charged c D,E,H,K,R<br />
hydrophobic h A,C,F,G,H,I,K,L,M,R,T,V,W,Y<br />
negative - D,E<br />
polar p C,D,E,H,K,N,Q,R,S,T<br />
positive + H,K,R<br />
small s A,C,D,G,N,P,S,T,V<br />
tiny u A,G,S<br />
turnlike t A,C,D,E,G,H,K,N,Q,R,S,T<br />
<strong>Sequence</strong> logos<br />
<strong>Sequence</strong> logos are a graphical way for presenting multiple alignments. The<br />
sequence logo shows how well residues are c<strong>on</strong>served at each positi<strong>on</strong>: the<br />
fewer the number of residues, the higher the letters will be, because the better<br />
the c<strong>on</strong>servati<strong>on</strong> is at that positi<strong>on</strong>. Different residues at the same positi<strong>on</strong> are<br />
20