07.02.2013 Views

Bioinformatics Algorithms: Techniques and Applications

Bioinformatics Algorithms: Techniques and Applications

Bioinformatics Algorithms: Techniques and Applications

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

38 GRAPH THEORETICAL APPROACHES<br />

3.3 RECONSTRUCTING PHYLOGENIES<br />

Consider a set of taxa, where each taxon is represented by a vector of attributes, the<br />

so-called characters. We assume that every character can take one of a finite number<br />

of states <strong>and</strong> the set of taxa evolved from a common ancestor through changes of<br />

states of the corresponding characters. For example, the set of taxa can be described<br />

by columns in multiple sequence alignment of protein sequences. In this case, each<br />

column in the alignment is a character that can assume one of twenty possible states.<br />

Parsimony methods seek a phylogenetic tree that explains the observed characters<br />

with the minimum number of character changes along the branches of the tree.<br />

In our working example for this section, the set of taxa includes eight species<br />

shown in Fig. 3.4a; each species is described by two binary characters. As there<br />

FIGURE 3.4 A set of eight species: Anopheles gambiae (Ag), Arabidopsis thaliana (At),<br />

Caenorhabditis elegans (Ce), Drosophila melanogaster (Dm), Homo sapiens (Hm), Plasmodium<br />

falciparum (Pf), Saccharomyces cerevisiae (Ag), <strong>and</strong> Saccharomyces pombe (Sp). (a)<br />

The species are described by binary characters that correspond to the presence (value of 1) or<br />

absence (value of 0) of introns. This is truncated data limited to just two introns (105 <strong>and</strong> 256)<br />

out of about 7236 from the study of Rogozin et al. [59]. (b) A phylogenetic tree: the leaves<br />

are the species in the set <strong>and</strong> are labeled with the input character states; the internal nodes are<br />

ancestral species <strong>and</strong> are labeled with the inferred character states. This particular tree requires<br />

three character changes , which are marked with solid bars on the corresponding edges.(c) The<br />

character overlap graph. There are four vertices, one vertex per character state, 105 (state “1”<br />

of the character “intron 105”), −105 (state “0” of the character “intron 105”), 256 (state “1” of<br />

the character “intron 256”), <strong>and</strong> −256 (state “0” of the character “intron 256”). Two vertices<br />

are connected by an edge if corresponding character states are observed together in some taxon.<br />

The edge (105, −256), for example, is due to species Ag <strong>and</strong> Dm.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!