15.12.2012 Views

Bioinformatics, Volume I Data, Sequence Analysis and Evolution

Bioinformatics, Volume I Data, Sequence Analysis and Evolution

Bioinformatics, Volume I Data, Sequence Analysis and Evolution

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

3.1. Splice Graph<br />

Representation of<br />

Gene Structure Structure <strong>and</strong><br />

Alternative Alternative Splicing<br />

Fig. 10.1. Splice graph representation<br />

of gene structure<br />

<strong>and</strong> alternative splicing. (A)<br />

The exon-intron structure<br />

of a three-exon gene. The<br />

middle exon is alternatively<br />

spliced. (B) The splice graph<br />

representation of the gene<br />

structure. Alternative splicing<br />

of the second exon is<br />

represented by a directed<br />

edge from node 1 to node 3.<br />

Reconstruction of Full-Length Isoforms from Splice Graphs 201<br />

Conventionally, a gene structure is represented as a linear string<br />

of exons (Fig. 10.1A), ordered according to the positions of<br />

exons from 5′ to 3′. This simple representation, however, is insufficient<br />

for the analyses of alternative splicing <strong>and</strong> the inference of<br />

full-length isoforms. By definition, alternative splicing introduces<br />

branches to the gene structure, disrupting the validity of a single<br />

linear order of all exons.<br />

In a pioneering study, Heber <strong>and</strong> colleagues introduced the<br />

concept of “splice graph” (3), which is a directed acyclic graph<br />

representation of gene structure. In the splice graph, each exon<br />

is represented as a node, <strong>and</strong> each splice junction is represented<br />

as a directed edge between two nodes (i.e., exons) (Fig. 10.1B).<br />

Different types of alternative splicing events, such as exon skipping,<br />

alternative 5′/3′ splicing, <strong>and</strong> intron retention, can be easily<br />

represented using splice graphs. Fig. 10.2 shows the splice<br />

graph of a multi-exon human gene TCN1. The observed exon<br />

skipping event of exon 2 is represented as a directed edge from<br />

node 1 to node 3. Similarly, the observed exon skipping event of<br />

exon 5/6 is represented as a directed edge from node 4 to node<br />

7. One EST from unspliced genomic DNA is represented as a<br />

single isolated node of the splice graph (node 8). Under such a<br />

representation, the isoform problem becomes a graph traversal<br />

problem. Multiple traversals of the splice graph correspond to<br />

multiple isoforms of a gene. Furthermore, the splice graph can<br />

be weighted. The edge weight reflects the strength of experimental<br />

evidence for a particular splice junction.<br />

For EST data, this can be the number of ESTs on<br />

which two exons are connected by a splice junction.<br />

For microarray data, this can be the signal<br />

intensity of a particular exon junction probe.<br />

To construct a splice graph, we start from<br />

sequence-based detection of exon-intron structure<br />

<strong>and</strong> alternative splicing (which is described in<br />

detail in the previous chapter). We treat each exon<br />

as a node in the splice graph. Alternative donor/<br />

acceptor splicing can produce two exon forms<br />

with a common splice site at one end, <strong>and</strong> different<br />

splice sites at the other end. We treat these two exon forms<br />

as different nodes in the splice graph. Next, we go through each<br />

expressed sequence to obtain edge information of the splice<br />

graph. We connect two nodes with a directed edge if the two<br />

exons are linked by a splice junction in the expressed sequences.<br />

The edge weight is set as N if the connection between two<br />

nodes is observed in N expressed sequences. In the end, we<br />

obtain a directed acyclic graph (DAG). This graph represents all<br />

splicing events of a gene <strong>and</strong> their numbers of occurrences in<br />

the sequence data.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!