16.07.2013 Views

Vol 9 No1 - Journal of Cell and Molecular Biology - Haliç Üniversitesi

Vol 9 No1 - Journal of Cell and Molecular Biology - Haliç Üniversitesi

Vol 9 No1 - Journal of Cell and Molecular Biology - Haliç Üniversitesi

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

48 Arumugam KUNTHAVAI <strong>and</strong> Somasundaram VASANTHA RATHNA<br />

array. The lcp-interval tree <strong>of</strong> S = acaaacatat$ is<br />

shown in Figure 2.<br />

Figure 2. The Lcp-interval tree <strong>of</strong> S = acaaacatat$<br />

An interval [i...j], where 0 ≤ i ≤ j ─ n, in an Lcparray<br />

is called an Lcp-interval<br />

<strong>of</strong> Lcp-value ℓ<br />

(denoted by ℓ-[i...j]) if<br />

Lcptab[i] < ℓ<br />

Lcptab[k] ≤ ℓ for all k with i+1 ≤ k ≤ j<br />

Lcptab[k] = ℓ for at least one k with<br />

i+1 ≤ k ≤ j<br />

Lcptab [j + 1] < ℓ<br />

Every index k, i+1 ≤ k ≤ j, with Lcptab[k] = Ssuftab<br />

is called ℓ index. The set <strong>of</strong> all ℓ indices <strong>of</strong> an ℓ<br />

interval [i...j] will be denoted by ℓ Indices (i...j). If<br />

[i...j] is an ℓ-interval such that ω =<br />

S[suftab[i]..Suftab[i]+ ℓ -1] is the longest common<br />

prefix <strong>of</strong> the suffixes Ssuftab[i]; Ssuftab[i+1]; … ;<br />

Suftab[j], then [i...j] is also called ω-interval. Based<br />

on the analogy between the suffix array <strong>and</strong> the<br />

suffix tree, it is desirable to enhance the suffix array<br />

with additional information to determine, for any ℓ-<br />

interval [i..j], all its child intervals in constant time<br />

using enhancing the suffix array with two tables.<br />

Enhanced Suffix array<br />

The new data structure consists <strong>of</strong> the suffix array,<br />

the Lcp-interval table, <strong>and</strong> an additional<br />

table: the<br />

child-table cldtab shown in Table 2.<br />

The child-table is a table <strong>of</strong> size n+1 indexed<br />

from 0 to n <strong>and</strong> each entry contains three values:<br />

up, down, <strong>and</strong> nextℓIndex. Each <strong>of</strong> these three<br />

values requires 4 bytes in the worst case. The<br />

values <strong>of</strong> each cldtab-entry are defined as follows<br />

(it is assumed that min Φ = max Φ = 1):<br />

1. cldtab[i].up =<br />

Min {q Є [0..i - 1] | Lcptab[q] > Lcptab[i]<br />

<strong>and</strong> for all k Є [q + 1..i - 1] :<br />

Lcptab[k] ≥ Lcptab[q]}<br />

2. cldtab[i].down<br />

=<br />

Max {q Є [i + 1.. n] | Lcptab[q] > Lcptab[i]<br />

<strong>and</strong> for all k Є [i + 1..q - 1] : Lcptab[k] ≥<br />

Lcptab[q]}<br />

3. cldtab[i].next ℓ Index =<br />

Min {q Є [i + 1.. n] | Lcptab[q] = Lcptab[ i]<br />

<strong>and</strong> for all k Є [i + 1..q - 1] : Lcptab[k] ><br />

Lcptab[i]}<br />

The child-table stores the parent-child<br />

relationship <strong>of</strong> Lcp-intervals. For an ℓ -interval<br />

[i...j] whose ℓ -indices are i1 < i2

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!