Bioinformatics Algorithms: Techniques and Applications

More documents

Recommendations

Info

4, 2, 3, 1, 11, 10, 9, 8, 7, 6, 5 4, 2, 3, 1 11, 10, 9, 8, 7, 6, 5 MODELS OF GENE CLUSTERS 189 4 2, 3 1 11 10 9 8 7 6 5 2 3 FIGURE 8.4 The tree of strong common intervals of the permutations G3 and G4. Leaves are ordered according to G4. (1) [Q-nodes] Any union of consecutive children of N is a common interval of G. (2) [P-nodes] No union of consecutive children of N is a common interval of G, except the union of all its children—in this case the union equals N itself. In PQ-trees, the P-nodes are traditionally depicted as roundish boxes and the Qnodes as rectangular boxes. The tree of Fig. 8.4 has only Q-nodes. A more general example is given by the tree of Fig. 8.5, which represents the strong common intervals of the permutations: G5 = ( 1 2 3 4 5 6 7 8 9 10 11 ) G6 = ( 1 4 2 5 3 11 10 8 9 7 6 ). In Fig. 8.5, the node corresponding to the strong common interval {4, 2, 3, 5} is a P-node, since no union of consecutive children is a common interval. This representation of strong common intervals allows them to serve as a basis for generating all common intervals of a set of permutations. We have 1, 4, 2, 5, 3, 11, 10, 8, 9, 7, 6 1 4, 2, 5, 3 11, 10, 8, 9, 7, 6 4 2 5 3 11 10 8, 9 7 6 8 9 FIGURE 8.5 The tree of strong common intervals of the permutations G5 and G6. Leaves are ordered according to G6.
190 FORMAL MODELS OF GENE CLUSTERS Proposition 8.3 [10] Let T be the PQ-tree of the strong common intervals of a set G of permutations, ordered according to one of the permutations in G. A set S is a common interval of G if and only if it is the union of consecutive nodes of children of a Q-node or the union of all children of a P-node. 8.4.1.1 Computing Common Intervals and Strong Intervals The algorithmic history of efficient computation of common and strong intervals has an interesting twist. From the start, Uno and Yagiura [50] proposed an algorithm to compute the common intervals of two permutations whose theoretical running time was O(n + N), where n is the number of elements of the permutation, and N is the number of common intervals of the two permutations. Such an algorithm can be considered as optimal since it runs in time proportional to the sum of the size of the input and the size of the output. However, the authors acknowledged that their algorithm was “quite complicated” and that, in practice, simpler O(n 2 ) algorithms run faster on randomly generated permutations. Building on Uno and Yagiura’s work, Heber and Stoye [27] proposed an algorithm to generate all common intervals of a set of K permutations in time proportional to Kn + N, based on Uno and Yagiura analysis. They achieved the extension to K permutations by considering the set of irreducible common intervals that are common intervals and that are not the union of two overlapping common intervals. As for the strong intervals, the irreducible common intervals also form a basis of size O(n) that generates the common intervals by unions of overlapping irreducible intervals. The drawback of these algorithms is that they use complex data structures that are difficult to implement. A simpler way to generate the common intervals is to compute a basis that generates intervals using intersections instead of unions. Definition 8.7 Let G be a set of K permutations on n elements that contains the identity permutation. A generator for the common intervals of G is a pair (R, L) of vectors of size n such that (1) R[i] ≥ i and L[j] ≤ j for all i, j ∈{1, 2,...,n}, (2) (i,...,j) is a common interval of G if and only if (i,...,j) = (i,...,R[i]) ∩ (L[j],...,j). It is not immediate that such generators even exist, but it turns out that they are far from unique, and some of them can be computed using elementary data structures such as stacks and arrays [10]. The algorithms are easy to implement, and the theoretical complexity is O(Kn + N). The strong common intervals can also be computed in O(Kn). 8.4.1.2 The Use of Common Intervals in Comparative Genomics Datasets based on permutations that use real “genes” are not frequent in comparative genomics since real genes are often found in several copies within the genome of an organism. In order to obtain permutations, it is possible to eliminate all duplicates, or even better,
Page 2 and 3:
BIOINFORMATICS ALGORITHMS
Page 4 and 5:
Copyright © 2008 by John Wiley & S
Page 6 and 7:
vi CONTENTS 6 A Survey of Seeding f
Page 8 and 9:
PREFACE Bioinformatics, broadly def
Page 10 and 11:
CONTRIBUTORS Sudha Balla, Departmen
Page 12 and 13:
CONTRIBUTORS xiii Steven Hecht Orza
Page 14 and 15:
1 EDUCATING BIOLOGISTS IN THE 21ST
Page 16 and 17:
EDUCATING BIOLOGISTS IN THE 21ST CE
Page 18 and 19:
EDUCATING BIOLOGISTS IN THE 21ST CE
Page 20 and 21:
2 DYNAMIC PROGRAMMING ALGORITHMS FO
Page 22 and 23:
SEQUENCE ALIGNMENT: GLOBAL, LOCAL,
Page 24 and 25:
ecurrence: SEQUENCE ALIGNMENT: GLOB
Page 26 and 27:
Page 28 and 29:
Page 30 and 31:
DYNAMIC PROGRAMMING ALGORITHMFOR RN
Page 32 and 33:
DYNAMIC PROGRAMMING ALGORITHMFOR RN
Page 34 and 35:
DYNAMIC PROGRAMMING ALGORITHMS FOR
Page 36 and 37:
REFERENCES 25 the flexible structur
Page 38 and 39:
REFERENCES 27 32. Gusfield D. Effic
Page 40 and 41:
3 GRAPH THEORETICAL APPROACHES TO D
Page 42 and 43:
GRAPH THEORY BACKGROUND 31 beginnin
Page 44 and 45:
GRAPH THEORY BACKGROUND 33 FIGURE 3
Page 46 and 47:
GRAPH THEORY BACKGROUND 35 chordal
Page 48 and 49:
GRAPH THEORY BACKGROUND 37 decompos
Page 50 and 51:
RECONSTRUCTING PHYLOGENIES 39 are (
Page 52 and 53:
RECONSTRUCTING PHYLOGENIES 41 only
Page 54 and 55:
FORMATION OF MULTIPROTEIN COMPLEXES
Page 56 and 57:
3.4.1 Ribosomal Assembly FORMATION
Page 58 and 59:
Page 60 and 61:
Page 62 and 63:
ACKNOWLEDGMENTS REFERENCES 51 This
Page 64 and 65:
REFERENCES 53 37. Golumbic MC, Hart
Page 66 and 67:
4 ADVANCES IN HIDDEN MARKOV MODELS
Page 68 and 69:
HIDDEN MARKOV MODELS FOR SEQUENCE A
Page 70 and 71:
Page 72 and 73:
Page 74 and 75:
ALTERNATIVES TO VITERBI DECODING 63
Page 76 and 77:
Noncoding Coding Intron (a) Without
Page 78 and 79:
also have this same label). We get
Page 80 and 81:
change as follows: GENERALIZED HIDD
Page 82 and 83:
0.00004 0.00002 0.00000 0 20000 400
Page 84 and 85:
HMMS WITH MULTIPLE OUTPUTS OR EXTER
Page 86 and 87:
Page 88 and 89:
Page 90 and 91:
Page 92 and 93:
Page 94 and 95:
TRAINING THE PARAMETERS OF AN HMM 8
Page 96 and 97:
CONCLUSION 85 of parameters compare
Page 98 and 99:
REFERENCES 87 4. Altun Y, Tsochanta
Page 100 and 101:
REFERENCES 89 42. Krogh A. Using da
Page 102 and 103:
REFERENCES 91 77. Xu EW, Kearney P,
Page 104 and 105:
94 SORTING- AND FFT-BASED TECHNIQUE
Page 106 and 107:
Page 108 and 109:
Page 110 and 111:
100 SORTING- AND FFT-BASED TECHNIQU
Page 112 and 113:
Page 114 and 115:
Page 116 and 117:
Page 118 and 119:
Page 120 and 121:
Page 122 and 123:
Page 124 and 125:
Page 126 and 127:
6 A SURVEY OF SEEDING FOR SEQUENCE
Page 128 and 129:
ALIGNMENTS 119 6.2.1 Formal Definit
Page 130 and 131:
TRADITIONAL APPROACHES TO HEURISTIC
Page 132 and 133:
Page 134 and 135:
Page 136 and 137:
MORE CONTEMPORARY SEEDING APPROACHE
Page 138 and 139:
Page 140 and 141:
Page 142 and 143:
MORE COMPLICATED SEED DESCRIPTIONS
Page 144 and 145:
MORE COMPLICATED SEED DESCRIPTIONS
Page 146 and 147: MORE COMPLICATED SEED DESCRIPTIONS
Page 148 and 149: SOME THEORETICAL ISSUES IN ALIGNMEN
Page 150 and 151: REFERENCES 141 6. Brown DG. Optimiz
Page 152 and 153: 7 THE COMPARISON OF PHYLOGENETIC NE
Page 154 and 155: INTRODUCTION 145 known phylogeny re
Page 156 and 157: BASIC DEFINITIONS 147 The undirecte
Page 158 and 159: BASIC DEFINITIONS 149 N1 displays N
Page 160 and 161: A B C A B C SUBTREES AND SUBNETWORK
Page 162 and 163: SUBTREES AND SUBNETWORKS 153 of x a
Page 164 and 165: SUBTREES AND SUBNETWORKS 155 1. it
Page 166 and 167: SUPERTREES AND SUPERNETWORKS 157 By
Page 168 and 169: SUPERTREES AND SUPERNETWORKS 159 Go
Page 170 and 171: SUPERTREES AND SUPERNETWORKS 161 Th
Page 172 and 173: RECONCILIATION OF GENE TREES AND SP
Page 182 and 183: REFERENCES 173 21. Gòrecki P, Tiur
Page 184 and 185: 8 FORMAL MODELS OF GENE CLUSTERS An
Page 186 and 187: 8.2 GENOME PLASTICITY 8.2.1 Genome
Page 188 and 189: GENOME PLASTICITY 181 FIGURE 8.2 An
Page 190 and 191: BASIC CONCEPTS 183 “more or less
Page 192 and 193: BASIC CONCEPTS 185 of {m, o, s}. On
Page 194 and 195: MODELS OF GENE CLUSTERS 187 Definit
Page 198 and 199: MODELS OF GENE CLUSTERS 191 FIGURE
Page 200 and 201: MODELS OF GENE CLUSTERS 193 another
Page 202 and 203: MODELS OF GENE CLUSTERS 195 of gene
Page 204 and 205: MODELS OF GENE CLUSTERS 197 The two
Page 206 and 207: REFERENCES 199 flexibility by bound
Page 208 and 209: REFERENCES 201 28. Hoberman R, Dura
Page 210 and 211: 9 INTEGER LINEAR PROGRAMMING TECHNI
Page 212 and 213: BASIC PROBLEM SPECIFICATION 205 a n
Page 214 and 215: INTEGER LINEAR PROGRAMMING FORMULAT
Page 216 and 217: INTEGER LINEAR PROGRAMMING FORMULAT
Page 218 and 219: 9.4 EXTENSIONS AND VARIATIONS EXTEN
Page 220 and 221: i=1 EXTENSIONS AND VARIATIONS 213 H
Page 222 and 223: 9.5 COMPUTATIONAL RESULTS COMPUTATI
Page 224 and 225: DISCUSSION 217 TABLE 9.2 Cluster Si
Page 226 and 227: DISCUSSION 219 FIGURE 9.5 Manually
Page 228 and 229: ACKNOWLEDGMENTS REFERENCES 221 We t
Page 230 and 231: 224 EFFICIENT COMBINATORIAL ALGORIT
Page 246 and 247:
11 ALGORITHMS FOR MULTIPLEX PCR PRI
Page 248 and 249:
INTRODUCTION 243 problem: given a s
Page 250 and 251:
1. p hybridizes at position t of f
Page 252 and 253:
Thus, constraints 11.7 can be repla
Page 254 and 255:
A GREEDY ALGORITHM 249 FIGURE 11.3
Page 256 and 257:
EXPERIMENTAL RESULTS 251 11.5.1 Amp
Page 258 and 259:
#primers/(2x#SNPs) (%) #primers/(2x
Page 260 and 261:
TABLE 11.2 (Continued ) EXPERIMENTA
Page 262 and 263:
REFERENCES 257 p, discard all candi
Page 264 and 265:
12 RECENT DEVELOPMENTS IN ALIGNMENT
Page 266 and 267:
12.2 MULTIPLE SEQUENCE ALIGNMENT 12
Page 268 and 269:
MULTIPLE SEQUENCE ALIGNMENT 263 The
Page 270 and 271:
MOTIF FINDING 265 Marsan and Sagot
Page 272 and 273:
BIOLOGICAL NETWORK ANALYSIS 267 mul
Page 274 and 275:
DISCUSSION 269 an interaction pair
Page 276 and 277:
REFERENCES 271 13. Bucka-Lassen K,
Page 278 and 279:
REFERENCES 273 52. Lee C, Grasso C,
Page 280 and 281:
REFERENCES 275 90. Stormo GD, Hartz
Page 282 and 283:
PART III MICROARRAY DESIGN AND DATA
Page 284 and 285:
280 ALGORITHMS FOR OLIGONUCLEOTIDE
Page 286 and 287:
Page 288 and 289:
Page 290 and 291:
Page 292 and 293:
Page 294 and 295:
Page 296 and 297:
Page 298 and 299:
Page 300 and 301:
Page 302 and 303:
Page 304 and 305:
Page 306 and 307:
14 CLASSIFICATION ACCURACY BASED MI
Page 308 and 309:
INTRODUCTION 305 Decomposition (SVD
Page 310 and 311:
METHODS 307 Note that in most of th
Page 312 and 313:
estimated as K� 1 ai,j = aik,j. d
Page 314 and 315:
METHODS 311 [7]. The KNN-classifier
Page 316 and 317:
ROWimpute-KNN ROWimpute-SVM KNNimpu
Page 318 and 319:
Page 320 and 321:
Classification accuracies of SRBCT
Page 322 and 323:
Page 324 and 325:
Page 326 and 327:
Classification accuracies of SRBCT
Page 328 and 329:
REFERENCES 325 From these two plots
Page 330 and 331:
REFERENCES 327 18. Troyanskaya OG,
Page 332 and 333:
330 META-ANALYSIS OF MICROARRAY DAT
Page 334 and 335:
Page 336 and 337:
Page 338 and 339:
Page 340 and 341:
Page 342 and 343:
Page 344 and 345:
Page 346 and 347:
Page 348 and 349:
Page 350 and 351:
Page 352 and 353:
Page 354 and 355:
Page 356 and 357:
16 PHASING GENOTYPES USING A HIDDEN
Page 358 and 359:
A HIDDEN MARKOV MODEL FOR RECOMBINA
Page 360 and 361:
LEARNING THE HMM FROM UNPHASED GENO
Page 362 and 363:
Page 364 and 365:
Page 366 and 367:
EXPERIMENTAL RESULTS 365 It is also
Page 368 and 369:
DISCUSSION 367 TABLE 16.1 Phasing A
Page 370 and 371:
GERBIL PHASE fastPHASE 0.4 0.35 0.3
Page 372 and 373:
REFERENCES 371 however, that direct
Page 374 and 375:
17 ANALYTICAL AND ALGORITHMIC METHO
Page 376 and 377:
INTRODUCTION 375 The use of real ha
Page 378 and 379:
follows: X11 = 2N11 + N12 + N21 X21
Page 380 and 381:
METHODS 379 FIGURE 17.1 The likelih
Page 382 and 383:
METHODS 381 TABLE 17.3 Tests for Ha
Page 384 and 385:
METHODS 383 The sixth stochastic al
Page 386 and 387:
RESULTS 385 TABLE 17.4 The Distribu
Page 388 and 389:
RESULTS 387 TABLE 17.6 The Distribu
Page 390 and 391:
DISCUSSION 389 2SNP also produced r
Page 392 and 393:
ACKNOWLEDGMENTS 391 haplotypes need
Page 394 and 395:
REFERENCES 393 16. Hill WG. Estimat
Page 396 and 397:
18 OPTIMIZATION METHODS FOR GENOTYP
Page 398 and 399:
Tag-restricted haplotype n Complete
Page 400 and 401:
INFORMATIVE SNP SELECTION 399 from
Page 402 and 403:
DISEASE ASSOCIATION SEARCH 401 18.2
Page 404 and 405:
DISEASE ASSOCIATION SEARCH 403 18.3
Page 406 and 407:
Below is the formal description of
Page 408 and 409:
RESULTS AND DISCUSSION 407 to decid
Page 410 and 411:
RESULTS AND DISCUSSION 409 � Comp
Page 412 and 413:
RESULTS AND DISCUSSION 411 TABLE 18
Page 414 and 415:
RESULTS AND DISCUSSION 413 nonindex
Page 416 and 417:
REFERENCES 415 20. Lee PH, Shatkay
Page 418 and 419:
19 TOPOLOGICAL INDICES IN COMBINATO
Page 420 and 421:
TOPOLOGICAL INDICES 421 The quantit
Page 422 and 423:
Theorem 19.2 Let T = (V, E) be a tr
Page 424 and 425:
HOSOYA POLYNOMIAL 425 The Laplacian
Page 426 and 427:
H2(G, x) = � {u,v}⊆V INVERSE WI
Page 428 and 429:
HEXAGONAL SYSTEMS 429 hexagonal sys
Page 430 and 431:
C 2 HEXAGONAL SYSTEMS 431 FIGURE 19
Page 432 and 433:
THE WIENER INDEX OF PEPTOIDS 433 Th
Page 434 and 435:
if R ≥ L, then π(Lp) = i; Lp = L
Page 436 and 437:
REFERENCES 437 19. Entringer RC, Me
Page 438 and 439:
20 EFFICIENT ALGORITHMS FOR STRUCTU
Page 440 and 441:
COMPOUND REPRESENTATION 441 FIGURE
Page 442 and 443:
COMPOUND REPRESENTATION 443 breakag
Page 444 and 445:
TABLE 20.1 Bond List of Aspirin Bon
Page 446 and 447:
COMPOUND REPRESENTATION 447 20.2.5
Page 448 and 449:
Initial class value for node A A 3
Page 450 and 451:
CHEMICAL COMPOUND DATABASE 451 In c
Page 452 and 453:
CHEMICAL COMPOUND DATABASE 453 taki
Page 454 and 455:
CHEMICAL COMPOUND DATABASE 455 Othe
Page 456 and 457:
REFERENCES 457 lab may take months
Page 458 and 459:
REFERENCES 459 22. Curco D, Rodrigu
Page 460 and 461:
REFERENCES 461 61. An J, Nakama T,
Page 462 and 463:
REFERENCES 463 101. Shen J. HAD An
Page 464 and 465:
466 COMPUTATIONAL APPROACHES TO PRE
Page 466 and 467:
Page 468 and 469:
Page 470 and 471:
Page 472 and 473:
Page 474 and 475:
Page 476 and 477:
Page 478 and 479:
Page 480 and 481:
Page 482 and 483:
Page 484 and 485:
Page 486 and 487:
Page 488 and 489:
Page 490 and 491:
INDEX 2SNP computer program 383, 38
Page 492 and 493:
degeneracy 101-104, 112 degenerate
Page 494 and 495:
lowest p-value method 484-486 max-g
Page 496 and 497:
pseudoknots 20 p-value 339-343, 347
Page 498:
ioinformatics-cp.qxd 11/29/2007 8:4
show all

Bioinformatics Algorithms: Techniques and Applications

Create successful ePaper yourself

Delete template?

Save as template?