Rapide bilan 2012-2013 - LIFL

Rapide bilan 2012-2013 

Laurent 

LIFL, Université Lille 1 - INRIA 

Journées au vert 

11 et 12 juin 2013 

Laurent Année 2012-2013

Mais avant ... 

mais avant ... 


Rapide bilan 2011-2012 

Laurent 

LIFL, Université Lille 1 - INRIA 

Journées au vert 

13 et 14 juin 2012 

Laurent Année 2011-2012

J’avais fini l’année dernière par : 

Merci pour les nombreux PJIs encadrées cette année !! 

n’hesitez pas à en proposer encore plus l’année prochaine :-) 


J’avais fini l’année dernière par : 

Merci pour les nombreux PJIs encadrées cette année !! 

n’hesitez pas à en proposer encore plus l’année prochaine :-) 

Donc je peux recommencer, en y rajoutant désormais : 

Merci aux présidents permanents et ponctuels cette année !! 

n’hesitez pas à en prendre encore plus (module) l’année prochaine :-) 


Vendredi 1 er juin 

Date Heure Salle # Titre Auteur Responsable Etudiant1 Etudiant2 

2012-06-01 08h00 M5-A7 124 


2012-06-01 09h00 M5-A7 96 Intégration d'un mécanisme de récompenses pour des Nicolas Haderer Romain Rouvoy Nacim Hamdad Benjamin Bertein 

2012-06-01 09h30 M5-A7 47 Conception d'une interface Web pour la visualisation Adel Boris Couturier 

Noureddine Romain Rouvoy Jonathan Decrocq 

2012-06-01 10h45 M5-A7 93 [AGIL-IT] Site de suivi des demandes de l'accueil RH Lionel Drain 

2012-06-01 11h15 M5-A7 92 Régis Servant 

[AGIL-IT] [http://www.mobitic.fr/] Développement et Patricia Plénacoste Loïc Daara Sébastien Poulmane 

2012-06-01 11h45 M5-A7 50 Vers un campus ubiquitaire et social Yvan Peter Yvan Peter 

2012-06-01 08h00 M5-A8 33 Classification des Historiques de Ventes en grande 

2012-06-01 08h30 M5-A8 45 Application web de consultations de données Jean-Christophe Routier Jean-Christophe Routier 

2012-06-01 09h00 M5-A8 128 Interface de saisie et de restitution de données Jean-Christophe Routier Jean-Christophe Routier Julien Milan 

2012-06-01 09h30 M5-A8 88 Application web de gestion du flux des achats Bruno Bogaert Bruno Bogaert Benjamin Jacquet 

2012-06-01 10h45 M5-A8 63 Souris 3D 

2012-06-01 11h15 M5-A8 1 Suivi multi-flux d'objets mobiles Chabane Djeraba Chabane Djeraba Alexandre Mandy 

2012-06-01 11h45 M5-A8 109 Comparaisons de séquences musicales symboliques Mathieu Giraud Mathieu Giraud Corentin Bertiaux Anthony Lerouge 

2012-06-01 14h00 M5-A7 86 Annotation de génomes Sylvain Denis 

Hélène Touzet Mikael Salson Pauline Wauquier 

2012-06-01 14h30 M5-A7 112 Mise en place d'une base de données des élus et mMikaël Salson Mikaël Salson Goulven Rozec Patience Ngami-Nana 

2012-06-01 15h00 M5-A7 66 distrSégolène Caboche Eric Piette 

Développement d'un outil de visualisation de la Mikael Salson Luigi Palmiero 


2012-06-01 16h45 M5-A7 11 Recherche dans des millions de courtes séquencesMikaël Salson 

Mikaël Salson 

Florian Recourt 

2012-06-01 17h15 M5-A7 94 [AGIL-IT] Création d’un site internet de promotion de Julien Bliart 

Laurent Noé 



MINY - Multimodality Is Nice for You! Xavier Le Pallec Xavier Le Pallec Alain Laraki 

2012-06-01 14h30 M5-A9 102 Clement Dufour 

2012-06-01 15h00 M5-A9 60 Jean Martinet 

Kinect et ZCam, le face à face Amel Aissaoui Ramy Arbid Antoni Pauchet 

2012-06-01 15h30 M5-A9 41 Extraction d'information de Twitter Ali Abbas 

Luigi Lancieri Eric Lepretre David Deroo 

desSamuel Blanquart Samuel Blanquart Benoît-Charles Detuncq 

2012-06-01 16h45 M5-A9 35 Développement d'une interface de visualisation Yannick Leroy 

2012-06-01 17h15 M5-A9 37 Développement de sites déployables pour la gestion Samuel Blanquart Samuel Blanquart Ismael Souissi Samuel Queniart 


Implémentation d'un jeu de tir à l'arc avec Kinect Thomas Pietrzak Thomas Pietrzak Guillaume Devos Joffrey Hochart 

Visage en relief avec Zcam : application à la détectiAfifa Dahmane Afifa Dahmane Bahare Shirazi 

Patricia Plénacoste 

Emmanuel Ardiot 

Simon Debaecke 

Christophe Leemans 

Sylvain Mongy Jean-Stéphane Varré Benjamin Fisset Taha Touati 

Kouami-Aderibgbe Adekambi 

Marc Duez 

Matthieu Calmels 

Cédric Montay 

Géry Casiez Géry Casiez Kevin Pollaert Axel Delahaye 

Alecsia : Aide à L'Evaluation et à la Correction SemMikaël Salson Mikaël Salson Ludovic Loridan 

Tony Proum 

Développement d’une base de données de glycosylAnne Harduin-Lepers Olgo Plechakova & Maria-CeciliaAnthony Tonglet Antoine Baluzolanga-Kiatoko 

Paint collaboratif par Smartphones Xavier Le Pallec Xavier Le Pallec Maxime Raverdy Damien Level 

Développement avec un framework PHP pour aiderVincent Vatelot Gilles Vanwormhoudt Adel Ben-Elkrizi Mamadou Cellou Dara Diallo

Lundi 4 juin 


2012-06-04 08h00 M5-A7 90 [ATOS-IT] MOW (groupe 1 : étudiants 1 et 2) 

Lionel Seinturier Lionel Seinturier Gaetan Mallants 

2012-06-04 08h30 M5-A7 91 [ATOS-IT] MOW (groupe 1 : étudiants 3 et 4) Manuel Servais 

2012-06-04 09h00 M5-A7 121 Client WindowsPhone pour plate-forme d'échange de Pierre Kopaczewski Lionel Seinturier Matthias Mellouli Louis Dekeister 

2012-06-04 09h30 M5-A7 80 Cloud Security - Simulations d'attaques distribuées Damien Riquet 

Gilles Grimaud Ludovic Moreau 

2012-06-04 10h45 M5-A7 2 Recherche de sous-graphes dans un graphe, appliquée Laurent Noé 

Maude Pupin 

2012-06-04 11h15 M5-A7 22 Développement d'un outil de dessin facilitant la créat Valérie Leclère 

Maude Pupin 

2012-06-04 11h45 M5-A7 34 Implémentation d’une solution BI et analyse technolSylvain Maude Pupin 

Mongy Adil Ayar Morgan Auchede 


2012-06-04 08h30 M5-A8 72 bas Maria Cecilia Arias 

Développement d’interfaces graphiques pour une Olga Plechakova Gorgui-Djire Ndong Naby Gueye 

2012-06-04 09h00 M5-A8 18 Base de données Intranet du matériel biologique d'une Christophe Remi Duriez 

D'Hulst Olga Plechakova Benjamin Bellangeon 

2012-06-04 09h30 M5-A8 126 intégrat Jean-Frédéric Berthelot Jean-Frédéric Berthelot 

Développement d’extensions pour CMS pour Mickael Lemaitre Jerome Deboffles 

2012-06-04 10h45 M5-A8 127 permettant Jean-Frédéric Berthelot Jean-Frédéric Berthelot 

Développement d’un plugin pour digiKam Iliya Ivanov Nathan Damie 



2012-06-04 14h00 M5-A7 21 Algorithmes de recherche locale pour l’optimisation Arnaud Liefooghe Arnaud Liefooghe Yoann Dufresne 


2012-06-04 15h00 M5-A7 7 Site Web 2.0 communautaire d'appariement 


2012-06-04 16h45 M5-A7 55 Module de construction moléculaire 3D Fabrice Aubert 

Sébastien Canneaux Nadir Cherifi Mohamed El-Amrani 

2012-06-04 17h15 M5-A7 69 Fabrice Aubert Fabrice Aubert 

Lecteur web de vidéos 360 et WebSocket Nathanaël Deboeuf Pierre Denquin 

2012-06-04 17h45 M5-A7 95 Pierre-Hubert Olivier 

[SCOTLER] Scotler C&C WA Céline Kuttler Jamal-Dine Youlhajen 

2012-06-04 14h00 M5-A9 144 [ALTERNANT] étudiant:Kévin Labat & entreprise:Audax Vincent Cordonnier Maude Pupin 

2012-06-04 14h30 M5-A9 145 [ALTERNANT] étudiant:Valentin Lecerf & entreprise:Laurent Vansuypeene Pierre Boulet 

2012-06-04 15h00 M5-A9 136 [ALTERNANT] étudiant:Nicolas Cousin & entreprise: Céline Bilasco 

Céline Kuttler 

Nicolas Cousin 

2012-06-04 16h15 M5-A9 146 [ALTERNANT] étudiant:Florian Ledoux & entreprise: Nicolas Ruff 

Sophie Tison Florian Ledoux 

2012-06-04 16h45 M5-A9 147 [ALTERNANT] étudiant:Benoit Petit & entreprise:Quadr Sébastien Lucas Laetitia Jourdan Benoit Petit 

2012-06-04 17h15 M5-A9 140 [ALTERNANT] étudiant:Guillaume Gallant & entreprFrançois Pasquereau Laetitia Jourdan 

Lionel Seinturier Lionel Seinturier Nicolas Crappe Alexandre Dubus 

Evelyne Ferot 

Remi Degruson 

Clément Pasek 

Développement d'un outil interactif de contrôle de la Valérie Leclère Olga Plechakova Chaste Isabane Anais Ngo-Xuan-Coi 

Implementation of a web server for protein function Marc Lensink Guillaume Brysbaert Qiang Liu Roshanak Gharagozlou 

Development of an XML language for protein function Marc Lensink Guillaume Brysbaert Doga Ozturk 

Optimisation de ressources dans les "clouds" GooglFrançois Clautiaux Arnaud Liefooghe Charle-Edmond Bihr 

Maxime Morge Maxime Morge Valentine Maillart Tristan Bourgois 

Middleware DDS en environnement métro Christophe Gransart Christophe Gransart Seilendria Hadiwardoyo 

Kévin Labat 

Valentin Lecerf 

Guillaume Gallant

Mardi 5 juin 


2012-06-05 08h00 M5-A9 122 Rubik Francesco Eric Wegrzynowski Oscar Gest 

Lego (A) De Comite 

Geoffrey Verhille 

2012-06-05 08h30 M5-A9 125 Rubik Lego (B) Francesco De Comite Eric Wegrzynowski 

Guillaume Macke 2012-06-05 09h00 M5-A9 24 Kinect Jean-Claude Tarby Jean-Claude Tarby Cyprien Cuvillier 

Applications Jean-Claude Tarby Jean-Claude Tarby Claude Saint-Georges 

2012-06-05 09h30 M5-A9 58 pilotées par le cerveau 

2012-06-05 10h45 M5-A9 77 Migration d’une base de données relationnelles en bas Céline Anas El-Achiqi 

Bilasco Marius Bilasco 

Marius Bilasco Marius Bilasco 

Warren Moreau 

Web of Metadata 2012-06-05 11h15 M5-A9 123 

2012-06-05 11h45 M5-A9 130 Reprise d'une application pour la génération du WHO Marius Frederic Bellano Bilasco Marius Bilasco Larbi Noufli 

[ALTERNANT] étudiant:Kévin Defives & entreprise:ORomain Lahoche 

Sophie Tison 

Kévin Defives 


entreprise: Jean-Jacques Decrucq Mikaël Salson 

2012-06-05 09h30 M5-A8 142 [ALTERNANT] étudiant:Geoffrey Hecht & Geoffrey Hecht 

2012-06-05 10h15 M5-A8 134 [ALTERNANT] étudiant:Christopher Coat & entreprisJonathan Christopher Coat 

Carpentier Patricia Plenacoste 

[ALTERNANT] étudiant:Guillaume Dauster & entrepr Jonathan Alexandre Sedoglavic Guillaume Dauster 

2012-06-05 11h15 M5-A8 137 Carpentier 

[ALTERNANT] étudiant:Vincent Herbulot & entreprisDesrumaux Romain Rouvoy Vincent Herbulot 


Sabine 

[ALTERNANT] étudiant:Henri Roussez & entreprise:Bertrand Hudzia Philippe Marquet Henri Roussez 


[ALTERNANT] étudiant:Amaury David & entreprise:OLaurent Decool 

Philippe Marquet 

Amaury David 



Maxime Colmant 

[ALTERNANT] étudiant:Maxime Colmant & entreprisRalida Azzi Xavier Le Pallec 

[ALTERNANT] étudiant:Kevin Guilbert & entreprise:Yann Marzack Xavier Le Pallec Kevin Guilbert 


Président : Xavier le Pallec / Laetitia Jourdan 

Président : Fabrice Aubert 

Président : Laurent Noé 

Président : Mikael Salson

Quelques points sur “l’administratif” 

1 

Licence Cybersécurité : 2012-2013 pour 

2013-2014 ... 

2 

UPMC : McF 

3 

PJI mes amis :-) 

4 

Reviews (que des graines cette année, ou 

presque ...) 


Quelques points sur la recherche 

1 

Mappi 

(slides de mi-parcours + cf Exposé Jenya) 

2 

Graines et Produit 

(draft) 

3 

Peptide Matching 

→ stage de Yoann Dufresne 



1 

Mappi 

(slides de mi-parcours + cf Exposé Jenya) 

2 

Graines et Produit 

(draft) 

3 


→ stage de Yoann Dufresne 


MAPPI 

5tâches, 

I Tâche 1 : Nouvelles structures d’index pour la recherche de 

motifs approchés 

I Tâche 2 : Mapping pour la métagenomique et la 

métatranscriptomique 

I Tâche 3 : Outils d’assemblage pour les NGS 

I Tâche 4 : Assemblage guidé de données de 


I Tâche 5 : Pipeline bioinformatique

MAPPI 

5tâches,celles que je vais décrire dans le contexte Lillois 

I Tâche 1 : Nouvelles structures d’index pour la recherche de 

motifs approchés 

I Tâche 2 : Mapping pour la métagenomique et la 


I Tâche 3 : Outils d’assemblage pour les NGS 

I Tâche 4 : Assemblage guidé de données de 


I Tâche 5 : Pipeline bioinformatique

Tâche 1 : Nouvelles structures d’index pour la recherche de motifs approchés 

Contexte : Read Mapping


Contexte : Read Mapping


Contexte : Read Mapping 

Réalisé : 

1. Portage de l’algorithme de Wu-Mamber sur GPU 

[Bit-Parallel Multiple Pattern Matching. T. T. Tran, M. Giraud, J.-S. Varré PPAM / 

PBC 2011.] 

2. Indexation des voisinages des k-mers 

But : profiter de l’efficacité du cache GPU/Processeur 

Deux méthodes d’indexation envisagées : 

I indexation directe (tri des mots → recherche dichotomique) 

I hachage parfait 

+ non encore publié mais des résultats : 

I mise en oeuvre en OpenCL (fonctionnelle sur CPU et GPU) 

I gain en performance entre x10 et x60 

I prototype de readmapper en cours 

[LIFL] Tuan Tu Tran, Mathieu Giraud, Jean-Stéphane Varré 

[LIAFA] Djamal Belazzougui, Mathieu Raffinot

900 

700 

500 

400 

300 

800 

200 

100 

1000 

600 

5’ 

1100 

1200 

3’ 

1800 

1700 

1300 

1400 

1500 

LEGEND 

count 

-------------------------------------- ----- 

100% gaps 0 

information content (bits): 

[0.000-0.400) 172 

[0.400-0.800) 205 

[0.800-1.200) 238 

[1.200-1.600) 259 

[1.600-1.990) 677 

[1.990-2.000] 330 

1600 

Tâche 4 : Assemblage guidé de données de métatranscriptomique 

Contexte : identification d’ARN ribosomiques (16S/18S,23S/28S...) 

Buts : 

I élimination 

I classification 

: Problème nouveau sur données de métatranscriptomique 

created by the SSU-ALIGN package (http://eddylab.org/software.html) 

structure diagram derived from CRW database (http://www.rna.ccbb.utexas.edu/)


Contexte : identification d’ARN ribosomiques 

Réalisé : 

I conception d’un filtre efficace pour la sélection des familles 

d’ARNr (SortMeRNA) 

I travail basé sur le Burst Trie et l’automate de Levenstein 

A C G U 

−1 −1 1 1 

NULL NULL 

010(x) 

{I 7 } 

x10(x) 

x1x(x) 

NULL NULL 

011x 

x11x 

{I 6 } 

x01x 

001x 

{I 5 } 

xx1x 

A C G U 

1 0 1 1 

NULL 

NULL NULL NULL NULL 

A C G U 

1 1 0 0 

NULL NULL 

NULL NULL NULL NULL 

[8] GGCUU [3] GGUAU 

111x 

{I 2 } 

101(x) 

101 

001 

{I 3 } 1x1x {M 12 } 

11x(x) 

111 

{I 1 } 

10(x)(x) 

[2] CAGC 

[4] AUCU 

[9] AGGC 

[7] UUU 

[6] CACG 

[1] UGAG 

[5] GUUU 

x01x 

x00(x) 

1(x)(x)(x) 

{M 8 } 

{I 4 } {M 13 } 

1 

x01 

{M 11 } 

x1xx 

{I 0 x1x 

} {M 9 } 

x 

x0 

{M 10 } 

x1


Contexte : identification d’ARN ribosomiques 

En cours : 

I communication aux London Stringology Days 

I publication en cours de soumission 

I séjour prévue au Génoscope pour la transition 

fin Tâche 4 / début Tâche 2, 10-13 avril 

[LIFL] Evguenia Kopylova (ANR), Laurent Noé, Hélène Touzet 

[GENOSCOPE] Olivier Jaillon

Spaced seed design for precise read-mapping on HMM profiles for 

NGS read-mapping 

efficient sliding window product on the matrix semi-group 

Laurent Noé 

May 10, 2012 

Abstract 

We propose a new method and an associated algorithm to efficiently compute seed sensitivity when 

considering that HTS reads are mapped along sub-parts of a known HMM alignment profile. This 

computation makes particularly sense with positioned spaced seeds. It relies both on automata theory 

(previous work [KNR06]) combined with a matrix product problem. 

Interestingly, it brings into light an “interval product problem” considered more than twenty years 

ago in [AS87], but in a “sliding window” form. We propose here an efficient algorithm to compute this 

sliding windows product using a linear number of products on the (associative, but non commutative 

and non invertible) matrix semi-group. 

This computational scheme is implemented in the ongoing 1.06 version of Iedera http://bioinfo. 

lifl.fr/yass/iedera.php. 

1 Introduction 

Spaced seed design remains an important, but complex and challenging problem. Many papers have been 

devoted to this subject (mainly this last decade), from the mere (but at first unintuitive) idea that such seeds 

were performing better [CR93, Buh02] and could be optimized [MTL02, BK01], to spaced seed sensitivity 

definition and computation [KLMT04], extended models of seeds and their computation [BBV05, Bro05, 

MGB06, CM07, YZ08, II09, KWS + 11], and given bounds and complexity problems investigated [FCLCST05, 

NR08, MY09, EM11]. Several software are now publicly available to design spaced seeds [SB05, NGK10, 

IIMB11] 1 . 

High throughput sequencing technologies have thrown a new light on the seed design process, mainly 

because reads obtained are of relative short length and quality labelled. Some of the most sensitive algorithms 

to map such reads onto related genomes use spaced seeds (SHRiMP [RLD + 09], ZOOM [LZZ + 08], 

BFAST [HMN09], PerM [CSC09], LAST [KWS + 11], SToRM [NGK10], ...), 

But most of the regular seeds designed within these tools are based on the assumption that the mapped 

alignment profile remains “unknown”, thus prefering a i.i.d “randomly” generated profile. There are several 

(if not many) cases where this assumption can be removed due to a known profile of what is searched / 

filtered out (prior knowledge on the sequences being searched). 

We propose in the main part of this paper an extended method to efficiently compute seed sensitivity or 

lossless property when considering that short reads are mapped on substrings of a known HMM alignment 

profile. This computation is especially usefull when designing positioned spaced seeds, it relies mainly on 

dynamic programming on automata, that can be computed by a set of matrices product along overlapped 

intervals. 

DRAFT 

1 Currenlty, more than one hundred references have been directly related to the spaced seeds problem, see for example 

http://www.lifl.fr/~noe/spaced_seeds.html 

This “interval product problem” has been considered in [AS87] and the authors provide an efficient solution 

in term of preprocessing, in order to be able to answer any query product with a given constant number 

of products bound k. We propose here to consider this “interval product problem” with an incremental 

aspect, using a form of “sliding window”, and propose an efficient algorithm to compute it using a linear 

number of product on the (associative, but non commutative and non invertible) matrix semi-group. 

In part 2, we give a brief recall of the seed design principle focussing on the seed sensitivity computation. 

We than propose the (matrix) product problem in part 3, and show how it can be solved. Finally, in part 

4, we give some measurments on a practical implementation included in the ongoing 1.06 version of Iedera 

http://bioinfo.lifl.fr/yass/iedera.php. 

2 Seed design process 

Spaced seeds are a now frequently used hashing technique for biological sequence analysis. Their implementation 

(as direct hashing) is straitforward and brings high sensitivity for the same theoretical selectivity. 

Interestingly, in practice, a lightly reduced computational cost can also be observed when using spaced seeds 

compared to contiguous seeds of the same weight. 

Spaced seeds have been generalized by several extended seed models (Vector seeds [BBV05], Indel 

seeds [MGB06], Subset seeds [KNR06, ZF07, YZ08], Neighbor seeds [CM07]). To increase the overall sensitivity, 

they can usually be designed jointly as multiple seeds [YWC + 04, SB05], and (on quality labelled 

sequences) as positioned seeds [LZZ + 08, NGK10]. 

In addition to the seed model, one need a selection criterion for good seed shapes : this criterion is 

(almost always) established on a model of alignment being searched (usualy a word on a match/mismatch 

binary alphabet), itself “weigthed” by a probabilistic model. Here again the initially proposed i.i.d. Bernoulli 

model [KLMT04] has been extended into Markov models [BKS05] and HMM [BBV04] models, with several 

extensions [MB07, CP10]. 

In practice the considered criterion to select good spaced seed shapes is “the probability to hit at least 

once” (sensivity), or the guaranty to hit always once (lossless property) 

Such criterion can be measured by a dynamic programming algorithm on automata, with a probabilistic 

model (a probabilistic automaton, eg HMM (vinar) ) - represented by regular expressions - computation 

involved - 

3 Matrices product 

Finite Automata are frequently represented by Matrices (obviously sparse matrices when DFA are used). 

Matrices are in practice multiplied or powered, in such a way that properties of the initial languages of 

the Finite Automata are computed on “semi-rings” : for example, probabilities are computed on a classical 

semi-ring (E = R0applerapple1,⊕ =+,⊙ =0,1⊙ = 1), whereas costs are computed on a tropical semiring 

[Sim88, Pin98, MS09] (E = N,⊕ = min,⊙ = 1,1⊙ = 0). Sometime (but not always), 

=+,0⊕,✏⊙ 

=.,0⊕,✏⊙ 

on tropical semi-rings, such costs are log probability ratios; in that case, the underlying problem one has to 

solve is to find the best path (if any) in term of expected value. 

More generally, on both classical and tropical semi-rings, the same algorithm can be applied to compute 

seed sensitivity [KNR06] for (what is commonly named) lossy (classical semi-ring) or lossless (tropical 

semi-ring) seed design framework. 

On the classical semi-ring, HMM models (HMM alignment models) are frequently used in language 

recognition and seed sensitivity computation [BBV04, KNR06, HR08] : they give a set of probabilities 

(emissionprobabilitiesforeachstate,togetherwithtransitionprobabilitiesbetweenstates)thatarecomputed 

out of a “profile” alignment. But when such HMM models have to be used with NGS reads to design seeds, 

one has to face a new problem : taking into account the fact that the read can be any sub-string generated 

by the HMM alignment model, and thus that the computation may start at any “position” on the alignment 

HMM : in some way a more challenging problem. 

DRAFT 

1 

2

3.1 Sliding windows product 

Such computation, translated into matrix form, implies to compute, for an ordered set of (non-invertible) 

matrices M1,M2,...,Mn, a set of products in the two following forms : 

either : 

Problem. 

where w is the length of the read, 

Problem. 

i+w Y 

compute 

compute 

u=i 

Mu 8i 2 [1..n−w] (1) 

or more generally : 

j(t) Y 

u=i(t) 

such that i(t) and j(t) are two monotonically ( +0 

+1 )-increasing functions. 

Mu 8t with i(t) apple j(t) (2) 

The definition (2) suits particularly well when the length of the read is not fixed : for example with 454 

sequencing process where homo-polymers are read in a single step, and thus give variable read lengths. In 

other words, the definition (1) is just a special case of (2), where after increasing the j up to w,astepwise 

increment of both i and j is applied. We will thus consider the second definition (2) in the next parts. 

3.2 Previous work for the Online query product after preprocessing 

Alon and Schieber [AS87] have proposed an Online optimal way to answer any (non-commutative) product 

Q j 

t=iMt for any i and j in a constant k number of products, after a preprocessing in Θ(n.λ(k,n)) where 

λ(k,n) is defined as the inverse of a certain function at the b k 2c-th level of the primitive recursive hierarchy. 

For example λ(0,n)=d n 2 e λ(1,n)=dp ne λ(2,n) = log(n) λ(3,n) = loglog(n) λ(4,n) = log ⇤ (n). 

This fit perfectly when the length of the windows and its position are randomly drawn. But when there 

are dependencies on the positions of the windows, a sliding windows product may be more appropriate. 

3.3 Algorithm proposed to compute the Sliding windows product 

In our case Online query is not required so we can avoid doing both preprocessing and processing by using 

an “Online sliding window product” that moves separately or conjonclty the two ends of the windows : it 

costs an amortized constant number of products on problem 1 and problem 2. 

This process does not depends in the size of the sliding window in the second problem (which can be 

asymptotically improved otherwise, using similar approach of [AS87]). We are here able to move both left 

and right ends i and j of the window step-wise, keeping a set of matrices, and computing the product for 

any of the windows obtained. 

U(k) definition and “pre”-processing : the main additional data used is to preprocess and keep 

a set of block products U(k) (for k 2 [i..j]) as shown on Figure 1. U(k) is defined as the product of 

a given contiguous block of matrices of size u(k,j) starting from k (to k + u(k,j) − 1). More precisely 

U(k)= Q k+u(k,j)−1 

t=k Mt. u(k,j) is defined as the largest possible value 2 p such that k + u(k,j) − 1 apple j 

and that u(k,j)dividesk.SuchU(k) blocks are thus of size u(k,j)=2 p and this size, once fixed can only 

increase (by doubling) depending on j value, before disapering (when i>k). 

Maintaining such matrices U(k) for k 2 [i..j] does only cost at most (in amortized analysis) one product 

per increase of j (see Appendix 6.2). Note that increasing i simply deletes the last U(i) and thus does not 

DRAFT 

U[0] 

Figure 1: U(k) matrices: example when i = 0 and j = 24 

17 

16 

15 

14 

13 

12 

11 

10 

09 

08 

07 

06 

05 

04 

03 

02 

01 

00 

25 

29 

28 

27 

26 

25 

24 

23 

22 

21 

20 

19 

i=0 j=24 

U[1] 

U[2] 

U[3] 

U[4] 

U[5] 

U[6] 

U[7] 

U[8] ... 

U[16] 

18 

U[24] 

any additional product on the U(k)’s. A pseudo-code of the add right process (increment of j)isprovided 

in Algorithm 1. 

Algorithm 1: add right : increments the right border j by one, and updates the set Ui..j using 

the matrix Mj 

Data: the set of matrices M1,M2,...,Mn, the original set Ui..j 

Result: the updated set Ui..j 

/* a) only before the first increment */ 

if j =0then 

U0 M0; 

/* b) increment j */ 

inc(j); 

/* c) and process the set of Uj−t matrices that have to be updated */ 

Uj Mj; 

u j +1;told 0;t 1; 

while u is even and j −t ≥ i do 

Uj−t ; Uj−t.Uj−told 

told t ; t 2.t+1;u u/2; 

Without considering any previous computation kept, it is directly possible to compute the product 

Mi.Mi+1···Mj for any i,j (j>i)inO(log(j − i)) products using the updated U(k) set of matrices for 

k 2 [i..j] (see Appendix 6.1). 

But when the product is computed when i and j follow the “increasing step”-functions as defined before, 

the number of products can be reduced to constants for each i and j step-move (or for both moves when the 

distance w separating i and j is fixed) : 

DRAFT 

middle definition : we need to define here the middle m of i and j as the beginning position of the 

maximal (in size) U-block included in the interval i..j. In other words, m corresponds to the value between 

i and j that can be the “most factorized by 2”. If two maximal blocks are between i and j, we choose the 

beginning of the second block (see Figure 3.3) (as it always corresponds to the value m that can be the 

“most factorized by 2”). This middle border enable to split the computation in two parts when needed, that 

we will call left (colored in green on Figure 3.3) and right (red on Figure 3.3). Note that m< 1 3 i + 2 3 j. 

Note also that when there is only one maximal sized block, that m< 1 2 i+ 1 2j, and when there are two 

maximal sized blocks, that m> 2 3 i+ 1 3 j. 

In the next part, we will compute in two separate parts Mi..m−1 and Mm..j, considering the case when 

m is fixed first, and then two cases when m is increased. 

3 

4

04 

03 

02 

01 

00 

i=1 

U[1] 

U[2] 

U[3] 

Figure 2: U(k) matrices: example when i = 1 and j = 24 

05 

U[4] 

U[5] 

13 

12 

11 

10 

09 

08 

07 

06 

U[6] 

U[7] 

U[8]... 

24 

14 

m=16 

29 

28 

27 

26 

25 

24 

23 

22 

21 

20 

19 

18 

17 

16 

15 

U[16] 

U[24] 

middle unchanged : if we suppose that the middle m does not change during a computational step, 

it can be observed that : 

j=24 

• when j is increased (so that j = jold +1), updating the product Mm..j can be done with one product, 

considering that we keep the previous computation . Thus considering that we also update the 

Mm..jold 

U(k)’s values at the same time, an amortized single product must be added (Amortization on j :see 

Appendix 6.2). 

Joining Mi..m−1 with Mm..j then costs one extra product, giving a total number of products of three. 

• when i is increased (i = iold + 1), previous computation Miold..m−1 does not help and can be erased 

here. However, if we suppose that we keep all the previous computed products Mk..m−1 in a stack for 

all the blocks Uk visited before, reusing and updating this part can be done with one single amortized 

product (Amortization on m : see Appendix 6.3). 

Joining Mi..m−1 and Mm..j then costs one extra product, giving a total number of products of two. 

At first sight, a {cost(i) apple i+m; cost(j) apple 3j} cost is applied (when m does not change). However, 

this computation has to be updated when m changes; this will be considered in the next part : 

middle changed : if we suppose that the middle m does change, previous computation cut in two 

parts Mi..m−1 and Mm..j is somehow “compromised”; Let now see when m change, and moreover why : 

• when m changes due to a j increase, as m follows the beginning of the largest right-most Uk block, 

j can increase the maximal block size by two, either without changing m (case handled before), 

or jumping to the next power of two block thus from mold = odd⇥2 p to m =(odd+1)⇥2 p = 

odd+1 

2 ⇥2 p+1 : The last case has no consequence on the product Mm..j that is immediately computed 

by the update of the U(k)’s values as Mm..j corresponds to a single maximal block in U(k), thus in 

one single product here (and not two). 

DRAFT 

However, moving m will obviously compromise the left stack of Mk..mold−1 previous computations that 

will now not help the computation of the next Mi..mold−1 on the next increase of i,sincemold is now 

pushed to the next power of two m, and can be erased. 

This extra cost can however be amortized by a 9 8 of ∆m where ∆m =representsthem increase 

(Amortization on m, see Appendix 6.4). Joining Mi..m−1 with Mm..j then costs one extra product. 

Finally, when m changes due to a j increase, a {cost(j) apple 2j + 9 8m} cost is applied. 

• when i is increased so that i>m(thus i = m+1), m can only “jump” to a next block of smaller 

size : the cost on the left stack [i..m−1] is already payed as it corresponds to a “legal” move of i that 

is amortized by one product as seen previously(Amortization on m : see Appendix 6.3). 

However, moving m will obviously compromise the right computation of Mmold..j since mold is now 

pushed to the next (smaller) block, and can be erased and recomputed. 

This cost can however be amortized by a 9 8 

Appendix 6.5). 

Joining Mi..m−1 and Mm..j then costs one extra product. 

of ∆m where ∆m = m increase (Amortization on m,see 

Finally, when m changes due to i increase, a {cost(i) apple i+ 9 8m} cost is applied. 

To conclude, a {cost(i) apple i+ 9 8 m; cost(j) apple 3j + 1 8m} cost is applied. 

4 Practical implementation 

First, it is very likely that these bounds can be improved by a more precise analysis; However going under a 

bound of 3.0 per move is unlikely, at least without any initial amortized costs, since we have found at least 

one example such that the number of product is 3.00325 per move 2 . 

Moreover when j is increased by “runs” while i is fixed, the proposed algorithm can be enhanced with a 

gready computation of the Mi..j product (that can be done quickly provided that i is fixed for a while). In 

practice, this implementation gives always less products than the proposed one, but has not been carefully 

analysed by now. 

On the other hand, some more pratical considerations show also that, when applied on sparse matrices, 

such product cannot be considered as a “constant” operation, but more likely as a “function of the sparcity”. 

Such implementation needs however to know this “sparcity cost” for all the posible products, which, in 

practice on unknown automata, is similar to simulating the product, and thus costs as much as the product 

itself... 

5 Experiments 

The previous algorithm has been implemented and tested in iedera where 

Speedup in practice ... over the naive range product. 

On a typical example, for a windows of length 108 (that corresponds to Illumina read length here) and a 

profile of size 1605 (16S rRNA), number of products for the full computation of the 1605−108+1 = 1498 

windows is 5933 (note that each window need a displacement both on i and j). 

References 

[AS87] 

[BBV04] 

[BBV05] 

[BK01] 

DRAFT 

NogaAlonandBaruchSchieber. Optimalpreprocessingforansweringon-lineproductqueries. Technical 

Report TR 71/87, Inst. of Comp. Science, Tel-Aviv Univ., 1987. 

Broňa Brejová, Daniel G. Brown, and Tomáš Vinař. Optimal spaced seeds for homologous coding 

regions. Journal of Bioinformatics and Computational Biology, 1(4):595–610, Jan 2004. (earlier version 

in CPM 2003). 

Broňa Brejová, Daniel G. Brown, and Tomáš Vinař. Vector seeds: An extension to spaced seeds. 

Journal of Computer and System Sciences, 70(3):364–380, 2005. (earlier version in WABI 2003). 

StefanBurkhardtandJuhaKärkkäinen. Betterfilteringwithgappedq-grams. InProceedings of the 12th 

Symposium on Combinatorial Pattern Matching (CPM),volume2089ofLecture Notes in Computer 

Science, pages 73–85. Springer, July 2001. 

2 1,2,−1,3..24,−2,−3,25..51,−4,52..72,−5,73..392,−6,393..441,−7,−8,442..577,−9,578..3071 where i-moves are given 

with a minus notation 

5 

6

[BKS05] Jeremy Buhler, Uri Keich, and Yanni Sun. Designing seeds for similarity search in genomic DNA. 

Journal of Computer and System Sciences, 70(3):342–363, 2005. (earlier version in RECOMB 2003). 

[Bro05] Daniel G. Brown. Optimizing multiple seeds for protein homology search. IEEE/ACM Transactions 

on Computational Biology and Bioinformatics (TCBB), 2(1):29–38, january 2005. (earlier version in 

WABI 2004). 

[Buh02] 

[CM07] 

[CP10] 

[CR93] 

[CSC09] 

[EM11] 

Jeremy Buhler. Provably sensitive indexing strategies for biosequence similarity search. In RECOMB, 

Washington, DC (USA), pages 90–99. ACM Press, April 2002. 

Miklós Csűrös and Bin Ma. Rapid homology search with neighbor seeds. Algorithmica,48(2):187–202, 

Jun. 2007. (earlier version in COCOON 2005). 

Won-Hyoung Chung and Seong-Bae Park. Hit integration for identifying optimal spaced seeds. BMC 

Bioinformatics - Selected articles from the 8th Asia-Pacific Bioinformatics Conference (APBC), 18-21 

january, Bangalore, India,11(Suppl1):S37,2010. 

A. Califano and I. Rigoutsos. Flash: A fast look-up algorithm for string homology. In Proceedings of 

the 1st International Conference on Intelligent Systems for Molecular Biology (ISMB),pages56–64, 

July 1993. 

Yangho Chen, Tate Souaiaia, and Ting Chen. PerM: efficient mapping of short sequencing reads with 

periodic full sensitive spaced seeds. Bioinformatics,25(19):2514–2521,2009. 

Lavinia Egidi and Giovanni Manzini. Spaced seeds design using perfect rulers. In Proceedings of the 

18th International Symposium on String Processing and Information Retrieval (SPIRE), Pisa (Italy), 

volume 7024 of Lecture Notes in Computer Science, pages 32–43. Springer, 2011. 

[FCLCST05] Martin Farach-Colton, Gad M. Landau, Süleyman Cenk Sahinalp, and Dekel Tsur. Optimal spaced 

seeds for faster approximate string matching. In Proceedings of the 32nd International Colloquium on 

Automata, Languages and Programming (ICALP’05), Lisboa (Portugal),volume3580ofLecture Notes 

in Computer Science, pages 1251–1262. Springer, 2005. 

[HMN09] 

[HR08] 

[II09] 

[IIMB11] 

[KLMT04] 

[KNR06] 

[KWS + 11] 

[LZZ + 08] 

[MB07] 

[MGB06] 

Nils Homer, Barry Merriman, and Stanley F. Nelson. BFAST: An alignment tool for large scale genome 

resequencing. PLoS One,4(11):e7767,2009. 

Inke Herms and Sven Rahmann. Computing alignment seed sensitivity with probabilistic arithmetic 

automata. In Proceedings of the 8th International Workshop on Algorithms in Bioinformatics (WABI), 

Karlsruhe (Germany),volume5251ofLecture Notes in Bioinformatics, pages 318–329. Springer, Sept. 

2008. 

Lucian Ilie and Silvana Ilie. Fast computation of neighbor seeds. Bioinformatics,25(6):822–823,2009. 

Lucian Ilie, Silvana Ilie, and Anahita Mansouri Bigvand. SpEED: fast computation of sensitive spaced 

seeds. Bioinformatics,2011. 

Uri Keich, Ming Li, Bin Ma, and John Tromp. On spaced seeds for similarity search. Discrete Applied 

Mathematics, 138(3):253–263, 2004. (preliminary version in 2002). 

Gregory Kucherov, Laurent Noé, and Mikhail A. Roytberg. A unifying framework for seed sensitivity 

and its application to subset seeds. Journal of Bioinformatics and Computational Biology,4(2):553–569, 

November 2006. 

Szymon M. Kie lbasa, Raymond Wan, Kengo Sato, Paul Horton, and Martin C. Frith. Adaptive seeds 

tame genomic sequence comparison. Genome Research,21(3):487–493,2011. 

Hao Lin, Zefeng Zhang, Michael Q. Zhang, Bin Ma, and Ming Li. ZOOM! Zillions Of Oligos Mapped. 

Bioinformatics,24(21):2431–2437,2008. 

Denise Y.F. Mak and Gary Benson. All hits all the time: parameter free calculation of seed sensitivity. 

In D. Sanko↵, L. Wang, and F. Chin, editors, Proceedings of the 5th Asia Pacific Bioinformatics 

Conference (APBC),volume5ofAdvances in Bioinformatics and Computational Biology,pages327– 

340. Imperial College Press, 2007. 

DeniseY.F.Mak, YevgeniyGelfand, andGaryBenson. Indelseedsforhomologysearch. Bioinformatics, 

22(14):e341–e349, 2006. 

DRAFT 

[MS09] Diane Maclagan and Bernd Sturmfels. Introduction to tropical geometry. (draft book-in-progress), 2009. 

[MTL02] Bin Ma, John Tromp, and Ming Li. PatternHunter: Faster and more sensitive homology search. 

Bioinformatics,18(3):440–445,2002. 

[MY09] Bin Ma and Hongyi Yao. Seed optimization for i.i.d. similarities is no easier than optimal Golomb ruler 

design. Information Processing Letters,109(19):1120–1124,2009. 

[NGK10] 

[NR08] 

[Pin98] 

[RLD + 09] 

[SB05] 

[Sim88] 

Laurent Noé, Marta Gîrdea, and Gregory Kucherov. Designing efficient spaced seeds for SOLiD read 

mapping. Advances in Bioinformatics, 2010:ID 708501, July 2010. 

François Nicolas and Éric Rivals. Hardness of optimal spaced seed design. Journal of Computer and 

System Sciences, 74(5):831–849, Aug. 2008. (earlier version in CPM 2005). 

Jean-Éric Pin. Tropical semirings. In J. Gunawardena, editor, Idempotency,volume11ofPubl. Newton 

Inst., pages 50–69, Bristol, 1998. Cambridge Univ. Press. 

Stephen M. Rumble, Phil Lacroute, Adrian V. Dalca, Marc Fiume, Arend Sidow, and Michael Brudno. 

SHRiMP: Accurate mapping of short color-space reads. PLoS Comput Biol,5(5):e1000386,052009. 

Yanni Sun and Jeremy Buhler. Designing multiple simultaneous seeds for DNA similarity search. 

Journal of Computational Biology, 12(6):847–861, 2005. (earlier version in RECOMB 2004). 

Imre Simon. Recognizable sets with multiplicities in the tropical semiring. In Mathematical foundations 

of computer science, 1988 (Carlsbad, 1988),volume324ofLecture Notes in Comput. Sci.,pages107– 

120. Springer, Berlin, 1988. 

[YWC + 04] I-Hsuan Yang, Sheng-Ho Wang, Yang-Ho Chen, Pao-Hsian Huang, Liang Ye, Xiaoqiu Huang, and Kun- 

Mao Chao. Efficient methods for generating optimal single and multiple spaced seeds. In Proceedings 

of the IEEE 4th Symposium on Bioinformatics and Bioengineering (BIBE), Taichung (Taiwan),pages 

411–416. IEEE Computer Society Press, 2004. 

[YZ08] 

[ZF07] 

Jialiang Yang and Louxin Zhang. Run probabilities of seed-like patterns and identifying good transition 

seeds. Journal of Computational Biology, 15(10):1295–1313, Dec. 2008. (earlier version in APBC 2008). 

Leming Zhou and Liliana Florea. Designing sensitive and specific spaced seeds for cross-species mRNAto-genome 

alignment. Journal of Computational Biology, 14(2):113–130, Mar. 2007. 

DRAFT 

7 

8

x−1 

6 Appendix 

6.1 Worst case number of products from i to j 

We denote by n the number of single matrices : n = j − i +1(n is thus the length of the block being 

computed with help of the Uk matrices already given). We illustrate below how to obtain the smaller size n 

according to the number of product x. 

• if xhis odd, i the worst case is produced by a concatenation of blocks of size 2 i on both ends, for 

i 2 0.. x−1 

2 (see Figure 3 for x = 5): 

00 

01 

02 

Figure 3: U(k) matrices and product: example when i = 9 and j = 23 

03 

04 

05 

06 

n = 2 

07 

08 

i=9 

2X 

i=0 

09 

10 

x 

11 

12 

x 

13 

14 

15 

16 

17 

n=14 

x 

18 

19 

DRAFT 

20 

x 

21 

22 

x 

23 

24 

25 

j=23 

2 i = 2 p ⇣ 

2⇥2 x n+2 

⌘ 

2 −2 x = 2log 2 

• If x is even, the worst h case i is produced by a concatenation of blocks of size 2 i on both ends of a block 

of size 2 x 2, for i 2 0.. x−2 


00 

01 

02 

26 

2 p 2 

Figure 4: U(k) matrices and product: example when i = 9 and j = 19 

03 

04 

05 

06 

n = 2 

07 

08 

i=9 

x−2 

2X 

i=0 

09 

10 

x 

11 

12 

x 

13 

14 

n=10 

15 

16 

x 

17 

18 

x 

19 

20 

21 

j=19 

⇣ 

2 i +2 x 2 = 3⇥2 x n+2 

⌘ 

2 −2 x = 2log 2 

3 

22 

23 

24 

25 

26 

27 

27 

28 

28 

29 

29 

Combining those two cases, it can be shown that when the number of product is set to x =1,2 or 3, 

then the minimal size is exactly 2⇥x, and also that when x>3 (or x = 0) that this minimal bound is never 

reached again. 

Figure 5: minimal n (for x even and odd) functions compared to 2⇥x 

In other words, the number of products x is always apple n 2 . 

6.2 Amortized analysis of Uk blocks when i =0and j ≥ 0 

Summing the number of products needed when computing Uk should be 2 on average, and not 1 : a quick 

analysis shows that, indeed, if one product is done half of the time, two are done each 1/4, three done each 

1/8, and so on ... then the P 1 

u=1 u 

2 =2 u 

However here, we will show that amortized number of product when considering j is only 1. We use an 

amortized analysis by giving one coin each time j is increased (i is supposed to stay at 0 but this assumtion 

can be leaved since it can be seen as a worst case when updating Uk) to show than any sublock Uk will 

generate one extra coin, and thus grouped with its neigboor block in size (itself generating on extra coin), 

the cost of the father block processed with those two is also generating (1+1)−1 = one extra coin. 

DRAFT 

• this is true for blocks of size 2 since they are build of blocks of size 1 that do not generate any product 

: the cost for such block of size 2 is thus 1, and 1 extra coin remains. 

• this can be easily verified for blocks of size 2 p (p>1), since by induction hypothesis the two sub-blocks 

of size 2 p−1 give each one extra coin : the cost associated when joining the two sub-blocks then removes 

one coin, and one extra coin remain again. 

Note that this analysis can be set for any i ≥ 0 and any j>iprovided that at first an extra number of 

j −i coins is provided. 

6.3 Amortized analysis of the left Mi..m−1 blocks when m fixed and i increased 

Summing the number of products needed to when computing Mi..m−1 for any i from 0 to m is 1 on average 

: a quick analysis shows indeed that if zero product is done half of the time (when i is even), one product is 

done each 1/4, two done each 1/8, and so on ... then P 1 u 

u=0 2u+1 = 1. 

9 

10

But this does not guaranty that the total number of product payed when increasing i from any value (for 

example 0) to m is always less than m. Here we will show that the number of product (once m is fixed) for 

computing Mi..m−1 for any i from 0 to a given m =2 p is apple 2 p −p−1. 

m =2! 0 

m =4! 1 

m =8! 4 

m = 16 ! 11 

A similar method to section 6.2 can be applied. 

First we consider the case when i = 0 and m has been increased to reach a given (and fixed) value 2 p . 

• this is true when p =1(thuswhenm = 2) since, using Uk blocks, it needs no product to compute 

M0..1 and M1..1. 

• this can be verified for blocks of size 2 p (p>1), since we can then use the two sub-blocks of size 2 p−1 : 

when i is within the first sub-block, as the product is done from m to i and stacked in such way that any 

suffix Mk..m in kept, it costs the product produced by this sub-block (2 (p−1) −(p−1)−1) added to the 

log2( m 2 )=p−1 extra products to cover the second sub-block of size 2p−1 ;wheni is within the second 

sub-block, exactly the number of products produced by this sub-block ⇣ (2 (p−1) −(p−1)−1). ⌘ Thus when 

summing these two quantities, the number of product is apple 2⇥ 2 (p−1) −(p−1)−1 +(p−1) = 2 p −p−1 

Thus, increasing i from any value ≥ 0tom and computing all the possible products (with the help of 

the Blocks Uk)isapple m−log2(m)−1, and thus costs less than m. 

Note that this analysis can be set for any i ≥ 0 and any m (not necesseraly represented as a strict power 

of 2 , but as m = a⇥2 p such that 2 p is the maximal block size of Uk for k 2 [i..j]). 

6.4 Amortized analysis of the left Mi..m−1 blocks when m is increased (due to a 

j increase) and i is fixed 

When j increases while i is fixed, m may change to a new (and of course increased) value pointing to an 

equal (or twice larger block) : this appends when m goes from mold =2 pold ⇥aold (with aold odd), to its new 

value m = mold+2 pold =(aold+1)⇥2 pold = a⇥2 p (with a = aold 

2 and p = pold+1), as illustrated on Figure 

6. 

Figure 6: U(k) matrices and Mi..m−1 product : example when i = 33 and j goes from 47 to 48 

32 

33 

34 

DRAFT 

i=33 

x 

35 

36 

x 

37 

m_old=36 

38 

39 

40 

24 

m=40 

41 

42 

43 

44 

45 

46 

47 

48 

49 

50 

j=48 

j_old=47 

We are here interested in the computation of Mi..m−1 due to this ∆m = m−mold =2 pold increase. In 

practice, since m has changed, the full set of left stack matrices Mk..m−1 has to be recomputed for some 

k 2 [i..m], and some products already done Mk..mold−1 have to be redone unfortunately twice here. 

51 

This twice-cost is at most log2( mold−i+1 

2 ) apple log2( m−mold 

2 )=log2( ∆m 2 

)(mold −imwhile j is fixed, m must change to a new (and of course increased) value 

pointing to a smaller block as illustrated on Figure 7. This appends when m goes from mold =2 pold ⇥aold 

to its new value m = mold +2 pold =(aold +1)⇥2 pold , as illustrated on Figure 7. 

Compared to the case previously seen on Appendix 6.4, there is no twice cost on the left-stack, so this 

part is still be amortized within the Section 6.3. However the right part has now to be recomputed. 

Figure 7: U(k) matrices and Mm..j product : example when j = 47 and i goes from 32 to 33 

i_old=32 

32 

33 

i=33 

m_old=32 

34 

35 

36 

37 

38 

39 

40 

41 

42 

m=40 

We are here interested in the computation of Mm..j due to this m change (red mark on Figure 7) 

This cost is at most log2( j−m+1 

2 ) apple log2( m−mold 

2 )=log2( ∆m 2 )(j −m

Using Appendix 6.2 and Appendix 6.5 (in a similar way to 6.4), moving i implies thus 

• a1(peri increase) +1 (per ”m amortized increase”, see section 6.3) cost when m is fixed, 

• a1(peri increase) +log2( ∆m 2 ) cost when m is increased by ∆m =2p . 

A rapid analysis of the two cases combined shows that this cost can be bounded by 1⇥i+ 9 8 ⇥m (this worst 

case can be produced when ∆m = 8 or ∆m = 16). This cost is thus apple 11 

8 ⇥i+ 3 2⇥j+i 

4 ⇥j (since m apple 3 ) and 

can be roughly bound by 2 1 8 per j increase (since i apple j). 

DRAFT 

13






Peptides à matcher

...

...

ce coup-ci ... pas trouvé d’excuse de dernière minute 



merci pour les nombreux PJIs encadrées ou présidés cette année !! 




1 on recherche un nouveau président de PJI pour l’année prochaine 

(départ Gery) 




1 on recherche un nouveau président de PJI pour l’année prochaine 

(départ Gery) 

2 on recherche un nouveau repreneur du module 

(qui est dans cette salle) 


Mardi 28 mai 

Date Heure Salle Projet Titre Auteur Responsable Etudiant1 Etudiant2 

2013-05-28 16h00 M5-A8 127 [ALTERNANT] étudiant:Stephane Drubay & entreprise:OdysysPierre-Eric Marez Philippe Marquet Stephane Drubay 

2013-05-28 16h30 M5-A8 132 [ALTERNANT] étudiant:Laura Leclercq & entreprise:Odysys Pierre-Eric Marez Philippe Marquet Laura Leclercq 

2013-05-28 17h00 M5-A8 137 [ALTERNANT] étudiant:Kévin Moulart & entreprise:Proges PluPhilippe Viot 

Maude Pupin Kévin Moulart 

2013-06-03 17h30 M5-A8 134 [ALTERNANT] étudiant:Laurent Leleux & entreprise:Proges PlMaryvonne Viot Maude Pupin Laurent Leleux 

Mercredi 29 mai 


2013-05-29 09h00 M5-A9 22 Évolution de l'application de gestion du personnel de l'IEEA Jean-Christophe RoutierJean-Christophe RoutierMélissa Blain Jérôme Wyckaert 

2013-05-29 09h30 M5-A9 89 Annuaire de l'association des anciens de la MIAGE de Lille Anne-Cécile Caron Anne-Cécile Caron Nicolas VandemeulebrouFlorian Bruffaert 

2013-05-29 10h30 M5-A9 111 Outils de communication pour l'association AVERS Anne-Cécile Caron Anne-Cécile Caron Mamadou Bachir Bah 

2013-05-29 11h00 M5-A9 90 Génération de documents pédagogiques Anne-Cécile Caron Anne-Cécile Caron Maxime Boucher Latifou Sano 

2013-05-29 14h00 M5-A9 86 Amélioration d'un logiciel de visualisation d'orbite Florent Deleflie Francesco De Comité Romain Frangi Dimitri Descamps 

2013-05-29 14h30 M5-A9 112 Concours Infotel Anne-Cécile Caron Anne-Cécile Caron Christopher Laethem Zakariae Azaroual 

2013-05-29 15h00 M5-A9 113 Concours Infotel (suite) Anne-Cécile Caron Anne-Cécile Caron Nassim Hassaine Zouhair Makhout 

2013-05-29 15h30 M5-A9 45 Analyse automatique de l'historique Git des logiciels Martin Monperrus Martin Monperrus Sylvain Magnier Maxence Montauzan 

2013-05-29 16h00 M5-A9 95 Export de code source Python en XML Martin Monperrus Martin Monperrus Pierre Frayer 

Antoine Goubel 

Jeudi 30 mai 


2013-05-30 09h30 M5-A9 93 Optimisation de flots 

François Clautiaux Marie-Emile Voge Irina Bakardzhieva Ophélie Debiève 

2013-05-30 10h30 M5-A9 77 Suivi d'accueil des enfants dans un centre périscolaire - facturPeriscope Marius Bilasco Rémi Kaczmarek Maxime Vanpeene 

2013-05-30 11h00 M5-A9 115 Reprise application de gestion de listes de présences alternanMarius Bilasco Marius Bilasco Alexis Boutrouille Pierre Bailleul 

2013-05-30 14h00 M5-A9 116 Application web de gestion de suivis de recherche de stage Patricia Plénacoste Maude Pupin Soufiane Agadr Thomas Aubry 

2013-05-30 14h30 M5-A9 92 Création d’une base de données sur la glycosylation du poissoYann Guerardel Olga Plechakova Karl Deleforterie Franck David 

2013-05-30 15h30 M5-A9 23 Base de données et données géographiques Francis Bossut Francis Bossut Pierrick Lesage Alexandre Bienvenu 

2013-05-30 16h00 M5-A9 99 Site web de la MDE Eric Bros 

Raphaël Marvie Djamel Amara 

Vendredi 31 mai 


2013-05-31 08h30 M5-A7 2 Reconstituer le puzzle : depuis des fragments jusqu'à l'ARN Mikaël Salson Mikaël Salson Charles Husquin 

2013-05-31 09h00 M5-A7 4 Alecsia apprend à lire les ODT et PDF Mikaël Salson Mikaël Salson Anthony Tonglet 

2013-05-31 10h15 M5-A7 78 Evolution de l'application de suivi d'alternants et stages Marius Bilasco Marius Bilasco Ayoub Nejmeddine Sara El-Arbaoui 

2013-05-31 10h45 M5-A7 81 Take a photo for me Marius Bilasco Marius Bilasco Jérémie Samson Victor Paumier 

2013-05-31 11h15 M5-A7 82 Interagir avec votre ordinateur de la tête 

Marius Bilasco Marius Bilasco Mamadou Diop 

2013-05-31 11h45 M5-A7 84 Analyse contextuelle de collections de photos privées Marius Bilasco Marius Bilasco Benjamin Allaert Benjamin Flahauw 

2013-05-31 14h30 M5-A7 6 Frameworks PHP et back-offices pour applications mobiles Jean-Claude Tarby Jean-Claude Tarby Omar Chahbouni Abderrahime El Idrissi 

2013-05-31 15h00 M5-A7 8 Intégration des ondes cérébrales dans la vie courante Jean-Claude Tarby Jean-Claude Tarby Mickaël Duruisseau Nicolas Coyard 

2013-05-31 16h15 M5-A7 68 Conception d'un Raspberry pi dédié aux présentations Bruno Bogaert Bruno Bogaert Louis Billiet Sylvain Goulliart 

2013-05-31 16h45 M5-A7 69 Écosystème pour gestion d'emploi du temps hebdomadaire Bruno Bogaert Bruno Bogaert Dhia Elhak Lakhal Sylvain Malfait 

2013-05-31 10h15 M5-A8 46 Intégration de Drone à une plateforme logicielle 

Gwenael Cattez Gwenael Cattez Ali Hedjaz Tony Tran 

2013-05-31 10h45 M5-A8 65 Moteur de scripts sous iOS Nicolas Haderer, RomainRomain Rouvoy Benjamin Digeon Florent David 

2013-05-31 11h15 M5-A8 66 Utiliser les téléphones mobiles pour l’estimation de la densité dNicolas Haderer Romain Rouvoy Julien Duribreux Justin Dufour 

2013-05-31 14h00 M5-A8 34 Interface de visualisation de molécules Maude Pupin Laurent Noé 

Antonia Ludunge 

2013-05-31 14h30 M5-A8 91 Pipeline d'analyse de régions de cassures 

Jean-Stéphane Varré Jean-Stéphane Varré Gauvain Marquet 

2013-05-31 15h00 M5-A8 30 Robot lego solveur de Sudoku Francesco De Comité Leopold Weinberg Oulamine Youssef El Achiqi Anas 

2013-05-31 16h15 M5-A8 37 Traitement semi-automatique des feuilles de présence Géry Casiez Géry Casiez Alexis Linke 

Maxence Gaudry 

2013-05-31 16h45 M5-A8 3 Conception d'un reseau social orienté vidéo Antoine Thomas Antoine Thomas Emmanuel Pede Thomas Besset 

2013-05-31 08h30 M5-A9 105 Suivi d'un capteur en 3D a l'aide d'une webcam 

Jean Rioult Sébastien Ambellouis Matthieu Fesselier Guillaume Huylebroeck 

2013-05-31 09h00 M5-A9 110 Algorithmes de placement en deux dimensions 

François Clautiaux François Clautiaux Romain Windels 

2013-05-31 10h15 M5-A9 70 Home Cloud Server Cedric Dumoulin Cedric Dumoulin Lison Gallos Arnaud Caulier 

2013-05-31 10h45 M5-A9 27 Framework de modélisation dans les Tablettes Android Amine El Kouhen Cédric Dumoulin Malika Rakhaoui ‎ Fatou-Laye Mbaye 

2013-05-31 11h15 M5-A9 71 Etude de la spécification des représentations arborescentes Cedric Dumoulin Cedric Dumoulin Adrien Burillon Thomas Camberlin 

2013-05-31 11h45 M5-A9 72 Generateur de GUI Android Cedric Dumoulin Cedric Dumoulin Gerard Paligot 

2013-05-31 14h00 M5-A9 33 Intégration du support multitouch dans Pharo Stéphane Ducasse Stéphane Ducasse Francois Lepan Benjamin V. Ryseghem 

2013-05-31 14h30 M5-A9 50 Interaction Kinect pour une application ludique Samuel Degrande Patricia Plenacoste Thomas Crepel Rémi Boens 

2013-05-31 15h00 M5-A9 108 Développement d'un plugin Eclipse de transformation et d'anaMartin Monperrus Benoit Cornu 

Amina El-Mekky Ouardia Ma-Z 

Lundi 3 juin 


2013-06-03 09h00 M5-A8 138 [ALTERNANT] étudiant:Augustin Petre & entreprise:DecathlonJulien Mouchon Jean-Claude Tarby Augustin Petre 

2013-06-03 09h30 M5-A8 124 [ALTERNANT] étudiant:Olivier Debreu & entreprise:Noolitic Sylvain Deceuninck Gilles Grimaud Olivier Debreu 

2013-06-03 10h15 M5-A8 135 [ALTERNANT] étudiant:Alexandre Loywick & entreprise:GenesGaël Even Mikaël Salson Alexandre Loywick 

2013-06-03 10h45 M5-A8 123 [ALTERNANT] étudiant:Tristan Cavelier & entreprise:Nexedi Jean-Paul Smets Mikaël Salson Tristan Cavelier 

2013-06-03 11h45 M5-A8 141 [ALTERNANT] étudiant:Dominique Testelin & entreprise:Idees3Guillaume Palamin Fabrice Aubert Dominique Testelin 

2013-06-03 14h00 M5-A8 136 [ALTERNANT] étudiant:Nathanael Martin & entreprise:Unis Michaël Macquart Yves Roos 

Nathanael Martin 

2013-06-03 15h00 M5-A8 143 [ALTERNANT] étudiant:Donovan Watteau & entreprise:Cerise Gauthier M Dequidt Arnaud Liefooghe Donovan Watteau 

2013-06-03 14h00 M5-A7 52 Recherche de candidats/jobs sans contact Nabil Djarallah, Nicolas HNabil Djarallah Gens Maxime Camille Riquier 

2013-06-03 14h30 M5-A7 53 API de contrôle de drones volants Nabil Djarallah, Nicolas PNicolas Petitprez Mohamed Ouannane Jeremy Diaz 

2013-06-03 15h00 M5-A7 54 Petites annonces en réalité augmentée Nabil Djarallah, Nicolas PNicolas Petitprez Alexandre Raulin Yann Duval 

2013-06-03 16h15 M5-A7 118 Intégration du uPnP dans le serveur embarqué SMEWS Gilles Grimaud Gilles Grimaud Edouard Berton Nicolas Ryckembusch 

2013-06-03 16h45 M5-A7 119 Interface graphique en python pour la commande de compilatiGilles Grimaud Gilles Grimaud Rabab Bouziane Narjes Jomaa 

2013-06-03 14h00 M5-A9 20 Capture de mouvement 3D avec une caméra Microsoft KinectHazem Wannous Hazem Wannous Derek Hendrickx Benjamin Makusa 

2013-06-03 14h30 M5-A9 107 Essayage 3D des lunettes virtuelles avec une caméra MicrosoHazem Wannous Hazem Wannous Pierre Villoutreix Maxime Chaste 

2013-06-03 15h00 M5-A9 31 Robot lego machine de Turing Francesco De Comité Eric Wegrzynowski Matthieu Poudroux Ronan Dhellemmes 

2013-06-03 16h15 M5-A9 55 Extraction d'information textuelles multilingue à partir de flux sLuigi Lancieri Luigi Lancieri Shichen Zhao Amira Kamli 

2013-06-03 16h45 M5-A9 56 Analyse du buzz sur twitter Luigi Lancieri Luigi Lancieri Florian Michiel Alessio Trunfio 

Mardi 4 juin 


2013-06-04 13h30 M5-A7 85 Plugin de visualisation 3D pour la consommation énergétique Romain Rouvoy Romain Rouvoy Aurore Allart Benjamin Ruytoor 

2013-06-04 14h00 M5-A7 109 Réseau de neurones artificiels pour reconnaissance d'émotionPierre Boulet Pierre Boulet 

Sanaa Mouatassim 

2013-06-04 14h30 M5-A7 10 IHM HTML5 pour un simulateur de marchés financiers Yann Secq Philippe Mathieu Thomas Buisine Romain Belmonte 

2013-06-04 15h00 M5-A7 106 Mise en place d'une application vidéo sur la carte xilinx ZynbqJean-Luc Dekeyser Jean-Luc Dekeyser Quang-Tung Nguyen Antoine B. Kiatoko 

2013-06-04 15h30 M5-A7 117 Experimentation d'un codeur jpeg sur Homade : une approcheRabie Ben Atitallah Jean-Luc Dekeyser Aurelien Bertiaux 

2013-06-04 16h15 M5-A7 102 Ecosystèmes virtuels et programmation 3D : spécification et dSamuel Blanquart Samuel Blanquart Lois Arens 

Yoann Bouquet 

2013-06-04 09h00 M5-A8 131 [ALTERNANT] étudiant:Jules Ivanic & entreprise:Gfi Thomas Ribeaucoup Jean-Christophe RoutierJules Ivanic 

2013-06-04 09h30 M5-A8 133 [ALTERNANT] étudiant:Sebastien Leclercq & entreprise:LifedoHerve Fourmeaux Jean-Christophe RoutierSebastien Leclercq 

2013-06-04 10h15 M5-A8 142 [ALTERNANT] étudiant:Valois Vander-Cruyssen & entreprise:MAnthony Dhondt Jean-Luc Levaire Valois Vander-Cruyssen 

2013-06-04 10h45 M5-A8 121 [ALTERNANT] étudiant:Loic Allart & entreprise:Vekia Vincent Wauters Laetitia Jourdan Loic Allart 

2013-06-04 11h15 M5-A8 126 [ALTERNANT] étudiant:Stefan Dochez & entreprise:AlternativeGuillaume Pellien Lionel Seinturier Stefan Dochez 

2013-06-04 11h45 M5-A8 130 [ALTERNANT] étudiant:Etienne Helluy-Lafont & entreprise:AdvJeremie Jourdin Pierre Boulet 

Etienne Helluy-Lafont 

2013-06-04 12h15 M5-A8 128 [ALTERNANT] étudiant:Thibaut Frain & entreprise:Valipost Thierry Thibaut Philippe Marquet Thibaut Frain 

2013-06-04 14h00 M5-A8 129 [ALTERNANT] étudiant:Rémi Gosselin & entreprise:J2S Jean-Yves Jourdain Samuel Hym 

Rémi Gosselin 

2013-06-04 14h30 M5-A8 139 [ALTERNANT] étudiant:Fabien Piette & entreprise:Recisio Jean-Baptiste Defossez Samuel Hym 

Fabien Piette 

2013-06-04 15h00 M5-A8 125 [ALTERNANT] étudiant:Jérôme Desjardins & entreprise:StadlinPascal Farange Marius Bilasco Jérôme Desjardins 

2013-06-04 15h30 M5-A8 140 [ALTERNANT] étudiant:Cesar Splete & entreprise:Audaxis Vincent Hosatte Marius Bilasco Cesar Splete 

2013-06-04 16h15 M5-A8 120 [ALTERNANT] étudiant:Romuald Alapide & entreprise:Cap GeJean-Yves Byhet Alexandre Sedoglavic Romuald Alapide 

Présidents de sessions 

Laetitia Jourdan 

Anne-Cécile Caron 

Fabrice Aubert 

Gery Casiez 

Laurent Noé 

Mikael Salson

Vendredi 31 mai 


2013-05-31 08h30 M5-A7 2 Reconstituer le puzzle : depuis des fragments juMikaël Salson Mikaël Salson Charles Husquin 

2013-05-31 09h00 M5-A7 4 Alecsia apprend à lire les ODT et PDF Mikaël Salson Mikaël Salson Anthony Tonglet 

2013-05-31 10h15 M5-A7 78 Evolution de l'application de suivi d'alternants et Marius Bilasco Marius Bilasco Ayoub Nejmeddine Sara El-Arbaoui 

2013-05-31 10h45 M5-A7 81 Take a photo for me Marius Bilasco Marius Bilasco Jérémie Samson Victor Paumier 

2013-05-31 11h15 M5-A7 82 Interagir avec votre ordinateur de la tête Marius Bilasco Marius Bilasco Mamadou Diop 

2013-05-31 11h45 M5-A7 84 Analyse contextuelle de collections de photos prMarius Bilasco Marius Bilasco Benjamin Allaert Benjamin Flahauw 

2013-05-31 14h30 M5-A7 6 Frameworks PHP et back-offices pour applicationJean-Claude Tarby Jean-Claude Tarby Omar Chahbouni Abderrahime El Idrissi 

2013-05-31 15h00 M5-A7 8 Intégration des ondes cérébrales dans la vie couJean-Claude Tarby Jean-Claude Tarby Mickaël Duruisseau Nicolas Coyard 

2013-05-31 16h15 M5-A7 68 Conception d'un Raspberry pi dédié aux présentBruno Bogaert Bruno Bogaert Louis Billiet Sylvain Goulliart 

2013-05-31 16h45 M5-A7 69 Écosystème pour gestion d'emploi du temps hebBruno Bogaert Bruno Bogaert Dhia Elhak Lakhal Sylvain Malfait 

2013-05-31 10h15 M5-A8 46 Intégration de Drone à une plateforme logicielle Gwenael Cattez Gwenael Cattez Ali Hedjaz Tony Tran 

2013-05-31 10h45 M5-A8 65 Moteur de scripts sous iOS Nicolas Haderer, RomainRomain Rouvoy Benjamin Digeon Florent David 

2013-05-31 11h15 M5-A8 66 Utiliser les téléphones mobiles pour l’estimation Nicolas Haderer Romain Rouvoy Julien Duribreux Justin Dufour 

2013-05-31 14h00 M5-A8 34 Interface de visualisation de molécules Maude Pupin Laurent Noé 

Antonia Ludunge 

2013-05-31 14h30 M5-A8 91 Pipeline d'analyse de régions de cassures Jean-Stéphane Varré Jean-Stéphane Varré Gauvain Marquet 

2013-05-31 15h00 M5-A8 30 Robot lego solveur de Sudoku Francesco De Comité Leopold Weinberg Oulamine Youssef El Achiqi Anas 

2013-05-31 16h15 M5-A8 37 Traitement semi-automatique des feuilles de préGéry Casiez Géry Casiez Alexis Linke 

Maxence Gaudry 

2013-05-31 16h45 M5-A8 3 Conception d'un reseau social orienté vidéo Antoine Thomas Antoine Thomas Emmanuel Pede Thomas Besset 

2013-05-31 08h30 M5-A9 105 Suivi d'un capteur en 3D a l'aide d'une webcam Jean Rioult Sébastien Ambellouis Matthieu Fesselier Guillaume Huylebroeck 

2013-05-31 09h00 M5-A9 110 Algorithmes de placement en deux dimensions François Clautiaux François Clautiaux Romain Windels 

2013-05-31 10h15 M5-A9 70 Home Cloud Server Cedric Dumoulin Cedric Dumoulin Lison Gallos Arnaud Caulier 

2013-05-31 10h45 M5-A9 27 Framework de modélisation dans les Tablettes AAmine El Kouhen Cédric Dumoulin Malika Rakhaoui ‎ Fatou-Laye Mbaye 

2013-05-31 11h15 M5-A9 71 Etude de la spécification des représentations arbCedric Dumoulin Cedric Dumoulin Adrien Burillon Thomas Camberlin 

2013-05-31 11h45 M5-A9 72 Generateur de GUI Android Cedric Dumoulin Cedric Dumoulin Gerard Paligot 

2013-05-31 14h00 M5-A9 33 Intégration du support multitouch dans Pharo Stéphane Ducasse Stéphane Ducasse Francois Lepan Benjamin V. Ryseghem 

2013-05-31 14h30 M5-A9 50 Interaction Kinect pour une application ludique Samuel Degrande Patricia Plenacoste Thomas Crepel Rémi Boens 

2013-05-31 15h00 M5-A9 108 Développement d'un plugin Eclipse de transformMartin Monperrus Benoit Cornu Amina El-Mekky Ouardia Maiz 

Lundi 3 juin 


2013-06-03 14h00 M5-A7 52 Recherche de candidats/jobs sans contact Nabil Djarallah, Nicolas HNabil Djarallah Gens Maxime Camille Riquier 

2013-06-03 14h30 M5-A7 53 API de contrôle de drones volants Nabil Djarallah, Nicolas PNicolas Petitprez Mohamed Ouannane Jeremy Diaz 

2013-06-03 15h00 M5-A7 54 Petites annonces en réalité augmentée Nabil Djarallah, Nicolas PNicolas Petitprez Alexandre Raulin Yann Duval 

2013-06-03 16h15 M5-A7 118 Intégration du uPnP dans le serveur embarqué SGilles Grimaud Gilles Grimaud Edouard Berton Nicolas Ryckembusch 

2013-06-03 16h45 M5-A7 119 Interface graphique en python pour la commandGilles Grimaud Gilles Grimaud Rabab Bouziane Narjes Jomaa 

2013-06-03 14h00 M5-A9 20 Capture de mouvement 3D avec une caméra Micr Hazem Wannous Hazem Wannous Derek Hendrickx Benjamin Makusa 

2013-06-03 14h30 M5-A9 107 Essayage 3D des lunettes virtuelles avec une caHazem Wannous Hazem Wannous Pierre Villoutreix Maxime Chaste 

2013-06-03 15h00 M5-A9 31 Robot lego machine de Turing Francesco De Comité Eric Wegrzynowski Matthieu Poudroux Ronan Dhellemmes 

2013-06-03 16h15 M5-A9 55 Extraction d'information textuelles multilingue à pLuigi Lancieri Luigi Lancieri Shichen Zhao Amira Kamli 

2013-06-03 16h45 M5-A9 56 Analyse du buzz sur twitter Luigi Lancieri Luigi Lancieri Florian Michiel Alessio Trunfio 

Mardi 4 juin 


2013-06-04 13h30 M5-A7 85 Plugin de visualisation 3D pour la consommationRomain Rouvoy Romain Rouvoy Aurore Allart Benjamin Ruytoor 

2013-06-04 14h00 M5-A7 109 Réseau de neurones artificiels pour reconnaissaPierre Boulet Pierre Boulet Sanaa Mouatassim 

2013-06-04 14h30 M5-A7 10 IHM HTML5 pour un simulateur de marchés finaYann Secq Philippe Mathieu Thomas Buisine Romain Belmonte 

2013-06-04 15h00 M5-A7 106 Mise en place d'une application vidéo sur la carteJean-Luc Dekeyser Jean-Luc Dekeyser Quang-Tung Nguyen 

2013-06-04 15h30 M5-A7 117 Experimentation d'un codeur jpeg sur Homade : Rabie Ben Atitallah Jean-Luc Dekeyser Aurelien Bertiaux 

2013-06-04 16h15 M5-A7 102 Ecosystèmes virtuels et programmation 3D : spéSamuel Blanquart Samuel Blanquart Lois Arens 

Yoann Bouquet 

Présidents de sessions 

Fabrice Aubert 

Gery Casiez 

Laurent Noé 

Mikael Salson

Quelques points 

Enseignement : 

1 

Bioinfo, Algo [1er semestre] 

2 

PDS, Réseaux, Suivis de Stages (3), PJI mes amis :-) 

[2eme semestre] 

3 

Nouvelle maquette 

Recherche : 

1 

PEPS Sand (accepté), ANR BnB (heu ... réponse le 17), 

Stage, Recrutements, Code (bugs), Evaluations en tout 

genre ... 

2 

Reviews (encore des graines, lossless ce coup ci ...) 

3 

→ 



1 

Mappi 

cf Exposé de Jenya 

2 


cf Exposé de Yoann 

3 

Graines et Produit (serpent de mer) 

(draft) 


Spaced seed design on profile HMMs for precise HTS read-mapping 

efficient sliding window product on the matrix semi-group 

Laurent Noé 

May 28, 2013 

Abstract 

We propose a new method and its associated algorithm to efficiently compute seed sensitivity when 

considering that High Throughput Sequencing reads are mapped along sub-parts of a known HMM alignment 

profile. This computation particularly makes sense with positioned spaced seeds. It relies on both 

automata theory (previous work [KNR06]) combined with a matrix product problem. 

Interestingly, it brings into light an interval product problem considered more than twenty years ago 

in [AS87], but here with a sliding window aspect : we propose an efficient algorithm to compute this 

sliding window set of products using a linear number of unit products on the (associative, but non 

commutative and non invertible) matrix semi-group. 

This computational scheme is implemented in the ongoing 1.06 version of Iedera which is available at 

http://bioinfo.lifl.fr/yass/iedera.php 

1 Introduction 

Spaced seed design remains an important, but a complex and challenging problem. Many papers have been 

devoted to this subject (mainly this last decade), from the (at first counter-intuitive) idea that such seeds 

were performing better [CR93, Buh02] and could be optimized [MTL02, BK01], to spaced seed sensitivity 

definition and computation [KLMT04], extended models of seeds and their computation [BBV05, Bro05, 

MGB06, CM07, YZ08, II09, KWS + 11], and given bounds and complexity problems investigated [FCLCST05, 

NR08, MY09, EM11]. Several software are now publicly available to design spaced seeds [SB05, NGK10, 

IIMB11, DDDD + 12, Nue11, MHKR12] 1 . 

High Throughput Sequencing (HTS) technologies have thrown a new light on the seed design process, 

because obtained HTS reads are of relative short length and quality labelled. Some of the most sensitive 

algorithms to map such reads onto related genomes use spaced seeds (SHRiMP [RLD + 09, DDL + 11], 

ZOOM [LZZ + 08], BFAST [HMN09], PerM [CSC09], LAST [KWS + 11], SToRM [NGK10], ...). 

But most of the regular seeds designed within these tools are based on the assumption that the mapped 

alignment profile remains “unknown”, thus preferring a i.i.d “randomly” generated profile. There are several 

(if not many) cases where this assumption can be removed due to a known profile of what is searched [SB09] 

/ filtered out (prior knowledge on the sequences being searched). However, an additional constraint comes 

from the fact that HTS reads are (most of the time) relatively short compared to the known profile and are 

thus aligned against any sub-profile extracted from the original profile. 

We thus propose in the main part of this paper an extended method to efficiently compute seed sensitivity 

or lossless property when considering that HTS reads are mapped on sub-profiles (overlapping windows) of 

a known HMM alignment profile, which is especially useful when designing positioned spaced seeds. This 

computation is first known to rely on a dynamic programming algorithm applied on the automaton that 

recognizes the language matched by the seed combined with the HMM model [KNR06]. This computation 

DRAFT 

1 Currently, more than one hundred references have been directly related to the spaced seeds problem, see for example 

http://www.lifl.fr/~noe/spaced_seeds.html 

also depends, due to the sub-profile constraint, on a set of matrix products done along overlapped intervals, 

which is an idea explored in this paper. 

The interval product problem has been considered in [AS87] and the authors provide an efficient solution 

in term of preprocessing, in order to answer any query product with a given constant number of products. 

We consider this interval product problem with an incremental aspect, using a sliding window, and propose 

an efficient algorithm to compute it without preprocessing using an amortized linear number of products 

on associative, but non commutative and non invertible, matrix semi-group that stores the property being 

computed (probability, cost, score, ...), itself represented by a semi-ring. 

In part 2, we give a brief recall of the seed design principle focusing on the seed sensitivity computation. 

We then propose the (matrix) product problem in part 3, and propose a method to solve it. Finally, in part 

4, we give some measurements on a practical implementation included in the ongoing 1.06 version of Iedera 

http://bioinfo.lifl.fr/yass/iedera.php, before concluding remarks in part 5. 

2 Seed design process 

Spaced seeds are now a frequently used hashing technique for biological sequence analysis. Their implementation 

(as a direct hashing method) is straightforward and brings high sensitivity for the same theoretical 

selectivity compared to contiguous seeds of an equivalent weight. Interestingly, in practice, a lightly reduced 

computational cost can even be observed when using spaced seeds compared with contiguous seeds of the 

same weight. 

Spaced seeds have been generalized by several extended seed models (Vector seeds [BBV05], Indel 

seeds [MGB06], Subset seeds [KNR06, ZF07, YZ08], Neighbor seeds [CM07]). To increase the overall sensitivity, 

they can usually be designed jointly as multiple seeds [YWC + 04, SB05], and (for example on quality 

labelled sequences) as positioned seeds [LZZ + 08, NGK10]. 

In addition to the seed model, one needs a selection criterion for good seed shapes : this criterion is 

(almost always) established on a model of the alignments being matched (usually represented as words on 

a binary match/mismatch alphabet), itself weighted by a probabilistic/cost/score/...(possibly any combination 

of such “semi-groups”) model. Here again, the initially proposed i.i.d. Bernoulli model [KLMT04] 

has been extended into Markov model [BKS05] and HMM [BBV04], with several extensions set on its 

parametrization [MB07, CP10]. 

In practice the considered criterion to select good spaced seed shapes is “the probability to hit at least 

once”(sensitivity), or “the guaranty to hit always at least once”(lossless property). Such criteria can 

then be measured by a dynamic programming algorithm based on the decomposition of alignment word 

suffixes detected by the seed [KLMT04, BK03], or more directly on the regular language recognized by the 

seed, itself compiled into a deterministic finite automaton [BKS05, KNR06, HR08]. 

3 Matrices product 

Given an automaton for the language recognized by the seed, and given a model (probabilistic/cost/score 

model) provided by a transducer, it is possible to compute properties (probabilities, costs, scores ...) of 

the initial language (see the illustrative example provided in Figure 1 for probabilities). In practice, the 

resulting matrices obtained from the model and the seed language are multiplied and/or powered; the 

computation “within matrices” is performed on “semi-rings” representing the properties : For example, 

language probabilities are computed on a classical semi-ring (E = R0≤r≤1,⊕ =+,⊙ =.,0⊕,ɛ⊙ =0,1⊙ = 

1), whereas language costs (respectively scores) are computed on a tropical semi-ring [Sim88, Pin98, MS09, 

Moh09](E = R,⊕ = min,⊙ =+,0⊕,ɛ⊙ = ∞,1⊙ = 0) (respectively (E = R,⊕ = max,⊙ =+,0⊕,ɛ⊙ = 

−∞,1⊙ = 0) for scores). 

In practice, for a set of seeds (and in general for any regular expression), the same algorithm [KNR06, 

MHKR12] can be applied on both classical and tropical semi-rings : it computes for example, either the 

seed sensitivity on the classical semi-ring for what is commonly named lossy seed design framework, 

DRAFT 

1 

2

q1 

q2 

q3 

q4 

q5 

p1 

11) 

p2 

11) 

Figure 1: Product of the seed 1*1 automaton with an ad hoc probabilistic model 

start 

0 

1 

0 

1 

0 

0 

1 

1 

0,1 0 ( 17),1 (27) 1 ( 47) 0 ( 1 

× 

= 

DRAFT 

start 

0 ( 11),1 3 ( 7 

(q1×p1) (q1×p2) (q2×p1) (q2×p2) (q3×p1) (q3×p2) (q4×p1) (q4×p2) (q5×p1) (q5×p2) 

(q1×p1) ( 1 7 ) (2 7 ) (4 7 ) 

(q1×p2) ( 3 

11 ) ( 1 

11 ) ( 7 

11 ) 

(q2×p1) ( 2 7 ) (4 7 ) (1 7 ) 

(q2×p2) ( 7 

11 ) ( 3 

11 ) ( 1 

11 ) 

(q3×p1) ( 2 7 ) (1 7 ) (4 7 ) 

(q3×p2) ( 3 

11 ) ( 11 1 ) ( 7 11 ) 

(2 7 ) (q4×p1) ( 1 7 ) (4 7 ) 

(q4×p2) ( 3 

11 ) ( 11 1 ) ( 7 11 ) 

1 

7 +2 7 (q5×p1) ( ) (4 7 ) 

(q5×p2) ( 3 11 + 7 

11 ) ( 1 

11 ) 

otherwise the minimal cost and thus the lossless property on the tropical semi-ring for the lossless seed 

design framework [NGK10]. Note also that it can be adapted to a score framework, if providing a clearly 

defined problem (e.g. [KNP04]). 

In the lossy framework, HMMs are frequently used in biological sequence and alignment representation 

(for example as profile HMMs [Edd98]) 2 . They thus can be easily applied to seed sensitivity computation 

[BBV04, KNR06, HR08] : they give a set of probabilities (emission probabilities for each state, together 

with transition probabilities between states) that are computed out of a profile alignment. Butwhensuch 

HMMs have to be used with HTS reads to design seeds, one must face a new problem : taking into account 

the fact that the read can be any sub-part of the HMM (HMM local alignment), and thus that the computation 

may start at any “position” on the alignment HMM : in some way a more challenging problem to design 

seeds when one needs to know precisely the hit probability of a set of (positioned) seeds for each window 

along the HMM. 

3.1 Sliding window product 

Such computation, translated into matrix form, implies to compute, for a list of (non-invertible) matrices 

M0,M1,M2,...,Mn−1, a set of products as one of the two following forms : 

2 Notice also that Position Weight Matrices (PWM) with indels, as the one used for example in Prosite, can be seen as a 

rough equivalent of the profile HMM in the tropical semi-ring... 

Problem. 

where w is the length of the read, 

Problem. 

compute 

compute 

j(t) ∏ 

u=i(t) 

i+w ∏ 

u=i 

Mu ∀i ∈ [0..n−w−1] (1) 

or more generally : 

Mu ∀t with 0≤i(t)≤j(t)k). 

Maintaining such matrices Uk for k ∈ [i..j] costs at most (in amortized analysis) one product per j- 

increase (see Appendix 7.2). Note that increasing i simply deletes the last Ui and thus does not cost any 

3 

4

i=0 

00 

U[0] 

Figure 2: Uk matrices: example when i = 0 and j = 24 

27 

26 

25 

24 

23 

22 

21 

20 

19 

18 

17 

16 

15 

14 

13 

12 

11 

10 

09 

08 

07 

06 

05 

04 

03 

02 

01 

U[1] 

U[2] 

U[3] 

U[4] 

U[5] 

U[6] 

U[7] 

U[8]... 

25 

additional product on the Uk’s. A pseudo-code of the add right process (increment of j)isprovidedin 

Algorithm 1. 

Without considering that any previous computation is kept, it is directly possible to compute the Mi..j 

product, as Mi × Mi+1···Mj for any i,j (j>i)inO(log(j − i)) products using the updated Uk set of 

matrices for k ∈ [i..j] (see Appendix 7.1). 

But if the product is computed when i and j follow two monotonically ( +0 

+1 )-increasing functions, the 

number of products can be reduced to (amortized) constants for each i and j step-move (or for both moves). 

3.3.2 Middle m definition and Mi..j product update 

00 

i=1 

01 

U[1] 

02 

U[2] 

03 

U[3] 

04 

U[16] 

Figure 3: Uk matrices: example when i = 1 and j = 24 

U[4] 

05 

U[5] 

06 

U[6] 

07 

U[7] 

08 

09 

10 

U[8]... 

11 

12 

24 

13 

14 

15 

m=16 

16 

17 

U[16] 

18 

19 

U[24] 

j=24 

DRAFT 

20 

21 

22 

23 

24 

25 

U[24] 

To split the computation when only i or j is moved, we need to define here the middle m of i and j. 

It is defined as the beginning position of the maximal (in size) U-block included in the interval i..j. Iftwo 

equal-size maximal blocks are between i and j, we choose m as the one that is the most factorized by two, 

which corresponds 3 to the beginning of the right maximal block (see Figure 3). This middle border enables 

to split the computation in two parts when needed, which we will call left (colored in green in Figure 3) and 

right (red in Figure 3). Note that m< 1 3 i+ 2 3j. Note also that when there is only one maximal sized 

block,thenm< 1 2 i+ 1 2 j, and when there are two maximal sized blocks,thenm>2 3 i+ 1 3 j. 

3 proof : the other choice would implies that the two maximal left and right blocks would be merged, which contradicts 

“maximality” of the left block; thus only the right block can be increased in size; to conclude : for two contiguous blocks of 

equal size, the right block is at least one more power of two factorizable than the left block 

26 

27 

j=24 

28 

28 

29 

29 

Algorithm 1: add right : increments the right border j, and updates the set Ui..j using Mj 

Input: 

• M0,M1,M2,...,Mn−1 : original matrices. 

Global: 

• i,j : integers, 

• Ui,...,Uj : original and updated set of matrices. 

Local: 

• u,t,told : integers. 

/* a) only before the first increment */ 

if j =0then 

U0 ← M0; 

/* b) increment j */ 

inc(j); 

/* c) and process the subset of Uj−t matrices that have to be updated */ 

Uj ← Mj; 

u ← j +1;told ← 0;t ← 1; 

while u is even and j −t ≥ i do 

Uj−t ← Uj−t ; ×Uj−told 

told ← t ; t ← 2.t+1;u ← u/2; 

In the next part, we will compute in two separate parts Mi..m−1 and Mm..j, considering the case when 

m is fixed first, and then two cases when m is increased. 

middle unchanged : if we suppose that the middle m does not change during a computational step, 

the following can be observed : 

• when j is increased (so that j = jold +1), updating the product Mm..j can be done with one product, 

considering that we keep the previous computation . Thus, considering that we also update 

Mm..jold 

the Uk’s values at the same time, an amortized single product must be added (Amortization on j :see 

Appendix 7.2). Joining Mi..m−1 with Mm..j will then cost one extra product, giving a total number of 

products of three. 

• when i is increased (i = iold + 1), previous computation Miold..m−1 does not help and can be erased 

here. However, if we suppose that we keep all the previous computed products Mk..m−1 in a stack for 

all the blocks Uk visited before, reusing and updating this part can be done with one single amortized 

product (Amortization on m : see Appendix 7.3). Joining Mi..m−1 and Mm..j will then cost one extra 

product, giving a total number of products of two. 

DRAFT 

At first glance, a {cost(i) ≤ i+m; cost(j) ≤ 3j} cost is applied when m does not change. Otherwise 

this computation has to be updated and this will be considered in the next part : 

middle changed : if we suppose that the middle m does change, then the previous computation cut 

in two parts Mi..m−1 and Mm..j is somehow “compromised”; Let’s now see when m changes, and moreover, 

why : 

5 

6

• whenmchangesduetoa j-increase, asmfollowsthebeginningofthelargestright-mostUk block, j can 

increase the maximal block size by two without changing m (case handled before, corresponding 

to one single maximal block), or j can make m jump to the next power of two “potential” block, 

thus from mold = odd×2 p to m =(odd+1)×2 p = odd+1 

2 ×2 p+1 (case not handled, that corresponds 

to two maximal blocks of equal size, the right-most being now the “m one”): This last case has no 

consequence on the product Mm..j that is immediately computed by the update of the Uk’s values as 

Mm..j corresponds to the right-most maximal block in Uk,thusinone single product here (and 

not two as shown before). 

However, moving m will obviously compromise the left stack of Mk..mold−1 previous computations that 

will now not help the computation of the next Mi..mold−1 on the next i-increase, since mold is now 

pushed to the next power of two m, and can be erased. This cost can however be bound by a log2( ∆m 2 ) 

where ∆m represents the m increase (see Appendix 7.4). 

At the end, joining Mi..m−1 with Mm..j will cost one extra product. 

Using an amortization on m and j and combining the two j-increase cases (when m does change, or 

not) gives a cost(j) ≤ 3j + 1 8m (see Appendix 7.4) 

• when i is increased so that i>m(thus i = m+1), m (that correponds to the largest right-most block) 

can only “jump” to a next block of smaller size : the cost on the left stack [i..m − 1] is already 

paid as it corresponds to a “legal” move of i that is amortized by one product as seen previously 

(Amortization on m, see Appendix 7.3). 

However, moving m will obviously compromise the right computation of Mmold..j since mold is now 

pushed to the next (smaller) block, and can be erased and recomputed. This cost can however be 

bound by a log2( ∆m 2 )where∆m where ∆m =representsthem increase (see Appendix 7.5). 

At the end, joining Mi..m−1 and Mm..j will cost one extra product. 

Using an amortization on m and i and combining the two i-increase cases (when m does change, or 

not) gives a cost(i) ≤ i+ 9 8m (see Appendix 7.5) 

To conclude, a {cost(i) ≤ i+ 9 8 m; cost(j) ≤ 3j + 1 8m} cost is applied. 

3.3.3 Pseudocode 

To illustrate the previously described computation, an associated pseudocode is given in Algorithm 2.The 

proposed algorithm returns the Mi..j product (still defined as Mi × Mi+1 × ···× Mj). It can only be 

applied once the Ui..j matrices have been updated by Algorithm 1. The main global data structure used in 

Algorithm 2 is a stack of matrices left products to m stack that keeps a set of products Mk..m−1 (where k 

is i ≤ k pairs, 

• right product from m : < matrix,int > pair. 

Local: 

• Pleft,Pright : matrices, 

• kleft,kright : integers. 

Result: 

• the product Mi..j 

/* a) update m (update algorithm not described here) */ 

mold ← m ; m ← update(m,i,j); 

/* a.1) reset all global variables when m change */ 

if mold ≠ m then 

left products to m stack ←∅; 

right products from m ← ; 

/* b) update the left stack products*/ 

/* b.1) remove stacked product that are not usefull */ 

while left products to m stack ≠ ∅ and top(left products to m stack).int 

pop(left products to m stack); 

if left products to m stack ≠ ∅ then 

← top(left products to m stack); 

else 

← ; 

/* b.2) and compute / stack left products from i to m */ 

while kleft >ido 

kleft ← kleft −size of block before(kleft,i); 

Pleft ← Ukleft ×Pleft; 

push(left products to m stack,< Pleft,kleft >); 

DRAFT 

/* c) compute the right product from m to j */ 

← right product from m; 

while kright

1 

4 Experiments on seed design 

The previous algorithm has been implemented and tested in Iedera where it can now be activated with the 

-ll option (see http://bioinfo.lifl.fr/yass/iedera.php). 

We designed spaced seeds on reads using an alignment model obtained from a profile HMM : on a 

typical example, for a read/windows length of 100 (respectively of 200) that corresponds to an observed 

current Illumina single read length (respectively two merged reads), and a simplified 4 profile HMM alignment 

of size 1605 (from a 16S rRNA database), the number of products required for the full computation of the 

1605−100+1 = 1506windowsoflength100was5931 5 (respectively5720productsforthe1605−200+1 = 1406 

windows of length 200) that must be compared to the number of products for the naive range algorithm of 

≈ 150000 (respectively ≈ 300000). 

However, it must be noticed that the products required for the naive algorithm are less time consuming 

(as matrix × vector products) compared to our case (matrix × matrix products). We thus compared the 

execution time of both approaches under the conditions proposed above for both 100 and 200 windows 

length. We conducted two experiments, one on spaced seeds, and the other on positioned spaced seeds. For 

the first experiment, the seeds were set at every position along the HMM, and the sensitivity was computed 

on all the windows along the HMM. For the second experiment, we additionally set a fixed number of 

positions along the HMM (10,20,40,80,160,320,640,1280) where seeds were set : seed positions were drawn, 

and the sensitivity was again computed on all the windows along the HMM. 

For both spaced seeds and positioned spaced seeds, we have chosen seeds of weight w ranging from 8 to 

12 (Figure 4 and 5: x-axis bottom label), span s ranging from w to 2×w (Figure 4 and 5: x-axis top label), 

and, for each pair (w,s), we have computed the sensitivity on 100 seeds and measured the time elapsed 

(Figure 4 and 5: y-axis label). Note that the set of seeds (respectively the set of positions for each seed) 

was identical on both methods being evaluated. The computation was carried out exclusively on a HP Z800 

Computer (Intel(R) Xeon(R) CPU E5620 @ 2.40GHz) with 20Gb of RAM (in practice, not more than 20% of 

the RAM was used), using a single thread. 

The obtained results are illustrated on Figure 4 for one single seed, and also on Figure 5 for a set of two 

seeds : they show a substantial improvement in almost all cases considered in the experiments. 

There is a double speedup observed in the most time consuming problems : this appends for seeds of the 

largest span in the set. This is the worst case, in the sense that large and dense matrices are produced. In 

practice, the practical speedup for seeds of reasonable span 

weight ratio (e.g. ≤ 1.8) is at least four times the one 

of the naive algorithm on non positioned seeds. The practical speedup for positioned seeds is less obvious on 

middle span seeds, but appears to increase if the seeds are of small or very large span, and when the set of 

positions increase. Finally, it must be noticed that on non positioned seeds, increasing the window length 

from 100 to 200 has a strong impact on the overall performances. 

5 Concluding remarks 

DRAFT 

First, it is very likely that the bounds proposed at the end of section 3 could be improved by a more precise 

analysis; However going under a bound of 3.0 per move while computing, for each move, the window product, 

is unlikely (at least without any initial amortized cost), since we have found at least one example such that 

the amortized number of products is 9253 

3081 ≈ 3.0032457 per move6 . 

Moreover when j is increased by “runs” while i is fixed, the proposed algorithm can be enhanced with a 

greedy computation of the Mi..j product (that can be done quickly provided that i is fixed for a while). In 

practice, this implementation always gives less or the same number of products than the proposed one, but 

has to further be carefully analyzed. 

4 only matching states are kept : insertion and deletion states are removed, but we keep track of transitions between matching 

Figure 4: Iedera speed improvement for one seed 

positioned seeds (window length 100) 

8 

9101112131415169101112131415161718101112131415161718192011121314151617181920212212131415161718192020222324 

non positioned seeds (window length 100) 

span 

span 

1000000 

naive range algorithm 

proposed algorithm 

proposed algorithm slower 

naive range algorithm faster 



100000 

× 2 

× 2 

10000 

1000 

100 

10 

8 9 10 11 12 

weight 

8 

9101112131415169101112131415161718101112131415161718192011121314151617181920212212131415161718192020222324 

8 9 10 11 12 

positioned seeds (window length 200) 

span 

1000000 

DRAFT 

weight 

non positioned seeds (window length 200) 

span 







100000 

× 2 

10000 

× 2 

1000 

100 

10 

8 9 10 11 12 

8 9 10 11 12 

weight 

weight 

8 

9101112131415169101112131415161718101112131415161718192011121314151617181920212212131415161718192021222324 

8 

9101112131415169101112131415161718101112131415161718192011121314151617181920212212131415161718192021222324 

↔ indel states to generate some break between contiguous blocks of matches : we thus keep the indel even without its length, 

and without any letter it may add here, since they are not supposed match any seed 

5 it must be noticed that each window needs a displacement both on i and j 

6 1,2,−1,3..24,−2,−3,25..51,−4,52..72,−5,73..392,−6,393..441,−7,−8,442..577,−9,578..3071 where i-moves are given 

with a minus notation 

1000000 

100000 

10000 

1000 

100 

10 

10000000 

1000000 

100000 

10000 

1000 

100 

10 

9 

time (seconds) 

10 

time (seconds)

Figure 5: Iedera speed improvement for two seeds 

positioned seeds (two seeds, window length 100) 

span 

8 

9101112131415169101112131415161718101112131415161718192011121314151617181920212212131415161718192020222324 

non positioned seeds (two seeds, window length 100) 

span 

100000 




naive × 2 range algorithm faster 



10000 

1000 

100 

10 

8 9 10 11 12 

weight 

positioned seeds (two seeds, window length 200) 

span 

8 

9101112131415169101112131415161718101112131415161718192011121314151617181920212212131415161718192020222324 

8 9 10 11 12 

10000000 





1000000 

DRAFT 

weight 

100000 

× 2 

10000 

1000 

100 

8 9 10 11 12 

weight 

10 

8 9 10 11 12 

weight 

8 

9101112131415169101112131415161718101112131415161718192011121314151617181920212212131415161718192021222324 

10000000 

1000000 

100000 

× 2 

Finally, it is also very likely that the algorithm “binary block division” may be modified and analyzed to 

get an even better bound. For example, some sorting algorithm as the Smooth sort [Dij82] uses Fibonacci 

numbers to partition data, rather than following the “binary tree” of the classical Heap Sort. Another 

interestingpointofviewistoconsiderthesizeofeachmatrix(hereunknownmostofthetime, andunfortately 

square on trivial cases), in order to combine the sliding window problem with the classical chain matrix 

non positioned seeds (two seeds, window length 200) 

span 



× 2 

DRAFT 

8 

9101112131415169101112131415161718101112131415161718192011121314151617181920212212131415161718192021222324 

10000 

1000 

100 

10 

1000000 

100000 

10000 

1000 

100 

10 

multiplication problem for the overall computation. 

From a more practical point of view, matrix products used within the algorithm, when applied on 

sparse/non sparse matrices, cannot be considered as a “constant” operation, but more likely as a “function 

of the sparsity”. However, such implementation needs to know this “sparsity cost” for all the possible 

products, which, in practice on unknown automata, is not predictable, but is similar to simulating the 

product, thus costs as much as the product itself; We have adopted in Iedera the choice of representing 

matrices (and the associated product) with the two possibilities : the algorithm chooses, for each matrix 

row, a sparse implementation if less than 20% of the cells are present, or a dense implementation otherwise. 

However, it is still possible here to get very high costs with the full matrix product : an alternative solution 

would be to combine both elements of the naive range product with sub-computations from the proposed 

algorithm if such cases would appear. 

Finally, a last aspect that can be taken into account is to parallelize the block product carefully since this 

one heavily depends on separate calculations for the same window being considered : the naive algorithm 

is in fact more difficult to parallelize efficiently withing each window for at least two reasons : first, there 

is a flow dependency between the set of products; worse, within each product, synchronization is needed 

when accessing the post computation vector, unless one has to reverse the computation by considering it 

first, which implies to reverse the matrices cells access from row-first to column-first. 

6 Acknowledgments 

This research was supported by the ANR project MAPPI (ANR-2010-COSI-004-02), LIFL (UMR CNRS 

8022 Université de Lille 1) and Inria Lille Nord-Europe. Project MAPPI is associated with the Tara Oceans 

expedition where the principal tasks involve the development of new software for mapping and assembling 

metagenomic and metatranscriptomic data. 

References 

[AS87] 

[BBV04] 

[BBV05] 

[BK01] 

[BK03] 

NogaAlonandBaruchSchieber. Optimalpreprocessingforansweringon-lineproductqueries. Technical 

Report TR 71/87, Inst. of Comp. Science, Tel-Aviv Univ., 1987. 

Broňa Brejová, Daniel G. Brown, and Tomáš Vinař. Optimal spaced seeds for homologous coding 

regions. Journal of Bioinformatics and Computational Biology, 1(4):595–610, Jan 2004. (earlier version 

in CPM 2003). URL: http://www.worldscinet.com/jbcb/01/0104/S0219720004000326.html, doi: 

10.1142/S0219720004000326. 

BroňaBrejová, DanielG.Brown, andTomášVinař. Vectorseeds: Anextensiontospacedseeds. Journal 

of Computer and System Sciences, 70(3):364–380, 2005. (earlier version in WABI 2003). URL: http:// 

linkinghub.elsevier.com/retrieve/pii/S0022000004001527, doi:10.1016/j.jcss.2004.12.008. 

Stefan Burkhardt and Juha Kärkkäinen. Better filtering with gapped q-grams. In Proceedings of 

the 12th Symposium on Combinatorial Pattern Matching (CPM),volume2089ofLecture Notes in 

Computer Science, pages 73–85. Springer, July 2001. URL: http://www.springerlink.com/content/ 

gykw51mpjqnwrmqx, doi:10.1007/3-540-48194-X_6. 

Stefan Burkhardt and Juha Kärkkäinen. Better filtering with gapped q-grams. Fundamenta Informaticae, 

56(1-2):51–70, 2003. Preliminary version in Combinatorial Pattern Matching 2001. URL: 

http://iospress.metapress.com/content/8ad9p3mqeday8vt5. 

[BKS05] 

Jeremy Buhler, Uri Keich, and Yanni Sun. Designing seeds for similarity search in genomic DNA. 

Journal of Computer and System Sciences, 70(3):342–363, 2005. (earlier version in RECOMB 2003). 


11 


12

URL: http://linkinghub.elsevier.com/retrieve/pii/S0022000004001515, doi:10.1016/j.jcss. 

2004.12.003. 

[Bro05] Daniel G. Brown. Optimizing multiple seeds for protein homology search. IEEE/ACM Transactions 

on Computational Biology and Bioinformatics (TCBB), 2(1):29–38, january 2005. (earlier version in 

WABI 2004). URL: http://ieeexplore.ieee.org/xpl/freeabs_all.jsparnumber=1416848, doi: 

10.1109/tcbb.2005.13. 

[Buh02] 

[CM07] 

[CP10] 

[CR93] 

[CSC09] 

Jeremy Buhler. Provably sensitive indexing strategies for biosequence similarity search. In RECOMB, 

Washington, DC (USA), pages 90–99. ACM Press, April 2002. URL: http://doi.acm.org/10.1145/ 

565196.565208, doi:10.1145/565196.565208. 

Miklós Csűrös and Bin Ma. Rapid homology search with neighbor seeds. Algorithmica,48(2):187– 

202, Jun. 2007. (earlier version in COCOON 2005). URL: http://www.springerlink.com/content/ 

45446712u14n0416, doi:10.1007/s00453-007-0062-y. 

Won-Hyoung Chung and Seong-Bae Park. Hit integration for identifying optimal spaced seeds. BMC 

Bioinformatics - Selected articles from the 8th Asia-Pacific Bioinformatics Conference (APBC), 18-21 

january, Bangalore, India,11(Suppl1):S37,2010.URL:http://www.biomedcentral.com/1471-2105/ 

11/S1/S37, doi:10.1186/1471-2105-11-S1-S37. 

Andrea Califano and Isidore Rigoutsos. Flash: A fast look-up algorithm for string homology. In 

Proceedings of the 1st International Conference on Intelligent Systems for Molecular Biology (ISMB), 

pages 56–64, July 1993. 

Yangho Chen, Tate Souaiaia, and Ting Chen. PerM: efficient mapping of short sequencing reads 

with periodic full sensitive spaced seeds. Bioinformatics, 25(19):2514–2521,2009. URL:http:// 

bioinformatics.oxfordjournals.org/content/25/19/2514, doi:10.1093/bioinformatics/btp486. 

[DDDD + 12] Dong Do Duc, Huy Q. Dinh, Thanh Hai Dang, Kris Laukens, and Xuan Huan Hoang. AcoSeeD: An 

ant colony optimization for finding optimal spaced seeds in biological sequence search. In Proceedings of 

the 8th International Conference on Swarm Intelligence (ANTS), Brussels (Belgium),volume7461of 

Lecture Notes in Computer Science, pages 204–211. Springer, 2012. URL: http://www.springerlink. 

com/content/n1476j612302410k/, doi:10.1007/978-3-642-32650-9_19. 

[DDL + 11] 

[Dij82] 

[Edd98] 

[EM11] 

Matei David, Misko Dzamba, Dan Lister, Lucian Ilie, and Michael Brudno. SHRiMP2: Sensitive yet 

practical short read mapping. Bioinformatics,2011.doi:10.1093/bioinformatics/btr046. 

Edsger W. Dijkstra. Smoothsort, an alternative to sorting in situ. Sci. Comp. Progr.,1:223–233,1982. 

Sean R. Eddy. Profile hidden Markov models. Bioinformatics,14(9):755–763,1998. doi:10.1093/ 

bioinformatics/14.9.755. 

Lavinia Egidi and Giovanni Manzini. Spaced seeds design using perfect rulers. In Proceedings of the 

18th International Symposium on String Processing and Information Retrieval (SPIRE), Pisa (Italy), 

volume 7024 of Lecture Notes in Computer Science, pages 32–43. Springer, 2011. URL: http://www. 

springerlink.com/content/c18m78j1214h7k21/, doi:10.1007/978-3-642-24583-1_5. 

[FCLCST05] Martin Farach-Colton, Gad M. Landau, Süleyman Cenk Sahinalp, and Dekel Tsur. Optimal spaced 

seeds for faster approximate string matching. In Proceedings of the 32nd International Colloquium 

on Automata, Languages and Programming (ICALP’05), Lisboa (Portugal),volume3580ofLecture 

Notes in Computer Science, pages 1251–1262. Springer, 2005. URL: http://www.springerlink.com/ 

content/815pej6c1kc09upj, doi:10.1007/11523468_101. 

[HMN09] 

[HR08] 

DRAFT 

Nils Homer, Barry Merriman, and Stanley F. Nelson. BFAST: An alignment tool for large scale genome 

resequencing. PLoS One,4(11):e7767,2009.doi:10.1371/journal.pone.0007767. 

Inke Herms and Sven Rahmann. Computing alignment seed sensitivity with probabilistic arithmetic 

automata. In Proceedings of the 8th International Workshop on Algorithms in Bioinformatics 

(WABI), Karlsruhe (Germany), volume5251ofLecture Notes in Bioinformatics, pages318– 

329. Springer, Sept. 2008. URL: http://www.springerlink.com/content/e8w1g39288144l56, doi: 

10.1007/978-3-540-87361-7_27. 

[II09] Lucian Ilie and Silvana Ilie. Fast computation of neighbor seeds. Bioinformatics, 25(6):822– 

823, 2009. URL: http://bioinformatics.oxfordjournals.org/content/25/6/822, doi:10.1093/ 

bioinformatics/btp054. 

[IIMB11] 

[KLMT04] 

[KNP04] 

[KNR06] 

[KWS + 11] 

[LZZ + 08] 

[MB07] 

[MGB06] 

Lucian Ilie, Silvana Ilie, and Anahita Mansouri Bigvand. SpEED: fast computation of sensitive spaced 

seeds. Bioinformatics,2011.doi:10.1093/bioinformatics/btr368. 

Uri Keich, Ming Li, Bin Ma, and John Tromp. On spaced seeds for similarity search. Discrete Applied 

Mathematics, 138(3):253–263, 2004. (preliminary version in 2002). doi:10.1016/S0166-218X(03) 

00382-2. 

Gregory Kucherov, Laurent Noé, and Yann Ponty. Estimating seed sensitivity on homogeneous 

alignments. In Proceedings of the IEEE 4th Symposium on Bioinformatics and Bioengineering 

(BIBE), May 19-21, 2004, Taichung (Taiwan), pages 387–394. IEEE Computer Society Press, April 

2004. URL: http://ieeexplore.ieee.org/xpl/freeabs_all.jsparnumber=1317369, arXiv:cs.OH/ 

0603106, doi:10.1109/BIBE.2004.1317369. 

Gregory Kucherov, Laurent Noé, and Mikhail A. Roytberg. A unifying framework for seed sensitivity 

and its application to subset seeds. Journal of Bioinformatics and Computational Biology,4(2):553– 

569, November 2006. URL: http://www.worldscinet.com/jbcb/04/0402/S0219720006001977.html, 

arXiv:cs.DS/0601116, doi:10.1142/S0219720006001977. 

Szymon M. Kie̷lbasa, Raymond Wan, Kengo Sato, Paul Horton, and Martin C. Frith. Adaptive seeds 

tame genomic sequence comparison. Genome Research,21(3):487–493,2011.URL:http://genome. 

cshlp.org/content/21/3/487, doi:10.1101/gr.113985.110. 

Hao Lin, Zefeng Zhang, Michael Q. Zhang, Bin Ma, and Ming Li. ZOOM! Zillions Of Oligos 

Mapped. Bioinformatics,24(21):2431–2437,2008. URL:http://bioinformatics.oxfordjournals. 

org/content/24/21/2431, doi:10.1093/bioinformatics/btn416. 

Denise Y.F. Mak and Gary Benson. All hits all the time: parameter free calculation of seed sensitivity. 

In D. Sankoff, L. Wang, and F. Chin, editors, Proceedings of the 5th Asia Pacific Bioinformatics 

Conference (APBC),volume5ofAdvances in Bioinformatics and Computational Biology,pages327– 

340. Imperial College Press, 2007. URL: http://eproceedings.worldscinet.com/9781860947995/ 

9781860947995_0035.html, doi:10.1142/9781860947995_0035. 

DeniseY.F.Mak, YevgeniyGelfand, andGaryBenson. Indelseedsforhomologysearch. Bioinformatics, 

22(14):e341–e349, 2006. URL: http://bioinformatics.oxfordjournals.org/content/22/14/e341, 

doi:10.1093/bioinformatics/btl263. 

[MHKR12] Tobias Marschall, Inke Herms, Hans-Michael Kaltenbach, and Sven Rahmann. Probabilistic arithmetic 

automata and their applications. IEEE/ACM Transactions on Computational Biology and Bioinformatics 

(TCBB),9(6):1737–1750,2012.URL:http://doi.ieeecomputersociety.org/10.1109/tcbb. 

2012.109, doi:10.1109/TCBB.2012.109. 

[Moh09] Mehryar Mohri. Handbook of Weighted Automata, chapter Weighted automata algorithms, pages 213– 

254. Springer, 2009. doi:10.1007/978-3-642-01492-5_6. 

[MS09] Diane Maclagan and Bernd Sturmfels. Introduction to tropical geometry. (draft book-in-progress), 2009. 

[MTL02] 

[MY09] 

[NGK10] 

[NR08] 

[Nue11] 

[Pin98] 

Bin Ma, John Tromp, and Ming Li. PatternHunter: Faster and more sensitive homology search. 

Bioinformatics,18(3):440–445,2002. URL:http://bioinformatics.oxfordjournals.org/content/ 

18/3/440, doi:10.1093/bioinformatics/18.3.440. 

DRAFT 

Bin Ma and Hongyi Yao. Seed optimization for i.i.d. similarities is no easier than optimal Golomb 

ruler design. Information Processing Letters, 109(19):1120–1124,2009. URL:http://linkinghub. 

elsevier.com/retrieve/pii/S0020019009002270, doi:10.1016/j.ipl.2009.07.008. 

Laurent Noé, Marta Gîrdea, and Gregory Kucherov. Designing efficient spaced seeds for SOLiD read 

mapping. Advances in Bioinformatics,2010:ID708501,July2010.URL:http://www.hindawi.com/ 

journals/abi/2010/708501/, doi:10.1155/2010/708501. 

François Nicolas and Éric Rivals. Hardness of optimal spaced seed design. Journal of Computer and 

System Sciences, 74(5):831–849, Aug. 2008. (earlier version in CPM 2005). URL: http://linkinghub. 

elsevier.com/retrieve/pii/S0022000007001444, doi:10.1016/j.jcss.2007.10.001. 

Gregory Nuel. Bioinformatics - Trends and Methodologies, chapter Significance Score of Motifs in 

Biological Sequences. InTech, 2011. doi:10.5772/18448. 

Jean-Éric Pin. Tropical semirings. In J. Gunawardena, editor, Idempotency,volume11ofPubl. Newton 

Inst., pages 50–69, Bristol, 1998. Cambridge Univ. Press. 

13 

14

x−1 

x−2 

[RLD + 09] Stephen M. Rumble, Phil Lacroute, Adrian V. Dalca, Marc Fiume, Arend Sidow, and Michael 

Brudno. SHRiMP: Accurate mapping of short color-space reads. PLoS Comput Biol,5(5):e1000386, 

05 2009. URL: http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1000386, 

doi:10.1371/journal.pcbi.1000386. 

[SB05] Yanni Sun and Jeremy Buhler. Designing multiple simultaneous seeds for DNA similarity search. 

Journal of Computational Biology, 12(6):847–861, 2005. (earlierversioninRECOMB2004). URL:http: 

//www.liebertonline.com/doi/abs/10.1089/cmb.2005.12.847, doi:10.1089/cmb.2005.12.847. 

[SB09] 

[Sim88] 

Yanni Sun and Jeremy Buhler. Designing patterns and profiles for faster HMM search. IEEE/ACM 

Transactions on Computational Biology and Bioinformatics (TCBB),6(2):232–243,2009. doi:10. 

1109/tcbb.2008.14. 

Imre Simon. Recognizable sets with multiplicities in the tropical semiring. In Mathematical foundations 

of computer science, 1988 (Carlsbad, 1988),volume324ofLecture Notes in Comput. Sci.,pages107– 

120. Springer, Berlin, 1988. doi:10.1007/BFb0017135. 

[YWC + 04] I-Hsuan Yang, Sheng-Ho Wang, Yang-Ho Chen, Pao-Hsian Huang, Liang Ye, Xiaoqiu Huang, and Kun- 

Mao Chao. Efficient methods for generating optimal single and multiple spaced seeds. In Proceedings 

of the IEEE 4th Symposium on Bioinformatics and Bioengineering (BIBE), Taichung (Taiwan),pages 

411–416. IEEE Computer Society Press, 2004. URL:http://ieeexplore.ieee.org/xpl/freeabs_all. 

jsparnumber=1317372, doi:10.1109/BIBE.2004.1317372. 

[YZ08] 

[ZF07] 

Jialiang Yang and Louxin Zhang. Run probabilities of seed-like patterns and identifying good transition 

seeds. Journal of Computational Biology, 15(10):1295–1313, Dec. 2008. (earlier version in APBC 

2008). URL: http://www.liebertonline.com/doi/abs/10.1089/cmb.2007.0209, doi:10.1089/cmb. 

2007.0209. 

Leming Zhou and Liliana Florea. Designing sensitive and specific spaced seeds for cross-species mRNAto-genome 

alignment. Journal of Computational Biology, 14(2):113–130, Mar. 2007. URL: http: 

//www.liebertonline.com/doi/abs/10.1089/cmb.2006.0130, doi:10.1089/cmb.2006.0130. 

DRAFT 

00 

00 

01 

01 

02 

02 

7 Appendix 

Figure 6: Uk matrices and product: example when i = 9 and j = 23 

24 

23 

22 

21 

20 

19 

18 

17 

16 

15 

14 

13 

12 

11 

10 

09 

08 

07 

06 

05 

04 

03 

i=9 

x 

x 

n=14 

x 

29 

28 

27 

26 

DRAFT 

x 

x 

j=23 

Figure 7: Uk matrices and product: example when i = 9 and j = 19 

03 

04 

05 

06 

07 

08 

i=9 

09 

x 

10 

11 

x 

12 

13 

14 

n=10 

7.1 Worst case number of products from i to j 

We denote by n the number of single matrices : n = j − i +1(n is thus the length of the block being 

computed with help of the Uk matrices already given). We illustrate below how to obtain the smaller size n 

according to the number of products x. 

• if x[ 

is odd, ] the worst case is produced by a concatenation of blocks of size 2 i on both ends, for 

i ∈ 0.. x−1 


n = 2 

2∑ 

i=0 

15 

x 

16 

17 

x 

18 

19 

20 

j=19 

2 i = 2 √ ( 

2×2 x n+2 

) 

2 −2 x = 2log 2 

• If x is even, the worst case] 

is produced by a concatenation of blocks of size 2 i on both ends of a block 

of size 2 x 2, for i ∈ 

[0.. x 2 −1 (see Figure 7 for x = 4): 

n = 2 

2∑ 

i=0 

21 

22 

23 

24 

25 

25 

26 

2 √ 2 

( 

2 i +2 x 2 = 3×2 x n+2 

) 

2 −2 x = 2log 2 

3 

27 

28 

29 

15 

16

Figure 8: minimal n (for x even and odd) functions compared to 2×x 

Figure 9: Uk matrices and Mi..m−1 product : example when i = 33 and j goes from 47 to 48 

16 

22.2 x/2 - 2 

3.2 x/2 - 2 

14 

2.x 14 

12 

10 

10 

8 

6 

6 

4 

4 

2 

2 

1 

0 

0 1 2 3 4 5 

Note that this integer sequence has its own OEIS sequence at http://oeis.org/A027383,definedhere 

as a partial sum of http://oeis.org/A016116. 

Combining those two cases, it can be shown that when the number of products is set to x =1,2 or 3, 

then the minimal size is exactly 2×x (Illustration on Figure 8), and also when x>3 (or x = 0) that this 

minimal bound is never reached again. 

In other words, the number of products x is always ≤ n 2 . 

7.2 Amortized analysis of Uk blocks when i =0and j ≥ 0 

Summing the number of products needed when computing Uk should be 2 on average, and not 1 : a quick 

analysis shows that, indeed, if one product is done half of the time, two are done each 1/4, three done each 

1/8, and so on ... then the ∑ ∞ 

u=1 u 

2 = 2. u 

However here, we will show that amortized number of product when considering j is only 1. We use an 

amortized analysis by giving one coin each time j is increased (i is supposed to stay at 0 but this assumption 

can be lifted since it can be seen as a worst case when updating Uk) to show than any sub-block Uk will 

generate one extra coin, and thus grouped with its neighbour block in size (itself generating one extra coin), 

the cost of the father block processed with those two is also generating (1+1)−1 = one extra coin. 

• this is true for blocks of size 2 since they are build of blocks of size 1 that do not generate any product 

: the cost for such block of size 2 is thus 1, and 1 extra coin remains. 

• this can be easily verified for blocks of size 2 p (p>1), since by induction hypothesis the two sub-blocks 

of size 2 p−1 give each one extra coin : the cost associated when joining the two sub-blocks then removes 

one coin, and one extra coin remains again. 

Note that this analysis can be set for any i ≥ 0 and any j>iprovided that at first an extra number of 

j −i coins is provided. 

x 

DRAFT 

i=33 

51 

50 

49 

48 

47 

46 

45 

44 

43 

42 

41 

40 

39 

38 

37 

36 

35 

34 

33 

32 

m_old=36 

x 

x 

24 

m=40 

j_old=47 

j=48 

7.3 Amortized analysis of the left Mi..m−1 blocks when m fixed and i increased 

Summing the number of products needed when computing Mi..m−1 for any i from 0 to m is on average 1 : a 

quick analysis shows indeed that if no product is done half of the time (when i is even), one product is done 

each 1/4, two done each 1/8, and so on ... then ∑ ∞ u 

u=0 2u+1 = 1. 

But this does not guaranty that the total number of products paid when increasing i from any value (for 

example 0) to m is always less than m. Here we will show that the number of products (once m is fixed) for 

computing Mi..m−1 for any i from 0 to a given m =2 p is ≤ 2 p −p−1. 

m =2! 0 

m =4! 1 

m =8! 4 

m = 16 ! 11 

A similar method to section 7.2 can be applied. 

First we consider the case when i = 0 and m has been increased to reach a given (and fixed) value 2 p . 

• this is true when p =1(thuswhenm = 2) since, using Uk blocks, it needs no product to compute 

M0..1 and M1..1. 

• this can be verified for blocks of size 2 p (p>1), since we can then use the two sub-blocks of size 2 p−1 : 

when i is within the first sub-block, as the product is done from m to i and stacked in such way that any 

suffix Mk..m in kept, it costs the product produced by this sub-block (2 (p−1) −(p−1)−1) added to the 

log2( m 2 )=p−1 extra products to cover the second sub-block of size 2p−1 ;wheni is within the second 

sub-block, exactly the number of products produced by this sub-block ( (2 (p−1) −(p−1)−1). ) Thus when 

summing these two quantities, the number of products is ≤ 2× 2 (p−1) −(p−1)−1 +(p−1) = 2 p −p−1 

DRAFT 

Thus, increasing i from any value ≥ 0tom and computing all the possible products (with the help of 

the Blocks Uk)is≤ m−log2(m)−1, thus costs less than m. 

Note that this analysis can be set for any i ≥ 0 and any m (not necessary represented as a strict power 

of 2 , but as m = a×2 p such that 2 p is the maximal block size of Uk for k ∈ [i..j]). 

7.4 Amortized analysis of the left Mi..m−1 blocks when m is increased (due to a 

j increase) and i is fixed 

When j increases while i is fixed, m may change to a new (and of course increased) value pointing to an 

equal (or twice larger block) : this appends when m goes from mold =2 pold ×aold (with aold odd), to its new 

value m = mold+2 pold =(aold+1)×2 pold = a×2 p (with a = aold 

2 and p = pold+1), as illustrated on Figure 

9. 

17 

18

Here we are interested in the computation of Mi..m−1 due to this ∆m = m − mold =2 pold increase. 

In practice, since m has changed, the full set of left stack matrices Mk..m−1 has to be recomputed for 

some k ∈ [i..m], and some products already done Mk..mold−1 (amortized in Section 7.3)havetoberedone 

unfortunately twice here. 

This twice-cost is at most log2( mold−i+1 

2 ) ≤ log2( m−mold 

2 )=log2( ∆m 2 

)(mold −i

7.6 Analysis 1 

If the size of the ∆m block increase is given by 2 u , the function f(u) that represents the amortized increase 

per j move (Appendix 7.4)is: 

f(u)= 

( 

3− 1 ) 

2 u [j]+ u−1 

( 

2 u [m] ≤ g(u)= 3− 1 ) 

2 u [j]+ u−1 [ i+2×j 

2 u 3 

DRAFT 

] 

since m< i+2×j 

3 

[ i+2×j 

f ′ (u)= ln(2) 

2 u [j]+1−ln(2)−uln(2) 2 u [m] g ′ (u)= ln(2) 

2 u [j]+1−ln(2)−uln(2) 2 u 3 

note that 

g ′ (x) ≥ 0ifx ≤ 1+ 1 

ln(2) ≈ 2.44 

g ′ (x) ≤ 0ifx ≥ 5 2 + 1 

ln(2) ≈ 3.94 

so the maximal g(int) candidate is one of g(2),g(3) or g(4). Since, 

g(3)−g(2) = 1 8 [j] ≥ 0 

g(4)−g(3) = 1 

48 ([j]−[i]) ≥ 0 

then g(4) = 3[j]+ [i]+[j] 

16 is the maximal value. Thus, f(u) ≤ 3[j]+ [i]+[j] 

16 . 

Note also that 

f(u)= 

( 

3− 1 ) 

2 u [j]+ u−1 [m] 

2u ≤ 3[j]+u−2[m] 2u sincem

Rapide bilan 2012-2013 - LIFL

Create successful ePaper yourself

Delete template?

Save as template?