12.07.2015 Views

Slim: Directly Mining Descriptive Patterns - Universiteit Antwerpen

Slim: Directly Mining Descriptive Patterns - Universiteit Antwerpen

Slim: Directly Mining Descriptive Patterns - Universiteit Antwerpen

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Relative compression (L%)100806040200<strong>Slim</strong>AbstractsMammalsLetter recog.IonosphereMCADDMushroomPen digitsPlantsPumsbPumsbstarKrimpAccidentsAdultBMS-posBMS-wv1Connect-4Chess (kr-k)(k-k) ChessDNA amp.WaveformTime in seconds10 010 110 210 310 410 5<strong>Slim</strong> at L% of KrimpFigure 1: Comparing <strong>Slim</strong> and Krimp: relative compression (L%) (top), overall running time in seconds (bottom),and time needed by <strong>Slim</strong> to reach the compression attained by Krimp (mid-grey). Note that besides obtainingbetter compression, the <strong>Slim</strong> prototype is nearly always faster than the optimised Krimp implementation.Relative compression (L%)100908070605040Adult<strong>Slim</strong>SlamKrimpKrampEstimated ∆L (in bits)3500300025002000150010005000DNA amplificationrejected candidateaccepted candidate100030200 500 1000 1500 2000 2500 3000Number of accepted itemsetsFigure 2: Convergence plot for Adult. The lines in lowerleft corner show the final attained relative compression.−500−100−100 0 100−1000−1000 −500 0 500 1000 1500 2000 2500 3000 3500Exact ∆L (in bits)Figure 3: Correlation plot of the estimated versus exact∆L (in bits) on the DNA amplification dataset.6.5 Number of Candidates Next, we compare thenumber of instantiated candidates |F|, shown in Table1. This is the number of itemsets for which we calculatethe total compressed size by covering the data. ForKrimp this equals the number of frequent itemsets atthe listed minsup threshold. For <strong>Slim</strong> this reflects thenumber of materialised unions of code table elements.As Table 1 shows, <strong>Slim</strong> evaluates 2 orders-of-magnitudefewer candidates. In general, <strong>Slim</strong> considers between 10thousand and 10 million itemsets, whereas Krimp processesmillions to billions of itemsets.Only for Abstracts and BMS-webview 1 <strong>Slim</strong> evaluatesmore candidates than Krimp—datasets for whicha minute decrease in minsup leads to an explosion ofcandidates. As such, also for this sparse data <strong>Slim</strong> canconsider candidates at lower support than Krimp canhandle, while for the other datasets <strong>Slim</strong> only requiresa fraction to reach better compression.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!