30.07.2015 Views

Actas JP2011 - Universidad de La Laguna

Actas JP2011 - Universidad de La Laguna

Actas JP2011 - Universidad de La Laguna

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

<strong>Actas</strong> XXII Jornadas <strong>de</strong> Paralelismo (<strong>JP2011</strong>) , <strong>La</strong> <strong>La</strong>guna, Tenerife, 7-9 septiembre 2011We have used two different reference databases:Spanish : A Spanish dictionary with 51,589 wordsand we used the edit distance [10] to measure similarity.On this metric-space we processed 40,000 queriesselected from a sample of the Chilean Web which wastaken from the TODOCL search engine. This can beconsi<strong>de</strong>red a low dimensional metric space.Images : We took a collection of images from aNASA database containing 40,700 images vectors,and we used them as an empirical probability distributionfrom which we generated a large collectionof random image objects containing 120,000 objects.We built each in<strong>de</strong>x with the 80% of the objects andthe remaining 20% objects were used as queries. Inthis collection we used the eucli<strong>de</strong>an distance to measurethe similarity between two objects. Intrinsic dimensionalityof this space higher than in the previousdatabase, but it is still consi<strong>de</strong>red low.In the vector database (Images) the radius usedwere those that retrieve on average the 0.01%, 0.1%and 1% of the elements of the database per query.In the Spanish database the radius were 1, 2 and3. Similar values have been also used in previouspapers [9][8]. In all the proposed methods, the set ofqueries are previously copied to <strong>de</strong>vice memory.Regarding the GPU implementation, we performeda wi<strong>de</strong> exploration to obtain the best parametersfor each in<strong>de</strong>xed structure. Regarding LC wefound that 64 elements per cluster is the best optionfor the vector database, while 32 performs thebest in the Spanish database. We already discussedSSS-In<strong>de</strong>x tuning in Section IV-C. The conclusionsthere drawn hold for the Images database, so a singlepivot (α = 0.66) is used. However, for the Spanishdatabase it is better to use 64 pivots (α = 0.5).Figure 1 illustrates the performance characteristicsof our GPU implementations. Brute Force stands forthe exhaustive-search algorithm. LC and SSS-In<strong>de</strong>xshow the results for the two implemented in<strong>de</strong>xingmechanisms with the parameters indicated above.All figures are normalized to the largest value of eachversion.We first place our attention on the total numberof distance evaluations (Figure 1(a)). Both databasebehaves as expected: in<strong>de</strong>xing mechanisms do significantly<strong>de</strong>crease the number of distance evaluationswhen compared to the brute force search method.SSS-In<strong>de</strong>x typically outperforms LC when checkingdistance evaluations. And it does it if we consi<strong>de</strong>rthe Spanish database. However, since we just usedone pivot for the Images, LC becomes the winner inthis category. One would expect that running timesmimic the trend exhibited by the distance evaluationsbut results in Figure 1(b) partially contradictsthis intuition: the brute force search algorithm behavesbetter than expected in the Images database.It equals or even improves SSS-In<strong>de</strong>x performance.For the Spanish database changes are not so drastic,but LC becomes the best implementation even if itperforms more distance evaluations.Figure 1(c) has the clue: the brute force techniqueNormalized Average of D.E.Normalized Running TimeNormalized Quantity of Read−Write OperationsD.B. ImagesD.B. Spanish0.80.70.60.50.40.30.20.100.9 1 0.01 0.1 1 1 2 3Percentage of Database RetrievedBrute ForceSSS−In<strong>de</strong>xL.C.Radii(a) Number of Distance EvaluationsD.B. ImagesD.B. Spanish0.80.70.60.50.40.30.20.100.9 1 0.01 0.1 1 1 2 3Percentage of Database RetrievedD.B Images(b) Running timeBrute ForceSSS−In<strong>de</strong>xL.C.RadiiD.B Spanish0.80.70.60.50.40.30.20.100.9 1 0.01 0.1 1 1 2 3Percentage of Database RetrievedBrute ForceSSS−In<strong>de</strong>xL.C.Radii(c) Number of read/write operationsFig. 1. Normalized a) Distance evaluations per query (average)b) Running time and c) Read-write Operations (of32, 64 o 128 bytes) to <strong>de</strong>vice memory.has a slightly better memory access pattern overSSS-In<strong>de</strong>x when <strong>de</strong>aling with the Images database.The alignment of memory access heavily influencesperformance on current GPUs. As stated in SectionIII, when a warp launches misaligned or nonconsecutivememory accesses, hardware is not able tocoalesce it and a single reference may become up to32 separate accesses. The LC shows the best resultson this aspect on both databases, which explains itssuperior performance previously reported.A. Performance of parallel implementationsFigure 2 shows the performance speed-ups of ourGPU in<strong>de</strong>xed implementations over optimized sequentialimplementations. Results are very impressivefor SSS-In<strong>de</strong>x (up to 248x for 1% elements re-<strong>JP2011</strong>-362

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!