11.07.2015 Views

Automatic Refinement of Parallel Applications Structure Detection

Automatic Refinement of Parallel Applications Structure Detection

Automatic Refinement of Parallel Applications Structure Detection

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

1e+081e+08Completed Instructions1e+07Completed Instructions1e+07Cluster 1Cluster 2Cluster 3Cluster 4Cluster 5Cluster 6Cluster 7Cluster 8Cluster 9Cluster 10(a)1e+060 0.05 0.1 0.15 0.2 0.25 0.3IPC(c)1e+060 0.05 0.1 0.15 0.2 0.25 0.3IPC(b)(d)Fig. 1. Example structure detection using a cluster analysis based on DBSCAN algorithm. Figures on the left represent the input data. Figures on the rightshow the results <strong>of</strong> applying a DBSCAN with Eps = 0.0050 and MinP oints = 4A. Computation regions characterizationIn order to perform this characterization, we apply a clusteranalysis to the computation bursts present in a parallelapplication. We define a computation burst, or CPU burst,as the region in between communications. Using the clusteralgorithm, the target is to detect the similarities in terms <strong>of</strong>hardware counter metrics related to such this bursts.Presented with a large set <strong>of</strong> metrics that describe each burst(duration plus up to eight hardware counters), we select asubset <strong>of</strong> the counters that will be used as the parametersfor the cluster algorithm analysis. As shown in [4], usingCompleted Instructions and IPC, which focuses on a generalperformance view <strong>of</strong> the application, obtained good resultsfor the majority <strong>of</strong> cases. This combination is able to detectregions <strong>of</strong> the application with different computationalcomplexity (Instructions Completed), and at the same timeto differentiate those regions with the same complexity butdifferent performance. All experiments presented on this paperwere done using this pair <strong>of</strong> metrics.The clustering algorithm we use is DBSCAN [6], onemost representatives algorithms in the density-based clusteringfamily. The most important characteristic <strong>of</strong> these algorithmsis that they do not take any assumption <strong>of</strong> the underlying datamodel, which makes them able to detect clusters <strong>of</strong> arbitraryshape. This is a key fact when analyzing hardware countersdata, because in our studies we stated that the spatial pointsthat represent the counters associated to each burst do notfollow any distribution.In brief, DBSCAN algorithm requires two parameters,Epsilon (or Eps for short) and MinP oints. Then, thealgorithm tries two find the neighborhoods in the data spacewith more than MinP oints individuals, where the distancesacross each pair is less than Eps. The resulting neighborhoodsare the cluster themselves. In addition, those points which doesnot fulfill this characteristics are marked as noise. When in therest <strong>of</strong> the paper we use concepts as more or less restrictiveEps, that means lower and higher Eps values respectively.Using a restrictive (small) Eps value produces smaller or andmore compact clusters, but more noise points. On the otherhand, using a less restrictive (big) Eps value, the result arebigger clusters, with more variability, but less noise points.Figure 1 serves as an example to show how DBSCAN isable to detect the different groups inside the hardware countersdata. The application used was the BT benchmark, one <strong>of</strong>the NAS <strong>Parallel</strong> Benchmarks, using four tasks. We showthe result <strong>of</strong> two iterations <strong>of</strong> the main loops so as to easilyillustrate how the methodology works. As mentioned before,in this example we used the combination <strong>of</strong> Instructions andIPC combination to detect the benchmark structure. Plot 1(a)shows the counters associated to each <strong>of</strong> the CPU bursts thatcorrespond to the blue regions in the time-line 1(b). Nexttwo figures show the results <strong>of</strong> the cluster analysis. In thescatter plot 1(c) is easy to observe that each <strong>of</strong> the differentgroups have been detected as different clusters by DBSCAN.Finally, the time-line 1(d) depicts how these different groupsare distributed along the application execution. In this timelinethe vertical black separates a pattern that repeats twotimes, as the number <strong>of</strong> iterations used, starting with Cluster1 (light green) and finishing with Cluster 8 (orange).B. Cluster Sequence ScoreOnce we discovered the properties <strong>of</strong> DBSCAN in theparallel performance analysis scenario and its utility to detectthe structure <strong>of</strong> parallel applications, we also found a majordrawback to fully automatize the analysis task. The characteristic<strong>of</strong> not making any assumption <strong>of</strong> the underlying datamodel, that becomes essential to produce a good characterization<strong>of</strong> the computation bursts, has also the collateral effect<strong>of</strong> disabling any chance to adjust the discovered cluster to agiven distribution. So, it is difficult to numerically assess thequality <strong>of</strong> a given results. Because <strong>of</strong> this fact, in the workpresented in [4] we proposed to use an expert validation.In our need to automatize cluster analysis process, usingan automatic index or score to evaluate the clusters found2


(a)(b)Fig. 2. Example <strong>of</strong> Cluster Sequence Score alignment <strong>of</strong> the cluster analysisresults. Image (a) shows the outputs <strong>of</strong> the alignment algorithm, markingwith asterisks those clusters with perfect alignment. Time-line (b) presentsthe cluster distribution during the application executionwas critic. There have been some efforts in the communityto propose indexes to evaluate the clusters [7], [8], but wefound that none <strong>of</strong> them were able to correctly evaluate whatwe consider a good structure detection. For this reason, weproposed a Cluster Sequence Score [5].This score is build upon the premise that in an SPMDapplication, at a given time, all tasks should be performingthe same operation. So, if we consider the sequence <strong>of</strong>(computing) actions <strong>of</strong> each tasks, expressed as the sequence<strong>of</strong> clusters obtained after the cluster analysis, this problemcan be understood as a Multiple Sequence Alignment (MSA)<strong>of</strong> DNA or proteins, a classic problem in the bio-informaticsarea. Once the MSA algorithm computed a possible alignment<strong>of</strong> the cluster sequences, what the Cluster Sequence Scoreevaluates is the “percentage alignment” <strong>of</strong> each <strong>of</strong> the detectedclusters. That is, when a cluster appears in one position <strong>of</strong> thesequence, this percentage alignment show how many <strong>of</strong> thetotal tasks are performing the same cluster. Then, when allpositions in the sequence have been considered, we weight thescore obtained for each cluster, using the total time a clusterrepresents in the application, to produce a global score for thewhole application.Figure 2 illustrate how the Cluster Sequence Score works.The application used is the NPB BT Class A, also used inFigure 1, but using a more restrictive Eps that producesmore noise. The sub-figure 2(a) is a capture <strong>of</strong> ClustalX,a bio-informatics s<strong>of</strong>tware that computes MSAs. It containsthe translation <strong>of</strong> the cluster as amino acid sequences. Thosepositions in the sequence that have a perfect align are markedwith an asterisk on the top. Is easy to see that the noise points(Z) produce some misalignment, that leads to a global score<strong>of</strong> 84.64%, meanwhile the cluster example <strong>of</strong> Figure 1 wouldproduce a global score <strong>of</strong> 100%. The time-line in 2(b) presentsthe distribution <strong>of</strong> the clusters during the application execution,and one can clearly see that the noise bursts (brown) disturbthe actual SPMD pattern defined by the rest <strong>of</strong> clusters.III. AGGREGATIVE CLUSTER REFINEMENTOnce presented the two basic pieces <strong>of</strong> our analysis methodology,in this section we present the work merges them. Themain contribution <strong>of</strong> the present paper is the Aggregative Cluster<strong>Refinement</strong>, a method that combines these two previousworks into a single process, that automatically maximizesthe Cluster Sequence Score, and consequently, generates astructure detection according to the expected SPMD pattern.Before detailing how the process works, it is importantto justify the need <strong>of</strong> this automatic refinement analysis.Basically, this necessity comes from the fact that to fullyautomatize the analysis process, we should minimize theparametrization <strong>of</strong> the DBSCAN algorithm. Deeply discussedin [5], any pair <strong>of</strong> parameters will produce a description <strong>of</strong> theapplication structure, but we have to chose a threshold betweena the detail, the amount <strong>of</strong> noise acceptable, etc. For example,considering a fixed election <strong>of</strong> MinPoints as a percentage <strong>of</strong>the total tasks <strong>of</strong> the application, then more restrictive Epswe choose, the more detailed structure we obtain, sometimes,losing the SPMD patterns that we consider the easy way toanalyze an application. In addition to this, we also find aninherent problem <strong>of</strong> the DBSCAN algorithm: its inability tocorrectly detect clusters when the density varies across the dataspace. This problem is directly related to the use <strong>of</strong> a singleEps value. The effects are that if we choose a very restrictiveEps some points that really belong to a cluster are classifiedas noise, meanwhile if we choose a less restrictive Eps someclusters that should be separated appear as a single cluster.A. BackgroundIs not the aim <strong>of</strong> this paper to present a survey on clusteringalgorithms, but some background is needed. One <strong>of</strong> the mostbasic algorithms used in cluster analysis is the hierarchicalclustering [9]. In a naive way, what hierarchical algorithmsdo is link elements <strong>of</strong> the data set according to a givenmetric, for example the Euclidean distance. This is a bottomupprocess, where on each level those individuals or groups<strong>of</strong> individuals with lower values <strong>of</strong> the metric get merged. Asa result, we obtain a dendrogram, a tree where the leaves arethe individuals, and the root is the whole data set. On eachintermediate level we have a possible partition <strong>of</strong> the data, atdifferent granularity. Then, to obtain a final partition, what theuser has to do is decide which <strong>of</strong> this intermediate levels is themost interesting. In other words, where to cut the dendrogram.Exists an implicit parallelism between hierarchical algorithmsand DBSCAN. If we fix the value <strong>of</strong> MinP oints,each Eps value we use on fixed data set can be understood asa cut at different levels <strong>of</strong> the dendrogram. In this way, if weuse a more restrictive Eps value, this “cut” will be close toleaves in the dendrogram, i.e. single noise points, and using aless restrictive Eps will produce a “cut” close to the root, i.e.big clusters grouping high number <strong>of</strong> individuals.B. Methodology DescriptionKnowing this parallelism between a hierarchical algorithmand DBSCAN, the way the Aggregative Cluster <strong>Refinement</strong>is pretty much easy to understand. Instead <strong>of</strong> performing thebrute force attack, consisting <strong>of</strong> select a range <strong>of</strong> possibleEps values and decide using the Cluster Sequence Scorewhich is the best, this process builds a tree, similar to adendrogram, where clusters which do not have a perfectalignment merge so as to reach this desired structure. It is anaggregative refinement because the different cluster analysesuse Eps values from smaller (restrictive) to bigger, imitatingthe bottom-up construction <strong>of</strong> a dendrogram in a hierarchicalalgorithm.The pseudo-code in the Algorithm 1 illustrates this processin more detail. The inputs <strong>of</strong> the algorithm are the data setcomposed by the counters associated to each <strong>of</strong> CPU bursts,3


Input: CP UBurstSet = points representing the CPUBursts <strong>of</strong> the applicationInput: n = number <strong>of</strong> different Eps values to generateInput: ApplicationT asks = number <strong>of</strong> application tasksOutput: F inalP artition = final partition <strong>of</strong> the data inclusters maximizing the Cluster Sequence ScoreOutput: F inalScores = Cluster Sequence Score <strong>of</strong>F inalP artitionMinP oints = ApplicationT asks/4ComputeEpsilons(CP UBurstSet, EpsSet, n);RunDBSCAN(CP UBurstSet, MinP oints, Eps 1 ,P artition 1 );ComputeScores(CP UBurstSet, P artition 1 ,Scores 1 );UpdateTree(CP UBurstSet, P artition 1 , Scores 1 );foreach Eps i ∈ (EpsSet, i = 2..n) doGenerateCandidatePoints(CP UBurstSet,Scores i−1 , CP UBurstsSubset);if CP UBurstsSubset is empty thenconvergence;elseRunDBSCAN(CP UBurstsSubset, MinP oints,Eps i , P artition i );ComputeScores(CP UBurstsSet,P artition i , Scores i );UpdateTree(CP UBurstSet, P artition i ,Scores i );endendProcessLastPartition(CP UBurstSet, lastcomputed P artition i , last computed Scores i ,P artition P ostP rocessed, Scores P ostP rocessed);F inalP artition = P artition P ostP rocessed;F inalScores = Scores P ostP rocessed;Algorithm 1: Aggregative Cluster <strong>Refinement</strong> ProcessCP UBurstSet, the number <strong>of</strong> different Eps we want touse, n (10 by default), and the number <strong>of</strong> tasks presentin the application we want to analyze, ApplicationT asks.The outputs <strong>of</strong> the algorithm are a partition <strong>of</strong> the data,F inalP artition, with the cluster identifiers assigned to eachburst in the input set, and also the scores obtained to eachcluster found plus the global score <strong>of</strong> the last partition, storedin F inalScores.The process starts setting MinP oints as a quarter <strong>of</strong> thetotal tasks the application has. This value is selected becausewe consider that the minimum acceptable SPMD region shouldcover 25% <strong>of</strong> total tasks. Next, using the data set the ndifferent Eps are generated in ComputeEpsilons, sortedincreasingly and stored in the EpsSet set. The way wegenerate this different Eps is crucial and deeply described lateron this Section. The following step is run DBSCAN using thefirst (small) Eps value in the EpsSet. Then we compute thescore associated to each cluster found, on ComputeScoreprocedure, and finally the lowest level <strong>of</strong> the tree is built onUpdateTree.Once we have executed this initial DBSCAN what weobtain is a set <strong>of</strong> clusters that group the individuals in veryfine grain, so they will become the leaves <strong>of</strong> the tree. Next,on each iteration <strong>of</strong> the “foreach” loop, the Eps i used willbe bigger and some <strong>of</strong> those initial clusters that are closein the data set will start merging so as to maximize itsscore. To avoid clusters with a good score to be merged, inGenerateCandidatePoints just the bursts that belongto those which does not have a perfect score (100% alignment)are selected to take part in the current step analysis. Thiscandidates are stored in the set CP UBurstSubset. If thissubset <strong>of</strong> the data is empty, that means that we have arrived toa convergence point, where all clusters scored the maximum,and no new candidates are generated. This causes the break<strong>of</strong> the loop. Otherwise, the loop iterates n − 1 times and theexecution finishes when no more Eps i are available. Priorto assign the outputs, a last process to the data is done. Ifthe main loop has finished because all Eps values have beenexplored, some clusters may have not been aggregated todetermine a SPMD section. In ProcessLastPartition,we process the sequences resulting from the last partitioncomputed. Those clusters that do not have perfect scoreand occur simultaneously in more than one positions in thesequences get merged. Finally, when this last process is done,F inalP artition and F inalScore are assigned to the onesproduced by this last method.In Figures 3 and 4 we present an example to illustratehow this process works. In this experiment, we used the sameapplication as in the examples <strong>of</strong> Section II, and NPB BT ClassA, with 4 tasks. We used n = 10 as the number <strong>of</strong> possibleEps values to evaluate, but the algorithm converged in 7 steps.Figure 3 shows tree obtained after a whole execution <strong>of</strong> therefinement process. In this tree the nodes that represent theclusters with perfect score are filled. Each level represents onestep <strong>of</strong> the iterative process, and we can see how the clustersmerge as the Eps increases. Actually, what we finally obtainis a set <strong>of</strong> trees, because those clusters with perfect alignmentstop growing to higher levels. What we show in Figure 4 arethe time-lines corresponding to those steps where clusters withperfect score appear: time-line 4(a) for Step 1, 4(b) for Step 4,4(c) from Step 6 and 4(d) for the final Step 7. The Eps rangeselected for this example produced a good number <strong>of</strong> clusterswith perfect score in Step 1, and the we can see how noisepoints (brown) move to actual clusters at different levels. It isalso interesting to see how some clusters that are independent,get merged with some others to get the perfect alignment, forexample in time-line 4(b) comparing it with time-line 4(a),where Cluster 9 (light blue) merged with Cluster 10 (lightgreen).We want to highlight two interesting facts about this algorithm.Firstly, it avoids a possible trick to the Cluster SequenceScore, that consist <strong>of</strong> choosing an Eps big enough to considerall burst in a single cluster. That would obtain a perfect scorebecause all sequences <strong>of</strong> actions will be the same, but wouldbe unusable, because not structure would be presented to theanalyst. The way the algorithm aggregates the clusters andcuts the expansions <strong>of</strong> those perfectly aligned guarantees thatthis over-aggregation never happens. Secondly, the decision <strong>of</strong>producing the refinement in a bottom-up approach is based onthe efficiency <strong>of</strong> DBSCAN when using small Eps values. Wefirst implemented a divisive approach, top-down, but usingbig Eps in first steps resulted in expensive analysis <strong>of</strong> theinitial iterations. With the aggregative approach, we force theanalysis with big Eps values to be executed with less inputbursts, because some <strong>of</strong> them would have been cut in previouslevels.4


STEP 1 Eps = 0.00185696Cluster 8Score = 75%NoiseScore = 31.25%Cluster 9Score = 50%Cluster 10Score = 37.5%Cluster 12Score = 37.5%Cluster 11Score = 37.5%Cluster 1Score = 100%Cluster 2Score = 100%Cluster 3Score = 100%Cluster 4Score = 100%Cluster 5Score = 100%Cluster 6Score = 100%Cluster 7Score = 100%STEP 2 Eps = 0.0020434Cluster 8Score = 75%NoiseScore = 31.25%Cluster 9Score = 87.5%Cluster 12Score = 37.5%Cluster 11Score = 37.5%STEP 3 Eps = 0.00230222Cluster 8Score = 75%Cluster 13Score = 50%NoiseScore = 25%Cluster 9Score = 87.5%Cluster 11Score = 75%STEP 4 Eps = 0.0032067Cluster 8Score = 75%Cluster 13Score = 50%NoiseScore = 25%Cluster 9Score = 100%Cluster 11Score = 87.5%STEP 5 Eps = 0.0032843Cluster 8Score = 75%Cluster 13Score = 50%NoiseScore = 25%Cluster 11Score = 87.5%STEP 6 Eps = 0.00395902Cluster 8Score = 100%NoiseScore = 25%Cluster 11Score = 87.5%STEP 7 Eps = 0.00469895Cluster 11Score = 100%Fig. 3. Complete aggregative refinement cluster analysis tree obtained from NPB BT class A executed with 4 tasks. The empty nodes <strong>of</strong> the tree depict those clusters that are discarded because need to be merged.Filled ones are those selected in the final partition <strong>of</strong> the data. In this case, due to the convergence, all selected nodes got 100% score. Each layer represents one step in the refinement loop.5(a)(b)(c)(d)Fig. 4. Application time-lines expressing the clusters found at different steps <strong>of</strong> the refinement process corresponding to Figure 3. (a) is the initial clustering, Step 1, where most <strong>of</strong> the main clusters have alreadybeen discovered. (b) Time-line <strong>of</strong> the Step 4, where Cluster 9 (light green), gets a perfect alignment. (c) Time-line <strong>of</strong> the Step 6, where Cluster 8 (orange) gets perfect alignment. (d) Time-line <strong>of</strong> the final partition<strong>of</strong> the data, Step 7 in the tree, Cluster 11 obtains perfect score. Dots on the top <strong>of</strong> the time-lines serve as guide to clearly see this clusters


TABLE IRESUME OF THE EXPERIMENTS PERFORMED USING THE AGGREGATIVE CLUSTER REFINEMENT ANALYSIS ON SIX PARALLEL APPLICATIONS<strong>Applications</strong> Description Aggregative <strong>Refinement</strong> Manual AnalysisName Task Count Data Points Clusters Global Score Clusters Global Score Score DifferenceCPMD 128 37,490 17 99.93% 16 99.15% +0.78%GAPge<strong>of</strong>em 16 15,136 2 99.94% 3 99.77% +0.17%PEPC 32 14,233 12 98.20% 13 98.09% +0.11%SOCORRO 16 623 8 99.69% 10 97.08% +2.61%VAC 128 10,240 5 99.94% 10 97.62% +2.33%WRF 16 11,733 10 98.01% 6 95.87% +2.14%up to six different clusters. If we look at the time-lines, wecan observe that the Cluster 6 on the Aggregative <strong>Refinement</strong>correspond to two different SPMD regions perfectly aligned in6(b), but in the Manual Analysis time-line 6(d) these regionsare not perfectly SPMD, mixing groups <strong>of</strong> tasks into differentclusters. Is interesting to note that the Aggregative Cluster<strong>Refinement</strong> merge this clusters final process performed onProcessLastPartition, as can be seen on Figure 7.On the other hand, what the Manual Analysis detected isan unbalance in this section <strong>of</strong> code, which is actually aninteresting region to analyze, but the target <strong>of</strong> the automaticrefinement in first instance is to detect the different SPMDregions, not the internal detail.Then, the question that question arises is why do not increasethe Eps in the Manual Analysis? The answer is relatedto one <strong>of</strong> the major drawbacks <strong>of</strong> DBSCAN: it could not detectcorrectly clusters with different density. If we increase the Epsin the Manual Analysis the clusters on this region will mergeas it happens in the Aggregative <strong>Refinement</strong>, but Clusters 3(red) and 4 (green) on VI will merge too, and that would bea clear situation <strong>of</strong> over-aggregation, because these clustersreally define well-separated SPMD regions.The case <strong>of</strong> WRF is completely the opposite. The Aggregative<strong>Refinement</strong> detects 10 different clusters, with a GlobalScore <strong>of</strong> 98.01%, and the best Manual Analysis just obtained6, with a score <strong>of</strong> 95.87%. We do not show the graphicalresults <strong>of</strong> this experiments, due to space considerations, butthe numbers on Table I demonstrate that, for this application,the automatic analysis based on the Aggregative <strong>Refinement</strong>produced a superior structure detection than the “best” ManualAnalysis possible.V. RELATED WORKAs we mentioned in the introduction, Cluster Analysis isbecoming more and more popular in the parallel performanceanalysis community. Initial works refer to Nickolayev et.al. [2], where cluster analysis was used to detect commonbehaviour accross different tasks, in an on-line scenario.This kind <strong>of</strong> analyses, where the target is to group tasksin a parallel application that behave in a similar way wasalso exploited by Ahn et. al. [1], who introduced the use <strong>of</strong>hardware counters to detect this similarity, or Huck et. al. [10],where the cluster analysis were applied in a multi-experimentanalyzer.Recent works are moving back to the original on-linescenario. Good examples <strong>of</strong> are the work presented by Llortet. al. [11], which is an on-line porting <strong>of</strong> the DBSCAN clusteranalysis, or the paper by Szebenyi et. al. [12], where an onlinecluster analysis is used to reduce the volume <strong>of</strong> analysisdata generated.VI. CONCLUSIONSIn this paper we have presented a methodology, the AggregativeCluster <strong>Refinement</strong>, to automatically detect the innerstructure <strong>of</strong> SPMD parallel applications. This methodologyrelies on a cluster analysis using the DBSCAN algorithmplus a quality score based on a Multiple Sequence Alignmentalgorithm to iteratively refine the SPMD phases detected. It<strong>of</strong>fers the application analyst an easy way to start the analyses,so it is capable to detect the different phases <strong>of</strong> an application,at fine grain.The correctness and usefulness <strong>of</strong> the Aggregative Cluster<strong>Refinement</strong> has been demonstrated analyzing 6 different realworldapplications. In the experiments, this automatic method,that requires no user intervention, were able to correctlydetect the application SPMD structures, outperforming the“best” results obtained manually using the original DBSCANanalysis.Our most immediate work is to port the developmentspresented in this paper to an on-line scenario and combinethem with other automatic tools, so as to define a fullyautomatic analysis environment for parallel applications.ACKOWNLEDGMENTSWe would like to acknowledge the BSC Tools team for theirsupport with the tools used for the development <strong>of</strong> the currentpaper. This work is granted by the IBM/BSC MareIncognitoproject and by the Comisión Interministerial de Ciencia yTecnología (CICYT), contract TIN2007-60625.REFERENCES[1] D. H. Ahn and J. S. Vetter, “Scalable analysis techniques for microprocessorperformance counter metrics,” in SC, 2002.[2] O. Y. Nickolayev, P. C. Roth, and D. A. Reed, “Real-Time StatisticalClustering for Event Trace Reduction,” International Journal <strong>of</strong> Supercomputer<strong>Applications</strong> and High Performance Computing, 1997.[3] C. W. Lee, C. Mendes, and L. Kale, “Towards Scalable PerformanceAnalysis and Visualization through Data Reduction,” in IPDPS, 2008.[4] J. Gonzalez, J. Gimenez, and J. Labarta, “<strong>Automatic</strong> <strong>Detection</strong> <strong>of</strong> <strong>Parallel</strong><strong>Applications</strong> Computation Phases,” in IPDPS, 2009.[5] ——, “<strong>Automatic</strong> Evaluation <strong>of</strong> the Computation <strong>Structure</strong> <strong>of</strong> <strong>Parallel</strong><strong>Applications</strong>,” in PDCAT, 2009.[6] M. Ester, H. P. Kriegel, J. Sander, and X. Xu, “A Density-BasedAlgorithm for Discovering Clusters in Large Spatial Databases withNoise,” in KDD, 1996.[7] M. Halkidi and M. Vazirigiannis, “Clustering Validity Sssessment UsingMulti Representatives,” in SETN, 2002.[8] Y. Liu, Z. Li, H. Xiong, X. Gao, and J. Wu, “Understanding <strong>of</strong> internalclustering validation measures,” in ICDM, 2010.[9] J. Joe H. Ward, “Hierarchical Grouping to Optimize an ObjectiveFunction,” Journal <strong>of</strong> the American Statistical Association, 1963.[10] K. A. Huck and A. D. Malony, “PerfExplorer: A Performance DataMining Framework For Large-Scale <strong>Parallel</strong> Computing,” in SC, 2005.[11] G. Llort, J. Gonzalez, H. Servat, J. Gimenez, and J. Labarta, “On-line<strong>Detection</strong> <strong>of</strong> Large-scale <strong>Parallel</strong> Application’s <strong>Structure</strong>,” in IPDPS,2010.[12] Z. Szebenyi, F. Wolf, and B. J. N. Wylie, “Performance Analysis <strong>of</strong>Long-running <strong>Applications</strong>,” in IPDPS, 2011.7


Completed Instructions1e+101e+091e+08NoiseCluster 1Cluster 2Cluster 3Cluster 4Cluster 6(a)1e+070.2 0.25 0.3 0.35 0.4 0.45 0.5 0.55 0.6 0.65IPC(b)1e+10Completed Instructions1e+091e+08NoiseCluster 1Cluster 2Cluster 3Cluster 4Cluster 5Cluster 6Cluster 7Cluster 8Cluster 9Cluster 10(c)1e+070.2 0.25 0.3 0.35 0.4 0.45 0.5 0.55 0.6 0.65IPC(d)8Fig. 6.Scatter plots and time-lines <strong>of</strong> the resulting clusters from VAC application obtained using the Aggregative <strong>Refinement</strong> Cluster analysis, (a) and (b), and perforoming a Manual Analysis, (c) and (d)STEP1Cl. 12Score = 19.7656%Cl. 2Score = 99.9219%Cl. 13Score = 20.2344%Cl. 10Score = 20.0781%Cl. 11Score = 9.60938%Cl. 3Score = 97.5%NoiseScore = 5.96418%Cl. 6Score = 69.7266%Cl. 16Score = 2.96875%Cl. 14Score = 25%Cl. 17Score = 2.45536%Cl. 15Score = 28.125%Cl. 9Score = 10.1562%Cl. 5Score = 28.125%Cl. 7Score = 16.3281%Cl. 4Score = 31.25%Cl. 8Score = 14.0625%Cl. 1Score = 100%STEP 2Cl. 12Score = 20.3125%Cl. 2Score = 99.9219%Cl. 13Score = 20.2344%Cl. 10Score = 20%Cl. 11Score = 9.72656%Cl. 3Score = 97.6562%NoiseScore = 5.72266%Cl. 6Score = 69.9219%Cl. 16Score = 3.20312%Cl. 14Score = 25%Cl. 17Score = 2.45536%Cl. 15Score = 28.125%Cl. 9Score = 10.1562%Cl. 5Score = 28.125%Cl. 4Score = 47.6562%Cl. 8Score = 14.0625%STEP 3Cl. 12Score = 20.3125%Cl. 2Score = 99.9219%Cl. 13Score = 20.2344%Cl. 10Score = 20%Cl. 11Score = 9.80469%Cl. 3Score = 97.7344%NoiseScore = 5.35156%Cl. 6Score = 70.0391%Cl. 16Score = 4.0625%Cl. 14Score = 20.4241%Cl. 15Score = 25.5682%Cl. 9Score = 10.1562%Cl. 5Score = 28.125%Cl. 4Score = 47.6562%Cl. 8Score = 14.0625%STEP 4Cl. 12Score = 20.3125%Cl. 2Score = 99.9219%Cl. 13Score = 20.2344%Cl. 10Score = 20%Cl. 11Score = 9.84375%NoiseScore = 4.66797%Cl. 3Score = 97.8906%Cl. 6Score = 73.0469%Cl. 14Score = 13.8765%Cl. 15Score = 28.125%Cl. 9Score = 10.1562%Cl. 5Score = 28.125%Cl. 4Score = 61.7188%STEP 5Cl. 12Score = 20.3125%Cl. 2Score = 99.9219%Cl. 13Score = 21.0156%Cl. 10Score = 19.9219%Cl. 11Score = 10.1562%NoiseScore = 4.00391%Cl. 3Score = 97.8906%Cl. 6Score = 73.5938%Cl. 14Score = 13.3523%Cl. 15Score = 28.125%Cl. 9Score = 10.1562%Cl. 5Score = 28.125%Cl. 4Score = 61.7188%STEP 6Cl. 12Score = 20.3125%Cl. 2Score = 99.9219%Cl. 13Score = 21.1719%Cl. 10Score = 20.8594%NoiseScore = 3.08494%Cl. 3Score = 99.2188%Cl. 6Score = 73.7891%Cl. 14Score = 11.5385%Cl. 15Score = 28.125%Cl. 9Score = 10.1562%Cl. 5Score = 28.125%Cl. 4Score = 61.7188%STEP 7Cl. 12Score = 20.3125%Cl. 2Score = 99.9219%Cl. 13Score = 21.4844%Cl. 10Score = 22.4219%NoiseScore = 2.23524%Cl. 3Score = 99.6875%Cl. 6Score = 73.8281%Cl. 14Score = 12%Cl. 15Score = 25.5682%Cl. 9Score = 10.1562%Cl. 4Score = 89.8438%STEP 8Cl. 12Score = 20.3125%Cl. 2Score = 99.9219%Cl. 13Score = 21.875%Cl. 10Score = 23.2422%NoiseScore = 1.63352%Cl. 3Score = 99.9219%Cl. 6Score = 73.9844%Cl. 14Score = 12.0312%Cl. 15Score = 25.5682%Cl. 9Score = 10.1562%Cl. 4Score = 89.8438%STEP 9Cl. 12Score = 20.3125%Cl. 2Score = 99.9219%Cl. 13Score = 21.875%NoiseScore = 1.30208%Cl. 10Score = 24.8828%Cl. 3Score = 100%Cl. 6Score = 59.375%Cl. 15Score = 28.125%Cl. 4Score = 100%STEP 10Cl. 12Score = 18.4659%Cl. 2Score = 99.9609%Cl. 13Score = 18.2943%NoiseScore = 1.49148%Cl. 10Score = 24.9609%Cl. 6Score = 68.75%POST-PROCESSCl. 6Score = 99.4792%Fig. 7. Tree obtained in Aggregative <strong>Refinement</strong> analysis <strong>of</strong> VAC application. As in Figure 3, filled nodes represent those clusters selected in the final partition. The main difference with the previous examplelies in the fact that this analysis needed to use all possible Eps values and also the last process based on sequence alignment merged a set <strong>of</strong> clusters to produce the final Cluster 6

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!