Automatic Refinement of Parallel Applications Structure Detection

1e+081e+08Completed Instructions1e+07Completed Instructions1e+07Cluster 1Cluster 2Cluster 3Cluster 4Cluster 5Cluster 6Cluster 7Cluster 8Cluster 9Cluster 10(a)1e+060 0.05 0.1 0.15 0.2 0.25 0.3IPC(c)1e+060 0.05 0.1 0.15 0.2 0.25 0.3IPC(b)(d)Fig. 1. Example structure detection using a cluster analysis based on DBSCAN algorithm. Figures on the left represent the input data. Figures on the rightshow the results of applying a DBSCAN with Eps = 0.0050 and MinP oints = 4A. Computation regions characterizationIn order to perform this characterization, we apply a clusteranalysis to the computation bursts present in a parallelapplication. We define a computation burst, or CPU burst,as the region in between communications. Using the clusteralgorithm, the target is to detect the similarities in terms ofhardware counter metrics related to such this bursts.Presented with a large set of metrics that describe each burst(duration plus up to eight hardware counters), we select asubset of the counters that will be used as the parametersfor the cluster algorithm analysis. As shown in [4], usingCompleted Instructions and IPC, which focuses on a generalperformance view of the application, obtained good resultsfor the majority of cases. This combination is able to detectregions of the application with different computationalcomplexity (Instructions Completed), and at the same timeto differentiate those regions with the same complexity butdifferent performance. All experiments presented on this paperwere done using this pair of metrics.The clustering algorithm we use is DBSCAN [6], onemost representatives algorithms in the density-based clusteringfamily. The most important characteristic of these algorithmsis that they do not take any assumption of the underlying datamodel, which makes them able to detect clusters of arbitraryshape. This is a key fact when analyzing hardware countersdata, because in our studies we stated that the spatial pointsthat represent the counters associated to each burst do notfollow any distribution.In brief, DBSCAN algorithm requires two parameters,Epsilon (or Eps for short) and MinP oints. Then, thealgorithm tries two find the neighborhoods in the data spacewith more than MinP oints individuals, where the distancesacross each pair is less than Eps. The resulting neighborhoodsare the cluster themselves. In addition, those points which doesnot fulfill this characteristics are marked as noise. When in therest of the paper we use concepts as more or less restrictiveEps, that means lower and higher Eps values respectively.Using a restrictive (small) Eps value produces smaller or andmore compact clusters, but more noise points. On the otherhand, using a less restrictive (big) Eps value, the result arebigger clusters, with more variability, but less noise points.Figure 1 serves as an example to show how DBSCAN isable to detect the different groups inside the hardware countersdata. The application used was the BT benchmark, one ofthe NAS Parallel Benchmarks, using four tasks. We showthe result of two iterations of the main loops so as to easilyillustrate how the methodology works. As mentioned before,in this example we used the combination of Instructions andIPC combination to detect the benchmark structure. Plot 1(a)shows the counters associated to each of the CPU bursts thatcorrespond to the blue regions in the time-line 1(b). Nexttwo figures show the results of the cluster analysis. In thescatter plot 1(c) is easy to observe that each of the differentgroups have been detected as different clusters by DBSCAN.Finally, the time-line 1(d) depicts how these different groupsare distributed along the application execution. In this timelinethe vertical black separates a pattern that repeats twotimes, as the number of iterations used, starting with Cluster1 (light green) and finishing with Cluster 8 (orange).B. Cluster Sequence ScoreOnce we discovered the properties of DBSCAN in theparallel performance analysis scenario and its utility to detectthe structure of parallel applications, we also found a majordrawback to fully automatize the analysis task. The characteristicof not making any assumption of the underlying datamodel, that becomes essential to produce a good characterizationof the computation bursts, has also the collateral effectof disabling any chance to adjust the discovered cluster to agiven distribution. So, it is difficult to numerically assess thequality of a given results. Because of this fact, in the workpresented in [4] we proposed to use an expert validation.In our need to automatize cluster analysis process, usingan automatic index or score to evaluate the clusters found2

(a)(b)Fig. 2. Example of Cluster Sequence Score alignment of the cluster analysisresults. Image (a) shows the outputs of the alignment algorithm, markingwith asterisks those clusters with perfect alignment. Time-line (b) presentsthe cluster distribution during the application executionwas critic. There have been some efforts in the communityto propose indexes to evaluate the clusters [7], [8], but wefound that none of them were able to correctly evaluate whatwe consider a good structure detection. For this reason, weproposed a Cluster Sequence Score [5].This score is build upon the premise that in an SPMDapplication, at a given time, all tasks should be performingthe same operation. So, if we consider the sequence of(computing) actions of each tasks, expressed as the sequenceof clusters obtained after the cluster analysis, this problemcan be understood as a Multiple Sequence Alignment (MSA)of DNA or proteins, a classic problem in the bio-informaticsarea. Once the MSA algorithm computed a possible alignmentof the cluster sequences, what the Cluster Sequence Scoreevaluates is the “percentage alignment” of each of the detectedclusters. That is, when a cluster appears in one position of thesequence, this percentage alignment show how many of thetotal tasks are performing the same cluster. Then, when allpositions in the sequence have been considered, we weight thescore obtained for each cluster, using the total time a clusterrepresents in the application, to produce a global score for thewhole application.Figure 2 illustrate how the Cluster Sequence Score works.The application used is the NPB BT Class A, also used inFigure 1, but using a more restrictive Eps that producesmore noise. The sub-figure 2(a) is a capture of ClustalX,a bio-informatics software that computes MSAs. It containsthe translation of the cluster as amino acid sequences. Thosepositions in the sequence that have a perfect align are markedwith an asterisk on the top. Is easy to see that the noise points(Z) produce some misalignment, that leads to a global scoreof 84.64%, meanwhile the cluster example of Figure 1 wouldproduce a global score of 100%. The time-line in 2(b) presentsthe distribution of the clusters during the application execution,and one can clearly see that the noise bursts (brown) disturbthe actual SPMD pattern defined by the rest of clusters.III. AGGREGATIVE CLUSTER REFINEMENTOnce presented the two basic pieces of our analysis methodology,in this section we present the work merges them. Themain contribution of the present paper is the Aggregative ClusterRefinement, a method that combines these two previousworks into a single process, that automatically maximizesthe Cluster Sequence Score, and consequently, generates astructure detection according to the expected SPMD pattern.Before detailing how the process works, it is importantto justify the need of this automatic refinement analysis.Basically, this necessity comes from the fact that to fullyautomatize the analysis process, we should minimize theparametrization of the DBSCAN algorithm. Deeply discussedin [5], any pair of parameters will produce a description of theapplication structure, but we have to chose a threshold betweena the detail, the amount of noise acceptable, etc. For example,considering a fixed election of MinPoints as a percentage ofthe total tasks of the application, then more restrictive Epswe choose, the more detailed structure we obtain, sometimes,losing the SPMD patterns that we consider the easy way toanalyze an application. In addition to this, we also find aninherent problem of the DBSCAN algorithm: its inability tocorrectly detect clusters when the density varies across the dataspace. This problem is directly related to the use of a singleEps value. The effects are that if we choose a very restrictiveEps some points that really belong to a cluster are classifiedas noise, meanwhile if we choose a less restrictive Eps someclusters that should be separated appear as a single cluster.A. BackgroundIs not the aim of this paper to present a survey on clusteringalgorithms, but some background is needed. One of the mostbasic algorithms used in cluster analysis is the hierarchicalclustering [9]. In a naive way, what hierarchical algorithmsdo is link elements of the data set according to a givenmetric, for example the Euclidean distance. This is a bottomupprocess, where on each level those individuals or groupsof individuals with lower values of the metric get merged. Asa result, we obtain a dendrogram, a tree where the leaves arethe individuals, and the root is the whole data set. On eachintermediate level we have a possible partition of the data, atdifferent granularity. Then, to obtain a final partition, what theuser has to do is decide which of this intermediate levels is themost interesting. In other words, where to cut the dendrogram.Exists an implicit parallelism between hierarchical algorithmsand DBSCAN. If we fix the value of MinP oints,each Eps value we use on fixed data set can be understood asa cut at different levels of the dendrogram. In this way, if weuse a more restrictive Eps value, this “cut” will be close toleaves in the dendrogram, i.e. single noise points, and using aless restrictive Eps will produce a “cut” close to the root, i.e.big clusters grouping high number of individuals.B. Methodology DescriptionKnowing this parallelism between a hierarchical algorithmand DBSCAN, the way the Aggregative Cluster Refinementis pretty much easy to understand. Instead of performing thebrute force attack, consisting of select a range of possibleEps values and decide using the Cluster Sequence Scorewhich is the best, this process builds a tree, similar to adendrogram, where clusters which do not have a perfectalignment merge so as to reach this desired structure. It is anaggregative refinement because the different cluster analysesuse Eps values from smaller (restrictive) to bigger, imitatingthe bottom-up construction of a dendrogram in a hierarchicalalgorithm.The pseudo-code in the Algorithm 1 illustrates this processin more detail. The inputs of the algorithm are the data setcomposed by the counters associated to each of CPU bursts,3

Input: CP UBurstSet = points representing the CPUBursts of the applicationInput: n = number of different Eps values to generateInput: ApplicationT asks = number of application tasksOutput: F inalP artition = final partition of the data inclusters maximizing the Cluster Sequence ScoreOutput: F inalScores = Cluster Sequence Score ofF inalP artitionMinP oints = ApplicationT asks/4ComputeEpsilons(CP UBurstSet, EpsSet, n);RunDBSCAN(CP UBurstSet, MinP oints, Eps 1 ,P artition 1 );ComputeScores(CP UBurstSet, P artition 1 ,Scores 1 );UpdateTree(CP UBurstSet, P artition 1 , Scores 1 );foreach Eps i ∈ (EpsSet, i = 2..n) doGenerateCandidatePoints(CP UBurstSet,Scores i−1 , CP UBurstsSubset);if CP UBurstsSubset is empty thenconvergence;elseRunDBSCAN(CP UBurstsSubset, MinP oints,Eps i , P artition i );ComputeScores(CP UBurstsSet,P artition i , Scores i );UpdateTree(CP UBurstSet, P artition i ,Scores i );endendProcessLastPartition(CP UBurstSet, lastcomputed P artition i , last computed Scores i ,P artition P ostP rocessed, Scores P ostP rocessed);F inalP artition = P artition P ostP rocessed;F inalScores = Scores P ostP rocessed;Algorithm 1: Aggregative Cluster Refinement ProcessCP UBurstSet, the number of different Eps we want touse, n (10 by default), and the number of tasks presentin the application we want to analyze, ApplicationT asks.The outputs of the algorithm are a partition of the data,F inalP artition, with the cluster identifiers assigned to eachburst in the input set, and also the scores obtained to eachcluster found plus the global score of the last partition, storedin F inalScores.The process starts setting MinP oints as a quarter of thetotal tasks the application has. This value is selected becausewe consider that the minimum acceptable SPMD region shouldcover 25% of total tasks. Next, using the data set the ndifferent Eps are generated in ComputeEpsilons, sortedincreasingly and stored in the EpsSet set. The way wegenerate this different Eps is crucial and deeply described lateron this Section. The following step is run DBSCAN using thefirst (small) Eps value in the EpsSet. Then we compute thescore associated to each cluster found, on ComputeScoreprocedure, and finally the lowest level of the tree is built onUpdateTree.Once we have executed this initial DBSCAN what weobtain is a set of clusters that group the individuals in veryfine grain, so they will become the leaves of the tree. Next,on each iteration of the “foreach” loop, the Eps i used willbe bigger and some of those initial clusters that are closein the data set will start merging so as to maximize itsscore. To avoid clusters with a good score to be merged, inGenerateCandidatePoints just the bursts that belongto those which does not have a perfect score (100% alignment)are selected to take part in the current step analysis. Thiscandidates are stored in the set CP UBurstSubset. If thissubset of the data is empty, that means that we have arrived toa convergence point, where all clusters scored the maximum,and no new candidates are generated. This causes the breakof the loop. Otherwise, the loop iterates n − 1 times and theexecution finishes when no more Eps i are available. Priorto assign the outputs, a last process to the data is done. Ifthe main loop has finished because all Eps values have beenexplored, some clusters may have not been aggregated todetermine a SPMD section. In ProcessLastPartition,we process the sequences resulting from the last partitioncomputed. Those clusters that do not have perfect scoreand occur simultaneously in more than one positions in thesequences get merged. Finally, when this last process is done,F inalP artition and F inalScore are assigned to the onesproduced by this last method.In Figures 3 and 4 we present an example to illustratehow this process works. In this experiment, we used the sameapplication as in the examples of Section II, and NPB BT ClassA, with 4 tasks. We used n = 10 as the number of possibleEps values to evaluate, but the algorithm converged in 7 steps.Figure 3 shows tree obtained after a whole execution of therefinement process. In this tree the nodes that represent theclusters with perfect score are filled. Each level represents onestep of the iterative process, and we can see how the clustersmerge as the Eps increases. Actually, what we finally obtainis a set of trees, because those clusters with perfect alignmentstop growing to higher levels. What we show in Figure 4 arethe time-lines corresponding to those steps where clusters withperfect score appear: time-line 4(a) for Step 1, 4(b) for Step 4,4(c) from Step 6 and 4(d) for the final Step 7. The Eps rangeselected for this example produced a good number of clusterswith perfect score in Step 1, and the we can see how noisepoints (brown) move to actual clusters at different levels. It isalso interesting to see how some clusters that are independent,get merged with some others to get the perfect alignment, forexample in time-line 4(b) comparing it with time-line 4(a),where Cluster 9 (light blue) merged with Cluster 10 (lightgreen).We want to highlight two interesting facts about this algorithm.Firstly, it avoids a possible trick to the Cluster SequenceScore, that consist of choosing an Eps big enough to considerall burst in a single cluster. That would obtain a perfect scorebecause all sequences of actions will be the same, but wouldbe unusable, because not structure would be presented to theanalyst. The way the algorithm aggregates the clusters andcuts the expansions of those perfectly aligned guarantees thatthis over-aggregation never happens. Secondly, the decision ofproducing the refinement in a bottom-up approach is based onthe efficiency of DBSCAN when using small Eps values. Wefirst implemented a divisive approach, top-down, but usingbig Eps in first steps resulted in expensive analysis of theinitial iterations. With the aggregative approach, we force theanalysis with big Eps values to be executed with less inputbursts, because some of them would have been cut in previouslevels.4

STEP 1 Eps = 0.00185696Cluster 8Score = 75%NoiseScore = 31.25%Cluster 9Score = 50%Cluster 10Score = 37.5%Cluster 12Score = 37.5%Cluster 11Score = 37.5%Cluster 1Score = 100%Cluster 2Score = 100%Cluster 3Score = 100%Cluster 4Score = 100%Cluster 5Score = 100%Cluster 6Score = 100%Cluster 7Score = 100%STEP 2 Eps = 0.0020434Cluster 8Score = 75%NoiseScore = 31.25%Cluster 9Score = 87.5%Cluster 12Score = 37.5%Cluster 11Score = 37.5%STEP 3 Eps = 0.00230222Cluster 8Score = 75%Cluster 13Score = 50%NoiseScore = 25%Cluster 9Score = 87.5%Cluster 11Score = 75%STEP 4 Eps = 0.0032067Cluster 8Score = 75%Cluster 13Score = 50%NoiseScore = 25%Cluster 9Score = 100%Cluster 11Score = 87.5%STEP 5 Eps = 0.0032843Cluster 8Score = 75%Cluster 13Score = 50%NoiseScore = 25%Cluster 11Score = 87.5%STEP 6 Eps = 0.00395902Cluster 8Score = 100%NoiseScore = 25%Cluster 11Score = 87.5%STEP 7 Eps = 0.00469895Cluster 11Score = 100%Fig. 3. Complete aggregative refinement cluster analysis tree obtained from NPB BT class A executed with 4 tasks. The empty nodes of the tree depict those clusters that are discarded because need to be merged.Filled ones are those selected in the final partition of the data. In this case, due to the convergence, all selected nodes got 100% score. Each layer represents one step in the refinement loop.5(a)(b)(c)(d)Fig. 4. Application time-lines expressing the clusters found at different steps of the refinement process corresponding to Figure 3. (a) is the initial clustering, Step 1, where most of the main clusters have alreadybeen discovered. (b) Time-line of the Step 4, where Cluster 9 (light green), gets a perfect alignment. (c) Time-line of the Step 6, where Cluster 8 (orange) gets perfect alignment. (d) Time-line of the final partitionof the data, Step 7 in the tree, Cluster 11 obtains perfect score. Dots on the top of the time-lines serve as guide to clearly see this clusters

TABLE IRESUME OF THE EXPERIMENTS PERFORMED USING THE AGGREGATIVE CLUSTER REFINEMENT ANALYSIS ON SIX PARALLEL APPLICATIONSApplications Description Aggregative Refinement Manual AnalysisName Task Count Data Points Clusters Global Score Clusters Global Score Score DifferenceCPMD 128 37,490 17 99.93% 16 99.15% +0.78%GAPgeofem 16 15,136 2 99.94% 3 99.77% +0.17%PEPC 32 14,233 12 98.20% 13 98.09% +0.11%SOCORRO 16 623 8 99.69% 10 97.08% +2.61%VAC 128 10,240 5 99.94% 10 97.62% +2.33%WRF 16 11,733 10 98.01% 6 95.87% +2.14%up to six different clusters. If we look at the time-lines, wecan observe that the Cluster 6 on the Aggregative Refinementcorrespond to two different SPMD regions perfectly aligned in6(b), but in the Manual Analysis time-line 6(d) these regionsare not perfectly SPMD, mixing groups of tasks into differentclusters. Is interesting to note that the Aggregative ClusterRefinement merge this clusters final process performed onProcessLastPartition, as can be seen on Figure 7.On the other hand, what the Manual Analysis detected isan unbalance in this section of code, which is actually aninteresting region to analyze, but the target of the automaticrefinement in first instance is to detect the different SPMDregions, not the internal detail.Then, the question that question arises is why do not increasethe Eps in the Manual Analysis? The answer is relatedto one of the major drawbacks of DBSCAN: it could not detectcorrectly clusters with different density. If we increase the Epsin the Manual Analysis the clusters on this region will mergeas it happens in the Aggregative Refinement, but Clusters 3(red) and 4 (green) on VI will merge too, and that would bea clear situation of over-aggregation, because these clustersreally define well-separated SPMD regions.The case of WRF is completely the opposite. The AggregativeRefinement detects 10 different clusters, with a GlobalScore of 98.01%, and the best Manual Analysis just obtained6, with a score of 95.87%. We do not show the graphicalresults of this experiments, due to space considerations, butthe numbers on Table I demonstrate that, for this application,the automatic analysis based on the Aggregative Refinementproduced a superior structure detection than the “best” ManualAnalysis possible.V. RELATED WORKAs we mentioned in the introduction, Cluster Analysis isbecoming more and more popular in the parallel performanceanalysis community. Initial works refer to Nickolayev et.al. [2], where cluster analysis was used to detect commonbehaviour accross different tasks, in an on-line scenario.This kind of analyses, where the target is to group tasksin a parallel application that behave in a similar way wasalso exploited by Ahn et. al. [1], who introduced the use ofhardware counters to detect this similarity, or Huck et. al. [10],where the cluster analysis were applied in a multi-experimentanalyzer.Recent works are moving back to the original on-linescenario. Good examples of are the work presented by Llortet. al. [11], which is an on-line porting of the DBSCAN clusteranalysis, or the paper by Szebenyi et. al. [12], where an onlinecluster analysis is used to reduce the volume of analysisdata generated.VI. CONCLUSIONSIn this paper we have presented a methodology, the AggregativeCluster Refinement, to automatically detect the innerstructure of SPMD parallel applications. This methodologyrelies on a cluster analysis using the DBSCAN algorithmplus a quality score based on a Multiple Sequence Alignmentalgorithm to iteratively refine the SPMD phases detected. Itoffers the application analyst an easy way to start the analyses,so it is capable to detect the different phases of an application,at fine grain.The correctness and usefulness of the Aggregative ClusterRefinement has been demonstrated analyzing 6 different realworldapplications. In the experiments, this automatic method,that requires no user intervention, were able to correctlydetect the application SPMD structures, outperforming the“best” results obtained manually using the original DBSCANanalysis.Our most immediate work is to port the developmentspresented in this paper to an on-line scenario and combinethem with other automatic tools, so as to define a fullyautomatic analysis environment for parallel applications.ACKOWNLEDGMENTSWe would like to acknowledge the BSC Tools team for theirsupport with the tools used for the development of the currentpaper. This work is granted by the IBM/BSC MareIncognitoproject and by the Comisión Interministerial de Ciencia yTecnología (CICYT), contract TIN2007-60625.REFERENCES[1] D. H. Ahn and J. S. Vetter, “Scalable analysis techniques for microprocessorperformance counter metrics,” in SC, 2002.[2] O. Y. Nickolayev, P. C. Roth, and D. A. Reed, “Real-Time StatisticalClustering for Event Trace Reduction,” International Journal of SupercomputerApplications and High Performance Computing, 1997.[3] C. W. Lee, C. Mendes, and L. Kale, “Towards Scalable PerformanceAnalysis and Visualization through Data Reduction,” in IPDPS, 2008.[4] J. Gonzalez, J. Gimenez, and J. Labarta, “Automatic Detection of ParallelApplications Computation Phases,” in IPDPS, 2009.[5] ——, “Automatic Evaluation of the Computation Structure of ParallelApplications,” in PDCAT, 2009.[6] M. Ester, H. P. Kriegel, J. Sander, and X. Xu, “A Density-BasedAlgorithm for Discovering Clusters in Large Spatial Databases withNoise,” in KDD, 1996.[7] M. Halkidi and M. Vazirigiannis, “Clustering Validity Sssessment UsingMulti Representatives,” in SETN, 2002.[8] Y. Liu, Z. Li, H. Xiong, X. Gao, and J. Wu, “Understanding of internalclustering validation measures,” in ICDM, 2010.[9] J. Joe H. Ward, “Hierarchical Grouping to Optimize an ObjectiveFunction,” Journal of the American Statistical Association, 1963.[10] K. A. Huck and A. D. Malony, “PerfExplorer: A Performance DataMining Framework For Large-Scale Parallel Computing,” in SC, 2005.[11] G. Llort, J. Gonzalez, H. Servat, J. Gimenez, and J. Labarta, “On-lineDetection of Large-scale Parallel Application’s Structure,” in IPDPS,2010.[12] Z. Szebenyi, F. Wolf, and B. J. N. Wylie, “Performance Analysis ofLong-running Applications,” in IPDPS, 2011.7

Completed Instructions1e+101e+091e+08NoiseCluster 1Cluster 2Cluster 3Cluster 4Cluster 6(a)1e+070.2 0.25 0.3 0.35 0.4 0.45 0.5 0.55 0.6 0.65IPC(b)1e+10Completed Instructions1e+091e+08NoiseCluster 1Cluster 2Cluster 3Cluster 4Cluster 5Cluster 6Cluster 7Cluster 8Cluster 9Cluster 10(c)1e+070.2 0.25 0.3 0.35 0.4 0.45 0.5 0.55 0.6 0.65IPC(d)8Fig. 6.Scatter plots and time-lines of the resulting clusters from VAC application obtained using the Aggregative Refinement Cluster analysis, (a) and (b), and perforoming a Manual Analysis, (c) and (d)STEP1Cl. 12Score = 19.7656%Cl. 2Score = 99.9219%Cl. 13Score = 20.2344%Cl. 10Score = 20.0781%Cl. 11Score = 9.60938%Cl. 3Score = 97.5%NoiseScore = 5.96418%Cl. 6Score = 69.7266%Cl. 16Score = 2.96875%Cl. 14Score = 25%Cl. 17Score = 2.45536%Cl. 15Score = 28.125%Cl. 9Score = 10.1562%Cl. 5Score = 28.125%Cl. 7Score = 16.3281%Cl. 4Score = 31.25%Cl. 8Score = 14.0625%Cl. 1Score = 100%STEP 2Cl. 12Score = 20.3125%Cl. 2Score = 99.9219%Cl. 13Score = 20.2344%Cl. 10Score = 20%Cl. 11Score = 9.72656%Cl. 3Score = 97.6562%NoiseScore = 5.72266%Cl. 6Score = 69.9219%Cl. 16Score = 3.20312%Cl. 14Score = 25%Cl. 17Score = 2.45536%Cl. 15Score = 28.125%Cl. 9Score = 10.1562%Cl. 5Score = 28.125%Cl. 4Score = 47.6562%Cl. 8Score = 14.0625%STEP 3Cl. 12Score = 20.3125%Cl. 2Score = 99.9219%Cl. 13Score = 20.2344%Cl. 10Score = 20%Cl. 11Score = 9.80469%Cl. 3Score = 97.7344%NoiseScore = 5.35156%Cl. 6Score = 70.0391%Cl. 16Score = 4.0625%Cl. 14Score = 20.4241%Cl. 15Score = 25.5682%Cl. 9Score = 10.1562%Cl. 5Score = 28.125%Cl. 4Score = 47.6562%Cl. 8Score = 14.0625%STEP 4Cl. 12Score = 20.3125%Cl. 2Score = 99.9219%Cl. 13Score = 20.2344%Cl. 10Score = 20%Cl. 11Score = 9.84375%NoiseScore = 4.66797%Cl. 3Score = 97.8906%Cl. 6Score = 73.0469%Cl. 14Score = 13.8765%Cl. 15Score = 28.125%Cl. 9Score = 10.1562%Cl. 5Score = 28.125%Cl. 4Score = 61.7188%STEP 5Cl. 12Score = 20.3125%Cl. 2Score = 99.9219%Cl. 13Score = 21.0156%Cl. 10Score = 19.9219%Cl. 11Score = 10.1562%NoiseScore = 4.00391%Cl. 3Score = 97.8906%Cl. 6Score = 73.5938%Cl. 14Score = 13.3523%Cl. 15Score = 28.125%Cl. 9Score = 10.1562%Cl. 5Score = 28.125%Cl. 4Score = 61.7188%STEP 6Cl. 12Score = 20.3125%Cl. 2Score = 99.9219%Cl. 13Score = 21.1719%Cl. 10Score = 20.8594%NoiseScore = 3.08494%Cl. 3Score = 99.2188%Cl. 6Score = 73.7891%Cl. 14Score = 11.5385%Cl. 15Score = 28.125%Cl. 9Score = 10.1562%Cl. 5Score = 28.125%Cl. 4Score = 61.7188%STEP 7Cl. 12Score = 20.3125%Cl. 2Score = 99.9219%Cl. 13Score = 21.4844%Cl. 10Score = 22.4219%NoiseScore = 2.23524%Cl. 3Score = 99.6875%Cl. 6Score = 73.8281%Cl. 14Score = 12%Cl. 15Score = 25.5682%Cl. 9Score = 10.1562%Cl. 4Score = 89.8438%STEP 8Cl. 12Score = 20.3125%Cl. 2Score = 99.9219%Cl. 13Score = 21.875%Cl. 10Score = 23.2422%NoiseScore = 1.63352%Cl. 3Score = 99.9219%Cl. 6Score = 73.9844%Cl. 14Score = 12.0312%Cl. 15Score = 25.5682%Cl. 9Score = 10.1562%Cl. 4Score = 89.8438%STEP 9Cl. 12Score = 20.3125%Cl. 2Score = 99.9219%Cl. 13Score = 21.875%NoiseScore = 1.30208%Cl. 10Score = 24.8828%Cl. 3Score = 100%Cl. 6Score = 59.375%Cl. 15Score = 28.125%Cl. 4Score = 100%STEP 10Cl. 12Score = 18.4659%Cl. 2Score = 99.9609%Cl. 13Score = 18.2943%NoiseScore = 1.49148%Cl. 10Score = 24.9609%Cl. 6Score = 68.75%POST-PROCESSCl. 6Score = 99.4792%Fig. 7. Tree obtained in Aggregative Refinement analysis of VAC application. As in Figure 3, filled nodes represent those clusters selected in the final partition. The main difference with the previous examplelies in the fact that this analysis needed to use all possible Eps values and also the last process based on sequence alignment merged a set of clusters to produce the final Cluster 6

Automatic Refinement of Parallel Applications Structure Detection

Create successful ePaper yourself

Delete template?

Save as template?