30.07.2015 Views

Actas JP2011 - Universidad de La Laguna

Actas JP2011 - Universidad de La Laguna

Actas JP2011 - Universidad de La Laguna

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

<strong>Actas</strong> XXII Jornadas <strong>de</strong> Paralelismo (<strong>JP2011</strong>) , <strong>La</strong> <strong>La</strong>guna, Tenerife, 7-9 septiembre 2011in the LRU sequence. Each value of the matrix indicatesthe number of accesses to every position inthe LRU sequence for every set. This informationis very useful for offline analysis given that we can<strong>de</strong>termine the number of misses for a given numberof ways w in our cache by simply adding the accessesfor the last n − w columns of the matrix.We can also use this matrix to compute the misseswith faulty cells. For this purpose, and accordingto Equation 5, we need to calculate the number ofmisses if a given number of ways were disabled inthe cache due to permanent faults. First, we accumulatethe number of accesses in every position perset. Then, we perform the same operation per setto obtain a vector which indicates the exact numberof misses our cache would suffer as a consequence oflosing from 0 to w ways.For our experiments, we have simulated a processorarchitecture by means of Virtutech Simics [21]and GEMS [22]. Simics is a functional simulator executinga Solaris 10 Unix distribution simulating theUltraSPARC-III ISA. GEMS is a timing simulatorwhich, coupled to Simics, provi<strong>de</strong>s <strong>de</strong>tailed resultsfor the memory system. We have performed severalmodifications to the simulator to extract cache addresstraces. Then, these traces are used to generatethe map of accesses for every possible cache configurationby means of the all-associativity algorithm asexplained previously.We have conducted our experiments by executingdifferent applications from the SPECcpu-2000 [23](bzip2, gap, gzip, parser, twolf, vpr). Benchmarksare run for 1 billion of cycles. In all cases, the warmingup of caches has been taken into account.The different p fails used for the evaluation of thecaches are shown in Section I with the exception of6.1e-13, which produces virtually no faulty blocks inour experiments. Additionally, we have evaluatedp fail of 1e-03 which is consi<strong>de</strong>red in many relatedpapers.B. Random Fault Map MethodologyBefore proceeding to <strong>de</strong>termine the EMR usingthe proposed methodology we <strong>de</strong>termine how wellrandomly generated fault-maps approximate the expectednumber of faults obtained using Eq. 1.In Figure 1 we can see the probability distributionof the number of faulty blocks for different p fails (wehave omitted 7.3e-09 and 1.5e-06 because they offerfrom 0 to 1 and 0 to 4 faulty blocks, respectively)in a 32KB, 2-way associative cache with 558 bits perblock 3 . Results show the estimated faulty blocks obtainedanalytically (analytical line) and by differentnumbers of faulty maps (from 100 to 10 millions). Asit is observed, few faulty maps are not able to capturethe exact behaviour of the analytical mo<strong>de</strong>l. However,when the number of maps increases (1K mapsor more), the number of faulty blocks becomes moreaccurate. Nonetheless, this study cannot conclu<strong>de</strong>how well random maps approximate the expectedmisses of a cache, since misses directly <strong>de</strong>pend onthe location of faults among the different cache sets.C. EMR and SD MR for SPEC applicationsIn this section we show the calculated EMR andSD MR for several benchmarks and a 2-way 32KBL1 cache with different p fails .Surprisingly we can see in Figure 2 that a smallnumber of faulty maps, 100-1000, is enough to approximatethe EMR and SD MR provi<strong>de</strong>d by themo<strong>de</strong>l. The reason for this is the access homogeneityto the different sets of the cache. In other words, forthe applications we have evaluated, there are no particularsets that are clearly more accessed than othersduring the overall execution of the benchmark. Thismakes the EMR and SD MR virtually in<strong>de</strong>pen<strong>de</strong>ntfrom the fault locations and that is the reason whyfault maps are able to provi<strong>de</strong> such good estimations.We establish the cache access homogeneity with astudy of the correlation of accesses between all thesets in our cache by calculating the Pearson correlationcoefficient. When the Pearson coefficient is closeto 0, it means that there is no correlation betweenvariables, whereas when it is close to 1, it means acorrelation between them. We have calculated thematrix of correlations of the number of accesses for a2-way 32KB L1 cache for the evaluated benchmarks.Table II reflects the average value for the Pearson coefficientsas well as its standard <strong>de</strong>viation. As we cansee, all coefficients are very close to 1, which meansthat the accesses among sets are highly correlated.TABLE IIPearson Coefficient Matrix for each benchmark.Benchmark Mean Pearson Coeff. DEVbzip2 .993 .007gap .9 .086gzip .997 .002parser .998 .003twolf .943 .119vpr .995 .006The key insight from this study is that, becauseof the high correlation, a small number of randomfault maps is sufficient to obtain accurate expectedcache behavior with faults. If data accesses amongsets are not highly correlated, a few fault maps wouldnot be able to provi<strong>de</strong> an accurate prediction of theexpected behaviour with faults.D. PD MR for SPEC applicationsIn Section II-C, we have <strong>de</strong>veloped a method tocalculate a PD MR for the expected values of theEMR. As explained, we follow a constructive approach,calculating the different p fail from 0 to nfaulty blocks. Then, for each of these values we calculateits EMR.3 We consi<strong>de</strong>r blocks comprised of: 64 bytes for data and 11bits for its ECC, 25 bits for the tag and 7 bits for its ECC,and 3 control bits for valid, disable and dirty states.<strong>JP2011</strong>-236

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!