Views
5 years ago

Speculative Instruction Validation for Performance-Reliability Trade-off

Speculative Instruction Validation for Performance-Reliability Trade-off

Speculative Instruction Validation for Performance-Reliability

SpeculativeInstructionValidationforPerformance-ReliabilityTrade-off Sumeet Kumar Electrical and Computer Engineering Binghamton University Binghamton, NY 13902 skumar1@binghamton.edu Abstract Withreducingfeaturesize,increasingchipcapacity,andincreasingclockspeed,microprocessorsarebecomingincreasingly susceptibletotransient(soft)errors. Redundantmulti-threading (RMT)isanattractiveapproachforconcurrenterrordetection. RMTprovidescompleteerrorcoverage,whileincurringasignificantperformanceimpactbecauseoftheredundantthread.Achievingperfectreliabilityattheexpenseofahighperformancedropis notagooddesignoptionforsystemswhereslightvulnerabilitymay stillachievethedesirederrorrates. Inthispaper,weexplorespeculativemechanismstotrade-off reliabilityforperformanceinRMT.Ourbasicapproachvalidates theexecutionofaninstructionbycomparingitsresultagainstthe expectedresult. Onlythoseinstructionsareredundantlyexecuted forwhichthevalidationsfail.Thismechanismisexpectedtohave aminimalvulnerabilityimpactbecauseitishighlyunlikelythat anerroneousresultmatchestheexpectedvalue. Wealsopropose severalextensionstothebasicapproachthatfurtherexplorethe performance-reliabilitytrade-offdesignspace. Acombinationof thesetechniquesincurabout10%performanceimpactandabout 0.09%undetectedbaseerrorrate,comparedtoabout25%performanceimpactforRMTwithnoundetectederrors. Keywords: Concurrent Error Detection, Reducing Instruction Redundancy, Redundant Multi-threading, Instruction Validation, Performance-Reliability Trade-off 1 Introduction With the current trends in transistor size, voltage, and clock frequency, microprocessors are becoming increasingly susceptible to hardware failures. A large portion of the errors in the current technology are soft errors [4,22] that randomly flip the bit values in the processor. These errors are temporary in nature and are particularly troublesome because they elude most of the current testing methods. A popular approach to detect soft errors isRedundantMulti-threading (RMT), where a thread is executed redundantly and errors are detected by corroborating the results from redundant executions. There are various different implementations of RMT [1, 4, 6, 9, 16, 18, 20, 21,23, 27,28, 31]. One popular RMT implementation is SRT, where the redundant threads execute entirely independently of each other in a simultaneous multi-threaded (SMT) environment [6, 20, 21, 28]. Redundantly executing every instruction in an application for complete error coverage results in signifi- 978-1-4244-2070-4/08/$25.00 ©2008 IEEE Aneesh Aggarwal Electrical and Computer Engineering Binghamton University Binghamton, NY 13902 aneesh@binghamton.edu cant performance degradation. For instance, our experiments show that SRT, our SRT architecture is explained in detail in Section 2, results in about 25% reduction in the average IPC of the simulated SPEC2K benchmarks. For many systems, achieving perfect reliability at the expense of such high performance impact is not a good design option, especially if a slightly vulnerable system may still achieve the desired error rates. Hence, it is essential to explore techniques that achieve a good performance-reliability trade-off [7]. Recently, there have been a few proposals, such as Re- Store [30], PB [17], and LB [5], that do not perform any redundant execution. These approaches flag errors if perturbations are observed in the program behaviors such as branch predictions, instruction results, TLB misses, etc. The false positive rate for these schemes can be high if the program behavior changes correctly. Similarly, the false negative rate for these schemes can also be high. For instance, these schemes will miss errors in instructions that do not have predictable behavior and/or do not lead to the behavior that is heeded.It isimportanttonotethattheseapproachesmaystillhavea highperformanceimpactbecauseofrevertingbackonevery falsepositive. One method to mitigate the performance impact of redundancy is to avoid redundant execution of as many instructions as possible. There are two different approaches that have been proposed to reduce redundancy in RMT. The first set of proposals, such as PER-IRTR [7], VCSR [26],and RMT-Toggling [29], react to the processor state in order to reduce redundancy in RMT. For instance, PER-IRTR avoids redundant execution during high IPC program phases. Similarly, VCSR performs redundant executions only when the AVF of certain processor structures increases beyond the acceptable limits. RMT-Toggling uses AVF prediction to identify low AVF program phases, and avoids redundant execution for those phases. In these approaches, any error that may affect the program outcome may go undetected during the unprotected phases. This may result in a high false negative rates, especially in those approaches, such as PER-IRTR, that give priority to performance. Similarly, approaches that give priority to reliability, such as VCSR [26] and RMT- Toggling [29] may have a high performance impact. A second set of approaches, such as SlicK [14], SSmod [19] (modified SlipStream [27]), DIE-IRB [15], proac- 405

Myotest Performs Construct Validity and Reliability Study at
Reliability and validity of the Canadian Occupational Performance ...
Assessing Student Performance: How to Design Valid and Reliable ...
Reliable and Valid Measures of Threat Detection Performance in X ...