Exterminator- A ... with High Probability.pdf - DSpace at CUSAT ...
Exterminator- A ... with High Probability.pdf - DSpace at CUSAT ...
Exterminator- A ... with High Probability.pdf - DSpace at CUSAT ...
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
<strong>Extermin<strong>at</strong>or</strong>: Autom<strong>at</strong>ically Correcting Memory Errors<br />
<strong>with</strong> <strong>High</strong> <strong>Probability</strong><br />
SEMINAR REPORT<br />
2009-2011<br />
In partial fulfillment of Requirements in<br />
Degree of Master of Technology<br />
In<br />
SOFTWARE ENGINEERING<br />
SUBMITTED BY<br />
NISHAMOL P H<br />
DEPARTMENT OF COMPUTER SCIENCE<br />
COCHIN UNIVERSITY OF SCIENCE AND TECHNOLOGY<br />
KOCHI – 682 022
COCHIN UNIVERSITY OF SCIENCE AND TECHNOLOGY<br />
KOCHI – 682 022<br />
DEPARTMENT OF COMPUTER SCIENCE<br />
CERTIFICATE<br />
This is to certify th<strong>at</strong> the seminar report entitled “<strong>Extermin<strong>at</strong>or</strong>: Autom<strong>at</strong>ically<br />
Correcting Memory Errors <strong>with</strong> <strong>High</strong> <strong>Probability</strong>” is being submitted by<br />
NISHAMOL P H in partial fulfillment of the requirements for the award of<br />
M.Tech in Software Engineering is a bonafide record of the seminar<br />
presented by her during the academic year 2009.<br />
Dr.Sumam Mary Idicula Prof. Dr.K.Poulose Jacob<br />
Reader Director<br />
Dept. of Computer Science Dept. of Computer Science
ACKNOWLEDGEMENT<br />
First of all let me thank our Director Prof: Dr. K. Paulose Jacob,<br />
Dept. of Computer Science, who provided <strong>with</strong> the necessary facilities and<br />
advice. I am also thankful to Dr. Sumam Mary Idicula, Reader, Dept. of<br />
Computer Science, for her valuable suggestions and support for the<br />
completion of this seminar. With gre<strong>at</strong> pleasure I remember Mr. G.<br />
Santhoskumar for his sincere guidance. Also I am thankful to all of my<br />
teaching and non-teaching staff in the department and my friends for<br />
extending their warm kindness and help.<br />
I would like to thank my parents <strong>with</strong>out their blessings and support I<br />
would not have been able to accomplish my goal. Finally, I thank the<br />
almighty for giving the guidance and blessings.
CONTENTS<br />
1. Introduction -------------------------------------------------------------------- 1<br />
2. Memory errors ----------------------------------------------------------------- 3<br />
3. Software Architecture ---------------------------------------------------------<br />
3.1. DieHard Overview --------------------------------------------------------<br />
3.2. <strong>Extermin<strong>at</strong>or</strong>’s Heap Layout ---------------------------------------------<br />
3.3. DieFast: A Probabilistic Debugging Alloc<strong>at</strong>or ---------------------------<br />
3.4. Modes of Oper<strong>at</strong>ion ------------------------------------------------------<br />
4. Error Isol<strong>at</strong>ion (ITERATIVE AND REPLICATED) ----------------------------- 11<br />
4.1. Buffer overflow detection ------------------------------------------------ 11<br />
4.2. Dangling Pointer Isol<strong>at</strong>ion ----------------------------------------------- 13<br />
5. CUMULATIVE ERROR ISOLATION -------------------------------------------- 14<br />
5.1. Buffer overflow detection -----------------------------------------------<br />
5.2. Dangling pointer isol<strong>at</strong>ion -----------------------------------------------<br />
6. Error Correction --------------------------------------------------------------- 16<br />
6.1. Buffer overflow correction -----------------------------------------------<br />
6.2. Dangling pointer correction ---------------------------------------------<br />
6.3. The Correcting Memory Alloc<strong>at</strong>or ---------------------------------------<br />
6.4. Collabor<strong>at</strong>ive Correction ----------------------------------------------<br />
7. Results ------------------------------------------------------------------------- 19<br />
7.1. <strong>Extermin<strong>at</strong>or</strong> Runtime Overhead ---------------------------------------- 19<br />
7.2. Memory Error Correction ------------------------------------------------ 20<br />
7.3 P<strong>at</strong>ch Overhead ----------------------------------------------------------- 22<br />
8. Rel<strong>at</strong>ed Work ------------------------------------------------------------------ 23<br />
8.1. Randomized Memory Managers ----------------------------------------- 23<br />
8.2. Autom<strong>at</strong>ic Repair --------------------------------------------------------<br />
8.3. Autom<strong>at</strong>ic Debugging ---------------------------------------------------<br />
8.4. Fault Tolerance ----------------------------------------------------------<br />
8.5. Memory Managers -------------------------------------------------------<br />
9. Future Work ------------------------------------------------------------------ 25<br />
10. Conclusion -------------------------------------------------------------------<br />
References ------------------------------------------------------------------<br />
4<br />
4<br />
5<br />
6<br />
8<br />
14<br />
15<br />
16<br />
16<br />
16<br />
18<br />
23<br />
23<br />
24<br />
24<br />
26<br />
27
<strong>Extermin<strong>at</strong>or</strong>: Autom<strong>at</strong>ically Correcting Memory Errors<br />
<strong>with</strong> <strong>High</strong> <strong>Probability</strong><br />
Abstract<br />
Programs written in C and C++ are susceptible to memory errors,including buffer<br />
overflows and dangling pointers. These errors,which can lead to crashes, erroneous<br />
execution, and security vulnerabilities, are notoriously costly to repair. Tracking down<br />
their loc<strong>at</strong>ion in the source code is difficult, even when the full memory st<strong>at</strong>e of the<br />
program is available. Once the errors are finally found, fixing them remains challenging:<br />
even for critical security-sensitive bugs, the average time between initial reports and the<br />
issuance of a p<strong>at</strong>ch is nearly one month.<br />
We present <strong>Extermin<strong>at</strong>or</strong>, a system th<strong>at</strong> autom<strong>at</strong>ically corrects heap-based memory<br />
errors <strong>with</strong>out programmer intervention. <strong>Extermin<strong>at</strong>or</strong> exploits randomiz<strong>at</strong>ion to pinpoint<br />
errors <strong>with</strong> high precision. From this inform<strong>at</strong>ion, <strong>Extermin<strong>at</strong>or</strong> derives runtime p<strong>at</strong>ches<br />
th<strong>at</strong> fix these errors both in current and subsequent executions. In addition, <strong>Extermin<strong>at</strong>or</strong><br />
enables collabor<strong>at</strong>ive bug correction by merging p<strong>at</strong>ches gener<strong>at</strong>ed by multiple users. We<br />
present analytical and empirical results th<strong>at</strong> demonstr<strong>at</strong>e <strong>Extermin<strong>at</strong>or</strong>’s effectiveness <strong>at</strong><br />
detecting and correcting both injected and real faults.
1. Introduction<br />
<strong>Extermin<strong>at</strong>or</strong><br />
The use of manual memory management and unchecked memory accesses in C and<br />
C++ leaves applic<strong>at</strong>ions written in these languages susceptible to a range of memory<br />
errors. These include buffer overruns, where reads or writes go beyond alloc<strong>at</strong>ed regions,<br />
and dangling pointers, when a program de alloc<strong>at</strong>es memory while it is still live. Memory<br />
errors can cause programs to crash or produce incorrect results. Worse, <strong>at</strong>tackers are<br />
frequently able to exploit these memory errors to gain unauthorized access to systems.<br />
Debugging memory errors is notoriously difficult and time consuming. Reproducing<br />
the error requires an input th<strong>at</strong> exposes it. Since inputs are often unavailable from deployed<br />
programs, developers must either concoct such an input or find the problem via code<br />
inspection. Once a test input is available, software developers typically execute the<br />
applic<strong>at</strong>ion <strong>with</strong> heap debugging tools like Purify and Valgrind , which slow execution<br />
by an order of magnitude. When the bug is ultim<strong>at</strong>ely discovered, developers must<br />
construct and carefully test a p<strong>at</strong>ch to ensure th<strong>at</strong> it fixes the bug <strong>with</strong>out introducing any<br />
new ones. According to Symantec, the average time between the discovery of a critical,<br />
remotely exploitable memory error and the release of a p<strong>at</strong>ch for enterprise applic<strong>at</strong>ions is<br />
28 days.<br />
As an altern<strong>at</strong>ive to debugging memory errors, researchers have proposed a number<br />
of systems th<strong>at</strong> either detect or toler<strong>at</strong>e them. Fail-stop systems are compiler-based<br />
approaches th<strong>at</strong> require access to source code, and abort programs when they performs<br />
illegal oper<strong>at</strong>ions like buffer overflows. They rely either on conserv<strong>at</strong>ive garbage<br />
collection or pool alloc<strong>at</strong>ion to prevent or detect dangling pointer errors. Failure- oblivious<br />
systems are also compiler-based, but manufacture read values and drop or cache illegal<br />
writes for l<strong>at</strong>er reuse. Finally, fault-tolerant systems mask the effect of errors, either by<br />
logging and replaying inputs in an environment th<strong>at</strong> pads alloc<strong>at</strong>ion requests and defers<br />
dealloc<strong>at</strong>ions, or through randomiz<strong>at</strong>ion and optional voting-based replic<strong>at</strong>ion th<strong>at</strong> reduces<br />
the odds th<strong>at</strong> an error will have any effect (e.g., DieHard ).<br />
Contributions: This paper presents <strong>Extermin<strong>at</strong>or</strong>, a runtime system th<strong>at</strong> not only<br />
toler<strong>at</strong>es but also detects and corrects heap-based memory errors. <strong>Extermin<strong>at</strong>or</strong> requires<br />
neither source code nor programmer intervention, and fixes existing errors <strong>with</strong>out<br />
introducing new ones. To our knowledge, this system is the first of its kind.<br />
<strong>Extermin<strong>at</strong>or</strong> relies on an efficient probabilistic debugging alloc<strong>at</strong>or th<strong>at</strong> we call<br />
DieFast. DieFast is based on DieHard’s alloc<strong>at</strong>or, which ensures th<strong>at</strong> heaps are<br />
independently randomized. However, while DieHard can only probabilistically toler<strong>at</strong>e<br />
errors, DieFast probabilistically detects them.<br />
When <strong>Extermin<strong>at</strong>or</strong> discovers an error, it dumps a heap image th<strong>at</strong> contains the<br />
complete st<strong>at</strong>e of the heap. <strong>Extermin<strong>at</strong>or</strong>’s probabilistic error isol<strong>at</strong>ion algorithm then<br />
processes one or more heap images to loc<strong>at</strong>e the source and size of buffer overflows and<br />
Dept. Of Computer Science & Engg. ~ 1 ~ Cochin University of Science & Technology
<strong>Extermin<strong>at</strong>or</strong><br />
dangling pointer errors. This error isol<strong>at</strong>ion algorithm has provably low false positive and<br />
false neg<strong>at</strong>ive r<strong>at</strong>es.<br />
Once <strong>Extermin<strong>at</strong>or</strong> loc<strong>at</strong>es a buffer overflow, it determines the alloc<strong>at</strong>ion site of the<br />
overflowed object, and the size of the over flow. For dangling pointer errors, <strong>Extermin<strong>at</strong>or</strong><br />
determines both the alloc<strong>at</strong>ion and deletion sites of the dangled object, and computes how<br />
prem<strong>at</strong>urely the object was freed.<br />
With this inform<strong>at</strong>ion in hand, <strong>Extermin<strong>at</strong>or</strong> corrects the errors by gener<strong>at</strong>ing<br />
runtime p<strong>at</strong>ches. These p<strong>at</strong>ches oper<strong>at</strong>e in the context of a correcting alloc<strong>at</strong>or. The<br />
correcting alloc<strong>at</strong>or prevents overflows by padding objects, and prevents dangling pointer<br />
errors by deferring object dealloc<strong>at</strong>ions. These actions impose little space overhead<br />
because <strong>Extermin<strong>at</strong>or</strong>’s runtime p<strong>at</strong>ches are tailored to the specific alloc<strong>at</strong>ion and<br />
dealloc<strong>at</strong>ion sites of each error.<br />
After <strong>Extermin<strong>at</strong>or</strong> completes p<strong>at</strong>ch gener<strong>at</strong>ion, it both stores the p<strong>at</strong>ches to correct<br />
the bug in subsequent executions, and triggers a p<strong>at</strong>ch upd<strong>at</strong>e in the running program to fix<br />
the bug in the current execution. <strong>Extermin<strong>at</strong>or</strong>’s p<strong>at</strong>ches also compose straightforwardly,<br />
enabling collabor<strong>at</strong>ive bug correction: users running <strong>Extermin<strong>at</strong>or</strong> can autom<strong>at</strong>ically<br />
merge their p<strong>at</strong>ches, thus system<strong>at</strong>ically and continuously improving applic<strong>at</strong>ion reliability.<br />
<strong>Extermin<strong>at</strong>or</strong> can oper<strong>at</strong>e in three distinct modes: an iter<strong>at</strong>ive mode for runs over the<br />
same input, a replic<strong>at</strong>ed mode th<strong>at</strong> can correct errors on-the-fly, and a cumul<strong>at</strong>ive mode<br />
th<strong>at</strong> corrects errors across multiple runs of the same applic<strong>at</strong>ion.<br />
We experimentally demonstr<strong>at</strong>e th<strong>at</strong>, in exchange for modest runtime overhead<br />
(geometric mean of 25%), <strong>Extermin<strong>at</strong>or</strong> effectively isol<strong>at</strong>es and corrects both injected and<br />
real memory errors, including buffer overflows in the Squid web caching server and the<br />
Mozilla web browser.<br />
Dept. Of Computer Science & Engg. ~ 2 ~ Cochin University of Science & Technology
2. Memory Errors<br />
<strong>Extermin<strong>at</strong>or</strong><br />
Table 1 summarizes the memory errors th<strong>at</strong> <strong>Extermin<strong>at</strong>or</strong> addresses, and its response to<br />
each. <strong>Extermin<strong>at</strong>or</strong> identifies and corrects dangling pointers, where a heap object is freed<br />
while it is still live, and buffer overflows (a.k.a. buffer overruns) of heap objects. Notice<br />
th<strong>at</strong> this differs substantially from DieHard, which toler<strong>at</strong>es these errors probabilistically<br />
but cannot detect or correct them.<br />
<strong>Extermin<strong>at</strong>or</strong>’s alloc<strong>at</strong>or (DieFast) inherits from DieHard its immunity from two other<br />
common memory errors: double frees, when a heap object is dealloc<strong>at</strong>ed multiple times<br />
<strong>with</strong>out an intervening alloc<strong>at</strong>ion, and invalid frees, when a program dealloc<strong>at</strong>es an object<br />
th<strong>at</strong> was never returned by the alloc<strong>at</strong>or. These errors have serious consequences in other<br />
systems, where they can lead to heap corruption or abrupt program termin<strong>at</strong>ion.<br />
<strong>Extermin<strong>at</strong>or</strong> prevents these invalid dealloc<strong>at</strong>ion requests from having any impact.<br />
DieFast’s bitmap-based alloc<strong>at</strong>or makes multiple frees benign since a bit can only be reset<br />
once. By checking ranges, DieFast detects and ignores invalid frees.<br />
Limit<strong>at</strong>ions<br />
<strong>Extermin<strong>at</strong>or</strong>’s ability to correct both dangling pointer errors and buffer overflows has<br />
several limit<strong>at</strong>ions. First, <strong>Extermin<strong>at</strong>or</strong> assumes th<strong>at</strong> buffer overflows always corrupt<br />
memory <strong>at</strong> higher addresses—th<strong>at</strong> is, they are forward overflows. While it is possible to<br />
extend <strong>Extermin<strong>at</strong>or</strong> to handle backwards overflows, we have not implemented this<br />
functionality. <strong>Extermin<strong>at</strong>or</strong> can only correct finite overflows, so th<strong>at</strong> it can contain any<br />
given overflow by overalloc<strong>at</strong>ion. Similarly, <strong>Extermin<strong>at</strong>or</strong> corrects dangling pointer errors<br />
by inserting finite delays before freeing particular objects. Finally, in iter<strong>at</strong>ed and<br />
replic<strong>at</strong>ed modes, <strong>Extermin<strong>at</strong>or</strong> assumes th<strong>at</strong> overflows and dangling pointer errors are<br />
deterministic. However, the cumul<strong>at</strong>ive mode does not require deterministic errors.<br />
Unlike DieHard, <strong>Extermin<strong>at</strong>or</strong> does not detect uninitialized reads, where a program<br />
makes use of a value left over in a previously-alloc<strong>at</strong>ed object. Because the intended value<br />
is unknown,it is not generally possible to repair such errors <strong>with</strong>out additional inform<strong>at</strong>ion,<br />
e.g. d<strong>at</strong>a structure invariants. Instead, <strong>Extermin<strong>at</strong>or</strong> fills all alloc<strong>at</strong>ed objects <strong>with</strong> zeroes.<br />
Dept. Of Computer Science & Engg. ~ 3 ~ Cochin University of Science & Technology
3. Software Architecture<br />
<strong>Extermin<strong>at</strong>or</strong><br />
<strong>Extermin<strong>at</strong>or</strong>’s software architecture extends and modifies DieHard to enable its error<br />
isol<strong>at</strong>ing and correcting properties.<br />
3.1 DieHard Overview<br />
The DieHard system includes a bitmap-based, fully-randomized memory alloc<strong>at</strong>or<br />
th<strong>at</strong> provides probabilistic memory safety. The l<strong>at</strong>est version of DieHard, upon which<br />
<strong>Extermin<strong>at</strong>or</strong> is based, adaptively sizes its heap be M times larger than the maximum<br />
needed by the applic<strong>at</strong>ion (see Figure 2). This version of Die-Hard alloc<strong>at</strong>es memory from<br />
increasingly large chunks th<strong>at</strong> we call miniheaps. Each miniheap contains objects of<br />
exactly one size. If an alloc<strong>at</strong>ion would cause the total number of objects to exceed1/M,<br />
DieHard alloc<strong>at</strong>es a new miniheap th<strong>at</strong> is twice as large as the previous largest miniheap.<br />
Alloc<strong>at</strong>ion randomly probes a miniheap’s bitmap for the given size class for a free bit:<br />
this oper<strong>at</strong>ion takes O(1) expected time. Freeing a valid object resets the appropri<strong>at</strong>e bit.<br />
DieHard’s use of randomiz<strong>at</strong>ion across an over-provisioned heap makes it probabilistically<br />
likely th<strong>at</strong> buffer overflows will land on free space, and unlikely th<strong>at</strong> a recently-freed<br />
object will be reused soon, making dangling pointer errors rare.<br />
DieHard optionally uses replic<strong>at</strong>ion to increase the probability of successful execution.<br />
In this mode, it broadcasts inputs to a number of replicas of the applic<strong>at</strong>ion process, each<br />
equipped <strong>with</strong> a different random seed. A voter intercepts and compares outputs across the<br />
replicas, and only actually gener<strong>at</strong>es output agreed on by a plurality of the replicas. The<br />
independent randomiz<strong>at</strong>ion of each replica’s heap makes the probabilities of memory<br />
errors independent. Replic<strong>at</strong>ion thus exponentially decreases the likelihood of a memory<br />
error affecting output, since the probability of an error striking a majority of the replicas is<br />
low.<br />
Dept. Of Computer Science & Engg. ~ 4 ~ Cochin University of Science & Technology
3.2 <strong>Extermin<strong>at</strong>or</strong>’s Heap Layout<br />
<strong>Extermin<strong>at</strong>or</strong><br />
Figure 1 presents <strong>Extermin<strong>at</strong>or</strong>’s heap layout, which includes five fields per object for<br />
error isol<strong>at</strong>ion and correction: an object id, alloc<strong>at</strong>ion and dealloc<strong>at</strong>ion sites, dealloc<strong>at</strong>ion<br />
time, which records when the object was freed, and a canary bitset th<strong>at</strong> indic<strong>at</strong>es if the<br />
object was filled <strong>with</strong> canaries (Section 3.3).<br />
An object id of n means th<strong>at</strong> the object is the nth object alloc<strong>at</strong>ed. <strong>Extermin<strong>at</strong>or</strong> uses<br />
object ids to identify objects across multiple heaps. These ids are needed because the<br />
object’s address cannot be used to identify it across differently-randomized heaps.<br />
The site inform<strong>at</strong>ion fields capture the calling context for alloc<strong>at</strong>ions and<br />
dealloc<strong>at</strong>ions. For each, <strong>Extermin<strong>at</strong>or</strong> hashes the least significant bytes of the five mostrecent<br />
return addresses into 32 bits using the DJB2 hash function (see Figure 3).<br />
This out-of-band metad<strong>at</strong>a accounts for approxim<strong>at</strong>ely 16 bytes plus two bits of space<br />
overhead for every object. This overhead is comparable to th<strong>at</strong> of typical free list-based<br />
memory managers like the Lea alloc<strong>at</strong>or, which prepend 8-byte (on 32-bit systems) or 16byte<br />
headers (on 64-bit systems) to alloc<strong>at</strong>ed objects.<br />
Dept. Of Computer Science & Engg. ~ 5 ~ Cochin University of Science & Technology
3.3 DieFast: A Probabilistic Debugging Alloc<strong>at</strong>or<br />
<strong>Extermin<strong>at</strong>or</strong><br />
<strong>Extermin<strong>at</strong>or</strong> uses a new, probabilistic debugging alloc<strong>at</strong>or th<strong>at</strong> we call DieFast.<br />
DieFast uses the same randomized heap layout as DieHard, but extends its alloc<strong>at</strong>ion and<br />
dealloc<strong>at</strong>ion algorithms to detect and expose errors. Figure 4 presents pseudo-code for the<br />
DieFast alloc<strong>at</strong>or. Unlike previous debugging alloc<strong>at</strong>ors, DieFast has a number of unusual<br />
characteristics tailored for its use in the context of <strong>Extermin<strong>at</strong>or</strong>.<br />
Implicit Fence-posts<br />
Many existing debugging alloc<strong>at</strong>ors pad alloc<strong>at</strong>ed objects <strong>with</strong> fence-posts (filled <strong>with</strong><br />
canary values) on both sides. They can thus detect buffer overflows by checking the<br />
integrity of these fence posts. This approach has the disadvantage of increasing space<br />
requirements. Combined <strong>with</strong> the already-increased space requirements of a DieHardbased<br />
heap, the additional space overhead of padding may be unacceptably large.<br />
DieFast exploits two facts to obtain the effect of fence-posts <strong>with</strong>out any additional<br />
space overhead. First, because its heap lay out is header less, one fence-post serves double<br />
duty: a fence-post following an object can act as the one preceding the next object. Second,<br />
Dept. Of Computer Science & Engg. ~ 6 ~ Cochin University of Science & Technology
<strong>Extermin<strong>at</strong>or</strong><br />
because alloc<strong>at</strong>ed objects are separ<strong>at</strong>ed by E(M−1) freed objects on the heap, we use freed<br />
space to act as fence-posts.<br />
Random Canaries<br />
Traditional debugging canaries include values th<strong>at</strong> are readily distinguished from<br />
normal program d<strong>at</strong>a in a debugging session, such as the hexadecimal value<br />
0xDEADBEEF. However, one drawback of a deterministically-chosen canary is th<strong>at</strong> it is<br />
always possible for the program to use the canary p<strong>at</strong>tern as a d<strong>at</strong>a value. Because DieFast<br />
uses canaries loc<strong>at</strong>ed in freed space r<strong>at</strong>her than in alloc<strong>at</strong>ed space, a fixed canary would<br />
lead to a high false positive r<strong>at</strong>e if th<strong>at</strong> d<strong>at</strong>a value were common in alloc<strong>at</strong>ed objects.<br />
DieFast instead uses a random 32-bit value set <strong>at</strong> startup. Sinceboth the canary and<br />
heap addresses are random and differ on every execution, any fixed d<strong>at</strong>a value has a low<br />
probability of colliding <strong>with</strong> the canary, thus ensuring a low false positive r<strong>at</strong>e (see Theorem<br />
2). To increase the likelihood of detecting an error, DieFast sets the last bit of the<br />
canary. Setting this bit will cause an alignment error if the canary is dereferenced, but<br />
keeps the probability of an accidental collision <strong>with</strong> the canary low (1/231).<br />
Probabilistic Fence-posts<br />
Intuitively, the most effective way to expose a dangling pointer error is to fill freed<br />
memory <strong>with</strong> canary values. For example, dereferencing a canary-filled pointer will likely<br />
trigger a segment<strong>at</strong>ion viol<strong>at</strong>ion.<br />
Unfortun<strong>at</strong>ely, reading random values does not necessarily cause programs to fail. For<br />
example, in the espresso benchmark, some objects hold bitsets. Filling a freed bitset <strong>with</strong> a<br />
random value does not cause the program to termin<strong>at</strong>e but only affects the correctness of<br />
the comput<strong>at</strong>ion.<br />
If reading from a canary-filled dangling pointer causes a program to diverge, there is no<br />
way to narrow down the error. In the worst-case, half of the heap could be filled <strong>with</strong> freed<br />
objects, all overwritten <strong>with</strong> canaries. All of these objects would then be potential sources<br />
of dangling pointer errors.<br />
In cumul<strong>at</strong>ive mode, <strong>Extermin<strong>at</strong>or</strong> prevents this scenario by non-deterministically<br />
writing canaries into freed memory randomly <strong>with</strong> probability p, and setting the<br />
appropri<strong>at</strong>e bit in the canary bitmap. This probabilistic approach may seem to degrade<br />
<strong>Extermin<strong>at</strong>or</strong>’s ability to find errors. However, it is required to isol<strong>at</strong>e read-only dangling<br />
pointer errors, where the canary itself remains intact. Because it would take an<br />
impractically large number of iter<strong>at</strong>ions or replicas to isol<strong>at</strong>e these errors, <strong>Extermin<strong>at</strong>or</strong><br />
always fills freed objects <strong>with</strong> canaries when not running in cumul<strong>at</strong>ive mode<br />
Probabilistic Error Detection<br />
Whenever DieFast alloc<strong>at</strong>es memory, it examines the memory to be returned to verify<br />
th<strong>at</strong> any canaries are intact. If not, in addition to signaling an error (see Section 3.4),<br />
DieFast sets the alloc<strong>at</strong>ed bit for this chunk of memory. This “bad object isol<strong>at</strong>ion” ensures<br />
Dept. Of Computer Science & Engg. ~ 7 ~ Cochin University of Science & Technology
<strong>Extermin<strong>at</strong>or</strong><br />
th<strong>at</strong> the object will not be reused for future alloc<strong>at</strong>ions, preserving its contents for<br />
<strong>Extermin<strong>at</strong>or</strong>’s subsequent use. Checking canary integrity on each alloc<strong>at</strong>ion ensures th<strong>at</strong><br />
DieFast will detect heap corruption <strong>with</strong>in E(H) alloc<strong>at</strong>ions, where H is the number of<br />
objects on the heap.<br />
After every dealloc<strong>at</strong>ion, DieFast checks both the preceding and subsequent objects.<br />
For each of these, DieFast checks if they are free. If so, it performs the same canary check<br />
as above. Recall th<strong>at</strong> because DieFast’s alloc<strong>at</strong>ion is random, the identity of these adjacent<br />
objects will differ from run to run. Checking the predecessor and successor on each free<br />
allows DieFast to detect buffer overruns immedi<strong>at</strong>ely upon object dealloc<strong>at</strong>ion.<br />
3.4 Modes of Oper<strong>at</strong>ion<br />
<strong>Extermin<strong>at</strong>or</strong> can be used in three modes of oper<strong>at</strong>ion: an iter<strong>at</strong>ive mode suitable for<br />
testing or whenever all inputs are available, a replic<strong>at</strong>ed mode th<strong>at</strong> is suitable both for<br />
testing and for restricted deployment scenarios, and a cumul<strong>at</strong>ive mode th<strong>at</strong> is suitable for<br />
broad deployment. All of these rely on the gener<strong>at</strong>ion of heap images, which <strong>Extermin<strong>at</strong>or</strong><br />
examines to isol<strong>at</strong>e errors and compute runtime p<strong>at</strong>ches.<br />
If <strong>Extermin<strong>at</strong>or</strong> discovers an error when executing a program, or if DieFast signals an<br />
error, <strong>Extermin<strong>at</strong>or</strong> forces the process to emit a heap image file. This file is akin to a core<br />
dump, but contains less d<strong>at</strong>a (e.g., no code), and is organized to simplify processing. In<br />
addition to the full heap contents and heap metad<strong>at</strong>a, the heap image includes the current<br />
alloc<strong>at</strong>ion time (measured by the number of alloc<strong>at</strong>ions to d<strong>at</strong>e)<br />
Iter<strong>at</strong>ive Mode<br />
<strong>Extermin<strong>at</strong>or</strong>’s iter<strong>at</strong>ive mode oper<strong>at</strong>es <strong>with</strong>out replic<strong>at</strong>ion. To find a single bug,<br />
<strong>Extermin<strong>at</strong>or</strong> is initially invoked via a command-line option th<strong>at</strong> directs it to stop as soon<br />
as it detects an error. <strong>Extermin<strong>at</strong>or</strong> then re-executes the program in “replay” mode over the<br />
same input (but <strong>with</strong> a new random seed). In this mode, <strong>Extermin<strong>at</strong>or</strong> reads the alloc<strong>at</strong>ion<br />
time from the initial heap image to abort execution <strong>at</strong> th<strong>at</strong> point; we call this a malloc<br />
breakpoint. <strong>Extermin<strong>at</strong>or</strong> then begins execution and ignores DieFast error signals th<strong>at</strong> are<br />
raised before the malloc breakpoint is reached.<br />
Once it reaches the malloc breakpoint, <strong>Extermin<strong>at</strong>or</strong> triggers another heap image<br />
dump. This process can be repe<strong>at</strong>ed multiple times to gener<strong>at</strong>e independent heap images.<br />
<strong>Extermin<strong>at</strong>or</strong> then performs post-mortem error isol<strong>at</strong>ion and runtime p<strong>at</strong>ch gener<strong>at</strong>ion. A<br />
small number of iter<strong>at</strong>ions usually suffices for <strong>Extermin<strong>at</strong>or</strong> to gener<strong>at</strong>e runtime p<strong>at</strong>ches<br />
for an individual error, as we show in Section 7.2. When run <strong>with</strong> a correcting memory<br />
alloc<strong>at</strong>or th<strong>at</strong> in corpor<strong>at</strong>es these changes (described in detail in Section 6.3), these p<strong>at</strong>ches<br />
autom<strong>at</strong>ically fix the isol<strong>at</strong>ed errors.<br />
Replic<strong>at</strong>ed Mode<br />
The iter<strong>at</strong>ed mode described above works well when all inputs are available so th<strong>at</strong> rerunning<br />
an execution is feasible. However, when applic<strong>at</strong>ions are deployed in the field,<br />
Dept. Of Computer Science & Engg. ~ 8 ~ Cochin University of Science & Technology
<strong>Extermin<strong>at</strong>or</strong><br />
such inputs may not be available, and replaying may be impractical. The replic<strong>at</strong>ed mode<br />
of oper<strong>at</strong>ion allows <strong>Extermin<strong>at</strong>or</strong> to correct errors while the program is running, <strong>with</strong>out<br />
the need for multiple iter<strong>at</strong>ions.<br />
Like DieHard, <strong>Extermin<strong>at</strong>or</strong> can run a number of differently randomized replicas<br />
simultaneously (as separ<strong>at</strong>e processes), broadcasting inputs to all and voting on their<br />
outputs. However, <strong>Extermin<strong>at</strong>or</strong> uses DieFast-based heaps, each <strong>with</strong> a correcting<br />
alloc<strong>at</strong>or. This organiz<strong>at</strong>ion lets <strong>Extermin<strong>at</strong>or</strong> discover and fix errors.<br />
In replic<strong>at</strong>ed mode, when DieFast signals an error or the voter detects divergent<br />
output, <strong>Extermin<strong>at</strong>or</strong> sends a signal th<strong>at</strong> triggers a heap image dump for each replica. If the<br />
program crashes because of a segment<strong>at</strong>ion viol<strong>at</strong>ion, a signal handler also dumps a heap<br />
image.<br />
If DieFast signals an error, the replicas th<strong>at</strong> dump a heap image do not have to stop<br />
executing. If their output continues to be in agreement, they can continue executing<br />
concurrently <strong>with</strong> the error isol<strong>at</strong>ion process. When the runtime p<strong>at</strong>ch gener<strong>at</strong>ion process is<br />
complete, th<strong>at</strong> process signals the running replicas to tell the correcting alloc<strong>at</strong>ors to reload<br />
their runtime p<strong>at</strong>ches. Thus, subsequent alloc<strong>at</strong>ions in the same process will be p<strong>at</strong>ched onthe-fly<br />
<strong>with</strong>out interrupting execution.<br />
Dept. Of Computer Science & Engg. ~ 9 ~ Cochin University of Science & Technology
Cumul<strong>at</strong>ive Mode<br />
<strong>Extermin<strong>at</strong>or</strong><br />
While the replic<strong>at</strong>ed mode can isol<strong>at</strong>e and correct errors on-the fly in deployed<br />
applic<strong>at</strong>ions, it may not be practical in all situ<strong>at</strong>ions. For example, replic<strong>at</strong>ing applic<strong>at</strong>ions<br />
<strong>with</strong> high resource requirements may cause unacceptable overhead. In addition,<br />
multithreaded or non-deterministic applic<strong>at</strong>ions can exhibit different alloc<strong>at</strong>ion activity and<br />
so cause object ids to diverge across replicas. To support these applic<strong>at</strong>ions, <strong>Extermin<strong>at</strong>or</strong><br />
uses its third mode of oper<strong>at</strong>ion, cumul<strong>at</strong>ive mode, which isol<strong>at</strong>es errors <strong>with</strong>out<br />
replic<strong>at</strong>ion or multiple identical executions.<br />
When oper<strong>at</strong>ing in cumul<strong>at</strong>ive mode, <strong>Extermin<strong>at</strong>or</strong> reasons about objects grouped by<br />
alloc<strong>at</strong>ion and dealloc<strong>at</strong>ion sites instead of individual objects, since objects are no longer<br />
guaranteed to be identical across different executions.<br />
Because objects from a given site only occasionally cause errors, often <strong>at</strong> low<br />
frequencies, <strong>Extermin<strong>at</strong>or</strong> requires more executions than in replic<strong>at</strong>ed or iter<strong>at</strong>ive mode in<br />
order to identify these low-frequency errors <strong>with</strong>out a high false positive r<strong>at</strong>e. Instead of<br />
storing heap images from multiple runs, <strong>Extermin<strong>at</strong>or</strong> computes relevant st<strong>at</strong>istics about<br />
each run and stores them in its p<strong>at</strong>ch file. The retained d<strong>at</strong>a is on the order of a few<br />
kilobytes per execution, compared to tens or hundreds of megabytes for each heap image.<br />
Dept. Of Computer Science & Engg. ~ 10 ~ Cochin University of Science & Technology
4. Error Isol<strong>at</strong>ion<br />
<strong>Extermin<strong>at</strong>or</strong><br />
<strong>Extermin<strong>at</strong>or</strong> employs two different families of error isol<strong>at</strong>ion algorithms: one set<br />
for replic<strong>at</strong>ed and iter<strong>at</strong>ive modes, and another for cumul<strong>at</strong>ive mode.<br />
ITERATIVE AND REPLICATED ERROR ISOLATION<br />
When oper<strong>at</strong>ing in its replic<strong>at</strong>ed or iter<strong>at</strong>ive modes, <strong>Extermin<strong>at</strong>or</strong>’s probabilistic<br />
error isol<strong>at</strong>ion algorithm oper<strong>at</strong>es by searching for discrepancies across multiple heap<br />
images. <strong>Extermin<strong>at</strong>or</strong> relies on corrupted canaries stored in freed objects to indic<strong>at</strong>e<br />
the presence of an error. A corrupted canary (one th<strong>at</strong> has been overwritten) can mean two<br />
things. If the same object (identified by object id) across all heap images has the same<br />
corruption, then the error is likely to be a dangling pointer. If canaries are corrupted in<br />
multiple freed objects, then the error is likely to be a buffer overflow. <strong>Extermin<strong>at</strong>or</strong> limits<br />
the number of false positives for both overflows and dangling pointer errors.<br />
4.1 Buffer overfow detection<br />
<strong>Extermin<strong>at</strong>or</strong> examines heap images looking for discrepancies across the heaps, both in<br />
overwritten canaries and in live objects. If an object is not equivalent across the heaps,<br />
extermin<strong>at</strong>or considers it to be a candid<strong>at</strong>e victim of an overflow.<br />
To identify victim objects, extermin<strong>at</strong>or compares the contents of equivalent objects,<br />
as identified by their object Id across all heaps. <strong>Extermin<strong>at</strong>or</strong> builds an overflow mask<br />
th<strong>at</strong> comprises the discrepancies found across all heaps. However, because the same<br />
logical object may legitim<strong>at</strong>ely differ across multiple heaps, <strong>Extermin<strong>at</strong>or</strong> must take<br />
care not to consider these occurrences as overflows.<br />
First, a freed object may differ across heaps because it was filled <strong>with</strong> canaries only in<br />
some of the heaps. <strong>Extermin<strong>at</strong>or</strong> uses the canary bitmap to identify this case.<br />
Second, an object can contain pointers to other objects, which are randomly loc<strong>at</strong>ed<br />
on their respective heaps. <strong>Extermin<strong>at</strong>or</strong> uses both deterministic and probabilistic<br />
techniques to distinguish integers from pointers. Briefly, if a value interpreted as a<br />
pointer points inside the heap area and points to the same logical object across all heaps,<br />
then <strong>Extermin<strong>at</strong>or</strong> considers it to be the same logical pointer, and thus not a discrepancy.<br />
<strong>Extermin<strong>at</strong>or</strong> also handles the case where pointers point into dynamic libraries, which<br />
newer versions of Linux place <strong>at</strong> pseudorandom base addresses.<br />
Finally, an object can contain values th<strong>at</strong> legitim<strong>at</strong>ely differ from process to process.<br />
Examples of these values include process ids, file handles, pseudorandom numbers, and<br />
pointers in d<strong>at</strong>a structures th<strong>at</strong> depend on addresses (e.g., some red-black tree<br />
implement<strong>at</strong>ions). When <strong>Extermin<strong>at</strong>or</strong> examines an object and encounters any word<br />
th<strong>at</strong> differs <strong>at</strong> the same position across all the heaps, it considers it to be legitim<strong>at</strong>ely<br />
different, and not an indic<strong>at</strong>ion of buffer overflow.<br />
Dept. Of Computer Science & Engg. ~ 11 ~ Cochin University of Science & Technology
<strong>Extermin<strong>at</strong>or</strong><br />
For small overflows, the risk of missing an overflow by ignoring overwrites of<br />
the same objects across multiple heaps is low:<br />
Theorem 1: Let k be the number of heap images, S the length (in number of objects) of<br />
the overflow string, and h the number of objects on the heap. Then the probability of an<br />
overflow over-writing an object on all k heaps is<br />
Proof. Assume th<strong>at</strong> buffer overflows overwrite past the end of an object. Thus, for an<br />
overflow from object i to land on a given object j, it must both precede it and be large<br />
enough to span the distance from i to j. An object i precedes j in k heaps <strong>with</strong> probability<br />
(1/2) k . Objects i and j are separ<strong>at</strong>ed by S or fewer objects <strong>with</strong> probability <strong>at</strong> most<br />
(1/(H −S)) k . Combining these terms yields the joint probability.<br />
False neg<strong>at</strong>ive analysis: We now bound the worst-case false neg<strong>at</strong>ive r<strong>at</strong>e for buffer<br />
overflows; th<strong>at</strong> is, the odds of not finding a buffer overflow because it failed to overwrite<br />
any canaries.<br />
Theorem 2: Let M be the heap multiplier, so a heap is never more than 1/M full. The<br />
likelihood th<strong>at</strong> an overflow of length b bytes fails to be detected by comparison against a<br />
canary is <strong>at</strong> most:<br />
Proof. Each heap is <strong>at</strong> least (M −1)/M free. Since DieFast fills free space <strong>with</strong> canaries<br />
<strong>with</strong> P = 1/2, the fraction of each heap filled <strong>with</strong> canaries is <strong>at</strong> least (M −1)/2M. The<br />
likelihood of a random write not landing on a canary across all k heaps is thus <strong>at</strong> most<br />
(1−(M−1)/2M) k . The overflow string could also m<strong>at</strong>ch the canary value. Since the canary<br />
is randomly chosen, the odds of this are <strong>at</strong> most (1/256) b .<br />
Culprit Identific<strong>at</strong>ion<br />
At this point, <strong>Extermin<strong>at</strong>or</strong> has identified the possible victims of overflows. For each<br />
victim, it scans the heap images for a m<strong>at</strong>ching culprit, the source of the overflow into a<br />
victim. Because <strong>Extermin<strong>at</strong>or</strong> assumes th<strong>at</strong> overflows are deterministic when oper<strong>at</strong>ing in<br />
iter<strong>at</strong>ive or replic<strong>at</strong>ed modes, the culprit must be the same distance δ bytes away from the<br />
victim in every heap image. In addition, <strong>Extermin<strong>at</strong>or</strong> requires th<strong>at</strong> the overflowed values<br />
have some bytes in common across the images, and ranks them by their similarity.<br />
<strong>Extermin<strong>at</strong>or</strong> checks every other heap image for the candid<strong>at</strong>e culprit, and examines the<br />
object th<strong>at</strong> is the same δ bytes forwards. If th<strong>at</strong> object is free and should be filled <strong>with</strong><br />
canaries but they are not intact, then it adds this culprit-victim pair to the candid<strong>at</strong>e list.<br />
False positive analysis: Because buffer overflows can be discontiguous, every object in the<br />
heap th<strong>at</strong> precedes an overflow is a potential culprit. However, each additional heap<br />
dram<strong>at</strong>ically lowers this number:<br />
Dept. Of Computer Science & Engg. ~ 12 ~ Cochin University of Science & Technology
<strong>Extermin<strong>at</strong>or</strong><br />
Theorem 3 : The expected number of objects (possible culprits) the same distance δ from<br />
any given victim object across k heaps is:<br />
Proof. Without loss of generality, assume th<strong>at</strong> the victim object occupies the last slot in<br />
every heap. An object can thus be in any of the remaining n = H−1 slots. The odds of it<br />
being in the same slot in k heaps is p = 1/(H −1) k−1 . This is a binomial distribution, so<br />
E(possible culprits) = np = 1/(H −1) k−2 .<br />
With only one heap image, all (H−1) objects are potential culprits, but one additional<br />
image reduces the expected number of culprits for any victim to just 1 (1/(H −1) 0 ),<br />
effectively elimin<strong>at</strong>ing the risk of false positives.<br />
Once <strong>Extermin<strong>at</strong>or</strong> identifies a culprit-victim pair, it records the overflow size for th<strong>at</strong><br />
culprit as the maximum of any observed δ to a victim. <strong>Extermin<strong>at</strong>or</strong> also assigns each<br />
culprit-victim pair a score th<strong>at</strong> corresponds to its confidence th<strong>at</strong> it is an actual overflow.<br />
This score is 1−(1/256) S , where S is the sum of the length of detected<br />
overflow strings across all pairs. Intuitively, small overflow strings (e.g., one byte)<br />
detected in only a few heap images are given low scores, and large overflow strings present<br />
in many heap images get high scores.<br />
After overflow processing completes, <strong>Extermin<strong>at</strong>or</strong> gener<strong>at</strong>es a runtime p<strong>at</strong>ch for the<br />
top-ranked overflow for each culprit.<br />
4.2 Dangling Pointer Isol<strong>at</strong>ion<br />
Isol<strong>at</strong>ing dangling pointer errors falls into two cases: a program may read and write to<br />
the dangled object, leaving it partially or completely overwritten, or it may only read<br />
through the dangling pointer. <strong>Extermin<strong>at</strong>or</strong> does not handle read-only dangling pointer<br />
errors in iter<strong>at</strong>ive or replic<strong>at</strong>ed mode because it would require too many replicas (e.g.,<br />
around 20; see Section 7.2). However, it handles overwritten dangling objects<br />
straightforwardly.<br />
When a freed object is overwritten <strong>with</strong> identical values across multiple heap images,<br />
<strong>Extermin<strong>at</strong>or</strong> classifies the error as a dangling pointer overwrite. As Theorem 1 shows, this<br />
situ<strong>at</strong>ion is highly unlikely to occur for a buffer overflow. <strong>Extermin<strong>at</strong>or</strong> then gener<strong>at</strong>es an<br />
appropri<strong>at</strong>e runtime p<strong>at</strong>ch, as Section 6.2 describes.<br />
Dept. Of Computer Science & Engg. ~ 13 ~ Cochin University of Science & Technology
5. CUMULATIVE ERROR ISOLATION<br />
<strong>Extermin<strong>at</strong>or</strong><br />
Unlike iter<strong>at</strong>ive and replic<strong>at</strong>ed mode, cumul<strong>at</strong>ive mode focuses on detecting,<br />
isol<strong>at</strong>ing, and correcting errors th<strong>at</strong> happen in the field. In this context, replic<strong>at</strong>ion,<br />
identical inputs, and deterministic execution are infeasible. Worse, program errors may<br />
manifest themselves in ways th<strong>at</strong> are inherently hard to detect. For example, a program<br />
th<strong>at</strong> reads a canary written into a free object may fail immedi<strong>at</strong>ely, or may execute<br />
incorrectly for some time.<br />
The approach to error detection in this mode is to consider exceptional program<br />
events, such as prem<strong>at</strong>ure termin<strong>at</strong>ion, raising unexpected signals, etc., to be evidence th<strong>at</strong><br />
memory was corrupted during execution. We counter the lack of error reproducibility in<br />
these cases <strong>with</strong> st<strong>at</strong>istical accumul<strong>at</strong>ion of evidence before assuming an error needs to be<br />
corrected. <strong>Extermin<strong>at</strong>or</strong> isol<strong>at</strong>es memory errors in cumul<strong>at</strong>ive mode by computing<br />
summary inform<strong>at</strong>ion accumul<strong>at</strong>ed over multiple executions, r<strong>at</strong>her than by oper<strong>at</strong>ing<br />
over multiple heap images.<br />
5.1 Buffer overflow detection<br />
<strong>Extermin<strong>at</strong>or</strong>’s buffer overflow isol<strong>at</strong>ion algorithm proceeds in three phases. First, it<br />
identifies heap corruption by looking for overwritten canary values. Second, for each<br />
alloc<strong>at</strong>ion site, it computes an estim<strong>at</strong>e of the probability th<strong>at</strong> an object from th<strong>at</strong> site could<br />
be the source of the corruption. Third, it combines these independent estim<strong>at</strong>es from<br />
multiple runs to identify sites th<strong>at</strong> consistently appear as candid<strong>at</strong>es for<br />
causing the corruption.<br />
<strong>Extermin<strong>at</strong>or</strong>’s randomized alloc<strong>at</strong>or allows us to compute the probability of<br />
certain properties in the heap. For example, the probability of an object occurring on<br />
a given miniheap can be estim<strong>at</strong>ed given the miniheap size and the number of miniheaps.<br />
If objects from some alloc<strong>at</strong>ion site are sources of overflows, then those objects will<br />
occur on miniheaps containing corruptions more often than expected. <strong>Extermin<strong>at</strong>or</strong> tracks<br />
how often objects from each alloc<strong>at</strong>ion site occur on corrupted miniheaps across<br />
multiple runs. Using this inform<strong>at</strong>ion, it uses a st<strong>at</strong>istical hypothesis test th<strong>at</strong> identifies<br />
sites th<strong>at</strong> occur <strong>with</strong> corruption too often to be random chance, and identifies them as<br />
overflow culprits.<br />
Once <strong>Extermin<strong>at</strong>or</strong> identifies an erroneous alloc<strong>at</strong>ion site, it produces a runtime<br />
p<strong>at</strong>ch th<strong>at</strong> corrects the error. To find the correct pad size, it searches backward from the<br />
corruption found during the current run until it finds an object alloc<strong>at</strong>ed from the site. It<br />
then uses the distance between th<strong>at</strong> object and the end of the corruption as the pad size.<br />
Dept. Of Computer Science & Engg. ~ 14 ~ Cochin University of Science & Technology
5.2 Dangling pointer isol<strong>at</strong>ion<br />
<strong>Extermin<strong>at</strong>or</strong><br />
As <strong>with</strong> buffer overflows, <strong>Extermin<strong>at</strong>or</strong>’s dangling pointer isol<strong>at</strong>or computes<br />
summary inform<strong>at</strong>ion over multiple runs. To force each run to have a different effect,<br />
<strong>Extermin<strong>at</strong>or</strong> fills freed objects <strong>with</strong> canaries <strong>with</strong> some probability p, turning every<br />
execution into a series of Bernoulli trials. In this scenario, if the program reads canary<br />
d<strong>at</strong>a through the dangling pointer, the program may crash. Thus writing the canary for th<strong>at</strong><br />
object increases the probability th<strong>at</strong> the program will l<strong>at</strong>er crash. Conversely, if an<br />
object is not freed prem<strong>at</strong>urely, then overwriting it <strong>with</strong> canaries has no influence on the<br />
failure or success of the program. <strong>Extermin<strong>at</strong>or</strong> then uses the same hypothesis testing<br />
framework as its buffer overflow algorithm to identify sources of dangling pointer<br />
errors.<br />
The choice of p reflects a trade-off between the precision of the buffer<br />
overflow algorithm and dangling pointer isol<strong>at</strong>ion. Since overflow isol<strong>at</strong>ion relies on<br />
detecting corrupt canaries, low values of p increase the number of runs (though not the<br />
number of failures) required to isol<strong>at</strong>e overflows. However, lower values of p increase<br />
the precision of dangling pointer isol<strong>at</strong>ion by reducing the risk th<strong>at</strong> certain alloc<strong>at</strong>ion<br />
sites (those th<strong>at</strong> alloc<strong>at</strong>e large numbers of objects) will always observe one canary value.<br />
We currently set p = 1/2, though some dangling pointer errors may require lower values of<br />
p to converge <strong>with</strong>in a reasonable number of runs.<br />
<strong>Extermin<strong>at</strong>or</strong> then estim<strong>at</strong>es the required lifetime extension by loc<strong>at</strong>ing the oldest<br />
canaried object from an identified alloc<strong>at</strong>ion site, and computing the number of<br />
alloc<strong>at</strong>ions between the time it was freed and the time th<strong>at</strong> the program failed. The<br />
correcting alloc<strong>at</strong>or then extends the lifetime of all objects corresponding to this<br />
alloc<strong>at</strong>ion/dealloc<strong>at</strong>ion site by twice this number.<br />
Dept. Of Computer Science & Engg. ~ 15 ~ Cochin University of Science & Technology
6. Error Correction<br />
<strong>Extermin<strong>at</strong>or</strong><br />
How <strong>Extermin<strong>at</strong>or</strong> uses the inform<strong>at</strong>ion from its error isol<strong>at</strong>ion algorithms to correct<br />
specific errors? <strong>Extermin<strong>at</strong>or</strong> first gener<strong>at</strong>es runtime p<strong>at</strong>ches for each error. It then relies<br />
on a correcting alloc<strong>at</strong>or th<strong>at</strong> uses this inform<strong>at</strong>ion, padding alloc<strong>at</strong>ions to prevent<br />
overflows, and deferring dealloc<strong>at</strong>ions to prevent dangling pointer errors.<br />
6.1 Buffer overflow correction<br />
For every culprit-victim pair th<strong>at</strong> <strong>Extermin<strong>at</strong>or</strong> encounters, it gener<strong>at</strong>es a runtime<br />
p<strong>at</strong>ch consisting of the alloc<strong>at</strong>ion site hash and the padding needed to contain the overflow<br />
(δ + the size of the over-flow). If a runtime p<strong>at</strong>ch has already been gener<strong>at</strong>ed for a given<br />
alloc<strong>at</strong>ion site, <strong>Extermin<strong>at</strong>or</strong> uses the maximum padding value encountered so far.<br />
6.2 Dangling pointer correction<br />
The runtime p<strong>at</strong>ch for a dangling pointer consists of the combin<strong>at</strong>ion of its alloc<strong>at</strong>ion<br />
site info and a time by which to delay its dealloc<strong>at</strong>ion. <strong>Extermin<strong>at</strong>or</strong> computes this delay as<br />
follows. Let τ be the recorded dealloc<strong>at</strong>ion time of the dangled object, and T be the last<br />
alloc<strong>at</strong>ion time. <strong>Extermin<strong>at</strong>or</strong> has no way of knowing how long the object is supposed to<br />
live, so computing an exact delay time is impossible. Instead, it extends the object’s<br />
lifetime (delays its free) by twice the distance between its prem<strong>at</strong>ure free and the last<br />
alloc<strong>at</strong>ion time, plus one: 2×(T −τ)+1.<br />
This choice ensures th<strong>at</strong> <strong>Extermin<strong>at</strong>or</strong> will compute a correct p<strong>at</strong>ch in a logarithmic<br />
number of executions. As we show in Section 7.2, multiple iter<strong>at</strong>ions to correct pointer<br />
errors are rare in practice, because the last alloc<strong>at</strong>ion time can be well past the time th<strong>at</strong> the<br />
object should have been freed.<br />
It is important to note th<strong>at</strong> this dealloc<strong>at</strong>ion deferral does not multiply its lifetime but<br />
r<strong>at</strong>her its drag. To illustr<strong>at</strong>e, an object might live for 1000 alloc<strong>at</strong>ions and then be freed just<br />
10 alloc<strong>at</strong>ions too soon. If the program immedi<strong>at</strong>ely crashes, <strong>Extermin<strong>at</strong>or</strong> will extend its<br />
lifetime by 21 alloc<strong>at</strong>ions, increasing its lifetime by less than 1% (1021/1010).<br />
6.3 The Correcting Memory Alloc<strong>at</strong>or<br />
The correcting memory alloc<strong>at</strong>or incorpor<strong>at</strong>es the runtime p<strong>at</strong>ches described above<br />
and applies them when appropri<strong>at</strong>e. Figure 6 presents pseudo-code for the alloc<strong>at</strong>ion and<br />
dealloc<strong>at</strong>ion functions.<br />
At start-up, or upon receiving a reload signal (Section 3.4), the correcting alloc<strong>at</strong>or<br />
loads the runtime p<strong>at</strong>ches from a specified file. It builds two hash tables: a pad table<br />
mapping alloc<strong>at</strong>ion sites to pad sizes, and a deferral table, mapping pairs of alloc<strong>at</strong>ion and<br />
dealloc<strong>at</strong>ion sites to a deferral value. Because it can reload the run-time p<strong>at</strong>ch file and<br />
Dept. Of Computer Science & Engg. ~ 16 ~ Cochin University of Science & Technology
<strong>Extermin<strong>at</strong>or</strong><br />
rebuild these tables on-the-fly, <strong>Extermin<strong>at</strong>or</strong> can apply p<strong>at</strong>ches to running programs<br />
<strong>with</strong>out interrupting their execution. This aspect of <strong>Extermin<strong>at</strong>or</strong>’s oper<strong>at</strong>ion may be<br />
especially useful for systems th<strong>at</strong> must be kept running continuously.<br />
On every dealloc<strong>at</strong>ion, the correcting alloc<strong>at</strong>or checks to see if the object to be freed<br />
needs to be deferred. If it finds a deferral value for the object’s alloc<strong>at</strong>ion and dealloc<strong>at</strong>ion<br />
site, it pushes onto the deferral priority queue the pointer and the time to actually free it<br />
(the current alloc<strong>at</strong>ion time plus the deferral value).<br />
The correcting alloc<strong>at</strong>or then checks the deferral queue on every alloc<strong>at</strong>ion to see if<br />
an object should now be freed. It then checks whether the current alloc<strong>at</strong>ion site has an<br />
associ<strong>at</strong>ed pad value. If so, it adds the pad value to the alloc<strong>at</strong>ion request, and forwards the<br />
alloc<strong>at</strong>ion request to the underlying alloc<strong>at</strong>or.<br />
Dept. Of Computer Science & Engg. ~ 17 ~ Cochin University of Science & Technology
6.4 Collabor<strong>at</strong>ive Correction<br />
<strong>Extermin<strong>at</strong>or</strong><br />
Each individual user of an applic<strong>at</strong>ion is likely to experience different errors. To allow<br />
an entire user community to autom<strong>at</strong>ically improve software reliability, <strong>Extermin<strong>at</strong>or</strong><br />
provides a simple utility th<strong>at</strong> supports collabor<strong>at</strong>ive correction. This utility takes as input a<br />
number of runtime p<strong>at</strong>ch files. It then combines these p<strong>at</strong>ches by computing the maximum<br />
buffer pad required for any alloc<strong>at</strong>ion site, and the maximal deferral amount for any given<br />
alloc<strong>at</strong>ion site. The result is a new runtime p<strong>at</strong>ch file th<strong>at</strong> covers all observed errors.<br />
Because the size of p<strong>at</strong>ch files is limited by the number of alloc<strong>at</strong>ion sites in a program, we<br />
expect these files to be compact and practical to transmit. For example, the size of the<br />
runtime p<strong>at</strong>ches th<strong>at</strong> <strong>Extermin<strong>at</strong>or</strong> gener<strong>at</strong>es for injected errors in espresso was just 130K,<br />
and shrinks to 17K when compressed <strong>with</strong> gzip.<br />
Dept. Of Computer Science & Engg. ~ 18 ~ Cochin University of Science & Technology
<strong>Extermin<strong>at</strong>or</strong><br />
7. Results<br />
Wh<strong>at</strong> is the runtime overhead of using <strong>Extermin<strong>at</strong>or</strong>?<br />
How effective is <strong>Extermin<strong>at</strong>or</strong> <strong>at</strong> finding and correcting memory errors, both for<br />
injected and real faults?<br />
Wh<strong>at</strong> is the overhead of <strong>Extermin<strong>at</strong>or</strong>’s runtime p<strong>at</strong>ches?<br />
7.1 <strong>Extermin<strong>at</strong>or</strong> Runtime Overhead<br />
Here evalu<strong>at</strong>e <strong>Extermin<strong>at</strong>or</strong>’s performance <strong>with</strong> the SPECint2000 suite running<br />
reference workloads, as well as a suite of alloc<strong>at</strong>ion-intensive benchmarks. We use the<br />
l<strong>at</strong>ter suite of benchmarks both because they are widely used in memory management<br />
studies and because their high alloc<strong>at</strong>ion-intensity stresses memory management<br />
performance. For all experiments, we fix <strong>Extermin<strong>at</strong>or</strong>’s heap multiplier (value of M) <strong>at</strong> 2.<br />
All results are the average of five runs on a quiescent, dual-processor Linux system<br />
<strong>with</strong> 3GB of RAM, <strong>with</strong> each 3.06 GHz Intel Xeon processor (hyperthreading active)<br />
equipped <strong>with</strong> 512K L2 caches. Our observed experimental variance is below 1%.<br />
We focus on the nonreplic<strong>at</strong>ed mode (iter<strong>at</strong>ive/cumul<strong>at</strong>ive), which we expect to be a<br />
key limiting factor for <strong>Extermin<strong>at</strong>or</strong>’s performance and the most common usage scenario.<br />
We compare the runtime of <strong>Extermin<strong>at</strong>or</strong> (DieFast plus the correcting alloc<strong>at</strong>or) to<br />
the GNU libc alloc<strong>at</strong>or. This alloc<strong>at</strong>or is based on the Lea alloc<strong>at</strong>or, which is among the<br />
fastest avail-able. Figure shows th<strong>at</strong>, versus this alloc<strong>at</strong>or, <strong>Extermin<strong>at</strong>or</strong> degrades<br />
performance by from 0% (186.crafty) to 132% (cfrac), <strong>with</strong> a geometric mean of<br />
25.1%. While <strong>Extermin<strong>at</strong>or</strong>’s overhead is substantial for the alloc<strong>at</strong>ion-intensive suite<br />
(geometric mean: 81.2%), where the cost of computing alloc<strong>at</strong>ion and dealloc<strong>at</strong>ion<br />
contexts domin<strong>at</strong>es, its overhead is significantly less pronounced across the SPEC<br />
benchmarks (geometric mean: 7.2%).<br />
Figure: Runtime overhead for <strong>Extermin<strong>at</strong>or</strong> across a suite of<br />
benchmarks, normalized to the performance of GNU libc (Linux) alloc<strong>at</strong>or.<br />
Dept. Of Computer Science & Engg. ~ 19 ~ Cochin University of Science & Technology
7.2 Memory Error Correction<br />
Injected Faults<br />
<strong>Extermin<strong>at</strong>or</strong><br />
To measure <strong>Extermin<strong>at</strong>or</strong>’s effectiveness <strong>at</strong> isol<strong>at</strong>ing and correcting bugs, we used the<br />
fault injector th<strong>at</strong> accompanies the DieHard distribution to inject buffer overflows and<br />
dangling pointer errors. For each d<strong>at</strong>a point, we run the injector using a random seed until<br />
it triggers an error or divergent output. We next use this seed to deterministically trigger a<br />
single error in <strong>Extermin<strong>at</strong>or</strong>, which we run in iter<strong>at</strong>ive mode. We then measure the number<br />
of iter<strong>at</strong>ions required to isol<strong>at</strong>e and gener<strong>at</strong>e an appropri<strong>at</strong>e runtime p<strong>at</strong>ch. The total<br />
number of images (iter<strong>at</strong>ions plus the first run) corresponds to the number of replicas th<strong>at</strong><br />
would be required when running <strong>Extermin<strong>at</strong>or</strong> in replic<strong>at</strong>ed mode.<br />
Buffer overflows: Figure presents the result of triggering 26 different buffer overflows for<br />
three different sizes (4, 8, and 16 bytes) in the espresso benchmark. The number of images<br />
required to isol<strong>at</strong>e and correct these errors is generally just 3, although occasionally there<br />
are outlier cases, where the number of images required reaches <strong>at</strong> most 5. Notice th<strong>at</strong> this<br />
result is substantially better than the analytical worst-case. For three images, Theorem 2<br />
bounds the worst-case likelihood of missing an overflow to 42% (Section 4.1),r<strong>at</strong>her than<br />
the 4% false neg<strong>at</strong>ive r<strong>at</strong>e we observe here.<br />
Figure: The number of images required for <strong>Extermin<strong>at</strong>or</strong> to<br />
isol<strong>at</strong>e and correct buffer overflows of different sizes<br />
Dangling pointer errors: We then triggered 10 dangling pointer faults in espresso <strong>with</strong><br />
<strong>Extermin<strong>at</strong>or</strong> running in iter<strong>at</strong>ive and in cumul<strong>at</strong>ive modes. In iter<strong>at</strong>ive mode, <strong>Extermin<strong>at</strong>or</strong><br />
succeeds in isol<strong>at</strong>ing the error in only 4 runs. In another 4 runs, espresso does not write<br />
through the dangling pointer. Instead, it reads a canary value through the dangled pointer,<br />
tre<strong>at</strong>s it as valid d<strong>at</strong>a, and either crashes or aborts. Since no corruption is present in the<br />
heap, <strong>Extermin<strong>at</strong>or</strong> cannot isol<strong>at</strong>e the source of the error. In the remaining 2 runs, writing<br />
Dept. Of Computer Science & Engg. ~ 20 ~ Cochin University of Science & Technology
<strong>Extermin<strong>at</strong>or</strong><br />
canaries into the dangled object triggers a cascade of errors th<strong>at</strong> corrupt large segments of<br />
the heap. In these cases, the corruption destroys the inform<strong>at</strong>ion <strong>Extermin<strong>at</strong>or</strong> requires to<br />
isol<strong>at</strong>e the error.<br />
Figure: The number of images required for <strong>Extermin<strong>at</strong>or</strong> to<br />
isol<strong>at</strong>e and correct dangling pointer errors<br />
In cumul<strong>at</strong>ive mode, however, <strong>Extermin<strong>at</strong>or</strong> successfully isol<strong>at</strong>es all 10 injected<br />
errors. For runs where no large-scale heap corruption occurs, <strong>Extermin<strong>at</strong>or</strong> requires<br />
between 22 and 30 executions to isol<strong>at</strong>e and correct the errors. In each case, 15 failures<br />
must be observed before the erroneous site pair crosses the likelihood threshold. Because<br />
objects are overwritten randomly, the number of runs required to yield 15 failures varies.<br />
Where writing canaries corrupts a large fraction of the heap, <strong>Extermin<strong>at</strong>or</strong> requires 18<br />
failures and 34 total runs. In some of the runs, execution continues long enough for the<br />
alloc<strong>at</strong>or to reuse the culprit object, preventing <strong>Extermin<strong>at</strong>or</strong> from observing th<strong>at</strong> it was<br />
overwritten.<br />
Real Faults<br />
We also tested <strong>Extermin<strong>at</strong>or</strong> <strong>with</strong> actual bugs in two applic<strong>at</strong>ions:<br />
the Squid web caching server, and the Mozilla web browser.<br />
Squid web cache: Version 2.3s5 of Squid has a buffer overflow; certain inputs cause<br />
Squid to crash <strong>with</strong> either the GNU libc alloc<strong>at</strong>or or the Boehm-Demers-Weiser collector.<br />
We run Squid three times under <strong>Extermin<strong>at</strong>or</strong> in iter<strong>at</strong>ive mode <strong>with</strong> an input th<strong>at</strong><br />
triggers a buffer overflow. <strong>Extermin<strong>at</strong>or</strong> continues executing correctly in each run, but the<br />
overflow corrupts a canary. <strong>Extermin<strong>at</strong>or</strong>’s error isol<strong>at</strong>ion algorithm identifies a single alloc<strong>at</strong>ion<br />
site as the culprit and gener<strong>at</strong>es a pad of exactly 6 bytes,<br />
fixing the error.<br />
Dept. Of Computer Science & Engg. ~ 21 ~ Cochin University of Science & Technology
<strong>Extermin<strong>at</strong>or</strong><br />
Mozilla web browser: We also tested <strong>Extermin<strong>at</strong>or</strong>’s cumul<strong>at</strong>ive mode on a known heap<br />
overflow inMozilla 1.7.3 / Firefox 1.0.6 and earlier. This overflow (bug 307259) occurs<br />
because of an error in Mozilla’s processing of Unicode characters in domain names. Not<br />
only is Mozilla multi-threaded, leading to non-deterministic alloc<strong>at</strong>ion behavior, but even<br />
slight differences in moving the mouse cause alloc<strong>at</strong>ion sequences to diverge. Thus,<br />
neither replic<strong>at</strong>ed nor iter<strong>at</strong>ive modes can identify equivalent objects across multiple runs.<br />
We perform two case studies th<strong>at</strong> represent plausible scenarios for using<br />
<strong>Extermin<strong>at</strong>or</strong>’s cumul<strong>at</strong>ive mode. In the first study, the user starts Mozilla and immedi<strong>at</strong>ely<br />
loads a page th<strong>at</strong> triggers the error. This scenario corresponds to a testing environment<br />
where a proof-of-concept input is available. In the second study, the user first navig<strong>at</strong>es<br />
through a selection of pages (different on each run), and then visits the error-triggering<br />
page. This scenario approxim<strong>at</strong>es deployed use where the error is triggered in the wild.<br />
In both cases, <strong>Extermin<strong>at</strong>or</strong> correctly identifies the overflow <strong>with</strong> no false positives.<br />
In the first case, <strong>Extermin<strong>at</strong>or</strong> requires 23 runs to isol<strong>at</strong>e the error. In the second, it requires<br />
34 runs. We believe th<strong>at</strong> this scenario requires more runs because the site th<strong>at</strong> produces the<br />
overflowed object alloc<strong>at</strong>es more correct objects, making it harder to identify it as<br />
erroneous.<br />
7.3 P<strong>at</strong>ch Overhead<br />
<strong>Extermin<strong>at</strong>or</strong>’s approach to correcting memory errors does not impose additional<br />
execution time overhead in the presence of p<strong>at</strong>ches. However, it consumes additional<br />
space, either by padding alloc<strong>at</strong>ions or by deferring dealloc<strong>at</strong>ions. We measure the space<br />
overhead for buffer overflow corrections by multiplying the size of the pad by the<br />
maximum number of live objects th<strong>at</strong> <strong>Extermin<strong>at</strong>or</strong> p<strong>at</strong>ches. The most space overhead we<br />
observe is for the buffer overflow experiment <strong>with</strong> overflows of size 36, where the total<br />
increased space overhead is between 320 and 2816 bytes.<br />
We measure space overhead for dangling pointer corrections by multiplying the object<br />
size by the number of alloc<strong>at</strong>ions for which the object is deferred; th<strong>at</strong> is, we compute the<br />
total additional drag. In the dangling pointer experiment, the amount of excess memory<br />
ranges from 32 bytes to 1024 bytes (one 256 byte object is deferred for 4 dealloc<strong>at</strong>ions).<br />
This amount constitutes less than 1% of the maximum memory consumed by the<br />
applic<strong>at</strong>ion.<br />
Dept. Of Computer Science & Engg. ~ 22 ~ Cochin University of Science & Technology
8. Rel<strong>at</strong>ed Work<br />
8.1 Randomized Memory Managers<br />
<strong>Extermin<strong>at</strong>or</strong><br />
Several memory management systems employ some degree of randomiz<strong>at</strong>ion,<br />
including loc<strong>at</strong>ing the heap <strong>at</strong> a random base address, adding random padding to alloc<strong>at</strong>ed<br />
objects , shuffling recently-freed objects , or a mix of padding and object deferral. This<br />
level of randomiz<strong>at</strong>ion is insufficient for <strong>Extermin<strong>at</strong>or</strong>, which requires full heap<br />
randomiz<strong>at</strong>ion.<br />
<strong>Extermin<strong>at</strong>or</strong> builds on DieHard, which toler<strong>at</strong>es errors probabilistically; Section 3.1<br />
provides an overview. <strong>Extermin<strong>at</strong>or</strong> substantially modifies and extends DieHard’s heap<br />
layout and alloc<strong>at</strong>ion algorithms. It also uses probabilistic algorithms th<strong>at</strong> identify and<br />
correct errors.<br />
8.2 Autom<strong>at</strong>ic Repair<br />
We are aware of only one other system th<strong>at</strong> repairs errors autom<strong>at</strong>ically: Demsky et<br />
al.’s autom<strong>at</strong>ic d<strong>at</strong>a structure repair enforces d<strong>at</strong>a structure consistency specific<strong>at</strong>ions,<br />
guided by a formal description of the program’s d<strong>at</strong>a structures. <strong>Extermin<strong>at</strong>or</strong> <strong>at</strong>tacks a<br />
different problem, namely th<strong>at</strong> of isol<strong>at</strong>ing and correcting memory errors, and is<br />
orthogonal and complementary to d<strong>at</strong>a structure repair.<br />
Sidiroglou et al. propose STEM, a self-healing runtime th<strong>at</strong> executes functions in a<br />
transactional environment so th<strong>at</strong> if they detect the function misbehaving, they can prevent<br />
it from doing damage. Using STEM, they implement error virtualiz<strong>at</strong>ion, which maps the<br />
set of possible errors in a function onto those th<strong>at</strong> have an explicit error handler. The more<br />
recent SEAD system goes beyond STEM requiring no source code changes, handling I/O<br />
<strong>with</strong> virtual proxies, and by specifying the repair policy explicitly through an external<br />
description. While STEM and SEAD are promising approaches to autom<strong>at</strong>ically<br />
recovering from errors, neither provides solutions for as broad a class of errors as<br />
<strong>Extermin<strong>at</strong>or</strong>, nor do they provide mechanisms to semantically elimin<strong>at</strong>e the source of the<br />
error autom<strong>at</strong>ically, as <strong>Extermin<strong>at</strong>or</strong> does.<br />
8.3 Autom<strong>at</strong>ic Debugging<br />
Two previous systems apply techniques designed to help isol<strong>at</strong>e bugs.<br />
St<strong>at</strong>istical bug isol<strong>at</strong>ion is a distributed assertion sampling technique th<strong>at</strong> helps pinpoint<br />
the loc<strong>at</strong>ion of errors, including but not limited to memory errors. It works by injecting<br />
lightweight tests into the source code; the result of these tests, in bit vector form, can be<br />
processed to gener<strong>at</strong>e likely sources of the errors. This st<strong>at</strong>istical processing differs from<br />
<strong>Extermin<strong>at</strong>or</strong>’s probabilistic error isol<strong>at</strong>ion algorithms, although Liu et al. also use<br />
hypothesis testing. Like st<strong>at</strong>istical bug isol<strong>at</strong>ion, <strong>Extermin<strong>at</strong>or</strong> can leverage the runs of<br />
deployed programs. However, unlike st<strong>at</strong>istical bug isol<strong>at</strong>ion, <strong>Extermin<strong>at</strong>or</strong> requires<br />
Dept. Of Computer Science & Engg. ~ 23 ~ Cochin University of Science & Technology
<strong>Extermin<strong>at</strong>or</strong><br />
neither source code nor a large deployed user base in order to find errors, and<br />
autom<strong>at</strong>ically gener<strong>at</strong>es runtime p<strong>at</strong>ches th<strong>at</strong> correct them.<br />
Delta debugging autom<strong>at</strong>es the process of identifying the smallest possible inputs th<strong>at</strong> do<br />
and do not exhibit a given error. Given these inputs, it is up to the software developer to<br />
actually loc<strong>at</strong>e the bugs themselves. <strong>Extermin<strong>at</strong>or</strong> focuses on a narrower class of errors, but<br />
is able to isol<strong>at</strong>e and correct an error given just one erroneous input, regardless of its size.<br />
8.4 Fault Tolerance<br />
Recently, there has been an increasing focus on approaches for toler<strong>at</strong>ing hardware<br />
transient errors th<strong>at</strong> are becoming more common due to fabric<strong>at</strong>ion process limit<strong>at</strong>ions.<br />
Work in this area ranges from proposed hardware support to software fault tolerance.While<br />
<strong>Extermin<strong>at</strong>or</strong> also uses redundancy as a method for detecting and correcting errors,<br />
<strong>Extermin<strong>at</strong>or</strong> goes beyond toler<strong>at</strong>ing software errors, which are not transient, to correcting<br />
them permanently. Like <strong>Extermin<strong>at</strong>or</strong>, other efforts in the fault tolerance community seek<br />
to g<strong>at</strong>her d<strong>at</strong>a from multiple program executions to identify potential errors. For example,<br />
Guo et al. use st<strong>at</strong>istical techniques on internal monitoring d<strong>at</strong>a to probabilistically detect<br />
faults, including memory leaks and deadlocks. <strong>Extermin<strong>at</strong>or</strong> goes beyond this previous<br />
work by characterizing each memory error so specifically th<strong>at</strong> a correction can be<br />
autom<strong>at</strong>ically gener<strong>at</strong>ed for it.<br />
Rinard et al. present a compiler-based approach called boundless buffers th<strong>at</strong> caches<br />
out-of-bound writes in a hash table for l<strong>at</strong>er reuse. This approach elimin<strong>at</strong>es buffer<br />
overflow errors (though not dangling pointer errors), but requires source code and imposes<br />
higher performance overheads (1.05x to 8.9x).<br />
Rx oper<strong>at</strong>es by checkpointing program execution and logging inputs. Rx rolls back<br />
crashed applic<strong>at</strong>ions and replays inputs to it in a new environment th<strong>at</strong> pads all alloc<strong>at</strong>ions<br />
or defers all dealloc<strong>at</strong>ions by some amount. If this new environment does not yield<br />
success, Rx rolls back the applic<strong>at</strong>ion again and increases the pad values, up to some<br />
threshold. Unlike Rx, <strong>Extermin<strong>at</strong>or</strong> does not require checkpointing or rollback, and<br />
precisely isol<strong>at</strong>es and corrects memory errors.<br />
8.5 Memory Managers<br />
Conserv<strong>at</strong>ive garbage collection prevents dangling pointer errors, but does not prevent<br />
buffer overflows. <strong>Extermin<strong>at</strong>or</strong>’s error isol<strong>at</strong>ion and correction is orthogonal to garbage<br />
collection.<br />
Finally, there have been numerous debugging memory alloc<strong>at</strong>ors; the document<strong>at</strong>ion<br />
for one of them, mp<strong>at</strong>rol, includes a list of over ninety such systems. Notable recent<br />
alloc<strong>at</strong>ors <strong>with</strong> debugging fe<strong>at</strong>ures include dnmalloc, Heap Server, and version 2.8 of the<br />
Lea alloc<strong>at</strong>or. <strong>Extermin<strong>at</strong>or</strong> either prevents or corrects errors th<strong>at</strong> these alloc<strong>at</strong>ors can only<br />
detect.<br />
Dept. Of Computer Science & Engg. ~ 24 ~ Cochin University of Science & Technology
9. Future Work<br />
<strong>Extermin<strong>at</strong>or</strong><br />
While <strong>Extermin<strong>at</strong>or</strong> can effectively loc<strong>at</strong>e and correct memory errors on the heap, it<br />
does not yet address stack errors. We are investig<strong>at</strong>ing approaches to apply <strong>Extermin<strong>at</strong>or</strong> to<br />
the stack.<br />
In addition, while <strong>Extermin<strong>at</strong>or</strong>’s runtime p<strong>at</strong>ches contain inform<strong>at</strong>ion th<strong>at</strong> describe<br />
the error loc<strong>at</strong>ion and its extent, it is not in a human-readable form. We plan to develop a<br />
tool to process runtime p<strong>at</strong>ches into bug reports <strong>with</strong> suggested fixes.<br />
<strong>Extermin<strong>at</strong>or</strong> currently requires programs to be deterministic in their alloc<strong>at</strong>ion<br />
p<strong>at</strong>terns and their storage to heap objects. Individual executions of a program are expected<br />
to yield the same sequence of object alloc<strong>at</strong>ion identifiers (i.e., object i always refers to the<br />
same semantic object).<br />
These requirements make <strong>Extermin<strong>at</strong>or</strong> well-suited for serial programs, but less<br />
useful for multi-threaded programs. While <strong>Extermin<strong>at</strong>or</strong> can run programs <strong>with</strong> a small<br />
amount of non determinism, its isol<strong>at</strong>ion algorithm is not robust <strong>with</strong> respect to disruptions<br />
to the alloc<strong>at</strong>ion sequence.<br />
While simul<strong>at</strong>ion or deterministic replay mechanisms would enable <strong>Extermin<strong>at</strong>or</strong> to<br />
work reliably for multi-threaded programs, we plan to explore a variety of ways to directly<br />
handle programs <strong>with</strong> extensive non-determinism. One way to improve its behavior in the<br />
face of non-determinism would be for <strong>Extermin<strong>at</strong>or</strong> to associ<strong>at</strong>e object identifiers <strong>with</strong><br />
thread ids. This change would prevent multiple threads from disrupting a global alloc<strong>at</strong>ion<br />
sequence. We are especially interested in combining st<strong>at</strong>istical alignment techniques from<br />
machine learning <strong>with</strong> inform<strong>at</strong>ion gleaned from locks and other synchroniz<strong>at</strong>ion variables<br />
to let us recover alloc<strong>at</strong>ion sequences across multiple threads.<br />
Dept. Of Computer Science & Engg. ~ 25 ~ Cochin University of Science & Technology
10. Conclusion<br />
<strong>Extermin<strong>at</strong>or</strong><br />
This paper presents <strong>Extermin<strong>at</strong>or</strong>, a system th<strong>at</strong> autom<strong>at</strong>ically corrects heap-based<br />
memory errors in C and C++ programs <strong>with</strong> high probability. <strong>Extermin<strong>at</strong>or</strong> oper<strong>at</strong>es<br />
entirely <strong>at</strong> the runtime level on unaltered binaries, and consists of three key components:<br />
DieFast, a probabilistic debugging alloc<strong>at</strong>or<br />
a probabilistic error isol<strong>at</strong>ion algorithm, and<br />
a correcting memory alloc<strong>at</strong>or.<br />
Ex-termin<strong>at</strong>or’s probabilistic error isol<strong>at</strong>ion isol<strong>at</strong>es the source and ex-tent of memory<br />
errors <strong>with</strong> provably low false positive and false neg<strong>at</strong>ive r<strong>at</strong>es. Its correcting memory<br />
alloc<strong>at</strong>or incorpor<strong>at</strong>es runtime p<strong>at</strong>ches th<strong>at</strong> the error isol<strong>at</strong>ion algorithm gener<strong>at</strong>es to<br />
correct memory errors. <strong>Extermin<strong>at</strong>or</strong> is not only suitable for use during testing, but also<br />
can autom<strong>at</strong>ically correct deployed programs.<br />
Dept. Of Computer Science & Engg. ~ 26 ~ Cochin University of Science & Technology
REFERENCES<br />
<strong>Extermin<strong>at</strong>or</strong><br />
Source : Communic<strong>at</strong>ions of the ACM, v.51 n.12, <strong>Extermin<strong>at</strong>or</strong>: Autom<strong>at</strong>ically<br />
correcting memory errors <strong>with</strong> high probability, Pages : 87-95, December 2008<br />
SECTION:Research highlights<br />
Authors : Gene Novark , University of Massachusetts, Amherst, MA<br />
Emery D. Berger , University of Massachusetts, Amherst, MA<br />
Benjamin G. Zorn , Microsoft Research, Redmond, WA<br />
o http://www.cs.umass.edu/~gnovark/<br />
o www.cs.umass.edu/~gnovark/pldi028-novark.<strong>pdf</strong><br />
o www.cs.umass.edu/~emery/pubs/UMCS-2006-058.<strong>pdf</strong><br />
o www.slideshare.net/.../autom<strong>at</strong>ically-toler<strong>at</strong>ing-and-correcting-memoryerrors<br />
-<br />
Dept. Of Computer Science & Engg. ~ 27 ~ Cochin University of Science & Technology