16.08.2013 Views

Exterminator- A ... with High Probability.pdf - DSpace at CUSAT ...

Exterminator- A ... with High Probability.pdf - DSpace at CUSAT ...

Exterminator- A ... with High Probability.pdf - DSpace at CUSAT ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

<strong>Extermin<strong>at</strong>or</strong>: Autom<strong>at</strong>ically Correcting Memory Errors<br />

<strong>with</strong> <strong>High</strong> <strong>Probability</strong><br />

SEMINAR REPORT<br />

2009-2011<br />

In partial fulfillment of Requirements in<br />

Degree of Master of Technology<br />

In<br />

SOFTWARE ENGINEERING<br />

SUBMITTED BY<br />

NISHAMOL P H<br />

DEPARTMENT OF COMPUTER SCIENCE<br />

COCHIN UNIVERSITY OF SCIENCE AND TECHNOLOGY<br />

KOCHI – 682 022


COCHIN UNIVERSITY OF SCIENCE AND TECHNOLOGY<br />

KOCHI – 682 022<br />

DEPARTMENT OF COMPUTER SCIENCE<br />

CERTIFICATE<br />

This is to certify th<strong>at</strong> the seminar report entitled “<strong>Extermin<strong>at</strong>or</strong>: Autom<strong>at</strong>ically<br />

Correcting Memory Errors <strong>with</strong> <strong>High</strong> <strong>Probability</strong>” is being submitted by<br />

NISHAMOL P H in partial fulfillment of the requirements for the award of<br />

M.Tech in Software Engineering is a bonafide record of the seminar<br />

presented by her during the academic year 2009.<br />

Dr.Sumam Mary Idicula Prof. Dr.K.Poulose Jacob<br />

Reader Director<br />

Dept. of Computer Science Dept. of Computer Science


ACKNOWLEDGEMENT<br />

First of all let me thank our Director Prof: Dr. K. Paulose Jacob,<br />

Dept. of Computer Science, who provided <strong>with</strong> the necessary facilities and<br />

advice. I am also thankful to Dr. Sumam Mary Idicula, Reader, Dept. of<br />

Computer Science, for her valuable suggestions and support for the<br />

completion of this seminar. With gre<strong>at</strong> pleasure I remember Mr. G.<br />

Santhoskumar for his sincere guidance. Also I am thankful to all of my<br />

teaching and non-teaching staff in the department and my friends for<br />

extending their warm kindness and help.<br />

I would like to thank my parents <strong>with</strong>out their blessings and support I<br />

would not have been able to accomplish my goal. Finally, I thank the<br />

almighty for giving the guidance and blessings.


CONTENTS<br />

1. Introduction -------------------------------------------------------------------- 1<br />

2. Memory errors ----------------------------------------------------------------- 3<br />

3. Software Architecture ---------------------------------------------------------<br />

3.1. DieHard Overview --------------------------------------------------------<br />

3.2. <strong>Extermin<strong>at</strong>or</strong>’s Heap Layout ---------------------------------------------<br />

3.3. DieFast: A Probabilistic Debugging Alloc<strong>at</strong>or ---------------------------<br />

3.4. Modes of Oper<strong>at</strong>ion ------------------------------------------------------<br />

4. Error Isol<strong>at</strong>ion (ITERATIVE AND REPLICATED) ----------------------------- 11<br />

4.1. Buffer overflow detection ------------------------------------------------ 11<br />

4.2. Dangling Pointer Isol<strong>at</strong>ion ----------------------------------------------- 13<br />

5. CUMULATIVE ERROR ISOLATION -------------------------------------------- 14<br />

5.1. Buffer overflow detection -----------------------------------------------<br />

5.2. Dangling pointer isol<strong>at</strong>ion -----------------------------------------------<br />

6. Error Correction --------------------------------------------------------------- 16<br />

6.1. Buffer overflow correction -----------------------------------------------<br />

6.2. Dangling pointer correction ---------------------------------------------<br />

6.3. The Correcting Memory Alloc<strong>at</strong>or ---------------------------------------<br />

6.4. Collabor<strong>at</strong>ive Correction ----------------------------------------------<br />

7. Results ------------------------------------------------------------------------- 19<br />

7.1. <strong>Extermin<strong>at</strong>or</strong> Runtime Overhead ---------------------------------------- 19<br />

7.2. Memory Error Correction ------------------------------------------------ 20<br />

7.3 P<strong>at</strong>ch Overhead ----------------------------------------------------------- 22<br />

8. Rel<strong>at</strong>ed Work ------------------------------------------------------------------ 23<br />

8.1. Randomized Memory Managers ----------------------------------------- 23<br />

8.2. Autom<strong>at</strong>ic Repair --------------------------------------------------------<br />

8.3. Autom<strong>at</strong>ic Debugging ---------------------------------------------------<br />

8.4. Fault Tolerance ----------------------------------------------------------<br />

8.5. Memory Managers -------------------------------------------------------<br />

9. Future Work ------------------------------------------------------------------ 25<br />

10. Conclusion -------------------------------------------------------------------<br />

References ------------------------------------------------------------------<br />

4<br />

4<br />

5<br />

6<br />

8<br />

14<br />

15<br />

16<br />

16<br />

16<br />

18<br />

23<br />

23<br />

24<br />

24<br />

26<br />

27


<strong>Extermin<strong>at</strong>or</strong>: Autom<strong>at</strong>ically Correcting Memory Errors<br />

<strong>with</strong> <strong>High</strong> <strong>Probability</strong><br />

Abstract<br />

Programs written in C and C++ are susceptible to memory errors,including buffer<br />

overflows and dangling pointers. These errors,which can lead to crashes, erroneous<br />

execution, and security vulnerabilities, are notoriously costly to repair. Tracking down<br />

their loc<strong>at</strong>ion in the source code is difficult, even when the full memory st<strong>at</strong>e of the<br />

program is available. Once the errors are finally found, fixing them remains challenging:<br />

even for critical security-sensitive bugs, the average time between initial reports and the<br />

issuance of a p<strong>at</strong>ch is nearly one month.<br />

We present <strong>Extermin<strong>at</strong>or</strong>, a system th<strong>at</strong> autom<strong>at</strong>ically corrects heap-based memory<br />

errors <strong>with</strong>out programmer intervention. <strong>Extermin<strong>at</strong>or</strong> exploits randomiz<strong>at</strong>ion to pinpoint<br />

errors <strong>with</strong> high precision. From this inform<strong>at</strong>ion, <strong>Extermin<strong>at</strong>or</strong> derives runtime p<strong>at</strong>ches<br />

th<strong>at</strong> fix these errors both in current and subsequent executions. In addition, <strong>Extermin<strong>at</strong>or</strong><br />

enables collabor<strong>at</strong>ive bug correction by merging p<strong>at</strong>ches gener<strong>at</strong>ed by multiple users. We<br />

present analytical and empirical results th<strong>at</strong> demonstr<strong>at</strong>e <strong>Extermin<strong>at</strong>or</strong>’s effectiveness <strong>at</strong><br />

detecting and correcting both injected and real faults.


1. Introduction<br />

<strong>Extermin<strong>at</strong>or</strong><br />

The use of manual memory management and unchecked memory accesses in C and<br />

C++ leaves applic<strong>at</strong>ions written in these languages susceptible to a range of memory<br />

errors. These include buffer overruns, where reads or writes go beyond alloc<strong>at</strong>ed regions,<br />

and dangling pointers, when a program de alloc<strong>at</strong>es memory while it is still live. Memory<br />

errors can cause programs to crash or produce incorrect results. Worse, <strong>at</strong>tackers are<br />

frequently able to exploit these memory errors to gain unauthorized access to systems.<br />

Debugging memory errors is notoriously difficult and time consuming. Reproducing<br />

the error requires an input th<strong>at</strong> exposes it. Since inputs are often unavailable from deployed<br />

programs, developers must either concoct such an input or find the problem via code<br />

inspection. Once a test input is available, software developers typically execute the<br />

applic<strong>at</strong>ion <strong>with</strong> heap debugging tools like Purify and Valgrind , which slow execution<br />

by an order of magnitude. When the bug is ultim<strong>at</strong>ely discovered, developers must<br />

construct and carefully test a p<strong>at</strong>ch to ensure th<strong>at</strong> it fixes the bug <strong>with</strong>out introducing any<br />

new ones. According to Symantec, the average time between the discovery of a critical,<br />

remotely exploitable memory error and the release of a p<strong>at</strong>ch for enterprise applic<strong>at</strong>ions is<br />

28 days.<br />

As an altern<strong>at</strong>ive to debugging memory errors, researchers have proposed a number<br />

of systems th<strong>at</strong> either detect or toler<strong>at</strong>e them. Fail-stop systems are compiler-based<br />

approaches th<strong>at</strong> require access to source code, and abort programs when they performs<br />

illegal oper<strong>at</strong>ions like buffer overflows. They rely either on conserv<strong>at</strong>ive garbage<br />

collection or pool alloc<strong>at</strong>ion to prevent or detect dangling pointer errors. Failure- oblivious<br />

systems are also compiler-based, but manufacture read values and drop or cache illegal<br />

writes for l<strong>at</strong>er reuse. Finally, fault-tolerant systems mask the effect of errors, either by<br />

logging and replaying inputs in an environment th<strong>at</strong> pads alloc<strong>at</strong>ion requests and defers<br />

dealloc<strong>at</strong>ions, or through randomiz<strong>at</strong>ion and optional voting-based replic<strong>at</strong>ion th<strong>at</strong> reduces<br />

the odds th<strong>at</strong> an error will have any effect (e.g., DieHard ).<br />

Contributions: This paper presents <strong>Extermin<strong>at</strong>or</strong>, a runtime system th<strong>at</strong> not only<br />

toler<strong>at</strong>es but also detects and corrects heap-based memory errors. <strong>Extermin<strong>at</strong>or</strong> requires<br />

neither source code nor programmer intervention, and fixes existing errors <strong>with</strong>out<br />

introducing new ones. To our knowledge, this system is the first of its kind.<br />

<strong>Extermin<strong>at</strong>or</strong> relies on an efficient probabilistic debugging alloc<strong>at</strong>or th<strong>at</strong> we call<br />

DieFast. DieFast is based on DieHard’s alloc<strong>at</strong>or, which ensures th<strong>at</strong> heaps are<br />

independently randomized. However, while DieHard can only probabilistically toler<strong>at</strong>e<br />

errors, DieFast probabilistically detects them.<br />

When <strong>Extermin<strong>at</strong>or</strong> discovers an error, it dumps a heap image th<strong>at</strong> contains the<br />

complete st<strong>at</strong>e of the heap. <strong>Extermin<strong>at</strong>or</strong>’s probabilistic error isol<strong>at</strong>ion algorithm then<br />

processes one or more heap images to loc<strong>at</strong>e the source and size of buffer overflows and<br />

Dept. Of Computer Science & Engg. ~ 1 ~ Cochin University of Science & Technology


<strong>Extermin<strong>at</strong>or</strong><br />

dangling pointer errors. This error isol<strong>at</strong>ion algorithm has provably low false positive and<br />

false neg<strong>at</strong>ive r<strong>at</strong>es.<br />

Once <strong>Extermin<strong>at</strong>or</strong> loc<strong>at</strong>es a buffer overflow, it determines the alloc<strong>at</strong>ion site of the<br />

overflowed object, and the size of the over flow. For dangling pointer errors, <strong>Extermin<strong>at</strong>or</strong><br />

determines both the alloc<strong>at</strong>ion and deletion sites of the dangled object, and computes how<br />

prem<strong>at</strong>urely the object was freed.<br />

With this inform<strong>at</strong>ion in hand, <strong>Extermin<strong>at</strong>or</strong> corrects the errors by gener<strong>at</strong>ing<br />

runtime p<strong>at</strong>ches. These p<strong>at</strong>ches oper<strong>at</strong>e in the context of a correcting alloc<strong>at</strong>or. The<br />

correcting alloc<strong>at</strong>or prevents overflows by padding objects, and prevents dangling pointer<br />

errors by deferring object dealloc<strong>at</strong>ions. These actions impose little space overhead<br />

because <strong>Extermin<strong>at</strong>or</strong>’s runtime p<strong>at</strong>ches are tailored to the specific alloc<strong>at</strong>ion and<br />

dealloc<strong>at</strong>ion sites of each error.<br />

After <strong>Extermin<strong>at</strong>or</strong> completes p<strong>at</strong>ch gener<strong>at</strong>ion, it both stores the p<strong>at</strong>ches to correct<br />

the bug in subsequent executions, and triggers a p<strong>at</strong>ch upd<strong>at</strong>e in the running program to fix<br />

the bug in the current execution. <strong>Extermin<strong>at</strong>or</strong>’s p<strong>at</strong>ches also compose straightforwardly,<br />

enabling collabor<strong>at</strong>ive bug correction: users running <strong>Extermin<strong>at</strong>or</strong> can autom<strong>at</strong>ically<br />

merge their p<strong>at</strong>ches, thus system<strong>at</strong>ically and continuously improving applic<strong>at</strong>ion reliability.<br />

<strong>Extermin<strong>at</strong>or</strong> can oper<strong>at</strong>e in three distinct modes: an iter<strong>at</strong>ive mode for runs over the<br />

same input, a replic<strong>at</strong>ed mode th<strong>at</strong> can correct errors on-the-fly, and a cumul<strong>at</strong>ive mode<br />

th<strong>at</strong> corrects errors across multiple runs of the same applic<strong>at</strong>ion.<br />

We experimentally demonstr<strong>at</strong>e th<strong>at</strong>, in exchange for modest runtime overhead<br />

(geometric mean of 25%), <strong>Extermin<strong>at</strong>or</strong> effectively isol<strong>at</strong>es and corrects both injected and<br />

real memory errors, including buffer overflows in the Squid web caching server and the<br />

Mozilla web browser.<br />

Dept. Of Computer Science & Engg. ~ 2 ~ Cochin University of Science & Technology


2. Memory Errors<br />

<strong>Extermin<strong>at</strong>or</strong><br />

Table 1 summarizes the memory errors th<strong>at</strong> <strong>Extermin<strong>at</strong>or</strong> addresses, and its response to<br />

each. <strong>Extermin<strong>at</strong>or</strong> identifies and corrects dangling pointers, where a heap object is freed<br />

while it is still live, and buffer overflows (a.k.a. buffer overruns) of heap objects. Notice<br />

th<strong>at</strong> this differs substantially from DieHard, which toler<strong>at</strong>es these errors probabilistically<br />

but cannot detect or correct them.<br />

<strong>Extermin<strong>at</strong>or</strong>’s alloc<strong>at</strong>or (DieFast) inherits from DieHard its immunity from two other<br />

common memory errors: double frees, when a heap object is dealloc<strong>at</strong>ed multiple times<br />

<strong>with</strong>out an intervening alloc<strong>at</strong>ion, and invalid frees, when a program dealloc<strong>at</strong>es an object<br />

th<strong>at</strong> was never returned by the alloc<strong>at</strong>or. These errors have serious consequences in other<br />

systems, where they can lead to heap corruption or abrupt program termin<strong>at</strong>ion.<br />

<strong>Extermin<strong>at</strong>or</strong> prevents these invalid dealloc<strong>at</strong>ion requests from having any impact.<br />

DieFast’s bitmap-based alloc<strong>at</strong>or makes multiple frees benign since a bit can only be reset<br />

once. By checking ranges, DieFast detects and ignores invalid frees.<br />

Limit<strong>at</strong>ions<br />

<strong>Extermin<strong>at</strong>or</strong>’s ability to correct both dangling pointer errors and buffer overflows has<br />

several limit<strong>at</strong>ions. First, <strong>Extermin<strong>at</strong>or</strong> assumes th<strong>at</strong> buffer overflows always corrupt<br />

memory <strong>at</strong> higher addresses—th<strong>at</strong> is, they are forward overflows. While it is possible to<br />

extend <strong>Extermin<strong>at</strong>or</strong> to handle backwards overflows, we have not implemented this<br />

functionality. <strong>Extermin<strong>at</strong>or</strong> can only correct finite overflows, so th<strong>at</strong> it can contain any<br />

given overflow by overalloc<strong>at</strong>ion. Similarly, <strong>Extermin<strong>at</strong>or</strong> corrects dangling pointer errors<br />

by inserting finite delays before freeing particular objects. Finally, in iter<strong>at</strong>ed and<br />

replic<strong>at</strong>ed modes, <strong>Extermin<strong>at</strong>or</strong> assumes th<strong>at</strong> overflows and dangling pointer errors are<br />

deterministic. However, the cumul<strong>at</strong>ive mode does not require deterministic errors.<br />

Unlike DieHard, <strong>Extermin<strong>at</strong>or</strong> does not detect uninitialized reads, where a program<br />

makes use of a value left over in a previously-alloc<strong>at</strong>ed object. Because the intended value<br />

is unknown,it is not generally possible to repair such errors <strong>with</strong>out additional inform<strong>at</strong>ion,<br />

e.g. d<strong>at</strong>a structure invariants. Instead, <strong>Extermin<strong>at</strong>or</strong> fills all alloc<strong>at</strong>ed objects <strong>with</strong> zeroes.<br />

Dept. Of Computer Science & Engg. ~ 3 ~ Cochin University of Science & Technology


3. Software Architecture<br />

<strong>Extermin<strong>at</strong>or</strong><br />

<strong>Extermin<strong>at</strong>or</strong>’s software architecture extends and modifies DieHard to enable its error<br />

isol<strong>at</strong>ing and correcting properties.<br />

3.1 DieHard Overview<br />

The DieHard system includes a bitmap-based, fully-randomized memory alloc<strong>at</strong>or<br />

th<strong>at</strong> provides probabilistic memory safety. The l<strong>at</strong>est version of DieHard, upon which<br />

<strong>Extermin<strong>at</strong>or</strong> is based, adaptively sizes its heap be M times larger than the maximum<br />

needed by the applic<strong>at</strong>ion (see Figure 2). This version of Die-Hard alloc<strong>at</strong>es memory from<br />

increasingly large chunks th<strong>at</strong> we call miniheaps. Each miniheap contains objects of<br />

exactly one size. If an alloc<strong>at</strong>ion would cause the total number of objects to exceed1/M,<br />

DieHard alloc<strong>at</strong>es a new miniheap th<strong>at</strong> is twice as large as the previous largest miniheap.<br />

Alloc<strong>at</strong>ion randomly probes a miniheap’s bitmap for the given size class for a free bit:<br />

this oper<strong>at</strong>ion takes O(1) expected time. Freeing a valid object resets the appropri<strong>at</strong>e bit.<br />

DieHard’s use of randomiz<strong>at</strong>ion across an over-provisioned heap makes it probabilistically<br />

likely th<strong>at</strong> buffer overflows will land on free space, and unlikely th<strong>at</strong> a recently-freed<br />

object will be reused soon, making dangling pointer errors rare.<br />

DieHard optionally uses replic<strong>at</strong>ion to increase the probability of successful execution.<br />

In this mode, it broadcasts inputs to a number of replicas of the applic<strong>at</strong>ion process, each<br />

equipped <strong>with</strong> a different random seed. A voter intercepts and compares outputs across the<br />

replicas, and only actually gener<strong>at</strong>es output agreed on by a plurality of the replicas. The<br />

independent randomiz<strong>at</strong>ion of each replica’s heap makes the probabilities of memory<br />

errors independent. Replic<strong>at</strong>ion thus exponentially decreases the likelihood of a memory<br />

error affecting output, since the probability of an error striking a majority of the replicas is<br />

low.<br />

Dept. Of Computer Science & Engg. ~ 4 ~ Cochin University of Science & Technology


3.2 <strong>Extermin<strong>at</strong>or</strong>’s Heap Layout<br />

<strong>Extermin<strong>at</strong>or</strong><br />

Figure 1 presents <strong>Extermin<strong>at</strong>or</strong>’s heap layout, which includes five fields per object for<br />

error isol<strong>at</strong>ion and correction: an object id, alloc<strong>at</strong>ion and dealloc<strong>at</strong>ion sites, dealloc<strong>at</strong>ion<br />

time, which records when the object was freed, and a canary bitset th<strong>at</strong> indic<strong>at</strong>es if the<br />

object was filled <strong>with</strong> canaries (Section 3.3).<br />

An object id of n means th<strong>at</strong> the object is the nth object alloc<strong>at</strong>ed. <strong>Extermin<strong>at</strong>or</strong> uses<br />

object ids to identify objects across multiple heaps. These ids are needed because the<br />

object’s address cannot be used to identify it across differently-randomized heaps.<br />

The site inform<strong>at</strong>ion fields capture the calling context for alloc<strong>at</strong>ions and<br />

dealloc<strong>at</strong>ions. For each, <strong>Extermin<strong>at</strong>or</strong> hashes the least significant bytes of the five mostrecent<br />

return addresses into 32 bits using the DJB2 hash function (see Figure 3).<br />

This out-of-band metad<strong>at</strong>a accounts for approxim<strong>at</strong>ely 16 bytes plus two bits of space<br />

overhead for every object. This overhead is comparable to th<strong>at</strong> of typical free list-based<br />

memory managers like the Lea alloc<strong>at</strong>or, which prepend 8-byte (on 32-bit systems) or 16byte<br />

headers (on 64-bit systems) to alloc<strong>at</strong>ed objects.<br />

Dept. Of Computer Science & Engg. ~ 5 ~ Cochin University of Science & Technology


3.3 DieFast: A Probabilistic Debugging Alloc<strong>at</strong>or<br />

<strong>Extermin<strong>at</strong>or</strong><br />

<strong>Extermin<strong>at</strong>or</strong> uses a new, probabilistic debugging alloc<strong>at</strong>or th<strong>at</strong> we call DieFast.<br />

DieFast uses the same randomized heap layout as DieHard, but extends its alloc<strong>at</strong>ion and<br />

dealloc<strong>at</strong>ion algorithms to detect and expose errors. Figure 4 presents pseudo-code for the<br />

DieFast alloc<strong>at</strong>or. Unlike previous debugging alloc<strong>at</strong>ors, DieFast has a number of unusual<br />

characteristics tailored for its use in the context of <strong>Extermin<strong>at</strong>or</strong>.<br />

Implicit Fence-posts<br />

Many existing debugging alloc<strong>at</strong>ors pad alloc<strong>at</strong>ed objects <strong>with</strong> fence-posts (filled <strong>with</strong><br />

canary values) on both sides. They can thus detect buffer overflows by checking the<br />

integrity of these fence posts. This approach has the disadvantage of increasing space<br />

requirements. Combined <strong>with</strong> the already-increased space requirements of a DieHardbased<br />

heap, the additional space overhead of padding may be unacceptably large.<br />

DieFast exploits two facts to obtain the effect of fence-posts <strong>with</strong>out any additional<br />

space overhead. First, because its heap lay out is header less, one fence-post serves double<br />

duty: a fence-post following an object can act as the one preceding the next object. Second,<br />

Dept. Of Computer Science & Engg. ~ 6 ~ Cochin University of Science & Technology


<strong>Extermin<strong>at</strong>or</strong><br />

because alloc<strong>at</strong>ed objects are separ<strong>at</strong>ed by E(M−1) freed objects on the heap, we use freed<br />

space to act as fence-posts.<br />

Random Canaries<br />

Traditional debugging canaries include values th<strong>at</strong> are readily distinguished from<br />

normal program d<strong>at</strong>a in a debugging session, such as the hexadecimal value<br />

0xDEADBEEF. However, one drawback of a deterministically-chosen canary is th<strong>at</strong> it is<br />

always possible for the program to use the canary p<strong>at</strong>tern as a d<strong>at</strong>a value. Because DieFast<br />

uses canaries loc<strong>at</strong>ed in freed space r<strong>at</strong>her than in alloc<strong>at</strong>ed space, a fixed canary would<br />

lead to a high false positive r<strong>at</strong>e if th<strong>at</strong> d<strong>at</strong>a value were common in alloc<strong>at</strong>ed objects.<br />

DieFast instead uses a random 32-bit value set <strong>at</strong> startup. Sinceboth the canary and<br />

heap addresses are random and differ on every execution, any fixed d<strong>at</strong>a value has a low<br />

probability of colliding <strong>with</strong> the canary, thus ensuring a low false positive r<strong>at</strong>e (see Theorem<br />

2). To increase the likelihood of detecting an error, DieFast sets the last bit of the<br />

canary. Setting this bit will cause an alignment error if the canary is dereferenced, but<br />

keeps the probability of an accidental collision <strong>with</strong> the canary low (1/231).<br />

Probabilistic Fence-posts<br />

Intuitively, the most effective way to expose a dangling pointer error is to fill freed<br />

memory <strong>with</strong> canary values. For example, dereferencing a canary-filled pointer will likely<br />

trigger a segment<strong>at</strong>ion viol<strong>at</strong>ion.<br />

Unfortun<strong>at</strong>ely, reading random values does not necessarily cause programs to fail. For<br />

example, in the espresso benchmark, some objects hold bitsets. Filling a freed bitset <strong>with</strong> a<br />

random value does not cause the program to termin<strong>at</strong>e but only affects the correctness of<br />

the comput<strong>at</strong>ion.<br />

If reading from a canary-filled dangling pointer causes a program to diverge, there is no<br />

way to narrow down the error. In the worst-case, half of the heap could be filled <strong>with</strong> freed<br />

objects, all overwritten <strong>with</strong> canaries. All of these objects would then be potential sources<br />

of dangling pointer errors.<br />

In cumul<strong>at</strong>ive mode, <strong>Extermin<strong>at</strong>or</strong> prevents this scenario by non-deterministically<br />

writing canaries into freed memory randomly <strong>with</strong> probability p, and setting the<br />

appropri<strong>at</strong>e bit in the canary bitmap. This probabilistic approach may seem to degrade<br />

<strong>Extermin<strong>at</strong>or</strong>’s ability to find errors. However, it is required to isol<strong>at</strong>e read-only dangling<br />

pointer errors, where the canary itself remains intact. Because it would take an<br />

impractically large number of iter<strong>at</strong>ions or replicas to isol<strong>at</strong>e these errors, <strong>Extermin<strong>at</strong>or</strong><br />

always fills freed objects <strong>with</strong> canaries when not running in cumul<strong>at</strong>ive mode<br />

Probabilistic Error Detection<br />

Whenever DieFast alloc<strong>at</strong>es memory, it examines the memory to be returned to verify<br />

th<strong>at</strong> any canaries are intact. If not, in addition to signaling an error (see Section 3.4),<br />

DieFast sets the alloc<strong>at</strong>ed bit for this chunk of memory. This “bad object isol<strong>at</strong>ion” ensures<br />

Dept. Of Computer Science & Engg. ~ 7 ~ Cochin University of Science & Technology


<strong>Extermin<strong>at</strong>or</strong><br />

th<strong>at</strong> the object will not be reused for future alloc<strong>at</strong>ions, preserving its contents for<br />

<strong>Extermin<strong>at</strong>or</strong>’s subsequent use. Checking canary integrity on each alloc<strong>at</strong>ion ensures th<strong>at</strong><br />

DieFast will detect heap corruption <strong>with</strong>in E(H) alloc<strong>at</strong>ions, where H is the number of<br />

objects on the heap.<br />

After every dealloc<strong>at</strong>ion, DieFast checks both the preceding and subsequent objects.<br />

For each of these, DieFast checks if they are free. If so, it performs the same canary check<br />

as above. Recall th<strong>at</strong> because DieFast’s alloc<strong>at</strong>ion is random, the identity of these adjacent<br />

objects will differ from run to run. Checking the predecessor and successor on each free<br />

allows DieFast to detect buffer overruns immedi<strong>at</strong>ely upon object dealloc<strong>at</strong>ion.<br />

3.4 Modes of Oper<strong>at</strong>ion<br />

<strong>Extermin<strong>at</strong>or</strong> can be used in three modes of oper<strong>at</strong>ion: an iter<strong>at</strong>ive mode suitable for<br />

testing or whenever all inputs are available, a replic<strong>at</strong>ed mode th<strong>at</strong> is suitable both for<br />

testing and for restricted deployment scenarios, and a cumul<strong>at</strong>ive mode th<strong>at</strong> is suitable for<br />

broad deployment. All of these rely on the gener<strong>at</strong>ion of heap images, which <strong>Extermin<strong>at</strong>or</strong><br />

examines to isol<strong>at</strong>e errors and compute runtime p<strong>at</strong>ches.<br />

If <strong>Extermin<strong>at</strong>or</strong> discovers an error when executing a program, or if DieFast signals an<br />

error, <strong>Extermin<strong>at</strong>or</strong> forces the process to emit a heap image file. This file is akin to a core<br />

dump, but contains less d<strong>at</strong>a (e.g., no code), and is organized to simplify processing. In<br />

addition to the full heap contents and heap metad<strong>at</strong>a, the heap image includes the current<br />

alloc<strong>at</strong>ion time (measured by the number of alloc<strong>at</strong>ions to d<strong>at</strong>e)<br />

Iter<strong>at</strong>ive Mode<br />

<strong>Extermin<strong>at</strong>or</strong>’s iter<strong>at</strong>ive mode oper<strong>at</strong>es <strong>with</strong>out replic<strong>at</strong>ion. To find a single bug,<br />

<strong>Extermin<strong>at</strong>or</strong> is initially invoked via a command-line option th<strong>at</strong> directs it to stop as soon<br />

as it detects an error. <strong>Extermin<strong>at</strong>or</strong> then re-executes the program in “replay” mode over the<br />

same input (but <strong>with</strong> a new random seed). In this mode, <strong>Extermin<strong>at</strong>or</strong> reads the alloc<strong>at</strong>ion<br />

time from the initial heap image to abort execution <strong>at</strong> th<strong>at</strong> point; we call this a malloc<br />

breakpoint. <strong>Extermin<strong>at</strong>or</strong> then begins execution and ignores DieFast error signals th<strong>at</strong> are<br />

raised before the malloc breakpoint is reached.<br />

Once it reaches the malloc breakpoint, <strong>Extermin<strong>at</strong>or</strong> triggers another heap image<br />

dump. This process can be repe<strong>at</strong>ed multiple times to gener<strong>at</strong>e independent heap images.<br />

<strong>Extermin<strong>at</strong>or</strong> then performs post-mortem error isol<strong>at</strong>ion and runtime p<strong>at</strong>ch gener<strong>at</strong>ion. A<br />

small number of iter<strong>at</strong>ions usually suffices for <strong>Extermin<strong>at</strong>or</strong> to gener<strong>at</strong>e runtime p<strong>at</strong>ches<br />

for an individual error, as we show in Section 7.2. When run <strong>with</strong> a correcting memory<br />

alloc<strong>at</strong>or th<strong>at</strong> in corpor<strong>at</strong>es these changes (described in detail in Section 6.3), these p<strong>at</strong>ches<br />

autom<strong>at</strong>ically fix the isol<strong>at</strong>ed errors.<br />

Replic<strong>at</strong>ed Mode<br />

The iter<strong>at</strong>ed mode described above works well when all inputs are available so th<strong>at</strong> rerunning<br />

an execution is feasible. However, when applic<strong>at</strong>ions are deployed in the field,<br />

Dept. Of Computer Science & Engg. ~ 8 ~ Cochin University of Science & Technology


<strong>Extermin<strong>at</strong>or</strong><br />

such inputs may not be available, and replaying may be impractical. The replic<strong>at</strong>ed mode<br />

of oper<strong>at</strong>ion allows <strong>Extermin<strong>at</strong>or</strong> to correct errors while the program is running, <strong>with</strong>out<br />

the need for multiple iter<strong>at</strong>ions.<br />

Like DieHard, <strong>Extermin<strong>at</strong>or</strong> can run a number of differently randomized replicas<br />

simultaneously (as separ<strong>at</strong>e processes), broadcasting inputs to all and voting on their<br />

outputs. However, <strong>Extermin<strong>at</strong>or</strong> uses DieFast-based heaps, each <strong>with</strong> a correcting<br />

alloc<strong>at</strong>or. This organiz<strong>at</strong>ion lets <strong>Extermin<strong>at</strong>or</strong> discover and fix errors.<br />

In replic<strong>at</strong>ed mode, when DieFast signals an error or the voter detects divergent<br />

output, <strong>Extermin<strong>at</strong>or</strong> sends a signal th<strong>at</strong> triggers a heap image dump for each replica. If the<br />

program crashes because of a segment<strong>at</strong>ion viol<strong>at</strong>ion, a signal handler also dumps a heap<br />

image.<br />

If DieFast signals an error, the replicas th<strong>at</strong> dump a heap image do not have to stop<br />

executing. If their output continues to be in agreement, they can continue executing<br />

concurrently <strong>with</strong> the error isol<strong>at</strong>ion process. When the runtime p<strong>at</strong>ch gener<strong>at</strong>ion process is<br />

complete, th<strong>at</strong> process signals the running replicas to tell the correcting alloc<strong>at</strong>ors to reload<br />

their runtime p<strong>at</strong>ches. Thus, subsequent alloc<strong>at</strong>ions in the same process will be p<strong>at</strong>ched onthe-fly<br />

<strong>with</strong>out interrupting execution.<br />

Dept. Of Computer Science & Engg. ~ 9 ~ Cochin University of Science & Technology


Cumul<strong>at</strong>ive Mode<br />

<strong>Extermin<strong>at</strong>or</strong><br />

While the replic<strong>at</strong>ed mode can isol<strong>at</strong>e and correct errors on-the fly in deployed<br />

applic<strong>at</strong>ions, it may not be practical in all situ<strong>at</strong>ions. For example, replic<strong>at</strong>ing applic<strong>at</strong>ions<br />

<strong>with</strong> high resource requirements may cause unacceptable overhead. In addition,<br />

multithreaded or non-deterministic applic<strong>at</strong>ions can exhibit different alloc<strong>at</strong>ion activity and<br />

so cause object ids to diverge across replicas. To support these applic<strong>at</strong>ions, <strong>Extermin<strong>at</strong>or</strong><br />

uses its third mode of oper<strong>at</strong>ion, cumul<strong>at</strong>ive mode, which isol<strong>at</strong>es errors <strong>with</strong>out<br />

replic<strong>at</strong>ion or multiple identical executions.<br />

When oper<strong>at</strong>ing in cumul<strong>at</strong>ive mode, <strong>Extermin<strong>at</strong>or</strong> reasons about objects grouped by<br />

alloc<strong>at</strong>ion and dealloc<strong>at</strong>ion sites instead of individual objects, since objects are no longer<br />

guaranteed to be identical across different executions.<br />

Because objects from a given site only occasionally cause errors, often <strong>at</strong> low<br />

frequencies, <strong>Extermin<strong>at</strong>or</strong> requires more executions than in replic<strong>at</strong>ed or iter<strong>at</strong>ive mode in<br />

order to identify these low-frequency errors <strong>with</strong>out a high false positive r<strong>at</strong>e. Instead of<br />

storing heap images from multiple runs, <strong>Extermin<strong>at</strong>or</strong> computes relevant st<strong>at</strong>istics about<br />

each run and stores them in its p<strong>at</strong>ch file. The retained d<strong>at</strong>a is on the order of a few<br />

kilobytes per execution, compared to tens or hundreds of megabytes for each heap image.<br />

Dept. Of Computer Science & Engg. ~ 10 ~ Cochin University of Science & Technology


4. Error Isol<strong>at</strong>ion<br />

<strong>Extermin<strong>at</strong>or</strong><br />

<strong>Extermin<strong>at</strong>or</strong> employs two different families of error isol<strong>at</strong>ion algorithms: one set<br />

for replic<strong>at</strong>ed and iter<strong>at</strong>ive modes, and another for cumul<strong>at</strong>ive mode.<br />

ITERATIVE AND REPLICATED ERROR ISOLATION<br />

When oper<strong>at</strong>ing in its replic<strong>at</strong>ed or iter<strong>at</strong>ive modes, <strong>Extermin<strong>at</strong>or</strong>’s probabilistic<br />

error isol<strong>at</strong>ion algorithm oper<strong>at</strong>es by searching for discrepancies across multiple heap<br />

images. <strong>Extermin<strong>at</strong>or</strong> relies on corrupted canaries stored in freed objects to indic<strong>at</strong>e<br />

the presence of an error. A corrupted canary (one th<strong>at</strong> has been overwritten) can mean two<br />

things. If the same object (identified by object id) across all heap images has the same<br />

corruption, then the error is likely to be a dangling pointer. If canaries are corrupted in<br />

multiple freed objects, then the error is likely to be a buffer overflow. <strong>Extermin<strong>at</strong>or</strong> limits<br />

the number of false positives for both overflows and dangling pointer errors.<br />

4.1 Buffer overfow detection<br />

<strong>Extermin<strong>at</strong>or</strong> examines heap images looking for discrepancies across the heaps, both in<br />

overwritten canaries and in live objects. If an object is not equivalent across the heaps,<br />

extermin<strong>at</strong>or considers it to be a candid<strong>at</strong>e victim of an overflow.<br />

To identify victim objects, extermin<strong>at</strong>or compares the contents of equivalent objects,<br />

as identified by their object Id across all heaps. <strong>Extermin<strong>at</strong>or</strong> builds an overflow mask<br />

th<strong>at</strong> comprises the discrepancies found across all heaps. However, because the same<br />

logical object may legitim<strong>at</strong>ely differ across multiple heaps, <strong>Extermin<strong>at</strong>or</strong> must take<br />

care not to consider these occurrences as overflows.<br />

First, a freed object may differ across heaps because it was filled <strong>with</strong> canaries only in<br />

some of the heaps. <strong>Extermin<strong>at</strong>or</strong> uses the canary bitmap to identify this case.<br />

Second, an object can contain pointers to other objects, which are randomly loc<strong>at</strong>ed<br />

on their respective heaps. <strong>Extermin<strong>at</strong>or</strong> uses both deterministic and probabilistic<br />

techniques to distinguish integers from pointers. Briefly, if a value interpreted as a<br />

pointer points inside the heap area and points to the same logical object across all heaps,<br />

then <strong>Extermin<strong>at</strong>or</strong> considers it to be the same logical pointer, and thus not a discrepancy.<br />

<strong>Extermin<strong>at</strong>or</strong> also handles the case where pointers point into dynamic libraries, which<br />

newer versions of Linux place <strong>at</strong> pseudorandom base addresses.<br />

Finally, an object can contain values th<strong>at</strong> legitim<strong>at</strong>ely differ from process to process.<br />

Examples of these values include process ids, file handles, pseudorandom numbers, and<br />

pointers in d<strong>at</strong>a structures th<strong>at</strong> depend on addresses (e.g., some red-black tree<br />

implement<strong>at</strong>ions). When <strong>Extermin<strong>at</strong>or</strong> examines an object and encounters any word<br />

th<strong>at</strong> differs <strong>at</strong> the same position across all the heaps, it considers it to be legitim<strong>at</strong>ely<br />

different, and not an indic<strong>at</strong>ion of buffer overflow.<br />

Dept. Of Computer Science & Engg. ~ 11 ~ Cochin University of Science & Technology


<strong>Extermin<strong>at</strong>or</strong><br />

For small overflows, the risk of missing an overflow by ignoring overwrites of<br />

the same objects across multiple heaps is low:<br />

Theorem 1: Let k be the number of heap images, S the length (in number of objects) of<br />

the overflow string, and h the number of objects on the heap. Then the probability of an<br />

overflow over-writing an object on all k heaps is<br />

Proof. Assume th<strong>at</strong> buffer overflows overwrite past the end of an object. Thus, for an<br />

overflow from object i to land on a given object j, it must both precede it and be large<br />

enough to span the distance from i to j. An object i precedes j in k heaps <strong>with</strong> probability<br />

(1/2) k . Objects i and j are separ<strong>at</strong>ed by S or fewer objects <strong>with</strong> probability <strong>at</strong> most<br />

(1/(H −S)) k . Combining these terms yields the joint probability.<br />

False neg<strong>at</strong>ive analysis: We now bound the worst-case false neg<strong>at</strong>ive r<strong>at</strong>e for buffer<br />

overflows; th<strong>at</strong> is, the odds of not finding a buffer overflow because it failed to overwrite<br />

any canaries.<br />

Theorem 2: Let M be the heap multiplier, so a heap is never more than 1/M full. The<br />

likelihood th<strong>at</strong> an overflow of length b bytes fails to be detected by comparison against a<br />

canary is <strong>at</strong> most:<br />

Proof. Each heap is <strong>at</strong> least (M −1)/M free. Since DieFast fills free space <strong>with</strong> canaries<br />

<strong>with</strong> P = 1/2, the fraction of each heap filled <strong>with</strong> canaries is <strong>at</strong> least (M −1)/2M. The<br />

likelihood of a random write not landing on a canary across all k heaps is thus <strong>at</strong> most<br />

(1−(M−1)/2M) k . The overflow string could also m<strong>at</strong>ch the canary value. Since the canary<br />

is randomly chosen, the odds of this are <strong>at</strong> most (1/256) b .<br />

Culprit Identific<strong>at</strong>ion<br />

At this point, <strong>Extermin<strong>at</strong>or</strong> has identified the possible victims of overflows. For each<br />

victim, it scans the heap images for a m<strong>at</strong>ching culprit, the source of the overflow into a<br />

victim. Because <strong>Extermin<strong>at</strong>or</strong> assumes th<strong>at</strong> overflows are deterministic when oper<strong>at</strong>ing in<br />

iter<strong>at</strong>ive or replic<strong>at</strong>ed modes, the culprit must be the same distance δ bytes away from the<br />

victim in every heap image. In addition, <strong>Extermin<strong>at</strong>or</strong> requires th<strong>at</strong> the overflowed values<br />

have some bytes in common across the images, and ranks them by their similarity.<br />

<strong>Extermin<strong>at</strong>or</strong> checks every other heap image for the candid<strong>at</strong>e culprit, and examines the<br />

object th<strong>at</strong> is the same δ bytes forwards. If th<strong>at</strong> object is free and should be filled <strong>with</strong><br />

canaries but they are not intact, then it adds this culprit-victim pair to the candid<strong>at</strong>e list.<br />

False positive analysis: Because buffer overflows can be discontiguous, every object in the<br />

heap th<strong>at</strong> precedes an overflow is a potential culprit. However, each additional heap<br />

dram<strong>at</strong>ically lowers this number:<br />

Dept. Of Computer Science & Engg. ~ 12 ~ Cochin University of Science & Technology


<strong>Extermin<strong>at</strong>or</strong><br />

Theorem 3 : The expected number of objects (possible culprits) the same distance δ from<br />

any given victim object across k heaps is:<br />

Proof. Without loss of generality, assume th<strong>at</strong> the victim object occupies the last slot in<br />

every heap. An object can thus be in any of the remaining n = H−1 slots. The odds of it<br />

being in the same slot in k heaps is p = 1/(H −1) k−1 . This is a binomial distribution, so<br />

E(possible culprits) = np = 1/(H −1) k−2 .<br />

With only one heap image, all (H−1) objects are potential culprits, but one additional<br />

image reduces the expected number of culprits for any victim to just 1 (1/(H −1) 0 ),<br />

effectively elimin<strong>at</strong>ing the risk of false positives.<br />

Once <strong>Extermin<strong>at</strong>or</strong> identifies a culprit-victim pair, it records the overflow size for th<strong>at</strong><br />

culprit as the maximum of any observed δ to a victim. <strong>Extermin<strong>at</strong>or</strong> also assigns each<br />

culprit-victim pair a score th<strong>at</strong> corresponds to its confidence th<strong>at</strong> it is an actual overflow.<br />

This score is 1−(1/256) S , where S is the sum of the length of detected<br />

overflow strings across all pairs. Intuitively, small overflow strings (e.g., one byte)<br />

detected in only a few heap images are given low scores, and large overflow strings present<br />

in many heap images get high scores.<br />

After overflow processing completes, <strong>Extermin<strong>at</strong>or</strong> gener<strong>at</strong>es a runtime p<strong>at</strong>ch for the<br />

top-ranked overflow for each culprit.<br />

4.2 Dangling Pointer Isol<strong>at</strong>ion<br />

Isol<strong>at</strong>ing dangling pointer errors falls into two cases: a program may read and write to<br />

the dangled object, leaving it partially or completely overwritten, or it may only read<br />

through the dangling pointer. <strong>Extermin<strong>at</strong>or</strong> does not handle read-only dangling pointer<br />

errors in iter<strong>at</strong>ive or replic<strong>at</strong>ed mode because it would require too many replicas (e.g.,<br />

around 20; see Section 7.2). However, it handles overwritten dangling objects<br />

straightforwardly.<br />

When a freed object is overwritten <strong>with</strong> identical values across multiple heap images,<br />

<strong>Extermin<strong>at</strong>or</strong> classifies the error as a dangling pointer overwrite. As Theorem 1 shows, this<br />

situ<strong>at</strong>ion is highly unlikely to occur for a buffer overflow. <strong>Extermin<strong>at</strong>or</strong> then gener<strong>at</strong>es an<br />

appropri<strong>at</strong>e runtime p<strong>at</strong>ch, as Section 6.2 describes.<br />

Dept. Of Computer Science & Engg. ~ 13 ~ Cochin University of Science & Technology


5. CUMULATIVE ERROR ISOLATION<br />

<strong>Extermin<strong>at</strong>or</strong><br />

Unlike iter<strong>at</strong>ive and replic<strong>at</strong>ed mode, cumul<strong>at</strong>ive mode focuses on detecting,<br />

isol<strong>at</strong>ing, and correcting errors th<strong>at</strong> happen in the field. In this context, replic<strong>at</strong>ion,<br />

identical inputs, and deterministic execution are infeasible. Worse, program errors may<br />

manifest themselves in ways th<strong>at</strong> are inherently hard to detect. For example, a program<br />

th<strong>at</strong> reads a canary written into a free object may fail immedi<strong>at</strong>ely, or may execute<br />

incorrectly for some time.<br />

The approach to error detection in this mode is to consider exceptional program<br />

events, such as prem<strong>at</strong>ure termin<strong>at</strong>ion, raising unexpected signals, etc., to be evidence th<strong>at</strong><br />

memory was corrupted during execution. We counter the lack of error reproducibility in<br />

these cases <strong>with</strong> st<strong>at</strong>istical accumul<strong>at</strong>ion of evidence before assuming an error needs to be<br />

corrected. <strong>Extermin<strong>at</strong>or</strong> isol<strong>at</strong>es memory errors in cumul<strong>at</strong>ive mode by computing<br />

summary inform<strong>at</strong>ion accumul<strong>at</strong>ed over multiple executions, r<strong>at</strong>her than by oper<strong>at</strong>ing<br />

over multiple heap images.<br />

5.1 Buffer overflow detection<br />

<strong>Extermin<strong>at</strong>or</strong>’s buffer overflow isol<strong>at</strong>ion algorithm proceeds in three phases. First, it<br />

identifies heap corruption by looking for overwritten canary values. Second, for each<br />

alloc<strong>at</strong>ion site, it computes an estim<strong>at</strong>e of the probability th<strong>at</strong> an object from th<strong>at</strong> site could<br />

be the source of the corruption. Third, it combines these independent estim<strong>at</strong>es from<br />

multiple runs to identify sites th<strong>at</strong> consistently appear as candid<strong>at</strong>es for<br />

causing the corruption.<br />

<strong>Extermin<strong>at</strong>or</strong>’s randomized alloc<strong>at</strong>or allows us to compute the probability of<br />

certain properties in the heap. For example, the probability of an object occurring on<br />

a given miniheap can be estim<strong>at</strong>ed given the miniheap size and the number of miniheaps.<br />

If objects from some alloc<strong>at</strong>ion site are sources of overflows, then those objects will<br />

occur on miniheaps containing corruptions more often than expected. <strong>Extermin<strong>at</strong>or</strong> tracks<br />

how often objects from each alloc<strong>at</strong>ion site occur on corrupted miniheaps across<br />

multiple runs. Using this inform<strong>at</strong>ion, it uses a st<strong>at</strong>istical hypothesis test th<strong>at</strong> identifies<br />

sites th<strong>at</strong> occur <strong>with</strong> corruption too often to be random chance, and identifies them as<br />

overflow culprits.<br />

Once <strong>Extermin<strong>at</strong>or</strong> identifies an erroneous alloc<strong>at</strong>ion site, it produces a runtime<br />

p<strong>at</strong>ch th<strong>at</strong> corrects the error. To find the correct pad size, it searches backward from the<br />

corruption found during the current run until it finds an object alloc<strong>at</strong>ed from the site. It<br />

then uses the distance between th<strong>at</strong> object and the end of the corruption as the pad size.<br />

Dept. Of Computer Science & Engg. ~ 14 ~ Cochin University of Science & Technology


5.2 Dangling pointer isol<strong>at</strong>ion<br />

<strong>Extermin<strong>at</strong>or</strong><br />

As <strong>with</strong> buffer overflows, <strong>Extermin<strong>at</strong>or</strong>’s dangling pointer isol<strong>at</strong>or computes<br />

summary inform<strong>at</strong>ion over multiple runs. To force each run to have a different effect,<br />

<strong>Extermin<strong>at</strong>or</strong> fills freed objects <strong>with</strong> canaries <strong>with</strong> some probability p, turning every<br />

execution into a series of Bernoulli trials. In this scenario, if the program reads canary<br />

d<strong>at</strong>a through the dangling pointer, the program may crash. Thus writing the canary for th<strong>at</strong><br />

object increases the probability th<strong>at</strong> the program will l<strong>at</strong>er crash. Conversely, if an<br />

object is not freed prem<strong>at</strong>urely, then overwriting it <strong>with</strong> canaries has no influence on the<br />

failure or success of the program. <strong>Extermin<strong>at</strong>or</strong> then uses the same hypothesis testing<br />

framework as its buffer overflow algorithm to identify sources of dangling pointer<br />

errors.<br />

The choice of p reflects a trade-off between the precision of the buffer<br />

overflow algorithm and dangling pointer isol<strong>at</strong>ion. Since overflow isol<strong>at</strong>ion relies on<br />

detecting corrupt canaries, low values of p increase the number of runs (though not the<br />

number of failures) required to isol<strong>at</strong>e overflows. However, lower values of p increase<br />

the precision of dangling pointer isol<strong>at</strong>ion by reducing the risk th<strong>at</strong> certain alloc<strong>at</strong>ion<br />

sites (those th<strong>at</strong> alloc<strong>at</strong>e large numbers of objects) will always observe one canary value.<br />

We currently set p = 1/2, though some dangling pointer errors may require lower values of<br />

p to converge <strong>with</strong>in a reasonable number of runs.<br />

<strong>Extermin<strong>at</strong>or</strong> then estim<strong>at</strong>es the required lifetime extension by loc<strong>at</strong>ing the oldest<br />

canaried object from an identified alloc<strong>at</strong>ion site, and computing the number of<br />

alloc<strong>at</strong>ions between the time it was freed and the time th<strong>at</strong> the program failed. The<br />

correcting alloc<strong>at</strong>or then extends the lifetime of all objects corresponding to this<br />

alloc<strong>at</strong>ion/dealloc<strong>at</strong>ion site by twice this number.<br />

Dept. Of Computer Science & Engg. ~ 15 ~ Cochin University of Science & Technology


6. Error Correction<br />

<strong>Extermin<strong>at</strong>or</strong><br />

How <strong>Extermin<strong>at</strong>or</strong> uses the inform<strong>at</strong>ion from its error isol<strong>at</strong>ion algorithms to correct<br />

specific errors? <strong>Extermin<strong>at</strong>or</strong> first gener<strong>at</strong>es runtime p<strong>at</strong>ches for each error. It then relies<br />

on a correcting alloc<strong>at</strong>or th<strong>at</strong> uses this inform<strong>at</strong>ion, padding alloc<strong>at</strong>ions to prevent<br />

overflows, and deferring dealloc<strong>at</strong>ions to prevent dangling pointer errors.<br />

6.1 Buffer overflow correction<br />

For every culprit-victim pair th<strong>at</strong> <strong>Extermin<strong>at</strong>or</strong> encounters, it gener<strong>at</strong>es a runtime<br />

p<strong>at</strong>ch consisting of the alloc<strong>at</strong>ion site hash and the padding needed to contain the overflow<br />

(δ + the size of the over-flow). If a runtime p<strong>at</strong>ch has already been gener<strong>at</strong>ed for a given<br />

alloc<strong>at</strong>ion site, <strong>Extermin<strong>at</strong>or</strong> uses the maximum padding value encountered so far.<br />

6.2 Dangling pointer correction<br />

The runtime p<strong>at</strong>ch for a dangling pointer consists of the combin<strong>at</strong>ion of its alloc<strong>at</strong>ion<br />

site info and a time by which to delay its dealloc<strong>at</strong>ion. <strong>Extermin<strong>at</strong>or</strong> computes this delay as<br />

follows. Let τ be the recorded dealloc<strong>at</strong>ion time of the dangled object, and T be the last<br />

alloc<strong>at</strong>ion time. <strong>Extermin<strong>at</strong>or</strong> has no way of knowing how long the object is supposed to<br />

live, so computing an exact delay time is impossible. Instead, it extends the object’s<br />

lifetime (delays its free) by twice the distance between its prem<strong>at</strong>ure free and the last<br />

alloc<strong>at</strong>ion time, plus one: 2×(T −τ)+1.<br />

This choice ensures th<strong>at</strong> <strong>Extermin<strong>at</strong>or</strong> will compute a correct p<strong>at</strong>ch in a logarithmic<br />

number of executions. As we show in Section 7.2, multiple iter<strong>at</strong>ions to correct pointer<br />

errors are rare in practice, because the last alloc<strong>at</strong>ion time can be well past the time th<strong>at</strong> the<br />

object should have been freed.<br />

It is important to note th<strong>at</strong> this dealloc<strong>at</strong>ion deferral does not multiply its lifetime but<br />

r<strong>at</strong>her its drag. To illustr<strong>at</strong>e, an object might live for 1000 alloc<strong>at</strong>ions and then be freed just<br />

10 alloc<strong>at</strong>ions too soon. If the program immedi<strong>at</strong>ely crashes, <strong>Extermin<strong>at</strong>or</strong> will extend its<br />

lifetime by 21 alloc<strong>at</strong>ions, increasing its lifetime by less than 1% (1021/1010).<br />

6.3 The Correcting Memory Alloc<strong>at</strong>or<br />

The correcting memory alloc<strong>at</strong>or incorpor<strong>at</strong>es the runtime p<strong>at</strong>ches described above<br />

and applies them when appropri<strong>at</strong>e. Figure 6 presents pseudo-code for the alloc<strong>at</strong>ion and<br />

dealloc<strong>at</strong>ion functions.<br />

At start-up, or upon receiving a reload signal (Section 3.4), the correcting alloc<strong>at</strong>or<br />

loads the runtime p<strong>at</strong>ches from a specified file. It builds two hash tables: a pad table<br />

mapping alloc<strong>at</strong>ion sites to pad sizes, and a deferral table, mapping pairs of alloc<strong>at</strong>ion and<br />

dealloc<strong>at</strong>ion sites to a deferral value. Because it can reload the run-time p<strong>at</strong>ch file and<br />

Dept. Of Computer Science & Engg. ~ 16 ~ Cochin University of Science & Technology


<strong>Extermin<strong>at</strong>or</strong><br />

rebuild these tables on-the-fly, <strong>Extermin<strong>at</strong>or</strong> can apply p<strong>at</strong>ches to running programs<br />

<strong>with</strong>out interrupting their execution. This aspect of <strong>Extermin<strong>at</strong>or</strong>’s oper<strong>at</strong>ion may be<br />

especially useful for systems th<strong>at</strong> must be kept running continuously.<br />

On every dealloc<strong>at</strong>ion, the correcting alloc<strong>at</strong>or checks to see if the object to be freed<br />

needs to be deferred. If it finds a deferral value for the object’s alloc<strong>at</strong>ion and dealloc<strong>at</strong>ion<br />

site, it pushes onto the deferral priority queue the pointer and the time to actually free it<br />

(the current alloc<strong>at</strong>ion time plus the deferral value).<br />

The correcting alloc<strong>at</strong>or then checks the deferral queue on every alloc<strong>at</strong>ion to see if<br />

an object should now be freed. It then checks whether the current alloc<strong>at</strong>ion site has an<br />

associ<strong>at</strong>ed pad value. If so, it adds the pad value to the alloc<strong>at</strong>ion request, and forwards the<br />

alloc<strong>at</strong>ion request to the underlying alloc<strong>at</strong>or.<br />

Dept. Of Computer Science & Engg. ~ 17 ~ Cochin University of Science & Technology


6.4 Collabor<strong>at</strong>ive Correction<br />

<strong>Extermin<strong>at</strong>or</strong><br />

Each individual user of an applic<strong>at</strong>ion is likely to experience different errors. To allow<br />

an entire user community to autom<strong>at</strong>ically improve software reliability, <strong>Extermin<strong>at</strong>or</strong><br />

provides a simple utility th<strong>at</strong> supports collabor<strong>at</strong>ive correction. This utility takes as input a<br />

number of runtime p<strong>at</strong>ch files. It then combines these p<strong>at</strong>ches by computing the maximum<br />

buffer pad required for any alloc<strong>at</strong>ion site, and the maximal deferral amount for any given<br />

alloc<strong>at</strong>ion site. The result is a new runtime p<strong>at</strong>ch file th<strong>at</strong> covers all observed errors.<br />

Because the size of p<strong>at</strong>ch files is limited by the number of alloc<strong>at</strong>ion sites in a program, we<br />

expect these files to be compact and practical to transmit. For example, the size of the<br />

runtime p<strong>at</strong>ches th<strong>at</strong> <strong>Extermin<strong>at</strong>or</strong> gener<strong>at</strong>es for injected errors in espresso was just 130K,<br />

and shrinks to 17K when compressed <strong>with</strong> gzip.<br />

Dept. Of Computer Science & Engg. ~ 18 ~ Cochin University of Science & Technology


<strong>Extermin<strong>at</strong>or</strong><br />

7. Results<br />

Wh<strong>at</strong> is the runtime overhead of using <strong>Extermin<strong>at</strong>or</strong>?<br />

How effective is <strong>Extermin<strong>at</strong>or</strong> <strong>at</strong> finding and correcting memory errors, both for<br />

injected and real faults?<br />

Wh<strong>at</strong> is the overhead of <strong>Extermin<strong>at</strong>or</strong>’s runtime p<strong>at</strong>ches?<br />

7.1 <strong>Extermin<strong>at</strong>or</strong> Runtime Overhead<br />

Here evalu<strong>at</strong>e <strong>Extermin<strong>at</strong>or</strong>’s performance <strong>with</strong> the SPECint2000 suite running<br />

reference workloads, as well as a suite of alloc<strong>at</strong>ion-intensive benchmarks. We use the<br />

l<strong>at</strong>ter suite of benchmarks both because they are widely used in memory management<br />

studies and because their high alloc<strong>at</strong>ion-intensity stresses memory management<br />

performance. For all experiments, we fix <strong>Extermin<strong>at</strong>or</strong>’s heap multiplier (value of M) <strong>at</strong> 2.<br />

All results are the average of five runs on a quiescent, dual-processor Linux system<br />

<strong>with</strong> 3GB of RAM, <strong>with</strong> each 3.06 GHz Intel Xeon processor (hyperthreading active)<br />

equipped <strong>with</strong> 512K L2 caches. Our observed experimental variance is below 1%.<br />

We focus on the nonreplic<strong>at</strong>ed mode (iter<strong>at</strong>ive/cumul<strong>at</strong>ive), which we expect to be a<br />

key limiting factor for <strong>Extermin<strong>at</strong>or</strong>’s performance and the most common usage scenario.<br />

We compare the runtime of <strong>Extermin<strong>at</strong>or</strong> (DieFast plus the correcting alloc<strong>at</strong>or) to<br />

the GNU libc alloc<strong>at</strong>or. This alloc<strong>at</strong>or is based on the Lea alloc<strong>at</strong>or, which is among the<br />

fastest avail-able. Figure shows th<strong>at</strong>, versus this alloc<strong>at</strong>or, <strong>Extermin<strong>at</strong>or</strong> degrades<br />

performance by from 0% (186.crafty) to 132% (cfrac), <strong>with</strong> a geometric mean of<br />

25.1%. While <strong>Extermin<strong>at</strong>or</strong>’s overhead is substantial for the alloc<strong>at</strong>ion-intensive suite<br />

(geometric mean: 81.2%), where the cost of computing alloc<strong>at</strong>ion and dealloc<strong>at</strong>ion<br />

contexts domin<strong>at</strong>es, its overhead is significantly less pronounced across the SPEC<br />

benchmarks (geometric mean: 7.2%).<br />

Figure: Runtime overhead for <strong>Extermin<strong>at</strong>or</strong> across a suite of<br />

benchmarks, normalized to the performance of GNU libc (Linux) alloc<strong>at</strong>or.<br />

Dept. Of Computer Science & Engg. ~ 19 ~ Cochin University of Science & Technology


7.2 Memory Error Correction<br />

Injected Faults<br />

<strong>Extermin<strong>at</strong>or</strong><br />

To measure <strong>Extermin<strong>at</strong>or</strong>’s effectiveness <strong>at</strong> isol<strong>at</strong>ing and correcting bugs, we used the<br />

fault injector th<strong>at</strong> accompanies the DieHard distribution to inject buffer overflows and<br />

dangling pointer errors. For each d<strong>at</strong>a point, we run the injector using a random seed until<br />

it triggers an error or divergent output. We next use this seed to deterministically trigger a<br />

single error in <strong>Extermin<strong>at</strong>or</strong>, which we run in iter<strong>at</strong>ive mode. We then measure the number<br />

of iter<strong>at</strong>ions required to isol<strong>at</strong>e and gener<strong>at</strong>e an appropri<strong>at</strong>e runtime p<strong>at</strong>ch. The total<br />

number of images (iter<strong>at</strong>ions plus the first run) corresponds to the number of replicas th<strong>at</strong><br />

would be required when running <strong>Extermin<strong>at</strong>or</strong> in replic<strong>at</strong>ed mode.<br />

Buffer overflows: Figure presents the result of triggering 26 different buffer overflows for<br />

three different sizes (4, 8, and 16 bytes) in the espresso benchmark. The number of images<br />

required to isol<strong>at</strong>e and correct these errors is generally just 3, although occasionally there<br />

are outlier cases, where the number of images required reaches <strong>at</strong> most 5. Notice th<strong>at</strong> this<br />

result is substantially better than the analytical worst-case. For three images, Theorem 2<br />

bounds the worst-case likelihood of missing an overflow to 42% (Section 4.1),r<strong>at</strong>her than<br />

the 4% false neg<strong>at</strong>ive r<strong>at</strong>e we observe here.<br />

Figure: The number of images required for <strong>Extermin<strong>at</strong>or</strong> to<br />

isol<strong>at</strong>e and correct buffer overflows of different sizes<br />

Dangling pointer errors: We then triggered 10 dangling pointer faults in espresso <strong>with</strong><br />

<strong>Extermin<strong>at</strong>or</strong> running in iter<strong>at</strong>ive and in cumul<strong>at</strong>ive modes. In iter<strong>at</strong>ive mode, <strong>Extermin<strong>at</strong>or</strong><br />

succeeds in isol<strong>at</strong>ing the error in only 4 runs. In another 4 runs, espresso does not write<br />

through the dangling pointer. Instead, it reads a canary value through the dangled pointer,<br />

tre<strong>at</strong>s it as valid d<strong>at</strong>a, and either crashes or aborts. Since no corruption is present in the<br />

heap, <strong>Extermin<strong>at</strong>or</strong> cannot isol<strong>at</strong>e the source of the error. In the remaining 2 runs, writing<br />

Dept. Of Computer Science & Engg. ~ 20 ~ Cochin University of Science & Technology


<strong>Extermin<strong>at</strong>or</strong><br />

canaries into the dangled object triggers a cascade of errors th<strong>at</strong> corrupt large segments of<br />

the heap. In these cases, the corruption destroys the inform<strong>at</strong>ion <strong>Extermin<strong>at</strong>or</strong> requires to<br />

isol<strong>at</strong>e the error.<br />

Figure: The number of images required for <strong>Extermin<strong>at</strong>or</strong> to<br />

isol<strong>at</strong>e and correct dangling pointer errors<br />

In cumul<strong>at</strong>ive mode, however, <strong>Extermin<strong>at</strong>or</strong> successfully isol<strong>at</strong>es all 10 injected<br />

errors. For runs where no large-scale heap corruption occurs, <strong>Extermin<strong>at</strong>or</strong> requires<br />

between 22 and 30 executions to isol<strong>at</strong>e and correct the errors. In each case, 15 failures<br />

must be observed before the erroneous site pair crosses the likelihood threshold. Because<br />

objects are overwritten randomly, the number of runs required to yield 15 failures varies.<br />

Where writing canaries corrupts a large fraction of the heap, <strong>Extermin<strong>at</strong>or</strong> requires 18<br />

failures and 34 total runs. In some of the runs, execution continues long enough for the<br />

alloc<strong>at</strong>or to reuse the culprit object, preventing <strong>Extermin<strong>at</strong>or</strong> from observing th<strong>at</strong> it was<br />

overwritten.<br />

Real Faults<br />

We also tested <strong>Extermin<strong>at</strong>or</strong> <strong>with</strong> actual bugs in two applic<strong>at</strong>ions:<br />

the Squid web caching server, and the Mozilla web browser.<br />

Squid web cache: Version 2.3s5 of Squid has a buffer overflow; certain inputs cause<br />

Squid to crash <strong>with</strong> either the GNU libc alloc<strong>at</strong>or or the Boehm-Demers-Weiser collector.<br />

We run Squid three times under <strong>Extermin<strong>at</strong>or</strong> in iter<strong>at</strong>ive mode <strong>with</strong> an input th<strong>at</strong><br />

triggers a buffer overflow. <strong>Extermin<strong>at</strong>or</strong> continues executing correctly in each run, but the<br />

overflow corrupts a canary. <strong>Extermin<strong>at</strong>or</strong>’s error isol<strong>at</strong>ion algorithm identifies a single alloc<strong>at</strong>ion<br />

site as the culprit and gener<strong>at</strong>es a pad of exactly 6 bytes,<br />

fixing the error.<br />

Dept. Of Computer Science & Engg. ~ 21 ~ Cochin University of Science & Technology


<strong>Extermin<strong>at</strong>or</strong><br />

Mozilla web browser: We also tested <strong>Extermin<strong>at</strong>or</strong>’s cumul<strong>at</strong>ive mode on a known heap<br />

overflow inMozilla 1.7.3 / Firefox 1.0.6 and earlier. This overflow (bug 307259) occurs<br />

because of an error in Mozilla’s processing of Unicode characters in domain names. Not<br />

only is Mozilla multi-threaded, leading to non-deterministic alloc<strong>at</strong>ion behavior, but even<br />

slight differences in moving the mouse cause alloc<strong>at</strong>ion sequences to diverge. Thus,<br />

neither replic<strong>at</strong>ed nor iter<strong>at</strong>ive modes can identify equivalent objects across multiple runs.<br />

We perform two case studies th<strong>at</strong> represent plausible scenarios for using<br />

<strong>Extermin<strong>at</strong>or</strong>’s cumul<strong>at</strong>ive mode. In the first study, the user starts Mozilla and immedi<strong>at</strong>ely<br />

loads a page th<strong>at</strong> triggers the error. This scenario corresponds to a testing environment<br />

where a proof-of-concept input is available. In the second study, the user first navig<strong>at</strong>es<br />

through a selection of pages (different on each run), and then visits the error-triggering<br />

page. This scenario approxim<strong>at</strong>es deployed use where the error is triggered in the wild.<br />

In both cases, <strong>Extermin<strong>at</strong>or</strong> correctly identifies the overflow <strong>with</strong> no false positives.<br />

In the first case, <strong>Extermin<strong>at</strong>or</strong> requires 23 runs to isol<strong>at</strong>e the error. In the second, it requires<br />

34 runs. We believe th<strong>at</strong> this scenario requires more runs because the site th<strong>at</strong> produces the<br />

overflowed object alloc<strong>at</strong>es more correct objects, making it harder to identify it as<br />

erroneous.<br />

7.3 P<strong>at</strong>ch Overhead<br />

<strong>Extermin<strong>at</strong>or</strong>’s approach to correcting memory errors does not impose additional<br />

execution time overhead in the presence of p<strong>at</strong>ches. However, it consumes additional<br />

space, either by padding alloc<strong>at</strong>ions or by deferring dealloc<strong>at</strong>ions. We measure the space<br />

overhead for buffer overflow corrections by multiplying the size of the pad by the<br />

maximum number of live objects th<strong>at</strong> <strong>Extermin<strong>at</strong>or</strong> p<strong>at</strong>ches. The most space overhead we<br />

observe is for the buffer overflow experiment <strong>with</strong> overflows of size 36, where the total<br />

increased space overhead is between 320 and 2816 bytes.<br />

We measure space overhead for dangling pointer corrections by multiplying the object<br />

size by the number of alloc<strong>at</strong>ions for which the object is deferred; th<strong>at</strong> is, we compute the<br />

total additional drag. In the dangling pointer experiment, the amount of excess memory<br />

ranges from 32 bytes to 1024 bytes (one 256 byte object is deferred for 4 dealloc<strong>at</strong>ions).<br />

This amount constitutes less than 1% of the maximum memory consumed by the<br />

applic<strong>at</strong>ion.<br />

Dept. Of Computer Science & Engg. ~ 22 ~ Cochin University of Science & Technology


8. Rel<strong>at</strong>ed Work<br />

8.1 Randomized Memory Managers<br />

<strong>Extermin<strong>at</strong>or</strong><br />

Several memory management systems employ some degree of randomiz<strong>at</strong>ion,<br />

including loc<strong>at</strong>ing the heap <strong>at</strong> a random base address, adding random padding to alloc<strong>at</strong>ed<br />

objects , shuffling recently-freed objects , or a mix of padding and object deferral. This<br />

level of randomiz<strong>at</strong>ion is insufficient for <strong>Extermin<strong>at</strong>or</strong>, which requires full heap<br />

randomiz<strong>at</strong>ion.<br />

<strong>Extermin<strong>at</strong>or</strong> builds on DieHard, which toler<strong>at</strong>es errors probabilistically; Section 3.1<br />

provides an overview. <strong>Extermin<strong>at</strong>or</strong> substantially modifies and extends DieHard’s heap<br />

layout and alloc<strong>at</strong>ion algorithms. It also uses probabilistic algorithms th<strong>at</strong> identify and<br />

correct errors.<br />

8.2 Autom<strong>at</strong>ic Repair<br />

We are aware of only one other system th<strong>at</strong> repairs errors autom<strong>at</strong>ically: Demsky et<br />

al.’s autom<strong>at</strong>ic d<strong>at</strong>a structure repair enforces d<strong>at</strong>a structure consistency specific<strong>at</strong>ions,<br />

guided by a formal description of the program’s d<strong>at</strong>a structures. <strong>Extermin<strong>at</strong>or</strong> <strong>at</strong>tacks a<br />

different problem, namely th<strong>at</strong> of isol<strong>at</strong>ing and correcting memory errors, and is<br />

orthogonal and complementary to d<strong>at</strong>a structure repair.<br />

Sidiroglou et al. propose STEM, a self-healing runtime th<strong>at</strong> executes functions in a<br />

transactional environment so th<strong>at</strong> if they detect the function misbehaving, they can prevent<br />

it from doing damage. Using STEM, they implement error virtualiz<strong>at</strong>ion, which maps the<br />

set of possible errors in a function onto those th<strong>at</strong> have an explicit error handler. The more<br />

recent SEAD system goes beyond STEM requiring no source code changes, handling I/O<br />

<strong>with</strong> virtual proxies, and by specifying the repair policy explicitly through an external<br />

description. While STEM and SEAD are promising approaches to autom<strong>at</strong>ically<br />

recovering from errors, neither provides solutions for as broad a class of errors as<br />

<strong>Extermin<strong>at</strong>or</strong>, nor do they provide mechanisms to semantically elimin<strong>at</strong>e the source of the<br />

error autom<strong>at</strong>ically, as <strong>Extermin<strong>at</strong>or</strong> does.<br />

8.3 Autom<strong>at</strong>ic Debugging<br />

Two previous systems apply techniques designed to help isol<strong>at</strong>e bugs.<br />

St<strong>at</strong>istical bug isol<strong>at</strong>ion is a distributed assertion sampling technique th<strong>at</strong> helps pinpoint<br />

the loc<strong>at</strong>ion of errors, including but not limited to memory errors. It works by injecting<br />

lightweight tests into the source code; the result of these tests, in bit vector form, can be<br />

processed to gener<strong>at</strong>e likely sources of the errors. This st<strong>at</strong>istical processing differs from<br />

<strong>Extermin<strong>at</strong>or</strong>’s probabilistic error isol<strong>at</strong>ion algorithms, although Liu et al. also use<br />

hypothesis testing. Like st<strong>at</strong>istical bug isol<strong>at</strong>ion, <strong>Extermin<strong>at</strong>or</strong> can leverage the runs of<br />

deployed programs. However, unlike st<strong>at</strong>istical bug isol<strong>at</strong>ion, <strong>Extermin<strong>at</strong>or</strong> requires<br />

Dept. Of Computer Science & Engg. ~ 23 ~ Cochin University of Science & Technology


<strong>Extermin<strong>at</strong>or</strong><br />

neither source code nor a large deployed user base in order to find errors, and<br />

autom<strong>at</strong>ically gener<strong>at</strong>es runtime p<strong>at</strong>ches th<strong>at</strong> correct them.<br />

Delta debugging autom<strong>at</strong>es the process of identifying the smallest possible inputs th<strong>at</strong> do<br />

and do not exhibit a given error. Given these inputs, it is up to the software developer to<br />

actually loc<strong>at</strong>e the bugs themselves. <strong>Extermin<strong>at</strong>or</strong> focuses on a narrower class of errors, but<br />

is able to isol<strong>at</strong>e and correct an error given just one erroneous input, regardless of its size.<br />

8.4 Fault Tolerance<br />

Recently, there has been an increasing focus on approaches for toler<strong>at</strong>ing hardware<br />

transient errors th<strong>at</strong> are becoming more common due to fabric<strong>at</strong>ion process limit<strong>at</strong>ions.<br />

Work in this area ranges from proposed hardware support to software fault tolerance.While<br />

<strong>Extermin<strong>at</strong>or</strong> also uses redundancy as a method for detecting and correcting errors,<br />

<strong>Extermin<strong>at</strong>or</strong> goes beyond toler<strong>at</strong>ing software errors, which are not transient, to correcting<br />

them permanently. Like <strong>Extermin<strong>at</strong>or</strong>, other efforts in the fault tolerance community seek<br />

to g<strong>at</strong>her d<strong>at</strong>a from multiple program executions to identify potential errors. For example,<br />

Guo et al. use st<strong>at</strong>istical techniques on internal monitoring d<strong>at</strong>a to probabilistically detect<br />

faults, including memory leaks and deadlocks. <strong>Extermin<strong>at</strong>or</strong> goes beyond this previous<br />

work by characterizing each memory error so specifically th<strong>at</strong> a correction can be<br />

autom<strong>at</strong>ically gener<strong>at</strong>ed for it.<br />

Rinard et al. present a compiler-based approach called boundless buffers th<strong>at</strong> caches<br />

out-of-bound writes in a hash table for l<strong>at</strong>er reuse. This approach elimin<strong>at</strong>es buffer<br />

overflow errors (though not dangling pointer errors), but requires source code and imposes<br />

higher performance overheads (1.05x to 8.9x).<br />

Rx oper<strong>at</strong>es by checkpointing program execution and logging inputs. Rx rolls back<br />

crashed applic<strong>at</strong>ions and replays inputs to it in a new environment th<strong>at</strong> pads all alloc<strong>at</strong>ions<br />

or defers all dealloc<strong>at</strong>ions by some amount. If this new environment does not yield<br />

success, Rx rolls back the applic<strong>at</strong>ion again and increases the pad values, up to some<br />

threshold. Unlike Rx, <strong>Extermin<strong>at</strong>or</strong> does not require checkpointing or rollback, and<br />

precisely isol<strong>at</strong>es and corrects memory errors.<br />

8.5 Memory Managers<br />

Conserv<strong>at</strong>ive garbage collection prevents dangling pointer errors, but does not prevent<br />

buffer overflows. <strong>Extermin<strong>at</strong>or</strong>’s error isol<strong>at</strong>ion and correction is orthogonal to garbage<br />

collection.<br />

Finally, there have been numerous debugging memory alloc<strong>at</strong>ors; the document<strong>at</strong>ion<br />

for one of them, mp<strong>at</strong>rol, includes a list of over ninety such systems. Notable recent<br />

alloc<strong>at</strong>ors <strong>with</strong> debugging fe<strong>at</strong>ures include dnmalloc, Heap Server, and version 2.8 of the<br />

Lea alloc<strong>at</strong>or. <strong>Extermin<strong>at</strong>or</strong> either prevents or corrects errors th<strong>at</strong> these alloc<strong>at</strong>ors can only<br />

detect.<br />

Dept. Of Computer Science & Engg. ~ 24 ~ Cochin University of Science & Technology


9. Future Work<br />

<strong>Extermin<strong>at</strong>or</strong><br />

While <strong>Extermin<strong>at</strong>or</strong> can effectively loc<strong>at</strong>e and correct memory errors on the heap, it<br />

does not yet address stack errors. We are investig<strong>at</strong>ing approaches to apply <strong>Extermin<strong>at</strong>or</strong> to<br />

the stack.<br />

In addition, while <strong>Extermin<strong>at</strong>or</strong>’s runtime p<strong>at</strong>ches contain inform<strong>at</strong>ion th<strong>at</strong> describe<br />

the error loc<strong>at</strong>ion and its extent, it is not in a human-readable form. We plan to develop a<br />

tool to process runtime p<strong>at</strong>ches into bug reports <strong>with</strong> suggested fixes.<br />

<strong>Extermin<strong>at</strong>or</strong> currently requires programs to be deterministic in their alloc<strong>at</strong>ion<br />

p<strong>at</strong>terns and their storage to heap objects. Individual executions of a program are expected<br />

to yield the same sequence of object alloc<strong>at</strong>ion identifiers (i.e., object i always refers to the<br />

same semantic object).<br />

These requirements make <strong>Extermin<strong>at</strong>or</strong> well-suited for serial programs, but less<br />

useful for multi-threaded programs. While <strong>Extermin<strong>at</strong>or</strong> can run programs <strong>with</strong> a small<br />

amount of non determinism, its isol<strong>at</strong>ion algorithm is not robust <strong>with</strong> respect to disruptions<br />

to the alloc<strong>at</strong>ion sequence.<br />

While simul<strong>at</strong>ion or deterministic replay mechanisms would enable <strong>Extermin<strong>at</strong>or</strong> to<br />

work reliably for multi-threaded programs, we plan to explore a variety of ways to directly<br />

handle programs <strong>with</strong> extensive non-determinism. One way to improve its behavior in the<br />

face of non-determinism would be for <strong>Extermin<strong>at</strong>or</strong> to associ<strong>at</strong>e object identifiers <strong>with</strong><br />

thread ids. This change would prevent multiple threads from disrupting a global alloc<strong>at</strong>ion<br />

sequence. We are especially interested in combining st<strong>at</strong>istical alignment techniques from<br />

machine learning <strong>with</strong> inform<strong>at</strong>ion gleaned from locks and other synchroniz<strong>at</strong>ion variables<br />

to let us recover alloc<strong>at</strong>ion sequences across multiple threads.<br />

Dept. Of Computer Science & Engg. ~ 25 ~ Cochin University of Science & Technology


10. Conclusion<br />

<strong>Extermin<strong>at</strong>or</strong><br />

This paper presents <strong>Extermin<strong>at</strong>or</strong>, a system th<strong>at</strong> autom<strong>at</strong>ically corrects heap-based<br />

memory errors in C and C++ programs <strong>with</strong> high probability. <strong>Extermin<strong>at</strong>or</strong> oper<strong>at</strong>es<br />

entirely <strong>at</strong> the runtime level on unaltered binaries, and consists of three key components:<br />

DieFast, a probabilistic debugging alloc<strong>at</strong>or<br />

a probabilistic error isol<strong>at</strong>ion algorithm, and<br />

a correcting memory alloc<strong>at</strong>or.<br />

Ex-termin<strong>at</strong>or’s probabilistic error isol<strong>at</strong>ion isol<strong>at</strong>es the source and ex-tent of memory<br />

errors <strong>with</strong> provably low false positive and false neg<strong>at</strong>ive r<strong>at</strong>es. Its correcting memory<br />

alloc<strong>at</strong>or incorpor<strong>at</strong>es runtime p<strong>at</strong>ches th<strong>at</strong> the error isol<strong>at</strong>ion algorithm gener<strong>at</strong>es to<br />

correct memory errors. <strong>Extermin<strong>at</strong>or</strong> is not only suitable for use during testing, but also<br />

can autom<strong>at</strong>ically correct deployed programs.<br />

Dept. Of Computer Science & Engg. ~ 26 ~ Cochin University of Science & Technology


REFERENCES<br />

<strong>Extermin<strong>at</strong>or</strong><br />

Source : Communic<strong>at</strong>ions of the ACM, v.51 n.12, <strong>Extermin<strong>at</strong>or</strong>: Autom<strong>at</strong>ically<br />

correcting memory errors <strong>with</strong> high probability, Pages : 87-95, December 2008<br />

SECTION:Research highlights<br />

Authors : Gene Novark , University of Massachusetts, Amherst, MA<br />

Emery D. Berger , University of Massachusetts, Amherst, MA<br />

Benjamin G. Zorn , Microsoft Research, Redmond, WA<br />

o http://www.cs.umass.edu/~gnovark/<br />

o www.cs.umass.edu/~gnovark/pldi028-novark.<strong>pdf</strong><br />

o www.cs.umass.edu/~emery/pubs/UMCS-2006-058.<strong>pdf</strong><br />

o www.slideshare.net/.../autom<strong>at</strong>ically-toler<strong>at</strong>ing-and-correcting-memoryerrors<br />

-<br />

Dept. Of Computer Science & Engg. ~ 27 ~ Cochin University of Science & Technology

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!