10.07.2015 Views

mayhem-oakland-12

mayhem-oakland-12

mayhem-oakland-12

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

(unlike standard concolic execution), and the re-executionhappens concretely. Figure 3 shows the intuition behindhybrid execution. We provide a detailed comparison betweenonline, offline, and hybrid execution in §VIII-C.C. Design and Implementation of the CECThe CEC takes in the binary program, a list of inputsources to be considered symbolic, and an optional checkpointinput that contains execution state information froma previous run. The CEC concretely executes the program,hooks input sources and performs taint analysis on inputvariables. Every basic block that contains tainted instructionsis sent to the SES for symbolic execution. As a response,the CEC receives the address of the next basic block tobe executed and whether to save the current state as arestoration point. Whenever an execution path is complete,the CEC context-switches to an unexplored path selectedby the SES and continues execution. The CEC terminatesonly if all possible execution paths have been explored or athreshold is reached. If we provide a checkpoint, the CECfirst executes the program concretely until the checkpointand then continues execution as before.Virtualization Layer. During an online execution run, theCEC handles multiple concrete execution states of theanalyzed program simultaneously. Each concrete executionstate includes the current register context, memory andOS state (the OS state contains a snapshot of the virtualfilesystem, network and kernel state). Under the guidanceof the SES and the path selector, the CEC context switchesbetween different concrete execution states depending on thesymbolic executor that is currently active. The virtualizationlayer mediates all system calls to the host OS and emulatesthem. Keeping separate copies of the OS state ensures thereare no side-effects across different executions. For instance,if one executor writes a value to a file, this modificationwill only be visible to the current execution state—all otherexecutors will have a separate instance of the same file.Efficient State Snapshot. Taking a full snapshot of theconcrete execution state at every fork is very expensive. Tomitigate the problem, CEC shares state across executionstates–similar to other systems [9], [28]. Whenever executionforks, the new execution state reuses the state of the parentexecution. Subsequent modifications to the state are recordedin the current execution.D. Design and Implementation of the SESThe SES manages the symbolic execution environmentand decides which paths are executed by the CEC. Theenvironment consists of a symbolic executor for each path,a path selector which determines which feasible path to runnext, and a checkpoint manager.The SES caps the number of symbolic executors to keep inmemory. When the cap is reached, MAYHEM stops generatingnew interpreters and produces checkpoints; execution statesthat will explore program paths that MAYHEM was unableto explore in the first run due to the memory cap. Eachcheckpoint is prioritized and used by MAYHEM to continueexploration of these paths at a subsequent run. Thus, when allpending execution paths terminate, MAYHEM selects a newcheckpoint and continues execution—until all checkpointsare consumed and MAYHEM exits.Each symbolic executor maintains two contexts (as state):a variable context containing all symbolic register valuesand temporaries, and a memory context keeping track of allsymbolic data in memory. Whenever execution forks, theSES clones the current symbolic state (to keep memory low,we keep the execution state immutable to take advantage ofcopy-on-write optimizations—similar to previous work [9],[28]) and adds a new symbolic executor to a priority queue.This priority queue is regularly updated by our path selectorto include the latest changes (e.g., which paths were explored,instructions covered, and so on).Preconditioned Symbolic Execution: MAYHEM implementspreconditioned symbolic execution as in AEG [2].In preconditioned symbolic execution, a user can optionallygive a partial specification of the input, such as a prefixor length of the input, to reduce the range of search space.If a user does not provide a precondition, then SES triesto explore all feasible paths. This corresponds to the userproviding the minimum amount of information to the system.Path Selection: MAYHEM applies path prioritizationheuristics—as found in systems such as SAGE [13] andKLEE [9]—to decide which path should be explored next.Currently, MAYHEM uses three heuristic ranking rules: a)executors exploring new code (e.g., instead of executingknown code more times) have high priority, b) executorsthat identify symbolic memory accesses have higher priority,and c) execution paths where symbolic instruction pointersare detected have the highest priority. The heuristics aredesigned to prioritize paths that are most likely to contain abug. For instance, the first heuristic relies on the assumptionthat previously explored code is less likely to contain a bugthan new code.E. Performance TuningMAYHEM employs several optimizations to speed-upsymbolic execution. We present three optimizations thatwere most effective: 1) independent formula, 2) algebraicsimplifications, and 3) taint analysis.Similar to KLEE [9], MAYHEM splits the path predicateto independent formulas to optimize solver queries. Asmall implementation difference compared to KLEE is thatMAYHEM keeps a map from input variables to formulas at alltimes. It is not constructed only for querying the solver (thisrepresentation allows more optimizations §V). MAYHEM alsoapplies other standard optimizations as proposed by previoussystems such as the constraint subsumption optimization [13],a counter-example cache [9] and others. MAYHEM also

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!