12.07.2015 Views

Certified Self-Modifying Code - The Flint Project - Yale University

Certified Self-Modifying Code - The Flint Project - Yale University

Certified Self-Modifying Code - The Flint Project - Yale University

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

<strong>Certified</strong> <strong>Self</strong>-<strong>Modifying</strong> <strong>Code</strong>Hongxu CaiDepartment of Computer Science andTechnology, Tsinghua <strong>University</strong>Beijing, 100084, Chinahxcai00@mails.tsinghua.edu.cnZhong ShaoDepartment of Computer Science<strong>Yale</strong> <strong>University</strong>New Haven, CT 06520, USAshao@cs.yale.eduAlexander VaynbergDepartment of Computer Science<strong>Yale</strong> <strong>University</strong>New Haven, CT 06520, USAalv@cs.yale.eduAbstract<strong>Self</strong>-modifying code (SMC), in this paper, broadly refers to anyprogram that loads, generates, or mutates code at runtime. It iswidely used in many of the world’s critical software systems to supportruntime code generation and optimization, dynamic loadingand linking, OS boot loader, just-in-time compilation, binary translation,or dynamic code encryption and obfuscation. Unfortunately,SMC is also extremely difficult to reason about: existing formalverification techniques—including Hoare logic and type system—consistently assume that program code stored in memory is fixedand immutable; this severely limits their applicability and power.This paper presents a simple but novel Hoare-logic-like frameworkthat supports modular verification of general von-Neumannmachine code with runtime code manipulation. By dropping the assumptionthat code memory is fixed and immutable, we are forcedto apply local reasoning and separation logic at the very beginning,and treat program code uniformly as regular data structure.We address the interaction between separation and code memoryand show how to establish the frame rules for local reasoning evenin the presence of SMC. Our framework is realistic, but designedto be highly generic, so that it can support assembly code under allmodern CPUs (including both x86 and MIPS). Our system is expressiveand fully mechanized. We prove its soundness in the Coqproof assistant and demonstrate its power by certifying a series ofrealistic examples and applications—all of which can directly runon the SPIM simulator or any stock x86 hardware.1. Introduction<strong>Self</strong>-modifying code (SMC), in this paper, broadly refers to anyprogram that purposely loads, generates, or mutates code at runtime.It is widely used in many of the world’s critical software systems.For example, runtime code generation and compilation canimprove the performance of operating systems [21] and other applicationprograms [20, 13, 31]. Dynamic code optimization canimprove the performance [4, 11] or minimize the code size [7]. Dynamiccode encryption [29] or obfuscation [15] can support codeprotection and tamper-resistant software [3]; they also make it hardfor crackers to debug or decompile the protected binaries. Evolutionarycomputing systems can use dynamic techniques to supportgenetic programming [26]. SMC also arises in applications suchas just-in-time compiler, dynamic loading and linking, OS bootloaders,binary translation, and virtual machine monitor.Unfortunately, SMC is also extremely difficult to reason about:existing formal verification techniques—including Hoare logic [8,12] and type system [27, 23]—consistently assume that programcode stored in memory is immutable; this significantly limits theirpower and applicability.In this paper we present a simple but powerful Hoare-logicstyleframework—namely GCAP (i.e., CAP [33] on General vonExamples System whereopcode modification GCAP2 Sec 5.2control flow modification GCAP2 Sec 6.2Common unbounded code rewriting GCAP2 Sec 6.5Basic runtime code checking GCAP1 Sec 6.4Constructs runtime code generation GCAP1 Sec 6.3multilevel RCG GCAP1 Sec 4.1self-mutating code block GCAP2 Sec 6.9mutual modification GCAP2 Sec 6.7self-growing code GCAP2 Sec 6.6polymorphic code GCAP2 Sec 6.8code optimization GCAP2/1 Sec 6.3Typical code compression GCAP1 Sec 6.10Applications code obfuscation GCAP2 Sec 6.9code encryption GCAP1 Sec 6.10OS boot loaders GCAP1 Sec 6.1shellcode GCAP1 Sec 6.11Table 1. A summary of GCAP-supported applicationsNeumann machines)—that supports modular verification of generalmachine code with runtime code manipulation. By droppingthe assumption that code memory is fixed and immutable, we areforced to apply local reasoning and separation logic [14, 30] at thevery beginning, and treat program code uniformly as regular datastructure. Our framework is realistic, but designed to be highlygeneric, so that it can support assembly code under all modernCPUs (including both x86 and MIPS). Our paper makes the followingnew contributions:• Our GCAP system is the first formal framework that can successfullycertify any form of runtime code manipulation—including all the common basic constructs and important applicationsgiven in Table 1. We are the first to successfully certifya realistic OS boot loader that can directly boot on stock x86hardware. All of our MIPS examples can be directly executedin the SPIM 7.3 simulator[18].• GCAP is the first successful extension of Hoare-style programlogic that treats machine instructions as regular mutable datastructures. A general GCAP assertion can not only specifythe usual precondition for data memory but also can ensurethat code segments are correctly loaded into memory beforeexecution. We develop the idea of parametric code blocks tospecify and reason about all possible outputs of each selfmodifyingprogram. <strong>The</strong>se results are general and can be easilyapplied to other Hoare-style verification systems.• GCAP supports both modular verification [9] and frame rulesfor local reasoning [30]. Program modules can be verifiedseparately and with minimized import interfaces. GCAP pinpointsthe precise boundary between non-self-modifying codegroups and those that do manipulate code at runtime. Non-self-


modifying code groups can be certified without any knowledgeabout each other’s implementation, yet they can still be safelylinked together with other self-modifying code groups.• GCAP is highly generic in the sense that it is the first attemptto support different machine architectures and instruction setsin the same framework without modifying any of its inferencerules. This is done by making use of several auxiliary functionsthat abstract away the machine-specific semantics and byconstructing generic (platform-independent) inference rules forcertifying well-formed code sequences.In the rest of this paper, we first present our von-Neumann machineGTM in Section 2. We stage the presentation of GCAP: Section3 presents a Hoare-style program logic for GTM; Section 4presents a simple extended GCAP1 system for certifying runtimecode loading and generation; Section 5 presents GCAP2 whichextends GCAP1 with general support of SMC. In Section 6, wepresent a large set of certified SMC applications to demonstratethe power and practicality of our framework. Our system is fullymechanized—the Coq implementation (including the full soundnessproof) is available on the web [5].2. General Target Machine GTMIn this section, we present the abstract machine model and its operationalsemantics, both of which are formalized inside a mechanizedmeta logic (Coq [32]). After that, we use an example todemonstrate the key features of a typical self-modifying program.2.1 SMC on Stock HardwareBefore proceeding further, a few practical issues for implementingSMC on today’s stock hardware need to be clarified. First, mostCPU nowadays prefetches a certain number of instructions beforeexecuting them from the cache. <strong>The</strong>refore, instruction modificationhas to occur long before it becomes visible. Meantime, for backwardcompatibility, some processors would detect and handle thisitself (at the cost of performance). Another issue is that some RISCmachines require the use of branch delay slots. In our system, weassume that all these are hidden from programmers at the assemblylevel (which is true for the SPIM simulator).<strong>The</strong>re exist more obstacles against SMC at the OS level. Operatingsystems usually assign a flag to each memory page to protectdata from being executed or, code from being modified. But thiscan often be get around through special techniques. For example,only stacks are allowed to support both the writing access and theexecution at the same time under Microsoft Windows, which providesa way to do SMC.To simplify the presentation, we will no longer consider theseeffects because they do not affect our core ideas in the rest ofthis paper. It is important to first sort out the key techniques forreasoning about SMC, while applying our system to deal withspecific hardware and OS restrictions will be left as future work.2.2 Abstract MachineOur general machine model, namely GTM, is an abstract frameworkfor von Neumann machines. GTM is general because it can beused to model modern computing architecture such as x86, MIPS,or PowerPC. Fig 1 shows the essential elements of GTM. An instanceMofa GTM machine is modeled as a 5-tuple that determinesthe machine’s operational semantics.A machine stateSshould consist of at least a memory componentM,which is a partial map from the memory address to itsstored Byte value. Byte specifies the machine byte which is the minimumunit of memory addressing. Note that because the memorycomponent is a partial map, its domain can be any subset of naturalnumbers.Erepresents other additional components of a state,(Machine)(State)(Mem)M ::= (Extension,Instr, Ec : Instr→ByteList,Next : Address→Instr→State⇀State,Npc : Address→Instr→State→Address)S ::= (M,E)M ::={fb} ∗(Extension) E ::=...(Address) f,pc ::=... (nat nums)(Byte) b ::=... (0..255)(ByteList)bs ::=b,bs|b(Instr) ι ::=...(World)W ::= (S,pc)Figure 1. Definition of target machine GTMIfDecode(S,pc,ι) is true, then(S,pc)↦−→ ( Next pc,ι (S), Npc pc,ι (S) )Figure 2. GTM program executionwhich may include register files and disks, etc. No explicit codeheap is involved: all the code is encoded and stored in the memoryand can be accessed just as regular data. Instr specifies the instructionset, with an encoding function Ec describing how instructionscan be stored in memory as byte sequences. We also introduce anauxiliary Decode predicate which is defined as follows:Decode((M,E),f,ι)Ec(ι)=(M[f],...,M[f+|Ec(ι)|−1])In other words,Decode(S,f,ι) states that under the stateS, certainconsecutive bytes stored starting from memory addressf are preciselythe encoding of instructionι.Program execution is modeled as a small-step transition relationbetween two Worlds (i.e.,W↦−→W ′ ), where a worldWis just astate plus a program counterpc that indicates the next instruction tobe executed. <strong>The</strong> definition of this transition relation is formalizedin Fig 2. Next and Npc are two functions that define the behavior ofall available instructions. When instructionι located at addresspc isexecuted at stateS, Next pc,ι (S) is the resulting state and Npc pc,ι (S)is the resulting program counter. Note that Next could be a partialfunction (since memory is partial) while Npc is always total.To make a step, a certain number of bytes starting frompc arefetched out and decoded into an instruction, which is then executedfollowing the Next and Npc functions. <strong>The</strong>re will be no transition ifNext is undefined on a given state. As expected, if there is no validtransition from a world, the execution gets stuck.To make program execution deterministic, the following conditionshould be satisfied:∀S,f,ι 1 ,ι 2 .Decode(S,f,ι 1 )∧Decode(S,f,ι 2 )−→ι 1 =ι 2In other words, Ec should be prefix-free: under no circumstancesshould the encoding of one instruction be a prefix of the encoding ofanother one. Instruction encodings on real machines follow regularpatterns (e.g., the actual value for each operand is extracted fromcertain bits). <strong>The</strong>se properties are critical when involving operandmodifyinginstructions. Appel et al [2, 22] gave a more specificdecoding relation and an in-depth analysis.<strong>The</strong> definitions of the Next and Npc functions should also guaranteethe following property: if ((M,E),pc)↦−→ ((M ′ ,E ′ ),pc ′ ) andM ′′ is a memory whose domain does not overlap with those ofMandM ′ , then ((M∪M ′′ ,E),pc)↦−→ ((M ′ ∪M ′′ ,E ′ ),pc ′ ). In otherwords, adding extra memory does not affect the execution processof the original world. This property can be further refined into thefollowing two fundamental requirements of the Next and Npc func-2 2007/3/31


tions:(State)(RegFile)S ::= (M,R)R ∈ Register→Value(Register) r ::= $1|...| $31(Value) i,〈w〉 1 ::=... (int nums)(Word) w,〈i〉 4 ::=b,b,b,b(Instr) ι ::= nop|lir d ,i|addr d ,r s ,r t | addir t ,r s ,i| mover d ,r s | lwr t ,i(r s )|swr t ,i(r s )| lar d ,f|jf|jrr s | beqr s ,r t ,i|jal f| mulr d ,r s ,r t | bner s ,r t ,iFigure 3. M MIPS data typesEc(ι)...,ifι=then Next pc,ι (M,R)=jalf(M,R{$31pc+4})nop(M,R)lir d ,i / lar d ,i (M,R{r d i})addr d ,r s ,r t (M,R{r d R(r s )+R(r t )})addir t ,r s ,i (M,R{r t R(r s )+i})mover d ,r s (M,R{r d R(r s )})lwr t ,i(r s ) (M,R{r t 〈M(f),...,M(f+3)〉 1 })iff=R(r s )+i∈dom(M)swr t ,i(r s ) (M{f,...,f+3〈R(r t )〉 4 },R)iff=R(r s )+i∈dom(M)mulr d ,r s ,r t (M,R{r d R(r s )×R(r t )})Otherwise (M,R)andifι= then Npc pc,ι (M,R)=jf fjrr s R(r s )⎧⎪⎨beqr s ,r t ,i ⎪⎩ pc+i whenR(r s)=R(r t ),pc+4 whenR(r s )R(r t )jalf f⎧⎪⎨ pc+i whenR(r s )R(r t ),bner s ,r t ,i ⎪⎩ pc+4 whenR(r s )=R(r t )Otherwise pc+|Ec(ι)|Figure 4.M MIPS operational semantics∀M,M ′ ,M 0 ,E,E ′ ,pc,ι.M⊥M 0 ∧ Next pc,ι (M,E)=(M ′ ,E ′ )→whereM ′ ⊥M 0 ∧ Next pc,ι (M⊎M 0 ,E)=(M ′ ⊎M 0 ,E ′ ) (1)∀pc,ι,M,M ′ ,E.Npc pc,ι (M,E)=Npc pc,ι (M⊎M ′ ,E) (2)2.3 SpecializationM⊥M ′ dom(M)∩dom(M ′ )=∅,M⊎M ′ M∪M ′ ifM⊥M ′ .By specializing every component of the machineMaccording todifferent architectures, we obtain different machines instances.MIPS specialization. <strong>The</strong> MIPS machineM MIPS is built as aninstance of the GTM framework (Fig 3). InM MIPS , the machinestate consists of a (M,R) pair, whereRis a register file, definedas a map from each of the 31 registers to a stored value. $0 is notincluded in the register set since it always stores constant zero andis immutable according to MIPS convention. A machine Word is thecomposition of four Bytes. To achieve interaction between registersand memory, two operators —〈·〉 1 and〈·〉 4 — are defined (detailsomitted here) to do type conversion between Word and Value.(Word) w ::=b,b(State) S ::= (M,R,D)(RegFile) R ::={r 16 w} ∗ ∪{r s w} ∗(Disk) D ::={lb} ∗(Word Regs) r 16 ::=r AX |r BX |r CX |r DX |r S I |r DI |r BP |r S P(Byte Regs) r 8 ::=r AH |r AL |r BH |r BL |r CH |r CL |r DH |r DL(S egment Regs) r s ::=r DS |r ES |r S S(Instr) ι ::= movww,r 16 | movwr 16 ,r S | movbb,r 8| jmpb|jmplw,w|intb|...Figure 5.M x86 data typesR(r AH ) :=R(r AX )&(255≪8)R(r BH ) :=R(r BX )&(255≪8)R(r CH ) :=R(r CX )&(255≪8)R(r DH ) :=R(r DX )&(255≪8)R(r AL ) :=R(r AX )&255R(r BL ) :=R(r BX )&255R(r CL ) :=R(r CX )&255R(r DL ) :=R(r DX )&255R{r AH b} :=R{r AX (R(r AX )&255|b≪8)}R{r AL b} :=R{r AX (R(r AX )&(255≪8)|b)}R{r BH b} :=R{r BX (R(r BX )&255|b≪8)}R{r BL b} :=R{r BX (R(r BX )&(255≪8)|b)}R{r CH b} :=R{r CX (R(r CX )&255|b≪8)}R{r CL b} :=R{r CX (R(r CX )&(255≪8)|b)}R{r DH b} :=R{r DX (R(r DX )&255|b≪8)}R{r DL b} :=R{r DX (R(r DX )&(255≪8)|b)}Figure 6.M x86 8-bit register useEc(ι)...,ifι=then Next pc,ι (M,R,D)=movww,r 16 (M,R{r 16 w},D)movwr 16 1,r 16 2 (M,R{r 16 2R(r 16 1)},D)movwr 16 ,r S (M,R{r S R(r 16 )},D)movwr S ,r 16 (M,R{r 16 R(r S )},D)movbb,r 8 (M,R{r 8 b},D)movbr 8 1,r 8 2 (M,R{r 8 2R(r 8 1)},D)jmpb(M,R,D)jmplw 1 ,w 2 (M,R,D)intbBIOS Callb(M,R,D)... ...andifι=then Npc pc,ι (M,R)=jmpb pc+2+bjmplw 1 ,w 2 w 1 ∗ 16+w 2Non-jmp instructions pc+|Ec(ι)|... ...Figure 7.M x86 operational semantics<strong>The</strong> set of instructions Instr is minimal and it contains onlythe basic MIPS instructions, but extensions can be easily made.M MIPS supports direct jump, indirect jump, and jump-and-link(jal) instructions. It also provides relative addressing for branchinstructions (e.g. beqr s ,r t ,i), but for clarity we will continue usingcode labels to represent the branching targets in our examples.<strong>The</strong> Ec function follows the official MIPS documentation and isomitted. Interested readers can read our Coq implementation. Fig 4gives the definitions of Next and Npc. It is easy to see that thesefunctions indeed satisfy the requirements we mentioned earilier.x86 (16-bit) specialization. In Fig 5, we show our x86 machine,M x86 , as an instance of GTM. <strong>The</strong> specification ofM x86 is arestriction of the real x86 architecture. It is limited to 16-bit realmode, and has only a small subset of instructions, including indirectand conditional jumps. We must note, however, that this is not due3 2007/3/31


Call 0x13 (disk operations) (id= 0x13)Command 0x02 (disk read) (R(r AH )=0x02)ParametersCount =R(r AL ) Cylinder=R(r CH )Sector =R(r CL ) Head =R(r DH )Disk Id=R(r DL ) Bytes = Count∗512Src = (Sector−1)∗512Dest = R(r ES )∗16+R(r BX )ConditionsCylinder=0 Head=0Disk Id=0x80 Sector0Safe(W) ∀n:Nat.Safen n (W)Safen n states that the machine is safe to execute n steps from aworld, while Safe describes that the world is safe to run forever.Invariant-based method [17] is a common technique for provingsafety properties of programs.Definition 3.1 An invariant is a predicate, namely Inv, definedover machine worlds, such that the following holds:•∀W.Inv(W)−→∃W ′ .(W↦−→W ′ ) (Progress)•∀W.Inv(W)∧(W↦−→W ′ )−→ Inv(W ′ ) (Preservation)<strong>The</strong> existence of an invariant immediately implies program safety,as shown by the following theorem.<strong>The</strong>orem 3.2 If Inv is an invariant then∀W.Inv(W)→Safe(W).Invariant-based method can be directly used to prove the examplewe just presented in Fig 9: one can construct a global invariantfor the code, as much like the tedious one in Fig 10, which satisfiesour Definition 3.1.Invariant-based verification is powerful especially when carriedout with a rich meta logic. But a global invariant for a program,usually troublesome and complicated, is often unrealistic to directlyfind and write down. A global invariant needs to specify thecondition on every memory address in the code. Even worse, thisprevents separate verification (of different components) and localreasoning, as we see: every program module has to be annotatedwith the same global invariant which requires the knowledge of theentire code base.4 2007/3/31


Inv(M,R,pc)Decode(M,100,ι 100 )∧Decode(M,200,ι 200 )∧Decode(M,208,ι 208 )∧Decode(M,212,ι 212 )∧Decode(M,216,ι 216 )∧Decode(M,224,ι 224 )∧Decode(M,232,ι 232 )∧((pc=200∧Decode(M,204,ι 204 ))∨(pc=204∧Decode(M,204,ι 204 ))∨(pc=204∧Decode(M,204,ι 100 )∧R($2)=R($4))∨(pc=208∧R($4)≤R($2)≤R($4)+1)∨(pc=212∧R($4)≤R($2)≤R($4)+1)∨(pc=216∧R($2)R($4))∨(pc=224∧R($2)=R($4)∧〈R($9)〉 4 = Ec(ι 100 ))∨(pc=232∧R($2)=R($4)∧Decode(M,204,ι 100 )))Figure 10. A global invariant for the opcode example<strong>Code</strong> Heap <strong>Code</strong> Blocks Control Flowf 1 :f 2 :f 3 :Figure 11. Hoare-style reasoning of assembly codeTraditional Hoare-style reasoning over assembly programs(e.g., CAP [33]) is illustrated in Fig 11. Program code is assumedto be stored in a static code heap separated from the main memory.A code heap can be divided into different code blocks (i.e.consecutive instruction sequences) which serve as basic certifyingunits. A precondition is assigned to every code block, whereasno postcondition shows up since we often use CPS (continuationpassing style) to reason about low-level programs. Different blockscan be independently verified then linked together to form a globalinvariant and complete the verification.Here we present a Hoare-logic-based system GCAP0 for GTM.Comparing to the limited usage of existing systems TAL or CAP, asystem working on GTM do have several advantages:First, instruction set and operational semantics come as an integratedpart for a TAL or CAP system, thus are usually fixed andlimited. Although extensions can be made, it costs fair amount ofefforts. On the other hand, GTM abstracts both components out,and a system that directly works on it will be robust and suitablefor different architecture.Also, GTM is more realistic since it has the benefit that programsare encoded and stored in memory as real computing platformdoes. Besides regular assembly code, A GTM-based verificationsystem would be able to manage real binary code.Our generalization is not trivial. Firstly, unifying different typesof instructions (especially between regular command and controltransfer instruction) without loss of usability requires an intrinsicunderstanding of the relation between instructions and programspecifications. Secondly, code is easily guaranteed to be immutablein an abstract machine that separates code heap as an individualcomponent, which GTM is different from. Surprisingly, the samef 1 :f 2 :f 3 :{p 1}{p 2}{p 3}(<strong>Code</strong>Block) B ::=f :I(InstrSeq)(<strong>Code</strong>Heap)(Assertion)I ::=ι;I|ιC ::={fI} ∗a ∈ State→Prop(ProgSpec) Ψ ∈ Address⇀AssertionFigure 12. Auxiliary constructs and specificationsa⇒a ′ ∀S.(aS→a ′ S) a⇔a ′ ∀S.(aS↔a ′ S)¬a λS.¬aS a∧a ′ λS.aS∧a ′ Sa∨a ′ λS.aS∨a ′ S∀x.a λS.∀x.(aS)a→a ′ λS.aS→a ′ S∃x.a λS.∃x.(aS)a∗a ′ λ(M 0 ,R).∃M,M ′ .M 0 =M⊎M ′ ∧a(M,R)∧a ′ (M ′ ,R)whereM⊎M ′ M∪M ′ whenM⊥M ′ , M⊥M ′ dom(M)∩dom(M ′ )=∅Figure 13. Assertion operatorsimmutability can be enforced in the inference rules using a simpleseparation conjunction borrowed from separation logic [14, 30].3.2 Specification LanguageOur specification language is defined in Fig 12. A code blockBis asyntactic unit that represents a sequenceI of instructions, beginningat specific memory addressf. Note that in CAP, we usually insistthat jump instructions can only appear at the end of a code block.This is no longer required in our new system so the division of codeblocks is much more flexible.<strong>The</strong> code heapCis a collection of code blocks that do not overlap,represented by a finite mapping from addresses to instructionsequences. Thus a code block can also be understood as a singletoncode heap. To support Hoare-style reasoning, assertions are definedas predicates over GTM machine states (i.e., via “shallow embedding”).A program specificationΨis a partial function which mapsa memory address to its corresponding assertion, with the intentionto represent the precondition of each code block. Thus,Ψ only hasentries at each location that indicates the beginning of a code block.Fig 13 defines an implication relation and a equivalence relationbetween two assertions (⇒) and also lifts all the standard logicaloperators to the assertion level. Note the difference betweena→a ′anda⇒a ′ : the former is an assertion, while the latter is a proposition!We also define standard separation logic primitives [14, 30]as assertion operators. <strong>The</strong> separating conjunction (∗) of two assertionsholds if they can be satisfied on two separating memory areas(the register file can be shared). Separating implication, emptyheap, or singleton heap can also be defined directly in our metalogic. Solid work has been established on separation logic, whichwe use heavily in our proofs.Lemma 3.3 (Properties for separation logic) Leta,a ′ anda ′′ beassertions, then we have1. For any assertionsaanda ′ , we havea∗a ′ ⇔a ′ ∗a.2. For any assertionsa,a ′ anda ′′ , we have (a∗a ′ )∗a ′′ ⇔a∗(a ′ ∗a ′′ ).3. Given type T, for any assertion function P∈T→ Assertion andassertiona, we have (∃x∈T. P x)∗a⇔∃x∈T.(P x∗a).4. Given type T, for any assertion function P∈T→ Assertion andassertiona, we have (∀x∈T. P x)∗a⇔∀x∈T.(P x∗a).We omit the proof of these properties since they are standard lawsof separation logic [14, 30].5 2007/3/31


⎧⎪⎨blk(f :I) ⎪⎩ λS.Decode(S,f,ι)ifI=ιλS.Decode(S,f,ι)∧(blk(f+|Ec(ι)|:I ′ )S) ifI=ι;I ′blk(C) ∀f∈dom(C).blk(f :C(f))Ψ⇒Ψ ′ ∀f∈dom(Ψ).(Ψ(f)⇒Ψ ′ (f))⎧Ψ 1 (f) iff∈dom(Ψ 1 )\dom(Ψ 2 )⎪⎨(Ψ 1 ∪Ψ 2 )(f) Ψ 2 (f) iff∈dom(Ψ 2 )\dom(Ψ 1 )⎪⎩ Ψ 1 (f)∨Ψ 2 ( f ) iff∈dom(Ψ 1 )∩dom(Ψ 2 )Figure 14. Predefined functionsΨ⊢W (Well-formed World)Ψ⊢C:Ψ (a∗(blk(C)∧blk(pc:I))S Ψ⊢{a}pc:IΨ⊢(S,pc)Ψ⊢C:Ψ ′ (Well-formed <strong>Code</strong> Heap)Ψ 1 ⊢C 1 :Ψ ′ 1 Ψ 2 ⊢C 2 :Ψ ′ 2 dom(C 1 )∩dom(C 2 )=∅Ψ 1 ∪Ψ 2 ⊢C 1 ∪C 2 :Ψ ′ 1 ∪Ψ′ 2Ψ⊢{a}f :IΨ⊢{fI} :{fa} (CDHP)Ψ⊢{a}B (Well-formed <strong>Code</strong> Block)Ψ⊢{a ′ }(f+|Ec(ι)|):I Ψ∪{f+|Ec(ι)|a ′ }⊢{a}f :ιΨ⊢{a}f :ι;I∀S.aS→Ψ(Npc f,ι (S)) (Next f,ι (S))Ψ⊢{a}f :ι(INSTR)Figure 15. Inference rules for GCAP0(PROG)(LINK-C)(SEQ)Fig 14 defines a few important macros:blk(B) holds ifBisstored properly in the memory of the current state;blk(C) holdsif all code blocks in the code heapCare properly stored. We alsodefine two operators between program specifications. We say thatone program specification is stricter than another, namelyΨ⇒Ψ ′ ,if every assertion inΨimplies the corresponding assertion at thesame address inΨ ′ . <strong>The</strong> union of two program specifications is justthe disjunction of the two corresponding assertions at each address.Clearly, any two program specifications are stricter than their unionspecification.3.3 Inference RulesFig 15 presents the inference rules of GCAP0. We give three setsof judgments (from local to global): well-formed code block, wellformedcode heap, and well-formed world.Intuitively, a code block is well-formed (Ψ⊢{a}B) iff, startingfrom a state satisfying its preconditiona, the code block is safe toexecute until it jumps to a location in a state satisfying the specificationΨ.<strong>The</strong> well-formedness of a single instruction (ruleIN-STR) directly follows this understanding. Inductively, to validatethe well-formedness of a code block beginning withιunder preconditiona(ruleSEQ),we should find an intermediate assertiona ′ serving simultaneously as the precondition of the tail code sequence,and the postcondition ofι. In the second premise ofSEQ,since our syntax does not have a postcondition,a ′ is directly fedinto the accompanied specification.Note that for a well-formed code block, even though we haveadded an extra entry to the program specification Ψ when wevalidate each individual instruction, theΨused for validating eachcode block and the tail code sequence remains the same.We can instantiate theINSTR andSEQ rules on each instructionif necessary. For example, specializingINSTR over the direct jump(jf ′ ) results in the following rule:a⇒Ψ(f ′ )Ψ⊢{a}f : jf ′ (J)SpecializingSEQ over the add instruction makesΨ⊢{a ′ }f+4:I Ψ∪{f+4a ′ }⊢{a}f : addr d ,r s ,r tΨ⊢{a}f : addr d ,r s ,r t ;Iwhich viaINSTR can be further reduced intoΨ⊢{a ′ }f+4:I∀(M,R).a (M,R)→a ′ (M,R{r d R(r s )+R(r t )})Ψ⊢{a}f : addr d ,r s ,r t ;I(ADD)Another interesting case is the conditional jump instructions,such as beq, which can be instantiated from ruleSEQ asΨ⊢{a ′ }(f+4):I ∀(M,R).a (M,R)→((R(r s )=R(r t )→Ψ(f+i) (M,R))∧(R(r s )R(r t )→a ′ (M,R)))Ψ⊢{a}f : beqr s ,r t ,i;I(BEQ)<strong>The</strong> instantiated rules are straightforward to understand andconvenient to use. Most importantly, they can be automaticallygenerated directly from the operational semantics for GTM.<strong>The</strong> well-formedness of a code heap (Ψ⊢C:Ψ ′ ) states thatgivenΨ ′ specifying the preconditions of each code block ofC, allthe code inCcan be safely executed with respect to specificationΨ.Here the domain ofCandΨ ′ should always be the same. <strong>The</strong>CDHPrule casts a code block into a corresponding well-formed singletoncode heap, and theLINK-C rule merges two disjoint well-formedcode heaps into a larger one.A world is well-formed with respect to a global specificationΨ(thePROG rule), if• the entire code heap is well-formed with respect toΨ;• the code heap and the current code block is properly stored;• A preconditionais satisfied, separately from the code section;• the instruction sequence is well-formed undera.<strong>The</strong>PROG rule also shows how we use separation conjunction toensure that the whole code heap is indeed in the memory andalways immutable; because assertionacannot refer to the memoryregion occupied byC, and the memory domain never grow duringthe execution of a program, the whole reasoning process below thetop level never involves the code heap region. This guarantees thatno code-modification can happen during the program execution.To verify the safety and correctness of a program, one needs tofirst establish the well-formedness of each code block. All the codeblocks are linked together progressively, resulting in a well-formedglobal code heap where the two accompanied specifications mustmatch. Finally, thePROG rule is used to prove the safety of theinitial world for the program.Soundness and frame rules. <strong>The</strong> soundness of GCAP0 guaranteesthat any well-formed world is safe to execute. Establishing awell-formed world is equivalent to an invariant-based proof of programcorrectness: the accompanied specificationΨ corresponds toa global invariant that the current world satisfies. This is establishedthrough a series of weakening lemmas, then progress and preservationlemmas; frame rules are also easy to prove.Lemma 3.4Ψ⊢C:Ψ ′ if and only if for everyf∈dom(Ψ ′ ) wehavef∈dom(C) andΨ⊢{Ψ ′ (f)}f :C(f).Proof Sketch: For the necessity, prove from inversion of theLINK-6 2007/3/31


C rule, theCDHP rule, and theWEAKEN-BLOCK rule. For the sufficiency,just repetitively apply theLINK-C rule. Lemma 3.5 contains the standard weakening rules and can beproved by induction over the well-formed-code-block rules and byusing Lemma 3.4.Lemma 3.5 (Weakening Properties)Ψ⊢{a}B Ψ⇒Ψ ′ a ′ ⇒aΨ ′ ⊢{a ′ }B(WEAKEN-BLOCK)Ψ 1 ⊢C:Ψ 2 Ψ 1 ⇒Ψ ′ 1 Ψ ′ 2 ⇒Ψ 2Ψ ′ (WEAKEN-CDHP)1 ⊢C:Ψ′ 2Proof:1. By doing induction over theINSTR rule and theSEQ rule, wehave the result.2. Easy to see from 1 and Lemma 3.4. Lemma 3.6 (Progress) IfΨ⊢W, then there exists a programW ′ ,such thatW↦−→W ′ .Proof: By inversion of thePROG rule we know that there is acode sequenceIindeed in the memory starting from the currentpc. Suppose the first instruction ofIisι, then from the rules forwell-formed code blocks, we learn that there existsΨ ′ , such thatΨ ′ ⊢{a}ι.Sinceais satisfied for the current state, by inversion of theINSTR rule, this guarantees that our worldW is safe to be executedfurther for at least one step. Lemma 3.7 (Preservation) IfΨ⊢W andW↦−→W ′ , thenΨ⊢W ′ .Proof Sketch: Suppose the premises of thePROG rule hold. Again,a code blockpc :I is present in the current memory. We onlyneed to find ana ′ andI ′ satisfying (a ′ ∗ (blk(C)∧blk(pc :I ′ ))SandΨ⊢{a ′ }pc:I ′ , when there is aW ′ = (M ′ ,E ′ ,pc ′ ) such thatW↦−→W ′ .<strong>The</strong>re are two cases:1.I=ι is a single-instruction sequence. It would be easy to showthatpc ′ ∈ dom(Ψ) andpc ′ ∈ dom(C). We choosea ′ =Ψ(pc ′ ), andI ′ =C(pc ′ )to satisfy the two conditions.2.I=ι;I ′ is a multi-instruction sequence. <strong>The</strong>n by inversion oftheSEQ rule, either we reduce to the previous case, orpc ′ =pc+|Ec(ι)| and there is ana ′ such thatΨ⊢{a ′ }pc ′ :I ′ . Thusa ′andI ′ are what we choose to satisfy the two conditions. <strong>The</strong>orem 3.8 (Soundness of GCAP0) IfΨ⊢W, then Safe(W).Proof: Define predicate InvλW.(Ψ⊢W). <strong>The</strong>n by Lemma 3.6and Lemma 3.7, Inv is an invariant of GTM. Together with <strong>The</strong>orem3.2, the conclusion holds. <strong>The</strong> following lemma (a.k.a., the frame rule) captures theessence of local reasoning for separation logic:Ψ⊢{a}BLemma 3.9(λf.Ψ(f)∗a ′ )⊢{a∗a ′ }B (FRAME-BLOCK)wherea ′ is independent of every register modified byB.Ψ⊢C:Ψ ′(λf.Ψ(f)∗a)⊢C:λf.Ψ ′ (f)∗a (FRAME-CDHP)whereais independent of every register modified byC.Ψ⊢W (Well-formed World)Ψ⊢(C,Ψ) C ′ ⊆C(a∗(blk(C ′ )∧blk(pc:I)))S Ψ ′ ⊢{a}pc:I∀f∈dom(Ψ ′ ).(Ψ ′ (f)∗(blk(C ′ )∧blk(pc:I))⇒Ψ(f))(PROG-G)Ψ⊢(S,pc)Ψ⊢(C,Ψ ′ )(Well-formed <strong>Code</strong> Specification)Ψ 1 ⊢(C 1 ,Ψ ′ 1 ) Ψ 2⊢(C 2 ,Ψ ′ 2 ) dom(C 1)∩dom(C 2 )=∅Ψ 1 ∪Ψ 2 ⊢(C 1 ∪C 2 ,Ψ ′ 1 ∪Ψ′ 2 )Ψ⊢C:Ψ ′Ψ∗blk(C)⊢(C,Ψ ′ ∗blk(C)) (LIFT)Figure 16. Inference rules for GCAP1(LINK-G)Proof: For the first rule, we need the following important fact ofour GTM specializations:IfM 0 ⊥M 1 andthen we haveM ′ 0 ⊥M 1, andNext pc,ι (M 0 ,E)=(M ′ 0 ,E′ ),Next pc,ι (M 0 ⊎M 1 ,E)=(M ′ 0 ⊎M 1,E ′ )<strong>The</strong>n it is easy to prove by induction over the two well-formed codeblock rules. <strong>The</strong> second rule is thus straightforward. Note that the correctness of this rule relies on the condition we gavein Sec 2 (incorporating extra memory does not affect the programexecution), as also pointed out by Reynolds [30].With theFRAME-BLOCK rule, one can extend a locally certifiedcode block with an extra assertion, given the requirement that thisassertion holds separately in conjunction with the original assertionas well as the specification. Frame rules at different levels will beused as the main tool to divide code and data to solve the SMC issuelater. All the derived rules and the soundness proof have been fullymechanized in Coq [5] and will be used freely in our examples.4. Certifying Runtime <strong>Code</strong> GenerationGCAP1 is a simple extension of GCAP0 to support runtime codegeneration. In the topPROG rule for GCAP0, the preconditiona forthe current code block must not specify memory regions occupiedby the code heap, and all the code must be stored in the memory andremain immutable during the whole execution process. In the caseof runtime code generation, this requirement has to be relaxed sincethe entire code may not be in the memory at the very beginning—some can be generated dynamically!Inference rules. GCAP1 borrows the same definition of wellformedcode heaps and well-formed code blocks as in GCAP0: theyuse the same set of inference rules (see Fig 15). To support runtimecode generation, we change the top rule and insert an extra layerof judgments called well-formed code specification (see Fig 16)between well-formed world and well-formed code heap.If “well-formed code heap” is a static reasoning layer, “wellformedcode specification” is more like a dynamic one. Insidean assertion for a well-formed code heap, no information aboutprogram code is included, since it is implicitly guaranteed by thecode immutability property. For a well-formed code specification,on the other hand, all information about the required program codeshould be provided in the precondition for all code blocks.We use theLIFT rule to transform a well-formed code heapinto a well-formed code specification by attaching the whole codeinformation to the specifications on both sides.LINK-G rule has thesame form asLINK-C, except that it works on the dynamic layer.7 2007/3/31


<strong>The</strong> new top rule (PROG-G) replaces a well-formed code heapwith a well-formed code specification. <strong>The</strong> initial condition is nowweakened! Only the current (locally immutable) code heap with thecurrent code block, rather than the whole code heap, is required tobe in the memory. Also, when proving the well-formedness of thecurrent code block, the current code heap information is strippedfrom the global program specification.Local reasoning. On the dynamic reasoning layer, since code informationis carried with assertions and passed between code modulesall the time, verification of one module usually involves theknowledge of code of another (as precondition). Sometimes, suchknowledge is redundant and breaks local reasoning. Fortunately,a frame rule can be established on the code specification level aswell. We can first locally verify the module, then extend it with theframe rule so that it can be linked with other modules later.Ψ⊢(C,Ψ ′ )Lemma 4.1(λf.Ψ(f)∗a)⊢(C,λf.Ψ ′ (f)∗a) (FRAME-SPEC)whereais independent of any register that is modified byC.Proof: By induction over the derivation forΨ⊢ (C,Ψ ′ ). <strong>The</strong>re areonly two cases: if the final step is done via theLINK-G rule, theconclusion follows immediately from the induction hypothesis; ifthe final step is via theLIFT rule, it must be derived from a wellformed-code-heapderivation:Ψ 0 ⊢C:Ψ ′ 0 (3)withΨ=λf.Ψ 0 (f)∗blk(C) andΨ ′ =λf.Ψ ′ 0 (f)∗blk(C); we firstapply theFRAME-CDHP rule to (3) obtain:(λf.Ψ 0 (f)∗a)⊢C:λf.Ψ ′ 0 (f)∗aand then apply theLIFT rule to get the conclusion.In particular, by setting the above assertionato be the knowledgeabout code not touched by the current module, the code canbe excluded from the local verification.As a more concrete example, suppose that we have two locallycertified code modulesC 1 andC 2 , whereC 2 is generated byC 1 atruntime. We first applyFRAME-SPEC to extendC 2 with assertionblk(C 1 ), which reveals the fact thatC 1 does not change duringthe whole executing process ofC 2 . After this, theLINK-G rule isapplied to link them together into a well-formed code specification.We give more examples about GCAP1 in Section 6.Soundness. <strong>The</strong> soundness of GCAP1 can be established in thesame way as <strong>The</strong>orem 3.8 (see TR [5] for more detail).<strong>The</strong>orem 4.2 (Soundness of GCAP1) IfΨ⊢W, then Safe(W).This theorem can be established following a similar techniqueas in GCAP0 (more cases need to be analyzed because of theadditional layer). However, we decide to delay the proof to the nextsection to give better understanding of the relationship betweenGCAP1 and our more general framework GCAP2. By converting awell-formed GCAP1 program to a well-formed GCAP2 program,we will finally see that GCAP1 is sound given the fact that GCAP2is sound (see <strong>The</strong>orem 5.5 and <strong>The</strong>orem 5.8 in the next section).To verify a program that involves run-time code generation, wefirst establish the well-formedness of each code module (whichnever modifies its own code) using the rules for well-formed codeheap as in GCAP0. We then use the dynamic layer to combine thesecode modules together into a global code specification. Finally weuse the newPROG-G rule to establish the initial state and prove thecorrectness of the entire program.<strong>The</strong> original code:main: la $9, gen # get the target addrli $8, 0xac880000 # load Ec(sw $8,0($4))sw $8, 0($9) # store to genli $8, 0x00800008 # load Ec(jr $4)B 1 sw $8, 4($9) # store to gen+4la $4, ggen # $4 = ggenla $9, main # $9 = main⎧⎪⎨⎪⎩li $8, 0x01200008 # load Ec(jr $9) to $8j gen # jump to targetgen: nop # to be generatednop # to be generatedggen: nop # to be generated<strong>The</strong> generated code:B 2{gen: sw $8,0($4)jr $4B 3 { ggen: jr $9Figure 17. mrcg.s: Multilevel runtime code generation4.1 Example: Multilevel Runtime <strong>Code</strong> GenerationWe use a small example mrcg.s in Fig 17 to demonstrate theusability of GCAP1 on runtime code generation. Our mrcg.s isalready fairly subtle—it does multilevel RCG, which means thatcode generated at runtime may itself generate new code. MultilevelRCG has its practical usage [13]. In this example, the code blockB 1 can generateB 2 (containing two instructions), which will againgenerateB 3 (containing only a single instruction).<strong>The</strong> first step is to verifyB 1 ,B 2 andB 3 respectively and locally,as the following three judgments show:{gena 2 ∗blk(B 2 )}⊢{a 1 }B 1wherea 1 =λS.True,{ggena 3 ∗blk(B 3 )}⊢{a 2 }B 2{maina 1 }⊢{a 3 }B 3a 2 =λ(M,R).R($9)=main∧R($8)=Ec(jr $9)∧R($4)=ggen,a 3 =λ(M,R).R($9)=mainAs we see,B 1 has no requirement for its precondition,B 2 simplyrequires that proper values are stored in the registers $4, $8, and $9,whileB 3 demands that $9 points to the labelmain.All the three judgments are straightforward to establish, bymeans of GCAP1 inference rules (theSEQ rule and theINSTR rule).For example, the pre- and selected intermediate conditions forB 1are as follows:{λS.True}main: la $9, gen{λ(M,R).R($9)=gen}li $8, 0xac880000sw $8, 0($9){(λ(M,R).R($9)=gen)∗blk(gen: sw $8,0($4))}li $8, 0x00800008sw $8, 4($9){blk(B 2 )}la $4, ggenla $9, mainli $8, 0x01200008{a 2 ∗blk(B 2 )}j gen<strong>The</strong> first five instructions generate the body ofB 2 . <strong>The</strong>n, registersare stored with proper values to matchB 2 ’s requirement. Noticethe three li instructions: the encoding for each generated instructionare directly specified as immediate value here.Notice thatblk(B 1 ) has to be satisfied as a precondition ofB 3 sinceB 3 points toB 1 . However, to achieve modularity we donot require it inB 3 ’s local precondition. Instead, we leave thiscondition to be added later via our frame rule.8 2007/3/31


LIFTmainLINK-Ga 1 * blk( 1 )<strong>Code</strong> Heap <strong>Code</strong> Blocks Control Flowf 1 :{p 1} {p 1'}f 1 :Ψ G =FRAME-SPECgenLIFTLINK-Ga 2 * blk( 2 ) * blk( 1 )f 2 :f 2 :{p 2}ggenLIFTa 3 * blk( 3 ) * blk( 1 )f 3 :f 4 :f 3 :f 4 :{p 4}{p 3}Figure 18. mrcg.s: GCAP1 specificationAfter the three code blocks are locally certified, theCDHP ruleand then theLIFT rule are respectively applied to each of them,as illustrated in Fig 18, resulting in three well-formed singletoncode heaps. Afterwards,B 2 andB 3 are linked together and weapplyFRAME-SPEC rule to the resulting code heap, so that it cansuccessfully be linked together with the other code heap, formingthe coherent global well-formed specification (as Fig 18 indicates):Ψ G={maina 1 ∗blk(B 1 ), gena 2 ∗blk(B 2 )∗blk(B 1 ),ggena 3 ∗blk(B 3 )∗blk(B 1 )}which should satisfyΨ G⊢ (C,Ψ G) (whereCstands for the entirecode heap).Now we can finish the last step—applying thePROG-G rule tothe initial world, so that the safety of the whole program is assured.5. Supporting General SMCAlthough GCAP1 is a nice extension to GCAP0, it can hardly beused to certify general SMC. For example, it cannot verify theopcode modification example given in Fig 9 at the end of Sec 2.In fact, GCAP1 will not even allow the same memory region tocontain different runtime instructions.General SMC does not distinguish between code heap and dataheap, therefore poses new challenges: first, at runtime, the instructionsstored at the same memory location may vary from time totime; second, the control flow is much harder to understand and represent;third, it is unclear how to divide a self-modifying programinto code blocks so that they can be reasoned about separately.To tackle these obstacles, we have developed a new verificationsystem GCAP2 supporting general SMCs. Our system is still builtupon our machine model GTM.5.1 Main Development<strong>The</strong> main idea of GCAP2 is illustrated in Fig 19. Again, the potentialruntime code is decomposed into code blocks, representingthe instruction sequences that may possibly be executed. Each codeblock is assigned with a precondition, so that it can be certified individually.Unlike GCAP1, since instructions can be modified, differentruntime code blocks may overlap in memory, even share thesame entry location. Hence if a code block contains a jump instructionto certain memory address (such as tof 1 in Fig 19) at whichseveral blocks start, it is usually not possible to tell statically whichblock it will exactly jump to at runtime. What our system requiresinstead is that whenever the program counter reaches this address(e.g.f 1 in Fig 19), there should exist at least one code block there,whose precondition is matched. After all the code blocks are certified,they can be linked together in a certain way to establish thecorrectness of the program.To support self-modifying features, we relax the requirementsof well-formed code blocks. Specifically, a well-formed code blockFigure 19. Typical Control Flow in GCAP2(ProgSpec) Ψ∈Address⇀Assertion(<strong>Code</strong>Spec)Φ∈<strong>Code</strong>Block⇀AssertionFigure 20. Assertion language for GCAP2now describes an execution sequence of instructions starting at certainmemory address, rather than merely a static instruction sequencecurrently stored in memory. <strong>The</strong>re is no difference betweenthese two understandings under the non-self-modifying circumstancesince the static code always executes as it is, while a fundamentaldifference could appear under the more general SMC cases.<strong>The</strong> new understanding execution code block characterizes betterthe real control flow of the program. Sec 6.9 discusses more aboutthe importance of this generalization.Specification language. <strong>The</strong> specification language is almostsame as GCAP1, but GCAP2 introduces one new concept calledcode specification (denoted asΦin Fig 20), which generalizesthe previous code and specification pair to resolve the problem ofhaving multiple code blocks starting at a single address. A codespecification is a partial function that maps code blocks to theirassertions. When certifying a program, the domain of the globalΦ indicates all the code blocks that can show up at runtime, andthe corresponding assertion of a code block describes its globalprecondition. <strong>The</strong> reader should note that thoughΦis a partialfunction, it can have an infinite domain (indicating that there mightbe an infinite number of possible runtime code blocks).Inference rules. GCAP2 has three sets of judgements (see Fig 21):well-formed world, well-formed code spec, and well-formed codeblock. <strong>The</strong> key idea of GCAP2 is to eliminate the well-formedcode-heaplayer in GCAP1 and push the “dynamic reasoning layer”down inside each code block, even into a single instruction. Interestingly,this makes the rule set of GCAP2 look much like GCAP0rather than GCAP1.<strong>The</strong> inference rules for well-formed code blocks has one tiny butessential difference from GCAP0/GCAP1. A well-formed instruction(INSTR) has one more requirement that the instruction must actuallybe in the proper location of memory. Previously in GCAP1,this is guaranteed by theLIFT rule which adds the whole static codeheap into the preconditions; for GCAP2, it is only required that thecurrent executing instruction be present in memory.Intuitively, the well-formedness of a code blockΨ⊢{a}f :Inowstates that if a machine state satisfies assertiona, thenIis the onlypossible code sequence to be executed starting fromf, until we getto a program point where the specificationΨ can be matched.9 2007/3/31


Ψ⊢W (Well-formed World)Φ⊢Φ aS Φ⊢{a}pc:I(PROG)Φ⊢(S,pc)whereΦλf.∃I.Φ(f :I).Ψ⊢Φ (Well-formed <strong>Code</strong> Specification)Ψ 1 ⊢Φ 1 Ψ 2 ⊢Φ 2 dom(Φ 1 )∩dom(Φ 2 )=∅Ψ 1 ∪Ψ 2 ⊢Φ 1 ∪Φ 2∀B∈dom(Φ).Ψ⊢{ΦB}BΨ⊢ΦΨ⊢{a}B (Well-formed <strong>Code</strong> Block)(CDHP)Ψ⊢{a ′ }f+|Ec(ι)|:I Ψ∪{f+|Ec(ι)|a ′ }⊢{a}f :ιΨ⊢{a}f :ι;I∀S.aS→(Decode(S,f,ι)∧Ψ(Npc f,ι (S)) Next f,ι (S))Ψ⊢{a}f :ιFigure 21. Inference rules for GCAP2(LINK)(SEQ)(INSTR)<strong>The</strong> precondition for a non-self-modifying code blockBmustnow includeBitself, i.e.blk(B). This extra requirement does notcompromise modularity, since the code is already present and canbe easily moved into the precondition. For dynamic code, the initialstored code may differ from the code actually being executed. Anexample is as follows:{blk(100: movi r0,Ec(j 100); sw r0,102)}100: movi r0, Ec(j 100)sw r0, 102j 100<strong>The</strong> third instruction is actually generated by the first two, thus doesnot appear in the precondition. This kind of judgment can never beshown in GCAP1, nor in any previous Hoare-logic models.Note that our generalization does not make the verification moredifficult: as long as the specification and precondition are given,the well-formedness of a code block can be established in the samemechanized way as before.<strong>The</strong> judgmentΨ⊢Φ (well-formed code specification) is fairlycomparable with the corresponding judgment in GCAP1 if wenotice that the pair (C,Ψ) is just a way to represent a more limitedΦ. <strong>The</strong> rules here basically follow the same idea except that theCDHP rule allows universal quantification over code blocks: if everyblock in a code specification’s domain is well-formed with respectto a program specification, then the code specification is wellformedwith respect to the same program specification.<strong>The</strong> interpretation operator− establishes the semantic relationbetween program specifications and code specifications: ittransforms a code specification to a program specification by unitingthe assertions (i.e. doing assertion disjunction) of all blocksstarting at the same address together. In the judgment for wellformedworld (rulePROG), we useΦ as the specification to establishthe well-formed code specification and the current well-formedcode block. We do not need to require the current code block to bestored in memory (as GCAP1 did) since such requirement will bespecified in the assertionaalready.Soundness. <strong>The</strong> soundness proof follows almost the same techniquesas in GCAP0.Firstly, due to the similarity between the rules for well-formedcode blocks, the same weakening lemmas still hold:Lemma 5.1Ψ⊢{a}B Ψ⇒Ψ ′ a ′ ⇒aΨ ′ ⊢{a ′ (WEAKEN-BLOCK)}BΨ⊢Φ Ψ⇒Ψ ′ ∀B∈dom(Φ ′ ).(Φ ′ B⇒ΦB)Ψ ′ ⊢Φ ′(WEAKEN-SPEC)And also we have the relation between well-formed code specificationand well-formed code blocks.Lemma 5.2Ψ⊢Φ if and only if for everyB∈dom(Φ) we haveΨ⊢{ΦB}B.Proof Sketch: For the necessity, prove from inversion ofLINK andCDHP rule, and weaken properties ( Lemma 5.1). <strong>The</strong> sufficiency istrivial byCDHP rule. Lemma 5.3 (Progress) IfΨ⊢W, then there exists a programW ′ ,such thatW↦−→W ′ .Proof Sketch: Easy to see by inversion of thePROG rule, theSEQrule and theINSTR rule successively. Lemma 5.4 (Preservation) IfΨ⊢W andW↦−→W ′ , thenΨ⊢W ′ .Proof Sketch: <strong>The</strong> first premise is already guaranteed. <strong>The</strong> othertwo premises can be verified by checking the two cases of wellformedcode block rules. <strong>The</strong>orem 5.5 (Soundness of GCAP2) IfΨ⊢W, then Safe(W).Local reasoning.local reasoning.Frame rules are still the key idea for supporting<strong>The</strong>orem 5.6 (Frame Rules)Ψ⊢{a}B(λf.Ψ(f)∗a ′ )⊢{a∗a ′ }B (FRAME-BLOCK)wherea ′ is independent of every register modified byB.Ψ⊢Φ(λf.Ψ(f)∗a)⊢(λB.ΦB∗a) (FRAME-SPEC)whereais independent of every register modified by any codeblock in the domain ofΦ.In fact, since we no longer have the static code layer in GCAP2, theframe rules play a more important role in achieving modularity. Forexample, to link two code modules that do not modify each other,we first use the frame rule to feed the code information of the othermodule into each module’s specification and then applyLINK rule.5.2 Example and DiscussionWe can now use GCAP2 to certify the opcode-modification examplegiven in Fig 9. <strong>The</strong>re are four runtime code blocks that needto be handled. Fig 22 shows the formal specification for each codeblock, including both the local version and the global version. NotethatB 1 andB 2 are overlapping in memory, so we cannot just useGCAP1 to certify this example.Locally, we need to make sure that each code block is indeedstored in memory before it can be executed. To executeB 1 andB 4 ,we also require that the memory location at the address new storesa proper instruction (which will be loaded later). On the other hand,sinceB 4 andB 2 can be executed if and only if the branch occurs atmain, they both have the preconditionR($2)=R($4).After verifying all the code blocks based on their local specifications,we can apply the frame rule to establish the extended specifications.As Fig 22 shows, the frame rule is applied to the local judgmentsofB 2 andB 4 , addingblk(B 3 ) on their both sides to form the10 2007/3/31


⎧⎪ main: beq $2, $4, modify⎨B 1⎪ move $2, $4 # B⎩2 starts herej halt{target: addi $2, $2, 1B 2 j haltB 3 { halt: j haltB 4⎧⎪ ⎨⎪ ⎩Letmodify: lwswj$9, new$9, targettargeta 1 blk(new : addi $2,$2,1)∗blk(B 1 )a ′ 1 blk(B 3)∗blk(B 4 )a 2 (λ(M,R).R($2)=R($4))∗blk(B 2 )a 3 (λ(M,R).R($4)≤R($2)≤R($4)+1)a 4 (λ(M,R).R($2)=R($4))∗blk(new : addi $2,$2,1)∗blk(B 1 )<strong>The</strong>n local judgments are as follows{modifya 4 ,halta 3 }⊢{a 1 }B 1{halta 3 }⊢{a 2 }B 2{halta 3 ∗blk(B 3 )}⊢{a 3 ∗blk(B 3 )}B 3{targeta 2 }⊢{a 4 ∗blk(B 4 )}B 4After applying frame rules and linking, the global specification becomes{modifya 4 ∗a ′ 1 ,halta 3∗a ′ 1 }⊢{a 1∗a ′ 1 }B 1{halta 3 ∗blk(B 3 )}⊢{a 2 ∗blk(B 3 )}B 2{halta 3 ∗blk(B 3 )}⊢{a 3 ∗blk(B 3 )}B 3{targeta 2 ∗blk(B 3 )}⊢{a 4 ∗a ′ 1 }B 4Figure 22. opcode.s: <strong>Code</strong> and specificationcorresponding global judgments. And forB 1 ,blk(B 3 )∗blk(B 4 )is added; here the additionalblk(B 4 ) in the specification entry forhalt will be weakened out by theLINK rule (the union of two programspecifications used in theLINK rule is defined in Fig 14).Finally, all these judgments are joined together via theLINK ruleto establish the well-formedness of the global code. This is similarto how we certify code using GCAP1 in the previous section, exceptthat theLIFT process is no longer required here. <strong>The</strong> global codespecification is exactly:Φ G={B 1 a 1 ∗a ′ 1 ,B 2a 2 ∗blk(B 3 ),B 3 a 3 ∗blk(B 3 ),B 4 a 4 ∗a ′ 1 }which satisfiesΦ G⊢Φ Gand ultimately thePROG rule can be successfullyapplied to validate the correctness of opcode.s. Actuallywe have proved not only the type safety of the program but also itspartial correctness, for instance, whenever the program executes tothe linehalt, the assertiona 3 will always hold.Parametric code. In SMC, as mentioned earlier, the number ofcode blocks we need to certify might be infinite. Thus, it is impossibleto enumerate and verify them one by one. To resolve thisissue, we introduce auxiliary variable(s) (i.e. parameters) into thecode body, developing parametric code blocks and, correspondingly,parametric code specifications.Traditional Hoare logic only allows auxiliary variables to appearin the pre- or post-condition of code sequences. In our new framework,by allowing parameters appearing in the code body and itsassertion at the same time, assertions, code body and specificationscan interact with each other. This make our program logic evenmore expressive.One simplest case of parametric code block is as follows:f: li $2, kj haltwith the number k as a parameter. It simply represents a family ofcode blocks where k ranges over all possible natural numbers.<strong>The</strong> code parameters can potentially be anything, e.g., instructions,code locations, or the operands of some instructions. Takinga whole code block or a code heap as parameter may allow us toexpress and prove more interesting applications.Certifying parametric code makes use of the universal quantifierin the ruleCDHP. In the example above we need to prove thejudgment∀k.(Ψ k ⊢{λS.True}f : li $2,k;j halt)whereΨ k (halt)=(λ(M,R).R($2)=k), to guarantee that the parametriccode block is well-formed with respect to the parametricspecificationΨ.Parametric code blocks are not just used in verifying SMC; theycan be used in other circumstances. For example, to prove positionindependent code, i.e. code whose function does not depend on theabsolute memory address where it is stored, we can parameterizethe base address of that code to do the certification. Parametriccode can also improve modularity, for example, by abstracting outcertain code modules as parameters.We will give more examples of parametric code blocks in Sec 6.Expressiveness. <strong>The</strong> following important theorem shows the expressivenessof our GCAP2 system: as long as there exists an invariantfor the safety of a program, GCAP2 can be used to certifyit with a program specification which is equivalent to the invariant.<strong>The</strong>orem 5.7 (Expressiveness of GCAP2) If Inv is an invariant ofGTM, then there is aΨ, such that for any world (S,pc) we haveInv(S,pc)←→ ((Ψ⊢(S,pc))∧(Ψ(pc)S)).Proof: We give a way to constructΨfrom Inv:Φ(f :ι)=λS.Inv(S,f)∧Decode(S,f,ι), ∀f,ιΨ=Φ=λf.λS.Inv(S,f)We first prove the forward direction. Given a program (S,pc)satisfying Inv(S,pc), we want to show that there existsasuch thatΨ⊢(S,pc).Observe the fact that for allfandS,Ψ(f)S←→∃I.Φ(f :I)S←→∃ι.Inv(S,f)∧Decode(S,f,ι)←→ Inv(S,f)∧∃ι.Decode(S,f,ι)←→ Inv(S,f)where the last equivalence is due to the Progress property of Invtogether with the transition relation of GTM.Now we prove the first premise ofPROG, i.e.Φ⊢Φ, which byruleCDHP, is equivalent towhich iswhich can be rewritten as∀B∈dom(Φ).(Φ⊢{ΦB}B)∀f,ι.(Ψ⊢{Φ(f :ι)}f :ι) (4)∀f,ι.(λf.λS ′ .Inv(S ′ ,f)⊢{λS ′ .Inv(S ′ ,f)∧Decode(S ′ ,f,ι)}f :ι)(5)On the other hand, since Inv is an invariant, it should satisfy theprogress and preservation properties. ApplyingINSTR rule, (5) canbe easily proved. <strong>The</strong>refore,Φ⊢Φ.Now, by the progress property of Inv, there should exist a validinstructionιsuch thatDecode(S,f,ι) holds. ChooseaλS ′ .Inv(S ′ ,pc)∧Decode(S ′ ,f,ι)11 2007/3/31


B 1{r DL = 0x80∧D(512)=Ec(jmp−2)∗blk(B 1 )}bootld: movw $0, %bx # can not use ESmovw %bx,%es # kernel segmentmovw $0x1000,%bx # kernel offsetmovb $1, %al # num sectorsmovb $2, %ah # disk read commandmovb $0, %ch # starting cylindermovb $2, %cl # starting sectormovb $0, %dh # starting headint $0x13 # call BIOS⎧⎪⎨⎪⎩ljmp $0,$0x1000 # jump to kernel0x7c000x1000jmp 0x1000load kernelkernel(in mem)Copied by BIOSbefore start-upstart executingherecopied bybootloaderkernelSector 2bootloaderSector 1{blk(B 2 )}B 2 { kernel: jmp $-2 # loopFigure 23. bootloader.s: <strong>Code</strong> and specification<strong>The</strong>n it’s obvious thataSis true. And since the correctness ofΦ⊢{a}pc:ι is included in (4), all the premises in ruleSTATE hasbeen satisfied.So we getΨ⊢(S,pc), and it’s trivial to see thatΨ(pc)S.On the other hand, for ourΨ, ifΨ(pc)S, obvious Inv holds onthe world (S,pc). This completes our proof. Together with the soundness theorem (<strong>The</strong>orem 5.5), we haveshowed that there is a correspondence relation between a globalprogram specification and a global invariant for any program.It should also come as no surprise that any program certified underGCAP1 can always be translated into GCAP2. In fact, the judgmentsof GCAP1 and GCAP2 have very close connections. <strong>The</strong>translation relation is described by the following theorems. Herewe use⊢ GCAP1and⊢ GCAP2to represent the two kinds of judgementsrespectively. Letthen we have({f i I i } ∗ ,{f i a i } ∗ ){(f i :I i )a 1 } ∗<strong>The</strong>orem 5.8 (GCAP1 to GCAP2)1. IfΨ⊢ GCAP0{a}B, then (λf.Ψ(f)∗blk(B))⊢ GCAP2{a∗blk(B)}B.2. IfΨ⊢ GCAP1(C,Ψ ′ ), thenΨ⊢ GCAP2(C,Ψ ′ ).3. IfΨ⊢ GCAP1W, thenΨ⊢ GCAP2W.Proof:1. Ifa∗blk(B), By frame rule of CAP, we knowblk(B) isalways satisfied and not modified during the execution ofB. Thusthe additional premise ofINSTR is always satisfied. <strong>The</strong>n its’ easyto get the conclusion by doing induction over the length of the codeblock.2. With the help of 1, we see that ifΨ⊢ GCAP1(C,Ψ ′ ), then for everyf∈dom(Ψ′ ), we havef∈dom(C) andΨ⊢ GCAP2{Ψ ′ (f)}f :C(f).<strong>The</strong>n the result is obvious.3. Directly derived from 2. Finally, as promised, from <strong>The</strong>orem 5.8 and the soundness ofGCAP2 we directly see the soundness of GCAP1 (<strong>The</strong>orem 4.2).6. More Examples and ApplicationsWe show the certification of a number of representative examplesand applications using GCAP (see Table 1 in Sec 1).6.1 A <strong>Certified</strong> OS Boot LoaderAn OS boot loader is a simple, yet prevalent application of runtimecode loading. It is an artifact of a limited bootstrapping protocol,but one that continues to exist to this day. <strong>The</strong> limitation on anx86 architecture is that the BIOS of the machine will only load thefirst 512 bytes of code into main memory for execution. <strong>The</strong> bootMemoryFigure 24. A typical boot loaderloader is the code contained in those bytes that will load the restof the OS from the disk into main memory, and begin executingthe OS (Fig 24). <strong>The</strong>refore certifying a boot loader is an importantpiece of a complete OS certification.To show that we support a real boot loader, we have created onethat runs on the Bochs simulator[19] and on a real x86 machine.<strong>The</strong> code (Fig 23) is very simple, it sets up the registers needed tomake a BIOS call to read the hard disk into the correct memorylocation, then makes the call to actually read the disk, then jumpsto loaded memory.<strong>The</strong> specifications of the boot loader are also simple.r DL =0x80 makes sure that the number of the disk is given to the bootloader by the hardware. <strong>The</strong> value is passed unaltered to the int instruction,and is needed for that BIOS call to read from the correctdisk.D(512)=Ec(jmp−2) makes sure that the disk actually containsa kernel with specific code. <strong>The</strong> code itself is not important,but the entry point into this code needs to be verifiable under a trivialprecondition, namely that the kernel is loaded. <strong>The</strong> code itselfcan be changed. <strong>The</strong> boot loader proof will not change if the codechanges, as it simply relies on the proof that the kernel code is certified.<strong>The</strong> assertionblk(B 1 ) just says that boot loader is in memorywhen executed.6.2 Control Flow Modification<strong>The</strong> fundamental operation unit of self-modification mechanismis the modification of single instructions. <strong>The</strong>re are several possibilitiesof modifying an instruction. Opcode modification, whichmutates an operation instruction (i.e. instruction that does notjump) into another operation instruction, has already been shown inSec 5.2 as a typical example. This kind of code is relatively easierto deal with since it does not change the control flow of a program.Another case which looks trickier is control flow modification.In this scenario, an operation instruction into a control transferinstruction, or the opposite way, and we can also modify a controltransfer instruction into another.Control flow modification is useful in patching of subroutinecall address. And we give an example here that demonstrates controlflow modification, and show how to verify it.halt: j haltmain: la $9, endlw $8, 0($9)addi $8, $8, -1addi $8, $8, -2sw $8, 0($9)end: j deaddead:<strong>The</strong> entry point of the program is themain line. At the first look,the code seems “deadly” because there is a jump to the dead line,which does not contain a valid instruction. But fortunately, beforethe program executes to that line, the instruction at it will be fetchedDisk12 2007/3/31


{blk(main : la $9,end;I 1 ;jdead)}main: la $9, endlw $8, 0($9)addi $8, $8, -1addi $8, $8, -2sw $8, 0($9)j mid2{(λ(M,R).〈R($8)〉 4 =Ec(jmid2)∧R($9)=end)∗blk(mid2 :I 2 ;jmid2)}mid2: addi $8, $8, -2sw $8, 0($9)j mid1{(λ(M,R).〈R($8)〉 4 =Ec(jmid1)∧R($9)=end)∗blk(mid1 :I 1 ;jmid1)}mid1: lw $8, 0($9)addi $8, $8, -1addi $8, $8, -2sw $8, 0($9)j halt{blk(halt : jhalt)}halt: j haltwhereI 1 = lw $8,0($9);addi $8,$8,−1;I 2 ,I 2 = addi $8,$8,−2;sw $8,0($9)Figure 25. Control flow modificationout, subtracted by 3 and stored back, so that the jump instruction onend now points to the second addi instruction. <strong>The</strong>n the programwill continue executing from that line, until finally hit thehalt lineand loops forever.Our specification for the program is described in Fig 25. <strong>The</strong>reare four code blocks that needs to be certified. <strong>The</strong>n executionprocess is now very clear.6.3 Runtime <strong>Code</strong> Generation and <strong>Code</strong> OptimizationRuntime code generation (also known as dynamic code generation),which dynamically creates executable code for later use duringexecution of a program[16], is probably the broadest usage ofSMC. <strong>The</strong> code generated at runtime usually have better performancethan static code in several aspects, because more informationwould be available at runtime. For example, if the value of avariable doesn’t change during runtime, program can then generatespecialized code with the variable as an immediate constant at runtime,which enables code optimizations such as pre-calculation anddead code elimination.[28] gave plenty of utilization of run-time code generation. <strong>The</strong>certification of all those applications can be fit into our system.Here we show the certification of one in them. We demonstrate theruntime code generation version of a fundamental operation of matrixmultiplication – vector dot product, which was also mentionedin [31].Efficient matrix multiplication is important to digital signal processingand scientific computation problems. Sometimes the matricesmight have certain characteristics such as sparseness(largenumber of zeros) or small integers. <strong>The</strong> use of runtime code generationallows these characteristics to be specialized on locally optimizedcode for each row of the matrix, based on the actual values.Since code for each row is specified once but used n times by theother matrix(one for each column), the performance improved easilysurpasses the cost caused by the generation of code at runtime.<strong>The</strong> code shown in Fig 26 (which includes the specification)will read the vector v 1 stored at vec1, generate specialized code atgen with elimination of multiplication by 0, and then run the codeatgen to compute the dot product of v 1 and v 2 , and store it into the.data # Data declaration sectionvec1: .word 22, 0, 25vec2: .word 7, 429, 6result: .word 0tpl: li $2, 0 # template codelw $13, 0($4)li $12, 0mul $12, $12, $13add $2, $2, $12jr $31.text # <strong>Code</strong> section# {True}main: li $4, 3 # vector sizeli $8, 0 # counterla $9, gen # target addressla $11, tpl # template addresslw $12, 0($11)sw $12, 0($9) # copy the 1st instraddi $9, $9, 4# {p(R($8))}loop: beq $8, $4, postli $13, 4mul $13, $13, $8lw $10, vec1($13)beqz $10, next # skip if zerolw $12, 4($11)add $12, $12, $13sw $12, 0($9)lw $12, 8($11)add $12, $12, $10sw $12, 4($9)lw $12, 12($11)sw $12, 8($9)lw $12, 16($11)sw $12, 12($9)addi $9, $9, 16# {p(R($8)+1)}next: addi $8, $8, 1j loop# {p(3)}post: lw $12, 20($11)sw $12, 0($9)la $4, vec2jal gen# {λ(M,R).R($2)=v 1· v 2 }sw $2, resultj maingen:Figure 26. Vector Dot Product{ lw $13,k($4);li $12,u;I k,u mul $12,$12,$13;add $2,$2,$12, if u>0,∅, if u=0.p(k)λ(M,R).R($4)=3∧k≤R($4)∧R($9)=gen+4+16k∧blk(gen: li $2,0;I 0,M(vec1+0) ;...;I k−1,M(vec1+k−1) ) (M,R)Figure 27. Dot Product Assertion Macromemory address result. For vector (22,0,25), the generated codewould be as follows:Since this program does not involve code-modification, we useGCAP1 to certify it. Each code block can be verified separatelywith the assertions we gave in Fig 26, Fig 27. <strong>The</strong> main part of theprogram can be directly linked and certified to form a well-formedcode heap. On the other hand, the generated code, as in Fig 28, can13 2007/3/31


# {R($31)=back∧R($4)=vec2}gen: li $2, 0 # int gen(int *v)lw $13, 0($4) # {li $12, 22 # int res = 0;mul $12, $12, $13 # res += 22 * v[0];add $2, $2, $12 # res += 25 * v[2];lw $13, 8($4) # return res;li $12, 25 # }mul $12, $12, $13add $2, $2, $12jr $31Figure 28. Generated <strong>Code</strong> of Vector Dot Product# {True}main:jal f# {λ(M,R).R($8)=42}move $2, $8j halt# {(λ(M,R).R($31)=main+4)∗blk(addr : jalf)∗blk(main : jalf)}f: li $8, 42lw $9, -4($31)lw $10, addrbne $9, $10, haltjr $31# {R($2)=R($8)=42}halt: j haltaddr:jal fFigure 29. Runtime code-checkingbe verified on its own. After these are all done, we use theFRAME-SPEC rule and theLINK rule to merge the two parts together, whichcompletes the whole verification. <strong>The</strong> modularity of our systemguarantees that, the verification of the generated code (Fig 28 doesnot need any information about its “mother code”.6.4 Runtime <strong>Code</strong> CheckingAs we mentioned, the limitation of previous verification frameworkis not only about self-modifying code. <strong>The</strong>y can also not deal with“self-reading code” that does not do any code modification.<strong>The</strong> example below is a program where reading the code ofitself helps the decision of the program executing. We verify it withGCAP1.Here is an interesting application of code-reading programs. Aswe know, there is a subroutine call instruction in most architectures(jal in MIPS orcall in x86). <strong>The</strong> call instruction in essentially acomposition of two operations: to first store the address of the nextinstruction to the return address register ($31 in MIPS), and thenjump to the target address. When the subroutine finishes executing,hopefully, the return address register should point to the next instructionin the parent program, so that the program can continueexecution normally. However, this is not guaranteed, because a viciousparent program can store a bad address into the return addressregister and jump to the subroutine. <strong>The</strong> code in Fig 29 shows howto avoid this kind of cheating by runtime code-checking mechanism.<strong>The</strong> main program calls f as a subroutine. After the subroutinehave done its job (storing 42 into register $8), and before return, itfirst checks whether the instruction previous to the return addressregister ($31) is indeed a jal (or call) instruction. If not so, it simplyhalts the program; otherwise, return..datanum: .byte 8.text# Data declaration section# <strong>Code</strong> sectionmain: lw $4, num # set argumentlw $9, key # $9 = Ec(add $2,$2,0)li $8, 1 # counterli $2, 1 # accumulatorloop: beq $8, $4, halt # check if doneaddi $8, $8, 1 # inc counteradd $10, $9, $2 # new instr to putkey: addi $2, $2, 0 # accumulatesw $10, key # store new instrj loop # next roundhalt: j haltFigure 30. fib.s: Fibonacci number{blk(B 1 )}main: lw $4, numlw $9, keyB 1⎧⎪⎨⎪⎩li $8, 1li $2, 1{(λ(M,R).R($4)=M(num)∧R($9)=Ec(addi $2,$2,0)∧R($8)=k+1∧R($2)=fib(k+1))∗blk(B 2,k )}loop: beq $8, $4, haltaddi $8, $8, 1add $10, $9, $2B 2,k key: addi $2, $2, fib(k)sw $10, key⎧⎪⎨⎪⎩j loop{(λ(M,R).R($2)=fib(M(num)))∗blk(B 3 )}B 3 { halt: j haltFigure 32. fib.s: <strong>Code</strong> and specification<strong>The</strong> precondition of f asserts the return address, and both addrand main store the jal f instruction, so that the code block cancheck the equality of these two memory location. Note that theprecondition of main does not specify the content of addr sincethis can be added later via the frame rule.6.5 Fibonacci Number and Parametric <strong>Code</strong>To demonstrate the usage of parametric code, we construct an examplefib.sto calculate the Fibonacci function fib(0)=0, fib(1)=1, fib(i+2)=fib(i)+fib(i+1), shown in Fig 30. More specifically,fib.s will calculate fib(M(num)) which is fib(8)=21 and store itinto register $2.It looks strange that this is possible since throughout the wholeprogram, the only instructions that write $2 is the fourth instructionwhich assigns 1 to it and the linekey which does nothing.<strong>The</strong> main trick, of course, comes from the code-modificationinstruction on the line next to key. In fact, the third operand ofthe addi instruction on the line key alters to the next Fibonaccinumber (temporarily calculated and stored in register $10 beforethe instruction modification) during every loop. Fig 31 illustratesthe complete execution process.Since the opcode of the line key would have an unboundednumber of runtime values, we need to seek help from parametriccode blocks. <strong>The</strong> formal specifications for each code block isshown in Fig 32. We specify the program using three code blocks,where the second block—the kernel loop of our programB 2,k —is a14 2007/3/31


main: lw $4, numlw $9, keyli $8, 1li $2, 1B 1loop:{$8=1,$2=1,$4=8}beq $8, $4, haltaddi $8, $8, 1add $10, $9, $2addi $2, $2, 0sw $10, keyj loopB 2,0 B 2,1 B 2,6B 2,7{$8=2,$2=1,$4=8}beq $8, $4, haltaddi $8, $8, 1add $10, $9, $2addi $2, $2, 1sw $10, keyj loop...{$8=7,$2=13,$4=8}beq $8, $4, haltaddi $8, $8, 1add $10, $9, $2addi $2, $2, 8sw $10, keyj loop{$8=8,$2=21,$4=8}beq $8, $4, haltaddi $8, $8, 1add $10, $9, $2addi $2, $2, 13sw $10, keyj loopwhere fib(0)=0,fib(1)=1,fib(i+2)=fib(i)+fib(i+1).halt:{$2=fib(r8)=21}j haltB 3Figure 31. fib.s: Control flowmain: la $8, loopla $9, newmove $10, $9loop: lw $11, 0($8)sw $11, 0($9)addi $8, $8, 4addi $9, $9, 4bne $8, $10, loopmove $10, $9new:Figure 33. selfgrow.s: <strong>Self</strong>-growing <strong>Code</strong>...l w . . .. . . . . .m o v e . ....l w . . .. . . . . .m o v e . .l w . . .. . . . . .m o v e . ....l w . . .. . . . . .m o v e . .l w . . .. . . . . .m o v e . .l w . . .. . . . . .m o v e . .l w . . .. . . . . .m o v e . .l w . . .. . . . . .m o v e . .l w . . .. . . . . .m o v e . .l w . . .. . . . . .m o v e . .. . . . . .. . . . . .parametric one. <strong>The</strong> parameter k appears in the operand of the keyinstruction as an argument of the Fibonacci function.Consider the execution of code blockB 2,k . Before it is executed,$9 stores Ec(addi $2,$2,0), and $2 stores fib(k+1). <strong>The</strong>refore,the execution of the third instruction addi $10,$9,$2 changes$10 into Ec(addi $2,$2,fib(k+1)) 1 , so at the end of the loop,$2 is now fib(k+1)+fib(k)=fib(k+2), and key has the instructionaddi $2,$2,fib(k+1) instead of addi $2,$2,fib(k), then the programcontinues to the next loopB 2,k+1 .<strong>The</strong> global code specification we finally get is as follows :Φ{B 1 a 1 ,B 2k a 2k | k∈Nat,B 3 a 3 } (6)But note that this formulation is just for readability; it is not directlyexpressible in our meta logic. To express parameterized codeblocks, we need to use existential quantifiers. <strong>The</strong> example is in factrepresented asΦλB.λS.(B=B 1 ∧a 1 S)∨(∃k.B=B 2k ∧a 2k S)∨(B=B 3 ∧a 3 S). One can easily see the equivalence between thisdefinition and (6).6.6 <strong>Self</strong> ReplicationCombining self-reading code with runtime code generation, wecan produce self-growing program, which keeps replicating itselfforever. This kind of code appears commonly in Core War—a gamewhere different people write assembly programs that attack theother programs. Our demo code selfgrow.s is shown in Fig 33.After initializing the registers, the code repeatedly duplicates itselfand continue to execute the new copy , as Fig 34 shows .<strong>The</strong> block starting at loop is the code body that keeps beingduplicated. During the execution, this part is copied to the new location.<strong>The</strong>n the program continues executing from new, until anothercode block is duplicated. Note that here we rely on the propertythat instruction encodings for branches use relative addressing,thus every time our code is duplicated, the target address of the bneinstruction would change accordingly.1 To simplify the case, we assume the encoding of addi instruction has alinear relationship with respect to its numerical operand in this example.Figure 34. selfgrow.s: <strong>Self</strong>-growing process⎧{blk(B 0 )}⎪ main: la $8, loop⎨B 0⎪ la $9, new⎩move $10, $9{(λ(M,R).∃0≤i


{blk(B 1 )}B 1 { main: j alter{(blk(B 1 )∗blk(B 2 )}alter: lw $8, mainli $9, 0B 2 sw $9, main⎧⎪⎨⎪⎩j main{{(λ(M,R).〈R($8)〉 4 =Ec(jalter))∗blk(B 3 )}main: nopB 3 sw $8, alter{blk(B 4 )}B 4 { alter: jalterFigure 36. Specification - Mutual Modificationthe “child code” but not the opposite. If both two modules operateon each other, the situation becomes trickier. But with GCAP2, thiscan be solved without too much difference.Here is a mini example showing what can happen under thiscircumstance.main: j altersw $8, alteralter: lw $8, mainli $9, 0sw $9, mainj main<strong>The</strong> main program will firstly jump to the alter block, andafter it modified the instruction at main and transfer the controlback, the main block will modify the alter instruction back,which makes the program loops forever.<strong>The</strong> specification for this tangly is fairly simple, as shown inFig 36. <strong>The</strong> actual number of code blocks that needs to be certifiedis 4, although there are only two starting addresses. Both thepreconditions and the code body for different code blocks at thesame starting address differs completely. For example, the first timethe program enters main, it only requiresB 1 itself, while when thesecond time it enters, the code body changes toB 3 and the preconditionrequires that register $8 stores the correct value. In fact, bycertifying this example, the control flow becomes much clearer tothe readers.6.8 Polymorphic <strong>Code</strong>polymorphic code is code that mutates itself while keeps the algorithmequivalent. It is commonly used in the writing of computervirus and trojans who want to hide their presence for certain purpose.Since anti-virus software usually identify virus by recognizingparticular byte patterns in the memory, the technique of polymorphiccode has been a way to combat against this.<strong>The</strong> following oscillating code is a simple demo example. Everytime it executes through, content of the code would switch (betweenB 1 andB 2 ), but its function is always to add 42 to register $2.# B 0 =main: la $10, body# B 1 = # B 2 =body: lw $8, 12($10) lw $8, 12($10)lw $9, 16($10) lw $9, 16($10)sw $9, 12($10) sw $8, 16($10)j dead addi $2, 21addi $2, 21 j deadsw $8, 16($10) sw $9, 12($10)lw $9, 8($10) lw $9, 8($10)lw $8, 20($10) lw $8, 20($10)sw $9, 20($10) sw $9, 20($10)sw $8, 8($10) sw $8, 8($10)j body j body{blk(B 0 )}main: la $10, body{blk(B 1 )} {blk(B 2 )}body: lw $8, 12($10) lw $8, 12($10)lw $9, 16($10) lw $9, 16($10)sw $9, 12($10) sw $8, 16($10)addi $2, 21 addi $2, 21addi $2, 21 addi $2, 21sw $8, 16($10) sw $9, 12($10)lw $9, 8($10) lw $9, 8($10)lw $8, 20($10) lw $8, 20($10)sw $9, 20($10) sw $9, 20($10)sw $8, 8($10) sw $8, 8($10)j body j bodyFigure 37. Specification - Mutual Modification<strong>The</strong> safety is not obvious as there is an illegal instruction j deadin our program. Since the code blocks that need to be certifiedoverlap, we use GCAP2 to prove it. <strong>The</strong> specification is showedin Fig 37. Note the tiny difference between the code above andthe code in Fig 37. This difference demonstrates the different understandingbetween “stored” code blocks and “executing” codeblocks, and makes our verification work.6.9 <strong>Code</strong> Obfuscation<strong>The</strong> purpose of code obfuscation is to confuse debuggers. Althoughthere are many other approaches that do code obfuscation withoutinvolving SMC such as control-flow based obfuscation, SMC is amuch more effective way due to its hardness for understanding.By instrumenting a big amount of redundant, garbage SMC intonormal code, plenty of memory reading and writing instructionswill considerably confuse the disassembler. <strong>The</strong> method mentionedin [15] is a typical attempt. In PLDI’06’s tutorial, Bjorn De Sutteralso demonstrated their Diablo as a binary rewriting program toinstrument self-modifying code for code-obfuscation.To certify a SMC-obfuscated program, traditional verificationtechniques becomes extremely tough. At the same, the reasoningof this kind of code appears fairly simple under our framework.Take the following code as example:B 1⎧⎪⎨⎪⎩main: la $8, glw $9, 0($8)addi $10, $9, 4sw $10, glw $11, hg: sw $9, 0($8)h: j deadsw $11, hj main<strong>The</strong> program does several weird memory loading and storing,and seems that there is a dead instruction, while actually it doesnothing. <strong>The</strong> line g and h would change during the execution, andbe changed back afterwards. Thus the code looks intact after oneround of execution. If we remove the last line, the piece of codecan be instrumented into regular program to fool the debugger.In our GCAP2 system, such code can be verified with a singlejudgment, as simple as Fig 38 shows.One observation is that this program can also be certified withthe more traditional-like system GCAP1. But in contrast, sinceGCAP1, like traditional reasoning, doesn’t allow a code block tomutate any of its own code, this program has to be divided intoseveral smaller intact parts to certify. However, this approach hasseveral disadvantages. Firstly, the code blocks are divided too smalland too difficult to manipulate. Secondly, this approach would fail16 2007/3/31


{blk(B 1 )}main: la $8, glw $9, 0($8)addi $10, $9, 4sw $10, glw $11, hsw $9, 4($8)sw $9, 0($8)sw $11, hj mainFigure 38. Specification - Obfuscationstart:<strong>Code</strong> Heapdecrypterencryptedmain code...0x12c5fa930x229ab1d80x98d93bc30xcc693de80x127f983b0x907e6d64...decrypting...start:body:<strong>Code</strong> Heapdecrypterdecryptedmain code...beq $8, $9, 8nopli $2, 1li $3, 2add $2, $2, $3li $4, 5...# {True}main: la $8, pgla $9, pgendli $10, 0xffffffff# {(λ(M,R).(pg≤R($8)


modifying techniques in shellcoding have nothing new than previousexamples and can be certified without surprise.7. ImplementationWe use the Coq proof assistant [32], which comes with an expressiveunderlying logic, to implement our full system, including theformal construction of GTM,M MIPS ,M x86 , GCAP0, GCAP1, andGCAP2, as well as the proof of soundness and related properties ofall the three systems. <strong>The</strong> assertion language as well as the basicproperties of separation logic are also implemented and proved.We have certified several typical examples, including a completeOS boot loader written in x86 for the demonstration ofGCAP1 and the Fibonacci example in MIPS for the use of GCAP2.<strong>The</strong> system can be immediately used for more realistic andtheoretical applications.8. Related Work and ConclusionPrevious assembly code certification systems (e.g., TAL [23],FPCC [1, 10], and CAP [33, 25]) all treat code separately fromdata memory, so that only immutable code is supported. Appel etal [2, 22] described a model that treats machine instructions asdata in von Neumann style; they raised the verification of runtimecode generation as an open problem, but did not provide a solution.TALT [6] also implemented the von Neumann machine modelwhere code is stored in memory, but it still does not support SMC.TAL/T [13, 31] is a typed assembly language that provides somelimited capabilities for manipulating code at runtime. TAL/T codeis compiled from Cyclone—a type safe C subset with extra supportfor template-based runtime code generation. However, since thecode generation is implemented by specific macro instructions, itdoes not support any code modification at runtime.Otherwise there was actually very little work done on the certificationof self-modifying code in the past. Previous program verificationsystems—including Hoare logic, type system, and proofcarryingcode [24]—consistently maintain the assumption that programcode stored in memory is immutable.We have developed a simple Hoare-style framework for modularlyverifying general von Neumann machine programs, withstrong support for self-modifying code. By statically specifying andreasoning about the possible runtime code sequences, we can nowsuccessfully verify arbitrary runtime code modification and/or generation.Taking a unified view of code and data has given us somesurprising benefits: we can now apply separation logic to supportlocal reasoning on both program code and regular data structures.AcknowledgmentWe thank Xinyu Feng, Zhaozhong Ni, Hai Fang, and anonymousreferees for their suggestions and comments on an earlier versionof this paper. Hongxu Cai’s research is supported in part by theNational Natural Science Foundation of China under Grant No.60553001 and by the National Basic Research Program of Chinaunder Grants No. 2007CB807900 and No. 2007CB807901. ZhongShao and Alexander Vaynberg’s research is based on work supportedin part by gifts from Intel and Microsoft, and NSF grantCCR-0524545. Any opinions, findings, and conclusions containedin this document are those of the authors and do not reflect theviews of these agencies.References[1] A. W. Appel. Foundational proof-carrying code. In Proc. 16th IEEESymp. on Logic in Computer Science, pages 247–258, June 2001.[2] A. W. Appel and A. P. Felty. A semantic model of types and machineinstructions for proof-carrying code. In Proc. 27th ACM Symposiumon Principles of Programming Languages, pages 243–253, Jan. 2000.[3] D. Aucsmith. Tamper resistant software: An implementation. InProceedings of the First International Workshop on InformationHiding, pages 317–333, London, UK, 1996. Springer-Verlag.[4] V. Bala, E. Duesterwald, and S. Banerjia. Dynamo: a transparentdynamic optimization system. In Proc. 2000 ACM Conf. on Prog.Lang. Design and Implementation, pages 1–12, 2000.[5] H. Cai, Z. Shao, and A. Vaynberg. <strong>Certified</strong> self-modifyingcode (extended version & coq implementation). Technical ReportYALEU/DCS/TR-1379, <strong>Yale</strong> Univ., Dept. of Computer Science, Mar.2007. http://flint.cs.yale.edu/publications/smc.html.[6] K. Crary. Toward a foundational typed assembly language. In Proc.30th ACM Symposium on Principles of Programming Languages,pages 198–212, Jan. 2003.[7] S. Debray and W. Evans. Profile-guided code compression. InProceedings of the 2002 ACM Conference on Programming LanguageDesign and Implementation, pages 95–105, New York, NY, 2002.ACM Press.[8] R. W. Floyd. Assigning meaning to programs. Communications ofthe ACM, Oct. 1967.[9] N. Glew and G. Morrisett. Type-safe linking and modular assemblylanguage. In Proc. 26th ACM Symposium on Principles of ProgrammingLanguages, pages 250–261, Jan. 1999.[10] N. A. Hamid, Z. Shao, V. Trifonov, S. Monnier, and Z. Ni. A syntacticapproach to foundational proof-carrying code. In Proc. 17th AnnualIEEE Symp. on Logic in Computer Science, pages 89–100, July 2002.[11] G. M. Henry. Flexible high-performance matrix multiply via a selfmodifyingruntime code. Technical Report TR-2001-46, Departmentof Computer Sciences, <strong>The</strong> <strong>University</strong> of Texas at Austin, Dec. 2001.[12] C. A. R. Hoare. Proof of a program: FIND. Communications of theACM, Jan. 1971.[13] L. Hornof and T. Jim. Certifying compilation and run-time codegeneration. Higher Order Symbol. Comput., 12(4):337–375, 1999.[14] S. S. Ishtiaq and P. W. O’Hearn. BI as an assertion language formutable data structures. In Proc. 28th ACM Symposium on Principlesof Programming Languages, pages 14–26, 2001.[15] Y. Kanzaki, A. Monden, M. Nakamura, and K. ichi Matsumoto.Exploiting self-modification mechanism for program protection. InCOMPSAC ’03, page 170, 2003.[16] D. Keppel, S. J. Eggers, and R. R. Henry. A case for runtimecode generation. Technical Report UWCSE 91-11-04, <strong>University</strong> ofWashington, November 1991.[17] L. Lamport. <strong>The</strong> temporal logic of actions. ACM Transactions onProgramming Languages and Systems, 16(3):872–923, May 1994.[18] J. Larus. SPIM: a MIPS32 simulator. v7.3, 2006.[19] K. Lawton. BOCHS: IA-32 emulator project. v2.3, 2006.[20] P. Lee and M. Leone. Optimizing ML with run-time code generation.In Proc. 1996 ACM Conf. on Prog. Lang. Design and Implementation,pages 137–148. ACM Press, 1996.[21] H. Massalin. Synthesis: An Efficient Implementation of FundamentalOperating System Services. PhD thesis, Columbia <strong>University</strong>, 1992.[22] N. G. Michael and A. W. Appel. Machine instruction syntax andsemantics in higher order logic. In International Conference onAutomated Deduction, pages 7–24, 2000.[23] G. Morrisett, D. Walker, K. Crary, and N. Glew. From system Fto typed assembly language. ACM Transactions on ProgrammingLanguages and Systems, 21(3):527–568, 1999.[24] G. Necula. Proof-carrying code. In Proc. 24th ACM Symposium onPrinciples of Programming Languages, pages 106–119, New York,Jan. 1997. ACM Press.18 2007/3/31


[25] Z. Ni and Z. Shao. <strong>Certified</strong> assembly programming with embeddedcode pointers. In Proc. 33rd ACM Symposium on Principles ofProgramming Languages, Jan. 2006.[26] P. Nordin and W. Banzhaf. Evolving turing-complete programs fora register machine with self-modifying code. In Proc. of the 6thInternational Conf. on Genetic Algorithms, pages 318–327, 1995.[27] B. C. Pierce. Advanced Topics in Types and Programming Languages.<strong>The</strong> MIT Press, Cambridge, MA, 2005.[28] M. Poletto, W. C. Hsieh, D. R. Engler, and M. F. Kaashoek. ’C andtcc: a language and compiler for dynamic code generation. ACMTransactions on Programming Languages and Systems, 21(2):324–369, 1999.[29] Ralph. Basics of SMC. http://web.archive.org/web/20010425070215/awc.rejects.net/files/text/smc.txt,2000.[30] J. Reynolds. Separation logic: a logic for shared mutable datastructures. In Proc. 17th IEEE Symp. on Logic in Computer Science,2002.[31] F. M. Smith. <strong>Certified</strong> Run-Time <strong>Code</strong> Generation. PhD thesis,Cornell <strong>University</strong>, Jan. 2002.[32] <strong>The</strong> Coq Development Team, INRIA. <strong>The</strong> Coq proof assistantreference manual. <strong>The</strong> Coq release v8.0, 2004-2006.[33] D. Yu, N. A. Hamid, and Z. Shao. Building certified libraries for PCC:Dynamic storage allocation. Science of Computer Programming,50(1-3):101–127, Mar. 2004.19 2007/3/31

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!