03.12.2012 Views

MaJIC: Compiling MATLAB for Speed and Responsiveness*

MaJIC: Compiling MATLAB for Speed and Responsiveness*

MaJIC: Compiling MATLAB for Speed and Responsiveness*

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

<strong>MaJIC</strong>: <strong>Compiling</strong> <strong>MATLAB</strong> <strong>for</strong> <strong>Speed</strong> <strong>and</strong> <strong>Responsiveness*</strong><br />

ABSTRACT<br />

This paper presents <strong>and</strong> evaluates techniques to improve the<br />

execution per<strong>for</strong>mance of <strong>MATLAB</strong>. Previous ef<strong>for</strong>ts concentrated<br />

on source to source translation <strong>and</strong> batch compilation;<br />

<strong>MaJIC</strong> provides an interactive frontend that looks like<br />

<strong>MATLAB</strong> <strong>and</strong> compiles/optimizes code behind the scenes<br />

in real time, employing a combination of just-in-time <strong>and</strong><br />

speculative ahead-of-time compilation. Per<strong>for</strong>mance results<br />

show that the proper mixture of these two techniques can<br />

yield near-zero response time as well as per<strong>for</strong>mance gains<br />

previously achieved only by batch compilers.<br />

Categories <strong>and</strong> Subject Descriptors<br />

D.3.4 [Programming Languages]: Interpreters, Compilers,<br />

Code Generation, Run-time environments<br />

General Terms<br />

Design, Languages, Algorithms, Per<strong>for</strong>mance<br />

1. INTRODUCTION<br />

<strong>MATLAB</strong> [15], a product of Mathworks Inc., is a popular<br />

programming language <strong>and</strong> development environment<br />

<strong>for</strong> numeric applications. The <strong>MATLAB</strong> programming language<br />

resembles FORTRAN 90 in that it deals with vectors<br />

<strong>and</strong> matrices, but unlike FORTRAN it is weakly typed <strong>and</strong><br />

polymorphic.<br />

The main strengths of <strong>MATLAB</strong> lie both in its interactive<br />

nature, which makes it a h<strong>and</strong>y exploration tool, <strong>and</strong> the<br />

richness of its precompiled libraries <strong>and</strong> toolboxes.<br />

The main weakness of <strong>MATLAB</strong> is its slow execution, especially<br />

when compared to similarly written code in FOR-<br />

TRAN. Because <strong>MATLAB</strong> has weak typing, the interpreter<br />

in the development environment has to check types at runtime,<br />

resulting in substantial per<strong>for</strong>mance loss.<br />

∗ This work was supported in part by NSF contract ACI98-<br />

70687.<br />

Permission to make digital or hard copies of all or part of this work <strong>for</strong><br />

personal or classroom use is granted without fee provided that copies are<br />

not made or distributed <strong>for</strong> profit or commercial advantage <strong>and</strong> that copies<br />

bearthisnotice<strong>and</strong>thefullcitationonthefirstpage. Tocopyotherwise,to<br />

republish,topostonserversortoredistributetolists,requirespriorspecific<br />

permission <strong>and</strong>/or a fee.<br />

PLDI’02, June 17-19, 2002, Berlin, Germany.<br />

Copyright 2002 ACM 1-58113-463-0/02/0006 ...$5.00.<br />

George Almási <strong>and</strong> David Padua<br />

galmasi,padua@cs.uiuc.edu<br />

Department of Computer Science<br />

University of Illinois at Urbana-Champaign<br />

1<br />

294<br />

Previous work with <strong>MATLAB</strong> to FORTRAN translators,<br />

notably the FALCON compiler [9, 8], has shown a per<strong>for</strong>mance<br />

increase of up to three orders of magnitude by employing<br />

compile-time type analysis to reduce the number of<br />

runtime checks.<br />

<strong>MaJIC</strong> (Matlab Just-In-time Compiler) aims to achieve<br />

the same per<strong>for</strong>mance goals without sacrificing the interactive<br />

nature of <strong>MATLAB</strong>. Like FALCON, it attempts to remove<br />

the overhead of runtime type checks by compiling code<br />

instead of interpreting it. Unlike FALCON, which is a batch<br />

compiler, <strong>MaJIC</strong> preserves interactive behavior by minimizing<br />

– or hiding – compilation time. <strong>MaJIC</strong> attempts to<br />

compile code ahead of time by speculation; whenever speculation<br />

fails, <strong>MaJIC</strong> falls back to just-in-time compilation.<br />

<strong>MaJIC</strong>’s dynamic (JIT) compiler reduces compile time<br />

as much as possible. It consists of an extremely fast type<br />

inference engine <strong>and</strong> a relatively naive, but fast, code generation<br />

engine. Compilation is per<strong>for</strong>med as late as possible<br />

in order to gather more runtime in<strong>for</strong>mation, in the idea<br />

that better runtime in<strong>for</strong>mation allows the compiler to save<br />

time-consuming optimization steps.<br />

In addition to JIT compilation, <strong>MaJIC</strong> also per<strong>for</strong>ms<br />

speculative ahead-of-time compilation. Looking at source<br />

code only, the compiler guesses the run-time context most<br />

likely to occur in practice. If the guess is correct, the end<br />

result is highly optimized code that will have been compiled<br />

by the time it is needed, effectively hiding compilation latency.<br />

A wrong guess by the compiler results, at worst, in<br />

degraded per<strong>for</strong>mance, but never affects program correctness:<br />

<strong>MaJIC</strong> contains a mechanism to insure that code is<br />

only executed if its semantics are guaranteed.<br />

The rest of this paper is structured as follows. Section 2<br />

describes the software architecture of <strong>MaJIC</strong> <strong>and</strong> optimization<br />

techniques related to JIT type inference <strong>and</strong> speculative<br />

type inference. Section 3 presents <strong>and</strong> analyzes the per<strong>for</strong>mance<br />

results we obtained. Section 4 offers a brief survey of<br />

related work. In Section 5 we present our conclusions.<br />

2. SOFTWARE ARCHITECTURE<br />

<strong>MaJIC</strong>’s users interact with a <strong>MATLAB</strong>-like front end:<br />

a compatible interpreter that can execute <strong>MATLAB</strong> code at<br />

approximately <strong>MATLAB</strong>’s original speed. However, <strong>MaJIC</strong>’s<br />

front end doesn’t attempt to execute all code: it defers computationally<br />

complex tasks (in the current implementation,<br />

function calls) to the code repository. To pass work to the<br />

repository, the <strong>MaJIC</strong> front end builds an invocation containing<br />

the name of a <strong>MATLAB</strong> function <strong>and</strong> the values of<br />

the parameters (if any).


type signature<br />

source code<br />

1<br />

parser<br />

2<br />

inliner<br />

compiled AST<br />

symbol table<br />

U/D chain<br />

2<br />

disambiguator<br />

The code repository is a database of compiled code. It<br />

compiles code on its own, ahead of time, by snooping the<br />

source code directories, maintaining dependency in<strong>for</strong>mation<br />

between source code <strong>and</strong> object code <strong>and</strong> triggering<br />

recompilations when the source code changes. The repository<br />

can also compile code as a result of user actions (such<br />

as invoking <strong>MATLAB</strong> functions).<br />

The code repository collects the type in<strong>for</strong>mation necessary<br />

<strong>for</strong> compiling <strong>MATLAB</strong> code. This type in<strong>for</strong>mation<br />

comes from different sources: directly from the user (i.e.<br />

when the user calls a function directly), from earlier runs of<br />

the same code, or from the type speculator.<br />

The code repository responds to requests <strong>for</strong> compiled<br />

code by the interpreter. It has a type matching system<br />

(described in Section 2.2.1) that allows the retrieval of semantically<br />

correct compiled code <strong>for</strong> a given invocation by<br />

the interpreter. A failure to find appropriate code usually<br />

triggers a compilation; since this typically happens during<br />

program execution, where time is at a premium, the JIT<br />

compiler is used in this situation. The generated code can<br />

later be recompiled (<strong>and</strong> replaced in the repository) using a<br />

better compiler.<br />

The compiler itself has the task of turning source code<br />

into executable code. The compiler’s passes are shown in<br />

Figure 1.<br />

• The first pass is a scanner/parser which trans<strong>for</strong>ms<br />

<strong>MATLAB</strong> source into an abstract syntax tree (AST).<br />

<strong>MaJIC</strong>’sparserisbasedonFALCON’sparserwitha<br />

few minor improvements.<br />

• Next, preliminary data flow analysis (disambiguation)<br />

is per<strong>for</strong>med to build a static symbol table. At this<br />

point the compiler can optionally per<strong>for</strong>m function inlining<br />

(which then necessitates the re-building of the<br />

symbol table).<br />

• When the symbol table is complete, the compiler per<strong>for</strong>ms<br />

type inference. This pass conservatively assigns<br />

types to all expressions in the program text. In JIT<br />

compilation mode, the type inference engine uses runtime<br />

in<strong>for</strong>mation fed to it by the repository; in speculative<br />

mode, the inference engine uses only the AST<br />

<strong>and</strong> the symbol table <strong>and</strong> produces speculative results.<br />

• The last step of the compilation is code generation.<br />

3<br />

JIT<br />

type inference<br />

−− or −−<br />

speculative<br />

type inference<br />

type annotations<br />

fun. call map<br />

Figure 1: <strong>MaJIC</strong> compiler passes<br />

2<br />

295<br />

4<br />

JIT<br />

code generator<br />

−− or −−<br />

speculative mode<br />

code generator<br />

native/object<br />

code<br />

There exists a code generator each <strong>for</strong> JIT <strong>and</strong> speculative<br />

mode. The JIT code generator builds code fast<br />

<strong>and</strong> in memory; in speculative mode, the code generator<br />

builds C or Fortran source code, which is then<br />

compiled <strong>and</strong> linked with plat<strong>for</strong>m native tools.<br />

In the next few sections we present some of the compiler<br />

passes in more detail.<br />

2.1 Disambiguating <strong>MATLAB</strong> symbols<br />

Other than keywords, symbols in <strong>MATLAB</strong> can represent<br />

variables, calls to built-in primitives, or calls to user<br />

functions. The interpreter recognizes a symbol as a variable<br />

when it appears on the left side of an assignment, or else if it<br />

has an entry in the dynamic symbol table of the interpreter.<br />

A symbol not recognized as a variable is potentially a builtin<br />

primitive; if it cannot be resolved as either a variable or<br />

a built-in, the <strong>MATLAB</strong> interpreter also consults the dynamic<br />

table of existing user functions. If the symbol cannot<br />

be found there either, its occurrence is treated as an error.<br />

Unlike the <strong>MATLAB</strong> interpreter, <strong>MaJIC</strong> needs to identify<br />

symbol meanings at compile time; but some symbols’<br />

meanings are hard to determine without running the code.<br />

Figure 2 shows code with ambiguous symbols. The left box<br />

shows a loop where the first occurrence of the symbol i is<br />

ambiguous, interpreted by <strong>MATLAB</strong> as √ −1inthefirst<br />

iteration, <strong>and</strong> as a variable in all following iterations.<br />

The right code box contains a loop where compiler analysis<br />

would recognize the right-h<strong>and</strong>-side occurrence of y is a<br />

possible undefined variable, or even a user function, if control<br />

flow is not taken into account. Looking at control flow,<br />

however, makes it obvious that y can only be accessed only<br />

after having been defined.<br />

clear<br />

while(...),<br />

z = i;<br />

i = z+1;<br />

end<br />

clear<br />

x=0;<br />

<strong>for</strong> p=1:N,<br />

if (p ≥ 2) then x = y;<br />

y = p;<br />

end<br />

Figure 2: Ambiguous symbols in <strong>MATLAB</strong><br />

Ambiguous symbols are rare in practice <strong>and</strong> almost always<br />

a sign of buggy code. <strong>MaJIC</strong> does deal with them: it de-


fers their processing until runtime. Non-ambiguous variables<br />

can, however, be identified at compile time by a variation<br />

of reaching definitions analysis: a symbol that has a reachingdefinitionasavariableonall<br />

paths leading to it must<br />

be a variable. This analysis is the first pass of the <strong>MaJIC</strong><br />

compiler.<br />

2.2 The type system<br />

<strong>MaJIC</strong>’s type system is used by the type inference engine<br />

<strong>and</strong> by the code repository. The type system is inspired<br />

from that of FALCON, which in turn was influenced by the<br />

APL [6] <strong>and</strong> SETL compilers. <strong>MaJIC</strong>’s notion of a type is<br />

represented by the Cartesian product of several lattices as<br />

follows:<br />

The intrinsic type of the expression is an element in the finite<br />

lattice Li <strong>for</strong>med by the elements real, integer, boolean,<br />

complex <strong>and</strong> string, <strong>and</strong> the requisite comparison operator:<br />

Li ={J , ⊥i, ⊤i, ⊑i, ⊔i}, where<br />

J = {⊥i, bool, int, real, cplx, strg, ⊤i}<br />

⊥i ⊑i bool ⊑i int ⊑i real ⊑i cplx ⊑i ⊤i <strong>and</strong><br />

⊥i ⊑i strg ⊑i ⊤i<br />

A <strong>MaJIC</strong> expression’s shape Ls consists of a pair of values,<br />

one each <strong>for</strong> the number of rows <strong>and</strong> columns of the<br />

expression. In the current version of <strong>MaJIC</strong> we only consider<br />

Fortran-like two-dimensional shapes:<br />

Ls ={N × N, ⊥s, ⊤s, ⊑s, ⊔s}, where<br />

⊥s =< 0, 0 >, ⊤s =< ∞, ∞ >;<br />

⊑s< c,d> iff a ≤ c <strong>and</strong> b ≤ d<br />

An expression’s range Ll is the interval of values the expression<br />

can take [4]. We define ranges only <strong>for</strong> real numbers;<br />

strings <strong>and</strong> complex expressions do not have associated<br />

ranges. The two numbers in the range define the (inclusive)<br />

lower <strong>and</strong> upper limits of an interval. The lower limit is always<br />

less than or equal to the upper limit, or else the range<br />

is mal<strong>for</strong>med:<br />

Ll ={R × R, ⊥l, ⊤l, ⊑l, ⊔l}, where<br />

⊥l =< nan, nan >; ⊤l =< −∞, ∞ >;<br />

⊑l< c,d> iff = ⊥l or (c ≤ a <strong>and</strong> b ≤ d)<br />

The type system is the Cartesian product T = Li × Ls ×<br />

Ls × Ll. Ls appears twice, because <strong>MaJIC</strong> tracks lower as<br />

well as upper bounds of shape descriptors. We will use the<br />

collective denominator “shape” to mean both descriptors<br />

together. Thus the type system consists of intrinsic type,<br />

shape <strong>and</strong> range in<strong>for</strong>mation.<br />

2.2.1 Type signatures<br />

Suppose that a function we are compiling has n <strong>for</strong>mal<br />

parameters {f1,f2, ...fn}. We assign the following types to<br />

the parameters: T = {T1,T2, ...Tn}, whereTi ∈T, 1 ≤ i ≤<br />

n.<br />

We call T the type signature of the compiled code.<br />

We use type signatures to determine whether compiled<br />

code is safe to execute, given a particular invocation. <strong>MaJIC</strong><br />

generates code in such a way that an invocation of the compiled<br />

code with the actual parameters {a1,a2, ...an} having<br />

types {Q1,Q2, ...Qn} is safe if Qi ⊑ Ti, 1 ≤ i ≤ n. An ac-<br />

3<br />

296<br />

tual invocation is safe as long the types of the inputs are<br />

subtypes of the type signature of the compiled code.<br />

The code repository may contain, at any time, several<br />

compiled versions of the same code, differing only in the<br />

assumptions about the types of input parameters (Figure 3<br />

shows a simple function with a single parameter as an example).<br />

The function locator has to match a given invocation<br />

to a version of compiled code in the repository that is safe<br />

to execute (i.e. preserves the semantics of the program),<br />

<strong>and</strong> at the same time is optimal per<strong>for</strong>mance-wise. In order<br />

to do so, the function locator checks the type signature of<br />

the invocation against the signatures of the existing compiled<br />

objects in the repository, until a matching object is<br />

found or all repository objects are exhausted. When several<br />

matching objects exist, the code repository uses simple<br />

heuristics to find the best matching c<strong>and</strong>idate <strong>for</strong> a particular<br />

call, based on a Manhattan-like “distance” between the<br />

type signature of the invocation <strong>and</strong> the matching compiled<br />

code.<br />

2.3 Type inference<br />

The type inference engine is an iterative join-of-all-paths<br />

monotonic data analysis framework [17]. It starts out with<br />

the control flow graph (CFG) of a <strong>MATLAB</strong> program <strong>and</strong>, in<br />

the case of JIT type inference, a type signature T (where |T |<br />

is equal to the number of <strong>for</strong>mal parameters of the function<br />

that is being compiled). The result of type inference is a set<br />

of type annotations S, one type <strong>for</strong> each expression node in<br />

the abstract syntax tree. S is a conservative estimate of the<br />

types that expression nodes can assume during execution.<br />

The annotations are later used by the code generator.<br />

Because <strong>MaJIC</strong> has a relatively simple type system, <strong>and</strong><br />

because the type inference engine avoids symbolic computation<br />

<strong>and</strong> caps the number of iterations, the type inference<br />

engine is fast enough <strong>for</strong> use by the JIT compiler.<br />

2.3.1 Transfer functions<br />

The transfer functions of the type inference engine are implemented<br />

as a set of rules in a type calculator. The calculator<br />

has two modes of operation: in <strong>for</strong>ward mode it infers<br />

expression types from argument types; in backward mode<br />

it infers argument types from the expressions’ types (this<br />

mode is used by the type speculator).<br />

Multiple type calculation rules may exist <strong>for</strong> each AST<br />

node type. Each rule is guarded by a boolean precondition.<br />

When the type calculator is invoked with a particular AST<br />

node as argument, the corresponding rules’ preconditions<br />

are tested in order until one evaluates to true; the rule is<br />

then applied to calculate the result(s).<br />

A rational way of ordering type inference rules is to progress<br />

from the most restrictive ones to the least restrictive ones.<br />

Evaluating more restrictive rules first makes sense because<br />

these generally lead to better per<strong>for</strong>mance, whereas more<br />

general rules tend to yield generic, low per<strong>for</strong>mance code. If<br />

no rules’ preconditions evaluate to true, the type calculator<br />

applies the implicit default rule: all output types are set to<br />

⊤. This allows the type inference engine to behave conservatively<br />

<strong>for</strong> language constructs that have no corresponding<br />

rules in the database.<br />

Thus <strong>for</strong> example the “*” operator in <strong>MaJIC</strong> can be evaluated<br />

successively as an instance of: integer scalar multiply;<br />

real scalar multiply; complex scalar multiply; real scalar ×<br />

vector or vector × scalar; part of a dgemv operation; or a


<strong>MATLAB</strong> code type signature generated code: C + <strong>MATLAB</strong> C library functions<br />

itype(x)=int<br />

int poly1 sig0() {<br />

shape(x)=scalar<br />

return 254;<br />

limits(x)=<br />

}<br />

itype(x)=int<br />

int poly1 sig1(int x) {<br />

shape(x)=scalar<br />

return x*x*x*x*x+3*x+2;<br />

limits(x)=⊤l<br />

itype(x)=real<br />

shape(x)=scalar<br />

}<br />

int poly1 sig2(double x) {<br />

return x*x*x*x*x+3.0*x+2.0;<br />

function p=poly(x)<br />

p = x.ˆ5+3*x+2;<br />

return<br />

limits(x)=⊤l<br />

itype(x)=real<br />

minshape(x)=<br />

maxshape(x)=<br />

limits(x)=⊤l<br />

}<br />

double *poly sig3(double x[3]) {<br />

static tmp2[3];<br />

tmp2[0]=x[0]*x[0]*x[0]*x[0]*x[0]+3.0*x[0]+2.0;<br />

tmp2[1]=x[1]*x[1]*x[1]*x[1]*x[1]+3.0*x[1]+2.0;<br />

tmp2[2]=x[2]*x[2]*x[2]*x[2]*x[2]+3.0*x[2]+2.0;<br />

}<br />

mxArray *poly4 sig1(mxArray *x) {<br />

mxArray *tmp1 = mlfScalar(5.0);<br />

mxArray *tmp2 = mlfPower(x,tmp1); mxFree(tmp1);<br />

mxArray *tmp3 = mlfScalar(3.0);<br />

itype(x)=complex<br />

shape(x)=⊤s<br />

limits(x)=⊤l<br />

mxArray *tmp4 = mlfTimes(tmp3,x); mxFree(tmp3);<br />

mxArray *tmp5 = mlfPlus(tmp2, tmp4);<br />

mxFree(tmp2); mxFree(tmp4);<br />

mxArray *tmp6 = mlfScalar(2.0);<br />

mxArray *tmp7 = mlfPlus(tmp5, tmp6);<br />

mxFree(tmp5); mxFree(tmp6);<br />

return tmp7;<br />

}<br />

Figure 3: Type signatures <strong>and</strong> generated code.The operators itype(x), shape(x) <strong>and</strong> limits(x) in the second<br />

column refer to type components from the type lattice defined earlier.<br />

generic complex matrix multiply. This does not exhaust all<br />

possibilities, but these are the categories that <strong>MaJIC</strong> can<br />

generate successively less optimized code <strong>for</strong>.<br />

Currently, <strong>MaJIC</strong>’s type calculator contains about 250<br />

rules. Each <strong>MATLAB</strong> expression/operator type has at least<br />

one entry in the database; many of <strong>MATLAB</strong>’s built-in functions<br />

have several entries each. Our current implementation<br />

covers just enough of <strong>MATLAB</strong> to execute the benchmarks<br />

efficiently. The type inference engine can h<strong>and</strong>le all other<br />

language features by resorting to the default.<br />

2.4 JIT type inference<br />

In JIT mode, the type calculator per<strong>for</strong>ms only <strong>for</strong>ward<br />

analysis. Type inference propagates the type signature of<br />

the function to calculate type annotations <strong>for</strong> the function<br />

body. Since the type inference system is biased towards<br />

more speed in detriment of precision, one would expect the<br />

quality of type annotations to suffer when per<strong>for</strong>ming justin-time<br />

type inference. However, JIT type inference operates<br />

with very precise initial data: the type signature of the code,<br />

derived directly from the input values of the runtime invocation.<br />

Under these circumstances type inference is not only<br />

precise but lends itself to a number of extensions, which extract<br />

additional in<strong>for</strong>mation from the type inference process<br />

at little or no additional cost:<br />

• Constant propagation: Range propagation (the part<br />

of type inference which deals with the Ll lattice) can<br />

be thought of as a generalization of constant propagation<br />

<strong>for</strong> real scalars. A real value is a constant if<br />

its lower <strong>and</strong> upper limits are equal. Given a type<br />

signature that contains many constants, most of the<br />

transfer functions are able to calculate exact lower <strong>and</strong><br />

4<br />

297<br />

upper limits <strong>for</strong> scalar objects, effectively per<strong>for</strong>ming<br />

constant propagation as part of type inference.<br />

Range propagation does not work <strong>for</strong> complex numbers<br />

<strong>and</strong> non-numeric values, so constant propagation does<br />

not work <strong>for</strong> these either.<br />

• Exact shape inference: <strong>MaJIC</strong> propagates lower<br />

<strong>and</strong> upper bounds <strong>for</strong> array shape in<strong>for</strong>mation. An array’s<br />

shape is exactly determined if the lower <strong>and</strong> upper<br />

shape bounds are equal. Just as constants can be<br />

determined given good input data, exact array shapes<br />

can be determined also. Sometimes value range propagation<br />

<strong>and</strong> shape propagation collaborate on determining<br />

exact shapes. For example, in the statement A =<br />

zeros(m,n), the value ranges of m <strong>and</strong> n may uniquely<br />

determine the shape of A.<br />

In array assignments of the <strong>for</strong>m A(i)=..., the range<br />

of the index can determine the shape of the array A<br />

(because <strong>MATLAB</strong> arrays reshape themselves to accommodate<br />

indices).<br />

There are a number of ways in which exact shapes<br />

can be used to achieve better per<strong>for</strong>mance. For example,<br />

by completely unrolling simple operations on<br />

small arrays we can eliminate all control flow from the<br />

operation.<br />

• Subscript check removal: <strong>MATLAB</strong> m<strong>and</strong>ates subscript<br />

checks on all array accesses. The removal of<br />

unnecessary subscript checks is a major source of per<strong>for</strong>mance<br />

enhancement in <strong>MaJIC</strong>.<br />

Older versions of <strong>MATLAB</strong>’s own compiler, mcc, have<br />

comm<strong>and</strong> line switches to disable subscript checks (in-


cluding resizing checks). This can cause code that otherwise<br />

is correct to run incorrectly when compiled with<br />

mcc. Newer versions of mcc have consequently discontinued<br />

the option.<br />

<strong>MaJIC</strong> removes subscript checks automatically <strong>and</strong><br />

conservatively, by using the range <strong>and</strong> shape in<strong>for</strong>mation<br />

readily available during type inference. Because<br />

JIT type inference propagates these exactly, the extra<br />

ef<strong>for</strong>t needed <strong>for</strong> subscript check analysis is extremely<br />

low, comparing favorably with more conventional techniques<br />

[13].<br />

2.5 Type speculation<br />

Just-in-time type inference assumes that the full calling<br />

context (i.e. the type signature) is available to the analyzer.<br />

By contrast, type speculation assumes nothing about the<br />

calling context: it guesses the likely types of the arguments.<br />

This allows the compiler to process the code ahead of time,<br />

applying advanced (<strong>and</strong> time consuming) loop optimizations<br />

in order to generate good code.<br />

The type speculator’s trick is to back-propagate certain<br />

type hints from the body of the code to the input parameters.<br />

Type hints are collected from syntactic constructs that<br />

suggest, but do not comm<strong>and</strong>, particular semantic meanings.<br />

These constructs originate in part from programmers’ tendency<br />

to avoid arcane <strong>MATLAB</strong> specific constructs, sticking<br />

instead to features already prevalent in Fortran-like languages.<br />

Other hints can be derived from some <strong>MATLAB</strong><br />

built-in functons’ affinity towards certain inputs. The following<br />

list summarizes the type hints used by <strong>MaJIC</strong>’s<br />

speculator:<br />

• When processing the colon operator (:), used to specify<br />

index ranges, <strong>MATLAB</strong> silently ignores the imaginary<br />

part of the index arguments. Even if the index<br />

is a complex array, only the real part of its first element<br />

is used, <strong>and</strong> all indices are of course rounded<br />

be<strong>for</strong>e use. This suggests that oper<strong>and</strong>s of the interval<br />

operator are almost always integer scalars.<br />

• Relational operators disregard the imaginary components<br />

of their oper<strong>and</strong>s. Also, relational operations<br />

between vectors are possible but are rare in practice,<br />

since their semantics are non-intuitive. This holds even<br />

stronger <strong>for</strong> expressions that <strong>for</strong>m the condition of an<br />

if-statement of a while-statement.<br />

• The <strong>MATLAB</strong> bracket operator (vector constructor)<br />

collates several matrices into a new larger matrix. The<br />

components all have to have either the same number of<br />

rows or the same number of columns. In practice the<br />

bracket operator is often used to build vectors out of<br />

scalars. When we can prove that one of the arguments<br />

xi of the bracket operator [x1x2...xn] is a scalar, all<br />

other arguments are probably scalars too.<br />

• In matrix index expressions of the <strong>for</strong>m A(idx) <strong>and</strong><br />

A(idx1,idx2), if the subscript is an expression or a<br />

variable then it is likely scalar. This is a reasonable assumption<br />

because a many <strong>MATLAB</strong> applications use<br />

either Fortran77 or Fortran90 compatible array indexing<br />

operations. Fortran90 syntax is indicated by the<br />

presence of the colon (:) operator; the lack of colons<br />

indicates Fortran77 syntax.<br />

5<br />

298<br />

• Arguments to a number of builtin functions, such as<br />

zeros, ones, r<strong>and</strong>, the second argument of size <strong>and</strong><br />

many others, are likely integer scalars. <strong>MATLAB</strong> issues<br />

warnings when the arguments in question are nonscalars<br />

or non-integers, but does not stop processing.<br />

However most well-written <strong>MATLAB</strong> programs don’t<br />

intentionally produce these warnings.<br />

These hints are implemented as type calculator rules. Note<br />

that the hints involve backwards propagation of types, since<br />

they make statements about input arguments rather than<br />

the result types of <strong>MATLAB</strong> expressions. Thus, in order to<br />

propagate hints, the type inference engine must be used in<br />

backwards mode.<br />

Speculative type inference consists of a number of alternating<br />

backward <strong>and</strong> <strong>for</strong>ward type inference passes. A speculative<br />

(backward) pass infers a credible type signature from<br />

the code body; it is immediately followed by a normal type<br />

inference pass to re-calculate the types in the body. The<br />

alternating backwards-<strong>for</strong>wards process can be iterated several<br />

times until convergence.<br />

2.6 Code generation<br />

<strong>MaJIC</strong> has two code generation systems: a fast lightweight<br />

code generator used <strong>for</strong> JIT compilation, <strong>and</strong> a C (or Fortran)<br />

based code generator that uses the host system to compile,<br />

optimize <strong>and</strong> link the code. Both code generators use<br />

the parsed AST <strong>and</strong> type annotations to drive code selection.<br />

The code generators follow the same general selection<br />

rules, but build radically different code.<br />

The JIT code generator is able to build executable code<br />

directly in memory by using the vcode [11] dynamic assembler.<br />

The code generator makes a single code selection pass<br />

through the parsed AST. No loop optimizations or instruction<br />

scheduling are per<strong>for</strong>med. Register allocation is done<br />

using the linear-scan register allocator [19]. This, <strong>and</strong> the<br />

small total number of code generation passes, results in a<br />

fast code generator.<br />

The source code generator is somewhat more complicated.<br />

It uses the same code selection pass as the JIT code generator,<br />

but builds C or Fortran source code in a temporary<br />

file. This file is then compiled with the native compiler using<br />

the most aggressive optimization mode that is available.<br />

The compiler generates a relocatable object which is then<br />

dynamically linked into the <strong>MaJIC</strong> executable. Unlike the<br />

JIT code generator, the source code generator is quite slow,<br />

hampered by the large overheads of loading <strong>and</strong> executing<br />

the compiler <strong>and</strong> the linker. Compilation, optimization <strong>and</strong><br />

linking can take several seconds.<br />

2.6.1 Code selection rules<br />

As mentioned be<strong>for</strong>e, the two code generators use the<br />

same selection rules even though they use them to produce<br />

different code. A few of the selection rules are listed below.<br />

• The implicit default rule <strong>for</strong> any operator is that the<br />

numeric oper<strong>and</strong>s are complex matrices. This is the<br />

unoptimized fall-back option <strong>for</strong> operations that have<br />

not been type-inferred. The <strong>MATLAB</strong> C library provides<br />

functions that implement these generic operators.<br />

• <strong>MaJIC</strong> inlines scalar arithmetic <strong>and</strong> logical operations,<br />

elementary math functions <strong>and</strong> assignments of


scalar integers, reals <strong>and</strong> complex numbers. This is<br />

probably the most important per<strong>for</strong>mance optimization<br />

in <strong>MaJIC</strong>: it relies on type annotations to replace<br />

<strong>MATLAB</strong>’s polymorphic operations with single<br />

machine instructions.<br />

• <strong>MaJIC</strong> inlines scalar <strong>and</strong> F90-like array index operations.<br />

The <strong>MATLAB</strong> interpreter discriminates between<br />

array expressions types at runtime, spending<br />

hundreds of cycles. By contrast, an inlined scalar index<br />

operation takes only a few cycles.<br />

• Small temporary arrays of known sizes are pre-allocated.<br />

<strong>MATLAB</strong>’s expression evaluation semantics sometimes<br />

<strong>for</strong>ces the existence of temporary buffers to hold intermediary<br />

array results. Replacing dynamically allocated<br />

buffers with statically allocated ones saves a lot<br />

of overhead at the expense of a small amount of heap<br />

memory.<br />

• Elementary vector operations, such as arithmetic operations<br />

<strong>and</strong> vector concatenation, are completely unrolled<br />

when exact array shapes are known. This technique<br />

is very effective on small (up to 3 × 3) matrices<br />

<strong>and</strong> vectors because it completely eliminates loop overhead.<br />

• <strong>MaJIC</strong> per<strong>for</strong>ms code selection to combine several<br />

AST nodes into a single library call. For example, expressions<br />

like a*X+b*C*Y are trans<strong>for</strong>med into a single<br />

call to the BLAS routine dgemv [7].<br />

• Unlike Fortran, <strong>MATLAB</strong> resizes arrays on dem<strong>and</strong>.<br />

In general, this occurs when an array index overflow<br />

occurs on the left h<strong>and</strong> side. Repetitive array resizing<br />

(e.g. in a loop) can be tremendously expensive.<br />

<strong>MaJIC</strong> applies the simple but effective technique of<br />

“oversizing” arrays, i.e. allocating about 10% more<br />

space <strong>for</strong> a resized array than strictly necessary, so<br />

that subsequent growth of the array does not necessitate<br />

another resize operation.<br />

<strong>MaJIC</strong> per<strong>for</strong>ms oversizing carefully in order to preserve<br />

the original semantics of the code. The oversized<br />

array, when queried, returns accurate size in<strong>for</strong>mation.<br />

Oversizing is also limited by the amount of available<br />

memory <strong>and</strong> the size of the array. Large arrays are<br />

never oversized.<br />

• <strong>MaJIC</strong> inlines calls to small (less than 200 lines of<br />

code) functions. Inlining preserves the call-by-value<br />

semantics of <strong>MATLAB</strong> by making copies of the actual<br />

parameters. However, read-only <strong>for</strong>mal parameters are<br />

not copied. This can result in huge per<strong>for</strong>mance gain<br />

when large matrices are being passed as read-only arguments<br />

in the call.<br />

3. PERFORMANCE EVALUATION<br />

In this section, we evaluate the overall per<strong>for</strong>mance of the<br />

<strong>MaJIC</strong> compiler. Although the repository is part of the<br />

interactive <strong>MaJIC</strong> system, an evaluation of its per<strong>for</strong>mance<br />

is not a goal of this paper. We were interested in analyzing<br />

the quality <strong>and</strong> speed of the JIT <strong>and</strong> speculative compilers.<br />

To test JIT compilation, we started our experiments with<br />

an empty repository. This resulted in the JIT compiler being<br />

invoked <strong>for</strong> any function call.<br />

6<br />

299<br />

To test speculative compilation, we also started up <strong>MaJIC</strong><br />

with an initially empty repository, but we invoked the benchmarks<br />

only after <strong>MaJIC</strong>’s repository had ample time to find<br />

them <strong>and</strong> compile them speculatively.<br />

3.1 Benchmarks<br />

<strong>MaJIC</strong> was tested with 15 <strong>MATLAB</strong> benchmarks, between<br />

50 <strong>and</strong> 250 lines long each. Table 1 lists the names,<br />

origin <strong>and</strong> functional description of the benchmarks, as well<br />

as the associated problem size <strong>for</strong> which measurements were<br />

run (matrix sizes in some of the benchmarks). In addition,<br />

we list the number of lines in each benchmark <strong>and</strong> the runtime<br />

on a reference system (the SPARC plat<strong>for</strong>m described<br />

in Section 3.3) using a stock <strong>MATLAB</strong> interpreter.<br />

Many of the benchmarks were originally used to evaluate<br />

FALCON; we reused them in order to facilitate a direct<br />

comparison of <strong>MaJIC</strong> <strong>and</strong> FALCON.<br />

In order to make the subsequent discussion easier, we<br />

group the benchmarks into four partially overlapping categories.<br />

Benchmarks in the same categories tend to be optimized<br />

in similar ways by the compiler, <strong>and</strong> show similar<br />

per<strong>for</strong>mance gains:<br />

• Scalar, or Fortran-like, benchmarks: dirich, finedif,<br />

icn, m<strong>and</strong>el <strong>and</strong>, to some extent, crnich, are written<br />

in a style that closely resembles Fortran 77. All array<br />

indices in these benchmarks are scalars.<br />

• Benchmarks with built-in functions: cgopt, qmr, sor<br />

<strong>and</strong> mei spend a large portion of their runtime in builtin<br />

<strong>MATLAB</strong> library functions. Typically, these codes<br />

are hard to optimize, since the the library functions<br />

themselves are already optimized.<br />

• Array benchmarks: orbec, orbrk, fractal <strong>and</strong> adapt<br />

have many operations on small fixed size <strong>MATLAB</strong><br />

vectors. adapt features a large (<strong>and</strong> dynamically growing)<br />

array as well as small vectors.<br />

• Recursive benchmarks: fibo <strong>and</strong> ack contain recursion,<br />

which makes inlining <strong>and</strong> type inference harder.<br />

3.2 Measurement methodology<br />

Our per<strong>for</strong>mance figures are derived from the running<br />

times of the benchmarks. The most important gauge of<br />

per<strong>for</strong>mance we use is the speedup of compiled code relative<br />

to interpreted code, i.e. the expression s = ti/tc, whereti is<br />

the runtime of the code in <strong>MATLAB</strong>’s interpreter, <strong>and</strong> tc is<br />

the runtime in the compiler.<br />

We measured <strong>MaJIC</strong>’s speedups in both JIT <strong>and</strong> speculative<br />

compilation mode. In JIT mode runtime includes<br />

the time spent by the JIT compiler producing object code.<br />

In speculative mode the repository is assumed to have a<br />

generated the code ahead of time; hence compile time is not<br />

included in the runtime in this case, unless the speculatively<br />

generated code turns out not to match the benchmarks invocation<br />

– in this latter case the JIT compiler kicks in <strong>and</strong><br />

helps out with the code generation. This mode of measuring<br />

runtimes is consistent with the expected real-world usage<br />

pattern of <strong>MaJIC</strong>.<br />

For purposes of comparison, we also measured the speedups<br />

of mcc, the compiler supplied by the Mathworks Inc. We set<br />

a number of compile time options <strong>for</strong> this compiler in order<br />

to guarantee the best per<strong>for</strong>mance: we manually eliminated


enchmark source short description problem size lines of code runtime (s)<br />

adapt [14] adaptive quadrature approx. 2500 81 5.24<br />

cgopt [3] conjugate gradient w. diagonal preconditioner 420 x 420 38 0.43<br />

crnich [14] Crank-Nicholson heat equation solver 321 x 321 40 16.33<br />

dirich [14] Dirichlet solution to Laplace’s equation 134 x 134 34 277.89<br />

finedif [14] Finite difference solution to the wave equation 1000 x 1000 21 57.81<br />

galrkn [12] Galerkin’s method (finite element method) 40 x 40 43 8.02<br />

icn R. Bramley Cholesky factorization 400 x 400 29 7.72<br />

mei unknown fractal l<strong>and</strong>scape generator 31 x 14 24 10.77<br />

orbec [12] Euler-Cromer method <strong>for</strong> 1-body problem 62400 points 24 19.10<br />

orbrk [12] Runge-Kutta method <strong>for</strong> 1-body problem 5000 points 52 9.30<br />

qmr [12] linear equation system solver, QMR method 420 x 420 119 5.29<br />

sor [3] lin. eq. sys. solver, successive overrelaxation 420 x 420 29 4.77<br />

ackermann authors Ackermann’s function ackermann(3,5) 15 3.84<br />

fractal authors Barnsley fern generator 25000 points 35 26.55<br />

m<strong>and</strong>el authors M<strong>and</strong>elbrot set generator 200 x 200 16 8.64<br />

fibonacci authors recursive Fibonacci function fibonacci(20) 10 1.29<br />

subscript checks, <strong>and</strong> replaced operations on complex number<br />

with real number operations where it was safe to do<br />

so.<br />

We measured the speedups of FALCON by repeating the<br />

experiments described in [9] on our test machines. We instructed<br />

FALCON to eliminate subscript checks wherever<br />

this did not break the code.<br />

Execution times were measured on a “best of 10 runs”<br />

basis on a quiet system.<br />

Our per<strong>for</strong>mance graphs show four bars <strong>for</strong> each benchmark.<br />

The four bars are the speedups achieved by mcc,<br />

FALCON, <strong>MaJIC</strong> in JIT mode <strong>and</strong> <strong>MaJIC</strong> in speculative<br />

mode respectively. Because the speedups are distributed<br />

over four orders of magnitude, ranging from 0.1 to about<br />

1000, the graphs are represented on a logarithmic scale.<br />

3.3 Testing plat<strong>for</strong>ms <strong>and</strong> speedups<br />

We measured the interpreted execution time ti of all benchmarks<br />

using the <strong>MATLAB</strong> 6 (release 12) integrated environment<br />

on two architectures:<br />

• The development plat<strong>for</strong>m <strong>for</strong> <strong>MaJIC</strong> is a 400MHz<br />

UltraSparc 10 workstation with 256MB of RAM, running<br />

Solaris 7 <strong>and</strong> equipped with the Sparcworks 5.0<br />

C compiler. The per<strong>for</strong>mance results <strong>for</strong> this machine<br />

aresummarizedinFigure4.<br />

As described above, the figure has bars <strong>for</strong> each benchmark,<br />

called “mmc”, “falcon”, “jit” <strong>and</strong> “spec” respectively.<br />

A few of the speedup bars are missing: there<br />

are no FALCON speedup bars <strong>for</strong> the benchmarks ack,<br />

fractal, fibo <strong>and</strong> m<strong>and</strong>el, because these were not<br />

part of the original FALCON benchmark series <strong>and</strong><br />

are unsuitable <strong>for</strong> compilation with FALCON.<br />

The speedup bars of cgopt appear to be missing because<br />

they are very close to 1.0.<br />

• We also ran some of the experiments on an SGI Origin<br />

200 machine equipped with 4 180MHz R10000 processors,<br />

IRIX 6.5 <strong>and</strong> the MIPSPro C compiler. The JIT<br />

compiler on this plat<strong>for</strong>m is not yet completely implemented.<br />

Some benchmarks (like adapt) wereleftout<br />

of the graphs <strong>for</strong> this reason. Others are included, but<br />

Table 1: <strong>MaJIC</strong> benchmarks<br />

7<br />

300<br />

1000<br />

100<br />

10<br />

1<br />

0.1<br />

crnich<br />

dirich<br />

finedif<br />

icn<br />

m<strong>and</strong>el<br />

mmc falcon jit+gen spec<br />

cgopt<br />

mei<br />

qmr<br />

Figure 4: Per<strong>for</strong>mance on the SPARC plat<strong>for</strong>m<br />

run at reduced per<strong>for</strong>mance due to the poor quality of<br />

the generated code. Figure 5 shows the results.<br />

3.4 Comparative per<strong>for</strong>mance analysis<br />

The two groups of benchmarks that most clearly benefit<br />

from compilation are the Fortran-like benchmarks <strong>and</strong> the<br />

small vector benchmarks. These types of codes incur the<br />

most overhead during interpreted execution; they profit the<br />

most from the removal of overhead.<br />

By contrast, the benchmarks that are heavy in built-in<br />

function calls benefit very little, <strong>and</strong> sometimes not at all,<br />

from compilation. Obviously, the execution speed of built-in<br />

functions is not influenced by compiling the calling code.<br />

The orbrk benchmark demonstrates that inlining at compile<br />

time is beneficial. Recursive functions like fibo <strong>and</strong><br />

ack also generally benefit from inlining. <strong>MaJIC</strong> does not<br />

attempt to inline more than 3 levels of recursive calls in<br />

order to avoid code explosion.<br />

While mcc is not particularly successful at removing the<br />

interpretive overhead, both FALCON <strong>and</strong> <strong>MaJIC</strong> do succeed<br />

in eliminating it, although using different strategies.<br />

FALCON relies heavily on the native Fortran compiler to<br />

sor<br />

adapt<br />

orbec<br />

orbrk<br />

fractal<br />

galrkn<br />

ack<br />

fibo


10000<br />

1000<br />

100<br />

10<br />

1<br />

0.1<br />

crnich<br />

dirich<br />

finedif<br />

icn<br />

m<strong>and</strong>el<br />

mmc falcon jit spec<br />

cgopt<br />

mei<br />

Figure 5: Per<strong>for</strong>mance on the MIPS plat<strong>for</strong>m<br />

generate good code. <strong>MaJIC</strong> has a few specific optimizations<br />

(described in Section 2.6.1) that make it less reliant<br />

on the native compiler <strong>and</strong> allow it to generate reasonable<br />

code even with the JIT code generator.<br />

On the SPARC plat<strong>for</strong>m the native Fortran-90 compiler<br />

generates relatively poor code, causing <strong>MaJIC</strong> to outper<strong>for</strong>m<br />

FALCON in a few of the benchmarks. On the MIPS<br />

plat<strong>for</strong>m the native compiler is excellent, causing <strong>MaJIC</strong>’s<br />

JIT compiler to fall behind FALCON.<br />

3.5 Analysis of JIT compilation<br />

For the analysis of JIT compilation we rely mostly on results<br />

gathered on the SPARC plat<strong>for</strong>m, since the JIT code<br />

generator was optimized <strong>for</strong> this plat<strong>for</strong>m. The per<strong>for</strong>mance<br />

figures are remarkable when considering that the code in<br />

question is generated in a fraction of a second <strong>and</strong> without<br />

the benefit of backend optimizations. On the other<br />

h<strong>and</strong> there is room <strong>for</strong> future optimizations; however, be<strong>for</strong>e<br />

adding these on, it will be necessary to test whether the<br />

increased compile time will destroy the per<strong>for</strong>mance gained<br />

by optimization.<br />

Figure 6 shows the time composition of the runtime of<br />

each JIT-compiled benchmark. With the exception of orbrk,<br />

most benchmarks spend a relatively modest amount of time<br />

compiling the code. The compile time/runtime ratio is artificially<br />

high anyway, in part because the benchmarks run<br />

on modestly sized problems. There is definitely room <strong>for</strong><br />

at least basic back-end optimizations in the JIT compiler,<br />

such as common subexpression elimination, loop unrolling,<br />

loop invariant removal <strong>and</strong> some <strong>for</strong>m of instruction scheduling.<br />

Preliminary experiments with the finedif <strong>and</strong> dirich<br />

benchmark suggest that loop unrolling alone can reduce excution<br />

time by about 50% at a reasonable cost in overhead.<br />

3.5.1 The effect of existing JIT optimizations<br />

The effect of optimizations in any compiler is cumulative<br />

<strong>and</strong> hard to study in isolation. In this section we evaluate<br />

the effectiveness of JIT-specific optimizations by individually<br />

disabling them <strong>and</strong> studying the resulting drop in<br />

per<strong>for</strong>mance. Figure 7 shows the measurement results.<br />

The first set of bars (“no range”) was obtained by disabling<br />

range propagation during JIT type inference. The<br />

primary effect of this measure is to disable subscript check<br />

qmr<br />

sor<br />

adapt<br />

orbec<br />

orbrk<br />

fractal<br />

galrkn<br />

8<br />

301<br />

normalized execution time<br />

100%<br />

90%<br />

80%<br />

70%<br />

60%<br />

50%<br />

40%<br />

30%<br />

20%<br />

10%<br />

0%<br />

crnich<br />

dirich<br />

finedif<br />

icn<br />

m<strong>and</strong>el<br />

disamb typeinf codegen exec<br />

cgopt<br />

mei<br />

qmr<br />

Figure 6: The composition of JIT execution<br />

per<strong>for</strong>mance relative to fully optimized JIT<br />

100%<br />

80%<br />

60%<br />

40%<br />

20%<br />

0%<br />

crnich<br />

dirich<br />

finedif<br />

icn<br />

no ranges<br />

no min. shapes<br />

no regalloc<br />

m<strong>and</strong>el<br />

cgopt<br />

mei<br />

Figure 7: Disabling JIT optimizations<br />

removal. The relative increase in execution time is highest<br />

in the benchmarks that have many array accesses: dirich,<br />

finedif <strong>and</strong> m<strong>and</strong>el are good examples.<br />

The second set of bars (“no min. shape”) was obtained<br />

by disabling the propagation of minimum shape in<strong>for</strong>mation.<br />

This disables subscript check removal in some cases,<br />

<strong>and</strong> does not allow the compiler to unroll small vector operations.<br />

orbec, orbrk <strong>and</strong> fractal are the most affected,<br />

because these consist mostly of operations on small vectors<br />

<strong>and</strong> matrices.<br />

The last set of bars (“no regalloc”) was obtained by <strong>for</strong>cing<br />

the linear-scan register allocator to spill every variable.<br />

This is roughly equivalent to compiling with the -g flag set<br />

on a regular compiler like gcc.<br />

The results clearly show that range propagation, minimum<br />

shape propagation <strong>and</strong> register allocation are essential<br />

to JIT per<strong>for</strong>mance.<br />

qmr<br />

sor<br />

sor<br />

adapt<br />

adapt<br />

orbec<br />

orbec<br />

orbrk<br />

orbrk<br />

fractal<br />

fractal<br />

galrkn<br />

galrkn<br />

ack<br />

ack<br />

fibo<br />

fibo


3.6 Analysis of the speculator<br />

The speedup results produced by speculation generally<br />

match those of FALCON. We conclude that speculation is<br />

generally successful. However, we cannot expect a speculative<br />

technique to be universally successful; we need to<br />

analyze the consequences of failure.<br />

<strong>MaJIC</strong>’s type speculator fails in two ways: by being too<br />

aggressive, <strong>and</strong> generating useless code, or by not being aggressive<br />

enough <strong>and</strong> generating suboptimal code. The first<br />

type of failure, unreasonable specialization of input types,<br />

is easily countered: the type signature check, done by the<br />

repository at runtime, will eliminate such code from consideration.<br />

A more insidious failure is when the speculator generates<br />

code that is perfectly safe to execute, but suboptimal. Such<br />

cases are not caught at runtime. The per<strong>for</strong>mance of the<br />

invoked code will be lower, but it is not immediately clear<br />

by how much.<br />

benchmark crnich dirich finedif icn m<strong>and</strong>el<br />

spec. 181 817 412 48 36<br />

JIT 181 817 413 51 54.0<br />

benchmark cgopt mei qmr sor adapt<br />

spec. 1 4.24 4.52 1.68 4.09<br />

JIT 1.16 5.67 5.68 1.79 4.16<br />

benchmark orbec orbrk fractal galrkn ack<br />

spec. 146 465 663 61.7 4.04<br />

JIT 174 465 664 72.9 6.00<br />

benchmark fibo<br />

spec. 3.49<br />

JIT 5.16<br />

Table 2: JIT vs.speculative type inference<br />

Table 2 attempts to quantify the speculator’s per<strong>for</strong>mance.<br />

It compares the speedups produced by the same code generator<br />

using type annotations generated with either speculation<br />

or JIT type inference (the speedups were calculated<br />

without considering compile time). Looking at this table,<br />

it is obvious that speculative type inference closely matches<br />

the per<strong>for</strong>mance of JIT type inference in many cases. We<br />

conclude that<br />

• Speculation works best on scalar (Fortran 77-like) <strong>and</strong><br />

vector codes. Speculative rules look <strong>for</strong> exactly the<br />

kinds of features that are prevalent in these codes.<br />

• Benchmarks with built-in functions typically fare badly<br />

because the speculative rules currently present in <strong>MaJIC</strong><br />

do not account <strong>for</strong> the language features used by these<br />

codes. <strong>MaJIC</strong> mispredicts a “*” operator in qmr to<br />

represent scalar multiplication, whereas in fact it is a<br />

matrix-vector multiplication. In mei the speculator is<br />

unable to predict that the arguments to an eig function<br />

call are reals; instead it considers them complex<br />

values which leads to per<strong>for</strong>mance loss. A similar situation<br />

occurs in m<strong>and</strong>el due to the use of the built-in<br />

function i.<br />

• Recursive benchmarks are not h<strong>and</strong>led correctly by<br />

speculative compilation. They always need to be recompiled<br />

at runtime.<br />

9<br />

302<br />

4. RELATED WORK<br />

<strong>MaJIC</strong> is patterned after FALCON [9, 8], a <strong>MATLAB</strong><br />

to Fortran-90 translator developed by L. DeRose in 1996.<br />

FALCON per<strong>for</strong>ms type inference to generate declarations<br />

<strong>for</strong> variables. It then generates Fortran code using these<br />

declarations. However, FALCON’s type inference engine is<br />

facing a limiting factor. Because FALCON is a batch compiler,<br />

it has no in<strong>for</strong>mation about the calling context of the<br />

functions it tries to compile. This makes type inference potentially<br />

ineffective. FALCON circumvents this problem by<br />

“peeking” into the input files of the code it compiles <strong>and</strong><br />

extracting type in<strong>for</strong>mation from there.<br />

MENHIR [5], developed by Francois Bodin at INRIA, is<br />

another batch compiler similar to FALCON: it generates<br />

code <strong>for</strong> <strong>MATLAB</strong> <strong>and</strong> exploits parallelism by using optimized<br />

runtime libraries. MENHIR’s code generation is<br />

retargetable (it generates C or FORTRAN code). It also<br />

contains a type inference engine similar to FALCON’s.<br />

MATCH [2] is a <strong>MATLAB</strong> compiler targeted to heterogeneous<br />

architectures, such as DSP chips <strong>and</strong> FPGAs. It<br />

also uses type analysis <strong>and</strong> generates code <strong>for</strong> multiple large<br />

functional units.<br />

Vijay Menon’s vectorizer [16] is an alternative to compilation.<br />

Menon observed that scalar operations in <strong>MATLAB</strong><br />

were slower than vector operations because they involved<br />

more overhead pear floating-point operation. He proposed<br />

to eliminate this overhead not by compilation, but by translating<br />

Fortran 77-like scalar operations into Fortran 90-like<br />

vector expressions in the <strong>MATLAB</strong> source code. Menon’s<br />

vectorizer is built on top of the <strong>MaJIC</strong> infrastructure.<br />

Just-in-time compilation has been around since 1984, when<br />

Deutsch described a dynamic compilation system <strong>for</strong> the<br />

Smalltalk language [10]. The technique became truly popular<br />

with the Java language, <strong>and</strong> countless Java JIT compilers<br />

have been proposed <strong>and</strong> implemented in recent times.<br />

<strong>MaJIC</strong>’s JIT compiler reuses code <strong>and</strong> ideas from the vcode [11]<br />

<strong>and</strong> tcc [18] packages. vcode was originally built as a generalpurpose,<br />

plat<strong>for</strong>m-independent RISC-like dynamic assembly<br />

language to facilitate dynamic code generation, <strong>and</strong> is used<br />

in almost unchanged <strong>for</strong>m by <strong>MaJIC</strong>. tcc was built on top<br />

of vcode <strong>and</strong> provides an implementation of ‘C, a C-like<br />

programming language with a LISP-like backquote operator<br />

that facilitates the building of dynamic code by composition.<br />

We did not reuse the ‘C parser, but we did use the tcc intermediate<br />

language specification, ICODE, <strong>and</strong> re-implemented<br />

the register allocator used by tcc.<br />

5. CONCLUSIONS<br />

In an ef<strong>for</strong>t to bring high per<strong>for</strong>mance to the <strong>MATLAB</strong> integrated<br />

environment, we have designed, built <strong>and</strong> evaluated<br />

two paradigms <strong>for</strong> compiling <strong>MATLAB</strong> code: JIT compilation<br />

<strong>and</strong> speculative compilation.<br />

JIT compilation is remarkably successful in bringing down<br />

compile time to almost nil, while obtaining reasonable per<strong>for</strong>mance<br />

gains (up to two orders of magnitude faster than<br />

the <strong>MATLAB</strong> interpreter). It falls behind in terms of per<strong>for</strong>mance<br />

when compared to the best that a static compiler<br />

(like FALCON) can do. Of our benchmarks, the most affected<br />

were the Fortran-like <strong>and</strong> small vector codes, where<br />

the lack of backend optimization is felt the most.<br />

In order to estimate the effect of adding more optimizations<br />

to the JIT compiler, we h<strong>and</strong>-optimized the finedif


enchmark by h<strong>and</strong>-unrolling its innermost loop <strong>and</strong> per<strong>for</strong>ming<br />

common subexpression elimination. We obtained<br />

a version of finedif that was almost 100% faster than the<br />

normal JIT-compiled finedif, <strong>and</strong> within 20% of the per<strong>for</strong>mance<br />

of the best (native compiler-generated) version of the<br />

code. Preliminary data suggests that similar, although less<br />

impressive, per<strong>for</strong>mance improvements can be obtained with<br />

some of the other Fortran-like benchmarks, which leaves the<br />

door open <strong>for</strong> future enhancements of the JIT compiler.<br />

Speculative compilation is successful in bringing up per<strong>for</strong>mance<br />

to – <strong>and</strong> beyond – FALCON levels. However, generation<br />

of optimized code takes time; speculation is designed<br />

to allow the hiding of compilation latency. Speculation is not<br />

universally successful; it can result in loss of per<strong>for</strong>mance<br />

when it fails.<br />

It is interesting to note that the speculative type hints that<br />

are used most successfully by <strong>MaJIC</strong>’s speculator are tied<br />

to the very same language features of <strong>MATLAB</strong> that slow<br />

down the interpreter. Hence, speculation tends to succeed<br />

when it is most needed.<br />

6. REFERENCES<br />

[1] George Almasi. <strong>MaJIC</strong>: a Matlab Just-In-time<br />

Compiler. PhD thesis, University of Illinois at<br />

Urbana-Champaign, June 2001.<br />

[2] P. Banerjee, N. Shenoy, A. Choudhary, S. Hauck,<br />

C. Bachmann, M. Chang, M. Haldar, P. Joisha,<br />

A. Jones, A. Kanhare, A. Nayak, S. Periyacheri, <strong>and</strong><br />

M. Walkden. Match: A matlab compiler <strong>for</strong><br />

configurable computing systems. Technical Report<br />

CPDC-TR-9908-013, Center <strong>for</strong> Parallel <strong>and</strong><br />

Distributed Computing, Northwestern University,<br />

Aug. 1999.<br />

[3] R. Barrett, M. Berry, T. F. Chan, J. Demmel,<br />

J.Donato,J.Dongarra,V.Eijkhout,R.Pozo,<br />

C. Romine, <strong>and</strong> H. Van der Vorst. Templates <strong>for</strong> the<br />

Solution of Linear Systems: Building Blocks <strong>for</strong><br />

Iterative Methods, 2nd Edition. SIAM, Philadelphia,<br />

PA, 1994.<br />

[4] William Blume <strong>and</strong> Rudolf Eigenmann. Symbolic<br />

range propagation. In Proceedings of the 9th<br />

International Parallel Processing Symposium, April<br />

1995.<br />

[5] Francois Bodin. MENHIR: High per<strong>for</strong>mance code<br />

generation <strong>for</strong> <strong>MATLAB</strong>.<br />

http://www.irisa.fr/caps/PEOPLE/Francois/.<br />

[6] Timothy Budd. An APL Compiler. Springer Verlag,<br />

1988.<br />

[7] J. Choi, J. Dongarra, <strong>and</strong> D.W. Walker. BLAS<br />

reference manual (version 1.0beta). Technical Report<br />

ORNL/TM-12469, Oak Ridge National Laboratory,<br />

March 1994.<br />

[8] Luiz DeRose <strong>and</strong> David Padua. Techniques <strong>for</strong> the<br />

translation of <strong>MATLAB</strong> programs into Fortran 90.<br />

ACM Transactions on Programming Languages <strong>and</strong><br />

Systems (TOPLAS), 21(2):285–322, March 1999.<br />

[9] Luiz Antonio DeRose. Compiler Techniques <strong>for</strong><br />

<strong>MATLAB</strong> Programs. Technical Report<br />

UIUCDCS-R-96-1996, Department of Computer<br />

Science, University of Illinois, 1996.<br />

10<br />

303<br />

[10] L Peter Deutsch <strong>and</strong> Alan Schiffman. Efficient<br />

Implementation of the Smalltalk-80 System. In<br />

Proceedings of the 11th Symposium on the Principles<br />

of Programming Languages, Salt Lake City, UT, 1984.<br />

[11] Dawson R. Engler. VCODE: a portable, very fast<br />

dynamic code generation system. In Proceedings of the<br />

ACM SIGPLAN Conference on Programming<br />

Languages Design <strong>and</strong> Implementation (PLDI ’96),<br />

Philadelphia PA, May 1996.<br />

[12] Alej<strong>and</strong>ro L. Garcia. Numerical Methods <strong>for</strong> Physics.<br />

Prentice Hall, 1994.<br />

[13] Rajiv Gupta. Optimizing array bounds checks using<br />

flow analysis. ACM Letters on Programming<br />

Languages <strong>and</strong> Systems, 2(1-4):135–150, 1993.<br />

[14] John H. Mathews. Numerical Methods <strong>for</strong><br />

Mathematics, Science <strong>and</strong> Engineering. Prentice Hall,<br />

1992.<br />

[15] Mathworks Inc. homepage. www.mathworks.com.<br />

[16] Vijay Menon <strong>and</strong> Keshav Pingali. High-level semantic<br />

optimization of numerical codes. In 1999 ACM<br />

Conference on Supercomputing. ACM SIGARCH,<br />

June 1999.<br />

[17] Steven S. Muchnick <strong>and</strong> Neil D. Jones. Program Flow<br />

Analysis: Theory <strong>and</strong> Application. Prentice Hall, 1981.<br />

[18] Massimiliano Poletto, Dawson R. Engler, <strong>and</strong><br />

M. Frans Kaashoek. tcc: A system <strong>for</strong> fast, flexible,<br />

<strong>and</strong> high-level dynamic code generation. In<br />

Proceedings of the ACM SIGPLAN Conference on<br />

Programming Languages Design <strong>and</strong> Implementation<br />

(PLDI ’97), pages 109–121, Las Vegas, Nevada, May<br />

1997.<br />

[19] Massimiliano Poletto <strong>and</strong> Vivek Sarkar. Linear scan<br />

register allocation. ACM Transactions on<br />

Programming Languages <strong>and</strong> Systems, 21(5):895–913,<br />

1999.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!