MaJIC: Compiling MATLAB for Speed and Responsiveness*
MaJIC: Compiling MATLAB for Speed and Responsiveness*
MaJIC: Compiling MATLAB for Speed and Responsiveness*
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
<strong>MaJIC</strong>: <strong>Compiling</strong> <strong>MATLAB</strong> <strong>for</strong> <strong>Speed</strong> <strong>and</strong> <strong>Responsiveness*</strong><br />
ABSTRACT<br />
This paper presents <strong>and</strong> evaluates techniques to improve the<br />
execution per<strong>for</strong>mance of <strong>MATLAB</strong>. Previous ef<strong>for</strong>ts concentrated<br />
on source to source translation <strong>and</strong> batch compilation;<br />
<strong>MaJIC</strong> provides an interactive frontend that looks like<br />
<strong>MATLAB</strong> <strong>and</strong> compiles/optimizes code behind the scenes<br />
in real time, employing a combination of just-in-time <strong>and</strong><br />
speculative ahead-of-time compilation. Per<strong>for</strong>mance results<br />
show that the proper mixture of these two techniques can<br />
yield near-zero response time as well as per<strong>for</strong>mance gains<br />
previously achieved only by batch compilers.<br />
Categories <strong>and</strong> Subject Descriptors<br />
D.3.4 [Programming Languages]: Interpreters, Compilers,<br />
Code Generation, Run-time environments<br />
General Terms<br />
Design, Languages, Algorithms, Per<strong>for</strong>mance<br />
1. INTRODUCTION<br />
<strong>MATLAB</strong> [15], a product of Mathworks Inc., is a popular<br />
programming language <strong>and</strong> development environment<br />
<strong>for</strong> numeric applications. The <strong>MATLAB</strong> programming language<br />
resembles FORTRAN 90 in that it deals with vectors<br />
<strong>and</strong> matrices, but unlike FORTRAN it is weakly typed <strong>and</strong><br />
polymorphic.<br />
The main strengths of <strong>MATLAB</strong> lie both in its interactive<br />
nature, which makes it a h<strong>and</strong>y exploration tool, <strong>and</strong> the<br />
richness of its precompiled libraries <strong>and</strong> toolboxes.<br />
The main weakness of <strong>MATLAB</strong> is its slow execution, especially<br />
when compared to similarly written code in FOR-<br />
TRAN. Because <strong>MATLAB</strong> has weak typing, the interpreter<br />
in the development environment has to check types at runtime,<br />
resulting in substantial per<strong>for</strong>mance loss.<br />
∗ This work was supported in part by NSF contract ACI98-<br />
70687.<br />
Permission to make digital or hard copies of all or part of this work <strong>for</strong><br />
personal or classroom use is granted without fee provided that copies are<br />
not made or distributed <strong>for</strong> profit or commercial advantage <strong>and</strong> that copies<br />
bearthisnotice<strong>and</strong>thefullcitationonthefirstpage. Tocopyotherwise,to<br />
republish,topostonserversortoredistributetolists,requirespriorspecific<br />
permission <strong>and</strong>/or a fee.<br />
PLDI’02, June 17-19, 2002, Berlin, Germany.<br />
Copyright 2002 ACM 1-58113-463-0/02/0006 ...$5.00.<br />
George Almási <strong>and</strong> David Padua<br />
galmasi,padua@cs.uiuc.edu<br />
Department of Computer Science<br />
University of Illinois at Urbana-Champaign<br />
1<br />
294<br />
Previous work with <strong>MATLAB</strong> to FORTRAN translators,<br />
notably the FALCON compiler [9, 8], has shown a per<strong>for</strong>mance<br />
increase of up to three orders of magnitude by employing<br />
compile-time type analysis to reduce the number of<br />
runtime checks.<br />
<strong>MaJIC</strong> (Matlab Just-In-time Compiler) aims to achieve<br />
the same per<strong>for</strong>mance goals without sacrificing the interactive<br />
nature of <strong>MATLAB</strong>. Like FALCON, it attempts to remove<br />
the overhead of runtime type checks by compiling code<br />
instead of interpreting it. Unlike FALCON, which is a batch<br />
compiler, <strong>MaJIC</strong> preserves interactive behavior by minimizing<br />
– or hiding – compilation time. <strong>MaJIC</strong> attempts to<br />
compile code ahead of time by speculation; whenever speculation<br />
fails, <strong>MaJIC</strong> falls back to just-in-time compilation.<br />
<strong>MaJIC</strong>’s dynamic (JIT) compiler reduces compile time<br />
as much as possible. It consists of an extremely fast type<br />
inference engine <strong>and</strong> a relatively naive, but fast, code generation<br />
engine. Compilation is per<strong>for</strong>med as late as possible<br />
in order to gather more runtime in<strong>for</strong>mation, in the idea<br />
that better runtime in<strong>for</strong>mation allows the compiler to save<br />
time-consuming optimization steps.<br />
In addition to JIT compilation, <strong>MaJIC</strong> also per<strong>for</strong>ms<br />
speculative ahead-of-time compilation. Looking at source<br />
code only, the compiler guesses the run-time context most<br />
likely to occur in practice. If the guess is correct, the end<br />
result is highly optimized code that will have been compiled<br />
by the time it is needed, effectively hiding compilation latency.<br />
A wrong guess by the compiler results, at worst, in<br />
degraded per<strong>for</strong>mance, but never affects program correctness:<br />
<strong>MaJIC</strong> contains a mechanism to insure that code is<br />
only executed if its semantics are guaranteed.<br />
The rest of this paper is structured as follows. Section 2<br />
describes the software architecture of <strong>MaJIC</strong> <strong>and</strong> optimization<br />
techniques related to JIT type inference <strong>and</strong> speculative<br />
type inference. Section 3 presents <strong>and</strong> analyzes the per<strong>for</strong>mance<br />
results we obtained. Section 4 offers a brief survey of<br />
related work. In Section 5 we present our conclusions.<br />
2. SOFTWARE ARCHITECTURE<br />
<strong>MaJIC</strong>’s users interact with a <strong>MATLAB</strong>-like front end:<br />
a compatible interpreter that can execute <strong>MATLAB</strong> code at<br />
approximately <strong>MATLAB</strong>’s original speed. However, <strong>MaJIC</strong>’s<br />
front end doesn’t attempt to execute all code: it defers computationally<br />
complex tasks (in the current implementation,<br />
function calls) to the code repository. To pass work to the<br />
repository, the <strong>MaJIC</strong> front end builds an invocation containing<br />
the name of a <strong>MATLAB</strong> function <strong>and</strong> the values of<br />
the parameters (if any).
type signature<br />
source code<br />
1<br />
parser<br />
2<br />
inliner<br />
compiled AST<br />
symbol table<br />
U/D chain<br />
2<br />
disambiguator<br />
The code repository is a database of compiled code. It<br />
compiles code on its own, ahead of time, by snooping the<br />
source code directories, maintaining dependency in<strong>for</strong>mation<br />
between source code <strong>and</strong> object code <strong>and</strong> triggering<br />
recompilations when the source code changes. The repository<br />
can also compile code as a result of user actions (such<br />
as invoking <strong>MATLAB</strong> functions).<br />
The code repository collects the type in<strong>for</strong>mation necessary<br />
<strong>for</strong> compiling <strong>MATLAB</strong> code. This type in<strong>for</strong>mation<br />
comes from different sources: directly from the user (i.e.<br />
when the user calls a function directly), from earlier runs of<br />
the same code, or from the type speculator.<br />
The code repository responds to requests <strong>for</strong> compiled<br />
code by the interpreter. It has a type matching system<br />
(described in Section 2.2.1) that allows the retrieval of semantically<br />
correct compiled code <strong>for</strong> a given invocation by<br />
the interpreter. A failure to find appropriate code usually<br />
triggers a compilation; since this typically happens during<br />
program execution, where time is at a premium, the JIT<br />
compiler is used in this situation. The generated code can<br />
later be recompiled (<strong>and</strong> replaced in the repository) using a<br />
better compiler.<br />
The compiler itself has the task of turning source code<br />
into executable code. The compiler’s passes are shown in<br />
Figure 1.<br />
• The first pass is a scanner/parser which trans<strong>for</strong>ms<br />
<strong>MATLAB</strong> source into an abstract syntax tree (AST).<br />
<strong>MaJIC</strong>’sparserisbasedonFALCON’sparserwitha<br />
few minor improvements.<br />
• Next, preliminary data flow analysis (disambiguation)<br />
is per<strong>for</strong>med to build a static symbol table. At this<br />
point the compiler can optionally per<strong>for</strong>m function inlining<br />
(which then necessitates the re-building of the<br />
symbol table).<br />
• When the symbol table is complete, the compiler per<strong>for</strong>ms<br />
type inference. This pass conservatively assigns<br />
types to all expressions in the program text. In JIT<br />
compilation mode, the type inference engine uses runtime<br />
in<strong>for</strong>mation fed to it by the repository; in speculative<br />
mode, the inference engine uses only the AST<br />
<strong>and</strong> the symbol table <strong>and</strong> produces speculative results.<br />
• The last step of the compilation is code generation.<br />
3<br />
JIT<br />
type inference<br />
−− or −−<br />
speculative<br />
type inference<br />
type annotations<br />
fun. call map<br />
Figure 1: <strong>MaJIC</strong> compiler passes<br />
2<br />
295<br />
4<br />
JIT<br />
code generator<br />
−− or −−<br />
speculative mode<br />
code generator<br />
native/object<br />
code<br />
There exists a code generator each <strong>for</strong> JIT <strong>and</strong> speculative<br />
mode. The JIT code generator builds code fast<br />
<strong>and</strong> in memory; in speculative mode, the code generator<br />
builds C or Fortran source code, which is then<br />
compiled <strong>and</strong> linked with plat<strong>for</strong>m native tools.<br />
In the next few sections we present some of the compiler<br />
passes in more detail.<br />
2.1 Disambiguating <strong>MATLAB</strong> symbols<br />
Other than keywords, symbols in <strong>MATLAB</strong> can represent<br />
variables, calls to built-in primitives, or calls to user<br />
functions. The interpreter recognizes a symbol as a variable<br />
when it appears on the left side of an assignment, or else if it<br />
has an entry in the dynamic symbol table of the interpreter.<br />
A symbol not recognized as a variable is potentially a builtin<br />
primitive; if it cannot be resolved as either a variable or<br />
a built-in, the <strong>MATLAB</strong> interpreter also consults the dynamic<br />
table of existing user functions. If the symbol cannot<br />
be found there either, its occurrence is treated as an error.<br />
Unlike the <strong>MATLAB</strong> interpreter, <strong>MaJIC</strong> needs to identify<br />
symbol meanings at compile time; but some symbols’<br />
meanings are hard to determine without running the code.<br />
Figure 2 shows code with ambiguous symbols. The left box<br />
shows a loop where the first occurrence of the symbol i is<br />
ambiguous, interpreted by <strong>MATLAB</strong> as √ −1inthefirst<br />
iteration, <strong>and</strong> as a variable in all following iterations.<br />
The right code box contains a loop where compiler analysis<br />
would recognize the right-h<strong>and</strong>-side occurrence of y is a<br />
possible undefined variable, or even a user function, if control<br />
flow is not taken into account. Looking at control flow,<br />
however, makes it obvious that y can only be accessed only<br />
after having been defined.<br />
clear<br />
while(...),<br />
z = i;<br />
i = z+1;<br />
end<br />
clear<br />
x=0;<br />
<strong>for</strong> p=1:N,<br />
if (p ≥ 2) then x = y;<br />
y = p;<br />
end<br />
Figure 2: Ambiguous symbols in <strong>MATLAB</strong><br />
Ambiguous symbols are rare in practice <strong>and</strong> almost always<br />
a sign of buggy code. <strong>MaJIC</strong> does deal with them: it de-
fers their processing until runtime. Non-ambiguous variables<br />
can, however, be identified at compile time by a variation<br />
of reaching definitions analysis: a symbol that has a reachingdefinitionasavariableonall<br />
paths leading to it must<br />
be a variable. This analysis is the first pass of the <strong>MaJIC</strong><br />
compiler.<br />
2.2 The type system<br />
<strong>MaJIC</strong>’s type system is used by the type inference engine<br />
<strong>and</strong> by the code repository. The type system is inspired<br />
from that of FALCON, which in turn was influenced by the<br />
APL [6] <strong>and</strong> SETL compilers. <strong>MaJIC</strong>’s notion of a type is<br />
represented by the Cartesian product of several lattices as<br />
follows:<br />
The intrinsic type of the expression is an element in the finite<br />
lattice Li <strong>for</strong>med by the elements real, integer, boolean,<br />
complex <strong>and</strong> string, <strong>and</strong> the requisite comparison operator:<br />
Li ={J , ⊥i, ⊤i, ⊑i, ⊔i}, where<br />
J = {⊥i, bool, int, real, cplx, strg, ⊤i}<br />
⊥i ⊑i bool ⊑i int ⊑i real ⊑i cplx ⊑i ⊤i <strong>and</strong><br />
⊥i ⊑i strg ⊑i ⊤i<br />
A <strong>MaJIC</strong> expression’s shape Ls consists of a pair of values,<br />
one each <strong>for</strong> the number of rows <strong>and</strong> columns of the<br />
expression. In the current version of <strong>MaJIC</strong> we only consider<br />
Fortran-like two-dimensional shapes:<br />
Ls ={N × N, ⊥s, ⊤s, ⊑s, ⊔s}, where<br />
⊥s =< 0, 0 >, ⊤s =< ∞, ∞ >;<br />
⊑s< c,d> iff a ≤ c <strong>and</strong> b ≤ d<br />
An expression’s range Ll is the interval of values the expression<br />
can take [4]. We define ranges only <strong>for</strong> real numbers;<br />
strings <strong>and</strong> complex expressions do not have associated<br />
ranges. The two numbers in the range define the (inclusive)<br />
lower <strong>and</strong> upper limits of an interval. The lower limit is always<br />
less than or equal to the upper limit, or else the range<br />
is mal<strong>for</strong>med:<br />
Ll ={R × R, ⊥l, ⊤l, ⊑l, ⊔l}, where<br />
⊥l =< nan, nan >; ⊤l =< −∞, ∞ >;<br />
⊑l< c,d> iff = ⊥l or (c ≤ a <strong>and</strong> b ≤ d)<br />
The type system is the Cartesian product T = Li × Ls ×<br />
Ls × Ll. Ls appears twice, because <strong>MaJIC</strong> tracks lower as<br />
well as upper bounds of shape descriptors. We will use the<br />
collective denominator “shape” to mean both descriptors<br />
together. Thus the type system consists of intrinsic type,<br />
shape <strong>and</strong> range in<strong>for</strong>mation.<br />
2.2.1 Type signatures<br />
Suppose that a function we are compiling has n <strong>for</strong>mal<br />
parameters {f1,f2, ...fn}. We assign the following types to<br />
the parameters: T = {T1,T2, ...Tn}, whereTi ∈T, 1 ≤ i ≤<br />
n.<br />
We call T the type signature of the compiled code.<br />
We use type signatures to determine whether compiled<br />
code is safe to execute, given a particular invocation. <strong>MaJIC</strong><br />
generates code in such a way that an invocation of the compiled<br />
code with the actual parameters {a1,a2, ...an} having<br />
types {Q1,Q2, ...Qn} is safe if Qi ⊑ Ti, 1 ≤ i ≤ n. An ac-<br />
3<br />
296<br />
tual invocation is safe as long the types of the inputs are<br />
subtypes of the type signature of the compiled code.<br />
The code repository may contain, at any time, several<br />
compiled versions of the same code, differing only in the<br />
assumptions about the types of input parameters (Figure 3<br />
shows a simple function with a single parameter as an example).<br />
The function locator has to match a given invocation<br />
to a version of compiled code in the repository that is safe<br />
to execute (i.e. preserves the semantics of the program),<br />
<strong>and</strong> at the same time is optimal per<strong>for</strong>mance-wise. In order<br />
to do so, the function locator checks the type signature of<br />
the invocation against the signatures of the existing compiled<br />
objects in the repository, until a matching object is<br />
found or all repository objects are exhausted. When several<br />
matching objects exist, the code repository uses simple<br />
heuristics to find the best matching c<strong>and</strong>idate <strong>for</strong> a particular<br />
call, based on a Manhattan-like “distance” between the<br />
type signature of the invocation <strong>and</strong> the matching compiled<br />
code.<br />
2.3 Type inference<br />
The type inference engine is an iterative join-of-all-paths<br />
monotonic data analysis framework [17]. It starts out with<br />
the control flow graph (CFG) of a <strong>MATLAB</strong> program <strong>and</strong>, in<br />
the case of JIT type inference, a type signature T (where |T |<br />
is equal to the number of <strong>for</strong>mal parameters of the function<br />
that is being compiled). The result of type inference is a set<br />
of type annotations S, one type <strong>for</strong> each expression node in<br />
the abstract syntax tree. S is a conservative estimate of the<br />
types that expression nodes can assume during execution.<br />
The annotations are later used by the code generator.<br />
Because <strong>MaJIC</strong> has a relatively simple type system, <strong>and</strong><br />
because the type inference engine avoids symbolic computation<br />
<strong>and</strong> caps the number of iterations, the type inference<br />
engine is fast enough <strong>for</strong> use by the JIT compiler.<br />
2.3.1 Transfer functions<br />
The transfer functions of the type inference engine are implemented<br />
as a set of rules in a type calculator. The calculator<br />
has two modes of operation: in <strong>for</strong>ward mode it infers<br />
expression types from argument types; in backward mode<br />
it infers argument types from the expressions’ types (this<br />
mode is used by the type speculator).<br />
Multiple type calculation rules may exist <strong>for</strong> each AST<br />
node type. Each rule is guarded by a boolean precondition.<br />
When the type calculator is invoked with a particular AST<br />
node as argument, the corresponding rules’ preconditions<br />
are tested in order until one evaluates to true; the rule is<br />
then applied to calculate the result(s).<br />
A rational way of ordering type inference rules is to progress<br />
from the most restrictive ones to the least restrictive ones.<br />
Evaluating more restrictive rules first makes sense because<br />
these generally lead to better per<strong>for</strong>mance, whereas more<br />
general rules tend to yield generic, low per<strong>for</strong>mance code. If<br />
no rules’ preconditions evaluate to true, the type calculator<br />
applies the implicit default rule: all output types are set to<br />
⊤. This allows the type inference engine to behave conservatively<br />
<strong>for</strong> language constructs that have no corresponding<br />
rules in the database.<br />
Thus <strong>for</strong> example the “*” operator in <strong>MaJIC</strong> can be evaluated<br />
successively as an instance of: integer scalar multiply;<br />
real scalar multiply; complex scalar multiply; real scalar ×<br />
vector or vector × scalar; part of a dgemv operation; or a
<strong>MATLAB</strong> code type signature generated code: C + <strong>MATLAB</strong> C library functions<br />
itype(x)=int<br />
int poly1 sig0() {<br />
shape(x)=scalar<br />
return 254;<br />
limits(x)=<br />
}<br />
itype(x)=int<br />
int poly1 sig1(int x) {<br />
shape(x)=scalar<br />
return x*x*x*x*x+3*x+2;<br />
limits(x)=⊤l<br />
itype(x)=real<br />
shape(x)=scalar<br />
}<br />
int poly1 sig2(double x) {<br />
return x*x*x*x*x+3.0*x+2.0;<br />
function p=poly(x)<br />
p = x.ˆ5+3*x+2;<br />
return<br />
limits(x)=⊤l<br />
itype(x)=real<br />
minshape(x)=<br />
maxshape(x)=<br />
limits(x)=⊤l<br />
}<br />
double *poly sig3(double x[3]) {<br />
static tmp2[3];<br />
tmp2[0]=x[0]*x[0]*x[0]*x[0]*x[0]+3.0*x[0]+2.0;<br />
tmp2[1]=x[1]*x[1]*x[1]*x[1]*x[1]+3.0*x[1]+2.0;<br />
tmp2[2]=x[2]*x[2]*x[2]*x[2]*x[2]+3.0*x[2]+2.0;<br />
}<br />
mxArray *poly4 sig1(mxArray *x) {<br />
mxArray *tmp1 = mlfScalar(5.0);<br />
mxArray *tmp2 = mlfPower(x,tmp1); mxFree(tmp1);<br />
mxArray *tmp3 = mlfScalar(3.0);<br />
itype(x)=complex<br />
shape(x)=⊤s<br />
limits(x)=⊤l<br />
mxArray *tmp4 = mlfTimes(tmp3,x); mxFree(tmp3);<br />
mxArray *tmp5 = mlfPlus(tmp2, tmp4);<br />
mxFree(tmp2); mxFree(tmp4);<br />
mxArray *tmp6 = mlfScalar(2.0);<br />
mxArray *tmp7 = mlfPlus(tmp5, tmp6);<br />
mxFree(tmp5); mxFree(tmp6);<br />
return tmp7;<br />
}<br />
Figure 3: Type signatures <strong>and</strong> generated code.The operators itype(x), shape(x) <strong>and</strong> limits(x) in the second<br />
column refer to type components from the type lattice defined earlier.<br />
generic complex matrix multiply. This does not exhaust all<br />
possibilities, but these are the categories that <strong>MaJIC</strong> can<br />
generate successively less optimized code <strong>for</strong>.<br />
Currently, <strong>MaJIC</strong>’s type calculator contains about 250<br />
rules. Each <strong>MATLAB</strong> expression/operator type has at least<br />
one entry in the database; many of <strong>MATLAB</strong>’s built-in functions<br />
have several entries each. Our current implementation<br />
covers just enough of <strong>MATLAB</strong> to execute the benchmarks<br />
efficiently. The type inference engine can h<strong>and</strong>le all other<br />
language features by resorting to the default.<br />
2.4 JIT type inference<br />
In JIT mode, the type calculator per<strong>for</strong>ms only <strong>for</strong>ward<br />
analysis. Type inference propagates the type signature of<br />
the function to calculate type annotations <strong>for</strong> the function<br />
body. Since the type inference system is biased towards<br />
more speed in detriment of precision, one would expect the<br />
quality of type annotations to suffer when per<strong>for</strong>ming justin-time<br />
type inference. However, JIT type inference operates<br />
with very precise initial data: the type signature of the code,<br />
derived directly from the input values of the runtime invocation.<br />
Under these circumstances type inference is not only<br />
precise but lends itself to a number of extensions, which extract<br />
additional in<strong>for</strong>mation from the type inference process<br />
at little or no additional cost:<br />
• Constant propagation: Range propagation (the part<br />
of type inference which deals with the Ll lattice) can<br />
be thought of as a generalization of constant propagation<br />
<strong>for</strong> real scalars. A real value is a constant if<br />
its lower <strong>and</strong> upper limits are equal. Given a type<br />
signature that contains many constants, most of the<br />
transfer functions are able to calculate exact lower <strong>and</strong><br />
4<br />
297<br />
upper limits <strong>for</strong> scalar objects, effectively per<strong>for</strong>ming<br />
constant propagation as part of type inference.<br />
Range propagation does not work <strong>for</strong> complex numbers<br />
<strong>and</strong> non-numeric values, so constant propagation does<br />
not work <strong>for</strong> these either.<br />
• Exact shape inference: <strong>MaJIC</strong> propagates lower<br />
<strong>and</strong> upper bounds <strong>for</strong> array shape in<strong>for</strong>mation. An array’s<br />
shape is exactly determined if the lower <strong>and</strong> upper<br />
shape bounds are equal. Just as constants can be<br />
determined given good input data, exact array shapes<br />
can be determined also. Sometimes value range propagation<br />
<strong>and</strong> shape propagation collaborate on determining<br />
exact shapes. For example, in the statement A =<br />
zeros(m,n), the value ranges of m <strong>and</strong> n may uniquely<br />
determine the shape of A.<br />
In array assignments of the <strong>for</strong>m A(i)=..., the range<br />
of the index can determine the shape of the array A<br />
(because <strong>MATLAB</strong> arrays reshape themselves to accommodate<br />
indices).<br />
There are a number of ways in which exact shapes<br />
can be used to achieve better per<strong>for</strong>mance. For example,<br />
by completely unrolling simple operations on<br />
small arrays we can eliminate all control flow from the<br />
operation.<br />
• Subscript check removal: <strong>MATLAB</strong> m<strong>and</strong>ates subscript<br />
checks on all array accesses. The removal of<br />
unnecessary subscript checks is a major source of per<strong>for</strong>mance<br />
enhancement in <strong>MaJIC</strong>.<br />
Older versions of <strong>MATLAB</strong>’s own compiler, mcc, have<br />
comm<strong>and</strong> line switches to disable subscript checks (in-
cluding resizing checks). This can cause code that otherwise<br />
is correct to run incorrectly when compiled with<br />
mcc. Newer versions of mcc have consequently discontinued<br />
the option.<br />
<strong>MaJIC</strong> removes subscript checks automatically <strong>and</strong><br />
conservatively, by using the range <strong>and</strong> shape in<strong>for</strong>mation<br />
readily available during type inference. Because<br />
JIT type inference propagates these exactly, the extra<br />
ef<strong>for</strong>t needed <strong>for</strong> subscript check analysis is extremely<br />
low, comparing favorably with more conventional techniques<br />
[13].<br />
2.5 Type speculation<br />
Just-in-time type inference assumes that the full calling<br />
context (i.e. the type signature) is available to the analyzer.<br />
By contrast, type speculation assumes nothing about the<br />
calling context: it guesses the likely types of the arguments.<br />
This allows the compiler to process the code ahead of time,<br />
applying advanced (<strong>and</strong> time consuming) loop optimizations<br />
in order to generate good code.<br />
The type speculator’s trick is to back-propagate certain<br />
type hints from the body of the code to the input parameters.<br />
Type hints are collected from syntactic constructs that<br />
suggest, but do not comm<strong>and</strong>, particular semantic meanings.<br />
These constructs originate in part from programmers’ tendency<br />
to avoid arcane <strong>MATLAB</strong> specific constructs, sticking<br />
instead to features already prevalent in Fortran-like languages.<br />
Other hints can be derived from some <strong>MATLAB</strong><br />
built-in functons’ affinity towards certain inputs. The following<br />
list summarizes the type hints used by <strong>MaJIC</strong>’s<br />
speculator:<br />
• When processing the colon operator (:), used to specify<br />
index ranges, <strong>MATLAB</strong> silently ignores the imaginary<br />
part of the index arguments. Even if the index<br />
is a complex array, only the real part of its first element<br />
is used, <strong>and</strong> all indices are of course rounded<br />
be<strong>for</strong>e use. This suggests that oper<strong>and</strong>s of the interval<br />
operator are almost always integer scalars.<br />
• Relational operators disregard the imaginary components<br />
of their oper<strong>and</strong>s. Also, relational operations<br />
between vectors are possible but are rare in practice,<br />
since their semantics are non-intuitive. This holds even<br />
stronger <strong>for</strong> expressions that <strong>for</strong>m the condition of an<br />
if-statement of a while-statement.<br />
• The <strong>MATLAB</strong> bracket operator (vector constructor)<br />
collates several matrices into a new larger matrix. The<br />
components all have to have either the same number of<br />
rows or the same number of columns. In practice the<br />
bracket operator is often used to build vectors out of<br />
scalars. When we can prove that one of the arguments<br />
xi of the bracket operator [x1x2...xn] is a scalar, all<br />
other arguments are probably scalars too.<br />
• In matrix index expressions of the <strong>for</strong>m A(idx) <strong>and</strong><br />
A(idx1,idx2), if the subscript is an expression or a<br />
variable then it is likely scalar. This is a reasonable assumption<br />
because a many <strong>MATLAB</strong> applications use<br />
either Fortran77 or Fortran90 compatible array indexing<br />
operations. Fortran90 syntax is indicated by the<br />
presence of the colon (:) operator; the lack of colons<br />
indicates Fortran77 syntax.<br />
5<br />
298<br />
• Arguments to a number of builtin functions, such as<br />
zeros, ones, r<strong>and</strong>, the second argument of size <strong>and</strong><br />
many others, are likely integer scalars. <strong>MATLAB</strong> issues<br />
warnings when the arguments in question are nonscalars<br />
or non-integers, but does not stop processing.<br />
However most well-written <strong>MATLAB</strong> programs don’t<br />
intentionally produce these warnings.<br />
These hints are implemented as type calculator rules. Note<br />
that the hints involve backwards propagation of types, since<br />
they make statements about input arguments rather than<br />
the result types of <strong>MATLAB</strong> expressions. Thus, in order to<br />
propagate hints, the type inference engine must be used in<br />
backwards mode.<br />
Speculative type inference consists of a number of alternating<br />
backward <strong>and</strong> <strong>for</strong>ward type inference passes. A speculative<br />
(backward) pass infers a credible type signature from<br />
the code body; it is immediately followed by a normal type<br />
inference pass to re-calculate the types in the body. The<br />
alternating backwards-<strong>for</strong>wards process can be iterated several<br />
times until convergence.<br />
2.6 Code generation<br />
<strong>MaJIC</strong> has two code generation systems: a fast lightweight<br />
code generator used <strong>for</strong> JIT compilation, <strong>and</strong> a C (or Fortran)<br />
based code generator that uses the host system to compile,<br />
optimize <strong>and</strong> link the code. Both code generators use<br />
the parsed AST <strong>and</strong> type annotations to drive code selection.<br />
The code generators follow the same general selection<br />
rules, but build radically different code.<br />
The JIT code generator is able to build executable code<br />
directly in memory by using the vcode [11] dynamic assembler.<br />
The code generator makes a single code selection pass<br />
through the parsed AST. No loop optimizations or instruction<br />
scheduling are per<strong>for</strong>med. Register allocation is done<br />
using the linear-scan register allocator [19]. This, <strong>and</strong> the<br />
small total number of code generation passes, results in a<br />
fast code generator.<br />
The source code generator is somewhat more complicated.<br />
It uses the same code selection pass as the JIT code generator,<br />
but builds C or Fortran source code in a temporary<br />
file. This file is then compiled with the native compiler using<br />
the most aggressive optimization mode that is available.<br />
The compiler generates a relocatable object which is then<br />
dynamically linked into the <strong>MaJIC</strong> executable. Unlike the<br />
JIT code generator, the source code generator is quite slow,<br />
hampered by the large overheads of loading <strong>and</strong> executing<br />
the compiler <strong>and</strong> the linker. Compilation, optimization <strong>and</strong><br />
linking can take several seconds.<br />
2.6.1 Code selection rules<br />
As mentioned be<strong>for</strong>e, the two code generators use the<br />
same selection rules even though they use them to produce<br />
different code. A few of the selection rules are listed below.<br />
• The implicit default rule <strong>for</strong> any operator is that the<br />
numeric oper<strong>and</strong>s are complex matrices. This is the<br />
unoptimized fall-back option <strong>for</strong> operations that have<br />
not been type-inferred. The <strong>MATLAB</strong> C library provides<br />
functions that implement these generic operators.<br />
• <strong>MaJIC</strong> inlines scalar arithmetic <strong>and</strong> logical operations,<br />
elementary math functions <strong>and</strong> assignments of
scalar integers, reals <strong>and</strong> complex numbers. This is<br />
probably the most important per<strong>for</strong>mance optimization<br />
in <strong>MaJIC</strong>: it relies on type annotations to replace<br />
<strong>MATLAB</strong>’s polymorphic operations with single<br />
machine instructions.<br />
• <strong>MaJIC</strong> inlines scalar <strong>and</strong> F90-like array index operations.<br />
The <strong>MATLAB</strong> interpreter discriminates between<br />
array expressions types at runtime, spending<br />
hundreds of cycles. By contrast, an inlined scalar index<br />
operation takes only a few cycles.<br />
• Small temporary arrays of known sizes are pre-allocated.<br />
<strong>MATLAB</strong>’s expression evaluation semantics sometimes<br />
<strong>for</strong>ces the existence of temporary buffers to hold intermediary<br />
array results. Replacing dynamically allocated<br />
buffers with statically allocated ones saves a lot<br />
of overhead at the expense of a small amount of heap<br />
memory.<br />
• Elementary vector operations, such as arithmetic operations<br />
<strong>and</strong> vector concatenation, are completely unrolled<br />
when exact array shapes are known. This technique<br />
is very effective on small (up to 3 × 3) matrices<br />
<strong>and</strong> vectors because it completely eliminates loop overhead.<br />
• <strong>MaJIC</strong> per<strong>for</strong>ms code selection to combine several<br />
AST nodes into a single library call. For example, expressions<br />
like a*X+b*C*Y are trans<strong>for</strong>med into a single<br />
call to the BLAS routine dgemv [7].<br />
• Unlike Fortran, <strong>MATLAB</strong> resizes arrays on dem<strong>and</strong>.<br />
In general, this occurs when an array index overflow<br />
occurs on the left h<strong>and</strong> side. Repetitive array resizing<br />
(e.g. in a loop) can be tremendously expensive.<br />
<strong>MaJIC</strong> applies the simple but effective technique of<br />
“oversizing” arrays, i.e. allocating about 10% more<br />
space <strong>for</strong> a resized array than strictly necessary, so<br />
that subsequent growth of the array does not necessitate<br />
another resize operation.<br />
<strong>MaJIC</strong> per<strong>for</strong>ms oversizing carefully in order to preserve<br />
the original semantics of the code. The oversized<br />
array, when queried, returns accurate size in<strong>for</strong>mation.<br />
Oversizing is also limited by the amount of available<br />
memory <strong>and</strong> the size of the array. Large arrays are<br />
never oversized.<br />
• <strong>MaJIC</strong> inlines calls to small (less than 200 lines of<br />
code) functions. Inlining preserves the call-by-value<br />
semantics of <strong>MATLAB</strong> by making copies of the actual<br />
parameters. However, read-only <strong>for</strong>mal parameters are<br />
not copied. This can result in huge per<strong>for</strong>mance gain<br />
when large matrices are being passed as read-only arguments<br />
in the call.<br />
3. PERFORMANCE EVALUATION<br />
In this section, we evaluate the overall per<strong>for</strong>mance of the<br />
<strong>MaJIC</strong> compiler. Although the repository is part of the<br />
interactive <strong>MaJIC</strong> system, an evaluation of its per<strong>for</strong>mance<br />
is not a goal of this paper. We were interested in analyzing<br />
the quality <strong>and</strong> speed of the JIT <strong>and</strong> speculative compilers.<br />
To test JIT compilation, we started our experiments with<br />
an empty repository. This resulted in the JIT compiler being<br />
invoked <strong>for</strong> any function call.<br />
6<br />
299<br />
To test speculative compilation, we also started up <strong>MaJIC</strong><br />
with an initially empty repository, but we invoked the benchmarks<br />
only after <strong>MaJIC</strong>’s repository had ample time to find<br />
them <strong>and</strong> compile them speculatively.<br />
3.1 Benchmarks<br />
<strong>MaJIC</strong> was tested with 15 <strong>MATLAB</strong> benchmarks, between<br />
50 <strong>and</strong> 250 lines long each. Table 1 lists the names,<br />
origin <strong>and</strong> functional description of the benchmarks, as well<br />
as the associated problem size <strong>for</strong> which measurements were<br />
run (matrix sizes in some of the benchmarks). In addition,<br />
we list the number of lines in each benchmark <strong>and</strong> the runtime<br />
on a reference system (the SPARC plat<strong>for</strong>m described<br />
in Section 3.3) using a stock <strong>MATLAB</strong> interpreter.<br />
Many of the benchmarks were originally used to evaluate<br />
FALCON; we reused them in order to facilitate a direct<br />
comparison of <strong>MaJIC</strong> <strong>and</strong> FALCON.<br />
In order to make the subsequent discussion easier, we<br />
group the benchmarks into four partially overlapping categories.<br />
Benchmarks in the same categories tend to be optimized<br />
in similar ways by the compiler, <strong>and</strong> show similar<br />
per<strong>for</strong>mance gains:<br />
• Scalar, or Fortran-like, benchmarks: dirich, finedif,<br />
icn, m<strong>and</strong>el <strong>and</strong>, to some extent, crnich, are written<br />
in a style that closely resembles Fortran 77. All array<br />
indices in these benchmarks are scalars.<br />
• Benchmarks with built-in functions: cgopt, qmr, sor<br />
<strong>and</strong> mei spend a large portion of their runtime in builtin<br />
<strong>MATLAB</strong> library functions. Typically, these codes<br />
are hard to optimize, since the the library functions<br />
themselves are already optimized.<br />
• Array benchmarks: orbec, orbrk, fractal <strong>and</strong> adapt<br />
have many operations on small fixed size <strong>MATLAB</strong><br />
vectors. adapt features a large (<strong>and</strong> dynamically growing)<br />
array as well as small vectors.<br />
• Recursive benchmarks: fibo <strong>and</strong> ack contain recursion,<br />
which makes inlining <strong>and</strong> type inference harder.<br />
3.2 Measurement methodology<br />
Our per<strong>for</strong>mance figures are derived from the running<br />
times of the benchmarks. The most important gauge of<br />
per<strong>for</strong>mance we use is the speedup of compiled code relative<br />
to interpreted code, i.e. the expression s = ti/tc, whereti is<br />
the runtime of the code in <strong>MATLAB</strong>’s interpreter, <strong>and</strong> tc is<br />
the runtime in the compiler.<br />
We measured <strong>MaJIC</strong>’s speedups in both JIT <strong>and</strong> speculative<br />
compilation mode. In JIT mode runtime includes<br />
the time spent by the JIT compiler producing object code.<br />
In speculative mode the repository is assumed to have a<br />
generated the code ahead of time; hence compile time is not<br />
included in the runtime in this case, unless the speculatively<br />
generated code turns out not to match the benchmarks invocation<br />
– in this latter case the JIT compiler kicks in <strong>and</strong><br />
helps out with the code generation. This mode of measuring<br />
runtimes is consistent with the expected real-world usage<br />
pattern of <strong>MaJIC</strong>.<br />
For purposes of comparison, we also measured the speedups<br />
of mcc, the compiler supplied by the Mathworks Inc. We set<br />
a number of compile time options <strong>for</strong> this compiler in order<br />
to guarantee the best per<strong>for</strong>mance: we manually eliminated
enchmark source short description problem size lines of code runtime (s)<br />
adapt [14] adaptive quadrature approx. 2500 81 5.24<br />
cgopt [3] conjugate gradient w. diagonal preconditioner 420 x 420 38 0.43<br />
crnich [14] Crank-Nicholson heat equation solver 321 x 321 40 16.33<br />
dirich [14] Dirichlet solution to Laplace’s equation 134 x 134 34 277.89<br />
finedif [14] Finite difference solution to the wave equation 1000 x 1000 21 57.81<br />
galrkn [12] Galerkin’s method (finite element method) 40 x 40 43 8.02<br />
icn R. Bramley Cholesky factorization 400 x 400 29 7.72<br />
mei unknown fractal l<strong>and</strong>scape generator 31 x 14 24 10.77<br />
orbec [12] Euler-Cromer method <strong>for</strong> 1-body problem 62400 points 24 19.10<br />
orbrk [12] Runge-Kutta method <strong>for</strong> 1-body problem 5000 points 52 9.30<br />
qmr [12] linear equation system solver, QMR method 420 x 420 119 5.29<br />
sor [3] lin. eq. sys. solver, successive overrelaxation 420 x 420 29 4.77<br />
ackermann authors Ackermann’s function ackermann(3,5) 15 3.84<br />
fractal authors Barnsley fern generator 25000 points 35 26.55<br />
m<strong>and</strong>el authors M<strong>and</strong>elbrot set generator 200 x 200 16 8.64<br />
fibonacci authors recursive Fibonacci function fibonacci(20) 10 1.29<br />
subscript checks, <strong>and</strong> replaced operations on complex number<br />
with real number operations where it was safe to do<br />
so.<br />
We measured the speedups of FALCON by repeating the<br />
experiments described in [9] on our test machines. We instructed<br />
FALCON to eliminate subscript checks wherever<br />
this did not break the code.<br />
Execution times were measured on a “best of 10 runs”<br />
basis on a quiet system.<br />
Our per<strong>for</strong>mance graphs show four bars <strong>for</strong> each benchmark.<br />
The four bars are the speedups achieved by mcc,<br />
FALCON, <strong>MaJIC</strong> in JIT mode <strong>and</strong> <strong>MaJIC</strong> in speculative<br />
mode respectively. Because the speedups are distributed<br />
over four orders of magnitude, ranging from 0.1 to about<br />
1000, the graphs are represented on a logarithmic scale.<br />
3.3 Testing plat<strong>for</strong>ms <strong>and</strong> speedups<br />
We measured the interpreted execution time ti of all benchmarks<br />
using the <strong>MATLAB</strong> 6 (release 12) integrated environment<br />
on two architectures:<br />
• The development plat<strong>for</strong>m <strong>for</strong> <strong>MaJIC</strong> is a 400MHz<br />
UltraSparc 10 workstation with 256MB of RAM, running<br />
Solaris 7 <strong>and</strong> equipped with the Sparcworks 5.0<br />
C compiler. The per<strong>for</strong>mance results <strong>for</strong> this machine<br />
aresummarizedinFigure4.<br />
As described above, the figure has bars <strong>for</strong> each benchmark,<br />
called “mmc”, “falcon”, “jit” <strong>and</strong> “spec” respectively.<br />
A few of the speedup bars are missing: there<br />
are no FALCON speedup bars <strong>for</strong> the benchmarks ack,<br />
fractal, fibo <strong>and</strong> m<strong>and</strong>el, because these were not<br />
part of the original FALCON benchmark series <strong>and</strong><br />
are unsuitable <strong>for</strong> compilation with FALCON.<br />
The speedup bars of cgopt appear to be missing because<br />
they are very close to 1.0.<br />
• We also ran some of the experiments on an SGI Origin<br />
200 machine equipped with 4 180MHz R10000 processors,<br />
IRIX 6.5 <strong>and</strong> the MIPSPro C compiler. The JIT<br />
compiler on this plat<strong>for</strong>m is not yet completely implemented.<br />
Some benchmarks (like adapt) wereleftout<br />
of the graphs <strong>for</strong> this reason. Others are included, but<br />
Table 1: <strong>MaJIC</strong> benchmarks<br />
7<br />
300<br />
1000<br />
100<br />
10<br />
1<br />
0.1<br />
crnich<br />
dirich<br />
finedif<br />
icn<br />
m<strong>and</strong>el<br />
mmc falcon jit+gen spec<br />
cgopt<br />
mei<br />
qmr<br />
Figure 4: Per<strong>for</strong>mance on the SPARC plat<strong>for</strong>m<br />
run at reduced per<strong>for</strong>mance due to the poor quality of<br />
the generated code. Figure 5 shows the results.<br />
3.4 Comparative per<strong>for</strong>mance analysis<br />
The two groups of benchmarks that most clearly benefit<br />
from compilation are the Fortran-like benchmarks <strong>and</strong> the<br />
small vector benchmarks. These types of codes incur the<br />
most overhead during interpreted execution; they profit the<br />
most from the removal of overhead.<br />
By contrast, the benchmarks that are heavy in built-in<br />
function calls benefit very little, <strong>and</strong> sometimes not at all,<br />
from compilation. Obviously, the execution speed of built-in<br />
functions is not influenced by compiling the calling code.<br />
The orbrk benchmark demonstrates that inlining at compile<br />
time is beneficial. Recursive functions like fibo <strong>and</strong><br />
ack also generally benefit from inlining. <strong>MaJIC</strong> does not<br />
attempt to inline more than 3 levels of recursive calls in<br />
order to avoid code explosion.<br />
While mcc is not particularly successful at removing the<br />
interpretive overhead, both FALCON <strong>and</strong> <strong>MaJIC</strong> do succeed<br />
in eliminating it, although using different strategies.<br />
FALCON relies heavily on the native Fortran compiler to<br />
sor<br />
adapt<br />
orbec<br />
orbrk<br />
fractal<br />
galrkn<br />
ack<br />
fibo
10000<br />
1000<br />
100<br />
10<br />
1<br />
0.1<br />
crnich<br />
dirich<br />
finedif<br />
icn<br />
m<strong>and</strong>el<br />
mmc falcon jit spec<br />
cgopt<br />
mei<br />
Figure 5: Per<strong>for</strong>mance on the MIPS plat<strong>for</strong>m<br />
generate good code. <strong>MaJIC</strong> has a few specific optimizations<br />
(described in Section 2.6.1) that make it less reliant<br />
on the native compiler <strong>and</strong> allow it to generate reasonable<br />
code even with the JIT code generator.<br />
On the SPARC plat<strong>for</strong>m the native Fortran-90 compiler<br />
generates relatively poor code, causing <strong>MaJIC</strong> to outper<strong>for</strong>m<br />
FALCON in a few of the benchmarks. On the MIPS<br />
plat<strong>for</strong>m the native compiler is excellent, causing <strong>MaJIC</strong>’s<br />
JIT compiler to fall behind FALCON.<br />
3.5 Analysis of JIT compilation<br />
For the analysis of JIT compilation we rely mostly on results<br />
gathered on the SPARC plat<strong>for</strong>m, since the JIT code<br />
generator was optimized <strong>for</strong> this plat<strong>for</strong>m. The per<strong>for</strong>mance<br />
figures are remarkable when considering that the code in<br />
question is generated in a fraction of a second <strong>and</strong> without<br />
the benefit of backend optimizations. On the other<br />
h<strong>and</strong> there is room <strong>for</strong> future optimizations; however, be<strong>for</strong>e<br />
adding these on, it will be necessary to test whether the<br />
increased compile time will destroy the per<strong>for</strong>mance gained<br />
by optimization.<br />
Figure 6 shows the time composition of the runtime of<br />
each JIT-compiled benchmark. With the exception of orbrk,<br />
most benchmarks spend a relatively modest amount of time<br />
compiling the code. The compile time/runtime ratio is artificially<br />
high anyway, in part because the benchmarks run<br />
on modestly sized problems. There is definitely room <strong>for</strong><br />
at least basic back-end optimizations in the JIT compiler,<br />
such as common subexpression elimination, loop unrolling,<br />
loop invariant removal <strong>and</strong> some <strong>for</strong>m of instruction scheduling.<br />
Preliminary experiments with the finedif <strong>and</strong> dirich<br />
benchmark suggest that loop unrolling alone can reduce excution<br />
time by about 50% at a reasonable cost in overhead.<br />
3.5.1 The effect of existing JIT optimizations<br />
The effect of optimizations in any compiler is cumulative<br />
<strong>and</strong> hard to study in isolation. In this section we evaluate<br />
the effectiveness of JIT-specific optimizations by individually<br />
disabling them <strong>and</strong> studying the resulting drop in<br />
per<strong>for</strong>mance. Figure 7 shows the measurement results.<br />
The first set of bars (“no range”) was obtained by disabling<br />
range propagation during JIT type inference. The<br />
primary effect of this measure is to disable subscript check<br />
qmr<br />
sor<br />
adapt<br />
orbec<br />
orbrk<br />
fractal<br />
galrkn<br />
8<br />
301<br />
normalized execution time<br />
100%<br />
90%<br />
80%<br />
70%<br />
60%<br />
50%<br />
40%<br />
30%<br />
20%<br />
10%<br />
0%<br />
crnich<br />
dirich<br />
finedif<br />
icn<br />
m<strong>and</strong>el<br />
disamb typeinf codegen exec<br />
cgopt<br />
mei<br />
qmr<br />
Figure 6: The composition of JIT execution<br />
per<strong>for</strong>mance relative to fully optimized JIT<br />
100%<br />
80%<br />
60%<br />
40%<br />
20%<br />
0%<br />
crnich<br />
dirich<br />
finedif<br />
icn<br />
no ranges<br />
no min. shapes<br />
no regalloc<br />
m<strong>and</strong>el<br />
cgopt<br />
mei<br />
Figure 7: Disabling JIT optimizations<br />
removal. The relative increase in execution time is highest<br />
in the benchmarks that have many array accesses: dirich,<br />
finedif <strong>and</strong> m<strong>and</strong>el are good examples.<br />
The second set of bars (“no min. shape”) was obtained<br />
by disabling the propagation of minimum shape in<strong>for</strong>mation.<br />
This disables subscript check removal in some cases,<br />
<strong>and</strong> does not allow the compiler to unroll small vector operations.<br />
orbec, orbrk <strong>and</strong> fractal are the most affected,<br />
because these consist mostly of operations on small vectors<br />
<strong>and</strong> matrices.<br />
The last set of bars (“no regalloc”) was obtained by <strong>for</strong>cing<br />
the linear-scan register allocator to spill every variable.<br />
This is roughly equivalent to compiling with the -g flag set<br />
on a regular compiler like gcc.<br />
The results clearly show that range propagation, minimum<br />
shape propagation <strong>and</strong> register allocation are essential<br />
to JIT per<strong>for</strong>mance.<br />
qmr<br />
sor<br />
sor<br />
adapt<br />
adapt<br />
orbec<br />
orbec<br />
orbrk<br />
orbrk<br />
fractal<br />
fractal<br />
galrkn<br />
galrkn<br />
ack<br />
ack<br />
fibo<br />
fibo
3.6 Analysis of the speculator<br />
The speedup results produced by speculation generally<br />
match those of FALCON. We conclude that speculation is<br />
generally successful. However, we cannot expect a speculative<br />
technique to be universally successful; we need to<br />
analyze the consequences of failure.<br />
<strong>MaJIC</strong>’s type speculator fails in two ways: by being too<br />
aggressive, <strong>and</strong> generating useless code, or by not being aggressive<br />
enough <strong>and</strong> generating suboptimal code. The first<br />
type of failure, unreasonable specialization of input types,<br />
is easily countered: the type signature check, done by the<br />
repository at runtime, will eliminate such code from consideration.<br />
A more insidious failure is when the speculator generates<br />
code that is perfectly safe to execute, but suboptimal. Such<br />
cases are not caught at runtime. The per<strong>for</strong>mance of the<br />
invoked code will be lower, but it is not immediately clear<br />
by how much.<br />
benchmark crnich dirich finedif icn m<strong>and</strong>el<br />
spec. 181 817 412 48 36<br />
JIT 181 817 413 51 54.0<br />
benchmark cgopt mei qmr sor adapt<br />
spec. 1 4.24 4.52 1.68 4.09<br />
JIT 1.16 5.67 5.68 1.79 4.16<br />
benchmark orbec orbrk fractal galrkn ack<br />
spec. 146 465 663 61.7 4.04<br />
JIT 174 465 664 72.9 6.00<br />
benchmark fibo<br />
spec. 3.49<br />
JIT 5.16<br />
Table 2: JIT vs.speculative type inference<br />
Table 2 attempts to quantify the speculator’s per<strong>for</strong>mance.<br />
It compares the speedups produced by the same code generator<br />
using type annotations generated with either speculation<br />
or JIT type inference (the speedups were calculated<br />
without considering compile time). Looking at this table,<br />
it is obvious that speculative type inference closely matches<br />
the per<strong>for</strong>mance of JIT type inference in many cases. We<br />
conclude that<br />
• Speculation works best on scalar (Fortran 77-like) <strong>and</strong><br />
vector codes. Speculative rules look <strong>for</strong> exactly the<br />
kinds of features that are prevalent in these codes.<br />
• Benchmarks with built-in functions typically fare badly<br />
because the speculative rules currently present in <strong>MaJIC</strong><br />
do not account <strong>for</strong> the language features used by these<br />
codes. <strong>MaJIC</strong> mispredicts a “*” operator in qmr to<br />
represent scalar multiplication, whereas in fact it is a<br />
matrix-vector multiplication. In mei the speculator is<br />
unable to predict that the arguments to an eig function<br />
call are reals; instead it considers them complex<br />
values which leads to per<strong>for</strong>mance loss. A similar situation<br />
occurs in m<strong>and</strong>el due to the use of the built-in<br />
function i.<br />
• Recursive benchmarks are not h<strong>and</strong>led correctly by<br />
speculative compilation. They always need to be recompiled<br />
at runtime.<br />
9<br />
302<br />
4. RELATED WORK<br />
<strong>MaJIC</strong> is patterned after FALCON [9, 8], a <strong>MATLAB</strong><br />
to Fortran-90 translator developed by L. DeRose in 1996.<br />
FALCON per<strong>for</strong>ms type inference to generate declarations<br />
<strong>for</strong> variables. It then generates Fortran code using these<br />
declarations. However, FALCON’s type inference engine is<br />
facing a limiting factor. Because FALCON is a batch compiler,<br />
it has no in<strong>for</strong>mation about the calling context of the<br />
functions it tries to compile. This makes type inference potentially<br />
ineffective. FALCON circumvents this problem by<br />
“peeking” into the input files of the code it compiles <strong>and</strong><br />
extracting type in<strong>for</strong>mation from there.<br />
MENHIR [5], developed by Francois Bodin at INRIA, is<br />
another batch compiler similar to FALCON: it generates<br />
code <strong>for</strong> <strong>MATLAB</strong> <strong>and</strong> exploits parallelism by using optimized<br />
runtime libraries. MENHIR’s code generation is<br />
retargetable (it generates C or FORTRAN code). It also<br />
contains a type inference engine similar to FALCON’s.<br />
MATCH [2] is a <strong>MATLAB</strong> compiler targeted to heterogeneous<br />
architectures, such as DSP chips <strong>and</strong> FPGAs. It<br />
also uses type analysis <strong>and</strong> generates code <strong>for</strong> multiple large<br />
functional units.<br />
Vijay Menon’s vectorizer [16] is an alternative to compilation.<br />
Menon observed that scalar operations in <strong>MATLAB</strong><br />
were slower than vector operations because they involved<br />
more overhead pear floating-point operation. He proposed<br />
to eliminate this overhead not by compilation, but by translating<br />
Fortran 77-like scalar operations into Fortran 90-like<br />
vector expressions in the <strong>MATLAB</strong> source code. Menon’s<br />
vectorizer is built on top of the <strong>MaJIC</strong> infrastructure.<br />
Just-in-time compilation has been around since 1984, when<br />
Deutsch described a dynamic compilation system <strong>for</strong> the<br />
Smalltalk language [10]. The technique became truly popular<br />
with the Java language, <strong>and</strong> countless Java JIT compilers<br />
have been proposed <strong>and</strong> implemented in recent times.<br />
<strong>MaJIC</strong>’s JIT compiler reuses code <strong>and</strong> ideas from the vcode [11]<br />
<strong>and</strong> tcc [18] packages. vcode was originally built as a generalpurpose,<br />
plat<strong>for</strong>m-independent RISC-like dynamic assembly<br />
language to facilitate dynamic code generation, <strong>and</strong> is used<br />
in almost unchanged <strong>for</strong>m by <strong>MaJIC</strong>. tcc was built on top<br />
of vcode <strong>and</strong> provides an implementation of ‘C, a C-like<br />
programming language with a LISP-like backquote operator<br />
that facilitates the building of dynamic code by composition.<br />
We did not reuse the ‘C parser, but we did use the tcc intermediate<br />
language specification, ICODE, <strong>and</strong> re-implemented<br />
the register allocator used by tcc.<br />
5. CONCLUSIONS<br />
In an ef<strong>for</strong>t to bring high per<strong>for</strong>mance to the <strong>MATLAB</strong> integrated<br />
environment, we have designed, built <strong>and</strong> evaluated<br />
two paradigms <strong>for</strong> compiling <strong>MATLAB</strong> code: JIT compilation<br />
<strong>and</strong> speculative compilation.<br />
JIT compilation is remarkably successful in bringing down<br />
compile time to almost nil, while obtaining reasonable per<strong>for</strong>mance<br />
gains (up to two orders of magnitude faster than<br />
the <strong>MATLAB</strong> interpreter). It falls behind in terms of per<strong>for</strong>mance<br />
when compared to the best that a static compiler<br />
(like FALCON) can do. Of our benchmarks, the most affected<br />
were the Fortran-like <strong>and</strong> small vector codes, where<br />
the lack of backend optimization is felt the most.<br />
In order to estimate the effect of adding more optimizations<br />
to the JIT compiler, we h<strong>and</strong>-optimized the finedif
enchmark by h<strong>and</strong>-unrolling its innermost loop <strong>and</strong> per<strong>for</strong>ming<br />
common subexpression elimination. We obtained<br />
a version of finedif that was almost 100% faster than the<br />
normal JIT-compiled finedif, <strong>and</strong> within 20% of the per<strong>for</strong>mance<br />
of the best (native compiler-generated) version of the<br />
code. Preliminary data suggests that similar, although less<br />
impressive, per<strong>for</strong>mance improvements can be obtained with<br />
some of the other Fortran-like benchmarks, which leaves the<br />
door open <strong>for</strong> future enhancements of the JIT compiler.<br />
Speculative compilation is successful in bringing up per<strong>for</strong>mance<br />
to – <strong>and</strong> beyond – FALCON levels. However, generation<br />
of optimized code takes time; speculation is designed<br />
to allow the hiding of compilation latency. Speculation is not<br />
universally successful; it can result in loss of per<strong>for</strong>mance<br />
when it fails.<br />
It is interesting to note that the speculative type hints that<br />
are used most successfully by <strong>MaJIC</strong>’s speculator are tied<br />
to the very same language features of <strong>MATLAB</strong> that slow<br />
down the interpreter. Hence, speculation tends to succeed<br />
when it is most needed.<br />
6. REFERENCES<br />
[1] George Almasi. <strong>MaJIC</strong>: a Matlab Just-In-time<br />
Compiler. PhD thesis, University of Illinois at<br />
Urbana-Champaign, June 2001.<br />
[2] P. Banerjee, N. Shenoy, A. Choudhary, S. Hauck,<br />
C. Bachmann, M. Chang, M. Haldar, P. Joisha,<br />
A. Jones, A. Kanhare, A. Nayak, S. Periyacheri, <strong>and</strong><br />
M. Walkden. Match: A matlab compiler <strong>for</strong><br />
configurable computing systems. Technical Report<br />
CPDC-TR-9908-013, Center <strong>for</strong> Parallel <strong>and</strong><br />
Distributed Computing, Northwestern University,<br />
Aug. 1999.<br />
[3] R. Barrett, M. Berry, T. F. Chan, J. Demmel,<br />
J.Donato,J.Dongarra,V.Eijkhout,R.Pozo,<br />
C. Romine, <strong>and</strong> H. Van der Vorst. Templates <strong>for</strong> the<br />
Solution of Linear Systems: Building Blocks <strong>for</strong><br />
Iterative Methods, 2nd Edition. SIAM, Philadelphia,<br />
PA, 1994.<br />
[4] William Blume <strong>and</strong> Rudolf Eigenmann. Symbolic<br />
range propagation. In Proceedings of the 9th<br />
International Parallel Processing Symposium, April<br />
1995.<br />
[5] Francois Bodin. MENHIR: High per<strong>for</strong>mance code<br />
generation <strong>for</strong> <strong>MATLAB</strong>.<br />
http://www.irisa.fr/caps/PEOPLE/Francois/.<br />
[6] Timothy Budd. An APL Compiler. Springer Verlag,<br />
1988.<br />
[7] J. Choi, J. Dongarra, <strong>and</strong> D.W. Walker. BLAS<br />
reference manual (version 1.0beta). Technical Report<br />
ORNL/TM-12469, Oak Ridge National Laboratory,<br />
March 1994.<br />
[8] Luiz DeRose <strong>and</strong> David Padua. Techniques <strong>for</strong> the<br />
translation of <strong>MATLAB</strong> programs into Fortran 90.<br />
ACM Transactions on Programming Languages <strong>and</strong><br />
Systems (TOPLAS), 21(2):285–322, March 1999.<br />
[9] Luiz Antonio DeRose. Compiler Techniques <strong>for</strong><br />
<strong>MATLAB</strong> Programs. Technical Report<br />
UIUCDCS-R-96-1996, Department of Computer<br />
Science, University of Illinois, 1996.<br />
10<br />
303<br />
[10] L Peter Deutsch <strong>and</strong> Alan Schiffman. Efficient<br />
Implementation of the Smalltalk-80 System. In<br />
Proceedings of the 11th Symposium on the Principles<br />
of Programming Languages, Salt Lake City, UT, 1984.<br />
[11] Dawson R. Engler. VCODE: a portable, very fast<br />
dynamic code generation system. In Proceedings of the<br />
ACM SIGPLAN Conference on Programming<br />
Languages Design <strong>and</strong> Implementation (PLDI ’96),<br />
Philadelphia PA, May 1996.<br />
[12] Alej<strong>and</strong>ro L. Garcia. Numerical Methods <strong>for</strong> Physics.<br />
Prentice Hall, 1994.<br />
[13] Rajiv Gupta. Optimizing array bounds checks using<br />
flow analysis. ACM Letters on Programming<br />
Languages <strong>and</strong> Systems, 2(1-4):135–150, 1993.<br />
[14] John H. Mathews. Numerical Methods <strong>for</strong><br />
Mathematics, Science <strong>and</strong> Engineering. Prentice Hall,<br />
1992.<br />
[15] Mathworks Inc. homepage. www.mathworks.com.<br />
[16] Vijay Menon <strong>and</strong> Keshav Pingali. High-level semantic<br />
optimization of numerical codes. In 1999 ACM<br />
Conference on Supercomputing. ACM SIGARCH,<br />
June 1999.<br />
[17] Steven S. Muchnick <strong>and</strong> Neil D. Jones. Program Flow<br />
Analysis: Theory <strong>and</strong> Application. Prentice Hall, 1981.<br />
[18] Massimiliano Poletto, Dawson R. Engler, <strong>and</strong><br />
M. Frans Kaashoek. tcc: A system <strong>for</strong> fast, flexible,<br />
<strong>and</strong> high-level dynamic code generation. In<br />
Proceedings of the ACM SIGPLAN Conference on<br />
Programming Languages Design <strong>and</strong> Implementation<br />
(PLDI ’97), pages 109–121, Las Vegas, Nevada, May<br />
1997.<br />
[19] Massimiliano Poletto <strong>and</strong> Vivek Sarkar. Linear scan<br />
register allocation. ACM Transactions on<br />
Programming Languages <strong>and</strong> Systems, 21(5):895–913,<br />
1999.