21.01.2013 Views

Lecture Notes in Computer Science 4917

Lecture Notes in Computer Science 4917

Lecture Notes in Computer Science 4917

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

372 C.–C. L<strong>in</strong> and C.–L. Chen<br />

Figure 2 shows a demand pag<strong>in</strong>g approach uses limited amount of RAM as the<br />

cache of NAND flash. The “romized” program codes stay <strong>in</strong> NAND flash memory,<br />

and a MMU loads only portions of program codes which is about to be executed from<br />

NAND <strong>in</strong>to the cache. The major advantage of this approach is it consumes less<br />

RAM. Several kilobytes of RAM are enough to cache a NAND flash memory. Us<strong>in</strong>g<br />

less RAM means it is easier to <strong>in</strong>tegrate CPU, MMU and cache <strong>in</strong>to a s<strong>in</strong>gle chip (The<br />

shadowed part <strong>in</strong> Figure 2). The startup latency is shorter s<strong>in</strong>ce CPU is ready to run<br />

soon after the first NAND flash page is loaded <strong>in</strong>to the cache. The material cost is<br />

relative lower than the previous approach. The realization of the MMU might be<br />

either hardware or software approach, which is not covered <strong>in</strong> this paper.<br />

However, performance is the major drawback of this approach. The penalty of each<br />

cache miss is high, because load<strong>in</strong>g contents from a NAND flash page is nearly 200<br />

times slower than do<strong>in</strong>g the same operation with RAM. Therefore reduc<strong>in</strong>g cache<br />

misses becomes a critical issue to such configuration.<br />

3.2 KVM Internals<br />

Fig. 2. Us<strong>in</strong>g cache unit to access NAND flash<br />

Source Level. In respect of functionality, the KVM can be broken down <strong>in</strong>to several<br />

parts: startup, class files load<strong>in</strong>g and constant pool resolv<strong>in</strong>g, <strong>in</strong>terpreter, garbage<br />

collection, and KVM cleanup. Lafond et al., <strong>in</strong> [11], have measured the energy<br />

consumptions of each part <strong>in</strong> the KVM. Their study showed that the <strong>in</strong>terpreter<br />

consumed more than 50% of total energy. In our experiments runn<strong>in</strong>g Embedded<br />

Caffe<strong>in</strong>e Benchmark [12], the <strong>in</strong>terpreter contributed 96% of total memory accesses.<br />

These evidences concluded that the <strong>in</strong>terpreter is the performance bottleneck of the<br />

KVM, and they motivated us to focus on reduc<strong>in</strong>g the cache misses generated by the<br />

<strong>in</strong>terpreter.<br />

Figure 3 shows the program structure of the <strong>in</strong>terpreter. It is a loop encloses a large<br />

switch-case dispatcher. The loop fetches bytecode <strong>in</strong>structions from Java application,<br />

and each “case” sub-clause, say bytecode handler, deals with one bytecode<br />

<strong>in</strong>struction. The control flow graph of the <strong>in</strong>terpreter, as illustrated <strong>in</strong> Figure 4, is a<br />

flat and shallow spann<strong>in</strong>g tree. There are three major steps <strong>in</strong> the <strong>in</strong>terpreter,

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!