Lecture Notes in Computer Science 4917
Lecture Notes in Computer Science 4917
Lecture Notes in Computer Science 4917
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
370 C.–C. L<strong>in</strong> and C.–L. Chen<br />
higher capacity than NOR flash technology does. As the applications of embedded<br />
devices become large and complicated, more ma<strong>in</strong>stream devices adopt NAND flash<br />
memories to replace NOR flash memories.<br />
In this paper, we tried to offer one of the answers to this question: can we speed up<br />
a Java-enabled device us<strong>in</strong>g NAND flash memories to store programs? We beg<strong>in</strong> to<br />
construct our approach from consider<strong>in</strong>g the page-oriented access property of NAND<br />
flash memories; because the penalty of each access to the NAND flash memory is<br />
higher than access<strong>in</strong>g RAM. By the unique nature of the KVM <strong>in</strong>terpreter, we found a<br />
special way to discover the locality of the KVM while execution, and implemented a<br />
post-process<strong>in</strong>g program runn<strong>in</strong>g beh<strong>in</strong>d the compiler code generation stage. The<br />
post-process<strong>in</strong>g program ref<strong>in</strong>ed mach<strong>in</strong>e code placement of KVM based on the graph<br />
that formalizes both Java <strong>in</strong>struction trace patterns and code size constra<strong>in</strong>ts. The<br />
tuned KVM dramatically reduced page accesses to NAND flash memories, thus saves<br />
more battery power as well.<br />
2 Related Works<br />
Park et al., <strong>in</strong> [2], proposed a hardware module connect<strong>in</strong>g with NAND flash to allow<br />
direct code execution from NAND flash memory. In this approach, program codes<br />
stored <strong>in</strong> NAND flash pages will be loaded <strong>in</strong>to RAM cache on-demand <strong>in</strong>stead of<br />
move entire contents <strong>in</strong>to RAM. Their work is a universal hardware-based solution<br />
without consider<strong>in</strong>g application-specific characteristics.<br />
Samsung Electronics offers a commercial product called “OneNAND” [3] based<br />
on the same concept of above approach. It is a s<strong>in</strong>gle chip with a standard NOR flash<br />
<strong>in</strong>terface. Actually, it conta<strong>in</strong>s a NAND flash memory array for data storage. The<br />
vendor was <strong>in</strong>tent to provide a cost-effective alternative to NOR flash memories used<br />
<strong>in</strong> exist<strong>in</strong>g designs. The <strong>in</strong>ternal structure of OneNAND comprises a NAND flash<br />
memory, control logics, hardware ECC, and 5KB buffer RAM. The 5KB buffer RAM<br />
is comprised of three buffers: 1KB for boot RAM, and a pair of 2KB buffers used for<br />
bi-directional data buffers. Our approach is suitable for systems us<strong>in</strong>g this type of<br />
flash memories.<br />
Park et al., <strong>in</strong> [4], proposed yet another pure software approach to archive execute<strong>in</strong>-place<br />
by us<strong>in</strong>g a customized compiler that <strong>in</strong>serts NAND flash read<strong>in</strong>g operations<br />
<strong>in</strong>to program code at proper place. Their compiler determ<strong>in</strong>es <strong>in</strong>sertion po<strong>in</strong>ts by sum<br />
up sizes of basic blocks along the call<strong>in</strong>g tree. Although special hardware is no longer<br />
required, but it still need a tailor-made compiler <strong>in</strong> contrast to their previous work [2].<br />
Conventional studies of ref<strong>in</strong><strong>in</strong>g code placement to m<strong>in</strong>imize cache misses can<br />
apply to NAND flash cache system. Parameswaran et al., <strong>in</strong> [5], used the b<strong>in</strong>-pack<strong>in</strong>g<br />
approach. It reorders the program codes by exam<strong>in</strong><strong>in</strong>g the execution frequency of<br />
basic blocks. Code segments with higher execution frequency are placed next to each<br />
other with<strong>in</strong> the cache. Janapsatya et al., <strong>in</strong> [6], proposed a pure software heuristic<br />
approach to reduce number of cache misses by relocat<strong>in</strong>g program sections <strong>in</strong> the<br />
ma<strong>in</strong> memory. Their approach was to analyze program flow graph, identify and pack<br />
basic blocks with<strong>in</strong> the same loop. They have also created relations between cache<br />
miss and energy consumption. Although their approach can identify loops with<strong>in</strong> a<br />
program, it is hard to break the <strong>in</strong>terpreter of a virtual mach<strong>in</strong>e <strong>in</strong>to <strong>in</strong>dividual loops<br />
because all the loops share the same start<strong>in</strong>g po<strong>in</strong>t.