27.07.2013 Views

Design, Implementation, and Performance Evaluation of Flash ...

Design, Implementation, and Performance Evaluation of Flash ...

Design, Implementation, and Performance Evaluation of Flash ...

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

JOURNAL OF INFORMATION SCIENCE AND ENGINEERING 23, 1865-1887 (2007)<br />

<strong>Design</strong>, <strong>Implementation</strong>, <strong>and</strong> <strong>Performance</strong> <strong>Evaluation</strong> <strong>of</strong><br />

<strong>Flash</strong> Memory-based File System on Chip *<br />

SEONGJUN AHN, JONGMOO CHOI 1 , DONGHEE LEE 2 , SAM H. NOH 3 ,<br />

SANG LYUL MIN AND YOOKUN CHO<br />

Department <strong>of</strong> Electrical Engineering <strong>and</strong> Computer Sciences<br />

Seoul National University<br />

Seoul, 151-742 Korea<br />

1<br />

Division <strong>of</strong> Information <strong>and</strong> Computer Science<br />

Dankook University<br />

Seoul, 140-714 Korea<br />

2<br />

School <strong>of</strong> Computer Science<br />

University <strong>of</strong> Seoul<br />

Seoul, 130-743 Korea<br />

3<br />

School <strong>of</strong> Information <strong>and</strong> Computer Engineering<br />

Hongik University<br />

Seoul, 121-791 Korea<br />

Interoperability is an important requirement for portable storage devices that are increasingly<br />

being used to exchange <strong>and</strong> share data among diverse hosts. However, interoperability<br />

cannot be provided if different host systems use different file systems. To<br />

address this problem, we propose a storage device that contains a file system within itself,<br />

which we refer to as FSOC (File System On Chip). In this paper, we explain the design<br />

<strong>and</strong> implementation <strong>of</strong> a <strong>Flash</strong> memory-based FSOC as a pro<strong>of</strong>-<strong>of</strong>-concept. We also propose<br />

a performance model for FSOC, which is derived by analyzing operations <strong>of</strong> the<br />

host <strong>and</strong> storage device. Using this model, we show that aside from qualitative benefits,<br />

there are quantitative benefits in using FSOC instead <strong>of</strong> a conventional storage device.<br />

Results from a series <strong>of</strong> experiments are given that compare the performance <strong>of</strong> a conventional<br />

storage device <strong>and</strong> the FSOC using synthetic workloads as well as real applications,<br />

which verifies the proposed model.<br />

Keywords: embedded system, file system, flash memory, interoperability, portable storage<br />

1. INTRODUCTION<br />

Portable storage devices such as Compact<strong>Flash</strong> [1] <strong>and</strong> Multimedia Card [2] are increasingly<br />

being used as nonvolatile storage to exchange <strong>and</strong> share data among multiple<br />

hosts including mobile ones. Hence, interoperability is a key requirement <strong>of</strong> such systems.<br />

However, if the file system on the portable storage device is not compatible with<br />

the file system <strong>of</strong> the host, the applications running on the host cannot access the stored<br />

data.<br />

Developing or porting a file system is not a simple task. Thus, providing a file sys-<br />

Received November 3, 2005; revised January 27 & May 29, 2006; accepted July 19, 2006.<br />

Communicated by Tei-Wei Kuo.<br />

* This research was partly supported by grant No. R01-2004-000-10188-0 from the Basic Research Program<br />

<strong>of</strong> the Korea Science & Engineering Foundation <strong>and</strong> in part by MIC & IITA through IT Leading R&D Support<br />

Project.<br />

1865


1866<br />

S. J. AHN, J. M. CHOI, D. H. LEE, S. H. NOH, S. L. MIN AND Y. K. CHO<br />

tem for each <strong>of</strong> the many different embedded systems <strong>and</strong> mobile devices such as digital<br />

cameras, MP3 players, <strong>and</strong> PDA’s, each with its own specific environment, takes a lot <strong>of</strong><br />

time <strong>and</strong> effort [3]. This results in a prolonged time-to-market <strong>of</strong> the product, which, in<br />

turn, may influence the success <strong>of</strong> the product.<br />

In this paper, we propose that the file system be embedded within the portable storage<br />

device, which we refer to as FSOC (File System on Chip). The benefits <strong>of</strong> FSOC can<br />

be categorized into qualitative <strong>and</strong> quantitative ones. Qualitative advantages <strong>of</strong> FSOC<br />

compared with conventional storage devices include the following.<br />

• FSOC provides a high degree <strong>of</strong> interoperability. When conventional storage devices<br />

are used, a host can access data stored in a storage device only if the file system <strong>of</strong> the<br />

host is compatible with that in the storage device. Using FSOC, any host can access the<br />

data residing in the storage device simply by adding a simple interface for FSOC to the<br />

host.<br />

• Host system developers need not implement a file system. Therefore, FSOC eliminates<br />

the burden <strong>of</strong> developing or porting a file system, <strong>and</strong> thus reduces the time-to-market.<br />

• FSOC improves file system performance by optimizing the file system for the storage<br />

media it uses. Generally, when the file system resides in the host system, optimizing its<br />

performance is difficult since it needs to support a variety <strong>of</strong> storage media with varying<br />

characteristics [4]. In FSOC, as the file system is developed for the specific storage<br />

media that it contains, there are more opportunities for optimization [5-7].<br />

Aside from these qualitative benefits, obtainable quantitative benefits <strong>of</strong> FSOC are<br />

summarized below, with details provided in later sections.<br />

• Since the file system code is now executed in the storage device, more processor time<br />

in the host system can be allocated to the execution <strong>of</strong> application code.<br />

• FSOC reduces the data traffic between the host <strong>and</strong> the storage device. This is because<br />

metadata required during file operations are not transferred between the FSOC <strong>and</strong> the<br />

host [8].<br />

• FSOC lowers the energy consumption <strong>of</strong> the system by reducing data traffic <strong>and</strong> by<br />

lowering the clock speed <strong>and</strong> the supply voltage through parallel processing <strong>of</strong> the host<br />

<strong>and</strong> the storage device [9].<br />

However, these benefits, qualitative as well as quantitative, are not always attainable.<br />

Our study shows that the use <strong>of</strong> FSOC is desirable in the following situations. That<br />

is, (1) when the host is unable to provide diverse file systems due to limitations on resources<br />

<strong>and</strong>/or development time, (2) when an application requires a large amount <strong>of</strong><br />

computation, <strong>and</strong> I/O can be overlapped with the computation, (3) when multiple applications<br />

are being executed on the host, <strong>and</strong>/or (4) when applications perform metadata<br />

intensive I/O operations. However, when applications require large amounts <strong>of</strong> I/O <strong>and</strong><br />

I/O time is critical for the application or when the computation power <strong>of</strong> the host is sufficiently<br />

greater than that <strong>of</strong> the storage device, the use <strong>of</strong> FSOC may not be desirable.<br />

As a pro<strong>of</strong>-<strong>of</strong>-concept, we designed <strong>and</strong> implemented an FSOC that uses <strong>Flash</strong><br />

memory as its storage media. We also show how the interface between the FSOC <strong>and</strong> the<br />

host can be defined based on the st<strong>and</strong>ard interface that is used to access files by applica-


FLASH MEMORY-BASED FILE SYSTEM ON CHIP<br />

tions, which is important factor <strong>of</strong> the interoperability. Our contribution in the theoretical<br />

aspect is in presenting a performance analysis based on a theoretical model <strong>and</strong> validating<br />

the model through experiments using an FSOC implementation. Although the approach<br />

<strong>of</strong> <strong>of</strong>f-loading part <strong>of</strong> host work to the storage device is not new, our work is the<br />

first attempt to apply it to the embedded system such as portable storage to our knowledge.<br />

In such circumstances the models <strong>and</strong> performance analysis results from previous<br />

work cannot be directly applied as the environment <strong>of</strong> embedded systems is restrictive.<br />

The rest <strong>of</strong> the paper is organized as follows. Section 2 presents works related to<br />

this paper <strong>and</strong> section 3 describes the design <strong>and</strong> implementation <strong>of</strong> an FSOC in a <strong>Flash</strong><br />

memory card. In section 4, performance models <strong>of</strong> FSOC <strong>and</strong> a conventional storage<br />

device are presented, <strong>and</strong> the performance <strong>of</strong> these devices is compared using the presented<br />

model. In section 5, a performance comparison <strong>of</strong> the implemented FSOC <strong>and</strong> a<br />

conventional storage device is presented. Finally, we provide a summary <strong>and</strong> conclude in<br />

section 6.<br />

2. RELATED WORK<br />

The FSOC approach, that is <strong>of</strong>f-loading file system work to a storage device, is not<br />

a new idea. There have been several previous research results that consider performance<br />

<strong>of</strong> intelligent storage devices [8, 10-13]. However, in these studies the storage device <strong>of</strong><br />

interest is disk storage <strong>and</strong> their main concern is in regards to exploiting parallelism<br />

among the storage devices. In this section, we describe some <strong>of</strong> the previous research<br />

that utilizes additional resources within the storage device to improve performance <strong>and</strong>/or<br />

functionality. In contrast to these works, our work is more apt to storage devices that can-<br />

not exploit this kind <strong>of</strong> parallelism such as portable storage devices that are becoming<br />

more <strong>and</strong> more prevalent. In this section, we also describe research on <strong>Flash</strong> memory,<br />

which is the platform on which we implement our FSOC.<br />

Active disk [8, 10] IDISK (Intelligent Disk) [11], <strong>and</strong> OSD (Object-based Storage<br />

Device) [14, 15] are some <strong>of</strong> the research efforts that are targeted to improve the performance<br />

<strong>and</strong>/or functionality <strong>of</strong> storage devices. In the Active disk approach, a portion<br />

<strong>of</strong> the application code is downloaded <strong>and</strong> executed in the storage device to reduce the<br />

data traffic <strong>and</strong> to improve the performance <strong>of</strong> data-intensive portions <strong>of</strong> the application.<br />

The IDISK proposes an architecture that uses a high-speed network to interconnect multiple<br />

disks that have application code execution capabilities. This architecture improves<br />

application execution parallelism <strong>and</strong> can be used to improve efficiency <strong>of</strong> data-intensive<br />

applications such as decision support systems. Finally, in OSD, which is most closely<br />

related to FSOC, interoperability is enhanced by making OSD responsible for locating<br />

data blocks. However, OSD is different from FSOC in that OSD manages data in the<br />

form <strong>of</strong> objects rather than files, <strong>and</strong> naming <strong>of</strong> files <strong>and</strong> managing directories remain the<br />

responsibility <strong>of</strong> the host. Moreover, the main purpose <strong>of</strong> OSD is in constructing storage<br />

appliances, whereas the purpose <strong>of</strong> FSOC is in providing portable storage that can be<br />

used by mobile devices.<br />

Riedel et al. proposed a performance model for Active disk [12, 13]. The performance<br />

model for Active disk was focused on parallelism among storage devices that execute<br />

part <strong>of</strong> the host application. It is similar to our model in that it models the perform-<br />

1867


1868<br />

S. J. AHN, J. M. CHOI, D. H. LEE, S. H. NOH, S. L. MIN AND Y. K. CHO<br />

ance <strong>of</strong> storage devices that process <strong>of</strong>floaded computation as well as I/O requests.<br />

However, in this model, parallelism is considered only among the storage devices.<br />

Currently, interfaces such as ATA or SCSI provide only simple read <strong>and</strong> write access<br />

to blocks in the storage device. This limited interface has been identified as an obstacle<br />

to making the storage device intelligent. It has been suggested that to make use <strong>of</strong><br />

computational resources within the storage device a more expressive interface between<br />

the host <strong>and</strong> the storage is needed [4]. We propose a new interface for FSOC in section 3.<br />

The storage media used in our implementation <strong>of</strong> FSOC is <strong>Flash</strong> memory. <strong>Flash</strong><br />

memory is suitable as storage media for mobile devices because it is light, rigid, <strong>and</strong> has<br />

low power consumption [16, 17]. However, <strong>Flash</strong> memory has a unique characteristic<br />

different from hard disks; that is, to overwrite a physical block in <strong>Flash</strong> memory an erase<br />

operation must be performed before the actual write operation [18, 19]. Hence, simply<br />

adapting a conventional file system developed for hard disks may not be possible. Two<br />

approaches have been used to circumvent this problem.<br />

The first method is to use a s<strong>of</strong>tware layer called a <strong>Flash</strong> Translation Layer (FTL)<br />

[18-22], as generally done for <strong>Flash</strong> memory cards such as Compact<strong>Flash</strong> [1] <strong>and</strong> Multimedia<br />

Card [2]. FTL is a sector remapping s<strong>of</strong>tware layer that provides to the file system<br />

an interface similar to that <strong>of</strong> a disk.<br />

The other approach is to develop a file system that takes into consideration the limitation<br />

<strong>of</strong> <strong>Flash</strong> memory. The JFFS [23] <strong>and</strong> YAFFS [24] are such file systems. These file<br />

systems run on the Linux operating system, <strong>and</strong> manage the physical structure <strong>of</strong> <strong>Flash</strong><br />

memory by themselves. These file systems adapted the approach used in the Log-stru-<br />

tured File System (LFS) [25] since in LFS only append operations are used, which eliminates<br />

the need for overwrites. However, we did not consider JFFS <strong>and</strong> YAFFS as a c<strong>and</strong>idate<br />

for the file system in FSOC since both <strong>of</strong> them require a considerable amount <strong>of</strong><br />

resources, which cannot be assumed in a consumer device.<br />

3. DESIGN AND IMPLEMENTATION OF FSOC<br />

In this section, we first describe the design <strong>of</strong> our FSOC. The relationship <strong>and</strong> the<br />

interface between the host <strong>and</strong> the storage device as compared with conventional storage<br />

devices are presented. We then describe in detail the hardware platform <strong>and</strong> the s<strong>of</strong>tware<br />

structure <strong>of</strong> the prototype implementation <strong>of</strong> the FSOC.<br />

3.1 Structure <strong>of</strong> FSOC<br />

Fig. 1 shows the structure <strong>of</strong> a conventional storage device <strong>and</strong> the FSOC. Consider<br />

how a file request actually accesses the storage medium. In conventional storage devices,<br />

the request is converted to block access requests by the file system <strong>of</strong> the host operating<br />

system. The block access requests are then passed to the device driver, <strong>and</strong> the device<br />

driver converts block access requests to sector requests <strong>and</strong> transmits them to the storage<br />

device.<br />

In the FSOC, the file system is embedded within the storage device. Therefore, the<br />

host that uses the FSOC does not have to be equipped with a file system. Instead, it only<br />

needs a simple stub that serves as an interface between the host <strong>and</strong> the FSOC. All file


FLASH MEMORY-BASED FILE SYSTEM ON CHIP<br />

Host<br />

Application<br />

File<br />

Request<br />

Kernel<br />

File System<br />

Block<br />

Request<br />

Device Driver<br />

Bus<br />

Storage<br />

Media<br />

Sector<br />

Request<br />

Conventional<br />

Storage Device<br />

Host<br />

Application<br />

File<br />

Request<br />

Kernel<br />

FSOC Stub<br />

Bus File<br />

Request<br />

File System<br />

Storage<br />

Media<br />

FSOC<br />

Sector<br />

Request<br />

Fig. 1. Structure <strong>of</strong> conventional storage device <strong>and</strong> FSOC.<br />

Application<br />

Interface<br />

open()<br />

read()<br />

System<br />

write()<br />

Call<br />

unlink()<br />

mkdir()<br />

rmdir()<br />

rename()<br />

•••<br />

Host FSOC<br />

Stub<br />

Interface<br />

open_cli()<br />

read_cli()<br />

write_cli()<br />

unlink_cli()<br />

mkdir_cli()<br />

rmdir_cli()<br />

rename_cli()<br />

•••<br />

File<br />

Request<br />

FSOC<br />

Interface<br />

open_svc()<br />

read_svc()<br />

write_svc()<br />

unlink_svc()<br />

mkdir_svc()<br />

rmdir_svc()<br />

rename_svc()<br />

Fig. 2. Interface between the host <strong>and</strong> the FSOC.<br />

access requests are transmitted to the FSOC through this stub. The stub arranges the parameters<br />

<strong>of</strong> file requests <strong>and</strong> converts them to FSOC requests, which are then transmitted<br />

to the file system in the FSOC. The file system in the FSOC then fulfills the file request<br />

by making a direct request to the storage media, i.e., writes new data on the storage media<br />

or reads data from the storage media. It also updates the metadata when necessary.<br />

The results <strong>of</strong> the file operation are sent to the stub, <strong>and</strong> then passed on to the host application.<br />

The stub can be implemented as a st<strong>and</strong>-alone module or as a pseudo file system<br />

under the Virtual File System (VFS) layer [26]. In the latter case, applications can access<br />

the FSOC using an interface that is identical to other file systems.<br />

Fig. 2 shows the interface between the application <strong>and</strong> the stub, <strong>and</strong> between the<br />

stub <strong>and</strong> the FSOC file system. The FSOC interface is similar to that <strong>of</strong> an RPC (Remote<br />

Procedure Call) [27]. The FSOC has service routines corresponding to each file request,<br />

<strong>and</strong> the stub has the client routines. For example, a read operation is performed as follows:<br />

(1) the application makes a system call read() <strong>and</strong> passes the identifier <strong>of</strong> the file,<br />

the file <strong>of</strong>fset, the amount <strong>of</strong> data to be read, <strong>and</strong> the buffer address to the stub, (2) the<br />

stub executes the read_cli() routine <strong>and</strong> converts the file identifier, the file <strong>of</strong>fset, <strong>and</strong> the<br />

amount <strong>of</strong> data into an FSOC request, (3) the converted request is transmitted to the<br />

FSOC, (4) upon receiving the request, the read_svc() routine within the FSOC is called<br />

that reads data from the storage media, (5) the read_svc() routine returns the read data to<br />

the stub <strong>of</strong> the host, <strong>and</strong> (6) the stub, finally, passes the data to the application.<br />

•••<br />

1869


1870<br />

S. J. AHN, J. M. CHOI, D. H. LEE, S. H. NOH, S. L. MIN AND Y. K. CHO<br />

The FSOC interface needs to be comprehensive <strong>and</strong> general in order to provide interoperability<br />

<strong>of</strong> the storage <strong>and</strong> the various hosts. For this purpose the comm<strong>and</strong>s, parameters,<br />

<strong>and</strong> response <strong>of</strong> the FSOC interface is defined based on the POSIX interface<br />

for the files <strong>and</strong> directories [28] that has been widely adopted by various operating systems.<br />

The communication protocol between the host <strong>and</strong> the FSOC begins with issuing a<br />

comm<strong>and</strong> that initiates a file operation from the host. And then the host sends parameters<br />

that are required for the issued file operation. If the file operation requested is a file write,<br />

the host sends the data to write to the FSOC in the next step. Otherwise this step is omitted.<br />

If file data or metadata are requested by the host the FSOC sends them to the host<br />

after completing the request. The response that indicates whether the operation succeeded<br />

or failed, <strong>and</strong> the error code if failed, is transferred to the host in the final step.<br />

3.2 <strong>Implementation</strong> <strong>of</strong> FSOC<br />

We implemented an FSOC on a Compact<strong>Flash</strong> memory card that uses NAND type<br />

<strong>Flash</strong> memory as its storage media. As shown in Fig. 3, the Compact<strong>Flash</strong> has an ARM7-<br />

TDMI core that operates at 24MHz <strong>and</strong> 48KB <strong>of</strong> NOR type <strong>Flash</strong> memory that stores<br />

FTL code. Also, it has 16KB <strong>of</strong> SRAM for stack, data, <strong>and</strong> buffer area that are necessary<br />

for executing the FTL code. The interface with the host is PCMCIA [29]. The file system<br />

we implemented was embedded in the NOR <strong>Flash</strong> memory along with the FTL code. Fig.<br />

4 depicts the Compact<strong>Flash</strong> development board that was used to implement the FSOC.<br />

Host<br />

PCMCIA<br />

NAND<br />

<strong>Flash</strong><br />

<strong>Flash</strong><br />

Controller<br />

Compact<strong>Flash</strong><br />

Local BUS<br />

ARM7TDMI<br />

NOR <strong>Flash</strong><br />

SRAM<br />

Global BUS<br />

Fig. 3. Hardware structure <strong>of</strong> the Compact<strong>Flash</strong>. Fig. 4. Compact<strong>Flash</strong> development board used<br />

to implement the FSOC.<br />

In designing a FAT-based file system for our FSOC, we considered three requirements.<br />

The first requirement is quick recovery after power failure as FSOC is expected to<br />

be used mostly in mobile environments where power is turned <strong>of</strong>f frequently, inadvertently<br />

or not. For this reason, we added a journaling mechanism [30, 31] that records the<br />

contents <strong>of</strong> the file operations before actually modifying the file system for recovery purposes.<br />

Specifically, when metadata such as FAT entries or directory entries need to be<br />

updated, the operation is written to the log, which is maintained as a file in the root directory,<br />

prior to the actual update. It results in a slight performance degradation for write<br />

operations compared to the original FAT file system. However, read performance is not<br />

affected.<br />

The second requirement is efficient execution on low performance processors that


FLASH MEMORY-BASED FILE SYSTEM ON CHIP<br />

are expected to be used in FSOC. To meet this requirement, summary information <strong>of</strong> file<br />

names contained in directories were retained in the directory entries. The summary information<br />

is also cached in the main memory <strong>of</strong> FSOC <strong>and</strong> managed in LRU manner.<br />

This caching mechanism reduces the file name lookup time, thereby improving execution<br />

efficiency.<br />

The last requirement is on code <strong>and</strong> data memory. Recall that the NOR type <strong>Flash</strong><br />

memory size is only 48KB, <strong>and</strong> the FTL code occupies 13.6KB. Therefore, the file system<br />

must fit into what is left. Also, there is only 16KB <strong>of</strong> SRAM available. Again,<br />

6.2KB <strong>of</strong> it is required by the FTL code <strong>and</strong> some space for the buffer is also required for<br />

the transfer <strong>of</strong> data between the host <strong>and</strong> FSOC. For this purpose, we used the 16 bit<br />

Thumb ISA supported in ARM7TDMI rather than the 32 bit ARM ISA <strong>and</strong> avoided<br />

compiler optimizations that can increase the code size. As a result, the resulting file system<br />

uses only 10KB <strong>of</strong> the NOR type <strong>Flash</strong> memory <strong>and</strong> 6.5KB <strong>of</strong> SRAM, which meets<br />

the memory requirements.<br />

4. PERFORMANCE MODEL FOR FSOC<br />

In this section, we present the performance model for FSOC, <strong>and</strong> use the model to<br />

compare its performance with a conventional storage device. The performance evaluation<br />

criterion is the application run time including storage access time. Some common assumptions<br />

that we make for our model are as follows:<br />

(1) There is only one application executing.<br />

(2) An application reads a unit <strong>of</strong> data <strong>and</strong> computation is performed on this data. A constant<br />

amount <strong>of</strong> time is consumed for this computation. This read-computation cycle<br />

is repeated N number <strong>of</strong> times.<br />

(3) Write operations <strong>of</strong> the application are non-blocking, <strong>and</strong> hence do not affect the run<br />

time <strong>of</strong> the application. (Note that these write requests will be queued <strong>and</strong> processed<br />

later by the operating system.)<br />

4.1 <strong>Performance</strong> Model for Serial Execution <strong>of</strong> I/O <strong>and</strong> Computation<br />

Applications that use blocking reads will block upon a request for data. For these<br />

kinds <strong>of</strong> applications the reading <strong>of</strong> data <strong>and</strong> computation upon this read data can only be<br />

executed one after the other. Hence, overlapping <strong>of</strong> I/O <strong>and</strong> computation is impossible<br />

leading to serial execution <strong>of</strong> I/O <strong>and</strong> computation.<br />

The performance model, in this case, is derived from operations <strong>of</strong> the host CPU,<br />

bus, <strong>and</strong> storage devices. In a conventional storage device, execution <strong>of</strong> an application<br />

consists <strong>of</strong> application code execution, file system code execution, device driver code<br />

execution, <strong>and</strong> I/O processing, where I/O processing is divided into storage media access<br />

<strong>and</strong> data transfer. Then, the application run time, denoted Tconv_serial, can be expressed as<br />

Eq. (1). (Refer to Table 1 for the definition <strong>of</strong> the symbols used in all equations.)<br />

Tconv_serial = Tcomp_serial + N(TFS_host + Tdriver + Tmeida + Ttrans_conv + Tcomp) (1)<br />

1871


1872<br />

S. J. AHN, J. M. CHOI, D. H. LEE, S. H. NOH, S. L. MIN AND Y. K. CHO<br />

Table 1. Definition <strong>of</strong> symbols used in the performance model.<br />

Used model Symbol Meaning<br />

N Total number <strong>of</strong> data units to be processed<br />

Execution time <strong>of</strong> application code that is not dependent on par-<br />

Tcomp_serial ticular data unit <strong>and</strong> cannot be overlapped with I/O (e.g. initializing<br />

memory at program start up or outputting overall result at<br />

the end <strong>of</strong> program)<br />

Common<br />

Tcomp Application code execution time for performing computation on<br />

a unit <strong>of</strong> data<br />

Tmedia Storage media access time for a unit <strong>of</strong> data<br />

Time for a unit <strong>of</strong> data to be transferred between host <strong>and</strong> storage<br />

Ttrans device when they are the same for conventional storage device<br />

<strong>and</strong> FSOC<br />

Tconv_serial Total application run time for conventional storage device when<br />

all I/O <strong>and</strong> computation are serially executed<br />

Conventional<br />

storage device<br />

Tconv_parallel TFS_host Total application run time for conventional storage device, when<br />

some I/O <strong>and</strong> computation are executed in parallel<br />

File system code execution time <strong>of</strong> host for a unit <strong>of</strong> data (does<br />

not include device driver code execution time)<br />

Tdriver Device driver code execution time <strong>of</strong> host for a unit <strong>of</strong> data<br />

Ttrans_conv Data transfer time between host <strong>and</strong> conventional storage device<br />

for a unit <strong>of</strong> data<br />

TFSOC_serial Total application run time for FSOC when all I/O <strong>and</strong> computation<br />

are serially executed<br />

FSOC<br />

TFSOC_parallel Total application run time for FSOC when some I/O <strong>and</strong> computation<br />

are processed in parallel<br />

Stub code execution time for a unit <strong>of</strong> data<br />

T stub<br />

T FS_FSOC<br />

T trans_FSOC<br />

File system code execution time <strong>of</strong> FSOC for a unit <strong>of</strong> data<br />

Data transfer time between host <strong>and</strong> FSOC for a unit <strong>of</strong> data<br />

When FSOC is used, execution <strong>of</strong> an application consists <strong>of</strong> application code execution,<br />

stub code execution, <strong>and</strong> I/O processing. I/O processing <strong>of</strong> FSOC is divided into<br />

three parts: the file system code execution within FSOC, storage media access, <strong>and</strong> data<br />

transfer. The key difference in executing an application with FSOC <strong>and</strong> with a conventional<br />

storage device is that the host executes the stub code instead <strong>of</strong> the device driver<br />

code <strong>and</strong> that the file system code is executed within the device, not in the host, for<br />

FSOC. The application run time, denoted TFSOC_serial, can be expressed as Eq. (2).<br />

TFSOC_serial = Tcomp_serial + N(Tstub + TFS_FSOC + Tmeida + Ttrans_FSOC + Tcomp) (2)<br />

4.2 <strong>Performance</strong> Model for Parallel Execution <strong>of</strong> I/O <strong>and</strong> Computation<br />

In this subsection, we present the performance model when I/O processing <strong>and</strong><br />

computation may be overlapped. We assume that parallel execution <strong>of</strong> I/O <strong>and</strong> computation<br />

is possible, that is, the application can determine which data will be needed before


FLASH MEMORY-BASED FILE SYSTEM ON CHIP<br />

the computation on the currently read data is complete, <strong>and</strong> reads are non-blocking<br />

<strong>and</strong>/or, without loss <strong>of</strong> generality, assume that two processes cooperate per application,<br />

one for I/O processing <strong>and</strong> one for computation. We also assume that data read requests<br />

are issued as soon as the bus <strong>and</strong> storage become available, that is, the highest priority is<br />

given to I/O processing <strong>and</strong> when I/O processing is available the operating system notifies<br />

the process so it can suspend computation, issue I/O request, <strong>and</strong> resume computation.<br />

Similarly to the performance model presented for serial execution <strong>of</strong> I/O <strong>and</strong> computation,<br />

the application execution time for a conventional storage device can be derived<br />

from operations <strong>of</strong> the host CPU, bus, <strong>and</strong> storage device. Fig. 5 depicts the execution<br />

behavior when I/O <strong>and</strong> computation may be overlapped. The application execution time<br />

for this case is given in Eq. (3). Similarly, the execution behavior for FSOC is depicted in<br />

Fig. 6, <strong>and</strong> the application execution time is given in Eq. (4). Note that if N is 1, then Eqs.<br />

(3) <strong>and</strong> (4) <strong>and</strong> Eqs. (1) <strong>and</strong> (2), respectively, become the same. Both situations represent<br />

the case where I/O processing <strong>and</strong> computation cannot overlap.<br />

CPU<br />

Bus<br />

Storage<br />

initialization<br />

read request<br />

for data unit 1<br />

C F D<br />

M<br />

read request<br />

for data unit 2<br />

T<br />

process<br />

data unit 1<br />

F D C F D C<br />

C C<br />

C<br />

M<br />

read request<br />

for data unit 3<br />

T<br />

M<br />

T<br />

process<br />

data unit 2<br />

C: application code execution<br />

D: device driver code execution<br />

F: file system code execution<br />

M: media access<br />

T: data transfer<br />

process<br />

data unit 3<br />

output<br />

result<br />

Fig. 5. Operations <strong>of</strong> host CPU, bus, <strong>and</strong> storage device for conventional storage device when I/O<br />

<strong>and</strong> computation are executed in parallel (N = 3, T comp_serial = 2 time units, T comp = 3 time<br />

units, <strong>and</strong> all other parameters are <strong>of</strong> 1 time unit).<br />

CPU<br />

Bus<br />

Storage<br />

initialization<br />

read request<br />

for data unit 1<br />

C<br />

S<br />

F<br />

M<br />

read request<br />

for data unit 2<br />

T<br />

S C<br />

S<br />

F<br />

process<br />

data unit 1<br />

M<br />

read request<br />

for data unit 3<br />

T<br />

F<br />

process<br />

data unit 2<br />

C C<br />

C<br />

M<br />

T<br />

C: application code execution<br />

S: stub code execution<br />

F: file system code execution<br />

M: media access<br />

T: data transfer<br />

process<br />

data unit 3<br />

output<br />

result<br />

Fig. 6. Operations <strong>of</strong> host CPU, bus, <strong>and</strong> storage device for FSOC when I/O <strong>and</strong> computation are<br />

executed in parallel (N = 3, T comp_serial = 2 time units, T comp = 3 time units, <strong>and</strong> all other parameters<br />

are <strong>of</strong> 1 time unit).<br />

1873<br />

Time<br />

Time


1874<br />

S. J. AHN, J. M. CHOI, D. H. LEE, S. H. NOH, S. L. MIN AND Y. K. CHO<br />

Tconv_parallel = Tcomp_serial + N(TFS_host + Tdriver) + Tmeida + Ttrans_conv + Tcomp<br />

+ (N − 1)max(Tcomp, Tmedia + Ttrans_conv) (3)<br />

TFSOC_parallel = Tcomp_serial + N × Tstub + TFS_FSOC + Tmeida + Ttrans_FSOC + Tcomp<br />

+ (N − 1)max(Tcomp, TFS_FSOC + Tmedia + Ttrans_FSOC) (4)<br />

The performance <strong>of</strong> executing on a conventional storage device <strong>and</strong> FSOC can be<br />

compared based on the presented performance model. In our analysis, we assume that<br />

TFS_FSOC > TFS_host because the computation power <strong>of</strong> a storage device would, in general,<br />

be lower than that <strong>of</strong> the host. For simplicity, we also assume that the execution time for<br />

the device driver code <strong>and</strong> stub code are the same as the stub code basically plays the<br />

role <strong>of</strong> the device driver for FSOC 1 . Another simplification we make is that the data<br />

transfer time for both devices are the same. Hence, we denote both Ttrans_FSOC <strong>and</strong> Ttrans<br />

_conv as Ttrans.<br />

Application run tim e<br />

100000<br />

90000<br />

80000<br />

70000<br />

60000<br />

50000<br />

40000<br />

30000<br />

20000<br />

10000<br />

(1) (2.a) (2.b) (3)<br />

0<br />

0 50 100<br />

Tcom Tcomp p<br />

150 200<br />

T_conv_seria<br />

Tconv_serial<br />

T_FS TFSOC_serial O C _seria<br />

Tconv_paralle<br />

Tconv_parallel<br />

T_FS TFSOC_parallel O C _para<br />

Fig. 7. Application run time as application code execution time is varied.<br />

The performance comparison results obtained from Eqs. (1) to (4) are shown in Fig.<br />

7. In this figure, the application code execution time for a unit <strong>of</strong> data (Tcomp) is varied,<br />

while all other parameters are fixed. Tconv_serial <strong>and</strong> TFSOC_serial are increased linearly to N ×<br />

Tcomp. For the case when I/O <strong>and</strong> computation are executed serially, the conventional<br />

storage device shows better performance for all Tcomp, <strong>and</strong> the difference (TFSOC_serial –<br />

Tconv_serial) is constant at N(TFS_FSOC – TFS_host).<br />

When I/O <strong>and</strong> computation are being executed in parallel, observe from this figure<br />

that there are three phases <strong>of</strong> execution, which we denote by (1), (2), <strong>and</strong> (3). We discuss<br />

each <strong>of</strong> these phases separately.<br />

(1) This is when the application code execution time is less than the I/O time <strong>of</strong> the conventional<br />

storage device, that is, Tcomp < Tmedia + Ttrans. Here, application run time for<br />

1 Strictly speaking, Tstub is larger than Tdriver, but the difference is negligible. The only extra overhead in executing<br />

the stub code compared with an ordinary device driver is copying the arguments <strong>of</strong> the file operation<br />

to a contiguous memory area so that they can be transferred via DMA. In most cases, the size <strong>of</strong> the arguments<br />

to be copied is around 20 bytes, small enough to be negligible. We measured Tstub in our prototype<br />

implementation, <strong>and</strong> it was only 4% larger than Tdriver when reading 1MB <strong>of</strong> data.


FLASH MEMORY-BASED FILE SYSTEM ON CHIP<br />

both the conventional storage device <strong>and</strong> FSOC are mainly dominated by the I/O<br />

time with an increase rate <strong>of</strong> 1. Note that in Fig. 7, the results seem to be a constant<br />

value, but this is because the increase rate is relatively very small compared to the<br />

other phases.<br />

The difference is (TFSOC_parallel – Tconv_parallel) = N(TFS_FSOC – TFS_host) (recall that<br />

we assume that Tstub <strong>and</strong> Tdriver are the same). The difference in the run time is caused<br />

by the difference in executing the file system code in the host <strong>and</strong> FSOC, <strong>and</strong> the<br />

conventional storage device shows better performance. Detailed execution times<br />

occurring at each system component for this phase are depicted in Fig. 8 (a).<br />

(2) This is when application code execution time is greater than the I/O time <strong>of</strong> the conventional<br />

storage device <strong>and</strong> smaller than the I/O time <strong>of</strong> FSOC, that is, Tmedia +<br />

Ttrans ≤ Tcomp < TFS_FSOC + Tmedia + Ttrans. Here, application run time <strong>of</strong> the conventional<br />

storage device is dominated by Tcomp <strong>and</strong> the rate <strong>of</strong> increase is N. On the other<br />

h<strong>and</strong>, application run time <strong>of</strong> FSOC is still dominated by I/O time <strong>and</strong> the rate <strong>of</strong> increase<br />

is 1. Therefore, the difference in application execution time grows smaller as<br />

Tcomp increases, <strong>and</strong> eventually crosses over.<br />

In order to emphasize the crossover point, we divided phase (2) into two<br />

sub-phases, that is, phase (2.a), where the conventional storage device shows better<br />

performance <strong>and</strong> phase (2.b), where the FSOC shows better performance. In phase<br />

(2.a), Tmedia + Ttrans ≤ Tcomp < (Tmedia + Ttrans) + N/(N – 1)(TFS_FSOC − TFS_host), <strong>and</strong> the<br />

difference is N(TFS_FSOC − TFS_host) + (N – 1)((Tmedia + Ttrans) – Tcomp).<br />

In phase (2.b), (Tmedia + Ttrans) + N/(N – 1)(TFS_FSOC − TFS_host) ≤ Tcomp < Tmedia +<br />

Ttrans + TFS_FSOC, <strong>and</strong> the difference is given as N(TFS_host – TFS_FSOC) + (N – 1)(Tcomp –<br />

(Tmedia + Ttrans)). Figs. 8 (b) <strong>and</strong> (c) show the detailed execution times occurring at<br />

each system component for these two situations, respectively.<br />

(3) This is when the application code execution time is greater than the I/O time <strong>of</strong> the<br />

FSOC, that is, Tcomp ≥ Tmedia + Ttrans + TFS_FSOC ≥ Tmedia + Ttrans. Here, the application<br />

run times <strong>of</strong> both the conventional storage device <strong>and</strong> the FSOC are dominated by<br />

the application code execution time. Hence, the application run time increases proportionally<br />

to Tcomp for both devices <strong>and</strong> the rate <strong>of</strong> increase is N. The difference in<br />

the application run time between the conventional storage device <strong>and</strong> the FSOC is<br />

(Tconv_parallel – TFSOC_parallel) = N × TFS_host – TFS_FSOC. Fig. 8 (d) shows the detailed execution<br />

times occurring at each system component for this phase.<br />

In summary, FSOC performs better than the conventional storage device when the<br />

application code execution time is larger than the I/O time. This performance gain is due<br />

to the fact that parallel execution <strong>of</strong> file system code <strong>and</strong> application code is possible<br />

with FSOC. Otherwise, the conventional storage device performs better.<br />

5. PERFORMANCE EVALUATION<br />

In this section, the quantitative aspect <strong>of</strong> FSOC is evaluated through several experiments<br />

using our prototype implementation. For this purpose, the performance <strong>of</strong><br />

FSOC is compared against a conventional storage device. The conventional storage device<br />

1875


1876<br />

Conv<br />

CPU<br />

Bus<br />

Storage<br />

FSOC<br />

CPU<br />

Bus<br />

Storage<br />

Conv<br />

CPU<br />

Bus<br />

Storage<br />

FSOC<br />

CPU<br />

Bus<br />

Storage<br />

Conv<br />

CPU<br />

Bus<br />

Storage<br />

FSOC<br />

CPU<br />

Bus<br />

Storage<br />

Conv<br />

CPU<br />

Bus<br />

Storage<br />

FSOC<br />

CPU<br />

Bus<br />

Storage<br />

Ci F D<br />

Ci<br />

S<br />

Ci F D<br />

Ci<br />

S<br />

Ci F D<br />

Ci<br />

S<br />

Ci F D<br />

Ci<br />

S<br />

S. J. AHN, J. M. CHOI, D. H. LEE, S. H. NOH, S. L. MIN AND Y. K. CHO<br />

F<br />

F<br />

F<br />

F<br />

M<br />

M<br />

M<br />

M<br />

T<br />

M<br />

T<br />

M<br />

T<br />

M<br />

T<br />

M<br />

F D C1 F D C2 C3 Co<br />

T<br />

M<br />

T<br />

M<br />

T<br />

S C1<br />

S C2 C3 Co<br />

F<br />

M<br />

T<br />

F D C1 F D C1 C2 C3 Co<br />

T<br />

M<br />

T<br />

M<br />

T<br />

F<br />

S C1<br />

S C2 C3 Co<br />

F<br />

M<br />

T<br />

F<br />

M<br />

M<br />

T<br />

T<br />

Ci: application code execution for initialization<br />

Co: applicatio ncode execution for outputting overall result<br />

Cn: application code execution for n-th data unit<br />

D: device driver code execution<br />

S: stub code execution<br />

F: file system code execution<br />

M: media access<br />

T: data transfer<br />

F D C1 F D C1<br />

C2 C3<br />

Co<br />

T<br />

M<br />

T<br />

M<br />

T<br />

S C1<br />

S<br />

C2 C3<br />

Co<br />

F<br />

M<br />

T<br />

F<br />

M<br />

F D C1 F D C1<br />

C2 C3<br />

Co<br />

T<br />

M<br />

T<br />

M<br />

T<br />

S C1<br />

S C1<br />

C2 C3<br />

Co<br />

F<br />

(a) T comp = 2, phase (1).<br />

(b) T comp = 3, phase (2.a).<br />

(c) Tcomp = 4, phase (2.b).<br />

M<br />

T<br />

(d) Tcomp = 5, phase (3).<br />

Fig. 8. Operation examples <strong>of</strong> the conventional storage device <strong>and</strong> FSOC.<br />

used in our experiments is the Compact<strong>Flash</strong> memory card that we described in the previous<br />

section. This card has exactly the same hardware <strong>and</strong> s<strong>of</strong>tware configuration as the<br />

one on which the FSOC prototype was implemented. The relationship between the host<br />

<strong>and</strong> the storage devices that it operates is shown in Fig. 9. For the host system, we used<br />

an embedded system development board with an ARM920T core running the Linux<br />

2.4.18 operating system. The same host is used for both the FSOC <strong>and</strong> the conventional<br />

F<br />

M<br />

T<br />

T<br />

Time<br />

Time<br />

Time<br />

Time


FLASH MEMORY-BASED FILE SYSTEM ON CHIP<br />

Linux kernel<br />

FTL<br />

<strong>Flash</strong><br />

Memory<br />

Compact<strong>Flash</strong><br />

Host<br />

Virtual File System Layer<br />

FSOC<br />

File System<br />

Device Driver<br />

Application<br />

FSOC Stub<br />

FSOC<br />

File System<br />

FTL<br />

<strong>Flash</strong><br />

Memory<br />

FSOC<br />

Fig. 9. Host implementation supporting FSOC <strong>and</strong> Compact<strong>Flash</strong>.<br />

storage device. For the FSOC, a stub was implemented <strong>and</strong> added to the kernel. The file<br />

system used in the FSOC prototype was ported to the Linux kernel to operate the conventional<br />

storage device. Therefore, the file system in the host is exactly the same as the<br />

file system in FSOC. This was done for the purpose <strong>of</strong> fair comparison. For all the experiments,<br />

the PCMCIA interface was used <strong>and</strong> the host clock rate is fixed at 56Mhz<br />

unless otherwise stated. The NAND flash memory we used has read b<strong>and</strong>width <strong>of</strong><br />

42MB/sec <strong>and</strong> written b<strong>and</strong>width <strong>of</strong> 2.56MB/sec. The data transfer rate <strong>of</strong> the bus ranges<br />

between 700KB/sec <strong>and</strong> 1MB/sec depending on the host clock speed.<br />

5.1 Computation Time <strong>and</strong> I/O Time <strong>of</strong> an Application<br />

There are two performance implications as we move the file system from the host to<br />

the storage device. The first is that the host CPU burden is reduced as the file system<br />

code is no longer executed. This leaves more room for other CPU activities including<br />

application code execution in the host system that may be executed in parallel with the<br />

file system code that is executed in the FSOC, having a positive influence on performance.<br />

On the other h<strong>and</strong>, the CPU in the FSOC, which is generally slower than the one in<br />

the host system, now has more work to do than before, having a negative influence on<br />

performance.<br />

In this section, we show how the ratio between the computation time <strong>and</strong> the I/O<br />

time <strong>of</strong> the application influences the overall performance <strong>of</strong> FSOC. For this purpose, we<br />

perform experiments with a synthetic workload that varies the ratio between the computation<br />

time <strong>and</strong> the I/O time. Fig. 10 shows the pseudo code for the synthetic workload.<br />

We used non-blocking read for parallel execution <strong>of</strong> I/O processing <strong>and</strong> computation. I/O<br />

processing <strong>and</strong> computation is performed in 4KB data units. Step 5 in Fig. 10 is a dummy<br />

loop that does not perform any useful computation, but was inserted to control the computation<br />

time so we could control the computation <strong>and</strong> I/O time ratio.<br />

The results from this experiment are shown in Fig. 11 (a), where the x-axis is the<br />

initial value <strong>of</strong> the counter variable <strong>and</strong> the y-axis is the total execution time <strong>of</strong> the synthetic<br />

application. ‘FSOC’ denotes the results for the FSOC prototype <strong>and</strong> ‘Conv’ denotes<br />

the results for the conventional storage device. When the initial value <strong>of</strong> the counter<br />

1877


1878<br />

Application run time (ms)<br />

2500<br />

2000<br />

1500<br />

1000<br />

500<br />

0<br />

S. J. AHN, J. M. CHOI, D. H. LEE, S. H. NOH, S. L. MIN AND Y. K. CHO<br />

1) issue non-blocking read request for initial 4KB <strong>of</strong> data<br />

2) check if read request is completed<br />

a. if not completed, wait for completion<br />

3) issue non-blocking read request for the next 4KB <strong>of</strong> data<br />

4) sum all values in the read data<br />

5) count from the specified initial value to 0 (dummy loop)<br />

6) check if the total amount <strong>of</strong> read data is 1MB<br />

a. if it is, terminate program<br />

b. otherwise goto step 2)<br />

Fig. 10. Pseudo code for the synthetic workload using non-blocking read.<br />

App. execution time (ms)<br />

2500<br />

2400<br />

2300<br />

2200<br />

2100<br />

2000<br />

1900<br />

1800<br />

1700<br />

1600<br />

T_conv T_FSOC<br />

1500<br />

5000 10000 15000 20000 25000<br />

Computation overhead (iteration count <strong>of</strong> dummy loop)<br />

(a) Measured application execution time <strong>of</strong> synthetic workload.<br />

TFSOC_parallel derived from Eq. (4) T_FSOC measured from the experiment<br />

800 1000 1200 1400 1600 1800 2000<br />

Tcomp (ms)<br />

Application run time (ms)<br />

2500<br />

2000<br />

1500<br />

1000<br />

T_conv_parallel derived from Eq.(3) T_conv measured from the experiment<br />

500<br />

0<br />

800 1000 1200 1400 1600 1800 2000<br />

Tcomp (ms)<br />

(b) Comparison <strong>of</strong> measured application execution time <strong>and</strong> derived value from Eqs. (3) <strong>and</strong> (4)<br />

(with parameters N = 256, Tmedia + Ttrans = 5.6, Tdriver = 0.72, TFS_host = 0.24, TFS_FSOC = 0.42,<br />

Tstub = 0.75, <strong>and</strong> Tcomp_serial = 38).<br />

Fig. 11. Result <strong>of</strong> the synthetic workload experiment.<br />

variable is less than 15000, the application run time <strong>of</strong> both the conventional storage device<br />

<strong>and</strong> the FSOC are bounded by the I/O time, which consists <strong>of</strong> the file/storage device<br />

access time <strong>and</strong> the data transfer time. In this range, since most <strong>of</strong> the computation time<br />

is hidden by the I/O time, the application run time <strong>of</strong> both the conventional storage de-


FLASH MEMORY-BASED FILE SYSTEM ON CHIP<br />

vice <strong>and</strong> the FSOC increases very slowly, looking almost as if they are constant, with the<br />

FSOC increasing even slower as more file/storage activities are performed in the storage<br />

device, which has a slower CPU.<br />

When the initial value is over 15000, things start to change in the conventional storage<br />

device. Now, all the computation time cannot be hidden behind the I/O time <strong>and</strong> the<br />

computation time starts to dictate the application run time. Hence, as the computation<br />

time increases, the total execution time increases with it. For FSOC, since its I/O time is<br />

greater than that <strong>of</strong> the conventional storage device, the above phenomenon does not<br />

occur until the counter variable reaches 17000. Between 15000 <strong>and</strong> 17000, the execution<br />

time <strong>of</strong> FSOC remains almost constant while that <strong>of</strong> the conventional storage device increases<br />

linearly. Hence, before the two crosses over (in our case this happens when the<br />

counter variable value reaches 16000), the conventional storage device performs better,<br />

while after the crossover point, the FSOC starts to perform better. Beyond 17000, both<br />

devices are dominated by the computation time <strong>and</strong> so the difference in performance<br />

remains constant with FSOC performing better.<br />

Fig. 11 (b) compares Fig. 11 (a), the results obtained through actual measurements<br />

<strong>and</strong> Fig. 7, which shows the values obtained from the model. The parameters used for the<br />

model were obtained from actual measurements 2 . Observe the similarity between the<br />

results. The margin <strong>of</strong> error is in the 2-3% range. The error comes from the extra overhead<br />

caused by executing measurement code. When we calibrated the parameters by<br />

subtracting measurement overhead, the margin <strong>of</strong> error was reduced below 1%. These<br />

results experimentally validate the presented performance model for the case where I/O<br />

processing <strong>and</strong> computation may be executed in parallel.<br />

5.2 Computing Power <strong>of</strong> the Host <strong>and</strong> the Storage Device<br />

Portable storage devices such as the FSOC can be used with diverse hosts, <strong>and</strong> the<br />

execution time <strong>of</strong> the application code <strong>and</strong> the file system code varies depending on the<br />

computing power <strong>of</strong> the host. In this section, we analyze the influence <strong>of</strong> the computing<br />

power <strong>of</strong> the host <strong>and</strong> the storage devices on the application performance. For this purpose,<br />

we execute the synthetic workload described in section 5.1 with various host clock<br />

speeds.<br />

Fig. 12 shows the experimental results for host clock speeds <strong>of</strong> 45MHz, 56MHz,<br />

<strong>and</strong> 67MHz, while the clock speed <strong>of</strong> the storage device core remains fixed at 24Mhz.<br />

Notice that the crossover point where the FSOC starts to perform better than the conventional<br />

storage device moves to the right as the clock speed increases. This is a natural<br />

consequence as with a faster clock more computation can be hidden behind the I/O time.<br />

Except for this, we can observe the same performance trends as in Fig. 11 (a).<br />

We also performed experiments with three real-world applications: cat, gzip, <strong>and</strong><br />

mpeg. The versions that we used were cat that is embedded in busybox 0.60.3, gzip 1.3.2<br />

from GNU, mpeg2play 1.1 from MPEG S<strong>of</strong>tware Simulation Group. Cat dumps the contents<br />

<strong>of</strong> a 1MB file on the terminal, gzip compresses a file whose original size is 4.8MB,<br />

2 We obtained the actual execution time by inserting measurement instructions into the device driver, stub, <strong>and</strong><br />

the file system code. TFS_host is measured by recording timestamps at the entry <strong>of</strong> the file system code <strong>and</strong> the<br />

device driver code in the host, <strong>and</strong> calculating the difference. TFS_FSOC is measured by similar method, recording<br />

timestamps at the entry <strong>of</strong> the file system code <strong>and</strong> the flash memory access code in the storage device.<br />

1879


Application execution time (ms) .<br />

1880<br />

2100<br />

1900<br />

1700<br />

1500<br />

S. J. AHN, J. M. CHOI, D. H. LEE, S. H. NOH, S. L. MIN AND Y. K. CHO<br />

App. execution time (ms)<br />

2900<br />

2700<br />

2500<br />

2300<br />

2100<br />

1900<br />

1700<br />

Host 45MHz, Conv<br />

Host 45MHz, FSO C<br />

Host 56MHz, Conv<br />

Host 56MHz, FSO C<br />

Host 67MHz, Conv<br />

Host 67MHz, FSO C<br />

1500<br />

5000 10000 15000 20000 25000 30000<br />

Computation overhead (interation count <strong>of</strong> dummy loop)<br />

Fig. 12. Results <strong>of</strong> synthetic workload execution with various host clock speeds.<br />

FSO C<br />

Conv<br />

1300<br />

30 40 50 60 70 80 90<br />

Host clock speed (MHz)<br />

Application execution time (ms) .<br />

19000<br />

17000<br />

15000<br />

13000<br />

11000<br />

FSO C<br />

Conv<br />

9000<br />

30 40 50 60 70<br />

Host clock speed (MHz)<br />

80 90<br />

(a) Cat. (b) Gzip.<br />

Application execution time (ms) .<br />

5600<br />

5100<br />

4600<br />

4100<br />

FSO C<br />

Conv<br />

3600<br />

60 70 80 90 100<br />

Host clock speed (MHz)<br />

(c) Mpeg.<br />

Fig. 13. Application run time <strong>of</strong> cat, gzip, <strong>and</strong> mpeg.<br />

<strong>and</strong> mpeg decodes a 1.7MB video clip. They represent applications with varying ratios <strong>of</strong><br />

computation time <strong>and</strong> I/O time. Cat is an I/O-bound application, while mpeg is computation-bound.<br />

Gzip is an application that has almost the same I/O <strong>and</strong> computation times.<br />

For our experiments we modified the gzip <strong>and</strong> mpeg to spawn a process whose task<br />

is to read data <strong>and</strong> transfer the read data to the parent process via pipe IPC. Also, the


FLASH MEMORY-BASED FILE SYSTEM ON CHIP<br />

original mpeg program outputs the decoded result onto a display device. To reduce the<br />

effect <strong>of</strong> the display, we removed the output part <strong>of</strong> the program.<br />

Fig. 13 shows the results with the three real-world applications. The x-axis in the<br />

graphs represents the host clock speed <strong>and</strong> the y-axis is the application execution time in<br />

milliseconds. Note that the scales in the graphs are different for each <strong>of</strong> the graphs. In the<br />

case <strong>of</strong> the I/O-bound cat, the conventional storage device shows better performance,<br />

while for the computation-bound mpeg, the FSOC performs better for all clock speeds.<br />

However, in the case <strong>of</strong> gzip, the performance <strong>of</strong> the conventional storage device <strong>and</strong> the<br />

FSOC crosses over when the host clock speed is at around 50MHz. These results validate<br />

the experiments done with the synthetic workloads presented above.<br />

5.3 Effects <strong>of</strong> Multiprogramming<br />

In this section, we compare the performance <strong>of</strong> FSOC <strong>and</strong> conventional storage<br />

when there are multiple applications running concurrently in the host system. To examine<br />

the effect <strong>of</strong> multiprogramming, we executed the I/O-bound application cat in parallel<br />

with an application that calculates the value <strong>of</strong> the circular constant pi, <strong>and</strong> measured<br />

the time elapsed to complete both applications. Fig. 14 shows the results <strong>of</strong> the experiment.<br />

Application execution time (ms) .<br />

4300<br />

3800<br />

3300<br />

2800<br />

2300<br />

1800<br />

1300<br />

800<br />

30 40 50 60 70 80 90<br />

Host clock speed (MHz)<br />

Fig. 14. Effect <strong>of</strong> multiprogramming.<br />

F S O C (c a t+ p i)<br />

C onv (cat+pi)<br />

F S O C (c a t)<br />

Conv (cat)<br />

F S O C (p i)<br />

Conv (pi)<br />

The application pi does not have any file operations, so the application run time is<br />

the same for both conventional storage <strong>and</strong> FSOC. Therefore, in Fig. 14, the results for<br />

FSOC (pi) <strong>and</strong> Conv (pi) are completely overlapped <strong>and</strong> look like one line. Cat is slower<br />

with FSOC as discussed previously when it is executed alone. However, when it is executed<br />

concurrently with pi, FSOC shows better performance. The implication is that<br />

when the host CPU is kept busy via multiprogramming, FSOC can perform better even<br />

for I/O-bound applications. This indicates that when there are more applications running<br />

concurrently on embedded systems, the performance benefit <strong>of</strong> FSOC will increase as<br />

well.<br />

1881


1882<br />

S. J. AHN, J. M. CHOI, D. H. LEE, S. H. NOH, S. L. MIN AND Y. K. CHO<br />

5.4 Data Traffic between the Host <strong>and</strong> the Storage Device<br />

Data traffic between the host <strong>and</strong> the storage device is reduced with FSOC, again,<br />

leading to improved performance. This is because metadata that are necessary for file<br />

operations need not be transferred since the file system now resides in the storage device.<br />

For example, to create a new file on the FAT file system, the file allocation table <strong>and</strong> the<br />

directory entry need to be modified. If the file system is run in the host system, data for<br />

the file allocation table <strong>and</strong> the directory entry are transferred to the host, <strong>and</strong> they are<br />

transferred back to the storage device after required modifications are made. However, in<br />

FSOC, these operations are performed within the storage device, eliminating the need for<br />

metadata transfers. Only the name <strong>of</strong> the file to be created needs to be transferred to the<br />

FSOC.<br />

Data traffic ratio (FSOC/Conv) (%) .<br />

100<br />

90<br />

80<br />

70<br />

60<br />

50<br />

40<br />

30<br />

20<br />

10<br />

0<br />

Directory<br />

creation<br />

Directory<br />

re m o v a l<br />

File<br />

creation<br />

File<br />

re m o v a l<br />

File operations<br />

Renam e File<br />

read/w rite<br />

Fig. 15. Data traffic ratio between the FSOC <strong>and</strong> the conventional storage device.<br />

The amount <strong>of</strong> metadata transferred varies depending on the type <strong>of</strong> file operation.<br />

Fig. 15 shows the ratio between the data traffic <strong>of</strong> the conventional storage device <strong>and</strong><br />

the FSOC for the various types <strong>of</strong> file operations. The read operation <strong>and</strong> the write operation<br />

require a relatively small amount <strong>of</strong> metadata compared with the file data, <strong>and</strong><br />

thus the data traffic <strong>of</strong> the conventional storage device <strong>and</strong> the FSOC is almost the same.<br />

However, when performing file operations such as directory creation, directory removal,<br />

file creation, <strong>and</strong> file removal, which mainly manipulates metadata, the data traffic required<br />

for FSOC is much smaller (between 5% <strong>and</strong> 30%) than that required in the conventional<br />

storage device.<br />

To examine the effects <strong>of</strong> data traffic on the performance <strong>of</strong> FSOC <strong>and</strong> conventional<br />

storage, we performed three experiments. The first two experiments are done with synthetic<br />

workloads, each representing two extreme cases. The first experiment is sequentially<br />

reading data from a 1MB file. The amount <strong>of</strong> metadata required to perform this<br />

operation is trivial compared with the amount <strong>of</strong> file data. The other experiment creates<br />

500 files corresponding to the case where metadata operations dominate the execution.<br />

The third experiment is the second phase <strong>of</strong> the Andrew benchmark [32], which represents<br />

real-life file system operations, that is, copying files <strong>of</strong> various sizes mixed with<br />

file creations, file reads, <strong>and</strong> file writes.


FLASH MEMORY-BASED FILE SYSTEM ON CHIP<br />

Table 2. Elapsed time for file operations (unit: ms).<br />

Operation FSOC Conv<br />

File read 1730 1603<br />

File creation 8986 28926<br />

Andrew 4920 4961<br />

The results are presented in Table 2. The file read experiment shows that the performance<br />

<strong>of</strong> FSOC is slightly lower (by about 8%) than the conventional storage device.<br />

This is due to the lower computing power <strong>of</strong> the storage device compared with that <strong>of</strong> the<br />

host system. On the other h<strong>and</strong>, the file creation experiment shows that the performance<br />

<strong>of</strong> FSOC is higher than the conventional storage device by about 69%. In the Andrew<br />

benchmark experiment, the conventional storage device <strong>and</strong> the FSOC show almost identical<br />

performance. This shows that the effect <strong>of</strong> reducing data traffic compensates for the<br />

performance degradation due to the lower computing power <strong>of</strong> the storage device.<br />

5.5 Summary <strong>of</strong> Experimental Results<br />

We can summarize the experimental results observed from this section as follows:<br />

• The performance model presented in section 4 matches well with measurement results<br />

obtained from the prototype implementation.<br />

• FSOC is beneficial when the computation time <strong>of</strong> the application is larger than the I/O<br />

time, <strong>and</strong> I/O time can be hidden by parallel execution. The gain, in this case, is equivalent<br />

to the file system code execution time <strong>of</strong> the host.<br />

• The lower the computing power <strong>of</strong> the host, the larger the gain for FSOC. This is because<br />

computing power <strong>of</strong> the host determines the file system code execution time at<br />

the host.<br />

• When multiple applications are executed, more benefit may be attainable for FSOC as<br />

I/O time can be hidden by the computation time <strong>of</strong> other concurrently executing applications.<br />

• The FSOC also reduces data traffic between the host <strong>and</strong> the storage, especially, for<br />

metadata intensive operations.<br />

6. CONCLUSION<br />

In this paper, we proposed <strong>and</strong> implemented an FSOC that contains a file system in<br />

the storage device with <strong>Flash</strong> memory as its storage media. FSOC provides a higher degree<br />

<strong>of</strong> interoperability than conventional storage devices <strong>and</strong> improves the efficiency <strong>of</strong><br />

the storage device by allowing the file system to be optimized specifically for the storage<br />

media <strong>of</strong> interest. Moreover, FSOC reduces the burden <strong>of</strong> developing or porting different<br />

file systems in the host system.<br />

Aside from these qualitative advantages, FSOC also provides quantitative advantages.<br />

<strong>Performance</strong> gains can be obtained through parallel execution <strong>of</strong> application code<br />

in the host <strong>and</strong> the file system code in the storage device. The FSOC also reduces data<br />

1883


1884<br />

S. J. AHN, J. M. CHOI, D. H. LEE, S. H. NOH, S. L. MIN AND Y. K. CHO<br />

traffic between the host <strong>and</strong> the storage. These performance issues were evaluated through<br />

a performance model <strong>and</strong> several experiments with synthetic workloads <strong>and</strong> real applications.<br />

The experimental results showed that FSOC performed better than the conventional<br />

storage device when the computation time is larger than the I/O time <strong>and</strong>/or when<br />

there are many file operations that require access to metadata.<br />

As future work we plan to extend the proposed model to include the case where<br />

multiple applications are executed independently. In this case we need to consider the<br />

behavior <strong>of</strong> multiple processes that do not cooperate with each other. This point was not<br />

considered in the model presented in this paper. We are also planning to examine the<br />

effect <strong>of</strong> FSOC for a variety <strong>of</strong> file systems <strong>and</strong> the impact upon performance from different<br />

file system designs <strong>and</strong> implementations.<br />

REFERENCES<br />

1. Compact<strong>Flash</strong> Association, “Information about Compact<strong>Flash</strong>,” http://www.compact-<br />

flash.org/.<br />

2. MultiMediaCard Association, http://www.mmca.org.<br />

3. E. Zadok <strong>and</strong> J. Nieh, “Fist: a language for stackable file systems,” in Proceedings <strong>of</strong><br />

the Annual USENIX Technical Conference, 2000, pp. 55-70.<br />

4. G. R. Ganger, “Blurring the line between OSes <strong>and</strong> storage devices,” Technical Report,<br />

No. CMU-CS-01166, Carnegie Mellon University, 2001.<br />

5. C. R. Lumb, J. Schindler, <strong>and</strong> G. R. Ganger, “Freeblock scheduling outside <strong>of</strong> disk<br />

firmware,” in Proceedings <strong>of</strong> the 1st USENIX Conference on File <strong>and</strong> Storage Technologies,<br />

2002, pp. 275-288.<br />

6. J. Schindler, J. L. Griffin, C. R. Lumb, <strong>and</strong> G. R. Ganger, “Track-aligned extents:<br />

matching access patterns to disk drive characteristics,” in Proceedings <strong>of</strong> the 1st<br />

USENIX Conference on File <strong>and</strong> Storage Technologies, 2002, pp. 259-274.<br />

7. R. Wang, T. E. Anderson, <strong>and</strong> D. A. Patterson, “Virtual log-based file systems for a<br />

programmable disk,” in Proceedings <strong>of</strong> the 3rd Symposium on Operating Systems<br />

<strong>Design</strong> <strong>and</strong> <strong>Implementation</strong>, 1999, pp. 29-43.<br />

8. A. Acharya, M. Uysal, <strong>and</strong> J. Saltz, “Active disks: programming model, algorithms<br />

<strong>and</strong> evaluation,” in Proceedings <strong>of</strong> the 8th International Conference on Architectural<br />

Support for Programming Languages <strong>and</strong> Operating Systems, 1998, pp. 81-91.<br />

9. A. P. Ch<strong>and</strong>rakasan, S. Sheng, <strong>and</strong> R. W. Brodersen, “Low-power CMOS digital design,”<br />

IEEE Journal <strong>of</strong> Solid State Circuits, Vol. 27, 1992, pp. 473-484.<br />

10. E. Riedel, G. Gibson, <strong>and</strong> C. Faloutsos, “Active storage for large-scale data mining<br />

<strong>and</strong> multimedia,” in Proceedings <strong>of</strong> the 24th International Conference on Very Large<br />

Data Bases, 1998, pp. 62-73.<br />

11. K. Keeton, D. A. Patterson, <strong>and</strong> J. M. Hellerstein, “A case for intelligent disks<br />

(IDISKs),” in Proceedings <strong>of</strong> the ACM SIGMOD International Conference on Management<br />

<strong>of</strong> Data, 1998, pp. 42-52.<br />

12. E. Riedel, “Active disks − Remote execution for network-attached storage,” Ph.D.<br />

Dissertation, No. CMU-CS-99-177, Department <strong>of</strong> Electrical <strong>and</strong> Computer Engineering,<br />

Carnegie Mellon University, 1999.<br />

13. E. Riedel, C. Faloutsos, G. A. Gibson, <strong>and</strong> D. Nagle, “Active disks for large-scale


FLASH MEMORY-BASED FILE SYSTEM ON CHIP<br />

data processing,” IEEE Computer, Vol. 34, 2001, pp. 68-74.<br />

14. D. Anderson, “Object based storage devices: a comm<strong>and</strong> set proposal,” Technical<br />

Report, National Storage Industry Consortium, 1999.<br />

15. OSD workgroup, http://www.snia.org/tech_activities/workgroups/osd.<br />

16. F. Douglis, R. Caceres, F. Kaashoek, K. Li, B. Marsh, <strong>and</strong> J. A. Tauber, “Storage al-<br />

ternatives for mobile computers,” in Proceedings <strong>of</strong> the 1st Symposium on Operating<br />

Systems <strong>Design</strong> <strong>and</strong> <strong>Implementation</strong>, 1994, pp. 25-37.<br />

17. B. Marsh, F. Douglis, <strong>and</strong> P. Krishnan, “<strong>Flash</strong> memory file caching for mobile computers,”<br />

in Proceedings <strong>of</strong> the 27th Annual Hawaii International Conference on Systems<br />

Sciences, 1994, pp. 451-461.<br />

18. J. Kim, J. M. Kim, S. H. Noh, S. L. Min, <strong>and</strong> Y. Cho, “A space-efficient flash translation<br />

layer for Compact<strong>Flash</strong> systems,” IEEE Transactions on Consumer Electronics,<br />

Vol. 28, 2002, pp. 366-375.<br />

19. M. Wu <strong>and</strong> W. Zwaenepoel, “eNVy: a non-volatile, main memory storage system,”<br />

in Proceedings <strong>of</strong> the 6th International Conference on Architectural Support for<br />

Programming Languages <strong>and</strong> Operating Systems, 1994, pp. 86-97.<br />

20. Intel Corporation, “Underst<strong>and</strong>ing the flash translation layer (FTL) specification,”<br />

http://developer.intel.com.<br />

21. A. Kawaguchi, S. Nishioka, <strong>and</strong> H. Motoda, “A flash-memory based file system,” in<br />

Proceedings <strong>of</strong> the USENIX Technical Conference, 1995, pp. 155-164.<br />

22. MTD, “Memory technology device (MTD) subsystem for Linux,” http://www.linux-<br />

mtd.infradead.org.<br />

23. D. Woodhouse, Red Hat, Inc., “JFFS: the journaling flash file system,” http://sources.<br />

redhat.com/jffs2/jffs2-html/.<br />

24. Aleph One Company, “Yet another flash filing system,” http://www.aleph1.co.uk/<br />

armlinux/projects/yaffs/.<br />

25. M. Rosenblum <strong>and</strong> J. K. Ousterhout, “The design <strong>and</strong> implementation <strong>of</strong> a log-struc-<br />

tured file system,” ACM Transactions on Computer Systems, Vol. 10, 1992, pp. 26-51.<br />

26. S. Kleiman, “Vnode: an architecture for multiple file systems in Sun UNIX,” in Proceedings<br />

<strong>of</strong> the USENIX Technical Conference, 1986, pp. 238-247.<br />

27. A. Birrel <strong>and</strong> B. Nelson, “Implementing remote procedure calls,” ACM Transactions<br />

on Computer Systems, Vol. 2, 1984, pp. 39-59.<br />

28. IEEE, Information Technology – Portable Operating System Interface (POSIX) Part<br />

1: System Application Program Interface (API) [C Language], IEEE, 1990.<br />

29. Personal Computer Card Interface Association, “PCMCIA PC card st<strong>and</strong>ard release<br />

2.1,” 1993.<br />

30. R. Hagmann, “Reimplementing the Cedar file system using logging <strong>and</strong> group commit,”<br />

in Proceedings <strong>of</strong> the 11th Symposium on Operating System Principles, 1987,<br />

pp. 155-162.<br />

31. M. S. Kwon, S. H. Bae, S. S. Jung, D. Y. Seo, <strong>and</strong> C. K. Kim, “KFAT: log-based<br />

transactional FAT file system for embedded mobile systems,” in Proceedings <strong>of</strong> the<br />

US-Korea Conference on Science, Technology, <strong>and</strong> Entrepreneurship.<br />

32. J. K. Ousterhout, “Why aren’t operating systems getting faster as fast as hardware?”<br />

in Proceedings <strong>of</strong> theUSENIX Technical Conference, 1990, pp. 247-256.<br />

1885


1886<br />

S. J. AHN, J. M. CHOI, D. H. LEE, S. H. NOH, S. L. MIN AND Y. K. CHO<br />

Seongjun Ahn received the M.S. <strong>and</strong> Ph.D. degrees in<br />

Computer Engineering from Seoul National University, Seoul,<br />

Korea, in 1999 <strong>and</strong> 2006, respectively. He has been working at<br />

S<strong>of</strong>tware Laboratories, Samsung Electronics Company, Korea<br />

since 2006. His current research interests include operating systems,<br />

flash memory s<strong>of</strong>tware, embedded systems, <strong>and</strong> high performance<br />

storage systems.<br />

Jongmoo Choi received the B.S. degree in Oceanography<br />

from Seoul National University, Korea, in 1993 <strong>and</strong> the M.S. <strong>and</strong><br />

Ph.D. degrees in Computer Engineering from Seoul National<br />

University in 1995 <strong>and</strong> 2001, respectively. Currently, He is an as-<br />

sistant pr<strong>of</strong>essor in the Division <strong>of</strong> Information <strong>and</strong> Computer<br />

Science, Dankook University, Seoul, Korea. Previously, he was a<br />

senior engineer at the Ubiquix Company where he participated in<br />

developing a real-time micro operating system for PDAs <strong>and</strong><br />

Smart Phones. His research interests include embedded system,<br />

system s<strong>of</strong>tware, flash memory, RTOS, <strong>and</strong> data mining.<br />

Donghee Lee received the M.S. <strong>and</strong> Ph.D. degrees in Computer<br />

Engineering, both from Seoul National University, Seoul,<br />

Korea, in 1991 <strong>and</strong> 1998, respectively. He has been with the<br />

University <strong>of</strong> Seoul, Korea since 2002, where he is now an associate<br />

pr<strong>of</strong>essor in the School <strong>of</strong> Computer Science. Previously, in<br />

1998, he was a senior engineer at Samsung Electronics Company,<br />

Korea. From 1999 to 2001, he was with the Cheju National University,<br />

Korea, where he was an assistant pr<strong>of</strong>essor. He was also<br />

an assistant pr<strong>of</strong>essor in Hanyang University, Korea in 2001. His<br />

research interests include operating systems, flash memory s<strong>of</strong>t-<br />

ware, embedded systems, <strong>and</strong> high performance storage systems.<br />

Sam H. Noh received the B.S. degree in Computer Engineering<br />

from Seoul National University, Korea, in 1986, <strong>and</strong> the<br />

PhD degree from the University <strong>of</strong> Maryl<strong>and</strong> at College Park in<br />

1993. He held a visiting faculty position at George Washington<br />

University from 1993 to 1994 before joining Hongik University<br />

in Seoul, Korea, where he is now a pr<strong>of</strong>essor in the School <strong>of</strong><br />

Information <strong>and</strong> Computer Engineering. From August 2001 to<br />

August 2002, he was a visiting associate pr<strong>of</strong>essor to University<br />

<strong>of</strong> Maryl<strong>and</strong> Institute for Advanced Computer Studies (UMIACS),


FLASH MEMORY-BASED FILE SYSTEM ON CHIP<br />

College Park. He was member <strong>of</strong> the program committee for the <strong>Performance</strong> <strong>and</strong> Reliability<br />

track <strong>of</strong> the Twelfth International WWW Conference (WWW2003). His current<br />

research interests include Web systems, parallel <strong>and</strong> distributed systems, operating systems<br />

with emphasis on I/O issues, <strong>and</strong> real-time systems.<br />

Sang Lyul Min received the B.S. <strong>and</strong> M.S. degrees in Computer<br />

Engineering, both from Seoul National University, Seoul,<br />

Korea, in 1983 <strong>and</strong> 1985, respectively. In 1985, he was awarded a<br />

Fullbright scholarship to pursue further graduate studies at the<br />

University <strong>of</strong> Washington. He received the M.S. <strong>and</strong> Ph.D. degrees<br />

in Computer Science from the University <strong>of</strong> Washington,<br />

Seattle, in 1988 <strong>and</strong> 1989, respectively. He is currently a pr<strong>of</strong>essor<br />

in the School <strong>of</strong> Computer Science <strong>and</strong> Engineering, Seoul<br />

National University, Seoul, Korea. Previously, he was an assistant<br />

pr<strong>of</strong>essor in the Department <strong>of</strong> Computer Engineering, Pusan<br />

National University, Pusan, Korea, from 1989 to 1992 <strong>and</strong> a visiting scientist at the IBM<br />

T.J. Watson Research Center, Yorktown Heights, New York, from 1989 to 1990. His<br />

research interests include computer architecture, real-time computing, parallel processing,<br />

<strong>and</strong> computer performance evaluation.<br />

Yookun Cho received the B.S. degree from Seoul National<br />

University, Korea, in 1971 <strong>and</strong> the Ph.D. degree in Computer<br />

Science from the University <strong>of</strong> Minnesota at Minneapolis in 1978.<br />

He has been with the School <strong>of</strong> Computer Science <strong>and</strong> Engineering<br />

since 1979, where he is currently a pr<strong>of</strong>essor. He was a visiting<br />

assistant pr<strong>of</strong>essor at the University <strong>of</strong> Minnesota during 1985<br />

<strong>and</strong> the director <strong>of</strong> Educational <strong>and</strong> Research Computing Center<br />

at Seoul National University from 1993 to 1995. He also served<br />

as the president <strong>of</strong> Korea Information Science Society from 2001<br />

to 2002. He was a member <strong>of</strong> the program committee <strong>of</strong> the IPPS/<br />

SPDP’98 in 1997 <strong>and</strong> the International Conference on High-<strong>Performance</strong> Computing<br />

from 1995 to 1997. His research interests include operating systems, algorithms, system<br />

security, <strong>and</strong> fault-tolerant computing systems.<br />

1887

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!