Design, Implementation, and Performance Evaluation of Flash ...

JOURNAL OF INFORMATION SCIENCE AND ENGINEERING 23, 1865-1887 (2007) 

Design, Implementation, and Performance Evaluation of 

Flash Memory-based File System on Chip * 

SEONGJUN AHN, JONGMOO CHOI 1 , DONGHEE LEE 2 , SAM H. NOH 3 , 

SANG LYUL MIN AND YOOKUN CHO 

Department of Electrical Engineering and Computer Sciences 

Seoul National University 

Seoul, 151-742 Korea 

1 

Division of Information and Computer Science 

Dankook University 


2 

School of Computer Science 

University of Seoul 


3 

School of Information and Computer Engineering 

Hongik University 


Interoperability is an important requirement for portable storage devices that are increasingly 

being used to exchange and share data among diverse hosts. However, interoperability 

cannot be provided if different host systems use different file systems. To 

address this problem, we propose a storage device that contains a file system within itself, 

which we refer to as FSOC (File System On Chip). In this paper, we explain the design 

and implementation of a Flash memory-based FSOC as a proof-of-concept. We also propose 

a performance model for FSOC, which is derived by analyzing operations of the 

host and storage device. Using this model, we show that aside from qualitative benefits, 

there are quantitative benefits in using FSOC instead of a conventional storage device. 

Results from a series of experiments are given that compare the performance of a conventional 

storage device and the FSOC using synthetic workloads as well as real applications, 

which verifies the proposed model. 

Keywords: embedded system, file system, flash memory, interoperability, portable storage 

1. INTRODUCTION 

Portable storage devices such as CompactFlash [1] and Multimedia Card [2] are increasingly 

being used as nonvolatile storage to exchange and share data among multiple 

hosts including mobile ones. Hence, interoperability is a key requirement of such systems. 

However, if the file system on the portable storage device is not compatible with 

the file system of the host, the applications running on the host cannot access the stored 

data. 

Developing or porting a file system is not a simple task. Thus, providing a file sys- 

Received November 3, 2005; revised January 27 & May 29, 2006; accepted July 19, 2006. 

Communicated by Tei-Wei Kuo. 

* This research was partly supported by grant No. R01-2004-000-10188-0 from the Basic Research Program 

of the Korea Science & Engineering Foundation and in part by MIC & IITA through IT Leading R&D Support 

Project. 

1865

1866 

S. J. AHN, J. M. CHOI, D. H. LEE, S. H. NOH, S. L. MIN AND Y. K. CHO 

tem for each of the many different embedded systems and mobile devices such as digital 

cameras, MP3 players, and PDA’s, each with its own specific environment, takes a lot of 

time and effort [3]. This results in a prolonged time-to-market of the product, which, in 

turn, may influence the success of the product. 

In this paper, we propose that the file system be embedded within the portable storage 

device, which we refer to as FSOC (File System on Chip). The benefits of FSOC can 

be categorized into qualitative and quantitative ones. Qualitative advantages of FSOC 

compared with conventional storage devices include the following. 

• FSOC provides a high degree of interoperability. When conventional storage devices 

are used, a host can access data stored in a storage device only if the file system of the 

host is compatible with that in the storage device. Using FSOC, any host can access the 

data residing in the storage device simply by adding a simple interface for FSOC to the 

host. 

• Host system developers need not implement a file system. Therefore, FSOC eliminates 

the burden of developing or porting a file system, and thus reduces the time-to-market. 

• FSOC improves file system performance by optimizing the file system for the storage 

media it uses. Generally, when the file system resides in the host system, optimizing its 

performance is difficult since it needs to support a variety of storage media with varying 

characteristics [4]. In FSOC, as the file system is developed for the specific storage 

media that it contains, there are more opportunities for optimization [5-7]. 

Aside from these qualitative benefits, obtainable quantitative benefits of FSOC are 

summarized below, with details provided in later sections. 

• Since the file system code is now executed in the storage device, more processor time 

in the host system can be allocated to the execution of application code. 

• FSOC reduces the data traffic between the host and the storage device. This is because 

metadata required during file operations are not transferred between the FSOC and the 

host [8]. 

• FSOC lowers the energy consumption of the system by reducing data traffic and by 

lowering the clock speed and the supply voltage through parallel processing of the host 

and the storage device [9]. 

However, these benefits, qualitative as well as quantitative, are not always attainable. 

Our study shows that the use of FSOC is desirable in the following situations. That 

is, (1) when the host is unable to provide diverse file systems due to limitations on resources 

and/or development time, (2) when an application requires a large amount of 

computation, and I/O can be overlapped with the computation, (3) when multiple applications 

are being executed on the host, and/or (4) when applications perform metadata 

intensive I/O operations. However, when applications require large amounts of I/O and 

I/O time is critical for the application or when the computation power of the host is sufficiently 

greater than that of the storage device, the use of FSOC may not be desirable. 

As a proof-of-concept, we designed and implemented an FSOC that uses Flash 

memory as its storage media. We also show how the interface between the FSOC and the 

host can be defined based on the standard interface that is used to access files by applica-

FLASH MEMORY-BASED FILE SYSTEM ON CHIP 

tions, which is important factor of the interoperability. Our contribution in the theoretical 

aspect is in presenting a performance analysis based on a theoretical model and validating 

the model through experiments using an FSOC implementation. Although the approach 

of off-loading part of host work to the storage device is not new, our work is the 

first attempt to apply it to the embedded system such as portable storage to our knowledge. 

In such circumstances the models and performance analysis results from previous 

work cannot be directly applied as the environment of embedded systems is restrictive. 

The rest of the paper is organized as follows. Section 2 presents works related to 

this paper and section 3 describes the design and implementation of an FSOC in a Flash 

memory card. In section 4, performance models of FSOC and a conventional storage 

device are presented, and the performance of these devices is compared using the presented 

model. In section 5, a performance comparison of the implemented FSOC and a 

conventional storage device is presented. Finally, we provide a summary and conclude in 

section 6. 

2. RELATED WORK 

The FSOC approach, that is off-loading file system work to a storage device, is not 

a new idea. There have been several previous research results that consider performance 

of intelligent storage devices [8, 10-13]. However, in these studies the storage device of 

interest is disk storage and their main concern is in regards to exploiting parallelism 

among the storage devices. In this section, we describe some of the previous research 

that utilizes additional resources within the storage device to improve performance and/or 

functionality. In contrast to these works, our work is more apt to storage devices that can- 

not exploit this kind of parallelism such as portable storage devices that are becoming 

more and more prevalent. In this section, we also describe research on Flash memory, 

which is the platform on which we implement our FSOC. 

Active disk [8, 10] IDISK (Intelligent Disk) [11], and OSD (Object-based Storage 

Device) [14, 15] are some of the research efforts that are targeted to improve the performance 

and/or functionality of storage devices. In the Active disk approach, a portion 

of the application code is downloaded and executed in the storage device to reduce the 

data traffic and to improve the performance of data-intensive portions of the application. 

The IDISK proposes an architecture that uses a high-speed network to interconnect multiple 

disks that have application code execution capabilities. This architecture improves 

application execution parallelism and can be used to improve efficiency of data-intensive 

applications such as decision support systems. Finally, in OSD, which is most closely 

related to FSOC, interoperability is enhanced by making OSD responsible for locating 

data blocks. However, OSD is different from FSOC in that OSD manages data in the 

form of objects rather than files, and naming of files and managing directories remain the 

responsibility of the host. Moreover, the main purpose of OSD is in constructing storage 

appliances, whereas the purpose of FSOC is in providing portable storage that can be 

used by mobile devices. 

Riedel et al. proposed a performance model for Active disk [12, 13]. The performance 

model for Active disk was focused on parallelism among storage devices that execute 

part of the host application. It is similar to our model in that it models the perform- 

1867

1868 


ance of storage devices that process offloaded computation as well as I/O requests. 

However, in this model, parallelism is considered only among the storage devices. 

Currently, interfaces such as ATA or SCSI provide only simple read and write access 

to blocks in the storage device. This limited interface has been identified as an obstacle 

to making the storage device intelligent. It has been suggested that to make use of 

computational resources within the storage device a more expressive interface between 

the host and the storage is needed [4]. We propose a new interface for FSOC in section 3. 

The storage media used in our implementation of FSOC is Flash memory. Flash 

memory is suitable as storage media for mobile devices because it is light, rigid, and has 

low power consumption [16, 17]. However, Flash memory has a unique characteristic 

different from hard disks; that is, to overwrite a physical block in Flash memory an erase 

operation must be performed before the actual write operation [18, 19]. Hence, simply 

adapting a conventional file system developed for hard disks may not be possible. Two 

approaches have been used to circumvent this problem. 

The first method is to use a software layer called a Flash Translation Layer (FTL) 

[18-22], as generally done for Flash memory cards such as CompactFlash [1] and Multimedia 

Card [2]. FTL is a sector remapping software layer that provides to the file system 

an interface similar to that of a disk. 

The other approach is to develop a file system that takes into consideration the limitation 

of Flash memory. The JFFS [23] and YAFFS [24] are such file systems. These file 

systems run on the Linux operating system, and manage the physical structure of Flash 

memory by themselves. These file systems adapted the approach used in the Log-stru- 

tured File System (LFS) [25] since in LFS only append operations are used, which eliminates 

the need for overwrites. However, we did not consider JFFS and YAFFS as a candidate 

for the file system in FSOC since both of them require a considerable amount of 

resources, which cannot be assumed in a consumer device. 

3. DESIGN AND IMPLEMENTATION OF FSOC 

In this section, we first describe the design of our FSOC. The relationship and the 

interface between the host and the storage device as compared with conventional storage 

devices are presented. We then describe in detail the hardware platform and the software 

structure of the prototype implementation of the FSOC. 

3.1 Structure of FSOC 

Fig. 1 shows the structure of a conventional storage device and the FSOC. Consider 

how a file request actually accesses the storage medium. In conventional storage devices, 

the request is converted to block access requests by the file system of the host operating 

system. The block access requests are then passed to the device driver, and the device 

driver converts block access requests to sector requests and transmits them to the storage 

device. 

In the FSOC, the file system is embedded within the storage device. Therefore, the 

host that uses the FSOC does not have to be equipped with a file system. Instead, it only 

needs a simple stub that serves as an interface between the host and the FSOC. All file


Host 

Application 

File 

Request 

Kernel 

File System 

Block 

Request 

Device Driver 

Bus 

Storage 

Media 

Sector 

Request 

Conventional 

Storage Device 

Host 

Application 

File 

Request 

Kernel 

FSOC Stub 

Bus File 

Request 

File System 

Storage 

Media 

FSOC 

Sector 

Request 

Fig. 1. Structure of conventional storage device and FSOC. 

Application 

Interface 

open() 

read() 

System 

write() 

Call 

unlink() 

mkdir() 

rmdir() 

rename() 

••• 

Host FSOC 

Stub 

Interface 

open_cli() 

read_cli() 

write_cli() 

unlink_cli() 

mkdir_cli() 

rmdir_cli() 

rename_cli() 

••• 

File 

Request 

FSOC 

Interface 

open_svc() 

read_svc() 

write_svc() 

unlink_svc() 

mkdir_svc() 

rmdir_svc() 

rename_svc() 

Fig. 2. Interface between the host and the FSOC. 

access requests are transmitted to the FSOC through this stub. The stub arranges the parameters 

of file requests and converts them to FSOC requests, which are then transmitted 

to the file system in the FSOC. The file system in the FSOC then fulfills the file request 

by making a direct request to the storage media, i.e., writes new data on the storage media 

or reads data from the storage media. It also updates the metadata when necessary. 

The results of the file operation are sent to the stub, and then passed on to the host application. 

The stub can be implemented as a stand-alone module or as a pseudo file system 

under the Virtual File System (VFS) layer [26]. In the latter case, applications can access 

the FSOC using an interface that is identical to other file systems. 

Fig. 2 shows the interface between the application and the stub, and between the 

stub and the FSOC file system. The FSOC interface is similar to that of an RPC (Remote 

Procedure Call) [27]. The FSOC has service routines corresponding to each file request, 

and the stub has the client routines. For example, a read operation is performed as follows: 

(1) the application makes a system call read() and passes the identifier of the file, 

the file offset, the amount of data to be read, and the buffer address to the stub, (2) the 

stub executes the read_cli() routine and converts the file identifier, the file offset, and the 

amount of data into an FSOC request, (3) the converted request is transmitted to the 

FSOC, (4) upon receiving the request, the read_svc() routine within the FSOC is called 

that reads data from the storage media, (5) the read_svc() routine returns the read data to 

the stub of the host, and (6) the stub, finally, passes the data to the application. 

••• 

1869

1870 


The FSOC interface needs to be comprehensive and general in order to provide interoperability 

of the storage and the various hosts. For this purpose the commands, parameters, 

and response of the FSOC interface is defined based on the POSIX interface 

for the files and directories [28] that has been widely adopted by various operating systems. 

The communication protocol between the host and the FSOC begins with issuing a 

command that initiates a file operation from the host. And then the host sends parameters 

that are required for the issued file operation. If the file operation requested is a file write, 

the host sends the data to write to the FSOC in the next step. Otherwise this step is omitted. 

If file data or metadata are requested by the host the FSOC sends them to the host 

after completing the request. The response that indicates whether the operation succeeded 

or failed, and the error code if failed, is transferred to the host in the final step. 

3.2 Implementation of FSOC 

We implemented an FSOC on a CompactFlash memory card that uses NAND type 

Flash memory as its storage media. As shown in Fig. 3, the CompactFlash has an ARM7- 

TDMI core that operates at 24MHz and 48KB of NOR type Flash memory that stores 

FTL code. Also, it has 16KB of SRAM for stack, data, and buffer area that are necessary 

for executing the FTL code. The interface with the host is PCMCIA [29]. The file system 

we implemented was embedded in the NOR Flash memory along with the FTL code. Fig. 

4 depicts the CompactFlash development board that was used to implement the FSOC. 

Host 

PCMCIA 

NAND 

Flash 


Controller 

CompactFlash 

Local BUS 

ARM7TDMI 

NOR Flash 

SRAM 

Global BUS 

Fig. 3. Hardware structure of the CompactFlash. Fig. 4. CompactFlash development board used 

to implement the FSOC. 

In designing a FAT-based file system for our FSOC, we considered three requirements. 

The first requirement is quick recovery after power failure as FSOC is expected to 

be used mostly in mobile environments where power is turned off frequently, inadvertently 

or not. For this reason, we added a journaling mechanism [30, 31] that records the 

contents of the file operations before actually modifying the file system for recovery purposes. 

Specifically, when metadata such as FAT entries or directory entries need to be 

updated, the operation is written to the log, which is maintained as a file in the root directory, 

prior to the actual update. It results in a slight performance degradation for write 

operations compared to the original FAT file system. However, read performance is not 

affected. 

The second requirement is efficient execution on low performance processors that


are expected to be used in FSOC. To meet this requirement, summary information of file 

names contained in directories were retained in the directory entries. The summary information 

is also cached in the main memory of FSOC and managed in LRU manner. 

This caching mechanism reduces the file name lookup time, thereby improving execution 

efficiency. 

The last requirement is on code and data memory. Recall that the NOR type Flash 

memory size is only 48KB, and the FTL code occupies 13.6KB. Therefore, the file system 

must fit into what is left. Also, there is only 16KB of SRAM available. Again, 

6.2KB of it is required by the FTL code and some space for the buffer is also required for 

the transfer of data between the host and FSOC. For this purpose, we used the 16 bit 

Thumb ISA supported in ARM7TDMI rather than the 32 bit ARM ISA and avoided 

compiler optimizations that can increase the code size. As a result, the resulting file system 

uses only 10KB of the NOR type Flash memory and 6.5KB of SRAM, which meets 

the memory requirements. 

4. PERFORMANCE MODEL FOR FSOC 

In this section, we present the performance model for FSOC, and use the model to 

compare its performance with a conventional storage device. The performance evaluation 

criterion is the application run time including storage access time. Some common assumptions 

that we make for our model are as follows: 

(1) There is only one application executing. 

(2) An application reads a unit of data and computation is performed on this data. A constant 

amount of time is consumed for this computation. This read-computation cycle 

is repeated N number of times. 

(3) Write operations of the application are non-blocking, and hence do not affect the run 

time of the application. (Note that these write requests will be queued and processed 

later by the operating system.) 

4.1 Performance Model for Serial Execution of I/O and Computation 

Applications that use blocking reads will block upon a request for data. For these 

kinds of applications the reading of data and computation upon this read data can only be 

executed one after the other. Hence, overlapping of I/O and computation is impossible 

leading to serial execution of I/O and computation. 

The performance model, in this case, is derived from operations of the host CPU, 

bus, and storage devices. In a conventional storage device, execution of an application 

consists of application code execution, file system code execution, device driver code 

execution, and I/O processing, where I/O processing is divided into storage media access 

and data transfer. Then, the application run time, denoted Tconv_serial, can be expressed as 

Eq. (1). (Refer to Table 1 for the definition of the symbols used in all equations.) 

Tconv_serial = Tcomp_serial + N(TFS_host + Tdriver + Tmeida + Ttrans_conv + Tcomp) (1) 

1871

1872 


Table 1. Definition of symbols used in the performance model. 

Used model Symbol Meaning 

N Total number of data units to be processed 

Execution time of application code that is not dependent on par- 

Tcomp_serial ticular data unit and cannot be overlapped with I/O (e.g. initializing 

memory at program start up or outputting overall result at 

the end of program) 

Common 

Tcomp Application code execution time for performing computation on 

a unit of data 

Tmedia Storage media access time for a unit of data 

Time for a unit of data to be transferred between host and storage 

Ttrans device when they are the same for conventional storage device 

and FSOC 

Tconv_serial Total application run time for conventional storage device when 

all I/O and computation are serially executed 

Conventional 

storage device 

Tconv_parallel TFS_host Total application run time for conventional storage device, when 

some I/O and computation are executed in parallel 

File system code execution time of host for a unit of data (does 

not include device driver code execution time) 

Tdriver Device driver code execution time of host for a unit of data 

Ttrans_conv Data transfer time between host and conventional storage device 

for a unit of data 

TFSOC_serial Total application run time for FSOC when all I/O and computation 

are serially executed 

FSOC 

TFSOC_parallel Total application run time for FSOC when some I/O and computation 

are processed in parallel 

Stub code execution time for a unit of data 

T stub 

T FS_FSOC 

T trans_FSOC 

File system code execution time of FSOC for a unit of data 

Data transfer time between host and FSOC for a unit of data 

When FSOC is used, execution of an application consists of application code execution, 

stub code execution, and I/O processing. I/O processing of FSOC is divided into 

three parts: the file system code execution within FSOC, storage media access, and data 

transfer. The key difference in executing an application with FSOC and with a conventional 

storage device is that the host executes the stub code instead of the device driver 

code and that the file system code is executed within the device, not in the host, for 

FSOC. The application run time, denoted TFSOC_serial, can be expressed as Eq. (2). 

TFSOC_serial = Tcomp_serial + N(Tstub + TFS_FSOC + Tmeida + Ttrans_FSOC + Tcomp) (2) 

4.2 Performance Model for Parallel Execution of I/O and Computation 

In this subsection, we present the performance model when I/O processing and 

computation may be overlapped. We assume that parallel execution of I/O and computation 

is possible, that is, the application can determine which data will be needed before


the computation on the currently read data is complete, and reads are non-blocking 

and/or, without loss of generality, assume that two processes cooperate per application, 

one for I/O processing and one for computation. We also assume that data read requests 

are issued as soon as the bus and storage become available, that is, the highest priority is 

given to I/O processing and when I/O processing is available the operating system notifies 

the process so it can suspend computation, issue I/O request, and resume computation. 

Similarly to the performance model presented for serial execution of I/O and computation, 

the application execution time for a conventional storage device can be derived 

from operations of the host CPU, bus, and storage device. Fig. 5 depicts the execution 

behavior when I/O and computation may be overlapped. The application execution time 

for this case is given in Eq. (3). Similarly, the execution behavior for FSOC is depicted in 

Fig. 6, and the application execution time is given in Eq. (4). Note that if N is 1, then Eqs. 

(3) and (4) and Eqs. (1) and (2), respectively, become the same. Both situations represent 

the case where I/O processing and computation cannot overlap. 

CPU 

Bus 

Storage 

initialization 

read request 

for data unit 1 

C F D 

M 

read request 


T 

process 

data unit 1 

F D C F D C 

C C 

C 

M 

read request 


T 

M 

T 

process 

data unit 2 

C: application code execution 

D: device driver code execution 

F: file system code execution 

M: media access 

T: data transfer 

process 

data unit 3 

output 

result 

Fig. 5. Operations of host CPU, bus, and storage device for conventional storage device when I/O 

and computation are executed in parallel (N = 3, T comp_serial = 2 time units, T comp = 3 time 

units, and all other parameters are of 1 time unit). 

CPU 

Bus 

Storage 

initialization 

read request 


C 

S 

F 

M 

read request 


T 

S C 

S 

F 

process 

data unit 1 

M 

read request 


T 

F 

process 

data unit 2 

C C 

C 

M 

T 

C: application code execution 

S: stub code execution 




process 

data unit 3 

output 

result 

Fig. 6. Operations of host CPU, bus, and storage device for FSOC when I/O and computation are 

executed in parallel (N = 3, T comp_serial = 2 time units, T comp = 3 time units, and all other parameters 

are of 1 time unit). 

1873 

Time 

Time

1874 


Tconv_parallel = Tcomp_serial + N(TFS_host + Tdriver) + Tmeida + Ttrans_conv + Tcomp 

+ (N − 1)max(Tcomp, Tmedia + Ttrans_conv) (3) 

TFSOC_parallel = Tcomp_serial + N × Tstub + TFS_FSOC + Tmeida + Ttrans_FSOC + Tcomp 

+ (N − 1)max(Tcomp, TFS_FSOC + Tmedia + Ttrans_FSOC) (4) 

The performance of executing on a conventional storage device and FSOC can be 

compared based on the presented performance model. In our analysis, we assume that 

TFS_FSOC > TFS_host because the computation power of a storage device would, in general, 

be lower than that of the host. For simplicity, we also assume that the execution time for 

the device driver code and stub code are the same as the stub code basically plays the 

role of the device driver for FSOC 1 . Another simplification we make is that the data 

transfer time for both devices are the same. Hence, we denote both Ttrans_FSOC and Ttrans 

_conv as Ttrans. 

Application run tim e 

100000 

90000 

80000 

70000 

60000 

50000 

40000 

30000 

20000 

10000 

(1) (2.a) (2.b) (3) 

0 

0 50 100 

Tcom Tcomp p 

150 200 

T_conv_seria 

Tconv_serial 

T_FS TFSOC_serial O C _seria 

Tconv_paralle 

Tconv_parallel 

T_FS TFSOC_parallel O C _para 

Fig. 7. Application run time as application code execution time is varied. 

The performance comparison results obtained from Eqs. (1) to (4) are shown in Fig. 

7. In this figure, the application code execution time for a unit of data (Tcomp) is varied, 

while all other parameters are fixed. Tconv_serial and TFSOC_serial are increased linearly to N × 

Tcomp. For the case when I/O and computation are executed serially, the conventional 

storage device shows better performance for all Tcomp, and the difference (TFSOC_serial – 

Tconv_serial) is constant at N(TFS_FSOC – TFS_host). 

When I/O and computation are being executed in parallel, observe from this figure 

that there are three phases of execution, which we denote by (1), (2), and (3). We discuss 

each of these phases separately. 

(1) This is when the application code execution time is less than the I/O time of the conventional 

storage device, that is, Tcomp < Tmedia + Ttrans. Here, application run time for 

1 Strictly speaking, Tstub is larger than Tdriver, but the difference is negligible. The only extra overhead in executing 

the stub code compared with an ordinary device driver is copying the arguments of the file operation 

to a contiguous memory area so that they can be transferred via DMA. In most cases, the size of the arguments 

to be copied is around 20 bytes, small enough to be negligible. We measured Tstub in our prototype 

implementation, and it was only 4% larger than Tdriver when reading 1MB of data.


both the conventional storage device and FSOC are mainly dominated by the I/O 

time with an increase rate of 1. Note that in Fig. 7, the results seem to be a constant 

value, but this is because the increase rate is relatively very small compared to the 

other phases. 

The difference is (TFSOC_parallel – Tconv_parallel) = N(TFS_FSOC – TFS_host) (recall that 

we assume that Tstub and Tdriver are the same). The difference in the run time is caused 

by the difference in executing the file system code in the host and FSOC, and the 

conventional storage device shows better performance. Detailed execution times 

occurring at each system component for this phase are depicted in Fig. 8 (a). 

(2) This is when application code execution time is greater than the I/O time of the conventional 

storage device and smaller than the I/O time of FSOC, that is, Tmedia + 

Ttrans ≤ Tcomp < TFS_FSOC + Tmedia + Ttrans. Here, application run time of the conventional 

storage device is dominated by Tcomp and the rate of increase is N. On the other 

hand, application run time of FSOC is still dominated by I/O time and the rate of increase 

is 1. Therefore, the difference in application execution time grows smaller as 

Tcomp increases, and eventually crosses over. 

In order to emphasize the crossover point, we divided phase (2) into two 

sub-phases, that is, phase (2.a), where the conventional storage device shows better 

performance and phase (2.b), where the FSOC shows better performance. In phase 

(2.a), Tmedia + Ttrans ≤ Tcomp < (Tmedia + Ttrans) + N/(N – 1)(TFS_FSOC − TFS_host), and the 

difference is N(TFS_FSOC − TFS_host) + (N – 1)((Tmedia + Ttrans) – Tcomp). 

In phase (2.b), (Tmedia + Ttrans) + N/(N – 1)(TFS_FSOC − TFS_host) ≤ Tcomp < Tmedia + 

Ttrans + TFS_FSOC, and the difference is given as N(TFS_host – TFS_FSOC) + (N – 1)(Tcomp – 

(Tmedia + Ttrans)). Figs. 8 (b) and (c) show the detailed execution times occurring at 

each system component for these two situations, respectively. 

(3) This is when the application code execution time is greater than the I/O time of the 

FSOC, that is, Tcomp ≥ Tmedia + Ttrans + TFS_FSOC ≥ Tmedia + Ttrans. Here, the application 

run times of both the conventional storage device and the FSOC are dominated by 

the application code execution time. Hence, the application run time increases proportionally 

to Tcomp for both devices and the rate of increase is N. The difference in 

the application run time between the conventional storage device and the FSOC is 

(Tconv_parallel – TFSOC_parallel) = N × TFS_host – TFS_FSOC. Fig. 8 (d) shows the detailed execution 

times occurring at each system component for this phase. 

In summary, FSOC performs better than the conventional storage device when the 

application code execution time is larger than the I/O time. This performance gain is due 

to the fact that parallel execution of file system code and application code is possible 

with FSOC. Otherwise, the conventional storage device performs better. 

5. PERFORMANCE EVALUATION 

In this section, the quantitative aspect of FSOC is evaluated through several experiments 

using our prototype implementation. For this purpose, the performance of 

FSOC is compared against a conventional storage device. The conventional storage device 

1875

1876 

Conv 

CPU 

Bus 

Storage 

FSOC 

CPU 

Bus 

Storage 

Conv 

CPU 

Bus 

Storage 

FSOC 

CPU 

Bus 

Storage 

Conv 

CPU 

Bus 

Storage 

FSOC 

CPU 

Bus 

Storage 

Conv 

CPU 

Bus 

Storage 

FSOC 

CPU 

Bus 

Storage 

Ci F D 

Ci 

S 

Ci F D 

Ci 

S 

Ci F D 

Ci 

S 

Ci F D 

Ci 

S 


F 

F 

F 

F 

M 

M 

M 

M 

T 

M 

T 

M 

T 

M 

T 

M 

F D C1 F D C2 C3 Co 

T 

M 

T 

M 

T 

S C1 

S C2 C3 Co 

F 

M 

T 

F D C1 F D C1 C2 C3 Co 

T 

M 

T 

M 

T 

F 

S C1 

S C2 C3 Co 

F 

M 

T 

F 

M 

M 

T 

T 

Ci: application code execution for initialization 

Co: applicatio ncode execution for outputting overall result 

Cn: application code execution for n-th data unit 

D: device driver code execution 

S: stub code execution 




F D C1 F D C1 

C2 C3 

Co 

T 

M 

T 

M 

T 

S C1 

S 

C2 C3 

Co 

F 

M 

T 

F 

M 

F D C1 F D C1 

C2 C3 

Co 

T 

M 

T 

M 

T 

S C1 

S C1 

C2 C3 

Co 

F 

(a) T comp = 2, phase (1). 

(b) T comp = 3, phase (2.a). 

(c) Tcomp = 4, phase (2.b). 

M 

T 

(d) Tcomp = 5, phase (3). 

Fig. 8. Operation examples of the conventional storage device and FSOC. 

used in our experiments is the CompactFlash memory card that we described in the previous 

section. This card has exactly the same hardware and software configuration as the 

one on which the FSOC prototype was implemented. The relationship between the host 

and the storage devices that it operates is shown in Fig. 9. For the host system, we used 

an embedded system development board with an ARM920T core running the Linux 

2.4.18 operating system. The same host is used for both the FSOC and the conventional 

F 

M 

T 

T 

Time 

Time 

Time 

Time


Linux kernel 

FTL 


Memory 

CompactFlash 

Host 

Virtual File System Layer 

FSOC 

File System 

Device Driver 

Application 

FSOC Stub 

FSOC 

File System 

FTL 


Memory 

FSOC 

Fig. 9. Host implementation supporting FSOC and CompactFlash. 

storage device. For the FSOC, a stub was implemented and added to the kernel. The file 

system used in the FSOC prototype was ported to the Linux kernel to operate the conventional 

storage device. Therefore, the file system in the host is exactly the same as the 

file system in FSOC. This was done for the purpose of fair comparison. For all the experiments, 

the PCMCIA interface was used and the host clock rate is fixed at 56Mhz 

unless otherwise stated. The NAND flash memory we used has read bandwidth of 

42MB/sec and written bandwidth of 2.56MB/sec. The data transfer rate of the bus ranges 

between 700KB/sec and 1MB/sec depending on the host clock speed. 

5.1 Computation Time and I/O Time of an Application 

There are two performance implications as we move the file system from the host to 

the storage device. The first is that the host CPU burden is reduced as the file system 

code is no longer executed. This leaves more room for other CPU activities including 

application code execution in the host system that may be executed in parallel with the 

file system code that is executed in the FSOC, having a positive influence on performance. 

On the other hand, the CPU in the FSOC, which is generally slower than the one in 

the host system, now has more work to do than before, having a negative influence on 

performance. 

In this section, we show how the ratio between the computation time and the I/O 

time of the application influences the overall performance of FSOC. For this purpose, we 

perform experiments with a synthetic workload that varies the ratio between the computation 

time and the I/O time. Fig. 10 shows the pseudo code for the synthetic workload. 

We used non-blocking read for parallel execution of I/O processing and computation. I/O 

processing and computation is performed in 4KB data units. Step 5 in Fig. 10 is a dummy 

loop that does not perform any useful computation, but was inserted to control the computation 

time so we could control the computation and I/O time ratio. 

The results from this experiment are shown in Fig. 11 (a), where the x-axis is the 

initial value of the counter variable and the y-axis is the total execution time of the synthetic 

application. ‘FSOC’ denotes the results for the FSOC prototype and ‘Conv’ denotes 

the results for the conventional storage device. When the initial value of the counter 

1877

1878 

Application run time (ms) 

2500 

2000 

1500 

1000 

500 

0 


1) issue non-blocking read request for initial 4KB of data 

2) check if read request is completed 

a. if not completed, wait for completion 

3) issue non-blocking read request for the next 4KB of data 

4) sum all values in the read data 

5) count from the specified initial value to 0 (dummy loop) 

6) check if the total amount of read data is 1MB 

a. if it is, terminate program 

b. otherwise goto step 2) 

Fig. 10. Pseudo code for the synthetic workload using non-blocking read. 

App. execution time (ms) 

2500 

2400 

2300 

2200 

2100 

2000 

1900 

1800 

1700 

1600 

T_conv T_FSOC 

1500 

5000 10000 15000 20000 25000 

Computation overhead (iteration count of dummy loop) 

(a) Measured application execution time of synthetic workload. 

TFSOC_parallel derived from Eq. (4) T_FSOC measured from the experiment 

800 1000 1200 1400 1600 1800 2000 

Tcomp (ms) 

Application run time (ms) 

2500 

2000 

1500 

1000 

T_conv_parallel derived from Eq.(3) T_conv measured from the experiment 

500 

0 

800 1000 1200 1400 1600 1800 2000 

Tcomp (ms) 

(b) Comparison of measured application execution time and derived value from Eqs. (3) and (4) 

(with parameters N = 256, Tmedia + Ttrans = 5.6, Tdriver = 0.72, TFS_host = 0.24, TFS_FSOC = 0.42, 

Tstub = 0.75, and Tcomp_serial = 38). 

Fig. 11. Result of the synthetic workload experiment. 

variable is less than 15000, the application run time of both the conventional storage device 

and the FSOC are bounded by the I/O time, which consists of the file/storage device 

access time and the data transfer time. In this range, since most of the computation time 

is hidden by the I/O time, the application run time of both the conventional storage de-


vice and the FSOC increases very slowly, looking almost as if they are constant, with the 

FSOC increasing even slower as more file/storage activities are performed in the storage 

device, which has a slower CPU. 

When the initial value is over 15000, things start to change in the conventional storage 

device. Now, all the computation time cannot be hidden behind the I/O time and the 

computation time starts to dictate the application run time. Hence, as the computation 

time increases, the total execution time increases with it. For FSOC, since its I/O time is 

greater than that of the conventional storage device, the above phenomenon does not 

occur until the counter variable reaches 17000. Between 15000 and 17000, the execution 

time of FSOC remains almost constant while that of the conventional storage device increases 

linearly. Hence, before the two crosses over (in our case this happens when the 

counter variable value reaches 16000), the conventional storage device performs better, 

while after the crossover point, the FSOC starts to perform better. Beyond 17000, both 

devices are dominated by the computation time and so the difference in performance 

remains constant with FSOC performing better. 

Fig. 11 (b) compares Fig. 11 (a), the results obtained through actual measurements 

and Fig. 7, which shows the values obtained from the model. The parameters used for the 

model were obtained from actual measurements 2 . Observe the similarity between the 

results. The margin of error is in the 2-3% range. The error comes from the extra overhead 

caused by executing measurement code. When we calibrated the parameters by 

subtracting measurement overhead, the margin of error was reduced below 1%. These 

results experimentally validate the presented performance model for the case where I/O 

processing and computation may be executed in parallel. 

5.2 Computing Power of the Host and the Storage Device 

Portable storage devices such as the FSOC can be used with diverse hosts, and the 

execution time of the application code and the file system code varies depending on the 

computing power of the host. In this section, we analyze the influence of the computing 

power of the host and the storage devices on the application performance. For this purpose, 

we execute the synthetic workload described in section 5.1 with various host clock 

speeds. 

Fig. 12 shows the experimental results for host clock speeds of 45MHz, 56MHz, 

and 67MHz, while the clock speed of the storage device core remains fixed at 24Mhz. 

Notice that the crossover point where the FSOC starts to perform better than the conventional 

storage device moves to the right as the clock speed increases. This is a natural 

consequence as with a faster clock more computation can be hidden behind the I/O time. 

Except for this, we can observe the same performance trends as in Fig. 11 (a). 

We also performed experiments with three real-world applications: cat, gzip, and 

mpeg. The versions that we used were cat that is embedded in busybox 0.60.3, gzip 1.3.2 

from GNU, mpeg2play 1.1 from MPEG Software Simulation Group. Cat dumps the contents 

of a 1MB file on the terminal, gzip compresses a file whose original size is 4.8MB, 

2 We obtained the actual execution time by inserting measurement instructions into the device driver, stub, and 

the file system code. TFS_host is measured by recording timestamps at the entry of the file system code and the 

device driver code in the host, and calculating the difference. TFS_FSOC is measured by similar method, recording 

timestamps at the entry of the file system code and the flash memory access code in the storage device. 

1879

Application execution time (ms) . 

1880 

2100 

1900 

1700 

1500 


App. execution time (ms) 

2900 

2700 

2500 

2300 

2100 

1900 

1700 

Host 45MHz, Conv 

Host 45MHz, FSO C 





1500 

5000 10000 15000 20000 25000 30000 

Computation overhead (interation count of dummy loop) 

Fig. 12. Results of synthetic workload execution with various host clock speeds. 

FSO C 

Conv 

1300 

30 40 50 60 70 80 90 

Host clock speed (MHz) 


19000 

17000 

15000 

13000 

11000 

FSO C 

Conv 

9000 

30 40 50 60 70 


80 90 

(a) Cat. (b) Gzip. 


5600 

5100 

4600 

4100 

FSO C 

Conv 

3600 

60 70 80 90 100 


(c) Mpeg. 

Fig. 13. Application run time of cat, gzip, and mpeg. 

and mpeg decodes a 1.7MB video clip. They represent applications with varying ratios of 

computation time and I/O time. Cat is an I/O-bound application, while mpeg is computation-bound. 

Gzip is an application that has almost the same I/O and computation times. 

For our experiments we modified the gzip and mpeg to spawn a process whose task 

is to read data and transfer the read data to the parent process via pipe IPC. Also, the


original mpeg program outputs the decoded result onto a display device. To reduce the 

effect of the display, we removed the output part of the program. 

Fig. 13 shows the results with the three real-world applications. The x-axis in the 

graphs represents the host clock speed and the y-axis is the application execution time in 

milliseconds. Note that the scales in the graphs are different for each of the graphs. In the 

case of the I/O-bound cat, the conventional storage device shows better performance, 

while for the computation-bound mpeg, the FSOC performs better for all clock speeds. 

However, in the case of gzip, the performance of the conventional storage device and the 

FSOC crosses over when the host clock speed is at around 50MHz. These results validate 

the experiments done with the synthetic workloads presented above. 

5.3 Effects of Multiprogramming 

In this section, we compare the performance of FSOC and conventional storage 

when there are multiple applications running concurrently in the host system. To examine 

the effect of multiprogramming, we executed the I/O-bound application cat in parallel 

with an application that calculates the value of the circular constant pi, and measured 

the time elapsed to complete both applications. Fig. 14 shows the results of the experiment. 


4300 

3800 

3300 

2800 

2300 

1800 

1300 

800 

30 40 50 60 70 80 90 


Fig. 14. Effect of multiprogramming. 

F S O C (c a t+ p i) 

C onv (cat+pi) 

F S O C (c a t) 

Conv (cat) 

F S O C (p i) 

Conv (pi) 

The application pi does not have any file operations, so the application run time is 

the same for both conventional storage and FSOC. Therefore, in Fig. 14, the results for 

FSOC (pi) and Conv (pi) are completely overlapped and look like one line. Cat is slower 

with FSOC as discussed previously when it is executed alone. However, when it is executed 

concurrently with pi, FSOC shows better performance. The implication is that 

when the host CPU is kept busy via multiprogramming, FSOC can perform better even 

for I/O-bound applications. This indicates that when there are more applications running 

concurrently on embedded systems, the performance benefit of FSOC will increase as 

well. 

1881

1882 


5.4 Data Traffic between the Host and the Storage Device 

Data traffic between the host and the storage device is reduced with FSOC, again, 

leading to improved performance. This is because metadata that are necessary for file 

operations need not be transferred since the file system now resides in the storage device. 

For example, to create a new file on the FAT file system, the file allocation table and the 

directory entry need to be modified. If the file system is run in the host system, data for 

the file allocation table and the directory entry are transferred to the host, and they are 

transferred back to the storage device after required modifications are made. However, in 

FSOC, these operations are performed within the storage device, eliminating the need for 

metadata transfers. Only the name of the file to be created needs to be transferred to the 

FSOC. 

Data traffic ratio (FSOC/Conv) (%) . 

100 

90 

80 

70 

60 

50 

40 

30 

20 

10 

0 

Directory 

creation 

Directory 

re m o v a l 

File 

creation 

File 

re m o v a l 

File operations 

Renam e File 

read/w rite 

Fig. 15. Data traffic ratio between the FSOC and the conventional storage device. 

The amount of metadata transferred varies depending on the type of file operation. 

Fig. 15 shows the ratio between the data traffic of the conventional storage device and 

the FSOC for the various types of file operations. The read operation and the write operation 

require a relatively small amount of metadata compared with the file data, and 

thus the data traffic of the conventional storage device and the FSOC is almost the same. 

However, when performing file operations such as directory creation, directory removal, 

file creation, and file removal, which mainly manipulates metadata, the data traffic required 

for FSOC is much smaller (between 5% and 30%) than that required in the conventional 

storage device. 

To examine the effects of data traffic on the performance of FSOC and conventional 

storage, we performed three experiments. The first two experiments are done with synthetic 

workloads, each representing two extreme cases. The first experiment is sequentially 

reading data from a 1MB file. The amount of metadata required to perform this 

operation is trivial compared with the amount of file data. The other experiment creates 

500 files corresponding to the case where metadata operations dominate the execution. 

The third experiment is the second phase of the Andrew benchmark [32], which represents 

real-life file system operations, that is, copying files of various sizes mixed with 

file creations, file reads, and file writes.


Table 2. Elapsed time for file operations (unit: ms). 

Operation FSOC Conv 

File read 1730 1603 

File creation 8986 28926 

Andrew 4920 4961 

The results are presented in Table 2. The file read experiment shows that the performance 

of FSOC is slightly lower (by about 8%) than the conventional storage device. 

This is due to the lower computing power of the storage device compared with that of the 

host system. On the other hand, the file creation experiment shows that the performance 

of FSOC is higher than the conventional storage device by about 69%. In the Andrew 

benchmark experiment, the conventional storage device and the FSOC show almost identical 

performance. This shows that the effect of reducing data traffic compensates for the 

performance degradation due to the lower computing power of the storage device. 

5.5 Summary of Experimental Results 

We can summarize the experimental results observed from this section as follows: 

• The performance model presented in section 4 matches well with measurement results 

obtained from the prototype implementation. 

• FSOC is beneficial when the computation time of the application is larger than the I/O 

time, and I/O time can be hidden by parallel execution. The gain, in this case, is equivalent 

to the file system code execution time of the host. 

• The lower the computing power of the host, the larger the gain for FSOC. This is because 

computing power of the host determines the file system code execution time at 

the host. 

• When multiple applications are executed, more benefit may be attainable for FSOC as 

I/O time can be hidden by the computation time of other concurrently executing applications. 

• The FSOC also reduces data traffic between the host and the storage, especially, for 

metadata intensive operations. 

6. CONCLUSION 

In this paper, we proposed and implemented an FSOC that contains a file system in 

the storage device with Flash memory as its storage media. FSOC provides a higher degree 

of interoperability than conventional storage devices and improves the efficiency of 

the storage device by allowing the file system to be optimized specifically for the storage 

media of interest. Moreover, FSOC reduces the burden of developing or porting different 

file systems in the host system. 

Aside from these qualitative advantages, FSOC also provides quantitative advantages. 

Performance gains can be obtained through parallel execution of application code 

in the host and the file system code in the storage device. The FSOC also reduces data 

1883

1884 


traffic between the host and the storage. These performance issues were evaluated through 

a performance model and several experiments with synthetic workloads and real applications. 

The experimental results showed that FSOC performed better than the conventional 

storage device when the computation time is larger than the I/O time and/or when 

there are many file operations that require access to metadata. 

As future work we plan to extend the proposed model to include the case where 

multiple applications are executed independently. In this case we need to consider the 

behavior of multiple processes that do not cooperate with each other. This point was not 

considered in the model presented in this paper. We are also planning to examine the 

effect of FSOC for a variety of file systems and the impact upon performance from different 

file system designs and implementations. 

REFERENCES 

1. CompactFlash Association, “Information about CompactFlash,” http://www.compact- 

flash.org/. 

2. MultiMediaCard Association, http://www.mmca.org. 

3. E. Zadok and J. Nieh, “Fist: a language for stackable file systems,” in Proceedings of 

the Annual USENIX Technical Conference, 2000, pp. 55-70. 

4. G. R. Ganger, “Blurring the line between OSes and storage devices,” Technical Report, 

No. CMU-CS-01166, Carnegie Mellon University, 2001. 

5. C. R. Lumb, J. Schindler, and G. R. Ganger, “Freeblock scheduling outside of disk 

firmware,” in Proceedings of the 1st USENIX Conference on File and Storage Technologies, 

2002, pp. 275-288. 

6. J. Schindler, J. L. Griffin, C. R. Lumb, and G. R. Ganger, “Track-aligned extents: 

matching access patterns to disk drive characteristics,” in Proceedings of the 1st 

USENIX Conference on File and Storage Technologies, 2002, pp. 259-274. 

7. R. Wang, T. E. Anderson, and D. A. Patterson, “Virtual log-based file systems for a 

programmable disk,” in Proceedings of the 3rd Symposium on Operating Systems 

Design and Implementation, 1999, pp. 29-43. 

8. A. Acharya, M. Uysal, and J. Saltz, “Active disks: programming model, algorithms 

and evaluation,” in Proceedings of the 8th International Conference on Architectural 

Support for Programming Languages and Operating Systems, 1998, pp. 81-91. 

9. A. P. Chandrakasan, S. Sheng, and R. W. Brodersen, “Low-power CMOS digital design,” 

IEEE Journal of Solid State Circuits, Vol. 27, 1992, pp. 473-484. 

10. E. Riedel, G. Gibson, and C. Faloutsos, “Active storage for large-scale data mining 

and multimedia,” in Proceedings of the 24th International Conference on Very Large 

Data Bases, 1998, pp. 62-73. 

11. K. Keeton, D. A. Patterson, and J. M. Hellerstein, “A case for intelligent disks 

(IDISKs),” in Proceedings of the ACM SIGMOD International Conference on Management 

of Data, 1998, pp. 42-52. 

12. E. Riedel, “Active disks − Remote execution for network-attached storage,” Ph.D. 

Dissertation, No. CMU-CS-99-177, Department of Electrical and Computer Engineering, 

Carnegie Mellon University, 1999. 

13. E. Riedel, C. Faloutsos, G. A. Gibson, and D. Nagle, “Active disks for large-scale


data processing,” IEEE Computer, Vol. 34, 2001, pp. 68-74. 

14. D. Anderson, “Object based storage devices: a command set proposal,” Technical 

Report, National Storage Industry Consortium, 1999. 

15. OSD workgroup, http://www.snia.org/tech_activities/workgroups/osd. 

16. F. Douglis, R. Caceres, F. Kaashoek, K. Li, B. Marsh, and J. A. Tauber, “Storage al- 

ternatives for mobile computers,” in Proceedings of the 1st Symposium on Operating 

Systems Design and Implementation, 1994, pp. 25-37. 

17. B. Marsh, F. Douglis, and P. Krishnan, “Flash memory file caching for mobile computers,” 

in Proceedings of the 27th Annual Hawaii International Conference on Systems 

Sciences, 1994, pp. 451-461. 

18. J. Kim, J. M. Kim, S. H. Noh, S. L. Min, and Y. Cho, “A space-efficient flash translation 

layer for CompactFlash systems,” IEEE Transactions on Consumer Electronics, 

Vol. 28, 2002, pp. 366-375. 

19. M. Wu and W. Zwaenepoel, “eNVy: a non-volatile, main memory storage system,” 

in Proceedings of the 6th International Conference on Architectural Support for 

Programming Languages and Operating Systems, 1994, pp. 86-97. 

20. Intel Corporation, “Understanding the flash translation layer (FTL) specification,” 

http://developer.intel.com. 

21. A. Kawaguchi, S. Nishioka, and H. Motoda, “A flash-memory based file system,” in 

Proceedings of the USENIX Technical Conference, 1995, pp. 155-164. 

22. MTD, “Memory technology device (MTD) subsystem for Linux,” http://www.linux- 

mtd.infradead.org. 

23. D. Woodhouse, Red Hat, Inc., “JFFS: the journaling flash file system,” http://sources. 

redhat.com/jffs2/jffs2-html/. 

24. Aleph One Company, “Yet another flash filing system,” http://www.aleph1.co.uk/ 

armlinux/projects/yaffs/. 

25. M. Rosenblum and J. K. Ousterhout, “The design and implementation of a log-struc- 

tured file system,” ACM Transactions on Computer Systems, Vol. 10, 1992, pp. 26-51. 

26. S. Kleiman, “Vnode: an architecture for multiple file systems in Sun UNIX,” in Proceedings 

of the USENIX Technical Conference, 1986, pp. 238-247. 

27. A. Birrel and B. Nelson, “Implementing remote procedure calls,” ACM Transactions 

on Computer Systems, Vol. 2, 1984, pp. 39-59. 

28. IEEE, Information Technology – Portable Operating System Interface (POSIX) Part 

1: System Application Program Interface (API) [C Language], IEEE, 1990. 

29. Personal Computer Card Interface Association, “PCMCIA PC card standard release 

2.1,” 1993. 

30. R. Hagmann, “Reimplementing the Cedar file system using logging and group commit,” 

in Proceedings of the 11th Symposium on Operating System Principles, 1987, 

pp. 155-162. 

31. M. S. Kwon, S. H. Bae, S. S. Jung, D. Y. Seo, and C. K. Kim, “KFAT: log-based 

transactional FAT file system for embedded mobile systems,” in Proceedings of the 

US-Korea Conference on Science, Technology, and Entrepreneurship. 

32. J. K. Ousterhout, “Why aren’t operating systems getting faster as fast as hardware?” 

in Proceedings of theUSENIX Technical Conference, 1990, pp. 247-256. 

1885

1886 


Seongjun Ahn received the M.S. and Ph.D. degrees in 

Computer Engineering from Seoul National University, Seoul, 

Korea, in 1999 and 2006, respectively. He has been working at 

Software Laboratories, Samsung Electronics Company, Korea 

since 2006. His current research interests include operating systems, 

flash memory software, embedded systems, and high performance 

storage systems. 

Jongmoo Choi received the B.S. degree in Oceanography 

from Seoul National University, Korea, in 1993 and the M.S. and 

Ph.D. degrees in Computer Engineering from Seoul National 

University in 1995 and 2001, respectively. Currently, He is an as- 

sistant professor in the Division of Information and Computer 

Science, Dankook University, Seoul, Korea. Previously, he was a 

senior engineer at the Ubiquix Company where he participated in 

developing a real-time micro operating system for PDAs and 

Smart Phones. His research interests include embedded system, 

system software, flash memory, RTOS, and data mining. 

Donghee Lee received the M.S. and Ph.D. degrees in Computer 

Engineering, both from Seoul National University, Seoul, 

Korea, in 1991 and 1998, respectively. He has been with the 

University of Seoul, Korea since 2002, where he is now an associate 

professor in the School of Computer Science. Previously, in 

1998, he was a senior engineer at Samsung Electronics Company, 

Korea. From 1999 to 2001, he was with the Cheju National University, 

Korea, where he was an assistant professor. He was also 

an assistant professor in Hanyang University, Korea in 2001. His 

research interests include operating systems, flash memory soft- 

ware, embedded systems, and high performance storage systems. 

Sam H. Noh received the B.S. degree in Computer Engineering 

from Seoul National University, Korea, in 1986, and the 

PhD degree from the University of Maryland at College Park in 

1993. He held a visiting faculty position at George Washington 

University from 1993 to 1994 before joining Hongik University 

in Seoul, Korea, where he is now a professor in the School of 

Information and Computer Engineering. From August 2001 to 

August 2002, he was a visiting associate professor to University 

of Maryland Institute for Advanced Computer Studies (UMIACS),


College Park. He was member of the program committee for the Performance and Reliability 

track of the Twelfth International WWW Conference (WWW2003). His current 

research interests include Web systems, parallel and distributed systems, operating systems 

with emphasis on I/O issues, and real-time systems. 

Sang Lyul Min received the B.S. and M.S. degrees in Computer 

Engineering, both from Seoul National University, Seoul, 

Korea, in 1983 and 1985, respectively. In 1985, he was awarded a 

Fullbright scholarship to pursue further graduate studies at the 

University of Washington. He received the M.S. and Ph.D. degrees 

in Computer Science from the University of Washington, 

Seattle, in 1988 and 1989, respectively. He is currently a professor 

in the School of Computer Science and Engineering, Seoul 

National University, Seoul, Korea. Previously, he was an assistant 

professor in the Department of Computer Engineering, Pusan 

National University, Pusan, Korea, from 1989 to 1992 and a visiting scientist at the IBM 

T.J. Watson Research Center, Yorktown Heights, New York, from 1989 to 1990. His 

research interests include computer architecture, real-time computing, parallel processing, 

and computer performance evaluation. 

Yookun Cho received the B.S. degree from Seoul National 

University, Korea, in 1971 and the Ph.D. degree in Computer 

Science from the University of Minnesota at Minneapolis in 1978. 

He has been with the School of Computer Science and Engineering 

since 1979, where he is currently a professor. He was a visiting 

assistant professor at the University of Minnesota during 1985 

and the director of Educational and Research Computing Center 

at Seoul National University from 1993 to 1995. He also served 

as the president of Korea Information Science Society from 2001 

to 2002. He was a member of the program committee of the IPPS/ 

SPDP’98 in 1997 and the International Conference on High-Performance Computing 

from 1995 to 1997. His research interests include operating systems, algorithms, system 

security, and fault-tolerant computing systems. 

1887

Design, Implementation, and Performance Evaluation of Flash ...

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?