11.12.2012 Views

NASA Scientific and Technical Aerospace Reports

NASA Scientific and Technical Aerospace Reports

NASA Scientific and Technical Aerospace Reports

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

holdings. A study released by the School of Information Management <strong>and</strong> Systems at the University of California, Berkeley,<br />

estimates that over 5 exabytes of data was created in 2002. Almost 99 percent of this information originally appeared on<br />

magnetic media. The theme for MSST2004 is therefore both timely <strong>and</strong> appropriate. There have been many discussions about<br />

rapid technological obsolescence, incompatible formats <strong>and</strong> inadequate attention to the permanent preservation of knowledge<br />

committed to digital storage. Tutorial sessions at MSST2004 detail some of these concerns, <strong>and</strong> steps being taken to alleviate<br />

them. Over 30 papers deal with topics as diverse as performance, file systems, <strong>and</strong> stewardship <strong>and</strong> preservation. A number<br />

of short papers, extemporaneous presentations, <strong>and</strong> works in progress will detail current <strong>and</strong> relevant research on the<br />

MSST2004 theme.<br />

Author<br />

Information Management; Computer Networks; Metadata; Computer Storage Devices; Data Storage<br />

20040121031 Houston Univ., TX, USA<br />

Identifying Stable File Access Patterns<br />

Shah, Purvi; Paris, Jehan-Francois; Amer, Ahmed; Long, Darrell D. E.; <strong>NASA</strong>/IEEE MSST 2004 Twelfth <strong>NASA</strong> Goddard<br />

Conference on Mass Storage Systems <strong>and</strong> Technologies in cooperation with the Twenty-First IEEE Conference on Mass<br />

Storage Systems <strong>and</strong> Technologies; April 2004, pp. 159-163; In English; See also 20040121020<br />

Contract(s)/Grant(s): NSF CCR-99-88390; NSF ANI-03-25353; NSF CCR-02-04358; No Copyright; Avail: CASI; A01,<br />

Hardcopy<br />

Disk access times have not kept pace with the evolution of disk capacities, CPU speeds <strong>and</strong> main memory sizes. They<br />

have only improved by a factor of 3 to 4 in the last 25 years whereas other system components have almost doubled their<br />

performance every other year. As a result, disk latency has an increasingly negative impact on the overall performance of many<br />

computer applications. Two main techniques can be used to mitigate this problem, namely caching <strong>and</strong> prefetching. Caching<br />

keeps in memory the data that are the most likely to be used again while prefetching attempts to bring data in memory before<br />

they are needed. Both techniques are widely implemented at the data block level. More recent work has focused on caching<br />

<strong>and</strong> prefetching entire files. There are two ways to implement file prefetching. Predictive prefetching attempts to predict which<br />

files are likely to be accessed next in order to read them before they are needed. While being conceptually simple, the approach<br />

has two important shortcomings. First, the prefetching workload will get in the way of the regular disk workload. Second, it<br />

is difficult to predict file accesses sufficiently ahead of time to ensure that the predicted files can be brought into main memory<br />

before they are needed. A more promising alternative is to group together on the disk drive files that are often accessed at the<br />

same time. This technique is known as implicit prefetching <strong>and</strong> suffers none of the shortcomings of predictive prefetching<br />

because each cluster of files can now be brought into main memory in a single I/O operation. The sole drawback of this new<br />

approach is the need to identify stable file access patterns in order to build long-lived clusters of related files. We present here<br />

a new file predictor that identifies stable access patterns <strong>and</strong> can predict between 50 <strong>and</strong> 70 percent of next file accesses over<br />

a period of one year. Our First Stable Successor keeps track of the successor of each individual file. Once it has detected m<br />

successive accesses to file Y, each immediately following an access to file X, it predicts that file Y will always be the successor<br />

of file X <strong>and</strong> never alters this prediction. The remainder of this paper is organized as follows. Section 2 reviews previous work<br />

on file access prediction. Section 3 introduces our First Stable Successor predictor <strong>and</strong> Section 4 discusses its performance.<br />

Finally, Section 5 states our conclusions.<br />

Author (revised)<br />

File Maintenance (Computers); Memory (Computers); Data Retrieval; Predictions<br />

20040121032 Toronto Univ., Ontario, Canada<br />

Clotho: Transparent Data Versioning at the Block I/O Level<br />

Flouris, Michail D.; Bilas, Angelos; <strong>NASA</strong>/IEEE MSST 2004 Twelfth <strong>NASA</strong> Goddard Conference on Mass Storage Systems<br />

<strong>and</strong> Technologies in cooperation with the Twenty-First IEEE Conference on Mass Storage Systems <strong>and</strong> Technologies; April<br />

2004, pp. 315-328; In English; See also 20040121020; No Copyright; Avail: CASI; A03, Hardcopy<br />

Recently storage management has emerged as one of the main problems in building cost effective storage infrastructures.<br />

One of the issues that contribute to management complexity of storage systems is maintaining previous versions of data. Up<br />

till now such functionality has been implemented by high-level applications or at the filesystem level. However, many modern<br />

systems aim at higher scalability <strong>and</strong> do not employ such management entities as filesystems. In this paper we propose pushing<br />

the versioning functionality closer to the disk by taking advantage of modern, block-level storage devices. We present Clotho,<br />

a storage block abstraction layer that allows transparent <strong>and</strong> automatic data versioning at the block level. Clotho provides a<br />

set of mechanisms that can be used to build flexible higher-level version management policies that range from keeping all data<br />

modifications to version capturing triggered by timers or other system events. Overall, we find that our approach is promising<br />

235

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!