NASA Scientific and Technical Aerospace Reports
NASA Scientific and Technical Aerospace Reports
NASA Scientific and Technical Aerospace Reports
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
holdings. A study released by the School of Information Management <strong>and</strong> Systems at the University of California, Berkeley,<br />
estimates that over 5 exabytes of data was created in 2002. Almost 99 percent of this information originally appeared on<br />
magnetic media. The theme for MSST2004 is therefore both timely <strong>and</strong> appropriate. There have been many discussions about<br />
rapid technological obsolescence, incompatible formats <strong>and</strong> inadequate attention to the permanent preservation of knowledge<br />
committed to digital storage. Tutorial sessions at MSST2004 detail some of these concerns, <strong>and</strong> steps being taken to alleviate<br />
them. Over 30 papers deal with topics as diverse as performance, file systems, <strong>and</strong> stewardship <strong>and</strong> preservation. A number<br />
of short papers, extemporaneous presentations, <strong>and</strong> works in progress will detail current <strong>and</strong> relevant research on the<br />
MSST2004 theme.<br />
Author<br />
Information Management; Computer Networks; Metadata; Computer Storage Devices; Data Storage<br />
20040121031 Houston Univ., TX, USA<br />
Identifying Stable File Access Patterns<br />
Shah, Purvi; Paris, Jehan-Francois; Amer, Ahmed; Long, Darrell D. E.; <strong>NASA</strong>/IEEE MSST 2004 Twelfth <strong>NASA</strong> Goddard<br />
Conference on Mass Storage Systems <strong>and</strong> Technologies in cooperation with the Twenty-First IEEE Conference on Mass<br />
Storage Systems <strong>and</strong> Technologies; April 2004, pp. 159-163; In English; See also 20040121020<br />
Contract(s)/Grant(s): NSF CCR-99-88390; NSF ANI-03-25353; NSF CCR-02-04358; No Copyright; Avail: CASI; A01,<br />
Hardcopy<br />
Disk access times have not kept pace with the evolution of disk capacities, CPU speeds <strong>and</strong> main memory sizes. They<br />
have only improved by a factor of 3 to 4 in the last 25 years whereas other system components have almost doubled their<br />
performance every other year. As a result, disk latency has an increasingly negative impact on the overall performance of many<br />
computer applications. Two main techniques can be used to mitigate this problem, namely caching <strong>and</strong> prefetching. Caching<br />
keeps in memory the data that are the most likely to be used again while prefetching attempts to bring data in memory before<br />
they are needed. Both techniques are widely implemented at the data block level. More recent work has focused on caching<br />
<strong>and</strong> prefetching entire files. There are two ways to implement file prefetching. Predictive prefetching attempts to predict which<br />
files are likely to be accessed next in order to read them before they are needed. While being conceptually simple, the approach<br />
has two important shortcomings. First, the prefetching workload will get in the way of the regular disk workload. Second, it<br />
is difficult to predict file accesses sufficiently ahead of time to ensure that the predicted files can be brought into main memory<br />
before they are needed. A more promising alternative is to group together on the disk drive files that are often accessed at the<br />
same time. This technique is known as implicit prefetching <strong>and</strong> suffers none of the shortcomings of predictive prefetching<br />
because each cluster of files can now be brought into main memory in a single I/O operation. The sole drawback of this new<br />
approach is the need to identify stable file access patterns in order to build long-lived clusters of related files. We present here<br />
a new file predictor that identifies stable access patterns <strong>and</strong> can predict between 50 <strong>and</strong> 70 percent of next file accesses over<br />
a period of one year. Our First Stable Successor keeps track of the successor of each individual file. Once it has detected m<br />
successive accesses to file Y, each immediately following an access to file X, it predicts that file Y will always be the successor<br />
of file X <strong>and</strong> never alters this prediction. The remainder of this paper is organized as follows. Section 2 reviews previous work<br />
on file access prediction. Section 3 introduces our First Stable Successor predictor <strong>and</strong> Section 4 discusses its performance.<br />
Finally, Section 5 states our conclusions.<br />
Author (revised)<br />
File Maintenance (Computers); Memory (Computers); Data Retrieval; Predictions<br />
20040121032 Toronto Univ., Ontario, Canada<br />
Clotho: Transparent Data Versioning at the Block I/O Level<br />
Flouris, Michail D.; Bilas, Angelos; <strong>NASA</strong>/IEEE MSST 2004 Twelfth <strong>NASA</strong> Goddard Conference on Mass Storage Systems<br />
<strong>and</strong> Technologies in cooperation with the Twenty-First IEEE Conference on Mass Storage Systems <strong>and</strong> Technologies; April<br />
2004, pp. 315-328; In English; See also 20040121020; No Copyright; Avail: CASI; A03, Hardcopy<br />
Recently storage management has emerged as one of the main problems in building cost effective storage infrastructures.<br />
One of the issues that contribute to management complexity of storage systems is maintaining previous versions of data. Up<br />
till now such functionality has been implemented by high-level applications or at the filesystem level. However, many modern<br />
systems aim at higher scalability <strong>and</strong> do not employ such management entities as filesystems. In this paper we propose pushing<br />
the versioning functionality closer to the disk by taking advantage of modern, block-level storage devices. We present Clotho,<br />
a storage block abstraction layer that allows transparent <strong>and</strong> automatic data versioning at the block level. Clotho provides a<br />
set of mechanisms that can be used to build flexible higher-level version management policies that range from keeping all data<br />
modifications to version capturing triggered by timers or other system events. Overall, we find that our approach is promising<br />
235