the chau hiix archive: principles, problems, and solutions
the chau hiix archive: principles, problems, and solutions
the chau hiix archive: principles, problems, and solutions
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
THE CHAU HIIX ARCHIVE:<br />
PRINCIPLES, PROBLEMS, AND SOLUTIONS<br />
Caroline Beebe, Ph.D.<br />
Data Manager, Chau Hiix Project<br />
(carolinebeebe@hotmail.com)<br />
Begin by watching “Team Digital <strong>and</strong> <strong>the</strong> Deadly Cryptic Conundrum”<br />
at http://www.youtube.com/user/wepreserve<br />
The research design for excavations at Chau Hiix, Belize, is concerned with underst<strong>and</strong>ing <strong>the</strong> Maya<br />
Collapse in terms of bureaucratic interference in <strong>the</strong> exploitation strategies of smallholders (PI is<br />
Dr. Anne Pyburn, Indiana University). A regional comparison of data from o<strong>the</strong>r sites such as Altun<br />
Ha <strong>and</strong> Lamanai, as well as chronological comparisons, would facilitate an underst<strong>and</strong>ing of <strong>the</strong><br />
similarities <strong>and</strong> differences in various sites <strong>and</strong> <strong>the</strong> complexity of <strong>the</strong>ir development <strong>and</strong> eventual<br />
demise. Digital technology provides an ideal tool to facilitate data comparisons through time <strong>and</strong><br />
space. Digital data collection, management, <strong>and</strong> archiving, however, are wrought with new<br />
operational logistics, lack of established methodologies, <strong>and</strong> ever changing technology<br />
developments that represent yet ano<strong>the</strong>r area of expertise to be developed by a principal<br />
investigator. Large institutions <strong>and</strong> businesses have well established systems in place for all stages<br />
of digital workflow, but individuals <strong>and</strong> small projects (sometimes referred to as “small science”)<br />
must design <strong>the</strong>se systems <strong>the</strong>mselves without <strong>the</strong> aid of built-in supercomputer systems or large<br />
number crunching grants.<br />
Digital archiving is <strong>the</strong> culminating step of <strong>the</strong> digital data workflow. Though <strong>the</strong>re is a lot of<br />
information being written about this topic, <strong>the</strong>re are no specific examples. Available information<br />
assumes <strong>the</strong> reader underst<strong>and</strong>s what format <strong>and</strong> syntax mean, how to build a database, how to<br />
create controlled vocabularies, or how to name files (such as what constitutes a good filename<br />
within <strong>the</strong> boundaries of type <strong>and</strong> number of characters). This document lays out <strong>the</strong> <strong>principles</strong> <strong>and</strong><br />
guidelines for digital archiving, as interpreted by <strong>the</strong> Chau Hiix project, <strong>and</strong> <strong>the</strong>n describes <strong>the</strong> Chau<br />
Hiix Data Archive as a specific instantiation. The methodology for this <strong>archive</strong> is <strong>the</strong> result of 15<br />
years of trial <strong>and</strong> error in <strong>the</strong> identification of <strong>principles</strong> <strong>and</strong> rules, <strong>and</strong> constant modification of <strong>the</strong><br />
implementation of those rules. It is also <strong>the</strong> result of both conceptual <strong>and</strong> data input from students,<br />
staff <strong>and</strong> professional personnel, without which <strong>the</strong> Chau Hiix Digital Archive would not exist.<br />
The goal of <strong>the</strong> Chau Hiix Digital Archive is for <strong>the</strong> data to be share-able <strong>and</strong> available in <strong>the</strong> future.<br />
The <strong>principles</strong> that make this possible are: Data are organized <strong>and</strong> consistent, <strong>and</strong> data are <strong>archive</strong>d<br />
with a stewardship plan. Controlled vocabularies <strong>and</strong> archival formats are both key to supporting<br />
<strong>the</strong>se <strong>principles</strong>. The <strong>archive</strong> process for digital data is simply stated: Save-as text, copy, print <strong>and</strong><br />
store. (Since image files cannot be stored as text, <strong>the</strong> only recourse is to save in an archival image<br />
format.) Each of <strong>the</strong>se steps is replete with <strong>problems</strong> <strong>and</strong> decisions, <strong>and</strong> <strong>the</strong> process still does not<br />
guarantee <strong>the</strong> accessibility of <strong>the</strong> data in perpetuity but brings <strong>the</strong> data closer to that potential.<br />
<strong>chau</strong>-<strong>hiix</strong>-digital-<strong>archive</strong>.doc 1 3/30/2010 3:17 PM
The material that follows is at times cryptic as it uses terminologies <strong>and</strong> concepts from both<br />
archaeology <strong>and</strong> information technology. All information technologists have gaps in <strong>the</strong>ir digital<br />
knowledge, so it is unreasonable to expect an archaeologist to be an expert in <strong>the</strong> digital arena in<br />
addition to <strong>the</strong>ir own content area. On <strong>the</strong> o<strong>the</strong>r h<strong>and</strong>, information technologists supporting<br />
archaeology activities are often too fascinated with <strong>the</strong> latest <strong>and</strong> greatest emerging technology.<br />
Their technology recommendations often presume skills sets beyond <strong>the</strong> scope of <strong>the</strong> daily<br />
experiences of humanities scientists <strong>and</strong> so are costly in time by requiring constant learning or relearning.<br />
Digital archiving is a complex <strong>and</strong> emerging activity <strong>and</strong> would be embraced by more<br />
people, as well as o<strong>the</strong>r disciplines, if presented in some sort of Digital Archiving for Dummies<br />
format.<br />
Problem areas this <strong>archive</strong> still struggles with include: The naming of some types of image files<br />
(how specific should <strong>the</strong> name be?); training surveyors in developing consistent point description<br />
codes; establishing guidelines for <strong>the</strong> selection of data to <strong>archive</strong> (e.g., if processed data can be<br />
recreated from <strong>the</strong> primary data, does it need to be included in <strong>the</strong> <strong>archive</strong>?); <strong>and</strong>, stewardship<br />
issues, such as long term storage plans <strong>and</strong> version control of data corrections/analyses. The<br />
checklist at <strong>the</strong> end of this document summarizes <strong>the</strong> specific tasks as instantiated at this time, <strong>and</strong><br />
<strong>the</strong> rest of this document explains those tasks, though sometimes references <strong>the</strong> Lab Manual for<br />
more detail (see <strong>the</strong> author for access to <strong>the</strong> Lab Manual).<br />
Preserve primary data<br />
Principles for <strong>the</strong> Chau Hiix Digital Archive<br />
Chau Hiix values <strong>the</strong> preservation of primary data, with <strong>the</strong> underst<strong>and</strong>ing that <strong>the</strong> data discovery<br />
process is filtered by <strong>the</strong> archaeologist. Assigning a new context <strong>and</strong> making a photograph of it is<br />
interpreting <strong>the</strong> current stasis of an excavation as data of value. The data (artifacts, photographs,<br />
written descriptions) of <strong>the</strong>se interpretive decisions are <strong>the</strong> most primary data that can be collected<br />
<strong>and</strong> preserved. The archaeologist <strong>the</strong>n analyzes this primary data for descriptive types <strong>and</strong><br />
sequencing, <strong>and</strong> <strong>the</strong> analysis is summarized <strong>and</strong> discussed in a publication, which is considered<br />
secondary data, <strong>and</strong> its story can be told many ways. If secondary data are <strong>the</strong> only documents<br />
preserved, as in publication, <strong>the</strong>n <strong>the</strong> opportunity is lost for multiple perspectives on <strong>the</strong> primary<br />
data: For re-analysis based on new data, for new or comparative interpretations, or for <strong>the</strong><br />
application of emerging technologies.<br />
Preserving <strong>the</strong> primary data involves more decision making processes: What data are worthy of<br />
preservation; what methods are available for preserving digital data; <strong>and</strong>, how will access be<br />
provided? Selecting data of value to preserve is for ano<strong>the</strong>r discussion, but note that it becomes<br />
confounded in <strong>the</strong> digital realm because of <strong>the</strong> ease of saving multiple versions. Chau Hiix would<br />
like to provide <strong>the</strong> most likely scenario for <strong>the</strong> digital data to survive <strong>and</strong> remain accessible for <strong>the</strong><br />
next 100 years. Though not an unreasonable goal, it is challenged by <strong>the</strong> lack of st<strong>and</strong>ards for digital<br />
<strong>archive</strong> methodologies <strong>and</strong> ever-changing technologies. Methods for preserving digital data are <strong>the</strong><br />
focus of this document, while provision for access in <strong>the</strong> future is still a problem without a longterm<br />
solution.<br />
<strong>chau</strong>-<strong>hiix</strong>-digital-<strong>archive</strong>.doc 2 3/30/2010 3:17 PM
Data should be shared<br />
Chau Hiix supports <strong>the</strong> principle that primary data should be shared. This would allow for data reuse<br />
(e.g., with new technologies), or large-scale data integration, such as meaningful analysis of<br />
currently unlinked sets of data from different geographic or cultural areas. But as noted by<br />
Silverman <strong>and</strong> Parezo, “How much of what had been in <strong>the</strong>ir [excavator's] notes will never have<br />
seen print, <strong>and</strong> how much of what had been published will dem<strong>and</strong> reexamination of <strong>the</strong> primary<br />
records - if <strong>the</strong>se are available?”(1995, p.1)(Silverman <strong>and</strong> Parezo 1995) In order for data to be<br />
share-able in <strong>the</strong> future it must be organized <strong>and</strong> consistent, <strong>and</strong> it must be <strong>archive</strong>d with a<br />
stewardship plan to be accessible.<br />
Data are consistent <strong>and</strong> organized<br />
The key to successful digital archiving is thorough documentation of <strong>the</strong> data, how <strong>the</strong>y were<br />
collected, what st<strong>and</strong>ards were used to describe <strong>the</strong>m <strong>and</strong> how <strong>the</strong>y have been managed since<br />
collection … all digital project <strong>archive</strong>s must have three components: data, documentation,<br />
<strong>and</strong> an index. (AHDS 2000)<br />
The Chau Hiix Archive considers this a principle of consistency <strong>and</strong> organization of data, <strong>and</strong><br />
implements it by maintaining <strong>the</strong> data in archival formats (TXT, TIF, JPG, PDF); documenting data<br />
collection <strong>and</strong> detailed description in a Manual, <strong>and</strong> providing a simple file-folder structure as an<br />
index.<br />
Consistency has to do with both <strong>the</strong> consistent use of a finite set of data formats <strong>and</strong> consistency of<br />
values that represent data. Without consistency it is not possible to lump data toge<strong>the</strong>r or split data<br />
into groups. Consistency of data values is exemplified in controlled vocabularies <strong>and</strong> coding<br />
systems. If a group of artifacts are labeled variously as plaster, daub, ceramics, sherds, pots, it will<br />
require a good bit of re-labeling to group like objects: When sorting alphabetically on Material<br />
Type, <strong>the</strong> daub <strong>and</strong> plaster will not group toge<strong>the</strong>r, nei<strong>the</strong>r will <strong>the</strong> ceramics, sherds, <strong>and</strong> pots.<br />
Similarly, if <strong>the</strong>re are no st<strong>and</strong>ard survey codes for indicating <strong>the</strong> bottom southwest (BSW) corner<br />
of a building mound <strong>the</strong>n much time will be spent re-plotting <strong>the</strong> survey points to identify <strong>the</strong><br />
building configuration.<br />
In years past each excavator has collected his or her data in <strong>the</strong>ir own way, leading to a myriad of<br />
variations of software <strong>and</strong> descriptive terminology. Without consistency in data format, users of an<br />
<strong>archive</strong> will need access to <strong>the</strong> software that generated <strong>the</strong> various format types. Some software<br />
products create proprietary file formats requiring access to that specific software. Some survey <strong>and</strong><br />
GIS type software require hardware keys in order to open files. Databases are useful but <strong>the</strong>ir files<br />
are often not interchangeable between products. Selecting archival formats that are hardware <strong>and</strong><br />
software independent facilitates access to data.<br />
Archival formats. Text (.TXT) is considered <strong>the</strong> only archival format for alpha-numeric files. Text is<br />
<strong>the</strong> only format providing software independence. Select ei<strong>the</strong>r Text (ASCII) or Unicode (which<br />
subsumes ASCII; do not use <strong>the</strong> ANSI text option as it is specific to Microsoft). Though TXT is <strong>the</strong><br />
baseline format, a version of <strong>the</strong> file as created in its software application should be <strong>archive</strong>d along<br />
with <strong>the</strong> TXT version. For example, <strong>the</strong> MS Excel file MyData.XLS is <strong>archive</strong>d as well as <strong>the</strong> version<br />
<strong>chau</strong>-<strong>hiix</strong>-digital-<strong>archive</strong>.doc 3 3/30/2010 3:17 PM
saved-as MyData.TXT (if you save-as in <strong>the</strong> .CSV format , open it in Notepad/text-editor to save it in<br />
<strong>the</strong> .TXT format).<br />
Documentation of <strong>the</strong> TXT file is essential. Ensure that <strong>the</strong>re is a header in each TXT file that<br />
describes what <strong>the</strong> data in <strong>the</strong> file represents. This may require more than just a delineation of <strong>the</strong><br />
column headings, each heading may need explication.<br />
Images should be saved in TIF format. The JPG format is acceptable for images when resolution is of<br />
less importance or <strong>the</strong>re is no o<strong>the</strong>r option due to available technology (digital cameras should be<br />
set at <strong>the</strong>ir highest resolution). The PDF format is acceptable, but not optimal, for scanned images of<br />
notebook pages <strong>and</strong> o<strong>the</strong>r h<strong>and</strong>written documents but care must be taken to ensure <strong>the</strong> pages are<br />
readable. TIF is <strong>the</strong> optimal image format due to non-compressed high resolution.<br />
The goal of <strong>the</strong> archival format for image data is to save in <strong>the</strong> format with <strong>the</strong> greatest chance of<br />
industry format migration, <strong>and</strong> in <strong>the</strong> highest resolution possible (without compression)<br />
appropriate to <strong>the</strong> value of <strong>the</strong> image. For example, <strong>the</strong> photograph of a burial in situ is more<br />
valuable than an image scan of a field notebook.<br />
Organization of <strong>the</strong> <strong>archive</strong> can be accomplished with a simple folder-file structure <strong>and</strong> does not<br />
require a full index of all documents or a relational database structure that has to be updated <strong>and</strong><br />
migrated. Create a simple Read-me.txt file that documents how <strong>the</strong> folder-file organization is laid<br />
out, serving as a type of site-map.<br />
Files are Archived with a Stewardship Plan<br />
In order for data to be accessible in <strong>the</strong> future it must be <strong>archive</strong>d with a stewardship plan. Short<br />
term planning, for <strong>the</strong> next five years or until <strong>the</strong> PI retires, may be <strong>the</strong> only strategy at this time.<br />
Long term planning is dependent on <strong>the</strong> perceived value of <strong>the</strong> <strong>archive</strong> as a digital asset <strong>and</strong> <strong>the</strong><br />
availability of institutional, organizational, or public Data Repositories. Even <strong>the</strong> best-laid plans may<br />
change over time. Consider <strong>the</strong> plight of Nicholson Baker who “pleads <strong>the</strong> case for saving our<br />
recorded heritage in its original form while telling <strong>the</strong> story of how <strong>and</strong> why our greatest research<br />
libraries betrayed <strong>the</strong> public trust by auctioning off or pulping irreplaceable [newspaper]<br />
collections” (Baker 2001). The Libraries had microfilmed all <strong>the</strong> newspapers which not only lost all<br />
<strong>the</strong> color print but also rendered a good number of pages unreadable due to bleed through <strong>and</strong><br />
poor imaging. Diligence in stewardship does not guarantee that data will be accessible in <strong>the</strong> future.<br />
Identify <strong>the</strong> data steward.<br />
The Chau Hiix Archive is still in its formative years <strong>and</strong> so <strong>the</strong> PI (Dr. Pyburn) <strong>and</strong> <strong>the</strong> Data Manager<br />
maintain <strong>the</strong> <strong>archive</strong>. This is a fine plan as long as <strong>the</strong>y are alive <strong>and</strong> well <strong>and</strong> still in <strong>the</strong> workforce.<br />
But what happens when <strong>the</strong>y are gone, where does <strong>the</strong> data go? As members of a large university<br />
community <strong>the</strong> data <strong>archive</strong> is backed up on a well-maintained remote system, but how long will<br />
<strong>the</strong> <strong>archive</strong> remain on <strong>the</strong> system <strong>and</strong> who will have access once <strong>the</strong> faculty researcher retires?<br />
Universities have failed to step up to <strong>the</strong> need for creating Research Data Archives. While <strong>the</strong><br />
university Library may be <strong>the</strong> steward of scholarly discourse (in <strong>the</strong> form of print or electronic<br />
publication) <strong>the</strong>y have yet to step up or be m<strong>and</strong>ated to steward digital data. Scholars need to be<br />
<strong>chau</strong>-<strong>hiix</strong>-digital-<strong>archive</strong>.doc 4 3/30/2010 3:17 PM
advocating for <strong>the</strong> creation of research data repositories whe<strong>the</strong>r <strong>the</strong>y be institutional or<br />
organizational, public or privately funded. For a repository example, see AHDS (<strong>the</strong> Arts <strong>and</strong><br />
Humanities Data Service), for <strong>the</strong> latest discussion on <strong>the</strong> future of <strong>archive</strong>d data see <strong>the</strong> Blue<br />
Ribbon Task Force on Sustainable Digital Preservation <strong>and</strong> Access (BRTF-SDPA 2010).<br />
Stewardship tasks<br />
While analog documents age in a slow <strong>and</strong> foreseeable way, aging occurs rapidly in <strong>the</strong> digital<br />
realm. Digital stewardship involves three areas of yearly monitoring <strong>and</strong> activity: Format migration<br />
<strong>and</strong> media transfer, data verification, <strong>and</strong> accessibility.<br />
Format migration <strong>and</strong> media transfer. The data steward must monitor <strong>the</strong> need for data<br />
migration from one format to <strong>the</strong> next <strong>and</strong> for transfer from one media to <strong>the</strong> next generation.<br />
Archival formats evolve <strong>and</strong> may require data to be migrated from an old st<strong>and</strong>ard to a new<br />
st<strong>and</strong>ard. Alphanumeric data in <strong>the</strong> TXT format (ASCII-based) is safe because its next generation is<br />
Unicode which is backwardly compatible to ASCII <strong>and</strong> exp<strong>and</strong>s <strong>the</strong> character set to include<br />
requirements from o<strong>the</strong>r languages. Image formats, however, are in flux <strong>and</strong> need to be closely<br />
monitored. For example, <strong>the</strong> PDF format is proprietary to Abode. Should <strong>the</strong> company make a bad<br />
business decision <strong>and</strong> close down, or be acquired, <strong>the</strong>re is no guarantee that <strong>the</strong> format will remain<br />
viable in its current form. A migration path may only be available for a short period of time, after<br />
which it may take considerable expense to locate <strong>and</strong> execute <strong>the</strong> migration.<br />
Similarly <strong>the</strong> media on which data are stored need to be monitored for viability. Floppy drives <strong>and</strong><br />
diskettes have become obsolete <strong>and</strong> it is costly to find services that will transfer data on older<br />
media to new media. In addition, media can simply fail mechanically, thus <strong>the</strong> need for backup<br />
copies on various media types.<br />
Data verification. Transfer of data to a new medium can fail, partially fail, or cause data<br />
corruption, <strong>and</strong> so data verification is essential. Migration of data to a new format can cause<br />
unanticipated changes in <strong>the</strong> data, so <strong>the</strong> data need to be carefully verified during a migration<br />
process. Even if <strong>the</strong> data has not been migrated or transferred to new media, <strong>the</strong> data should be<br />
verified on a yearly basis as this will uncover <strong>the</strong> failure of a particular media, storage device, or<br />
location . R<strong>and</strong>omly opening several documents in different formats would be sufficient for<br />
verification.<br />
Over time multiple professionals may analyze portions of <strong>the</strong> data set <strong>and</strong> identify errors. Some<br />
process must be put in place to verify data errors <strong>and</strong> incorporate corrections into <strong>the</strong> <strong>archive</strong>.<br />
Accessibility. Digital data has to physically reside somewhere, on some server, on some storage<br />
device. Repositories can be private, institutional/industry, or national/international. Undoubtedly<br />
all researchers have private copies of all <strong>the</strong>ir data <strong>and</strong> can provide access, that is, if it is organized<br />
<strong>and</strong> regularly verified. In <strong>the</strong> context of this document, <strong>the</strong> university, as an institution, has<br />
provided a remote backup <strong>and</strong> storage system. But access is still only available through private<br />
contact with <strong>the</strong> researcher. O<strong>the</strong>r institutions (libraries, societies, <strong>and</strong> organizations) have failed as<br />
yet to manage such data in a way that a wider audience could have access.<br />
<strong>chau</strong>-<strong>hiix</strong>-digital-<strong>archive</strong>.doc 5 3/30/2010 3:17 PM
Industrial repositories include electronic publishers, web services (similar to FLICKR <strong>and</strong><br />
Facebook), or what are called “cloud” type services that offer remote data backup <strong>and</strong> archiving.<br />
But at issue with industries is cost as well as <strong>the</strong>ir viability as a company. The private sector has no<br />
m<strong>and</strong>ate to continue to provide service in perpetuity.<br />
Lots of Copies Keeps Stuff Safe (LOCKSS) is both a strategy <strong>and</strong> a program (StanfordUniversity<br />
2008). While <strong>the</strong> Stanford program may be beyond <strong>the</strong> scope of “small science” projects, <strong>the</strong><br />
strategy is useful if well managed. The data steward needs to track <strong>the</strong> locations of <strong>the</strong> various<br />
copies to maintain version control <strong>and</strong> to ensure data migration <strong>and</strong> media transfer. Chau Hiix<br />
maintains stewardship over five copies of <strong>the</strong> <strong>archive</strong>, six if you include <strong>the</strong> print version.<br />
O<strong>the</strong>r national or international repositories have yet to be established for “small science” projects<br />
such as Chau Hiix but <strong>the</strong> Arts <strong>and</strong> Humanities Data Service (AHDS 2000) provides a good model.<br />
History of Chau Hiix Digital Data<br />
Simple Not Easy: The digital process<br />
Prior to 1996 all Chau Hiix recordkeeping was done with paper <strong>and</strong> pencil. Digital data collection<br />
began with replacing pencil <strong>and</strong> paper with what was effectively a typewritten record, thus<br />
exchanging penmanship readability with typing errors. The portable computers were physically<br />
large, heavy, dependent on 3.5 diskettes for data storage, <strong>and</strong> expensive for <strong>the</strong> small budget of a<br />
university-based project. In addition, operating systems <strong>and</strong> software were not as user friendly as<br />
today, <strong>and</strong> both students <strong>and</strong> staff were inexperienced in <strong>the</strong>ir use. There was not money to buy<br />
spreadsheet software or hire a database programmer, <strong>and</strong> <strong>the</strong>re was not time to teach staff <strong>and</strong><br />
students to use <strong>the</strong>m reliably. These first Chau Hiix digital data records were created in a Text<br />
Editor (creating TXT files). Every data record was <strong>the</strong>refore entered as a single line in a text file<br />
with a consistent <strong>and</strong> documented syntax listed at <strong>the</strong> top of <strong>the</strong> file.<br />
Sample of a 1996 digital data file follows. This file exemplifies <strong>the</strong> Save-as Text archival version of<br />
all digital data.<br />
INVENTORY LITHIC/CERAMIC CHAU HIIX EXCAVATION 48 in 1996<br />
Data syntax= provenience,material,date,count,weight-in-ounces,excavator-initials,#-ofbags,notes<br />
(use x comma for missing information)<br />
che-48-1-2-96,lithic,6-Mar-96,8,x,rs/cw,1,x,<br />
che-48-1-20-96,ceramic,7-Mar-96,108,40,cw/rs,1,treefall cache<br />
che-48-1-21-96,ceramic,7-Mar-96,103,44,x,1,treefall cache<br />
che-48-1-23-96,ceramic,7-Mar-96,68,32,rs/gw,3of3,x,<br />
che-48-1-23-96,ceramic,7-Mar-96,26,32,rs/gw,2of3,x,<br />
che-48-1-23-96,ceramic,7-Mar-96,68,38,rs/gw,1of3,x,<br />
che-48-1-29-96,lithic,18-Mar-96,10,3,x,1,backdirt<br />
Note: Below is an explanation of provenience number construction.<br />
<strong>chau</strong>-<strong>hiix</strong>-digital-<strong>archive</strong>.doc 6 3/30/2010 3:17 PM
cht-70-2-1-99<br />
| | | | | |<br />
a b c d e f<br />
a. Location: CH= Chau Hiix; XT= Xtabai; NR= NaRob<br />
b. Type of provenience: e=excavation, t=testpit, p= posthole, s=surface collection.<br />
c. Provenience number for reference <strong>and</strong> sync with survey points.<br />
d. Operation number for multiple excavations/years at <strong>the</strong> same survey point.<br />
e. Context numbers are assigned by <strong>the</strong> excavator.<br />
f. Year code (YY) is redundant but useful for workflow <strong>and</strong> analysis.<br />
During <strong>the</strong> field season <strong>the</strong>se text lines would be visually scanned by <strong>the</strong> Lab Manager for data<br />
input errors. Since <strong>the</strong> comma is used as <strong>the</strong> data delimiter (i.e., separator used to indicate a new<br />
unit of data). Misplacement of <strong>the</strong> delimiter/comma is <strong>the</strong> source of most errors, such as separating<br />
multiple Excavator-initials (use “cjb/glp” instead of “cjb, glp”) or within a Note (use<br />
“backdirt/burial45” instead of “backdirt, burial45”).<br />
Post-season <strong>the</strong> Data Manager imports <strong>the</strong> data files into a spreadsheet for fur<strong>the</strong>r error checking.<br />
Typographic errors are constantly a concern <strong>and</strong> <strong>the</strong> spreadsheet sorting function provides a good<br />
tool for checking <strong>the</strong> consistency of data entry. Data files are <strong>the</strong>n merged with seasonal files from<br />
all field seasons. Using <strong>the</strong> spreadsheet’s Save-as function, a new TXT file is created as <strong>the</strong> archival<br />
copy of this total catalog of data. Spreadsheets are <strong>the</strong>n made available for analysis.<br />
By 2003 students <strong>and</strong> staff had enough computer experience for data entry to be done directly into<br />
a spreadsheet. The comma is still used as <strong>the</strong> delimiter for <strong>the</strong> archival TXT copy of <strong>the</strong> file, so data<br />
entry errors are still related to use of <strong>the</strong> comma within a cell, as well as typing <strong>and</strong> consistency<br />
errors. The Data Manager continues to follow <strong>the</strong> same post-season procedure: check for errors,<br />
merge all seasonal files with previous season’s files, make <strong>the</strong> spreadsheets available for analysis,<br />
<strong>and</strong> Save-as a TXT file to add to <strong>the</strong> <strong>archive</strong>.<br />
As students, staff <strong>and</strong> professionals who work at Chau Hiix become more digitally sophisticated, so<br />
does <strong>the</strong> hardware <strong>and</strong> software <strong>the</strong>y employ, as well as <strong>the</strong> number of files <strong>the</strong>y create. The basic<br />
data <strong>archive</strong> plan, however, remains <strong>the</strong> same. The <strong>archive</strong>d TXT files of 2009 look exactly like <strong>the</strong><br />
<strong>archive</strong>d files of 1996.<br />
Simple Archive: Save-as Text, Print, Backup<br />
The <strong>archive</strong> plan for Chau Hiix digital data is both conceptually <strong>and</strong> deceptively simple: Save-as text,<br />
print, <strong>and</strong> make three digital backup copies to store in separate locations.<br />
Not Easy in Details<br />
Implementing this simple plan is not easy; <strong>the</strong> devil is in <strong>the</strong> details. Each step is filled with<br />
<strong>problems</strong> <strong>and</strong> decisions, <strong>and</strong> <strong>the</strong> archival processes still do not guarantee <strong>the</strong> accessibility<br />
of <strong>the</strong> data in perpetuity.<br />
<strong>chau</strong>-<strong>hiix</strong>-digital-<strong>archive</strong>.doc 7 3/30/2010 3:17 PM
There are two sets of tasks in <strong>the</strong> digital data process: The Lab Manager tasks for collecting <strong>the</strong><br />
data, <strong>and</strong> <strong>the</strong> Data Manager tasks for archiving <strong>the</strong> data.<br />
Lab Manager Tasks for Collecting Digital Data. The Lab Manager is responsible for st<strong>and</strong>ardized<br />
file formats <strong>and</strong> names, consistency of data entry <strong>and</strong> ongoing backups during <strong>the</strong> data collection<br />
process. Use <strong>the</strong> following rules:<br />
File Formats<br />
Format refers to <strong>the</strong> three-letter extension that follows <strong>the</strong> period in a file name. Every<br />
software application uses an extension for its data files, most software offer choices of<br />
format under <strong>the</strong> Save-as option. Chau Hiix uses <strong>the</strong> following software: Microsoft Excel<br />
(format is .XLS), Microsoft Word (format is .DOC or .DOCX), <strong>and</strong> TDS Survey Pro<br />
(proprietary formats .AR5, .CR5, .RW5, but with a Save-as .TXT option). However, data may<br />
be collected in any software as long as <strong>the</strong>re is an option to Save-as or Export <strong>the</strong> data to an<br />
archival format. Archival formats are limited to TXT or Unicode, TIF, JPG, PDF.<br />
File Names<br />
There are generally two types of files: Data <strong>and</strong> image.<br />
Data files that are required are spreadsheets established <strong>and</strong> explained in <strong>the</strong> Lab Manual.<br />
Names of required spreadsheet data files for each Chau Hiix excavation season are:<br />
Personnel-Notebooks-YY.xls<br />
Provenience-Masterlist-YY.xls<br />
Inventory-Artifacts-YY.xls<br />
Inventory-Burial-YY.xls<br />
Survey-Gridpoints –YY.xls<br />
Survey-TotalPoints-YY.xls<br />
O<strong>the</strong>r data files may be generated as needed, such as Survey point collection files:<br />
TDS-mainplatform-07.cr5<br />
TDS-lagoon-07.cr5<br />
Data files are saved in both Data-entry format (of <strong>the</strong> specific software being used, e.g. Excel<br />
.XLS) <strong>and</strong> Backup format (archival format such as TXT or Unicode). Both <strong>the</strong> data-entry <strong>and</strong><br />
back-up format of a file have <strong>the</strong> exact same filename <strong>and</strong> are differentiated by <strong>the</strong> threeletter<br />
format extension. For example:<br />
Personnel-Notebooks-07.xls<br />
Personnel-Notebooks-07.txt<br />
TDS-mainplatform-07.cr5<br />
TDS-mainplatform-07.txt<br />
<strong>chau</strong>-<strong>hiix</strong>-digital-<strong>archive</strong>.doc 8 3/30/2010 3:17 PM
Image files are ei<strong>the</strong>r digital photographs or document scans (notebook pages, drawings,<br />
forms, cards, etc.) Digital photos are collected in a Photograph folder <strong>and</strong> placed in<br />
subfolders according to context type: Artifacts, Drawings <strong>and</strong> Maps, Proveniences, Camp-<br />
Scenes, etc. Files are named according to <strong>the</strong>ir content, beginning with Provenience number,<br />
or collected in a specific Provenience folder.<br />
str150-plan.tif (plan view of structure 150)<br />
str7-032905.jpg (structure 7 on Mar 29, 2005)<br />
che-71-1-15-98-lithics.tif (debitage from this context)<br />
cht-141-1-1-98-rim.tif (ceramic rim sherd)<br />
Or folders named:<br />
che-18-1-8-03-ceramic (large cache of sherds)<br />
che-16-x-x-03-lithic (contains multiple operations <strong>and</strong> contexts)<br />
che-22-excavations (general views of <strong>the</strong> excavation in progress)<br />
Document scans follow a similar filename-folder scheme except for Notebook page scans.<br />
Notebooks, designated for scanning by <strong>the</strong> PI, have filenames that begin with a two digit<br />
year YY <strong>and</strong> <strong>the</strong>n an arbitrary two digit personnel number assigned at <strong>the</strong> beginning of <strong>the</strong><br />
season to each field member. This is followed by a three digit number for <strong>the</strong> first notebook<br />
page in <strong>the</strong> file. Notebooks scans are limited to 8 pages per file to keep <strong>the</strong> file size<br />
manageable. For example, in 2007 Dr. Pyburn was assigned personnel number 01 so <strong>the</strong><br />
scanned files of her field notebook were named:<br />
0701001.tif (note this file contains pages 1 through 8)<br />
0701009.tif (note this file contains pages 9 through 17)<br />
0701018.tif (note this file contains pages 18 through 26)<br />
0701027.tif etc.<br />
All o<strong>the</strong>r files that are created should be named following <strong>the</strong> recommendations adapted<br />
from <strong>the</strong> Media Vault. The same guidelines apply to <strong>the</strong> names of Folders.<br />
File/folder names should not contain blank spaces. File/folder names should only<br />
include <strong>the</strong> letters A-Z <strong>and</strong> a-z, <strong>the</strong> numbers 0-9, plus underscores <strong>and</strong> hyphens.<br />
Underscores, hyphens, <strong>and</strong> zeros cannot occur as <strong>the</strong> first character of <strong>the</strong> name of a<br />
file/folder. File/folder names are considered case sensitive (because <strong>the</strong>y will be in<br />
some environments), but best practice is to not use case variation in considering<br />
uniqueness of naming. File/folder names should not exceed 64 characters, including<br />
file extension. Shorter is better. (MediaVault 2009)<br />
(Note that filenames that begin with zero are a problem when using <strong>the</strong> MS Excel <strong>and</strong> <strong>the</strong><br />
sorting function, so do not name files with leading zeros.)<br />
<strong>chau</strong>-<strong>hiix</strong>-digital-<strong>archive</strong>.doc 9 3/30/2010 3:17 PM
Data Entry<br />
Accuracy <strong>and</strong> consistency of data entry requires constant attention. Most fields in <strong>the</strong><br />
required spreadsheets use controlled vocabularies for data entry options which are listed<br />
<strong>and</strong> explained in <strong>the</strong> Lab Manual. (Note that it is possible to create drop down menus for<br />
populating spreadsheet cells, but <strong>the</strong> Lab Manager must underst<strong>and</strong> <strong>the</strong> implementation of<br />
this function in case changes need to be managed in <strong>the</strong> field.) The Lab Manager is wise to<br />
check for data errors on a daily basis, making <strong>the</strong>m easier to identify <strong>and</strong> correct. Data<br />
errors generally fall into three categories: typographical, auto-fill related, <strong>and</strong> explicative.<br />
Typographical errors are most often found by doing a sorting process on each column <strong>and</strong><br />
checking for inconsistencies. Typographical errors also occur when capturing survey data in<br />
<strong>the</strong> Total Station. The surveyor should be asked to edit <strong>the</strong>se errors, or “little finger<br />
mistakes” as one surveyor put it, as <strong>the</strong> Shot Description field can be quite cryptic.<br />
Auto-fill errors relate to <strong>the</strong> spreadsheet software trying to be “helpful” <strong>and</strong> auto-fill cells<br />
based on previous data entry. (Note that it is possible to turn OFF auto-fill but again, <strong>the</strong> Lab<br />
Manager must underst<strong>and</strong> <strong>the</strong> management of this function.) These errors are much harder<br />
to uncover, such as <strong>the</strong> date-of-collection becoming <strong>the</strong> date of data entry. Those doing data<br />
entry are asked to step-away from <strong>the</strong> task every twenty minutes to regain focus <strong>and</strong> help<br />
minimize this “helpful” problem. Daily error checking by <strong>the</strong> Lab Manager is <strong>the</strong> best<br />
deterrent.<br />
Explicative data errors refer to <strong>the</strong> excitement of data discovery <strong>and</strong> attempts during <strong>the</strong><br />
data entry process to provide as much information as possible or to begin <strong>the</strong> analysis<br />
process. For example, those doing data entry might see a particularly interesting lithic <strong>and</strong><br />
try to describe it in <strong>the</strong> Material column as “stemmed macro blade retouched”, or a ceramic<br />
as a “diagnostic rim sherd”. The information in <strong>the</strong>se types of entries belongs in <strong>the</strong> Notes<br />
field where it will not interfere with <strong>the</strong> st<strong>and</strong>ard vocabulary for Material type <strong>and</strong> thus<br />
cause sorting <strong>problems</strong>. Consistency of terminology (<strong>and</strong> spelling) is essential to being able<br />
to sort <strong>the</strong> data. This holds true for survey description also. Survey descriptive codes must<br />
be consistent <strong>and</strong> documented. New descriptors, codes, or vocabulary can be added as<br />
necessary but with discussion <strong>and</strong> agreement by all professionals.<br />
Backups<br />
Data files that have been modified need to be backed up each day, or more often if <strong>the</strong>re is<br />
extensive data entry or many changes in <strong>the</strong> personnel doing data entry. In order to control<br />
<strong>the</strong> file versions, file naming <strong>and</strong> a backup schedule should be established. Use a file name<br />
beginning with DB for daily backups (or DB1, DB2, DB3, etc. for more frequent backups)<br />
followed by a filename designator <strong>and</strong> <strong>the</strong>n a date in <strong>the</strong> form of DDMMYY. For example, <strong>the</strong><br />
Inventory-Artifacts-07.xls file would have daily backup files named:<br />
DB-IA-120307.xls<br />
DB-IA-130307.xls<br />
DB1-IA-140307.xls<br />
<strong>chau</strong>-<strong>hiix</strong>-digital-<strong>archive</strong>.doc 10 3/30/2010 3:17 PM
DB2-IA-140307.xls<br />
DB-IA-150307.xls<br />
These daily backups could be deleted once a weekly backup file is created <strong>and</strong> checked<br />
against <strong>the</strong> latest version of <strong>the</strong> daily backup. Weekly backups should be retained as part of<br />
<strong>the</strong> seasonal field record to provide for data entry error analysis by <strong>the</strong> Data Manager at <strong>the</strong><br />
end of <strong>the</strong> season. The weekly backup (WB) would use a similar filename for version<br />
control:<br />
WB-IA-160307.xls<br />
For version control, designate <strong>the</strong> last backup at <strong>the</strong> end of <strong>the</strong> field season (EOS) as:<br />
EOS-IA-220407.xls<br />
Each day <strong>the</strong> backup files should be copied to a portable storage device that is kept with <strong>the</strong><br />
Lab Manager’s passport. In <strong>the</strong> event of a disaster (fire, flood, hurricane, etc.) <strong>the</strong> Lab<br />
Manager is most likely to remember to grab <strong>the</strong> passport <strong>and</strong> thus <strong>the</strong> data also.<br />
Data Manager Process for Archiving Digital Data. The Data Manager is responsible for creating<br />
<strong>and</strong> maintaining <strong>the</strong> data <strong>archive</strong>. The maintenance tasks are best accomplished as soon as possible<br />
after <strong>the</strong> end of <strong>the</strong> field season; <strong>the</strong> more time that passes <strong>the</strong> less likely it is that personnel will<br />
remember details if <strong>the</strong>re are questions or anomalies.<br />
Organize <strong>the</strong> Archive<br />
Controlled vocabularies are <strong>the</strong> foundation of <strong>the</strong> <strong>archive</strong> as well as a major source of data<br />
entry errors. Vocabulary (or values allowed in each cell) must be consistent to provide for<br />
accurate data sorting when using <strong>the</strong> spreadsheets for analysis. Unless <strong>the</strong>y are experienced<br />
in sorting data, consistency in data entry is <strong>the</strong> most difficult principle to convey to <strong>the</strong> field<br />
personnel. This includes <strong>the</strong> surveyor who may not be experienced in <strong>the</strong> use of GIS type<br />
software <strong>and</strong> fails to consistently label each point with st<strong>and</strong>ardized description codes.<br />
Building <strong>and</strong> maintaining <strong>the</strong> controlled vocabularies is a primary task for <strong>the</strong> Data<br />
Manager. Before each field season <strong>the</strong> Data Manager updates <strong>and</strong> explicates <strong>the</strong> controlled<br />
vocabulary lists in <strong>the</strong> Lab Manual.<br />
Material Type is an example of a controlled vocabulary: Name <strong>the</strong>m pots, sherds or<br />
ceramics, but pick one term to use consistently for each concept. The Material Type (<strong>and</strong><br />
sub-types) terms in <strong>the</strong> controlled vocabulary used by Chau Hiix are:<br />
Historic (ceramic, glass, metal)<br />
Ceramic (polychrome, mendhole, netweight, partial-vessel)<br />
Shell (marine, freshwater)<br />
Bone (human, faunal)<br />
Lithic (chert, groundstone, jade, obsidian, hematite)<br />
Plaster (Painted)<br />
Carbon<br />
<strong>chau</strong>-<strong>hiix</strong>-digital-<strong>archive</strong>.doc 11 3/30/2010 3:17 PM
Soil (chemical, flotation, waterscreen)<br />
Misc (wood, stone, coral)<br />
All controlled vocabularies are constantly in review for additions or clarifications. Sub-types<br />
of a vocabulary term are not required initially but may be added or separated into more<br />
detail as <strong>the</strong> analysis phase evolves.<br />
The folder structure of <strong>the</strong> Chau Hiix Archive is simple. The Data Manager establishes <strong>the</strong><br />
folder structure as well as all <strong>the</strong> rules <strong>and</strong> tasks for <strong>the</strong> files it contains: File formats; file<br />
naming schemes; processes for data entry; <strong>and</strong> backups. In addition, <strong>the</strong> Data Manager, in<br />
concert with <strong>the</strong> PI, maintains controlled vocabularies that are used to fill in each cell in <strong>the</strong><br />
spreadsheets. Rules, tasks <strong>and</strong> vocabularies are all reviewed <strong>and</strong> updated on a yearly basis<br />
as influenced by <strong>the</strong> Lab Manager’s experience <strong>and</strong> as dem<strong>and</strong>ed by changes in both data<br />
<strong>and</strong> technology.<br />
The Chau Hiix Archive consists of <strong>the</strong> following Folders:<br />
0-LogArchiveChanges<br />
1-Manuals<br />
2-Personnel<br />
3-Notebooks<br />
4-Proveniences<br />
5-Artifacts<br />
6-Survey<br />
7-Visuals<br />
8-Analyses<br />
9-Documents<br />
10-SeasonalFiles<br />
FORMS<br />
PROBLEMS<br />
SOFTWARE<br />
Populating <strong>the</strong> Archive<br />
The Data Manager collects all <strong>the</strong> digital data at <strong>the</strong> end of <strong>the</strong> field season from <strong>the</strong> Lab<br />
Manager <strong>and</strong> begins <strong>the</strong> process of data validation, error-checking, documentation, <strong>and</strong><br />
storing archival versions. See <strong>the</strong> checklist that follows for <strong>the</strong> details of this process.<br />
Locations for <strong>the</strong> storage of <strong>the</strong> Chau Hiix Digital Archive include: onsite at Chau Hiix, in<br />
country with <strong>the</strong> department of archaeology, in <strong>the</strong> PI office, in <strong>the</strong> Data Manger office, <strong>and</strong><br />
on a university backup system. The PRINT version of <strong>the</strong> TXT files is stored in a fireproof<br />
cabinet in <strong>the</strong> office of <strong>the</strong> PI.<br />
Maintain <strong>the</strong> Archive<br />
The five Archive locations are maintained by <strong>the</strong> Data Manager. This entails replacing <strong>the</strong><br />
whole folder system with a new version at least after each field season <strong>and</strong> more often in<br />
<strong>the</strong> case of more frequent inter-season changes or additions, typically analysis data. At least<br />
<strong>chau</strong>-<strong>hiix</strong>-digital-<strong>archive</strong>.doc 12 3/30/2010 3:17 PM
three full versions of <strong>the</strong> whole <strong>archive</strong> are maintained on <strong>the</strong> university backup system,<br />
identified by folder name <strong>and</strong> version data (YYMMDD, <strong>and</strong> this date syntax is documented<br />
in <strong>the</strong> LogArchiveChanges files.) For example, <strong>archive</strong> folders are named:<br />
ChauHiixArchive-070621<br />
ChauHiixArchive-091020<br />
ChauHiixArchive-100112<br />
When a new version of <strong>the</strong> <strong>archive</strong> is created, r<strong>and</strong>om files need to be opened to ensure <strong>the</strong><br />
copy process has been completely successfully. If a new <strong>archive</strong> folder is not created after a<br />
two year period, <strong>the</strong> Data Manager should open <strong>the</strong> latest version <strong>and</strong> r<strong>and</strong>omly check files<br />
for data <strong>and</strong> media viability.<br />
The Data Manager is responsible for monitoring <strong>the</strong> viability of both archival formats <strong>and</strong><br />
media integrity <strong>and</strong> identifying <strong>the</strong> need for migration of data format, storage media, or<br />
storage location. The need for migration could be triggered by technology obsolescence, <strong>and</strong><br />
upgrades or retirement of formats, software, operating systems, or location accessibility.<br />
Digital Archive Checklist of Tasks<br />
Pre- season:<br />
1. Lab Manual is reviewed <strong>and</strong> updated as necessary <strong>and</strong> copies of all required Master files<br />
are made available for reference in <strong>the</strong> new field season.<br />
2. Prepare blank copies of required Master files for data entry in current season.<br />
3. Assign an arbitrary two-digit number to each field season participant <strong>and</strong> record it in<br />
<strong>the</strong> new, blank Personnel-notebook spreadsheet.<br />
Daily during <strong>the</strong> field season:<br />
1. Make daily backup files (DB) of each file that has been modified.<br />
2. Sort by columns each required spreadsheet to identify <strong>and</strong> correct errors.<br />
3. Copy daily backup files to an external storage device <strong>and</strong> store with your passport.<br />
Weekly during <strong>the</strong> field season:<br />
1. Make weekly backup files (WB) of each file.<br />
2. Sort by columns each required spreadsheet to identify <strong>and</strong> correct errors.<br />
3. Copy weekly backup files to an external storage device <strong>and</strong> store with your passport.<br />
End of field season:<br />
1. Copy all WB files to TWO different external storage devices, leaving master copies on<br />
<strong>the</strong> Lab computer.<br />
<strong>chau</strong>-<strong>hiix</strong>-digital-<strong>archive</strong>.doc 13 3/30/2010 3:17 PM
Post-season:<br />
2. Create a final EOS file of each spreadsheet representing <strong>the</strong> total data entry for <strong>the</strong><br />
season. Make two copies of <strong>the</strong> EOS files <strong>and</strong> store with <strong>the</strong> WB files on external devices.<br />
3. Write a short report documenting <strong>problems</strong>, anomalies or suggestions regarding <strong>the</strong>se<br />
files.<br />
1. Copy each EOS file <strong>and</strong> rename it according to its Master file name but append YY for<br />
year <strong>and</strong> <strong>the</strong> letter F for Field. For example: EOS-IA-220407.xls is copied to Inventory-<br />
Artifacts-07F.xls. This file becomes part of <strong>the</strong> permanent Archive in folder 10-<br />
SeasonalFiles. (See list of Archive folders above.)<br />
2. Make a second copy each of EOS file <strong>and</strong> rename it according to its Master file name but<br />
append YY for year <strong>and</strong> <strong>the</strong> letter P for Postseason. This is <strong>the</strong> Postseason working file<br />
where changes or edits are made. For example: EOS-IA-220407.xls is copied to Inventory-<br />
Artifacts-07P.xls.<br />
3. Sort by columns <strong>the</strong> spreadsheets (filename–YYP.xls) to identify <strong>and</strong> correct errors.<br />
4. When confident with data validity, combine each of <strong>the</strong>se files (filename–YYP) with its<br />
previous season’s Master file to become <strong>the</strong> next/future season’s Master file. For<br />
example: Inventory-Artifacts-07P.xls is combined with Inventory-Artifacts-05P.xls <strong>and</strong> <strong>the</strong><br />
new Inventory-Artifacts-07P.xls becomes part of <strong>the</strong> permanent Archive in folder 5-<br />
Artifacts.<br />
5. The Master file from <strong>the</strong> previous 2005 season, Inventory-Artifact-05P.xls, is <strong>the</strong>n moved<br />
to folder 10-SeasonalFiles subfolder for <strong>the</strong> 2005 field season. The folder 10-SeasonalFiles<br />
subfolder 2005 now contains <strong>the</strong> two files: Inventory-Artifact-05F.xls <strong>and</strong> Inventory-<br />
Artifact-05P.xls<br />
6. Make a TXT file for every Master file, adding a data syntax header to each one, <strong>the</strong>n<br />
PRINT each TXT file.<br />
7. In <strong>the</strong> current year’s subfolder in folder 10-SeasonalFiles subfolder, make a TXT copy of<br />
every file.<br />
8. Delete all DB <strong>and</strong> WB files as <strong>the</strong>y are now represented by <strong>the</strong> file with <strong>the</strong> –YYF<br />
designation. Retain <strong>the</strong> EOS files as insurance against post season data editing errors.<br />
9. File all o<strong>the</strong>r seasonal files in <strong>the</strong> appropriate Archive folders. For example, <strong>the</strong> season<br />
notebook scans are put in folder 3-Notebooks, <strong>and</strong> photographs are put in folder 7-<br />
Visuals.<br />
10. Delete any extraneous files.<br />
<strong>chau</strong>-<strong>hiix</strong>-digital-<strong>archive</strong>.doc 14 3/30/2010 3:17 PM
Yearly:<br />
11. Write a brief document (.TXT) of <strong>problems</strong>, anomalies or suggestions from <strong>the</strong><br />
postseason <strong>archive</strong> process. File this document in folder 0-LogArchiveChanges.<br />
12. Copy <strong>the</strong> whole Archive to THREE external devices or locations. R<strong>and</strong>omly open files on<br />
<strong>the</strong> external devices to ensure data viability. Deposit <strong>the</strong> print versions of <strong>the</strong> TXT files<br />
in a designated fireproof location.<br />
1. Verify <strong>the</strong> functioning of <strong>the</strong> storage media <strong>and</strong>/or location by opening files.<br />
2. Monitor <strong>the</strong> need to transfer media or migrate data.<br />
3. Monitor changes in accessibility or access rights.<br />
4. Provide for <strong>the</strong> addition of new analyses or data error correction.<br />
References cited:<br />
AHDS, Arts <strong>and</strong> Humanities Data Service. 2000. Digital Archives from Excavation <strong>and</strong> Fieldwork<br />
Guide to Good Practice. http://ads.ahds.ac.uk/project/goodguides/excavation/sect82.html.<br />
Baker, Nicholson. 2001. Double Fold: Libraries <strong>and</strong> <strong>the</strong> Assault on Paper. New York: Vintage Books.<br />
BRTF-SDPA, Blue Ribbon Task Force on Sustainable Digital Preservation <strong>and</strong> Access. 2010.<br />
Sustainable Economics for a Digital Planet: Ensuring Long-Term Access to Digital<br />
Information. http://brtf.sdsc.edu/index.html.<br />
MediaVault. 2009. Acceptable Characters in File Names. Naming Your Files,<br />
http://mediavault.wordpress.com/documentation/.<br />
Silverman, Sydel, <strong>and</strong> Nancy J. Parezo. 1995. Preserving <strong>the</strong> Anthropological Record. New York:<br />
Wenner-Gren Foundation for Anthropological Research.<br />
StanfordUniversity. 2008. LOCKSS, Lots of Copies Keeps Stuff Safe.<br />
http://lockss.stanford.edu/lockss/Home.<br />
<strong>chau</strong>-<strong>hiix</strong>-digital-<strong>archive</strong>.doc 15 3/30/2010 3:17 PM