26.11.2012 Views

the chau hiix archive: principles, problems, and solutions

the chau hiix archive: principles, problems, and solutions

the chau hiix archive: principles, problems, and solutions

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

THE CHAU HIIX ARCHIVE:<br />

PRINCIPLES, PROBLEMS, AND SOLUTIONS<br />

Caroline Beebe, Ph.D.<br />

Data Manager, Chau Hiix Project<br />

(carolinebeebe@hotmail.com)<br />

Begin by watching “Team Digital <strong>and</strong> <strong>the</strong> Deadly Cryptic Conundrum”<br />

at http://www.youtube.com/user/wepreserve<br />

The research design for excavations at Chau Hiix, Belize, is concerned with underst<strong>and</strong>ing <strong>the</strong> Maya<br />

Collapse in terms of bureaucratic interference in <strong>the</strong> exploitation strategies of smallholders (PI is<br />

Dr. Anne Pyburn, Indiana University). A regional comparison of data from o<strong>the</strong>r sites such as Altun<br />

Ha <strong>and</strong> Lamanai, as well as chronological comparisons, would facilitate an underst<strong>and</strong>ing of <strong>the</strong><br />

similarities <strong>and</strong> differences in various sites <strong>and</strong> <strong>the</strong> complexity of <strong>the</strong>ir development <strong>and</strong> eventual<br />

demise. Digital technology provides an ideal tool to facilitate data comparisons through time <strong>and</strong><br />

space. Digital data collection, management, <strong>and</strong> archiving, however, are wrought with new<br />

operational logistics, lack of established methodologies, <strong>and</strong> ever changing technology<br />

developments that represent yet ano<strong>the</strong>r area of expertise to be developed by a principal<br />

investigator. Large institutions <strong>and</strong> businesses have well established systems in place for all stages<br />

of digital workflow, but individuals <strong>and</strong> small projects (sometimes referred to as “small science”)<br />

must design <strong>the</strong>se systems <strong>the</strong>mselves without <strong>the</strong> aid of built-in supercomputer systems or large<br />

number crunching grants.<br />

Digital archiving is <strong>the</strong> culminating step of <strong>the</strong> digital data workflow. Though <strong>the</strong>re is a lot of<br />

information being written about this topic, <strong>the</strong>re are no specific examples. Available information<br />

assumes <strong>the</strong> reader underst<strong>and</strong>s what format <strong>and</strong> syntax mean, how to build a database, how to<br />

create controlled vocabularies, or how to name files (such as what constitutes a good filename<br />

within <strong>the</strong> boundaries of type <strong>and</strong> number of characters). This document lays out <strong>the</strong> <strong>principles</strong> <strong>and</strong><br />

guidelines for digital archiving, as interpreted by <strong>the</strong> Chau Hiix project, <strong>and</strong> <strong>the</strong>n describes <strong>the</strong> Chau<br />

Hiix Data Archive as a specific instantiation. The methodology for this <strong>archive</strong> is <strong>the</strong> result of 15<br />

years of trial <strong>and</strong> error in <strong>the</strong> identification of <strong>principles</strong> <strong>and</strong> rules, <strong>and</strong> constant modification of <strong>the</strong><br />

implementation of those rules. It is also <strong>the</strong> result of both conceptual <strong>and</strong> data input from students,<br />

staff <strong>and</strong> professional personnel, without which <strong>the</strong> Chau Hiix Digital Archive would not exist.<br />

The goal of <strong>the</strong> Chau Hiix Digital Archive is for <strong>the</strong> data to be share-able <strong>and</strong> available in <strong>the</strong> future.<br />

The <strong>principles</strong> that make this possible are: Data are organized <strong>and</strong> consistent, <strong>and</strong> data are <strong>archive</strong>d<br />

with a stewardship plan. Controlled vocabularies <strong>and</strong> archival formats are both key to supporting<br />

<strong>the</strong>se <strong>principles</strong>. The <strong>archive</strong> process for digital data is simply stated: Save-as text, copy, print <strong>and</strong><br />

store. (Since image files cannot be stored as text, <strong>the</strong> only recourse is to save in an archival image<br />

format.) Each of <strong>the</strong>se steps is replete with <strong>problems</strong> <strong>and</strong> decisions, <strong>and</strong> <strong>the</strong> process still does not<br />

guarantee <strong>the</strong> accessibility of <strong>the</strong> data in perpetuity but brings <strong>the</strong> data closer to that potential.<br />

<strong>chau</strong>-<strong>hiix</strong>-digital-<strong>archive</strong>.doc 1 3/30/2010 3:17 PM


The material that follows is at times cryptic as it uses terminologies <strong>and</strong> concepts from both<br />

archaeology <strong>and</strong> information technology. All information technologists have gaps in <strong>the</strong>ir digital<br />

knowledge, so it is unreasonable to expect an archaeologist to be an expert in <strong>the</strong> digital arena in<br />

addition to <strong>the</strong>ir own content area. On <strong>the</strong> o<strong>the</strong>r h<strong>and</strong>, information technologists supporting<br />

archaeology activities are often too fascinated with <strong>the</strong> latest <strong>and</strong> greatest emerging technology.<br />

Their technology recommendations often presume skills sets beyond <strong>the</strong> scope of <strong>the</strong> daily<br />

experiences of humanities scientists <strong>and</strong> so are costly in time by requiring constant learning or relearning.<br />

Digital archiving is a complex <strong>and</strong> emerging activity <strong>and</strong> would be embraced by more<br />

people, as well as o<strong>the</strong>r disciplines, if presented in some sort of Digital Archiving for Dummies<br />

format.<br />

Problem areas this <strong>archive</strong> still struggles with include: The naming of some types of image files<br />

(how specific should <strong>the</strong> name be?); training surveyors in developing consistent point description<br />

codes; establishing guidelines for <strong>the</strong> selection of data to <strong>archive</strong> (e.g., if processed data can be<br />

recreated from <strong>the</strong> primary data, does it need to be included in <strong>the</strong> <strong>archive</strong>?); <strong>and</strong>, stewardship<br />

issues, such as long term storage plans <strong>and</strong> version control of data corrections/analyses. The<br />

checklist at <strong>the</strong> end of this document summarizes <strong>the</strong> specific tasks as instantiated at this time, <strong>and</strong><br />

<strong>the</strong> rest of this document explains those tasks, though sometimes references <strong>the</strong> Lab Manual for<br />

more detail (see <strong>the</strong> author for access to <strong>the</strong> Lab Manual).<br />

Preserve primary data<br />

Principles for <strong>the</strong> Chau Hiix Digital Archive<br />

Chau Hiix values <strong>the</strong> preservation of primary data, with <strong>the</strong> underst<strong>and</strong>ing that <strong>the</strong> data discovery<br />

process is filtered by <strong>the</strong> archaeologist. Assigning a new context <strong>and</strong> making a photograph of it is<br />

interpreting <strong>the</strong> current stasis of an excavation as data of value. The data (artifacts, photographs,<br />

written descriptions) of <strong>the</strong>se interpretive decisions are <strong>the</strong> most primary data that can be collected<br />

<strong>and</strong> preserved. The archaeologist <strong>the</strong>n analyzes this primary data for descriptive types <strong>and</strong><br />

sequencing, <strong>and</strong> <strong>the</strong> analysis is summarized <strong>and</strong> discussed in a publication, which is considered<br />

secondary data, <strong>and</strong> its story can be told many ways. If secondary data are <strong>the</strong> only documents<br />

preserved, as in publication, <strong>the</strong>n <strong>the</strong> opportunity is lost for multiple perspectives on <strong>the</strong> primary<br />

data: For re-analysis based on new data, for new or comparative interpretations, or for <strong>the</strong><br />

application of emerging technologies.<br />

Preserving <strong>the</strong> primary data involves more decision making processes: What data are worthy of<br />

preservation; what methods are available for preserving digital data; <strong>and</strong>, how will access be<br />

provided? Selecting data of value to preserve is for ano<strong>the</strong>r discussion, but note that it becomes<br />

confounded in <strong>the</strong> digital realm because of <strong>the</strong> ease of saving multiple versions. Chau Hiix would<br />

like to provide <strong>the</strong> most likely scenario for <strong>the</strong> digital data to survive <strong>and</strong> remain accessible for <strong>the</strong><br />

next 100 years. Though not an unreasonable goal, it is challenged by <strong>the</strong> lack of st<strong>and</strong>ards for digital<br />

<strong>archive</strong> methodologies <strong>and</strong> ever-changing technologies. Methods for preserving digital data are <strong>the</strong><br />

focus of this document, while provision for access in <strong>the</strong> future is still a problem without a longterm<br />

solution.<br />

<strong>chau</strong>-<strong>hiix</strong>-digital-<strong>archive</strong>.doc 2 3/30/2010 3:17 PM


Data should be shared<br />

Chau Hiix supports <strong>the</strong> principle that primary data should be shared. This would allow for data reuse<br />

(e.g., with new technologies), or large-scale data integration, such as meaningful analysis of<br />

currently unlinked sets of data from different geographic or cultural areas. But as noted by<br />

Silverman <strong>and</strong> Parezo, “How much of what had been in <strong>the</strong>ir [excavator's] notes will never have<br />

seen print, <strong>and</strong> how much of what had been published will dem<strong>and</strong> reexamination of <strong>the</strong> primary<br />

records - if <strong>the</strong>se are available?”(1995, p.1)(Silverman <strong>and</strong> Parezo 1995) In order for data to be<br />

share-able in <strong>the</strong> future it must be organized <strong>and</strong> consistent, <strong>and</strong> it must be <strong>archive</strong>d with a<br />

stewardship plan to be accessible.<br />

Data are consistent <strong>and</strong> organized<br />

The key to successful digital archiving is thorough documentation of <strong>the</strong> data, how <strong>the</strong>y were<br />

collected, what st<strong>and</strong>ards were used to describe <strong>the</strong>m <strong>and</strong> how <strong>the</strong>y have been managed since<br />

collection … all digital project <strong>archive</strong>s must have three components: data, documentation,<br />

<strong>and</strong> an index. (AHDS 2000)<br />

The Chau Hiix Archive considers this a principle of consistency <strong>and</strong> organization of data, <strong>and</strong><br />

implements it by maintaining <strong>the</strong> data in archival formats (TXT, TIF, JPG, PDF); documenting data<br />

collection <strong>and</strong> detailed description in a Manual, <strong>and</strong> providing a simple file-folder structure as an<br />

index.<br />

Consistency has to do with both <strong>the</strong> consistent use of a finite set of data formats <strong>and</strong> consistency of<br />

values that represent data. Without consistency it is not possible to lump data toge<strong>the</strong>r or split data<br />

into groups. Consistency of data values is exemplified in controlled vocabularies <strong>and</strong> coding<br />

systems. If a group of artifacts are labeled variously as plaster, daub, ceramics, sherds, pots, it will<br />

require a good bit of re-labeling to group like objects: When sorting alphabetically on Material<br />

Type, <strong>the</strong> daub <strong>and</strong> plaster will not group toge<strong>the</strong>r, nei<strong>the</strong>r will <strong>the</strong> ceramics, sherds, <strong>and</strong> pots.<br />

Similarly, if <strong>the</strong>re are no st<strong>and</strong>ard survey codes for indicating <strong>the</strong> bottom southwest (BSW) corner<br />

of a building mound <strong>the</strong>n much time will be spent re-plotting <strong>the</strong> survey points to identify <strong>the</strong><br />

building configuration.<br />

In years past each excavator has collected his or her data in <strong>the</strong>ir own way, leading to a myriad of<br />

variations of software <strong>and</strong> descriptive terminology. Without consistency in data format, users of an<br />

<strong>archive</strong> will need access to <strong>the</strong> software that generated <strong>the</strong> various format types. Some software<br />

products create proprietary file formats requiring access to that specific software. Some survey <strong>and</strong><br />

GIS type software require hardware keys in order to open files. Databases are useful but <strong>the</strong>ir files<br />

are often not interchangeable between products. Selecting archival formats that are hardware <strong>and</strong><br />

software independent facilitates access to data.<br />

Archival formats. Text (.TXT) is considered <strong>the</strong> only archival format for alpha-numeric files. Text is<br />

<strong>the</strong> only format providing software independence. Select ei<strong>the</strong>r Text (ASCII) or Unicode (which<br />

subsumes ASCII; do not use <strong>the</strong> ANSI text option as it is specific to Microsoft). Though TXT is <strong>the</strong><br />

baseline format, a version of <strong>the</strong> file as created in its software application should be <strong>archive</strong>d along<br />

with <strong>the</strong> TXT version. For example, <strong>the</strong> MS Excel file MyData.XLS is <strong>archive</strong>d as well as <strong>the</strong> version<br />

<strong>chau</strong>-<strong>hiix</strong>-digital-<strong>archive</strong>.doc 3 3/30/2010 3:17 PM


saved-as MyData.TXT (if you save-as in <strong>the</strong> .CSV format , open it in Notepad/text-editor to save it in<br />

<strong>the</strong> .TXT format).<br />

Documentation of <strong>the</strong> TXT file is essential. Ensure that <strong>the</strong>re is a header in each TXT file that<br />

describes what <strong>the</strong> data in <strong>the</strong> file represents. This may require more than just a delineation of <strong>the</strong><br />

column headings, each heading may need explication.<br />

Images should be saved in TIF format. The JPG format is acceptable for images when resolution is of<br />

less importance or <strong>the</strong>re is no o<strong>the</strong>r option due to available technology (digital cameras should be<br />

set at <strong>the</strong>ir highest resolution). The PDF format is acceptable, but not optimal, for scanned images of<br />

notebook pages <strong>and</strong> o<strong>the</strong>r h<strong>and</strong>written documents but care must be taken to ensure <strong>the</strong> pages are<br />

readable. TIF is <strong>the</strong> optimal image format due to non-compressed high resolution.<br />

The goal of <strong>the</strong> archival format for image data is to save in <strong>the</strong> format with <strong>the</strong> greatest chance of<br />

industry format migration, <strong>and</strong> in <strong>the</strong> highest resolution possible (without compression)<br />

appropriate to <strong>the</strong> value of <strong>the</strong> image. For example, <strong>the</strong> photograph of a burial in situ is more<br />

valuable than an image scan of a field notebook.<br />

Organization of <strong>the</strong> <strong>archive</strong> can be accomplished with a simple folder-file structure <strong>and</strong> does not<br />

require a full index of all documents or a relational database structure that has to be updated <strong>and</strong><br />

migrated. Create a simple Read-me.txt file that documents how <strong>the</strong> folder-file organization is laid<br />

out, serving as a type of site-map.<br />

Files are Archived with a Stewardship Plan<br />

In order for data to be accessible in <strong>the</strong> future it must be <strong>archive</strong>d with a stewardship plan. Short<br />

term planning, for <strong>the</strong> next five years or until <strong>the</strong> PI retires, may be <strong>the</strong> only strategy at this time.<br />

Long term planning is dependent on <strong>the</strong> perceived value of <strong>the</strong> <strong>archive</strong> as a digital asset <strong>and</strong> <strong>the</strong><br />

availability of institutional, organizational, or public Data Repositories. Even <strong>the</strong> best-laid plans may<br />

change over time. Consider <strong>the</strong> plight of Nicholson Baker who “pleads <strong>the</strong> case for saving our<br />

recorded heritage in its original form while telling <strong>the</strong> story of how <strong>and</strong> why our greatest research<br />

libraries betrayed <strong>the</strong> public trust by auctioning off or pulping irreplaceable [newspaper]<br />

collections” (Baker 2001). The Libraries had microfilmed all <strong>the</strong> newspapers which not only lost all<br />

<strong>the</strong> color print but also rendered a good number of pages unreadable due to bleed through <strong>and</strong><br />

poor imaging. Diligence in stewardship does not guarantee that data will be accessible in <strong>the</strong> future.<br />

Identify <strong>the</strong> data steward.<br />

The Chau Hiix Archive is still in its formative years <strong>and</strong> so <strong>the</strong> PI (Dr. Pyburn) <strong>and</strong> <strong>the</strong> Data Manager<br />

maintain <strong>the</strong> <strong>archive</strong>. This is a fine plan as long as <strong>the</strong>y are alive <strong>and</strong> well <strong>and</strong> still in <strong>the</strong> workforce.<br />

But what happens when <strong>the</strong>y are gone, where does <strong>the</strong> data go? As members of a large university<br />

community <strong>the</strong> data <strong>archive</strong> is backed up on a well-maintained remote system, but how long will<br />

<strong>the</strong> <strong>archive</strong> remain on <strong>the</strong> system <strong>and</strong> who will have access once <strong>the</strong> faculty researcher retires?<br />

Universities have failed to step up to <strong>the</strong> need for creating Research Data Archives. While <strong>the</strong><br />

university Library may be <strong>the</strong> steward of scholarly discourse (in <strong>the</strong> form of print or electronic<br />

publication) <strong>the</strong>y have yet to step up or be m<strong>and</strong>ated to steward digital data. Scholars need to be<br />

<strong>chau</strong>-<strong>hiix</strong>-digital-<strong>archive</strong>.doc 4 3/30/2010 3:17 PM


advocating for <strong>the</strong> creation of research data repositories whe<strong>the</strong>r <strong>the</strong>y be institutional or<br />

organizational, public or privately funded. For a repository example, see AHDS (<strong>the</strong> Arts <strong>and</strong><br />

Humanities Data Service), for <strong>the</strong> latest discussion on <strong>the</strong> future of <strong>archive</strong>d data see <strong>the</strong> Blue<br />

Ribbon Task Force on Sustainable Digital Preservation <strong>and</strong> Access (BRTF-SDPA 2010).<br />

Stewardship tasks<br />

While analog documents age in a slow <strong>and</strong> foreseeable way, aging occurs rapidly in <strong>the</strong> digital<br />

realm. Digital stewardship involves three areas of yearly monitoring <strong>and</strong> activity: Format migration<br />

<strong>and</strong> media transfer, data verification, <strong>and</strong> accessibility.<br />

Format migration <strong>and</strong> media transfer. The data steward must monitor <strong>the</strong> need for data<br />

migration from one format to <strong>the</strong> next <strong>and</strong> for transfer from one media to <strong>the</strong> next generation.<br />

Archival formats evolve <strong>and</strong> may require data to be migrated from an old st<strong>and</strong>ard to a new<br />

st<strong>and</strong>ard. Alphanumeric data in <strong>the</strong> TXT format (ASCII-based) is safe because its next generation is<br />

Unicode which is backwardly compatible to ASCII <strong>and</strong> exp<strong>and</strong>s <strong>the</strong> character set to include<br />

requirements from o<strong>the</strong>r languages. Image formats, however, are in flux <strong>and</strong> need to be closely<br />

monitored. For example, <strong>the</strong> PDF format is proprietary to Abode. Should <strong>the</strong> company make a bad<br />

business decision <strong>and</strong> close down, or be acquired, <strong>the</strong>re is no guarantee that <strong>the</strong> format will remain<br />

viable in its current form. A migration path may only be available for a short period of time, after<br />

which it may take considerable expense to locate <strong>and</strong> execute <strong>the</strong> migration.<br />

Similarly <strong>the</strong> media on which data are stored need to be monitored for viability. Floppy drives <strong>and</strong><br />

diskettes have become obsolete <strong>and</strong> it is costly to find services that will transfer data on older<br />

media to new media. In addition, media can simply fail mechanically, thus <strong>the</strong> need for backup<br />

copies on various media types.<br />

Data verification. Transfer of data to a new medium can fail, partially fail, or cause data<br />

corruption, <strong>and</strong> so data verification is essential. Migration of data to a new format can cause<br />

unanticipated changes in <strong>the</strong> data, so <strong>the</strong> data need to be carefully verified during a migration<br />

process. Even if <strong>the</strong> data has not been migrated or transferred to new media, <strong>the</strong> data should be<br />

verified on a yearly basis as this will uncover <strong>the</strong> failure of a particular media, storage device, or<br />

location . R<strong>and</strong>omly opening several documents in different formats would be sufficient for<br />

verification.<br />

Over time multiple professionals may analyze portions of <strong>the</strong> data set <strong>and</strong> identify errors. Some<br />

process must be put in place to verify data errors <strong>and</strong> incorporate corrections into <strong>the</strong> <strong>archive</strong>.<br />

Accessibility. Digital data has to physically reside somewhere, on some server, on some storage<br />

device. Repositories can be private, institutional/industry, or national/international. Undoubtedly<br />

all researchers have private copies of all <strong>the</strong>ir data <strong>and</strong> can provide access, that is, if it is organized<br />

<strong>and</strong> regularly verified. In <strong>the</strong> context of this document, <strong>the</strong> university, as an institution, has<br />

provided a remote backup <strong>and</strong> storage system. But access is still only available through private<br />

contact with <strong>the</strong> researcher. O<strong>the</strong>r institutions (libraries, societies, <strong>and</strong> organizations) have failed as<br />

yet to manage such data in a way that a wider audience could have access.<br />

<strong>chau</strong>-<strong>hiix</strong>-digital-<strong>archive</strong>.doc 5 3/30/2010 3:17 PM


Industrial repositories include electronic publishers, web services (similar to FLICKR <strong>and</strong><br />

Facebook), or what are called “cloud” type services that offer remote data backup <strong>and</strong> archiving.<br />

But at issue with industries is cost as well as <strong>the</strong>ir viability as a company. The private sector has no<br />

m<strong>and</strong>ate to continue to provide service in perpetuity.<br />

Lots of Copies Keeps Stuff Safe (LOCKSS) is both a strategy <strong>and</strong> a program (StanfordUniversity<br />

2008). While <strong>the</strong> Stanford program may be beyond <strong>the</strong> scope of “small science” projects, <strong>the</strong><br />

strategy is useful if well managed. The data steward needs to track <strong>the</strong> locations of <strong>the</strong> various<br />

copies to maintain version control <strong>and</strong> to ensure data migration <strong>and</strong> media transfer. Chau Hiix<br />

maintains stewardship over five copies of <strong>the</strong> <strong>archive</strong>, six if you include <strong>the</strong> print version.<br />

O<strong>the</strong>r national or international repositories have yet to be established for “small science” projects<br />

such as Chau Hiix but <strong>the</strong> Arts <strong>and</strong> Humanities Data Service (AHDS 2000) provides a good model.<br />

History of Chau Hiix Digital Data<br />

Simple Not Easy: The digital process<br />

Prior to 1996 all Chau Hiix recordkeeping was done with paper <strong>and</strong> pencil. Digital data collection<br />

began with replacing pencil <strong>and</strong> paper with what was effectively a typewritten record, thus<br />

exchanging penmanship readability with typing errors. The portable computers were physically<br />

large, heavy, dependent on 3.5 diskettes for data storage, <strong>and</strong> expensive for <strong>the</strong> small budget of a<br />

university-based project. In addition, operating systems <strong>and</strong> software were not as user friendly as<br />

today, <strong>and</strong> both students <strong>and</strong> staff were inexperienced in <strong>the</strong>ir use. There was not money to buy<br />

spreadsheet software or hire a database programmer, <strong>and</strong> <strong>the</strong>re was not time to teach staff <strong>and</strong><br />

students to use <strong>the</strong>m reliably. These first Chau Hiix digital data records were created in a Text<br />

Editor (creating TXT files). Every data record was <strong>the</strong>refore entered as a single line in a text file<br />

with a consistent <strong>and</strong> documented syntax listed at <strong>the</strong> top of <strong>the</strong> file.<br />

Sample of a 1996 digital data file follows. This file exemplifies <strong>the</strong> Save-as Text archival version of<br />

all digital data.<br />

INVENTORY LITHIC/CERAMIC CHAU HIIX EXCAVATION 48 in 1996<br />

Data syntax= provenience,material,date,count,weight-in-ounces,excavator-initials,#-ofbags,notes<br />

(use x comma for missing information)<br />

che-48-1-2-96,lithic,6-Mar-96,8,x,rs/cw,1,x,<br />

che-48-1-20-96,ceramic,7-Mar-96,108,40,cw/rs,1,treefall cache<br />

che-48-1-21-96,ceramic,7-Mar-96,103,44,x,1,treefall cache<br />

che-48-1-23-96,ceramic,7-Mar-96,68,32,rs/gw,3of3,x,<br />

che-48-1-23-96,ceramic,7-Mar-96,26,32,rs/gw,2of3,x,<br />

che-48-1-23-96,ceramic,7-Mar-96,68,38,rs/gw,1of3,x,<br />

che-48-1-29-96,lithic,18-Mar-96,10,3,x,1,backdirt<br />

Note: Below is an explanation of provenience number construction.<br />

<strong>chau</strong>-<strong>hiix</strong>-digital-<strong>archive</strong>.doc 6 3/30/2010 3:17 PM


cht-70-2-1-99<br />

| | | | | |<br />

a b c d e f<br />

a. Location: CH= Chau Hiix; XT= Xtabai; NR= NaRob<br />

b. Type of provenience: e=excavation, t=testpit, p= posthole, s=surface collection.<br />

c. Provenience number for reference <strong>and</strong> sync with survey points.<br />

d. Operation number for multiple excavations/years at <strong>the</strong> same survey point.<br />

e. Context numbers are assigned by <strong>the</strong> excavator.<br />

f. Year code (YY) is redundant but useful for workflow <strong>and</strong> analysis.<br />

During <strong>the</strong> field season <strong>the</strong>se text lines would be visually scanned by <strong>the</strong> Lab Manager for data<br />

input errors. Since <strong>the</strong> comma is used as <strong>the</strong> data delimiter (i.e., separator used to indicate a new<br />

unit of data). Misplacement of <strong>the</strong> delimiter/comma is <strong>the</strong> source of most errors, such as separating<br />

multiple Excavator-initials (use “cjb/glp” instead of “cjb, glp”) or within a Note (use<br />

“backdirt/burial45” instead of “backdirt, burial45”).<br />

Post-season <strong>the</strong> Data Manager imports <strong>the</strong> data files into a spreadsheet for fur<strong>the</strong>r error checking.<br />

Typographic errors are constantly a concern <strong>and</strong> <strong>the</strong> spreadsheet sorting function provides a good<br />

tool for checking <strong>the</strong> consistency of data entry. Data files are <strong>the</strong>n merged with seasonal files from<br />

all field seasons. Using <strong>the</strong> spreadsheet’s Save-as function, a new TXT file is created as <strong>the</strong> archival<br />

copy of this total catalog of data. Spreadsheets are <strong>the</strong>n made available for analysis.<br />

By 2003 students <strong>and</strong> staff had enough computer experience for data entry to be done directly into<br />

a spreadsheet. The comma is still used as <strong>the</strong> delimiter for <strong>the</strong> archival TXT copy of <strong>the</strong> file, so data<br />

entry errors are still related to use of <strong>the</strong> comma within a cell, as well as typing <strong>and</strong> consistency<br />

errors. The Data Manager continues to follow <strong>the</strong> same post-season procedure: check for errors,<br />

merge all seasonal files with previous season’s files, make <strong>the</strong> spreadsheets available for analysis,<br />

<strong>and</strong> Save-as a TXT file to add to <strong>the</strong> <strong>archive</strong>.<br />

As students, staff <strong>and</strong> professionals who work at Chau Hiix become more digitally sophisticated, so<br />

does <strong>the</strong> hardware <strong>and</strong> software <strong>the</strong>y employ, as well as <strong>the</strong> number of files <strong>the</strong>y create. The basic<br />

data <strong>archive</strong> plan, however, remains <strong>the</strong> same. The <strong>archive</strong>d TXT files of 2009 look exactly like <strong>the</strong><br />

<strong>archive</strong>d files of 1996.<br />

Simple Archive: Save-as Text, Print, Backup<br />

The <strong>archive</strong> plan for Chau Hiix digital data is both conceptually <strong>and</strong> deceptively simple: Save-as text,<br />

print, <strong>and</strong> make three digital backup copies to store in separate locations.<br />

Not Easy in Details<br />

Implementing this simple plan is not easy; <strong>the</strong> devil is in <strong>the</strong> details. Each step is filled with<br />

<strong>problems</strong> <strong>and</strong> decisions, <strong>and</strong> <strong>the</strong> archival processes still do not guarantee <strong>the</strong> accessibility<br />

of <strong>the</strong> data in perpetuity.<br />

<strong>chau</strong>-<strong>hiix</strong>-digital-<strong>archive</strong>.doc 7 3/30/2010 3:17 PM


There are two sets of tasks in <strong>the</strong> digital data process: The Lab Manager tasks for collecting <strong>the</strong><br />

data, <strong>and</strong> <strong>the</strong> Data Manager tasks for archiving <strong>the</strong> data.<br />

Lab Manager Tasks for Collecting Digital Data. The Lab Manager is responsible for st<strong>and</strong>ardized<br />

file formats <strong>and</strong> names, consistency of data entry <strong>and</strong> ongoing backups during <strong>the</strong> data collection<br />

process. Use <strong>the</strong> following rules:<br />

File Formats<br />

Format refers to <strong>the</strong> three-letter extension that follows <strong>the</strong> period in a file name. Every<br />

software application uses an extension for its data files, most software offer choices of<br />

format under <strong>the</strong> Save-as option. Chau Hiix uses <strong>the</strong> following software: Microsoft Excel<br />

(format is .XLS), Microsoft Word (format is .DOC or .DOCX), <strong>and</strong> TDS Survey Pro<br />

(proprietary formats .AR5, .CR5, .RW5, but with a Save-as .TXT option). However, data may<br />

be collected in any software as long as <strong>the</strong>re is an option to Save-as or Export <strong>the</strong> data to an<br />

archival format. Archival formats are limited to TXT or Unicode, TIF, JPG, PDF.<br />

File Names<br />

There are generally two types of files: Data <strong>and</strong> image.<br />

Data files that are required are spreadsheets established <strong>and</strong> explained in <strong>the</strong> Lab Manual.<br />

Names of required spreadsheet data files for each Chau Hiix excavation season are:<br />

Personnel-Notebooks-YY.xls<br />

Provenience-Masterlist-YY.xls<br />

Inventory-Artifacts-YY.xls<br />

Inventory-Burial-YY.xls<br />

Survey-Gridpoints –YY.xls<br />

Survey-TotalPoints-YY.xls<br />

O<strong>the</strong>r data files may be generated as needed, such as Survey point collection files:<br />

TDS-mainplatform-07.cr5<br />

TDS-lagoon-07.cr5<br />

Data files are saved in both Data-entry format (of <strong>the</strong> specific software being used, e.g. Excel<br />

.XLS) <strong>and</strong> Backup format (archival format such as TXT or Unicode). Both <strong>the</strong> data-entry <strong>and</strong><br />

back-up format of a file have <strong>the</strong> exact same filename <strong>and</strong> are differentiated by <strong>the</strong> threeletter<br />

format extension. For example:<br />

Personnel-Notebooks-07.xls<br />

Personnel-Notebooks-07.txt<br />

TDS-mainplatform-07.cr5<br />

TDS-mainplatform-07.txt<br />

<strong>chau</strong>-<strong>hiix</strong>-digital-<strong>archive</strong>.doc 8 3/30/2010 3:17 PM


Image files are ei<strong>the</strong>r digital photographs or document scans (notebook pages, drawings,<br />

forms, cards, etc.) Digital photos are collected in a Photograph folder <strong>and</strong> placed in<br />

subfolders according to context type: Artifacts, Drawings <strong>and</strong> Maps, Proveniences, Camp-<br />

Scenes, etc. Files are named according to <strong>the</strong>ir content, beginning with Provenience number,<br />

or collected in a specific Provenience folder.<br />

str150-plan.tif (plan view of structure 150)<br />

str7-032905.jpg (structure 7 on Mar 29, 2005)<br />

che-71-1-15-98-lithics.tif (debitage from this context)<br />

cht-141-1-1-98-rim.tif (ceramic rim sherd)<br />

Or folders named:<br />

che-18-1-8-03-ceramic (large cache of sherds)<br />

che-16-x-x-03-lithic (contains multiple operations <strong>and</strong> contexts)<br />

che-22-excavations (general views of <strong>the</strong> excavation in progress)<br />

Document scans follow a similar filename-folder scheme except for Notebook page scans.<br />

Notebooks, designated for scanning by <strong>the</strong> PI, have filenames that begin with a two digit<br />

year YY <strong>and</strong> <strong>the</strong>n an arbitrary two digit personnel number assigned at <strong>the</strong> beginning of <strong>the</strong><br />

season to each field member. This is followed by a three digit number for <strong>the</strong> first notebook<br />

page in <strong>the</strong> file. Notebooks scans are limited to 8 pages per file to keep <strong>the</strong> file size<br />

manageable. For example, in 2007 Dr. Pyburn was assigned personnel number 01 so <strong>the</strong><br />

scanned files of her field notebook were named:<br />

0701001.tif (note this file contains pages 1 through 8)<br />

0701009.tif (note this file contains pages 9 through 17)<br />

0701018.tif (note this file contains pages 18 through 26)<br />

0701027.tif etc.<br />

All o<strong>the</strong>r files that are created should be named following <strong>the</strong> recommendations adapted<br />

from <strong>the</strong> Media Vault. The same guidelines apply to <strong>the</strong> names of Folders.<br />

File/folder names should not contain blank spaces. File/folder names should only<br />

include <strong>the</strong> letters A-Z <strong>and</strong> a-z, <strong>the</strong> numbers 0-9, plus underscores <strong>and</strong> hyphens.<br />

Underscores, hyphens, <strong>and</strong> zeros cannot occur as <strong>the</strong> first character of <strong>the</strong> name of a<br />

file/folder. File/folder names are considered case sensitive (because <strong>the</strong>y will be in<br />

some environments), but best practice is to not use case variation in considering<br />

uniqueness of naming. File/folder names should not exceed 64 characters, including<br />

file extension. Shorter is better. (MediaVault 2009)<br />

(Note that filenames that begin with zero are a problem when using <strong>the</strong> MS Excel <strong>and</strong> <strong>the</strong><br />

sorting function, so do not name files with leading zeros.)<br />

<strong>chau</strong>-<strong>hiix</strong>-digital-<strong>archive</strong>.doc 9 3/30/2010 3:17 PM


Data Entry<br />

Accuracy <strong>and</strong> consistency of data entry requires constant attention. Most fields in <strong>the</strong><br />

required spreadsheets use controlled vocabularies for data entry options which are listed<br />

<strong>and</strong> explained in <strong>the</strong> Lab Manual. (Note that it is possible to create drop down menus for<br />

populating spreadsheet cells, but <strong>the</strong> Lab Manager must underst<strong>and</strong> <strong>the</strong> implementation of<br />

this function in case changes need to be managed in <strong>the</strong> field.) The Lab Manager is wise to<br />

check for data errors on a daily basis, making <strong>the</strong>m easier to identify <strong>and</strong> correct. Data<br />

errors generally fall into three categories: typographical, auto-fill related, <strong>and</strong> explicative.<br />

Typographical errors are most often found by doing a sorting process on each column <strong>and</strong><br />

checking for inconsistencies. Typographical errors also occur when capturing survey data in<br />

<strong>the</strong> Total Station. The surveyor should be asked to edit <strong>the</strong>se errors, or “little finger<br />

mistakes” as one surveyor put it, as <strong>the</strong> Shot Description field can be quite cryptic.<br />

Auto-fill errors relate to <strong>the</strong> spreadsheet software trying to be “helpful” <strong>and</strong> auto-fill cells<br />

based on previous data entry. (Note that it is possible to turn OFF auto-fill but again, <strong>the</strong> Lab<br />

Manager must underst<strong>and</strong> <strong>the</strong> management of this function.) These errors are much harder<br />

to uncover, such as <strong>the</strong> date-of-collection becoming <strong>the</strong> date of data entry. Those doing data<br />

entry are asked to step-away from <strong>the</strong> task every twenty minutes to regain focus <strong>and</strong> help<br />

minimize this “helpful” problem. Daily error checking by <strong>the</strong> Lab Manager is <strong>the</strong> best<br />

deterrent.<br />

Explicative data errors refer to <strong>the</strong> excitement of data discovery <strong>and</strong> attempts during <strong>the</strong><br />

data entry process to provide as much information as possible or to begin <strong>the</strong> analysis<br />

process. For example, those doing data entry might see a particularly interesting lithic <strong>and</strong><br />

try to describe it in <strong>the</strong> Material column as “stemmed macro blade retouched”, or a ceramic<br />

as a “diagnostic rim sherd”. The information in <strong>the</strong>se types of entries belongs in <strong>the</strong> Notes<br />

field where it will not interfere with <strong>the</strong> st<strong>and</strong>ard vocabulary for Material type <strong>and</strong> thus<br />

cause sorting <strong>problems</strong>. Consistency of terminology (<strong>and</strong> spelling) is essential to being able<br />

to sort <strong>the</strong> data. This holds true for survey description also. Survey descriptive codes must<br />

be consistent <strong>and</strong> documented. New descriptors, codes, or vocabulary can be added as<br />

necessary but with discussion <strong>and</strong> agreement by all professionals.<br />

Backups<br />

Data files that have been modified need to be backed up each day, or more often if <strong>the</strong>re is<br />

extensive data entry or many changes in <strong>the</strong> personnel doing data entry. In order to control<br />

<strong>the</strong> file versions, file naming <strong>and</strong> a backup schedule should be established. Use a file name<br />

beginning with DB for daily backups (or DB1, DB2, DB3, etc. for more frequent backups)<br />

followed by a filename designator <strong>and</strong> <strong>the</strong>n a date in <strong>the</strong> form of DDMMYY. For example, <strong>the</strong><br />

Inventory-Artifacts-07.xls file would have daily backup files named:<br />

DB-IA-120307.xls<br />

DB-IA-130307.xls<br />

DB1-IA-140307.xls<br />

<strong>chau</strong>-<strong>hiix</strong>-digital-<strong>archive</strong>.doc 10 3/30/2010 3:17 PM


DB2-IA-140307.xls<br />

DB-IA-150307.xls<br />

These daily backups could be deleted once a weekly backup file is created <strong>and</strong> checked<br />

against <strong>the</strong> latest version of <strong>the</strong> daily backup. Weekly backups should be retained as part of<br />

<strong>the</strong> seasonal field record to provide for data entry error analysis by <strong>the</strong> Data Manager at <strong>the</strong><br />

end of <strong>the</strong> season. The weekly backup (WB) would use a similar filename for version<br />

control:<br />

WB-IA-160307.xls<br />

For version control, designate <strong>the</strong> last backup at <strong>the</strong> end of <strong>the</strong> field season (EOS) as:<br />

EOS-IA-220407.xls<br />

Each day <strong>the</strong> backup files should be copied to a portable storage device that is kept with <strong>the</strong><br />

Lab Manager’s passport. In <strong>the</strong> event of a disaster (fire, flood, hurricane, etc.) <strong>the</strong> Lab<br />

Manager is most likely to remember to grab <strong>the</strong> passport <strong>and</strong> thus <strong>the</strong> data also.<br />

Data Manager Process for Archiving Digital Data. The Data Manager is responsible for creating<br />

<strong>and</strong> maintaining <strong>the</strong> data <strong>archive</strong>. The maintenance tasks are best accomplished as soon as possible<br />

after <strong>the</strong> end of <strong>the</strong> field season; <strong>the</strong> more time that passes <strong>the</strong> less likely it is that personnel will<br />

remember details if <strong>the</strong>re are questions or anomalies.<br />

Organize <strong>the</strong> Archive<br />

Controlled vocabularies are <strong>the</strong> foundation of <strong>the</strong> <strong>archive</strong> as well as a major source of data<br />

entry errors. Vocabulary (or values allowed in each cell) must be consistent to provide for<br />

accurate data sorting when using <strong>the</strong> spreadsheets for analysis. Unless <strong>the</strong>y are experienced<br />

in sorting data, consistency in data entry is <strong>the</strong> most difficult principle to convey to <strong>the</strong> field<br />

personnel. This includes <strong>the</strong> surveyor who may not be experienced in <strong>the</strong> use of GIS type<br />

software <strong>and</strong> fails to consistently label each point with st<strong>and</strong>ardized description codes.<br />

Building <strong>and</strong> maintaining <strong>the</strong> controlled vocabularies is a primary task for <strong>the</strong> Data<br />

Manager. Before each field season <strong>the</strong> Data Manager updates <strong>and</strong> explicates <strong>the</strong> controlled<br />

vocabulary lists in <strong>the</strong> Lab Manual.<br />

Material Type is an example of a controlled vocabulary: Name <strong>the</strong>m pots, sherds or<br />

ceramics, but pick one term to use consistently for each concept. The Material Type (<strong>and</strong><br />

sub-types) terms in <strong>the</strong> controlled vocabulary used by Chau Hiix are:<br />

Historic (ceramic, glass, metal)<br />

Ceramic (polychrome, mendhole, netweight, partial-vessel)<br />

Shell (marine, freshwater)<br />

Bone (human, faunal)<br />

Lithic (chert, groundstone, jade, obsidian, hematite)<br />

Plaster (Painted)<br />

Carbon<br />

<strong>chau</strong>-<strong>hiix</strong>-digital-<strong>archive</strong>.doc 11 3/30/2010 3:17 PM


Soil (chemical, flotation, waterscreen)<br />

Misc (wood, stone, coral)<br />

All controlled vocabularies are constantly in review for additions or clarifications. Sub-types<br />

of a vocabulary term are not required initially but may be added or separated into more<br />

detail as <strong>the</strong> analysis phase evolves.<br />

The folder structure of <strong>the</strong> Chau Hiix Archive is simple. The Data Manager establishes <strong>the</strong><br />

folder structure as well as all <strong>the</strong> rules <strong>and</strong> tasks for <strong>the</strong> files it contains: File formats; file<br />

naming schemes; processes for data entry; <strong>and</strong> backups. In addition, <strong>the</strong> Data Manager, in<br />

concert with <strong>the</strong> PI, maintains controlled vocabularies that are used to fill in each cell in <strong>the</strong><br />

spreadsheets. Rules, tasks <strong>and</strong> vocabularies are all reviewed <strong>and</strong> updated on a yearly basis<br />

as influenced by <strong>the</strong> Lab Manager’s experience <strong>and</strong> as dem<strong>and</strong>ed by changes in both data<br />

<strong>and</strong> technology.<br />

The Chau Hiix Archive consists of <strong>the</strong> following Folders:<br />

0-LogArchiveChanges<br />

1-Manuals<br />

2-Personnel<br />

3-Notebooks<br />

4-Proveniences<br />

5-Artifacts<br />

6-Survey<br />

7-Visuals<br />

8-Analyses<br />

9-Documents<br />

10-SeasonalFiles<br />

FORMS<br />

PROBLEMS<br />

SOFTWARE<br />

Populating <strong>the</strong> Archive<br />

The Data Manager collects all <strong>the</strong> digital data at <strong>the</strong> end of <strong>the</strong> field season from <strong>the</strong> Lab<br />

Manager <strong>and</strong> begins <strong>the</strong> process of data validation, error-checking, documentation, <strong>and</strong><br />

storing archival versions. See <strong>the</strong> checklist that follows for <strong>the</strong> details of this process.<br />

Locations for <strong>the</strong> storage of <strong>the</strong> Chau Hiix Digital Archive include: onsite at Chau Hiix, in<br />

country with <strong>the</strong> department of archaeology, in <strong>the</strong> PI office, in <strong>the</strong> Data Manger office, <strong>and</strong><br />

on a university backup system. The PRINT version of <strong>the</strong> TXT files is stored in a fireproof<br />

cabinet in <strong>the</strong> office of <strong>the</strong> PI.<br />

Maintain <strong>the</strong> Archive<br />

The five Archive locations are maintained by <strong>the</strong> Data Manager. This entails replacing <strong>the</strong><br />

whole folder system with a new version at least after each field season <strong>and</strong> more often in<br />

<strong>the</strong> case of more frequent inter-season changes or additions, typically analysis data. At least<br />

<strong>chau</strong>-<strong>hiix</strong>-digital-<strong>archive</strong>.doc 12 3/30/2010 3:17 PM


three full versions of <strong>the</strong> whole <strong>archive</strong> are maintained on <strong>the</strong> university backup system,<br />

identified by folder name <strong>and</strong> version data (YYMMDD, <strong>and</strong> this date syntax is documented<br />

in <strong>the</strong> LogArchiveChanges files.) For example, <strong>archive</strong> folders are named:<br />

ChauHiixArchive-070621<br />

ChauHiixArchive-091020<br />

ChauHiixArchive-100112<br />

When a new version of <strong>the</strong> <strong>archive</strong> is created, r<strong>and</strong>om files need to be opened to ensure <strong>the</strong><br />

copy process has been completely successfully. If a new <strong>archive</strong> folder is not created after a<br />

two year period, <strong>the</strong> Data Manager should open <strong>the</strong> latest version <strong>and</strong> r<strong>and</strong>omly check files<br />

for data <strong>and</strong> media viability.<br />

The Data Manager is responsible for monitoring <strong>the</strong> viability of both archival formats <strong>and</strong><br />

media integrity <strong>and</strong> identifying <strong>the</strong> need for migration of data format, storage media, or<br />

storage location. The need for migration could be triggered by technology obsolescence, <strong>and</strong><br />

upgrades or retirement of formats, software, operating systems, or location accessibility.<br />

Digital Archive Checklist of Tasks<br />

Pre- season:<br />

1. Lab Manual is reviewed <strong>and</strong> updated as necessary <strong>and</strong> copies of all required Master files<br />

are made available for reference in <strong>the</strong> new field season.<br />

2. Prepare blank copies of required Master files for data entry in current season.<br />

3. Assign an arbitrary two-digit number to each field season participant <strong>and</strong> record it in<br />

<strong>the</strong> new, blank Personnel-notebook spreadsheet.<br />

Daily during <strong>the</strong> field season:<br />

1. Make daily backup files (DB) of each file that has been modified.<br />

2. Sort by columns each required spreadsheet to identify <strong>and</strong> correct errors.<br />

3. Copy daily backup files to an external storage device <strong>and</strong> store with your passport.<br />

Weekly during <strong>the</strong> field season:<br />

1. Make weekly backup files (WB) of each file.<br />

2. Sort by columns each required spreadsheet to identify <strong>and</strong> correct errors.<br />

3. Copy weekly backup files to an external storage device <strong>and</strong> store with your passport.<br />

End of field season:<br />

1. Copy all WB files to TWO different external storage devices, leaving master copies on<br />

<strong>the</strong> Lab computer.<br />

<strong>chau</strong>-<strong>hiix</strong>-digital-<strong>archive</strong>.doc 13 3/30/2010 3:17 PM


Post-season:<br />

2. Create a final EOS file of each spreadsheet representing <strong>the</strong> total data entry for <strong>the</strong><br />

season. Make two copies of <strong>the</strong> EOS files <strong>and</strong> store with <strong>the</strong> WB files on external devices.<br />

3. Write a short report documenting <strong>problems</strong>, anomalies or suggestions regarding <strong>the</strong>se<br />

files.<br />

1. Copy each EOS file <strong>and</strong> rename it according to its Master file name but append YY for<br />

year <strong>and</strong> <strong>the</strong> letter F for Field. For example: EOS-IA-220407.xls is copied to Inventory-<br />

Artifacts-07F.xls. This file becomes part of <strong>the</strong> permanent Archive in folder 10-<br />

SeasonalFiles. (See list of Archive folders above.)<br />

2. Make a second copy each of EOS file <strong>and</strong> rename it according to its Master file name but<br />

append YY for year <strong>and</strong> <strong>the</strong> letter P for Postseason. This is <strong>the</strong> Postseason working file<br />

where changes or edits are made. For example: EOS-IA-220407.xls is copied to Inventory-<br />

Artifacts-07P.xls.<br />

3. Sort by columns <strong>the</strong> spreadsheets (filename–YYP.xls) to identify <strong>and</strong> correct errors.<br />

4. When confident with data validity, combine each of <strong>the</strong>se files (filename–YYP) with its<br />

previous season’s Master file to become <strong>the</strong> next/future season’s Master file. For<br />

example: Inventory-Artifacts-07P.xls is combined with Inventory-Artifacts-05P.xls <strong>and</strong> <strong>the</strong><br />

new Inventory-Artifacts-07P.xls becomes part of <strong>the</strong> permanent Archive in folder 5-<br />

Artifacts.<br />

5. The Master file from <strong>the</strong> previous 2005 season, Inventory-Artifact-05P.xls, is <strong>the</strong>n moved<br />

to folder 10-SeasonalFiles subfolder for <strong>the</strong> 2005 field season. The folder 10-SeasonalFiles<br />

subfolder 2005 now contains <strong>the</strong> two files: Inventory-Artifact-05F.xls <strong>and</strong> Inventory-<br />

Artifact-05P.xls<br />

6. Make a TXT file for every Master file, adding a data syntax header to each one, <strong>the</strong>n<br />

PRINT each TXT file.<br />

7. In <strong>the</strong> current year’s subfolder in folder 10-SeasonalFiles subfolder, make a TXT copy of<br />

every file.<br />

8. Delete all DB <strong>and</strong> WB files as <strong>the</strong>y are now represented by <strong>the</strong> file with <strong>the</strong> –YYF<br />

designation. Retain <strong>the</strong> EOS files as insurance against post season data editing errors.<br />

9. File all o<strong>the</strong>r seasonal files in <strong>the</strong> appropriate Archive folders. For example, <strong>the</strong> season<br />

notebook scans are put in folder 3-Notebooks, <strong>and</strong> photographs are put in folder 7-<br />

Visuals.<br />

10. Delete any extraneous files.<br />

<strong>chau</strong>-<strong>hiix</strong>-digital-<strong>archive</strong>.doc 14 3/30/2010 3:17 PM


Yearly:<br />

11. Write a brief document (.TXT) of <strong>problems</strong>, anomalies or suggestions from <strong>the</strong><br />

postseason <strong>archive</strong> process. File this document in folder 0-LogArchiveChanges.<br />

12. Copy <strong>the</strong> whole Archive to THREE external devices or locations. R<strong>and</strong>omly open files on<br />

<strong>the</strong> external devices to ensure data viability. Deposit <strong>the</strong> print versions of <strong>the</strong> TXT files<br />

in a designated fireproof location.<br />

1. Verify <strong>the</strong> functioning of <strong>the</strong> storage media <strong>and</strong>/or location by opening files.<br />

2. Monitor <strong>the</strong> need to transfer media or migrate data.<br />

3. Monitor changes in accessibility or access rights.<br />

4. Provide for <strong>the</strong> addition of new analyses or data error correction.<br />

References cited:<br />

AHDS, Arts <strong>and</strong> Humanities Data Service. 2000. Digital Archives from Excavation <strong>and</strong> Fieldwork<br />

Guide to Good Practice. http://ads.ahds.ac.uk/project/goodguides/excavation/sect82.html.<br />

Baker, Nicholson. 2001. Double Fold: Libraries <strong>and</strong> <strong>the</strong> Assault on Paper. New York: Vintage Books.<br />

BRTF-SDPA, Blue Ribbon Task Force on Sustainable Digital Preservation <strong>and</strong> Access. 2010.<br />

Sustainable Economics for a Digital Planet: Ensuring Long-Term Access to Digital<br />

Information. http://brtf.sdsc.edu/index.html.<br />

MediaVault. 2009. Acceptable Characters in File Names. Naming Your Files,<br />

http://mediavault.wordpress.com/documentation/.<br />

Silverman, Sydel, <strong>and</strong> Nancy J. Parezo. 1995. Preserving <strong>the</strong> Anthropological Record. New York:<br />

Wenner-Gren Foundation for Anthropological Research.<br />

StanfordUniversity. 2008. LOCKSS, Lots of Copies Keeps Stuff Safe.<br />

http://lockss.stanford.edu/lockss/Home.<br />

<strong>chau</strong>-<strong>hiix</strong>-digital-<strong>archive</strong>.doc 15 3/30/2010 3:17 PM

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!