26.12.2014 Views

Rome Wasn't Digitized in a Day - Council on Library and Information ...

Rome Wasn't Digitized in a Day - Council on Library and Information ...

Rome Wasn't Digitized in a Day - Council on Library and Information ...

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

98<br />

step, because epigraphic texts should also be “fully queryable <strong>and</strong> manipulable” <str<strong>on</strong>g>in</str<strong>on</strong>g> a digital<br />

envir<strong>on</strong>ment:<br />

By the term “queryable”, we do not simply mean that the text may be scanned for particular<br />

patterns of characters; we mean that features of the text <str<strong>on</strong>g>in</str<strong>on</strong>g>dicated by Leiden should be able to<br />

be <str<strong>on</strong>g>in</str<strong>on</strong>g>vestigated also. So, for example, a corpus of <str<strong>on</strong>g>in</str<strong>on</strong>g>scripti<strong>on</strong>s should be able to be queried for<br />

the full list of abbreviati<strong>on</strong>s used with<str<strong>on</strong>g>in</str<strong>on</strong>g> it, or for the number of occurrences of a word <str<strong>on</strong>g>in</str<strong>on</strong>g> its full<br />

form, neither abbreviated nor supplemented. One can imag<str<strong>on</strong>g>in</str<strong>on</strong>g>e many uses for a search eng<str<strong>on</strong>g>in</str<strong>on</strong>g>e<br />

able to do these k<str<strong>on</strong>g>in</str<strong>on</strong>g>ds of queries <strong>on</strong> text (Cayless et al. 2009).<br />

The ability to do searches that “leverage the structures” embedded with<str<strong>on</strong>g>in</str<strong>on</strong>g> Leiden, accord<str<strong>on</strong>g>in</str<strong>on</strong>g>g to Cayless<br />

et al. (2009), first requires marked-up <str<strong>on</strong>g>in</str<strong>on</strong>g>scripti<strong>on</strong> text that could then be parsed <strong>and</strong> c<strong>on</strong>verted <str<strong>on</strong>g>in</str<strong>on</strong>g>to data<br />

structures that could be used to support the operati<strong>on</strong>s listed above. Such pars<str<strong>on</strong>g>in</str<strong>on</strong>g>g requires lexical<br />

analysis that produces token streams that can be fed <str<strong>on</strong>g>in</str<strong>on</strong>g>to a parser, which can produce parse trees that<br />

can be acted up<strong>on</strong> <strong>and</strong> queried <str<strong>on</strong>g>in</str<strong>on</strong>g> different ways. The authors granted that while EpiDoc is <strong>on</strong>ly <strong>on</strong>e<br />

“possible serializati<strong>on</strong> of the Leiden data structure,” it does have the added advantage of hav<str<strong>on</strong>g>in</str<strong>on</strong>g>g many<br />

tools available to already work with it.<br />

Rather than mak<str<strong>on</strong>g>in</str<strong>on</strong>g>g use of st<strong>and</strong>ards such as EpiDoc, Cayless et al. stated that the databases that<br />

supported most <strong>on</strong>l<str<strong>on</strong>g>in</str<strong>on</strong>g>e epigraphy projects typically <str<strong>on</strong>g>in</str<strong>on</strong>g>cluded various metadata fields <strong>and</strong> a large text<br />

field with the Leiden <str<strong>on</strong>g>in</str<strong>on</strong>g>scripti<strong>on</strong> directly transcribed without any markup or encod<str<strong>on</strong>g>in</str<strong>on</strong>g>g (a fact supported<br />

by the survey <str<strong>on</strong>g>in</str<strong>on</strong>g> this review). The c<strong>on</strong>venience of such a database setup is that it permits various<br />

fielded <strong>and</strong> full-text searches, it is easy to c<strong>on</strong>nect with web-based fr<strong>on</strong>t ends for forms, data can be<br />

easily extracted us<str<strong>on</strong>g>in</str<strong>on</strong>g>g Structured Query Language (SQL), <strong>and</strong> data can also be easily added to these<br />

systems. This makes it easy to <str<strong>on</strong>g>in</str<strong>on</strong>g>sert new <str<strong>on</strong>g>in</str<strong>on</strong>g>scripti<strong>on</strong>s as they are discovered. N<strong>on</strong>etheless, this<br />

st<strong>and</strong>ard database approach has two major flaws, accord<str<strong>on</strong>g>in</str<strong>on</strong>g>g to Cayless et al.: (1) <str<strong>on</strong>g>in</str<strong>on</strong>g> terms of digital<br />

preservati<strong>on</strong>, each “digital corpus” or database does not have distributed copies as a pr<str<strong>on</strong>g>in</str<strong>on</strong>g>t corpus does;<br />

<strong>and</strong> (2) these databases lack the ability to “customize queries” <strong>and</strong> thus “see how result sets are be<str<strong>on</strong>g>in</str<strong>on</strong>g>g<br />

c<strong>on</strong>structed.”<br />

Another significant issue is that the way databases or their <str<strong>on</strong>g>in</str<strong>on</strong>g>terfaces are designed can greatly <str<strong>on</strong>g>in</str<strong>on</strong>g>fluence<br />

the types of questi<strong>on</strong>s that can be asked. Mak<str<strong>on</strong>g>in</str<strong>on</strong>g>g arguments similar to those of Dunn (2009) <strong>and</strong><br />

Bodard <strong>and</strong> Garcés (2009), Cayless et al. argued that technical decisi<strong>on</strong>s such as the creati<strong>on</strong> of a<br />

database are also “editorial <strong>and</strong> scholarly decisi<strong>on</strong>s” <strong>and</strong> that access to raw data is required to provide<br />

users the ability to both exam<str<strong>on</strong>g>in</str<strong>on</strong>g>e <strong>and</strong> correct decisi<strong>on</strong>s. L<strong>on</strong>g-term digital repositories for <str<strong>on</strong>g>in</str<strong>on</strong>g>scripti<strong>on</strong>s<br />

thus have at least two major requirements: (1) the ability to export part or all of the data <str<strong>on</strong>g>in</str<strong>on</strong>g> st<strong>and</strong>ard<br />

formats; <strong>and</strong> (2) persistent identifiers (such as digital object identifiers [DOIs]) at the level of a digital<br />

object so that they can be used to cite these objects <str<strong>on</strong>g>in</str<strong>on</strong>g>dependent of the locati<strong>on</strong> from where they were<br />

retrieved. As Cayless et al. expla<str<strong>on</strong>g>in</str<strong>on</strong>g>, <str<strong>on</strong>g>in</str<strong>on</strong>g> a future where published digital <str<strong>on</strong>g>in</str<strong>on</strong>g>scripti<strong>on</strong>s may be stored <str<strong>on</strong>g>in</str<strong>on</strong>g><br />

various locati<strong>on</strong>s, the ability to cite items us<str<strong>on</strong>g>in</str<strong>on</strong>g>g persistent identifiers will be very important. They see<br />

EpiDoc as a key comp<strong>on</strong>ent of such a future digital <str<strong>on</strong>g>in</str<strong>on</strong>g>frastructure for epigraphy, because it could serve<br />

not <strong>on</strong>ly as an <str<strong>on</strong>g>in</str<strong>on</strong>g>terchange format but also as a means of stor<str<strong>on</strong>g>in</str<strong>on</strong>g>g, distribut<str<strong>on</strong>g>in</str<strong>on</strong>g>g, <strong>and</strong> preserv<str<strong>on</strong>g>in</str<strong>on</strong>g>g<br />

epigraphic data <str<strong>on</strong>g>in</str<strong>on</strong>g> a digital format.<br />

All of these arguments lead the authors to <strong>on</strong>e fundamental c<strong>on</strong>clusi<strong>on</strong> about epigraphy, namely, that<br />

<str<strong>on</strong>g>in</str<strong>on</strong>g>scripti<strong>on</strong>s are texts <str<strong>on</strong>g>in</str<strong>on</strong>g> complex envir<strong>on</strong>ments, not just physical objects:<br />

This fact argues for treat<str<strong>on</strong>g>in</str<strong>on</strong>g>g them from the start as complex digital packages with their own<br />

deep structure, history, <strong>and</strong> associated data (such as images), rather than as simple elements <str<strong>on</strong>g>in</str<strong>on</strong>g> a

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!