Rome Wasn't Digitized in a Day - Council on Library and Information ...
Rome Wasn't Digitized in a Day - Council on Library and Information ...
Rome Wasn't Digitized in a Day - Council on Library and Information ...
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
98<br />
step, because epigraphic texts should also be “fully queryable <strong>and</strong> manipulable” <str<strong>on</strong>g>in</str<strong>on</strong>g> a digital<br />
envir<strong>on</strong>ment:<br />
By the term “queryable”, we do not simply mean that the text may be scanned for particular<br />
patterns of characters; we mean that features of the text <str<strong>on</strong>g>in</str<strong>on</strong>g>dicated by Leiden should be able to<br />
be <str<strong>on</strong>g>in</str<strong>on</strong>g>vestigated also. So, for example, a corpus of <str<strong>on</strong>g>in</str<strong>on</strong>g>scripti<strong>on</strong>s should be able to be queried for<br />
the full list of abbreviati<strong>on</strong>s used with<str<strong>on</strong>g>in</str<strong>on</strong>g> it, or for the number of occurrences of a word <str<strong>on</strong>g>in</str<strong>on</strong>g> its full<br />
form, neither abbreviated nor supplemented. One can imag<str<strong>on</strong>g>in</str<strong>on</strong>g>e many uses for a search eng<str<strong>on</strong>g>in</str<strong>on</strong>g>e<br />
able to do these k<str<strong>on</strong>g>in</str<strong>on</strong>g>ds of queries <strong>on</strong> text (Cayless et al. 2009).<br />
The ability to do searches that “leverage the structures” embedded with<str<strong>on</strong>g>in</str<strong>on</strong>g> Leiden, accord<str<strong>on</strong>g>in</str<strong>on</strong>g>g to Cayless<br />
et al. (2009), first requires marked-up <str<strong>on</strong>g>in</str<strong>on</strong>g>scripti<strong>on</strong> text that could then be parsed <strong>and</strong> c<strong>on</strong>verted <str<strong>on</strong>g>in</str<strong>on</strong>g>to data<br />
structures that could be used to support the operati<strong>on</strong>s listed above. Such pars<str<strong>on</strong>g>in</str<strong>on</strong>g>g requires lexical<br />
analysis that produces token streams that can be fed <str<strong>on</strong>g>in</str<strong>on</strong>g>to a parser, which can produce parse trees that<br />
can be acted up<strong>on</strong> <strong>and</strong> queried <str<strong>on</strong>g>in</str<strong>on</strong>g> different ways. The authors granted that while EpiDoc is <strong>on</strong>ly <strong>on</strong>e<br />
“possible serializati<strong>on</strong> of the Leiden data structure,” it does have the added advantage of hav<str<strong>on</strong>g>in</str<strong>on</strong>g>g many<br />
tools available to already work with it.<br />
Rather than mak<str<strong>on</strong>g>in</str<strong>on</strong>g>g use of st<strong>and</strong>ards such as EpiDoc, Cayless et al. stated that the databases that<br />
supported most <strong>on</strong>l<str<strong>on</strong>g>in</str<strong>on</strong>g>e epigraphy projects typically <str<strong>on</strong>g>in</str<strong>on</strong>g>cluded various metadata fields <strong>and</strong> a large text<br />
field with the Leiden <str<strong>on</strong>g>in</str<strong>on</strong>g>scripti<strong>on</strong> directly transcribed without any markup or encod<str<strong>on</strong>g>in</str<strong>on</strong>g>g (a fact supported<br />
by the survey <str<strong>on</strong>g>in</str<strong>on</strong>g> this review). The c<strong>on</strong>venience of such a database setup is that it permits various<br />
fielded <strong>and</strong> full-text searches, it is easy to c<strong>on</strong>nect with web-based fr<strong>on</strong>t ends for forms, data can be<br />
easily extracted us<str<strong>on</strong>g>in</str<strong>on</strong>g>g Structured Query Language (SQL), <strong>and</strong> data can also be easily added to these<br />
systems. This makes it easy to <str<strong>on</strong>g>in</str<strong>on</strong>g>sert new <str<strong>on</strong>g>in</str<strong>on</strong>g>scripti<strong>on</strong>s as they are discovered. N<strong>on</strong>etheless, this<br />
st<strong>and</strong>ard database approach has two major flaws, accord<str<strong>on</strong>g>in</str<strong>on</strong>g>g to Cayless et al.: (1) <str<strong>on</strong>g>in</str<strong>on</strong>g> terms of digital<br />
preservati<strong>on</strong>, each “digital corpus” or database does not have distributed copies as a pr<str<strong>on</strong>g>in</str<strong>on</strong>g>t corpus does;<br />
<strong>and</strong> (2) these databases lack the ability to “customize queries” <strong>and</strong> thus “see how result sets are be<str<strong>on</strong>g>in</str<strong>on</strong>g>g<br />
c<strong>on</strong>structed.”<br />
Another significant issue is that the way databases or their <str<strong>on</strong>g>in</str<strong>on</strong>g>terfaces are designed can greatly <str<strong>on</strong>g>in</str<strong>on</strong>g>fluence<br />
the types of questi<strong>on</strong>s that can be asked. Mak<str<strong>on</strong>g>in</str<strong>on</strong>g>g arguments similar to those of Dunn (2009) <strong>and</strong><br />
Bodard <strong>and</strong> Garcés (2009), Cayless et al. argued that technical decisi<strong>on</strong>s such as the creati<strong>on</strong> of a<br />
database are also “editorial <strong>and</strong> scholarly decisi<strong>on</strong>s” <strong>and</strong> that access to raw data is required to provide<br />
users the ability to both exam<str<strong>on</strong>g>in</str<strong>on</strong>g>e <strong>and</strong> correct decisi<strong>on</strong>s. L<strong>on</strong>g-term digital repositories for <str<strong>on</strong>g>in</str<strong>on</strong>g>scripti<strong>on</strong>s<br />
thus have at least two major requirements: (1) the ability to export part or all of the data <str<strong>on</strong>g>in</str<strong>on</strong>g> st<strong>and</strong>ard<br />
formats; <strong>and</strong> (2) persistent identifiers (such as digital object identifiers [DOIs]) at the level of a digital<br />
object so that they can be used to cite these objects <str<strong>on</strong>g>in</str<strong>on</strong>g>dependent of the locati<strong>on</strong> from where they were<br />
retrieved. As Cayless et al. expla<str<strong>on</strong>g>in</str<strong>on</strong>g>, <str<strong>on</strong>g>in</str<strong>on</strong>g> a future where published digital <str<strong>on</strong>g>in</str<strong>on</strong>g>scripti<strong>on</strong>s may be stored <str<strong>on</strong>g>in</str<strong>on</strong>g><br />
various locati<strong>on</strong>s, the ability to cite items us<str<strong>on</strong>g>in</str<strong>on</strong>g>g persistent identifiers will be very important. They see<br />
EpiDoc as a key comp<strong>on</strong>ent of such a future digital <str<strong>on</strong>g>in</str<strong>on</strong>g>frastructure for epigraphy, because it could serve<br />
not <strong>on</strong>ly as an <str<strong>on</strong>g>in</str<strong>on</strong>g>terchange format but also as a means of stor<str<strong>on</strong>g>in</str<strong>on</strong>g>g, distribut<str<strong>on</strong>g>in</str<strong>on</strong>g>g, <strong>and</strong> preserv<str<strong>on</strong>g>in</str<strong>on</strong>g>g<br />
epigraphic data <str<strong>on</strong>g>in</str<strong>on</strong>g> a digital format.<br />
All of these arguments lead the authors to <strong>on</strong>e fundamental c<strong>on</strong>clusi<strong>on</strong> about epigraphy, namely, that<br />
<str<strong>on</strong>g>in</str<strong>on</strong>g>scripti<strong>on</strong>s are texts <str<strong>on</strong>g>in</str<strong>on</strong>g> complex envir<strong>on</strong>ments, not just physical objects:<br />
This fact argues for treat<str<strong>on</strong>g>in</str<strong>on</strong>g>g them from the start as complex digital packages with their own<br />
deep structure, history, <strong>and</strong> associated data (such as images), rather than as simple elements <str<strong>on</strong>g>in</str<strong>on</strong>g> a