Rome Wasn't Digitized in a Day - Council on Library and Information ...
Rome Wasn't Digitized in a Day - Council on Library and Information ...
Rome Wasn't Digitized in a Day - Council on Library and Information ...
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
124<br />
While the advanced document-recogniti<strong>on</strong> technology used with the Archimedes Palimpsest has been<br />
discussed previously, the metadata <strong>and</strong> l<str<strong>on</strong>g>in</str<strong>on</strong>g>k<str<strong>on</strong>g>in</str<strong>on</strong>g>g strategies used to l<str<strong>on</strong>g>in</str<strong>on</strong>g>k manuscript metadata, images,<br />
<strong>and</strong> transcripti<strong>on</strong>s that were developed merit some further discussi<strong>on</strong>. Two recent articles by Doug<br />
Emery <strong>and</strong> Michael B. Toth (Emery <strong>and</strong> Toth 2009, Toth <strong>and</strong> Emery 2008) have described this process<br />
<str<strong>on</strong>g>in</str<strong>on</strong>g> detail. The creati<strong>on</strong> of the Archimedes Palimpsest Digital product, which released <strong>on</strong>e terabyte of<br />
<str<strong>on</strong>g>in</str<strong>on</strong>g>tegrated image <strong>and</strong> transcripti<strong>on</strong> data, required the spatial l<str<strong>on</strong>g>in</str<strong>on</strong>g>k<str<strong>on</strong>g>in</str<strong>on</strong>g>g of registered images for each leaf<br />
“to diplomatic transcripti<strong>on</strong>s that scholars <str<strong>on</strong>g>in</str<strong>on</strong>g>itially created <str<strong>on</strong>g>in</str<strong>on</strong>g> various n<strong>on</strong>st<strong>and</strong>ard formats, with<br />
associated st<strong>and</strong>ardized metadata” (Emery <strong>and</strong> Toth 2009). The transcripti<strong>on</strong> encod<str<strong>on</strong>g>in</str<strong>on</strong>g>g built off of<br />
previous work c<strong>on</strong>ducted by the HMT project, <strong>and</strong> Emery <strong>and</strong> Toth noted that st<strong>and</strong>ardized metadata<br />
were critical for three purposes: “(1) access to <strong>and</strong> <str<strong>on</strong>g>in</str<strong>on</strong>g>tegrati<strong>on</strong> of images for digital process<str<strong>on</strong>g>in</str<strong>on</strong>g>g <strong>and</strong><br />
enhancement, (2) management of transcripti<strong>on</strong>s from those images, <strong>and</strong> (3) l<str<strong>on</strong>g>in</str<strong>on</strong>g>kage of the images with<br />
the transcripti<strong>on</strong>s.”<br />
The authors also described how the great discipl<str<strong>on</strong>g>in</str<strong>on</strong>g>ary variety of scholars work<str<strong>on</strong>g>in</str<strong>on</strong>g>g <strong>on</strong> the palimpsest,<br />
from students of Ancient Greek to those explor<str<strong>on</strong>g>in</str<strong>on</strong>g>g the history of science, necessitated the ability to<br />
capture data from a range of scholars <str<strong>on</strong>g>in</str<strong>on</strong>g> a st<strong>and</strong>ard digital format. This necessity led to a “Transcripti<strong>on</strong><br />
Integrati<strong>on</strong> Plan” that <str<strong>on</strong>g>in</str<strong>on</strong>g>corporated Unicode, Dubl<str<strong>on</strong>g>in</str<strong>on</strong>g> Core, <strong>and</strong> the TEI. They expla<str<strong>on</strong>g>in</str<strong>on</strong>g>ed that they chose<br />
Dubl<str<strong>on</strong>g>in</str<strong>on</strong>g> Core as their major <str<strong>on</strong>g>in</str<strong>on</strong>g>tegrati<strong>on</strong> st<strong>and</strong>ard for digital images <strong>and</strong> transcripti<strong>on</strong>s because it would<br />
allow for “host<str<strong>on</strong>g>in</str<strong>on</strong>g>g <strong>and</strong> <str<strong>on</strong>g>in</str<strong>on</strong>g>tegrati<strong>on</strong> of this data set <strong>and</strong> other cultural works across service providers,<br />
libraries <strong>and</strong> cultural <str<strong>on</strong>g>in</str<strong>on</strong>g>stituti<strong>on</strong>s” (Toth <strong>and</strong> Emery 2008). While they used the “Identificati<strong>on</strong>,” “Data<br />
Type,” <strong>and</strong> “Data C<strong>on</strong>tent” elements from the Dubl<str<strong>on</strong>g>in</str<strong>on</strong>g> Core element set, they also needed to extend this<br />
st<strong>and</strong>ard with elements such as “Spatial Data Reference” drawn from the Federal Geographic Data<br />
Committee C<strong>on</strong>tent St<strong>and</strong>ard for Digital Geospatial Metadata.<br />
Emery <strong>and</strong> Toth (2009) argued that <strong>on</strong>e of the guid<str<strong>on</strong>g>in</str<strong>on</strong>g>g pr<str<strong>on</strong>g>in</str<strong>on</strong>g>ciples both beh<str<strong>on</strong>g>in</str<strong>on</strong>g>d their choice of comm<strong>on</strong><br />
st<strong>and</strong>ards <strong>and</strong> emphasis <strong>on</strong> the importance of <str<strong>on</strong>g>in</str<strong>on</strong>g>tegrat<str<strong>on</strong>g>in</str<strong>on</strong>g>g data <strong>and</strong> metadata was the need to create a<br />
digital archive for both today <strong>and</strong> the distant future. The data set they created thus also follows the<br />
pr<str<strong>on</strong>g>in</str<strong>on</strong>g>ciples of the Open Archival Informati<strong>on</strong> System (OAIS) 396 In their data set, every image bears all<br />
relevant metadata <str<strong>on</strong>g>in</str<strong>on</strong>g> its header, <strong>and</strong> each image file or folio directory serves as a self-c<strong>on</strong>ta<str<strong>on</strong>g>in</str<strong>on</strong>g>ed<br />
preservati<strong>on</strong> unit that <str<strong>on</strong>g>in</str<strong>on</strong>g>cludes all the images of a given folio side, XMP metadata files, checksum data<br />
<strong>and</strong> the spatially mapped TEI-XML transcripti<strong>on</strong>s. In additi<strong>on</strong>, the project developed its own<br />
Archimedes Palimpsest Metadata St<strong>and</strong>ard that “provides a metadata structure specifically geared to<br />
relat<str<strong>on</strong>g>in</str<strong>on</strong>g>g all images of a folio side <str<strong>on</strong>g>in</str<strong>on</strong>g> a s<str<strong>on</strong>g>in</str<strong>on</strong>g>gle multi- or hyper-spectral data “cube””<br />
(Emery <strong>and</strong> Toth 2009). Because each image has its own embedded metadata, the images can either<br />
st<strong>and</strong> al<strong>on</strong>e or be related to other members of the same cube. F<str<strong>on</strong>g>in</str<strong>on</strong>g>ally, more than 140 of the 180 folio<br />
sides <str<strong>on</strong>g>in</str<strong>on</strong>g>clude a transcripti<strong>on</strong>, <strong>and</strong> the l<str<strong>on</strong>g>in</str<strong>on</strong>g>es <str<strong>on</strong>g>in</str<strong>on</strong>g> these transcripti<strong>on</strong>s are mapped to rectangular regi<strong>on</strong>s <str<strong>on</strong>g>in</str<strong>on</strong>g><br />
the folio images us<str<strong>on</strong>g>in</str<strong>on</strong>g>g the TEI element. This mapp<str<strong>on</strong>g>in</str<strong>on</strong>g>g serves two useful purposes: it<br />
allows the digital transcripti<strong>on</strong>s to provide “mach<str<strong>on</strong>g>in</str<strong>on</strong>g>e-readable c<strong>on</strong>tent” <strong>and</strong> allows easy movement<br />
between the transcripti<strong>on</strong> <strong>and</strong> the image.<br />
In additi<strong>on</strong> to the challenges presented by <str<strong>on</strong>g>in</str<strong>on</strong>g>dividual manuscripts, other digital projects have explored<br />
the challenges of manag<str<strong>on</strong>g>in</str<strong>on</strong>g>g multiple manuscripts of the same text. The Roman de La Rose 397 Digital<br />
<strong>Library</strong> (RRDL), a jo<str<strong>on</strong>g>in</str<strong>on</strong>g>t project of the Sheridan Libraries of Johns Hopk<str<strong>on</strong>g>in</str<strong>on</strong>g>s University <strong>and</strong> the<br />
Bibliothèque Nati<strong>on</strong>ale de France (BnF), seeks to ultimately provide access to digital surrogates of all<br />
of the manuscripts (more than 300) c<strong>on</strong>ta<str<strong>on</strong>g>in</str<strong>on</strong>g><str<strong>on</strong>g>in</str<strong>on</strong>g>g the Roman de la Rose poem. The creati<strong>on</strong> of this digital<br />
396 For more <strong>on</strong> this ISO st<strong>and</strong>ard, see http://public.ccsds.org/publicati<strong>on</strong>s/archive/650x0b1.pdf<br />
397 http://rom<strong>and</strong>elarose.org/ - home