26.12.2014 Views

Rome Wasn't Digitized in a Day - Council on Library and Information ...

Rome Wasn't Digitized in a Day - Council on Library and Information ...

Rome Wasn't Digitized in a Day - Council on Library and Information ...

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

124<br />

While the advanced document-recogniti<strong>on</strong> technology used with the Archimedes Palimpsest has been<br />

discussed previously, the metadata <strong>and</strong> l<str<strong>on</strong>g>in</str<strong>on</strong>g>k<str<strong>on</strong>g>in</str<strong>on</strong>g>g strategies used to l<str<strong>on</strong>g>in</str<strong>on</strong>g>k manuscript metadata, images,<br />

<strong>and</strong> transcripti<strong>on</strong>s that were developed merit some further discussi<strong>on</strong>. Two recent articles by Doug<br />

Emery <strong>and</strong> Michael B. Toth (Emery <strong>and</strong> Toth 2009, Toth <strong>and</strong> Emery 2008) have described this process<br />

<str<strong>on</strong>g>in</str<strong>on</strong>g> detail. The creati<strong>on</strong> of the Archimedes Palimpsest Digital product, which released <strong>on</strong>e terabyte of<br />

<str<strong>on</strong>g>in</str<strong>on</strong>g>tegrated image <strong>and</strong> transcripti<strong>on</strong> data, required the spatial l<str<strong>on</strong>g>in</str<strong>on</strong>g>k<str<strong>on</strong>g>in</str<strong>on</strong>g>g of registered images for each leaf<br />

“to diplomatic transcripti<strong>on</strong>s that scholars <str<strong>on</strong>g>in</str<strong>on</strong>g>itially created <str<strong>on</strong>g>in</str<strong>on</strong>g> various n<strong>on</strong>st<strong>and</strong>ard formats, with<br />

associated st<strong>and</strong>ardized metadata” (Emery <strong>and</strong> Toth 2009). The transcripti<strong>on</strong> encod<str<strong>on</strong>g>in</str<strong>on</strong>g>g built off of<br />

previous work c<strong>on</strong>ducted by the HMT project, <strong>and</strong> Emery <strong>and</strong> Toth noted that st<strong>and</strong>ardized metadata<br />

were critical for three purposes: “(1) access to <strong>and</strong> <str<strong>on</strong>g>in</str<strong>on</strong>g>tegrati<strong>on</strong> of images for digital process<str<strong>on</strong>g>in</str<strong>on</strong>g>g <strong>and</strong><br />

enhancement, (2) management of transcripti<strong>on</strong>s from those images, <strong>and</strong> (3) l<str<strong>on</strong>g>in</str<strong>on</strong>g>kage of the images with<br />

the transcripti<strong>on</strong>s.”<br />

The authors also described how the great discipl<str<strong>on</strong>g>in</str<strong>on</strong>g>ary variety of scholars work<str<strong>on</strong>g>in</str<strong>on</strong>g>g <strong>on</strong> the palimpsest,<br />

from students of Ancient Greek to those explor<str<strong>on</strong>g>in</str<strong>on</strong>g>g the history of science, necessitated the ability to<br />

capture data from a range of scholars <str<strong>on</strong>g>in</str<strong>on</strong>g> a st<strong>and</strong>ard digital format. This necessity led to a “Transcripti<strong>on</strong><br />

Integrati<strong>on</strong> Plan” that <str<strong>on</strong>g>in</str<strong>on</strong>g>corporated Unicode, Dubl<str<strong>on</strong>g>in</str<strong>on</strong>g> Core, <strong>and</strong> the TEI. They expla<str<strong>on</strong>g>in</str<strong>on</strong>g>ed that they chose<br />

Dubl<str<strong>on</strong>g>in</str<strong>on</strong>g> Core as their major <str<strong>on</strong>g>in</str<strong>on</strong>g>tegrati<strong>on</strong> st<strong>and</strong>ard for digital images <strong>and</strong> transcripti<strong>on</strong>s because it would<br />

allow for “host<str<strong>on</strong>g>in</str<strong>on</strong>g>g <strong>and</strong> <str<strong>on</strong>g>in</str<strong>on</strong>g>tegrati<strong>on</strong> of this data set <strong>and</strong> other cultural works across service providers,<br />

libraries <strong>and</strong> cultural <str<strong>on</strong>g>in</str<strong>on</strong>g>stituti<strong>on</strong>s” (Toth <strong>and</strong> Emery 2008). While they used the “Identificati<strong>on</strong>,” “Data<br />

Type,” <strong>and</strong> “Data C<strong>on</strong>tent” elements from the Dubl<str<strong>on</strong>g>in</str<strong>on</strong>g> Core element set, they also needed to extend this<br />

st<strong>and</strong>ard with elements such as “Spatial Data Reference” drawn from the Federal Geographic Data<br />

Committee C<strong>on</strong>tent St<strong>and</strong>ard for Digital Geospatial Metadata.<br />

Emery <strong>and</strong> Toth (2009) argued that <strong>on</strong>e of the guid<str<strong>on</strong>g>in</str<strong>on</strong>g>g pr<str<strong>on</strong>g>in</str<strong>on</strong>g>ciples both beh<str<strong>on</strong>g>in</str<strong>on</strong>g>d their choice of comm<strong>on</strong><br />

st<strong>and</strong>ards <strong>and</strong> emphasis <strong>on</strong> the importance of <str<strong>on</strong>g>in</str<strong>on</strong>g>tegrat<str<strong>on</strong>g>in</str<strong>on</strong>g>g data <strong>and</strong> metadata was the need to create a<br />

digital archive for both today <strong>and</strong> the distant future. The data set they created thus also follows the<br />

pr<str<strong>on</strong>g>in</str<strong>on</strong>g>ciples of the Open Archival Informati<strong>on</strong> System (OAIS) 396 In their data set, every image bears all<br />

relevant metadata <str<strong>on</strong>g>in</str<strong>on</strong>g> its header, <strong>and</strong> each image file or folio directory serves as a self-c<strong>on</strong>ta<str<strong>on</strong>g>in</str<strong>on</strong>g>ed<br />

preservati<strong>on</strong> unit that <str<strong>on</strong>g>in</str<strong>on</strong>g>cludes all the images of a given folio side, XMP metadata files, checksum data<br />

<strong>and</strong> the spatially mapped TEI-XML transcripti<strong>on</strong>s. In additi<strong>on</strong>, the project developed its own<br />

Archimedes Palimpsest Metadata St<strong>and</strong>ard that “provides a metadata structure specifically geared to<br />

relat<str<strong>on</strong>g>in</str<strong>on</strong>g>g all images of a folio side <str<strong>on</strong>g>in</str<strong>on</strong>g> a s<str<strong>on</strong>g>in</str<strong>on</strong>g>gle multi- or hyper-spectral data “cube””<br />

(Emery <strong>and</strong> Toth 2009). Because each image has its own embedded metadata, the images can either<br />

st<strong>and</strong> al<strong>on</strong>e or be related to other members of the same cube. F<str<strong>on</strong>g>in</str<strong>on</strong>g>ally, more than 140 of the 180 folio<br />

sides <str<strong>on</strong>g>in</str<strong>on</strong>g>clude a transcripti<strong>on</strong>, <strong>and</strong> the l<str<strong>on</strong>g>in</str<strong>on</strong>g>es <str<strong>on</strong>g>in</str<strong>on</strong>g> these transcripti<strong>on</strong>s are mapped to rectangular regi<strong>on</strong>s <str<strong>on</strong>g>in</str<strong>on</strong>g><br />

the folio images us<str<strong>on</strong>g>in</str<strong>on</strong>g>g the TEI element. This mapp<str<strong>on</strong>g>in</str<strong>on</strong>g>g serves two useful purposes: it<br />

allows the digital transcripti<strong>on</strong>s to provide “mach<str<strong>on</strong>g>in</str<strong>on</strong>g>e-readable c<strong>on</strong>tent” <strong>and</strong> allows easy movement<br />

between the transcripti<strong>on</strong> <strong>and</strong> the image.<br />

In additi<strong>on</strong> to the challenges presented by <str<strong>on</strong>g>in</str<strong>on</strong>g>dividual manuscripts, other digital projects have explored<br />

the challenges of manag<str<strong>on</strong>g>in</str<strong>on</strong>g>g multiple manuscripts of the same text. The Roman de La Rose 397 Digital<br />

<strong>Library</strong> (RRDL), a jo<str<strong>on</strong>g>in</str<strong>on</strong>g>t project of the Sheridan Libraries of Johns Hopk<str<strong>on</strong>g>in</str<strong>on</strong>g>s University <strong>and</strong> the<br />

Bibliothèque Nati<strong>on</strong>ale de France (BnF), seeks to ultimately provide access to digital surrogates of all<br />

of the manuscripts (more than 300) c<strong>on</strong>ta<str<strong>on</strong>g>in</str<strong>on</strong>g><str<strong>on</strong>g>in</str<strong>on</strong>g>g the Roman de la Rose poem. The creati<strong>on</strong> of this digital<br />

396 For more <strong>on</strong> this ISO st<strong>and</strong>ard, see http://public.ccsds.org/publicati<strong>on</strong>s/archive/650x0b1.pdf<br />

397 http://rom<strong>and</strong>elarose.org/ - home

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!