26.12.2014 Views

Rome Wasn't Digitized in a Day - Council on Library and Information ...

Rome Wasn't Digitized in a Day - Council on Library and Information ...

Rome Wasn't Digitized in a Day - Council on Library and Information ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

17<br />

should segment the text <str<strong>on</strong>g>in</str<strong>on</strong>g>to blocks—which may be smaller than words—while recogniz<str<strong>on</strong>g>in</str<strong>on</strong>g>g<br />

(Edwards et al. 2004).<br />

In their model, Edwards <strong>and</strong> colleagues chose not to model word to word transiti<strong>on</strong> probabilities s<str<strong>on</strong>g>in</str<strong>on</strong>g>ce<br />

word order <str<strong>on</strong>g>in</str<strong>on</strong>g> Lat<str<strong>on</strong>g>in</str<strong>on</strong>g> is highly arbitrary. The method had reas<strong>on</strong>able accuracy: 75 percent of the letters<br />

were correctly transcribed <strong>and</strong> the search<str<strong>on</strong>g>in</str<strong>on</strong>g>g ability was reported to be relatively str<strong>on</strong>g.<br />

Some research with document analysis of Lat<str<strong>on</strong>g>in</str<strong>on</strong>g> manuscripts has focused <strong>on</strong> assist<str<strong>on</strong>g>in</str<strong>on</strong>g>g palaeographers.<br />

The discipl<str<strong>on</strong>g>in</str<strong>on</strong>g>e of palaeography is explored further <str<strong>on</strong>g>in</str<strong>on</strong>g> its subsecti<strong>on</strong>, but <str<strong>on</strong>g>in</str<strong>on</strong>g> general, palaeography<br />

studies the writ<str<strong>on</strong>g>in</str<strong>on</strong>g>g style of ancient documents. 53 Moalla et al. (2006) c<strong>on</strong>ducted automatic analysis of<br />

the writ<str<strong>on</strong>g>in</str<strong>on</strong>g>g styles of ancient Lat<str<strong>on</strong>g>in</str<strong>on</strong>g> manuscripts from the eighth to the sixteenth centuries <strong>and</strong> focused<br />

<strong>on</strong> the extracti<strong>on</strong> of “sufficiently discrim<str<strong>on</strong>g>in</str<strong>on</strong>g>ative features” to be able to differentiate between<br />

sufficiently large numbers of Lat<str<strong>on</strong>g>in</str<strong>on</strong>g> writ<str<strong>on</strong>g>in</str<strong>on</strong>g>gs. A number of problems complicated their image analysis,<br />

<str<strong>on</strong>g>in</str<strong>on</strong>g>clud<str<strong>on</strong>g>in</str<strong>on</strong>g>g the complexity of the shapes of letters, hybrid writ<str<strong>on</strong>g>in</str<strong>on</strong>g>g styles, poor manuscript quality,<br />

overlapp<str<strong>on</strong>g>in</str<strong>on</strong>g>g l<str<strong>on</strong>g>in</str<strong>on</strong>g>es <strong>and</strong> words, <strong>and</strong> poor-quality manuscript images. Their discrim<str<strong>on</strong>g>in</str<strong>on</strong>g>ant analysis of 15<br />

Lat<str<strong>on</strong>g>in</str<strong>on</strong>g> classes achieved a classificati<strong>on</strong>-accuracy rate of <strong>on</strong>ly 59 percent <str<strong>on</strong>g>in</str<strong>on</strong>g> their first iterati<strong>on</strong>, but the<br />

elim<str<strong>on</strong>g>in</str<strong>on</strong>g>ati<strong>on</strong> of four classes that were not statistically well-represented <str<strong>on</strong>g>in</str<strong>on</strong>g>creased the rate to 81 percent.<br />

Another key area of technology research is the development of techniques for digitiz<str<strong>on</strong>g>in</str<strong>on</strong>g>g <strong>and</strong> search<str<strong>on</strong>g>in</str<strong>on</strong>g>g<br />

<str<strong>on</strong>g>in</str<strong>on</strong>g>cunabula, or early pr<str<strong>on</strong>g>in</str<strong>on</strong>g>ted books, a large number of which were pr<str<strong>on</strong>g>in</str<strong>on</strong>g>ted <str<strong>on</strong>g>in</str<strong>on</strong>g> Lat<str<strong>on</strong>g>in</str<strong>on</strong>g>. One major project<br />

<str<strong>on</strong>g>in</str<strong>on</strong>g> this area is CAMENA—Lat<str<strong>on</strong>g>in</str<strong>on</strong>g> Texts of Early Modern Europe, 54 hosted by the University of<br />

Mannheim. Their digital library <str<strong>on</strong>g>in</str<strong>on</strong>g>cludes five collecti<strong>on</strong>s: a collecti<strong>on</strong> of Neo-Lat<str<strong>on</strong>g>in</str<strong>on</strong>g> poetry composed<br />

by German authors available as images <strong>and</strong> mach<str<strong>on</strong>g>in</str<strong>on</strong>g>e-readable texts; a collecti<strong>on</strong> of Lat<str<strong>on</strong>g>in</str<strong>on</strong>g> historical <strong>and</strong><br />

political writ<str<strong>on</strong>g>in</str<strong>on</strong>g>g from early modern Germany; a reference collecti<strong>on</strong> of dicti<strong>on</strong>aries <strong>and</strong> h<strong>and</strong>books<br />

from 1500–1750 that helps provide a read<str<strong>on</strong>g>in</str<strong>on</strong>g>g envir<strong>on</strong>ment; a corpus of Lat<str<strong>on</strong>g>in</str<strong>on</strong>g> letters written by German<br />

scholars between 1530 <strong>and</strong> 1770; <strong>and</strong> a collecti<strong>on</strong> of early pr<str<strong>on</strong>g>in</str<strong>on</strong>g>ted editi<strong>on</strong>s of Italian Renaissance<br />

humanists born before 1500. This project also <str<strong>on</strong>g>in</str<strong>on</strong>g>cludes the Term<str<strong>on</strong>g>in</str<strong>on</strong>g>i <strong>and</strong> Lemmata databases, which are<br />

now part of the eAQUA Project. The wealth of Neo-Lat<str<strong>on</strong>g>in</str<strong>on</strong>g> materials <strong>on</strong>l<str<strong>on</strong>g>in</str<strong>on</strong>g>e is well documented by the<br />

“Philological Museum: An Analytic Bibliography of On-L<str<strong>on</strong>g>in</str<strong>on</strong>g>e Neo Lat<str<strong>on</strong>g>in</str<strong>on</strong>g> Texts,” 55 an extensive<br />

website created by Dana F. Sutt<strong>on</strong> of the University of California, Irv<str<strong>on</strong>g>in</str<strong>on</strong>g>e, that s<str<strong>on</strong>g>in</str<strong>on</strong>g>ce 1999 has served as<br />

an “analytic bibliography of Lat<str<strong>on</strong>g>in</str<strong>on</strong>g> texts written dur<str<strong>on</strong>g>in</str<strong>on</strong>g>g the Renaissance <strong>and</strong> later that are freely<br />

available to the general public <strong>on</strong> the Web” <strong>and</strong> <str<strong>on</strong>g>in</str<strong>on</strong>g>cludes more than 33,960 records.<br />

Digitiz<str<strong>on</strong>g>in</str<strong>on</strong>g>g <str<strong>on</strong>g>in</str<strong>on</strong>g>cunabula, or books pr<str<strong>on</strong>g>in</str<strong>on</strong>g>ted before 1500, poses a number of challenges, as outl<str<strong>on</strong>g>in</str<strong>on</strong>g>ed by<br />

Schibel <strong>and</strong> Rydberg-Cox (2006) <strong>and</strong> Rydberg-Cox (2009). As they expla<str<strong>on</strong>g>in</str<strong>on</strong>g>ed:<br />

The primary challenges arise from the use of n<strong>on</strong>st<strong>and</strong>ard typographical glyphs based <strong>on</strong><br />

medieval h<strong>and</strong>writ<str<strong>on</strong>g>in</str<strong>on</strong>g>g to abbreviate words. Further difficulties are posed by the practice of<br />

<str<strong>on</strong>g>in</str<strong>on</strong>g>c<strong>on</strong>sistently mark<str<strong>on</strong>g>in</str<strong>on</strong>g>g word breaks at the end of l<str<strong>on</strong>g>in</str<strong>on</strong>g>es <strong>and</strong> of reduc<str<strong>on</strong>g>in</str<strong>on</strong>g>g or even elim<str<strong>on</strong>g>in</str<strong>on</strong>g>at<str<strong>on</strong>g>in</str<strong>on</strong>g>g<br />

spac<str<strong>on</strong>g>in</str<strong>on</strong>g>g between some words (Rydberg-Cox 2009).<br />

In additi<strong>on</strong>, such digitized texts are often presented to a modern audience <strong>on</strong>ly after an extensive<br />

amount of edit<str<strong>on</strong>g>in</str<strong>on</strong>g>g <strong>and</strong> annotati<strong>on</strong> has occurred, a level of edit<str<strong>on</strong>g>in</str<strong>on</strong>g>g that is not scalable to milli<strong>on</strong>-book<br />

libraries.<br />

53 An excellent resource for explor<str<strong>on</strong>g>in</str<strong>on</strong>g>g ancient writ<str<strong>on</strong>g>in</str<strong>on</strong>g>g systems is Mnam<strong>on</strong>: Ancient Writ<str<strong>on</strong>g>in</str<strong>on</strong>g>g Systems <str<strong>on</strong>g>in</str<strong>on</strong>g> the Mediterranean<br />

(http://lila.sns.it/mnam<strong>on</strong>/<str<strong>on</strong>g>in</str<strong>on</strong>g>dex.phppage=Home&lang=en), which not <strong>on</strong>ly provides extensive descripti<strong>on</strong>s <strong>on</strong> various writ<str<strong>on</strong>g>in</str<strong>on</strong>g>g systems but also <str<strong>on</strong>g>in</str<strong>on</strong>g>cludes<br />

selected electr<strong>on</strong>ic resources.<br />

54 http://www.uni-mannheim.de/mateo/camenahtdocs/camena.html<br />

55 http://www.philological.bham.ac.uk/bibliography/

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!