26.12.2014 Views

Rome Wasn't Digitized in a Day - Council on Library and Information ...

Rome Wasn't Digitized in a Day - Council on Library and Information ...

Rome Wasn't Digitized in a Day - Council on Library and Information ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

27<br />

such as a word grammar, <strong>and</strong> an “Analytical Lexic<strong>on</strong>” (a data file that c<strong>on</strong>ta<str<strong>on</strong>g>in</str<strong>on</strong>g>s previous scholarly<br />

analyses), data sets that are gradually built up from patterns that are registered <str<strong>on</strong>g>in</str<strong>on</strong>g> the Analytical<br />

Lexic<strong>on</strong>, <strong>and</strong> active human <str<strong>on</strong>g>in</str<strong>on</strong>g>terventi<strong>on</strong> where researchers must decide to accept or reject proposals (or<br />

add a new analysis) made by an automatic morphology program called Analyse.<br />

The major reas<strong>on</strong> to utilize an encod<str<strong>on</strong>g>in</str<strong>on</strong>g>g procedure rather than a tagg<str<strong>on</strong>g>in</str<strong>on</strong>g>g methodology, accord<str<strong>on</strong>g>in</str<strong>on</strong>g>g to van<br />

Peursen, is that it will guarantee c<strong>on</strong>sistent morphological analysis s<str<strong>on</strong>g>in</str<strong>on</strong>g>ce all functi<strong>on</strong>al deducti<strong>on</strong>s are<br />

automatically produced. “It has the advantage that not <strong>on</strong>ly the <str<strong>on</strong>g>in</str<strong>on</strong>g>terpretati<strong>on</strong> of a word, but also the<br />

data which led to a certa<str<strong>on</strong>g>in</str<strong>on</strong>g> <str<strong>on</strong>g>in</str<strong>on</strong>g>terpretati<strong>on</strong> can be retrieved,” van Peursen argued, “whereas the<br />

motivati<strong>on</strong> beh<str<strong>on</strong>g>in</str<strong>on</strong>g>d a tagg<str<strong>on</strong>g>in</str<strong>on</strong>g>g is usually not visible. It also has the advantage that both the surface forms<br />

<strong>and</strong> the functi<strong>on</strong>al analysis are preserved” (van Peursen 2009. In additi<strong>on</strong>, van Peursen stated that by<br />

us<str<strong>on</strong>g>in</str<strong>on</strong>g>g language-specific files such as the Analytical Lexic<strong>on</strong>, their system was utiliz<str<strong>on</strong>g>in</str<strong>on</strong>g>g exist<str<strong>on</strong>g>in</str<strong>on</strong>g>g<br />

scholarly knowledge regard<str<strong>on</strong>g>in</str<strong>on</strong>g>g Semitic studies. The encod<str<strong>on</strong>g>in</str<strong>on</strong>g>g system ultimately deployed also<br />

supported a first with<str<strong>on</strong>g>in</str<strong>on</strong>g> Semitic studies, van Peursen <str<strong>on</strong>g>in</str<strong>on</strong>g>sisted, the ability to test alternative<br />

<str<strong>on</strong>g>in</str<strong>on</strong>g>terpretati<strong>on</strong>s or scholarly assumpti<strong>on</strong>s aga<str<strong>on</strong>g>in</str<strong>on</strong>g>st actual data. This ability to represent multiple textual<br />

<str<strong>on</strong>g>in</str<strong>on</strong>g>terpretati<strong>on</strong>s was particularly important as the manuscript evidence for Syriac typically <str<strong>on</strong>g>in</str<strong>on</strong>g>cludes a<br />

large number of “orthographic variants.” 85 The encod<str<strong>on</strong>g>in</str<strong>on</strong>g>g system used by TURGAMA thus supports<br />

search<str<strong>on</strong>g>in</str<strong>on</strong>g>g for both “attested surface forms” found <str<strong>on</strong>g>in</str<strong>on</strong>g> actual manuscript witnesses <strong>and</strong> the abstract<br />

morphemes for these words.<br />

“This way of encod<str<strong>on</strong>g>in</str<strong>on</strong>g>g the verb forms attested <str<strong>on</strong>g>in</str<strong>on</strong>g> multiple textual witnesses provides us with a large<br />

database from which language variati<strong>on</strong> data can be retrieved,” van Peursen expla<str<strong>on</strong>g>in</str<strong>on</strong>g>ed, “In some cases<br />

language development is <str<strong>on</strong>g>in</str<strong>on</strong>g>volved as well, <strong>and</strong> the data can be used for diachr<strong>on</strong>ic analysis” (van<br />

Peursen 2010). All encod<str<strong>on</strong>g>in</str<strong>on</strong>g>gs developed by the TURGAMA Project, van Peursen cauti<strong>on</strong>ed, however,<br />

should ultimately be c<strong>on</strong>sidered as “hypotheses” that can be tested aga<str<strong>on</strong>g>in</str<strong>on</strong>g>st the data they have created.<br />

Cuneiform Texts <strong>and</strong> Sumerian<br />

Generally c<strong>on</strong>sidered to be the earliest writ<str<strong>on</strong>g>in</str<strong>on</strong>g>g system known <str<strong>on</strong>g>in</str<strong>on</strong>g> the world, cuneiform script was used<br />

<str<strong>on</strong>g>in</str<strong>on</strong>g> the Ancient Near East from about 3200 BC to about 100 AD. While the largest number of cuneiform<br />

texts represent the Sumerian language, the cuneiform script was adapted for other languages, <str<strong>on</strong>g>in</str<strong>on</strong>g>clud<str<strong>on</strong>g>in</str<strong>on</strong>g>g<br />

Akkadian, Elamite, <strong>and</strong> Hittite. Sumero-Akkadian cuneiform is the most comm<strong>on</strong> by far <strong>and</strong> is a<br />

complex “syllabic <strong>and</strong> ideographic writ<str<strong>on</strong>g>in</str<strong>on</strong>g>g system, with different signs for the various syllables”<br />

(Cohen et al. 2004). There are approximately 1,000 different cuneiform signs that form a complex<br />

script system where most signs are also “polyvalent,” or have multiple ph<strong>on</strong>emic <strong>and</strong> semantic<br />

realizati<strong>on</strong>s. Additi<strong>on</strong>al glyphs have also shown great “palaeographic development” over their three<br />

millennia of use (Cohen et al. 2004). Sumerian has also been described as a “language isolate”: <strong>on</strong>e for<br />

which no other related languages have been identified <strong>and</strong> that therefore lacks resources such as a<br />

“st<strong>and</strong>ardized sign list <strong>and</strong> comprehensive dicti<strong>on</strong>ary” (Ebel<str<strong>on</strong>g>in</str<strong>on</strong>g>g 2007). These various factors make<br />

digitiz<str<strong>on</strong>g>in</str<strong>on</strong>g>g, transliterat<str<strong>on</strong>g>in</str<strong>on</strong>g>g, <strong>and</strong> present<str<strong>on</strong>g>in</str<strong>on</strong>g>g cuneiform <strong>on</strong>l<str<strong>on</strong>g>in</str<strong>on</strong>g>e a complicated task.<br />

As <str<strong>on</strong>g>in</str<strong>on</strong>g>dicated by the size of the previously described CDLI, there are hundreds of thous<strong>and</strong>s of<br />

cuneiform tablets <strong>and</strong> other texts around the world <str<strong>on</strong>g>in</str<strong>on</strong>g> both private <strong>and</strong> public collecti<strong>on</strong>s. In additi<strong>on</strong> to<br />

the CDLI, there are a number of significant digital collecti<strong>on</strong>s <strong>and</strong> corpora of cuneiform texts <strong>on</strong>l<str<strong>on</strong>g>in</str<strong>on</strong>g>e.<br />

This secti<strong>on</strong> describes them, al<strong>on</strong>g with relevant literature, briefly.<br />

85 The topic of textual variants found with<str<strong>on</strong>g>in</str<strong>on</strong>g> various witnesses to a text <strong>and</strong> the need to develop appropriate encod<str<strong>on</strong>g>in</str<strong>on</strong>g>g/markup models to represent them<br />

was also reported <str<strong>on</strong>g>in</str<strong>on</strong>g> the secti<strong>on</strong>s <strong>on</strong> Greek <strong>and</strong> Sanskrit, <strong>and</strong> is more fully discussed <str<strong>on</strong>g>in</str<strong>on</strong>g> the secti<strong>on</strong> <strong>on</strong> Digital Editi<strong>on</strong>s.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!