Rome Wasn't Digitized in a Day - Council on Library and Information ...
Rome Wasn't Digitized in a Day - Council on Library and Information ...
Rome Wasn't Digitized in a Day - Council on Library and Information ...
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
27<br />
such as a word grammar, <strong>and</strong> an “Analytical Lexic<strong>on</strong>” (a data file that c<strong>on</strong>ta<str<strong>on</strong>g>in</str<strong>on</strong>g>s previous scholarly<br />
analyses), data sets that are gradually built up from patterns that are registered <str<strong>on</strong>g>in</str<strong>on</strong>g> the Analytical<br />
Lexic<strong>on</strong>, <strong>and</strong> active human <str<strong>on</strong>g>in</str<strong>on</strong>g>terventi<strong>on</strong> where researchers must decide to accept or reject proposals (or<br />
add a new analysis) made by an automatic morphology program called Analyse.<br />
The major reas<strong>on</strong> to utilize an encod<str<strong>on</strong>g>in</str<strong>on</strong>g>g procedure rather than a tagg<str<strong>on</strong>g>in</str<strong>on</strong>g>g methodology, accord<str<strong>on</strong>g>in</str<strong>on</strong>g>g to van<br />
Peursen, is that it will guarantee c<strong>on</strong>sistent morphological analysis s<str<strong>on</strong>g>in</str<strong>on</strong>g>ce all functi<strong>on</strong>al deducti<strong>on</strong>s are<br />
automatically produced. “It has the advantage that not <strong>on</strong>ly the <str<strong>on</strong>g>in</str<strong>on</strong>g>terpretati<strong>on</strong> of a word, but also the<br />
data which led to a certa<str<strong>on</strong>g>in</str<strong>on</strong>g> <str<strong>on</strong>g>in</str<strong>on</strong>g>terpretati<strong>on</strong> can be retrieved,” van Peursen argued, “whereas the<br />
motivati<strong>on</strong> beh<str<strong>on</strong>g>in</str<strong>on</strong>g>d a tagg<str<strong>on</strong>g>in</str<strong>on</strong>g>g is usually not visible. It also has the advantage that both the surface forms<br />
<strong>and</strong> the functi<strong>on</strong>al analysis are preserved” (van Peursen 2009. In additi<strong>on</strong>, van Peursen stated that by<br />
us<str<strong>on</strong>g>in</str<strong>on</strong>g>g language-specific files such as the Analytical Lexic<strong>on</strong>, their system was utiliz<str<strong>on</strong>g>in</str<strong>on</strong>g>g exist<str<strong>on</strong>g>in</str<strong>on</strong>g>g<br />
scholarly knowledge regard<str<strong>on</strong>g>in</str<strong>on</strong>g>g Semitic studies. The encod<str<strong>on</strong>g>in</str<strong>on</strong>g>g system ultimately deployed also<br />
supported a first with<str<strong>on</strong>g>in</str<strong>on</strong>g> Semitic studies, van Peursen <str<strong>on</strong>g>in</str<strong>on</strong>g>sisted, the ability to test alternative<br />
<str<strong>on</strong>g>in</str<strong>on</strong>g>terpretati<strong>on</strong>s or scholarly assumpti<strong>on</strong>s aga<str<strong>on</strong>g>in</str<strong>on</strong>g>st actual data. This ability to represent multiple textual<br />
<str<strong>on</strong>g>in</str<strong>on</strong>g>terpretati<strong>on</strong>s was particularly important as the manuscript evidence for Syriac typically <str<strong>on</strong>g>in</str<strong>on</strong>g>cludes a<br />
large number of “orthographic variants.” 85 The encod<str<strong>on</strong>g>in</str<strong>on</strong>g>g system used by TURGAMA thus supports<br />
search<str<strong>on</strong>g>in</str<strong>on</strong>g>g for both “attested surface forms” found <str<strong>on</strong>g>in</str<strong>on</strong>g> actual manuscript witnesses <strong>and</strong> the abstract<br />
morphemes for these words.<br />
“This way of encod<str<strong>on</strong>g>in</str<strong>on</strong>g>g the verb forms attested <str<strong>on</strong>g>in</str<strong>on</strong>g> multiple textual witnesses provides us with a large<br />
database from which language variati<strong>on</strong> data can be retrieved,” van Peursen expla<str<strong>on</strong>g>in</str<strong>on</strong>g>ed, “In some cases<br />
language development is <str<strong>on</strong>g>in</str<strong>on</strong>g>volved as well, <strong>and</strong> the data can be used for diachr<strong>on</strong>ic analysis” (van<br />
Peursen 2010). All encod<str<strong>on</strong>g>in</str<strong>on</strong>g>gs developed by the TURGAMA Project, van Peursen cauti<strong>on</strong>ed, however,<br />
should ultimately be c<strong>on</strong>sidered as “hypotheses” that can be tested aga<str<strong>on</strong>g>in</str<strong>on</strong>g>st the data they have created.<br />
Cuneiform Texts <strong>and</strong> Sumerian<br />
Generally c<strong>on</strong>sidered to be the earliest writ<str<strong>on</strong>g>in</str<strong>on</strong>g>g system known <str<strong>on</strong>g>in</str<strong>on</strong>g> the world, cuneiform script was used<br />
<str<strong>on</strong>g>in</str<strong>on</strong>g> the Ancient Near East from about 3200 BC to about 100 AD. While the largest number of cuneiform<br />
texts represent the Sumerian language, the cuneiform script was adapted for other languages, <str<strong>on</strong>g>in</str<strong>on</strong>g>clud<str<strong>on</strong>g>in</str<strong>on</strong>g>g<br />
Akkadian, Elamite, <strong>and</strong> Hittite. Sumero-Akkadian cuneiform is the most comm<strong>on</strong> by far <strong>and</strong> is a<br />
complex “syllabic <strong>and</strong> ideographic writ<str<strong>on</strong>g>in</str<strong>on</strong>g>g system, with different signs for the various syllables”<br />
(Cohen et al. 2004). There are approximately 1,000 different cuneiform signs that form a complex<br />
script system where most signs are also “polyvalent,” or have multiple ph<strong>on</strong>emic <strong>and</strong> semantic<br />
realizati<strong>on</strong>s. Additi<strong>on</strong>al glyphs have also shown great “palaeographic development” over their three<br />
millennia of use (Cohen et al. 2004). Sumerian has also been described as a “language isolate”: <strong>on</strong>e for<br />
which no other related languages have been identified <strong>and</strong> that therefore lacks resources such as a<br />
“st<strong>and</strong>ardized sign list <strong>and</strong> comprehensive dicti<strong>on</strong>ary” (Ebel<str<strong>on</strong>g>in</str<strong>on</strong>g>g 2007). These various factors make<br />
digitiz<str<strong>on</strong>g>in</str<strong>on</strong>g>g, transliterat<str<strong>on</strong>g>in</str<strong>on</strong>g>g, <strong>and</strong> present<str<strong>on</strong>g>in</str<strong>on</strong>g>g cuneiform <strong>on</strong>l<str<strong>on</strong>g>in</str<strong>on</strong>g>e a complicated task.<br />
As <str<strong>on</strong>g>in</str<strong>on</strong>g>dicated by the size of the previously described CDLI, there are hundreds of thous<strong>and</strong>s of<br />
cuneiform tablets <strong>and</strong> other texts around the world <str<strong>on</strong>g>in</str<strong>on</strong>g> both private <strong>and</strong> public collecti<strong>on</strong>s. In additi<strong>on</strong> to<br />
the CDLI, there are a number of significant digital collecti<strong>on</strong>s <strong>and</strong> corpora of cuneiform texts <strong>on</strong>l<str<strong>on</strong>g>in</str<strong>on</strong>g>e.<br />
This secti<strong>on</strong> describes them, al<strong>on</strong>g with relevant literature, briefly.<br />
85 The topic of textual variants found with<str<strong>on</strong>g>in</str<strong>on</strong>g> various witnesses to a text <strong>and</strong> the need to develop appropriate encod<str<strong>on</strong>g>in</str<strong>on</strong>g>g/markup models to represent them<br />
was also reported <str<strong>on</strong>g>in</str<strong>on</strong>g> the secti<strong>on</strong>s <strong>on</strong> Greek <strong>and</strong> Sanskrit, <strong>and</strong> is more fully discussed <str<strong>on</strong>g>in</str<strong>on</strong>g> the secti<strong>on</strong> <strong>on</strong> Digital Editi<strong>on</strong>s.