26.12.2014 Views

Rome Wasn't Digitized in a Day - Council on Library and Information ...

Rome Wasn't Digitized in a Day - Council on Library and Information ...

Rome Wasn't Digitized in a Day - Council on Library and Information ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

25<br />

The eBeth Arké Syriac Studies Collecti<strong>on</strong>, described by its project website as “an electr<strong>on</strong>ic <strong>on</strong>l<str<strong>on</strong>g>in</str<strong>on</strong>g>e<br />

resource library for Syriac studies,” complements the Syriac Studies Reference <strong>Library</strong> 74 that is hosted<br />

by BYU. Hold<str<strong>on</strong>g>in</str<strong>on</strong>g>gs for both collecti<strong>on</strong>s have come from the Semitics/Institute of Christian Oriental<br />

Research <strong>Library</strong> 75 at the Catholic University of America <str<strong>on</strong>g>in</str<strong>on</strong>g> Wash<str<strong>on</strong>g>in</str<strong>on</strong>g>gt<strong>on</strong>, DC. When complete, eBeth<br />

Arké will <str<strong>on</strong>g>in</str<strong>on</strong>g>clude approximately 650 digitized items. 76 This collecti<strong>on</strong> of Syriac texts <str<strong>on</strong>g>in</str<strong>on</strong>g>cludes early<br />

pr<str<strong>on</strong>g>in</str<strong>on</strong>g>ted catalogs, grammars, <strong>and</strong> lexic<strong>on</strong>s, as well as many other rare volumes. The eBeth Arké<br />

Collecti<strong>on</strong> is hosted <strong>on</strong> Vivarium, the digital library of the Hill Museum & Manuscript <strong>Library</strong><br />

(HMML), 77 which uses the proprietary software CONTENTdm 78 to manage its digital collecti<strong>on</strong>, <strong>and</strong><br />

can be searched or browsed by a variety of opti<strong>on</strong>s (e.g., keyword, name of author). Each digitized<br />

item can be viewed <strong>on</strong>l<str<strong>on</strong>g>in</str<strong>on</strong>g>e as an image book (<str<strong>on</strong>g>in</str<strong>on</strong>g> PDF format), <strong>and</strong> users can also create a collecti<strong>on</strong> of<br />

favorites. Access to these books is also available through the Syriac Studies Reference <strong>Library</strong> at<br />

BYU, <strong>and</strong> the collecti<strong>on</strong> can be browsed by ancient author (e.g., Cyril of Alex<strong>and</strong>ria, Philoxenus) or<br />

topic, or searched by keyword.<br />

The largest research project <str<strong>on</strong>g>in</str<strong>on</strong>g> terms of provid<str<strong>on</strong>g>in</str<strong>on</strong>g>g access to digitized Syriac texts is currently under<br />

way at BYU <strong>and</strong> seeks to create a comprehensive Syriac corpus of electr<strong>on</strong>ic texts. The project website<br />

notes that no “coord<str<strong>on</strong>g>in</str<strong>on</strong>g>ated <strong>and</strong> large scale effort has yet been attempted” 79 so BYU began work<str<strong>on</strong>g>in</str<strong>on</strong>g>g to<br />

create a Syriac corpus <str<strong>on</strong>g>in</str<strong>on</strong>g> 2001 <strong>and</strong> their efforts were jo<str<strong>on</strong>g>in</str<strong>on</strong>g>ed by David G. K. Taylor of Oxford<br />

University <str<strong>on</strong>g>in</str<strong>on</strong>g> 2004. The project is work<str<strong>on</strong>g>in</str<strong>on</strong>g>g with both pr<str<strong>on</strong>g>in</str<strong>on</strong>g>ted editi<strong>on</strong>s of Syriac <strong>and</strong> manuscript<br />

collecti<strong>on</strong>s. 80 An electr<strong>on</strong>ic lexic<strong>on</strong> will be fully <str<strong>on</strong>g>in</str<strong>on</strong>g>tegrated <str<strong>on</strong>g>in</str<strong>on</strong>g>to this corpus, <strong>and</strong> the project has chosen<br />

to use Jessie Payne-Smith’s Compendious Syriac Dicti<strong>on</strong>ary. This dicti<strong>on</strong>ary is be<str<strong>on</strong>g>in</str<strong>on</strong>g>g c<strong>on</strong>verted <str<strong>on</strong>g>in</str<strong>on</strong>g>to a<br />

lexical database, <strong>and</strong> each word that is tagged <str<strong>on</strong>g>in</str<strong>on</strong>g> the corpus will be l<str<strong>on</strong>g>in</str<strong>on</strong>g>ked to its appropriate lexic<strong>on</strong><br />

entry.<br />

Several recent publicati<strong>on</strong>s address the work currently be<str<strong>on</strong>g>in</str<strong>on</strong>g>g undertaken <str<strong>on</strong>g>in</str<strong>on</strong>g> the development of the<br />

“BYU-Oxford Corpus of Syriac Literature.” McClanahan et al. (2010) describe Syriac as an “underresourced”<br />

dialect <str<strong>on</strong>g>in</str<strong>on</strong>g> that there are either few or no language tools (e.g., morphological analyzers, POS<br />

taggers) available to work with <strong>and</strong> there are relatively little “labeled data” available (e.g., tagged<br />

corpora, digitized Syriac texts) up<strong>on</strong> which to tra<str<strong>on</strong>g>in</str<strong>on</strong>g> <strong>and</strong> test algorithms. 81 Despite these challenges,<br />

McClanahan et al. sought to replicate the type of manual annotati<strong>on</strong> used to create the Peshitta New<br />

Testament (Kiraz 1994) <strong>on</strong> a far larger scale us<str<strong>on</strong>g>in</str<strong>on</strong>g>g a number of computati<strong>on</strong>al tools <strong>and</strong> the data from<br />

this s<str<strong>on</strong>g>in</str<strong>on</strong>g>gle labeled resource, or essentially to automatically annotate Syriac texts <str<strong>on</strong>g>in</str<strong>on</strong>g> a “data-driven<br />

fashi<strong>on</strong>,” an approach they labeled Syromorph. They created a probabilistic morphological analyzer for<br />

Syriac that made use of “five probabilistic sub-models that can be tra<str<strong>on</strong>g>in</str<strong>on</strong>g>ed <str<strong>on</strong>g>in</str<strong>on</strong>g> a supervised fashi<strong>on</strong> <strong>and</strong><br />

comb<str<strong>on</strong>g>in</str<strong>on</strong>g>ed <str<strong>on</strong>g>in</str<strong>on</strong>g> a jo<str<strong>on</strong>g>in</str<strong>on</strong>g>t model of morphological annotati<strong>on</strong>” (McClanahan et al. 2010).<br />

One other major c<strong>on</strong>tributi<strong>on</strong> of their work is the <str<strong>on</strong>g>in</str<strong>on</strong>g>troducti<strong>on</strong> of “novel algorithms” for the important<br />

natural language process<str<strong>on</strong>g>in</str<strong>on</strong>g>g (NLP) subtasks of segmentati<strong>on</strong> (often described as tokenizati<strong>on</strong>),<br />

dicti<strong>on</strong>ary l<str<strong>on</strong>g>in</str<strong>on</strong>g>kage, <strong>and</strong> morphological tagg<str<strong>on</strong>g>in</str<strong>on</strong>g>g. All these algorithms made use of maximum entropy<br />

74 http://www.lib.byu.edu/dlib/cua/<br />

75 http://libraries.cua.edu/semicoll/<str<strong>on</strong>g>in</str<strong>on</strong>g>dex.html<br />

76 Informati<strong>on</strong> about the digitizati<strong>on</strong> of these materials is available <str<strong>on</strong>g>in</str<strong>on</strong>g> a project report that was published <str<strong>on</strong>g>in</str<strong>on</strong>g> Hugoye, the <strong>on</strong>l<str<strong>on</strong>g>in</str<strong>on</strong>g>e journal of Syriac studies<br />

(http://syrcom.cua.edu/Hugoye/Vol8No1/HV8N1PRBlanchard.html). Both the digitizati<strong>on</strong> <strong>and</strong> preparati<strong>on</strong> of <str<strong>on</strong>g>in</str<strong>on</strong>g>itial metadata were undertaken by a<br />

project team from Beth Mardutho of the Syriac Institute.<br />

77 http://www.hmml.org/<br />

78 http://www.c<strong>on</strong>tentdm.org/<br />

79 The project notes that the most significant effort so far has been the work of the Comprehensive Aramaic Lexic<strong>on</strong> with the Peshitta, a Syriac translati<strong>on</strong><br />

of the Bible (http://cal1.cn.huc.edu/<br />

80 See http://cpart.byu.edu/page=114&sidebar for a list of <str<strong>on</strong>g>in</str<strong>on</strong>g>itial texts that will be available<br />

81 The work reported here made use of an annotated versi<strong>on</strong> of the Peshitta New Testament as well as of a c<strong>on</strong>cordance described <str<strong>on</strong>g>in</str<strong>on</strong>g> Kiraz (1994) <strong>and</strong><br />

(2000).

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!