Rome Wasn't Digitized in a Day - Council on Library and Information ...
Rome Wasn't Digitized in a Day - Council on Library and Information ...
Rome Wasn't Digitized in a Day - Council on Library and Information ...
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
48<br />
<str<strong>on</strong>g>in</str<strong>on</strong>g>dividual texts can be much simpler; <strong>and</strong> (5) “an MVD is the format of an applicati<strong>on</strong> not a st<strong>and</strong>ard.”<br />
Schmidt suggests that MVD documents should be stored <str<strong>on</strong>g>in</str<strong>on</strong>g> a b<str<strong>on</strong>g>in</str<strong>on</strong>g>ary format, particularly if the c<strong>on</strong>tent<br />
of each text is <str<strong>on</strong>g>in</str<strong>on</strong>g> XML. In their current work, they have created a MultiVersi<strong>on</strong> wiki tool where<br />
scholars can work <strong>on</strong> cultural heritage texts that exist <str<strong>on</strong>g>in</str<strong>on</strong>g> multiple versi<strong>on</strong>s.<br />
Computati<strong>on</strong>al L<str<strong>on</strong>g>in</str<strong>on</strong>g>guistics <strong>and</strong> Natural Language Process<str<strong>on</strong>g>in</str<strong>on</strong>g>g<br />
Computati<strong>on</strong>al l<str<strong>on</strong>g>in</str<strong>on</strong>g>guistics 142 has been def<str<strong>on</strong>g>in</str<strong>on</strong>g>ed as “the branch of l<str<strong>on</strong>g>in</str<strong>on</strong>g>guistics <str<strong>on</strong>g>in</str<strong>on</strong>g> which the techniques of<br />
computer science are applied to the analysis <strong>and</strong> synthesis of language <strong>and</strong> speech.” 143 NLP has been<br />
def<str<strong>on</strong>g>in</str<strong>on</strong>g>ed as an “area of computer science that develops systems that implement natural language<br />
underst<strong>and</strong><str<strong>on</strong>g>in</str<strong>on</strong>g>g,” <strong>and</strong> it is often listed as a subdiscipl<str<strong>on</strong>g>in</str<strong>on</strong>g>e of computati<strong>on</strong>al l<str<strong>on</strong>g>in</str<strong>on</strong>g>guistics. 144 The use of<br />
computati<strong>on</strong>al l<str<strong>on</strong>g>in</str<strong>on</strong>g>guistics <strong>and</strong> of NLP has grown enormously <str<strong>on</strong>g>in</str<strong>on</strong>g> the humanities over the past 20 years,<br />
<strong>and</strong> they have an even l<strong>on</strong>ger history <str<strong>on</strong>g>in</str<strong>on</strong>g> classical comput<str<strong>on</strong>g>in</str<strong>on</strong>g>g, as described <str<strong>on</strong>g>in</str<strong>on</strong>g> the <str<strong>on</strong>g>in</str<strong>on</strong>g>troducti<strong>on</strong> to this<br />
review. 145 Bamman <strong>and</strong> Crane (2009) have argued that both computati<strong>on</strong>al l<str<strong>on</strong>g>in</str<strong>on</strong>g>guistics <strong>and</strong> NLP will be<br />
necessary comp<strong>on</strong>ents of any cyber<str<strong>on</strong>g>in</str<strong>on</strong>g>frastructure for classics:<br />
In decid<str<strong>on</strong>g>in</str<strong>on</strong>g>g how we want to design a cyber<str<strong>on</strong>g>in</str<strong>on</strong>g>frastructure for Classics over the next ten years,<br />
there is an important questi<strong>on</strong> that lurks between “where are we now” <strong>and</strong> “where do we want<br />
to be”: where are our colleagues already Computati<strong>on</strong>al l<str<strong>on</strong>g>in</str<strong>on</strong>g>guistics <strong>and</strong> natural language<br />
process<str<strong>on</strong>g>in</str<strong>on</strong>g>g generally perform best <str<strong>on</strong>g>in</str<strong>on</strong>g> high-resource languages—languages like English, <strong>on</strong><br />
which computati<strong>on</strong>al research has been focus<str<strong>on</strong>g>in</str<strong>on</strong>g>g for over sixty years, <strong>and</strong> for which expensive<br />
resources (such as treebanks, <strong>on</strong>tologies <strong>and</strong> large, curated corpora) have l<strong>on</strong>g been developed.<br />
Many of the tools we would want <str<strong>on</strong>g>in</str<strong>on</strong>g> the future are founded <strong>on</strong> technologies that already exist<br />
for English <strong>and</strong> other languages; our task <str<strong>on</strong>g>in</str<strong>on</strong>g> design<str<strong>on</strong>g>in</str<strong>on</strong>g>g a cyber<str<strong>on</strong>g>in</str<strong>on</strong>g>frastructure may simply be to<br />
transfer <strong>and</strong> customize them for Classical Studies (Bamman <strong>and</strong> Crane 2009).<br />
This secti<strong>on</strong> describes three applicati<strong>on</strong>s from computati<strong>on</strong>al l<str<strong>on</strong>g>in</str<strong>on</strong>g>guistics <strong>and</strong> NLP <str<strong>on</strong>g>in</str<strong>on</strong>g> terms of services<br />
for digital classics as a whole: treebanks, automatic morphological analysis, <strong>and</strong> lexic<strong>on</strong>s.<br />
Treebanks<br />
A treebank can be def<str<strong>on</strong>g>in</str<strong>on</strong>g>ed as a “database of sentences which are annotated with syntactic <str<strong>on</strong>g>in</str<strong>on</strong>g>formati<strong>on</strong>,<br />
often <str<strong>on</strong>g>in</str<strong>on</strong>g> the form of a tree.” 146 Treebanks can be either manually or automatically c<strong>on</strong>structed, <strong>and</strong><br />
they are used to support a variety of computati<strong>on</strong>al tasks such as those <str<strong>on</strong>g>in</str<strong>on</strong>g>volved <str<strong>on</strong>g>in</str<strong>on</strong>g> corpus l<str<strong>on</strong>g>in</str<strong>on</strong>g>guistics,<br />
the study of syntactic features <str<strong>on</strong>g>in</str<strong>on</strong>g> computati<strong>on</strong>al l<str<strong>on</strong>g>in</str<strong>on</strong>g>guistics, <strong>and</strong> tra<str<strong>on</strong>g>in</str<strong>on</strong>g><str<strong>on</strong>g>in</str<strong>on</strong>g>g <strong>and</strong> test<str<strong>on</strong>g>in</str<strong>on</strong>g>g parsers. There has<br />
been a large growth <str<strong>on</strong>g>in</str<strong>on</strong>g> the number of historical treebanks <str<strong>on</strong>g>in</str<strong>on</strong>g> recent years, <str<strong>on</strong>g>in</str<strong>on</strong>g>clud<str<strong>on</strong>g>in</str<strong>on</strong>g>g treebanks <str<strong>on</strong>g>in</str<strong>on</strong>g> Greek<br />
<strong>and</strong> Lat<str<strong>on</strong>g>in</str<strong>on</strong>g>. Currently there are two major treebank projects for Lat<str<strong>on</strong>g>in</str<strong>on</strong>g>, the Perseus Project’s Lat<str<strong>on</strong>g>in</str<strong>on</strong>g><br />
Dependency Treebank (classical Lat<str<strong>on</strong>g>in</str<strong>on</strong>g>) <strong>and</strong> the Index Thomisticus (IT) Treebank (medieval Lat<str<strong>on</strong>g>in</str<strong>on</strong>g>), <strong>and</strong><br />
142 Relatively little work has been d<strong>on</strong>e utiliz<str<strong>on</strong>g>in</str<strong>on</strong>g>g computati<strong>on</strong>al l<str<strong>on</strong>g>in</str<strong>on</strong>g>guistics for historical languages such as Lat<str<strong>on</strong>g>in</str<strong>on</strong>g> <strong>and</strong> Greek, but for some fairly recent<br />
experiments with Lat<str<strong>on</strong>g>in</str<strong>on</strong>g>, see Sayeed <strong>and</strong> Szpakowicz (2004) <strong>and</strong> Casadio <strong>and</strong> Lambek (2005).<br />
143 "computati<strong>on</strong>al l<str<strong>on</strong>g>in</str<strong>on</strong>g>guistics plural noun" The Oxford Dicti<strong>on</strong>ary of English (revised editi<strong>on</strong>). Ed. Cather<str<strong>on</strong>g>in</str<strong>on</strong>g>e Soanes <strong>and</strong> Angus Stevens<strong>on</strong>. Oxford<br />
University Press, 2005. Oxford Reference Onl<str<strong>on</strong>g>in</str<strong>on</strong>g>e. Oxford University Press. Tufts University. 12 April 2010<br />
<br />
144 “natural-language process<str<strong>on</strong>g>in</str<strong>on</strong>g>g" A Dicti<strong>on</strong>ary of Comput<str<strong>on</strong>g>in</str<strong>on</strong>g>g. Ed John Da<str<strong>on</strong>g>in</str<strong>on</strong>g>tith <strong>and</strong> Edmund Wright. Oxford University Press, 2008. Oxford Reference<br />
Onl<str<strong>on</strong>g>in</str<strong>on</strong>g>e. Oxford University Press. Tufts University. <br />
145 For some recent exam<str<strong>on</strong>g>in</str<strong>on</strong>g>ati<strong>on</strong>s of the potential of computati<strong>on</strong>al l<str<strong>on</strong>g>in</str<strong>on</strong>g>guistics <strong>and</strong> of NLP for the humanities, see Sporleder (2010), de J<strong>on</strong>g (2009), <strong>and</strong><br />
Lüdel<str<strong>on</strong>g>in</str<strong>on</strong>g>g <strong>and</strong> Zeldes (2007).<br />
146 http://en.wikti<strong>on</strong>ary.org/wiki/treebank