26.12.2014 Views

Rome Wasn't Digitized in a Day - Council on Library and Information ...

Rome Wasn't Digitized in a Day - Council on Library and Information ...

Rome Wasn't Digitized in a Day - Council on Library and Information ...

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

48<br />

<str<strong>on</strong>g>in</str<strong>on</strong>g>dividual texts can be much simpler; <strong>and</strong> (5) “an MVD is the format of an applicati<strong>on</strong> not a st<strong>and</strong>ard.”<br />

Schmidt suggests that MVD documents should be stored <str<strong>on</strong>g>in</str<strong>on</strong>g> a b<str<strong>on</strong>g>in</str<strong>on</strong>g>ary format, particularly if the c<strong>on</strong>tent<br />

of each text is <str<strong>on</strong>g>in</str<strong>on</strong>g> XML. In their current work, they have created a MultiVersi<strong>on</strong> wiki tool where<br />

scholars can work <strong>on</strong> cultural heritage texts that exist <str<strong>on</strong>g>in</str<strong>on</strong>g> multiple versi<strong>on</strong>s.<br />

Computati<strong>on</strong>al L<str<strong>on</strong>g>in</str<strong>on</strong>g>guistics <strong>and</strong> Natural Language Process<str<strong>on</strong>g>in</str<strong>on</strong>g>g<br />

Computati<strong>on</strong>al l<str<strong>on</strong>g>in</str<strong>on</strong>g>guistics 142 has been def<str<strong>on</strong>g>in</str<strong>on</strong>g>ed as “the branch of l<str<strong>on</strong>g>in</str<strong>on</strong>g>guistics <str<strong>on</strong>g>in</str<strong>on</strong>g> which the techniques of<br />

computer science are applied to the analysis <strong>and</strong> synthesis of language <strong>and</strong> speech.” 143 NLP has been<br />

def<str<strong>on</strong>g>in</str<strong>on</strong>g>ed as an “area of computer science that develops systems that implement natural language<br />

underst<strong>and</strong><str<strong>on</strong>g>in</str<strong>on</strong>g>g,” <strong>and</strong> it is often listed as a subdiscipl<str<strong>on</strong>g>in</str<strong>on</strong>g>e of computati<strong>on</strong>al l<str<strong>on</strong>g>in</str<strong>on</strong>g>guistics. 144 The use of<br />

computati<strong>on</strong>al l<str<strong>on</strong>g>in</str<strong>on</strong>g>guistics <strong>and</strong> of NLP has grown enormously <str<strong>on</strong>g>in</str<strong>on</strong>g> the humanities over the past 20 years,<br />

<strong>and</strong> they have an even l<strong>on</strong>ger history <str<strong>on</strong>g>in</str<strong>on</strong>g> classical comput<str<strong>on</strong>g>in</str<strong>on</strong>g>g, as described <str<strong>on</strong>g>in</str<strong>on</strong>g> the <str<strong>on</strong>g>in</str<strong>on</strong>g>troducti<strong>on</strong> to this<br />

review. 145 Bamman <strong>and</strong> Crane (2009) have argued that both computati<strong>on</strong>al l<str<strong>on</strong>g>in</str<strong>on</strong>g>guistics <strong>and</strong> NLP will be<br />

necessary comp<strong>on</strong>ents of any cyber<str<strong>on</strong>g>in</str<strong>on</strong>g>frastructure for classics:<br />

In decid<str<strong>on</strong>g>in</str<strong>on</strong>g>g how we want to design a cyber<str<strong>on</strong>g>in</str<strong>on</strong>g>frastructure for Classics over the next ten years,<br />

there is an important questi<strong>on</strong> that lurks between “where are we now” <strong>and</strong> “where do we want<br />

to be”: where are our colleagues already Computati<strong>on</strong>al l<str<strong>on</strong>g>in</str<strong>on</strong>g>guistics <strong>and</strong> natural language<br />

process<str<strong>on</strong>g>in</str<strong>on</strong>g>g generally perform best <str<strong>on</strong>g>in</str<strong>on</strong>g> high-resource languages—languages like English, <strong>on</strong><br />

which computati<strong>on</strong>al research has been focus<str<strong>on</strong>g>in</str<strong>on</strong>g>g for over sixty years, <strong>and</strong> for which expensive<br />

resources (such as treebanks, <strong>on</strong>tologies <strong>and</strong> large, curated corpora) have l<strong>on</strong>g been developed.<br />

Many of the tools we would want <str<strong>on</strong>g>in</str<strong>on</strong>g> the future are founded <strong>on</strong> technologies that already exist<br />

for English <strong>and</strong> other languages; our task <str<strong>on</strong>g>in</str<strong>on</strong>g> design<str<strong>on</strong>g>in</str<strong>on</strong>g>g a cyber<str<strong>on</strong>g>in</str<strong>on</strong>g>frastructure may simply be to<br />

transfer <strong>and</strong> customize them for Classical Studies (Bamman <strong>and</strong> Crane 2009).<br />

This secti<strong>on</strong> describes three applicati<strong>on</strong>s from computati<strong>on</strong>al l<str<strong>on</strong>g>in</str<strong>on</strong>g>guistics <strong>and</strong> NLP <str<strong>on</strong>g>in</str<strong>on</strong>g> terms of services<br />

for digital classics as a whole: treebanks, automatic morphological analysis, <strong>and</strong> lexic<strong>on</strong>s.<br />

Treebanks<br />

A treebank can be def<str<strong>on</strong>g>in</str<strong>on</strong>g>ed as a “database of sentences which are annotated with syntactic <str<strong>on</strong>g>in</str<strong>on</strong>g>formati<strong>on</strong>,<br />

often <str<strong>on</strong>g>in</str<strong>on</strong>g> the form of a tree.” 146 Treebanks can be either manually or automatically c<strong>on</strong>structed, <strong>and</strong><br />

they are used to support a variety of computati<strong>on</strong>al tasks such as those <str<strong>on</strong>g>in</str<strong>on</strong>g>volved <str<strong>on</strong>g>in</str<strong>on</strong>g> corpus l<str<strong>on</strong>g>in</str<strong>on</strong>g>guistics,<br />

the study of syntactic features <str<strong>on</strong>g>in</str<strong>on</strong>g> computati<strong>on</strong>al l<str<strong>on</strong>g>in</str<strong>on</strong>g>guistics, <strong>and</strong> tra<str<strong>on</strong>g>in</str<strong>on</strong>g><str<strong>on</strong>g>in</str<strong>on</strong>g>g <strong>and</strong> test<str<strong>on</strong>g>in</str<strong>on</strong>g>g parsers. There has<br />

been a large growth <str<strong>on</strong>g>in</str<strong>on</strong>g> the number of historical treebanks <str<strong>on</strong>g>in</str<strong>on</strong>g> recent years, <str<strong>on</strong>g>in</str<strong>on</strong>g>clud<str<strong>on</strong>g>in</str<strong>on</strong>g>g treebanks <str<strong>on</strong>g>in</str<strong>on</strong>g> Greek<br />

<strong>and</strong> Lat<str<strong>on</strong>g>in</str<strong>on</strong>g>. Currently there are two major treebank projects for Lat<str<strong>on</strong>g>in</str<strong>on</strong>g>, the Perseus Project’s Lat<str<strong>on</strong>g>in</str<strong>on</strong>g><br />

Dependency Treebank (classical Lat<str<strong>on</strong>g>in</str<strong>on</strong>g>) <strong>and</strong> the Index Thomisticus (IT) Treebank (medieval Lat<str<strong>on</strong>g>in</str<strong>on</strong>g>), <strong>and</strong><br />

142 Relatively little work has been d<strong>on</strong>e utiliz<str<strong>on</strong>g>in</str<strong>on</strong>g>g computati<strong>on</strong>al l<str<strong>on</strong>g>in</str<strong>on</strong>g>guistics for historical languages such as Lat<str<strong>on</strong>g>in</str<strong>on</strong>g> <strong>and</strong> Greek, but for some fairly recent<br />

experiments with Lat<str<strong>on</strong>g>in</str<strong>on</strong>g>, see Sayeed <strong>and</strong> Szpakowicz (2004) <strong>and</strong> Casadio <strong>and</strong> Lambek (2005).<br />

143 "computati<strong>on</strong>al l<str<strong>on</strong>g>in</str<strong>on</strong>g>guistics plural noun" The Oxford Dicti<strong>on</strong>ary of English (revised editi<strong>on</strong>). Ed. Cather<str<strong>on</strong>g>in</str<strong>on</strong>g>e Soanes <strong>and</strong> Angus Stevens<strong>on</strong>. Oxford<br />

University Press, 2005. Oxford Reference Onl<str<strong>on</strong>g>in</str<strong>on</strong>g>e. Oxford University Press. Tufts University. 12 April 2010<br />

<br />

144 “natural-language process<str<strong>on</strong>g>in</str<strong>on</strong>g>g" A Dicti<strong>on</strong>ary of Comput<str<strong>on</strong>g>in</str<strong>on</strong>g>g. Ed John Da<str<strong>on</strong>g>in</str<strong>on</strong>g>tith <strong>and</strong> Edmund Wright. Oxford University Press, 2008. Oxford Reference<br />

Onl<str<strong>on</strong>g>in</str<strong>on</strong>g>e. Oxford University Press. Tufts University. <br />

145 For some recent exam<str<strong>on</strong>g>in</str<strong>on</strong>g>ati<strong>on</strong>s of the potential of computati<strong>on</strong>al l<str<strong>on</strong>g>in</str<strong>on</strong>g>guistics <strong>and</strong> of NLP for the humanities, see Sporleder (2010), de J<strong>on</strong>g (2009), <strong>and</strong><br />

Lüdel<str<strong>on</strong>g>in</str<strong>on</strong>g>g <strong>and</strong> Zeldes (2007).<br />

146 http://en.wikti<strong>on</strong>ary.org/wiki/treebank

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!