26.12.2014 Views

Rome Wasn't Digitized in a Day - Council on Library and Information ...

Rome Wasn't Digitized in a Day - Council on Library and Information ...

Rome Wasn't Digitized in a Day - Council on Library and Information ...

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

55<br />

<strong>and</strong> these authors noted that even more <str<strong>on</strong>g>in</str<strong>on</strong>g>terest<str<strong>on</strong>g>in</str<strong>on</strong>g>g results could be ga<str<strong>on</strong>g>in</str<strong>on</strong>g>ed by us<str<strong>on</strong>g>in</str<strong>on</strong>g>g such techniques<br />

with the large corpus of Lat<str<strong>on</strong>g>in</str<strong>on</strong>g> that is grow<str<strong>on</strong>g>in</str<strong>on</strong>g>g <strong>on</strong>l<str<strong>on</strong>g>in</str<strong>on</strong>g>e, from multiple editi<strong>on</strong>s of classical Lat<str<strong>on</strong>g>in</str<strong>on</strong>g> authors<br />

to neo-Lat<str<strong>on</strong>g>in</str<strong>on</strong>g> texts. In a larger corpus, a dynamic lexic<strong>on</strong> could be used to explore how classical Lat<str<strong>on</strong>g>in</str<strong>on</strong>g><br />

authors such as Caesar <strong>and</strong> Ovid used words differently, or the use of a word could be compared<br />

between classical <strong>and</strong> neo-Lat<str<strong>on</strong>g>in</str<strong>on</strong>g> texts. Another advantage of a dynamic lexic<strong>on</strong> is that rather than<br />

present<str<strong>on</strong>g>in</str<strong>on</strong>g>g several highly illustrative examples of word usage (as is d<strong>on</strong>e with the Cambridge Greek<br />

English Lexic<strong>on</strong>), it can present as many examples as are found <str<strong>on</strong>g>in</str<strong>on</strong>g> the corpus. F<str<strong>on</strong>g>in</str<strong>on</strong>g>ally, the fact that the<br />

dynamic lexic<strong>on</strong> supports the ability to search across Lat<str<strong>on</strong>g>in</str<strong>on</strong>g> <strong>and</strong> Greeks text us<str<strong>on</strong>g>in</str<strong>on</strong>g>g English translati<strong>on</strong>s<br />

of Greek <strong>and</strong> Lat<str<strong>on</strong>g>in</str<strong>on</strong>g> words is a “close approximati<strong>on</strong> to real cross-language <str<strong>on</strong>g>in</str<strong>on</strong>g>formati<strong>on</strong> retrieval.”<br />

Perhaps most important, Bamman <strong>and</strong> Crane argue that their work to create a dynamic lexic<strong>on</strong><br />

illustrates how even small structured-knowledge sources can be used to m<str<strong>on</strong>g>in</str<strong>on</strong>g>e <str<strong>on</strong>g>in</str<strong>on</strong>g>terest<str<strong>on</strong>g>in</str<strong>on</strong>g>g patterns from<br />

larger collecti<strong>on</strong>s:<br />

The applicati<strong>on</strong> of structured knowledge to much larger but unstructured collecti<strong>on</strong>s addresses<br />

a gap left by the massive digitizati<strong>on</strong> efforts of groups such as Google <strong>and</strong> the Open C<strong>on</strong>tent<br />

Alliance (OCA). While these large projects are creat<str<strong>on</strong>g>in</str<strong>on</strong>g>g truly milli<strong>on</strong>- book collecti<strong>on</strong>s, the<br />

services they provide are general (e.g., key term extracti<strong>on</strong>, named entity analysis, related<br />

works) <strong>and</strong> reflect the wide array of texts <strong>and</strong> languages they c<strong>on</strong>ta<str<strong>on</strong>g>in</str<strong>on</strong>g>. By apply<str<strong>on</strong>g>in</str<strong>on</strong>g>g the language<br />

specific knowledge of experts (as encoded <str<strong>on</strong>g>in</str<strong>on</strong>g> our treebank), we are able to create more specific<br />

services to complement these general <strong>on</strong>es already <str<strong>on</strong>g>in</str<strong>on</strong>g> place. In creat<str<strong>on</strong>g>in</str<strong>on</strong>g>g a dynamic lexic<strong>on</strong> built<br />

from the <str<strong>on</strong>g>in</str<strong>on</strong>g>tersecti<strong>on</strong> of a 3.5 milli<strong>on</strong> word corpus <strong>and</strong> a 30,457 word treebank, we are<br />

highlight<str<strong>on</strong>g>in</str<strong>on</strong>g>g the immense role than even very small structured knowledge sources can play<br />

(Bamman <strong>and</strong> Crane 2008).<br />

The authors also observed that s<str<strong>on</strong>g>in</str<strong>on</strong>g>ce many of the technologies used to build the lexic<strong>on</strong>, such as wordsense<br />

disambiguati<strong>on</strong> <strong>and</strong> syntactic pars<str<strong>on</strong>g>in</str<strong>on</strong>g>g, are modular, any separate improvements made to these<br />

algorithms could be <str<strong>on</strong>g>in</str<strong>on</strong>g>corporated back <str<strong>on</strong>g>in</str<strong>on</strong>g>to the lexic<strong>on</strong>. Similarly, as tagg<str<strong>on</strong>g>in</str<strong>on</strong>g>g <strong>and</strong> pars<str<strong>on</strong>g>in</str<strong>on</strong>g>g accuracy<br />

improve with the size of a corpus <strong>and</strong> as the tra<str<strong>on</strong>g>in</str<strong>on</strong>g><str<strong>on</strong>g>in</str<strong>on</strong>g>g corpus of Lat<str<strong>on</strong>g>in</str<strong>on</strong>g> grows, so will the treebank. In<br />

additi<strong>on</strong>, this work illustrates how small-doma<str<strong>on</strong>g>in</str<strong>on</strong>g> tools might be repurposed to work with larger<br />

collecti<strong>on</strong>s.<br />

Bamman <strong>and</strong> Crane (2009) have <str<strong>on</strong>g>in</str<strong>on</strong>g>vestigated these issues further <str<strong>on</strong>g>in</str<strong>on</strong>g> their overview of computati<strong>on</strong>al<br />

l<str<strong>on</strong>g>in</str<strong>on</strong>g>guistics <strong>and</strong> lexicography. They noted that while the TLG <strong>and</strong> Perseus provide “dirty results,” or the<br />

ability to f<str<strong>on</strong>g>in</str<strong>on</strong>g>d all the <str<strong>on</strong>g>in</str<strong>on</strong>g>stances of a lemma <str<strong>on</strong>g>in</str<strong>on</strong>g> their collecti<strong>on</strong>s, the TLL gives a smaller subset of<br />

impeccably precise results. Bamman <strong>and</strong> Crane argued that <str<strong>on</strong>g>in</str<strong>on</strong>g> the future, a comb<str<strong>on</strong>g>in</str<strong>on</strong>g>ati<strong>on</strong> of these two<br />

approaches will be necessary, <strong>and</strong> lexicography will need to utilize both mach<str<strong>on</strong>g>in</str<strong>on</strong>g>e-learn<str<strong>on</strong>g>in</str<strong>on</strong>g>g techniques<br />

that learn from large textual collecti<strong>on</strong>s <strong>and</strong> the knowledge <strong>and</strong> labor <str<strong>on</strong>g>in</str<strong>on</strong>g>vested <str<strong>on</strong>g>in</str<strong>on</strong>g> h<strong>and</strong>crafted lexic<strong>on</strong>s<br />

to help such techniques learn. The authors also noted that new lexic<strong>on</strong>s built for a classical<br />

cyber<str<strong>on</strong>g>in</str<strong>on</strong>g>frastructure would need to support new levels of research:<br />

Manual lexicography has produced fantastic results for Classical languages, but as we design a<br />

cyber<str<strong>on</strong>g>in</str<strong>on</strong>g>frastructure for Classics <str<strong>on</strong>g>in</str<strong>on</strong>g> the future, our aim must be to build a scaffold<str<strong>on</strong>g>in</str<strong>on</strong>g>g that is<br />

essentially enabl<str<strong>on</strong>g>in</str<strong>on</strong>g>g: it must not <strong>on</strong>ly make historical languages more accessible <strong>on</strong> a functi<strong>on</strong>al<br />

level, but <str<strong>on</strong>g>in</str<strong>on</strong>g>tellectually as well; it must give students the resources they need to underst<strong>and</strong> a<br />

text while also provid<str<strong>on</strong>g>in</str<strong>on</strong>g>g scholars the tools to <str<strong>on</strong>g>in</str<strong>on</strong>g>teract with it <str<strong>on</strong>g>in</str<strong>on</strong>g> whatever ways they see fit<br />

(Bamman <strong>and</strong> Crane 2009).

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!