26.12.2014 Views

Rome Wasn't Digitized in a Day - Council on Library and Information ...

Rome Wasn't Digitized in a Day - Council on Library and Information ...

Rome Wasn't Digitized in a Day - Council on Library and Information ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

81<br />

implicati<strong>on</strong>s of h<strong>and</strong>l<str<strong>on</strong>g>in</str<strong>on</strong>g>g archaeological <str<strong>on</strong>g>in</str<strong>on</strong>g>formati<strong>on</strong> digitally… this is [SIC] also perta<str<strong>on</strong>g>in</str<strong>on</strong>g>s to the<br />

broader arts <strong>and</strong> humanities VRE agenda. The act of publish<str<strong>on</strong>g>in</str<strong>on</strong>g>g a database of archaeological<br />

<str<strong>on</strong>g>in</str<strong>on</strong>g>formati<strong>on</strong> implicitly disguises the fact that creat<str<strong>on</strong>g>in</str<strong>on</strong>g>g the database <str<strong>on</strong>g>in</str<strong>on</strong>g> the first place is an<br />

<str<strong>on</strong>g>in</str<strong>on</strong>g>terpretive process (Dunn 2009).<br />

Dunn’s warn<str<strong>on</strong>g>in</str<strong>on</strong>g>g is an important rem<str<strong>on</strong>g>in</str<strong>on</strong>g>der that the design of any <str<strong>on</strong>g>in</str<strong>on</strong>g>frastructure <str<strong>on</strong>g>in</str<strong>on</strong>g> the humanities must<br />

take <str<strong>on</strong>g>in</str<strong>on</strong>g>to account the <str<strong>on</strong>g>in</str<strong>on</strong>g>terpretative nature of most humanities scholarship. He also further detailed how<br />

archaeological workflows are “idiosyncratic, partly <str<strong>on</strong>g>in</str<strong>on</strong>g>formal, <strong>and</strong> extremely difficult to def<str<strong>on</strong>g>in</str<strong>on</strong>g>e,” all<br />

factors that make them hard to translate <str<strong>on</strong>g>in</str<strong>on</strong>g>to a digital <str<strong>on</strong>g>in</str<strong>on</strong>g>frastructure (Dunn 2009).<br />

Some recent research us<str<strong>on</strong>g>in</str<strong>on</strong>g>g topic model<str<strong>on</strong>g>in</str<strong>on</strong>g>g <strong>and</strong> an archaeological database has illustrated how<br />

subjective the human <str<strong>on</strong>g>in</str<strong>on</strong>g>terpretati<strong>on</strong>s of archaeological data can be. Recent work by David Mimno<br />

(2009) used topic model<str<strong>on</strong>g>in</str<strong>on</strong>g>g <strong>and</strong> a database of objects discovered <str<strong>on</strong>g>in</str<strong>on</strong>g> houses from Pompeii 259 to<br />

exam<str<strong>on</strong>g>in</str<strong>on</strong>g>e the validity of the typological classificati<strong>on</strong>s that were <str<strong>on</strong>g>in</str<strong>on</strong>g>itially assigned to these objects. This<br />

database c<strong>on</strong>ta<str<strong>on</strong>g>in</str<strong>on</strong>g>s 6,000 artifact records for f<str<strong>on</strong>g>in</str<strong>on</strong>g>ds <str<strong>on</strong>g>in</str<strong>on</strong>g> 30 architecturally similar houses <str<strong>on</strong>g>in</str<strong>on</strong>g> Pompeii, <strong>and</strong><br />

each artifact is labeled with <strong>on</strong>e of 240 typological categories <strong>and</strong> the room <str<strong>on</strong>g>in</str<strong>on</strong>g> which it was found.<br />

Because of the large amount of data available, Mimno argued that the use of statistical data-m<str<strong>on</strong>g>in</str<strong>on</strong>g><str<strong>on</strong>g>in</str<strong>on</strong>g>g<br />

tools could help provide some new <str<strong>on</strong>g>in</str<strong>on</strong>g>sights <str<strong>on</strong>g>in</str<strong>on</strong>g>to these data:<br />

In this paper we apply <strong>on</strong>e such tool, statistical topic model<str<strong>on</strong>g>in</str<strong>on</strong>g>g ... <str<strong>on</strong>g>in</str<strong>on</strong>g> which rooms are modeled<br />

as hav<str<strong>on</strong>g>in</str<strong>on</strong>g>g mixtures of functi<strong>on</strong>s, <strong>and</strong> functi<strong>on</strong>s are modeled as distributi<strong>on</strong>s over a “vocabulary”<br />

of object types. The purpose of this study is not to show that topic model<str<strong>on</strong>g>in</str<strong>on</strong>g>g is the best tool for<br />

archeological <str<strong>on</strong>g>in</str<strong>on</strong>g>vestigati<strong>on</strong>, but that it is an appropriate tool that can provide a complement to<br />

human analysis. To this aim, we attempt to provide a perspective <strong>on</strong> several issues raised by<br />

Allis<strong>on</strong>, that is, if not unbiased, then at least mathematically c<strong>on</strong>crete <str<strong>on</strong>g>in</str<strong>on</strong>g> its biases (Mimno<br />

2009).<br />

In comm<strong>on</strong> archaeological practice, Mimno expla<str<strong>on</strong>g>in</str<strong>on</strong>g>ed, artifacts that are excavated are removed to<br />

secure storage, <strong>and</strong> while their locati<strong>on</strong> is carefully noted <str<strong>on</strong>g>in</str<strong>on</strong>g> modern digs, artifacts <str<strong>on</strong>g>in</str<strong>on</strong>g> storage are<br />

typically analyzed “<str<strong>on</strong>g>in</str<strong>on</strong>g> comparis<strong>on</strong> to typologically similar objects rather than with<str<strong>on</strong>g>in</str<strong>on</strong>g> their orig<str<strong>on</strong>g>in</str<strong>on</strong>g>al<br />

c<strong>on</strong>text.” C<strong>on</strong>sequently, Mimno reas<strong>on</strong>ed that determ<str<strong>on</strong>g>in</str<strong>on</strong>g><str<strong>on</strong>g>in</str<strong>on</strong>g>g the functi<strong>on</strong> of many artifacts had been<br />

driven by “arbitrary traditi<strong>on</strong>” <strong>and</strong> the percepti<strong>on</strong> of <str<strong>on</strong>g>in</str<strong>on</strong>g>dividual researchers <str<strong>on</strong>g>in</str<strong>on</strong>g> terms of what an artifact<br />

resembles. Two classes of artifacts, <str<strong>on</strong>g>in</str<strong>on</strong>g> fact, the casseruola (casserole dish) <strong>and</strong> forma di pasticceria<br />

(pastry mold) were named based <strong>on</strong> similarities to n<str<strong>on</strong>g>in</str<strong>on</strong>g>eteenth-century household objects, <strong>and</strong> the<br />

creator of the Pompeii database (Penelope Allis<strong>on</strong>) c<strong>on</strong>tended that modern archaeologists often made<br />

unvalidated assumpti<strong>on</strong>s about objects based <strong>on</strong> their modern names (Allis<strong>on</strong> 2001).<br />

For these reas<strong>on</strong>s, Mimno decided to use topic model<str<strong>on</strong>g>in</str<strong>on</strong>g>g to reduce this bias <strong>and</strong> explored the functi<strong>on</strong><br />

of artifact types us<str<strong>on</strong>g>in</str<strong>on</strong>g>g <strong>on</strong>ly object co-occurrence data <strong>and</strong> no typology <str<strong>on</strong>g>in</str<strong>on</strong>g>formati<strong>on</strong>. All object<br />

descripti<strong>on</strong>s were reduced to <str<strong>on</strong>g>in</str<strong>on</strong>g>tegers, <strong>and</strong> a statistical topic model was then used to detect “clusters of<br />

object cooccurrence” that might <str<strong>on</strong>g>in</str<strong>on</strong>g>dicate functi<strong>on</strong>s. While Mimno admitted that this system still relied<br />

<strong>on</strong> experts hav<str<strong>on</strong>g>in</str<strong>on</strong>g>g accurately classified physical objects <str<strong>on</strong>g>in</str<strong>on</strong>g>to appropriate categories <str<strong>on</strong>g>in</str<strong>on</strong>g> the first place, no<br />

other archaeological assumpti<strong>on</strong>s were made by the tra<str<strong>on</strong>g>in</str<strong>on</strong>g><str<strong>on</strong>g>in</str<strong>on</strong>g>g model. The basic assumpti<strong>on</strong> was that if<br />

two objects had similar patterns of use they should have a high probability of co-occurrence together <str<strong>on</strong>g>in</str<strong>on</strong>g><br />

<strong>on</strong>e or more “topics.” Initial analysis of a topic model for the casseruola <strong>and</strong> forma di pasticceria<br />

illustrated them as hav<str<strong>on</strong>g>in</str<strong>on</strong>g>g little c<strong>on</strong>necti<strong>on</strong> to other food-preparati<strong>on</strong> objects <strong>and</strong> thus supported<br />

Allis<strong>on</strong>’s claim that the modern names for these items are <str<strong>on</strong>g>in</str<strong>on</strong>g>correct. This work illustrates how<br />

259 http://www.stoa.org/projects/ph/home

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!