26.12.2014 Views

Rome Wasn't Digitized in a Day - Council on Library and Information ...

Rome Wasn't Digitized in a Day - Council on Library and Information ...

Rome Wasn't Digitized in a Day - Council on Library and Information ...

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

47<br />

Additi<strong>on</strong>ally, Schmidt <strong>and</strong> Colomb postulated that the lack of an “accurate model of textual variati<strong>on</strong>”<br />

<strong>and</strong> the ability to implement such a model <str<strong>on</strong>g>in</str<strong>on</strong>g> a digital world have c<strong>on</strong>t<str<strong>on</strong>g>in</str<strong>on</strong>g>ued to frustrate many<br />

humanists.<br />

A related problem identified by Schmidt <strong>and</strong> Colomb is that of overlapp<str<strong>on</strong>g>in</str<strong>on</strong>g>g hierarchies, or when<br />

different markup structures (e.g., generic structural markup, l<str<strong>on</strong>g>in</str<strong>on</strong>g>guistic markup, literary markup)<br />

overlap <str<strong>on</strong>g>in</str<strong>on</strong>g> a text. Markup is said to overlap <str<strong>on</strong>g>in</str<strong>on</strong>g> that “the tags <str<strong>on</strong>g>in</str<strong>on</strong>g> <strong>on</strong>e perspective are not always well<br />

formed with respect to tags <str<strong>on</strong>g>in</str<strong>on</strong>g> another” (e.g., as <str<strong>on</strong>g>in</str<strong>on</strong>g> well-formed XML). Schmidt <strong>and</strong> Colomb proposed<br />

that the term “overlapp<str<strong>on</strong>g>in</str<strong>on</strong>g>g hierarchies” is essentially <str<strong>on</strong>g>in</str<strong>on</strong>g>correct: “Firstly, not all overlap is between<br />

compet<str<strong>on</strong>g>in</str<strong>on</strong>g>g hierarchies, <strong>and</strong> sec<strong>on</strong>dly what is meant by the term ‘hierarchy’ is actually ‘trees’, that is a<br />

specific k<str<strong>on</strong>g>in</str<strong>on</strong>g>d of hierarchy <str<strong>on</strong>g>in</str<strong>on</strong>g> which each node, except for the root, has <strong>on</strong>ly <strong>on</strong>e parent.” They put<br />

forward that although there have been over 50 papers deal<str<strong>on</strong>g>in</str<strong>on</strong>g>g with this topic, <strong>on</strong>e fundamental <strong>and</strong><br />

comm<strong>on</strong> weakness <str<strong>on</strong>g>in</str<strong>on</strong>g> the proposed approaches was that they offered soluti<strong>on</strong>s to problematic markup<br />

by us<str<strong>on</strong>g>in</str<strong>on</strong>g>g markup itself. The authors further <strong>and</strong> asserted that all cases of overlapp<str<strong>on</strong>g>in</str<strong>on</strong>g>g hierarchies are<br />

also cases of textual variati<strong>on</strong>, even if the reverse is not always true. “The overlapp<str<strong>on</strong>g>in</str<strong>on</strong>g>g hierarchies<br />

problem, then, boils down to variati<strong>on</strong> <str<strong>on</strong>g>in</str<strong>on</strong>g> the metadata,” Schmidt <strong>and</strong> Colomb declared, add<str<strong>on</strong>g>in</str<strong>on</strong>g>g that “it<br />

is entirely subsumed by the textual variati<strong>on</strong> problem because textual variati<strong>on</strong> is variati<strong>on</strong> <str<strong>on</strong>g>in</str<strong>on</strong>g> the entire<br />

text, not <strong>on</strong>ly <str<strong>on</strong>g>in</str<strong>on</strong>g> the markup” (Schmidt <strong>and</strong> Colomb 2009). They thus c<strong>on</strong>cluded that textual variati<strong>on</strong><br />

was the problem that needed solv<str<strong>on</strong>g>in</str<strong>on</strong>g>g.<br />

Ma<str<strong>on</strong>g>in</str<strong>on</strong>g>ta<str<strong>on</strong>g>in</str<strong>on</strong>g><str<strong>on</strong>g>in</str<strong>on</strong>g>g that neither versi<strong>on</strong> c<strong>on</strong>trol systems nor multiple sequence alignment (<str<strong>on</strong>g>in</str<strong>on</strong>g>spired by<br />

bio<str<strong>on</strong>g>in</str<strong>on</strong>g>formatics) can adequately address the problem of text variants, Schmidt <strong>and</strong> Colomb propose<br />

model<str<strong>on</strong>g>in</str<strong>on</strong>g>g text variati<strong>on</strong> as either a “m<str<strong>on</strong>g>in</str<strong>on</strong>g>imally redundant directed graph” or as an “ordered list of<br />

pairs” where each pair c<strong>on</strong>ta<str<strong>on</strong>g>in</str<strong>on</strong>g>s a “set of versi<strong>on</strong>s <strong>and</strong> a fragment of text or data.” The greatest<br />

challenge with variant graphs, they expla<str<strong>on</strong>g>in</str<strong>on</strong>g>ed, is how to process them efficiently. The m<str<strong>on</strong>g>in</str<strong>on</strong>g>imum<br />

number of functi<strong>on</strong>s that users would need were read<str<strong>on</strong>g>in</str<strong>on</strong>g>g a s<str<strong>on</strong>g>in</str<strong>on</strong>g>gle versi<strong>on</strong> of a text, search<str<strong>on</strong>g>in</str<strong>on</strong>g>g a<br />

multiversi<strong>on</strong> text, compar<str<strong>on</strong>g>in</str<strong>on</strong>g>g two versi<strong>on</strong>s of a text, determ<str<strong>on</strong>g>in</str<strong>on</strong>g><str<strong>on</strong>g>in</str<strong>on</strong>g>g what was a variant of what else,<br />

creat<str<strong>on</strong>g>in</str<strong>on</strong>g>g <strong>and</strong> edit<str<strong>on</strong>g>in</str<strong>on</strong>g>g a variant graph, <strong>and</strong> separat<str<strong>on</strong>g>in</str<strong>on</strong>g>g c<strong>on</strong>tent <strong>and</strong> variati<strong>on</strong>. The soluti<strong>on</strong> proposed by<br />

Schmidt (2010) is the multiversi<strong>on</strong> document format (MVD):<br />

The Multi-Versi<strong>on</strong> Document or MVD model represents all the versi<strong>on</strong>s of a work, whether<br />

they arise from correcti<strong>on</strong>s to a text or from the copy<str<strong>on</strong>g>in</str<strong>on</strong>g>g of <strong>on</strong>e orig<str<strong>on</strong>g>in</str<strong>on</strong>g>al text <str<strong>on</strong>g>in</str<strong>on</strong>g>to several variant<br />

versi<strong>on</strong>s, or some comb<str<strong>on</strong>g>in</str<strong>on</strong>g>ati<strong>on</strong> of the two, as four atomic operati<strong>on</strong>s: <str<strong>on</strong>g>in</str<strong>on</strong>g>serti<strong>on</strong>, deleti<strong>on</strong>,<br />

substituti<strong>on</strong>, <strong>and</strong> transpositi<strong>on</strong>. … An MVD can be represented as a directed graph, with <strong>on</strong>e<br />

start node <strong>and</strong> <strong>on</strong>e end-node. … Alternatively it can be serialized as a list of paired values, each<br />

c<strong>on</strong>sist<str<strong>on</strong>g>in</str<strong>on</strong>g>g of a fragment of text <strong>and</strong> a set of versi<strong>on</strong>s to which that fragment bel<strong>on</strong>gs. As the<br />

number of versi<strong>on</strong>s <str<strong>on</strong>g>in</str<strong>on</strong>g>creases, the number of fragments <str<strong>on</strong>g>in</str<strong>on</strong>g>creases, their size decreases, <strong>and</strong> the<br />

size of their versi<strong>on</strong>-sets <str<strong>on</strong>g>in</str<strong>on</strong>g>creases. This provides a good scalability as it trades off complexity<br />

for size, someth<str<strong>on</strong>g>in</str<strong>on</strong>g>g that modern computers are very good at h<strong>and</strong>l<str<strong>on</strong>g>in</str<strong>on</strong>g>g. By follow<str<strong>on</strong>g>in</str<strong>on</strong>g>g a path from<br />

the start-node to the end-node any versi<strong>on</strong> can be recovered. When read<str<strong>on</strong>g>in</str<strong>on</strong>g>g the list form of the<br />

graph, fragments not bel<strong>on</strong>g<str<strong>on</strong>g>in</str<strong>on</strong>g>g to the desired versi<strong>on</strong> are merely skipped over (Schmidt 2010).<br />

Schmidt listed a number of benefits of the MVD format for humanists, <str<strong>on</strong>g>in</str<strong>on</strong>g>clud<str<strong>on</strong>g>in</str<strong>on</strong>g>g the follow<str<strong>on</strong>g>in</str<strong>on</strong>g>g: (1) it<br />

supports the automatic computati<strong>on</strong> of <str<strong>on</strong>g>in</str<strong>on</strong>g>serti<strong>on</strong>s, deleti<strong>on</strong>s, variants, <strong>and</strong> transpositi<strong>on</strong>s between a set<br />

of versi<strong>on</strong>s; (2) MVDs are c<strong>on</strong>tent format-agnostic about <str<strong>on</strong>g>in</str<strong>on</strong>g>dividual versi<strong>on</strong>s so they can be used with<br />

any generalized markup or pla<str<strong>on</strong>g>in</str<strong>on</strong>g> text; (3) an MVD is “not a collecti<strong>on</strong> of files” <strong>and</strong> <str<strong>on</strong>g>in</str<strong>on</strong>g>stead stores “<strong>on</strong>ly<br />

the differences between all the versi<strong>on</strong>s of a work as <strong>on</strong>e digital entity <strong>and</strong> <str<strong>on</strong>g>in</str<strong>on</strong>g>terrelates them” (Schmidt<br />

2010); (4) s<str<strong>on</strong>g>in</str<strong>on</strong>g>ce the MVD stores the overlapp<str<strong>on</strong>g>in</str<strong>on</strong>g>g structures of a set of versi<strong>on</strong>s, the markup of

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!