Rome Wasn't Digitized in a Day - Council on Library and Information ...

More documents

Recommendations

Info

129 As part of this phase, the VMR at the University of Birm<strong>in</strong>gham 410 also plans to jo<strong>in</strong> with a parallel VMR be<strong>in</strong>g built at the University of Münster <strong>in</strong> Germany and provide seamless access to both collections. Four features dist<strong>in</strong>guish Interedition from previous manuscript digitization projects: (1) it is designed around granular metadata, so <strong>in</strong>stead of simply present<strong>in</strong>g metadata records for whole manuscripts, records are provided for each page image, for the transcription of the text on that page, and for specify<strong>in</strong>g what text is on that page; (2) “the metadata states the exact resource type associated with the URL specified <strong>in</strong> each record” (e.g., if a text file is <strong>in</strong> XML and what schema has been used); (3) all VMR materials will be stored <strong>in</strong> Birm<strong>in</strong>gham’s <strong>in</strong>stitutional repository and be accessible through the library onl<strong>in</strong>e public access catalog (OPAC), and (4) the VMR will support full reuse of its materials not just access to them. This fourth feature is perhaps most unique, for as seen by the survey of projects <strong>in</strong> this section, the focus of much manuscript-digitization work has often been on support<strong>in</strong>g the discovery of digital manuscripts for use onl<strong>in</strong>e rather than on the ability for scholars to get access to the raw digital materials and use them <strong>in</strong> their own projects. The VMR plans to provide access to all the metadata they create through a syndicated RSS feed so that users can create their own <strong>in</strong>terfaces to VMR data. In addition, they plan to allow other users to add material to the VMR by creat<strong>in</strong>g a “metadata record for the resource follow<strong>in</strong>g VMR protocols” and then add it to the RSS feed of any VMR project. The importance of support<strong>in</strong>g new collaboration models that allow many <strong>in</strong>dividuals to contribute related digital manuscript resources has also been discussed <strong>in</strong> Rob<strong>in</strong>son (2009, 2010). While the amount of metadata about manuscripts, as well as digital images and transcriptions of manuscripts, that have become available onl<strong>in</strong>e has <strong>in</strong>creased, there are still few easy ways to l<strong>in</strong>k between them if they exist <strong>in</strong> different collections. A related problem is the limited ability to at least partially automate the l<strong>in</strong>k<strong>in</strong>g of manuscript images with their transcriptions, even if both are known to exist. Arianna Ciula has argued that the work of palaeographers would greatly benefit from descriptive encod<strong>in</strong>g or technology that supported more sophisticated l<strong>in</strong>k<strong>in</strong>g between images and texts, particularly “the possibility to export the association between descriptions of specific palaeographical properties and the coord<strong>in</strong>ates with<strong>in</strong> a manuscript image <strong>in</strong> a standard format such as the encod<strong>in</strong>g proposed by the TEI facsimile module or SVG” (Ciula 2009). Recently Hugh Cayless has developed a series of tools and techniques to assist <strong>in</strong> this process that have been grouped under the name img2XML 411 and have been described <strong>in</strong> detail <strong>in</strong> Cayless (2008, 2009). As has been previously discussed by Monella (2008) and Boschetti (2009), Cayless noted that manuscript transcriptions are typically published <strong>in</strong> one of two formats, either as a critical edition where the editors’ comments are <strong>in</strong>cluded as an <strong>in</strong>tegral part of the text or as a diplomatic transcription that tries to “faithfully reproduce the text” (Cayless 2009). While TEI allows the production of both types of transcriptions from the same marked-up text, Cayless argued that the next important step is to automatically l<strong>in</strong>k such transcriptions to their page images. While many systems l<strong>in</strong>k manuscript images and transcriptions on the page level, 412 the work of Cayless sought to support more granular l<strong>in</strong>k<strong>in</strong>g, such as at the level of <strong>in</strong>dividual l<strong>in</strong>es or even words. 410 The VMR at Birm<strong>in</strong>gham has been funded by JISC and is be<strong>in</strong>g created by the Institute of Textual Scholarship and Electronic Edit<strong>in</strong>g (ITSEE), http://www.itsee.bham.ac.uk/ 411 http://github.com/hcayless/img2xml 412 One such system is EPPT (discussed earlier <strong>in</strong> this paper), and another tool listed by Cayless is the Image Markup Tool (IMT), http://www.tapor.uvic.ca/%7Emholmes/image_markup/<strong>in</strong>dex.php, which allows a user to annotate “rectangular sections of an image” by us<strong>in</strong>g a draw<strong>in</strong>g tool with which they can first “draw shape overlays on an image” and then these overlays can then be l<strong>in</strong>ked to “text annotations entered by the user.” (Cayless 2009).
130 Cayless thus developed a method for generat<strong>in</strong>g a “Scalable Vector Graphics (SVG) 413 representation of the text <strong>in</strong> an image of a manuscript” (Cayless 2009). This work was <strong>in</strong>spired by experiments conducted us<strong>in</strong>g the OpenLayers 414 Javascript library by Tom Elliott and Sean Gillies to trace the text on a sample <strong>in</strong>scription 415 and Cayless sought to create a “toolcha<strong>in</strong>” that used only open-source software. To beg<strong>in</strong>, Cayless converted JPEG images of manuscripts <strong>in</strong>to a bitmap format us<strong>in</strong>g ImageMagick; 416 he then used an open-source tool called Potrace 417 to convert the bitmap to SVG. The SVG conversion process required some manual <strong>in</strong>tervention, and an SVG editor called Inscape was used to clean up the result<strong>in</strong>g SVG files. The result<strong>in</strong>g SVG documents were analyzed us<strong>in</strong>g a Python script that attempted to “detect l<strong>in</strong>es <strong>in</strong> the image and organize paths with<strong>in</strong> those l<strong>in</strong>es <strong>in</strong>to groups with<strong>in</strong> the document” (Cayless 2008). After the text image with<strong>in</strong> a larger manuscript page image had been converted <strong>in</strong>to SVG paths, these paths could be grouped with<strong>in</strong> the document to mark the words there<strong>in</strong> and these groups could then be l<strong>in</strong>ked us<strong>in</strong>g various methods to tokenized versions of the transcriptions (Cayless 2009). Cayless then used the OpenLayers library to simultaneously display the l<strong>in</strong>ked manuscript image and TEI transcription, for importantly, OpenLayers “allows the <strong>in</strong>sertion of a s<strong>in</strong>gle image as a base layer (though it supports tiled images as well), so it is quite simple to <strong>in</strong>sert a page image <strong>in</strong>to it” (Cayless 2008). This <strong>in</strong>itial system also required the addition of several functions to the OpenLayers library, particularly the ability to support “paths and groups of paths.” Ultimately, Cayless reported that: The experiments outl<strong>in</strong>ed above prove that it is feasible to go from a page image with a TEIbased transcription to an onl<strong>in</strong>e display <strong>in</strong> which the image can be panned and zoomed, and the text on the page can be l<strong>in</strong>ked to the transcription (and vice-versa). The steps <strong>in</strong> the process that have not yet been fully automated are the selection of a black/white cutoff for the page image, the decision of what percentage of vertical overlap to use <strong>in</strong> recogniz<strong>in</strong>g that two paths are members of the same l<strong>in</strong>e, and the need for l<strong>in</strong>e beg<strong>in</strong>n<strong>in</strong>g () tags to be <strong>in</strong>serted <strong>in</strong>to the TEI transcription (Cayless 2008). While automatic analysis of the SVG output has supported the detection of l<strong>in</strong>es of text <strong>in</strong> page images, work cont<strong>in</strong>ues to allow the automatic detection of words or other features <strong>in</strong> the image. Cayless concluded that this research raised two issues. To beg<strong>in</strong> with, further research would need to consider what structures (beyond l<strong>in</strong>es) could be detected <strong>in</strong> a SVG document and how they could be l<strong>in</strong>ked to transcriptions. Second, TEI transcriptions often def<strong>in</strong>e document structure <strong>in</strong> a “semantic” rather than physical way, and even though l<strong>in</strong>e, word, and letter segments can be marked <strong>in</strong> TEI they often are not. This makes it difficult, if not impossible, to automate the l<strong>in</strong>k<strong>in</strong>g process. Cayless proposed that a standard would need to be developed for this type of l<strong>in</strong>k<strong>in</strong>g. Other experiments <strong>in</strong> automatic l<strong>in</strong>k<strong>in</strong>g of images and transcriptions have been conducted by the TILE project. 418 This project seeks to build a “web-based image markup tool” 419 and is based on the exist<strong>in</strong>g code of the Ajax XML (AXE) image encoder. 420 It will be <strong>in</strong>teroperable with both the EPPT and the 413 SVG is “a language for describ<strong>in</strong>g two-dimensional graphics and graphical applications <strong>in</strong> XML.” http://www.w3.org/Graphics/SVG/ 414 http://trac.openlayers.org/wiki/Release/2.6/Notes 415 http://sgillies.net/blog/691/digitiz<strong>in</strong>g-ancient-<strong>in</strong>scriptions-with-openlayers 416 http://www.imagemagick.org/ 417 http://potrace.sourceforge.net/ 418 This project’s approach to digital editions was discussed earlier <strong>in</strong> this paper. 419 An <strong>in</strong>itial release of TILE 0.9 is now available for download at (http://mith.umd.edu/tile/) <strong>in</strong>clud<strong>in</strong>g extensive step-by-step documentation http://mith.<strong>in</strong>fo/tile/documentation/ and a forum for users. This <strong>in</strong>itial version <strong>in</strong>cludes an image markup tool, import<strong>in</strong>g and export<strong>in</strong>g tools, and a semiautomated l<strong>in</strong>e recognizer. There is also a TILE sandbox (http://mith.umd.edu/tile/sandbox/), a “MITH-hosted version of TILE allow<strong>in</strong>g users to try the tool before <strong>in</strong>stall<strong>in</strong>g their own copy.” 420 http://mith.<strong>in</strong>fo/AXE/
Page 1 and 2:
"Rome Wasn
Page 3 and 4:
ii ISBN 978-1-932326-38-3 CLIR Publ
Page 5 and 6:
iv EpiDoc-Based Digital Epigraphy P
Page 7 and 8:
vi ABOUT THE AUTHOR Alison Babeu ha
Page 9 and 10:
viii PBW PCA PDB PDL PHI PLANETS PN
Page 11 and 12:
x pursuit of knowledge about the an
Page 13 and 14:
2 and intellectual
Page 15 and 16:
4 associations to work together, an
Page 17 and 18:
6 briefly explore issues that are n
Page 19 and 20:
8 for English translations of Greek
Page 21 and 22:
10 is created of the entire bibliog
Page 23 and 24:
12 directory of more than 2,100 cat
Page 25 and 26:
14 generated by the decision tree p
Page 27 and 28:
16 The image-registration algorithm
Page 29 and 30:
18 Schibel and Rydberg-Cox argued t
Page 31 and 32:
20 information ret
Page 33 and 34:
22 In ancient manuscripts, Sanskrit
Page 35 and 36:
24 algorithms; ins
Page 37 and 38:
26 Markov Models (MEMM) 82 and outp
Page 39 and 40:
28 One major project to recently em
Page 41 and 42:
30 developed a cuneiform sign reper
Page 43 and 44:
32 and prayers), and edited texts a
Page 45 and 46:
34 Digital critical editions, howev
Page 47 and 48:
36 While these requirements may see
Page 49 and 50:
38 many discrete texts, in<
Page 51 and 52:
40 examine how l<s
Page 53 and 54:
42 The second fact that Rob
Page 55 and 56:
44 “multimedia scholarly editions
Page 57 and 58:
46 different apparatuses. In Bosche
Page 59 and 60:
48 individual text
Page 61 and 62:
50 In these cases we must provide a
Page 63 and 64:
52 First, it utilizes a nearest nei
Page 65 and 66:
54 entirely pre-searched for each l
Page 67 and 68:
56 As this research in</str
Page 69 and 70:
58 the inscription
Page 71 and 72:
60 Text Min<strong
Page 73 and 74:
62 This manual analysis provided a
Page 75 and 76:
64 As illustrated by this def<stron
Page 77 and 78:
66 archaeology as a discipl
Page 79 and 80:
68 funded with public money <strong
Page 81 and 82:
70 for re-use is a simple slogan, b
Page 83 and 84:
72 framework to in
Page 85 and 86:
74 then mapped to the CRM-EH so tha
Page 87 and 88:
76 tDAR also stores all resources <
Page 89 and 90: 78 Many look to their in</s
Page 91 and 92: 80 Although the multidiscipl<strong
Page 93 and 94: 82 computer science can make it pos
Page 95 and 96: 84 were created after this time. A
Page 97 and 98: 86 transparency, accessibility, ava
Page 99 and 100: 88 (Research Archive for Ancient Sc
Page 101 and 102: 90 This challenge of not just digit
Page 103 and 104: 92 Part of the research of the Plei
Page 105 and 106: 94 After correctin
Page 107 and 108: 96 Epigraphy Overview: Epigraphy Da
Page 109 and 110: 98 step, because epigraphic texts s
Page 111 and 112: 100 text and to previous annotation
Page 113 and 114: 102 and squeezes as well as a selec
Page 115 and 116: 104 original); typ
Page 117 and 118: 106 To begin this
Page 119 and 120: 108 better set the in</stro
Page 121 and 122: 110 Another related project that ha
Page 123 and 124: 112 used to point
Page 125 and 126: 114 names in texts
Page 127 and 128: 116 and Manuscripts (VRE-SDM) 378 s
Page 129 and 130: 118 level of storage in</st
Page 131 and 132: 120 simplicity, and sensible file n
Page 133 and 134: 122 compliant DTD and schema. 393 I
Page 135 and 136: 124 While the advanced document-rec
Page 137 and 138: 126 Another significant manuscript
Page 139: 128 compelled to operate in
Page 143 and 144: 132 must consider commercial and en
Page 145 and 146: 134 standard developed for the desc
Page 147 and 148: 136 to standard catalogs, identific
Page 149 and 150: 138 present an extension and comb<s
Page 151 and 152: 140 scholars in th
Page 153 and 154: 142 described as the amicitia papyr
Page 155 and 156: 144 to a full text transcription wh
Page 157 and 158: 146 The final majo
Page 159 and 160: 148 One is toward openness; the oth
Page 161 and 162: 150 this idea of personal ownership
Page 163 and 164: 152 prototype 512 in</stron
Page 165 and 166: 154 other assertions made about par
Page 167 and 168: 156 The conclusion drawn from this
Page 169 and 170: 158 and that “is main</st
Page 171 and 172: 160 philology available at the webs
Page 173 and 174: 162 descriptions that provides scho
Page 175 and 176: 164 of multilingua
Page 177 and 178: 166 Another major methodological is
Page 179 and 180: 168 ancient historians. Network ana
Page 181 and 182: 170 Bradley and Short (2005) have o
Page 183 and 184: 172 “apparently disconnected and
Page 185 and 186: 174 sources used and their abbrevia
Page 187 and 188: 176 resources, inc
Page 189 and 190: 178 materials could be found, a rec
Page 191 and 192:
180 or more documents (69 percent)
Page 193 and 194:
182 This work tends to focus on des
Page 195 and 196:
184 A related poin
Page 197 and 198:
186 “infer user
Page 199 and 200:
188 accessed; few providers, if any
Page 201 and 202:
190 information so
Page 203 and 204:
192 features. In addition, while pa
Page 205 and 206:
194 postdoctoral students, who were
Page 207 and 208:
196 computers by ancient historians
Page 209 and 210:
198 Demos (a growin</strong
Page 211 and 212:
200 database of this encyclopedia c
Page 213 and 214:
202 academia.” At the same time,
Page 215 and 216:
204 granted their views of the pote
Page 217 and 218:
206 science and use ICT occasionall
Page 219 and 220:
208 also declared that the m<strong
Page 221 and 222:
210 were exceptionally diverse, as
Page 223 and 224:
212 across virtual collections of d
Page 225 and 226:
214 tradition of European literatur
Page 227 and 228:
216 the related MILARQ project, 649
Page 229 and 230:
218 initial fund<s
Page 231 and 232:
220 SPQR—Supportin</stron
Page 233 and 234:
222 environments in</strong
Page 235 and 236:
224 become less “localized” wit
Page 237 and 238:
226 methods,” Cohen et al. expla<
Page 239 and 240:
228 One might have presumed that ou
Page 241 and 242:
230 This lack of deeper understand<
Page 243 and 244:
232 expertise, tools, experience, a
Page 245 and 246:
234 All link<stron
Page 247 and 248:
236 of architecture, tools and serv
Page 249 and 250:
238 reported that eSAD was develop<
Page 251 and 252:
240 semantic tools that would make
Page 253 and 254:
242 Sustainable Pr
Page 255 and 256:
244 A recent ARL report that explor
Page 257 and 258:
246 organizations and fundi
Page 259 and 260:
248 throughout this review, i.e., o
Page 261 and 262:
250 commercial providers. At the sa
Page 263 and 264:
252 representations beyond the bord
Page 265 and 266:
254 such as Vindol
Page 267 and 268:
256 that the question of in
Page 269 and 270:
258 Linguistic Com
Page 271 and 272:
260 resources and technology overvi
Page 273 and 274:
262 According to B
Page 275 and 276:
264 support the main</stron
Page 277 and 278:
266 Interoperability”). 763 The m
Page 279 and 280:
268 as the point o
Page 281 and 282:
270 This architecture makes it easy
Page 283 and 284:
272 [Aschenbrenner et al. 2008]. As
Page 285 and 286:
274 Editions.” Advances i
Page 287 and 288:
276 [Bodard et al. 2009]. Bodard, G
Page 289 and 290:
278 [Bulger et al. 2011]. Bulger, M
Page 291 and 292:
280 [Choudhury and Stin</st
Page 293 and 294:
282 [Crane et al. 2009a]. Crane, Gr
Page 295 and 296:
284 [Dué and Ebbott 2009]. Dué, C
Page 297 and 298:
286 [Flaten 2009]. Flaten, Arne R.
Page 299 and 300:
288 [Hardwick 2000]. Hardwick, Lorn
Page 301 and 302:
290 Conference on E-Science Worksho
Page 303 and 304:
292 [Lockyear 2007]. Lockyear, Kris
Page 305 and 306:
294 Cambridge, MA: Association for
Page 307 and 308:
296 [Ntzios et al. 2007]. Ntzios, K
Page 309 and 310:
298 [Reddy and Crane 2006]. Reddy,
Page 311 and 312:
300 [Rydberg-Cox 2009]. Rydberg-Cox
Page 313 and 314:
302 [Smith 2010]. Smith, D. Neel.
Page 315 and 316:
304 and the Canadian Academic Commu
Page 317 and 318:
306 [Wallom et al. 2009]. Wallom, D
show all

Rome Wasn't Digitized in a Day - Council on Library and Information ...

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?