26.07.2013 Views

Slovenian Biographical Lexicon ? From a Digital Edition to an On ...

Slovenian Biographical Lexicon ? From a Digital Edition to an On ...

Slovenian Biographical Lexicon ? From a Digital Edition to an On ...

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

INFuture2009: “<strong>Digital</strong> Resources <strong>an</strong>d Knowledge Sharing”<br />

tion of data pertaining <strong>to</strong> people <strong>an</strong>d places <strong>an</strong>d improved specifications for encoding<br />

textual alternatives. Additionally, TEI P5 takes adv<strong>an</strong>tage of the power<br />

of XML schema l<strong>an</strong>guages, so that other XML tag-sets, such as MathML or<br />

SVG, c<strong>an</strong> now be referenced from within a TEI document <strong>an</strong>d a TEI document<br />

c<strong>an</strong> be embedded within other types of XML documents, such as METS <strong>an</strong>d<br />

MODS records (Burnard, Baum<strong>an</strong>, 2007), which turned out <strong>to</strong> be crucial for the<br />

implementation of our on-line reposi<strong>to</strong>ry, since this makes TEI a well-behaved<br />

XML citizen, able <strong>to</strong> take part in <strong>an</strong>y, however complex, XML processing<br />

chains <strong>an</strong>d composite documents.<br />

XML Data Source Structure<br />

The vast majority of SBL articles present information on the life <strong>an</strong>d actions of<br />

a single person, while some describe well known families, detailing life <strong>an</strong>d<br />

work of several members of the family. An article usually starts with the name<br />

of the person or the family, its vari<strong>an</strong>ts, mostly those used <strong>to</strong>wards the end of<br />

their life or the most generally used, followed by a chronological summary of<br />

the person’s life <strong>an</strong>d activity, including birth, death, locations, occupations, activities<br />

etc. An article may consist of one (usually) or m<strong>an</strong>y paragraphs, depending<br />

on the exhaustiveness of the article, <strong>an</strong>d is written in dense l<strong>an</strong>guage,<br />

using abbreviations wherever possible, ending with a brief bibliography <strong>an</strong>d<br />

other materials relev<strong>an</strong>t <strong>to</strong> the person, such as portraits or pho<strong>to</strong>graphs.<br />

The text of the articles has been digitized <strong>an</strong>d m<strong>an</strong>ually revised <strong>to</strong> fix OCR errors<br />

before it was au<strong>to</strong>matically converted in<strong>to</strong> the basic TEI-XML format. In<br />

the next stage, segments of text that needed <strong>to</strong> be marked up but could not be<br />

identified au<strong>to</strong>matically had <strong>to</strong> be tagged m<strong>an</strong>ually, in particular with details<br />

such as different vari<strong>an</strong>ts of names (linguistic <strong>an</strong>d orographic variations, married<br />

names, ecclesiastic names <strong>an</strong>d titles, pseudonyms, complex name parts in<br />

the case of foreign names <strong>an</strong>d names with denotation of nobility etc.), making<br />

the process slow <strong>an</strong>d error-prone. Since the original data was not normalized,<br />

considerable effort had <strong>to</strong> be spent <strong>to</strong> achieve high quality TEI XML mark up,<br />

<strong>an</strong>d some work with data normalization is still ongoing. (The major aspects of<br />

this conversion process have been reported in more detail in Vide Ogrin, Erjavec<br />

2007). In this m<strong>an</strong>ner, essential information about the subject of the article<br />

<strong>an</strong>d its bibliographical section have been encoded with special purpose elements<br />

from TEI P5 biographical <strong>an</strong>d prosopographical modules, representing<br />

each article as a element starting with a element, which<br />

contains the detailed information on the subject of the article (names, sex,<br />

birth/death date <strong>an</strong>d location, locations of activities, occupations or activities<br />

etc.), but also meta data, (volume <strong>an</strong>d year of the first publication, author of the<br />

article, revision status etc.). This element is followed with one or more <br />

elements with the article text <strong>an</strong>d a element with the extracted bibliographical<br />

data.<br />

254

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!