24.04.2013 Views

Experiences from the Europeana regia project - Herzog August ...

Experiences from the Europeana regia project - Herzog August ...

Experiences from the Europeana regia project - Herzog August ...

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Cataloguing manuscripts in an<br />

international context<br />

<strong>Experiences</strong> <strong>from</strong> <strong>the</strong> <strong>Europeana</strong> <strong>regia</strong> <strong>project</strong><br />

Torsten Schaßan – Encodage de documents – Caen – 2012, March 30


• „A digital cooperative library of roal manuscripts in<br />

Medieval and Renaissance Europe“<br />

• Co-funded by <strong>the</strong> European Commission (ICT-PSP, 50%)<br />

• Jan 2010 - June 2012<br />

• 5 partners<br />

<strong>Europeana</strong> Regia: partners<br />

– BnF Bibliothèque nationale de France (and many municipal<br />

libraries)<br />

– BSB Bayerische Staatsbiblio<strong>the</strong>k, Munich<br />

– BHUV Biblioteca historica, Universitat de Valencia<br />

– HAB <strong>Herzog</strong> <strong>August</strong> Biblio<strong>the</strong>k Wolfenbüttel<br />

– KBR Bibliothèque royale de Belgique, Brussels<br />

Torsten Schaßan – Encodage de documents – Caen – 2012, March 30<br />

2


• 3 collections<br />

ER: collections + formats<br />

– Biblio<strong>the</strong>ca Carolina (8 th /9 th cent., 425 mss)<br />

– The Library of King Charles V of France (14 th cent., 167 mss)<br />

– The library of <strong>the</strong> Aragonese Kings of Naples (15 th cent., 282<br />

mss)<br />

• Information in 6 languages<br />

– Catalan, Dutch, English, French, German, Spanish<br />

• Descriptions in 5 formats<br />

– MARC, EAD, TEI, MAB, MXML (=format of <strong>the</strong> German<br />

national manuscript database Manuscripta Mediaevalia)<br />

Torsten Schaßan – Encodage de documents – Caen – 2012, March 30<br />

3


1.Management<br />

2.Specification of metadata<br />

3.Integration of metadata<br />

4.Digitisation<br />

5.Integration of images<br />

6.Dissemination<br />

Workpackages<br />

WP leaders: Bnf 1+6, HAB 2+3, BSB 4+5<br />

Torsten Schaßan – Encodage de documents – Caen – 2012, March 30<br />

4


WP2 Specification of metadata<br />

• This work package aims at consolidating <strong>the</strong> list and<br />

format of metadata to be used by each participant.<br />

– formats<br />

– level of metadata<br />

– quality of metadata<br />

– organise ingestion format<br />

Torsten Schaßan – Encodage de documents – Caen – 2012, March 30<br />

5


D2.1 State of <strong>the</strong> art in metadata<br />

• global survey of cataloguing and metadata standards<br />

• obtain a matched OAI extraction despite <strong>the</strong> different<br />

formats used in libraries (EAD and TEI e.g.)<br />

– BNF: EAD<br />

– KBR: local DB, had to decide upon <strong>the</strong> format (=TEI)<br />

– BSB: MAB → METS/MODS, MXML → TEI<br />

– HAB: TEI<br />

– BHUV: MARC-XML<br />

– Lyons: special data, with mapping to MODS<br />

• quality and amount of metadata varied heavily<br />

Torsten Schaßan – Encodage de documents – Caen – 2012, March 30<br />

6<br />

6


D2.2 Minimum metadata<br />

• Which information about a manuscript is necessary as<br />

a very short overview and sufficient for basic<br />

orientation?<br />

– <strong>the</strong> manuscript identifiers (actual and former shelf marks,<br />

hosting institution or possessors)<br />

– a summary title<br />

– basic historical information (date of origin, place of origin,<br />

previous owners)<br />

– basic material information (material of <strong>the</strong> support, number of<br />

leaves, size of leaves)<br />

– introductory bibliographical information<br />

Torsten Schaßan – Encodage de documents – Caen – 2012, March 30<br />

7<br />

7


D2.2 Minimum metadata<br />

• <strong>Europeana</strong>‘s view (according to ESE v3.2):<br />

– Obligatory: europeana:rights<br />

– Strongly recommended<br />

• dc:title<br />

• dcterms:alternative<br />

• dc:creator<br />

• dc:date<br />

• dc:contributor<br />

• dcterms:created<br />

• dcterms:issued<br />

– Consider, how your data will perform in response to „who“, „what“,<br />

„where“, and „when“ queries<br />

Torsten Schaßan – Encodage de documents – Caen – 2012, March 30<br />

8<br />

8


D2.2 „Academic“ metadata<br />

• All information besides this minimum level is - toge<strong>the</strong>r<br />

with <strong>the</strong> minimum level - what is called "academic<br />

metadata" in <strong>the</strong> <strong>project</strong>, and will have to be added in a<br />

second step.<br />

• The <strong>project</strong> partners will make sure that very important<br />

bits of information (authors names, standardized titles<br />

of <strong>the</strong> contained texts) are accurately provided.<br />

• References to norm data like VIAF will be included in<br />

<strong>the</strong> descriptions.<br />

• These special metadata will be translated in <strong>the</strong><br />

relevant languages.<br />

Torsten Schaßan – Encodage de documents – Caen – 2012, March 30<br />

9<br />

9


D2.3 Vademecum for librarians<br />

• study of <strong>the</strong> existing descriptions (printed catalogues,<br />

card files, computer files)<br />

• selection of common metadata to be provided by each<br />

participant, formalised in a guideline for librarians and<br />

academic staff<br />

• description of <strong>the</strong> format of metadata, in TEI, EAD and<br />

MARC<br />

Torsten Schaßan – Encodage de documents – Caen – 2012, March 30<br />

10<br />

10


D2.4 Attractive Guidelines<br />

• Guidelines (were intended to) cover <strong>the</strong> areas<br />

– Content aggregation, metadata, image processing<br />

– Now: description of <strong>the</strong> <strong>project</strong>, <strong>the</strong> partners, <strong>the</strong> collections,<br />

minimum metadata, and user's requirements<br />

• They (were intended to) address an external audience<br />

– Librarians, scholars, professionals<br />

– Now: interested public<br />

• „Technical description“ (former D2.1 – D2.3) will be<br />

published separately<br />

Torsten Schaßan – Encodage de documents – Caen – 2012, March 30<br />

11<br />

11


WP3 Integration of metadata<br />

• Integration of <strong>the</strong> existing (minimum or academic)<br />

metadata in each library’s system,<br />

• Description of <strong>the</strong> digital object with a table of contents<br />

editor or image–related XML file, to update <strong>the</strong><br />

metadata according to recent research (i.e. : date,<br />

patron, artist, origin)<br />

• Eventually providing more detailed information in o<strong>the</strong>r<br />

internet resources<br />

• Ingestion of data in <strong>Europeana</strong><br />

Torsten Schaßan – Encodage de documents – Caen – 2012, March 30<br />

12


WP3 Progress on Metadata<br />

• Librarians will enter <strong>the</strong> minimum existing data in each library’s<br />

system, as a first step, in order to make <strong>the</strong>se metadata<br />

immediately available for <strong>Europeana</strong>.<br />

• Metadata will be updated and (if necessary) amended, following<br />

a scheme that will allow improved access to <strong>the</strong> digital copy and<br />

agreed among <strong>the</strong> <strong>project</strong> partners.<br />

• Full descriptions of <strong>the</strong> manuscripts will be accessible in<br />

specialized databases, e.g. Manuscripta Mediaevalia.<br />

• As far as possible this information should be made available to<br />

<strong>Europeana</strong>.<br />

Torsten Schaßan – Encodage de documents – Caen – 2012, March 30<br />

13<br />

13


Mapping data: General rules<br />

• Map as many of <strong>the</strong> original source elements as possible to <strong>the</strong><br />

available ESE elements<br />

• If this is not possible, leave it unmapped or consider using<br />

<br />

• If possible use one of <strong>the</strong> more specific refinements<br />

• Consider how to meet expectations of <strong>the</strong> user and <strong>the</strong><br />

functionality of <strong>the</strong> system best<br />

• Consider how <strong>the</strong> data would perform in response to “who, what,<br />

where and when” queries. This <strong>the</strong>refore encompasses names,<br />

types, places and dates relevant to <strong>the</strong> object and what it depicts<br />

Torsten Schaßan – Encodage de documents – Caen – 2012, March 30<br />

14<br />

14


WP2 vs. WP3<br />

• WP2 = Theoretical foundation (Me)<br />

• WP3 = Practical implementation (Stefanie Gehrke)<br />

Torsten Schaßan – Encodage de documents – Caen – 2012, March 30<br />

15<br />

15


Local presentation: HAB<br />

Torsten Schaßan – Encodage de documents – Caen – 2012, March 30<br />

Wolfenbütteler<br />

Buchspiegel


<strong>Europeana</strong>: Search result<br />

Torsten Schaßan – Encodage de documents – Caen – 2012, March 30<br />

Wolfenbütteler<br />

Buchspiegel


<strong>Europeana</strong>: Detail view<br />

Torsten Schaßan – Encodage de documents – Caen – 2012, March 30<br />

Wolfenbütteler<br />

Buchspiegel


europeana<strong>regia</strong>.eu: Homepage<br />

Torsten Schaßan – Encodage de documents – Caen – 2012, March 30<br />

Wolfenbütteler<br />

Buchspiegel


europeana<strong>regia</strong>.eu: byRepository -<br />

HAB<br />

Torsten Schaßan – Encodage de documents – Caen – 2012, March 30<br />

Wolfenbütteler<br />

Buchspiegel


europeana<strong>regia</strong>.eu: ms detail<br />

Torsten Schaßan – Encodage de documents – Caen – 2012, March 30<br />

Wolfenbütteler<br />

Buchspiegel


Decisions: customise + delivery<br />

• All institutions needed to adapt <strong>the</strong>ir cataloguing<br />

formats<br />

– Customise <strong>the</strong> ENRICH-TEI schema, adapt TEI, adapt<br />

AMREMM<br />

• Aggregation via TEL (The European Library)<br />

– obligatory: rights declaration → tei:availability<br />

– obligatory: thumbnail → tei:pubPlace/tei:ptr<br />

– needed: <strong>project</strong> / collection → tei:<strong>project</strong>Desc<br />

• Delivery of ESE, preparation for EDM<br />

– EDM still under construction, but possibility “to represent”<br />

manuscript data<br />

Torsten Schaßan – Encodage de documents – Caen – 2012, March 30<br />

22


Local Decisions: Theory<br />

• One TEI file for <strong>the</strong> manuscript, referencing <strong>the</strong><br />

facsimile and resources (descriptions, websites, etc.)<br />

– Only „minimum“ metadata, ready to export to <strong>Europeana</strong><br />

– Will be updated<br />

– = metadata, to be stored in <br />

• One TEI file for each description<br />

– As rich description as it has been originally<br />

– Will stay as it is, as it represents an „historical“ document<br />

– = data, to be stored in <br />

Torsten Schaßan – Encodage de documents – Caen – 2012, March 30<br />

23


• TEL hasn't dealt with TEI <strong>the</strong>mselves<br />

– Crosswalk had to be implemented → we supplied XSLTs<br />

• How to<br />

Mapping: Practice<br />

– make sure every institution submits <strong>the</strong> same set of<br />

information?<br />

– make similar information <strong>from</strong> different formats look<br />

alike?<br />

• Refinement of <strong>the</strong> mapping table prepared in WP2<br />

Torsten Schaßan – Encodage de documents – Caen – 2012, March 30<br />

24


Decisions: translate + harmonise<br />

• Translation<br />

– Some of <strong>the</strong> basic categories can be translated<br />

(semi-)automatically (names, dates, etc)<br />

– In order to „avoid“ translation, use latin names and text-titles<br />

• Harmonisation<br />

– Done during transformation (e.g. via OAI-MPH) respectively<br />

during processing by aggregator<br />

→ break with habits: in TEL summary title contains shelfmark<br />

• Normalisation, semantic quality<br />

– Norm data like VIAF, TGN, etc. shall be applied wherever<br />

possible<br />

Torsten Schaßan – Encodage de documents – Caen – 2012, March 30<br />

25


• Subject classification<br />

– In order to allow for browsing through repositories, subject<br />

classification would be helpful → special index entries?<br />

• Ontologies<br />

Wish-list<br />

– Norm vocabulary would be helpful, e.g. for bindings,<br />

decoration, muscial notation, scripts, etc.<br />

→ cf. <strong>Europeana</strong> Regia's TEI customisation<br />

Torsten Schaßan – Encodage de documents – Caen – 2012, March 30<br />

26


• The <strong>project</strong> has seen many changes<br />

– Implement workflows for <strong>the</strong> first time (digitisation + metadata<br />

- KBR; OAI for mss → HAB; norm data → BHUV, KBR)<br />

– Adaptations (AMREMM – BHUV)<br />

– Delivery to <strong>Europeana</strong> → aggregation through TEL<br />

– Adapt export formats → harmonisation through TEL<br />

• Adaptation of OAI difficult for BnF/BSB → selection by TEL<br />

– ESE → EDM<br />

Chan(c|g)es<br />

– Copyright status: RR-F → CC0<br />

– Organise multilingual access ourselves → europeana<strong>regia</strong>.eu<br />

Torsten Schaßan – Encodage de documents – Caen – 2012, March 30<br />

27


Conclusions I<br />

• Cataloguing in a world of electronic publication and<br />

distribution, portals, and <strong>the</strong> need to exchange data<br />

needs to take into account<br />

– Data formats (local practices)<br />

– Publication paths (in print, electronic)<br />

– Mapping paths (generalisation of data types)<br />

– even: arranging data (position of kinds of data in <strong>the</strong><br />

character stream)<br />

Torsten Schaßan – Encodage de documents – Caen – 2012, March 30<br />

28


• Additonally, some bits of information need to be<br />

encoded (more) explicitely<br />

– e.g. textual language<br />

Conclusions II<br />

• Data tends to sit in multiple places<br />

– Each of <strong>the</strong>m with special views, interests<br />

– Still: <strong>the</strong> most up-to-date and complete information will be<br />

available usually <strong>from</strong> local presentations<br />

• Until <strong>the</strong> realisation of <strong>the</strong> Semantic Web (and having<br />

solved some copyright issues) we might do with slim<br />

descriptions in portals and <strong>the</strong> richness in local (i.e.<br />

specialised) presentations.<br />

Torsten Schaßan – Encodage de documents – Caen – 2012, March 30<br />

29


• http://www.europeana<strong>regia</strong>.eu<br />

• http://www.europeana.eu<br />

• http://www.hab.de<br />

References<br />

• http://diglib.hab.de/?db=mss<br />

Torsten Schaßan – Encodage de documents – Caen – 2012, March 30<br />

30


AMREMM Descriptive Cataloguing of Ancient, Medieval, Renaissance, and Earlymodern<br />

Manuscripts<br />

EDM <strong>Europeana</strong> Data Model<br />

ENRICH European Networking Resources and Information concerning Cultural<br />

Heritage<br />

07_GoogleBooks_Blick-ins-<br />

Buch_Suchergebnis<br />

ESE <strong>Europeana</strong> Semantic Elements<br />

MAB Maschinelles Austauschformat für Biblio<strong>the</strong>ken<br />

OAI-PMH Open Archives Initiative, Protocol for Metadata Harvesting<br />

PND (GND) Personennamendatei (→ Gemeinsame Normdatei)<br />

TEL The European Library<br />

Glossary<br />

TGN Getty Thesaurus of Geographical Names<br />

VIAF Virtual International Authority File<br />

Torsten Schaßan – Encodage de documents – Caen – 2012, March 30<br />

31

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!