Experiences from the Europeana regia project - Herzog August ...
Experiences from the Europeana regia project - Herzog August ...
Experiences from the Europeana regia project - Herzog August ...
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
Cataloguing manuscripts in an<br />
international context<br />
<strong>Experiences</strong> <strong>from</strong> <strong>the</strong> <strong>Europeana</strong> <strong>regia</strong> <strong>project</strong><br />
Torsten Schaßan – Encodage de documents – Caen – 2012, March 30
• „A digital cooperative library of roal manuscripts in<br />
Medieval and Renaissance Europe“<br />
• Co-funded by <strong>the</strong> European Commission (ICT-PSP, 50%)<br />
• Jan 2010 - June 2012<br />
• 5 partners<br />
<strong>Europeana</strong> Regia: partners<br />
– BnF Bibliothèque nationale de France (and many municipal<br />
libraries)<br />
– BSB Bayerische Staatsbiblio<strong>the</strong>k, Munich<br />
– BHUV Biblioteca historica, Universitat de Valencia<br />
– HAB <strong>Herzog</strong> <strong>August</strong> Biblio<strong>the</strong>k Wolfenbüttel<br />
– KBR Bibliothèque royale de Belgique, Brussels<br />
Torsten Schaßan – Encodage de documents – Caen – 2012, March 30<br />
2
• 3 collections<br />
ER: collections + formats<br />
– Biblio<strong>the</strong>ca Carolina (8 th /9 th cent., 425 mss)<br />
– The Library of King Charles V of France (14 th cent., 167 mss)<br />
– The library of <strong>the</strong> Aragonese Kings of Naples (15 th cent., 282<br />
mss)<br />
• Information in 6 languages<br />
– Catalan, Dutch, English, French, German, Spanish<br />
• Descriptions in 5 formats<br />
– MARC, EAD, TEI, MAB, MXML (=format of <strong>the</strong> German<br />
national manuscript database Manuscripta Mediaevalia)<br />
Torsten Schaßan – Encodage de documents – Caen – 2012, March 30<br />
3
1.Management<br />
2.Specification of metadata<br />
3.Integration of metadata<br />
4.Digitisation<br />
5.Integration of images<br />
6.Dissemination<br />
Workpackages<br />
WP leaders: Bnf 1+6, HAB 2+3, BSB 4+5<br />
Torsten Schaßan – Encodage de documents – Caen – 2012, March 30<br />
4
WP2 Specification of metadata<br />
• This work package aims at consolidating <strong>the</strong> list and<br />
format of metadata to be used by each participant.<br />
– formats<br />
– level of metadata<br />
– quality of metadata<br />
– organise ingestion format<br />
Torsten Schaßan – Encodage de documents – Caen – 2012, March 30<br />
5
D2.1 State of <strong>the</strong> art in metadata<br />
• global survey of cataloguing and metadata standards<br />
• obtain a matched OAI extraction despite <strong>the</strong> different<br />
formats used in libraries (EAD and TEI e.g.)<br />
– BNF: EAD<br />
– KBR: local DB, had to decide upon <strong>the</strong> format (=TEI)<br />
– BSB: MAB → METS/MODS, MXML → TEI<br />
– HAB: TEI<br />
– BHUV: MARC-XML<br />
– Lyons: special data, with mapping to MODS<br />
• quality and amount of metadata varied heavily<br />
Torsten Schaßan – Encodage de documents – Caen – 2012, March 30<br />
6<br />
6
D2.2 Minimum metadata<br />
• Which information about a manuscript is necessary as<br />
a very short overview and sufficient for basic<br />
orientation?<br />
– <strong>the</strong> manuscript identifiers (actual and former shelf marks,<br />
hosting institution or possessors)<br />
– a summary title<br />
– basic historical information (date of origin, place of origin,<br />
previous owners)<br />
– basic material information (material of <strong>the</strong> support, number of<br />
leaves, size of leaves)<br />
– introductory bibliographical information<br />
Torsten Schaßan – Encodage de documents – Caen – 2012, March 30<br />
7<br />
7
D2.2 Minimum metadata<br />
• <strong>Europeana</strong>‘s view (according to ESE v3.2):<br />
– Obligatory: europeana:rights<br />
– Strongly recommended<br />
• dc:title<br />
• dcterms:alternative<br />
• dc:creator<br />
• dc:date<br />
• dc:contributor<br />
• dcterms:created<br />
• dcterms:issued<br />
– Consider, how your data will perform in response to „who“, „what“,<br />
„where“, and „when“ queries<br />
Torsten Schaßan – Encodage de documents – Caen – 2012, March 30<br />
8<br />
8
D2.2 „Academic“ metadata<br />
• All information besides this minimum level is - toge<strong>the</strong>r<br />
with <strong>the</strong> minimum level - what is called "academic<br />
metadata" in <strong>the</strong> <strong>project</strong>, and will have to be added in a<br />
second step.<br />
• The <strong>project</strong> partners will make sure that very important<br />
bits of information (authors names, standardized titles<br />
of <strong>the</strong> contained texts) are accurately provided.<br />
• References to norm data like VIAF will be included in<br />
<strong>the</strong> descriptions.<br />
• These special metadata will be translated in <strong>the</strong><br />
relevant languages.<br />
Torsten Schaßan – Encodage de documents – Caen – 2012, March 30<br />
9<br />
9
D2.3 Vademecum for librarians<br />
• study of <strong>the</strong> existing descriptions (printed catalogues,<br />
card files, computer files)<br />
• selection of common metadata to be provided by each<br />
participant, formalised in a guideline for librarians and<br />
academic staff<br />
• description of <strong>the</strong> format of metadata, in TEI, EAD and<br />
MARC<br />
Torsten Schaßan – Encodage de documents – Caen – 2012, March 30<br />
10<br />
10
D2.4 Attractive Guidelines<br />
• Guidelines (were intended to) cover <strong>the</strong> areas<br />
– Content aggregation, metadata, image processing<br />
– Now: description of <strong>the</strong> <strong>project</strong>, <strong>the</strong> partners, <strong>the</strong> collections,<br />
minimum metadata, and user's requirements<br />
• They (were intended to) address an external audience<br />
– Librarians, scholars, professionals<br />
– Now: interested public<br />
• „Technical description“ (former D2.1 – D2.3) will be<br />
published separately<br />
Torsten Schaßan – Encodage de documents – Caen – 2012, March 30<br />
11<br />
11
WP3 Integration of metadata<br />
• Integration of <strong>the</strong> existing (minimum or academic)<br />
metadata in each library’s system,<br />
• Description of <strong>the</strong> digital object with a table of contents<br />
editor or image–related XML file, to update <strong>the</strong><br />
metadata according to recent research (i.e. : date,<br />
patron, artist, origin)<br />
• Eventually providing more detailed information in o<strong>the</strong>r<br />
internet resources<br />
• Ingestion of data in <strong>Europeana</strong><br />
Torsten Schaßan – Encodage de documents – Caen – 2012, March 30<br />
12
WP3 Progress on Metadata<br />
• Librarians will enter <strong>the</strong> minimum existing data in each library’s<br />
system, as a first step, in order to make <strong>the</strong>se metadata<br />
immediately available for <strong>Europeana</strong>.<br />
• Metadata will be updated and (if necessary) amended, following<br />
a scheme that will allow improved access to <strong>the</strong> digital copy and<br />
agreed among <strong>the</strong> <strong>project</strong> partners.<br />
• Full descriptions of <strong>the</strong> manuscripts will be accessible in<br />
specialized databases, e.g. Manuscripta Mediaevalia.<br />
• As far as possible this information should be made available to<br />
<strong>Europeana</strong>.<br />
Torsten Schaßan – Encodage de documents – Caen – 2012, March 30<br />
13<br />
13
Mapping data: General rules<br />
• Map as many of <strong>the</strong> original source elements as possible to <strong>the</strong><br />
available ESE elements<br />
• If this is not possible, leave it unmapped or consider using<br />
<br />
• If possible use one of <strong>the</strong> more specific refinements<br />
• Consider how to meet expectations of <strong>the</strong> user and <strong>the</strong><br />
functionality of <strong>the</strong> system best<br />
• Consider how <strong>the</strong> data would perform in response to “who, what,<br />
where and when” queries. This <strong>the</strong>refore encompasses names,<br />
types, places and dates relevant to <strong>the</strong> object and what it depicts<br />
Torsten Schaßan – Encodage de documents – Caen – 2012, March 30<br />
14<br />
14
WP2 vs. WP3<br />
• WP2 = Theoretical foundation (Me)<br />
• WP3 = Practical implementation (Stefanie Gehrke)<br />
Torsten Schaßan – Encodage de documents – Caen – 2012, March 30<br />
15<br />
15
Local presentation: HAB<br />
Torsten Schaßan – Encodage de documents – Caen – 2012, March 30<br />
Wolfenbütteler<br />
Buchspiegel
<strong>Europeana</strong>: Search result<br />
Torsten Schaßan – Encodage de documents – Caen – 2012, March 30<br />
Wolfenbütteler<br />
Buchspiegel
<strong>Europeana</strong>: Detail view<br />
Torsten Schaßan – Encodage de documents – Caen – 2012, March 30<br />
Wolfenbütteler<br />
Buchspiegel
europeana<strong>regia</strong>.eu: Homepage<br />
Torsten Schaßan – Encodage de documents – Caen – 2012, March 30<br />
Wolfenbütteler<br />
Buchspiegel
europeana<strong>regia</strong>.eu: byRepository -<br />
HAB<br />
Torsten Schaßan – Encodage de documents – Caen – 2012, March 30<br />
Wolfenbütteler<br />
Buchspiegel
europeana<strong>regia</strong>.eu: ms detail<br />
Torsten Schaßan – Encodage de documents – Caen – 2012, March 30<br />
Wolfenbütteler<br />
Buchspiegel
Decisions: customise + delivery<br />
• All institutions needed to adapt <strong>the</strong>ir cataloguing<br />
formats<br />
– Customise <strong>the</strong> ENRICH-TEI schema, adapt TEI, adapt<br />
AMREMM<br />
• Aggregation via TEL (The European Library)<br />
– obligatory: rights declaration → tei:availability<br />
– obligatory: thumbnail → tei:pubPlace/tei:ptr<br />
– needed: <strong>project</strong> / collection → tei:<strong>project</strong>Desc<br />
• Delivery of ESE, preparation for EDM<br />
– EDM still under construction, but possibility “to represent”<br />
manuscript data<br />
Torsten Schaßan – Encodage de documents – Caen – 2012, March 30<br />
22
Local Decisions: Theory<br />
• One TEI file for <strong>the</strong> manuscript, referencing <strong>the</strong><br />
facsimile and resources (descriptions, websites, etc.)<br />
– Only „minimum“ metadata, ready to export to <strong>Europeana</strong><br />
– Will be updated<br />
– = metadata, to be stored in <br />
• One TEI file for each description<br />
– As rich description as it has been originally<br />
– Will stay as it is, as it represents an „historical“ document<br />
– = data, to be stored in <br />
Torsten Schaßan – Encodage de documents – Caen – 2012, March 30<br />
23
• TEL hasn't dealt with TEI <strong>the</strong>mselves<br />
– Crosswalk had to be implemented → we supplied XSLTs<br />
• How to<br />
Mapping: Practice<br />
– make sure every institution submits <strong>the</strong> same set of<br />
information?<br />
– make similar information <strong>from</strong> different formats look<br />
alike?<br />
• Refinement of <strong>the</strong> mapping table prepared in WP2<br />
Torsten Schaßan – Encodage de documents – Caen – 2012, March 30<br />
24
Decisions: translate + harmonise<br />
• Translation<br />
– Some of <strong>the</strong> basic categories can be translated<br />
(semi-)automatically (names, dates, etc)<br />
– In order to „avoid“ translation, use latin names and text-titles<br />
• Harmonisation<br />
– Done during transformation (e.g. via OAI-MPH) respectively<br />
during processing by aggregator<br />
→ break with habits: in TEL summary title contains shelfmark<br />
• Normalisation, semantic quality<br />
– Norm data like VIAF, TGN, etc. shall be applied wherever<br />
possible<br />
Torsten Schaßan – Encodage de documents – Caen – 2012, March 30<br />
25
• Subject classification<br />
– In order to allow for browsing through repositories, subject<br />
classification would be helpful → special index entries?<br />
• Ontologies<br />
Wish-list<br />
– Norm vocabulary would be helpful, e.g. for bindings,<br />
decoration, muscial notation, scripts, etc.<br />
→ cf. <strong>Europeana</strong> Regia's TEI customisation<br />
Torsten Schaßan – Encodage de documents – Caen – 2012, March 30<br />
26
• The <strong>project</strong> has seen many changes<br />
– Implement workflows for <strong>the</strong> first time (digitisation + metadata<br />
- KBR; OAI for mss → HAB; norm data → BHUV, KBR)<br />
– Adaptations (AMREMM – BHUV)<br />
– Delivery to <strong>Europeana</strong> → aggregation through TEL<br />
– Adapt export formats → harmonisation through TEL<br />
• Adaptation of OAI difficult for BnF/BSB → selection by TEL<br />
– ESE → EDM<br />
Chan(c|g)es<br />
– Copyright status: RR-F → CC0<br />
– Organise multilingual access ourselves → europeana<strong>regia</strong>.eu<br />
Torsten Schaßan – Encodage de documents – Caen – 2012, March 30<br />
27
Conclusions I<br />
• Cataloguing in a world of electronic publication and<br />
distribution, portals, and <strong>the</strong> need to exchange data<br />
needs to take into account<br />
– Data formats (local practices)<br />
– Publication paths (in print, electronic)<br />
– Mapping paths (generalisation of data types)<br />
– even: arranging data (position of kinds of data in <strong>the</strong><br />
character stream)<br />
Torsten Schaßan – Encodage de documents – Caen – 2012, March 30<br />
28
• Additonally, some bits of information need to be<br />
encoded (more) explicitely<br />
– e.g. textual language<br />
Conclusions II<br />
• Data tends to sit in multiple places<br />
– Each of <strong>the</strong>m with special views, interests<br />
– Still: <strong>the</strong> most up-to-date and complete information will be<br />
available usually <strong>from</strong> local presentations<br />
• Until <strong>the</strong> realisation of <strong>the</strong> Semantic Web (and having<br />
solved some copyright issues) we might do with slim<br />
descriptions in portals and <strong>the</strong> richness in local (i.e.<br />
specialised) presentations.<br />
Torsten Schaßan – Encodage de documents – Caen – 2012, March 30<br />
29
• http://www.europeana<strong>regia</strong>.eu<br />
• http://www.europeana.eu<br />
• http://www.hab.de<br />
References<br />
• http://diglib.hab.de/?db=mss<br />
Torsten Schaßan – Encodage de documents – Caen – 2012, March 30<br />
30
AMREMM Descriptive Cataloguing of Ancient, Medieval, Renaissance, and Earlymodern<br />
Manuscripts<br />
EDM <strong>Europeana</strong> Data Model<br />
ENRICH European Networking Resources and Information concerning Cultural<br />
Heritage<br />
07_GoogleBooks_Blick-ins-<br />
Buch_Suchergebnis<br />
ESE <strong>Europeana</strong> Semantic Elements<br />
MAB Maschinelles Austauschformat für Biblio<strong>the</strong>ken<br />
OAI-PMH Open Archives Initiative, Protocol for Metadata Harvesting<br />
PND (GND) Personennamendatei (→ Gemeinsame Normdatei)<br />
TEL The European Library<br />
Glossary<br />
TGN Getty Thesaurus of Geographical Names<br />
VIAF Virtual International Authority File<br />
Torsten Schaßan – Encodage de documents – Caen – 2012, March 30<br />
31