11.07.2015 Views

European Cultural Heritage Online - ECHO

European Cultural Heritage Online - ECHO

European Cultural Heritage Online - ECHO

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

<strong>European</strong> <strong>Cultural</strong> <strong>Heritage</strong> <strong>Online</strong><strong>ECHO</strong>PUBLICContract n°: HPSE / 2002 / 00137Title: State-of-the-art review on humanities, social sciences, and cultural heritageknowledgebasesAuthors : Sven Strömqvist, Juergen Renn, Elisabeth Kieven, Peter Wittenburg, DimitriKaradimas & alteraConcerned WPs : WP1 – State-of-the-Art (D1.1)Abstract: The report offers a bird's eye perspective on the situation related to bringing the fourdomains (Languages, Ethnography and museum objects, History of science, and History of art) ofthe <strong>European</strong> cultural heritage online. The report is based on experiences during the first 6months of the <strong>ECHO</strong> pilot project and seeks to shed light on the prospects of creating shared,searchable and browsable digital content for the humanities in the EU. The report centers aroundfour key notions: 1) technology, 2) content, 3) metadata, infrastructure, interoperability, and 4)users.It is argued that part of the explanation for the lack of accessible digital content in the humanitiesso far has to do with lack of metadata, infrastructure and interoperability, as well as withinsufficient attention to user perspectives. This points to the crucial role that the AGORA has toplay in a fullfledged <strong>ECHO</strong> project, bringing Europe's cultural heritage online.Published in :Keywords: social sciences, cultural heritage, online, state of the ArtDate of issue of this report: June 2003Project financed within the Key ActionImproving the Socio-economic Knowledge Base


ContentEUROPEAN CULTURAL HERITAGE ONLINE...................................................................................................11. INTRODUCTION (RENN & STRÖMQVIST).....................................................................................................3The goals of <strong>ECHO</strong>................................................................................................................................................3The present report..................................................................................................................................................32. TECHNOLOGY IN THE CULTURAL HERITAGE SECTOR (WITTENBURG) .......................................43. THE CONTENT ........................................................................................................................................................5Increasing quantities .............................................................................................................................................5The four cultural domains .....................................................................................................................................53.1 LANGUAGES (STRÖMQVIST, UNESON, EKLUND & CRASBORN)..........................................................................6The present situation..............................................................................................................................................6The call for language resources............................................................................................................................7Workshops and training.........................................................................................................................................8Minority languages................................................................................................................................................8Sign languages.......................................................................................................................................................9Language education...............................................................................................................................................93.2 ETHNOGRAPHY AND MUSEUM OBJECTS (KARADIMAS ET AL) ...........................................................................10Non-<strong>European</strong> objects and Societies within the <strong>European</strong> Museums (NECEP)...............................................10Conclusions and directions for the future...........................................................................................................113.3 HISTORY OF SCIENCE (RENN)..............................................................................................................................11Clearing houses ...................................................................................................................................................11History of science information on the Web.........................................................................................................12Annotation and collaboration through the Web.................................................................................................13The role of the history of science case study on the Web...................................................................................133.4 HISTORY OF ART (KIEVEN) .................................................................................................................................13Museum and other collections in web databases ...............................................................................................14Collections and web portals of digital images ...................................................................................................15Annotation and collaboration through the web..................................................................................................15The role of Lineamenta and ZUCCARO in the situation described above .......................................................164. INTEROPERABILITY (WITTENBURG AND STRÖMQVIST) ...................................................................17The issues .............................................................................................................................................................17Workshops............................................................................................................................................................185. USER GROUPS (STRÖMQVIST) .......................................................................................................................18Lund University....................................................................................................................................................18Schools .................................................................................................................................................................19Museum visitors ...................................................................................................................................................196. CONCLUSIONS (RENN & STRÖMQVIST) .....................................................................................................202


1. Introduction (Renn & Strömqvist)The goals of <strong>ECHO</strong>The aim of the creation of the Internet in the late 1960s was to interconnect a small number ofmain frames, in order to share computational resources more efficiently. The creation of theWorld Wide Web in the early 1990s was initiated at the CERN laboratory in order for theresearch community in high-energy physics to be able to search and find information in alarge number of documents across different computers and networks. Both endeavours weredriven by economical reasoning – to make better usage of fragmented resources – and theyboth proved to be extremely good investments.The time is now ripe for the humanities to make truly advanced usage of the Internet and theWorld Wide Web, to overcome the present fragmentation of data essential to research andeducation - a fragmentation which represents an enormous waste of resources. <strong>ECHO</strong>(<strong>European</strong> <strong>Cultural</strong> <strong>Heritage</strong> <strong>Online</strong>) proposes technology and infrastructure to start thatprocess. <strong>ECHO</strong> is conducive to more creative and efficient use of resources in the humanities- for research and education – in order to secure the vitality of our cultural heritage.The goal of <strong>ECHO</strong> is threefold:First, an assessment of the present situation related to bringing <strong>European</strong> culturalheritage online. In view of the fragmentation of endeavours presently undertaken, it isnecessary to assess the implementation of Information Technology for preserving,sharing, and studying this heritage in different disciplines and nations.Second, the exploration of a novel IT-based cooperative research infrastructure. Theproject will create, within its limited scope, a model implementation of a newcooperative research infrastructure, that aims at mobilising and bringing together allrelevant actors (universities, museums, libraries, archives, (national) research councils,digital heritage organisations, and companies) in the broad field of the humanities andcultural heritage in Europe.Third, a paradigmatic proof of the new potentials for research offered by thisinfrastructure. By taking up four paradigmatic content areas in the humanities, from thehistory of art, the history of science, language studies, and social and culturalanthropology, respectively, the project aims at demonstrating the innovative potentialfor research offered by this infrastructure, the AGORA.The present reportThe present report offers a bird’s eye perspective on the situation related to bringing thesefour domains of the <strong>European</strong> cultural heritage online. The report is based on experiencesduring the first 6 months of the <strong>ECHO</strong> pilot project and seeks to shed light on the prospects ofcreating shared, searchable and browsable digital content for the humanities in the EU. Thereport centers around four key notions: 1) technology, 2) content, 3) metadata, infrastructure,interoperability, and 4) users. It is argued that part of the explanation for the lack of accessibledigital content in the humanities so far has to do with lack of metadata, infrastructure andinteroperability, as well as with insufficient attention to user perspectives. This points to thecrucial role that the AGORA has to play in a fullfledged <strong>ECHO</strong> project, bringing Europe’scultural heritage online.3


2. Technology in the cultural heritage sector (Wittenburg)Various discussions at recent meetings with leading cultural heritage (CH) institutions inEurope and with partners in <strong>ECHO</strong> confirmed the impressions about the state of technology inthe CH domain that were made in the <strong>ECHO</strong> documents. In detail we can say the following:• Some institutions created their own database solution that serves exactly the needs ofthe particular institutes. Others are at the very beginning in setting up such databases.• It will become non-trivial to combine the different database solutions already at thestructural and repository level, since only few institutions formulated openaccessibility as a goal. In the institutions often there is no knowledge how to accessthe databases for example to harvest certain descriptions.• The most urgent needs are directed towards establishing a descriptive metadatadomain to allow easy resource management and discovery on the institute’s holdings.CH institutes are faced with gigantic problems here, since they often have to createhundreds of thousands of such descriptions.• Even the concept of descriptive metadata is not at all clear to all institutions and othershave not yet decided about the set to be used (even typical libraries such as the DutchNational Library came to the conclusion that Dublin Core will not be sufficient fortheir purposes). Only few are at an advanced state insofar that they have decided abouta set, have clear and open concept definitions and can offer the descriptions forharvesting.• CH institutions are very much interested in tools that allow them to create andmaintain metadata descriptions (semi) automatically, since otherwise they will only beable to partly describe their holding.• More advanced notions as being discussed in <strong>ECHO</strong> such as rich and collaborativeannotations are seen as future visions far away from concrete reality. There are notools yet that support this kind of work that are known to them and acceptable.• In general these CH institutions see the potential of offering their holding on the netand even allow other people to add comments, photos etc. They know that there aremany groups of experts that know much more details about part of their holding thanthe archivists themselves and they would, of course, like to include this expertknowledge. But yet they have not had time to think about solutions.• Interoperability issues are just started to be discussed by the more advancedinstitutions, but there are yet no attempts to achieve interoperability at the semanticlevel. Creating the necessary ontological categories for the individual disciplines and,even more demanding, interrelating them, is a very difficult endeavor and yet it cannot be seen when and how this can be tackled.• Almost all institutions see big problems with respect to IPR aspects 1 , therefore theyare still hesitant about how to present their holding in the Internet. This problem issolved partly by having ready-made web-sites as “guided tours” such that theinstitutions can put their “brand name” everywhere and can use low-resolution imagesetc. They see the great potential of offering their holding such that external people candefine their own view on collections from different museums, i.e. combining theholdings to get new insights. However, this would mean that they completely loose1 In this note just one aspect related to IPR is discussed, but there are in fact several. It is strongly recommendedto organize a big workshop about IPR questions under the <strong>ECHO</strong> umbrella. First discussions have taken place inthis respect and we expect to organize such an <strong>ECHO</strong> workshop in January.4


control of the usage of their holding which could mean that the brand name of theinstitute (the anchor for them to earn money which they have to survive) coulddisappear from the newly created web-offering. Therefore, we need a new ethicalfoundation and a clarification of the legal situation on a world-wide scale, since webofferingsare available world-wide, in order to ensure that the principles mentioned inthe <strong>ECHO</strong> charter can be followed.In conclusion, we are, as yet, far away from what is referred to in the <strong>ECHO</strong> proposal as acommon technological framework. In most cases, we lack the simplest levels that would makepossible a combination of collections. Moreover, different methods have been established inthe diferent disciplines and these are partly linked to commercial considerations. These couldform additional obstacles on the way to such a framework. It is obvious that for the current<strong>ECHO</strong> project only small steps can be achieved. It will require great collaborative effortsfrom technologists and experts from CH institutions to overcome the current limitations and itwill require new ethical and legal principles to convince the great institutions to contribute inthe way the <strong>ECHO</strong> charter describes it.3. The contentIncreasing quantitiesOne way of approximating the amount of that information relevant to the <strong>European</strong> culturalheritage which is currently available on the web is to consult available statistics. On a nationallevel, this kind of statistics is sometimes available from the library world. Thus, since 1997,The Royal Library of Sweden Stockholm, sweeps the web for Swedish cultural resources onan annual basis. The results are made public in the form of descriptive statistics distributed onfour tables: Bytes, Web sites, Number of files, and The most frequent file types (text/html,gif, jpeg, text/plain, and appl/pdf). These four tables – as updated by 2003 - are rendered inAppendix 3.0 together with short descriptive conmments in English. Although the statistics islimited to information judged relevant to Sweden’s cultural heritage only, the numbers maygive a hint as to the rapidly increasing quantities of information. Among other things, it isobserved that there is a steep increase in number of files during the past six-year-period: fromaround 7 million to 40 million files. Calculated in bytes, the increase goes from around 160Gbytes to 1800 Gbytes, with the greatest proportional increase from 1999.The four cultural domainsWith respect to the cultural domain of languages, the situation is extremely varying.Languages appear in written, spoken and signed (sign languages) forms. Because ofrestrictions of the storage medium, exemplars or samples of language usage before the late19th century only exist in written forms. Later samples include spoken language (audio) andsigned language (film, video). Also, computer technology has led to a growing amount ofrecordings of on-line writing (as distinct from the old text samples of off-line writing).Further, the combination of sound, (moving) picture and text – recently in the form ofcomputerized multimedia representations – allow for detailed documentation of linguisticcommunication online in its particular setting (physical and social circumstances). The newtechnologies are used to an increasing degree for creating computerized samples of language5


usage, for the purpose of analysing as diverse phenomena as, for example, spoken languages,sign languages (interestingly, these languages lack adequate writing systems, and multimediarepresentations therefore have an important role to play for their documentation), minoritylanguages and endangered languages, child language acquisition, writing development inschool children, translation and simultaneous interpretation, etc etc. Still, written text is, todate, the prevailing manifestation of languages which the research community is concernedwith. Spoken languages are often studied in a transcribed form (that is, after they have beenwritten down) and many so-called national language corpora are based solely on written texts(such as newspaper text and novels). Today, every language department with self esteem isworking with computerized language samples and computational methods of analysis. Butthere is little coordination of efforts and very little systematic sharing of data. In particular,metadata (systematic and structured descriptions of the data) are either absent or extremelymeagre. In the languages section of this report (3.1), we focus on experiences during the firstsix months of the <strong>ECHO</strong> pilot project of trying to recruit providers of data and metadata aswell as to explore potential communities of users.With respect to the cultural domain of ethnography, the present situation is principally builtaround objects which are kept in ethnographic museums and/or in institutions dealing withextra-<strong>European</strong> material. Ethnographical material, however, is not understandable withoutadditional comments on the societies that have produced them. These comments can be madeby natives or by specialists — ethnographers and anthropologists — who have studied them.The fact that collections and ethnographers are not under the same roof therefore means anessential challenge to meet demands on communication and infrastructural support to realizean exchange and complementarity between the two — the kind of challenge <strong>ECHO</strong> isdesigned to meet. In the ethnography section of this report (3.2), we focus on observationsmade from a widely distributed questionnaire, concerning digitized databases of non-<strong>European</strong> objects and societies within <strong>European</strong> museums.With respect to the cultural domain of history of science the situation is also varying.There is a lot of unorganized information on different levels of quantity and quality ofMaterial presented on the Web. To use the Web for research works it seems necessary notonly to classify the web presentations but also to evaluate the presented material as well as thetools, the interoperability functions and annotation possibilities.In the history of science section of this report (3.3), we focus on significant examples and theexperiences collected by international scientists in the area of history of science.With respect to the cultural domain of history of art there is a lack of presentations on historyof art on the Internet. Various collections and presentations of museums are available, but thetools to do research work with this material is mostly missing.In the history of art section of this report (3.4), we focus on a comparison of differentexamples of web presentations including analysis of available tools to use the Internet forresearch work in the area of history of art.3.1 Languages (Strömqvist, Uneson, Eklund & Crasborn)The present situationA quick search on the world wide web for ”language resources european” gives a firstindication of the large amount of descriptions of languages, tools for linguistic analysis andsamples of languages which are scattered around the world, especially the university world.6


The three search words yielded more than 1,290,000 hits. The top ten are rendered inappendix 3.1a. A handful of other searches also yielded very large amounts of URLs. 15 sitesfrom the searches were reviewed more closely with respect to content, access restrictions etc(see appendix 3.1b). The procedure showed that most of the the sites represent organisationsholding large and interesting language resources and that these resources are more or lessinaccessible. Whereas some sites made at least some language data available to the visitor,other sites made no language data available at all. Similarly, the sites provided no or littlemetadata.From personal experience and personal communication with research colleagues around theworld, we would briefly characterize the situation as follows. There are many research groupswho have collected very interesting language samples, digitized the material and developedinteresting analysis tools. But these efforts are seldom coordinated and the resultant digitizedmaterials are seldom shared with other research teams. Often, the software developed is notgeneralizable/applicable to other digitized data than those of the local group and the particularprogrammer may well be the only person with proper knowledge about the programdeveloped. The situation means that a lot of efforts and ingenuity is never propagated outsidethe local group and thus represents a considerable waste of resources. Interoperability issuesas well as incentives for daring to share both data and methods across research groups shouldtherefore be a top prioritiy.An exception to the rule is the Child Data Exchange System (CHILDES, see MacWhinney1991). CHILDES is a system for sharing digitized research resources concerning childlanguage development, with three main modules: a data archive, analysis tools and aninformation bulletin board. While CHILDES does not represent cutting edge technology, itprovides a user friendly system, closely tailored to the needs of the child language researchcommunity, and this is the key factor behind the success of the system. The child languageresearch community experiences CHILDES as a resource to profit from, as a source ofresearch contacts and cooperation, shared data and tools, something which gives an addedvalue, provided that you dare to share data and ideas. In comparison to <strong>ECHO</strong>, however,CHILDES is much more narrow - both in terms of technology and in terms of user groups.But the moral from the CHILDES project prevails: no success is guaranteed unless the usersfeel they have something to gain. And for some reason, which is almost criterial of the presentstate-of-art, CHILDES, just like the other language resource sites, does not make use ofmetadata for systematic searches in its archive.The call for language resourcesIn order to elicit more systematic information about language resources and explore groups ofpotential providers of data and metadata, a call for language resources was made within theframe of the <strong>ECHO</strong> project. A letter of invitation was sent out to over 400 addressees withrequests that they visit our home page and fill in an electronic form designed to describe theirdata. It was decided that special efforts should be made to attract the interest in the <strong>ECHO</strong>language resource concept of potential providers of sign language data, minority languagedata, data of relevance to language educators, data representing disparate data formats,crosslinguistic data, unique data, and data from at least one institute representing ongoingdigitization. Sign languages, in particular, was judged to constitute a domain of high priority,since these languages cannot be adequately rendered in writing and since the propagation andanalysis of samples of sign languages therefore have to rely on multimedia. Also, sign7


languages are, probably, the least known languages contributing to the cultural heritage andlinguistic diversity of Europe.Only a little more than 30 recipients of the letter of invitation responded that they would beclearly interested in participating in the <strong>ECHO</strong> project, and only 10 of them responded withdetailed descriptions of their data or with samples of metadata and an informal commitment tocontribute larger amounts of metadata and data to the <strong>ECHO</strong> project.Workshops and trainingTo strengthen the recruitment procedure, workshops with metadata training sessions wereorganized. The goal of such an event is to guide the participant/potential provider throughsteps 1 and 2 in the recruitment scheme below, and to prepare him/her for steps 3 and 4.1) convey the concept of <strong>ECHO</strong> and language resources2) describe own data and produce metadata sample3) produce all metadata4) produce the dataFour events of this kind have been organized:1. Local Lund workshop, April 7-82. Sign Languages, MPI-Psycholinguistics, May 7-83. Minority Languages, Lund, May 19-214. <strong>ECHO</strong> workshop in connection with International conference on minority languages,Kiruna, June 5A tentative conclusion from the workshops we have organized so far is that workshopparticipation is an efficient road to real understanding of the power of metadata, which, in itsturn, is conducive to <strong>ECHO</strong> commitment. The combination of letters of invitation, web-basedinformation propagation, and workshops have so far resulted in that browsable metadatadomains from 10-15 institutes/projects/organisations are in progress. The budding network ofproviders is briefly described below:Minority languagesMPI-PLHelsinkirich collectionsSaamiPhonogrammarchiv, Viennaunique material (UNESCO Memory of the world register)collections of Basque, Celtic, Inuit from 1900-1910specialised digitization skills (wax rolls...)25000 recordings, in total 7000 hours (selection, of course)University of Groningen and University of St Petersburg8


endangered languages in Siberia; FrisianU Lund (Swedia)1200 hours dialectal recordings 1998-2000excellent quality, spontaneous & elicitedrich metadata on paper!some MLSweden’s national dialect archive (SOFI)Sami data, unique recordingsSign languagesU Nijmegen + U Sthlm + U Bristol<strong>ECHO</strong> case studyU UtrechtU Hamburg (Europe's largest SL archive)Language educationSpencerScriptlogStavanger,JyväskyläCrosslinguistic archive of spoken and written text development inschool childrenCHAT format, meagre metadataseven languages (dutch, eng, fr, heb, icel, sp, sw,)Crosslinguistic archive of online writing resourcesmeagre metadata embedded in binariessmall corpus so far (young researchers with little data of their own)reading research and L2-L1 / L1-L2 transfer in spoken and writtentextsIMDI from beginningThere is at least a handful of other institutes and organisations who are strong candidatecontributors to <strong>ECHO</strong> language resources, among them two institutes in Russia with datafrom endangered languages (Kibrik, Dagestan lgs; Churikov, Itelmen Language) andMeertens Institute, Amsterdam.To make recruitment more efficient, a web distributed IT-tutorial on <strong>ECHO</strong> and IMDI isunder construction by the MPI-Pscholinguistics and the Lund University teams. The tutorialshould allow presumptive contributors to take step 1 and 2 in the recruitment scheme (seeabove) on their own.9


3.2 Ethnography and museum objects (Karadimas et al)Non-<strong>European</strong> objects and Societies within the <strong>European</strong> Museums (NECEP)A report on the state-of-the-art concerning digitized databases dealing with non-<strong>European</strong>objects and Societies within the <strong>European</strong> Museums, ”NECEP” (Non <strong>European</strong> Componentof <strong>European</strong> Patrimony) is presented by Karadimas et al (2003). The report is based on aNECEP questionnaire, distributed to a large number of important museums and institutes inEurope. The NECEP report is attached to the present state-of-art report.The NECEP investigation addressed 78 major museums and institutions from 23 <strong>European</strong>countries. The current estimation of the total number of museum objects was done on 15countries from which data were obtained (on a total of 23). The total sum proved to be morethan 3 million objects (3,086,067). It should be stressed that this sum taken as an estimation isbased on partial information and that it represents a minimum. The estimation will be subjectto continuous revision as further information reaches us. There are, however, a few tentativeobservations to be made from the information currently available.Thus, on the basis of current sampling, the major museums of England, Germany and Hollandhave alone close to the 2/3 of the NECEP objects in Europe (2.062.519).Further, with regard to the percentage of the process of digitalization of the objects present inthe museums, we only have information for 7 of the 23 countries concerned. The dataconcerning this process are thus very incomplete. Again, using the information available, weobserve a rate of digitization of 37% (slightly more than 1/3 of the collections, or 1,150,000objects without counting archives material).The situation, however, is very heterogeneous with respect to the different countries. Thus,the major Dutch and English museums have practically completed the digitalization of theircollections, whereas those of France and Italy, for example, are still in process (for example,the collections of the Quai Branly museum, will be completely digitized by 2005).The museums were also reviewed in terms of the functionality of their web sites. Threesituations were discerned:1. Publicity (general information on the museum; location, contact, etc.)2. Publicity & Diffusion (situation 1 plus some specific collection information)3. Publicity, Diffusion & Research (situation 2 plus presence of a research engineinto the databases of the museums).On the basis of the review, it was found that the great majority of the <strong>European</strong> museumsdealing with NECEP collections take their web sites as a publicity and diffusion platform(that is, situation 2 above). The first exception to this rule are some museums in Switzerland,Great Britain and all the Dutch ones, which offer a higher functional level representingsituation 3. The second exception are Germany and France, which do not have a systematicpresentation of their material through the Internet.Further, the most advanced situations in making scientific information available throughInternet are the British and Dutch museums web sites.A summary overview of the NECEP questionnaire and its results is presented in Appendix3.2.10


Conclusions and directions for the futureThe summary and conclusions of the ethnography report are• The majority of the <strong>European</strong> museums are organized around their objects andcollections.• None of the major <strong>European</strong> museums has, in their object related databases, datarelated to the cultures that produced these objects: even if a field "ethnic group" oftenappears, it only contains the name of the society from which the object originated.• For Echo and its Necep component, the museums will not be able alone to face therequest of information relating to the cultures which have produced these objects. Thestrictly ethnological component of the Necep part is currently underrepresented: onlythe museums (holders of 'patrimonial’ objects) were initially approached.• According to specific and personal surveys of some French and German ethnologicaland anthropological research centres, a large quantity (currently noncalculable) ofrough data relating to the non-<strong>European</strong> cultures is presently hosted there.• The nature of the data of these centres and institutes is ethnographic investigationmaterial (lexicons, linguistics questionnaires on non-<strong>European</strong> languages, field notes,accounts, myths, photographs, films and sound recordings…).• This material also takes part in the <strong>European</strong> heritage concerning non-<strong>European</strong>societies although it is not recognized nor inventoried by major ‘patrimonial’institutions. The research centres, which do not have patrimonial vocations, howeverbalk to deposit them in the museums due to the fear of not being able, thereafter, tohave an open access to them.• Until now, only a few research centres where approached, principally those thatalready have important collections (e.g. Cambridge University). Continued work isneeded at the <strong>European</strong> level to make an inventory of ethnological institutes andresearch centres.• In order to improve the present situation, infrasctructural support at the <strong>European</strong> levelis essential. Institutes and research centres that will have to account for theinformation to be given on non-<strong>European</strong> cultures remains to be defined. The questionof the modalities of the choice of those centres also remains to be worked out. Thevarious research groups of will have to be put in contact at the <strong>European</strong> level (e.g., interms of networks built around the large cultural areas: the Andes, Mesoamerica, sub-Saharan Africa, Amazonia, Melanesia, etc.)3.3 History of science (Renn)Clearing housesSeveral clearing houses covering the area of history of science are available on the Internet(e.g. http://echo.gmu.edu/, http://www.loc.gov). They are intended to support navigation in aflood of unorganized information. Although clearing house-sites provide very usefulorientation, they lack mechanisms to evaluate the user’s feedback and suggestions from othersites, i.e. they don’t cater to the needs of users as is now possible on the web, with dynamicalsites such as “Google” or “Amazon”.11


If, in the future, content on the world wide web is to assume a reference status comparable tothat of established encyclopedia, such clearing houses will have to satisfy increasinglydemanding criteria such as:• reliability,• up-to-dateness,• comprehensiveness.In addition, in view of the dynamic qualities of the electronic medium, they should allow• interactive augmentation and rating, and• multiple user interfaces for browsing, search, and retrieval.Given these criteria, the available clearing house must be considered to be still far fromexploiting the potential of the new media. Many refer, for example, to dead-links, offeroutdated and even false information, and are far from being comprehensive.This deplorable situation is, however, not merely a matter of lack of effort or engagement butrather points to a more fundamental problem of todays world wide web: its lack of appropriateontologies and metadata models, which impede a semantic organization of its contents. Mostof the sites referenced by the clearing houses do not include metainformation about theirentries and offer information in an unsystematic way. A link to “Galileo and Einstein” couldhence be the entry point to a rich source collection, or merely point to some outdated lecturenotes. A dynamical clearing house for cultural heritage on the web, which would alsocomprise the area of history of science, would thus be essential. It would, furthermore, bedesirable to introduce standards of evaluation and criteria of quality control such the HONfoundation (http://www.hon.ch) or s AFGIS (http://www.afgis.de), which have initiated amedical “web of trust”.History of science information on the WebA preliminary survey of websites in the history of science has revealed an enormousdiscrepancy between project goals and actual realization. In particular, many of theannouncements of projects promising to make scholarly materials available have not beenfulfilled. It seems to be the case, furthermore, that funding agencies still lack appropriatemechanisms for ensuring a satisfactory implementation of the new media in the humanities.This is all the more regrettable as the potential of the new media for the history of science isconsiderable. Evidently, the electronic storage of historical sources improves theiraccessibility and allows for new and powerful methods of information retrieval. Scanning andoptical character recognition techniques can be used to digitize historical sources, which formthe basis for building electronic archives. Databases and software tools can be developed toassist research and editorial activities. Working environments are conceivable that make itpossible to integrate historical details into coherent models of historical developments. Theyrequire, however, both the availability of a wide range of sources accessible to the scientificcommunity as a whole, within the framework of open digital research libraries, as well asscholarly cooperation extending well beyond a single institution. These cooperations,characterized by a novel unity of research and dissemination, have, however, hardly beentaken. Hence the potential of the Internet to cut across the traditional distinctions of researchinstitutions, universities, and libraries is, in the area of the history of science, still notsufficiently exploited.An attempt to single out and evaluate the best resources for the history of science on the worldwide web for this state of the art report does not seem, in view of the situation described in the12


last paragraph, to be very promising. Instead the following list of problems from which allresources for the history of science suffer to a greater or lesser extent is provided.• The Volume Problem: Only a tiny fraction of the material relevant to the history ofscience is presented on the Web• The Presentation Problem: Most of the available sources are not appropriatelypresented• The Connectivity Problem: The hypertext capability of the medium is not sufficientlyexploited for connecting the available pieces across sources, collections, countries anddisciplines• The Accessability Problem: the few digitized collections such as libraries and archivesof documents are often not accessible to research or to the public• The Search Problem: A content-based index structure for effective search is notavailable• The Exploitation Problem: The scientific and educational exploitation of digitallyavailable resources is severely hampered by the lack of appropriate method and tools• The Suitability Problem: current web technology does not yet meet the requirementsposed by the complexity and heterogeneity of cultural, historic and scientific content• The Distribution Problem: Composition of sites from distributed resources andinteroperability of different data structure and software systems is difficult andsometimes impossible• The interactivity Problem: It is hardly ever possible to comment and never possible toaggregate new contents from old.The last problem of interactivity will be discussed further in the next section.Annotation and collaboration through the WebEven the best websites available are still far from realizing the original vision of the Internetto be an interactive read-write medium and do not offer platforms for scholarly cooperationon historical sources. Two exceptional examples are the “History of Recent Science andTechnology” project (http://hrst.mit.edu/) or the Archimedes Project(http://archimedes.mpiwg-berlin.mpg.de and http://archimedes.fas.harvard.edu), which isbuilding up a digital research library offering at the same time tools for the comprehension ofsources in the history of mechanics. These tools cover not only contemporary dictionaries andencyclopedias that are linked to the sources but as well software tools for the annotation ofthese texts (Arboreal). Furthermore, the project allows locally distributed editorial work onthe sources, combining visual representation with textual analysis.The role of the history of science case study on the WebThe role of the history of science case study is to provide seed collections combining some ofthe technical solutions developed by the advanced projects such as the Archimedes project inorder to apply them to a wider range of sources and to offer an infrastructure helping the“dynamic accumulation” of further sources by lowering the technical and competencethreshold for putting them on-line. Furthermore, a dynamic clearing house model has beendeveloped to test its feasibility.3.4 History of art (Kieven)Until now, Art History is almost only present on the Internet in the form of digitized museumcollections that can be accessed through the web: If they can be used for free, they allow usersto see images of the objects in a quality sufficient for representation in a web-browser but notfor scholarly work. A search function is provided, but in many cases limited to a simple full13


text search over the database. Though some information is given on the objects themselves,scholarly information of a scientific level is almost ever lacking. Exchangeable metadata arenot offered.The following list gives an overview of web databases that can be seen as examples for thedifferent approaches regarding art historical images and metadata on the web. We tried toselect the most advanced or useful examples containing larger amounts of images andinformation.Museum and other collections in web databasesA very advanced example for a freely accessible online database of a museum's collection isthe website of the Guggenheim Museum, New York City:http://www.guggenheimcollection.orgIt allows to search in different directories and categories, but - of course - no annotation orlinkage of found objects to other resources. The quality of the images is - as mentioned above- sufficient for the use in a web browser, but not for advanced scholarly use. Informationregarding the objects is given in English articles only - i.e.: contains no metadata - and coversthe content of a normal printed catalogue. There are no means of referencing informationfrom these entries in another way than bookmarking the whole website, which is not veryhelpful, because the links do not contain any indication of the content, i.e.:http://www.guggenheimcollection.org/site/artist_work_md_126_15.htmlregarding Picasso's "Le Moulin de la Galette". Anyway, this special web-page, thoughobviously created dynamically, can be found by internet search machines like Google, if theuser knows the name of the painting in the original french version used in the database:Therefore, users expecting an english title in this database might not find it.The French governmental database project JOCONDE contains images and data from about84 museums in France with very few metadata information (only in French):http://www.culture.fr/documentation/joconde/pres.htmAgain, no means are provided to search or link the few given metadata information.Besides this, some projects inaugurated by (art) historians tend to concentrate on their specialsubject without the intention to interrelate their data with those of other projects and thereforedo not offer this possibility to other users for collaboration.An interesting example for this sort of databases, especially because of the rich metadataprovided and the quality and high resolution of the images, is the CEEC (Codices ElectroniciEcclesiastici Coloniensis: http://www.ceec.uni-koeln.de), a web database of the medievalcodices from the Erzbischöfliche Diözesan- und Dombibliothek Köln/Cologne.But many, even very famous museums and collections (like the Staatliche Museen zu Berlin -Preußischer Kulturbesitz: http://www.smb.spk-berlin.de) do not have a web-database of theircultural heritage at all, though they may own a database of their objects for internal use.The main problem for art museums in publishing their content seems to be the gap betweenpublic interests and their own needs to earn money by selling digital images of their objectsand other relevant cultural content.14


Collections and web portals of digital imagesAnother sort of database tries to offer access to groups of similar databases through a webinterface 'hiding' the differences between the databases' structures for the user:PROMETHEUS (http://www.prometheus-bildarchiv.de) is a growing but by now not fullyfunctioning network of archives and databases of digital images for research and teaching.The quality of the images as well as of the provided textual data depend – of course – on thesingle institution and therefore varies widely. Search is provided as a selection from a smallset of metadata describing the images. Tools for interactive work with images from differentsources are under development, but focus clearly on the teaching processes and do not includecollaboration between users and/or annotation of the sources.A similar project (as a repository of big amounts of images) is the German "Bildindex":http://www.bildindex.de, provided by the partners of a network centered around theBildarchiv Foto Marburg. It allows searching and collecting of images of interest in a privatefolder for every user. Though some metadata regarding the images can been seen, they are notsearchable and concentrate on HIDA, a large set of abbreviations for the description ofimages.In any way, a possibility to make comments, annotate the given information or add new oneas well as links to other, comparable digitized objects on the web does not exist - without veryfew and limitated exceptions:Annotation and collaboration through the webAn interesting software project aiming at the exploration and connection of differentdatabases is FEDORA (http://www.fedora.info). The "Flexible Extensible Digital Object andRepository Architecture" (= FEDORA) was "designed to be a foundation upon whichinteroperable web-based digital libraries, institutional repositories and other informationmanagement systems can be built" and therefore could be interesting for new databaseprojects in the humanities interested in linking their data with other databases through theweb. Written in JAVA it is (almost) open source and platform independent – but concentrateson textual information.COLOSSEUM (http://colosseum.biblhertz.it) is an independent project supported by theBibliotheca Hertziana as an internet platform offering the possibility to contribute and discussscholarly (art historical) papers - respectively: articles - including images through the web,but again: the quality of the images is very low - and cannot be higher because of copyrightlimitations.These limitations set by the new copyright laws in different countries adapted for the internetage and intended to save the rights and financial benefits of the copyright owner (but not toserve the scientific community or a broader public, interested in the cultural heritage of theworld), create growing borders for scholars and scientists especially in the fields concernedwith images of art historical and other objects of the cultural heritage: Especially databases ofart historical object created by - but not only - North American institutions like universities ormuseums do not offer free access through the internet, but charge institutional and privateusers with high entrance fees, often limited to a short period like i.e. a year.Another interesting project, "VBI ERAT LVPA" (http://www.ubi-erat-lupa.austrogate.at/),concentrating on archaeology and therefore not an art historical web database in a strict sense,15


allows contributing information through the web, but is based on proprietary software. It'smost interesting point is the use of the CIDOC Schema as point of reference for thestructuring of the data. Though CIDOC itself is intended to serve - again - the purposes ofmuseum collections and archives and therefore establishes a set of metadata more related toarchival information than to scientific data, its CRM (Conceptual Reference Model),implemented as an XML-DTD, comes structurally close to what is intended and needed forour project, Lineamenta, and - in our opinion - other art historical database projects too.Of course, there are lots of other databases useful for art historians, but - as far as we can see -none of them offers the possibilities intended by the creators of the World Wide Web and tobe realized by the <strong>ECHO</strong> partners for a real scholarly collaboration over the Internet. So, thethousands of entries, given by a quick google-search for "art historical databases" and similarwords, can rather be sorted under "inventory databases for art historical objects" than under"art historical databases" in a scientific meaning of the notion "art history".Therefore, the main gap between possibilities and realities concerning art historicalinformation and their exchange over the web, is the lack of tools and possibilities forscholarly annotation to existing data, not to mention for creating new information orinterconnect them between different databases over the web: At least, the creation of links isalways due to the maintainer of the database and/or website, but not available to externalcontributors or users.The role of Lineamenta and ZUCCARO in the situation described aboveA simple first version of our web database for architectural drawings, Lineamenta, - intendedfor technical tests and as a "proof of concept" - allows insertion of data through the web in abrowser interface and is used for collecting basic information on content material. Any personwith the sufficient "role" (sum of rights concerning the use of the database) can add andcomment information or create links to any resources on the web. We use this feature, forinstance, to address the very large files of digital images created for our database and servedby Digilib, a free, open source image-server that can be located anywhere in the web becauseits content is addressed and its functions are operated by the use of URLs only.Recently, information from another database project on architectural drawings at theStaatliche Museen Kassel (Germany), using a different technology, have been imported intoour database: Now, the colleagues from Kassel can work over the internet with their data andalso relate them easily to other content in Lineamenta as well as profit from the infrastructurecreated for our database. This project is very important for us because it demonstrates – evenin its simple structure – the benefits and chances that real collaboration over the internet offersto scientist and researchers. As a next step, we can create a “clone” of Lineamenta on theserver in Kassel in order to allow temporarily disconnection between users and our databaseserver, substituting it with regularly processed synchronization: a feature, that we until todayonly tested with local installations on portable computers, but not over the net, and that – asfar as is known to us – does not exist yet anywhere in the scientific internet world of ArtHistory.But the search for a sufficient software to fulfill our wider aims for a final version ofLineamenta led us to the decision to develop a software framework - or rather "tool box" - for(art) historical purposes, based solely on complete free and open source software, applicablefor all important operating systems.16


This framework, called ZUCCARO (Zope-based Universally Configurable Classes forAcademic Research <strong>Online</strong>), consists of an XML-Schema describing about 60 so-called"classes" of objects of (art) historical interest including generalizable metadata and aworkflow system for the collaboration on entries through the web-interface. The XML-Schema is intended to be used with ZOPE (including an object-oriented database) andPostgreSQL (an object-relational SQL-standard conform database) as well as an additionalnative XML repository (not database) for all data: But any other implementation ofZUCCARO with different software is also possible while the exchange of data andcollaboration between ZUCCARO-based databases over XML-ex- and import will be saved.By modifying the XML-Schema (adding new or changing existing classes) the creator of anew database has all possibilities to adopt the database to the special needs of the new projectwithout destroying the chance to exchange data with other databases. The rapidly growingusage of XML in web applications as well as in office software will allow users of databasesto contribute to ZUCCARO-based databases in a complete transparent way by working withtheir favorite software, though working through web-interfaces in the browser (offeringsimple editing tools) may be the best way to contribute to such databases - for instance in thewider scientific framework of <strong>ECHO</strong> -, because it allows to create all needed relations (links)to other resources on the web using <strong>ECHO</strong>'s AGORA interface in the easiest way.Twelve prominent hits from an internet search for arts resources (of the same kind as the oneconducted for language resources (see appendices 3.1a and 3.1b above) were reviewed moreclosely with respect to content, access restrictions etc (see appendix 3.4).4. Interoperability (Wittenburg and Strömqvist)The issuesWe have repeatedly stressed the importance of sharing resources as a means to arrive at amore creative and efficient usage of data and, thereby, to enhance the vitality of the culturalheritage. This process will have a positive effect already within disciplines, but the greatpotential lies in its application across disciplines.<strong>ECHO</strong> covers several disciplines which play an active role in the area of cultural heritage,such as History of Science, History of Arts, Languages, Ethnology and Philosophy. Thismultidisciplinary arena helps the <strong>ECHO</strong> partners to study interaction and technologydevelopment processes which cross discipline boundaries so as to arrive at a frameworkwhich allows researchers to integrate resources from various disciplines to gain new insights.Cross-disciplinary work requires interoperability at several levels. The most obvious level isthat of discovering suitable resources that can be exploited in an interoperable way. Thesecond level has to deal with accessing such resources, since they are stored in somecontainers and formats. The third level addresses the syntactical aspects, i.e. how can weidentify the elements of the individual resources and extract relevant content. The last andmost difficult level focuses on the semantic aspects 2 , i.e. how can we exploit and combine thedifferent contents.There is yet another and even more fundamental level of interoperability that has to beaddressed: Do scholars have a real interest in interdisciplinary work that exceeds the boundary2 For the present purposes, we exclude aspects of pragmatics from this consideration.17


of tailored projects? And if so do we understand each other’s language and objectivessufficiently well?Due to the AGORA approach of <strong>ECHO</strong> we have to address all levels of interoperability. WP2already started discussing the first three levels and identified a couple of problems that are notat all trivial and need to be solved (see WP2 reports under http://www.mpi.nl/echo). WithinWP2 two institutes started already to discuss how their catalogue information could bemapped to allow joint queries 3 .WorkshopsOn May 20 the MPI-PL held a meeting with the big Dutch <strong>Cultural</strong> <strong>Heritage</strong> institutions(Rijksmuseum Amsterdam, National Library, National Archive, etc). One of the main topicswas metadata and guidelines for how to forward the state-of-art in the netherlands wereworked out. The participating institutions saw the meeting as the first major step towards amanageable solution. The solution will focus on- Accept all MD sets that are out there of course. One of therelevant sets is ICONCLASS based and this will be further elaborated.- Interoperability via modularization and Semantic Web technologies- New attempts to automatically generate MD.- Different portals that show domain specific terminology, but alsosimple access possibilities.The meeting confirmed that the <strong>ECHO</strong> ideas are at the top of the hitlist of other initiatives aswell.In order to further the discussion about semantic interoperability, WP2 - in cooperation withLund and Bern - will organize an Interoperability workshop in Lund in September 2003.5. User groups (Strömqvist)During the pilot phase of <strong>ECHO</strong>, user groups associated with research in the four targetedareas history of science, history of art, ethnology and languages are being established. Usingthe <strong>ECHO</strong> infrastructure for making more efficient use of own data will help enhancing theresearch situation for many research groups themselves. Sharing data across research groupswill mean still more efficiency and creativity. But the user groups of <strong>ECHO</strong> resources are notconfined to the scientific communities only. There is a large variety of groups for which<strong>ECHO</strong> resources might prove very useful, for example, teachers and students first andforemost in higher education but also in schools, organizers and visitors of museumexhibitions, etc.Lund UniversityThe university world represents a complex organization with both research and education, andwe therefore decided to start exploring the potential of <strong>ECHO</strong> for different user groups at theuniversity level. The vice chancellor of Lund University got very interested in this concept3 Bibliotheka Hertziana and MPI for Psycholinguistics have started information exchange about this issue.18


and kindly offered Lund University as a testbed. As an effect, <strong>ECHO</strong> was described in localpress and a briefing was given to the Board of deans (Humanities, Social Sciences, Medicine,Natural Sciences, Technical Sciences, Law, and Economy). Also, the faculty of theHumanities has decided to announce a doctoral position dedicated to <strong>ECHO</strong>. Lund Universityis an organisation which can profit from <strong>ECHO</strong> on all levels (research, education,construction of IT-based courses etc), and where we would stand a chance of evaluating theeffects and effective usage of <strong>ECHO</strong> resources, not the least since LU has internal proceduresfor evaluation.At a local <strong>ECHO</strong> workshop for Lund university, April 7-8, the <strong>ECHO</strong> concept was furtherelaborated to groups of postgraduate students, senior researchers and heads of department.The idea of profiting from <strong>ECHO</strong> as a framework for systematizing and sharing own researchdata was widely cherished. Many participants decided to make reference to <strong>ECHO</strong> in theirfuture research applications, and to introduce graduate students to IMDI as a method forsystematizing own research data. Representatives of IT-based education and distanceeducation saw new opportunities from using <strong>ECHO</strong> resources.The focus of the Lund workshop was on language data, but one professor of film science gotinterested in using the <strong>ECHO</strong> framework for film data, and one professor of musicology for alarge project concerning the history of music in the Baltic countries (song texts, examples ofsinging performance, biographical notes of musicians, etc). Although the history of musiccannot be a priority within the <strong>ECHO</strong> pilot, the example testifies to the need for precisely theinfrastructural support offered by <strong>ECHO</strong> and the AGORA not only for the four targeteddomains of the pilot phase of <strong>ECHO</strong> but also for other scientific and cultural domains.Further, a new research library is presently being built at the Centre for languages andliterature at Lund University. This library will serve as a privileged testbed for the integrationof library resources and <strong>ECHO</strong> resources to create a powerful information accessingenvironment for students and researchers. At the department of Library and InformationScience at Lund University, the work with integrating <strong>ECHO</strong> with the new library has alreadygenerated three MA theses. Indeed, if this work proves successful, <strong>ECHO</strong> could work outcriteria for ”<strong>ECHO</strong> certified research libraries”.SchoolsIn the same vein, the educational authority of public schools in the Lund region has signalledan interest in <strong>ECHO</strong>, and a contact seminar between the schools and Lund University hasbeen initiated. Among other things, five schools now participate in a project using andcontributing to <strong>ECHO</strong> pilot material on writing traditions and writing development in schoolchildren in different <strong>European</strong> countries.In the domain of natural science, the JASON project is a successful attempt to build apedagogical interface on the Internet between university research labs and activities underschool curricula.We believe that interfaces between <strong>ECHO</strong> and school activities can helpextend <strong>ECHO</strong> to serve a similar role in the humanities as that of JASON in the naturalsciences.Museum visitorsTwo more potential user groups are worth mentioning in this context. The first is Skånes19


kulturarv (Scania’s cultural heritage board), an organization coordinating joint activitiesbetween museums, IT based information propagation concerning the cultural heritage ofsouthern Sweden. Skånes kulturarv has signalled a strong interest in <strong>ECHO</strong>. The second is theInteractive Institute, an organization concerned with innovative uses of informationtechnology for information propagation, pedagogy, etc. The Interactive Institute has a projecttogether with the Swedish National Museum in Stockholm concerning IT based presentationsto museum visitors. The leader of WP 1 has presented <strong>ECHO</strong> to the project group and hasbeen invited to be a consultant on the project.6. Conclusions (Renn & Strömqvist)There is a handfull of conclusions to be drawn from the present report. The first is that thefragmentation of resources/data/objects pertaining to the cultural heritage represents anenormous waste of resources. Many objects and data are stored in ways which make theminaccessible or hard to access for the research community. Other data are collected, digitizedand analysed by research groups, but the group has developed their own computationalsolutions which, admittedly, serves the local research group well but which makes the groupand their data isolated from the rest of the scientific community. Most importantly, there arehardly any metadata. There is an enormous need for metadata to make data searchable andmore easy to share. Sharing different kinds of data further presupposes advancedinteroperability.All these observations point to the crucial role that the <strong>ECHO</strong> infrastructure initiativeAGORA has to play in a fullfledged <strong>ECHO</strong> project. Provision of training, support, metadataconversion, etc are basic functions which are much needed now and which can be expected tobe much needed in the future. Further, it is important to grant providers full control of theirdata. The use of <strong>ECHO</strong> facilities should be conducive to further usage by the same users aswell as by new users.In our limited experience from the first six months of the <strong>ECHO</strong> project, workshops andtraining sessions tailored to candidate <strong>ECHO</strong> participants have proven very successful – forrecruiting providers of data and metadata, and for deepening the understanding of user needs.Workshops and training sessions produce good examples of user needs, problems andsolutions, which can then be used in order to propagate the <strong>ECHO</strong> concept more effectively tonew users. The good examples – and not just abstract descriptions of, for example, metadata –are very powerful.Further, we believe in the power of the researcher’s own social/professional network. Weshould seek to give priveleged training and support to key groups and scholars, who can thenact as agents for spreading the <strong>ECHO</strong> concept to their own network of scientific partners.Also, our experiences so far suggest that a dialogue with future users should take placealready while the tools are being developed. This helps motivating future user groups anddeepening our understanding of user needs and perspectives. <strong>ECHO</strong> is not just atechnological feasibility project. It also aims at building networks of users, both scientific andeducational.20

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!