01.08.2014 Views

LIBER 39TH ANNUAL CONFERENCE - Statsbiblioteket

LIBER 39TH ANNUAL CONFERENCE - Statsbiblioteket

LIBER 39TH ANNUAL CONFERENCE - Statsbiblioteket

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

sEssIon 3.3<br />

cLEmEns nEudEckER, asaf<br />

tzadok (ImPact PRoJEct,<br />

thE nEthERLands): usER<br />

coLLaBoRatIon foR ImPRovIng<br />

accEss to hIstoRIcaL tEXts<br />

The paper will address how web based<br />

collaboration tools can engage users in the<br />

building of historical printed text resources<br />

created by mass digitisation projects. The<br />

drivers for developing such tools, identifying<br />

the benefits that can be derived for both<br />

the user community and cultural heritage<br />

institutions, will be presented. The perceived<br />

risks, such as new errors introduced by<br />

the users, and the limitations of engaging<br />

with users in this way will be set out with<br />

the lessons that can be learnt from existing<br />

activities, such as the National Library<br />

of Australia’s newspaper website which<br />

supports collaborative correction of Optical<br />

Character Recognition (OCR) output.<br />

The paper will present the work of the<br />

IMPACT (Improving Access to Text,<br />

http://www.impact-project.eu) project, a<br />

large-scale integrating project funded by<br />

the European Commission as part of the<br />

Seventh Framework Programme (FP7). One<br />

of the aims of the project is to develop tools<br />

that help improve OCR results for historical<br />

printed texts, specifically those works published<br />

before the industrial production of<br />

books from the middle of the 19th century.<br />

The coordinator of the IMPACT project is<br />

the KB – National library of the Netherlands.<br />

The KB will work intensively in the<br />

coming years to realise a digital library<br />

that is accessible to everyone with an<br />

Internet connection. As national library the<br />

KB collects and maintains all publications<br />

that appear in the Netherlands, as well<br />

as a part of the international publications<br />

about the Netherlands. One of the large,<br />

labour-intensive challenges is to digitise all<br />

the books, periodicals and newspapers that<br />

have appeared in the Netherlands. The KB<br />

aims to have 10% of all Dutch books, newspapers<br />

and periodicals digitised in 2013<br />

(60 million pages by the KB, 13 million by<br />

third parties), as well as to offer the full-text<br />

collections in such a way that they can be<br />

used immediately by researchers.<br />

To realise this goal, technological improvements<br />

to image processing and OCR engine<br />

technology are key. However, engaging<br />

the user community also has an important<br />

role to play. Utilising the intended user<br />

can aid in achieving the levels of accuracy<br />

currently found in born digital materials.<br />

Improving OCR results to this level is key<br />

to producing resources that support better<br />

resource discovery and enabling greater<br />

performance when applying text mining<br />

and accessibility tools to the extracted text.<br />

The IMPACT project will specifically develop<br />

a tool that supports collaborative correction<br />

and validation of OCR results and a tool to<br />

allow user involvement in building historical<br />

dictionaries which can be used to validate<br />

word recognition. The technologies use the<br />

characteristics of human perception as a<br />

basis for error detection.<br />

clemens neudecker holds a M.A. in<br />

Philosophy, Computer Science and Political<br />

Science. He has been a member of the<br />

Munich Digitisation Centre (MDZ) from<br />

2003-2009 and has been mostly involved<br />

with OCR processing, authority files and<br />

databases. He has in depth knowledge of<br />

all steps of an in-house digitisation process,<br />

from capture approach to online publication,<br />

thanks to numerous responsibilities in<br />

almost 20 digitisation projects from 2003<br />

onwards. He currently works as Interoperability<br />

Manager for IMPACT at the KB<br />

National library of the Netherlands.<br />

30 JunE 2010<br />

43

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!