LIBER 39TH ANNUAL CONFERENCE - Statsbiblioteket
LIBER 39TH ANNUAL CONFERENCE - Statsbiblioteket
LIBER 39TH ANNUAL CONFERENCE - Statsbiblioteket
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
sEssIon 3.3<br />
cLEmEns nEudEckER, asaf<br />
tzadok (ImPact PRoJEct,<br />
thE nEthERLands): usER<br />
coLLaBoRatIon foR ImPRovIng<br />
accEss to hIstoRIcaL tEXts<br />
The paper will address how web based<br />
collaboration tools can engage users in the<br />
building of historical printed text resources<br />
created by mass digitisation projects. The<br />
drivers for developing such tools, identifying<br />
the benefits that can be derived for both<br />
the user community and cultural heritage<br />
institutions, will be presented. The perceived<br />
risks, such as new errors introduced by<br />
the users, and the limitations of engaging<br />
with users in this way will be set out with<br />
the lessons that can be learnt from existing<br />
activities, such as the National Library<br />
of Australia’s newspaper website which<br />
supports collaborative correction of Optical<br />
Character Recognition (OCR) output.<br />
The paper will present the work of the<br />
IMPACT (Improving Access to Text,<br />
http://www.impact-project.eu) project, a<br />
large-scale integrating project funded by<br />
the European Commission as part of the<br />
Seventh Framework Programme (FP7). One<br />
of the aims of the project is to develop tools<br />
that help improve OCR results for historical<br />
printed texts, specifically those works published<br />
before the industrial production of<br />
books from the middle of the 19th century.<br />
The coordinator of the IMPACT project is<br />
the KB – National library of the Netherlands.<br />
The KB will work intensively in the<br />
coming years to realise a digital library<br />
that is accessible to everyone with an<br />
Internet connection. As national library the<br />
KB collects and maintains all publications<br />
that appear in the Netherlands, as well<br />
as a part of the international publications<br />
about the Netherlands. One of the large,<br />
labour-intensive challenges is to digitise all<br />
the books, periodicals and newspapers that<br />
have appeared in the Netherlands. The KB<br />
aims to have 10% of all Dutch books, newspapers<br />
and periodicals digitised in 2013<br />
(60 million pages by the KB, 13 million by<br />
third parties), as well as to offer the full-text<br />
collections in such a way that they can be<br />
used immediately by researchers.<br />
To realise this goal, technological improvements<br />
to image processing and OCR engine<br />
technology are key. However, engaging<br />
the user community also has an important<br />
role to play. Utilising the intended user<br />
can aid in achieving the levels of accuracy<br />
currently found in born digital materials.<br />
Improving OCR results to this level is key<br />
to producing resources that support better<br />
resource discovery and enabling greater<br />
performance when applying text mining<br />
and accessibility tools to the extracted text.<br />
The IMPACT project will specifically develop<br />
a tool that supports collaborative correction<br />
and validation of OCR results and a tool to<br />
allow user involvement in building historical<br />
dictionaries which can be used to validate<br />
word recognition. The technologies use the<br />
characteristics of human perception as a<br />
basis for error detection.<br />
clemens neudecker holds a M.A. in<br />
Philosophy, Computer Science and Political<br />
Science. He has been a member of the<br />
Munich Digitisation Centre (MDZ) from<br />
2003-2009 and has been mostly involved<br />
with OCR processing, authority files and<br />
databases. He has in depth knowledge of<br />
all steps of an in-house digitisation process,<br />
from capture approach to online publication,<br />
thanks to numerous responsibilities in<br />
almost 20 digitisation projects from 2003<br />
onwards. He currently works as Interoperability<br />
Manager for IMPACT at the KB<br />
National library of the Netherlands.<br />
30 JunE 2010<br />
43