Tools for mass digitization and long-term perservation in cultural ...
Tools for mass digitization and long-term perservation in cultural ...
Tools for mass digitization and long-term perservation in cultural ...
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
<strong>Tools</strong> <strong>for</strong> <strong>mass</strong> digitisation <strong>and</strong> <strong>long</strong>-<strong>term</strong><br />
preservation <strong>in</strong> <strong>cultural</strong> heritage <strong>in</strong>stitutions<br />
Cezary Mazurek, Tomasz Parkoła, Marc<strong>in</strong> Werla<br />
SEEDI conference 17-18 of May 2012
Agenda<br />
• Introduction<br />
• Mass digitisation<br />
• Long-<strong>term</strong> preservation<br />
• <strong>Tools</strong> support<strong>in</strong>g digitisation activities<br />
• dLibra <strong>and</strong> dMuseion – digital library/repository/museum<br />
• dArceo – <strong>long</strong>-<strong>term</strong> preservation services<br />
• dLab – digitisation workflow management tool<br />
• Summary<br />
2
Introduction<br />
Mass digitisation is a large-scale automated process<br />
of captur<strong>in</strong>g the analog signal <strong>in</strong>to digital <strong>for</strong>m,<br />
<strong>in</strong>clud<strong>in</strong>g enhancements such as OCR <strong>and</strong><br />
transcription.<br />
Long-<strong>term</strong> preservation assures that the digital<br />
<strong>in</strong><strong>for</strong>mation is accesible today, tomorrow, <strong>in</strong> a year, 10<br />
years, etc.<br />
3
Support<strong>in</strong>g tools<br />
Developed by Poznań Supercomput<strong>in</strong>g <strong>and</strong><br />
Network<strong>in</strong>g Center (Polish Academy of Sciences)<br />
• dLibra – digital library framework<br />
• dMuseion – digital museum framwork<br />
• dArceo – <strong>long</strong>-<strong>term</strong> preservation services<br />
• dLab – digitisation workflow management<br />
4
dLibra<br />
Developed by Poznań Supercomput<strong>in</strong>g <strong>and</strong><br />
Network<strong>in</strong>g Center (Polish Academy of Sciences)<br />
• Has been developed by PSNC s<strong>in</strong>ce 1999<br />
• The first Polish software <strong>for</strong> build<strong>in</strong>g digital libraries<br />
• Key element <strong>in</strong> stimulat<strong>in</strong>g the growth of digital<br />
libraries <strong>in</strong> Pol<strong>and</strong><br />
5
dLibra – deployments<br />
60<br />
50<br />
Number of deployments<br />
40<br />
30<br />
20<br />
10<br />
0<br />
2002 2003 2004 2005 2006 2007 2008 2009 2010 2011<br />
Institutional deployments 0 0 1 1 5 7 13 22 26 28<br />
Regional deployments 1 1 1 5 10 11 17 25 28 32
dLibra<br />
Polish Digital Libraries<br />
Approx. no of digital objects:<br />
1 000 000<br />
Number of digital libraries:<br />
Over 80<br />
Participat<strong>in</strong>g <strong>in</strong>stitutions:<br />
Several hundreds<br />
All resources available via the Polish<br />
national aggregator – Digital Libraries<br />
Federation:<br />
http://fbc.pionier.net.pl<br />
Regional digital libraries<br />
Institutional digital libraries<br />
7
dLibra – regional digital library<br />
8
dLibra – <strong>in</strong>stitutional digital library<br />
9
dMuseion<br />
• Dedicated software tool <strong>for</strong> digital<br />
museums<br />
– Has been developed <strong>in</strong> cooperation with National Museum <strong>in</strong><br />
Warsaw<br />
– The aim is to make the museum resources available <strong>in</strong> the Internet<br />
<strong>and</strong> prepare an easy to use software package <strong>for</strong> build<strong>in</strong>g digital<br />
museums<br />
• Why a dedicated solution?<br />
– Different type of resources than <strong>in</strong> libraries/respositories <strong>and</strong><br />
archives (pa<strong>in</strong>t<strong>in</strong>gs, sculptures, etc.)<br />
– Themed collections of museum resources<br />
– Term<strong>in</strong>ology<br />
10
dMuseion – ma<strong>in</strong> page<br />
11
dMuseion – digital object metadata<br />
12
dMuseion – 3D object<br />
13
dArceo<br />
• Has been developed by PSNC s<strong>in</strong>ce 2011<br />
• It is based on the prototype services developed <strong>in</strong><br />
frame of the SYNAT project, f<strong>in</strong>anced by the Polish<br />
National Center <strong>for</strong> Research <strong>and</strong> Development<br />
• Dedicated to preserve master, optimized <strong>and</strong> even<br />
presentation files with primary focus on:<br />
– Textual data (e.g. PDF/A)<br />
– Images (e.g. TIFF, JPEG2000)<br />
– Audiovisual (e.g. MPEG-4)<br />
14
dArceo<br />
Basic functions <strong>and</strong> characteristics (1)<br />
• Can utilise various types of data storage<br />
– Local hard drive (RAID is suggested), disk array, etc.<br />
– (S)FTP, e.g. archiv<strong>in</strong>g services of PLATON U4 (R&D project)<br />
• Internal representation uses well-known st<strong>and</strong>ards<br />
<strong>and</strong> <strong>for</strong>mats (e.g. METS, PREMIS)<br />
• Dedicated build-<strong>in</strong> mechanism <strong>for</strong> data monitor<strong>in</strong>g<br />
– loss risk calculation based on PRONOM/UDFR<br />
15
dArceo<br />
Basic functions <strong>and</strong> characteristics (2)<br />
• The most important functionality<br />
– Data migration <strong>for</strong> the needs of <strong>long</strong>-<strong>term</strong> preservation (OAIS<br />
trans<strong>for</strong>mation approach)<br />
– Data conversion, e.g. <strong>for</strong> the needs of digital libraries <strong>and</strong><br />
onl<strong>in</strong>e availability of resources<br />
– Advanced data delivery, e.g. dedicated tool <strong>for</strong> large-size data<br />
visualisation, transcription tool<br />
• It is possible to def<strong>in</strong>e migration/conversion plans<br />
– Migration or conversion can have several steps (pipel<strong>in</strong><strong>in</strong>g of<br />
the services)<br />
– Semantic technologies applied <strong>for</strong> the orchestration of the<br />
data manipulation services<br />
16
dArceo<br />
Basic functions <strong>and</strong> characteristics (3)<br />
• Capability of shar<strong>in</strong>g migration, conversion <strong>and</strong><br />
advanced delivery services<br />
– By means of synchronisation <strong>and</strong> P2P-like communication<br />
• Data safety <strong>and</strong> security<br />
– AAA of the users <strong>and</strong> external services<br />
– Data safety should be assured by the data storage (e.g.<br />
redundancy, distant locations)<br />
17
dLab<br />
Basic functions <strong>and</strong> characteristics<br />
• General<br />
– Support digitisation activities<br />
– Management of the digitisation workflow<br />
– Monitor<strong>in</strong>g capability<br />
• Term<strong>in</strong>ology<br />
– Digitisation task<br />
• Basic element <strong>in</strong> the system, related to particular element, e.g. book, issue,<br />
etc.<br />
• Covers all the activities necessary to f<strong>in</strong>ish digitisation of particular element<br />
– Activity <strong>in</strong> frame of the digitisation task<br />
• Represents certa<strong>in</strong> work to be done, e.g. prepare optimized files<br />
• The work is done by human (user) or a mach<strong>in</strong>e (tool)<br />
• Flexibility<br />
– A set of activities <strong>in</strong> scope of a task<br />
– Order constra<strong>in</strong>ts<br />
– Pluggable architecture<br />
18
dLab – digitisation task<br />
Editor<br />
Scanner<br />
operator<br />
Tool<br />
QA<br />
Task A<br />
Select object<br />
to digitise<br />
Prepare<br />
master files<br />
Create PDF<br />
Archive<br />
master files<br />
Verify<br />
Submit PDF to<br />
digital library<br />
19
dLab – external tools<br />
dLab<br />
Plug<strong>in</strong>s <strong>for</strong> external tools<br />
dArceo<br />
F<strong>in</strong>eReader<br />
Document<br />
Express<br />
External tools<br />
dLibra<br />
plug<strong>in</strong><br />
dArceo<br />
plug<strong>in</strong><br />
FR plug<strong>in</strong><br />
DE plug<strong>in</strong><br />
dLab<br />
dLibra<br />
Activity<br />
A<br />
Activity<br />
B<br />
Activity<br />
C<br />
Activity<br />
…<br />
dLab UI<br />
Work<strong>in</strong>g space<br />
(e.g. disk array)
Summary (1)<br />
• Digitisation is an important mission of all <strong>cultural</strong><br />
heritage <strong>in</strong>stitutions<br />
• Long-<strong>term</strong> preservation is a key element of each digital<br />
library or museum<br />
• Mass digitisation needs a strong support <strong>in</strong> <strong>term</strong>s of<br />
management<br />
• Poznań Supercomput<strong>in</strong>g <strong>and</strong> Network<strong>in</strong>g Center acts<br />
as a support<strong>in</strong>g <strong>in</strong>stitution<br />
– R&D center (PAS) <strong>in</strong> the IT area, <strong>in</strong>clud<strong>in</strong>g digital libraries<br />
– Cooperates with various national <strong>and</strong> <strong>in</strong>ternational bodies<br />
to stimulate growth <strong>and</strong> advancements of <strong>in</strong><strong>for</strong>mation<br />
society<br />
21
Summary (2)<br />
Complex software solution <strong>for</strong> <strong>cultural</strong><br />
heritage <strong>in</strong>stitutions<br />
dLab<br />
dArceo<br />
master files<br />
<strong>long</strong>-<strong>term</strong> preservation<br />
digitisation process<br />
dLibra or<br />
dMuseion<br />
presentation files<br />
onl<strong>in</strong>e availability of resources<br />
22
Questions?<br />
Tomasz Parkoła<br />
tparkola@man.poznan.pl<br />
http://dl.psnc.pl<br />
23
Thank you!