29.04.2014 Views

Tools for mass digitization and long-term perservation in cultural ...

Tools for mass digitization and long-term perservation in cultural ...

Tools for mass digitization and long-term perservation in cultural ...

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

<strong>Tools</strong> <strong>for</strong> <strong>mass</strong> digitisation <strong>and</strong> <strong>long</strong>-<strong>term</strong><br />

preservation <strong>in</strong> <strong>cultural</strong> heritage <strong>in</strong>stitutions<br />

Cezary Mazurek, Tomasz Parkoła, Marc<strong>in</strong> Werla<br />

SEEDI conference 17-18 of May 2012


Agenda<br />

• Introduction<br />

• Mass digitisation<br />

• Long-<strong>term</strong> preservation<br />

• <strong>Tools</strong> support<strong>in</strong>g digitisation activities<br />

• dLibra <strong>and</strong> dMuseion – digital library/repository/museum<br />

• dArceo – <strong>long</strong>-<strong>term</strong> preservation services<br />

• dLab – digitisation workflow management tool<br />

• Summary<br />

2


Introduction<br />

Mass digitisation is a large-scale automated process<br />

of captur<strong>in</strong>g the analog signal <strong>in</strong>to digital <strong>for</strong>m,<br />

<strong>in</strong>clud<strong>in</strong>g enhancements such as OCR <strong>and</strong><br />

transcription.<br />

Long-<strong>term</strong> preservation assures that the digital<br />

<strong>in</strong><strong>for</strong>mation is accesible today, tomorrow, <strong>in</strong> a year, 10<br />

years, etc.<br />

3


Support<strong>in</strong>g tools<br />

Developed by Poznań Supercomput<strong>in</strong>g <strong>and</strong><br />

Network<strong>in</strong>g Center (Polish Academy of Sciences)<br />

• dLibra – digital library framework<br />

• dMuseion – digital museum framwork<br />

• dArceo – <strong>long</strong>-<strong>term</strong> preservation services<br />

• dLab – digitisation workflow management<br />

4


dLibra<br />

Developed by Poznań Supercomput<strong>in</strong>g <strong>and</strong><br />

Network<strong>in</strong>g Center (Polish Academy of Sciences)<br />

• Has been developed by PSNC s<strong>in</strong>ce 1999<br />

• The first Polish software <strong>for</strong> build<strong>in</strong>g digital libraries<br />

• Key element <strong>in</strong> stimulat<strong>in</strong>g the growth of digital<br />

libraries <strong>in</strong> Pol<strong>and</strong><br />

5


dLibra – deployments<br />

60<br />

50<br />

Number of deployments<br />

40<br />

30<br />

20<br />

10<br />

0<br />

2002 2003 2004 2005 2006 2007 2008 2009 2010 2011<br />

Institutional deployments 0 0 1 1 5 7 13 22 26 28<br />

Regional deployments 1 1 1 5 10 11 17 25 28 32


dLibra<br />

Polish Digital Libraries<br />

Approx. no of digital objects:<br />

1 000 000<br />

Number of digital libraries:<br />

Over 80<br />

Participat<strong>in</strong>g <strong>in</strong>stitutions:<br />

Several hundreds<br />

All resources available via the Polish<br />

national aggregator – Digital Libraries<br />

Federation:<br />

http://fbc.pionier.net.pl<br />

Regional digital libraries<br />

Institutional digital libraries<br />

7


dLibra – regional digital library<br />

8


dLibra – <strong>in</strong>stitutional digital library<br />

9


dMuseion<br />

• Dedicated software tool <strong>for</strong> digital<br />

museums<br />

– Has been developed <strong>in</strong> cooperation with National Museum <strong>in</strong><br />

Warsaw<br />

– The aim is to make the museum resources available <strong>in</strong> the Internet<br />

<strong>and</strong> prepare an easy to use software package <strong>for</strong> build<strong>in</strong>g digital<br />

museums<br />

• Why a dedicated solution?<br />

– Different type of resources than <strong>in</strong> libraries/respositories <strong>and</strong><br />

archives (pa<strong>in</strong>t<strong>in</strong>gs, sculptures, etc.)<br />

– Themed collections of museum resources<br />

– Term<strong>in</strong>ology<br />

10


dMuseion – ma<strong>in</strong> page<br />

11


dMuseion – digital object metadata<br />

12


dMuseion – 3D object<br />

13


dArceo<br />

• Has been developed by PSNC s<strong>in</strong>ce 2011<br />

• It is based on the prototype services developed <strong>in</strong><br />

frame of the SYNAT project, f<strong>in</strong>anced by the Polish<br />

National Center <strong>for</strong> Research <strong>and</strong> Development<br />

• Dedicated to preserve master, optimized <strong>and</strong> even<br />

presentation files with primary focus on:<br />

– Textual data (e.g. PDF/A)<br />

– Images (e.g. TIFF, JPEG2000)<br />

– Audiovisual (e.g. MPEG-4)<br />

14


dArceo<br />

Basic functions <strong>and</strong> characteristics (1)<br />

• Can utilise various types of data storage<br />

– Local hard drive (RAID is suggested), disk array, etc.<br />

– (S)FTP, e.g. archiv<strong>in</strong>g services of PLATON U4 (R&D project)<br />

• Internal representation uses well-known st<strong>and</strong>ards<br />

<strong>and</strong> <strong>for</strong>mats (e.g. METS, PREMIS)<br />

• Dedicated build-<strong>in</strong> mechanism <strong>for</strong> data monitor<strong>in</strong>g<br />

– loss risk calculation based on PRONOM/UDFR<br />

15


dArceo<br />

Basic functions <strong>and</strong> characteristics (2)<br />

• The most important functionality<br />

– Data migration <strong>for</strong> the needs of <strong>long</strong>-<strong>term</strong> preservation (OAIS<br />

trans<strong>for</strong>mation approach)<br />

– Data conversion, e.g. <strong>for</strong> the needs of digital libraries <strong>and</strong><br />

onl<strong>in</strong>e availability of resources<br />

– Advanced data delivery, e.g. dedicated tool <strong>for</strong> large-size data<br />

visualisation, transcription tool<br />

• It is possible to def<strong>in</strong>e migration/conversion plans<br />

– Migration or conversion can have several steps (pipel<strong>in</strong><strong>in</strong>g of<br />

the services)<br />

– Semantic technologies applied <strong>for</strong> the orchestration of the<br />

data manipulation services<br />

16


dArceo<br />

Basic functions <strong>and</strong> characteristics (3)<br />

• Capability of shar<strong>in</strong>g migration, conversion <strong>and</strong><br />

advanced delivery services<br />

– By means of synchronisation <strong>and</strong> P2P-like communication<br />

• Data safety <strong>and</strong> security<br />

– AAA of the users <strong>and</strong> external services<br />

– Data safety should be assured by the data storage (e.g.<br />

redundancy, distant locations)<br />

17


dLab<br />

Basic functions <strong>and</strong> characteristics<br />

• General<br />

– Support digitisation activities<br />

– Management of the digitisation workflow<br />

– Monitor<strong>in</strong>g capability<br />

• Term<strong>in</strong>ology<br />

– Digitisation task<br />

• Basic element <strong>in</strong> the system, related to particular element, e.g. book, issue,<br />

etc.<br />

• Covers all the activities necessary to f<strong>in</strong>ish digitisation of particular element<br />

– Activity <strong>in</strong> frame of the digitisation task<br />

• Represents certa<strong>in</strong> work to be done, e.g. prepare optimized files<br />

• The work is done by human (user) or a mach<strong>in</strong>e (tool)<br />

• Flexibility<br />

– A set of activities <strong>in</strong> scope of a task<br />

– Order constra<strong>in</strong>ts<br />

– Pluggable architecture<br />

18


dLab – digitisation task<br />

Editor<br />

Scanner<br />

operator<br />

Tool<br />

QA<br />

Task A<br />

Select object<br />

to digitise<br />

Prepare<br />

master files<br />

Create PDF<br />

Archive<br />

master files<br />

Verify<br />

Submit PDF to<br />

digital library<br />

19


dLab – external tools<br />

dLab<br />

Plug<strong>in</strong>s <strong>for</strong> external tools<br />

dArceo<br />

F<strong>in</strong>eReader<br />

Document<br />

Express<br />

External tools<br />

dLibra<br />

plug<strong>in</strong><br />

dArceo<br />

plug<strong>in</strong><br />

FR plug<strong>in</strong><br />

DE plug<strong>in</strong><br />

dLab<br />

dLibra<br />

Activity<br />

A<br />

Activity<br />

B<br />

Activity<br />

C<br />

Activity<br />

…<br />

dLab UI<br />

Work<strong>in</strong>g space<br />

(e.g. disk array)


Summary (1)<br />

• Digitisation is an important mission of all <strong>cultural</strong><br />

heritage <strong>in</strong>stitutions<br />

• Long-<strong>term</strong> preservation is a key element of each digital<br />

library or museum<br />

• Mass digitisation needs a strong support <strong>in</strong> <strong>term</strong>s of<br />

management<br />

• Poznań Supercomput<strong>in</strong>g <strong>and</strong> Network<strong>in</strong>g Center acts<br />

as a support<strong>in</strong>g <strong>in</strong>stitution<br />

– R&D center (PAS) <strong>in</strong> the IT area, <strong>in</strong>clud<strong>in</strong>g digital libraries<br />

– Cooperates with various national <strong>and</strong> <strong>in</strong>ternational bodies<br />

to stimulate growth <strong>and</strong> advancements of <strong>in</strong><strong>for</strong>mation<br />

society<br />

21


Summary (2)<br />

Complex software solution <strong>for</strong> <strong>cultural</strong><br />

heritage <strong>in</strong>stitutions<br />

dLab<br />

dArceo<br />

master files<br />

<strong>long</strong>-<strong>term</strong> preservation<br />

digitisation process<br />

dLibra or<br />

dMuseion<br />

presentation files<br />

onl<strong>in</strong>e availability of resources<br />

22


Questions?<br />

Tomasz Parkoła<br />

tparkola@man.poznan.pl<br />

http://dl.psnc.pl<br />

23


Thank you!

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!