set of shared metadata which is the same for all types, and document type-specificmetadata recorded in the case that the document to be submitted already has beenpublished elsewhere. The content of all ingested documents is described by means of acontrolled vocabulary. Authors can search for appropriate subject headings by means ofan implemented search tool and add them to the submission form. 101 Although authors arealso given opportunity to add <strong>free</strong> descriptive keywords, these are usually replaced bydescriptors/subject headings from the controlled vocabulary by pedocs staff. Particularlywhere abstracts are included, the descriptive information available appears sufficient tolocate and analyze (see OAIS 2002, 4-30) digital objects stored in the repository.Via the OAI interface, pedocs metadata can be mapped onto the Dublin CoreElement Set, of which eight elements are currently used by pedocs. In addition,documents are classified by means of DDC; however, since only the DDC main classesare used, all documents receive the same notation, i.e. 370. This is not displayed in therecords and can – obviously – not be used for the purpose of browsing.2.2.2 JUWEL IngestIngest: Receive SubmissionDigital objects reach the repository via the author submission form (Web SubmissionUser Interface [UI]) or by email to the repository staff, who then ingest the submitted <strong>file</strong>sinto the repository. In addition, campus press publications are ingested into JUWEL in anautomated batch ingest process 102 which can either be initiated by an administrator or bescheduled to be carried out automatically as a so-called cron job at designated times.Currently, most publications stored in JUWEL are in <strong>PDF</strong> format, although sometimesthe “original” from which the <strong>PDF</strong> created is archived along with the <strong>PDF</strong>. As of yet nopolicy exists which <strong>file</strong> formats can and will be preserved (i.e. curated) in JUWEL and/orthe long term archive with which it might cooperate one day. However, according toWagner it seems unlikely that proprietary formats such as .doc will be included inpreservation efforts beyond bitstream preservation if at all. In contrast, the use andsupport of open formats such as LaTeX seems much more likely.While the “JUWEL-Policy” 103 states that authors submitting material to the repositoryhave to give permission that their works can be stored, copied, transformed into otherformats, and made accessible via JUWEL 104 , the policy does not specify that documents101http://www.pedocs.de/schlagwortsuche.php?set=01&status=call – 03.11.2009. The subject terms offeredare an excerpt from the controlled vocabulary used for the FIS Bildung database.102 In addition to the Web Submission User Interface (UI), a “batch item importer” is available in DSpace,“which turns an external SIP (an XML metadata document with some content <strong>file</strong>s) into an 'in progresssubmission' object” (Tansely et al. 2006, 16). Initiating such a batch ingest is not possible for users, whoare thus required to use the Web Submission UI in all cases.103 http://juwel.fz-juelich.de:8080/dspace/help/policy.jsp – 03.11.2009. Hereafter cited as JUWEL Policy.104 <strong>Not</strong>e that the policy contains the deposit agreement rather than more general policy information. “Mit derAnnahme dieser Bestimmung wird der Zentralbibliothek das nicht-ausschließliche Recht eingeräumt, dieRessource zu speichern, zu vervielfältigen, weltweit zugänglich zu machen und bei Bedarf gedruckte undelektronische Kopien anzufertigen […]. Mit der Annahme dieser Bestimmung räumt der Veröffentlichendeder Zentralbibliothek das Recht ein, die Ressource bei Bedarf (z. B. Migration, Barrierefreiheit, bessereZugänglichkeit, Erschließung) in andere elektronische und physische Formate zu überführen und diese38
submitted must be <strong>free</strong> from DRM or other security or protection measures (e.g. passwordprotection, etc.). Thus, for example, the record accessible athttp://hdl.handle.net/2128/3140 (25.10.2009) links to an encrypted campus presspublication protected against changes by password. As already explained above, this ishighly problematic if the document is to be preserved in a long term archive. Thus, itseems advisable not only to modify the policy accordingly, but also to take measures toascertain that the repository indeed has complete control over these digital objects asotherwise long-term preservation might not be possible or considerably more difficult torealized for these.Ingest: Quality AssuranceIntegrity: DSpace versions up to 1.5 carry out a format identification which, however,is merely based on the <strong>file</strong>name extension which is checked against a (locally installed)format registry. 105 Thus, as Larry Stone, commenting on problems with the current method,explains in a paper delivered at the Open Repositories Conference 2008,[t]he data model for technical metadata (the BitstreamFormat object) has been essentiallyunchanged from the inception of DSpace through release 1.5. Although its design anticipatedthe use of external format registries, it was never completed. There are some other problemswith the current data model and implementation:1. Formats are identified by arbitrarily-assigned descriptive names such as “Adobe <strong>PDF</strong>”which have no meaning outside of DSpace. This impedes any attempt to share formatdescriptions with any other application.2. There is no provision for collecting more extensive format technical metadata, such asstandards documents, that would help future preservationists interpret obsolete formats.3. A Bitstream's format is only identified by comparing its <strong>file</strong>name extension to entries in theformat registry. This method is prone to errors, ambiguous results, and outright failures [...].(2008, 2)Thus, currently neither a thorough format identification nor a format validation, whichwould have to be performed by an external tool such as JHOVE, are carried out forsubmissions to JUWEL. In consequence, the question of whether a <strong>file</strong> is functional or notis at the moment merely answered by means of opening it, which clearly is no sufficientprocedure if digital objects are to be preserved for the long term.Additionally, the DSpace submission workflow gives authors the opportunity todownload their publications immediately after upload in order to verify that the MD5checksum generated during the ingest process is the same as the one of the originalgemäß Absatz 1 zu verwerten” (Juwel Policy).105 Plans exist to implement an automatic format identification service into DSpace making use of externalformat registries as well as a new data model for technical metadata in a future DSpace release. Theseefforts are explicitly related to considerations concerning digital preservation. As Stone explains, “[t]oaddress digital preservation problems and take advantage of tools currently being developed, DSpaceneeds more fine-grained format classification, as well as globally-recognized identifiers for formats […].Both of these are provided by a preservation-minded external format registry such as PRONOM or GDFR”(Stone 2008, 5.2). In addition, the importance of format validation has been recognized, and according toStone the new data model “is ready to record validation; successfully validated Bitstreams are marked witha Confidence value of 'VALIDATED'” (Stone 2008, 5.2).As outlined in the DSpace wiki, future projects will address the following preservation applications:“Detecting and notifying administrators of obsolete formats in the archive,” “Format migration andnormalization (migration on ingest),” and “Data format validation (some of which is already implemented inpending JHOVE integration work” (DSpace Wiki 2008a).39
- Page 6: AbstractTaking its cue from the inc
- Page 13: and benefit from the development an
- Page 18 and 19: German repositories have already be
- Page 20 and 21: [t]he Open Archival Information Sys
- Page 22 and 23: Thus it seems highly recommendable
- Page 24 and 25: actively pursuing the long-term pre
- Page 26 and 27: Like pedocs, the repository is not
- Page 28 and 29: application for a project grant was
- Page 30 and 31: generating an Archival Information
- Page 32 and 33: Integrity can be defined as “comp
- Page 34 and 35: It is with this step that the metad
- Page 36 and 37: documents submitted to the reposito
- Page 38 and 39: [t]he majority of OCR software supp
- Page 40 and 41: One of the shortcomings of the soft
- Page 44 and 45: document, and that hence the docume
- Page 46: Structural metadata: In DSpace it i
- Page 49 and 50: dc.description.provenancedc.descrip
- Page 51 and 52: Für die Langzeitverfügbarkeit der
- Page 53 and 54: ecord for a title; although a workf
- Page 55 and 56: Source (where applicable)Publicatio
- Page 57 and 58: checksums. In particular, TRAC requ
- Page 59 and 60: checksums are currently not checked
- Page 61 and 62: 2.4.2 JUWEL Data ManagementThe stru
- Page 63 and 64: Before any SIPs are accepted, the r
- Page 65 and 66: guarantee that documents are “arc
- Page 67 and 68: pedocs is a scholarly open access d
- Page 69 and 70: formats. 151 Although the possible
- Page 71 and 72: While the preferred file format is
- Page 73 and 74: from nestor criterion 8, making it
- Page 75 and 76: preserved for the long term, will h
- Page 77 and 78: It seems that of all three reposito
- Page 79 and 80: associated with them, or has define
- Page 81 and 82: versioning functionality which allo
- Page 83 and 84: communication channels, responsibil
- Page 85 and 86: Works CitedAllinson, Julie (2006):
- Page 87 and 88: DSpace Homepage. http://www.dspace.
- Page 89 and 90: Lynch, Clifford A. (2000): Authenti
- Page 91 and 92: in the EU. Amsterdam: Amsterdam Uni
- Page 93 and 94:
Ingestnestor TRAC DINIReceive Submi
- Page 95 and 96:
Archival Storagenestor TRAC DINIRec
- Page 97 and 98:
Archival Information Update10.4 Das
- Page 99 and 100:
Preservation PlanningnestorMonitor