Integrity can be defined as “completeness” and “intactness” of the digital objects – inparticular, of their significant properties (nestor 2008, 6; my translation), and can be put atrisk both by malfunctioning technology and by human action (intentionally or by mistake)(nestor 2008, 6). During Ingest, integrity may, among others, be ascertained and protectedby means of virus checking, cyclic redundancy checks or other checksums, as well as byascertaining the validity of the <strong>file</strong> format. Thus, in order to determine the digital object'sproperties as comprehensively as possible, the following actions should be performed:- Format identification is the process of determining the format to which a digital objectconforms; in other words, it answers the question: “I have a digital object; what format is it?”- Format validation is the process of determining the level of compliance of a digital object to thespecification for its purported format, e.g.: “I have an object purportedly of format F; is it?” […]. 77For the purpose of long-term preservation, such identification and validation processes,which can, for example, be carried out with the tool JHOVE (JSTOR/Harvard ObjectValidation Environment) 78 , should be combined with format validation, i.e. “the process ofdetermining the format-specific significant properties of an object of a given format, e.g.: 'Ihave an object of format F; what are its salient properties?'” (ibid.). 79 Such formatidentification and validation measures clearly exceed the range of “traditional”, nonpreservation-relatedrepository tasks. Nonetheless it would be desirable if availablerepository software included such features as a standard. Thus, if the software madefunctions important in the context of long-term preservation easily available, repositoriesdeciding in the future to begin implementing long-term preservation strategies would findthat their collections are – at least to some extent – already prepared for suchpreservation efforts, a circumstance that might make it considerably easier for them todecide taking a step towards long-term preservation.Both authenticity and integrity of <strong>file</strong>s may be protected by making use of secureconnections (e.g. by means of SSL or HTTPS protocols) for <strong>file</strong> and metadata upload bysubmitting authors. In the context of institutional and subject repositories, this not onlyhelps to protect the personal information authors give about themselves when submittinga digital object (i.e. information which will not be visible in the public record for thepublication), but also prevents third parties from interfering with the submission. In order toidentify and authenticate a depositor, many repositories work with user accounts and log77 http://hul.harvard.edu/jhove/ – 03.11.2009.78 Another <strong>free</strong>ly available tool for format identification is DROID (Digital Record Object Identification), whichwas developed by the UK National Archives. It is available for download from http://droid.sourceforge.net/.Together with JHOVE and other tools, DROID forms part of FITS (File Information Tool Set), an opensource tool which “combines the abilities of many different open-source <strong>file</strong> identification, validation, andmetadata extraction tools. The File Information Tool Set (FITS) acts as a wrapper around these tools,invoking, normalizing, and combining their output” (Spencer et al. 2009; no pag.). Further formatidentification and validation tools are listed in the “Preservation-related Tools” section of the IDEALS(Illinois Digital Environment for Access to Learning and Scholarship) Wiki athttps://services.ideals.uiuc.edu/wiki/bin/view/IDEALS/Internal/PreservationTools – 03.11.2009.79 In this context, so-called format registries play an important role. Among the most well-known registries arethose built up by the PRONOM or GDFR initiatives, both of which “joined forces” in April 2009 to form theUnified Digital Formats Registry (UDFR) (http://www.gdfr.info/index.html – 03.11.2009).28
in procedures or even require authors to identify themselves by means of PGP/GPGkeys. 80The nestor and TRAC catalogs contain criteria dealing with integrity and authenticityof the <strong>file</strong>s submitted to and archived by the repository. While, as we have seen, thenestor criteria mention the two concepts explicitly (cf. criteria blocks 6 and 7), TRAC onlyrefers to “completeness and correctness” of the submitted objects (2007, B1.4). Importantaspects of the concept of authenticity as outlined above are touched upon in TRACcriterion B1.3, focusing on “authenticat[ing] the source of all materials,” and requiring therepository to “ensure the digital objects are obtained from the expected source, that theappropriate provenance has been maintained, and that the objects are the expectedobjects” (2007, B1.3). DINI's overall approach to authenticity and integrity is primarily fromthe technical side. Thus 2.5 (Sicherheit, Authentizität und Integrität) among others 81recommends the use of advanced digital signatures according to SigG 2001 (Gesetz überRahmenbedingungen für elektronische Signaturen) 82 (DINI 2007, 2.5.2) and requires that(technical) measures are taken to prevent that <strong>file</strong>s are uploaded onto the server which donot meet the criteria outlined in the repository's policy (DINI 2007, 2.5.1). Thus, forexample, <strong>file</strong>s containing viruses or <strong>file</strong>s with formats not accepted by the repositoryshould be rejected automatically. 83During the Generate AIP functional sub-entity, an Archival Information Package isgenerated from the Submission Information Package. This AIP must, according to OAIS,“conform to the archive's data formatting and documentation standards” (2002, 4-6;emphasis omitted), that is, it must be built and structured in accordance with packagingdesigns developed by the Preservation Planning Functional Entity (see below) andadopted by Administration. Like all OAIS Information Packages, the AIPis a conceptual container of two types of information called Content Information andPreservation Description Information (PDI). The Content Information and PDI are viewed asbeing encapsulated and identifiable by the Packaging Information. The resulting package isviewed as being discoverable by virtue of the Descriptive Information. (OAIS 2002, 2-5)80 TUBdok, the OPUS-based repository of the Technical University Hamburg-Harburg, uses PGP/GPG keysto authenticate the source of materials published on its server. Thus, every record contains a link to a pageserving as evidence for the attached document's integrity (“Unversehrtheitsnachweis”), including areference to digital signatures of the author(s) and the library. See Marahrens 2005 for a description of theproject in which these and other changes were implemented.81 In addition, DINI criterion 2.5 for example suggests access controls to the server (required; see 2007,2.5.1) and contains the requirement that a document whose content was changed has to be treated like anew document (see 2007, 2.5.2). These and other relevant criteria, requirements, and recommendationsfrom 2.5 will be listed and discussed under the functional entities for which they are most relevant.82 It does not become entirely clear how the signature is to be used and what is meant to be signed with it.See Winkler (2008) 76-78 for an explanation of possible uses of digital signatures in the context of longtermpreservation. Becker explicitly comments on the DINI suggestions and points to possible problemsthe use digital signatures can pose for long-term preservation efforts (2008, 36-37).83 Of course, each repository must decide for itself whether it wants to restrict the formats it will accept, andwhether these should really be rejected upon upload. As pointed out by Dr. Wagner, such an automatedrejection process might mean that <strong>file</strong>s which could, for example, be converted into a different format willnot reach the repository at all as users might not want to make a second attempt or contact repository staffafter a <strong>file</strong> was rejected. This scenario shows quite clearly that in contrast to digital long-term archivessolely serving the purpose of preservation, for institutional or subject repositories long-term preservationefforts might conflict with usability and the concern to acquire a critical mass of repository content.29
- Page 6: AbstractTaking its cue from the inc
- Page 13: and benefit from the development an
- Page 18 and 19: German repositories have already be
- Page 20 and 21: [t]he Open Archival Information Sys
- Page 22 and 23: Thus it seems highly recommendable
- Page 24 and 25: actively pursuing the long-term pre
- Page 26 and 27: Like pedocs, the repository is not
- Page 28 and 29: application for a project grant was
- Page 30 and 31: generating an Archival Information
- Page 34 and 35: It is with this step that the metad
- Page 36 and 37: documents submitted to the reposito
- Page 38 and 39: [t]he majority of OCR software supp
- Page 40 and 41: One of the shortcomings of the soft
- Page 42 and 43: set of shared metadata which is the
- Page 44 and 45: document, and that hence the docume
- Page 46: Structural metadata: In DSpace it i
- Page 49 and 50: dc.description.provenancedc.descrip
- Page 51 and 52: Für die Langzeitverfügbarkeit der
- Page 53 and 54: ecord for a title; although a workf
- Page 55 and 56: Source (where applicable)Publicatio
- Page 57 and 58: checksums. In particular, TRAC requ
- Page 59 and 60: checksums are currently not checked
- Page 61 and 62: 2.4.2 JUWEL Data ManagementThe stru
- Page 63 and 64: Before any SIPs are accepted, the r
- Page 65 and 66: guarantee that documents are “arc
- Page 67 and 68: pedocs is a scholarly open access d
- Page 69 and 70: formats. 151 Although the possible
- Page 71 and 72: While the preferred file format is
- Page 73 and 74: from nestor criterion 8, making it
- Page 75 and 76: preserved for the long term, will h
- Page 77 and 78: It seems that of all three reposito
- Page 79 and 80: associated with them, or has define
- Page 81 and 82: versioning functionality which allo
- Page 83 and 84:
communication channels, responsibil
- Page 85 and 86:
Works CitedAllinson, Julie (2006):
- Page 87 and 88:
DSpace Homepage. http://www.dspace.
- Page 89 and 90:
Lynch, Clifford A. (2000): Authenti
- Page 91 and 92:
in the EU. Amsterdam: Amsterdam Uni
- Page 93 and 94:
Ingestnestor TRAC DINIReceive Submi
- Page 95 and 96:
Archival Storagenestor TRAC DINIRec
- Page 97 and 98:
Archival Information Update10.4 Das
- Page 99 and 100:
Preservation PlanningnestorMonitor