generating an Archival Information Package (AIP) which complies with the archive's dataformatting and documentation standards, extracting Descriptive Information from the AIPs forinclusion in the archive database, and coordinating updates to Archival Storage and DataManagement. (OAIS 2002, 4-1)Ingest relies upon rules established on the organizational side to determine the metadata thatmust be present, the formats that are acceptable, the means that may be used for transferringobjects, and the quality checks that must be performed […]. The ingest functions must be ableto determine that the <strong>file</strong>s and their metadata are complete and correct as sent. Next themetadata must be generated to tie the objects into the structure of the archive by generating theArchival Information Package (AIP). Any text that will be used for searching or for display mustalso be created and associated with the objects – the Descriptive Information – and sent to DataManagement. After all the complex objects are created, they are moved to Archival Storage. 71During Ingest the repository will typically receive an Information Package from aproducer. 72 It is with Ingest that “the repository takes intellectual control of the [InformationPackage] and processes it for preservation” (Bodleian Library 2007, 55). In the following,the process in which an institutional or subject repository receives a digital object forinclusion in its digital collection will be described as Ingest, despite the fact that therepositories treated here are not the long-term archive. While it would be possible todesignate this process as a kind of pre-Ingest, it will become apparent in the following thatimportant data and information is collected at this stage, without which long-termpreservation would not be possible later, and that hence the repository Ingest can at leastconceptually be regarded as part of the “Ingest proper.”From the point of Ingest onward the repository has the responsibility for the ingestedInformation Package and the preservation of the content information according to thedefined storage or preservation goals. In order to fulfill its tasks, the repository must inparticular have physical and technical control over the digital objects it receives, a fact thatis reflected in all three criteria catalogs (see Appendix A). In particular, this means that theSIP must be <strong>free</strong> from any Digital Rights Management software or technical measureslimiting, for example, the possibilities to view, save, copy, or print the digital object. Thislatter aspect, which also is highly important in the context of user access to objects storedin a repository, is explicitly mentioned in the explanatory notes to the nestor criterion 9.3 73as well as in the DINI criteria (see DINI 2007, 2.8 74 ). In contrast, TRAC, while stating that71 http://www.icpsr.umich.edu/dpm/dpm-eng/foundation/oais/overview.html – 24.10.200972 However, the Ingest process may also be carried out for an existing AIP which has been changed orupdated in any way. Thus, the OAIS itself might act as producer, for example, when AIPs which underwenta migration are re-submitted to the repository as SIPs. See OAIS 2002, chapter 4.2 for a detailed definitionof its information model, including information packages in chapter 4.2.2.73 Hereafter, the criteria catalogs will be cited as short name (nestor, TRAC, and DINI respectively), year, andnumber of the criterion rather than page number.74 <strong>Not</strong>e, however, that DINI is somewhat unclear in its remarks concerning technical measures for digitalrights management. Thus, the following minimum standard is described under 2.8 Langzeitverfügbarkeit(Long Term Accessibility): “Die gegebenenfalls zusätzlich zu den eingereichten Originaldateien des Autorserstellten Archivkopien sind frei von Schutzmaßnahmen (DRM), die eine Anwendung von Strategien zurLangzeitverfügbarkeit (Migration, Emulation) verhindern” (2007, 2.8; emphasis added). Hence DINI makesa distinction between the submitted <strong>file</strong>s, which are not required to be <strong>free</strong> from DRM, and archival copies,which can but do not have to be created, and which would have to be <strong>free</strong> from DRM. Thus the creation ofDRM-<strong>free</strong> archival copies is currently not a requirement in the DINI criteria, a fact that is, from theperspective of long-term preservation, certainly problematic as a repository that will decide in the future toundertake long-term preservation activities may have a large amount of digital objects unfit forpreservation in its collection. In addition, the following is recommended (i.e. not required): “Nutzung vonoffenen Dateiformaten, die zur Langzeitarchivierung geeignet […] und frei von Schutzmaßnahmen (DRM)sind” (DINI 2007, 2.8). This statement is similarly unclear, in that DRM is something applied to <strong>file</strong>s or26
“[t]he repository must obtain complete control of the bits of the digital objects conveyedwith each SIP” (2007, B1.5), does not focus on DRM or similar issues so much but pointsto the fact that digital objects may contain references to other digital objects, and that therepository should attempt to harvest and ingest these objects as well in order to guaranteethat the digital objects it preserves are as complete as possible.Part of the Ingest process is Quality Assurance, during which the repositoryascertains the integrity and authenticity of the SIP before archiving it in the repository.Definitions of either concept abound, and Clifford Lynch's observation, dating back to theyear 2000, that authenticity and integrity are “elusive properties” which, as we try to definethem, “recurse into a wilderness of mirrors, of questions about trust and identity in thenetworked information world” (2000, no pag.), still rings true. 75 Authenticity can beunderstood as a measure for the digital object's “trustworthiness” in the sense that anobject is authentic to the degree that it is what it seems or purports to be (see nestor2008, 7). 76 As Factor et al. note,[a]uthenticity refers to the reliability of the data in the broad sense [...]. To validate authenticity ofa preserved data object provenance is needed, i.e., the documented history of creation,ownership, accesses, and changes that have occurred over time for a given data object. Also ameans is needed to guarantee that data is whole and uncorrupted (integrity). (2009, no pag.)It follows that authenticity strongly depends on the repository's ability to ascertain andguarantee that a digital object was created by the specified author/source at the specifiedtime – information that is also indispensable if the object is to be usable for/in scholarlyresearch – and on a documentation of any transformation the object may have undergonesince submission:Ein wichtiger Aspekt ist, dass das vorliegende Objekt von der angegebenen Quelle und zurangegebenen Zeit erstellt wurde. Ferner schließt Authentizität den lückenlosen Nachweis allerim Sinne der Erhaltungsmaßnahmen durchgeführten Transformationen an den Objekten mit ein.(nestor 2008, 7)Such provenance information is crucial in that in the context of long-term preservation weneed to take into account thatin most cases digital objects cannot be preserved without any change in the bit stream, and wehave to modify the original object to have the ability to reproduce it in the future. Unfortunately,this runs counter to the assumption that preserving authenticity implies retaining the identity andintegrity of a digital object, i.e. <strong>free</strong> from tampering or corruption. It is a sort of paradox, wherepreservation entails change, while authenticity needs fixity. (Factor et al. 2009, no pag.)software rather than <strong>file</strong> formats.75 See the DigiCULT thematic issue on “Integrity and Authenticity of Digital Cultural Heritage Objects” (2002)for very similar observations. http://www.digicult.info/downloads/thematic_issue_1_final.pdf – 31.10.2009.“Authenticity” is also the focus of two letters to the editor published in issue 4.2 (2009) of the InternationalJournal of Digital Curation (see Wilson 2009 and Jantz 2009).76 See also the revised version of the OAIS model (May 2009 version for public comment), which definesauthenticity as “the degree to which a person (or system) may regard an object as what it is purported tobe” (1-8). The question of authenticity is also considered by Wilson in the InSPECT 2.2 work package(2007). Wilson draws attention to the fact that rather than adhering to a “broad meaning” of authenticity,with “all the connotations of that much overused word truth,” it makes sense in the context of digital longtermpreservation to “limit 'authenticity' to its archival meanings to do with what a record purports to be andhow it was created. Variations of this definition abound but the central core of the concept is fixed [...]. Forthe UK National Archives assessing authenticity involves establishing the integrity and identity of the object– integrity here referring to the objects 'wholeness and soundness', and identity referring to attributes suchas context and provenance” (Wilson 2007, 4).27
- Page 6: AbstractTaking its cue from the inc
- Page 13: and benefit from the development an
- Page 18 and 19: German repositories have already be
- Page 20 and 21: [t]he Open Archival Information Sys
- Page 22 and 23: Thus it seems highly recommendable
- Page 24 and 25: actively pursuing the long-term pre
- Page 26 and 27: Like pedocs, the repository is not
- Page 28 and 29: application for a project grant was
- Page 32 and 33: Integrity can be defined as “comp
- Page 34 and 35: It is with this step that the metad
- Page 36 and 37: documents submitted to the reposito
- Page 38 and 39: [t]he majority of OCR software supp
- Page 40 and 41: One of the shortcomings of the soft
- Page 42 and 43: set of shared metadata which is the
- Page 44 and 45: document, and that hence the docume
- Page 46: Structural metadata: In DSpace it i
- Page 49 and 50: dc.description.provenancedc.descrip
- Page 51 and 52: Für die Langzeitverfügbarkeit der
- Page 53 and 54: ecord for a title; although a workf
- Page 55 and 56: Source (where applicable)Publicatio
- Page 57 and 58: checksums. In particular, TRAC requ
- Page 59 and 60: checksums are currently not checked
- Page 61 and 62: 2.4.2 JUWEL Data ManagementThe stru
- Page 63 and 64: Before any SIPs are accepted, the r
- Page 65 and 66: guarantee that documents are “arc
- Page 67 and 68: pedocs is a scholarly open access d
- Page 69 and 70: formats. 151 Although the possible
- Page 71 and 72: While the preferred file format is
- Page 73 and 74: from nestor criterion 8, making it
- Page 75 and 76: preserved for the long term, will h
- Page 77 and 78: It seems that of all three reposito
- Page 79 and 80: associated with them, or has define
- Page 81 and 82:
versioning functionality which allo
- Page 83 and 84:
communication channels, responsibil
- Page 85 and 86:
Works CitedAllinson, Julie (2006):
- Page 87 and 88:
DSpace Homepage. http://www.dspace.
- Page 89 and 90:
Lynch, Clifford A. (2000): Authenti
- Page 91 and 92:
in the EU. Amsterdam: Amsterdam Uni
- Page 93 and 94:
Ingestnestor TRAC DINIReceive Submi
- Page 95 and 96:
Archival Storagenestor TRAC DINIRec
- Page 97 and 98:
Archival Information Update10.4 Das
- Page 99 and 100:
Preservation PlanningnestorMonitor