integrity between AIPs and their metadata. Thus, TRAC criterion B5.3 requires that“[e]very AIP must have some descriptive information and all descriptive information mustpoint to at least one AIP, such that the integrity can be validated” and suggests thefollowing evidence: “Descriptive metadata; persistent identifier/locator associated with AIP;documented relationship between AIP and metadata; system documentation and technicalarchitecture; process workflow documentation” (TRAC 2007, B5.3). Criterion B5.4requires that this referential integrity must also be maintained. 138 To achieve referentialintegrity, the nestor catalog suggests that persistent identifiers are used to identify digitalobjects and all their parts (including metadata), and/or that metadata and content objectare stored in one <strong>file</strong> or directory (cf. nestor 2008, 12.7). Similarly, DINI criterion 2.8requires that a permanent link is created between metadata and document, e.g. by meansof persistent identifiers or containers. In practice, these functions are primarily performedby the repository software; more particularly, the DBMS managing the tables containingthe information.2.4.1 pedocs Data ManagementAs just pointed out, a primary concern of the Data Management functional entity isthe referential integrity between archived information packages and their associatedmetadata (cf. TRAC 2007, B5.3). This can be achieved by saving objects and theirmetadata together in one container. pedocs does not currently use such containers butbuilds its information “packages” virtually, so to speak, by linking metadata and object viathe relational database tables.In addition, a cover sheet will be used in the future to permanently link metadata anddigital objects linked by including them in the same <strong>file</strong>. Thus, each document uploadedinto pedocs will automatically (via a PHP script) receive a cover sheet added to the <strong>PDF</strong>as its first page during the Ingest process. The workflow for this procedure is currentlybeing tested, and a number of documents have already received cover sheets (see forexample http://nbn-resolving.de/urn/resolver.pl?urn=urn:nbn:de:0111-opus-18568 –01.11.2009). It is planned that by the end of 2009 all documents contained in pedocs willhave cover sheets containing core bibliographic metadata, the URN, copyright/useinformation, as well as the logo of the publisher (where applicable) by whom the work wasfirst published. In addition, if the work was digitized by pedocs, this is also stated. Thispractice is clearly described in the pedocs policy document so that submitting authors areaware of the fact that the <strong>file</strong>s they submit will be modified in this manner. With this coversheet, the problem that currently the <strong>PDF</strong> documents are separated from their metadataonce they have been opened, printed, and/or saved on a user's computer is solved.138 Suggested evidence includes “[l]og detailing ongoing monitoring/checking of referential integrity,especially following repair/modification of AIP; legacy descriptive metadata; persistence ofidentifier/locator; documented relationship between AIP and metadata; system documentation andtechnical architecture; process workflow documentation” (TRAC 2007, B5.4).56
2.4.2 JUWEL Data ManagementThe structure of JUWEL information packages is currently defined by the DSpacestandard configuration. For AIPs, this implies in particular the structuring of items intobundles and bitstreams outlined above. The actual bitstreams are placed in the so-called“Bitstream Store” which is either the <strong>file</strong> system on the server or provided by a storagemanager (Storage Resource Broker, SRB). A bitstream's location in the <strong>file</strong> system isidentified by means of a “38-digit internal ID […] used to determine the exact location(relative to the relevant store directory) that the bitstream is stored in [...]” (Tansley et al.2006, 84). All of these components are linked by means of ID numbers which function askeys in the relational databases. 139 In addition, DSpace is capable of creating informationpackages for export containing, among others, an XML <strong>file</strong> with the metadata, the item'shandle, the license text <strong>file</strong>s as well as the content document(s).According to the DSpace wiki, among the changes planned for the DSpace 2.0release is also the implementation of an AIP Asset Store. Thus,[i]n the current architecture, all metadata is in a relational database, and all content bitstreamsare in the <strong>file</strong> system on the server. This makes certain preservation-related activities complex,including backups, auditing, and replication/distribution. The proposed new asset store storesmetadata and content together as standards-based Archival Information Packages OAISterminology. This does not replace the current relational database in DSpace. Although theseAIPs are the authoritative version of information in the system, the relational database andsearch indices are still used for performant access; however, these are considered caches ofthe information in the AIPs […].An AIP consists of:- A core metadata METS document, conforming to the DSpace AIP METS pro<strong>file</strong> [...].- Zero or more bitstreams. These may be content <strong>file</strong>s (which are in the <strong>file</strong> manifest of theMETS document), or additional metadata (referred to by the METS document).- A checksum for the METS. (DSpace Wiki 2009) 140While this Asset Store DSpace on the one hand makes an effort to further OAISimplementationby creating actual AIPs rather than storing the information only in therelational databases. At the same time however, this might lead to storing the checksumand the digital object which it describes in one container, which would not be inaccordance with the TRAC catalog of criteria. In addition, as the relational database is toremain functional, it might become more complex to establish the integrity of theinformation stored in the repository: thus, not only referential integrity of the database hasto be monitored, but in addition it has to be ascertained that database and asset store donot deviate from each other in terms of the information they contain.As mentioned in above quotation, a METS pro<strong>file</strong> will be available for DSpace AIPs.Similarly, a METS pro<strong>file</strong> for SIPs has been proposed in order to answer to “a need toprepare a METS pro<strong>file</strong> or pro<strong>file</strong>s that will govern the creation of the three types ofcontent 'packages' defined by the [OAIS] reference model” (DSpace Wiki 2008b). Thus,139 A graphical visualization of the relational database is available athttp://dspace.cvs.sourceforge.net/dspace/dspace/docs/image/db-schema.gif?view=markup – 30.10.2009.140 Please note that the page is currently marked as requiring an update. It is, however, not transparentwhich information exactly needs to be updated.57
- Page 6:
AbstractTaking its cue from the inc
- Page 13: and benefit from the development an
- Page 18 and 19: German repositories have already be
- Page 20 and 21: [t]he Open Archival Information Sys
- Page 22 and 23: Thus it seems highly recommendable
- Page 24 and 25: actively pursuing the long-term pre
- Page 26 and 27: Like pedocs, the repository is not
- Page 28 and 29: application for a project grant was
- Page 30 and 31: generating an Archival Information
- Page 32 and 33: Integrity can be defined as “comp
- Page 34 and 35: It is with this step that the metad
- Page 36 and 37: documents submitted to the reposito
- Page 38 and 39: [t]he majority of OCR software supp
- Page 40 and 41: One of the shortcomings of the soft
- Page 42 and 43: set of shared metadata which is the
- Page 44 and 45: document, and that hence the docume
- Page 46: Structural metadata: In DSpace it i
- Page 49 and 50: dc.description.provenancedc.descrip
- Page 51 and 52: Für die Langzeitverfügbarkeit der
- Page 53 and 54: ecord for a title; although a workf
- Page 55 and 56: Source (where applicable)Publicatio
- Page 57 and 58: checksums. In particular, TRAC requ
- Page 59: checksums are currently not checked
- Page 63 and 64: Before any SIPs are accepted, the r
- Page 65 and 66: guarantee that documents are “arc
- Page 67 and 68: pedocs is a scholarly open access d
- Page 69 and 70: formats. 151 Although the possible
- Page 71 and 72: While the preferred file format is
- Page 73 and 74: from nestor criterion 8, making it
- Page 75 and 76: preserved for the long term, will h
- Page 77 and 78: It seems that of all three reposito
- Page 79 and 80: associated with them, or has define
- Page 81 and 82: versioning functionality which allo
- Page 83 and 84: communication channels, responsibil
- Page 85 and 86: Works CitedAllinson, Julie (2006):
- Page 87 and 88: DSpace Homepage. http://www.dspace.
- Page 89 and 90: Lynch, Clifford A. (2000): Authenti
- Page 91 and 92: in the EU. Amsterdam: Amsterdam Uni
- Page 93 and 94: Ingestnestor TRAC DINIReceive Submi
- Page 95 and 96: Archival Storagenestor TRAC DINIRec
- Page 97 and 98: Archival Information Update10.4 Das
- Page 99 and 100: Preservation PlanningnestorMonitor