documents submitted to the repository must be <strong>free</strong> from any DRM, password protection,or other technical protection measures that might be used, for example, to limit or disablethe print option. Thus – in the case that authors follow the guidelines – pedocs obtains fullphysical/technical control over the digital objects as required by nestor, DINI, and TRAC.As of October 2009, a newly developed tool is used which automatically identifiesdocuments with DRM restrictions during upload. As a result, while pedocs still contains<strong>file</strong>s with such technical limitations, uploaded before the new tool was implemented 91 ,these can now be identified and further action be taken where necessary.Ingest: Quality AssuranceIntegrity: As outlined above, virus checking and cyclic redundancy checks help toascertain a submitted <strong>file</strong>'s completeness and correctness (on a technical/bitstream level),and to determine whether any errors occurred during the transmission. Currently, viruschecks are not implemented into the pedocs workflow as such, i.e. <strong>file</strong>s uploaded throughthe web interface are stored in the pedocs <strong>file</strong> system without having been checked forviruses. However, every uploaded <strong>file</strong> is opened by repository staff in order to see if it isfunctional and is thus automatically scanned by the antivirus software installed on eachcomputer.As explained above, a digital object's integrity also depends on the validity of its <strong>file</strong>format. Thus, small mistakes in a <strong>file</strong>'s code might make it harder to preserve the objectfor the long term, for example, because the behavior of corrupted <strong>file</strong>s during formatmigrations cannot be predicted. A format validation (preceded by a format identification),which ascertains that a <strong>file</strong>'s bitstream is composed exactly according to thespecifications, is thus highly important in the context of long-term preservation. pedocscarries out format validation and characterization as part of the ingest process by meansof JHOVE, which is used to generate a limited amount of technical metadata to besubmitted to the DNB as part of information packages (i.e. <strong>file</strong> format, format version,supplemented by checksum; also see below). Invalid <strong>file</strong>s will not be submitted to thelong-term archive.The completeness and correctness of the SIPs is additionally ascertained by meansof intellectual control, especially with regard to metadata. Thus, for the purpose of qualityassurance the metadata submitted by authors is sometimes modified; also, furthermetadata can be added by repository staff in order to complete the information packagestored and published on the pedocs server. This practice is explicitly mentioned in thepedocs policy and thus authors are aware of the fact that their submitted metadata maybe subject to modification.Authenticity: In the context of the Ingest functional entity, the question of authenticityof digital objects is primarily a question of the extent to which the submitting person or91 Thus, for example, the documents attached to the record 807 are both encrypted and password protected.In addition, the possibility of copying text from the <strong>PDF</strong> has been disabled. Seehttp://www.pedocs.de/volltexte/2009/807/ – 30.10.2009.32
institution can be trusted – e.g. trusted to be the author of a work and/or to have the rightto publish it through the repository. Although there are different possible ways ofauthenticating a user (see above), none of these is currently used at pedocs, mainlybecause there is a concern not to create too many obstacles/hindrances that might keepauthors from submitting their works to the repository. In contrast to other types ofrepositories, as a subject repository pedocs is not in the convenient situation that its usersalready have an ID and/or password, e.g. for the use of other services. Thus, for arepository belonging to the services offered by a University Library it can be comparablyeasy to (re-)use the ID and password its patrons already have (e.g. for the use of thelibrary's on- and offline services) as a means of user authentication with the repository –an approach taken by the TU Hamburg-Harburg, for example. As a disciplinary repository,pedocs does not have such infrastructure ready at hand. This is relatively uncritical wherepedocs enters into direct contact with producers, as is the case, for example, with manypublishers approached by the repository. However, where contact between repository andsubmitting person is only based on e-mail, authenticating the source of submitted materialremains a problem.pedocs is addressing this problem with its newly drafted author agreement, which willhave to be accepted by submitting authors in a double opt-in procedure in the future. 92Thus, producers will have to confirm in a contract (author agreement) that they have therights to publish the document in question with pedocs. They e-mail the contract topedocs, are notified that the contract was received by the repository, and have to confirmagain that they intended to submit the contract. This procedure cannot offer the sameconfirmation of the identity of the depositor as, for example, the creation of user accountsin combination with digital signatures. Yet depositors do close a contract with pedocs bysubmitting the agreement, in which they ascertain that they are who they claim to be – i.e.the person holding the necessary rights to publish the submitted document with pedocs –,and this does offer some (if limited) assurance concerning the authenticity of the source.In the process of conceptualizing and developing pedocs, a further problem wasidentified just recently, which also touches upon the question of authenticity or, moreprecisely, the question of how accurate a representation of the submitted “original” objectthe pedocs record and the document attached to it are. This question concernspublications that were digitized by pedocs (or a third party service provider) and thatunderwent OCR in the digitization process. Depending on the quality of the original printeddocument, OCR may misrecognize characters, which will lead to an inaccuraterepresentation of the original text. According to Tanner, Muñoz and Ros,92 Currently authors have to accept the following shorter agreement when submitting their documents to therepository through the web interface: “Ich übertrage dem Deutschen Institut für InternationalePädagogische Forschung (DIPF) sowie der Deutschen Nationalbibliothek in Frankfurt bzw. Leipzig und derzuständigen Sondersammelgebietsbibliothek das Recht, das/die übermittelte/n Dokument/e elektronischzu speichern und in Datennetzen öffentlich zugänglich zu machen. Ich übertrage dem DIPF ferner dasRecht zur Konvertierung der übertragenen Datei zum Zwecke der Langzeitarchivierung unter Beachtungder Bewahrung des Inhalts. Die Originalarchivierung bleibt erhalten” (http://www.pedocs.de/uni/index.php –11.10.2009).33
- Page 6: AbstractTaking its cue from the inc
- Page 13: and benefit from the development an
- Page 18 and 19: German repositories have already be
- Page 20 and 21: [t]he Open Archival Information Sys
- Page 22 and 23: Thus it seems highly recommendable
- Page 24 and 25: actively pursuing the long-term pre
- Page 26 and 27: Like pedocs, the repository is not
- Page 28 and 29: application for a project grant was
- Page 30 and 31: generating an Archival Information
- Page 32 and 33: Integrity can be defined as “comp
- Page 34 and 35: It is with this step that the metad
- Page 38 and 39: [t]he majority of OCR software supp
- Page 40 and 41: One of the shortcomings of the soft
- Page 42 and 43: set of shared metadata which is the
- Page 44 and 45: document, and that hence the docume
- Page 46: Structural metadata: In DSpace it i
- Page 49 and 50: dc.description.provenancedc.descrip
- Page 51 and 52: Für die Langzeitverfügbarkeit der
- Page 53 and 54: ecord for a title; although a workf
- Page 55 and 56: Source (where applicable)Publicatio
- Page 57 and 58: checksums. In particular, TRAC requ
- Page 59 and 60: checksums are currently not checked
- Page 61 and 62: 2.4.2 JUWEL Data ManagementThe stru
- Page 63 and 64: Before any SIPs are accepted, the r
- Page 65 and 66: guarantee that documents are “arc
- Page 67 and 68: pedocs is a scholarly open access d
- Page 69 and 70: formats. 151 Although the possible
- Page 71 and 72: While the preferred file format is
- Page 73 and 74: from nestor criterion 8, making it
- Page 75 and 76: preserved for the long term, will h
- Page 77 and 78: It seems that of all three reposito
- Page 79 and 80: associated with them, or has define
- Page 81 and 82: versioning functionality which allo
- Page 83 and 84: communication channels, responsibil
- Page 85 and 86: Works CitedAllinson, Julie (2006):
- Page 87 and 88:
DSpace Homepage. http://www.dspace.
- Page 89 and 90:
Lynch, Clifford A. (2000): Authenti
- Page 91 and 92:
in the EU. Amsterdam: Amsterdam Uni
- Page 93 and 94:
Ingestnestor TRAC DINIReceive Submi
- Page 95 and 96:
Archival Storagenestor TRAC DINIRec
- Page 97 and 98:
Archival Information Update10.4 Das
- Page 99 and 100:
Preservation PlanningnestorMonitor