Assessment of file format testing tools - Jisc
Assessment of file format testing tools - Jisc
Assessment of file format testing tools - Jisc
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
Digital Asset <strong>Assessment</strong> Tool – File Format Testing Tools -Version 1.2 – 2006-12-13<br />
5. Using DROID, JHOVE or Empirical Walker as part <strong>of</strong> a D-<br />
PAS tool<br />
5.1 Overview<br />
In theory, a D-PAS tool might be able to integrate with any <strong>of</strong> these <strong>file</strong> <strong>format</strong> analysis <strong>tools</strong>.<br />
If our project’s hierarchy model proves workable, then a <strong>file</strong> <strong>format</strong> analysis tool could<br />
integrate its results at the level we have determined as 'item'level. In particular, the way the<br />
D-PAS tool is currently built, a tool such as DROID or JHOVE would come into play at our<br />
‘Level 5’, and its results would be applicable to Sections 8-11 <strong>of</strong> the questionnaire.<br />
This assumes that there are two parallel surveys going on: the D-PAS tool collecting the toplevel<br />
pr<strong>of</strong>ile in<strong>format</strong>ion about the organisation and its collections, with the <strong>file</strong> <strong>format</strong> tool<br />
gathering pr<strong>of</strong>ile in<strong>format</strong>ion on individual assets. At some point, the results from a top-down<br />
survey and bottom-up survey would have to find some connectivity to make such an exercise<br />
work.<br />
There are some fields in D-PAS Sections 8-11 where an overlap, if not an exact correlation,<br />
with DROID / JHOVE type in<strong>format</strong>ion can be clearly seen. For example, the following D-PAS<br />
questions from Section 8:<br />
• ‘File <strong>format</strong> type and version’<br />
• ‘Application (and version) used to create the asset’<br />
• ‘Location metadata’<br />
There remains a question as to how to automate the integration <strong>of</strong> the <strong>tools</strong>; whether results<br />
from DROID or JHOVE can be extracted and automatically imported into a D-PAS database,<br />
and have specific scores assigned to the results, thus adding to the final D-PAS score.<br />
The specific D-PAS questions on in Section 10 (File Formats) and Section 11 (S<strong>of</strong>tware) are<br />
slightly more complex. They are asking questions about the stability <strong>of</strong> <strong>format</strong>s, and the<br />
reliability <strong>of</strong> s<strong>of</strong>tware, but phrased in such ways that DROID or JHOVE can’t really answer. If<br />
this integrated approach were to succeed, D-PAS would need a certain amount <strong>of</strong> reworking.<br />
5.2 DROID integration with D-PAS<br />
In terms <strong>of</strong> adding an automated 'crawl and assess'feature to a D-PAS tool, we think DROID<br />
leaves a lot to be desired. DROID will do an automated crawl <strong>of</strong> all <strong>file</strong> <strong>format</strong>s in a drive, but<br />
it will only provide 'static'in<strong>format</strong>ion on its <strong>format</strong>, based on whatever in<strong>format</strong>ion is currently<br />
stored in PRONOM. Importantly, DROID isn't really looking 'inside'a <strong>file</strong>, just reporting on the<br />
extension. To put it bluntly, for all your <strong>file</strong>s which end in .TXT, DROID will tell you exactly the<br />
same thing for all <strong>of</strong> them.<br />
In ULCC’s test, the sample <strong>of</strong> <strong>file</strong>s included ‘non asset’ <strong>file</strong>s with the extensions <strong>of</strong>, for<br />
example, HLP, BAK, INI, EXE. The reason for doing this is that these <strong>file</strong>s may be used to<br />
access and read the digital data. The hardware environment also may be dependent on these<br />
<strong>file</strong>s. If these <strong>file</strong>s are missing or not safe, the asset is at risk. Unfortunately, DROID does not<br />
even recognise any <strong>of</strong> these system <strong>file</strong> types. Wherever possible we have commented on<br />
why we think they ought to be included in any tool used for risk assessment.<br />
DROID does not extract metadata from a <strong>file</strong>. This lack <strong>of</strong> metadata extraction may not be a<br />
problem for D-PAS. Metadata is useful overall for preservation, but it's not particularly<br />
germane to the risk assessment that D-PAS is meant to do. However, it is relevant to a risk<br />
assessment if there is no metadata available at all.<br />
In terms <strong>of</strong> output, DROID is capable <strong>of</strong> generating a .CSV <strong>file</strong>, so the results <strong>of</strong> a DROID<br />
survey could conceivably be integrated with a D-PAS-type database.<br />
11