13.03.2014 Views

Assessment of file format testing tools - Jisc

Assessment of file format testing tools - Jisc

Assessment of file format testing tools - Jisc

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Digital Asset <strong>Assessment</strong> Tool – File Format Testing Tools -Version 1.2 – 2006-12-13<br />

5. Using DROID, JHOVE or Empirical Walker as part <strong>of</strong> a D-<br />

PAS tool<br />

5.1 Overview<br />

In theory, a D-PAS tool might be able to integrate with any <strong>of</strong> these <strong>file</strong> <strong>format</strong> analysis <strong>tools</strong>.<br />

If our project’s hierarchy model proves workable, then a <strong>file</strong> <strong>format</strong> analysis tool could<br />

integrate its results at the level we have determined as 'item'level. In particular, the way the<br />

D-PAS tool is currently built, a tool such as DROID or JHOVE would come into play at our<br />

‘Level 5’, and its results would be applicable to Sections 8-11 <strong>of</strong> the questionnaire.<br />

This assumes that there are two parallel surveys going on: the D-PAS tool collecting the toplevel<br />

pr<strong>of</strong>ile in<strong>format</strong>ion about the organisation and its collections, with the <strong>file</strong> <strong>format</strong> tool<br />

gathering pr<strong>of</strong>ile in<strong>format</strong>ion on individual assets. At some point, the results from a top-down<br />

survey and bottom-up survey would have to find some connectivity to make such an exercise<br />

work.<br />

There are some fields in D-PAS Sections 8-11 where an overlap, if not an exact correlation,<br />

with DROID / JHOVE type in<strong>format</strong>ion can be clearly seen. For example, the following D-PAS<br />

questions from Section 8:<br />

• ‘File <strong>format</strong> type and version’<br />

• ‘Application (and version) used to create the asset’<br />

• ‘Location metadata’<br />

There remains a question as to how to automate the integration <strong>of</strong> the <strong>tools</strong>; whether results<br />

from DROID or JHOVE can be extracted and automatically imported into a D-PAS database,<br />

and have specific scores assigned to the results, thus adding to the final D-PAS score.<br />

The specific D-PAS questions on in Section 10 (File Formats) and Section 11 (S<strong>of</strong>tware) are<br />

slightly more complex. They are asking questions about the stability <strong>of</strong> <strong>format</strong>s, and the<br />

reliability <strong>of</strong> s<strong>of</strong>tware, but phrased in such ways that DROID or JHOVE can’t really answer. If<br />

this integrated approach were to succeed, D-PAS would need a certain amount <strong>of</strong> reworking.<br />

5.2 DROID integration with D-PAS<br />

In terms <strong>of</strong> adding an automated 'crawl and assess'feature to a D-PAS tool, we think DROID<br />

leaves a lot to be desired. DROID will do an automated crawl <strong>of</strong> all <strong>file</strong> <strong>format</strong>s in a drive, but<br />

it will only provide 'static'in<strong>format</strong>ion on its <strong>format</strong>, based on whatever in<strong>format</strong>ion is currently<br />

stored in PRONOM. Importantly, DROID isn't really looking 'inside'a <strong>file</strong>, just reporting on the<br />

extension. To put it bluntly, for all your <strong>file</strong>s which end in .TXT, DROID will tell you exactly the<br />

same thing for all <strong>of</strong> them.<br />

In ULCC’s test, the sample <strong>of</strong> <strong>file</strong>s included ‘non asset’ <strong>file</strong>s with the extensions <strong>of</strong>, for<br />

example, HLP, BAK, INI, EXE. The reason for doing this is that these <strong>file</strong>s may be used to<br />

access and read the digital data. The hardware environment also may be dependent on these<br />

<strong>file</strong>s. If these <strong>file</strong>s are missing or not safe, the asset is at risk. Unfortunately, DROID does not<br />

even recognise any <strong>of</strong> these system <strong>file</strong> types. Wherever possible we have commented on<br />

why we think they ought to be included in any tool used for risk assessment.<br />

DROID does not extract metadata from a <strong>file</strong>. This lack <strong>of</strong> metadata extraction may not be a<br />

problem for D-PAS. Metadata is useful overall for preservation, but it's not particularly<br />

germane to the risk assessment that D-PAS is meant to do. However, it is relevant to a risk<br />

assessment if there is no metadata available at all.<br />

In terms <strong>of</strong> output, DROID is capable <strong>of</strong> generating a .CSV <strong>file</strong>, so the results <strong>of</strong> a DROID<br />

survey could conceivably be integrated with a D-PAS-type database.<br />

11

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!