12.03.2015 Views

EN100-web

EN100-web

EN100-web

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Special theme: Scientific Data Sharing and Re-use<br />

Capturing the Experimental Context<br />

via Research Objects<br />

by Catherine Jones, Brian Matthews and Antony Wilson<br />

Data publication and sharing are becoming accepted parts of the data ecosystem to support<br />

research, and this is becoming recognised in the field of ‘facilities science’. We define facilities<br />

science as that undertaken at large-scale scientific facilities, in particular neutron and synchrotron<br />

x-ray sources, although similar characteristics can also apply to large telescopes, particle physics<br />

institutes, environmental monitoring centres and satellite observation platforms. In facilities<br />

science, a centrally managed set of specialized and high value scientific instruments is made<br />

accessible to users to run experiments which require the particular characteristics of those<br />

instruments<br />

The institutional nature of the facilities,<br />

with the provision of support infrastructure<br />

and staff, has allowed the facilities<br />

to support their user communities by<br />

systematically providing data acquisition,<br />

management, cataloguing and<br />

access. This has been successful to date;<br />

however, as the expectations of facilities<br />

users and funders develop, this approach<br />

has its limitations in the support of validation<br />

and reuse, and thus we propose to<br />

evolve the focus of the support provided.<br />

A research project produces many outputs<br />

during its lifespan; some are formally<br />

published, some relate to the<br />

administration of the project and some<br />

will relate to the stages in the process.<br />

Changes in culture are encouraging the<br />

re-use of existing data which means that<br />

data should be kept, discoverable and<br />

useable, for the long term. For example,<br />

a scientist wishing to reuse data may<br />

have discovered the information about<br />

the data from a journal article; but to be<br />

able to reuse this data they will also need<br />

to understand information about the<br />

analysis done to produce the data<br />

behind the publication. This activity<br />

may happen years after the original<br />

experiment has been undertaken and to<br />

achieve this, the data digital object and<br />

its context must be preserved from the<br />

start.<br />

We propose that instead of focussing on<br />

traditional artefacts such as data or publications<br />

as the unit of dissemination,<br />

we elevate the notion of experiment or<br />

‘investigation’ as an aggregation of the<br />

artefacts and supporting metadata surrounding<br />

a particular experiment on a<br />

facility to a first class object of discourse,<br />

which can be managed, published<br />

and cited in its own right. By providing<br />

this aggregate ‘research object’,<br />

we can provide information at the right<br />

level to support validation and reuse, by<br />

capturing the context for a given digital<br />

object and also preserving that context<br />

over the long term for effective preservation.<br />

In the SCAPE project , STFC<br />

has built on the notion of a Research<br />

Object which enables the aggregation of<br />

information about research artefacts.<br />

These are usually represented as a<br />

Linked Data graph; thus RDF is used as<br />

the underlying model and representation,<br />

with a URI used to uniquely identify<br />

artefacts, and OAI-ORE used as a<br />

aggregation container, with standard<br />

vocabularies for provenance citation<br />

and for facilities investigations. Within<br />

the SCAPE project, the focus of the<br />

research lifecycle is the experiment<br />

undertaken at the ISIS Neutron<br />

Spallation Facility. By following the<br />

lifecycle of a successful beam line<br />

application, we can collect all the artefacts<br />

and objects related to it, with their<br />

appropriate relationships. As this is<br />

strongly related to allocation of the<br />

resources of the facility, this is a highly<br />

appropriate intellectual unit for the<br />

facility; the facility want to record and<br />

evaluate the scientific results arising<br />

from the allocation of its scarce<br />

resources.<br />

Figure 1: Schematic of an Investigation Research Object.<br />

28<br />

ERCIM NEWS 100 January 2015

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!