07.08.2014 Views

Integration of Data and Publications - Alliance for Permanent Access

Integration of Data and Publications - Alliance for Permanent Access

Integration of Data and Publications - Alliance for Permanent Access

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Report on <strong>Integration</strong> <strong>of</strong> <strong>Data</strong> <strong>and</strong> <strong>Publications</strong> Grant Agreement no.: 261530<br />

descriptions <strong>of</strong> the data to so-called data publications all the way to (linking) the full<br />

publication using the data. Services like Pangaea that require researchers to submit<br />

metadescriptions with their data <strong>and</strong> adhere to certain <strong>for</strong>matting conventions (so that<br />

all datasets can be interpreted in a similar way) are a solid beginning. Crosslinks<br />

between articles <strong>and</strong> data are another means to support interpretability, because<br />

verbalized interpretation <strong>of</strong> the dataset in a publication helps the underst<strong>and</strong>ing <strong>of</strong> the<br />

original dataset. While links from articles to data become increasingly common, the<br />

other way around from data to articles is not yet so widely used, but good examples exist:<br />

e.g., Pangaea, PubChem <strong>and</strong> the Cambridge Crystallographic <strong>Data</strong>base Centre. From a<br />

technical viewpoint, the interpretability <strong>of</strong> datasets can be ensured by separating them<br />

from vulnerable data carriers like CD-ROMs or DVDs <strong>and</strong> storing them on hard drives,<br />

including backups, <strong>for</strong>ward migration <strong>and</strong> replications. <strong>Data</strong> centres seem to be best<br />

equipped to take on this challenge. In disciplines where there are no established data<br />

centres (yet), the universities institutional data centre, well equipped libraries, or library<br />

federations or initiatives like Dryad UK should st<strong>and</strong> in, although this may perpetuate<br />

the risk <strong>of</strong> fragmentation.<br />

Re-usability<br />

Ensuring re-usability is the most difficult goal <strong>of</strong> data management in a data centre <strong>and</strong><br />

library setting. In addition to all the preconditions needed to ensure interpretability, reusability<br />

<strong>of</strong>ten requires s<strong>of</strong>tware to be available <strong>for</strong> analysing the datasets. The<br />

researcher who wants to re-use another researcher’s dataset does not only need<br />

intellectual, discipline specific underst<strong>and</strong>ing <strong>of</strong> the available datasets, but also the<br />

skills to operate the appropriate s<strong>of</strong>tware. Besides constant monitoring <strong>of</strong> the data<br />

holdings, libraries <strong>and</strong> data centres need to maintain <strong>for</strong>mat <strong>and</strong> s<strong>of</strong>tware registries to<br />

plan <strong>for</strong> data preservation actions. First approaches to preservation <strong>of</strong> scientific data<br />

were <strong>for</strong> example, developed in the CASPAR project 70 , <strong>and</strong> are followed up in the<br />

APARSEN network <strong>of</strong> excellence 71 , but continued research is needed.<br />

General dilemmas<br />

Altogether, the many new initiatives in the area <strong>of</strong> data integration are promising.<br />

However, against the expected explosion <strong>of</strong> research data (see chapter 1 <strong>and</strong> 2) they are<br />

still more or less exceptional cases. There are a couple <strong>of</strong> pioneering libraries, <strong>of</strong>ten<br />

embedded in big <strong>and</strong> capable universities <strong>and</strong> involved in several initiatives at one time.<br />

The danger is that a few actors master the transition to a data-intensive scholarly<br />

in<strong>for</strong>mation infrastructure well, <strong>and</strong> that the majority <strong>of</strong> stakeholders follow in a passive<br />

manner.<br />

70 http://www.casparpreserves.eu/<br />

71 http://www.alliancepermanentaccess.org/current-projects/aparsen<br />

Opportunities <strong>for</strong> <strong>Data</strong> Exchange (ODE) –www.ode-project.eu 77

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!