Integration of Data and Publications - Alliance for Permanent Access
Integration of Data and Publications - Alliance for Permanent Access
Integration of Data and Publications - Alliance for Permanent Access
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
Report on <strong>Integration</strong> <strong>of</strong> <strong>Data</strong> <strong>and</strong> <strong>Publications</strong> Grant Agreement no.: 261530<br />
descriptions <strong>of</strong> the data to so-called data publications all the way to (linking) the full<br />
publication using the data. Services like Pangaea that require researchers to submit<br />
metadescriptions with their data <strong>and</strong> adhere to certain <strong>for</strong>matting conventions (so that<br />
all datasets can be interpreted in a similar way) are a solid beginning. Crosslinks<br />
between articles <strong>and</strong> data are another means to support interpretability, because<br />
verbalized interpretation <strong>of</strong> the dataset in a publication helps the underst<strong>and</strong>ing <strong>of</strong> the<br />
original dataset. While links from articles to data become increasingly common, the<br />
other way around from data to articles is not yet so widely used, but good examples exist:<br />
e.g., Pangaea, PubChem <strong>and</strong> the Cambridge Crystallographic <strong>Data</strong>base Centre. From a<br />
technical viewpoint, the interpretability <strong>of</strong> datasets can be ensured by separating them<br />
from vulnerable data carriers like CD-ROMs or DVDs <strong>and</strong> storing them on hard drives,<br />
including backups, <strong>for</strong>ward migration <strong>and</strong> replications. <strong>Data</strong> centres seem to be best<br />
equipped to take on this challenge. In disciplines where there are no established data<br />
centres (yet), the universities institutional data centre, well equipped libraries, or library<br />
federations or initiatives like Dryad UK should st<strong>and</strong> in, although this may perpetuate<br />
the risk <strong>of</strong> fragmentation.<br />
Re-usability<br />
Ensuring re-usability is the most difficult goal <strong>of</strong> data management in a data centre <strong>and</strong><br />
library setting. In addition to all the preconditions needed to ensure interpretability, reusability<br />
<strong>of</strong>ten requires s<strong>of</strong>tware to be available <strong>for</strong> analysing the datasets. The<br />
researcher who wants to re-use another researcher’s dataset does not only need<br />
intellectual, discipline specific underst<strong>and</strong>ing <strong>of</strong> the available datasets, but also the<br />
skills to operate the appropriate s<strong>of</strong>tware. Besides constant monitoring <strong>of</strong> the data<br />
holdings, libraries <strong>and</strong> data centres need to maintain <strong>for</strong>mat <strong>and</strong> s<strong>of</strong>tware registries to<br />
plan <strong>for</strong> data preservation actions. First approaches to preservation <strong>of</strong> scientific data<br />
were <strong>for</strong> example, developed in the CASPAR project 70 , <strong>and</strong> are followed up in the<br />
APARSEN network <strong>of</strong> excellence 71 , but continued research is needed.<br />
General dilemmas<br />
Altogether, the many new initiatives in the area <strong>of</strong> data integration are promising.<br />
However, against the expected explosion <strong>of</strong> research data (see chapter 1 <strong>and</strong> 2) they are<br />
still more or less exceptional cases. There are a couple <strong>of</strong> pioneering libraries, <strong>of</strong>ten<br />
embedded in big <strong>and</strong> capable universities <strong>and</strong> involved in several initiatives at one time.<br />
The danger is that a few actors master the transition to a data-intensive scholarly<br />
in<strong>for</strong>mation infrastructure well, <strong>and</strong> that the majority <strong>of</strong> stakeholders follow in a passive<br />
manner.<br />
70 http://www.casparpreserves.eu/<br />
71 http://www.alliancepermanentaccess.org/current-projects/aparsen<br />
Opportunities <strong>for</strong> <strong>Data</strong> Exchange (ODE) –www.ode-project.eu 77