RDMF13: notes from breakout group 2 (systems)

Breakout group 2, session 1 – What systems are involved in data preparation processes, and what impacts do they have? – chair: John Beaman, University of Leeds

M Wolf – systems for non-digital data (x2)

Several institutions do not have data repositories in place yet

Interest in EPrints and DSpace as data platforms

JB – The better prepared data is before it enters the repo, the more likely it is to be reused. But metadata creation is something of a chore – how might systems lighten this burden?

Types of system/activities used – integrity checking

MW - (Highly) negative feedback about Recollect.

Data should be documented at every stage, but not necessarily utilising metadata standards – too much of a turn-off for researchers.

GB (Lboro) – key thing is to get researchers engaged: if you make it too onerous, they won’t do it. So start light, then build on that.

JB – Simplicity is vital.

John Lewis – we should differentiate between research documentation and data documentation (metadata).

Peter (BTO) – documentation can be zipped into / bundled with the data deposit

DOIs help map to DataCite schema, and are widely understood – only 5 mandatory fields in DataCite, so pretty minimal.

JB – Paper discovery is most likely to be done via Google Scholar or some other aggregator/harvester – not via university or repository search function. Likely to be the same for data – so simplicity (of metadata creation and discovery mechanisms) is a key theme.

Does anyone disagree that minimal metadata is the best approach? Suzanne and Shannon do! Minimal is fine for human-readable, but to use it for other purposes it needs to be different. A more detailed schema enhances discoverability.

Aneesha – who checks quality? A. Sometimes librarians, who do sanity checking and spot obvious mistakes/typos. But they’re likely to miss the smaller issues, being non-specialists. So ideally researchers would spend some time on this, but it’s hard to get them to do it. A’s institution is following a relatively more intensive metadata approach, given that lots of deposits are not well documented when they are submitted.

JB – pros and cons of a one size fits all approach (i.e. bundling into a zip file, accompanies with a readme file.) Not great from a discoverability standpoint, and may lead to more errors. So more care required from the researcher’s side. But little incentive for this at present.

JB – Google Refine / Open Refine

Scientific instruments introduce closed, proprietary data formats (often without much forethought.) Ideally it can also be converted into an open format, and both versions preserved.