Breakout group 2, session 1 – What systems are involved in data preparation processes, and what impacts do they have? – chair: John Beaman, University of Leeds
M Wolf – systems for non-digital data (x2)
Several institutions do not have data repositories in place yet
Interest in EPrints and DSpace as data platforms
JB – The better prepared data is before it enters the repo, the more likely it is to be reused. But metadata creation is something of a chore – how might systems lighten this burden?
Types of system/activities used – integrity checking
MW - (Highly) negative feedback about Recollect.
Data should be documented at every stage, but not necessarily utilising metadata standards – too much of a turn-off for researchers.
GB (Lboro) – key thing is to get researchers engaged: if you make it too onerous, they won’t do it. So start light, then build on that.
JB – Simplicity is vital.
John Lewis – we should differentiate between research documentation and data documentation (metadata).
Peter (BTO) – documentation can be zipped into / bundled with the data deposit
DOIs help map to DataCite schema, and are widely understood – only 5 mandatory fields in DataCite, so pretty minimal.
JB – Paper discovery is most likely to be done via Google Scholar or some other aggregator/harvester – not via university or repository search function. Likely to be the same for data – so simplicity (of metadata creation and discovery mechanisms) is a key theme.
Does anyone disagree that minimal metadata is the best approach? Suzanne and Shannon do! Minimal is fine for human-readable, but to use it for other purposes it needs to be different. A more detailed schema enhances discoverability.
Aneesha – who checks quality? A. Sometimes librarians, who do sanity checking and spot obvious mistakes/typos. But they’re likely to miss the smaller issues, being non-specialists. So ideally researchers would spend some time on this, but it’s hard to get them to do it. A’s institution is following a relatively more intensive metadata approach, given that lots of deposits are not well documented when they are submitted.
JB – pros and cons of a one size fits all approach (i.e. bundling into a zip file, accompanies with a readme file.) Not great from a discoverability standpoint, and may lead to more errors. So more care required from the researcher’s side. But little incentive for this at present.
JB – Google Refine / Open Refine
Scientific instruments introduce closed, proprietary data formats (often without much forethought.) Ideally it can also be converted into an open format, and both versions preserved.