This is a guest post by Laurence Horton, Data Librarian at The London School of Economics and Political Science (LSE)
It was a hot and stormy day; the rain fell in torrents – except at occasional intervals, when I dodged the showers to walk to King’s College London (for it is in London that our scene lies). There, sat in a circle in a windowless room like some kind of self-help support group, was LARD.
LARD is London Area Research Data and this was its inaugural meeting, bringing together various people from London based institutions (and as far away as Reading) who are charged in some way with Research Data Management (RDM) – be it research support or repository work. These are my notes, which lack attribution partly because I couldn’t remember where every person was from, and also it wasn’t clear if the meeting was on or off the record. Nonetheless, I felt there were some interesting points that deserve sharing as an insight into how UK universities (and one research centre) are dealing with RDM less than a year away from the EPSRC deadline on expectations of compliance for research data.
The first item in what was a free form discussion (think RDM jazz – hence my beat style kind of note taking, with full stops however), was policies. Some institutions have data policies, some have draft policies, and others have no policy. The mood seemed to be that a policy was more effective as a mandate for focusing university attention and resources on support services, not so much for grabbing researchers’ attention. Researchers, it was said, tend to react more to what funders want rather than university policies or documents. Those universities that competed for Medical Research Council (MRC) funding felt the MRC demanded institutional data policies, and so those institutions tended to adopt or have drafts ready for adoption. Yet most researchers are not funded by one of the RCUK councils, and these are often funders without data mandates.
The group found a problem telling researchers that they don’t own their own data (it’s often funders or institutions through employee created works clauses). There was also a sense that researchers worry about data protection and are looking for practical guidance on how to keep data safe and secure. There was also a recognition that disciplines matter, those disciplines that do not have a strong culture of sharing data can be helped with the weight of institutional support providing the infrastructure to support RDM. This tackles the disciplinary focus of researchers, or localism. An example of how a bad experience can focus attention was mentioned when a researcher lost data by plugging a malware infected hard drive into a university network and had to have the drive and the copy of the data destroyed. Episodes like this can be used to tackle the culture of “improvisation” when it comes researchers “backing-up” their data without, or without engaging, institutional support. Aside from acting as a “wake-up” for researchers, they can push universities into providing workable, easy to use, institutional storage – either working storage or preservation in an institutional repository.
Discussion then moved round to the EPSRC expectations for research data, with those who attended a recent DCC event on the EPSRC expectations reporting that the EPSRC are not looking to get rid of opportunities for supporting research, so are not likely to cut off funding come May 2015. However, they do expect to see evidence that institutions are working towards or trying to improve storage, support, and data discovery and access. Nonetheless, there is no doubt the EPSRC policy has focused knowledge and effort in institutions towards RDM.
Then training was mentioned. When the “T” word is mentioned I often think of the quote attributed to Yogi Berra: “If people don’t want to come to the ballpark how are you going to stop them?” To save us from preparing to teach to empty rooms, the thinking now seems to be towards providing support when people need it and building up a directory of experts to refer to when appropriate. Structured support is based on identifying four key stages in the data lifecycle: submitting a proposal (for help on data management planning), when proposals are accepted (implementing RDM), mid-project (supporting implementation), and towards the close to talk about preservation. The key is to keep engagement with researchers. One institution is trying to do this for all research projects at that institution so is working with their research office to target RCUK funded projects. Another institution initially plans to work with a sample of projects.
By now the discussion had moved on to data management planning. One institution had a Data Management Plan (DMP) template and DMP requirement as part of its data policy, with separate plans for staff and postgraduate students. The feeling was that template texts are not such a good thing if they are copied and pasted into DMPs. A case was mentioned of one research funder refusing to fund a project because the DMP used identical text to another DMP submitted from that institution. The DCC’s DMPOnline tool was mentioned, particularly it’s ability to be customised towards an institution. It was also mentioned that DMPOnline has been much improved in later versions. A policy was mentioned at one institution of not offering storage until a DMP has been completed, another institution reported on how there is a checkbox in the research office to signify that the DMP has been looked at by the data management officer.
The RDM equivalent of Godwin’s law (or Godwin’s Rule of Nazi Analogies), is that at some point cost will be mentioned. How to cost RDM is an ongoing problem. Given the problem of identifying costs that specifically relate to RDM activity, as opposed to to typical research requirements that have an RDM aspect, an additional problem is that RCUK funders mostly allow budgeting for RDM but that budgeting must not identify activity that is supported as part of general institutional funding. Auditing costs is a problem. Storage tends to have the easier to identify costs (storage per byte for example), but this can be a problem if data is stored in an institutional repository when the budget for the project identified separate storage costs. For this reason, solutions like Arkivum may be advantageous as they can be specified as an auditable costs.
The coda to this discussion concerned metadata. It was said that funders were keen on ensuring that good quality metadata accompanies research data generated by projects they support, and that they are willing to allow proposals that factor in additional time and resources for metadata. However, an obvious problem is who should be adding that metadata – is it researchers who know the data, but not necessarily the standard or see its importance in the way RDM support staff do; or should it be RDM staff, particularly repository staff, who know they type of information required but do not necessarily know the data or discipline that well. Finally, hitting on a standard that that is applicable to all data is a problem. Social science is not the same as genetics; art history is not the same as management. It was then asked if there was a way to harvest metadata when that metadata is created elsewhere (say, the UK Data Service). Both the DCC and UK Data Service are working on a Jisc funded Research Data Registry and Discovery Service and the European Union are also working on data discovery platforms that imports/exports catalogue record metadata.
The feeling at the end of this initial meeting was LARD provided a useful forum for sharing practice and learning from contemporaries and there was enthusiasm for follow-up meetings including those based around structured themes.
If you work in a big city, and there are people doing similar things to you in that city, take advantage and get together to talk. So, thanks to Gareth Knight (LSHTM), Stephen Grace (UEL), and Veronica Howe (KCL) for organising, facilitating, and hosting LARD #1.