(Katie Fraser from the University of Nottingham reports on the Research Software Preservation breakout session at the RDMF16 event, which took place in Edinburgh in late November 2016...)
The Research Software breakout session was proposed and facilitated by Jez Cope from the University of Sheffield, and was attended by delegates from the RDMF, including representatives from a number of universities, the British Library, and Archivum. We all shared an interest in the use of software in research, and the challenges it presents for research data.
Our discussion opened on the topic of research funders. We noted that few of the research councils explicitly address software in their requirements. However, we recognised the Software Sustainability Institute (SSI) as a leader in best practice around research software. As discussed by Professor Les Carr earlier in the RDMF programme, the SSI was founded by EPSRC, suggesting there is significant interest in software from at least one research funder.
Research software sits on a spectrum, from ubiquitous commercial software and systems to small personal scripts. We identified a range of licensing issues and practical challenges exist between these extremes. However, the two key questions we felt RDM needed to grapple with are addressed below.
How do we ensure access to research data which needs specific software to be read?
We agreed that the short-term view “I had to buy software x to process my data; you have to buy it to read it” is of limited use, and that ensuring legacy access to data needs to be addressed proactively. Options we discussed included:
- dual-saving data – in both proprietary data formats and open ones – in order to maximise the chance of long-term access to core information, while also preserving data’s original form
- archiving the software need to read the data in private repository collections: on the grounds that intellectual property concerns would only become critical if the software was otherwise lost
- using virtual machines and emulators to recreate software environments: learning from an approach particularly common in videogames (which themselves can be research outputs).
How do we approach the software our researchers produce?
Software produced in the course of research stimulated even more discussion! We discussed:
- licensing and documenting code appropriately
- establishing institutional responsibility
- archiving code simply and expediently.
Programmers may use code in their software that is new, reworked or replicated, and software will often include pre-existing subroutines developed outside a project. This can make licensing and ensuring the provenance of code complex. However, we were simplifying matters by actively tapping into best practice from existing approaches to software development, including advice on open source software, and professional approaches to code writing and documentation covered in programming and software engineering courses.
While some developers approached research software with excellent skills, most of us had encountered researchers who had moved into developing software from a purely research background. These researchers were unlikely to have had formal training in coding. We debated how much responsibility institutions should take to ensure code documentation skills are learnt by all their researchers, by developing new routes for training and support, or widening access to existing routes.
Our group had a range of experience assisting in archiving code. Some of our institutional repositories are already hosting code, e.g. the University of Bristol data repository. We were also looking towards more specialist solutions, like Zenodo, where the Github to Zenodo functionality allows data archiving directly from a tool that is already integrated into many programmers’ workflows. The idea of building a similar bridge between Github and our own repositories was raised, but would require significant development work.
The discussion was wide-ranging and interesting, and I think the area is one where many of us involved in RDM will be growing our understanding and offering over the next few years. The takeaways from the session for me, and the questions I’ll be pursuing in my own institution are:
- Do our institutional research data workflows do enough to encourage and support researchers in thinking about the software needed to read their data?
- Do we need to make information about licensing of research software more readily available to our research community?
- Does our institution need to promote training on best practice around coding and documentation more widely?
- What other services and systems could we offer to ensure the research software our researchers develop can be appropriately archived?
Leave me a comment if you’re willing to share how your own institution has approached these challenges.