Author: Digital Curation Centre blogs

Active, actionable DMPs

Roadmap project IDCC briefing
We had a spectacularly productive IDCC last month thanks to everyone who participated in the various meetings and events focused on the DMPRoadmap project and machine-actionable DMPs. Thank you, thank you! Sarah has since …

Roadmap retrospective: 2016

 
Here’s an update on DMPRoadmap, courtesy of Stephanie Simms at CDL
 
2016 in review
 
The past year has been a wild ride, in more ways than one… Despite our respective political climates, UC3 and DCC remain enthusiastic about our partnership and the future of DMPs. Below is a brief retrospective about where we’ve been in 2016 and a roadmap (if you will…we also wish we’d chosen a different name for our joint project) for where we’re going in 2017. Jump to the end if you just want to know how to get involved with DMP events at the International Digital Curation Conference (IDCC 2017, 20–23 Feb in Edinburgh, register here).
 
In 2016 we consolidated our UC3-DCC project team, our plans for the merged platform (see the roadmap to MVP), and began testing a co-development process that will provide a framework for community contributions down the line. We’re plowing through the list of features and adding documentation to the GitHub repo—all are invited to join us at IDCC 2017 for presentations and demos of our progress to date (papers, slides, etc. will all be posted after the event). For those not attending IDCC, please let us know if you have ideas, questions, anything at all to contribute ahead of the event!
 
DMPs sans frontières 
 
Now we’d like to take a minute and reflect on events of the past year, particularly in the realm of open data policies, and the implications for DMPs and data management writ large. The open scholarship revolution has progressed to a point where top-level policies mandate open access to the results of government-funded research, including research data, in the US, UK, and EU, with similar principles and policies gaining momentum in Australia, Canada, South Africa, and elsewhere. DMPs are the primary vehicle for complying with these policies, and because research is a global enterprise, awareness of DMPs has spread throughout the research community. Another encouraging development is the ubiquity of the term FAIR data (Findable, Accessible, Interoperable, Reusable), which suggests that we’re all in agreement about what we’re trying to achieve.
 
On top of the accumulation of national data policies, 2016 ushered in a series of related developments in openness that contribute to the DMP conversation. To name a few:
 
  • More publishers articulated clear data policies, e.g., Springer Nature Research Data Policies apply to over 600 journals.
  • PLOS now requires an ORCID for all corresponding authors at the time of manuscript submission to promote discoverability and credit.
  • The Gates Foundation reinforced support for open access and open data by preventing funded researchers from publishing in journals that do not comply with its policy, which came into force at the beginning of 2017; this includes non-compliant high-impact journals such as Science, Nature, PNAS, and NEJM.
  • Researchers throughout the world continued to circumvent subscription access to scholarly literature by using Sci-Hub (Bohannon, 2016).
  • Library consortia in Germany and Taiwan canceled (or threatened to cancel) subscriptions to Elsevier journals because of open-access related conflicts, and Peru canceled over a lack of government funding for expensive paid access (Schiermeier and Rodríguez Mega, 2017).
  • Reproducibility continued to gain prominence, e.g., the US National Institutes of Health (NIH) Policy on Rigor and Reproducibility came into force for most NIH and AHRQ grant proposals received in 2016.
  • The Software Citation Principles (Smith et al., 2016) recognized software as an important product of modern research that needs to be managed alongside data and other outputs.
This flurry of open scholarship activity, both top-down and bottom-up, across all stakeholders continues to drive adoption of our services. DMPonline and the DMPTool were developed in 2011 to support open data policies in the UK and US, respectively, but today our organizations engage with users throughout the world. An upsurge in international users is evident from email addresses for new accounts and web analytics. In addition, local installations of our open source tools, as both national and institutional services, continue to multiply (see a complete list here). 
 
Over the past year, the DMP community has validated our decision to consolidate our efforts by merging our technical platforms and coordinating outreach activities. The DMPRoadmap project feeds into a larger goal of harnessing the work of international DMP projects to benefit the entire community. We’re also engaged with some vibrant international working groups (e.g., Research Data Alliance Active DMPs, FORCE11 FAIR DMPs, Data Documentation Initiative DMP Metadata group) that have provided the opportunity to begin developing use cases for machine-actionable DMPs. So far the use cases encompass a controlled vocabulary for DMPs; integrations with other systems (e.g., Zenodo, Dataverse, Figshare, OSF, PURE, grant management systems, electronic lab notebooks); passing information to/from repositories; leveraging persistent identifiers (PIDs); and building APIs. 
 
2017 things to come
This brings us to outlining plans for 2017 and charting a course for DMPs of the future. DCC will be running the new Roadmap code soon. And once we’ve added everything from the development roadmap, the DMPTool will announce our plans for migration. At IDCC we’ll kick off the conversation about bringing the many local installations of our tools along for the ride to actualize the vision of a core, international DMP infrastructure. A Canadian and a French team are our gracious guinea pigs for testing the draft external contributor guidelines.
 
There will be plenty of opportunities to connect with us at IDCC. If you’re going to be at the main conference, we encourage you to attend our practice paper and/or join a DMP session we’ll be running in parallel with the BoFs on Wednesday afternoon, 22 Feb. The session will begin with a demo and update on DMPRoadmap; then we’ll break into two parallel tracks. One track will be for developers to learn more about recent data model changes and developer guidelines if they want to contribute to the code. The other track will be a buffet of DMP discussion groups. Given the overwhelming level of interest in the workshop (details below), one of these groups will cover machine-actionable DMPs. We’ll give a brief report on the workshop and invite others to feed into discussion. The other groups are likely to cover training/supporting DMPs, evaluation cribsheets for reviewing DMPs, or other topics per community requests. If there’s something you’d like to propose please let us know!
 
IDCC DMP utopia workshop
We’re also hosting a workshop on Monday, 20 Feb entitled “A postcard from the future: Tools and services from a perfect DMP world.” The focus will be on machine-actionable DMPs and how to integrate DMP tools into existing research workflows and services.  
 
The program includes presentations, activities, and discussion to address questions such as:
  • Where and how do DMPs fit in the overall research lifecycle (i.e., beyond grant proposals)?
  • Which data could be fed automatically from other systems into DMPs (or vice versa)?
  • What information can be validated automatically?
  • Which systems/services should connect with DMP tools?
  • What are the priorities for integrations?
We’ve gathered an international cohort of diverse players in the DMP game—repository managers, data librarians, funders, researchers, developers, etc.—to continue developing machine-actionable use cases and craft a vision for a DMP utopia of the future. We apologize again that we weren’t able to accommodate everyone who wanted to participate in the workshop, but rest assured that we plan to share all of the outputs and will likely convene similar events in the future. 
 
Keep a lookout for more detailed information about the workshop program in the coming weeks and feel free to continue providing input before, during, and afterward. This is absolutely a community-driven effort and we look forward to continuing our collaborations into the new year!

RDMF16 Research Software Breakout Session

(Katie Fraser from the University of Nottingham reports on the Research Software Preservation breakout session at the RDMF16 event, which took place in Edinburgh in late November 2016…)
The Research Software breakout session was proposed and facilita…

Finding our Roadmap rhythm

In keeping with our monthly updates about the merged Roadmap platform, here’s the short and the long of what we’ve been up to lately courtesy of Stephanie Simms of the DMPTool:

Short update

Long(er) update

This month our main focus has been on getting into a steady 2-week sprint groove that you can track on our GitHub Projects board. DCC/DMPonline is keen to migrate to the new codebase so in preparation we’re revising the database schema and optimizing the code. This clean-up work not only makes things easier for our core development team, but will facilitate community development efforts down the line. It also addresses some scalability issues that we encountered during a week of heavy use on the hosted instance of the Finnish DMPTuuli (thanks for the lessons learned, Finland!). We’ve also been evaluating dependencies and fixing all the bugs introduced by the recent Rails and Bootstrap migrations.

Once things are in good working order, DMPonline will complete their migration and we’ll shift focus to adding new features from the MVP roadmap. DMPTool won’t migrate to the new system until we’ve added everything on the list and conducted testing with our institutional partners from the steering committee. The CDL UX team is also helping us redesign some things, with particular attention to internationalization and improving accessibility for users with disabilities.

The rest of our activities revolve around gathering requirements and refining some use cases for machine-actionable DMPs. This runs the gamut from big-picture brainstorming to targeted work on features that we’ll implement in the new platform. The first step to achieving the latter involves a collaboration with Substance.io to implement a new text editor (Substance Forms). The new editor offers increased functionality, a framework for future work on machine-actionability, and delivers a better user experience throughout the platform. In addition, we’re refining the DMPonline themes (details here)—we’re still collecting feedback and are grateful to all those who have weighed in so far. Sarah and I will consolidate community input and share the new set of themes during the first meeting of a DDI working group to create a DMP vocabulary. We plan to coordinate our work on the themes with this parallel effort—more details as things get moving on that front in Nov.

Future brainstorming events include PIDapalooza—come to Iceland and share your ideas about persistent identifiers in DMPs!—and the International Digital Curation Conference (IDCC) 2017 for which registration is now open. We’ll present a Roadmap update at IDCC along with a demo of the new system. In addition, we’re hosting an interactive workshop for developers et al. to help us envision (and plan for) a perfect DMP world with tools and services that support FAIR, machine-actionable DMPs (more details forthcoming).

Two final, related bits of info: 1) we’re still seeking funding to speed up progress toward building machine-actionable DMP infrastructure; we weren’t successful with our Open Science Prize application but are hoping for better news on an IMLS preliminary proposal (both available here). 2) We’re also continuing to promote greater openness with DMPs; one approach involves expanding the RIO Journal Collection of exemplary plans. Check out the latest plan from Ethan White that also lives on GitHub and send us your thoughts on DMP workflows, publishing and sharing DMPs.

Getting our ducks in a row

Recent activity on the Roadmap project encompasses two major themes: 1) machine-actionable data management plans and 2) kicking off co-development of the shared codebase.

Image credit: ‘Get Your Ducks in a Row‘ CC-BY-SA by Cliff Johnson

Machine-actionable DMPs

The first of these has been a hot topic of conversation among stakeholders in the data management game for some time now, although most use the phrase “machine-readable DMPs.” So what do we mean by machine-actionable DMPs? Per the Data Documentation Initiative definition, “this term refers to information that is structured in a consistent way so that machines can be programmed against the structure.” The goal of machine-actionable DMPs, then, is to better facilitate good data management and reuse practices (think FAIR: Findable, Accessible, Interoperable, Reusable) by enabling:

  • Institutions to manage their data
  • Funders to mine the DMPs they receive
  • Infrastructure providers to plan their resources
  • Researchers to discover data

This term is consistent with the Research Data Alliance Active DMPs Interest Group and the FORCE11 FAIR DMPs group mission statements, and it seems to capture what we’re all thinking: i.e., we want to move beyond static text files to a dynamic inventory of digital research methods, protocols, environments, software, articles, data… One reason for the DMPonline-DMPTool merger is to develop a core infrastructure for implementing use cases that make this possible. We still need a human-readable document with a narrative, but underneath the DMP could have more thematic richness with value for all stakeholders.

A recent Cern/RDA workshop presented the perfect opportunity to consolidate our notes and ideas. In addition to the Roadmap project members, Daniel Mietchen (NIH) and Angus Whyte (DCC) participated in the exercise. We conducted a survey of previous work on the topic (we know we didn’t capture everything so please alert us to things we missed) and began outlining concrete use cases for machine-actionable DMPs, which we plan to develop further through community engagement over the coming months. Another crucial piece of our presentation was a call to make DMPs public, open, discoverable resources. We highlighted existing efforts to promote public DMPs (e.g., the DMPTool Public DMPs list, publishing exemplary DMPs in RIO Journal) but these are just a drop in the bucket compared to what we might be able to do if all DMPs were open by default.  

You can review our slides here. And please send feedback—we want to know what you think!

Let the co-development begin!

Now for the second news item: our ducks are all in a row and work is underway on the shared Roadmap codebase.

We open with a wistful farewell to Marta Ribeiro, who is moving on to an exciting new gig at the Urban Big Data Centre. DCC has hired two new developers to join our ranks—Ray Carrick and Jimmy Angelakos—both from their sister team at EDINA. The finalized co-development team commenced weekly check-in calls and in the next week or two we’ll begin testing the draft co-development process by adding three features from the roadmap:

  1. Enhanced institutional branding
  2. Funder template export
  3. OAuth link an ORCID

In the meantime, Brian is completing the migration to Rails 4.2 and both teams are getting our development environments in place. Our intention is to iterate on the process for a few sprints, iron out the kinks, and then use it and the roadmap as the touchstones for a monthly community developer check-in call. We hope this will provide a forum for sharing use cases and plans for future work (on all instances of the tool) in order to prioritize, coordinate, and alleviate duplication of effort.

The DCC interns have also been plugging away at their respective projects. Sam Rust just finished building some APIs on creating plans and extracting guidance, and is now starting work on the statistics use case. Damodar Sójka meanwhile is completing the internationalization project, drawing from work done by the Canadian DMP Assistant team. We’ll share more details about their work once we roll it back into the main codebase.

Next month the UC Berkeley Web Services team will evaluate the current version of DMPonline to flag any accessibility issues that need to be addressed in the new system. We’ve also been consulting with Rachael Hu on UX strategy. We’re keeping track of requests for the new system and invite you to submit feedback via GitHub issues.

Stay tuned to GitHub and our blog channels for more documentation and regular progress updates.

#IDCC16: Atomising data: Rethinking data use in the age of explicitome

Data re-use is an elixir for those involved in research data.

Make the data available, add rich metadata, and then users will download the spreadsheets, databases, and images. The archive will be visited, making librarians happy. Datasets will be cited, making researchers happy. Datasets may be even re-used by the private sector, making university deans even happier.

But it seems to me that data re-use, or at least a particular conceptulisation of re-use as is established in most data repositories, is not the definitive way of conceiving of data in the 21st century.

Two great examples from the International Data Curation Conference illustrated this.

Barend Mons declared that the real scientific value in scholarly communication is not abstracts, articles or supplementary information. Rather the data that sits behinds these outputs is the real oil to be exploited, featuring millions of assertions about all kinds of biological entities.

Describing the sum of these assertions as the explicitome, it enables cross fertilisation between distinct scientific work. With all experimental data made available in the explicitome, researchers taking an aerial view can suddenly see all kinds of new connections and patterns between entities cited in wholly different research projects.

Secondly, Eric Kansa’s talk on the Open Context framework for publishing archaeological data. Following the same principle as Barend Mons, OpenContext breaks data down into individual items. Instead of downloading a whole spreadsheet relating to a single excavation, you can access individual bits of data. From an excavation, you can see the data related to a particular trench, and then items discovered in that trench.

A screenshot from OpenContext

In both cases, data re-use is promoted, but in an entirely different way to datasets being uploaded to an archive and then downloaded by a re-user.

In the model proposed by Mons and Kansa, data is atomised, and then published. Each individual item, or each individual assertion, gets it own identity. And that piece of data can then easily be linked to other relevant pieces of data.

This hugely increases the chance of data re-use; not whole datasets of course, but tiny fractions of datasets. An archaeologist examining remains of jars on French archaeological sites might not even think to look at a dataset from a Turkish excavation. But if the latter dataset is atomised in a way that it allows it identify the presence of jars as well, then suddenly that element of the Turkish dataset will become useful.

This approach to data is the big challenge for those charged with archiving such data. Many data repositories, particularly institutional ones, store individual files but not individual pieces of data. How research data managers begin to cope with the explicitome – enabling it, nourishing and sustaining it – may well be a topic of interest for IDCC17.

#IDCC16: Strategies and tactics in changing behaviour around research data

The International Data Curation Conference (IDCC) continues to be about change. That is, how do we change the eco-system so that managing data is an essential component of the research lifecycle? How can we free the rich data trapped in PDFs or lost to linkrot? How can we get researchers to data mine and not data whine?

While, for some, the pace of change is not quick enough, IDCC still demonstrates an impressive breadth of strategy and tactics to enable this change.

On the first day of the conference, Barend Mons set out the vision. The value of research is not in journals but in the underlying data – thousands and thousands of assertions about genes, bacteria, viruses, proteins, indeed any biological entity are locked in figures and tables. Release such data and the interconnections between related entities in different datasets reveals whole new patterns. How to make this happen? One part of the solution: all projects should allocate 5% of their budget to data stewardship.

Andrew Sallans of the Center for Open Science followed this up with their eponymous platform for managing Open Science for linking data to all kinds of cloud providers and (fingers crossed) institutions’ data repositories. In large-scale projects, sharing and versioning data can easily get out of control; the framework helps to manage this process more easily. They have some pretty nifty financial incentives to change practice too – $1000 awards for pre-registration of research plans.

Following this we saw many posters – tactics to alter behaviours of individuals and groups of researchers. There were some great ideas here, such as plans at the University of Toronto to develop packages of information for librarians on data requirements of different disciplines. 

Despite this, my principal concern was the huge gap between the massive sweep of the strategic visions and the tactics for implementing change. Many of the posters were valiant but were locked in an institutional setting – the libraries wrestling how to influence faculty without the in depth knowledge (or institutional clout) to make winning arguments within a particular area.

What still seems to be missing from IDCC is the disciplinary voice. How are particular subjects approaching research data? How can the existing community work more closely with them? There was one excellent presentation on building workflows for physicists studying gravitational waves; and other results from OCLC work with social scientists and zoologists. But in most cases it was us librarians doing the talking rather than it being a shared platform with the researchers. If we want that change to happen, there still needs to be greater engagement with the subjects that are creating the research data in the first place.

Where are they now? An RDM update from the University of Glasgow

A guest blog post by Mary Donaldson, Research Data Management Services Co-ordinator, University of Glasgow.

Over recent years, central support for research data management (RDM) at the University of Glasgow has been limited. The JISC-funded C4D project which ran until September 2013 provided some basic support, which was augmented with expert advice from the Digital Curation Centre (DCC). We used our DCC institutional engagement to assist with the formulation of our draft Institutional Data Policy and our Engineering and Physical Science Research Council (EPSRC) Roadmap and for help with RDM training.  The DCC also helped run a Data Asset Framework (DAF) survey and associated follow-up interviews which allowed us to assess current RDM practices in the University. Joy Davidson and Sarah Jones were invaluable for their support in the early development of RDM awareness at Glasgow.

Between the end of the initial formal engagement with the DCC and late 2014, work to develop and promote RDM at Glasgow has been on an ad hoc basis. In late 2014, the University of Glasgow began appointing an RDM team to run its institutional RDM service and research data registry and repository. The team currently comprises an RDM officer with responsibility for the technical side of operations and an RDM officer with responsibility for the coordination of the service. In June 2015, our team will be complete when a third member, an RDM officer with responsibility for staff training and support, joins us. The RDM team has been working systematically to develop the RDM service on several fronts:

Registry and Repository:

As part of the Cerif for Datasets (C4D) project, Glasgow set up a fledgling Research Data Registry (http://researchdata.gla.ac.uk) using EPrints Repository software.  The Registry uses a metadata specification developed during the C4D project working with other EPrints sites to agree standard functionality.  The data registry has various functionality to help researchers manage their research data, including the capability to mint Digital Object Identfiers (DOIs) for data.

Following the appointment of the Glasgow RDM team in late 2014, the Registry has been augmented by data archiving capability provided by Arkivum, and using the Arkivum EPrints plugin to seamlessly link the Registry with off-site Arkivum storage.

Plans for the coming months include: linking research data with research publications and theses, building on work carried out by University of London Computer Centre and University of East London; linking research data with University staff profile pages; enhanced Registry front-end to include a responsive design so that researchers can use the registry across multiple devices; and iterative development of data ingest and curation procedures as more datasets come through the Registry and workflows are tested and revised.

Researcher Engagement:

With the looming 1st May 2015 deadline for the EPSRC expectations, the majority of our researcher engagement activities this year have been focussed on EPSRC-funded researchers.  We have been contacting EPSRC-funded researchers to offer face-to-face meetings with a member of the RDM team to clarify funder expectations and to explain how the RDM service can help.  Through these meetings, we have identified a few examples of really good practice and have cultivated these researchers as ‘data management champions’ who are willing to speak at RDM engagement events about how they go about RDM activities. We also take opportunities to go and speak at researcher gatherings to raise the profile of RDM within the University’s research community. In addition to our proactive work with EPSRC-funded researchers, our services are also available to all other research staff in the University. In recent weeks, with the release of the ESRC Data Policy, we have been looking at ways in which we can engage with ESRC-funded researchers at Glasgow. We anticipate that as compliance with the EPSRC and ESRC requirements becomes part of the normal research workflow, we’ll turn our attention to other RCUK-funded researchers. We are also working with the Open Access Service to coordinate our service offerings and to reduce the number of emails being received by research staff.

Training Offering:

Recently we have been working on extending our researcher training offering to make sure we cover all aspects of RDM and the data lifecycle, and make this training available to researchers at all stages of their careers.

Through the Staff Development Service, we will be increasing access opportunities to the existing workshop-  ‘Managing Research Data’ and we will also be offering a new workshop –  ‘Data Management Planning’. We will also be contributing appropriate material to several other workshops run by the Staff Development Service.

Through the Graduate Schools, we will be offering workshops on Research Data Management for Postgraduate researchers. We will also be contributing appropriate material to other training courses offered by the Graduate Schools.

In addition we will be delivering training to staff and student groups within the University such as the Early Career Researcher Fellowship Application Mentoring Group.

Service coordination:

We are continuing to work with other services at Glasgow to ensure that consideration is given to RDM at the appropriate times in a research project lifecycle.  With the contracts team we have agreed wording for collaboration agreements that makes provision for data sharing at the end of the project. We are also working with the University Ethics Committee to inform them of RDM considerations that they might need to take into account when considering applications, and with the Research Support Office to get researchers to complete a Data Management Plan for their project and to cost for RDM when making funding bids.

We have also made two successful proposals to the Research Strategy and Planning Committee:

  1. To put the responsibility for the quality assurance of our data curations processes within the remit of our Vice Principal for Research and Enterprise.
  2. To strongly encourage all researchers within the University of Glasgow to prepare data management plans for their projects, regardless of whether this is required by their funders as part of the application process.

Future aspirations:

  • To ensure all researchers have access to support to facilitate compliance with funder requirements and good data management practice?
  • To extend the University of Glasgow-specific guidance in DMPonline.
  • To fully embed data management and planning into the normal workflow of researchers at Glasgow.
  • To update our training offering and resources with examples of best RDM practice from within our own research community.