Common standards and PIDs for machine-actionable DMPs

QR code cupcakes

From Flickr by Amber Case CC BY-NC 2.0

Picking up where we left off from “Machine-actionable DMPs: What can we automate?”… Let’s unpack a couple of topics central to our machine-actionable DMP prototyping and automating efforts. These are the top rallying themes from all conversations, workshops, and working groups we’ve been privy to in the past few years. In addition, they feature in the “10 principles for machine-actionable DMPs” (principles 4 and 5):

  • DMP common standards
  • Persistent identifiers (PIDs)

DMP common standards
There’s community consensus about the need to first establish common standards for DMPs in order to enable anything else (Simms et al. 2017). Interoperability and delivery of DMP information across systems—to alleviate administrative burdens, improve quality of information, and reap other benefits—requires a common data model.

To address this requirement, the DMP Common Standards working group was launched at the 9th RDA plenary meeting in Barcelona. They’re making excellent progress and are on track to deliver a set of recommendations in 2019, which we intend to incorporate into our existing tools and emerging prototypes. Adoption of the common data model will enable tools and systems (e.g., CRIS, repositories, funder systems) involved in processing research data to read and write information to/from DMPs. The working group deliverables will be publicly available under a CC0 license and will consist of models, software, and documentation. For a summary of their scope and activities to date see Miksa et al. 2018.

A second round of consultation is underway currently to tease out more details and gather additional requirements about what DMP info is needed when for each stakeholder group. This international, multi-stakeholder working group is open to all; check out their session at the next RDA plenary in Botswana and contribute to the DMP common data model (6 Nov; remote participation is available).

Current/traditional DMPs - model questionnaires

    <question>Who will be the Principal Investigator?</question>
    <answer>The PI will be John Smith from our university.</answer>
</administrative data>
Machine-actionable DMPs - model information

“dc:creator”:[ {
         “foaf:name”:”John Smith”,
} ],

Caption: An example of data models for traditional DMPs (upper part) and machine-actionable DMPs (lower part). (Miksa et al. 2018: Fig. 1)

PIDs and DMPs
The story of PIDs in DMPs, or at least my involvement in the discussion, began with a lot of hand waving and musical puns at PIDapalooza 2016 (slides). After a positive reception and many deep follow-on conversations (unexpected yet gratifying to discover a new nerd community), things evolved into what is now a serious exploration of how to leverage PIDs for and in DMPs. The promise of PIDs to identify and connect research-relevant entities is tremendous and we’re fortunate to ride the coattails of some smart people who are making significant strides in this arena.

For our own PID-DMP R&D we’re partnering with one of the usual PID suspects, Datacite, to draw from their expertise and technical capabilities. Datacite contributed to the timely publication of the European Commission-funded FREYA report, which provides the necessary background research and a straightforward starting point(s). There’s also an established RDA PID interest group that we plan to engage with more as things progress.

A primary goal of FREYA is the creation and expansion of the “PID Graph.” The PID Graph “connects and integrates PID systems, creating relationships across a network of PIDs and serving as a basis for new services.” The report summarizes the current state of PID services as well as some emerging initiatives that we hope to harness (each is classified as mature, emerging, or immature):

  • ORCID iDs for researchers (mature)
  • DOIs for publications and data (mature), and software (emerging; also see SWH IDs)
  • Research OrgIDs for organizations (aka ROR; emerging and CDL is participating so we have an intimate view)
  • Global grant IDs (emerging and very exciting to track the prototyping efforts of Wellcome, NIH, and MRC!)
  • Data repository IDs (immature but on the radar as we address DMPs)
  • Project IDs/RAiDs (emerging and we see a lot of overlap with DMPs)

It also describes a vast array of PIDs for other things, all of which are potentially useful for maDMPs as we reconfigure them as an inventory of linked research outputs (Table 1: RRIDs, protocols, research facilities, field stations, physical samples, cultural artifacts, conferences, etc. etc.). Taken together, these efforts are aimed at extending the universe of things that can be identified with PIDs and expanding what can be done with them. This, in turn, supports automation and machine-actionability to achieve better research data management and promote open science.

Summing up
For now we’ll continue exploring our graph database and interviewing stakeholders who contributed seed data to dive deeper into their workflows, challenges, and use cases for maDMPs. This runs parallel to the activities of the RDA DMP Common Standards WG and various emerging PID initiatives. Based on this overlapping community research, we can move forward with outlining what to implement and test. The recommendations of the RDA group for DMP common standards are a given, and below is a high-level plan for PID prototyping:

PIDs for DMPs and PIDs in DMPs:

  • DOIs for DMPs: define metadata
  • PIDs in DMPs: What can we achieve by leveraging mature PID services? How do we make the information flow between stakeholders and systems?

Stay tuned as the story develops here on the blog! I’ll also be presenting on maDMPs in a data repositories session convened by our BCO-DMO partners at the upcoming American Geophysical Union meeting in DC (program here, 11 Dec). And Daniel Mietchen will be at PIDapalooza 2019 (Dublin, 23-24 Jan) promoting a highly relevant initiative: PIDs for FAIR ethics review processes.