This week saw the first RDA UK workshop hosted by Jisc in Birmingham. The Research Data Alliance is a community-driven organisation aiming to build the social and technical infrastructure to enable open sharing of data. Members come together throu…
In keeping with our monthly updates about the merged Roadmap platform, here’s the short and the long of what we’ve been up to lately courtesy of Stephanie Simms of the DMPTool:
- Co-development on Roadmap codebase (current sprint)
- Adding documentation to DMPRoadmap GitHub wiki
- Machine-actionable DMPs
- Public DMPs: RIO Journal Collection and curating the DMPTool Public DMPs list
This month our main focus has been on getting into a steady 2-week sprint groove that you can track on our GitHub Projects board. DCC/DMPonline is keen to migrate to the new codebase so in preparation we’re revising the database schema and optimizing the code. This clean-up work not only makes things easier for our core development team, but will facilitate community development efforts down the line. It also addresses some scalability issues that we encountered during a week of heavy use on the hosted instance of the Finnish DMPTuuli (thanks for the lessons learned, Finland!). We’ve also been evaluating dependencies and fixing all the bugs introduced by the recent Rails and Bootstrap migrations.
Once things are in good working order, DMPonline will complete their migration and we’ll shift focus to adding new features from the MVP roadmap. DMPTool won’t migrate to the new system until we’ve added everything on the list and conducted testing with our institutional partners from the steering committee. The CDL UX team is also helping us redesign some things, with particular attention to internationalization and improving accessibility for users with disabilities.
The rest of our activities revolve around gathering requirements and refining some use cases for machine-actionable DMPs. This runs the gamut from big-picture brainstorming to targeted work on features that we’ll implement in the new platform. The first step to achieving the latter involves a collaboration with Substance.io to implement a new text editor (Substance Forms). The new editor offers increased functionality, a framework for future work on machine-actionability, and delivers a better user experience throughout the platform. In addition, we’re refining the DMPonline themes (details here)—we’re still collecting feedback and are grateful to all those who have weighed in so far. Sarah and I will consolidate community input and share the new set of themes during the first meeting of a DDI working group to create a DMP vocabulary. We plan to coordinate our work on the themes with this parallel effort—more details as things get moving on that front in Nov.
Future brainstorming events include PIDapalooza—come to Iceland and share your ideas about persistent identifiers in DMPs!—and the International Digital Curation Conference (IDCC) 2017 for which registration is now open. We’ll present a Roadmap update at IDCC along with a demo of the new system. In addition, we’re hosting an interactive workshop for developers et al. to help us envision (and plan for) a perfect DMP world with tools and services that support FAIR, machine-actionable DMPs (more details forthcoming).
Two final, related bits of info: 1) we’re still seeking funding to speed up progress toward building machine-actionable DMP infrastructure; we weren’t successful with our Open Science Prize application but are hoping for better news on an IMLS preliminary proposal (both available here). 2) We’re also continuing to promote greater openness with DMPs; one approach involves expanding the RIO Journal Collection of exemplary plans. Check out the latest plan from Ethan White that also lives on GitHub and send us your thoughts on DMP workflows, publishing and sharing DMPs.
When the DCC revised DMPonline in 2013, we introduced the concept of themes to the tool. The themes represent the most common topics addressed in Data Management Plans (DMPs) and work like tags to associate questions and guidance. Questions within DMP …
Recent activity on the Roadmap project encompasses two major themes: 1) machine-actionable data management plans and 2) kicking off co-development of the shared codebase.
Image credit: ‘Get Your Ducks in a Row‘ CC-BY-SA by Cliff Johnson
The first of these has been a hot topic of conversation among stakeholders in the data management game for some time now, although most use the phrase “machine-readable DMPs.” So what do we mean by machine-actionable DMPs? Per the Data Documentation Initiative definition, “this term refers to information that is structured in a consistent way so that machines can be programmed against the structure.” The goal of machine-actionable DMPs, then, is to better facilitate good data management and reuse practices (think FAIR: Findable, Accessible, Interoperable, Reusable) by enabling:
- Institutions to manage their data
- Funders to mine the DMPs they receive
- Infrastructure providers to plan their resources
- Researchers to discover data
This term is consistent with the Research Data Alliance Active DMPs Interest Group and the FORCE11 FAIR DMPs group mission statements, and it seems to capture what we’re all thinking: i.e., we want to move beyond static text files to a dynamic inventory of digital research methods, protocols, environments, software, articles, data… One reason for the DMPonline-DMPTool merger is to develop a core infrastructure for implementing use cases that make this possible. We still need a human-readable document with a narrative, but underneath the DMP could have more thematic richness with value for all stakeholders.
A recent Cern/RDA workshop presented the perfect opportunity to consolidate our notes and ideas. In addition to the Roadmap project members, Daniel Mietchen (NIH) and Angus Whyte (DCC) participated in the exercise. We conducted a survey of previous work on the topic (we know we didn’t capture everything so please alert us to things we missed) and began outlining concrete use cases for machine-actionable DMPs, which we plan to develop further through community engagement over the coming months. Another crucial piece of our presentation was a call to make DMPs public, open, discoverable resources. We highlighted existing efforts to promote public DMPs (e.g., the DMPTool Public DMPs list, publishing exemplary DMPs in RIO Journal) but these are just a drop in the bucket compared to what we might be able to do if all DMPs were open by default.
You can review our slides here. And please send feedback—we want to know what you think!
Let the co-development begin!
Now for the second news item: our ducks are all in a row and work is underway on the shared Roadmap codebase.
We open with a wistful farewell to Marta Ribeiro, who is moving on to an exciting new gig at the Urban Big Data Centre. DCC has hired two new developers to join our ranks—Ray Carrick and Jimmy Angelakos—both from their sister team at EDINA. The finalized co-development team commenced weekly check-in calls and in the next week or two we’ll begin testing the draft co-development process by adding three features from the roadmap:
- Enhanced institutional branding
- Funder template export
- OAuth link an ORCID
In the meantime, Brian is completing the migration to Rails 4.2 and both teams are getting our development environments in place. Our intention is to iterate on the process for a few sprints, iron out the kinks, and then use it and the roadmap as the touchstones for a monthly community developer check-in call. We hope this will provide a forum for sharing use cases and plans for future work (on all instances of the tool) in order to prioritize, coordinate, and alleviate duplication of effort.
The DCC interns have also been plugging away at their respective projects. Sam Rust just finished building some APIs on creating plans and extracting guidance, and is now starting work on the statistics use case. Damodar Sójka meanwhile is completing the internationalization project, drawing from work done by the Canadian DMP Assistant team. We’ll share more details about their work once we roll it back into the main codebase.
Next month the UC Berkeley Web Services team will evaluate the current version of DMPonline to flag any accessibility issues that need to be addressed in the new system. We’ve also been consulting with Rachael Hu on UX strategy. We’re keeping track of requests for the new system and invite you to submit feedback via GitHub issues.
Stay tuned to GitHub and our blog channels for more documentation and regular progress updates.
This week we hosted the DMPTool team to flesh out our plans for ‘roadmap’ – the joint codebase we’re building together based on DMPonline and DMPTool. The key focus was reviewing and prioritising tasks for an initial release. &n…
Our collaboration with the DMPTool team continues. Marta was in Oakland at the end of May and we’re preparing to host the US team in Glasgow next week. We’ve been experiencing Californian weather for the past few weeks – hope it lasts long enough so they experience Scotland at its best.
Below is an update from Stephanie on Marta’s visit. We’ll post more news soon on the UK side of the trip.
Roadmap team-building exercises: US edition – reposted from the DMPTool blog
Last week we hosted Marta Ribeiro, the lead developer for DMPonline, for an intense, donut-fueled planning meeting to define our co-development process and consolidate our joint roadmap. The following is a debriefing on what we accomplished and what we identified as our next steps.
The project team is established, with Brian Riley joining as the DMPTool technical lead. Marta is busy completing the migration of DMPonline to Rails 4.2 to deposit the code into our new Github repository: DMPRoadmap. There’s nothing to see just yet—we’re in the midst of populating it with documentation about our process, roadmap, issues, etc. As soon as everything is in place, we’ll send word so that anyone who’s interested can track our progress. This will also allow us to begin sussing out how to incorporate external development efforts to benefit the larger DMP community. In addition, Marta is mentoring a pair of summer interns who are undertaking the internationalization work and building APIs. Meanwhile, Brian will finish building the servers for the Roadmap development and staging environments on AWS with another new member of the UC3 team: Jim Vanderveen (DevOps/Developer). Additional core team members include Stephanie Simms and Sarah Jones as Service/Project Managers, Marisa Strong as the Technical Manager, and the CDL UX team (many thanks to our UX Design Manager, Rachael Hu, for spending so much time with us!). UC3 and DCC will also rely on their existing user groups for testing and feedback on both sides of the pond.
Other groundlaying activities include a web accessibility evaluation for DMPonline to ensure that the new system is accessible for disabled users and exploring what we (and others) mean when we talk about “machine-readable DMPs.” Stephanie just received an RDA/US Data Share Fellowship to develop use cases for making DMPs machine readable, in consultation with the Active DMPs Interest Group and the research community at large. In line with this effort, she’ll be participating in an interdisciplinary and international workshop on active DMPs next month, co-hosted by CERN and the RDA group. We’re actively seeking and summarizing thoughts on the topic so please send us your ideas!
We conclude this edition with a draft of our project roadmap (below); it lists all of the features that we’ll be adding to the DMPonline codebase before we release the new platform. Most of the features already exist in the DMPTool and were slated for future enhancements to DMPonline. Stay tuned for our next update following a UC3 exchange visit to Glasgow/Edinburgh in mid June to prioritize the roadmap and commence co-development work.
- Migration to Rails v.4.2
- Bring DMP Assistant’s internationalization upstream for multi-lingual support
- Adding the concept of locales so specific organizations, funders, and templates can be defined and filtered out for certain users/contexts
- Shibboleth support through eduGain
- OAuth link for ORCID
- APIs to create plans, extract guidance, and generate usage statistics
- More robust institutional branding
- A lifecycle to indicate the status of plans and allow institutional access to plans
- Support for reviewing plans
- Public sharing option > Public DMPs library
- Flag test plans (to exclude them from usage stats)
- Email notification system
- Admin controls for assigning admin rights to others
- Export template with guidance
- Copy template option for creating new templates
- Copy plan option for creating new plans
- Toggle switch for navigating between Plan area and Admin area
The Jisc Research Data Spring workshop at the Warwick Conference Centre in Coventry had some welcome moments of blue sky before the mid-December dull grey set in. These included a breakout session from one of the projects ; Collaboration for Research Enhancement by Active Metadata (CREAM). Their breakout session explored the active use of metadata in the arts and sciences, a theme the project members have been exploring for some time .
The workshop was titled ‘Observations on Commonalities of Process’, and led by Iris Garrelfs and Graham Klyne’s two-handed presentation on key parallels between the arts and sciences. Iris Garrelfs spoke as a PhD student and artist who works “on the cusp of music, art and sociology” . Graham Klyne of Nine by Nine , spoke as an ex University of Oxford bioinformatician, and contributor to many semantic web standards.
This seemed unaccustomedly philosophical territory for a Jisc programme workshop, in my experience anyway. And, despite any seasonal temptation, nobody made any rubbish puns about C.P Snow, or made too much of his big theme. It was the rifts between the ‘two cultures’ of the sciences and humanities, which the chemist-turned-novelist famously wrote about  and are still with us to today. Much of the research-driven impetus behind Research Data Management has come from the STEM disciplines. Perhaps understandably, given the impact of the EPSRC’s data policy on UK institutions, this has antagonised many humanities researchers who would rather deal with policy directions couched in their own terms. So I guess if Snow were still around he would have approved of this session.
Rather than becoming bogged down in the differences of terminology and epistemology, the session brought fresh thinking on common methods and tools for dealing with arts and humanities metadata. The four discussion themes were:
- planning and agility
- workflow and lifecycle
Each theme was introduced by Iris and Graham, based on the project’s effort to develop a model for ‘active metadata’. They included reflections on the research processes followed by artists at University of the Arts London, and by chemists and geoscientists in Southampton and Edinburgh. So there were many contributions from collaborators Athanasios Velios, Simon Coles, and others outside the project.
1. Planning and agility
Some of the fresh thinking mentioned earlier is in the shape of Iris Garrelf’s Procedural Blending model. This is an abstract framework for describing creative processes, set out in her PhD thesis , and based on her work in sound art.
If I picked up Iris’s quick introduction to the model correctly the gist of it is that creative processes do not follow a stepwise linear process from input to output, but blend parallel strands of action (or ways of framing a problem), that become joined together at key points in the research process. The question is, how can this be recorded in useful ways?
Provenance metadata is part of the answer for CREAM. Reflecting on his involvement in the W3C PROV collaboration, Graham Klyne’s take on this standard for provenance metadata  was that it offers a very useful structure for encoding process, but it is not forthcoming about describing its less mechanical aspects. The Procedural Blending model, he said, has offered a fascinating counterbalance to PROV. It may offer the provenance standards a broader framework for these less tangible aspects of data management. Of course provenance is a retrospective record of action, and research planning and workflow design are prospective. Addressing the tacit and intangible seems key to working out how to apply the provenance metadata emerging from a project as a resource for planning-in-action.
At first arts-science differences were most evident when the project began working out how to take that forward, but then parallels became clear. These include several aspects of the trade-off between planning and agility in research.
- Amendments and changes in process. The project has considered research around chemical reactions, responses to planning that research, and the role of improvisation. At one extreme improvisation can be thought of as ‘developing a plan in the moment’. At the other it can refer to points where a researcher is responding to observations and adapting (say) a spreadsheet to record experiment outcomes.
- Re-framing. Iris pointed out that artists are used to taking conceptual and physical objects and turning them on their head to look at from a different perspective, whether literally or metaphorically. Science aims to nail down processes in a more definitive and reproducible way. But as Graham and others commented, the way that science research is reported suggests design that is more planned than it is in practice. So CREAM has become focused on the messy aspects of research design; not just when the milk gets spilled, as it were, but acting on the smell of it; those points when arbitrary choices are made, or the data that researchers are faced with suggest a new line of investigation. Here it is detailed knowledge of background that makes for the ability to make decisions about what line to take.
These points resonated with challenges to reproducibility that the RDM community is trying to address. Simon Coles mentioned for example that a minority, perhaps 20%, of chemical syntheses are reproducible, because of tacit knowledge and arbitrary decisions that do not get recorded.
Neil Jefferies made a further connection with the under-reporting of negative results, and the idea that capturing this vast body of knowledge of ‘what didn’t work’ could save time by identifying what won’t work in future. Simon Coles pointed out that accounting for negative results that don’t go according to plan is a very different thing from accounting for agility in planning. And Southampton ex-colleague Mathew Addis pointed out that the desire for this level of accounting varies by discipline, but isn’t restricted to academia. Chemists try to record everything, and so does the pharma industry.
The Southampton University Chemistry groups’ work with electronic lab notebooks (ELN) has taught them a thing or two about what people actually record and do with paper and digital notebooks. The greater sense of ownership researchers have of paper notebooks affects their willingness to make the switch. ELN take-up is difficult where there are specific research values around data ownership, and they are investigating ways to encourage submission of ELNs.
Humanities scholars tend to deal with the subjectivity of research decision-making differently to scientists. Iris spoke about artists and scholars concern to understand motivations and influences. She pointed out that history and archaeology are concerned with similar problems as scientific reproducibility – doing forensic studies of ‘how people got there’. Generalising across the humanities, the view tends to be much more that provenance is debatable, while for scientists it’s a record of the path of their research that does not need or deserve debate.
These differences can be productive though; some present commented on the value of capturing motivations in science. There is also value in drawing out the role of decisions that can’t be planned for e.g. apparently arbitrary decisions about what is looked at, or selected for analysis. Conventional recording processes rarely allow for this in the sciences, and appraisal processes emphasise deliberate and rational choice.
Having light shed on arbitrary decision-making from models of the creative process may help to incorporate into the scientific record metadata on how ideas are made, and how spontaneity is dealt with. As Simon Coles pointed out research depends on the ability to deviate from a plan on the fly. And as Neil Jefferies remarked, collaborations often depend on people knowing that they share a similar feeling about the problem at hand that comes from their aggregated history. Exposing those aspects that influence how decision are made could help with reproducibility, and are challenging to record.
Workflow and lifecycle
If you have tried to apply research data lifecycle models in practice and thought ‘ok that’s fine for the fly-by overview, but life’s not like that’, you will probably appreciate the problem CREAM is trying to tackle. One of the marked similarities the project found was between the procedural blending model and models of the research lifecycle more common in the sciences.
There are well-known barriers to the practicality of documenting more fine-grained and realistic metadata; the prime one being justifying the expense of doing it. But there are nuances to the cost-benefit trade-offs. Obviously automating the metadata gathering helps, but only of the metadata is more meaningful and useful than that which is handcrafted but based on fallible memory and hindsight-based rationalisations about what happened. This is where the CREAM collaborators believe workflow models that allow for provenance metadata to be applied prospectively may help. So far, they said they had been pleasantly surprised at how much the fluidity of the artistic view could be useful to shed light on scientific process and, from that artistic perspective the potential of scientific workflow techniques for recording process.
Ownership and attribution issues were the key ones highlighted at this point in the discussion. Copyright and plagiarism concerns drive a reluctance to record research processes. For some present that pointed to the need to enable a hierarchy of access to data. Workflows for research data sharing must allow for much of the data to be kept to known collaborators for much of the time. The RDM community’s general invocation to be as open as possible, as quickly as feasible, can drown out that message.
Fiona Murphy, who coupled her humanities background with her experience in scientific publishing, highlighted an important question – how important is it who actually makes the observations that create data? From a reproducibility perspective these observations should, in principle at least, be independent of the observer making them, but how is that actually viewed in practice? Some of the scientists present were happy to acknowledge that some people are better than others at making observations. Had there been more humanists or sociologists of science in the room this might have sparked further debate about epistemology, or about how researchers’ biographies and social networks actually affect what research gets done.
Other participants reiterated the earlier point that science reporting also tends to play down the creative elements of the research process. And from her arts background Iris Garrelf mentioned that the convention of working within a genre, and following its rules of provenance (among other things), has similarities with the call for reproducibility in science.
Two main points wrapped up this session; firstly that scientific and humanistic datasets can be used by researchers on the other side of the divide for purposes neither imagined, so it makes sense to have common data management frameworks. The other was to encourage researchers, and others involved in the RDM field, to go beyond a mandate-fulfilling view of reproducibility. Records of process aren’t just useful for re-treading your own path, they can be a resource for doing things outside your own field.
From my own point of view I liked this workshop a lot, and was pleased to see there’s a similar one planned for IDCC . Many of the themes will be familiar to provenance researchers and also touch on sociology of science. I was also reminded of Arthur Koestler’s ‘bisociation’ theory of creative thinking . I used that in my very first published journal article [1989, lost to digital rot, but papyrus still available!] so it had plenty of personal resonances.
CREAM are pursuing a novel approach, and more recent parallels struck me. On the sociological side of things some work by the Information Systems group at the LSE on ‘Collective agility, paradox and organizational improvisation’ based on a study of particle physics research processes in the GridPP collaboration . A little more current than that, Also the Research Data Alliance has several groups addressing the ‘planning and agility’ theme. These are the interest group on Active Data Management Plans , plus another on ‘De-constructing the Data Lifecycle- Agile Curation’ .
CREAM is part of a flurry of tech development aiming for better record-making tools in research. The hope is they’ll offer metadata that’s actually useful for research before it’s done, as well as more accurate about how it’s done, all with less effort and higher usability than the traditional lab record or artists notepad. The results remain to be seen.
Photo credit: ‘just spilled milk‘ by Post Memes CC-BY-2.0
 More on the Jisc Research Data Spring projects is available at: https://www.jisc.ac.uk/rd/projects/research-data-spring
 For example: Cerys Willoughby, Colin Bird, Jeremy Frey (2015) User-Defined Metadata: Using Cues and Changing Perspectives, International Journal of Digital Curation 2015, 10(1), pp. 18-47 doi:10.2218/ijdc.v10i1.343
 Metadata in Action workshop, IDCC16, Amsterdam, 25 Feb 2016 http://www.dcc.ac.uk/events/idcc16/workshops#Workshop%2010
 Arthur Koestler (1964) The Act of Creation, London: Penguin Books
 Zheng, Yingqin and Venters, Will and Cornford, Tony (2011) ‘Collective agility, paradox and organizational improvisation: the development of a particle physics grid’ Information Systems Journal. DOI: 10.1111/j.13652575.2010.00360.x Pre-print available at: http://eprints.lse.ac.uk/30029/1/Collective_agility_%28LSERO%29.pdf
 Research Data Alliance, ‘Active Data Management Plans’ IG, see: https://rd-alliance.org/groups/active-data-management-plans.html
 Research Data Aliiance, ‘Deconstructing the Data Life Cycle- Agile Curation’ Birds of a Feather Group, see: https://rd-alliance.org/groups/deconstructing-data-life-cycle-agile-curation.html
This breakout group was a discussion on the challenges of integrating systems for research data management. It was chaired by Rory McNicholl.
The group was asked to give examples of systems that could be integrated with research data infrastructure.
This post is about the first part of a three-part briefing on the responses to our survey carried out in May 2015. The briefing is available on our survey page. A basic report giving analysis of these responses was sent to the survey participants at th…