Preserving digital stuff for the future is a weighty responsibility. With digital photos, for instance, would it be possible someday to generate perfectly sharp high-density, high-resolution photos from blurry or low-resolution digital originals? Probably not but who knows? The technological future is unpredictable. The possibility invites the question: shouldn’t we save our digital photos at […]
The European Commission recently held a public consultation on open research data. The President of DataCite, Adam Farquhar, had the opportunity to speak and highlighted the importance of identifying and citing research data. There were numerous DataCite members and data centres present.
Adam introduced the organization and highlighted how our data centres have assigned over 1.7 million DOIs of which over 270 thousand were assigned this year.
Just a few years ago, Adam explained, identification was dominated by local, national, or disciplinary initiatives. It has now matured substantially with the growth of international cross-disciplinary organizations such as DataCite. In other areas, we are also seeing researcher identifiers in ORCID and article identifiers from CrossRef.
There is widespread consensus that identification and citation-level metadata are essential to making data accessible, re-usable, and to establish incentive systems to encourage data sharing.
He also explained some lessons learned by DataCite over the last few years:
- Data identification requires interoperable APIs and metadata. We’ve worked with CrossRef to enhance the DOI APIs to support content negotiation for better machine-to-machine interactions. The DataCite Metadata Schema provides cross-disciplinary citation-level metadata for research data. It has been adopted by others, such as OpenAire, and supports third party services.
- While data identification has some distinct requirements, it has been possible to enhance existing approaches, such as DOI, to meet them. For example, we’ve worked with IDF so that their business model is now better suited to research data requirements.
- Together, open infrastructure, metadata, and APIs enable third parties to build enhanced services including commercial organizations, e.g. Thomson Reuters, and publishers, e. g. Thieme and Elsevier. They also enable repositories like Dryad, Pangaea, FigShare, and Zenodo to ensure that the data they hold is identified and citable for the long term.
- Data identification is more than assigning a number. Success requires robust services, robust policies, and a strong community of practice. It also means that allocating agents and data stewards must establish formal, often contractual, relationships with the long term in mind. Without these essential steps, data identification becomes just another breeding ground for 404 errors – data not found.
An open approach has also enabled us to work on the broader identifier challenges through collaborations. The DataCite-STM statement encouraged bi-directional links between articles and data. Through ODIN – the ORCID and DataCite Interoperability Network we will learn how to link authors with their articles, data, and more.
And, while challenges remain, we have a very strong basis for an interoperable identification infrastructure – one that weaves data, articles, and researchers together into a new fabric of open research.
During the meeting there was strong consensus on the essential role that data identification and citation play, as well as on the need for data management plans. There were also some areas of disagreement. Some industry representatives argued against the need for data to be open or to have ‘open’ be the default setting. There was also robust discussion on the appropriate size of data repositories – some argued for large scale, others for many small ones.
The following is a guest post from Megan Phillips, NARA’s Electronic Records Lifecycle Coordinator and an elected member of the NDSA coordinating committee and Andrea Goethals, Harvard Library’s Manager of Digital Preservation and Repository Services and co-chair of the NDSA Standards and Practices Working Group. As part of the effort to publicize the NDSA Levels of […]
This is a guest post by Abbie Grotke, the Library of Congress Web Archiving Team Lead and Co-Chair of the National Digital Stewardship Alliance Content Working Group. In this installment of the Content Matters series of the National Digital Stewardship Alliance Content Working Group we’re featuring an interview with staff from the U.S. National Park […]
As per the Nature copyright assignment form for Comments, I can post the text of my Comment 6 months after publication. That’s this week, so here it is! Piwowar H. (2013). Value all research products, Nature, 493 (7431) 159-159. DOI: 10.1038/493159a For more on this article, see previous blog posts: Just Published: Value All Research Products First […]
A few months back several members of the National Digital Stewardship Alliance’s Levels of Digital Preservation team presented a short paper at Archiving 2013, The NDSA Levels of Digital Preservation: An Explanation and Uses. While the Levels of Digital Preservation will continue to be refined and improved we are thrilled to report that they are […]