In early July I wrote about the “what” of email archiving. That is, “what” are we trying to preserve when we say we’re “preserving email.” It was admittedly a cursory look at the issue, but hopefully it’s a start for more thorough discussions down the road. This time I’ll dig in a little deeper and […]
A draft program as well as the registration form for the DataCite Summer Meeting (Washington DC, Thursday, 19 September 2013 – Friday, 20 September 2013) are now available:
We are looking forward to see you in September!
I’m involved in lots of projects, based at many institutions, with multiple funders and oodles of people involved. Each of these projects has requirements for reporting metrics that are used to prove the project is successful. Here, I want to argue that many of these metrics are arbitrary, and in some cases misleading. I’m not […]
Every year we’re thrilled to host a meeting with our partners and interested individuals in the digital preservation community. This year’s meeting, Digital Preservation 2013, features a number of speakers and presentations around exploring innovative ideas across the digital information landscape. Coming together to share stories and practices of collecting, delivering and preserving our digital […]
Preserving digital stuff for the future is a weighty responsibility. With digital photos, for instance, would it be possible someday to generate perfectly sharp high-density, high-resolution photos from blurry or low-resolution digital originals? Probably not but who knows? The technological future is unpredictable. The possibility invites the question: shouldn’t we save our digital photos at […]
The European Commission recently held a public consultation on open research data. The President of DataCite, Adam Farquhar, had the opportunity to speak and highlighted the importance of identifying and citing research data. There were numerous DataCite members and data centres present.
Adam introduced the organization and highlighted how our data centres have assigned over 1.7 million DOIs of which over 270 thousand were assigned this year.
Just a few years ago, Adam explained, identification was dominated by local, national, or disciplinary initiatives. It has now matured substantially with the growth of international cross-disciplinary organizations such as DataCite. In other areas, we are also seeing researcher identifiers in ORCID and article identifiers from CrossRef.
There is widespread consensus that identification and citation-level metadata are essential to making data accessible, re-usable, and to establish incentive systems to encourage data sharing.
He also explained some lessons learned by DataCite over the last few years:
- Data identification requires interoperable APIs and metadata. We’ve worked with CrossRef to enhance the DOI APIs to support content negotiation for better machine-to-machine interactions. The DataCite Metadata Schema provides cross-disciplinary citation-level metadata for research data. It has been adopted by others, such as OpenAire, and supports third party services.
- While data identification has some distinct requirements, it has been possible to enhance existing approaches, such as DOI, to meet them. For example, we’ve worked with IDF so that their business model is now better suited to research data requirements.
- Together, open infrastructure, metadata, and APIs enable third parties to build enhanced services including commercial organizations, e.g. Thomson Reuters, and publishers, e. g. Thieme and Elsevier. They also enable repositories like Dryad, Pangaea, FigShare, and Zenodo to ensure that the data they hold is identified and citable for the long term.
- Data identification is more than assigning a number. Success requires robust services, robust policies, and a strong community of practice. It also means that allocating agents and data stewards must establish formal, often contractual, relationships with the long term in mind. Without these essential steps, data identification becomes just another breeding ground for 404 errors – data not found.
An open approach has also enabled us to work on the broader identifier challenges through collaborations. The DataCite-STM statement encouraged bi-directional links between articles and data. Through ODIN – the ORCID and DataCite Interoperability Network we will learn how to link authors with their articles, data, and more.
And, while challenges remain, we have a very strong basis for an interoperable identification infrastructure – one that weaves data, articles, and researchers together into a new fabric of open research.
During the meeting there was strong consensus on the essential role that data identification and citation play, as well as on the need for data management plans. There were also some areas of disagreement. Some industry representatives argued against the need for data to be open or to have ‘open’ be the default setting. There was also robust discussion on the appropriate size of data repositories – some argued for large scale, others for many small ones.