How to find an appropriate research data repository.


pmpelAs more and more funders and journals adopt data policies that require researchers to deposit underlying research data in a data repository, the question over where to store this data and how to choose a repository becomes more and more important. Heinz Pampel is one of the people behind, an Open Science tool that helps researchers to easily identify a suitable repository for their data and thus comply to requirements set out in data policies.

The debate on open access to research data is gaining relevance. This February, the federal agencies in the U.S. have been told by the Office of Science and Technology Policy (OSTP) to maximize access to data from publicly funded research. In June, the G8 science ministers published a set of principles for open scientific research data. The ministers declared that, if possible, “publicly funded scientific research data should be open”. And already last year, the European Commission announced a pilot framework in Horizon 2020, the coming EU framework programme for research and innovation, to promote open access to research data.

Although scientists agree with the potential benefit of data sharing for the scientific progress, the majority is reserved when it comes to practical implementations. One reason for the reluctance is a lack of reliable “systems that make it quick and easy to share data” (Tenopir et al. 2011).

The current landscape of data repositories is heterogeneous. Some initiatives like the Data Seal of Approval (DSA) and the World Data System (WDS) are working on the standardization of data repositories. And there are already certification and auditing procedures for data repositories. Two examples are the DIN 31644 and the ISO 16363 standards. But these standards are not widely used yet. Research data repositories and their services are mostly characterised by the scientific discipline in which they work. They store a wide variety of file formats under different conditions for access and reuse. In many cases it is difficult for researchers to find an appropriate repository for the storage of their data. To overcome these shortcomings we started – Registry of Research Data Repositories. – Registry of Research Data Repositories

Launched in 2012, provides an overview of existing research data repositories. In September 2013, lists 600 research data repositories, 400 of these are described in detail by a comprehensive vocabulary. The registry covers data repositories from all academic disciplines.

In researchers can easily see the terms of access and use of each data repositories and other characteristics. Information icons help researchers to easily identify an adequate repository for the storage and reuse of their data.


Aspects of a Research Data Repository with the corresponding icons used in

Aspects of a Research Data Repository with the corresponding icons used in covers the following aspects of a research data repository:

  • general information (e.g. short description of the repository, content types, keywords),
  • responsibilities (e.g. institutions responsible for funding, content or technical issues),
  • policies (e.g. guidelines and policies of the repository),
  • legal aspects (e.g. licenses of the database and datasets),
  • technical standards (e.g. APIs, versioning of datasets, software of the repository),
  • quality standards (e.g. certificates, audit processes).

The portal offers two search possibilities: (1) free text search through a simple search box, and (2) filters for more specific searches. In the list of results each record includes the name of the repository, the subjects covered, a brief description of the content and a set of icons visualizing key properties of the repository. A comprehensive view of the descriptive record of the repository can be obtained by clicking on the name of the repository in the search results.  It is also possible to simply browse through the list of indexed data repositories.


Example screenshot of search results for geosciences data repositories using persistent identifiers.

Operators of data repositories can suggest their infrastructures to be listed in by filling in an online application form. A repository is indexed when the minimum requirements for inclusion in are met. These requirements are described in the vocabulary. The project team reviews each repository and reviewed repositories are identified by a green check mark.

The project cooperates with other Open Science initiatives like BioSharing, DataCite and OpenAIRE. Some publishers already refer to in their Editorial Policies as a tool for the identification of suitable data repositories.

Next Steps

In the upcoming project phase the focus will be on improving usability and implementing new features. Among other things, the dialog with repositories operators will be supported by a workflow system. Beyond the development of the registry, the project will promote the standardization of research data repositories. is funded by the German Research Foundation (DFG). Project partners are GFZ German Research Centre for Geosciences,  Humboldt-Universität zu Berlin and Karlsruhe Institute of Technology (KIT).  These three partners, with their expertise in information infrastructures, guarantee the sustainability of the registry.

Further information on can be found in a recently published article in PLOS ONE: Pampel, H., et al. (2013). Making Research Data Repositories Visible: The Registry. PLOS ONE. doi: 10.1371/journal.pone.0078080

This post originally appeared on the PLOS Tech blog and is reposted under CC-BY.

Heinz Pampel works as project manager at the GFZ German Research Centre for Geosciences. He is currently involved in the establishment of the - Registry of Research Data Repositories. Furthermore he is a member of the Helmholtz Association’s Open Access Coordination Office.