Finding and Choosing a Data Repository

What makes a good repository for imaging data?

Public data sharing creates trust in scientific rigor, facilitates reproducibility and enables the potential reuse of data in other projects. Funding agencies may demand that original source data of a publication be published, too. Many journals either demand or at least strongly recommend publishing original source data. However, there are things to consider. For example, Nature Cell Biology‘s Submission Guidelines contain for bioimaging: „For imaging source data, we encourage deposition to a relevant repository due to size constraints.“ (last access: 2023-06-01)

So the question is: Where should one publish the data? And how does it get there?

Specifically for bioimaging data, there are two european general purpose repositories that were established in recent years:

Many domain-specific repositories exist, too, that may be suitable for your data.

General criteria for trustworthy repositories

Given that repository-deposited data is public to be found and reused by others, there are general criteria that one can orient along when choosing a repository. For example, the „Science Europe Guide Practical Guide to the International Alignment of Research Data Management“ (2021, Zenodo) outlines the following minimum criteria for trustworthy repositories:

  1. Provision of Persistent and Unique Identifiers (PID)
    • Allow data discovery and identification
    • Enable searching, citing, and retrieval of data
    • Provide support for data versioning
  2. Metadata
    • Enable finding of data
    • Enable referencing to related relevant information, such as other data and publications
    • Provide information that is publicly available and maintained, even for non-published, protected, retracted, or deleted data
    • Use metadata standards that are broadly accepted (by the scientific community)
    • Ensure that metadata are machine-retrievable
  3. Data Access and Usage Licences
    • Enable access to data under well-specified conditions
    • Ensure data authenticity and integrity
    • Enable retrieval of data
    • Provide information about licensing and permissions (in ideally machine-readable form)
    • Ensure confidentiality and respect rights of data subjects and creators
  4. Preservation
    • Ensure persistence of metadata and data
    • Be transparent about mission, scope, preservation policies, and plans (including governance, financial sustainability, retention period, and continuity plan)

(taken from: Science Europe. (2021). Practical Guide to the International Alignment of Research Data Management – Extended Edition. https://doi.org/10.5281/zenodo.4915862, license: http://creativecommons.org/licenses/by/4.0/)

Help for researchers to choose a suitable repository

The Science Europe Practical Guide also includes a section targeted to researchers to help them decide for a repository. Moreover, members of the international bioimaging community have come to consensus recommendations for repositories in the field of bioimaging (Swedlow et al, 2021). In brief, these include that repositories should:

  • Have clear metadata specifications for submission
  • Should integrate into the bioimage data ecosystem as either data archives or as added value databases (see: Introduction to Image Data Repositories and Public Archives)
  • Should have clear author identification and also data access regulations
  • Should meet trustworthiness criteria as catalogued, e.g., by FAIRsharing.org, or other resources
  • Should follow recoginized guidelines on how to integrate person-identifiable data

Online registries to identify suitable repositories

As a researcher, it can be difficult to decide which repository is suitable for data deposition. Several of the abovementioned criteria are difficult to check for individuals. While professional data stewards at your research institutions (if available) may be able to help, a good approach is also to use repository registries:

What is re3data?

re3data is a global registry of research data repositories. The registry covers research data repositories from different academic disciplines. re3data presents repositories for the permanent storage and access of data sets to researchers, funding bodies, publishers and scholarly institutions. re3data aims to promote a culture of sharing, increased access and better visibility of research data.

In December 2012 re3data launched an alpha version. The registry will be continuously developed and improved together with relevant stakeholders. re3data is funded by the German Research Foundation (DFG).

Why is re3data useful?

re3data helps researchers to find appropriate repositories for the storage and access of research data. Further, it can be used by funding organisations to promote permanent access to research data from their funded research projects. In addition re3data offers publishers and academic institutions a tool for the identification of research data repositories where scientists can deposit their data.

(taken from: re3data.org/faq – Registry of Research Data Repositories. https://doi.org/10.17616/R3D, last accessed: 2023-03-14, license: https://creativecommons.org/licenses/by/4.0/)

FAIRsharing is much more than a repository registry. However, it also allows to search for suitable repositories based on subject, method, and more.

„Using community participation, the FAIRsharing team precisely curates information on standards employed for the identification, citation and reporting of data and metadata, via four standards subtypes. First, minimum reporting guidelines—also known as guiding principles or checklists—outline the necessary and sufficient information vital for contextualizing and understanding a digital object. Second, terminology artifacts or ‘semantics’, ranging from dictionaries to ontologies, provide definitions and unambiguous identification for concepts and objects. Third, models and formats define the structure and relationship of information for a conceptual model and include transmission formats to facilitate the exchange of data between different systems. And lastly, identifier schemata are formal systems for resources and other digital objects that allow their unique and unambiguous identification. FAIRsharing monitors the evolution of these standards, their implementation in databases and repositories, and recommendation by journal and funder data policies“

(taken from: Sansone, SA., McQuilton, P., Rocca-Serra, P. et al. FAIRsharing as a community approach to standards, repositories and policies. Nat Biotechnol 37, 358–367 (2019). https://doi.org/10.1038/s41587-019-0080-8, license: https://creativecommons.org/licenses/by/4.0/)

This service is part of the EU-funded OpenAIRE project. The OpenAIRE explore service harvests information about data sources, licenses, publications, research software items, etc. Among those, you can search for repositories under data sources: https://explore.openaire.eu/search/content-providers

„EXPLORE is an AI-driven, open research search engine to help you discover and outreach validated research results available in the OpenAIRE Research Graph. It integrates content from more than 12K trusted data sources worldwide, providing discovery and navigation to all types of scholalry works (publications, data, software) and links to funding/grants, organisations, metrics, people etc.“

(taken from: https://catalogue.openaire.eu/service/openaire.discovery_portal/overview last accessed: 2023-03-14, license: http://creativecommons.org/licenses/by/4.0/)