Why research-discipline-specific metadata standards are needed

The FAIR (findable, accessible, interoperable, reusable) principles provide guidance in the form of general criteria for what renders data FAIR. However, no discrete technical adoption of these rather loosely defined criteria is provided. The reason is that no universal solution can meet the needs of all research disciplines. Instead, depending on each research discipline’s discrete techniques and best practices, the technical solutions to make data FAIR must fit the research discipline.

The challenge, however, is that (…) investigators (…) don’t know how to begin to operationalize such principles (…). What exactly makes metadata attributes “accurate and relevant”? What qualifies a metadata specification as “rich”? Which community standards are the most important? The challenge with the FAIR Guiding Principles is that they are abstract and lack context. (Musen et al., 2022, Sci Data)

Therefore, community-derived consensus recommendations for metadata in bioimaging are an essential step toward making microscopy data FAIR. Given the variety of imaging techniques in bioimaging, even an intra-discipline heterogeneity for metadata requirements occurs. Here, we introduce selected metadata standardization approaches in bioimaging.

Which standards for metadata annotation exist in the field of bioimaging?

After acquiring microscopy data, researchers must decide how best to communicate what they‘ve done. That includes:

  • technical information about the instrument used;
  • sample information about what was imaged (biological entity, specimen origin, etc.); and
  • experimental information about how the investigation was performed

Find an overview of metadata for bioimaging explained by S. Kunis here. Deciding which annotations to include, in which annotation format, and with which words (or terms/vocabulary) can be challenging in everyday practice. While no general metadata standard exists for all bioimaging experiments, some community-developed standards have been proposed. Here we give a short introduction to some of these standards references:

  • OME data model (Technical metadata description of an imaging experiment)
  • REMBI (Recommended Metadata for Biological Images)
  • 4DN-BINA-OME-QUAREP (BNO) tiered metadata guidelines
  • 3D-MMS (3D microscopy metadata standard)
  • BIDS (Brain Imaging Data Structure) for Microscopy
  • MITI (Minimum Information for Highly Multiplexed Tissue Images)
  • MIHCSME (Minimum Information for High Content Screening Microscopy Experiments)

Proposed by community groups with expertise in different areas of imaging-related scientific fields, these standards are not mutually exclusive. Rather, they overlap or build on each other, or they build on generic metadata standards like DataCite.

Metadata annotation facilitates data retrieval and serves a clear understanding of data for reproduction or reuse in new research questions. The minimum requirement would be making the data identifiable as the underlying original data of a research finding. However, metadata is particularly important for the ability of computers (and then via search engines also humans) to find relevant data in a database or on the internet. Therefore, apart from the content of image annotation, it is also important to consider the format of metadata annotations.

In this article, we focus on the recommended content to provide practical orientation of what to annotate rather than the annotation format. Most standards come with a suggestion of a suitable annotation format, too.

OME data model

At the beginning of the 21st century, most data formats (including the stored metadata) were imposed on microscopists by choice of commercial instruments since vendors provide their own proprietary file and metadata formats for technical descriptions. What was saved and how it was saved was sometimes difficult to retrieve. Files had to be converted from one format to another during the research process, often leading to loss of metadata down the road. Swedlow et al. started working on the Open Microscopy Environment as a framework to enable lossless handling of images with their metadata. They converted the proprietary formats to an XML-based common metadata model. In 2005, the OME data model was released with the OME-TIFF format as an open, broadly usable file format for microscopy data (Goldberg et al., 2005). The data model contains items for relevant information, e.g., on the Project, the Experiment, the Instrument, the Image, a.s.o. – in total: 12 top-level elements with multiple sub-elements each. The model was represented in an XML format to be included in the XML header of the TIFF file format.

The OME data model has become a core data model for many bioimaging applications. It has led the groundwork for better interoperability between different software and different machines. The model is described in detail in the OME documentation.

(Go to the top)

REMBI (Recommended Metadata for Biological Images)

The OME model originated from technical necessities for handling image data. The often quite technical elements of the OME data model go beyond what researchers typically think about when performing their experiments in the lab. Another approach to metadata annotation is the question: What should each researcher performing a microscopy experiment make sure to have annotated at least (if not recorded automatically)?
Based on community workshops by a group of international imaging scientists, these REMBI guidelines for a set of recommended metadata annotations were published in 2021 (Sarkans et al., 2021). A key idea was that metadata should (or must) contain information that is relevant to three main groups of science professionals:

  • Biologists using microscopes for their research
  • Computer vision researchers interested in image processing and analysis
  • Imaging scientists working on the development of imaging techniques

To cover these needs, 35 items of metadata were proposed that should be at least annotated for imaging data. Each item is a „Key“ in combination with a discrete value that specifies the information for this Key („Key-Value Pair“ annotation). For example, one key item is „Imaging Method“. For a user annotating data, the value for this item would be the exact microscopy technique used to record the image, for example, „line-scanning confocal fluorescence microscopy”. The 35 items are grouped into eight categories. Each item is described in the REMBI publication, and examples are provided. To avoid the items being annotated with arbitrary terms, the authors provided suggestions from which controlled vocabulary or ontology the terms for the annotated values should be taken.

The REMBI metadata standards can be used to annotate individual images. Still, several REMBI items fit best for annotating datasets from an experiment or even a series of experiments within a study. REMBI has a flat structure and is a good beginner‘s orientation on what to think about when annotating microscopy metadata.
An example of how REMBI could be used to annotate metadata in an OMERO database was proposed by the RDMbites team of ELIXIR-UK: https://www.youtube.com/watch?v=3J5zqqO9LNs

(Go to the top)

4DN-BINA-OME-QUAREP tiered metadata

Building on the OME data model, members of the initiatives 4D-Nucleome project (4DN), BioImaging North America (BINA), Open Microscopy Environment (OME), and Quality Assessment and Reproducibility of Images in Light Microscopy (QUAREP-LiMi) have proposed a guideline that allows annotating metadata at several levels of complexity (Hammer, 2021). The authors suggest a tiered system in which the need for more extensive and more fine-grained metadata details increases depending on the experiment that is performed and the perceived reuse value of the images.

This model builds on the well-established OME data model intended to fit that model’s structure. Hence, it not only proposes what to annotate but also how to annotate it. The result is a flexible system of metadata. Concerning the format, users decide if they want to focus on the OME core model for metadata or go beyond and include more items as suggested by the guidelines. Concerning the description detail, three tiers are suggested that users can orient along:

  • Focus on Tier 1 for the minimal requirements of metadata annotation for any imaging experiment.
  • Include Tier 2 if the aim is to perform advanced quantification based on the imaging experiment.
  • If the goal is even to further or newly develop an imaging method or modality itself, Tier 3 metadata annotation is required to make the data fully described.

For practical applications, annotating the data with all the respective metadata items per tier can be challenging. Given the model‘s complexity, novel tools to ease the annotation are being developed, one of which is the Micro-Meta-App intended to provide a graphical, even playful, interface for metadata annotation.

(Go to the top)

3D-MMS 3D microscopy metadata standard

Many microscopy techniques produce either a 3D reconstruction of a specimen from several images or a 3D volume rendering based on voxels (3-dimensional pixels). One example of a powerful 3D microscopy technique is Selective Plane Illumination Microscopy or Light Sheet Microscopy (and its derived, specific modalities). These techniques often not only require specialized and advanced instruments, but also sophisticated experimental preparations, for example „tissue clearing“, the chemical processes that allow making originally opaque specimens transparent so that light can penetrate the tissue with only minimal scattering and absorption.

Therefore, an expert group of 3D microscopy scientists has elaborated specific metadata items for 3D microscopy (Ropelewski et al. 2022). They propose a set of 91 items distributed over seven categories that are intended for the description of image datasets (not single images). 31 items are deemed the minimal mandatory standard that would be required if the researchers intend to upload the images to the Brain Imaging Library repository. JSON templates or Excel-templates to record the metadata are provided by the authors.

The biological background of this standard is mainly brain research, but the standard‘s application is not limited to it. Unlike with REMBI, no discrete suggestions for ontologies or controlled vocabularies are given.

BIDS – Brain Imaging Data Structure – Microscopy Extension

Also based in the field of neuroscience, the BIDS standard goes beyond a metadata annotation. It is rather a file structure standard defining file tree paths and the documents therein for any brain-imaging-related experiment (originally developed for magnetic resonance images). A file tree structure is, however, a sort of metadata in itself, since relationships and hierarchies between individual files are implicitly contained in the file structure.

The original BIDS structure allows the inclusion of extensions. Community stakeholders can propose extensions called BEP (BIDS extension proposals). This procedure allows for an inclusive, democratized approach to defining the standard for a given application. The BIDS extension for microscopy was published in 2022 (Bourget et al. 2022).

(Go to the top)

MITI (minimum information guidelines for highly multiplexed tissue images)

A set of technologies combining imaging with sequencing technology has now become known under the term „spatial omics“. With appropriate setups, it is possible to resolve, for example, gene expression profiles of many genes not only at the level of single cells (single-cell RNA sequencing) but also at the level of single cells in the original tissue context (i.e., with the spatial information of where the transcripts are located in the tissue; spatial transcriptomics). Different technical solutions exist for this task. One approach is highly multiplexed tissue imaging in which a prepared tissue section is imaged multiple times in a row, each time staining for another target mRNA (the staining of which can be washed out after each run). For example, specific nucleotide probes coupled to fluorophores can be used to quantify the RNA amplicons in situ (in the tissue) based on the fluorescence signal intensity in the microscopy image.

The MITI guidelines are structured in “levels” that cover metadata items like biospecimen, reagent, data acquisition, and data analysis. Metadata levels for imaging with antibodies, aptamers, peptides, dyes, and similar detection reagents are considered (Shapiro et al., 2022). The guidelines describe the data at five data levels:

  • Level 1 refers to raw image tiles in original file formats
  • Level 2 is assembled data into multi-channel images or image sets
  • Level 3 describes a dataset after quality control and potentially with segmentation masks or similar
  • Level 4 data contains spatial feature tables used for analysis with dimensionality reduction (e.g. UMAP) or similar.
  • Level 5 would be the top level of integrated spatial features with images.

To enable researchers to collect all the metadata items, the authors provide standardized collection formats in the supplementary material (e.g., a spreadsheet format entry mask). These tools allow to pick metadata items from pre-defined entries. The entries are then validated by the tool.

(Go to the top)

MIHCSME (Minimum Information for High Content Screening Microscopy Experiments)

Published in 2023, these minimum standard guidelines and templates address metadata annotation for high content screening (HCS) microscopy experiments. HCS involves mostly automated imaging of a multitude of individual wells and often involves many plates with samples that were differently treated. Thus, a large amount of data is generated relatively quickly. HCS typically involves automated analysis of suitable image read-outs, which could be the cellular phenotype or a marker protein’s fluorescence intensity. To make sense of such data, each well and each plate must be exactly described with suitable metadata. So far, such annotations are not well-standardized, so comparing different screens from unrelated sources is difficult. The Image Data Resource (IDR) public repository has invested much work to provide richly annotated HCS datasets from published studies for cross-study data mining. Hosseini et al. (2023) suggest a minimum information standard for HCS that is based on REMBI (see above) and integrates with the generic ISA framework (Investigation, Study, Assay). The authors provide a tabular entry mask (excel-based) that helps researchers annotate relevant HCS metadata at the stage of experimental data acquisition. The template guides the users by providing not only examples for suitable values to annotate along the standardized annotation items (Keys) but also by offering a pre-selected list of ontology-derived standard vocabulary to choose from (Learn here what an ontology is with the help of the FAIR Cookbook by Elixir Europe).

(Go to the top)

What is the best standard to use?

The answer to this question is good and frustrating at the same time: It depends on your research, your needs, and the extent to which you would like to make your data reusable. If, for example, your research is part of a larger collaborative consortium to create high-value datasets, then agreeing on a specific metadata annotation will be necessary. The flexibility is great, but insecurity about a suitable standard can be frustrating. What all of these guidelines are for sure: A great start for researchers to think about how to get the most out of their data – both for themselves and for other researchers after the data is published.