Overview of metadata for bioimaging in the context of linked data

The world of metadata might seem confusing – especially for researchers who have not previously focused in great detail on this aspect of research. Metadata annotation is not only important to organize and curate data for single researchers or research collaborations, but at a higher level metadata curation can contribute to the ultimate goal of increasing the collective knowledge generated by scientific research. This is the goal of what is coined the “semantic web” – enable to connect knowledge (instead of merely linking documents) over the internet.

To get a better picture of the meaning of metadata for bioimaging in the context of linked (open) data, and to learn about the technical background associated with this topic, a glossary poster was created by J. Dohle and S. Kunis, which is presented to you in this video:

Structuring of Data and Metadata in Bioimaging: Concepts and technical Solutions in the Context of Linked Data.

S. Kunis and J. Dohle, 2022.
Available for download at:
https://doi.org/10.5281/zenodo.7018928

See also: information on research data management.

The video demonstrates that metadata encompasses various aspects, with different levels of complexity and depth of knowledge. Here, we focus on a pragmatic approach: How can researchers annotate metadata in their daily work in a user-friendly way, while simultaneously enhancing their metadata annotation to move towards FAIR, machine-readable metadata?

Metadata Annotation Tools for Bioimaging, and tipps and tricks

Vendor-specific file formats are typically optimized for image acquisition at the microscope. In most cases, the image files contain metadata, which is the essential supplementary information required to understand and work with the measurement data. For example, pixel size, laser intensities, light path information, and similar details are stored as technical metadata. In imaging data, the metadata is often contained in the header of an image file. However, proprietary file formats may differ significantly in their metadata structure. For accessibility and interoperability, it is vital that metadata can be read correctly and completely. The Bio-Formats library is an example of software designed to translate file formats from proprietary to open, mapping the metadata to the well-defined metadata model of an open file format (e.g., OME-TIFF).

For data to add value by containing rich, machine-readable metadata, more information needs to be captured beyond “just” the technical details. Information about the specimen, measured entities, sample preparation methods, and similar aspects is required to interpret the content of the image. Traditionally, this information has been documented in the researcher’s lab notebook. If properly published in the methods section of a paper, in theory, another researcher can understand, reuse, or reproduce the dataset. However, in a world full of data, human-readability alone limits what can be done with the data. In a FAIR data world, datasets are richly annotated with metadata in a way that allows computers to find, read, and interpret the information.

How can researchers add such relevant and important metadata directly to images, especially if the information is contained in the image header? Metadata must not only be curated but must also comply with interpretable structures. Therefore, structured annotations are essential for ensuring machine interoperability with metadata. To prevent researchers from having to write metadata in complex, coding-like ways (e.g., creating an XML file), software tools have been developed to facilitate the capture and editing of metadata. Below, three tools are introduced that have been developed in recent years and are still being actively improved:

  • MDE.mic is an editor that can read original metadata using Bio-Formats and allows to write new metadata into the metadata container, guided by fields and ontology-masks. Embedded in the image data management software OMERO (as OMERO.mde), this tools allows to edit the metadata during the upload of images using the OMERO.insight client on the researchers computer.
    Further information:

  • MicroMetaApp is a specialized app to capture technical metadata from any customized microscope setting that the user configures. With a graphical GUI providing a virtual representation of a microscope, a user can drag and drop individual microscopy hardware items, annotate them individually with all relevant information, and then combine this information into a comprehensive metadata file for export. The metadata is stored according to the 4D-BINA-OME metadata model.
    Further information:

  • Methods2J is designed to ease drafting a comprehensive methods description section based on the metadata contained in the imaging file. Methods2J is a script that runs in the widely used, open-source image analysis software ImageJ/Fiji. It can build on
    metadata sources from the MicroMetaApp and from the original image files using Bio-Formats.
    Further Information:

Practical metadata annotation in OMERO.web

In addition to the tools mentioned above, a relatively straightforward way to curate metadata for bioimaging data files is by adding metadata directly within the OMERO.web interface. This option is available if you have access to an institutional OMERO instance for bioimage data management. OMERO.web allows users to include structured metadata annotations in the form of Tags and Key-Value Pairs.

Tags are particularly useful for organizing and structuring data. Think of tags as a replacement for deep folder hierarchies. Each tag represents an attribute, and by combining tags, you can group data logically, similar to organizing them into nested folders on a computer. The key advantage: tags are far more flexible. You can easily change the combination of attributes, effectively simulating different folder hierarchies that may better suit specific ways of viewing the data.

Key-Value Pairs, as the name suggests, consist of two components: a key and a value. The key represents a real-world object or abstract concept, while the value specifies it further. For example, if you are working with cells derived from different mouse strains, the Key-Value Pairs could look like this:

Key: Organism strain Value: C57BL/6J

This combination allows both humans and machines to interpret the metadata as: “The mouse strain (used here) is C57BL/6J.”
Find more detailed training on how the I3D:bio team recommends to use Tags and Key Value Pairs in “I3D:bio’s OMERO training material”, published on Zenodo.

Beginner’s Guide to REMBI, Ontologies, and Standard Terms

We can now take metadata annotation one step further. While structured metadata allows both humans and machines to locate and interpret metadata, how can we ensure that both understand the meaning of the metadata? Furthermore, what keys should humans or machines look for to find specific details about an experiment?

Two key concepts address these questions:

1) Standard metadata checklists to unify the information that should be made available.

2) Defined standard terms for annotation, ensuring that the meaning of terms is unambiguous.

Standard Metadata Checklists

Community-driven metadata checklists make it easier for third-party users of original data to understand the dataset. Such standards depend on the research field and the technology used. Peer groups and communities of specialists determine what information is important and should always accompany publicly available data.

The “Recommended Metadata for Biological Images” (REMBI) checklist is an example. REMBI is not a formal computer standard but rather a collection of information items organized into groups. These groups provide a clear understanding of what the data represents. For instance, REMBI suggests using the key “Biological Entity” to describe what is being investigated in the image. When working with biological images, it is advisable to include this key along with a value that best describes your data. On top of REMBI, you can add plenty more Key-Value pairs. Additional (minimum) recommendations depending on the field of research or the research technique might be available. Find a list here. When in doubt, make sure to use REMBI at least.

Ontologies for Machine-Interpretable, Standard Terms

When we use a term like “image,” how can we ensure that its meaning is clear? In the context of microscopy data, “image” likely refers to a microscopy image. But having the term „image“ as an isolated value creates unclarity. An image could be a Docker-image, i.e., a deployable script of some computer program. It could also be the synonym for the word „metaphor“. But also it could be the associations and connotations we have with a person, of who we have a certain “image” in terms of a mental representation.

Using Key-Value Pairs, some terms may require detailed definitions to clarify their meaning. For example, the term “CD4+ T cell” may be immediately understood by immunologists as “a cell expressing the CD4 surface marker, which is a co-receptor molecule involved in T cell receptor activation via peptide-MHC-II interactions on antigen presenting cells.” However, more specific subtypes of CD4+ T cells exist, and the broader term may suffice depending on the level of detail required. We can, however, acknowledge that the descriptive information is much more interpretable than the term „CD4+ T cell“ alone.

This is where ontologies come in. Ontologies are hierarchical collections of terms with precise definitions, relationships, and attributes. For example, within an ontology of blood cell types, “CD4+ T cell” is a subtype of lymphocytes, which are derived from the bone marrow in mammals. By referencing this ontology, a single term can convey extensive information about the entity it represents.

The Value of Ontologies in Metadata Annotation

Ontologies formalize domain knowledge, enabling computers to construct semantically connected knowledge graphs. By using ontology-compliant terms, researchers can ensure that the metadata is:

  • Non-ambiguous, with clearly defined meanings.
  • Machine-interpretable, enabling advanced data integration and analysis.

To implement this, researchers can use an ontology lookup service (guide to ontology lookup services or TS4NFDI-based SemLookP) to find the appropriate terms and link them via their unique URI. This simple step can make data significantly more accessible and interoperable—a major leap toward achieving FAIR data principles.

Coming back to our example from above:

Key: Organism strain Value: C57BL/6J
Key: Organism strain term accession number Value: http://www.ebi.ac.uk/efo/EFO_0000606

The URI shown for the value of the Organism strain term accession number resolves to the clearly defined term derived from the EFO ontology.

Find more comprehensive guidance of how to use ontologies in Key-Value Pairs in OMERO in I3D:bio’s OMERO training material.