A deeper dive into metadata structures, and the OME data model
How is metadata structured and stored in files?
Metadata, i.e., the data providing essential accessory information about data (like image pixel values), can be structured in various ways. A specification for this structuring is defined by a data model, which provides an abstract set of rules about how to describe concepts or objects from the real world inside the computer. In the field of bioimaging, data models based on the OME data model are of particular importance:
The OME data model is a specification of how to store metadata
The Open Microscopy Environment Consortium (OME) has defined an extensible data model for bioimaging (metadata), described in detail in the original publication from 2005. It has been updated since (see, e.g., Linkert et al, 2010). As opposed to genomic sequencing data, the discrete context in which a microscopy image was generated is of high importance for interpreting the image. That means information on the instrument, including its variable hardware components like lasers, objectives, etc., but also sample details (cell type, state, experimental modifications of the specimen) and sample preparation (fixation, permeabilization, contrast method, staining procedures, etc) need to be described for full context. Especially if downstream image processing and analysis to yield quantitative information are applied, the technical and biological context are essential. Traditionally, microscope-vendor-based proprietary file formats needed to be converted into another format to go through steps of image visualization, processing, and analysis – with the essential metadata often lost in the process. The OME data model was designed as a “natural broker among a multitude of otherwise incompatible software tools” (Goldberg et al. 2005) making it possible to preserve metadata throughout the image data life cycle.
For the OME data model, an image is defined as an (up to) 5D representation of recorded data:
- x and y positions of the pixels in the image (2D image plane)
- focal position z of the 2D image plane
- channels at each pixel position (e.g., three different fluorescent channels)
- time points at which images were taken in a time series
The model accounts for individual images, but also a hierarchical context between different images, e.g.:
- Dataset: Groups of images analyzed together (e.g. from the same experiment)
- Project: Groups of datasets logically connected as part of a bigger investigation
- Image: The 5D representation of a recorded experiment
- Pixels: The image points in space at which a specific value is recorded
- and more
These definitions build the vocabulary that allows standardizing what the metadata item refers to. They are OME objects in the OME data model. Specific objects are also defined for images from a high-content screening experiment, in which multiple images are taken from multiple wells of a plate. Moreover, the model accounts for “features”, which is the marking of spatial regions such as the cell nucleus based on image analysis and segmentation (often referred to as “region of interest” of “ROI”). Beyond these desriptions of the images themselves, the OME data model includes additional context information. This is, for example, the description of the experimenter, the research group, the microscope name, and so on.
An example of the hierarchical schema of the metadata captured by the OME data model is shown below for the branch of the object tree for the OME object “Image”. This picture is taken from the official OME documentation, where you can find a full description and overview of the model.
OME-XML and OME-TIF as file formats based on the OME data model
To practically apply this concept, the model is implemented into a serialization format, i.e., the way that the information is encoded so that computers can read and transfer the information. An OME-XML file is the implementation of the data model using the schema of the Extensible Markup Language (XML).
It consists of specific elements, written in a specific way, so that the computer “knows” what information is included here.
For example: Written in the XML-schema the acquisition date of an image could be encoded like this:
<AcquisitionDate>2008-06-19T00:39:00</AcquisitionDate>
Here, the term “AcquisitionDate” is the “markup” for the computer to know that the content inside this field is always the information on the Aquisition Date. What is not marked with < > (or some other defined characters) is the specific information saved for this data object.
The OME-XML file format allows to store metadata in a standardized format with an open specification that can be read across many different software platforms. At the same time, the data model, which is implemented in the file format, allows the structured annotation of data with the help of databases. The OME data model is underlying the functionality of OMERO, an open-source image data management platform developed by the OME consortium.
If you are interested in examples of how an OME-XML file looks like, please review the OME documentation, e.g. OME-XML for filters.
OME-TIFF is a TIF-image file format using OME-XML as its metadata header
Images recorded at microscopes are mostly written in proprietary file formats (e.g., .czi format by Zeiss, or .nd2 format by Nikon). In contrast, the OME-TIFF format (.ome.tif) is an open file format based on the Tagged Image File Format (TIFF). Thus, any program that can read TIFF can read the image while the extraction of the metadata is possible from the standardized OME-XML header.
The image below shows an example of a proprietary microscopy image file from an experiment using laser scannning confocal microscopy. Beside the actual image that is rendered on the screen from the pixel data, the image file contains essential metadata that allows the software to interpret and render the binary data of the recorded image. Using the Bio-Formats library (integrated in the software OMERO) the image was translated to OME-TIFF:
Core model and extensions of the model
Naturally, not one file format or one specification of metadata items will be able to capture all the necessary information about microscopy experiments that exist now and that will exist in the future. That is why the “extensibility” of the OME model is important to emphasise. To provide sufficient standardization, specific aspects of the model should be adhered to strictly. These are referred to as the core model. On top of that, adapted extensions of the model can be implemented by research communities, companies or interest groups in order to make the model work with their specific imaging modality or field of research.
The tiered NBO guidelines for extended metadata
Based on the OME data model, members of bioimaging communities worldwide have worked together to propose much more detailed guidelines on metadata specifications. These guidelines were mainly developed by members of the 4D-Nucleome project (4DN), members of BioImaging North America (BINA), and members of the group Quality Assessment and Reproducibility for Images in Light Microscopy (QUAREP-LiMi), and they are based on the OME data model. Therefore, the specification was termed NBO-specifications. In brief, these guidelines propose to extend the OME core model in a modular fashion. This allows tiering the metadata to different needs depending on the complexity of the imaging experiment that is described, yet providing a standardization framework in spite of increasing complexity. The details behind the NBO guidelines are described by Hammer et al, 2021.
How do researchers use these data models and guidelines in practice?
The data models, the metadata file formats and the extended, tiered metadata guidelines provide a framework to enable storing metadata in a computer in a machine-readable, open, and interoperable way. However, researchers who are not familiar with computer science concepts, programming, or file format structures, might find it difficult to make use of these concepts in their everyday research practice.
For this purpose, different tools and user-centric recommendations are being developed. Find more guidance towards annotating metadata in practice in the Metadata Guide and in I3D:bio’s OMERO training material.