Bioimaging File Formats explained

A computer’s most basic unit of information is a simple distinction between two states: on or off. This is represented as a binary digit (or bit) with a value of either 1 (on) or 0 (off).

To store more complex information, multiple bits are combined. For example, 8 bits together make up a “byte”, which is a standard unit of data storage in computer memory. Since a byte consists of 8 bits, it can represent 256 different values (because 2^8 = 256). Each of these 256 possible values can be assigned a specific meaning. For example, in the ASCII (American Standard Code for Information Interchange) system:

  • The lowercase letter “a” is stored as 01100001 in binary.
  • The digit “1” is stored as 00110001 in binary.

Similarly, a byte can also represent a number. In an 8-bit binary system:

  • 00000001 represents the decimal number 1.
  • 11111111 represents the decimal number 255.

To store even larger numbers or more complex data, computers use larger chunks of bits, such as 12-bit, 16-bit, 32-bit, or even 64-bit storage. The more bits available, the more possible values can be represented.

Computers store, process, and transmit information using sequences of bits (1s and 0s). However, computer scientists and programmers don’t work directly with binary code when writing programs or managing data. Instead, they use programming languages and structured data formats. For example, a table is structured with rows and columns, meaning it exists in at least two dimensions. However, when a computer stores or transmits this table, it must convert it into a sequence of bits.

This process is called serialization—it transforms structured data into a linear sequence of bits so that it can be saved, transmitted, and later reconstructed by another computer. To ensure the original structure and relationships in the data are preserved during this process, different serialization formats are used. Some common serialization formats include:

  • JSON (JavaScript Object Notation)
  • XML (Extensible Markup Language)
  • YAML (Yet Another Markup Language)

For example, if a program on one computer serializes data in JSON format, another computer can read and de-serialize it using the same JSON rules to restore the original structure.

Data models vs. file formats

The data model specifies how information is subdivided into pieces and how these are semantically structured to represent the full information content best. In bioimaging, a data model is a scheme defining how all necessary pieces of information regarding a microscopy acquisition result (single images or multiple images) are structured to retrieve all information about the acquired image(s). In other words, the data model is the abstract set of rules defining how information from the outside world is represented inside a computer with specific terms and relationships in a machine-readable way.

To make use of a data model, it must be defined how the set of rules is adopted in the form of bits and bytes in a computer. This technical adoption of a data model is defined by the file format. In other words, file formats are containers describing how the information is organized so that a computer can read, exchange, store, and visualize it with the appropriate software. Different file formats can hence use the same data model, encoding the data model differently for the computer.

When a data model or a file format is used by a majority of different computer applications across or within a specific field, it may be referred to as a standard.

“Standard” file formats and (meta)data models in bioimaging

Most research microscopes are supplied by industry vendors who build their instruments including the software necessary to control the microscope and record images. Naturally, each vendor would ensure that their system provides optimal performance to record data with their microscopes. This includes engineering the file formats in which their microscope software saves the recorded images. The multitude of different imaging modalities from many different microscope vendors has led to the creation of a vast number of different file formats for microscopy, many of which are “owned” by the companies and are not open to the community (i.e., they are proprietary file formats).

This means there is no universal standard for file formats or metadata in the field of bioimaging.
Coming to a standard or a set of standard file formats in bioimaging (Swedlow et al., 2021, Nat Meth) is a declared goal collaboratively pursued by the international bioimaging community, organized under the roof of the international exchange network Global BioImaging.

However, the OME data model, the OME-XML file format (Goldberg et al., 2005, Genome Biol), and the OME-TIFF file format (Linkert et al. 2010, J Cell Biol) have indirectly become standard file formats for many applications. Since many computer applications can use TIFF as a format, the OME-TIFF file format allows quite broad use of the imaging data in different applications. OME-TIFF belongs to the classical file formats. At present, the development of a new file format that is suitable for access to data in cloud environments is under development. The format specification is OME-NGFF (Moore et al., 2021, Nat Meth) which can be categorized as a so-called next-generation file format (NGFF). A discrete implementation of OME-NGFF is the file format OME.zarr (Moore et al., 2023, Histochem Cell Biol). A high-level explanation about the strength of Zarr has been described in an Open Source Science Initiative article by J. Moore 2022.

Microscope vendors often engineer their own, specific file format to optimally work with their instruments during recording, and with their own software for processing and analysis. Industry-owned, closed formats are called proprietary file formats (PFF). Examples of proprietary file formats in microscopy are:

.czi (a file format by Zeiss)
.lif (a file format by Leica)
.nd2 (a file format by Nikon)
… any many more.

As opposed to the PFF from different vendors, the Open Microscopy Environment Consortium has developed the OME data model, a specification for storing data on biological imaging as an open file format, the OME-TIFF, which includes an OME-XML metadata header.

Classical file formats are written to hard drive or flash drive storage in a way that the information from image planes is organized in a linear sequence of bits. This way of storing the information has limitations when it comes to the dynamic use of only parts of the information stored in image files. Typically, for a computer program to use an image, the computer must load it into its temporary memory (RAM). If access is only required for a subset of the plane and even across different planes, yet all relevant planes have to be loaded into the RAM. Very large files take considerable amounts of time to load and require large RAM sizes, which are often not available to all users. Additionally, if data is transferred from one location to another, the full file is transferred completely in one piece before the information is accessible and usable. Large files take a long time for data transfer impeding its readiness for access and use in shared or remote environments. Next generation file formats offer a different approach to storing the information, which allows to dynamically load relevant parts of a file (chunks of data). Essentially, this works by breaking the file down into a large number of tiny file pieces that can be loaded together in combinations as needed.
An excellent introduction to the difference between classical and next-generation file formats in the context of bioimaging is given in a talk by C. Tischer during a Global BioImaging workshop in 2022.

(This page was last updated on March 6th, 2025)