What’s the issue with classical file formats?
To open and process image files, computers need to load the files into the computer’s memory. Large files can be for example subdivided into planes, which allows loading only the required planes of a stack. However, to load data inside a plane, the whole plane is loaded. Loading data across several planes and at user-defined angles would become more or less impossible since some files (e.g., in volume EM imaging, light-sheet microscopy, multiplexed imaging, etc.) can be really large (GB to TB size!). Processing of terabyte-sized image files is beyond the limits of what most scientists’ computers can do. Powerful workstations or high-performance computers offer solutions but they are neither widely accessible nor easy to use. Moreover, it is very difficult to access such files remotely when the data needs to be transferred from a location connected via a network. Furthermore, these classical files are not optimal for object-oriented storage like S3.
Why are next-generation file formats a solution?
An important difference between a classical file format and a next-generation file format (NGFF) is that the latter makes it easy to access and process only the part of an image that is of interest at a given time. This is achieved because the file structure is different. Instead of a so-called monolithic organization found in classical formats, NGFF allow to access chunks of the data in the file by breaking the file down to a multitude of small pieces that are logically bound together to represent the whole file. That means, a next-generation file format allows to store chunked N-dimensional arrays. Thus, a multi-dimensional image can be assessed along any dimension, loading only the chunks of interest. This feature also renders NGFF as “cloud-ready” formats, which allow streaming data chunks of a large file instead of the need to transfer the whole file. Find more information on NGFF provided by the OME team here or watch an talk by C. Tischer introducing NGFF for bioimaging here.
A graphical representation of the difference between monolithic, classical files and NGFF (here in the form of a Zarr directory) is depicted below in a comic by H. Falk: