NGFF: Democratize Access to Next-Generation Bioimaging Data

Project description

N-dimensional image data are essential tools in the life sciences, used extensively to measure and analyze the structure, function, and behavior of biological systems. These complex datasets often form the foundation of research published in peer-reviewed journals. However, sharing and publishing such data can be challenging due to the existence of numerous file formats, large file sizes, and long download times, which typically result in the dissemination of low-quality representations. The Open Microscopy Consortium (OME) has addressed these challenges by developing the Bio-Formats library, which provides reverse-engineered support for a variety of image file formats. Recognizing the growing need for scalable, cloud-based solutions, OME initiated the development of the NGFF (“next-generation file format”) in 2019. Built on the Zarr format, OME-Zarr enables the efficient storage and retrieval of large imaging datasets across cloud providers, offering a multiscale structure that allows for pyramid-like resolution access, similar to Google Maps, minimizing the data required for download.

Interest in NGFF as a universal standard for bioimaging data is growing, with institutes, open-source developers, and commercial vendors seeking a unified method to define and share bioimaging data. The proposed project will fund a dedicated position to spearhead the democratization of open, FAIR bioimaging data. This role will involve coordinating the development of newly defined specifications, fostering a consensus-based community process, and establishing a sustainable governance structure. The position will also liaise with key players in the field, including major institutions and data projects such as HuBMAP, Janelia, and EMBL, encouraging the widespread publication of imaging data in the open NGFF format across various platforms, including Zenodo, commercial clouds, and national supercomputing centers. By formalizing imaging as a defined Zarr data type, this initiative aims to create a resilient and self-sustaining ecosystem for sharing bioimaging data, further promoting the open accessibility of large, complex datasets in the life sciences community.

Image: “Multiple clients” by Henning Falk, ©2022 NumFOCUS, is used under a CC BY 4.0 license.