What is suitable storage for bioimaging data?
Itʼs Not Just About Space – Itʼs About Structure
Choosing the right storage solution is crucial – not just for keeping your data safe, but also for ensuring it remains accessible, shareable, and usable for years to come. The choice of storage should therefore be done in close collaboration with IT professionals. In fact, most academic IT centers can provide centralized storage solutions for their scientists, along with secure backup and network access. Don’t do it alone.
At a high level, we can distinguish a few storage types.
- Tape storage: A storage where data is written to tapes physically, which can be stored over long-term at low energy and maintenance cost. This storage type is often used in academic libraries when the purpose of storing the data is to hold a faithful record of the data over the long term. However, on tape storage, data is not readily accessible. To work with the data again, the data is typically transferred back from tape storage to other, more accessible and performant storage types. This is the reason why tape storage is often referred to as “cold” storage. As if the data is frozen away, but not frequently taken out of the freezer and put back in.
- To actively work with data, i.e., frequently loading the data into a computer program, analyze the data, etc., data needs to be stored on a readily accessible storage type. File storage systems are classical storage types that allow frequent access. Therefore, such system are often referred to as “warm” or “hot” storage. Most researchers are used to working with file storage types. They are the typical storage units also contained in PCs and laptops. File storage can be on local disks, network-attached storage (NAS), or high-performance clusters.
- Another, readily accessible storage type is called object storage. If the file format allows it (e.g., with OME-Zarr), object storage enables to access data and sub-fractions of data through a network (or the Internet) readily. An example of object storage is S3 storage, named after the Amazon Web Service (AWS) Simple Storage Service (S3). In this type of storage, data is hosted as data units (objects) with discrete metadata and identifiers. This allows access to pieces of data independently, which would classically be wrapped together in a larger file format container.
Each data storage format has its advantages and disadvantages, depending on the discrete use. Therefore, it is essential that core facilities, IT centers, and scientists consider the entire data lifecycle, from acquisition through processing and analysis workflows to public data sharing and long-term preservation.
What is the optimal storage for bioimaging data?
Bioimaging encompasses a diverse range of technologies and applications. Depending on the discrete image data, any of the above-mentioned storage solutions or a combination of them can be most suitable. When approaching IT support, make sure to contact your core facility professionals as well, and bear in mind to describe:
- How large are individual files?
- How many files typically belong to one dataset?
- What is the file format?
- How many files or datasets due you expect and over which period of time?
- Does data need to be moved frequently?
- How many people and from which locations (at work, collaboration partners, from home? through a VPN?) need access to the data, and for which purposes?
- How and on which machines (PCs, working stations, cloud computing clusters) are data processed?
- Where should the data be ultimately archived?
- How much manual metadata enrichment is required?
This information will help to start the conversation to ensure you’re on the right track when deciding which storage options to pursue.
Seek support from the I3D:bio team or the NFDI4BIOIMAGE Help Desk where in doubt.
Back to: Storage