NFDI4BIOIMAGE News
23.07.2025
#MeetOurDataStewards: Jens Wendt
Our Data Stewards are central to NFDI4BIOIMAGE’s Community Support. Get to know them! In our social media campaign, “Meet the Data Stewards”, we introduce one member of our NFDI4BIOIMAGE Data Steward Team every month. The #MeetTheDataSteward series continues with Jens Wendt, who gives us an insight into a help desk request he responded to as a data steward of NFDI4BIOIMAGE, which involved a number of complex tasks.
Research data management and FAIR data publication can appear challenging at first. At NFDI4BIOIMAGE, we understand that what sounds simple – simply uploading your data to a public archive along with sufficient metadata – is, in fact, quite a considerable effort. However, it is an effort that, once you’re willing to take it, enables your data to live beyond its original publication. You’re helping to make science more reproducible and sustainable. You may open paths to reuses that you might not even think of yet. And here is the good news: You’re not alone!
Our second featured data steward, Jens Wendt, is working at the Imaging Network at the University of Münster and has a background in Electrical Engineering, Information Technologies, and Biomedical Engineering. At NFDI4BIOIMAGE, his focus lies on administration, development and customization around OMERO, image data management, Python, and some image analysis tools.
Here, Jens tells us what he has been working on lately:
Let’s explore a genuine example of a help desk request that we received. I promise to be plain and open about the challenges we faced. Just to make it even more rewarding that together we succeeded in making a complex imaging dataset FAIR – and available publicly quite soon.
The data: A research group imaged over a dozen 384-well plates with different chemical compounds in each well and analyzed the fluorescence signal to draw conclusions about the effects on cells. They data they generated comprises a robust corpus of valuable images. The research paper is about to be submitted now and, thankfully, the authors are highly motivated to share their data publicly alongside with the article. For this purpose, they approached the NFDI4BIOIMAGE help desk to support sharing the data according to community approved practices.
Getting the raw image data is the first step for every data publication. This ensures that nothing has been altered unknowingly by, e.g., JPEG compression artifacts, i.e., that every pixel with the full bit depth signal is present. This already wasn’t straight forward in this case. The plates were imaged with an Operetta microscope from Revvity (formerly part of Perkin Elmer) and the research group used the proprietary Columbus software for their internal image data management and analysis. Fortunately, the Columbus software offers a way to export the images as “raw” .tif files.
Compatibility with OMERO was the next question. As we decided to submit to the Image Data Resource (IDR), which is based on the image data management platform OMERO (Allan et al., 2012, Nat Methods), we needed to make sure, that the ~5.700 exported single .tif files per plate would be correctly matched to their well, position and channel. The prerequisite for this is technical metadata in a companion file or in each file header that adheres to the OME schema to specify these details. An easy way to check this are the Bio-Formats CLI (command line interface) tools or just simply trying to upload the files into an OMERO instance (e.g., the NFDI OMERO demo instance). Thankfully, the required metadata was there and the .tif files correctly arranged into images, wells and plates.
(Experimental) Metadata needs to be supplied to the IDR in the form of a structured spreadsheet. This makes the published image data searchable and links metadata terms to ontologies. The research group provided us with their internal, very well curated spreadsheet of several thousand rows of matched image names and chemical compounds. This had to be transferred into the format the IDR expects which is best solved by advanced Excel formulas and scripts or – alternatively – in Python.
Conversion to OME-zarr was necessary as the IDR has switched over to this newly developed cloud-ready image file format standard (Moore et al., 2023, Histochem Cell Biol). OME-Zarr is the implementation of the OME-NGFF (next-generation file format) specification (Moore et al., 2021, Nat Methods). Glencoe Software has developed a very user-friendly and feature-rich GUI (graphical user interface) tool that everyone can use, the NGFF-converter. Therefore, the conversion proved to be straightforward and fast on our analysis server with 128 cores. I then uploaded it to our University-hosted S3 storage to allow the IDR staff to double-check that the conversion was valid.
The analysis workflow that has been applied to the data to achieve scientific results is a crucial aspect of FAIR data as part of a publication, facilitating its reproducibility. In this case, the analysis workflow was performed to segment individual cells and vesicles within the images. Many modern analysis tools allow for an export of the workflow in a human- or machine-readable manner or provide compatibility with the Common Workflow Language (CWL), which would even allow for fully re-running an analysis workflow. The Columbus software used here did not provide a readable export of the workflow. As the next best solution, we tried to at least export the segmented cells and vesicles on which the measurements were based as regions of interest annotations (ROIs) and label images (binary images containing labels vs. background) to integrate within the OME-Zarr files. ROIs are pixel-accurate polyhedra that allow for distinguishing exactly if a given pixel is counted as part of the segmented object or as outside of the object (background). Unfortunately, the Columbus software didn’t allow for the ROI export either and we were left with what is called the “bounding boxes” of the ROIs. These are rectangles sized just large enough to enclose all projections of the actual ROI in any direction. Of course, this information suffices only to know where an ROI was generated during the segmentation, but not to reproduce the segmentation down to the level of the actual ROIs.
Measurements of the segmented objects, e.g., how much fluorescence intensity is measures across all pixels within a given ROI, can be exported from Columbus into .csv files (comma separated value files, a tabular file format) but then need a conversion and proper matching with the images via a Python script or in Excel, so they can be properly displayed as part of the public data in the IDR. The end result are OME-Zarr files for each plate and structured .csv files containing experimental metadata of each well, e.g., the chemical compound used.
Now, we are ready to tie it all together, which will involve uploading the data to the BioImage Archive (BIA), where the OME-Zarr images will be hosted on long-term public S3 storage and then accessed by IDR. For the BIA publication, we will need to fill in the experimental metadata according to the REMBI template. In the end this will lead to maximum visibility and findability of this valuable dataset.
Don’t hesitate to reach out to our Data Stewards via the Help Desk.
