xeno-spectral dataset

Welcome to the xeno-spectral dataset! This dataset contains semantically annotated spectral data from multiple species and three different perfusion states (physiological, malperfused and ICG). The HSI cubes were acquired with the Tivita® Tissue camera system from Diaspective Vision. Each data cube has the dimensions (480, 640, 100) = (height, width, channels) with the 100 non-overlapping spectral channels being in the range from 500 nm to 1000 nm at a spectral resolution of around 5 nm. This repository contains the raw data, related metadata as well as the preprocessed files.

📝 More details can be found in our publication: TODO

Interactive visualizations

We provide interactive visualizations for every image in the dataset. This allows you to browse the dataset, visualize the annotations and search for specific images without downloading them. Further, the following PCA plots visualize every image in the dataset color-coded by species and symbol-coded by perfusion state. Click on a point to open the corresponding interactive visualization for that image.

Principal component analysis (PCA) of median spectra for all images in the dataset stratified by class label. The median spectrum is computed per image and class label independently for each channel. Additional metadata for an image is available on hover (description of the metadata can be found in the visualizations). A click on a point leads you to the corresponding interactive visualization for that image. Transparent markers denote unclear ICG or malperfusion states.

Additional labels

The following figure shows additional labels that were not part of the xeno-learning paper but are included in the dataset because those labels were used in the semantic annotations.

Principal component analysis (PCA) of median spectra for all images in the dataset stratified by class label. The median spectrum is computed per image and class label independently for each channel. Additional metadata for an image is available on hover (description of the metadata can be found in the visualizations). A click on a point leads you to the corresponding interactive visualization for that image. Transparent markers denote unclear ICG or malperfusion states.

Download

There is a separate dataset for each species which can also be downloaded and used independently. Each dataset is available as a ZIP archive containing the raw data, metadata, and preprocessed files. Each dataset is structured similarly to the HeiPorSPECTRAL dataset.

The full dataset can be downloaded at:

https://e130-hyperspectal-tissue-classification.s3.dkfz.de/data/xeno_spectral_pig.zip
- Archive size: 254 GiB
- SHA256: f49e4f76a7a35eaab5027752e39cd1352d05f3d12f22cc224a9c7c134fb86b18
https://e130-hyperspectal-tissue-classification.s3.dkfz.de/data/xeno_spectral_rat.zip
- Archive size: 472 GiB
- SHA256: fe0eb886fac10bc76b686a1ae7b984bc8eabefcd510504c121c849dbeeeb2e2c

We do not recommend downloading the archive via the browser due its large size. Instead, you may use a download manager of your choice or wget as in the example below. This will allow you to interrupt the download and continue it at a later point. Please make sure beforehand that you have enough free storage to both download the zip archive and decompress the content. Prior to decompression, it is recommended to verify the checksum of the downloaded file:

# Example for Unix-based systems
# Best to be run in a screen environment (e.g. https://linuxize.com/post/how-to-use-linux-screen/)

# Download the pig dataset
wget --continue https://e130-hyperspectal-tissue-classification.s3.dkfz.de/data/xeno_spectral_pig.zip
# Verify the checksum of the downloaded file
curl https://e130-hyperspectal-tissue-classification.s3.dkfz.de/data/xeno_spectral_pig.sha256 | sha256sum -c -

# Download the rat dataset
wget --continue https://e130-hyperspectal-tissue-classification.s3.dkfz.de/data/xeno_spectral_rat.zip
# Verify the checksum of the downloaded file
curl https://e130-hyperspectal-tissue-classification.s3.dkfz.de/data/xeno_spectral_rat.sha256 | sha256sum -c -

Exemplary versions of the datasets are also available for exploring without the need to download the full dataset. For each species, it contains three images (one for each perfusion state).

Usage

We recommend using the data with the htc framework, which offers:

a pipeline to efficiently load and process the HSI cubes, annotations and metadata.
a framework to train neural networks on the data, including the implementation of several classification and segmentation models.
simple usage of pre-trained models.

Installation (example for Unix-based systems):

# Make the datasets available
export PATH_Tivita_xeno_spectral_pig=/mnt/nvme_8tb/xeno_spectral_pig
export PATH_Tivita_xeno_spectral_rat=/mnt/nvme_8tb/xeno_spectral_rat

# Install the htc package
# Note: in case the installation fails, please check out alternative installation methods at https://github.com/IMSY-DKFZ/htc?tab=readme-ov-file#package-installation
pip install imsy-htc

As a teaser, this is how you can use the htc framework to read a data cube, corresponding annotation and metadata:

from htc import DataPath

# You can load every image based on its unique name
path = DataPath.from_image_name("P190#2023_10_27_14_33_29")

# HSI cube format: (height, width, channels)
print(path.read_cube().shape)
# (480, 640, 100)

# Semantic annotation
print(path.read_segmentation("semantic#primary").shape)
# (480, 640)

# Retrieve arbitrary meta information (like the perfusion state)
print(path.meta("perfusion_state"))
# malperfused

# Or the species name
print(DataPath.from_image_name("R047#2025_01_26_12_14_55").meta("species_name"))
# rat

License

This dataset is made available under the Creative Commons Attribution 4.0 International License (CC-BY 4.0). If you whish to use or reference this dataset, please cite our paper TODO.

Cite via BibTeX

TODO

Acknowledgements

This project has received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (project NEURAL SPICING grant agreement No. 101002198) and the National Center for Tumor Diseases (NCT) Heidelberg's Surgical Oncology Program. It was further supported by the German Cancer Research Center (DKFZ) and the Helmholtz Association under the joint research school HIDSS4Health (Helmholtz Information and Data Science School for Health). This publication was supported through state funds approved by the State Parliament of Baden-Württemberg for the Innovation Campus Health + Life Science alliance Heidelberg Mannheim.

The authors gratefully acknowledge the data storage service SDS@hd supported by the Ministry of Science, Research and the Arts Baden-Württemberg (MWK) and the German Research Foundation (DFG) through grant INST 35/1314-1 FUGG and INST 35/1503-1 FUGG. Furthermore, the authors gratefully acknowledge the support from the NCT (National Center for Tumor Diseases in Heidelberg, Germany) through its structured postdoc program and the Surgical Oncology program. We also acknowledge the support through state funds approved by the State Parliament of Baden-Württemberg for the Innovation Campus Health + Life Science Alliance Heidelberg Mannheim from the structured postdoc program for Alexander Studier-Fischer: Artificial Intelligence in Health (AIH) - A collaboration of DKFZ, EMBL, Heidelberg University, Heidelberg University Hospital, University Hospital Mannheim, Central Institute of Mental Health, and the Max Planck Institute for Medical Research. Furthermore, we acknowledge the support through the DKFZ Hector Cancer Institute at the University Medical Center Mannheim. For the publication fee we acknowledge financial support by Deutsche Forschungsgemeinschaft within the funding programme Open Access Publikationskosten as well as by Heidelberg University.

We would like to thank Hannah Gottlieb, Oray Kalayci, Hussein Bahaaeldin, Polina Borisova, Pit Beckius, Lotta Biehl, Fatmanur Yilmaz, Patrick Unverdorben and Laura Mehlan for annotating the data.