ACROBAT - a multi-stain breast cancer histological whole-slide-image data set from routine diagnostics for computational pathology
SND-ID: 2022-190-1. Version: 1. DOI: https://doi.org/10.48723/w728-p041
Associated documentation
Citation
Alternative title
ACROBAT
Creator/Principal investigator(s)
Mattias Rantalainen - Karolinska Institutet, Department of Medical Epidemiology and Biostatistics
Johan Hartman - Karolinska Institutet, Department of Oncology-Pathology
Research principal
Karolinska Institutet - Department of Medical Epidemiology and Biostatistics
Description
The ACROBAT data set consists of 4,212 whole slide images (WSIs) from 1,153 female primary breast cancer patients. The WSIs in the data set are available at 10X magnification and show tissue sections from breast cancer resection specimens stained with hematoxylin and eosin (H&E) or immunohistochemistry (IHC). For each patient, one WSI of H&E stained tissue and at least one one, and up to four, WSIs of corresponding tissue stained with the routine diagnostic stains ER, PGR, HER2 and KI67 are available. The data set was acquired as part of the CHIME study (chimestudy.se) and its primary purpose was to facilitate the ACROBAT WSI registration challenge (acrobat.grand-challenge.org). The histopathology slides originate from routine diagnostic pathology workflows and were digitised for research purposes at Karolinska Institutet (Stockholm, Sweden). The image acquisition process resembles the routine digital pathology image digitisation workflow, using three different Hamamatsu WSI scanners, specifically one NanoZoomer S360 and two NanoZoomer XR. The WSIs in this data set are accompanied by a data ta
... Show more..The data set consists of three subsets, the training, validation and test set, based on the ACROBAT WSI registration challenge. There are 750 cases in the training set, for each of which one H&E WSI and one to four IHC WSIs are available, with 3406 WSIs in total. The validation set consists of 100 cases with 200 WSIs in total and the test set of 303 cases with 606 WSIs in total. Both for the validation and test set, one H&E WSI as well as one randomly selected IHC WSI is available.
WSIs were anonymised by deleting the associated macro images, by generating filenames with random case IDs and by overwriting meta data fields with potentially personal information. Hamamatsu NDPI files were then converted using libvips (libvips.org/). WSIs are available as generic tiled TIFF WSIs (openslide.org/formats/generic-tiff/) at 10X magnification and lower image levels.
The data set is available for download in seven separate ZIP archives, five for the training data (train_part1.zip (71.47 GB), train_part2.zip (70.59 GB), train_part3.zip (75.91 GB), train_part4.zip (71.63 GB) and train_part5.zip (69.09 GB)), one for the validation data (valid.zip 21.79 GB) and one for the test data (test.zip 68.11 GB).
File listings and checksums in SHA1 format are available for checking archive/data integrity when downloading.
While it would be helpful to notify SND of any publications using this data set by sending an email to request@snd.gu.se, please note that this is not required to use the data. Show less..
Data contains personal data
No
Language
Unit of analysis
Population
Anonymised female primary breast cancer patients from the Stockholm region
Study design
Observational study
Sampling procedure
Time period(s) investigated
2012 – 2018
Number of individuals/objects
1153
Data format / data structure
Geographic spread
Geographic location: Stockholm County
Responsible department/unit
Department of Medical Epidemiology and Biostatistics
Contributor(s)
Masi Valkonen - University of Turku, Institute of Biomedicine
Kimmo Kartasalo - Karolinska Institutet, Department of Medical Epidemiology and Biostatistics
Kajsa Ledesma Eriksson - Karolinska Institutet, Department of Medical Epidemiology and Biostatistics
Leena Latonen - University of Eastern Finland, Institute of Biomedicine
Constance Boissin - Karolinska Institutet, Department of Medical Epidemiology and Biostatistics
... Show more..Masi Valkonen - University of Turku, Institute of Biomedicine
Kimmo Kartasalo - Karolinska Institutet, Department of Medical Epidemiology and Biostatistics
Kajsa Ledesma Eriksson - Karolinska Institutet, Department of Medical Epidemiology and Biostatistics
Leena Latonen - University of Eastern Finland, Institute of Biomedicine
Constance Boissin - Karolinska Institutet, Department of Medical Epidemiology and Biostatistics
Yanbo Feng - Karolinska Institutet, Department of Medical Epidemiology and Biostatistics
Philippe Weitz - Karolinska Institutet, Department of Medical Epidemiology and Biostatistics
Dusan Rasic - Zealand University Hospital, Department of Surgical Pathology
Sonja Koivukoski - University of Eastern Finland, Institute of Biomedicine
Pekka Ruusuvuori - University of Turku, Institute of Biomedicine
Circe Carr - University of Turku, Institute of Biomedicine
Sandra Pouplier - Zealand University Hospital, Department of Surgical Pathology
Leslie Solorzano - Karolinska Institutet, Department of Medical Epidemiology and Biostatistics
Abhinav Sharma - Karolinska Institutet, Department of Medical Epidemiology and Biostatistics
Anne-Vibeke Laenkholm - Zealand University Hospital, Institute of Biomedicine
Aino Kuusela - University of Turku, Institute of Biomedicine
Show less..Ethics Review
Stockholm - Ref. 2017/2106-31
Amendment: 2018/1462-32
Research area
Science and technology (CESSDA Topic Classification)
Information technology (CESSDA Topic Classification)
Medical image processing (Standard för svensk indelning av forskningsämnen 2011)
Medical and health sciences (Standard för svensk indelning av forskningsämnen 2011)
Cancer and oncology (Standard för svensk indelning av forskningsämnen 2011)
Weitz, P. et al., (2022). ACROBAT -- a multi-stain breast cancer histological whole-slide-image data set from routine diagnostics for computational pathology. doi:10.48550/ARXIV.2211.13621
DOI:
https://doi.org/10.48550/ARXIV.2211.13621
Weitz P, Valkonen M, Solorzano L, Carr C, Kartasalo K, Boissin C, Koivukoski S, Kuusela A, Rasic D, Feng Y, Sinius Pouplier S, Sharma A, Ledesma Eriksson K, Latonen L, Laenkholm AV, Hartman J, Ruusuvuori P, Rantalainen M. A Multi-Stain Breast Cancer Histological Whole-Slide-Image Data Set from Routine Diagnostics. Sci Data. 2023 Aug 24;10(1):562.
DOI:
https://doi.org/10.1038/s41597-023-02422-6