Uncertainty quantification for deep learning-based metastatic lesion segmentation on whole body PET/CT

Brayden Schott; Victor Santoro-Fernandes; Žan Klaneček; Scott Perlman; Robert Jeraj

doi:10.1088/1361-6560/add9df

Uncertainty quantification for deep learning-based metastatic lesion segmentation on whole body PET/CT

Phys Med Biol. 2025 May 23;70(11). doi: 10.1088/1361-6560/add9df.

Authors

Brayden Schott¹, Victor Santoro-Fernandes¹, Žan Klaneček², Scott Perlman³, Robert Jeraj^{1

2}

Affiliations

¹ Department of Medical Physics, School of Medicine and Public Health, University of Wisconsin, Madison, WI, United States of America.
² Faculty of Mathematics and Physics, University of Ljubljana, Ljubljana, Slovenia.
³ Department of Radiology, Section of Nuclear Medicine, School of Medicine and Public Health, University of Wisconsin, Madison, WI, United States of America.

PMID: 40378868
DOI: 10.1088/1361-6560/add9df

Abstract

Objective.Deep learning models are increasingly being implemented for automated medical image analysis to inform patient care. Most models, however, lack uncertainty information, without which the reliability of model outputs cannot be ensured. Several uncertainty quantification (UQ) methods exist to capture model uncertainty. Yet, it is not clear which method is optimal for a given task. The purpose of this work was to investigate several commonly used UQ methods for the critical yet understudied task of metastatic lesion segmentation on whole body PET/CT.Approach.59 whole body⁶⁸Ga-DOTATATE PET/CT images of patients undergoing theranostic treatment of metastatic neuroendocrine tumors were used in this work. A 3D U-Net was trained for lesion segmentation following five-fold cross validation. Uncertainty measures derived from four UQ methods-probability entropy, Monte Carlo dropout, deep ensembles, and test time augmentation-were investigated. Each uncertainty measure was assessed across four quantitative evaluations: (1) its ability to detect artificially degraded image data at low, medium, and high degradation magnitudes; (2) to detect false-positive (FP) predicted regions; (3) to recover false-negative (FN) predicted regions; and (4) to establish correlations with model biomarker extraction and segmentation performance metrics.Mainresults.Test time augmentation and probability entropy respectively achieved the highest and lowest degraded image detection at low (AUC = 0.54 vs. 0.68), medium (AUC = 0.70 vs. 0.82), and high (AUC = 0.83 vs. 0.90) degradation magnitudes. For detecting FPs, all UQ methods achieve strong performance, with AUC values ranging narrowly between 0.77 and 0.81. FN region recovery performance was strongest for test time augmentation and weakest for probability entropy. Performance for the correlation analysis was mixed, where the strongest performance was achieved by test time augmentation for SUV_totalcapture (ρ= 0.57) and segmentation Dice coefficient (ρ= 0.72), by Monte Carlo dropout for SUV_meancapture (ρ= 0.35), and by probability entropy for segmentation cross entropy (ρ= 0.96).Significance.Overall, test time augmentation demonstrated superior UQ performance and is recommended for use in metastatic lesion segmentation task. It also offers the advantage of being post hoc and computationally efficient. In contrast, probability entropy performed the worst, highlighting the need for advanced UQ approaches for this task.

Keywords: PET/CT; deep learning; metastatic lesion segmentation; segmentation; theranostics; uncertainty estimation; uncertainty quantification.

Creative Commons Attribution license.

MeSH terms

Deep Learning*
Humans
Image Processing, Computer-Assisted* / methods
Monte Carlo Method
Neoplasm Metastasis
Neuroendocrine Tumors / diagnostic imaging
Neuroendocrine Tumors / pathology
Positron Emission Tomography Computed Tomography*
Uncertainty
Whole Body Imaging*