Is segmentation uncertainty useful?
A critical view on multiple common probabilistic models.
Probabilistic image segmentation models provide a distribution of likely segmentation masks. Various characteristics of the distribution can be used to assess the prediction confidence of the model. Find the paper here and the code on Github.
Models
We investigate four common models used to predict segmentation uncertainty:
- A simple U-Net with a softmax-output layer.
- An ensemble of U-Nets.
- MC-Dropout, a method that approximates Bayesian weight distributions.
- Probabilistic U-Net, a fusion between a VAE and a U-Net.
Results
We evaluate the models on two datasets: (1) Skin leision segmentation and (2) Lung cancer segmentation. We see, that all models are relatively certain on correct predictions (TP + TN), and less certain on wrong predictions (FP + FN). Outliers are the MC-dropout model, which seems to be never completely certain in our case (possibly due to bad calibration), and TPs on the lung cancer dataset. The Lung cancer dataset has a large class imbalance, with very little cancer cells. This might explain my models are reluctant to be overly confident on detected cancer cells.
Separating model performance by whether different human annotators fully agree, somewhat agree, or completely disagree shows very little difference between models: All models are certain when the humans agree, and uncertain when humans disagree.
Active learning
Finally, we try to use the uncertainty estimations: In an active learning setting, we try to develop a “smart” strategy on what samples to include into the dataset next. This can inform annotation strategies and lower annotation cost. We trial an uncertainty-sampling strategy against random sampling. In most cases, the model with the random strategy performs better.
Investigating the samples selected by the uncertainty-based strategy reveals that the model frequently chose to include samples where even the group of expert annotators provided disagreeing segmentation masks, providing little information for the model to learn from.
Conclusion
Is segmentation uncertainty useful? We find that uncertainty, even in the simplest models, reliably gives practitioners an indication of areas of an image that might be ambiguous, or wrongly segmented. Using uncertainty estimates to reduce the annotation load has proven challenging, with no significant advantage over a random strategy.