Skip to main content Skip to secondary navigation
Main content start

AIMI Research + Industry Trends Meeting: Problems and Shortcuts in Deep Learning for Screening Mammography - Trevor Tsue | Whiterabbit.ai

Event Details:

Thursday, February 9, 2023
3:00pm - 4:00pm PST

Location

Hybrid: In-Person | Virtual

This event is open to:

Faculty/Staff
Students

Abstract:

Background: Deep learning models often exploit spurious shortcuts during training. Understanding performance and correcting shortcuts are essential for ensuring safety and reliability in clinical use.

Purpose: Deep learning models have been studied for mammography, but this work illustrates that their performance and generalizability remain unclear. We (1) identify spurious shortcuts in models and evaluation issues that can inflate performance and (2) propose training and analysis methods to address them.

Materials and methods: We train an AI model to classify cancer using 94,363 USA mammography exams (850 cancers) and 14,916 UK exams (4,424 cancers) and evaluate on a test set of 11,593 USA exams (102 cancers) and 1,880 UK exams (590 cancers). We present methods that determine if an attribute is spuriously correlated with cancer and utilized as a shortcut by the model. Shortcuts are identified by inspection of probability distributions, subset performance, and attribute prediction from cancer model features. We apply these methods to four different shortcuts to show universal applicability: view markers, dataset, exam type, and scanner model. To mitigate these shortcuts, we investigate removing view markers, dataset balancing, and removing diagnostic exams.

Results: Using only view markers as the image input, the model achieves a 0.691 [0.644, 0.736] AUC, revealing the need to remove this shortcut. Stratifying on the datasets, the model achieves an AUC of 0.945 [95% CI: 0.936, 0.954] on the joint test dataset but surprisingly only 0.838 [0.792, 0.878] on the USA and 0.892 [0.875, 0.906] on the UK test datasets alone. This “inflated AUC phenomenon” is an example of Simpson’s paradox caused by associating UK data with cancer and US data with non-cancer due to a 30x difference in cancer prevalence. This shortcut is mitigated by equally sampling cancers and non-cancers from both datasets. A similar inflated AUC (0.903 [0.886, 0.919]) occurs through associating screening exams (0.861 [0.818 0.898]) with non-cancer exams and diagnostic exams (0.862 [0.837, 0.884]) with cancer exams. Removing diagnostic exams during training removes this bias. Finally, the model does not exhibit the inflated AUC over scanner models; however, it associates Selenia Dimensions (SD) exams with cancers and Hologic Selenia (HS) exams with non-cancers. Through our analysis, we show that the inflated AUC phenomenon is caused by different prevalences of cancer and different probabilities assigned to properties of exams. Thus, stratification over all properties of the exam can be the best way to identify and prevent shortcuts.

About:

Trevor Tsue joined Whiterabbit.ai in June 2018. He received his Master’s degree in artificial intelligence from the Department of Computer Science at Stanford University, where his research spanned metastatic cancer biology, deep generative models, and applications of deep learning in healthcare. His work at Whiterabbit.ai explores multi-modal adaptation using 3D DBT images and time series data, leveraging semi-supervised and weakly-supervised approaches to train malignancy models, generalization of deep learning models to new clinics, and the development of the cancer rule-out algorithm.
 

Attendance is open to the Stanford community. If you would like to attend in-person or on Zoom, please contact the AIMI Center at aimicenter@stanford.edu.

Related Topics

Explore More Events