Skip to main content Skip to secondary navigation

INSPECT: A Multimodal Dataset for Pulmonary Embolism Diagnosis and Prognosis

Main content start

Download here

Dataset Description

Synthesizing information from various data sources plays a crucial role in the practice of modern medicine. Current applications of artificial intelligence in medicine often focus on single-modality data due to a lack of publicly available, multimodal medical datasets. To address this limitation, we introduce INSPECT, which contains de-identified longitudinal records from a large cohort of pulmonary embolism (PE) patients, along with ground truth labels for multiple outcomes. INSPECT contains data from 19,438 patients, including CT images, sections of radiology reports, and structured electronic health record (EHR) data (including demographics, diagnoses, procedures, and vitals). Using our provided dataset, we develop and release a benchmark for evaluating several baseline modeling approaches on a variety of important PE related tasks. We evaluate image-only, EHR-only, and fused models. Trained models and the de-identified dataset are made available for non-commercial use under a data use agreement. To the best our knowledge, INSPECT is the largest multimodal dataset for enabling reproducible research on strategies for integrating 3D medical imaging and EHR data. NOTE: this is the first part of release due to PHI review. This release has 20078 CT scans, 21,266 impression sections and the EHR modality data will be uploaded to Stanford Redivis website (https://redivis.com/Stanford)

Canonical URL

Full Description

Terms & Conditions

License