Skip to main content Skip to secondary navigation

Shared Datasets

Main content start

Stanford AIMI shares annotated data to foster transparent and reproducible collaborative research to advance AI in medicine. 

Our datasets are available to the public to view and use without charge for non-commercial research purposes. For research use, please click on the dataset titles below to be taken to the dataset download page. For commercial use, please see here for more information. 

PLEASE NOTE:  All users of the AIMI data/images are expected to acknowledge Stanford AIMI in all publications, presentations, etc, with the following language: “This research used data provided by the Stanford Center for Artificial Intelligence in Medicine and Imaging (AIMI). AIMI curated a publicly available imaging data repository containing clinical imaging and data from Stanford Health Care, the Stanford Children’s Hospital, the University Healthcare Alliance and Packard Children's Health Alliance clinics provisioned for research use by the Stanford Medicine Research Data Repository (STARR).”


Featured Datasets

CheXpert PlusNotable for its organization and depth, the CheXpert Plus dataset is a comprehensive collection that brings together text and images in the medical field, featuring a total of 223,462 unique pairs of radiology reports and chest X-rays across 187,711 studies from 64,725 patients.
Merlin Abdominal CTMerlin Abdominal CT Dataset is an abdominal CT dataset consisting of 25,494 scans from 18,317 patients. Each scan is paired with its corresponding radiology report.

All Datasets

NameDescription
Bone Age X-rayA data set composed of 387 clinical radiographs of the left hand from Lucile Packard Children’s Hospital at Stanford University.

BrainMetShare

156 pre- and post-contrast whole brain MRI studies, including high-resolution, multi-modal pre- and post-contrast sequences in patients with at least 1 brain metastasis accompanied by ground-truth segmentations by radiologists.

CheXlocalize

CheXlocalize is a radiologist-annotated segmentation dataset on chest X-rays. The dataset consists of two types of radiologist annotations for the localization of 10 pathologies: pixel-level segmentations and most-representative points.  The validation and test sets consist of 234 chest X-rays from 200 patients and 668 chest X-rays from 500 patients, respectively. 

CheXmultimodalCheXmultimodal, a publicly available, multimodal dataset of 324 patient studies containing chest X-rays and clinical history from Stanford University Hospital.
CheXpert PlusThe CheXpert Plus dataset is a comprehensive collection that brings together text and images in the medical field, featuring a total of 223,462 unique pairs of radiology reports and chest X-rays across 187,711 studies from 64,725 patients. 

CheXphoto

A training set of natural photos and synthetic transformations of 10,507 x-rays from 3,000 unique patients that were sampled at random from the CheXpert training set, and a validation and test set of natural and synthetic transformations applied to all 234 x-rays from 200 patients and 668 x-rays from 500 patients in the CheXpert validation and test sets, respectively.

COCA - Coronary Calcium and Chest CTsWe provide two datasets: 1) gated coronary CT DICOM images with corresponding coronary artery calcium segmentations and scores (xml files) 2) non-gated chest CT DICOM images with coronary artery calcium scores

DDI - Diverse Dermatology Images

Artificial intelligence (AI) may aid in triaging skin disease.  However, most AI models have not been rigorously assessed on images of diverse skin tones or uncommon diseases. To ascertain potential biases in algorithm performance in this context, we curated the Diverse Dermatology Images (DDI) dataset - the first publicly available, deeply curated, and pathologically confirmed image dataset with diverse skin tones.   

DDI2 - Diverse Dermatology Images -2 The DDI-2 dataset consists of 665 clinical photos corresponding to 169 unique biopsy-proven diagnoses from 550 patients who self-identified as Asian and were evaluated at Stanford Dermatology from 2012-2022.

EchoNet-Dynamic

10,030 labeled echocardiogram videos and human expert annotations (measurements, tracings, and calculations) to provide a baseline to study cardiac motion and chamber sizes.

EchoNet-LVH

The EchoNet-LVH dataset includes 12,000 labeled echocardiogram videos and human expert annotations (measurements, tracings, and calculations) to provide a baseline to study cardiac chamber size and wall thickness.

EchoNet-Pediatric

The EchoNet-Peds database includes 7,643 labeled echocardiogram videos and human expert annotations (measurements, tracings, and calculations) to provide a baseline to study cardiac motion and chamber sizes. The database includes patients ranging from 0-18 years (43% female) with a wide range of sizes.

EchoNet- Tee-View-ClassifierIntraoperative TEE videos from approximately 500 unique adult cardiac surgery patients from Stanford University Medical Center. This dataset represents the external test dataset for our TEE view classification study. 
EHRSHOTEHRSHOT is a benchmark dataset of structured electronic health record data (i.e., no images or clinical text) for 6712 patients.
GazexPErTGazeXPErT, a 4D eye-tracking dataset with expert decisions for tumor detection and measurement on 346 dual-annotated FDG-PET/CTs. The dataset contributes 9,030 unique gaze-to-lesion trajectories derived from 3,948 minutes of 60Hz eye-tracking data, rendered in COCO-style format for machine learning applications.
INSPECT: A Multimodal Dataset for Pulmonary Embolism Diagnosis and Prognosis INSPECT contains data from 19,438 patients, including CT images, sections of radiology reports, and structured electronic health record (EHR) data (including demographics, diagnoses, procedures, and vitals). 

LERA - Lower Extremity Radiographs

182 patients who underwent a radiographic examination at the Stanford  between 2003 and 2014. Includes images of the foot, knee, ankle, or hip associated with each patient.

Merlin Abdominal CTMerlin Abdominal CT Dataset is an abdominal CT dataset consisting of 25,494 scans from 18,317 patients. Each scan is paired with its corresponding radiology report.
MRA-MIDAS: Multimodal Image Dataset for AI-based Skin CancerMelanoma Research Alliance Multimodal Image Dataset for AI-based Skin Cancer (MRA-MIDAS) dataset, the first publicly available, prospectively-recruited, systematically-paired dermoscopic and clinical image-based dataset across a range of skin-lesion diagnoses.

MRNet: Knee MRIs

1,370 knee MRI exams performed at Stanford. Contains 1,104 (80.6%) abnormal exams, with 319 (23.3%) ACL tears and 508 (37.1%) meniscal tears; labels were obtained through manual extraction from clinical reports. 

MURA: MSK X-rays

A large dataset of musculoskeletal radiographs containing 40,561 images from 14,863 studies, where each study is manually labeled by radiologists as either normal or abnormal. 

OL3IOL3I:Opportunistic L3 CT slices for Ischemic heart disease risk assessment.  OL3I (pronounced olé) is a multimodal dataset of 8,139 axial computed tomography (CT) slices at the third lumbar vertebrae (L3) level of individuals along with tabular medical record data from up to one year prior to the scan.
OPAL:The Oxygen-15 PET/MR Cerebral Perfusion AtlasThis dataset contains simultaneous brain oxygen-15 labeled water([¹⁵O]H₂O) PET/MRI data designed to support the development and evaluation of advanced non-invasive perfusion imaging methods.

RadFusion: Multimodal Pulmonary Embolism Dataset

1794 patients susceptible to pulmonary embolism at Stanford. The dataset consists of chest CT, patient demographics and medical history.

RadGraph: CheXpert Results

RadGraph is a dataset of entities and relations in full-text chest X-ray radiology reports based on a novel information extraction schema designed to structure radiology reports. 

RadGraph XLRadGraph-XL: A Large-Scale Expert-Annotated Dataset for Entity and Relation Extraction from Radiology Reports.
SCOPE-HNSCOPE-HN: A Segmentation-based Collection of Oropharyngeal Structures Using Flexible Endoscopy for Head and Neck Cancers.

SinoCT

9776 head CTs with reconstructed images and a high-quality simulated sinogram, each labeled as normal/abnormal by experienced radiologists at the time of interpretation. Labels for hemorrhage are available.

SKM-TEA

Imaging data and annotations for 155 quantitative double echo steady state MRI knee scans acquired clinically at Stanford. The data includes the raw kspace, DICOM images, segmentations of six tissues, and bounding boxes for 16 pathologies. 

SPADeSPADe (Stanford PET/CT Abnormality Detection), a dataset of whole-body FDG-PET/CT scans with hypermetabolic abnormalities labeled by anatomical region.

Thyroid Ultrasound Cine-clip

167 patients with biopsy-confirmed thyroid nodules (n=192) at Stanford. The dataset consists of ultrasound cine-clip images, radiologist-annotated segmentations, patient demographics, lesion size and location, TI-RADS descriptors, and histopathological diagnoses.

TumorMap-LungSet of chest CTs from patients with metastatic non-small cell lung cancer. Two radiation oncologists contoured/segmented all lung tumors in 3D on each scan.