Skip to main content Skip to secondary navigation

Shared Datasets

Main content start

Stanford AIMI shares annotated data to foster transparent and reproducible collaborative research. 

Our datasets are available to the public to view and use without charge for non-commercial research purposes. The datasets can be downloaded using the links below. If you are interested in using the datasets for commercial purposes, please submit this form to start a conversation around the details.

Commercial Use Interest Form

PLEASE NOTE:  All users of the AIMI data/images are expected to acknowledge Stanford AIMI in all publications, presentations, etc, with the following language: “This research used data provided by the Stanford Center for Artificial Intelligence in Medicine and Imaging (AIMI). AIMI curated a publicly available imaging data repository containing clinical imaging and data from Stanford Health Care, the Stanford Children’s Hospital, the University Healthcare Alliance and Packard Children's Health Alliance clinics provisioned for research use by the Stanford Medicine Research Data Repository (STARR).”



The EchoNet-Peds database includes 7,643 labeled echocardiogram videos and human expert annotations (measurements, tracings, and calculations) to provide a baseline to study cardiac motion and chamber sizes. The database includes patients ranging from 0-18 years (43% female) with a wide range of sizes.





156 pre- and post-contrast whole brain MRI studies, including high-resolution, multi-modal pre- and post-contrast sequences in patients with at least 1 brain metastasis accompanied by ground-truth segmentations by radiologists.

COCA - Coronary Calcium and Chest CTs

We provide two datasets: 1) gated coronary CT DICOM images with corresponding coronary artery calcium segmentations and scores (xml files) 2) non-gated chest CT DICOM images with coronary artery calcium scores

CT Pulmonary Angiography

A collection of CT pulmonary angiography (CTPA) for patients susceptible to Pulmonary Embolism (PE). In addition to slice-level PE labels, we provide labels for PE location, RV/LV ratio, and PE type.


CheXlocalize is a radiologist-annotated segmentation dataset on chest X-rays. The dataset consists of two types of radiologist annotations for the localization of 10 pathologies: pixel-level segmentations and most-representative points.  The validation and test sets consist of 234 chest X-rays from 200 patients and 668 chest X-rays from 500 patients, respectively. 

CheXpert Demo Data

Self-reported race labels for the popular CheXpert dataset in the interest of open science, experimental validation and reproducibility, and to encourage further work in this important area.

CheXpert: Chest X-rays

224,316 chest radiographs of 65,240 patients who underwent a radiographic examination at Stanford between October 2002 and July 2017, in both inpatient and outpatient centers.


A training set of natural photos and synthetic transformations of 10,507 x-rays from 3,000 unique patients that were sampled at random from the CheXpert training set, and a validation and test set of natural and synthetic transformations applied to all 234 x-rays from 200 patients and 668 x-rays from 500 patients in the CheXpert validation and test sets, respectively.


Radiologist-annotated segmentation dataset on chest x-rays and competition for automated pathology segmentation. The dataset can also be used for evaluation of x-ray interpretation models.

DDI - Diverse Dermatology Images

Artificial intelligence (AI) may aid in triaging skin disease.  However, most AI models have not been rigorously assessed on images of diverse skin tones or uncommon diseases. To ascertain potential biases in algorithm performance in this context, we curated the Diverse Dermatology Images (DDI) dataset - the first publicly available, deeply curated, and pathologically confirmed image dataset with diverse skin tones.   


10,030 labeled echocardiogram videos and human expert annotations (measurements, tracings, and calculations) to provide a baseline to study cardiac motion and chamber sizes.

EchoNet-LVH The EchoNet-LVH dataset includes 12,000 labeled echocardiogram videos and human expert annotations (measurements, tracings, and calculations) to provide a baseline to study cardiac chamber size and wall thickness.

LERA - Lower Extremity Radiographs

182 patients who underwent a radiographic examination at the Stanford  between 2003 and 2014. Includes images of the foot, knee, ankle, or hip associated with each patient.

MRNet: Knee MRIs

1,370 knee MRI exams performed at Stanford. Contains 1,104 (80.6%) abnormal exams, with 319 (23.3%) ACL tears and 508 (37.1%) meniscal tears; labels were obtained through manual extraction from clinical reports. 

MURA: MSK X-rays

A large dataset of musculoskeletal radiographs containing 40,561 images from 14,863 studies, where each study is manually labeled by radiologists as either normal or abnormal. 

Multimodal Pulmonary Embolism Dataset

1794 patients susceptible to pulmonary embolism at Stanford. The dataset consists of chest CT, patient demographics and medical history.

RadGraph: CheXpert Results

RadGraph is a dataset of entities and relations in full-text chest X-ray radiology reports based on a novel information extraction schema designed to structure radiology reports. 


Imaging data and annotations for 155 quantitative double echo steady state MRI knee scans acquired clinically at Stanford. The data includes the raw kspace, DICOM images, segmentations of six tissues, and bounding boxes for 16 pathologies. 

Thyroid Ultrasound Cine-clip

167 patients with biopsy-confirmed thyroid nodules (n=192) at Stanford. The dataset consists of ultrasound cine-clip images, radiologist-annotated segmentations, patient demographics, lesion size and location, TI-RADS descriptors, and histopathological diagnoses.