Skip to main content Skip to secondary navigation

Shared Datasets

Main content start

Stanford AIMI shares annotated data to foster transparent and reproducible collaborative research to advance AI in medicine. 

Our datasets are available to the public to view and use without charge for non-commercial research purposes. For research use, please click on the dataset titles below to be taken to the dataset download page. For commercial use, please submit a commercial use interest form to start a conversation around the details. 

PLEASE NOTE:  All users of the AIMI data/images are expected to acknowledge Stanford AIMI in all publications, presentations, etc, with the following language: “This research used data provided by the Stanford Center for Artificial Intelligence in Medicine and Imaging (AIMI). AIMI curated a publicly available imaging data repository containing clinical imaging and data from Stanford Health Care, the Stanford Children’s Hospital, the University Healthcare Alliance and Packard Children's Health Alliance clinics provisioned for research use by the Stanford Medicine Research Data Repository (STARR).”


Featured Datasets

CheXpert PlusNotable for its organization and depth, the CheXpert Plus dataset is a comprehensive collection that brings together text and images in the medical field, featuring a total of 223,462 unique pairs of radiology reports and chest X-rays across 187,711 studies from 64,725 patients.

All Datasets

NameDescription

BrainMetShare

156 pre- and post-contrast whole brain MRI studies, including high-resolution, multi-modal pre- and post-contrast sequences in patients with at least 1 brain metastasis accompanied by ground-truth segmentations by radiologists.

COCA - Coronary Calcium and Chest CTs

We provide two datasets: 1) gated coronary CT DICOM images with corresponding coronary artery calcium segmentations and scores (xml files) 2) non-gated chest CT DICOM images with coronary artery calcium scores

CT Pulmonary Angiography

A collection of CT pulmonary angiography (CTPA) for patients susceptible to Pulmonary Embolism (PE). In addition to slice-level PE labels, we provide labels for PE location, RV/LV ratio, and PE type.

CheXlocalize

CheXlocalize is a radiologist-annotated segmentation dataset on chest X-rays. The dataset consists of two types of radiologist annotations for the localization of 10 pathologies: pixel-level segmentations and most-representative points.  The validation and test sets consist of 234 chest X-rays from 200 patients and 668 chest X-rays from 500 patients, respectively. 

CheXpert Demo Data

Self-reported race labels for the popular CheXpert dataset in the interest of open science, experimental validation and reproducibility, and to encourage further work in this important area.

CheXpert: Chest X-rays

224,316 chest radiographs of 65,240 patients who underwent a radiographic examination at Stanford between October 2002 and July 2017, in both inpatient and outpatient centers.

CheXphoto

A training set of natural photos and synthetic transformations of 10,507 x-rays from 3,000 unique patients that were sampled at random from the CheXpert training set, and a validation and test set of natural and synthetic transformations applied to all 234 x-rays from 200 patients and 668 x-rays from 500 patients in the CheXpert validation and test sets, respectively.

DDI - Diverse Dermatology Images

Artificial intelligence (AI) may aid in triaging skin disease.  However, most AI models have not been rigorously assessed on images of diverse skin tones or uncommon diseases. To ascertain potential biases in algorithm performance in this context, we curated the Diverse Dermatology Images (DDI) dataset - the first publicly available, deeply curated, and pathologically confirmed image dataset with diverse skin tones.   

EchoNet-Dynamic

10,030 labeled echocardiogram videos and human expert annotations (measurements, tracings, and calculations) to provide a baseline to study cardiac motion and chamber sizes.

EchoNet-LVH

The EchoNet-LVH dataset includes 12,000 labeled echocardiogram videos and human expert annotations (measurements, tracings, and calculations) to provide a baseline to study cardiac chamber size and wall thickness.

EchoNet-Pediatric

The EchoNet-Peds database includes 7,643 labeled echocardiogram videos and human expert annotations (measurements, tracings, and calculations) to provide a baseline to study cardiac motion and chamber sizes. The database includes patients ranging from 0-18 years (43% female) with a wide range of sizes.

EchoNet- Tee-View-ClassifierIntraoperative TEE videos from approximately 500 unique adult cardiac surgery patients from Stanford University Medical Center. This dataset represents the external test dataset for our TEE view classification study. 
INSPECT: A Multimodal Dataset for Pulmonary Embolism Diagnosis and Prognosis INSPECT contains data from 19,438 patients, including CT images, sections of radiology reports, and structured electronic health record (EHR) data (including demographics, diagnoses, procedures, and vitals). 

LERA - Lower Extremity Radiographs

182 patients who underwent a radiographic examination at the Stanford  between 2003 and 2014. Includes images of the foot, knee, ankle, or hip associated with each patient.

MRA-MIDAS: Multimodal Image Dataset for AI-based Skin CancerMelanoma Research Alliance Multimodal Image Dataset for AI-based Skin Cancer (MRA-MIDAS) dataset, the first publicly available, prospectively-recruited, systematically-paired dermoscopic and clinical image-based dataset across a range of skin-lesion diagnoses.

MRNet: Knee MRIs

1,370 knee MRI exams performed at Stanford. Contains 1,104 (80.6%) abnormal exams, with 319 (23.3%) ACL tears and 508 (37.1%) meniscal tears; labels were obtained through manual extraction from clinical reports. 

MURA: MSK X-rays

A large dataset of musculoskeletal radiographs containing 40,561 images from 14,863 studies, where each study is manually labeled by radiologists as either normal or abnormal. 

RadFusion: Multimodal Pulmonary Embolism Dataset

1794 patients susceptible to pulmonary embolism at Stanford. The dataset consists of chest CT, patient demographics and medical history.

RadGraph: CheXpert Results

RadGraph is a dataset of entities and relations in full-text chest X-ray radiology reports based on a novel information extraction schema designed to structure radiology reports. 

SinoCT

9776 head CTs with reconstructed images and a high-quality simulated sinogram, each labeled as normal/abnormal by experienced radiologists at the time of interpretation. Labels for hemorrhage are available.

SKM-TEA

Imaging data and annotations for 155 quantitative double echo steady state MRI knee scans acquired clinically at Stanford. The data includes the raw kspace, DICOM images, segmentations of six tissues, and bounding boxes for 16 pathologies. 

Thyroid Ultrasound Cine-clip

167 patients with biopsy-confirmed thyroid nodules (n=192) at Stanford. The dataset consists of ultrasound cine-clip images, radiologist-annotated segmentations, patient demographics, lesion size and location, TI-RADS descriptors, and histopathological diagnoses.