Shared Datasets

Stanford AIMI shares annotated data to foster transparent and reproducible collaborative research to advance AI in medicine.

Our datasets are available to the public to view and use without charge for non-commercial research purposes. For research use, please click on the dataset titles below to be taken to the dataset download page. For commercial use, please submit a commercial use interest form to start a conversation around the details.

PLEASE NOTE: All users of the AIMI data/images are expected to acknowledge Stanford AIMI in all publications, presentations, etc, with the following language: “This research used data provided by the Stanford Center for Artificial Intelligence in Medicine and Imaging (AIMI). AIMI curated a publicly available imaging data repository containing clinical imaging and data from Stanford Health Care, the Stanford Children’s Hospital, the University Healthcare Alliance and Packard Children's Health Alliance clinics provisioned for research use by the Stanford Medicine Research Data Repository (STARR).”

Featured Dataset

SinoCT

9776 head CTs with reconstructed images and a high-quality simulated sinogram, each labeled as normal/abnormal by experienced radiologists at the time of interpretation. Labels for hemorrhage are available.

All Datasets

Name	Description
BrainMetShare	156 pre- and post-contrast whole brain MRI studies, including high-resolution, multi-modal pre- and post-contrast sequences in patients with at least 1 brain metastasis accompanied by ground-truth segmentations by radiologists.
COCA - Coronary Calcium and Chest CTs	We provide two datasets: 1) gated coronary CT DICOM images with corresponding coronary artery calcium segmentations and scores (xml files) 2) non-gated chest CT DICOM images with coronary artery calcium scores
CT Pulmonary Angiography	A collection of CT pulmonary angiography (CTPA) for patients susceptible to Pulmonary Embolism (PE). In addition to slice-level PE labels, we provide labels for PE location, RV/LV ratio, and PE type.
CheXlocalize	CheXlocalize is a radiologist-annotated segmentation dataset on chest X-rays. The dataset consists of two types of radiologist annotations for the localization of 10 pathologies: pixel-level segmentations and most-representative points. The validation and test sets consist of 234 chest X-rays from 200 patients and 668 chest X-rays from 500 patients, respectively.
CheXpert Demo Data	Self-reported race labels for the popular CheXpert dataset in the interest of open science, experimental validation and reproducibility, and to encourage further work in this important area.
CheXpert: Chest X-rays	224,316 chest radiographs of 65,240 patients who underwent a radiographic examination at Stanford between October 2002 and July 2017, in both inpatient and outpatient centers.
CheXphoto	A training set of natural photos and synthetic transformations of 10,507 x-rays from 3,000 unique patients that were sampled at random from the CheXpert training set, and a validation and test set of natural and synthetic transformations applied to all 234 x-rays from 200 patients and 668 x-rays from 500 patients in the CheXpert validation and test sets, respectively.
CheXplanation	Radiologist-annotated segmentation dataset on chest x-rays and competition for automated pathology segmentation. The dataset can also be used for evaluation of x-ray interpretation models.
DDI - Diverse Dermatology Images	Artificial intelligence (AI) may aid in triaging skin disease. However, most AI models have not been rigorously assessed on images of diverse skin tones or uncommon diseases. To ascertain potential biases in algorithm performance in this context, we curated the Diverse Dermatology Images (DDI) dataset - the first publicly available, deeply curated, and pathologically confirmed image dataset with diverse skin tones.
EchoNet-Dynamic	10,030 labeled echocardiogram videos and human expert annotations (measurements, tracings, and calculations) to provide a baseline to study cardiac motion and chamber sizes.
EchoNet-LVH	The EchoNet-LVH dataset includes 12,000 labeled echocardiogram videos and human expert annotations (measurements, tracings, and calculations) to provide a baseline to study cardiac chamber size and wall thickness.
EchoNet-Pediatric	The EchoNet-Peds database includes 7,643 labeled echocardiogram videos and human expert annotations (measurements, tracings, and calculations) to provide a baseline to study cardiac motion and chamber sizes. The database includes patients ranging from 0-18 years (43% female) with a wide range of sizes.
LERA - Lower Extremity Radiographs	182 patients who underwent a radiographic examination at the Stanford between 2003 and 2014. Includes images of the foot, knee, ankle, or hip associated with each patient.
MRNet: Knee MRIs	1,370 knee MRI exams performed at Stanford. Contains 1,104 (80.6%) abnormal exams, with 319 (23.3%) ACL tears and 508 (37.1%) meniscal tears; labels were obtained through manual extraction from clinical reports.
MURA: MSK X-rays	A large dataset of musculoskeletal radiographs containing 40,561 images from 14,863 studies, where each study is manually labeled by radiologists as either normal or abnormal.
Multimodal Pulmonary Embolism Dataset	1794 patients susceptible to pulmonary embolism at Stanford. The dataset consists of chest CT, patient demographics and medical history.
RadGraph: CheXpert Results	RadGraph is a dataset of entities and relations in full-text chest X-ray radiology reports based on a novel information extraction schema designed to structure radiology reports.
SinoCT	9776 head CTs with reconstructed images and a high-quality simulated sinogram, each labeled as normal/abnormal by experienced radiologists at the time of interpretation. Labels for hemorrhage are available.
SKM-TEA	Imaging data and annotations for 155 quantitative double echo steady state MRI knee scans acquired clinically at Stanford. The data includes the raw kspace, DICOM images, segmentations of six tissues, and bounding boxes for 16 pathologies.
Thyroid Ultrasound Cine-clip	167 patients with biopsy-confirmed thyroid nodules (n=192) at Stanford. The dataset consists of ultrasound cine-clip images, radiologist-annotated segmentations, patient demographics, lesion size and location, TI-RADS descriptors, and histopathological diagnoses.

Shared Datasets

Main navigation

Featured Dataset

All Datasets