Shared Datasets
Stanford AIMI shares annotated data to foster transparent and reproducible collaborative research to advance AI in medicine.
Our datasets are available to the public to view and use without charge for non-commercial research purposes. For research use, please click on the dataset titles below to be taken to the dataset download page. For commercial use, please submit a commercial use interest form to start a conversation around the details.
PLEASE NOTE: All users of the AIMI data/images are expected to acknowledge Stanford AIMI in all publications, presentations, etc, with the following language: “This research used data provided by the Stanford Center for Artificial Intelligence in Medicine and Imaging (AIMI). AIMI curated a publicly available imaging data repository containing clinical imaging and data from Stanford Health Care, the Stanford Children’s Hospital, the University Healthcare Alliance and Packard Children's Health Alliance clinics provisioned for research use by the Stanford Medicine Research Data Repository (STARR).”
Featured Datasets
CheXpert Plus | Notable for its organization and depth, the CheXpert Plus dataset is a comprehensive collection that brings together text and images in the medical field, featuring a total of 223,462 unique pairs of radiology reports and chest X-rays across 187,711 studies from 64,725 patients. |
All Datasets
Name | Description |
---|---|
156 pre- and post-contrast whole brain MRI studies, including high-resolution, multi-modal pre- and post-contrast sequences in patients with at least 1 brain metastasis accompanied by ground-truth segmentations by radiologists. | |
We provide two datasets: 1) gated coronary CT DICOM images with corresponding coronary artery calcium segmentations and scores (xml files) 2) non-gated chest CT DICOM images with coronary artery calcium scores | |
A collection of CT pulmonary angiography (CTPA) for patients susceptible to Pulmonary Embolism (PE). In addition to slice-level PE labels, we provide labels for PE location, RV/LV ratio, and PE type. | |
CheXlocalize is a radiologist-annotated segmentation dataset on chest X-rays. The dataset consists of two types of radiologist annotations for the localization of 10 pathologies: pixel-level segmentations and most-representative points. The validation and test sets consist of 234 chest X-rays from 200 patients and 668 chest X-rays from 500 patients, respectively. | |
Self-reported race labels for the popular CheXpert dataset in the interest of open science, experimental validation and reproducibility, and to encourage further work in this important area. | |
224,316 chest radiographs of 65,240 patients who underwent a radiographic examination at Stanford between October 2002 and July 2017, in both inpatient and outpatient centers. | |
A training set of natural photos and synthetic transformations of 10,507 x-rays from 3,000 unique patients that were sampled at random from the CheXpert training set, and a validation and test set of natural and synthetic transformations applied to all 234 x-rays from 200 patients and 668 x-rays from 500 patients in the CheXpert validation and test sets, respectively. | |
Artificial intelligence (AI) may aid in triaging skin disease. However, most AI models have not been rigorously assessed on images of diverse skin tones or uncommon diseases. To ascertain potential biases in algorithm performance in this context, we curated the Diverse Dermatology Images (DDI) dataset - the first publicly available, deeply curated, and pathologically confirmed image dataset with diverse skin tones. | |
10,030 labeled echocardiogram videos and human expert annotations (measurements, tracings, and calculations) to provide a baseline to study cardiac motion and chamber sizes. | |
The EchoNet-LVH dataset includes 12,000 labeled echocardiogram videos and human expert annotations (measurements, tracings, and calculations) to provide a baseline to study cardiac chamber size and wall thickness. | |
The EchoNet-Peds database includes 7,643 labeled echocardiogram videos and human expert annotations (measurements, tracings, and calculations) to provide a baseline to study cardiac motion and chamber sizes. The database includes patients ranging from 0-18 years (43% female) with a wide range of sizes. | |
EchoNet- Tee-View-Classifier | Intraoperative TEE videos from approximately 500 unique adult cardiac surgery patients from Stanford University Medical Center. This dataset represents the external test dataset for our TEE view classification study. |
INSPECT: A Multimodal Dataset for Pulmonary Embolism Diagnosis and Prognosis | INSPECT contains data from 19,438 patients, including CT images, sections of radiology reports, and structured electronic health record (EHR) data (including demographics, diagnoses, procedures, and vitals). |
182 patients who underwent a radiographic examination at the Stanford between 2003 and 2014. Includes images of the foot, knee, ankle, or hip associated with each patient. | |
MRA-MIDAS: Multimodal Image Dataset for AI-based Skin Cancer | Melanoma Research Alliance Multimodal Image Dataset for AI-based Skin Cancer (MRA-MIDAS) dataset, the first publicly available, prospectively-recruited, systematically-paired dermoscopic and clinical image-based dataset across a range of skin-lesion diagnoses. |
1,370 knee MRI exams performed at Stanford. Contains 1,104 (80.6%) abnormal exams, with 319 (23.3%) ACL tears and 508 (37.1%) meniscal tears; labels were obtained through manual extraction from clinical reports. | |
A large dataset of musculoskeletal radiographs containing 40,561 images from 14,863 studies, where each study is manually labeled by radiologists as either normal or abnormal. | |
1794 patients susceptible to pulmonary embolism at Stanford. The dataset consists of chest CT, patient demographics and medical history. | |
RadGraph is a dataset of entities and relations in full-text chest X-ray radiology reports based on a novel information extraction schema designed to structure radiology reports. | |
9776 head CTs with reconstructed images and a high-quality simulated sinogram, each labeled as normal/abnormal by experienced radiologists at the time of interpretation. Labels for hemorrhage are available. | |
Imaging data and annotations for 155 quantitative double echo steady state MRI knee scans acquired clinically at Stanford. The data includes the raw kspace, DICOM images, segmentations of six tissues, and bounding boxes for 16 pathologies. | |
167 patients with biopsy-confirmed thyroid nodules (n=192) at Stanford. The dataset consists of ultrasound cine-clip images, radiologist-annotated segmentations, patient demographics, lesion size and location, TI-RADS descriptors, and histopathological diagnoses. |