Skip to content Skip to navigation

Stanford Aims For Global Data Access For AI Research

Photo by monsitj/Getty Images
Tuesday, August 3, 2021

Posted In:


Part of the idea, says the center, is to create an open and global repository that will enable researchers to explore important clinical use cases, as well as other expanded research opportunities.

AI has an insatiable appetite; for data, that is.  

In recognition of that fact, Stanford University’s Center for Artificial Intelligence in Medicine and Imaging (AIMI) is redoubling its efforts to expand what it says is already “the world’s largest free repository of AI-ready annotated medical imaging datasets.”

As Matthew Lungren, co-director of AIMI and an assistant professor of radiology at Stanford, put it, “What drives this technology, whether you’re a surgeon or an obstetrician, is data. We want to double down on the idea that medical data is a public good, and that it should be open to the talents of researchers anywhere in the world.”

To that end, AIMI has joined with Microsoft’s AI for Health program to launch a new platform that will be more automated, accessible, and visible. It will be capable of hosting and organizing scores of additional images from institutions around the world, while providing a hub for sharing research, thus making it easier to refine different models and identify differences between population groups. The platform will also offer cloud-based computing power so researchers don’t have to worry about building local resource intensive clinical machine-learning infrastructure.

The center says it already has nine datasets containing more than 1 million images, and Lungren predicts that number will double within the next year, with two new datasets to be released with the new platform.

“This platform will have the largest diversity and volume of AI-ready medical datasets in the world,” he said.

The overall goal is to create an entire ecosystem for AI medical research, while also offering standardized machine-learning tools and pre-trained models leveraging open-source data and common architectures, all “to spur a wave of crowd-sourced AI research.”

The center says that by offering data at no cost, researchers will be able to explore niche areas, such as medical problems that affect particular communities, that large corporations might well overlook.

Moreover, the diversity of datasets should enable researchers to detect hidden biases in the data or in algorithms, a problem which can distort AI models, rendering them more accurate for certain population groups than others.

“We love that corporations are doing all this work, but we don’t love the fact that the opportunity to share information is asymmetric,” Lungren said. “If they amass data but then lock it down, they will be the only ones who can innovate, which would shut out the important contributions by computer scientists and clinicians around the world. That’s not a position we want to be in.”