Truveta Raises $320M to Sequence 10M Exomes and Build Largest Genetic Database

The health data company Truveta, in collaboration with Regeneron and Illumina, and 17 major U.S. health systems, has launched the Truveta Genome Project. The initiative aims to sequence the exomes of up to 10 million consenting participants, creating a diverse database that links genetic information with de-identified medical records.

While the UK Biobank currently holds the world’s largest whole-genome sequencing dataset with 500,000 participants, the Truveta Genome Project focuses on exome sequencing—the protein-coding regions of the genome—at a much larger scale. By prioritizing diversity and representation, this project aims to uncover insights that will support drug discovery, optimize clinical trials, and advance personalized healthcare.

Aris Baras, senior vice president at Regeneron and head of the Regeneron Genetics Center, says that the Truveta Genome Project "will enable us to explore the complex interplay between genetics and health in unprecedented detail". With nearly three million exomes sequenced at the Regeneron Genetics Center to date, Regeneron scientists have already "identified dozens of genetic-based drug targets for conditions including chronic liver disease, obesity, cancer, and neurodegenerative diseases—many of which have progressed to clinical-stage treatments".

The project will use biospecimens left over from routine medical tests, with Regeneron’s Genetics Center conducting the sequencing. Microsoft’s Azure platform will provide the cloud infrastructure to store and analyze the data securely. A key component of the project is the Truveta Language Model, an AI system designed to process and standardize large volumes of genetic and clinical data, built on Microsoft's Azure. By applying AI to this dataset, researchers hope to better understand genetic contributions to health and disease.

See also: Next-Generation Sequencing Is The Next Big Thing In Pharmacogenomics

This project seeks to close gaps in genetic research by including participants from diverse backgrounds, ensuring the data reflects a broad range of ancestries, genders, and social factors. By integrating genomic and clinical data into Truveta Data, powered by the Truveta Language Model, researchers can identify genetic links to health outcomes with greater precision. This data also supports more efficient clinical trials by predicting patient responses to therapies and improving trial design. The project’s combination of scale, diversity, and AI integration aims to advance understanding of disease and develop more effective, targeted treatments.

In Truveta's Series C funding round, Regeneron has invested $119.5 million in the project, with Illumina contributing $20 million as part of a $320 million funding round that includes investments from 17 health systems such as Advocate Health, CommonSpirit Health, and Northwell Health. Truveta, founded in 2020, has already compiled de-identified electronic health records from 120 million patients through its partnerships with 30 health systems.

Topics: HealthTech

Get Exclusive Insights Into Your Inbox join 8000+ BPT insiders

Get Exclusive Insights Into Your Inbox
join 8000+ BPT insiders