Beyond Legacy Tools: Defining Modern AI Drug Discovery for 2025 and Beyond

by Andrii Buvailo, PhD, Oleg Kucheriavyi, PhD | updated on June 6, 2025

In this report:

Intro: The New Framework
“AI Drug Discovery” is About Holism
"AI Drug Discovery" is About Building Software
Access to Data is King
Validation is Critical for AI Drug Discovery Platforms
Grand Vision of AI Drug Discovery for 2025 and Beyond

Disclaimer

This report aims to provide an educational, balanced, and pragmatic perspective on AI-driven drug discovery (AIDD). No part of this report should be construed as promotional content or marketing communication.

Some companies featured are past or current clients, and certain organizations provided factual input during the research process. All analysis and conclusions were developed independently to ensure objectivity.

This report does not constitute investment advice or an endorsement. While we strive for accuracy and neutrality, we accept no liability for decisions made based on this content. Readers are encouraged to conduct their own due diligence.

In 2025, it seems there is still a lack of a robust definition of an emerging category of artificial intelligence-driven drug discovery companies (hereinafter, AIDD).

The purpose of this report is to suggest a qualitative framework for classification of AIDD companies, combining the four key attributes that define the leading players in this area:

Focus on holism vs reductionism in biology
Creating robust AI platforms (software)
Priority of data acquisition
Technology validation (via demonstrable ability to discover novel targets, discovery and develop clinical-grade drug candidates rapidly, a track record of platform partnerships, scientific publications, patents, and so on)

We will delve deeper into framework discussion below, but in a nutshell, it boils down to this:

AI drug discovery (AIDD) framework — Diagram 1

Indeed, abstracting from specific characteristics of a tech stack and platform design, there are three key value points of an AI platform on business outcome:

Is a computational platform scalable and robust enough to impact the R&D workflow, people collaboration patterns, and daily decision making of a wide range of specialists of a given organization to make a productivity difference?
Is it able to represent biology in silico down to sufficient depth, but also sufficient breadth to be able to grasp relevant and useful dependencies, patterns, network biology effects, to be able to impact scientific decision-making beyond mainstream research workflows?
Is the AI platform capable of addressing the above two questions in a repeatable, stable, standardized way across all levels of R&D workflows in the organization? Would a third-party collaborator be able to get sustainable value out of using the AI software if they had access?

In our opinion, AIDD is about being able to answer “yes” to all three questions. This is what makes the AIDD platform a tangible business asset.

“AI Drug Discovery” is About Holism

As we explore the newly suggested framework, one key distinction emerges: the difference in what we attempt to model and represent computationally in today’s AI-driven landscape versus what was typically addressed using earlier generations of computational tools.

A helpful starting point is to consider the conceptual gap between traditional software—developed decades ago and still widely used in drug discovery for specific tasks—and modern AI-enabled platforms that are increasingly positioned as end-to-end solutions. While both types of tools play valuable roles, their underlying philosophies differ significantly.

In simple terms, “traditional” or “legacy” cheminformatics and bioinformatics rely on human-driven approaches: cheminformatics uses predefined chemical descriptors (like molecular weight or logP), statistical methods and some machine learning approaches for tasks like QSAR modeling and docking, while bioinformatics applies statistical methods, including dimensionality reduction techniques, to analyze complex biological datasets (e.g., genomics, proteomics) and uncover potential drug targets. These methods are hypothesis-driven, modular, and work with smaller, well-structured datasets.

Conceptually, legacy computational systems and simpler machine learning methods are useful in the paradigm of “biological reductionism.” And they do a great job there, even today.

Classical reductionist approach example is structure-based drug discovery, where it is believed modulating a specific protein is an answer to a drug discovery problem (it sometimes is). The computational part, therefore, is mostly focused on narrow-scope tasks like fitting a ligand into a protein pocket (docking), or, computationally identifying a new type of chemistry for a given target (ligand-based virtual screening).

Structure-based Drug Discovery illustration — "Flow chart for structure based drug design" by Laozhengzz, via Wikimedia Commons (CC BY-SA 3.0)

In stark contrast, cutting edge AI-driven drug discovery companies attempt to shift to a systems biology level, a hypothesis-agnostic approach, using deep learning-based systems to integrate largely multimodal data (phenotype, omics, patient data, chemical structures, texts, images, etc.) to construct complex and comprehensive biology representations (e.g. “knowledge graphs”).

For example, the scientific underpinnings of Pharma.AI computational platform by Hong Kong based Insilico Medicine are rooted in a novel combination of policy-gradient-based reinforcement learning (RL) and generative models, enabling multi-objective optimization to balance parameters such as potency, toxicity, and novelty.

According to the company, a target identification PandaOmics module leverages 1.9 trillion data points from over 10 million biological samples (including RNA sequencing and proteomics) and 40 million documents (such as patents and clinical trials), using NLP and machine learning to uncover and prioritize novel therapeutic targets.

The Chemistry42 module applies deep learning, including generative adversarial networks (GANs) and reinforcement learning, to design novel drug-like molecules optimized for binding affinity, metabolic stability, and bioavailability.

In the context of clinical development, inClinico predicts trial outcomes using historical and ongoing trial data, offering insights into patient selection and endpoint optimization.

On an algorithm side of things, Pharma.AI incorporates advanced reward shaping, allowing it to fine-tune generated molecules to specific target profiles or polypharmacological goals. Additionally, Insilico emphasizes the use of knowledge graph embeddings, which encode biological relationships — such as gene–disease, gene–compound, and compound–target interactions — into vector spaces.

These embeddings are augmented by attention-based neural architectures, inspired by transformer models, to focus on biologically relevant subgraphs, refining hypotheses for target identification and biomarker discovery.

The platform employs a continuous active learning and iterative feedback process, retraining models on new experimental data, including biochemical assays, phenotypic screens, and in vivo validations, to accelerate the design–make–test–analyze (DMTA) cycle by rapidly eliminating suboptimal candidates and enhancing lead generation.

Furthermore, the platform’s multi-modal data fusion integrates textual information from published literature, patents, and clinical trial data with omics-level insights and chemical libraries. To this end, Natural Language Processing (NLP) models are used to extract relevant biological context and side-effect annotations from these textual sources, which are then enriched with phenotypic screening data, enabling a holistic view of the drug discovery process.

You can familiarize yourself with some of the aspects of the Pharma.AI platform by reading a recent paper “A small-molecule TNIK inhibitor targets fibrosis in preclinical and clinical models” (image below is from the paper).

Another relevant example of what can be classified as an AI drug discovery approach is Recursion’ OS Platform.

The Recursion OS is a vertical platform of diverse technologies that enables the company to map and navigate trillions of biological, chemical, and patient-centric relationships utilizing approximately 65 petabytes of proprietary data.

According to a commentary by Recursion, OS integrates ‘Real World’ data generated in their own wet-laboratories or by select partners and a ‘World Model’ which is a collection of AI computational models they also build in-house. Today, their scaled ‘wet-lab’ biology, chemistry, and patient-centric experimental data feeds their ‘dry-lab’ computational tools to identify, validate, and translate therapeutic insights, which they can then validate in the wet-lab. The Recursion OS is powered by BioHive-2, what company claims to be the fastest supercomputer wholly owned and operated by a biopharma company.

While different from Insilico Medicine in model architectures and workflows, Recursion is, however, focused on the same key objective: to create a comprehensive representation of biology to be able to mine crucial insights for drug discovery:

Conceptualized representation of the Recursion OS platform

Key models of Recursion OS include Phenom-2, a 1.9 billion-parameter ViT-G/8 MAE trained on 8 billion microscopy images, achieving a 60% improvement in genetic perturbation separability, according to company claims.

MolPhenix, winner of NeurIPS 2024 Best Paper, predicts molecule-phenotype effects with a considerable improvement over baselines. MolGPS, a 3-billion-parameter model, excels in molecular property prediction and integrates proprietary phenomics data, outperforming benchmarks in 12 of 22 ADMET tasks. MolE, trained on 842 million molecular graphs, leads in 10 of 22 ADMET tasks.

An interesting component of Recursion OS, is a knowledge graph tool that evaluates promising signals found by the Recursion OS through a complex lens of topics of interest in biology and drug discovery – including global trend scores, protein pockets and structure, competitive landscape, and clinical trials. The knowledge graph allows researchers to perform “target deconvolution” – identifying and validating the molecular targets of a small molecule's phenotypic responses – in order to narrow those hundreds of possibilities into the best target opportunity.

A more recent example comes from a California-based Iambic Therapeutics, founded in 2019. The team at Iambic developed a drug discovery platform that integrates three specialized AI systems—Magnet, NeuralPLexer, and Enchant—into a unified pipeline that computationally spans molecular design, structure prediction, and clinical property inference.

Magnet generates synthetically accessible small molecules by leveraging reaction-aware generative models constrained by Iambic’s automated chemistry infrastructure. These molecules are passed to NeuralPLexer, a multi-scale diffusion-based generative model that directly predicts atom-level, ligand-induced conformational changes in protein-ligand complexes using only protein sequence and ligand graph as input. The resulting structural complexes inform both target engagement and binding specificity.

Finally, Enchant uses a multi-modal transformer architecture trained across diverse, noisy preclinical datasets to predict human pharmacokinetics and other clinical outcomes via transfer learning, achieving high predictive accuracy even with minimal clinical data. This architecture enables an iterative, model-driven workflow where molecular candidates are designed, structurally evaluated, and clinically prioritized entirely in silico before synthesis.

Finally, there is a notable example from the area of neurodegenerative diseases, Verge Genomics. The CONVERGE® platform developed by Verge is an end-to-end, closed-loop machine learning system that integrates large-scale human-derived biological data with predictive modeling.

At its core, CONVERGE® leverages high-dimensional, multi-modal datasets—including over 60 terabytes of human gene expression and inferred gene relationships, thousands of gene perturbation and ChIP-seq studies, millions of protein-protein interactions, and direct-from-human clinical samples across diseases such as ALS, Parkinson’s, and FTD.

These data are used to train machine learning models that identify and prioritize drug targets with increased translational relevance, avoiding reliance on animal or artificial cell models that poorly mimic human biology. Predictions from these models are experimentally validated in-house using Verge’s wet lab infrastructure, forming a feedback loop that continuously refines both biological hypotheses and model performance.

This integration of patient-derived tissue data, mechanistic genomics, and computational target prioritization is aimed at the identification of clinically viable drug candidates without brute-force screening. Verge’s internally developed clinical compound was derived entirely through CONVERGE® in under four years, including target discovery stage.

Conceptually, “AI drug discovery”, in contrast to “legacy” computational systems refers to a modern computational tech stack, usually a multimodal ensemble, that is capable of modeling biology holistically, including molecular, phenotypic, and clinical data of all types and sizes (chemical, omics, text, images (e.g. cell staining), EHR, etc.) — all at once, or substantial part of variety.

Generative AI

Another crucial aspect differing modern AIDD from earlier computational tools is generative capabilities.

While companies like Insilico Medicine pioneered the use of Generative Adversarial Networks (GANs) for generative chemistry back in 2016, by leveraging their ability to model complex molecular distributions and propose novel chemical structures, it is the introduction of transformers and attention mechanisms in 2017, particularly with the advent of models like BERT and GPT, that in our opinion rendered a paradigm shift of generative modeling across domains.

We consider 2017 as a pillar year for generative AI, including chemistry and biology, after the landmark paper “Attention is all you need”.

These architectures, pioneered by Google, and later developed by OpenAI, Anthropic, Mistral AI, and others, demonstrated unparalleled scalability and capacity for capturing long-range dependencies in sequential data.

By pretraining on vast corpora of text (hundreds of billions and even trillions of parameters) and employing self-attention to dynamically weight input relationships, transformers enabled large-scale generative models such as GPT-3 and GPT-4 to generate highly coherent and contextually accurate outputs.

Yes, “hallucinations” are still a major issue. But the shift is paramount, nonetheless. The pioneering commercial products in this regard are ChatGPT for primarily text-to-text generation, Midjourney for text-to-image generation, and many others for text-to-video, text-to-music, etc.

The emergence of practically feasible transformers and large language models catalyzed a sort of race in computational chemistry and biology towards so-called foundation models. The article 19 Companies Pioneering AI Foundation Models in Pharma and Biotech summarizes some of the initiatives in this domain.

To summarize, here is a simple generalizable framework to draw a silver lining between legacy CADD and modern AIDD:

Table 1

Dimension	Traditional Chem(Bio)informatics	AI Drug Discovery
Primary Focus	Methodical QSAR, structure-based design, library searches	Automated, data-intensive predictions and/or generative output, end-to-end optimization, novel hypothesis generation, biology scoring, etc.
Core Techniques	- QSAR (linear/non-linear models) - Docking & virtual screening - Descriptor-driven modeling	- Deep learning (CNNs, GNNs) - Generative models (VAEs, GANs) - Transformers, attention algorithm - Active learning, reinforcement learning
Feature Engineering	- Heavily reliant on manually crafted descriptors - Traditional molecular fingerprints	- Automated feature extraction from raw data (e.g., molecular graphs) - Learns non-obvious patterns
Data Sources	- Limited to known chemical and structural data - Smaller curated databases	- Integration of large-scale multi-modal data (omics, real-world evidence) - Massive virtual libraries - Synthetic data
Generative Capability	- Rule-based or library-based enumeration - Similarity-driven searches	- Machine learning–based de novo molecule generation - Novel chemistry exploration
Scalability	- Often constrained by computational cost of docking or QSAR on moderate-sized libraries	- Designed to handle billions of compounds or biological data points in silico - Cloud-based, high-throughput pipelines
Human Involvement	- Significant expert intervention needed (e.g., choosing descriptors, scoring functions)	- Reduced manual involvement through automation - AI suggests experiments and molecules for validation
Integration Across Stages	- Typically used as isolated tools (e.g., for docking or property prediction)	- Can form an end-to-end platform (target ID to lead optimization, to clinical trial optimization ideas or predicting clinical trial success) - Real-time feedback loops
Scope of Insights	- Narrowly focused on chemical structures and known SAR rules	- Deeper pattern recognition across complex, high-dimensional datasets - Potential for discovering novel biology and chemistry, novel hypotheses
Value Proposition	- Proven track record for well-known targets and chemical series	- Potential for identifying breakthrough hypotheses, targets, biomarkers, and molecules, as well as diagnostic solutions - Accelerated and more efficient R&D cycles

Next, as we have reviewed what “AI drug discovery” attempts to model (holistic biology vs mainstream “reductionism”), and what kind of models are generally capable of doing so, let’s discuss another crucial aspect — AI platform “maturity” as a software product.

"AI Drug Discovery" is Also About Building Software

A characteristic feature of leading AIDD platforms versus “superficial” AI-companies, is the demonstrable focus on building actual software, a simple but somehow overlooked observation by many analysts, journalists, and commentators in this area.

We should expect that AIDD company has to be able to demonstrate the presence of a robust, self-contained software platform that supports critical functionalities—ranging from user-friendly interfaces (GUI) for data input and parameter tuning, to configurable machine learning modules (including algorithm selection, hyperparameter adjustment, and visualization of model performance).

Such a platform should integrate standardized data ingestion pipelines (e.g., for omics data, small-molecule libraries, or clinical metadata) with back-end components enabling dynamic model training, validation, and iterative optimization (e.g., active learning, reinforcement learning loops).

A well-documented application programming interface (API) is also essential for interoperability with external tools, ensuring end users can automate workflows and seamlessly exchange data between software components. Additionally, a proper end-to-end solution should incorporate security, data integrity measures (version control, audit trails, encryption), and deployment options (on-premises or cloud-based) to fit diverse organizational needs.

For platforms that are meeting the definition of “AI drug discovery” suggested by a new framework, you can actually see and even access a demo of their software, and in some cases, like with Insilico Medicine, Schrodinger, OWKIN, Iktos, CytoReason, BenchSci, and others — license it and use it for your internal projects.

We were unable to identify information about software characteristics or live demos from the overwhelming majority of companies claiming to be AI-driven businesses.

In the context of AI-driven drug discovery (AIDD), the maturity of a company’s software platform is not a minor detail — it’s foundational. This is because no AI solution currently exists that can independently produce a clinical-grade therapeutic at the push of a button. Despite impressive advances, today’s AI systems serve primarily as intelligent co-pilots — tools that support, rather than replace, the expertise of human scientists.

Given this supporting role, the real value of an AI system lies in how seamlessly it integrates into a company’s internal workflows. To have a tangible impact on R&D productivity and innovation quality, the software must be more than a set of models or interfaces — it must be a mature, interoperable platform capable of scaling across the organization. Without that level of software sophistication, the promise of AI in drug discovery remains largely theoretical.

AI drug discovery (AIDD) companies operate at the intersection of software and life sciences, yet their valuation and benchmarking remain largely pharma-centric. Current assessments prioritize clinical pipeline progression, regulatory milestones, and wet-lab validation, while largely overlooking key software-driven metrics such as model accuracy, algorithmic scalability, data ownership, and compute efficiency.

Given that many AIDD firms generate value through AI-powered platforms, predictive analytics, and proprietary datasets, their business potential should also be measured using software industry methodologies—such as revenue from AI-as-a-service models, IP valuation of proprietary algorithms, and cloud-based scalability. A more comprehensive benchmarking approach should integrate both software and pharmaceutical industry frameworks to more accurately capture the diverse value propositions of AIDD companies.

Access to Data is King

We expect 2025 to bring increased momentum in areas related to data generation, integration, and applied analytics — particularly across technologies like next-generation sequencing (NGS), advanced proteomics, mass spectrometry, cryo-EM, organ-on-chip systems, and robotics-enabled laboratories. These technologies are foundational to enabling richer, more comprehensive datasets for use in drug discovery and translational research.

In this context, companies such as Tempus represent a broader class of data infrastructure providers. Tempus focuses on aggregating and structuring clinical and molecular data, and developing software systems to support data accessibility and clinical decision-making. Their platform is used by healthcare institutions and life sciences organizations to inform diagnostics, treatment choices, and research efforts. As the volume and complexity of biomedical data continue to increase, such platforms may become central to integrating real-world and experimental datasets in support of AI-driven discovery workflows.

From early on, AI drug discovery companies like Insilico Medicine, BPGbio, Recursion, and more recently, NOETIK, have been investing in data acquisition as a cornerstone of their assets valuation.

For instance, Insilico Medicine's so-called “6th-generation” intelligent robotics drug discovery laboratory, launched in December 2022, integrates AI-powered decision-making with fully automated robotic modules for target discovery, compound screening, precision medicine development, and translational research.

Insilico Medicine's 6th-generation robotic lab — Insilico Medicine’s 6th-generation robotic lab in Suzhou BioBAY, China

According to company claims, by combining its Pharma.AI platform with six functional modules—spanning automated cell culture, high-throughput screening, next-generation sequencing (NGS), and high-content imaging—the lab forms a closed-loop system that validates novel targets, optimizes lead compounds, and generates high-quality biological data to train and refine AI models.

In another example, Recursion's data foundation includes 65+ petabytes of proprietary multiomics data, such as phenomics, transcriptomics, proteomics, ADME, InVivomics, genomics, and patient data. Internally, they’ve generated ~36 petabytes via 2.2 million weekly high-throughput experiments, using CRISPR-Cas9 editing and Brightfield imaging to create one of the largest pharma-related datasets. This data is embedded using AI models for advanced biological analysis.

According to an exclusive interview with a company representative, Recursion processes 2.2+ petabytes of transcriptomics data and integrates 20 petabytes of patient data from Helix and Tempus, covering whole genome and exome sequencing from hundreds of thousands of cases.

In cell manufacturing, Recursion produces 1 trillion hiPSC-derived neuronal cells, creating the "Neuromap" for neuroscience and oncology programs with Roche and Genentech, spanning 40 therapeutic programs.

Next example, CytoReason constructs its data foundation by integrating extensive public and proprietary datasets, encompassing bulk and single-cell transcriptomics, proteomics, and clinical data, into a unified AI-driven Disease Model Platform.

This platform employs advanced machine learning algorithms to map and compare treatments, patient groups, and disease mechanisms at cellular and molecular levels, enabling comprehensive analyses across various diseases and tissues.

Yet another example is Berg Health (now BPGbio), which established an extensive biobank comprising over 100,000 clinically annotated human specimens, including biofluids and tissue samples, to fuel their AI-driven drug discovery platform.

The company conducted comprehensive multi-omics profiling of the specimens—encompassing genomics, proteomics, metabolomics, and lipidomics—to capture a holistic view of human biology.

The resulting high-dimensional datasets were analyzed using their proprietary NAi Interrogative Biology® platform, which integrates Bayesian artificial intelligence learning algorithms to identify disease-specific biomarkers and therapeutic targets. This, arguably, led to the company’s quite successful clinical trial launches over the years.

Finally, a more recent but promising approach is how NOETIK is building their data foundation by sourcing curated human tumor specimens for its in-house biobank, applying stringent quality controls on parameters like ischemia time, necrosis percentage, and sample age, and ensuring each sample is pathologist-reviewed before inclusion.

The company generates multimodal datasets through advanced techniques such as spatial transcriptomics for single-cell RNA expression, whole exome sequencing for genomic alterations, and custom protein panels to map tumor-immune microenvironment interactions, all anchored by spatially randomized tissue microarrays to mitigate slide-level artifacts.

This vast data pipeline, coupled with their patent-pending processes, enables the creation of high-quality, self-supervised training datasets that power their AI engine OCTO, designed to model tumor biology and predict patient-specific therapeutic responses.

Validation is Critical for AI Drug Discovery Platforms

Finally, a central measure of credibility in AIDD is platform validation, which typically involves demonstrating tangible outcomes and reproducibility across diverse use cases.

Possible ways to validate a platform include a combination of the following:

(1) by advancing internal pipelines of novel therapeutics, where the AI engine is used support R&D team in discovering, designing, and optimizing lead molecules that progress through preclinical and, in some cases, clinical development.

(2) through partnerships with established pharmaceutical or biotech organizations, enabling third-party testing of the AI platform’s predictive power and generative capabilities on proprietary datasets. A track-record of public milestone announcements is critical.

(3) via public software demos or proof-of-concept studies published in peer reviewed journals, and patents.

(4) via regular publishing of AIDD case studies in peer reviewed journals

Below is a table showing historical pipeline growth dynamics for several notable companies frequently referenced in the AI-driven drug discovery space, including BenevolentAI, Healx, Insilico Medicine, Schrodinger, Relay Therapeutics, Recursion, Valo Health, Verge Genomics, and Exscientia which was acquired by Recursion in 2024:

———	Program	Ownership	Indication	Target	2019	2020	2021	2022	2023	2024	2025^◔
Table 2 (the indicated data is for end of May 2025) DISCLAIMER: Historical data sourced from archived public sources, including Webarchive service (see references 1-36), the data is NOT provided by the companies, and may have evolved.
BenevolentAI	BEN-8744	Whole	Ulcerative Colitis	PDE10			Discovery	Preclinical	Phase 1	Phase 1	unknown
	BEN-28010	Whole	Glioblastoma Multiforme	CHK1			Discovery	Preclinical	Preclinical	Preclinical	unknown
	BEN-34712	Whole	ALS	RARαβ				Discovery	Preclinical	Preclinical	unknown
	-	Whole	Parkinson's disease	-				Discovery	Discovery	Discovery	unknown
	-	Whole	Fibrosis	-				Discovery	Discovery	Discovery	unknown
	-	Co-owner w/ AstraZeneca	Chronic Kidney Disease	-					Discovery	Discovery	unknown
	-	Co-owner w/ AstraZeneca	Heart Failure	-						Discovery	unknown
	-	Co-owner w/ AstraZeneca	Systemic Lupus Erythematosus	-						Discovery	unknown
	-	Co-owner w/ Merck	Oncology	-					Discovery	Discovery	unknown
	-	Co-owner w/ Merck	Neurology	-					Discovery	Discovery	unknown
	-	Co-owner w/ Merck	Immunology	-					Discovery	Discovery	unknown
	-	Co-owner w/ AstraZeneca	Idiopathic Pulmonary Fibrosis	-					Discovery	unknown
	BEN-2293	Whole	Atopic Dermatitis	TrkA, TrkB, and TrkC	Discovery	Preclinical	Phase 1	Phase 2	Phase 2`
	BEN-9160	Whole	ALS	Bcr-Abl			Discovery	Preclinical	unknown
	-	Whole	Inflammatory Bowel disease (IBD)	-				Discovery	unknown
	-	Whole	Antiviral	-				Discovery	unknown
	-	Whole	Oncology	-				Discovery	unknown
	-	Whole	Oncology	-				Discovery	unknown
	-	Whole	NASH	-				Discovery	unknown
	-	Whole	Oncology	-				Discovery	unknown
	-	Whole	Parkinson's disease	-				Discovery	unknown
	-	Whole	Inflammation	-				Discovery	unknown
Healx	HLX-1502	-	Neurofibromatosis Type 1: plexiform/ cutaneous neurofibroma	-				Preclinical	Preclinical	Phase 1	Phase 2
	HLX-1502	-	Neurofibromatosis Type 2	-						Preclinical	Preclinical
	HLX-0213	-	Neurofibromatosis Type 1	-						Preclinical	Preclinical
	HLX-0205 + HLX-0206	-	Fragile X syndrome	-				Preclinical	Preclinical	Preclinical	Preclinical
	HLX-0553	-	Angelman syndrome	-				Preclinical	Preclinical	Preclinical	Preclinical
	HLX-1066	-	Autosomal dominant polycystic kidney disease (ADPKD)	-				Preclinical	Preclinical	Preclinical	unknown
	-	-	Autosomal recessive polycystic kidney disease (ARPKD)	-				Preclinical	Preclinical	unknown
	-	-	Autosomal Dominant Polycystic Liver Disease	-				Preclinical	Preclinical	unknown
	-	-	Myotonic Dystrophy type-1	-					Preclinical	unknown
	-	-	Autosomal Dominant Optic Atrophy	-					Preclinical	unknown
	HLX-2607	-	Autosomal Dominant Polycystic Kidney Disease	-						Discovery	Preclinical
	-	-	Leber Hereditary Optic Neuropathy	-					Discovery	unknown
	-	-	Spinocerebellar Ataxia	-					Discovery	unknown
	-	-	Pseudoachondroplasia	-					Discovery	unknown
	-	-	Chronic pancreatitis	-				Preclinical	unknown
	-	-	Renal undisclosed disease	-				Preclinical	unknown
	-	-	Facioscapulohumeral muscular dystrophy (FSHD)	-				Preclinical	unknown
	-	-	COVID-19	-				Preclinical	unknown
	-	-	Bone undisclosed disease	-				Preclinical	unknown
	-	-	Liver undisclosed disease	-				Preclinical	unknown
	-	-	Liver undisclosed disease	-				Preclinical	unknown
	-	-	Neuromuscular undisclosed disease	-				Preclinical	unknown
Insilico Medicine	INS018_055	Whole	IPF	TNIK		Discovery	Preclinical	Phase 1	Phase 2	Phase 2	Phase 2/3
	ISM012	Whole	Anemia of Chronic Kidney Disease	PHD1/2			Discovery	Preclinical	Phase 1	Phase 1	Phase 1
	ISM5411	Whole	Inflammatory bowel disease (IBD)	PHD1/2			Discovery	Preclinical	Phase 1	Phase 1	Phase 1
	ISM8207	Co-owner w/ Fosun	Immuno-oncology	QPCTL			Discovery	Preclinical	Phase 1	Phase 1	Phase 1
	ISM3312	-	COVID-19	3CLpro			Discovery	Preclinical	Phase 1	Phase 1	Phase 1
	ISM3091	Out-licensed, Exelixis	BRCA-mutant cancer	USP1			Discovery	Preclinical	Phase 1	Phase 1	Phase 1
	ISM5043	Out-licensed, Menarini	ER+/HER2-breast cancer	KAT6					Preclinical	Phase 1	Phase 1
	-	Whole	Kidney fibrosis	TNIK			Discovery	Preclinical	Preclinical	Preclinical	Preclinical
	ISM3412	Whole	MTAP-/-cancer	MAT2A			Discovery	Preclinical	Preclinical	Preclinical	Preclinical
	-	Whole	IPF (inhalable)	TNIK			Discovery	Discovery	Preclinical	Preclinical	Preclinical
	ISM9274	Whole	Solid tumors	CDK12/13			Discovery	Discovery	Preclinical	Preclinical	Preclinical
	ISM5939	Whole	solid tumors	ENPP1			Discovery	Discovery	Preclinical	Preclinical	Preclinical
	ISM4525	Whole	Solid tumors	DGKA					Preclinical	Preclinical	Preclinical
	ISM8001	Whole	Solid tumors	FGFR2/3					Preclinical	Preclinical	Preclinical
	ISM6331	Whole	Solid tumors	TEAD					Preclinical	Preclinical	Preclinical
	-	Whole	Solid tumors	KIF18A						Preclinical	Preclinical
	ISM2196	Whole	Solid tumors	WRN						Preclinical	Preclinical
	ISM027	Whole	Solid tumors	cMYC					Discovery	Discovery	unknown
	ISM016	Whole	Gout flare	NLRP3			Discovery	Discovery	Discovery	Preclinical	Preclinical
	ISM022	Whole	AML, Solid tumors	CDK8			Discovery	Discovery	unknown
	ISM023	Whole	Solid tumors	PARP7			Discovery	Discovery	unknown
	-	-	Skin Fibrosis	TNIK			Discovery	Discovery	unknown
	-	Co-owner w/ Fosun	Diabetic Nephropathy, FSGS	-			Discovery	unknown
Recursion / Exscientia	REC-2282	Whole	Neurofibromatosis Type 2	HDAC		Preclinical	Phase 1	Phase 1	Phase 2	Phase 2	Phase 2
	REC-4881	Whole	Familial Adenomatous Polyposis	MEK1 and MEK2		Preclinical	Phase 1	Phase 1	Phase 2	Phase 2	Phase 2
	SYCAMORE / REC-994	Whole	Cerebral Cavemous Malformation	antioxidant, no specific target		Preclinical	Phase 1	Phase 1	Phase 2	Phase 2	Phase 2
	REC-4881	Whole	AXIN1 or APC Mutant Cancers	MEK1 and MEK2					Phase 1	Phase 2
	REC-3964	Whole	Clostridium Difficile Colitis	C. difficile toxins		Discovery	Preclinical	Preclinical	Phase 1	Phase 2	Phase 2
	REC-1245	Whole	HR-proficient Ovarian Cancer RBM39	RBM39					Preclinical	Phase 1	Phase 1
	REC-4209	in-licensed from Bayer	Idiopathic Pulmonary Fibrosis	-						Preclinical	Preclinical
	-	Whole	Oncology	-			Discovery	Discovery	Discovery	unknown
	Immunotherapy Target Alpha	Whole	Oncology	-				Discovery	Discovery	unknown
	Immunotherapy Target Delta	Whole	-	-					Preclinical	Preclinical
	REC-3599	Whole	GM2 Gangliosidosis	PKC and GSK3ß		Preclinical	Phase 1	terminated
	-	Whole	Immune Checkpoint resistance in STK11-NSCLC	-			Preclinical	Preclinical	unknown
	-	-	Pulmonary Arterial Hypertension	-				Preclinical	unknown
	-	Whole	-	-				Preclinical	unknown
	-	Whole	Neuroinflammation	-			Discovery	Discovery	unknown
	-	Whole	Charcot-Marie-Tooth Disease Type 2	-			Discovery	Discovery	unknown
	Immunotherapy Target Beta	Whole	Oncology	-				Discovery	unknown
	-	Whole	Hepatocellular Carcinoma	-				Discovery	unknown
	-	Whole	Batten Disease	-			Discovery	unknown
	REC-617	Co-owner w/ Apeiron	Transcriptionally addicted cancers	CDK7			Preclinical	Preclinical	Phase 1/2	Phase 1/2	Phase 1/2
	EXS4318	Out-licensed, BMS	inflammatory and immunologic diseases	PKC-theta		Preclinical	Preclinical	Preclinical	Phase 1	Phase 1
	REC-4539	Whole	Oncology, AML, SCLC	LSD1			Discovery	Discovery	Preclinical	Preclinical	Phase 1
	REC-3565	Whole	Oncology, Hematology	MALT1			Discovery	Discovery	Preclinical	Preclinical	Phase 1
	REV102	Co-owner	Hypophosphatasia	ENPP1			Discovery	Preclinical	Preclinical	Preclinical	Preclinical
	EXS21546	Majority, w/ Evotec	High Adenosine Signature Cancers	A2aR		Preclinical	Phase 1	Phase 1/2	Phase 1/2`
	-	Whole	COVID-19	Mpro			Discovery	Preclinical	unknown
	-	Whole	Inflammation and Immunity	NLRP3			Discovery	Preclinical	unknown
	-	Co-owner	Psychiatry	-			Discovery	Preclinical	unknown
	-	Co-owner	Oncology	ENPP1			Discovery	Preclinical	unknown
	-	Co-owner	Oncology	-			Discovery	Discovery	unknown
	-	Co-owner	Inflammation and immunity	-			Discovery	Discovery	unknown
	-	Co-owner	Inflammation and Immunity	-			Discovery	Discovery	unknown
	-	Co-owner	Oncology	-			Discovery	Discovery	unknown
	-	Co-owner	Oncology	-			Discovery	Discovery	unknown
	-	Whole	Immuno-Oncology	HPK1			Discovery	unknown
	-	Whole	Oncology	-			Discovery	unknown
	-	Whole	Oncology	-			Discovery	unknown
	-	Whole	Oncology	-			Discovery	unknown
	-	Whole	Oncology	-			Discovery	unknown
	-	Whole	Anti-infective	-			Discovery	unknown
Relay Therapeutics	RLY-4008	Whole	FGFR2-altered cholangiocarcinoma (CCA)	FGFR2 (mutant+WT)	Discovery	Phase 1	Phase 1	Phase 1	Phase 1/2	Phase 1/2	Phase 1/2
	RLY-2608 monotherapy	Whole	Breast cancer and solid tumors	PI3Kα				Phase 1	Phase 1	Phase 1	Phase 1
	RLY-2608	Whole	Vascular malformations	PI3Kα						Preclinical	Preclinical
	RLY-1013 (degrader)	Whole	Breast Cancer	ERα					Discovery	Preclinical	Preclinical
	NRAS	Whole	melanoma, colorectal and non-small-cell lung	NRAS						Preclinical	Preclinical
	αGal Chaperone	Whole	Fabry disease	αGal						Preclinical	Preclinical
	RLV-PI3K1047 (RLY-5836)	Whole	-	PI3Kα			Discovery	Preclinical	Phase 1	unknown
	RLY-2139	Whole	Oncology	CDK2			Discovery	Discovery	Preclinical	paused
	GDC-1971	Co-owner w/ Genentech	Cancers, expand into multiple combination	SHP2	Preclinical	Phase 1	Phase 1	Phase 1	Phase 1	Phase 1	Phase 1
	-	Whole	-	PI3Kα				Discovery	unknown
	-	Whole	Oncology	-			Discovery	Discovery	unknown
	-	Whole	Oncology	-			Discovery	Discovery	unknown
	-	Whole	Genetic disease	-			Discovery	Discovery	unknown
	-	Whole	Genetic disease	-			Discovery	Discovery	unknown
Schrödinger	SGR-1505	Whole	Relapsed or refractory B-cell lymphoma, chronic lymphocytic leukemia	MALT1	Discovery	Discovery	Preclinical	Phase 1	Phase 1	Phase 1	Phase 1
	SGR-2921	Whole	Hematological cancers and solid tumors	CDC7				Preclinical	Phase 1	Phase 1	Phase 1
	SGR-3515	Whole	Solid tumors	WEE1/MYT1				Discovery	Preclinical	Phase 1	Phase 1
	SDGR5	Whole	KRAS-driven Cancers	SOS1		Discovery	Discovery	Discovery	Preclinical	Preclinical	Preclinical
	-	Whole	Neurology	LRRK2				Discovery	Discovery	Discovery	Discovery
	-	Whole	Oncology	PRMT5-MTA				Discovery	Discovery	Discovery	Discovery
	-	Whole	Oncology	EFGR(C797S)				Discovery	Discovery	Discovery	Discovery
	-	Whole	Immunology	NLRP3				Discovery	Discovery	Discovery	Discovery
	-	Whole	Oncology	-				Discovery	Discovery	Unknown
	-	Whole	Oncology	-				Discovery	Discovery	Unknown
	-	Whole	Immunology	-				Discovery	Discovery	Unknown
	SDGR1	Whole	Esophageal and Lung Cancers,	CDC7	Discovery	Discovery	Discovery	unknown
	SDGR2	Whole	Ovarian, Pancreatic, Breast and Lung Cancers	WEE1	Discovery	Discovery	Discovery	unknown
	TAK-279	Co-owner w/ Takeda	Psoriasis	TYK2					Phase 2	Phase 3	Phase 3
	-	Gilead	NASH	ACC				Phase 2	Phase 2	Phase 2	Phase 2
	MORF-057	Lilly	Inflammatory bowel diseases	α4β7				Phase 2	Phase 2	Phase 2	Phase 2
	-	Co-owner w/ Nimbus Therapeutic	Immuno-oncology	HPK1					Phase 1/1	Phase 1/2	Phase 1/2
	-	Co-owner w/ Structure Therapeutics	Pulmonary arterial hypertension	APJR					Phase 1	Phase 1	Phase 1
	-	Structure Therapeutics	Idiopathic pulmonary fibrosis	LPA1R					Preclinical	Preclinical	Preclinical
	-	Co-owner w/ Ajax	Oncology	JAK2					Discovery	Preclinical	Preclinical
	-	BMS	Neurology	-					Discovery	Discovery	Discovery
	-	Collab. w/ BMS	Oncology, Immunology, Neurology	-				Discovery	Discovery	Discovery	Discovery
	-	Co-owner w/ Takeda	Oncology	-				Discovery	Discovery	Discovery	Discovery
	-	Co-owner w/ Lilly	Immunology	-				Discovery	Discovery	Discovery	Discovery
	-	Lilly	Pulmonary arterial hypertension	-				Discovery	Discovery	Discovery	Discovery
	-	Lilly	Solid tumors, fibrosis	αvβ8				Discovery	Discovery	Discovery	Discovery
	-	Lilly	GI indications	α4β7				Discovery	Discovery	Discovery	Discovery
	-	Co-owner w/ Bright Angel Therapeutics	Antifungal	HSP90					Discovery	Discovery	Discovery
	-	Structure Therapeutics	-	-					Discovery	Discovery	Discovery
	-	Otsuka	CNS	-				Discovery	Discovery	Discovery	Discovery
	-	Co-owner w/ Loxo Therapeutics	oncology	-					Phase 1	unknown
	-	Co-owner w/ BMS	immunology	-					Discovery	unknown
	-	Co-owner w/ Sanofi	oncology	-					Discovery	unknown
	-	Co-owner w/ BMS	Oncology	-				Discovery	unknown
	-	Co-owner w/ BMS	Oncology	-				Discovery	unknown
	-	Co-owner w/ BMS	Immunology	-				Discovery	unknown
	-	Co-owner w/ Zai Lab	Oncology	-				Discovery	unknown
	SDGR4	Co-owner w/ BMS	Renal Cell Carcinoma	HIF-2a		Discovery	Discovery	unknown
	-	Co-owner w/ BMS	Oncology, Immunology, Neurology	-			Discovery	unknown
Verge Genomics	VRG50635	Co-developer w/ Ferrer	ALS	PIKfyve		Discovery	Preclinical	Phase 1	Phase 1	Phase 1	Phase 1
	VRG201	Whole	Obesity	CD38						Preclinical	Preclinical
	VRG201	Whole	Metabolic Syndrome	CD38						Preclinical	Preclinical
	-	Whole	Alzheimer disease / Parkinson's Disease	PIKfyve				Discovery	Discovery	Discovery	Discovery
	-	Whole	Neurodegenerative Diseases	CD38				Discovery	Discovery	Discovery	Discovery
	-	Whole	Peripheral	PIKfyve						Discovery	Discovery
	-	Whole	Schizophrenia	-				Discovery	Discovery	Discovery	Discovery
	-	Whole	Frontotemporal Dementia	-				Discovery	Discovery	Discovery	Discovery
	-	Whole	Progressive Supranuclear Palsy	-				Discovery	Discovery	Discovery	Discovery
	-	-	Crohn's Disease	-					Discovery	Discovery	Discovery
	-	-	Ulcerative Colitis	-					Discovery	Discovery	Discovery
	-	-	Psoriasis	-					Discovery	Discovery	Discovery
	-	-	Lewy Body Dementia	-						Discovery	Discovery
	-	-	Friedreich’s Ataxia	-						Discovery	Discovery
	-	-	Myotonic Dystrophy 1	-						Discovery	Discovery
	-	-	Picks Disease	-						Discovery	Discovery
	Partnered Programs	Co-owner w/ Lilly	ALS	-					Discovery	Discovery	Discovery
	Partnered Programs	Co-owner w/ Lilly	ALS	-					Discovery	Discovery	Discovery
	-	-	Atopic Dermatitis	-					Discovery	unknown
	Partnered Programs	Co-owner w/ Alexion	Neurodegenerative Diseases	-					Discovery	unknown
	Partnered Programs	Co-owner w/ Alexion	Neuromuscular Diseases	-					Discovery	unknown
	-	Whole	COVID-19	PIKfyve			Discovery	Preclinical	unknown
	-	Whole	Undisclosed	-				Discovery	unknown
	-	Whole	Parkinson's Disease	-				Discovery	unknown
	-	Whole	Parkinson's Disease	-				Discovery	unknown
Valo Health	OPL-0301	-	Heart failure and Acute Kidney Injury	S1P1 agonist			Phase 1	Phase 2	Phase 2	unknown
	OPL-0401	-	Diabetic Retinopathy	ROCK 1/2 inhibitor			Phase 1	Phase 2	Phase 2	Phase 2`
	OPAL-0022	-	Atherosclerosis	-			Discovery	unknown
	OPAL-0004	-	Atherosclerosis, Glioblastoma	-			Discovery	unknown
	OPAL-0018	-	Atherosclerosis	-			Discovery	unknown
	OPAL-0003	-	Heart Failure, Glioblastoma	-			Discovery	unknown
	OPL-0101	-	Immuno-Oncology	-		Discovery	Preclinical	unknown
	OPAL-0021	-	cancer	-			Discovery	unknown
	OPAL-0015	-	NSCLC, Squamous Cell Carcinoma, Targeted Defined Tumors	USP28			Discovery	unknown
	OPAL-0024	-	Solid Tumors	-			Discovery	unknown
	OPAL-0001	-	Medulla/Glioblastoma Brain Tumors, Breast Cancer	PARP1			Discovery	unknown
	OPAL-0014	-	Pancreatic Ductal Adenocarcinoma (PDAC), Targeted Defined Tumors	-			Discovery	unknown
	OPAL-0023	-	Defined Tumors, Immune Modulation	-			Discovery	unknown
	OPAL-0012	-	NSCLC	USP7			Discovery	unknown
	OPAL-0016	-	Induced Neuropathy and Cardiomyopathy	-			Discovery	unknown
	OPAL-0002	-	Neurodegenerative disorders	-			Discovery	unknown
	OPAL-0006	-	Neurodegenerative: Oncology (metastatic)	-			Discovery	unknown

Commenting on the above data from the table and the infographics, Insilico Medicine has shown notable pipeline growth over the past five years. Specifically, the company has launched 31 therapeutic programs targeting diverse indications, with 22 preclinical candidates nominated from 2021 and nine more in 2022, and a total of 10 pipelines receiving IND approval. At present, the leading Insilico program for idiopathic pulmonary fibrosis (IPF) was discovered from concept to Phase I trials in just under 30 months and is now in Phase 2 clinical trials in both the United States and China, while five other programs are in Phase 1.

Additionally, several clinical assets have been out-licensed or co-developed with third parties, including recent milestone payment. These accomplishments appear to highlight a significant productivity boost, yet the question remains whether they definitively prove that AI technologies — rather than more conventional R&D structures and partnerships — are the key drivers behind this rapid expansion.

A similar phenomenon is observed with Schrodinger’s platform, which, although not explicitly marketed as “AI,” has enabled significant pipeline development. While software capabilities can help streamline decision-making — such as accelerating target identification and optimizing lead compounds — it is not a simple matter of “printing” successful molecules. Determining the true impact of AI requires assessing the extent to which such platforms reduce discovery cycles or increase success rates in a statistically meaningful way. One potentially stronger validation approach is to examine how many third-party organizations license and effectively use these tools for drug development, thereby providing external feedback and real-world performance benchmarks.

Beyond just the number of drug candidates in the pipelines of AI companies, it is interesting to look at the target novelty landscape of some of the well-known AI players:

Novelty of targets that drug discovery AI companies work with — Diagram 3

Speed and Cost of Drug Development

The timelines of nominating preclinical candidates from start to IND, reported by some AI-driven drug discovery companies over the last several years, suggest a seemingly accelerated path, compared to known industry averages.

For instance, companies like Insilico Medicine, Recursion, and Exscientia have compressed the discovery phase from the industry-standard 2.5 to 4 years (40-50 months) down to 9 to 18 months in some cases.

According to a recently published benchmark, Insilico Medicine averages 12-18 months per program, testing only 60-200 molecules, while Recursion advances candidates in 18 months with fewer than 200 molecules per program. Exscientia, which merged with Recursion, claims to have shortened its timeline from four to five years to just 12 to 18 months, screening 150-250 molecules — a notable contrast to traditional methods that sometimes require testing 3,000-5,000 molecules per program.

Table 3

Company	Discovery Timelines	Programs
Benevolent AI	Around 24 months in the case of BEN‑8744	A small molecule PDE10 inhibitor for UC treatment
Evaxion	Around 12 months for EVX‑01	Neoantigen vaccine EVX-01 for metastatic melanoma
Exscientia (merged with Recursion)	Around 11 months for EXS4318 ~12-18 months on average	EXS4318 (PKC-theta inhibitor) for inflammatory and immunologic diseases, On average, 150-200 mols. per program.
Iambic Therapeutics	Around 8 months for IAM1363	A small molecule for the treatment of HER2-altered cancers
Insilico Medicine	~12 months on average across 22 preclinical candidates	Programs in Idiopathic Pulmonary Fibrosis (IPF), Inflammatory bowel disease (IBD), Immuno-oncology, COVID-19, and other On average, 60-200 mols. per program
Nimbus Therapeutics	Around 56 months for NDI‑034858	NDI-034858 is an allosteric TYK2 inhibitor for the treatment of multiple autoimmune diseases
Recursion	At least 18 months in the case of REC‑1245	Recursion's REC-1245 RBM39 degrader for solid tumors and lymphoma <200 mols. per program
Relay	Around 48 months for RLY‑4008	FGFR2-specific inhibitor RLY-4008 for cholangiocarcinoma
Schrodinger	Over 24 months for SGR‑3515	A Type 1 kinase inhibitor oncology project
Traditional approaches	2.5-4 years (40-50 months)	3000-5000 mols. per program

However, these seemingly accelerated timelines have yet to fully translate into clinical success. While some AI-developed drugs have progressed into ongoing clinical trials — such as those from Insilico Medicine, Iambic, and Recursion — there are a number of failed clinical trials or discontinuations for strategic reasons. Some examples are discussed in our 2024 report “It’s Been a Decade of AI in the Drug Discovery Race. What’s Next?”

Although AI-based computational tools could predict promising candidates faster (according to various claims), it does not guarantee that these drugs will be clinically viable, effective, or safe. The reduced number of molecules screened in AI-driven programs may also pose risks, as narrowing the search space too aggressively could lead to overlooked liabilities that emerge later in clinical development.

However, apart from timelines, the important parameter is also cost. While this study is not looking into the cost structure of such programs, Insilico Medicine once reported that some of their AI-designed drug candidates were discovered at around 10% of a “conventional” program cost, a claim we did not specifically validate.

Pragmatic Considerations for Evaluating AIDD Companies:

Look Beyond Candidate Counts: Merely tallying the number of pipeline assets does not capture the incremental value AI platforms may provide. Faster program initiation or more accurate attrition rates, for instance, could be more telling indicators.
Evaluate Decision-Making Efficiency: Pinpoint where AI significantly shortens R&D workflows — e.g., by expediting hit-to-lead stages or improving target validation, or supporting more efficient clinical trial protocol design.
Scrutinize External Adoption: Seek third-party evidence of productivity gains, such as collaboration announcements, successful milestones, or continued software licensing agreements. Tools that are openly licensed or sold commercially allow for real competitive benchmarking.
Consider Contextual Factors: Keep in mind that corporate strategy, funding, and existing R&D infrastructure often play major roles in pipeline output. It is not always possible to isolate AI’s contribution without analyzing these concurrent influences. In fact, it is quite the opposite: it is almost impossible to calculate the actual impact of AI algorithms on the actual drug development process.
Incorporate Financial Signals with Caution: The real value of AIDD platforms comes from their scientific impact. Still, selected financial indicators (e.g. how fast revenue is growing, how much is spent on R&D, or steady income from software) may offer useful context about platform maturity and commercial progress. For a snapshot of financial performance across selected AIDD companies, see Table 4 and Table 5 below.

Table 4 **

DISCLAIMER: The financial data provided below is intended solely for contextual analysis and educational purposes. It should not be interpreted as investment advice, endorsement, or an indicator of future performance. All figures are sourced from publicly available annual reports and financial disclosures as of the latest reporting period. Readers are encouraged to conduct independent due diligence when evaluating any financial metrics or company performance.

	Year	Total Revenue	Drug discovery services	Software solution services	Total Revenue Growth, %	Drug discovery services Growth, %	Software solution services Growth, %	Research and development expenses	Operating Loss	Total current assets
BenevolentAI ^*1	2021	6,254						-76,962	-164,052	76,567
	2022	12,774			104%			-86,958	-238,352	183,977
	2023	9,334			-27%			-77,380	-98,766	116,355
	2024	^*2			^*2			^*2	^*2	12,496 ^*3
Insilico Medicine ^*4	2021	4,713	3,687	1,026				-38,489	-49,359	161,191
	2022	30,147	28,648	1,499	540%	677%	46%	-78,175	-221,828	218,751
	2023	51,180	47,818	3,362	70%	67%	124%	-97,341	-211,640	188,653
	2024	85,834 ^*5	79,733	3,970	68%	67%	18%	-91,895	-17,096	133,409
Recursion	2021	10,178						-135,271	-182,775	534,718
	2022	39,843			291%			-155,696	-245,727	569,814
	2023	44,575			12%			-241,226	-350,060	438,137
	2024	58,839			32%			-314,421	-479,004	714,269
Relay Therapeutics	2021	3,029						-172,650	-364,698	976,242
	2022	1,381			-54%			-246,355	-299,275	1,019,505
	2023	25,546			1750%			-330,018	-373,000	770,103
	2024	10,007			-61%			-319,089	-372,468	809,204
Schrodinger	2021	137,931	24,695	113,236				-90,904	-111,443	625,060
	2022	180,955	45,377	135,578	31%	84%	20%	-126,372	-146,817	533,989
	2023	216,666	57,542	159,124	20%	27%	17%	-181,766	-177,448	567,796
	2024	207,539	27,174	180,365	-4%	-53%	13%	-201,785	-209,296	634,993

** - the data taken from annual reports, see references 37-52

*1 - for comparison reasons, values in the UK companies reports (originally in GBP) have been converted to USD at the GBP/USD rate (closing price) as of 31 December of the year depending on the report:
2021 - 1.3522; 2022 - 1.2097; 2023 - 1.2732; 2024 - 1.2515

*2 - the BenevolentAI was delisted and merged into Osaka Holdings S.à r.l. in March 2025, the annual report for 2024 is not disclosed

*3 - total current assets value was calculated based on Pre merger Pro Forma Balance Sheet of BenevolentAI published on accounting statements

*4 - The values for years 2024, 2023, and 2022 were taken from the public annual report, while Insilico disclosed 2021 data in our previous AI report

*5 - Includes $2,131 thousand in revenue that is part of the amount attributed to “Other Discovery”

Table 5 ``

Company	Short % of Shares	Current Price (USD) 2025-05-30 (US Trading Hour)	52-Week Price Change %	Current MarketCap (USD'MM)	Last 12 months' Operating Cash Flow (USD'MM)	Last 12 months' Revenue (USD'MM)
Recursion Pharmaceuticals, Inc	25.64%	4.18	-49.5%	1,671	-359.2	58.8
Schrodinger, Inc.	13.92%	21.62	0.5%	1,388	-157.4	207.5
Relay Therapeutics, Inc.	13.51%	3.00	-53.2%	514	-249.1	10.0
Insilico Medicine	N/A	N/A	N/A	?	-57.4	85.8

`` - data taken from Yahoo Finance for May 30, 2025.

Grand Vision of AI Drug Discovery for 2025 and Beyond

Having reviewed many claimed AI drug discovery companies over a decade of progress, hype and facing reality (like failed clinical trials of arguably AI-designed drug candidates), it is time to define the goal and method of AI drug discovery, and accept that the overwhelming majority of companies are not there yet.

The entire idea of the AIDD movement is, in fact, not about improving existing drug discovery processes, like structure-based drug discovery or virtual screening via using better models, advanced machine learning etc.

Obviously, it helps and most of the companies in this business are doing it. Using better models for screening or docking could have marginal improvements of research processes, but does not change the fundamental problem of drug discovery: poor translation of hypotheses to clinical results and high degree of clinical failures due to unexpected toxicity or poor efficacy in a (sometimes, poorly) selected patient subpopulation.

The novelty and ambition of the AIDD approach is about redesigning the existing mainstream drug discovery paradigm into something different.

We suggest calling it “Holistic Drug Development (HDD).”

Starting from modeling the entirety of real-world data about patients (coming from specimens, analytical samples, EHRs, and other biomedical data), and taking into account all available preclinical data and experience, and building the path down to a relevant underlying hypothesis on a molecular level. And then, walking that path in reverse — from the newly discovered hypothesis, via drug design and development, back to the patient. Hopefully, with the improved probability of success. We believe we are still years away from this reality, but a number of companies are already building pieces of the puzzle of the industrialized research workflow of the future.

Time will tell if AIDD proves to be the better way to achieve HDD vision, we are cautiously optimistic...

References

1. BenevolentAI, pipeline, February 2025 https://web.archive.org/web/20250210224107/https://www.benevolent.com/pipeline/

2. BenevolentAI, pipeline, December 2023 https://web.archive.org/web/20231205114116/https://www.benevolent.com/pipeline/

3. BenevolentAI, annual report (PDF), 2022 https://www.benevolent.com/application/files/9816/7939/1282/BenevolentAI_Annual_Report_2022.pdf

4. Healx, pipeline, April 2025 https://web.archive.org/web/20250423114523/https://healx.ai/pipeline/

5. Healx, pipeline, April 2024 https://web.archive.org/web/20240417123453/https://healx.ai/pipeline/

6. Healx, pipeline, April 2023 https://web.archive.org/web/20230329164007/https://healx.ai/pipeline/

7. Healx, pipeline, December 2022 https://web.archive.org/web/20221203025122/https://healx.ai/pipeline/

8. Insilico, pipeline, April 2025 https://insilico.com/pipeline

9. Insilico, pipeline, December 2023 https://web.archive.org/web/20231204133620/https://insilico.com/pipeline

10. Insilico, pipeline, October 2022 https://web.archive.org/web/20221007131323/https://insilico.com/pipeline

11. Insilico, pipeline, February 2022 https://web.archive.org/web/20220213125657/https://insilico.com/pipeline

12. Exscientia, pipeline, November 2023 https://web.archive.org/web/20231130165922/https://www.exscientia.ai/pipeline

13. Exscientia, PR, August 2022 https://www.businesswire.com/news/home/20220817005681/en/Exscientia-Business-Update-for-Second-Quarter-and-First-Half-2022

14. Exscientia, article, July 2022 https://www.nanalyze.com/2022/07/exscientia-stock-ai-drug-discovery/

15. Exscientia, annual report, 2021 https://s28.q4cdn.com/460399462/files/doc_financials/2021/ar/2021-UK-Annual-Report.pdf

16. Recursion, pipeline, April 2025 https://web.archive.org/web/20250425190557/https://www.recursion.com/pipeline

17. Recursion, pipeline, April 2024 https://web.archive.org/web/20240414085309/https://www.recursion.com/pipeline

18. Recursion, pipeline, March 2023 https://web.archive.org/web/20230324234118/https://www.recursion.com/pipeline

19. Recursion, pipeline, January 2022 https://web.archive.org/web/20220131104947/https://www.recursion.com/pipeline

20. Recursion, pipeline, February 2021 https://web.archive.org/web/20210225041638/https://www.recursion.com/pipeline

21. Recursion, pipeline, January 2021 https://web.archive.org/web/20210129043831/https://www.recursion.com/pipeline

22. Relay, pipeline, March 2025 https://web.archive.org/web/20250321133438/https://relaytx.com/pipeline/

23. Relay, pipeline, February 2024 https://web.archive.org/web/20240227231146/https://relaytx.com/pipeline/

24. Relay, pipeline, November 2023 https://web.archive.org/web/20231111223956/https://relaytx.com/pipeline/

25. Relay, annual report (PDF), 2022 https://ir.relaytx.com/static-files/1b13dc48-4fb1-4ec3-b639-69636bc3ace1

26. Relay, annual report (PDF), 2021 https://ir.relaytx.com/static-files/65cffc5e-e6e3-42a3-9b87-cc44b93c2856

27. Relay, annual report (PDF), 2020 https://ir.relaytx.com/static-files/08d959ca-abd2-4a9c-bd25-be8eef73d732

28. Schrodinger, pipeline, April 2025 https://web.archive.org/web/20250421111538/https://www.schrodinger.com/pipeline

29. Schrodinger, pipeline, April 2024 https://web.archive.org/web/20240427094807/https://www.schrodinger.com/pipeline

30. Schrodinger, pipeline, November 2022 https://web.archive.org/web/20221124124721/https://www.schrodinger.com/pipeline

31. Schrodinger, pipeline, June 2021 https://web.archive.org/web/20210620183431/https://www.schrodinger.com/pipeline

32. Schrodinger, pipeline, June 2020 https://web.archive.org/web/20200606152921/https://www.schrodinger.com/pipeline

33. Schrodinger, pipeline, July 2019 https://web.archive.org/web/20190717045358/https://www.schrodinger.com/pipeline

34. Verge Genomics, pipeline, April 2025 https://www.vergegenomics.com/pipeline

35. Verge Genomics, pipeline, February 2024 https://web.archive.org/web/20240306224636/https://www.vergegenomics.com/pipeline

36. Verge Genomics, pipeline, November 2022 https://web.archive.org/web/20221104085232/https://www.vergegenomics.com/pipeline

37. BenevolentAI, report release, 2021 and 2022 https://www.benevolent.com/news-and-media/press-releases-and-in-media/benevolentai-unaudited-preliminary-results-year-ended-31-december-2022/

38. BenevolentAI, report release, 2023 https://www.benevolent.com/application/files/2417/1136/4663/BenevolentAI_Annual_Report_2023.pdf

39. BenevolentAI, accounting statements, 2024 https://www.benevolent.com/application/files/7717/3916/6608/Benevolent_AI__OSAKA_Holdings_Pro_forma_BS_-_Final.pdf

40. Insilico, annual report (PDF), 2024 https://www1.hkexnews.hk/app/sehk/2025/107348/documents/sehk25050802048.pdf

41. Recursion, annual report (HTML), 2021 https://ir.recursion.com/node/6926/html

42. Recursion, annual report (HTML), 2022 https://ir.recursion.com/node/8131/html

43. Recursion, annual report (HTML), 2023 https://ir.recursion.com/node/9691/html

44. Recursion, annual report (HTML), 2024 https://ir.recursion.com/node/11351/html

45. Relay, annual report (HTML), 2021 https://ir.relaytx.com/node/7691/html

46. Relay, annual report (HTML), 2022 https://ir.relaytx.com/node/8531/html

47. Relay, annual report (HTML), 2023 https://ir.relaytx.com/node/9196/html

48. Relay, annual report (HTML), 2024 https://ir.relaytx.com/node/10066/html

49. Schrodinger, annual report (PDF), 2021 https://d18rn0p25nwr6d.cloudfront.net/CIK-0001490978/7a72e457-9a9e-4efc-b9b3-5ead018c904d.pdf

50. Schrodinger, annual report (PDF), 2022 https://d18rn0p25nwr6d.cloudfront.net/CIK-0001490978/6835c32b-f977-482f-82c5-254066f66d06.pdf

51. Schrodinger, annual report (PDF), 2023 https://d18rn0p25nwr6d.cloudfront.net/CIK-0001490978/b3224b2d-5cc5-4081-ba8b-d89a31181139.pdf

52. Schrodinger, annual report (PDF), 2024 https://d18rn0p25nwr6d.cloudfront.net/CIK-0001490978/2ad2903d-0825-4d27-b42a-2e6966d88206.pdf

Edits

Edit 1 (2025-04-17): Following a clarification from Iambic representatives, we have updated the Iambic timeline in the Table 3, replacing 24 months for 8 months. The company explains that 24 months is for getting to clinic, while it took only 8 months to get to IND studies.
Edit 2 (2025-04-29): Insilico Medicine headquarters location updated
Edit 3 (2025-05-12): Recurion pipeline updated in table 2 (source)
Edit 4 (2025-06-06): Financial summary tables and a related bullet point were added to provide contextual insight into AIDD company performance

Report methodology

An analysis of historical therapeutic pipeline data (Table 2) was carried out using archived snapshots from the Web Archive, allowing us to review how pipeline diagrams appeared at earlier points in time. In some instances, annual financial reports were also consulted to retrieve pipeline details for previous years.

Efforts were made to track each molecule or program within a given pipeline across successive years, and if a particular program did not appear in the following year’s records, it was generally assumed that it had been put on hold for various reasons.

Target novelty analysis for Diagram 3 was performed based on the methodology and mathematical formula outlined in this file.

Correction policy

If you come across any factual inaccuracies or outdated information, please don’t hesitate to contact us promptly. We will address these issues by issuing corrections in a dedicated section of our report, pending editorial review.

This correction policy covers company profiles, technology evaluations, and all comparative analyses included in our report. Stakeholders are encouraged to report potential errors to our editorial team using this form.

All corrections will be clearly dated and thoroughly detailed to uphold the integrity of our comparative report and ensure our readers have access to the most accurate and up-to-date information.

Beyond Legacy Tools: Defining Modern AI Drug Discovery for 2025 and Beyond

In this report:

Disclaimer

“AI Drug Discovery” is About Holism

Generative AI

"AI Drug Discovery" is Also About Building Software

Access to Data is King

Validation is Critical for AI Drug Discovery Platforms

Speed and Cost of Drug Development

Grand Vision of AI Drug Discovery for 2025 and Beyond

References

Edits

Report methodology

Correction policy

Disclaimer

About Us

Questions or Suggestions?