In 2020, AI models predicted the efficacy of repurposed drugs for Covid-19. Clinical trials revealed eight out of nine predictions were correct. Here Imran S Haque, VP Data Science at Recursion describes the journey of the AI platform behind those predictions.
The Covid-19 pandemic demonstrated the power of the scientific community to rapidly mobilise resources and accelerate vaccines and treatments using novel technologies. We witnessed the unprecedented pace of vaccine development, fuelled by cutting-edge mRNA technology.
At Recursion, we saw this emerging health crisis as an opportunity to redirect our artificial intelligence (AI)-powered platform to identify potential treatments. Our core technology uses machine learning (ML) models on image-based measurements of cellular morphology to broadly explore biology and chemistry and accelerate drug discovery efforts. We screen cells that have been altered by hundreds of thousands of different chemical and genetic factors to represent different states of healthy, sick and treated cells, while our ML algorithms analyse the resulting millions of images to predict relationships across biological contexts and chemical factors.
One question we often get asked (and we ask ourselves) is how we know our approach actually works. How can we verify and trust our models and their predictions? In a field like ours, where technology reveals something completely new about biological function that we as humans can’t detect on our own, validation is critical. We need to prove to ourselves and our partners that this is worth investing and believing in; we need to show that it works.
The onset of the Covid-19 pandemic provided an opportunity to do exactly that. In the months and years that followed our screens to rapidly identify repurposable treatments for Covid-19, we watched closely as our predictions made their way through clinical studies and into patients. Today, there have been efficacy readouts from large randomised controlled clinical trials for nine of our predictions – eight of which we predicted correctly. That is a remarkable testament to the power of AI to change how discovery science is done.

An opportunity for AI to shine
When the pandemic hit in March 2020, most of our employees transitioned to working remotely and much of our laboratory work decelerated. But along with this massive disruption came an opportunity: to apply our platform to researching the SARS-CoV-2 virus. In an emerging public health crisis, time is of the essence. There was an urgent need for quick and adaptable drug discovery in the context of a complex and poorly understood disease. This is exactly the type of problem that an AI-powered drug discovery engine is designed to address. Our goal was to quickly identify existing therapeutics that could be repurposed for Covid-19 treatment by screening a library of approved and late-stage development drugs against a disease model in our platform.
In more traditional target-based drug discovery, researchers generate a therapeutic hypothesis focused on a given target: eg., modulation of target X plays a role in the downstream biology of disease Y. They create a model of the disease, select a set of biomarkers to predict translational benefits, and design bespoke and univariate assays to measure each one. These steps are time-consuming and specific to answering a particular scientific question at hand. They cannot be easily scaled or broadly applied to other therapeutic programs or disease areas, and the data generated from any compound screens will be narrowly focused on the target hypothesis.
Contrast that against an AI-powered approach, where everything is designed with standardisation and generalisation in mind. This enables scalability and relatability of data across programs and therapeutic areas, allowing us to extract biological information without any pre-defined target or therapeutic hypothesis. Because our platform was built upon these foundational principles, we were able to quickly deploy it against disease models for Covid-19 and explore a vast amount of downstream biology.
Phenomics
The foundation of our data generation capabilities comes from a process called phenomics – the analysis of high-content microscopy images to examine cellular response to a range of genetic and chemical perturbations. Deep-learning algorithms extract high-dimensional and dose-dependent fingerprints of cellular morphological changes, or ‘phenoprints’, from images to support a variety of downstream applications. These phenoprints characterise subtle morphological changes far beyond human ability, and a standardised assay allows the phenoprints of millions of cellular samples to be related across time and experimental conditions.
The power of a high-dimensional readout like imaging is within the massive amount of information that can be collected from each experiment. An image will provide much more data than a univariate assay designed to evaluate only the target of interest; you’ll start to see both on-target and off-target activities. And unlike other high-dimensional approaches, image-based assays are relatively inexpensive, allowing it to be scaled to levels of throughput comparable to traditional low-dimensional screening modalities. Moreover, we believe cellular morphology more accurately reflects biological function than other assays that examine upstream regulation, such as mRNA or protein function.
The challenge, however, is in distilling this massive amount of unstructured data into something the can be computed and used to build models. That’s where AI and ML come in. Our models are trained to analyse unstructured data and build biologically-meaningful mathematical representations of each cell image, allowing us to extract biological information with minimal per-disease custom investment. These same models also play a significant role in correcting for the inherent variability in the technical execution of experiments, separating any experimental ‘noise’ from real biological signals. Importantly, these models are broad and generalisable enough to be applied across different therapeutic areas, cell types and perturbations with few changes – ensuring scalability.
Deployment
It took us less than four weeks to establish our Covid-19 disease models, run compound screens and identify hits using our platform. There was no need to develop disease- or virus-specific assays and biomarkers, which simplified and extended the ability to rapidly deploy our platform. The biggest upfront effort was around establishing an accurate disease model. Our aim was to identify compounds that could address two key components of Covid-19 disease progression: direct effects of viral infection, and the damaging effects of an unresolved inflammatory response, or cytokine storm.
Our first disease model – direct effects of the viral infection – was relatively straightforward to establish. We acquired the active SARS-CoV-2 virus, modeled infection in relevant cells and completed our screen in a Biosafety Level 3 lab.
However, the most severe cases of Covid-19 – which, at the time, represented the biggest medical need – are the result of an exaggerated cytokine response from the body’s immune system, manifesting as acute respiratory distress syndrome (ARDS). We were able to model the cytokine storm in endothelial cells by applying cocktails of circulating proteins that mirror those from severe Covid-19 patients. The result was a reverse-translational disease model built directly from patient data.
From there, we ran a screen of 1,670 and 2,913 compounds against cells in the infection and cytokine storm models, respectively, and evaluated how well these compounds rescued cellular morphology in our high-dimensional phenomics assay. Using a single platform, we identified a handful of drugs that were actively being studied in clinical trials to treat Covid-19. Remdesivir showed strong modulation in our infection model, while JAK inhibitors like baricitinib and tofacitinib demonstrated activity in our cytokine storm model. We published a preprint of our results1 and released the related datasets on our website, in August 2020, prior to any clinical trial results becoming available. These positive efficacy predictions for remdesivir, baricitinib and tofacitinib were later supported by clinical data.
Clinical trial readouts also demonstrated there was no or limited efficacy for several drugs originally hypothesised to benefit severe Covid-19, such as hydroxychloroquine or antivirals like lopinavir and ritonavir, which were later discontinued in clinical trials. For other drugs, clinical trial results initially suggested a positive result but were later rejected by regulatory agencies. This was the case with fluvoxamine, in which a trial demonstrated a reduction in hospitalisations among high-risk patients, but the FDA declined the Emergency Use Authorization citing limitations in the study design and a lack of clinically meaningful outcomes. Our platform predicted no efficacy for any of these compounds.
Validation of AI-powered drug discovery
The introduction of machine learning and data sciences in drug discovery is enabling us create generalisable screening systems that can extract a tremendous amount of biological information with relatively small upfront investment. The Covid-19 pandemic was an opportunity to demonstrate the power of this approach to rapidly identify potential therapeutics that could address an urgent health crisis. And more than two years later, the clinical trial readouts show it worked.
It’s clear we’re entering a new, digitised era of drug discovery that is truly data-driven. These technologies can broadly explore systems biology in an unbiased way, not only accelerating the process, but also revealing novel insights that we as humans are not equipped to discover on our own.
DDW Volume 24 – Issue 1, Winter 2022/2023