The complexity of disease biology gives opportunities for drug discovery

Steve Gardner, CEO PrecisionLife, examines the lack of return on investment for pharmaceutical R&D and the huge pockets of unmet medical need in common chronic disorders.

Over the last decade many pharma companies such as GSK, Abbvie and AZ have analysed the success factors for their drug discovery and development processes. They have recognised that selecting new targets based on a multi-layered decision process, such as AZ’s 5Rs, based on the deeper understanding of the target biology and its genetic mechanism and role in patients, significantly improves the likelihood of a new drug reaching the clinic. AZ has reported an almost five-fold increase in the number of new project successfully passing Phase III trials based on use of the 5Rs framework to select new targets.

The biggest costs of bringing new drugs to the market are incurred in late-stage clinical trials. Increasingly clinical trial failures happen late and in around 60% of cases these failures are due to failure to demonstrate efficacy. In many cases the drug may be effective for some patients but not enough of the trial’s cohort to demonstrate this. This is due to poor understanding of the influences on the biology of the selected target across the trial’s patient population and therefore flawed recruitment criteria for the trial.

At the same time, many patients are being left without effective treatment options for a wide range of diseases, especially outside of cancer and rare disorders in chronic diseases such as neurodegenerative, neuropsychiatric, respiratory, immunological and cardiovascular diseases, which cost health systems 85% of their treatment budget. This is a lost commercial opportunity for pharma companies, a major source of wastage in health systems via prescription of non-effective therapies and the cause of a huge long-term socioeconomic burden due to poor patient outcomes.

Solution – To make headway into chronic diseases, we need better tools. Chronic diseases are particularly heterogenous and polygenic, but their diagnoses are often based on similar observed symptomology (eg. breathlessness and wheezing in asthmatics), even though these symptoms may result from multiple different pathways.

This means that a population of patients sharing the same diagnosis will often contain multiple patient subgroups (endotypes) with different disease aetiologies, severities and therapy responses. These subgroups will be defined not by simple single gene variants, but by combinations of relatively common variants which come together to exert a complex, non-linear effect on the disease phenotype. This is highly challenging for existing genomic analysis and target discovery tools, and for downstream demonstration of efficacy.

Combinatorial analysis is a new method that brings a new hypothesis-free approach to understanding the disease mechanisms that are relevant to the different patient subgroups within a complex disease population1 . The outputs are explainable, reproducible and offer significant advantage to companies addressing R&D in complex chronic disease.

Combinatorial analysis can be applied to sets of genomic, multi-omic, clinical, and epidemiological data to perform high-resolution patient stratification, correlating the specific disease associated targets in the context of each of the patient subgroups. The result is a detailed map showing the different causes and potential targets for different disease endotypes [Figure 1 disease architecture map].

These novel disease insights are prioritised using a computational equivalent of the 5Rs framework, applying discovery criteria to identify and validate novel targets suitable for different modalities and associating them with highly predictive combinatorial patient stratification biomarkers.

An effective new tool

Combinatorial analysis finds additional signal corresponding to the non-linear interactions of genetic and metabolic networks in patient datasets that is invisible to existing GWAS and other genetic analysis methods. The scale of the additional insights are shown by a meta-analysis of several large-scale studies into the genetic factors underpinning Covid-19 host response as it relates to disease susceptibility and severity.

A GWAS study involving 1,131 severe patients and 15,434 mild controls identified 11 loci associated with high risk of developing severe Covid-19 and one gene – the ABO blood group gene2. A much larger study with 13,641 severe disease patients and over 2 million controls, identified 15 genome-wide significant loci associated with severe manifestations of Covid-193. The diverse symptoms of Covid-19, include micro-coagulation, cardiovascular neurological, renal and other consequences beyond inflammation driven disease, cannot however be explained by these findings.

Before these studies could collect sufficient patients, a combinatorial analysis had been run on the very first Covid-19 datasets available from UK Biobank, with just a few hundred severe patients. This study, working from a much smaller dataset than the GWAS studies, identified 156 severe disease associated loci, which mapped to 68 protein coding genes, that had plausible mechanistic correlations with the range of observed Covid-19 symptoms4.

These novel disease-associated mechanisms identified were replicated and validated in a set of de-identified patient health records in the UnitedHealth Group Covid-19 Data Suite, and several of the novel targets have also been subsequently validated in drug repurposing studies using viral plaque assays and other disease models.

Building a new biotech pipeline for the future

Combinatorial analytics generates more insights into complex disease processes and populations that are gained from much smaller patient dataset, opening up new opportunities across a wide range of disease areas that have been difficult for drug discovery and development. Disease studies using a combinatorial analytics approach across a wide range of chronic disorders have uncovered new disease associated mechanisms of action with multiple innovative druggable targets across dozens of clinically relevant patient subgroups. The novel targets bring new mechanistic insights to challenging disease areas that could result in novel therapeutic approaches addressing the unmet medical needs of currently refractory patients.

As part of the analysis, a systematic repositioning study is performed, identifying the known active chemistry at each of the targets. This enables the identification of tool compounds and potential repurposing candidates, which can be used in phenotypic assays, and also an opportunity to look at the repositioning potential for existing patented drugs in other diseases and endotypes whose disease is associated with the same target or pathway.

All of these new disease insights and the associated patient stratification biomarkers are banked. New targets are prioritised with reference to the scientific literature, target databases, industry pipelines, and KOLS, and ultimately the most promising are validated using either the tool compounds to demonstrate their disease modification potential in patient derived iPSCs or cellular assays, or other non-therapeutic methods such as CRISPR, siRNA or AAVs.

These selected novel targets can then be rapidly evaluated by AI augmented data driven computational chemistry approaches to identify opportunities for rapid and scalable small molecule drug discovery programmes. This enables candidates to be advanced as precision medicine disease programmes through a rapid lead generation and optimisation process. [Figure 2]

Viewed with a proteome wide lens, potential targets’ polypharmacological binding potential across all of the active sites of all of the targets in the human proteome can be evaluated. This enables the design of specific therapeutic product profiles, prioritising tissue availability, deliver routes, and avoidance of ADME and toxicity issues.

Assessing the specificity and selectivity across all available binding sites allows accurate prediction of the potential for multiple novel and rare on- and off-target interactions. Evaluating selectivity between and within protein families for specific members and even isoforms further informs the choice of targets with the exact profile of therapeutic interactions desired.

These are key decision points that are backed up by the ability if data-driven AI chemistry platforms to work from targeted compound libraries, or to design novel compounds semi-generatively within the bounds of a virtual compound library, or full generatively. The semi-generative approach enumerated databases of billions of synthetically feasible molecules that comply with “rule of 5“ and Veber criteria for drug-like compounds.

This provides for very rapid and reliable synthesis of molecules of interest, which makes scaffold-hopping away from tool compounds or cycles of design, make and test for derivative hit/lead series very cost and time efficient.

The final component in this discovery pipeline of the future is automated in vitro assays. New fully automated robotic platforms for delivering reproducible cellular, biochemical, and molecular biology assays, and target-based profiling are becoming available. These provide a fast, cost-effective and data rich source of high-quality confirmatory evidence for the activity of hit and lead compounds, further accelerating the discovery process.

Such an integrated approach of best of breed combinatorial biological analysis performing high-resolution patient stratification on large scale multimodal patient datasets, data-driven chemistry and automated in vitro assays can be disease agnostic, highly scalable and can embrace the inherent complexity of chronic diseases.

Example chronic disease studies

A range of previously poorly served disease areas including respiratory, cardiovascular immunological and neurodegenerative and neuropsychiatric disorders can be analyzed using this approach.

A good example is non-T2 asthma. Asthma patients can be broadly categorized into two molecular phenotypes: those with high type 2 T-helper cell expression (T2), and those with low type 2 T-helper cell expression (non-T2). Asthma patients with a T2 phenotype currently have a range of targeted biologic treatment options available to them. Non-T2 patients, however, lack personalized therapy, and often have to rely on conventional symptomatic control therapies (such as bronchodilators and inhaled corticosteroids) that do little to combat the underlying disease pathology.

Using combinatorial analysis the drivers of disease in T2 and non-T2 asthma patient populations was examined. This identified clear differences in the genetic pathways associated with disease between the T2 and non-T2 asthma cohorts.

These findings were well-aligned with the common understanding that cytokine regulation (especially IL-5 and IL-13) plays a key role in T2 asthma. They also provided some promising novel insights into the mechanisms underpinning the pathogenesis of non-T2 asthma. These differences hold significant potential for better patient stratification and diagnosis biomarkers, as well as for the discover of new treatment options for the 35% of asthmatics who do not respond to the new biologic medicines aimed at T2 mechanisms – a significant pool of unmet medical need.

While most of the significant disease-associated genes in the T2 cohort related to immune pathways and interleukins characteristic of Th2-driven allergic asthma, many of the genes that were significant in non-T2 asthmatic patients corresponded to metabolic and neuronal pathways.

Over 20 novel genes were identified as being significant in the non-T2 population only, with strong, testable hypotheses for their mechanism of action. These were reduced using the 5Rs criteria to a short list of 8 targets. This were then evaluated by using data-driven AI chemistry approaches to find targets that had small molecules with high predicted affinity and selectivity and low predicted ADMET issues. The remaining 6 targets represent highly promising opportunities for the development of personalised therapies for patients presenting with nonallergic asthma.

This single example has led to multiple partnering opportunities and other similar studies are underway in T2 asthma, idiopathic pulmonary fibrosis, COPD, chronic bronchitis and interstitial lung disease just in the respiratory area alone. There is now a broad pipeline of similar projects in neurodegenerative, neuropsychiatric, immunological, cardiuovascular and metabolic diseases.


The combination of new AI enabled biological, chemical and automated testing platforms is transformational for drug discovery. It delivers deeper disease insights that identify novel mechanisms and targets for a wider range of diseases, enables faster hit-lead optimization cycles and rapid, reliable in vitro validation, and then provides highly predictive patient stratification biomarkers to aid downstream development and even product launch.

The key enabling starting point is however deeper insights into the complex biology of chronic diseases, using the non-linear effects of combinations of genetic and metabolic networks to better map the drivers of disease biology and pathophysiology to patient subgroups and more accurately identify potential therapeutic option for those patients.

Volume 22, Issue 3 – Summer 2021

Figure 1 Disease Architecture Map of Sjögren’s syndrome, each colour represents a patient subgroup; each circle represents a disease associated SNP and edges represent co-associated SNPs


Figure 2 Schematic illustrating how novel targets can be rapidly evaluated by AI augmented data driven computational chemistry to discovery candidate chemistry.

About the author

Steve Gardner, Ph.D is CEO and a founder PrecisionLife.  He is an experienced serial entrepreneur successfully developing and commercializing ground-breaking data science and informatics in healthcare, life sciences and agri-food. He is a former Global Director of Research Informatics for Astra AB and has consulted with over 20 biopharma companies.


1 Shelton, JF, Shastri AJ, Ye, C et al. Trans-ethnic analysis reveals genetic and non-genetic associations with COVID-19 susceptibility and severity medRxiv 2020.09.04.20188318; doi:

2 Shelton, JF, Shastri AJ, Ye, C et al. Trans-ethnic analysis reveals genetic and non-genetic associations with COVID-19 susceptibility and severity medRxiv 2020.09.04.20188318; doi:

3 Mapping the human genetic architecture of COVID-19 by worldwide meta-analysis The COVID-19 Host Genetics Initiative, Andrea Ganna medRxiv 2021.03.10.21252820; doi:

4 Analysis of Genetic Host Response Risk Factors in Severe COVID-19 Patients Krystyna Taylor, Sayoni Das, Matthew Pearson, James Kozubek, Marcin Pawlowski, Claus Erik Jensen, Zbigniew Skowron, Gert Lykke Møller, Mark Strivens, Steve Gardner medRxiv 2020.06.17.20134015; doi:





Suggested Reading

Join FREE today and become a member
of Drug Discovery World

Membership includes:

  • Full access to the website including free and gated premium content in news, articles, business, regulatory, cancer research, intelligence and more.
  • Unlimited App access: current and archived digital issues of DDW magazine with search functionality, special in App only content and links to the latest industry news and information.
  • Weekly e-newsletter, a round-up of the most interesting and pertinent industry news and developments.
  • Whitepapers, eBooks and information from trusted third parties.
Join For Free