Opportunities and challenges for AI in drug discovery

By Naheed Kurji, Co-founder and Executive of the Alliance for Artificial Intelligence in Healthcare (AAIH), CEO of Cyclica; and Andreas Windemuth, Ph.D, Chief Science Officer of Cyclica

Over the past decade, a new industry has arisen that seeks to apply artificial intelligence (AI), or more aptly machine learning (ML) technologies, to healthcare. This surge was driven by methodological advances, increased computing power, and increased availability of data.

There are many different terms that are associated with AI, including but not limited to Machine Learning (ML), Deep Learning (DL), and Natural Language Processing (NLP). For the purpose of this perspective, we will not go into detail about each of these as they are covered in sufficient detail in the white paper published by the AAIH.

Within the pharmaceutical industry, there has been considerable focus on AI for drug discovery and development. This industry is commonly referred to as “AI for Drug Discovery” and includes research organisations, AI innovators – both early-stage and well-funded biotech companies, and multinational pharma companies.  Though AI has recently taken hold in the marketplace, computational approaches to drug discovery – namely in chemistry and biology – have a history that dates back almost as far as electronic computing itself. In the 1970s, Martin Karplus, Michael Levitt, Arieh Warshel and others laid the foundation for the computational modeling of macromolecules as a means of understanding and predicting chemical and biological processes. For this, Karplus, Levitt, and Warshel shared the 2013 Nobel prize in chemistry

This article will explore the various approaches to drug design including AI, applications of AI in novel target identification, biomarker development, and patient stratification along the important “Three Races” in pharma: the race to the molecule, the race to the clinic, and the race to the patient. To win each race, we present the “Five Rights”:  the right target, the right molecule, the right tissue, the right safety, the right patient.

The right target

Understanding the biological drivers of a disease in an accurately stratified patient population is key to hitting the right target with drug discovery programs and avoiding late-stage efficacy failures. Many drug candidates fail because the drug’s targeted mechanism turns out to not have the desired effect. This can manifest as a lack of efficacy or as an off-target toxic side effect that cannot be circumvented.

There are several ways in which AI can help with identifying the right target. First, finding the right target is essentially a problem of understanding the underlying biology of disease. Given that few major diseases are caused by single genes, this requires deeper understanding of the non-linear effects of network biology on disease phenotypes and an ability to find key genetic modifiers. AI systems can help process the information scientists have gathered in large and diverse systems biology databases to identify promising targets. Natural language processing systems can even process the scientific literature directly, for the same purpose. Another way of finding the right target is to defer the target hypothesis entirely. This is known as phenotypic drug design and is in many ways a revisiting of the oldest way to find drugs: Observing a compound’s effect without any knowledge or hypothesis of how it works. AI can help here by mechanizing and systematizing the observation of phenotypic effects, through image processing to recognize cell deformities, or through motion analysis to classify behaviors in animals. After a drug is found with a desired phenotypic effect, it is normally the next step to find the target and mechanism responsible for the effect. AI can help with that, too, for example by predicting the proteins most likely to bind to a given drug.

The right molecule

The moment of inception for a small molecule therapeutic is when the chemical structure of an active molecule is first proposed. This can happen by screening large libraries of compounds for activity, or by generating novel chemicals in some way. Because of its central importance, we go into some detail below on the different ways AI (and, more generally, computation) can play a role in finding the right molecule for a given target.

Nearly all computer-aided drug discovery (CADD) is done either of two ways: structure-based through attempts to simulate the underlying molecular biophysics with knowledge of the molecular structure of the receptor (e.g. molecular docking), or ligand-based through the quantitative analysis of phenomenological structure-activity relationships based only on ligand structures (e.g. quantitative structure-activity relationship – QSAR).

The structure-based approach, long predominant before the advent of “AI in Drug Discovery”, involves detailed knowledge of the 3d-structure of the target and uses biophysical simulation alone, i.e. the explicit evaluation of intermolecular interactions. These methods are very much in use today, and range in sophistication from semi-empirical force fields for molecular docking, to detailed simulation of dynamics and even quantum chemistry (e.g. free energy perturbation, FEP).  To discover new drugs, structure-based approaches start out with a well-established target protein and binding site that are carefully selected to have the desired biological effect. They then try to fit millions of different chemicals into that site, selecting those that work best for further development.  This is called virtual screening. Structure-based virtual screening allows researchers to find active compounds for targets that do not already have known drugs to use as a starting point, thus enabling the discovery of first-in-class drugs.

However, early enthusiasm for CADD in the 1980s was hampered by the high computational cost inherent in the method, as well as by the scarcity of proteins with known molecular structure. In addition, docking is notoriously inaccurate and slow, and the biophysically motivated model that underlies it is not informed by the large amount of experimental data available for drug/target interactions in the real world. It is enough to be useful in virtual screening and can help understand detailed molecular interactions, but despite intense research, its predictive accuracy and hit rate in virtual screening have not significantly improved over the last two decades. The inclusion of more physical detail through molecular dynamics and quantum chemistry, at large computational expense, does not substantially improve predictivity, either. Nevertheless, a few companies like Schrodinger have developed structure-based approaches that have been widely used to successfully discover drugs, emphasizing the importance of the biophysical insights provided by protein structure.

More recently, large databases of assay results have been collected that contain millions of examples of protein/molecule pairs that are observed to bind to each other in practice. This presents an opportunity to go beyond the first principles approach of docking and create predictive models that are informed by all this real-world data. Unlike structure-based models, ligand-based models rely on – and are thus limited by – the amount of data available for any given target.  This approach involves collecting experimental binding data for many molecules to a specific target and then fitting a machine learning model to the data to predict the binding of new molecules. Protein structure is not taken into account, and the model can only be informed by data generated specifically for the target in question. In-vitro data for hundreds of molecules is needed to inform a ligand-based model, so it is very difficult to apply these methods to new targets that do not have existing chemical matter associated with them. This methodology has long been used under the name of QSAR, but it has recently been greatly enhanced with modern ML methods. The vast majority of companies in the space of AI for Drug Discovery use the ligand-based approach, which is easily adapted to new ML methods.

The best solution, then, would be to combine ligand based-machine learning with structure-based modeling approaches. However, this is considerably more difficult, and there are only a few companies that we know of that have gone to market using ML together with receptor structures.

The right tissue

Finding the right molecule for the right target is really only the beginning of the drug development process. Most small molecule drugs are taken orally, and to be effective a drug must first make its way to the right place in the body.. The molecule must enter the bloodstream (Absorption), travel to the organ where it can affect the disease (Distribution), maintain its chemical identity (Metabolism), and stay around long enough (Excretion). The collection of these different processes is known as ADME, and the behavior of the molecule under them is known as pharmacokinetics (PK). AI is useful in predicting the molecular ADME properties that determine PK. There is a long tradition of predicting ADME properties using QSAR models, and that field has recently been invigorated by advanced ML. At the back end of ADME property prediction, mathematical models are used to understand the dynamic behavior of drug concentrations, which is known as quantitative systems pharmacology and has also been of increasing importance in recent years.

The right safety

It’s not enough for a drug to be effective. To be useful, a drug must do more good than harm, and a large amount of effort goes into ensuring that there are no, or minimal, toxic side effects. Ultimately, this is addressed during preclinical animal testing, but there are many ways that toxicity can be accounted for in advance of that, making drugs more safe and reducing the need for animal testing. One important source of toxicity is polypharmacology. It is well known that most drugs interact with proteins other than their intended target, which can lead to side effects that are hard to predict. AI can help with predicting such off-target effects, ideally early in the process during hit generation or lead optimization. Another important source of toxicity is ADME-related. Drugs, or their metabolites, often accumulate at high concentrations in the liver or kidneys, a tendency that is related to the special function of these organs in metabolism and excretion. Any mild general cytotoxicity that would normally not be a problem can, due to high concentration, cause damage to these organs. AI can help identify such problems early, using the same methods described above under “the right tissue”. A third important source of toxicity is the interference of drugs with nerve action by modulating the function of ion channels. Many drugs have failed in human trials or been withdrawn from the market because of such problems, most often due to interference with the electrophysiology of the heart. The most significant of these effects is Long QT Syndrome, which is caused by interference with the hERG potassium ion channel and can lead to fainting, seizures, and sudden death. AI can help identify problem compounds early in the discovery process using ligand-based QSAR models similar to the ones used for ADME property prediction.

The right patient

Ideally, a drug will work equally well for every patient, but in practice, this is far from the truth. The human genome has millions of polymorphisms, i.e. base pairs that differ between individuals in the human population. In addition to individual differences, there are many polymorphisms that vary widely in prevalence between different subpopulations, such as racial or ethnic groups, leading to ethical concerns in the selection of clinical trial participants. Any one of these polymorphisms could have an effect on the efficacy or safety of a drug, a phenomenon that is studied in the field of pharmacogenomics. AI can play a role in identifying or predicting the effect of a given polymorphism, if any, and in cross-referencing that effect with the intended or unintended effects of a drug. Of particular interest is structural pharmacogenomics, which maps genetic polymorphisms onto protein 3d-structure in order to better understand the mechanism of such interactions. Patient stratification biomarkers that can be used as inclusion/exclusion criteria are also key to delivering efficient recruitment and clear signal from downstream clinical trials. Given the genomic heterogeneity of the human population, AI can also help in using genomic markers to select the right participants in clinical trials. This can enhance statistical power and address concerns about the underrepresentation of particular subpopulations. AI is being used in clinical trials to create synthetic control arms and “digital twins” – two ways to improve the way clinical trials are conducted.

It is clear that there are a large number of applications of AI in healthcare, and when applied appropriately to specific problems and questions, AI can have far-reaching implications across the entire healthcare value chain, including diagnosing diseases earlier, discovering and developing better medicines for individual patients or populations, detecting safety signals in clinical trials or in the market, prescribing those medicines more effectively, monitoring patient adherence to prescription, and managing patient care. When relevant data is available in high quality, AI will undeniably play a large role in the near future in addressing these issues, by providing solutions to specific research questions that otherwise would be onerous or lengthy to do solely by traditional approaches. Given the magnitude of the impact AI will have across the entire healthcare industry, a body like the AAIH – a  coalition of technology developers, pharmaceutical companies, and research organizations – is key to ensure that industry players operate in a way where standards are well understood and adopted, that the broader industry is aware of progress, and that there is continuous innovation around application and validation of methods.

About the authors

Naheed Kurji

Naheed is the Co-founder, President and CEO of Cyclica, a leading AI in drug discovery company who advance molecules to medicines by embracing the complexity of disease. In addition to his role at Cyclica, Naheed is also a Co-founder and Director of Entheogenix Biosciences, a psychedelic inspired biotech company for mental health, a Co-founder, Board Member and Executive Officer of the Alliance for Artificial Intelligence in Healthcare (AAIH), and serves as a Member of the Life Sciences Advisory Group for Global Affairs Canada. Naheed is passionate about how people interact with technology to inform effective decision making, while dedicated to advancing the responsible application of AI to healthcare and the impact on patients. Naheed holds an MBA from Rotman School of Business, and a BSc from the University of Ottawa, and a certificate in AI from MIT.

Andreas Windemuth

Andreas Windemuth, Ph.D is the Chief Science Officer and guides Cyclica’s vision in creating a scientifically rigorous platform that’s integral in the drug discovery pipeline. Prior to joining Cyclica, Andreas served as Chief Information Officer at Firefly BioWorks, a biotechnology company from MIT, where he created all computational aspects of the company’s multiplexed assay technology. Andreas received his Ph.D. in Theoretical Physics from the Technical University of Munich and his Ph.D. in Theoretical Biophysics from the University of Illinois Urbana-Champaign.

Related Articles

Join FREE today and become a member
of Drug Discovery World

Membership includes:

  • Full access to the website including free and gated premium content in news, articles, business, regulatory, cancer research, intelligence and more.
  • Unlimited App access: current and archived digital issues of DDW magazine with search functionality, special in App only content and links to the latest industry news and information.
  • Weekly e-newsletter, a round-up of the most interesting and pertinent industry news and developments.
  • Whitepapers, eBooks and information from trusted third parties.
Join For Free