Ben Folwell, Principal Consultant at Citeline, maps out how AI, swarm learning and open access to clinical data could revolutionise drug discovery.
Artificial intelligence (AI) is notorious for its industry hype, touted to revolutionise the way multiple sectors operate over the next five to 10 years. Its application in pharma, most commonly in drug discovery, has been steadily growing with global partnerships between tech vendors and pharma companies becoming more commonplace. Although still a fairly nascent practice, it has undoubtedly seen success in areas such as Lingard discovery and in identifying molecules that bind to a biological target of interest. But has its application been as revolutionary as industry bods predicted?
The simple answer is no. Whilst we know that AI has been effective in practices such as validating targets that may be suitable to treat a disease, it faces a plethora of significant challenges. Many of these centre around the data that is available to those working directly with the AI; much of it is guarded by strict data privacy laws and what is accessible is often sourced from the earliest stages of research. Yet despite this, the biggest players in the global pharma sector are increasingly opening their doors to partnerships with AI and machine learning companies, cognisant of the potential benefits these could provide across broad therapy areas such as oncology, autoimmune and rare diseases.
How AI can streamline drug discovery
AI algorithms are able to quickly analyse and spot patterns in vast swathes of data. With the advent of natural language processing, algorithms can now also contextualise these data and, combined with trend analysis, can guide researchers towards areas more likely to show promising treatments.
Multinational science and technology company, Merck, cites one of the benefits1 of using AI in the drugs discovery process as being able to predict the properties of a specific compound, saving time and money by preventing work on others that are unlikely to be effective. It can also generate ideas for entirely novel compounds which could hugely accelerate the discovery of effective new drugs. Lastly, it alleviates the need for repetitive tasks, such as the analysis of thousands of histology images, which saved hundreds of hours of laboratory work. In fact, the market research firm, Bekryl, predicts that AI has the potential to offer over $70 billion in savings for the drug discovery process by 2028.
The significance of these predictions has not been lost on investors in the health and technology space. According to Hampleton Patners’ latest Healthtech M&A market report2, digital health companies raised a total of $57.2 billion in 2021, an increase of 79% from 2020. AI-based diagnostic software and clinical trials technology proved to be two of the most attractive health investments, alongside mental health and wellness and medical imaging companies.
The unavailability of good data
Perhaps one of the largest obstacles facing technical teams looking to analyse data and convert the findings into real world impact is the availability of data with which they can train their AI. Much of the most widely available data is preclinical, meaning that it is usually collected at an early stage of research where most validation experiments are conducted with microorganisms, cells, or biological molecules outside their normal biological context. While undoubtedly useful in lead optimisation, these data themselves cannot be directly translated into predicting how a drug will behave in an animal model or, even more importantly, the human body.
The result of the lack of clinical data is that AI researchers may often try to apply preclinical data to predict how a drug will behave in clinical situations. But the problem is that not enough is known about how the human body functions and metabolises drugs for an AI to be able to make accurate predictions. AI can excel at predicting how two chemicals will interact or bind as the rules governing this are well defined and repeatable in a way that experiments in biological systems are not. This means that, overall, the unpredictability of conducting experiments in biological systems leaves initial data and positive findings found in preclinical experiments open to vary substantially from the results obtained in the human body.
Data privacy and the advent of ‘swarm learning’
Another big problem looms for organisations attempting to access these private data sets; ever-stricter data privacy laws. Acts such as General Data Protection Regulation (GDPR) in the UK and the Health Insurance Portability and Accountability Act (HIPAA) in the US protect users’ health data from being used for commercial purposes. Healthcare and pharma are already highly regulated industries and steps are being taken globally to find ways to protect citizens’ health data and privacy whilst being able to arm pharma companies with the best and most accurate data sets for drug discovery.
This is where ‘swarm learning’, which is predicted to change the way the biotech sector operates, comes in. Perhaps one of the most promising recent breakthroughs, ‘swarm learning’ is a decentralised machine learning approach which can analyse data sources on the blockchain meaning that ownership of them never changes hands. This allows the algorithm to analyse multiple data sets, creating vastly more accurate results without violating data privacy laws.
This development is far from commercial. In fact, details about the practice were only made public in April3 by the University of Leeds, where much of this initial research had been carried out. The researchers wanted to find out whether this unique form of AI could be used to help computers predict cancer in medical images of patient tissue samples without releasing the data from hospitals.
According to the official news release: ‘Swarm learning trains AI algorithms to detect patterns in data that is held by a hospital or university, such as genetic changes within images of human tissue. The swarm learning system then sends this newly trained algorithm – but importantly no local data or patient information – to a central computer. There, it is combined with algorithms generated by other hospitals in an identical way to create an optimised algorithm. This is then sent back to the original hospital, where it is reapplied to the original data, improving detection of genetic changes thanks to its more sensitive detection capabilities. By undertaking this several times, the algorithm can be improved, and one created that works on all the data sets. This means that the technique can be applied without the need for any data to be released to third party companies or to be sent between hospitals or across international borders.’
The challenge of partnering with Big Pharma
Despite its potential, the widespread use of ‘swarm learning’ is far in the future. Currently, many of the start-ups working in biotech use publicly available data – such as genetic data stored on the National Centre for Biotechnology Information (NCBI) database – as a foundational level. They look for partnerships with clinical or pharma companies that often grant them access to a far richer goldmine of private data which they feed back into their AI to refine the results further. This lack of access for many has catalysed a growing industry call for companies to stop selective reporting and to disclose failures as well as positive results to reduce the prevalence of duplicated research efforts.
Globally, there are around 250 companies designing drugs with the help of AI. These vary from tiny start-ups to well-established industry players. Some recent highlights from market-leaders include:
DeepMind, a British AI company owned by Google which focusses on predicting proteins and has released the predicted structures of 350,000 proteins, covering almost the entire human body.
Insilico Medicine, a biotech company based in Hong Kong which has seen its candidate ISM001-055 for idiopathic pulmonary fibrosis – the first wholly AI developed drug – enter Phase 1 trials in 2022.
BenevolentAI, headquartered in Luxembourg and arguably at the forefront of AI in drug discovery, has won multiple industry partnerships such as with AstraZeneca and Helix Group.
Nuritas, based in Dublin, Ireland, is using AI to locate peptides in plants and clinically test them before they are integrated into consumer products for topical or oral use.
NEC, the Japanese IT and electronics corporation, has pivoted towards drug discovery with a focus on personalised medicines. The AI Drug Development Division is focused on AI-guided design and development of individualised neoantigen vaccines in oncology.
When considering which companies to partner with, Big Pharma will look for several green flags. Many of the most successful enterprises working in this space are spinouts from pharma executives and universities – sources which tend to know what drugs are needed and, as such, bring an attractive proposition to pharma companies as a potential partner. Biotech and AI companies founded by purely tech professionals often opt to bring in pharma executives with insider knowledge further down the line.
Interestingly, where Big Pharma companies have traditionally not had the expertise in-house to develop market-leading AI propositions, we are now seeing the tide beginning to change. Notably, Charlotte Allerton4, who is Pfizer’s head of medicine design, was reported as saying: ‘We feel we are in good shape in having built state-of-the-art AI models in order to enable us to more efficiently design oral drugs.’ For the wider industry, what we are likely to see as a result is that smaller biotechnology companies are pushed to focus on more niche research.
Which areas of drug discovery research stand to benefit most?
The clinical research areas set to benefit the most from the use of AI in drug discovery are those where traditionally the underlying biology is poorly understood but technological advances are providing more insight every year. These areas include oncology, autoimmune diseases such as multiple sclerosis, inflammatory bowel disease or lupus, and rare diseases such as fragile X syndrome. Other areas where current treatment options are extremely limited, such as congenital disorders of glycosylation, could also be revolutionised by continuing AI driven in silico research for drug repurposing which does not require the same level of monetary investment as other drug discovery approaches.
Looking ahead, the future is bright. Great hope is being placed in newer forms of drug discovery such as cell and gene therapies, therapeutic RNA, as well as the advent of personalised medicine where AI will undoubtably have an impact. Much of the primary sources of knowledge for AI come from scientific literature, but continuing improvements to machine learning algorithms such as the advent of natural language processing which allow it to find context as well as patterns in data will accelerate progress.
Not only this, but the amount and quality of data collected during clinical trials is also increasing which means that any AI trained on it may start to be able to predict results in vivo, and ultimately in humans, more accurately saving both time and costs. For AI, which was widely touted as a biotech disrupter, this is a huge and important step forward in proof that it can live up to the hype.
Volume 23 – Issue 4, Fall 2022
About the author
Ben Folwell is Principal Consultant at Citeline (formerly Informa Pharma Intelligence) where he specialises in the use of AI in drug discovery. Prior to this, Folwell managed field-based projects for major oil field operators. He holds a PhD in microbiology from the University of Essex.