Big data – charting a new path to drug discovery and development

Big data concept image

Tim Lowery, President, JSR Life Sciences, asks whether artificial intelligence can do for life sciences what it has done in other sectors and whether these tools can keep up with the complexities of human biology.

Drug discovery and development timelines can span 10-15 years or more from initial discovery to market approval and typically require the analysis of massive amounts of data. Now, with the growth of publicly available genomics, transcriptomics, and proteomics databases, the ability to quickly carry out large-scale DNA, RNA, and protein screenings, and the availability of massive sets of de-identified patient data, the amount of high-value, analysable data has reached enormous proportions.

While the potential for better insights has grown, acquiring powerful tools that can harness that data, and turn it into useful, actionable information, has increasingly become one of the most important aspects of life science research.

With the capacity to process and analyse complex data at speeds and accuracy previously unattainable taking root, artificial intelligence (AI) and machine learning (ML) solutions are emerging as requisite solutions to address these complexities within reasonable time frames. However, many questions remain. Up until this point, there has been limited success in the therapeutic space with AI and ML, as the algorithms can be flawed, the databases (although huge by current standards) are still not comprehensive enough, and our existing computing resources are not sufficient to carry out a proper analysis. In addition, the vast complexity of human disease often means we need a better understanding of all the variables involved in a specific process. Yet there is no denying that these powerful tools have ground-breaking impacts in other industries. So, where do we go from here to realise their potential in life sciences?

AI/ML 101

At their most basic level, AI solutions replace cumbersome, antiquated, or manual processes with more powerful, automated workflows that deliver rich data and analyses with greater efficiency (and potentially better accuracy). However, to be effective, AI requires advanced mathematical algorithms, extensive, high-quality data sets, and considerable computing power.

AI algorithms often (but only sometimes) rely on rule-based systems in which the programs define the rules upfront on how the data should be analysed and how decisions are reached.

ML is a type of AI technology. ML algorithms can learn (much faster and better than humans can) from the numerous examples in a dataset and come to conclusions based on those analyses. It can also learn to decipher patterns in data sets at scale, surpassing human ability to form connections and creating the potential for identifying previously unknown correlations.

These different fundamental approaches (rules-based versus pattern recognition) to building these systems provide multiple opportunities to interject them into disease diagnostics and treatments.

How AI and ML are influencing drug development

Biopharma has experienced significant growth in using these technologies for the past 10 years. McKinsey & Company recently identified nearly 270 companies working in the AI-driven drug discovery industry1 and partnering between AI companies and biopharmaceutical companies continues to grow. For example, Sanofi, Pfizer, Gilead, Novartis, BMS, Genentech, and Bayer have all announced significant alliances to integrate these big data technologies into drug discovery operations.

Much of the early AI work in the biopharmaceutical industry had been done in small molecule drug discovery due to their well-understood chemical structures and availability of large amounts of high- quality physiochemical data. However, other areas of the drug development pathway, such as target discovery, biomarker discovery, and product manufacturing, are increasingly looking to AI and ML to improve outcomes. Biomarker discovery is becoming a vibrant area for exploiting the power of AI and ML. Advances in imaging technology, high-quality DNA and mRNA sequencing, and large-scale proteomics have yielded a plethora of data that can be analysed and cross- referenced to identify disease target genes or biological response signatures. One system that has been especially useful in generating relevant and accurate biological data is the patient- derived xenograft (PDX) model for cancer studies. These models are developed from tumour fragments surgically dissected from cancer patients and transplanted into immunodeficient mice. These tumours retain the unique genetic characteristics of patients’ tumours and are considered a gold standard for studying disease biology and predicting patient-specific tumour responses. As the data sets and insights from these models continue to grow, AI algorithms can provide powerful tools to identify new and better biomarkers to predict responses specific to a patient’s cancer.

AI is also emerging as a valuable tool in bioprocessing. It has proven instrumental in helping cell line development companies improve the productivity of their manufacturing lines.

For example, one of JSR’s partnering companies fully sequenced their production cell line’s genome and transcriptome and developed AI algorithms to interrogate the data. As a result, the company built a comprehensive understanding of their cells’ transcriptional and genomic landscape from that deep dataset, including insights into changes that could potentially limit the engineered cell’s stability and productivity. This knowledge enabled cell line engineering campaigns that boosted yield and stability across various production bottlenecks. In another published report2, a biotech company needed to improve its centrifugation step during monoclonal antibody (mAb) harvesting.

By analysing five years of historical batch data with AI algorithms, researchers determined that by varying lapse time between process operations, they could save 277 grams of mAbs for each batch, resulting in $5 million in recovered revenue.

Advances in quantum computing are also showing early promise for increasingly complex data analyses. In addition, quantum computing can address issues too difficult for the more traditional technologies – with vastly faster computing times. While the mathematical frameworks for quantum computing are different from AI and ML, and there is still work to be done to validate this approach, quantum computing holds the promise of additional opportunities for transforming drug discovery.

What’s next for elevating AI and ML in drug discovery?

There are still significant issues inhibiting life sciences from realising the full potential of these technologies in addressing patient needs.

The output from AI/ML approaches often met with skepticism by end-users since the workings of the algorithms are not readily available or comprehensible, like the so-called ‘AI black box problem’. Any discoveries or conclusions need to be validated in disease model systems and readily understandable to get buy-in from the regulatory agencies and the ultimate end-users. As a result, companies utilising AI and ML algorithms must ensure that the computational teams analysing the data work closely with research scientists so that the data analysis and validation are done in the context of deep biological understanding.

It is highly unusual for computational experts to have the necessary training in life sciences to interpret the data output best. Hence, most AI-driven companies fall into the category of AI enablement for biopharma as a service (approximately 85%) vs. those companies that also build their own product pipeline3.

‘Industry 4.0’ is the term used to define the next evolution of industry powered by artificial intelligence tools and systems in which AI and ML are utilised in smart factories. While it is understood that the biomanufacturing industry could benefit from smart manufacturing, given the significant investment required and the currently limited guidance from the regulatory agencies, the industry has lagged in this transition. Nonetheless, AI and ML are expected to play a more significant role in bioprocessing as plants are updated. It is also anticipated that the Food and Drug Administration (FDA) and European Medicines Agency (EMA) will provide more guidance on requirements for AI/ML-enabled processes across the pharmaceutical industry as they become more mainstream.

Given the state of the field, much of the FDA guidance to date has been around the regulation of AI in medical devices. In 2021, the FDA published an action plan regarding the use of AI and ML as software for medical devices4, and this year they published the long-awaited guidance on Clinical Decision Support Software5. However, much of the industry response to this most recent guidance has been critical, highlighting some of the challenges and unresolved issues that still need to be addressed.

Other efforts are underway to broaden the effective and responsible use of AI/ ML beyond the medical devices field. For example, Xavier University has been working with industry leaders in coordination with FDA to develop Good Machine Learning Practices (GMLP)6, to advance the responsible use of AI to support improved success across the healthcare continuum, including healthcare diagnoses, product development, clinical trials, manufacturing operations, supply chain operations, and quality assurance. Additionally, the European Medicines Agency (EMA) and Heads of Medicines Agencies (HMA) published a report in 2020 on Evolving Data-Driven Regulation7 and, in 2021, sought feedback from pharmaceutical stakeholders regarding big data and AI-related technologies8. The need and value of AI/ML are clear, and the regulatory bodies are working with the key stakeholders to figure out how to harness those benefits safely and responsibly.


While there has been much hype around AI and ML, the life sciences industry has yet to experience the degree of progress seen in other sectors, such as engineering and finance. Yet even in their nascent state in life sciences, these tools are increasing the possibility of taking advantage of the breadth and depth of high-quality data that is now driving the understanding of complex diseases and the design of new products. When paired with rigorous science and external validation, AI/ML can drive significant leaps in identifying new treatments, improving therapeutic safety, personalising treatments on a patient-by-patient basis, and improving manufacturing. The understanding and use of AI and ML will continue to evolve, and efforts to apply these tools to biopharmaceutical development will undoubtedly play an essential role in future generations of therapeutics.

DDW Volume 24 – Issue 3, Summer 2023


  1. industries/life-sciences/our- insights/ai-in-biopharma-research- a-time-to-focus-and-scale
  2. https://www.compliance-insight. com/ai-in-pharma-adoption- part-4-three-real-use-cases-of-ai- implementation-in-pharma-not- science-fiction/
  3. industries/life-sciences/our- insights/ai-in-biopharma-research- a-time-to-focus-and-scale
  4. FDA, Artificial Intelligence/Machine Learning (AI/ML)-Based Software as a Medical Device (SaMD) Action Plan, (January 2021).
  5. Clinical Decision Support Software, Guidance for Industry and Food and Drug Administration Staff Document issued on September 28, 2022.
  6. team/
  7. documents/other/hma-ema-joint- big-data-taskforce-phase-ii-report- evolving-data-driven-regulation_ en.pdf
  8. events/joint-hmaema-workshop- artificial-intelligence-medicines- regulation

Tim LoweryAbout the author

Tim Lowery is President of JSR Life Sciences, which recently launched its NGS (next generation sequencing) -AI division, a research, and development group focused on using omics and bioinformatics to advance drug development. With this initiative, JSR is working to spur the development of new technologies, innovative algorithms, and assets to drive the growth of new standards and processes inside and outside JSR Life Sciences.

Related Articles

Join FREE today and become a member
of Drug Discovery World

Membership includes:

  • Full access to the website including free and gated premium content in news, articles, business, regulatory, cancer research, intelligence and more.
  • Unlimited App access: current and archived digital issues of DDW magazine with search functionality, special in App only content and links to the latest industry news and information.
  • Weekly e-newsletter, a round-up of the most interesting and pertinent industry news and developments.
  • Whitepapers, eBooks and information from trusted third parties.
Join For Free