How ML and automation are changing the future of protein therapeutics


Kristen Hopson, Vice President of Preclinical Discovery and Development at Generate Biomedicines, and Yue Liu, Associate Director, HTS at Generate Biomedicines, reflect on the impact of artificial intelligence (AI), machine learning (ML), cloud computing, and advances in laboratory automation and high-throughput screening (HTS) on life science research.

Proteins are the basic building blocks and engines of biology. For decades, drug discovery and development have been constrained by limits in our knowledge of the relationship between protein sequence, structure and function, and the methods available to explore novel drug targets. Therapeutic candidates were discovered randomly, with little control over the specificity of target engagement; understanding the function, developability, and safety/tolerability of potential candidates was a costly and time-consuming process, with low success rates; platforms were usually limited to individual modalities or disease areas.  

Today, the convergence of AI and ML, cloud computing, and advances in laboratory automation and high-throughput screening (HTS), all tied together by digital infrastructure, are changing the paradigm of life sciences research and creating a data-driven revolution in the discovery and development of protein therapeutics. 

Therapeutic leads can now be generated with precision to a specific epitope or defined target engagement. Rather than taking months or years, therapeutic leads can be generated in seconds and validated in weeks, with function, developability, and safety included simultaneously in the parameters of the initial generation process. New platforms based on this approach, building on core principles of protein structure, inherently span multiple modalities and disease areas.  

Instead of being limited to the infinitely tiny fraction of possible proteins that nature has evolved over 3.5 billion years, it is now possible to design a vast diversity of proteins never seen in nature and to program them for optimum combinations of symmetry, shape, binding affinity, potency, and developability. This capability, combined with the ability to measure critical molecular characteristics and function at unprecedented speed and scale, holds great promise for developing novel therapeutics and bringing them to patients faster, cheaper, and better tailored to their specific conditions.  

What are the roots of this new direction in protein therapeutics? As computers developed in power and sophistication in recent years, the idea that computers, if fed enough data, could learn the principles of systems and then use those principles to invent new things that had never existed before, began to take hold in different fields. For instance, the application of AI has made it possible for computers to create images of people that look realistic, but don’t actually exist; or to create text in different languages; or to create images based on natural language inputs. We see examples of these uses of AI and ML every day – when Amazon suggests something it predicts you may want to buy; or when social media tailors news streams and notifications based your preferences and networks of connections.  

There are two principal shortcomings of existing efforts to use data and ML to advance our understanding of potential new therapeutics. First, even armed with the structure of a potential therapeutic protein, we know nothing of its function and druggability. Second, focusing only on known proteins limits the horizons for novel therapeutics based on functional proteins that have never existed before. This is a direction that several organisations, including Generate Biomedicines, are now exploring.

At Generate, we are adapting the new tools of ML and combining them with laboratory facilities (including microfluidics and automation of HTS assays to enable rapid, standardised experimentation to maximise data quality and scale) to test protein sequences generated in silico with a specific therapeutic question in mind. Linking our wet lab efforts with the output of the ML platform (or dry lab) creates a productive cycle of generate, build, measure, and learn to provide the right data at the right scale to unlock value for designing the right medicines quickly, which can then be tested in the clinic and brought to patients rapidly. Two main characteristics differentiate. Generate: the ability to generate proteins that have never existed in nature before, and the integration of dry lab and wet lab capabilities through a robust digital infrastructure.

We have built customised lab automation systems to carry out production and characterisation efforts at high throughput in the wet lab. The large amount of data produced from a diverse set of assays will feed back to the ML algorithm to optimise the function and biophysical properties of protein design. The biologists are trained to use HTS technologies and automation to convert the functional assays into miniaturised format and use robots to carry out characterisation at scale. Automated screening processes provide high quality, biologically relevant data from which to learn. 

ML approaches, coupled with advances in laboratory automation, have brought about a new paradigm for designing protein therapeutics, focusing on generalisability from data and technology. The field is developing at a dizzying pace and we can reasonably expect that this new paradigm will affect every aspect of drug discovery and development over the coming years. The ability to program protein function will have a transformative impact on our efforts to improve human health – with implications as profound as the industrial revolution caused by the steam engine in the 18th century, the development of electric power in the 19th century, or the information revolution that emerged from Silicon Valley in the 20th century.

SLAS 2023 Supplement, Volume 24 – Issue 1, Winter 2022/2023

About the authors

Yup LiuYue Liu is an Associate Director leading the bioassay innovation and discovery team at Generate Biomedicines. In her current role she excels at applying interdisciplinary knowledge of biologics characterisation and lab automation to generate data at scale, and in integrating the wet lab and dry lab capabilities to build machine learning technology platform and pipeline programs. Liu received a Ph.D. in Pharmacology from the University of Florida and her MBA from the University of Massachusetts Amherst.

Kristen HopsonKristen Hopson is head of preclinical discovery and development at Generate Biomedicines. Hopson joined Generate from Moderna Therapeutics and has previous experience in translational medicine and vaccine development in the context of T-cell vaccines targeting various oncology, viral, and bacterial targets. Hopson received a PhD in molecular medicine from Boston University School of Medicine and completed her postdoctoral training at Harvard Medical School/MGH and Pfizer. 

Related Articles

Join FREE today and become a member
of Drug Discovery World

Membership includes:

  • Full access to the website including free and gated premium content in news, articles, business, regulatory, cancer research, intelligence and more.
  • Unlimited App access: current and archived digital issues of DDW magazine with search functionality, special in App only content and links to the latest industry news and information.
  • Weekly e-newsletter, a round-up of the most interesting and pertinent industry news and developments.
  • Whitepapers, eBooks and information from trusted third parties.
Join For Free