At IBM Research, a group of scientists and researchers from around the globe is working to support the power of the scientific method to invent what’s next for IBM, its clients, and the world. Joshua Smith, PhD, IBM TJ Watson Research Center, speaks with DDW’s Megan Thomas about AI, and the acceleration and discovery of therapeutics and biomarkers.
MT: What do you hope the audience will take from your keynote at SLAS2023?
JS: The SLAS2023 conference theme of Innovation Accelerated with respect to automation, discovery, research, technology, and collaboration is one that resonated immediately. It’s something we’ve given considerable thought to with specific focus on how these topics relate to accelerated therapeutic and biomarker discovery.
Tremendous effort has been applied across the industry toward advances in this direction, and at this stage, the relative maturity of several key technologies, including AI, has the opportunity to leverage available data to augment and transform scientific workflows to transcend existing bottlenecks in this space.
Among the core technologies poised to drive impact at scale are AI, cloud and high-performance (HPC) computing, and quantum computing, each game-changers their own right. AI can help generate and unlock insights from the vast amount of available data. Hybrid cloud and HPC can help tie together heterogenous and distributed resources and pull in data from public clouds, private data centres, and edge/IoT, including labs and instruments. Additionally, the advent of quantum computing promises to tackle certain problems intractable for classical computing.
My desire would be for listeners to walk away with a more holistic view of how these technologies can converge to provide end-to-end acceleration in scientific workflows, from knowledge integration and AI-enriched simulation to generative modelling and automated experimentation.
MT: Over the past 10 years, we have been living through an information revolution in healthcare and life sciences. What sparked this revolution, and what is fueling it still?
JS: Underlying this information revolution has been the explosion of data created in this space. By some estimations, data being generated within the healthcare industry now represents ~30% of all global data volume with significant projected growth in the coming years.1
Without question, many factors have contributed here, but at the heart of this spark lies a convergence of technology, awareness, desire for transparency, and necessity. The rise of new and maturing of technologies (including new assays and cheaper cost of sequencing), an ageing population, increased digitisation of health information, cloud-based information exchange, and a vast array of consumer products capable of tracking personal health data have all contributed here. This has led to a wealth of information encompassing multi-omics and foundational scientific data, as well as in diagnostic, treatment, outcome, and other related clinical data. Data from extended care facilities and other monitoring outside the hospital, such as data from wearables, have added significantly to this pool as well.
The ability to effectively weave this information together to present a more complete picture of individual health with actionable insights that improve outcomes is key moving forward. While growth in all these areas will continue to fuel this revolution, there is some evidence of a shift in the centre of gravity from data collected on patients diagnosed with a disease toward smart devices and wearables that can continuously monitor human wellness and quality of life (QoL), providing a much broader picture than medical records can provide alone.
MT: There is a lot of talk about the potential of AI in drug discovery. Do you think this potential is achievable and how can all sectors of the industry access its benefits?
JS: Absolutely. Given the enormous wealth and variety of data available, AI allows us to learn from increasingly rich models to help us with several tasks useful in drug discovery. It can accelerate and automate steps, provide end-to-end workflows, and augment human creativity, where data sets are too large for us to understand, digest, and interpret patterns and draw insights on our own.
Generally speaking, AI can be utilised in several ways to achieve end-to-end acceleration. It can be used to ingest, integrate, structure and search a wide variety of data and knowledge at scale, to acquire deep insights. Further, AI-enriched simulation can help identify the most promising simulations to run on a massive data set, thereby reducing the amount of compute required, for example to identify drug candidates. Generative AI can be applied to automate hypotheses generation, pulling from massive amounts of data to reverse-engineer molecules that fit the description of characteristics that researchers want for a new molecule, significantly expanding the scope of possibilities to explore. Finally, AI can be applied on the experimental front as well for tasks like chemical reaction prediction and automated chemical recipe creation.
AI can also bring specific focus to accelerate therapeutic and biomarker discovery. It can help improve drug safety and efficacy by automating the construction of a quantitative map between drug and disease mechanisms at the population level or help uncover disease stages and progression patterns from complex longitudinal data to name just a couple of examples. But anywhere there is data and the need for deeper insight, AI has a role to play in accelerating innovation at scale.
MT: In your experience, what are some key examples of how the industry, as well as IBM Research specifically, has benefitted from access to this previously inaccessible data?
JS: For the industry as well as IBM Research, I think it is less of a focus on data accessibility because there is a lot that is already there, and more of an emphasis on what to do with data that is already accessible. So, how do you leverage different kinds of data, having multiple modalities, coming from different sources to accelerate discovery? This is really the focus. You need to be able to connect and fuse this data and perform analytics to generate information that gives you a more complete picture, instead of doing this analysis in siloed isolation.
MT: How can this information be utilised for accelerated therapeutics and biomarker discovery, and what is currently stopping this from happening?
JS: As mentioned previously, there is already a considerable amount of data that is accessible that can be used to create actionable information when it comes to accelerated therapeutics and biomarker discovery. When it comes to handling sensitive data, such as protected health information (PHI) or other personally identifiable information (PII), there are currently some great solutions in place, such as data privacy vaults, that address challenges with respect to secure analytics and data sharing, compliance with laws and regulations, and data residency and security. There are even API-based solutions available as a service. So, I don’t believe there are currently barriers preventing meaningful discovery in this space. Again, it’s how this information is connected that is important.
MT: In terms of regulation, what milestones need to be reached before completely open access to this data is possible? Why does regulation matter?
JS: That is difficult question and one that is probably better suited for a regulatory agency. Generally speaking, I don’t foresee a world in which all data is completely open access, and for good reason. Certainly, withholding sensitivity personal information (SPI) and PII is necessary to protect individuals, and health data definitely falls into this category. But while protected, it needs to be made available to secure payment and enable healthcare practitioners as well as for treatment, general public health, and research. There is a balance to strike between privacy and protection weighed against operational enablement and discovery. I think there is room for both.
The real question, I think, is: Is the data achieving its expected value? Throughout the policy making process, especially in recent years, there has been a constant drive toward collecting more data and making that data publicly available. But while access to data is necessary, it is not sufficient to improve the healthcare delivery system.
MT: What are the key opportunities and challenges implementing this technology can offer?
JS: The discovery process allows individuals, institutions, and enterprises to pose questions, build knowledge, surface scenarios, enhance operations, and ultimately discover outcomes, iterating on this process for complex problems where no one knows the answers. This is true for drug discovery as well as many other scientific problems.
I think that the value and opportunity lie in the holistic, end-to-end acceleration of the scientific method across all relevant workstreams, identifying and providing technology that addresses all of the bottlenecks and pain points in the workflows of end users – the scientist, the researcher, the developer. This frees them from the information technology concerns so that they can focus on the actual science. At present, this cycle is very far from automated and very far from something that can be executed without thinking about the underlying technology in most cases.
Opportunity lies in providing these users with a technological framework capable of harnessing access to cutting-edge AI/ML, cloud capabilities, and heterogenous infrastructure and compute, such as HPC and quantum, for their particular use case, where they can bring their own data, domain context, and even algorithms, models, and tools to bear a problem. As part of this story, interoperability and integration of what they bring together with other data and sets of reusable and generalisable as well as domain-specific, built-for-purpose AI capabilities, such as casual inference, multimodal data fusion, molecule generation, etc, is crucial.
Some of the big challenges here include how you access and handle sensitive data and create a technological framework with this kind of flexibility across the entire drug discovery pipeline – everything from target identification to post-approval in market. That said, we are already seeing how AI is helping to achieve acceleration on a number of fronts across this entire flow through AI implementation in drug repurposing, finding new molecular entities, improving safety and efficacy, disease staging, and trial enhancement just to name a few.
SLAS 2023 Supplement, Volume 24 – Issue 1, Winter 2022/2023
1. RBC Capital Markets.
Joshua Smith, PhD, is the Global Lead for Accelerated Discovery Partnerships, a Research Staff Member, and an IBM Master Inventor at IBM Research. With 50-plus granted patents and 20-plus co-authored journal articles, his research has been highlighted by Forbes, CNN Money, Pharma Technology Focus and TED.