Listen to this article on the DDW Podcast:
Philip Gribbon is Head of Discovery Research at the Fraunhofer Institute for Translational Medicine and Pharmacology in Hamburg, Germany. Ahead of his participation in a Data Sharing panel at SLAS Europe 2022, Gribbon tells Megan Thomas about the importance and lasting impact of good quality data.
Health and data science
The Fraunhofer Society is the largest applied research organisation in Europe, and it devotes a significant proportion of its efforts to advancing health-related sciences and technologies. Philip Gribbon is the Head of Discovery Research at the Institute for Pharmacology and Translational Medicine, and he says a large part of his role Is using data science solutions to help the drug discovery industry solve problems. Gribbon and his team in the institute were originally focussed on generating data sets to support hit identification and target validation as part of early-stage drug discovery projects. As SLAS attendees will know, these approaches typically involve high throughput screening of large collections of compounds against targets in biochemicalor cellular assays. From that, the project team is able to develop an understanding around how these compounds might work in cells, how they interact with the targets and how they might cause disease modifying effects. He says: “We’ve generated a lot of data, and about six or seven years ago, we started getting more deeply involved in how we can better exploit the full potential of these results. We’re a publicly funded research organisation working mainly with external industrial partners and collaborators. The idea was to try and understand how one project can learn from another – since projects don’t exist in isolation. They’re generally using the same compound sets and there are also lot of commonalities between the assay systems or instruments used. In addition, there is a great wealth of external data being generated by the wider community, which appears typically in scientific publications and in some cases is up loaded to data resources such ChEMBL, a bioactivity database run by the European Bioinformatics Institute.
“While there is a practically limitless number of possible compounds that can be synthesised, there’s a much smaller and finite number which are physically available for screening at different organisations. As a result, many of the results that we generate will have similar or complementary data already deposited in a public database, or the information may sit on the server of another academic screening organisation or company. There’s a lot of value to be gained from being able to compare across all of these data, and we realised that was becoming an increasing proportion of our work here: not just generating the data, not just doing the biology and the chemistry, but also working to integrate and interoperate with public data and in this way maximise the impact of the newly generated data.”
Gribbon sees his job as continuing to work on advancing the biology and the chemistry elements of projects but always exploring ways to get more out of the data that they generate, link it through to other data resources, and then hopefully make better decisions. He says: “In the end, it’s all about decision making. We want to make good decisions to advance our projects and make investments to only generate essential new data, taking fully into account relevant data that have been produced by other investigators in previous studies.”
In a SLAS Europe 2022 panel, Gribbon will engage in more detail on the topic of data sharing and its importance in the drug discovery and development ecosystems. Gribbon emphasises that it is not just him and the Fraunhofer who value data and how it’s used. Public funders, such as the European Commission, supported many of the research projects which originally generated the data and are now looking to promote further reuse of these results. The EU is doing this by funding programmes which create resources and infrastructure that allows people to more effectively work with and share scientific data. Fraunhofer is involved inmany of those projects.
Gribbon is involved in FAIRplus, a project supported by the Innovative Medicines Initiative (IMI), now the Innovative Health Initiative, which works on “dataFAIRness”. He says: “We want to FAIRIFY data resources and data analysis workflows from IMI supported projects and in doing so create useful tools for use by the wider community. We want to improve data find ability, its accessibility, its interoperability and ultimately, increase its reusability. Other programmes include the European OpenScience Cloud (EOSC).”
“We’re involved in an EOSC project called EOSC-Life, where life science infrastructures get together to try to create resources for working with biological and clinical-related data from different infrastructures and giving the wider community access to it.There is also a project we are part of called EOSC FUTURE, which is looking at how these resource sare developing over time and how different research portals, resources and services can be linking together in a sustainable manner for the longer term.”
Gribbon continues: “We’re also involved in the BY-COVID project, where we’re looking at the last two years of Covid-19 research as well as new research data being generated going forward. The BY-COVID project will promote connection and aggregation of these diverse data thus allowing them to be effectively reused.” Gribbon and his colleagues are also involved in a whole set of projects, including the IMI related to FAIR data management in antibacterial drug discovery. Hecontinues: “These are all activities which meet the demands of funders in individual countries and the EU who don’t want to see data sitting in silos but actually being reused, thereby achieving a greater return on their historical investments. What I’m going to be trying to talk about at the meeting is how we are involved in some of these initiatives, and what it means for people who generate the data, people who analyse the data and everybody in between.
AI and ML
Acknowledging the current excitement around the application of approaches like machine learning (ML) and artificial intelligence (AI), Gribbon explains that their success is predicated on the availability of good quality, training data sets, which allow them to optimise the algorithms in order to make them more predictive, powerful and useful. He says: “Getting the data into a state where it can be effectively reused, for AI-type use cases, really has to happen at the very beginning of data creation. At the meeting, we can talk about some of the requirements when you’re generating datasets – how to make them fair, how to apply ontologies, how to apply dictionaries and all these other procedures that need to happen. I’ll be covering those types op oints at SLAS Europe as well as why you need to think about open data, fair data, deploying your data on public resources and what the benefits are for both data providers and users.”
Prior to Covid-19 projects, Gribbon and his team would typically keep primary data ‘in house’ and draw upon public resources to compare with internal results. He says: “We’ve run high throughput biochemical-based repurposing screens against seven of the mainSARS-CoV-2 viral proteins, we’ve also run phenotypic assays and been involved in in vivo studies, which is a huge amount of practical work on the Covid-19 side. I think one of the things that we really learned from the Covid example was that there were a lot of different drugs from in-vitro repurposing screens that were being highlighted as being possible treatments. However, looking across different studies, results were often inconsistent and in many cases contradictory. To properly interpret these data coming from different groups, having access to the experimental metadata is critical: knowing how the experiment was performed is essential. “We focused on making even our primary data available and much of our primary data from screens has been deposited into public resources. This will be further extended in the BY-COVID project, which links different European research infrastructures.”
AI, effective data and Google
Gribbon says: “When I was thinking about a standout example of what effectived ata – open data and data sharing – is, and how that can be useful to community, I thought of the AlphaFold resource. This is where the folks within ALPHABET and Google have been working for many years with the structural biology community to come up with ways of being able to predict the secondary and tertiary structures of proteins based upon their sequence. To predict protein structures is incredibly complex and it would be impossible to do it solely from knowing which amino acids were in a protein’s polypeptide chain. “But what they did was use public datasets from the structural biology community, including the Protein Data Bank (PDP), which is highly curated, highly annotated and consistent in the way the data is curated. Google and their academic collaborators were able to use that data to train their systems to predict the structures of practically all human proteins. “That wouldn’t have happened without the decades of work from tens of thousands of scientists, all working in different labs, all doing their experiments slightly differently, but all reporting their data consistently. For me, this is a shining example of what can now be done now in structural biology – something which few thought possible four or five years ago. We’ve come along way.”
Volume 23, Issue 2 – Spring 2022 – SLAS Europe Supplement
Philip Gribbon is Head of Discovery Research at the Fraunhofer Institute for Translational Medicine and Pharmacology in Hamburg, Germany. He is involved in several national and European consortia working on compound repurposing applied to infectious and rare diseases and was previously coordinator of the European Infrastructure for Chemical Biology, EU-OPENSCREEN. Previously, Gribbon was Chief Scientific Officer of the European ScreeningPort, a manager at GlaxoSmithKline, and a Principle Scientist at Pfizer.