By Dr Christof Gänzler, Product Marketing Biology at PerkinElmer Informatics.
It’s a centuries-old truism: the fastest runner doesn’t always win the race. It takes more than speed to advance and maintain performance. It takes the right skill set, good information, and the ability to change course as needed. In today’s quest to accelerate time to market for novel drugs, part of the success formula requires leveraging discovery data with more precision and speed, and earlier in the process. The ability to do this is essential at each of the three steps in the drug discovery process – create, test, and choose.
For pharma and biotech companies – whether working on small or large molecule drugs, or cell and gene therapies – this means empowering scientists with better and faster data management approaches, best practices, and leading technologies so they can pinpoint new molecules and formulations and modify existing ones more easily and quickly. Having the insights and flexibility to pursue or re-examine secondary candidates is also critical.
The end game: reducing late-stage candidate failure and driving stronger innovation, productivity, and throughput. As the industry is painfully aware, however, shrinking time to market is not only about generating mountains of data; it’s about more effectively and efficiently accessing, visualising, analysing, and sharing the right data at the right time across the right organisations.
Getting Data on FAIR Footing
The core challenge preventing many labs from getting to a better information destination is the fact that discovery data is strongly akin to mercury, it’s challenging to collect and hard to hold.
What’s needed – and what will help organisations increase their overall productivity as projects move from lab to market then – is the creation of a new, streamlined and connected informatics ecosystem centred on cloud-based technologies like electronic lab notebooks (ELNs), data automation, and secure but open collaboration. This exists today and helps scientists understand not only where critical data lives and how it can be best used but how it can be applied to minimise the amount of manual work and related errors and avoid the data silos that defeat seamless workflows.
As a backdrop to this informatics ecosystem, it can be helpful to tap into a set of guiding principles established in 2016 to support improved scientific data management and stewardship, the FAIR Principles.
The tenets of the FAIR Principles include improving the Findability, Accessibility, Interoperability, and Reusability of digital assets. FAIR is all about breaking up data stovepipes, exchanging knowledge, and creating meaningful data management and process changes to drive better results.
Taken together, the FAIR Principles show ways to properly collect, annotate, and archive data to create valuablendata assets for immediate use and long-term reuse. Here are some key points to consider:
Find – This principle asserts that all metadata and data should be easy to find by both humans and computers. This requires unique and persistent identifiers, rich metadata, and registration or indexing in a searchable resource.
For years, the ultimate goal of IT infrastructure was to locate all the existing data. Every network drive, cloud storage, and database needed to be indexed and searchable. Data lakes served this same purpose, but while the idea was good, the results of search engines were often disappointing. Though you could search and find files, some manual work was still required to make sense of it and combine the information to make it actionable. Most of the time the specialist knowledge on how the data was produced was missing.
Access – The goal of this principle is to ensure that, once found, data can be readily accessed, possibly including authentication and authorisation of the person who requests the data. This requires a standardised communications protocol that can retrieve data or metadata using their identifiers and maintaining metadata even after the data itself is no longer available.
For data and information to be accessible as a result of a search, you need to have the right tools and these tools need to work together. Mixed media search results with documents, presentations, PDF files, and spreadsheets are one thing, but what’s to be done with podcasts, recorded webcasts, and websites? Although all the individual entities might be accessible, the data buried in them might not be easily extractable for manual analysis. Trying to do data science with mixed media files is even worse. The way to circumvent this situation is to start collecting the data before it ends up in a podcast. Mining historical data is still a challenge, but mining new data will not be as challenging if the right access structure is put into place.
Interoperate – In this tenet, data must usually be integrated with other data, and the data must also operate seamlessly with applications or workflows for analysis, storage, and processing. This requires formal rules for knowledge representation that can be shared, vocabularies that follow FAIR Principles, and cross-references between all data and metadata.
After making data more findable and accessible, the next hurdle is to understand if the different assets inform about the same thing. Comparing two data sets coming from two different publications, for example, is frequently impossible because the scientists are focused on presenting their results and not focused on comparing their results with those from other scientists. Many efforts have been made to standardise certain types of results. For example, GEO, the gene expression omnibus database, has been collecting gene expression and functional genomics datasets and making them available to the public for more than 20 years. It is based on establishing the minimum information an experiment must provide to be regarded as interoperable with other experiments of the same kind. The same can be established in-house to facilitate the interoperability of current data with future data.
Reuse – Lastly, the ultimate goal of FAIR is to optimise data reuse, so metadata and data should be described thoroughly, allowing it to be replicated and/or combined in different settings. Having core or minimum information also supports the reuse of data beyond original experiments. Reusing assay results requires more interoperability between different kinds of experiments, and further data reduction and harmonisation needs to have its starting point in the wet lab. Then even if today’s data is analysed with current statistics, it will still be reusable together with future data and statistics and vice versa.
Core technologies and best practices for FAIR digital informatics ecosystems
With the FAIR principles as a roadmap, here are some core technologies and best practices helping labs create more effective informatics ecosystems today:
Cloud-based ELNs: Many labs are looking to ELNs to break down data silos, foster collaboration, and knowledge sharing, and combat the loss of institutional and skills knowledge resulting from high staff turnover rates. By employing ELNs labs know what data they have and where it lives. They can do Google-like searches to pinpoint critical data and they can ensure information is stored securely but where scientists can retrieve what they need, when they need it. This is in stark contrast to traditional, manual models where assay result reporting and analytics are shared through graphs and summary tables using slides and spreadsheets via email, which gives no context or understanding of the rich underlying data or processes.
Automated Data Management and Data Analytics: Automating the lab to increase the throughput of experiments, utilising pre- built assays, and looking to outsource some lab tasks can all provide big pieces of the discovery acceleration puzzle. Equally important, however, is the ability to efficiently harness andleverage information by using automated processes to calculate results from raw data. Automation in the lab should work in tandem with automation of data analysis and data management. With a better handle on data, better decisions on drug candidates can be made faster and earlier – without scientists needing to be IT experts or getting bogged down in data management tasks.
While there are many ways to support scientific decision-making, making it easier for a group of scientists to collaborate on a project remains a struggle. Every team needs to gather all available data, compare, analyse, and ultimately decide on the right new drug candidates to move forward. This is the case at any stage of decision-making, not only at the end of the process, and it’s true for single experiments, a group of orthogonal assays, in vitro and in vivo datasets, and certainly for entire research projects.
This creates a hierarchy of decision points starting at the individual experiment. While assay stability and outlier detection are crucial for individual assays, these quality measures are expected to be handled by the scientists conducting the experiment. To ensure this high level of quality, standard operating procedures and data automation must be applied to reduce errors in executing the lab work.
Defining Endpoints and Reducing Data: An important preparation step in any data management process is to define the decision-critical endpoints of each experiment. The more data that is captured and analysed, the more reliable the endpoint will be. But a data reduction step is required.
All the different metadata information and the individual datapoints must be reduced to a single piece of information. A prominent example of a data reduction is the creation of IC50 values out of hundreds of data and metadata points. These reliable IC50 endpoints make the next level of decision- making easier and more efficient.
Mapping a new approach with the scientific method as true north
Beyond all the technological advancements and optimisation strategies in the labs, the scientific method must always prevail. This overarching principle of creativity and empirical knowledge in the natural sciences is why scientists will always be needed. They cannot be taken out of the equation and replaced by an algorithm.
What will change in the near future, however, is the amount of digital work versus work in the wet lab. Critical thinkers will always need their flexibility and degrees of freedom to do their work, and this can be supported by technology. The traditional approach, where manual processes and individual knowledge were regarded as the key to groundbreaking innovation, is increasingly being replaced by digital processes where co-working and collaboration are king. This shifts the focus to teamwork, bringing critical thinking from letters, journals, and yearly scientific conferences into the daily cadence of a new generation of more digital lab scientists.
Harnessing the full potential of data produced in a drug research lab depends on many factors, including the way research is done. Lab automation using pre-built assays, environmental monitoring, and software automation are the technical foundation of this journey.
FAIR principles are a good way to frame data needs on the technical side. But everything depends on how much knowledge transfer, collaboration, and teamwork is happening. This is very dependent on the way the underlying processes and data streams are connected to all levels of decision-making and the buy-in of the stakeholders.
Building on a strong digital foundation starts today
Living by the FAIR Principles, and moving from a siloed to a sharing culture, can be supported with software which is already available today. Reaching a holistic view of the data and decision processes starts with an understanding that every data point counts and is an important asset to the project – if it is annotated and used correctly and in context.
Data flows from the planning phase in an electronic lab notebook to the experiments in the lab, while results flow back to be combined with the planning to create decision points for the next set of experiments. With FAIR as the guiding path for overcoming isolated data sets or point solutions and encouraging teamwork, every scientific organisation can be on its journey to implement the future of data decision making.
As a result, the automation of work in the lab and in the software will create a different work experience for pharma scientists and increase productivity and successful outcomes in what they do best – innovating drug discovery through smart science – without needing to become an IT expert.
Volume 23 – Issue 4, Fall 2022
About the author
Christof Gänzler gained his doctorate in molecular biology from the German Cancer Center (dkfz) in Heidelberg/Germany working on Human Papilloma Virus vaccines. Previously he held positions at LION Bioscience as a Scientific Bioinformatics Consultant before joining TIBCO Spotfire followed by ZephyrHealth. Gänzler then came to PerkinElmer Informatics as Manager, Scientific Analytics and today works in the Biology Informatics area of the business.