This article is sponsored by Benchling.
“What’s the single most important factor when building a machine learning model? Data! Having the right data, and an understanding of the uncertainty in that data often makes the difference between useful and useless models,” said Pat Walters, Chief Data Officer at Relay Therapeutics.
R&D leaders today know that artificial intelligence (AI) and machine learning (ML) accelerates pipelines, makes drug discovery more efficient, and is a path to innovation. And if data is the cornerstone to AI, biotech is awash in opportunity.
But there are critical investments that these companies need to make to be able to even be able to implement this technology, let alone benefit from it. The common thread? Businesses can only get value from AI and ML by first investing heavily in building strong data foundations. Read on for the three critical steps biotech companies must take first to be ready to implement AI and ML into their R&D strategies.
Develop infrastructure to capture quality data
AI and ML technology all starts with data. Luckily for biotech, there’s no shortage of it. Biotech data, largely fueled by genomics, is doubling every seven months in size, making biotech the industry with the largest data volume by 2025. With that data typically living in silos, scattered across multiple disconnected systems in an unstructured way, companies first need to implement a data strategy that allows them to capture the data they’re generating.
Moreover, scientists aren’t often confident in the data they do have access to — in fact, in a Benchling study, not a single company surveyed expressed having high confidence in their data quality. Companies must start the journey to using AI and ML by building data infrastructure that allows for collection of quality data.
Standardise data collection for simplified search and management
According the Benchling study, 65% of companies reported each person on their teams spending five to seven hours every week just searching for and collecting the data they need to do their jobs.
And that’s the next step: managing data in a standardised way across departments. For some industries like commerce, banking, or ads, data models are easier to define in a common way across the entire industry — making data standardisation easier to achieve and implement. In scientific fields, there’s more room for featurisation, leaving R&D companies with difficulty seeing standardisation not only across the company but even across teams.
Moreover, there is a lot more process data to collect in R&D: to correctly model the outcome of an experiment you need to correctly track the right metadata around the physical workflow that led to the measurements you want to optimise. Having standardised nomenclature — and even processes that enforce that nomenclature — to collect the right information around the process is the next issue biotechs must overcome, so that teams can locate and use the data they’re collecting.
Invest in a culture that sets the stage for data science
Once data is collected and managed, then companies can start to think about how they enable data scientists to engage with it. A common mistake R&D companies making is jumping ahead to data science before solving the hard foundational problems of establishing the pipeline and flow of data for analysis — how to operationalise the flow of data coming from experimental pipelines for usage by data scientists. R&D companies must start with building a strong data culture before hiring engineers who can manipulate data.
Building data models that effectively scale in R&D requires having past experience working in life sciences, using those datasets, and understand the complexity behind the processes used to generate it. But data science and and data modeling are relatively new skillsets. It’s tough for R&D companies to find talent that pairs new skillsets with prior experience needed to apply those skills. If businesses start by investing in a data culture from the start — bringing in data hires early on and in leadership roles — they’ll find it much easier to bring on data scientists and data engineers that will allow them to truly make the most of their data.
Benchling is the pioneer of the R&D Cloud, software that unlocks the power of biotechnology. More than 200,000 scientists at over 1,000 companies and 7,500 academic and research institutions globally have adopted the Benchling R&D Cloud to make breakthrough discoveries and bring the next generation of medicines, food, and materials to market faster. The Benchling R&D Cloud helps these organisations modernise their scientific processes and accelerate collaboration so they can convert the complexity of biology into world-changing results. For more, please visit Benchling.com.