Abbey Vangeloff is the Director of Business Development, Life Sciences, at Yahara Software. She will be chairing a session at SLAS2023. DDW’s Megan Thomas catches up with Vangeloff to learn why AI and data science are so important for the future of the sector.
Abbey Vangeloff says that after she first participated in a SLAS event four years ago, she has kept a presence in the organisation and considers chairing a session to be a next step in this ongoing relationship. She believes it is a keystone conference for Yahara as a software company to understand what people are interested in. She says: “There is always a very strong data and technology track at SLAS, and I think it’s really interesting to get that focus if you’re talking about automating. But today, there’s so much technology involved in that, and in what people are focusing on – the ways that they’re thinking about how to create those structures and systems, and the longevity of those structures and systems. The way modularity plays into that has really been a theme that I’ve been hearing over the past couple years.”
Then and now
Vangeloff remembers that four or five years ago, everybody was acknowledging the need for data. Since then, she says this has shifted to a conversation about what to actually do with this data. “For a while, it was all about the big words like ‘blockchain’ and ‘security’, and making sure that you could follow the provenance of the data and really make sure that it came from where you thought,” she says. Now, she thinks people are starting to move into AI and ask, ‘if we could consume more data in a way that’s going to provide better insight, why would we not be doing that?’ Not only does Vangeloff think that this dovetails nicely into personalised healthcare, but she also thinks it is a good build on from what we were hearing years ago.
She adds: “Scientists are always going to want all the data. But now, moving into looking at AI and machine learning, the ability to consume vast amounts of data in a meaningful way has really progressed. At SLAS, we get to be on the bleeding edge, and hear those sorts of talks. SLAS is both about the problems people are trying to solve and the technology they’re bringing to bear on that. We always come away with some really interesting ideas and some very cool stories and new tools that our clients – or we personally – would be interested in.”
What we’ve learnt
Vangeloff enjoys working in supporting software with science because: “Science is messy. Biology is messy. There’s a lot of variables, it wanders and meanders and even at your best, creating hypotheses, we don’t know what we’re going to find,” she says. She finds that marrying that with past iterations of software development, where the software is highly deterministic, is a whole discipline in and of itself. By this, she means you basically have to know going in what the pathways you’re looking at are and develop for those pathways and those edge cases, and the non-happy paths as a whole. She adds: “You’re trying to take something incredibly biological and organic and apply this very regimented system to it. So, concessions get made, and there are strategies around that.” This, Vangeloff says, is the part they get excited about when working with life science clients. Vangeloff thinks that, moving into these, if you’re looking at AI or machine learning correctly applied to it, you can wrap your arms around more of the complexity in the analysis than you can with a deterministic strategy.
“There are places where machine learning and AI might not be appropriate, and you do want that more systematic approach and the logic there, but having another tool in the toolbox to be able to do that makes a lot of sense. A lot of where we’re headed to in healthcare is more personalised medicine to be able to address more of those complexities on an individual basis, and on a population basis. It’s all coming together, we’re now collecting the data and we have more of the tools to be able to encompass that,” she adds.
Vangeloff says that data is going to continue to iterate. She anticipates that we are starting to understand that we can’t just collect data; we must collect it in a certain way. She thinks people will start to think to apply an algorithm to data and therefore make sure that it’s collected in a certain way, instead of trying to go back and recapitulate from datasets they’ve had before. “We’ll be more proactive about knowing what the applications for the data are going to be and making sure that the dataset supports the vision and the goals. Nobody collects data because they like data, we collect data because we want the insight. So, I’m creating datasets with the application and the insight in mind. I think will be valuable in the future,” she says.
Harnessing data and AI for drug discovery
Vangeloff sees vast possibility when it comes to harnessing data and AI technology for the benefit of the drug discovery and delivery industry, as well as the potential for new drugs. She notes that there are several companies that are going back and evaluating drugs currently on the market for different applications. “I think that’s a really fascinating discipline. I mean, everybody knows about side effects on drugs, but to purposefully say, ‘we’re going to go and evaluate something that’s already FDA approved for different use cases, and different value adds’, can bring a lot of value quickly in ways that creating a new drug cannot,” she states.
Realising potential in 2023
There is no doubt that the potential is almost unlimited when it comes to data science, and the use of AI in the life sciences. So, realistically, how much of this potential is it possible to realise in 2023? In some ways, Vangeloff feels like AI and ML are the ‘new thing’ that people want to throw at everything. She says: “There are specific applications that are really valuable for this and there are limits – preparation that has to happen. So, my whole session is really around saying, ‘Okay, you want to use some AI, you want to apply this machine learning tool to a data set, but what does that actually mean?’ You can’t just yell ‘AI’ at a computer – there must be a process and a system. The datasets must be structured, curated and annotated in a way that you can actually get the machine to learn so that the algorithm and the AI can understand the structures. You must teach the machine about the data set.”
Vangeloff thinks that going forward, we are going to see a lot of people starting to dive into AI, but not know how to actually operationalise this. She gives examples of the sorts of questions which need to be asked: What’s the team of people needed? How do I pull data that I already have? How do I start looking forward, to collect data in an appropriate way to apply these tools? Do we have the bioinformaticists and the technologist who can design and iterate on those algorithms, to apply them to the data?
Vangeloff shares that one of the talks in her session at SLAS2023 is from a company that started around an AI algorithm. Another purchased an AI algorithm as part of a value add– it was selling instrumentation and wanted to be able to do some analysis based on the images its clients were getting. There are also companies who have developed a couple of interesting tools that are out on the market, that make it easier for people to get into bioinformatics pipelines. She says: “I think we’re going to see more of everyone starting to think not just that this is something they want to do, but ask how to make it happen and start to pull the teams into place to get the datasets into a shape and structure where this can happen.”
So, what’s currently standing in the way of realising the industry’s full potential? Vangeloff thinks the answer is two-fold. First, there is a specific scientific discipline around how the data must be. Even on the periphery of it, there is a lot of effort that goes into creating an appropriate dataset that you can point an algorithm to, that you can perform machine learning on. She thinks that is something that a lot of people are going to run up against when they start to think about implementing this tool. She thinks there’s a gap in the tactical implementation strategy that people are really going to have to invest in and the people that they’re going to need to make use of the tools in a way that doesn’t just ‘give them gibberish’ at the end.
The second obstacle is that while this has improved over the past 10 years, cloud computing really must be a major part of this. Vangeloff says that there is a number of cloud providers and they’re incredibly secure. The trouble, though, is that she still sees some reticence around safety. The data in question for those who have collected the data is somewhat of a lifeblood which has required a lot of work. Thus, there is anxiety about whether these people want that data in the cloud. She says: “There are still some misperceptions about how secure that data really is. If you think about the fact that Amazon is using this, Google is using this, and the amount of data that they’re trying to secure on the cloud… it’s a secure system. It really is a tool that people are going to have to lean into if they want to be able to analyse datasets of this size with this complex nature – more than I think people are comfortable with at the moment, depending on the company and their strategy.”
A move towards open data
“I think [open data] is kind of the holy grail for everybody”, says Vangeloff. “Most people who get into science do it for the love of knowledge. They want that insight. I think that part of what I love about working in life sciences is it tends to be very collaborative more than it is competitive, even on the side of science that can be highly competitive. I think there is still a very good instinct to share the data that everybody could benefit from, that you’re not using in a proprietary fashion. Everybody’s in business, you have to keep your business going, but I think there’s still that reaction that says if I have something valuable, and it would be valuable to other people, and not to my detriment to share it, I would like to be able to do that.”
The problem we run into with data sharing is that scientific data is so complex. Vangeloff gives the example of healthcare data and says how, when you think about HL7 and Fire and the languages they use to share it, they’ve created a structure that confines their data specifically into categories so that you can specify ‘apples to apples’. This has its limitations. She says: “You can’t control everything; you can’t put everything into a defined systematic language. But on the scientific side, there have been some attempts at that, especially around instruments, languages, and integration. I think that a lot of places are working on a projects to be able to better share protocols so that you’re not just going to some paper and looking at a method. I think anybody who’s done that knows that you can sometimes recapitulate how they’ve done it, and sometimes you can’t. So, if we can share the methods, then we know how the data was collected. If we can start to categorise and say, ‘this is the method I used exactly and I can share it with you exactly, so that you can also collect data in this way’, then we can create pockets of shareable data. Obviously, nobody wants their PII or their PHI in a multiple people’s hands, but there are ways to aggregate and abstract data into meaningful sets so you get you get the data, but not the identifiers, or in an aggregated way where it’s still valuable, in some way, shape or form.”
SLAS 2023 Supplement, Volume 24 – Issue 1, Winter 2022/2023
Abbey Vangeloff has over 12 years of experience working in IT Support and Project Management and over five years of experience working in a scientific laboratory setting, including a master’s degree in Biochemistry from UW-Madison. She has worked in the life sciences, healthcare, and software development fields and has been with Yahara Software for 10 years.