Dr. Steve Arlington and Dr. Nick Lynch, The Pistoia Alliance, provide insight into pre-competitive collaboration and data sharing as we thinks about R&D beyond the pandemic
At the Pistoia Alliance, our mission is to encourage collaboration across the life sciences ecosystem. We believe that scientific advances and breakthroughs cannot be made alone and require the commitment of organisations to work together. The Alliance was formed in 2009 by representatives of AstraZeneca, GSK, Novartis and Pfizer, during a conference in Pistoia, Italy. We realised, discussing our current work focuses, that many of us were effectively duplicating each other’s efforts. The obvious waste of resources was something we felt couldn’t continue, and so the Alliance was formed. From that small group, it has grown considerably in the last decade; more than 150 member companies from around the world now make up our non-profit organisation, which includes larger pharma, smaller biotechs and startups, academic groups, patient organisations as well as, publishers, and technology companies.
Sharing, partnering and collaborating has become even more crucial to drug discovery and innovation of late. The COVID-19 pandemic has driven this and proven that working together is essential if we’re to find vital treatments for patients. All organisations face many of the same issues – whether that’s a shortage of skills, a lack of interoperable tools, or data being impossible to share. This is why our focus is on encouraging pre-competitive collaboration: so that problems encountered can be solved together and the whole industry can benefit from advances, including the most important group of stakeholders – patients.
We currently have a portfolio of 14 projects and community of interest expert groups working to address a variety of issues (see fig.1), from retrieving data from electronic lab notebooks (ELNs), to an open-source tool suite (HELM) to help researchers represent complex biomolecules. The thread that ties our projects together is data stewardship and the FAIR principles (which we expand upon below): essentially how we capture, use and share data. In recent years, a greater interest in and accessibility of personal genomics, electronic health records (EHRs), ‘smart’ lab instrumentation, and wearable devices has increased the amount of data generated, in a vast variety of formats. To gain insights from the data, the interest in analytics, artificial intelligence (AI), and machine and deep learning (ML/DL) has grown simultaneously. This has increased the need for industry-wide data standards to ensure these technologies can meet their potential and benefit all users.
Many of our projects and themes focus on the role of such standards; in this article, we’ll take a closer look at three of those initiatives – FAIR Implementation, the Chemical Safety Library (CSL) and Lab of the Future (LoTF) – as well as discuss the future of collaborative working in the life sciences.
FAIR Implementation
One of our most significant projects is our work around FAIR, because of its intersectional applicability for many of our other initiatives (1). The FAIR guiding principles (Findable,Accessible, Interoperable, Reusable) are essential for ensuring organisations’ AL and ML tools and initiatives are interoperable with the efforts or their peers, and can be run at scale (2). The FAIR Implementation project includes a range of tools to help organisations adopt these industry-wide principles for data management and stewardship, and we’ve also been collaborating with other FAIR initiatives, including the FAIRplus project to drive wider industry impact. As the life sciences sector continues to digitally transform, and in particular in light of the current COVID-19 crisis, this clear and practical guidance on how data and relevant metadata are captured and managed will be crucial to ensure greater collaboration and more effective partnerships.
A key element is our freely accessible Toolkit, which we launched this year (3). This enables organisations to realise the value of their data, accomplish effective data management, and build a more collaborative research environment, and contains numerous methods, tools, training and a change management guide to help with FAIR adoption. We already have five use cases, developed with organisations such as AstraZeneca, Roche, Bayer, and Dutch company, The Hyve – which are using the Toolkit to build knowledge graphs from COVID-19 clinical trials data, with the ultimate aim of finding suitable drug candidates.
The FAIR Implementation project and its principles around data management and stewardship are also part of broader strategic themes and core areas we are currently planning. These principles are similarly an underlying element of most other projects, like our DataFAIRy: Bioassay initiative, which aims to convert biological assay protocols and metadata contained in research publications into a machine-readable FAIR format (4). As our work around FAIR continues, we’re calling on as many organisations as possible to share their expertise and get involved with the project.
The Chemical Safety Library
Launched in 2017 and designed to capture, store and share hazardous reaction information to improve laboratory safety, the Chemical Safety Library (CSL) project helps ensure critical safety incident details are not stored in internal silos and databases, but are easily discoverable and sharable (5). The CSL incorporates a database, which provides a repository for scientists to capture and share hazardous reaction information – to avoid the same safety incidents happening again. This year, we expanded the project by partnering with the Chemical Abstracts Service (CAS), a division of the American Chemical Society, which will grow the crowd-sourced database and further enhance its reach (6).
The collaboration with CAS and the expansion has been crucial in bringing about a culture change in the chemical community. CAS’s technical capabilities will be vital, developing a new search interface for the library amongst other important features, and the momentum to drive the project forward has been critical. Even with the most comprehensive database in the world, if a culture that encourages incident data to be recorded and shared isn’t there, safety won’t improve. With the CSL project, the aim has always been to put it out to the community and receive continuous feedback, and CAS’s influence within the chemical industry will enable us to do this. Looking forward, we hope to see the project grow and thrive as it continues to be enhanced by the wider community, making the global chemical community safer.
The Lab of the Future
Against the backdrop of increasing digital transformation in R&D, our Lab of the Future (LotF) strategic area, community and projects aim to help organisations modernise their lab environments by embracing technology, data and automation (7). As the application of digital tools in drug discovery advances, a considerable increase in instrument and device data will be generated, alongside the aforementioned rising interest in personal genomics, digitisation of health records, and use of wearable devices. This puts even greater demands on the already limited data sharing systems used in R&D, which in turn hampers the integrity and reproducibility of experiments. The primary aim of the LotF theme is to address these issues.
One particular area of focus to date has been the Methods Database (MethodDB) initiative, for which we partnered with the Allotrope Foundation (8, 9). MethodDB is a framework used to store, search and retrieve a digital version of an HPLC analytical method, to improve a scientist’s ability to retain institutional knowledge and improve overall execution and reproducibility. Like the CSL, we look to members to guide interest; we’re hosting virtual lab meetings to identify the challenges scientists are facing, and which projects will be most useful for the LoTF community. We’re also planning a ‘Methods on Demand’ initiative that’s something of an evolution of MethodDB: a repository of both public and private (or commercial) methods to launch late 2020. Likewise, our Universal Integration Knowledge Base (UIKB) is currently in the planning stages and will act as a database of successful integration ‘recipes’ and APIs, saving time and money during experiments and improving data sharing. Coupled with better integration within the lab environment, our Semantic Enrichment of Experiment Data (SEED) project is also looking to enhance the data that is collected through use of improved vocabularies and offering a best practice workflow (10).
Both Methods on Demand and the UIKB will be contributed to and sustained by the community, which is something that underpins all of our aspirations for the Alliance going forward. We want our projects not just to be of use within the realms of the organisation, but to be continually enhanced, evolved and sustained by companies and individuals across the entire R&D ecosystem, so we want to encourage participation and engagement throughout the industry so that everyone can benefit.
An interdisciplinary approach
Looking to the future beyond our individual projects, data stewardship and digitalisation will continue to be a huge focus for the Pistoia Alliance. Technical advancement, particularly around AI and automation, is very competitive, and organisations are so intent on using their own platforms that tools often remain incompatible. To that end, we recently launched our digital health project, that’s initially focused on clinical trials but will continue to explore healthcare and health data across a range of areas (11).
Our exploration of digital health as against the ‘pure’ life sciences disciplines we’ve worked within to date is part of our recent strategy refresh, as we know there are many more gains to be made by looking at the crossover between research and healthcare delivery. It’s also evident that scientific breakthroughs are made at the boundaries between disciplines, so we are focused on exploring both life sciences and healthcare to expose the issues and areas where the Alliance can help. Technology – from AI and blockchain to quantum computing – is shifting gears day by day, enabling these technologies to mature and this will be essential as we look to run new projects across life sciences and healthcare. The rapid changes made to how we work, where we work and the amazing adoptions of technology to overcome the limitations to virtual working have shown that we can make these fundamental changes and we should continue to push forward so that we can leverage all forms of data at any point in the value chain and use it to deliver faster, safer and more efficacious treatments.
The road ahead for R&D innovation
We are in the midst of the most rapidly-evolving landscape for R&D and drug discovery, as the race to find treatments and vaccines for COVID-19 continues. The pandemic has inspired great collaborations across industries and eco-systems, particularly around producing a vaccine, with Chinese and Australian researchers making the COVID-19 genome freely available to find a vaccine faster (12). However, the situation has also exposed significant challenges with sharing knowledge and data. The results from a lot of COVID-19 research have still not been easily shared or used across disciplines – whether that’s due to data on regional outbreaks being shared too late, or real-time treatment data being siloed in hospital systems (13, 14). This is by no means due to want of trying; there is evidently a growing willingness to share data, which is why we are committed to ensuring the frameworks are in place to facilitate this sharing.
For the Pistoia Alliance, the challenges brought about by the pandemic have furthered our dedication to drive collaboration at a time when working together is more important than ever. Helping companies to be more dynamic and productive in the wake of COVID-19 and sparking greater engagement from all sectors of life sciences and healthcare, will continue to be our primary goals. This is why our strategy continues to evolve to develop projects with a collaborative and community focus. This includes the previously mentioned Methods on Demand and UIKB, as well as the HELM (Hierarchical Editing Language for Macromolecules) project, which includes a public monomer repository that we’re working on with Scilligence.
These community-focused projects and the tools they’ll provide will be crucial for life sciences organisations to continue to innovate in a post-COVID-19 world – but for such initiatives to work, they require buy-in from across the industry. It’s essential that cross-organisation and interdisciplinary collaboration goes beyond existing commercial partnerships and extends the boundaries of innovation. Only this way can data be shared and knowledge be acted upon, which is why we welcome all organisations who can contribute in some way to life sciences R&D to come and get involved, to invest and actively participate in helping us ensure the benefits of scientific discovery can reach those who need it faster.
Volume 21, Issue 4 – Fall 2020
Main image credit: NESA by Makers

Nick Lynch, Consultant at the Pistoia Alliance, has over 20 years’ experience in life science informatics and was at AstraZeneca for 13 years leading teams in R&D Informatics, working especially on global integration projects within pre-clinical & early clinical research, externalisation & data exchange in R&D. He is a co-founder of Pistoia Alliance as well supporting the initial Open PHACTS definitions, eTriks and EBI industry programme.

The Pistoia Alliance
Steve Arlington, President of the Pistoia Alliance, has worked in the pharmaceutical and diagnostics industry for over 40 years. He began as a research scientist in the field of immunology and he was part of the team that developed and launched Clearblue pregnancy tests. He is a retired partner from PwC and led the Pharmaceutical Team in Advisory Services, and also previously led the IBM Life Sciences and Pharmaceutical Global Teams. Arlington became president of The Pistoia Alliance in 2015.
References
- https://www.pistoiaalliance.org/projects/current-projects/fair-implementation/
- https://www.nature.com/articles/sdata201618
- https://fairtoolkit.pistoiaalliance.org/
- https://www.pistoiaalliance.org/projects/current-projects/datafairy-project/
- https://www.pistoiaalliance.org/projects/current-projects/chemical-safety-library/
- https://www.pistoiaalliance.org/news/cas-csl-announcement/
- https://www.pistoiaalliance.org/projects/future-projects/lab-of-the-future/
- https://www.pistoiaalliance.org/projects/current-projects/methods-database/
- https://www.pistoiaalliance.org/news/methoddb_launch/
- https://www.pistoiaalliance.org/projects/current-projects/seed-project/
- https://www.pistoiaalliance.org/news/12835/
- https://www.weforum.org/agenda/2020/05/global-science-collaboration-open-source-covid-19/
- https://news.sky.com/story/coronavirus-leicester-lockdown-unnecessary-if-data-had-been-known-earlier-says-citys-mayor-12029450
- https://www.thelancet.com/pdfs/journals/landig/PIIS2589-7500(20)30082-0.pdf