Wellcome Sanger Institute researchers have developed new software to quickly query databases generated from single-cell sequencing.
The open access scfind software can be used to identify which cell types are active in any combination of genes, and enables analysis of multiple datasets containing millions of cells by a wide range of users, on a standard computer.
Processing these datasets takes a few seconds, saving time and computing costs. It is “a tool that can function like a search engine”, and users can input free text and gene names. For ease, it incorporates techniques from natural language processing to allow for arbitrary queries.
Sequencing techniques for genetic material from an individual cell have advanced rapidly over the last decade. Single-cell RNA sequencing (scRNAseq), used to assess which genes are active in individual cells, can be used on millions of cells at once and generates vast amounts of data (2.2 GB for the Human Kidney Atlas). Projects including the Human Cell Atlas and the Malaria Cell Atlas are using such techniques to uncover and characterise all of the cell types present in an organism or population. Data must be easy to access and query, by a wide range of researchers, to get the most value from them.
To allow for fast and efficient access, the new software tool uses a two-step strategy to compress data ~100-fold. According to Wellcome Sanger Institute efficient decompression makes it possible to query the data quickly and scfind can perform large scale analysis of datasets involving millions of cells without special hardware.
Dr Jimmy Lee, Postdoctoral Fellow at the Wellcome Sanger Institute, and lead author of the research said: “The advances of multiomics methods have opened up an unprecedented opportunity to appreciate the landscape and dynamics of gene regulatory networks. Scfind will help us identify the genomic regions that regulate gene activity – even if those regions are distant from their targets.”
Researchers show that scfind is a more accurate and precise method to identify new genetic markers that are associated with, or define, a cell type, compared with manually curated databases or other computational methods available.
Dr Jonah Cool, Science Program Officer at the Chan Zuckerberg Initiative, said: “New, faster analysis methods are crucial for finding promising insights in single-cell data, including in the Human Cell Atlas. User-friendly tools like scfind are accelerating the pace of science and the ability of researchers to build off of each other’s work, and the Chan Zuckerberg Initiative is proud to support the team that developed this technology.”