Interpreting leukemia proteomics data
There are many things that can be limiting factors in the research process. In the case of medical professor Steven Kornblau, the time pressures of the biological infomaticians he works with have been significant limitations.
Kornblau, a Professor of Medicine in the Department of Leukemia & Department of Stem Cell Transplantation and Cellular Therapy at UT MD Anderson Cancer Center in Houston, Texas, US, uses proteomics – the large-scale study of proteins – in the fight against leukemia. The approach that he and his colleagues take is to look simultaneously at hundreds of proteins to define patterns of protein activation in acute myeloid leukemia (AML) and identify proteins whose function are key to the survival of leukemic cells.
His laboratory developed the techniques required to use a reverse phase protein array (RPPA), which measures the relative expression levels of a protein in many samples simultaneously, for the study of leukemia and he is recognized as the leader in this field. Another key feature of Kornblau’s work is the vast repository of leukemia patient samples that he has collected and can use for studies.
Kornblau and his colleagues – which includes five people in the lab, two coordinators that collect samples from the patients, and a database manager - make DNA, RNA, protein, serum and cryopreserve viable cells. He explained that ‘these materials can be used for a huge list of experiments by the researchers that we send samples to.’
Gaining insight from these many experiments is a complex task. The repository includes samples from over 2000 AML patients and the RPPA that Kornblau’s team has built uses samples from 511 AML cases. This approach enables many studies to go on simultaneously, speeding up research and enabling patterns to be uncovered. However, the volumes of data are huge, particularly considering that there are around 80 different fields of clinical information for each patient, which all need to be treated differently depending whether they are simple details like the patient’s name or more complicated features such as the patient’s response to a particular drug or combination chemotherapy regimen. These samples are then screened against hundreds of antibodies.
The volumes and complexities of the data prevent patterns being spotted without in-depth statistical analysis that is outside the normal range of expertise of biologists, even biologists like Kornblau who also has an economics degree.
As a result, data is taken to statistician colleagues for analysis, a process that can take time as these colleagues’ expertise is very much in demand. ‘I was always having to rely on statisticians. They did a great job but it involved a lot of going back and forth,’ explained Kornblau.
All this changed in June 2013 when Kornblau purchased Qlucore Omics Explorer. ‘Qlucore allows me to do analysis on my own,’ he said. ‘I can now check things that I know are biologically relevant on the fly.’ This, he said, has ‘speeded up the process and facilitated discovery. It has definitely been a big enabler.’
The software allows Kornblau to explore his data statistically and, for example, change outlier parameters. It also enables him to generate heat maps that quickly show relevant patterns. “We can look at proteins, see why some are particularly important compared with other functionally-related proteins and look for correlations between protein clusters from one functional group. We can see how proteins are interacting with antibodies and can classify them into groups. Formerly my statistician would do all this,” he observed. “The more I use the software the more I keep discovering new things.”
He said that the most common functions that he uses in Qlucore Omics Explorer are the heatmap generator and the ability to do principal component analysis (PCA). He also uses the 3D functions within the PCA. The change has been dramatic. “I’ve been working with protein arrays for seven years and always had to get bioinfomaticians to do the analysis. They are very much in demand so they are a rate-limiting step.” In fact, he said that it usually took a couple of months to get results. “Now it is pretty easy to put my data in and I can get results within an hour – and I play with my dataset more now that I can do it myself.”
Kornblau is one of the first researcher to apply the Qlucore software to proteomics.. However, he believes that it is well suited to his area of research and to other areas too.
He has found the software quite straightforward to use, with advice from Qlucore. “Like all complicated software it has a learning curve. However, once I’d learnt how the program set up its dataset there was almost nothing that I needed to do. I just had to rearrange my dataset a bit, for example to ensure that my 230 antibodies are at the end. I don’t think it’s any more difficult to learn than something like Excel,” he explained, although he added that it does help to have some statistical background in order to get the most out of it.
And when he has found things that could be improved he has fed these ideas back to Qlucore, which is working on improvements to the software based on user feedback.
Qlucore started as a collaborative research project at Lund University, Sweden, supported by researchers at the Departments of Mathematics and Clinical Genetics, in order to address the vast amount of high-dimensional data generated with microarray gene expression analysis. As a result, it was recognised that an interactive scientific software tool was needed to conceptualise the ideas evolving from the research collaboration.
The basic concept behind the software is to provide a tool that can take full advantage of the most powerful pattern recogniser that exists - the human brain. The result is a core software engine that visualises the data in 3D and will aid the user in identifying hidden structures and patterns. Over the last few years, major efforts have been made to optimise the early ideas and to develop a core software engine that is extremely fast, allowing the user to interactively and in real time instantly explore and analyse high-dimensional data sets with the use of a normal PC.
Qlucore was founded in early 2007 and the first product released was the “Qlucore Gene Expression Explorer 1.0”. The latest version of this software, now called "Qlucore Omics Explorer 2.0", was released in May 2009, and represents a major step forward with the added support for hierarchical clustering, scatter plots and powerful log function. The combination of instant visualisation and advanced statistics support gives the user new opportunities. All user action is at most two mouse clicks away. The Company's early customers are mainly from the Life-science and Biotech industries, but solutions for other industries are currently under development.
One of the early key methods used by Qlucore Gene Expression Explorer to visualise data is dynamic principal component analysis (PCA), an innovative way of combining PCA analysis with immediate user interaction. Dynamic PCA is PCA analysis combined with instant user response, a combination which provides an optimal way for users to visualise and analyse a large dataset by presenting a comprehensive view of the data set at the same time, since the user is given full freedom to explore all possible versions of the presented view. Later versions combine PCA analysis with other analysis methods such as hierarchical clustering. http://www.qlucore.com/