Toxicological safety testing of compounds has not yet advanced to the point where measurement of gene expression is being incorporated in the internal decision process of most companies, much less the FDA approval process. The FDA has asked for voluntary submissions, is preparing guidelines for gene expression data and has identified surrogate biomarker assays as the avenue to escape the dilemma highlighted in the FDA white paper �Innovation or Stagnation: Challenge and Opportunity on the Critical Path to New Medical Products
However, toxicology methods for testing drug candidates prior to clinical trials still rely on traditional animal studies and histopathology at the end of the preclinical discovery and optimisation process. Animal tissues are examined for morphological damage following rising dose or multidose treatment and sacrifice, and clinical chemistry indicators of tissue damage are measured.
It is well known that the rat, dog and sometimes non-human primate models used for toxicological testing often do not predict human response, and thus drug failures occur during clinical development or even later due to unanticipated adverse effects in humans. If in vitro methods of predicting safety in the human can be developed, or predictive molecular biomarkers useful in vitro and in vivo can be identified, then by discovering, profiling and addressing safety issues earlier in the preclinical process the overall success and productivity of drug discovery can be increased.
The cost to develop a new successful drug is between $0.8 to $1.7 billion2. If predictive toxicology methods are successfully implemented, a big gain will be in the increased productivity that results from more submissions and more drugs successfully reaching the clinic for the same investment. However, just decreasing the failure rate a little by achieving “10% improvement in predicting failures before clinical trials, could save $100 million in development costs per drug”3. That alone is significant.
There is a shared belief that gene expression analysis holds the key to implementing a more efficient and successful drug discovery process – from discovery to market. However, any use of surrogate markers must prove to be as reliable and dose dependent as the actual histopathology and clinical chemistry, must provide data with a faster turn-around time than histopathology, and/or must provide higher sensitivity and earlier prediction of safety issues, or permit cost-effective assessment of safety at the level of cell-based studies or provide a means to better translate animal findings to human. The industry has the hope that once biomarkers are identified, they can be measured in vitro using primary or cultured animal and human cells. This would permit compound safety to be profiled early in the drug discovery optimisation process, and possibly provide medicinal chemists an assay that can be used to optimise safety in the same way efficacy and selectivity is optimised today before testing in animals begins, by performing precise dose response curves on structural analogs and compiling the quantitative structure activity relationship (QSAR) for safety. Furthermore, these in vitro cell studies should provide a basis to derive direct correlations between animal and human safety, by comparing cell preparations from each species. If this can be achieved, then the conundrum that the industry has had to accept all these years, that animals are not good models of human, can be successfully addressed and a more efficient drug discovery process will result.
As important as high density arrays have been to the identification of new targets, and as useful as real time PCR has been to understand many aspects of gene expression at the basic research level, neither have turned out to be the right tool for safety. Nor have any of the methods that have been introduced during the past 10 to 20 years. In addition, it is now clear that it is not likely that a single gene will be identified that is predictive of each type of toxicity, but rather a family of genes, or a ‘signature’, will have to be measured in order to reliably reflect each safety ‘phenotype’.
What is the basis for stating that current high density arrays (eg Affymetrix, Santa Clara, CA; Agilent, Palo Alto, CA; Codelink, Amersham Biosciences, Division of GE Healthcare, Piscataway, NJ; and Illumina, San Diego, CA) will not provide the required methodology? A mini-monograph was published in March 2004 describing the results of the HESI Collaborative Research Program4. The conclusion was that even though dose dependent changes could be observed, there was a lack of agreement between research centres and platforms at the level of what genes were identified from the same samples, though all methods pointed to the same pathways. Harsher criticisms have been levelled by others. Tan et al, testing the same samples on three platforms, found that not only did they identify different numbers of genes that changed by greater than two-fold at a given confidence limit (117, 67 and 34 total for each platform, respectively), but only four genes were identified in common by all three5. The lack of agreement, or concordance, reported in these and other studies, as well as the difficulty repeating results between labs, is anathema to safety assessment – which must be rock solid and repeatable. A critical review of expert opinions appeared in the October 2004 issue of Science by Elliot Marchall6, concluding that high density array data is just not reliable. Whether the sources of variability in the high density array assay process can be identified and quantified sufficiently for the assay to pass FDA audit, is also a very real issue. Aside from these problems is the $200 to $500 materials, reagent and labour cost to test each sample, making this approach prohibitively expensive even if all the quantitative and repeatability issues can be solved.
What are the issues with PCR, technology that has been available for the past 20 years? Quantitative real-time reverse-transcription PCR (RT-PCR; ABI, Foster City, CA) was developed to measure gene expression. However, even as recently as September 2004, the conclusion is that in using PCR “it can be difficult to achieve not just a technically accurate but a biologically relevant result. ...Real-time RT-PCR appears to be a fragile assay that makes accurate data interpretation difficult”7. The authors of this publication point to many issues, but one which is universally recognised in the field as a problem is the need to extract or purify RNA from the sample before assay. Not only does this introduce variability between samples, but to obtain the best data extraction must be performed manually. The best technician can only extract ~40 samples a day. On top of this, each gene is typically measured by itself, though PCR can be run measuring two genes per well. Thus, it can take from two to four months to generate the necessary data from a single animal study – the same length of time that it takes to read the histology. A second issue is whether the PCR measurement of gene expression can predict safety, either at earlier time points in a multi-dose experiment, or at a lower concentration, than standard histology. Figure 1 depicts the PCR measurement of a gene that is a predictor of kidney toxicity. This is data from a oneweek safety study in primates conducted by Schering-Plough8, seven days of dosing with the antibiotics Everninomicin or Gentamicin or the combination treatment of the two. The level of the gene Clusterin is shown for each animal determined using PCR (red symbols), compared to the cut-off for significance used to distinguish effect from control (dashed red line). Adverse histology findings were observed for the starred treatments of high day seven Everninomicin and both day one and day seven of combination treatment. By PCR, Clusterin was found to be only marginally elevated day one of the high dose Everninomicin treatment, and clearly elevated by the treatments for which adverse histology was observed. Thus, the PCR data was essentially only significant under the same conditions that histology, an end-stage measurement, was observed, therefore did not provide any additional useful of predictive information.
A bigger issue is the quantitative accuracy of RTPCR data. For instance, the difficulty of obtaining reliable dose response curves from which precise EC50 values (the dose of compound producing a half maximal effect) can be determined. In drug discovery EC50s are used to quantify drug efficacy, specificity, metabolism and safety. If you go high enough in dose nearly every drug will produce adverse side-effects. Sound safety assessment requires that this toxic dose be identified. It is the therapeutic safety window, the difference between the dose that confers efficacy and the dose at which adverse side-effects are seen, that determines the marketability of a potential drug. Just scanning the literature tells the story about PCR – there are next to no dose response curves reported. Compare this situation to protein-based biochemical assays for which there are tens of thousands of dose response curves reported, and on which the modern drug discovery process is based. Figure 2 depicts dose response data for PCR9, demonstrating that there can be as much as a 20-fold difference between doses where the response should have reached saturation, and where there should be no difference, making it impossible to obtain a precise EC50 for a compound using PCR.
The first of a new generation of assay tools has been launched which provides the multiplexed, quantitative, and high sample throughput performance necessary for drug discovery and safety. This is the ArrayPlate quantitative nuclease protection assay (qNPA™ HTG, Tucson, AZ). Using a standard pipetting workstation, one person can test 2,000 samples a day, against 16 genes. This means all the samples from an animal study can be tested in a single day – work that would take more than 50 days to perform by PCR based solely on the rate of extracting 40 samples a day and not taking into account the difference between measuring 16 genes at a time versus one or two genes at a time.
But throughput without quantitative, repeatable accuracy and precision is not useful. Figure 2 depicts a qNPA dose response curve compared to PCR. Even the raw data without normalisation to 18S control RNA fits classic saturation kinetics (black diamonds), and enables a precise EC50 to be calculated. Lilly has presented data demonstrating the fit of qNPA data to saturation kinetics, showing the repeatability of the dose response curves across different days, and also showing the measurement of differential EC50 values for the regulation of two genes (the apoptosis related genes RelB and A20) by the same compound (Figure 3). This is an exciting set of data, because it demonstrates two points. One is an important systems biology observation. The precision of qNPA reveals that the specificity by which a compound regulates gene expression can be differentiated based on EC50 values, the same as the specificity of compound effect on different proteins can be differentiated. Since therapeutic window is all about specificity, this demonstrates that a gene expression therapeutic window can be determined using qNPA, namely, a low EC50 for regulation of genes related to efficacy, and a high EC50 for the regulation of genes related to toxicity. The other point is that these qNPA dose response results demonstrate that medicinal chemists can establish how the structure of a class of analogs quantitatively relates to their safety activity – safety QSAR. Armed with this assay and this type and quality of information, medicinal chemists will be able for the first time to routinely optimise the safety of clinical candidates not only during testing in animals to assess efficacy, but with an in vitro cellular model system, before testing in animals. Using this assay and an in vitro system, medicinal chemists will be able to correlate safety in animals and animal cells to human cells, and consequently predict safety in humans better than possible today. Conversely, it will be possible to identify when a toxicity seen in animals is likely not to be an issue in humans, permitting compounds to be salvaged which otherwise might be abandoned.
There is more... The same quantitative (not just relative) level of gene expression can be measured by different labs, or by the same lab months (or years) later with the same quantitative result within 80% to 90%. This is the repeatability that is necessary for investigators to be able to compare results or independently confirm results. Because qNPA does not require that the RNA be extracted the nature of samples is not an issue. Samples can be fresh or frozen cells, tissues, organisms, or can be formalin fixed paraffin embedded (FFPE) tissue. The latter is extremely important because in the current practice (animal and human) tissues are fixed for histology. Therefore the identical fixed tissue used for histology can be used to measure gene expression biomarkers, providing a parallel molecular assessment of toxicity. Furthermore, there are huge available archives of such fixed samples that can now be used to validate gene biomarkers. qNPA has more than sufficient sensitivity to measure samples as small as a few hundred cells or the equivalent of 5ng total RNA (the amount that could be recovered from ~500 cells with a 100% yield), or to measure 20mg tissue, or to detect as few as 600 molecules, however one wants to express sensitivity.
qNPA delivers the necessary statistical power. Whole assay reproducibility of >90% (ie <10% average co-efficient of variability, %CV) between treatments, whether in vitro or measured from animal tissues (in vivo), whether a treatment of hours or days. Table 1 demonstrates that a CV of 7% between animals (within treatment) can be obtained for the measurement of 15 genes from liver tissue, following 11 days of dosing. This data also demonstrates that the variability of each gene across a tissue can be evaluated, as well as between animals within a treatment to identify outliers, such as gene 15, that just do not belong in a set of genes being used to predict tox. What does this precision mean when it comes to tox assessment? Figure 1 depicts the ability to demonstrate dose dependent effects that are indicative of kidney toxicity, and to generate data on day one that is predictive of more severe effects, up to and including the appearance of histological findings, that are otherwise not evident until day seven of this sevenday dosing primate model. The PCR data (red symbols) has already been discussed and the blue symbols depict the gene (clusterin) level of each animal measured using qNPA, compared to the cut-off for significance above the normal controls (blue dashed line). The qNPA data indicates a significant increase in clusterin on day one of low everninomicin treatment, although histology does not pick up the toxicity of this compound until seven days of dosing with a high concentration. The qNPA results are dose dependent as a function of the number of doses and the level of dosing, which means it provides a good quantitative handle on the assessment of kidney tox. Therefore, the sensitivity to detect potential adverse effects earlier than can be detected by classic means, or by PCR, is associated with the ability to precisely quantify those effects.
There are other platforms that may offer a similar quantitative, high throughput, multiplexed gene expression measurement solution in the future. Ilumina bead-based Sentrix® slide arrays (San Diego, CA) and Luminex (Austin, TX) flow cytometry-based liquid array xMAP® bead approaches. Illumina and Luminex both use beads and both are measurement methods rather than assay methods, a different derivative bead for each gene. There are multiple beads/gene, and colour differences between the beads is used to identify which gene is being measured. In the case of the Illumina Sentrix® Array products the beads (at least 30 per gene) are arrayed on to a surface of fibre-optic light guides, one bead per light guide and interrogated to identify what gene is being measured and how much of the target gene is bound. The RNA is purified, reverse transcribed to cDNA (the same problematic process as required for PCR), amplified and labelled by in vitro transcription to produce a cRNA target that is captured on to the beads for measurement. The capture molecules on each bead have a 23 base address used for decoding, and a 50 base cRNA target recognition sequence. Cost per sample remains prohibitively high for high throughput testing, $100 to $200. Platform precision of 1.3- fold (reproducibility of 30% CV) translates to significantly worse whole assay reproducibility once the sample-to-sample variability of extraction, amplification and labelling, much less in vitro or in vivo treatment, is factored in and reported to give an apples-to-apples comparison to the <10% average CV for qNPA. The sensitivity requires relatively large samples from which 50 to 200ng total RNA can be recovered (requiring use of ~5,000 to 20,000 cells per sample, if there is a 100% yield, but more cells required in practical terms). It is not clear whether the sensitivity is sufficient to detect single copy genes. Multiple beads are required to measure each gene, and each bead is measured separately, which means that the amount of detected material per bead is diluted. With Luminex assays, the beads remain in solution and analysis performed by a dedicated flow cytometer, flowing each bead past an excitation source and detector and quantifying each. The degree of multiplexing is limited to 50 to 100 genes per sample. The same dilution effect on sensitivity occurs as in the Illumina system. Using a very similar assay, purification of RNA, reverse transcription, and amplification/ labelling of cRNA, the sensitivity was only sufficient to detect moderately abundant (10 to 30 copies per cell), at a sample size of 1,000 to 2,000 ng total RNA11, severely limiting the utility of the Luminex approach. Standard deviation (reproducibility) ranged from 5% to 35% of the mean. GenoSpectra (Fremont, CA) has partnered with Luminex to offer the bDNA12 QauntiGene™ assay, a singleplexed assay, in a multiplexed format. No data is available on the performance of this product. Meso Scale Discovery (Gaithersburg, MD) markets MultiSpot™ microplate system which utilises electrochemiluminescence and an array of four to 10 elements/well of a 96-well microplate. Current protocols require RNA extraction, and sensitivity requires the use of at least 100,000 cells/sample to detect a high expressed genes such as actin (1,000 to 5,000 copies per cell). RNA is captured and labelled with a Ru(bpy)3 2+ labelled detection probe for detection. The need for greater than 50,000 cells/sample typically precludes their culture or treatment in microplates.
To date, qNPA is the only method with the required performance. The one question about practical use of qNPA might be whether the assay is sufficiently multiplexed. The ArrayPlate platform in which qNPA is marketed today is based on a 96- well microplate and use of a Universal Array™ of 16 genes/well. While there is no limitation on density per well and higher density array products are likely to be launched in the future, the current product has tremendous multiplexing flexibility, permitting it to be used to measure 16 to more than 100 genes at a time. The Universal Array in each well is programmed by the investigator to measure a specific custom set of genes. Every well can be programmed to measure the same 16 genes, permitting 96 samples to be measured per ArrayPlate, or 128 genes can be programmed across eight wells, permitting 12 samples to be measured per ArrayPlate. Most investigators are finding that predictive gene sets are limited to a handful of genes, easily handled by the ArrayPlate qNPA.
The data above demonstrates that there is now an assay that delivers the necessary repeatable, quantitative, sensitive, precise and statistically powerful and reproducible data necessary to implement gene expression based safety assessment. With this assay, the industry can begin in earnest to establish and validate the gene sets and the model systems for assessing safety in vitro and in vivo. There is no barrier to the use of cells, tissues, organisms or fixed tissues, or to the number of samples that can be efficiently and rapidly tested. This work can proceed at multiple centres and be compiled into a common database, or can be independently corroborated with as much confidence as any proteinbased biochemical assay result. Only the systems biology itself is limiting, whether unique sets of genes do exist that reflect each type of toxicity. The identification, validation, and use of these sets of genes for safety assessment is no longer limiting.
Bruce Seligmann is President, CEO and Chairman of High Throughput Genomics in Tucson, Arizona, USA. Bruce Seligmann is responsible for High Throughput Genomics’ overall strategy and business direction. Prior to High Throughput Genomics, he founded combinatorial chemistry pioneer SIDDCO and served as its President, CEO and Chairman through its sale to Discovery Partners International and the resulting divestiture of HTG. Prior to SIDDCO, Seligmann was Center Director of Selectide. During his tenure with that company, he tripled its size and staged it for purchase by Marion Merrill Dow (now Aventis). Before Selectide, he was a Senior Research Fellow at Ciba-Geigy (now Novartis) for seven years. Seligmann also spent seven years as a Senior Staff Scientist at the National Institute of Health, National Institute of Allergy and Infectious Diseases and Laboratory of Clinical Investigation, where he became internationally known for his research. Seligmann earned his doctorate from the University of Maryland and holds a BS in chemistry from Davidson College.
1 FDA White Paper. “Innovation or Stagnation: Challenge and Opportunity on the Critical Path to New Medical Products”, March, 2004.
2 Gilbert, J, Henske, P and Singh,A.“Rebuilding Big Pharma’s Business Model”, In Vivo, the Business & Medicine Report,Windhover Information, 21:10, Nov. 2003.
3 Boston Consulting Group. “A revolution in R&D: How Genomics and Genetics Will Affect Drug Development Costs and Times”, in PAREXEL’s Pharmaceutical R&D Statistical Sourcebook, 2002/2003.
4 “Application of Genomics to Mechanism-Based Risk Assessment”. Environmental Health Perspectives, 112: March, 2004.
5 Tan, PK, Downey,TJ, Spitsnagel Jr, EL, Xu, P, Fu, D, Dimitrov, DS, Lempicki, RA, Raaka, BM and Cam, MC. “Evaluation of Gene Expression Measurements from Commercial Microarray Platforms”. Nucleic Acids Research, 31: 5676-5684, 2003.
6 Marshall, E. “Getting the Noise Out of Gene Arrays”. Science, 22: 630-631, 2004.
7 Bustin, SA and Nolan,T. “Pitfalls of Quantitative Real- Time Reverse-Transcription Polymerase Chain Reaction”. J. Biomolecular Tech., 15: 155- 166, 2004.
8 Botros, I, Goodsaid, FM, Seligmann, B, Davis II, JW, Crawford, M, Smith, RJ, Martel, R and Rosenblum, IY. “Evaluation of a High Throughput ArrayPlate™ Test Platform for Genomic Biomarkers of Toxicity”. Society of Toxicology 43rd Annual Meeting and TosExpo, Abstract 1256March 20-25, Baltimore, MD, 2004.
9 Seligmann, B.“The q-NPA™ ArrayPlate: High-Throughput, Multiplexed, Gene Expression Microplate Assay for Target Validation, Screening, QSAR, Safety, and Diagnostics”. CHI Chips to Hits, Boston, MA, Sept 20-23, 2004.
10 Lee, J.“High Content and High Information Cell Based Assays”. CHI High Content Analysis, San Francisco, CA, Jan 30, 2004.
11 Yang, LI,Tran, DK and Wang, X.“BADGE, BeadsArray for the Detection of Gene Expression, a High-Throughput Diagnostic Bioassay”. Genome Research, 11: 1888-1989 (2001).
12 Collins, MI, Dauley, PJ, Shen, LP, Urdea, MS,Wuestehube, LJ and Kolberg, JA.“Branched DNA (bDNA) Technology for Direct Quantification of Nucleic Acids: Design and Performance”. Gene Quantification, F. Ferre, Ed. 1998. Drug