Drug Discovery
Drug Discovery World
Drug discovery in the age of big data
By Dr Prem Premsrirut
Fall 2017

How RNA interference and CRISPR/Cas9 technologies are helping to build better mouse models and push drug discovery into a new era.

Most, if not all, consumers are feeling the weight of rising drug costs. But while much of the controversy surrounding rising drug prices has focused on a handful of pharmaceutical companies that have snapped up old drugs and then aggressively raised prices, the fact is that drug prices are far more likely to further increase due to the volume of candidates that falter in late-stage efficacy studies and the fact that there is no more low-hanging fruit.

This may seem incongruous but it is not. A widely circulated study published last year by a trio of economists found that the costs of compounds abandoned during testing were linked to the costs of compounds that obtained FDA approval (1). Built into the jaw-dropping US$2.5 billion plus that the analysts estimated it now costs to bring a new drug to market – a figure derived from an analysis of 106 randomly selected drugs from 10 companies – were the cost of unsuccessful projects that faltered in the clinic.

A study published last year illustrates the substantial financial risks that can occur when candidate drugs flop in the clinic. From 2013-15, 24% of candidates in Phase II and Phase III trials failed to meet safety endpoints, while 54% of the candidates got shelved because they did not work (2). A separate analysis reported last year by PAREXEL, a life science consulting company, identified 38 trials enrolling more than 145,000 patients that failed to show efficacy (3). Not only is this impacting drug prices, it is also impacting productivity and patient care. One study found that for every billion US dollars spent on R&D, the number of new drugs approved has decreased by approximately 50% every nine years since 1950 (inflation-adjusted) (4).

One way to improve the outcomes of clinical trials is to rigorously validate each new drug in numerous preclinical assays and animal models; however, the difficulty is that there are so many targets being discovered that we cannot possibly build compounds to all targets and test each one in multiple disease states. Big data genomics, expression profiling and screening platforms are continuously identifying new causes for genetic disorders which has dramatically increased the number of potential novel therapeutic targets, but we do not know enough about each newly-identified target to readily pick winners. At least not fast enough to make a dent on raging drug prices.

This overload of data has created a bottleneck in the target validation process (5).What we need is to take advantage of new technologies in pre-clinical research to help us validate novel targets quickly and tell us what the potential toxicity profile is before we begin to develop a drug and spend that $100 million to get to a Phase II trial. While we can use bioinformatics and in vitro culture systems to help understand gene function, there is no substitute for animal models. Disease states do not exist outside a whole organism, which contain an intact immune system, microenvironment and 3-D structures that play a role in not only disease pathogenesis, but also therapeutic responses. So while animal models remain the gold standard for target validation and toxicity assessment, the long lead times and high cost associated with genetically engineered animals has prevented their routine use in the preclinical studies.

RNA interference (RNAi) in tandem with CRISPR/Cas9 technology provides a solution to this problem. With the evolution of RNAi and the advent of CRISPR/Cas9 technologies, the speed and precision in which genetically-engineered mouse models can be created is unprecedented. Powerful new algorithms and expression vectors give us the ability to generate reliable RNAi tools, which can be exploited experimentally to effectively and reversibly silence nearly any gene or gene combinations, not only in vitro but also in live mice and soon rats and higher organisms. In addition, continued progress in the implementation of CRISPR/Cas9 as a gene editing tool allows us to introduce specific genetic alterations in animals and create ‘designer’ models. Synergising these technologies will help us to better model clinical disorders and evaluate genetic and environmental stimuli in animal models, which will increase our confidence in predicting drug responses in humans and push drug discovery research into a new era.

In short, RNAi and CRISPR/Cas9 will bring us into a whole new era of preclinical in vivo validation, where thorough investigation of mechanisms are not only possible but will become a prerequisite for entry into the clinical arena.

The genomic revolution

Neither RNAi nor CRISPR/Cas9 would have much meaning today were it not for a massive effort that began in 1990 to identify and map our genes. The completion of the Human Genome Project (www.genome.gov) in 2003 (6,7), among other things, gave us many new drug targets to explore, and functional genomics tools have endeavoured to prioritise these targets and translate that knowledge into rational and reliable drug discovery (8).

The minefield is huge. Scientists from the Human Genome Sequencing Commission estimated in 2004 that there are between 20,000 and 24,000 protein coding genes (9). This means we have 10s of 1,000s of genes and at least as many proteins as potential targets for drug intervention to control human disease or injury, light years from the 600 or so proteins that drugs have targeted over the past 100 years (2).

But although human genomics has the capacity to dramatically increase the number of potential drug targets, the limited knowledge available about these targets has also led to an increased attrition rate for early-stage research projects (8). So, we need technologies capable of identifying, validating and prioritising thousands of genes to select the most promising targets.

It goes without saying that these technologies need to be high throughput to develop depth of knowledge about each target. And we need integration of multiple technological platforms to understand the role of genes in biological pathways that are involved in various diseases to select best points of intervention.

RNAi: A game-changing tool

RNAi is a naturally-occurring process that regulates gene expression in many organisms and can be exploited experimentally via the expression of synthetic short hairpin RNAs (shRNAs) to silence almost any gene (10,11).

In other words, RNAi technology is a way for us to rapidly turn genes off and on. It serves as a fast alternative to gene deletion and, importantly, because it is reversible, gene silencing by RNAi better mimics the dynamics of small molecule inhibition than permanent genetic knockouts. In a sense, it allows us to mimic drug therapy without the actual drug, which allows us to predict toxicities before we actually develop the drug.

Since 1995, when the first pathways of RNAi were discovered, we have developed a much deeper understanding of how this process works enabling us to bring RNAi technology to its highest peak thus far. We have essentially learned how to hijack RNAi by delivering synthetic RNAs to effectively silence any gene of interest.

In the early days of RNAi, we utilised small synthetic interfering RNAs (siRNAs) that could be transiently delivered to cells (12). Although rapid, cheap and effective for gene silencing, high concentrations of siRNAs were often used, which had the potential to cause off-target effects and generate artifacts (13-15). The second generation stem-loop short hairpin or shRNAs (so-named because of their structure) were a huge improvement because we could integrate them into the genome to get stable gene suppression (16). However, they too were not without flaws, often requiring screening of more than a dozen sequences to find an effective one, which discouraged some labs from using RNAi altogether.

Despite the skeptics, the field of RNAi has matured and found its way around this impediment. We quickly realised that sequence really matters in using this technology. Rather than relying on inaccurate prediction tools, our team, while at Cold Spring Harbor Laboratory, developed a high-throughput functional ‘sensor’ assay to unbiasedly evaluate anywhere from 20,000 to 40,000 shRNA sequences in parallel (17). We used this process to understand the requirements for effective RNAi and build a better system for potent gene knockdown (18).

We also learned that we were doing it wrong. By utilising only simple stem-loop shRNA structures, we were providing not one, but two RNA substrates because the passenger strand was also being incorporated into the RNA-induced silencing complex (RISC) at a high frequency, which can cause a lot of off-target effects and dilute the signal. One way to avoid this is to embed shRNAs into endogenous microRNA (miRNA) structures, guiding a more natural process where the passenger strand is degraded almost 100% of the time, helping to increase the potency of the shRNAs and avoid offtargets caused by the passenger strand (19).

Now we have even progressed beyond our thirdgeneration miR30 structures and developed miRE, the most effective scaffold for RNAi-mediated gene silencing (19,20). With this miRNA backbone, we see effective gene silencing even with only a single copy genomic integration, which also dramatically decreases your chances for off-target effects. With this new structure, we can also express shRNAs in tandem, enabling potent inhibition of multiple gene targets simultaneously. By using this approach we can mirror drugs that inhibit protein families rather than single enzymes, providing an avenue for critical preclinical evaluation of multitarget inhibition or combination therapies.

Our group also spent about a decade building data sets (more than 500,000 shRNAs) and compiled them, in collaboration with Christina Leslie’s group at Memorial Sloan Kettering Cancer Center, into a sequential learning algorithm called SplashRNA that allows researchers to predict the best shRNAs for a given gene of interest with a high degree of certainty (18,21). This dramatically reduces the amount of screening needed to predict microRNA-based shRNAs for many genes. Once we identify a good shRNA sequence we can use them to create mice in as little as three months. When combined with the tetracycline inducible system (22), expression of shRNAs can be controlled by treating mice with doxycycline (a tetracycline analog) in their food or drinking water, which will induce gene silencing. Removal of doxycycline will reverse the system and cause gene re-expression to endogenous levels in four to seven days (23).

A study of acute myeloid leukaemia nicely illustrates the utility of RNAi in mimicking therapy. The researchers conducting the study wanted to determine whether epigenetic regulators of an AML cell line they had created might be potential targets for drugs (24). Armed with 300 genes and a library of about 1,000 shRNAs targeting those genes, they added viral particles encoding pooled shRNA DNA to the AML cells, cultured them and then evaluated which shRNAs were lost after culturing – potential evidence that specific gene inhibition had killed the AML cell lines. From this experiment, the group identified the protein bromodomain-containing 4 (Brd4) as being critically required for disease maintenance and its inhibition as a vulnerability in tumour growth. The researchers found that both suppression of Brd4 using shRNAs or using JQ1, a potent inhibitor of the BET bromodomain family (BRD2, BRD3, BRD4 and BRDT) (25) had a robust effect on disease progression both in vitro and in vivo. These results are one of dozens illustrating the ability of RNAi to mimic drug therapy.

Following these results, using RNAi mice containing a tet-inducible Brd4 shRNA, Scott Lowe’s group at MSKCC was able to show that suppression of Brd4 alters normal hematopoiesis, causes skin and hair follicle abnormalities, resulting in hair loss, and depletes specific stem cell populations in the intestine leading to dramatic weight loss (26). Importantly, these effects were completely reversible upon doxycycline removal and Brd4 reexpression. Two years following this study, nearidentical phenotypes were reported in mice treated with optimised BET inhibitors, CPI203 and IBET15127.

These results highlight the potential of RNAi mice, which can be used to predict side-effects within susceptible tissues and organs and evaluate therapeutic indices of pharmacological inhibitors a priori.

CRISPR/Cas9: A Game-Changing Tool

As you can see, advances in RNAi have enabled us to model drug therapy. Now, by successfully harnessing CRISPR/Cas9 technology for genome editing, we can induce targeted, disease-specific mutations in the same RNAi animals, thus enabling the systematic interrogation of mammalian genome function in specific disease states.

The potential of CRISPR/Cas9 has been widely reported, not just in the science press but the popular press as well and is fast becoming the preferred methodology for engineering mice. From cancer to Huntington’s, scientists are using CRISPR/Cas9 to generate mouse models of disease and some scientists have begun using the CRISPRCas9 system to generate other animal models besides mice. To put things in context, it used to take 12-18 months to make a transgenic mouse using traditional techniques. CRISPR/Cas9 does it in anywhere from three to nine months.

Using CRISPR to perform genome editing is actually not a new concept, as TALENS and Zinc finger nucleases have been around for decades (28,29). However, the CRISPR/Cas9 system is much more flexible and efficient, making it faster and cheaper to use (30-32). It consists of a Cas9 enzyme that snips through DNA like a pair of molecular scissors and a small RNA molecule that directs the scissors to a specific sequence of DNA. Following DNA cleavage, there are two different kinds of repair mechanisms that can be used to introduce a desired mutation into a cell’s genome: the homology-directed repair (HDR) pathway which uses a DNA template to copy and repair and the non-homologous end-joining (NHEJ) system. HDR is precise but occurs at very low frequency in mammalian cells. The NHEJ is more efficient but less precise.

By delivering specific DNA templates, scientists can use the HDR pathway to make specific gene modification. They have tried to make this process more precise and more efficient by using proteins to inhibit the most dominant repair protein of NHEJ and inserting a gene into a predefined position of the genome in mouse cells (33). Another lab has been working on trying to improve the utility of the HDR pathway by using Scr7, which appears to enhance the efficiency and specificity of CRISPR by inhibiting DNA ligase 5. In fact, the group found a 19-fold increase in HDR efficiency with Scr7 (33).

Both of these approaches have been used to develop many different CRISPR-generated disease models, either by NHEJ which generates random mutations to inactivate genes or HDR which can replace portions of a gene, disrupting it with an artificial piece of DNA or even replacing it with alternative gene versions (ie human gene) (34-36).

While the applied use of Cas9 is now routine in many research labs, and even being used in a few clinical studies, there are other naturally-occurring Cas proteins. As these become better characterised they could potentially be incorporated into other systems.

The marriage of RNAi and CRISPR/Cas9

The literature often casts the discussion around these two technologies as competing technologies. But my view and the view of others I work with is that they really are complementary. Cancer offers a good example of the benefits of combining RNAi and CRISPR/Cas9 to animal models. To genetically engineer a cancer model, multiple mutations must be engineered in the same animal in order for cancer to occur de novo. Traditionally this was accomplished by interbreeding mice with specific mutations to one another for years on end until the desired multi-allelic model was obtained, a labourintensive and expensive process. CRISPR/Cas9 combined with RNAi allows us to accelerate this. For instance, we can take pre-engineered mouse embryonic stem (ES) cells and, using CRISPR/Cas9 in vitro, generate multiple modified alleles that together will give rise to a specific cancer. We can then sequentially target the same CRISPR-modified ES cells with a specific shRNA and then use those ES cells to make mice. These mice will not only get cancer but they will also have an shRNA(s) to test for therapeutic intervention all in the same animal. This process dramatically speeds up the time it takes to create multi-allelic disease models and validate them.

In another instance, animals harbouring a Cas9 transgene are effective for generating rapid somatic mutations (37,38). Using this Cas9 mouse, we can simply deliver guide RNAs by either nanoparticles, lentiviral viral vectors or adeno-associated viral vectors to the organ of interest and induce mutagenesis to promote disease. This then allows us to make in vivo somatic mutations rapidly without further engineering. Furthermore, the financial and temporal cost of generating new mouse alleles and incorporating them into increasing complex mouse models will be dramatically decreased.

This was recently demonstrated in a model of pancreatic ductal adenocarcinoma (PDAC), which unfortunately is a leading cancer killer in the US (39). Whereas existing in vivo models failed to model the stepwise progression and adult onset of pancreatic cancer, the Winslow lab at Stanford University School of Medicine was able to induce targeted genomic deletion of Lkb1 in adult mice specifically in the pancreas by retrograde pancreatic ductal injection of virus containing sgRNAs to a Cas9 expressing mouse. In combination with oncogenic Kras expression, Lkb1 deletion led to the rapid formation of pancreatic tumours, confirming its role in pancreatic tumour development.

This approach is game-changing to the cancer biologist. By utilising this CRISPR/Cas toolkit for transgenic in vivo mutagenesis, this approach should make loss-of-function experiments in vivo no more difficult than altering those genes in vitro, meaning this system should enable the rapid functional investigation of any gene of interest in a live animal. Conceptually, sgRNA-directed Cas9 cutting combined with tet-inducible RNAi will enable multiplexed gene inactivation to rapidly give rise to disease and evaluation of therapeutic targets via RNAi-mediated gene silencing in the same animal.

Beyond animal models, the synergy of RNAi and CRISPR/Cas9 enables genetic in vitro manipulation of human primary cell models, often patientderived or with disease-associated genotypes edited in, enabling side-by-side comparison of data to improve the accuracy of these predictive models. Recognising the potential efficiencies this presents for drug discovery researchers, CROs such as Mirimus and Charles River have invested in sophisticated genome editing platforms to enable their partners to combine RNAi and CRISPR technologies to streamline the discovery process for target identification, validation and disease modelling both in vitro and in vivo.


When we think about all the new and exciting ways we have of probing the molecular make-up of cancer and other diseases, it is important to keep all of these tools in context. The CRISPR frenzy notwithstanding, none of these by themselves are the magic bullet, but used together not only can we broaden our understanding of disease mechanisms, we might be able to determine whether a designated target that looked promising in the earliest stages of discovery indeed has the kind of efficacy and limited toxicity that succeeds in patients. It might also help us to wage a winning war against those high drug prices.

Grateful acknowledgement to Qiantong Hu for contributing to the research of this article.


Dr Prem Premsrirut is Founder and Chief Executive Officer of Mirimus, which uses RNAi and CRISPR/Cas9 technologies to engineer mouse models.



1 DiMasi, JA, Grabowski, HG and Hansen, RW. Innovation in the pharmaceutical industry: New estimates of R&D costs. J Health Econ 47, 2033, doi:10.1016/j.jhealeco.2016.01.012 (2016).

2 Santos, R et al. A comprehensive map of molecular drug targets. Nat Rev Drug Discov 16, 19-34, doi:10.1038/nrd.2016.230 (2017).

3 Grignolo, A, Pretorius, S. Phase III Trial Failures: Costly, But Preventable. Applied Clinical Trials 25 (2016).

4 Scannell, JW, Blanckley, A, Boldon, H and Warrington, B. Diagnosing the decline in pharmaceutical R&D efficiency. Nat Rev Drug Discov 11, 191-200, doi:10.1038/nrd3681 (2012).

5 Chen, B and Butte, AJ. Leveraging big data to transform target selection and drug discovery. Clin Pharmacol Ther 99, 285-297, doi:10.1002/cpt.318 (2016).

6 International Human Genome Sequencing, C. Finishing the euchromatic sequence of the human genome. Nature 431, 931-945, doi:10.1038/nature03001 (2004).

7 Venter, JC et al. The sequence of the human genome. Science 291, 1304-1351, doi:10.1126/science.1058040 (2001).

8 Kramer, R and Cohen, D. Functional genomics to new drug targets. Nat Rev Drug Discov 3, 965-972, doi:10.1038/nrd1552 (2004).

9 Lipardi, C and Paterson, BM. Retraction for Lipardi and Paterson, “Identification of an RNAdependent RNA polymerase in Drosophila involved in RNAi and transposon suppression”. Proc Natl Acad Sci U S A 108, 15010, doi:10.1073/pnas.1111383108 (2011).

10 Crotty, S and Pipkin, ME. In vivo RNAi screens: concepts and applications. Trends Immunol 36, 315-322, doi:10.1016/j.it.2015.03.007 (2015).

11 Livshits, G and Lowe, SW. Accelerating cancer modeling with RNAi and nongermline genetically engineered mouse models. Cold Spring Harb Protoc 2013, doi:10.1101/pdb.top069856 (2013).

12 Zamore, PD, Tuschl, T, Sharp, PA and Bartel, DP. RNAi: double-stranded RNA directs the ATP-dependent cleavage of mRNA at 21 to 23 nucleotide intervals. Cell 101, 25-33, doi:10.1016/S0092-8674(00)80620-0 (2000).

13 Fedorov, Y et al. Off-target effects by siRNA can induce toxic phenotype. RNA 12, 1188-1196, doi:10.1261/rna.28106 (2006).

14 Jackson, AL et al. Expression profiling reveals off-target gene regulation by RNAi. Nat Biotechnol 21, 635-637, doi:10.1038/nbt831 (2003).

15 Jackson, AL and Linsley, PS. Recognizing and avoiding siRNA off-target effects for target identification and therapeutic application. Nat Rev Drug Discov 9, 57-67, doi:10.1038/nrd3010 (2010).

16 Silva, JM et al. Second-generation shRNA libraries covering the mouse and human genomes. Nat Genet 37, 1281-1288, doi:10.1038/ng1650 (2005).

17 Fellmann, C et al. Functional identification of optimized RNAi triggers using a massively parallel sensor assay. Mol Cell 41, 733-746, doi:10.1016/j.molcel.2011.02.008 (2011).

18 Fellmann, C and Lowe, SW. Stable RNA interference rules for silencing. Nat Cell Biol 16, 10-18, doi:10.1038/ncb2895 (2014).

19 Fellmann, C et al. An optimized microRNA backbone for effective single-copy RNAi. Cell Rep 5, 1704-1713, doi:10.1016/j.celrep.2013.11. 020 (2013).

20 Watanabe, C, Cuellar, TL and Haley, B. Quantitative evaluation of first, second, and third generation hairpin systems reveals the limit of mammalian vector-based RNAi. RNA Biol 13, 25-33, doi:10.1080/15476286.2015.1128062 (2016).

21 Pelossof, R et al. Prediction of potent shRNAs with a sequential classification algorithm. Nat Biotechnol 35, 350-353, doi:10.1038/nbt.3807 (2017).

22 Gossen, M and Bujard, H. Tight control of gene expression in mammalian cells by tetracycline-responsive promoters. Proc Natl Acad Sci U S A 89, 5547-5551 (1992).

23 Premsrirut, PK et al. A rapid and scalable system for studying gene function in mice using conditional RNA interference. Cell 145, 145158, doi:10.1016/j.cell.2011.03. 012 (2011).

24 Zuber, J et al. RNAi screen identifies Brd4 as a therapeutic target in acute myeloid leukaemia. Nature 478, 524-528, doi:10.1038/nature10334 (2011).

25 Filippakopoulos, P et al. Selective inhibition of BET bromodomains. Nature 468, 1067-1073, doi:10.1038/ nature09504 (2010).

26 Bolden, JE et al. Inducible in vivo silencing of Brd4 identifies potential toxicities of sustained BET protein inhibition. Cell Rep 8, 1919-1929, doi:10.1016/j.celrep.2014.08.025 (2014).

27 Nakagawa, A et al. Selective and reversible suppression of intestinal stem cell differentiation by pharmacological inhibition of BET bromodomains. Sci Rep 6, 20390, doi:10.1038/srep20390 (2016).

28 Kim, YG, Cha, J and Chandrasegaran, S. Hybrid restriction enzymes: zinc finger fusions to Fok I cleavage domain. Proc Natl Acad Sci U S A 93, 1156-1160 (1996).

29 Miller, JC et al. A TALE nuclease architecture for efficient genome editing. Nat Biotechnol 29, 143-148, doi:10.1038/nbt.1755 (2011).

30 Cong, L et al. Multiplex genome engineering using CRISPR/Cas systems. Science 339, 819823, doi:10.1126/ science.1231143 (2013).

31 Jinek, M et al. A programmable dual-RNAguided DNA endonuclease in adaptive bacterial immunity. Science 337, 816-821, doi:10.1126/science.1225829 (2012).

32 Mali, P et al. RNA-guided human genome engineering via Cas9. Science 339, 823-826, doi:10.1126/science.1232033 (2013).

33 Maruyama, T et al. Increasing the efficiency of precise genome editing with CRISPR-Cas9 by inhibition of nonhomologous end joining. Nat Biotechnol 33, 538-542, doi:10.1038/nbt.3190 (2015).

34 Li, D et al. Heritable gene targeting in the mouse and rat using a CRISPR-Cas system. Nat Biotechnol 31, 681-683, doi:10.1038/nbt.2661 (2013).

35 Wang, H et al. One-step generation of mice carrying mutations in multiple genes by CRISPR/Cas-mediated genome engineering. Cell 153, 910-918, doi:10.1016/j.cell.2013.04.025 (2013).

36 Yang, H et al. One-step generation of mice carrying reporter and conditional alleles by CRISPR/Cas-mediated genome engineering. Cell 154, 1370-1379, doi:10.1016/j.cell.2013.08.022 (2013).

37 Platt, RJ et al. CRISPR-Cas9 knockin mice for genome editing and cancer modeling. Cell 159, 440-455, doi:10.1016/j.cell.2014.09.014 (2014).

38 Dow, LE et al. Inducible in vivo genome editing with CRISPR-Cas9. Nat Biotechnol 33, 390-394, doi:10.1038/nbt.3155 (2015).

39 Chiou, SH et al. Pancreatic cancer modeling using retrograde viral vector delivery and in vivo CRISPR/Cas9-mediated somatic genome editing. Genes Dev 29, 1576-1585, doi:10. 1101/gad.264861.115 (2015).