Prediction versus Attrition in Drug Discovery & Development

Prediction versus Attrition in Drug Discovery & Development

By Professor Malcolm Young

The research-intensive part of the pharmaceutical industry is facing unusual commercial difficulties, whose elements are now well-known.

A substantial number of very valuable drugs have gone, or are about to go, off patent; too few good new premium drugs are coming through to market authorisation, to the extent that no new blockbusters are predicted to be launched by the largest companies until 2012; conventional R&D costs are crippling, whether these costs are met through in-house programmes, outsourced to service providers, or picked up through in-licensing products in which others have invested; and many thousands of experienced R&D professionals are departing previously secure R&D programmes in major companies. Share prices are sliding.


Productivity of the drug discovery and development process is the central problem in this dismal commercial picture (1). There are too few medically important new drugs derived now, and the processes in place to derive them straightforwardly cost too much for investor confidence to be sustained.

Many factors influence productivity, but attrition at later, very expensive, stages of development is the central manifestation of disappointing productivity. What, practically, can be done to improve productivity and decrease latestage attrition?


Conventional discovery and development approaches represent a heuristic. In most, but not all, discovery programmes, the processes begin with selection of an interesting protein target. Screening, or hit-to-lead, and then lead optimisation, processes match a likely lead candidate to the protein target; pharmacokinetics, toxicological features and basic efficacy are examined, usually in proxy or simplified systems, and then the efficacy and safety risks are evaluated somewhat serially by seeing whether these risks manifest in people, in the clinic.

During risk evaluation in the clinic, the entire spend on the development programme to that point hangs on the possibility that the trial, usually expensive in itself, will not manifest an important efficacy or safety risk.



The success rate of this heuristic approach is very low. For example, the average probability that a candidate emerging from lead optimisation will not make it to be a drug is above 99.8% (2). Building suspension bridges, cars or TVs with this sort of failure rate would not be thought reasonable by investors in those businesses, and searching questions are now being asked of the processes that are intended to build value in pharmaceuticals.


Prediction, cost and risk

Taking a lead from the bridge-builders, the crucial advance that they made was to be able to predict increasingly accurately what would be the properties of their constructions, in advance of actually expending vast sums building them. Prior to that, bridge building was also largely heuristic, in which the expensive building of the structure was itself the way to de-risk its design.

In drug discovery, we need also to be able to predict accurately what risks a candidate has as cheaply and early as possible, and not rely on hugely expensive clinical trials as the sole or principal means of de-risking our candidates. If we cannot predict accurately, as at present, most drugs will fail after substantial sums are spent on them, since very expensive trials would be the only way to evaluate the risks.

This would mean that at any point the vast majority of discovery and development resource will be tied up in drug candidates that are going to fail, and productivity will be unsustainably low. This is what we see in the industry at the moment.


But if we could predict accurately early, most drugs that are predicted to be worth developing would not fail, and expensive trials will usually evaluate the risks as low. This would mean that at any point the majority of discovery and development resource will be invested in drug candidates that will probably not fail because technical risks manifest, and productivity will be much higher.

This is what we would like to see in the industry in the future. Commercial risk for individual companies will not diminish in this scenario, indeed it would rise if more good drugs come to the market, but that would perhaps be a better problem to have.


Hence, a very great deal turns on whether one can in fact predict accurately what the efficacy, safety and deliverability issues of a candidate molecule are – before undertaking expensive testing. Prediction accuracy is now seen by many as the central issue in the research-intensive side of the industry. ‘Predictive de-risking’ has become a sort of Excalibur, mythical, but pretty usefully sharp if found.


It is evident from current attrition rates that the processes implemented in the recent past, and in the present if they differ little from what has been traditionally employed, are not sufficiently accurately predictive to yield the required productivity across the industry.

This can be a little dissonant for those used to drawing together extensive preclinical packages for IND submissions, but, in the average, the PK, tox, in vitro and in vivo studies required for an IND plainly do not predict efficacy, safety and deliverability in human patients even nearly well enough.

Some of the elements of a full picture of the Signal Detection Theory performance of preclinical packages for new molecules, such as correct identification rate and false positive rate, can be estimated, but correct rejections are presumably hidden among the cases that did not in fact gain IND status, and it is anyone’s guess how many incorrect rejections (misses) there have been in which, for example, a candidate would have worked very beneficially in humans – if only we’d known – but was unfortunately not beneficial to mice.

Figure 1 These diagrams illustrate a protein network of the bacteria Staphylococcus aureus


The road to prediction

Predictability starts to be an issue at the very start of a discovery programme. Selection of a protein target is often based on evidence that the specific protein is significant in a pathway relevant to the disease of interest, this evidence perhaps being in the form of a knock-out showing an effect in changing cell physiology, and on evidence that the protein target’s function can be affected by the binding of a drug molecule to it.

This approach is very deeply ingrained in the current intellectual furniture in discovery, and is characterised as the basis for ‘rational drug discovery’. Iconoclasts, however, are sometimes disposed to cast core beliefs in science as scientific claims, the better to examine their plausibility. In this case, the targetbased approach essentially makes the scientific claim that (for example) ‘inhibiting this one protein will make the patient better’.

Probably there are some diseases in which this claim can seem plausible, but for most diseases, especially the complex diseases that reflect those for which we now need new and effective medicines, it is akin to suggesting that one will fix an ailing economy by deleting one company, or that one will disable the enemy’s command and control network by sniping one ill-fated radio operator.


Indeed, the claim implicit in single-target discovery is stronger than this. It could be cast, again iconoclastically, as ‘even though we know that every drug molecule we develop for this one protein target will bind in varying degrees to many other proteins, these off-target effects don’t matter much’.

I’ll avoid developing any analogy to claims that collateral damage is not important. However, this focus is almost ubiquitously evident in optimising for nanomolar potency against the chosen single target. And yet potency cannot predict efficacy accurately if even one other protein has an affinity for the candidate molecule, since another affinity may be high enough that availability at the putative target is greatly reduced in vivo, and there may be powerful synergies in the network effects of multiple interventions within a complex biological network (3).

A rather concrete example of this issue is that atorvastatin, which enjoys some considerable success, and cerivastatin, which enjoyed rather less, cannot readily be distinguished on the basis of their presumptive primary protein target, since both are HMG-CoA reductase inhibitors of similar potency (4).

It is the everything-else that makes the difference, and accurate prediction would seem to require that we take account of a more complete spectrum of the interaction of drug molecules with biological systems, and learn to interpret what these interactions mean in terms of efficacy and safety.


What of Systems Biology?

Prediction is synonymous with biological simulation for many in this area. Simulations, however, have presented some problems in application to drug discovery which have so far limited the degree to which they can contribute to accurate prediction of drug properties.

The first issue is noise in the data used as the basis for a tissue model or cell simulation. Because each individual measurement in the data has to be modelled and ascribed a value, and often a whole set of values for the simulation to proceed, the ever-present noise in experimental biological data risks producing gross uncertainties in the dynamics of the simulation. Cleaner data will doubtlessly help, and instrumentation and methodology are constantly advancing.

A second issue with simulation as a basis for accurate prediction is a logical one. A simulation is simply a way to see what the consequences of one’s premises – the basis data and one’s assumptions – are when these are too difficult to see by eye or to calculate analytically. Hence simulations have the logical form: if p (the set of premises) then q (the consequence) is the behaviour of the simulation.

Unfortunately, if the simulation’s behaviour looks like that of the real biology, the consequent, q, is affirmed, and one is not entitled to believe the truth of the premises, p, because of the fallacy of affirming the consequent. Hence, a ‘successful’ simulation is information free. If, however, the simulation does not behave as does the real biology, it can be inferred validly that the premises are false, but here one encounters the third issue.

Because of the simplifications and extrapolations required to undertake the simulation, it is often very difficult to find which specific premises were incorrect. This combination of susceptibility to data noise, logical problems, and necessary simplifications suggest that more research is needed before simulations are likely to come centre stage in predictive de-risking.


What else is there? Complex systems science is developing a wide variety of analytic (as opposed to simulation) tools for complex networks, including networks of full biological complexity (5). This area is moving quickly at present, motivated by the ubiquity and importance of complex networks in human activity. Where these tools have been applied to biological systems there is some evidence of encouraging predictive accuracy.

Early attempts to apply network features derived from scale-free topology to predicting essentiality in model organisms were quite accurate (6). Approaches based on identifying the interactions between connectional and functional clusters were also quite successful in finding therapeutic targets (7). Approaches based on searching with multiple topological features, and then mapping synergetic combinations of proteins in specific cell-types back to compounds that could affect these combinations, have also been successful (8).

These were capable of finding all efficacious classes at a predictive accuracy characterised by a conditional probability of recovering these known molecules by chance below 10-180. These results suggest that complex systems science tools are capable of delivering predictive accuracy great enough to make a contribution to the economics and productivity of drug development, but it is not known yet whether or how far they generalise.


Initiatives for predictive de-risking

There is no shortage of enthusiasm for predictive de-risking, nor of initiatives aimed at generating better approaches to it. The €2 billion European Innovative Medicines Initiative, for example, suggests that ‘The biopharmaceutical industry’s greatest need is to be able to predict failure at the earliest possible stage of the medicine development process.

The ability to identify both lack of efficacy and the potential for adverse reactions as soon as possible would greatly increase the productivity of R&D, and accelerate the discovery and development of better medicines’ (9). These initiatives are very important, and will help. However, important advances in analytical capability have historically most often been derived by small groups of highly motivated scientists and mathematicians, and often by individual insight.

In this context, it is possible to envisage a Grand Challenge set against this important frontier, like those between rival machine vision approaches, or game theoretical algorithms. Perhaps the best example of a Grand Challenge that inspired and motivated a successful response was the Ansari X-prize for reusable space flight, which involved a prize of $10 million to the team that could first build and launch a spacecraft capable of carrying three people to 100 kilometers above the earth’s surface, twice within two weeks.

The prize was won, very surprisingly, by superbly innovative engineers, with very strange looking contraptions made of roughly the same material as my kayak, whose success followed from not pursuing the conventional knowledge in spaceflight design, but from designing from first principles.

Arguably, our challenge in predictive de-risking for drug development is even more important than developing space tourism, although some ingenuity in setting an achievable but very stretching target for predictive de-risking in drug discovery would be required. We all need rapidly to know which approaches can deliver better predictive accuracy, since improved predictive accuracy, there is rigorous reason to believe, will underpin much more efficient discovery of drugs. DDW

This article originally featured in the DDW Fall 2008 Issue



Professor Malcolm Young is one of the UK’s leading scientists in informatics. He has recently held a number of senior academic positions, including Director of the Complex Systems Group and Pro- Vice Chancellor for Strategic Development at Newcastle University, following a Royal Society Research Fellowship at the RIKEN Institute in Japan, and at Oxford University. Malcolm’s research experience and interest lies in complex systems analysis and informatics, and his main goals are to understand how biological function arises from structural aspects of complex biological systems. To this end, Malcolm founded e- Therapeutics, a systems biology drug discovery company which aims to de-risk the process of drug discovery and development. He has led significant funding rounds, taking the company through an IPO in December 2007 and has led its development since, driving e-Therapeutics to develop drug candidates for asthma, oncology, superbugs such as MRSA and C.difficile, cholesterol and pain management. In addition to his role at e- Therapeutics, Malcolm is Non-Executive Chairman of Novotech Investment Ltd, an innovative technology investment company, and of OGS Search Ltd, a post-Google search engine company. Malcolm is one of 18 scientists worldwide nominated by The Sunday Times as the ‘Brains behind the 21st Century’.




1 Booth, R and Zemmel, R. Prospects for productivity. Nature Reviews Drug Discovery 3: 451-456, 2004.


2 European Commission, Innovative Medicines Initiative: better tools for better medicines. Luxembourg: Office for Official Publications of the European Communities, 2008.


3 Young, MP, Hilgetag, CC and Scannell, JW. On imputing function to structure from the behavioural effects of brain lesions. Philosophical Transactions of the Royal Society: Biological Sciences, 355:147-161, 2000.


4 Carbonell, T and Freire, E. Binding Thermodynamics of Statins to HMG-CoA Reductase. Biochemistry, 44 (35), 11741 -11748, 2005.


5 Young, MP and Shapiro, E. From complexity to coherence. In Microsoft 2020 Science, Emmott, S and Risen, S (Eds), Microsoft Corporation, pp26-28, 2006.


6 Jeong, H, Mason, SP, Barabasi, AL and Oltvai, ZN. Lethality and centrality in protein networks. Nature 411: 41-42, 2001.


7 Anderson, A. Elucidating Essential Targets in Pharmacologically Relevant System Models. UCSF/UC Berkeley Bioengineering Graduate Group, 2002.


8 Idowu, OC, Lynden, SJ, Young, MP and Andras, P. Protein Interaction Network Analysis, pp. 623-625, IEEE Computational Systems Bioinformatics, 2004.


9 European Commission, Innovative Medicines Initiative: better tools for better medicines, page 5. Luxembourg: Office for Official Publications of the European Communities, 2008

Suggested Reading

Join FREE today and become a member
of Drug Discovery World

Membership includes:

  • Full access to the website including free and gated premium content in news, articles, business, regulatory, cancer research, intelligence and more.
  • Unlimited App access: current and archived digital issues of DDW magazine with search functionality, special in App only content and links to the latest industry news and information.
  • Weekly e-newsletter, a round-up of the most interesting and pertinent industry news and developments.
  • Whitepapers, eBooks and information from trusted third parties.
Join For Free