Many stages in the process of drug discovery require the separation, identification and analysis of biomolecules such as proteins, peptides and nucleic acids. Such analyses have traditionally been accomplished by electrophoretic separations in gels or capillaries, or more advanced techniques such as mass spectroscopy.
While advances have been made in all sectors of these productivity tools, the electrophoresis approach remains central to the majority of drug discovery genomics and proteomics tools. Conventionally, this has been accomplished by the use of labels attached to the biomolecules, allowing an excellent signal-to-noise ratio. However, the inherent disadvantages of imaging on a substance attached to the biomolecule being studied, has led many groups to consider label-free alternatives
Recent advances in instrumentation, microfluidic systems, signal processing and computational tools have enabled the first effective execution of Label-Free Intrinsic Imaging (LFII™) analysis in Proteomics and Genomics. In this context, ‘effective’ means sufficient resolution, sensitivity and reproducibility to allow complex biomolecule capillary electrophoretic analyses to be made and utilised. By imaging the biomolecule itself, rather than the attached label, as is current practice, significant advances in system functionality and the ability to quantify the amount of biomolecule are gained.
The excellent sensitivities and throughputs characteristic of advanced microfluidic systems can also be retained. By miniaturising the system, new technologies are allowing more rapid separations, and by removing labels, the rapid drop off in signal associated with decreasing the sample size can be mitigated. As the fundamental driver in biotechnology to reduce the size of the separation systems and thereby speed up the analysis process, the LFII™ approach has a built-in and increasing advantage. Finally, by removing labels, separation goals can be achieved while reducing the number of steps needed in sample preparation and label elution.
Other advantages of LFII™ systems include Health and Safety benefits, reduced reagent costs and sample sizes. LFII™ also allows the imaging of the biomolecules in real-time as they traverse the separation system. This affords the ability to separate and identify (including using advanced data-mining tools) and collect unlabelled proteins from these systems.
Productivity tools for drug discovery such as capillary electrophoresis must reach standards of performance appropriate to the tasks at hand. ‘Performance’ can be judged according to parameters such as:
Sensitivity (a measure of the smallest amount of biomolecule, eg proteins which can be detected and (generally) quantified.
Resolution (the ability to separate proteins of similar molecular weight, with the figure of merit, in effect, being the number of proteins one can resolve in a given sample).
Quantification ability (how well one measures the amount of a given protein).
Throughput (the number of fully analysed separations a given system can achieve in a given period).
Dynamic range (the range of concentrations accessible, and the range in molecular weights or read lengths attainable).
As is usual in instrumentation, one must generally compromise one or more of these parameters in achieving levels of performance in others so as to achieve a given analytical goal. Consequently, if one stays within a given technological sphere, improvements are usually incremental, often slight and always hard-won.
If a new technology arrives however, or an old technology is applied in a novel way, the usual stochastic improvements can be replaced with a stepchange in capability, resulting in dramatic improvements in more than one – perhaps even all – of the important parameters. This disruptive approach can often result in a radical improvement in the power of the resulting productivity tool in a wide range of applications. We believe that LFII™ represents such a disruptive technology in biomolecule analysis.
Drug discovery and analytical tools
1. Drug discovery depends on good basic science
Many drug discovery processes are fundamentally based in canonical cell and molecular biology based techniques. Examples include gel electrophoresis in one form or another and adaptations such as various blotting/affinity additions. These analyses of DNA/RNA and proteins form the backbone of the discovery process because they are the means by which diseases are defined. The definition of the disease process is an essential step in developing a therapy. Drug discovery needs a sure and reliable regimen to define what protein or group of proteins are up or down regulated in a disease state. This may be associated with a Single Nucleotide Polymorphism (SNP) or combination of SNPs in the patient’s genomic make-up. Both the proteomic signal, and the genomic information may well influence the choice and effectiveness of the therapy regime. All these applications return to the central problem that the discovery process needs good reliable basic research to be performed at the bench.
There are divided and occasionally contentious opinions on whether disease is best studied at the genomic or proteomic (or other ‘omic’) level. The pragmatic approach to this is to assume that for varying circumstances, one can draw on different biological signals or correlations between such signals. Over the past 10 years the core science of Molecular Biology has, somewhat artificially, been separated into the two broad categories of Genomics and Proteomics. Among the drivers of this schism, and the further ‘omic’ sub-headings such as Transcriptomics and Peptidomics, have been differing perspectives – even trends – in the opinion on what is important and also market forces, where the need to draw in specialised investment has made compartmentalisation an advantage. However, this trend may have thrown up artificial barriers in a field better served by a more universal approach, which would be served by advances in multifunctional technologies. Advances in computational tools and the more standardised approach afforded by the label-free approach are going a long way to reconcile this division.
The key issue is that most molecular biology technology is still embedded in the era of its birth, the 1960s-70s, where molecule analysis is based on a one-dimensional separation by charge/mass criteria. These techniques, essentially sorting molecules by size through various sieving matrices, are inherently limited due to problems in complexity, reproducibility, data handling and storage. The core science has stayed within the molecular biological sphere. A large percentage of technological advances in this field have targeted improvements in current gel-based agarose or polyacrylamide electrophoresis. Product offerings such as pre-cast gels (Bio-Rad PROTEAN II, Amersham MultiPhor II), advanced staining systems such as GE Healthcare‘s CyDye range and improved sample preparation (SIGMA Proteoprep) are current examples.
These products are serving the scientific community to improve its productivity, but also prolonging the user’s dependence on decades-old technology. It simply is not an option that pharmaceutical and other forms of biological research can be expected to deliver the 21st century needs of the growing, and ageing population using mid-20th century technologies. Advances in analytic technologies must be complemented by bioinformatics tools capable of handling the increasingly complex and high volumes of data. Furthermore, the correlations between genomic and proteomic data requires sophisticated renormalisation and association techniques which are unfeasible with old analogue technologies. Molecular biological research must be transformed from its slab gel dependence to embrace the new generation of tools. The problem with this concept is the in-built reluctance of many workers to trust their painstakingly synthesised or isolated samples to new ‘unproven’ technology. In DNA sequencing these new technologies are exemplified by advances such as Pyrosequencing (www.biotage.com, www.454.com) and Solexa’s array-based approach (www.solexa.co.uk). Both of these technologies are being released commercially over the coming months.
In proteomics, array-based assays are in the ascendancy and form the key new technology in the field, but they are hampered by several key issues, such as accessibility, normalisation, complexity and cost. These approaches have been made possible partly due to the failure of Capillary Electrophoresis (CE) to displace slabgel systems. There are a variety of reasons for this, but generally the problems have been to do with the wide range of technological components. These include the complex chemistries involved, the large number of contributing physical forces, their effects and their dependence on environmental issues such as pH and temperature. In the centre of this web of difficulties lies the fact that conventionally, CE has used chemiluminescent labels to image the biomolecule. This has resulted in poor reproducibility, lack of quantification, small dynamic range, inadequate resolution and undesirable changes of the biomolecules themselves. The traditional approach to improved performance of the analytic tools companies has been a steady improvement in these complex chemistries resulting in an incremental improvement in system performance. The alternative approach has involved the introduction of radically different technology, perhaps derived from another area, (eg x-ray crystallography and mass spectroscopy). These generally make very significant improvements in one or more of the measurable parameters but require a radical change in the modus operandi of the drug discovery process, as well as often very significant capital and operational costs. The LFII™ approach seeks to accomplish the step-change of the mass spectrometer, while staying rooted in the CE area of expertise which most drug discovery organisations have in abundance.
2. What is Capillary Electrophoresis (CE)?
In electrophoresis, the molecules, for example proteins, can be made to move upon the application of an electrical field as they generally have an overall negative charge. This means they will move towards the positively charged end of a separation medium, in this case a capillary. CE is a family of techniques, used for the separation and identification of charged and uncharged species. Each technique has its own set of operative and separative characteristics, but can be performed within the same instrumental configuration, ie a controllable high voltage supply, a fused silica capillary (coated or uncoated), a detector, buffer reservoirs and two electrodes.
Figure 1 shows the most common manifestation, Capillary Gel Electrophoresis (CGE). In CGE the biomolecule to be analysed is loaded into the capillary by electrokinetic or pressure injection. Once loaded, the biomolecule is exposed to an electrical potential and starts to move towards the anode, as in a slab gel system. The smaller molecules move faster and traverse the sieving matrix as discrete bodies once their terminal velocity is reached. The larger molecules do the same and all form distinct bands in the matrix. In conventional systems the bands move across the single-point detector and are imaged.
The second common form, Capillary Zone Electrophoresis (CZE) is a mode of CE that separates molecules based on their charge in an aqueous buffer solution. A condition to separation in capillaries is the presence of electro-osmotic flow (EOF). This is the electrically induced bulk flow of a solution. The inner surface of a bare fused silica capillary contains silanol groups which when exposed to sodium hydroxide (NaOH) dissociate to give the internal surface of the capillary a negative charge. To preserve electrical neutrality, cations in the buffer aggregate near the capillary surface to form what is known as an immobilised electric double layer (EDL). When the high voltage is applied, these cations move toward the cathode dragging the rest of the solution with them, thus causing the bulk flow of the fluid within the capillary.
The separation is driven by the EOF in tandem with the intrinsic surface charge of the molecule being separated.
3. Imaging separating biomolecules in slab gels
Following separation, the proteins have to be made visible by attaching some sort of coloured or radioactive label, a difficult, expensive and errorprone process. Conventional gel systems require the preparation of the gel (~2hrs), pre-running the gel (40 minutes), running the sample (~4 hrs in the example in Figure 2). The run gel must then be separated from the apparatus, where often breakage can occur, and stained, in this case with a dye called Coomassie Blue1. This takes about an hour. The stain sticks to the proteins in the gel making them visible, but then the excess stain, which is less adherent to the gel matrix but still sticks to it, has to be washed away. The slab-gel has already required almost eight hours of process time, of which about three hours is ‘hands on’. At this stage the destaining process is usually conducted overnight. The next day the gel is placed on an absorbent paper and dried in a heated tray. The gel is then ready for data acquisition. This is usually done by eye, but there are also expensive image analysis programs available to do this.
4. Label-free Intrinsic Imaging (LFII™)
LFII™ systems address many of these issues. An example of LFII™ is offered by the deltaDOT approach, which uses UV imaging but uses many hundreds of detectors. Conventional CE is restricted to just one detector: an apt metaphor is looking at the sequence of horses in a race by simply viewing the jockey’s colours as they flash past the finishing post. By using many detectors this approach can build up a map analogous to filming the entire race and watching the horses separate from each other in real-time over the entire racetrack. There is so much more information to be gained in this manner that the losses in the signalto- noise ratio caused by removal of the Jockey’s colourful shirts is more than compensated by the fact you can image over the whole race. The data set is generated as a series of peaks, each one representing a biomolecule. The data is ready to use about 100 times more rapidly than is possible in the slab-gel and is inherently digital. This has huge advantages, not only in time, but also in complying to regulatory strictures.
5. Digital systems versus Analogue and CFR 21/11
Normal gel-based systems are inherently analogue. The Federal Drug Administration has required CFR 21/11 compliance from August 20, 1997:
CFR 21 Part 11, the FDA guidelines for trustworthy electronic records, requires companies to employ procedures and controls designed to ensure the authenticity, integrity and, when appropriate, the confidentiality of electronic records, and to ensure that the signer cannot readily repudiate the signed record as not genuine.
To satisfy this requirement persons must, among other things, employ procedures and controls that include the use of computer generated time stamps2.
Given the rapid rate at which life-sciences operations continue to be automated, the FDA has set forth its current thinking on ways to meet 21 CFR Part 11 requirements for ensuring that all critical electronic records and signatures are trustworthy, reliable and compatible with FDA’s public health responsibilities. The objective of this regulation is to ensure the integrity of all:
“Electronic records and electronic signatures that persons create, modify, maintain, archive, retrieve, or transmit under any records or signature requirements set forth in the Federal Food, Drug and Cosmetic Act, the Public Health Service Act or any FDA regulation.”
In order to comply with this protocol, companies have a complex protocol for every single gel they run.
1. Run the gel.
2. Scan the gel. Create a TIFF file and keep a photographic record.
3. Submit this is IPQC standards (In Process Quality Control).
4. Scan the TIFF file to produce a densitometry image.
5. Produce purity data on this image, eg this protein is 97.3% pure.
Advanced LFII™ CE systems generate digital, time-stamped data that will allow the addition of digital user signatures and bring about a complete compliance to CFR 21/11. The time savings this creates will dramatically reduce the amount of resources drug discovery companies need to produce digitised molecular biological data.
6. Proteomic applications of advanced CE systems
In the LFII™ system, ‘Peregrine’, recently introduced by deltaDOT (www.deltadot.com); Protein, DNA and RNA sizing can be accomplished without the use of labels at all.
This technology has imaged a range of proteins standards from 14.5 to 205 kDa in under 15 minutes. Wider dynamic ranges are possible. E.coli cell lysates have also been separated at similar rates (Figure 2). LFII™ can achieve ‘virtual’ resolutions, which allow unprecedented separation and analysis of individual proteins and groups of proteins in complex mixtures. Peak shape information is retained, and such peaks can be associated with the fully analysed peaks. For extra analytical power, the peak height (as opposed to the peak area) gives an accurate measure of the protein concentration. Smaller proteins and peptides may also be analysed on these advanced systems, Figure 3 shows a comparative analysis of peptide standards on the system. The system has also been extensively used to analyse chemicals such as thiourea and caffeine.
A final example in this system’s capability in proteomics is the detection of anomalies in protein analysis. Figure 4 shows the detection of an extra gene product in Genetically Modified (GM) crops. The ability to detect anomalies such as this has direct application to drug discovery in the field of biomarker proteins. Disease and trauma cause various reactions in the body. One of a fairly new group of phenomena being used to detect diseases are disease marker proteins. These are proteins the body produces or reduces in the cell in a direct reaction to the problem it is facing. The resultant protein can be used as a diagnostic signal to indicate the presence of a disease or other deleterious state. These proteins can be identified as they appear as an anomaly in the usual pattern of the patient’s protein makeup. Cells, such as liver cells, are composed of thousands of different proteins and these are present in abundances defined by the state of the cell. Rather like a fingerprint, this pattern or protein profile can be used to monitor the cell. During routine cell life these profiles will vary within set parameters. During the outset of a disease one of the first events will be a change in the protein profile. Some proteins will diminish in abundance (down-regulation), while others will increase (up-regulation). These events give important clues as to which proteins are related to the disease state, whether they are causative or a symptom. These changes may be subtle and not manifest themselves until the disease onset is well developed. This makes them interesting for research into possible disease therapies, but not necessarily as a diagnostic indicator.
Recent research has highlighted a group of proteins whose up-regulation is so dramatic that it allows them to be used as an early phase diagnostic tool. These proteins can undergo increases of several orders of magnitude, allowing them to be used to spot the disease in sufficient time to begin therapy. A good example of such a protein is C-reactive protein (CRP). This protein is intimately linked to a type of cardiovascular inflammation which can lead to heart disease. A positive association between CRP and coronary artery disease has been established. In a survey of 388 British men aged 50-69, the occurrence of coronary artery disease multiplied 1.5 fold for each doubling of CRP level3,4. Although there are a wide range of such proteins5, CRP may well be the most important. CRP is produced in the body in reaction to trauma injury, infection such as Staphylococcal sp. or other types of inflammation. The quantity of CRP is directly proportional to the stage of inflammation; the protein can be up regulated by a factor of 1,000s.
The ability to detect such up- (and down-) regulated proteins using label-free technologies may herald a new era in diagnostic analysis of disease.
7. DNA sequencing in advanced CE systems
Since the advent of DNA sequencing, its main area of application has been in high throughput discovery genomics. However, it is difficult to overestimate the need to sequence short templates of DNA. Genetic manipulations of single bases, such as point mutations, or larger DNA fragments, as in gene silencing; are now routine in many laboratories. This was not the case when DNA sequencing machines were first launched, and the need to perform short run sequencing was not so apparent. Sequencing is the best possible way to check these processes, and generally a template of less than 100 bases needs to be read. The validation of known single nucleotide polymorphisms (SNPs) is also becoming increasingly important as their role in genetic disorders become clear. Again this can be achieved by sequencing templates only a few 10s of bases long.
Only a few years ago this sort of work would be performed by isotope labelled Sanger sequencing6 and large gel electrophoresis. In a high percentage of modern laboratories all sequencing is now sent out to a core service facility in a university or research institute. While not very expensive for an individual template (typically ~$20 or less), the costs can add up for a large facility with many workers performing genetic manipulations, and this will rapidly become astronomical when thousands of SNPs have to be quickly validated and correlations between them analysed.
Another problem is time. DNA sequencing services usually take a minimum of three days. For short sequencing, rapid turnaround is very important. LFII™ DNA sequencing is currently being developed to sequence 200 base templates in about an hour (Figure 5 DNA sequencing on a Merlin prototype).
8. LFII™ and microfluidics systems
In biotechnology, small is indeed beautiful. With huge gains in separation speed, significant reductions in the amounts of material needed for separation and the possibility of massive parallelisation allow for far better systematic controls in the analysis of biomolecules. For this reason, LFII™ has always perceived miniaturisation as an essential aspiration. To understand how microfluidic systems can be adapted for LFII™, we must first explore the fundamentals of the electrophoresis process. Conventional CE resolution is mainly limited by the electrophoresis process itself. Improvements that can be made include using higher voltage, tuning the injection, controlling the electro-dispersion and enhancing other parameters. The choice and optimisation of the sieving matrix is also a major resolution factor. Many of the currently used sieving matrices are non-gel based, and consist of long chain carbohydrate molecules. These types of materials have several advantages over polyacrylamide and agarose-based systems, not only in cost and health and safety issues, but also in the increased simplicity in filling complex microfluidic structures. One of the main matrices used is Polyethylene oxide (PEO). PEO is a polymer comprising CH2-CH2-O units of varying chain lengths from 1 x 106 to 9 x 106. PEO has good UV transparency properties at the appropriate wavelengths and has low viscosity (at the lower molecular weights) to enable chip filling. PEO also binds to the silica walls of the capillaries. While this might appear sub-optimal, in fact it is an immense benefit as this attribute suppresses Electro-Osmotic Flow (EOF – the tendencies for chemically created reverse fields to form at the matrix/capillary wall interface). EOF can cause major electrophoretic problems, but it can also be exploited as a technique to enhance protein and chemical separations7. These facts make PEO particularly suited to microfluidic LFII™ systems, and we anticipate that there will be an introduction of a range of chip-based LFII™ systems which take advantage of the new technologies available in the near future (Figure 6).
Chip technologies and the associated benefits of miniaturisation are definitely the way forward in molecular biology and therefore drug discovery. The advantages in cost, while minimal in the development phase, will lead to the creation of high throughput devices featuring disposable consumable elements that are simple to use and within the scope of Point of Care (PoC). PoC systems, that allow a healthcare worker to diagnose a patient quickly and cheaply are the gateway to personalised medicine, the ‘Holy Grail’ of healthcare in the 21st century.
Label-free analysis of biomolecules will potentially transform the genomic and proteomic research landscape. The advantages in terms of efficiency, resolution, reproducibility and throughput are clear. The utility of LFII™ will find applications in such diverse areas as regulatory approval and biomarker discovery. However, as with all disruptive technologies, gaining the critical acceptance of opinion leaders and research peers will be essential if current technologies are to be superseded. The approach offered by LFII™ allows the necessary disruptive step-change in capability to allow our industry to face the future challenges, while allowing laboratory workers to continue the established and trusted procedures of capillary electrophoresis.
The label-free analysis of proteins and nucleic acids will undoubtedly enhance efficiencies in the drug discovery process, but there may be other potential spin off benefits in the quality control of protein/gene therapeutics, forensic science and biodefence which will be worthy of examination.
Dr Stuart Hassard is Head of Biology and cofounder of deltaDOT. He is responsible for deltaDOT’s IP portfolio, bio-defence programme and its label-free DNA sequencing programme, and led the biochip effort in its early days. Stuart
1 Hames, BD (1998). Gel Electrophoresis of Proteins 197.The Practical Approach Series.
2 http://www.ntpsystems. com/21CFRwp.asp?eng ine=adwords!3230&keyword =%2821+cfr+11%29&match_ty pe
3 Mendall, MA. Inflammatory responses and coronary heart disease. BMJ, 1998. 316(7136): p. 953-954.
4 Riker, PHP. Prospective studies of C-reactive protein as a risk factor for cardiovascular disease. J Investig Med., 1998. 46: p. 391- 395.
5 Tokac, MOA,Aktan, M, Altunkeser, BB, Ozdemir, K, Duzenli,A, Gok, H.The role of inflammation markers in triggering acute coronary events. Heart Vessels, 2003. 18(4): p. 171-176.
6 Sanger, F, Nicklen, S and Coulson,AR. 1977 PNAs 74 pp 5463-5467. DNA sequencing with chainterminating inhibitors.
7 Chang et al (2004). American Laboratory News, Dynamic Coatings for Selectable, Buffer-Insensitive Electroosmosis, 36, 8.