HTS Systems: impact of reliability on downtime and drug discovery. Winter 10
Few survey respondents appear to have given much thought to the impact of HTS systems’ reliability on their company’s research enterprise as a whole and only minimal attempts have been made to assign an economic value to the downtime associated with HTS systems. In contrast, system integrators are actively working to deliver improved reliability and have identified/implemented many changes to their hardware and software over recent years to minimise downtime. A call is made for drug discovery companies to give greater internal recognition to the significant contribution HTS systems make. Equally HTS system operators need to take greater ownership of that role, champion their impact more widely across drug discovery, and embrace the challenge of 24-7 system operation. Somewhat belatedly companies are waking up to the fact that HTS system downtime represents a failure to maximally exploit a strategic resource which has an impact on their bottom line.
It is a well known fact that an impromptu visit to a Pharma robotic screening facility is unlikely to result in the visitor seeing an integrated HTS assay system in full operation (ie, performing any obvious visible task). The typical response to this is that the system is waiting for a command or undertaking an incubation period. However, an impression prevails that many of these systems are under-utilised and the concept of industrial 24-7 operation has been rarely applied. It is also suspected that the underlying cause of system downtime is poor reliability and the lack of adequate operational robustness, such that some groups are reluctant to perform assays when key personnel are not on site, eg outside normal working hours. However, as HTS groups are put under greater pressure to achieve more with less resource, some companies are taking a deeper look at the return they are getting from these big investments.
Increasingly there is a realisation that the functional operation and output of an HTS system cannot be considered in isolation and they play a pivotal role in the success of an organisation’s drug discovery process, such that system failure (or lack of availability) has associated with it a cost. With this in mind HTStec recently undertook a survey to collect opinion on the reliability of automated HTS assay systems and the effect of reliability on output1. The objective was to try to understand the impact of reliability of HTS systems to the company’s research enterprise as a whole and to assign an economic value to the downtime of one of these HTS systems. Feedback was obtained from the supervisors, operators, owners and the engineers that support HTS robotic assay systems. In this survey an ‘HTS system’ was defined as having robotically integrated capabilities for: 1) reagent/ liquid dispensing into assay plates; 2) assay plate incubation or holding; and 3) assay plate reading. Systems reviewed have a minimum on-board storage capacity of at least 150 assay plates and were able to process (screen) plates at a rate of at least 10,000 wells per 24h day. Assay workstations and other automated systems that primarily reformat compounds or create assay ready plates were excluded from the analysis. In the subsequent discussion we refer to an integrated HTS system as a ‘system’.
Typical system downtime
Survey respondents reported a mean system downtime (ie, not operating for any reason) of 8.1 days per month (Figure 1). In addition, 40% of respondents claimed their system was down for 10 days or more per month. This represents quite a significant under-utilisation of a resource that has associated with it a large investment cost and a high operational cost. On average each system has a team of three FTE devoted to its operation. On closer investigation the majority (61%) of system downtime (ie, time the system was not being used for screening) was idle time, with scheduled repairs or maintenance accounting 14% of downtime and other (unclassified) downtime 6%. (Figure 2). The system downtime that was attributed to unscheduled activities (ie, as a result of system breakdown or malfunction) was 19%. This equates to about 1.5 days per month when systems are down due to issues related to poor reliability.
System functioning at an acceptable level
Respondents reported that even when their systems were operational, only during 82% of this operational time did their system function at an acceptable level (Figure 3). This means 18% of operational time was at a level most respondents considered unacceptable.
Data points excluded
It is therefore not surprising to learn that respondents excluded a mean of 9% of all data points generated on their system due to an unacceptable level of quality, some of which may have been a direct consequence of poor system reliability (Figure 4).
Cause of greatest downtime
Peripheral components hardware (eg, readers, liquid handlers) were ranked the cause of most frequent system problems and had greatest impact on downtime. This was followed by integration hardware (eg, robots, plate handlers and movers); integration software (eg, scheduler, device drivers); and then peripheral components software (Figure 5).
What contributes to system failure?
The introduction of a new assay readout/technology was rated as having the greatest positive effect on the successful operation of a system, while reagent characteristics (eg, viscosity, homogeneity, surface tension, etc) as having the greatest negative effect on a system, and contributing to failures (Figure 6). Weekend and holiday operation, cellbased assays, and longer incubation times (ie, extended assay duration) were broadly neutral in their effect on the system failure. Interestingly, most hardware changes and software patches or upgrades were associated with a negative effect on system operation, as were unplanned absences of key personnel.
Realistic advantage of improved reliability
When respondents were asked what they thought would be realistic advantages if their system was more reliable they rated increased user (customer) satisfaction as the most likely outcome (Figure 7). This was followed by run screens quicker and then to repeat fewer wells (where data does not meet acceptance criteria). The least likely advantages were to identify weaker inhibitors, reduce number of false positives or to discover more lead series. The response recorded is suggestive of the fact that most respondents do not think much beyond their immediate task (ie, running the screen on their system for the customer group), and tend to see the advantages that might stem from improved reliability primarily in these terms, rather than how it might impact downstream or more widely on the drug discovery process. This finding was further reinforced by respondents’ answers to the question what HTS (ie, their system) has achieved in their organisation to date (Figure 8). More data points screened was rated the greatest achievement of HTS in respondent’s organisations. This was followed by enabled targets to be screened faster, better quality data points to be generated, and then enabled many more targets to be screened. Rated as least achieved was responsible for compounds in clinic, increased the bottom line (ie, company profits), and brought new drugs to the market earlier. Clearly respondents view their efforts in running HTS systems somewhat detached from those of their company goals as a whole. This view was reinforced by the answers we got to the question, does system reliability affect your organisation’s drug discovery success? (Figure 9). The majority (35%) of respondents thought system reliability had a minor effect on their organisation’s drug discovery success, with a further 8% claiming it had no effect. However, of the remaining respondents 57% thought its effects were greater (rated moderate, important or major). Only 17% believed system reliability had a major effect, and 25% an important effect on their organisation’s drug discovery success. If having a fully operational system is key to successfully running screens, it is difficult to comprehend how this is also not of major importance to an organisation’s drug discovery success.
Putting a cost on unscheduled downtime
When respondents were asked to quantify the cost of unscheduled downtime or system failure the mean cost of lost operation was estimated to be $5,800/day (24h) (Figure 10). However, there was quite a spread of estimates with 19% suggesting a value of <$500/day. Put perhaps most interestingly, 44% of respondents did not know or were not able to assign a cost to their lost downtime, suggestive of the fact they had not fully considered the impact (the cost) associated with lost downtime. However, respondents were able to put forward values for the uptime that might be gained if their systems were more reliable, and this was a median of two additional days operational per month and one additional primary screens per year. The most accessible cost associated with poor system reliability can probably be gauged from the estimated cost of repeating the unacceptable wells (those excluded) per screen, and in this case respondents reported a mean cost (ie, additional reagent and plate cost only) of $15.3K/biochemical screen and $16.4K/cell-based screen.
What are system integrators doing to improve system reliability?
Rather than review a list of features and benefits of the various system integrations on offer, we asked system integrators what they thought contributed to poor reliability and to tell us specifically what they are doing or have done to improve the reliability of their systems.
Agilent Automation Solutions (www.agilent.com/ chem/automation) has a reputation for delivering and supporting integrated systems with long walkaway times and the highest possible throughput. Several factors affect system reliability, but in the end there are three key factors to achieve the desired 24/365 availability: design for automation; test the system as close as possible to its final configuration and environment; and ensuring that change control is established by the end user. Although these may appear obvious, reliability is often lost on the detail and a careful impact analysis and log control are often the best methods to maintain and troubleshoot reliability. Before one can maintain reliability on-site, the integrator needs to achieve this in the built system. We can look at four aspects that are essential for this task. Firstly, it is important that the devices, its software and used labware are made for automation; most importantly the regular tuning and service that are acceptable for manually-used devices might not be compatible with fully integrated operation. Agilent rigorously pre-evaluate any new device and test it once integrated, with any required modifications or additional handling software. Occasionally, the use of devices with a poor track record cannot be avoided and it is then that the user has to consider a system that can handle true device pooling, that on device failure is able to automatically remove it from the pool and continue the protocol. Secondly, the use of very low maintenance, scientist-friendly robots are essential. This has driven us to build dedicated robots, hubs, translators and conveyors for the BioCel platform as well as a large number of the automation building blocks like plate sealers, centrifuge and others. Thirdly, the software environment is often an overlooked factor with a huge influence on overall reliability. Most systems depend on software for error handling and recovery and a properly tuned and backed up system is essential. Finally, there is the human factor; a missing tube, a plate off the stage by 2mm can all turn down a system. The Agilent VWorks software uses a patented error library that besides allowing for automatic plate rotation, pick and place re-try, and other pre-defined actions, can also choose to quarantine suspected plates or logically induce a controlled protocol early end to protect samples and avoid wasted reagents (Figure 11).
Beckman Coulter (www.beckmancoulter.com) develops systems that improve reliability through software functionality. Their systems combine best practice runtime reliability features with method development software for discovering errors before beginning a run. Obstructions on the deck of the liquid handler are known and the software prevents the pod from moving into a location where a crash could occur (collision detection) and will reroute gripper movements to move around obstacles (obstacle avoidance). Labware definitions prevent tips going lower in the well than the defined safe height. One or more tip touches can occur to every side or corner of the well. Volume tracking within methods prevents aspirating more liquid from a well than exists or overfilling a well. Using an accurate timing, location and resource model, SAMI EX’s planning scheduler allows a method to be reviewed without simulating the run. SAMI EX’s runtime software displays a view of the method showing plates numbered by family moving through the flow chart. Any device failure can be addressed programmatically at the device module level, automatically by the system executive using response definitions, or interactively prompting the user (through email alerts, alarm, and prompts). Assisting in error responses, every plate in the system is listed with its current location, position in the method and properties. Device modules can be reloaded/restarted, the method rescheduled, or families added to the method being run. Beckman’s development processes adhere to a strict ISO 9001:2008 registered quality systems. Systems are set up, maintained and serviced by a worldwide service organisation (Figure 12).
In CyBio’s (www.cybio-ag.com) Scheduler, inaccuracies of measurement systems can be compensated by appropriated tolerant measurement value descriptions, eg specification of lower and upper bounds. Furthermore, in case of an instrument failure, alternative data results can be generated or the user can be requested to submit alternative data results. A typical example is the generation of substitution barcodes during a barcode reader failure. Instrument failures should be handled automatically as far as possible. However, this needs a flexible failure management which is not restricted to a local failure evaluation. Instead, the overall consequence on the workflow must be taken into account, including the prevention of deadlock conditions. The CyBio® Scheduler software has implemented a multi-level failure management, which ensures appropriate standard failure handling for any use case. The standard failure handling can be and consistency verification. Moreover, within their automated systems, CyBio offers several sensor- based consistency verifications like fill level monitoring, microplate detection sensors and many more (Figure 13).
GNF Systems’ (www.GNFSystems.com) high throughput screening instruments are designed, developed, built and tested with reliability as the core objective. Beginning at the concept stage, its applications and engineering teams have spent significant time and effort refining and simplifying processes and design to achieve the maximum screening utility with the least complexity. GNF Systems then designs its hardware and systems with the goal of zero failures. Nothing is left to chance in the designs of its integrated genomics and high throughput screening platforms. For example, GNF Systems robots never simply set a microplate down and hope it hasn’t drifted out of position; its microplates are always positively and actively positioned at all stages of plate handling and transfer including within the GNF Systems robotic microplate gripper. Reliable screening systems enable researchers to concentrate on drug discovery and not automation issues. To this end GNF Systems designed and built its own lines of bulk reagent washer/dispensers, automated incubators and compound transfer devices. It also integrates a range of third party instruments, including readers and acoustic dispensers. However, such instruments are extensively tested and validated to meet GNF Systems’ reliability objectives before being incorporated into a GNF System. Innovations to improve the reliability of every hardware and software component of a GNF system, in conjunction with learning from a decade of use, has led to the generation of an automated platform that excels in assay performance, speed and most importantly, reliability (Figure 14).
HighRes Biosolutions’ (www.highresbio.com) service records show that approximately 90% of service tickets are related to the failure of third party integrated devices. Accordingly, HighRes is putting a lot of research and development effort into improving device reliability. It determined that of these service tickets relating to device failures, the vast majority of failures were of storage devices, ie incubators, freezers and carousels. In order to address this, HighRes has developed a complete line of highly-reliable storage devices and other instruments. Another method of improving system reliability is to use redundancy of components so that a single point of failure cannot ruin an entire run. To achieve redundancy, several instances of a given component can be on a system at once. Although effective, this uses valuable robot real estate at the expense of other instruments that could increase the functionality and flexibility of the system. HighRes’ docking technology allows redundant instruments to remain off-line until they are needed, at which point they can be hot-swapped with failed devices. Lastly, HighRes has also developed powerful software error-handling routines so that when instruments do fail, the user has the greatest chance of recovering as much of the run as possible (Figure 15).
Experience at paa (www.paa.co.uk) has shown that there are a number of causes of unreliability in HTS workcells. The first is equipment reliability within the workcell. Careful selection of instrumentation for reliability as well as functionality is critical. The ability to trap common functional errors and instigating remedial actions, before reporting the error can improve system reliability. Non-standard SBS microplates and lids can cause unexpected errors. Nominally all the same size, batches of plates can vary. Errors are reduced by the use of careful robot teaching and compliance locators (plate nests) in the work area. Robot accuracy and gripper design is crucial for system reliability. Using 3D modelling and software simulation packages, paa’s systems are designed to reduce plate placement and transportation failures. By system simulation, it is able to determine plate cycle time and identify potential sources of failure before the system is constructed. By incorporating factory automation methodologies, such as using industrial components and using failure mode analysis, our systems are built to react to both environmental changes and instrument failures. The workcell control systems also anticipate and provide warnings of upcoming issues which demand user intervention, such as levels of liquids and consumables. All errors can be fed back to the user via email and paa has implemented workcell panelling that changes colour to indicate system health. Our aim is to keep users advised of system status and attempt to recover from an error before it becomes fatal (Figure 16).
In labs from industry to academia, researchers are actively increasing their number of applications and seek the latest proven technologies to improve performance and productivity, while concurrently reducing costs and streamlining their research. To that end, high throughput screening (HTS) system reliability is critical as they must enable a broad range of applications without compromising performance or accuracy. PerkinElmer’s (www.perkinelmer. com) cell::explorer™ and plate::explorer™ systems strongly focus on a proper and intelligent integration of each individual component to increase reliability and enable robust, 24/7 operations. Although the application techniques required to perform HTS are standardised, a proper analysis of the intended functionality and performance must be considered when selecting and integrating devices. Moreover, the components should be monitored and controlled with sophisticated scheduling software to further ensure 24/7 performance without compromising system stability or incurring delays due to non-performance of individual devices and components (eg, automated flushing to remove clogging of tubing or shut down of lasers could improve the performance or lifetime of the devices). Proper pre-installation testing, service and technical support also contribute to ensuring that an HTS system performs at its best. Understanding the customers’ needs, their current laboratory configuration and the testing of a system prior to shipment is essential. In this industry, where accurate result and systems reliability are the very backbone of the facilities success throughout the drug discovery process, these critical steps are far too often overlooked. During the assessment and implementation stage, training and on-site support can significantly impact the overall longevity and performance of complex integrated systems that are required for effectively running high-throughput screening applications (Figure 17).
HTS combines laboratory instruments, a robot to move labware, and software to manage the process to a defined schedule. Instruments, robotics, labware and software all affect system failures. Software is recognised as a common cause for lost productivity and can constrain HTS systems making it difficult to replace unreliable hardware, and providing poor support to manage common failure modes. RTS Life Science (www.rts-group.com) has developed the new Sprint™6, process management and scheduling software, to improve productivity and increase up-time on any existing hardware platform. Instruments are typically designed for laboratory use, not 24/7 operation, and failures should be expected. Within Sprint™6 Software, features such as device pooling and critical schedule points ensure valuable assay reagents and data are preserved in the event of an error or failure. Powerful reporting and messaging features can alert engineers when problems occur, and configurable automated error management helps resolve problems without user interaction. Hardware configuration management and abstraction, and a large library of verified device drivers all enable users to swap out unreliable instrumentation, and even transfer assays between systems comprising different hardware with comparative ease. Modern software with drag and drop interfaces, simulation and assay statistics modules remove the trial and error involved when developing schedules. New assays can even be validated on virtual platforms in the office prior to running on hardware. On screen visual representations of the hardware platform and assay workflow, updated live provide powerful diagnostics insight and will help operators quickly identify the source of problems. Good software can make all the difference (Figure 18).
Poor reliability in complex automation systems, such as advanced HTS platforms, stems from several root causes, principally: hand-offs between robots and other devices, where the slightest missalignment or movement can cause failure; poor gripper or nest design which is intolerant of typical variations; poor or damaged barcodes; dimensional variation between supposedly standard SBS plate types; and, in many cases, significant variation within the same type of labware which cannot only affect system reliability, but also introduce systematic variation in assay results. Other factors such as failure of pneumatic actuators or vacuum pick-ups, or drift in make/break sensor positions, can also lead to system failure. TAP (www.theautomationpartnership. com) have developed numerous strategies to cope with these variations and risks. Low-level control software is coded to automatically retry operations which may occasionally fail, thus preventing an operator call out and associated process disruption for intermittent faults. Robotic teach points, particular for storage racking/ hotels, can be automatically set using a noncontact system of lasers and flags, thus ensuring positional variation is easily and consistently compensated for. Grippers and nests are designed with sufficient lead-ins and self-datuming features to accommodate typical variations in labware and positions, including differences between the pick and place location in the same nest. Where necessary, TAP also designs and supplies labware with the demanding tolerances which may be required for reliable automated operation, such as glass ‘QC’ pipettes which are guaranteed to be sufficiently straight to always engage with the corner of a flask or plate (Figure 19).
The increasing numbers of applications which can be automated on liquid handling workstations has necessitated development of process security features to safe-guard complex liquid handling operations. Innovative software and hardware solutions and magnetic sensors to confirm correct placement of labware. By actively monitoring user interactions with the worktable, the software is able to alert the operator if labware has been loaded incorrectly or removed unexpectedly, further reducing the risk of errors. With a growing reliance on automated liquid handling and high throughput techniques, the cost of failed or aborted runs is increasing almost exponentially. In addition to the loss of productivity, the loss of expensive reagents or compounds can represent a significant cost to laboratories. Error handling protocols not only safeguard against system errors, but are also an important mechanism for dealing with complex liquid handling requirements and avoiding unnecessary loss through operator errors. For some applications, it is important that the user is able to take corrective action at the earliest opportunity. Remote monitoring systems allow the operator to protect assays without having to physically supervise operation, offering increased walkaway time. A variety of strategies for remote monitoring are now available. Tecan’s CNS provides remote monitoring via an intranet or internet connection as well as using portable devices such as the iPhone®. These systems provide the operator with real time monitoring of automated laboratory processes from multiple systems, allowing them to perform other tasks from remote locations during processing (Figure 20).
Thermo Fisher Scientific (www.thermo-scientific. com/automate) partners with its customers to ensure they are involved in every aspect of system development, including hardware selection, specification of software requirements and instrument integration. The key factor impacting system reliability is appropriate component selection, this includes the automation and instrumentation. Thermo Fisher Scientific offers a comprehensive line of movers designed specifically for microtiter plate handling from simple instrument loading to complex multi-robot integrations to remove automation issues. It has implemented a host of instrument and software strategies to enhance overall reliability. Often, the weakest link is an instrument with a limited Automation Programming Interface. To compensate for limitations, Thermo has implemented a wealth of software features. Its innovative scheduling and workflow platform, Thermo Scientific Momentum, is feature-rich with appropriate recovery options and verbose contextual error messaging that enables swift recovery from instrument failure. In fact, it is the leading innovator of error-recovery, pioneering the development of unattended run-modes. Thermo’s ‘unattended mode’ automatically attempts to recover from issues, limiting their impact on the overall system performance. Momentum software is the only scheduling software that offers comprehensive deadlock avoidance, detection and recovery. Momentum identifies potential bottlenecks in advance and then schedules to resolve the issue. The company’s comprehensive portfolio of key technologies combined with advanced software solutions provides the industry with the highest standards in enabling reliability (Figure 21).
There are a number of factors that contribute to poor integrated system reliability. Four of the main failure points on systems are automated tissue culture incubators, lid removal and replacement, dropped plates and poor software integration. Complicated automated tissue culture incubator designs lead to high failure rates. Wako’s (www.wakousa.com/automation) patented automated incubator design is reliable because of its simple design, relying on the main system arm to reach inside and grab plates. Wako’s incubator has less moving parts and electronics compared to any other on the market. Lid removal and replacement is another failure point. Wako again has solved this problem by implementing a simple design. Wako lids have ‘wings’. When a plate is lowered over a Wako static lid removal station the ‘wings’ are caught allowing the lid to sit on the station while the robot hand moves away with the plate. Most robot grippers hold a plate using two fingers with rubber or pins that apply pressure to either side of the plate. Dust accumulation, DMSO spills and other factors lead to dropped plates. Wako’s patented grippers hold the plates from underneath and on three sides making dropped plates virtually impossible. Finally, buggy software with poor communication protocols can cause a system to stop for seemingly no reason. These are the most annoying issues faced with an integrated system, often leading to finger pointing between manufacturers. Wako prides itself on software robustness and is happy to provide customer references (Figure 22).
One of the most interesting findings of the survey is the wide variation in reliability and downtime of the various systems reported or respondent labs. Overall the survey findings appear to corroborate the view that HTS systems have associated with them a high degree of downtime and a significant proportion can be attributed to reliability issues. Given the importance of HTS as the main source of new leads for the pharma industry, it is perhaps surprising how little attention this downtime gets. More critically, few survey respondents appear to have given much thought to the impact of reliability of their HTS systems on their company’s research enterprise as a whole and only minimal attempts have been made to assign an economic value to the downtime associated with one of these HTS systems.
In contrast, system integrators are actively focused on delivering reliability and have collectively identified/implemented many improvements to their processes and systems over recent years to minimise downtime. Some of the principal developments adopted by system integrators to enhance reliability include (but are not limited to) the following: 1) improvements to scheduling software functionality, particularly with respect to errorhandling capabilities, deadlock avoidance and fault recovery; 2) more extensive component selection, plus greater testing and validation of all third party instrumentation; 3) designing and building more of their own lines of components and peripheral devices; 4) more rigorous testing of a system, as close as possible to the final environment/ application, prior to shipment; 5) improving gripper or nest design to ensure accurate and proper placement of labware; 6) redesign of labware to improve/better accommodate automate handling; 7) development and supply of specific labware manufactured to higher tolerances; 8) deployment of a wider variety of strategies for remote monitoring; 9) increased use of factory automation methodologies such as failure mode analysis, 3D modelling and software simulation packages to identify potential sources of error; and 10) greater emphasis on operator training and on-site support. Readers should be aware that not all these developments have been adopted by every system integrator.
To the outsider there does, however, appear to be a disparity between what system integrators believe they have delivered with their systems versus what those end-users surveyed have actually achieved with respect to system reliability. Bridging this reliability gap requires both parties to work closer, to greater understand the underlying causes of poor reliability, so improvements can be driven by observation and the necessary confidence gained to promote the wider adoption of 24-7 operation. For example, detailed analysis of the individual survey responses revealed one lab that gave its system relatively high marks for reliability and had little downtime, but at the same time commented that the software doesn’t work very well and needs to be entirely rewritten.
What this article also highlights is the need for drug discovery companies to give greater recognition of the significant contribution HTS systems make. In addition, those groups tasked with operating such systems need to take greater ownership of that role, champion their impact more widely on the drug discovery process and embrace the challenge of facilitating the widest possible operation of their systems. Downtime is not only about the immediate rectification of system malfunctions or errors, it is the realisation that failure to maximise the use of such strategic assets is costing the industry dearly. DDW
Dr John Comley is Managing Director of HTStec Limited, an independent market research consultancy whose focus is on assisting clients delivering novel enabling platform technologies (liquid handling, laboratory automation, detection instrumentation and assay reagent technologies) to drug discovery and the life sciences. Since its formation seven years ago, HTStec has published more than 50 market reports on enabling technologies and Dr Comley has authored more than 30 review articles in Drug Discovery World. Please contact firstname.lastname@example.org for more information about HTStec reports.