Abstract Objectives For 50 years, structure, process, and outcomes measures have assessed health care quality. For clinical laboratories, structural quality has generally been assessed by inspection. For assessing process, quality indicators (QIs), statistical monitors of steps in the clinical laboratory total testing, have proliferated across the globe. Connections between structural and process laboratory measures and patient outcomes, however, have rarely been demonstrated. Methods To inform further development of clinical laboratory quality systems, we conducted a selective but worldwide review of publications on clinical laboratory quality assessment. Results Some QIs, like seven generic College of American Pathologists Q-Tracks monitors, have demonstrated significant process improvement; other measures have uncovered critical opportunities to improve test selection and result management. The College of Pathologists of Australasia Key Indicator Monitoring and Management System has deployed risk calculations, introduced from failure mode effects analysis, as surrogate measures for outcomes. Showing economic value from clinical laboratory testing quality is a challenge. Conclusions Clinical laboratories should converge on fewer (7-14) rather than more (21-35) process monitors; monitors should cover all steps of the testing process under laboratory control and include especially high-risk specimen-quality QIs. Clinical laboratory stewardship, the combination of education interventions among clinician test orderers and report consumers with revision of test order formats and result reporting schemes, improves test ordering, but improving result reception is more difficult. Risk calculation reorders the importance of quality monitors by balancing three probabilities: defect frequency, weight of potential harm, and detection difficulty. The triple approach of (1) a more focused suite of generic consensus quality indicators, (2) more active clinical laboratory testing stewardship, and (3) integration of formal risk assessment, rather than competing with economic value, enhances it. Clinical laboratory testing, Process quality indicators, Pre- and postanalytic testing, Risk assessment, Diagnostic error, Q-Tracks, KIMMS, FMEA In 1966, Avedis Donabedian, a public health researcher at the University of Michigan, responded to a question from the newly created Health Services Research Section of the US Public Health Service: what is quality in health care? Donabedian replied that health care quality is a set of measurable attributes—structures, processes, and outcomes. For 50 years, Donabedian’s framework has guided government regulators, insurers, and professional accrediting agencies, practitioners, administrators, and researchers as they assess performance in health care delivery.1-4 For clinical laboratories, over the Donabedian half century, compliance with requirements, demonstrated on inspection to government regulators and/or accrediting agencies, has been the usual means to demonstrate structural quality.5,6 Measurement of clinical laboratory process and outcome quality indicators (QIs) has not reached a similar consensus. For diagnostic testing and patient monitoring, Donabedian1-3 cited (1) technical competence in performing procedures, (2) appropriateness and completeness of the information that these procedures produce, and (3) success in communicating this information as the three pillars of process quality. As outlined below, a variety of clinical laboratory QIs have used statistical control of production processes to demonstrate quality in these three spheres. Some of these monitors also impressed on observers a need for stewardship (defined below) of the testing process, both before test ordering and after result reporting. Donabedian1-3 listed patient (1) recovery, (2) restoration of function, and (3) survival as indicators of outcome quality. Connecting attributes of laboratory testing directly to these outcomes has proved challenging. In the absence of measurable direct outcome connections, risk, a concept from human factors engineering, and value, a concept from business economics, have been introduced to link laboratory processes with patient outcomes. This selective review examines progress in measuring clinical laboratory process indicators, the role of clinical laboratory stewardship, and problems connecting process with outcome quality; it suggests monitoring fewer key quality measures and offers risk calculations as criteria for choosing indices; finally, it considers the relation of indicators of clinical laboratory performance to economic value. Process Quality Clinical Laboratory Testing Is a Production Process Production processes are sequences of steps that start from inputs and end with outputs. All real production processes are liable to defects. Clinical laboratory testing’s inputs are test orders, identified patients, and samples from such patients; its outputs are reported results. Every production process opens itself to improvement, including prevention of errors, through statistical monitoring.7 Unwanted Variation Is the Focus of Process Control Regarding process variations, statistical monitoring of quality uses one important term, error, in two distinct ways: Degree of any variation among measurements of any variable Degree of unwanted variation in inputs, process steps, or results that interferes with the process reaching its intended goal Process QIs quantify the second sort of statistical error. QIs Monitor Unwanted Process Variation Step by Step QIs almost always measure frequencies of unwanted events (incidents), which interfere with specific process steps reaching their immediate objectives. Most often, monitors of unwanted variation divide the number of incidents (defects) by the total number of times the monitored steps’ performance was observed, to produce incident rates. Sometimes QIs measure processing (turnaround) times. Quality monitoring also identifies causes of the unwanted variation at monitored steps and, finally, assesses effects of countermeasures introduced to decrease the measured, unwanted variation. Most Unwanted Variation Occurs in Pre- and Postanalytic Testing Process Steps Most unwanted variation appears either in the preanalytic phase, before the central, analytic phase steps, which actually generate results, or in the postanalytic phase, after result generation. Documenting unwanted variation’s typical distribution, in the late 1990s, Plebani and Carraro8 reviewed 40,490 tests performed by the stat section of the Padua University Hospital’s clinical laboratory: they found 189 laboratory mistakes: 129 errors (68.2%) were preanalytical, 25 (13.3%) analytical, and 35 (18.5%) postanalytical, so almost seven (86.7%) of eight defects were either pre- or postanalytic. Quality Benchmarks Connect Process Monitoring to Structural Quality The College of American Pathologists (CAP)—involved since the 1950s in external quality assurance (proficiency testing) of the total testing process’s middle, analytic, result generation phase—began, in the early 1990s, to conduct Q(uality)-Probe studies that also measured performance quality in preanalytic steps (of test ordering, patient identification, and specimen collection and transport) and postanalytic steps (of report composition, communication, and recording). These studies produced quality performance benchmarks: process measurements that define the best in a class of, in this case, laboratories performing specific process steps. The Q-Probes studies both quantified performance and connected performance measures with laboratory structural attributes that facilitated or hindered performance. Over a 25-year period, the CAP sponsored 117 clinical pathology Q-Probes studies, which quantified process quality, in various settings; many measured error rates, some process (“turnaround”) times. The Q-Probes studies produced 98 peer-reviewed publications that defined quality performance benchmarks. These benchmarks shaped 61 items in the CAP’s Laboratory Accreditation Program’s Inspection Checklists, which CAP laboratory inspectors use to connect structural to process quality in laboratories.9 Continuous Quality Monitoring Leads to Continuous Quality Improvement The success of the Q-Probes benchmark studies suggested that continuous quality improvement process monitors, like those that had improved industrial production processes, would work in clinical laboratory settings. Beginning in 1999, the CAP’s Quality Practice Committee oversaw the Q-Tracks program of just such continuous statistical monitors of clinical laboratory performance.10 By 2006, Seven Q-Tracks Monitors Had Covered All Total Testing Process Steps Under Laboratory Control These steps ranged from measuring order entry accuracy to assessing result report integrity. By 2011, six generic monitors, monitors not dependent on a specific genre of testing, had demonstrated significant decreases in defect rates and the seventh significant shortening of stat processing times. Q-Tracks monitors also identified practice characteristics associated with better or worse participating laboratory performance. Improvement in all monitors was most striking in participants who, initially, had the most room to improve and among laboratories that had been in the program the longest.11 Seven Generic Q-Tracks Monitors Covered the Laboratory Steps of the Total Testing Process In the suite of seven monitors, three indicators indexed defects in preanalytic steps: order entry, identification of patients, and collection, transport, and handling of specimens. Two addressed process (turnaround) time each in different ways: measuring median order-to-report time for tests whose results triggered time-sensitive interventions and quantifying fractions of test-processing events that exceed a set reporting time objective for other time-critical reporting events. Finally, in the postanalytic phase, one indicator monitored reporting time and completeness of reporting events that inform clinicians of results of immediate import (called, in the past, in the United States, critical values). The other postanalytic indicator measures the integrity of issued reports, indexing how many had to be edited (in the United States, amended) or withdrawn Table 1 . Table 1 Seven Generic Q-Tracks Monitors That Cover the Laboratory-Controlled Steps of the Clinical Laboratory Total Testing Process 1. Test order entry errors 2. Wrong or incomplete patient identification 3. Inadequate sample identification, format, or integrity 4. Prolonged stat test 5. Fraction of outliers beyond expected specimen reception to result cycle time 6. Slow or incomplete report events of particularly important results (called in the United States critical orders) 7. Fraction of reports corrected or withdrawn after initial transmission to patient records (called in the United States amended reports) 1. Test order entry errors 2. Wrong or incomplete patient identification 3. Inadequate sample identification, format, or integrity 4. Prolonged stat test 5. Fraction of outliers beyond expected specimen reception to result cycle time 6. Slow or incomplete report events of particularly important results (called in the United States critical orders) 7. Fraction of reports corrected or withdrawn after initial transmission to patient records (called in the United States amended reports) View Large International Development of QIs Has Been Widespread and Extensive Not only in North America but across the globe, laboratorians have recognized that QIs both specify key benchmarks and improve the total testing process (TTP).12 An early publication on process indicators appeared in Australia in 1996.13 Over the past decade, several have been published from European centers, including Barcelona and Zagreb,14,15 while investigators in Brazil and the United Kingdom have also published similar accounts of QI development.16-19 Beginning in 2008, the International Federation of Clinical Chemistry and Laboratory Medicine (IFCC) Working Group: Laboratory Errors and Patient Safety (WG:LEPS) has identified more than 50 model quality indicators (MQIs) and developed a website into which participants input data from monitors and from which the MQIs report quality indicator results. The WG:LEPS is now harmonizing QIs to involve more clinical laboratories in collecting data and to report more relevant analyses.20-24 Monitoring Clinical Laboratory QIs Produces Quantified Improvements in All Testing Process Phases In the preanalytic phase, QIs have, for hundreds of laboratories, led specifically to fewer misidentified patients, fewer specimens collected in wrong tubes, and fewer collection events yielding an inadequate sample volume, as well as fewer blood samples collected in a hemolyzed or clotted condition. In the postanalytic phase of the process, QIs have stimulated more timely and complete reporting of critical value results and led to fewer corrected (amended) or withdrawn reports. In the more stable, middle, analytic phase of the TTP, QIs have clarified the determinants of stat processing times, both from order to result and from laboratory specimen reception to result availability.25 Laboratory Process Quality Begins and Ends “Outside the Laboratory” QIs have, however, also identified two important sources of process defects less vulnerable to laboratory-based process improvement efforts: faulty test-ordering practices and failure to register the arrival of actionable test results. In this respect, QIs have taught three lessons: (1) interventions to improve test ordering are most realistic when they base themselves on real, local observations rather than plausible, general claims; (2) a new model of test-ordering management (stewardship) is required; and (3) process monitoring by itself is vulnerable to the important criticism of not connecting in obvious ways to improving patient safety. Intervention to Improve Test Ordering Should Start From Monitoring Actual Events Test-ordering behavior, patterns of clinicians’ test selection, has long been recognized as likely an important source of unwanted variation. Non-measurement-based pseudodata have, however, gotten in the way of understanding test ordering’s role in triggering the total process. For example, much has been made of claims that 70% of medical decisions are driven by laboratory test results. On investigation, this assertion appears to be a medical (or hospital administrative) version of an urban legend. Generating a contrasting, factual basis for action, Ngo and colleagues26 recently monitored 72,196 patient encounters across a wide range of clinical conditions: they found laboratory tests were ordered in “only” 35% of all encounters and showed that the fractions of encounters with laboratory testing varied widely: from 98% of inpatient encounters, to 56% for emergency department visits, to 29% of outpatient appointments. The authors concluded, “It is not possible to use a single number to categorize the frequency with which laboratory tests occur in patient encounters.”26 When real, local monitoring of actual event is undertaken, as Kim and colleagues27 did, test-ordering patterns for what appear to be similar syndromes turn out indeed to vary widely from practitioner to practitioner. Process monitoring also proves essential to assessing ongoing effectiveness of plausible interventions, such as manipulation of computerized order menus presented to ordering practitioners, to decrease unwanted variation in ordering behavior.28 Clinical Laboratory Testing Stewardship Engages Test Orderers and Result Consumers From an Australian experience, to address obstacles to decreasing unwanted variation in test ordering, Spelman29 proposed two sets of interventions: one to educate test orderers and the other to change formats of test ordering. A clinical microbiologist, Spelman saw similarity between his two-pronged proposal for “clinical pathology stewardship” and antimicrobial stewardship strategies, which also combine educational and computer-based interventions. The educational interventions teach medical students and residents both general “golden rules” for test ordering and specific standard menus for particular clinical syndromes. Stewardship also introduces rotations in laboratory medicine, specifically for medical residents, to give them real, local understanding of the downstream effects of ordering practices. For the second sort of stewardship intervention, context-altering sets of computer menu changes, process monitoring turns out to be essential to demonstrate that intended aims of menu manipulations reach their three objectives: to decrease variability in order sets stimulated by similar syndromes, to obstruct duplicate test ordering, and to prevent retesting within minimal intervals.30 Monitoring is further essential to interventions that aim to change the test orderers’ “cognitive context”: for example, feedback to clinicians of audits of ordering behaviors and display of costs of very expensive tests.31 Process Monitoring Has Revealed High Frequencies of Reported Results Missed In 2011, Australian investigators published a systematic review of two studies of a general follow-up of test results. Lack of follow-up ranged from 20% to 62% for tests on inpatients and from 1% to 75% for emergency department patients, when non-follow-up incidents were calculated as a proportion of all tests. Critical test results (understood as results that should trigger immediate patient [re]evaluation or immediate intervention) and results for patients moving across health care settings (in events often characterized as “handoffs”) were two settings in which challenges were particularly evident.32 In their review, Callen and colleagues32 pointed out that quality monitoring demonstrated various interventions’ failure to counter lack of follow-up across a spectrum of paper-based, mixed paper and electronic, and electronic information systems. They concluded that, despite hopes for electronic surveillance, evidence for the effectiveness for electronic test management systems was limited. The problem of missed results has reaped substantial attention in the United States because of its medical-legal implications in malpractice litigation.33 Objections That Process-Based Indicators/Clinical Laboratory Stewardship Fail to Link to Patient Outcomes Epner et al34 argue that clinical laboratory testing causes diagnostic errors in five ways: (1) by an inappropriate test being ordered, (2) by an appropriate test not being ordered, (3) by delayed production of an appropriately ordered test result, (4) by an appropriately ordered test result being inaccurate, or (5) by an appropriately ordered, accurate test result being misapplied or not applied. They insist that an adequate taxonomy of laboratory testing–related, potential causes of diagnostic error must attend to these outcome-related attributes. Their laboratory quality taxonomy, they argue, makes three judgments that process monitors do not make: (1) to decide that a test order had been inappropriate, (2) to determine that the lack of an order was inappropriate, and (3) to judge application or nonapplication of an appropriate result as defective. Epner et al34 also urge that the two diagnostic defects that process monitors do measure, delay and inaccuracy, must be connected immediately to the magnitude of harm to patients, which these errors can be regarded as causing. Outcomes Quality Few Laboratory Process Errors Can Be Shown to Cause Patient Harm In their seminal 1997 study, already referred to above, Plebani and Carroro8 found, among almost 40,500 stat tests in their hospital laboratory in Padova, only 189 laboratory errors. Only 49 of these had a detectable impact on patient outcomes: the negative impact in 37 was to provoke further, otherwise unnecessary investigations; only 12 laboratory errors instigated inappropriate care interventions or unjustified modifications of treatment. The state of the question, in which this well-done but 20-year-old study continues to be frequently cited, indicates a dearth of published assessments relating process to outcome quality in clinical laboratory testing—and suggests the practical difficulties involved in making such assessments. Diagnostic Errors Are Still Considered Important Dangers to Patients Regarding patient harm, a pervasively cited 1999 report from the Institute of Medicine (IOM) of the American National Academy of Sciences (now the American National Academy of Medicine [NAM]) advocated best practices and safety tools for key processes in the management of diagnostic and screening tests.35 The report’s recommendation indeed influenced the design and use of two of the seven generic Q-Tracks monitors, launched in 1999 and the early 2000s: (1) monitoring of patient identification band integrity and (2) measurement of the success and completeness of critical value test result notification events.25 A decade and half later, the IOM/NAM issued another report, Improving Diagnosis in Healthcare, that recommended (1) that health care educators address performance in the diagnostic process, including areas such as clinical reasoning, teamwork, and communication, and (2) that educators address appropriate use of diagnostic tests and appropriate application of the test results on subsequent decision making. Besides education, the report also advocated (3) that “dynamic team-based” collaborations among pathologists, radiologists, “other diagnosticians,” and “treating health care professionals” focus on improving diagnostic processes; (4) that monitoring of diagnostic processes identify, learn from, and reduce diagnostic errors; and, finally, (5) that accrediting organizations, government regulators, and health insurers demand such monitoring.36 The first and second recommendations expand on Spelman’s stewardship approach. The fourth and fifth recommendations support process monitoring. The third recommendation, “to improve the diagnostic process,” places diagnostic information and its communication into the contexts of both physicians’ thinking and health system procedures. Cognitive and System Failures Both Contribute to Bad Diagnostic Outcomes Over the past decade, researchers have extensively developed both “cognitive” and “system” themes in the study of diagnostic error. In the United States, Grabner,37 Schiff,38 and Singh et al39,40 have probed the cognitive (heuristic) influences on error in diagnosis and contrasted them to system (information generation and communication) influences that also prompt erroneous diagnoses and impede their detection, amelioration, and prevention. In these studies, laboratory test results appear sometimes as stimuli to wrong diagnoses, on the cognitive side, or as failed triggers for error detection, mitigation, or prevention efforts, on the system side, but testing’s role in causing bad diagnostic outcomes seems very seldom a focus of investigators’ attention. Risk Difficulties Defining Test Order Inappropriateness and Test Result Misapplication in Harm-Based Taxonomies of Laboratory Defects In practice, the initial key criterion of inappropriateness by Epner et al34 is hard to define, case by case, in real time. Determining test orders’ appropriateness entails retrieving events both temporally antecedent and logically anterior to the orders themselves. Even with access to electronic medical records, characterizing antecedent clinical events, which determine order appropriateness, is a capacity not usually available in day-to-day medical laboratory settings. To assess the second key criterion by Epner et al,34 of results’ misapplication, also requires defining posttest clinical contexts, in which result reports are well or poorly applied, and assessing diagnostic and monitoring clinical decisions. These decisions are both temporally consequent and logically posterior to test results. Informed, serious chart reviewers come to appreciate that an extraordinarily wide range of medical knowledge and clinical judgment seasoned by extensive experience informs adequate evaluation of result-related decisions. Up to now, laboratorians have not cultivated such synthetic and analytic evaluative cognitive skills based on a wide, deep nonlaboratory knowledge base. The other two challenges of the Epner et al34 classification scheme—assessing the impact of delays in report production and/or communication and assessing the impact of report inaccuracy—are more straightforward. At present, however, getting over the first two obstacles to a harm-based classification of laboratory errors seems a steep climb, for which laboratorians lack equipment. Failure Modes and Effects Analysis and Risk Assignment Offer an Alternative Approach to a Harm-Based Taxonomy Rather than attempting direct assaults on inappropriateness and misapplication, connection between process defect causes and bad patient outcome effects may be made indirectly by adding patient risk assessment to evaluation of untoward process variations. Failure modes and effects analysis (FMEA) is a candidate approach to make the risk connection.41,42 Just as process QIs developed from the recognition that clinical laboratory testing is a production process, so FMEA in medical settings developed from the recognition that medical cogitative and system errors are similar to accidents that occur in complex industrial processes.43,44 In human factors engineering, FMEA calculates risks to each error type under study. It does so by multiplying objective error event probabilities (cause occurrence frequencies) by subjective harm probabilities (estimates of errors’ bad effects, usually assigning potential harms weights on a 1-10 scale) and subjective detection difficulty scores (quantifying how hard it would be to discover the error, also estimated on a 1-10 scale). Risk is the product of this multiplication, calculated from the first, second, and third of FMEA’s eight steps Table 2 . An example of a FMEA of specimen identification defects is given in online Appendix 1 (all supplemental materials can be found at American Journal of Clinical Pathology online). Risk Calculation From FMEA Has Six Obvious Limitations A first, major limitation of FMEA risk calculation is that the specific harm effects and detection difficulties of a specific cause failure (characterized in FMEA step 2 in Table 2) often vary widely from one diagnostic error event to the next. This variation only increases from one medical setting to another. A second, inevitable limitation of FMEA in the medical setting is that the subjective elements in assignment of degrees of detection difficulty (also in FMEA step 2 in Table 2) and harm (severity) (in FMEA step 3 in Table 2) are substantial. A third major limitation in many FMEAs is that attaching a specific process defect cause to a specific harm effect outcome is often difficult in complex medical circumstances (see FMEA step 4 in Table 2). Examining the function of controls (see FMEA step 5 in Table 2) is a fourth limitation, but one more easily overcome regarding laboratory testing than it is in other medical settings. Statistical analysis (see FMEA step 6 in Table 2), however, suffers from a fifth FMEA limitation: that it may give a deceptive appearance of precision to estimates from highly variable inputs (from steps 2 and 3) or that it may stem from very subjective inputs (in steps 2, 3, and 4). Table 2 Failure Mode Effect Analysis (FMEA) Event Frequencies, Harms, and Risks 1. Measure the objective probability of failure (error event frequency) at each step in a process implicated in detected error. (For the steps in the total testing process, quality indicators [QIs] provide these probabilities.) 2. (a) Ascertain how likely are specific bad effect(s)/harm outcome(s) due to the error event. (b) Determine how hard it is to discover this connection/detection difficulty (eg, that missed or wrong diagnoses ensued from confusion of results reports on two patients because the patients’ sample tube labels were confused). At this step, two medical judgments assign estimated likelihoods: (i) That error-events will lead to harm/outcomes. (ii) That the connection between the process error and harm can be detected. 3. (a) Assign levels of severity to the defect(s) (harm outcome weights). At this step, a third medical judgment estimates how bad the diagnostic error’s effects are. (b) Calculate risk by multiplying probabilities calculated in steps 1 to 3: event frequency (step 1) × detection difficulty (step 2) × harm weight (step 3). 4. Account for how event causes plausibly lead to harm effects. This is an instance of hypothesis generation, which is always provisional. 5. Examine the function (effectiveness/ineffectiveness) of the controls detecting problems like the defect cause (eg, the QIs), in light of detection difficulty. 6. Analyze critically the quantifiable information from steps 1, 2, and 3 that go into risk calculations—event frequencies, detection difficulties, and harm estimates. 7. Define remedial action(s) based on risk and plausibility: (a) risk from the error event/harm outcome calculation and (b) plausibility from the risk-informed hypothesis that connects event and outcome. 8. Determine the urgency and practicality of acting on event frequencies, detection difficulties, harm estimates, risk calculations, cause-effect hypotheses, assessments of control functions, and specification of remedial actions—the products of the FMEA. 1. Measure the objective probability of failure (error event frequency) at each step in a process implicated in detected error. (For the steps in the total testing process, quality indicators [QIs] provide these probabilities.) 2. (a) Ascertain how likely are specific bad effect(s)/harm outcome(s) due to the error event. (b) Determine how hard it is to discover this connection/detection difficulty (eg, that missed or wrong diagnoses ensued from confusion of results reports on two patients because the patients’ sample tube labels were confused). At this step, two medical judgments assign estimated likelihoods: (i) That error-events will lead to harm/outcomes. (ii) That the connection between the process error and harm can be detected. 3. (a) Assign levels of severity to the defect(s) (harm outcome weights). At this step, a third medical judgment estimates how bad the diagnostic error’s effects are. (b) Calculate risk by multiplying probabilities calculated in steps 1 to 3: event frequency (step 1) × detection difficulty (step 2) × harm weight (step 3). 4. Account for how event causes plausibly lead to harm effects. This is an instance of hypothesis generation, which is always provisional. 5. Examine the function (effectiveness/ineffectiveness) of the controls detecting problems like the defect cause (eg, the QIs), in light of detection difficulty. 6. Analyze critically the quantifiable information from steps 1, 2, and 3 that go into risk calculations—event frequencies, detection difficulties, and harm estimates. 7. Define remedial action(s) based on risk and plausibility: (a) risk from the error event/harm outcome calculation and (b) plausibility from the risk-informed hypothesis that connects event and outcome. 8. Determine the urgency and practicality of acting on event frequencies, detection difficulties, harm estimates, risk calculations, cause-effect hypotheses, assessments of control functions, and specification of remedial actions—the products of the FMEA. View Large Risk Calculation From FMEA Also Has Six Countervailing Strengths Its primary strength is the explicit shape it gives to connections between error events, detection difficulties, and harm outcomes via calculated risks. A second major strength is clear definitions of the different contributions, which objective evidence (event frequencies) and subjective judgments (probabilities of detection difficulty and harm weights) make to these connections. Because the amount of time and effort that FMEA requires is substantial, a third strength, in the laboratory context, is that the number of harm-causing defects in clinical laboratory testing processes that lead to untoward effects on patients is very few, so expenditures of time and effort in the practice of laboratory testing–based FMEA are not impractically great. A fourth strongpoint is that, in the laboratory setting, FMEA’s ability to measure error event frequencies step by step, in the total testing process, is greater than almost anywhere else among the processes of medical care. A fifth advantage is FMEA’s ability to provide, in its steps 5 to 8, an explicit forum that frames debates and decisions about the allocation of limited resources for remediation and prevention of harms, through the calculation of risk (1) by quantifying unwanted variation, (2) by making explicit how hard it is to connect such variations with bad outcomes, and (3) by making the case for the impact of specific harms as justification for specific systemic changes, which require institutional action—and resources. A final strength is that accrediting agencies, governmental regulators, and students of patient safety have all strenuously advocated the adoption of the FMEA model in many medical environments.45 Risk Assessment Has Inevitable Political, Adjudicatory, and Cultural Dimensions A quarter of a century ago, the British anthropologist Mary Douglas pointed out that communities apply the concept of risk in three contexts: “blame, justice, and danger.”46 Participants in FMEAs in modern medical centers can confirm Douglas’s insights that (1) “risks are always political,” (2) judgments always intervene in getting from “chance to danger,” and (3) “cultural differences”—for example, between the cultures of surgeons and physicians—shape different appreciations of “uncertainty.” Still, familiarity with FMEA’s terms of engagement assists interested parties’ communication about error events and understanding of associated harm outcomes.47 Risk Can Be Integrated Into Process Monitoring on a Large Scale In Australia, proficiency supervisors have developed a Key Incident Monitoring and Management System (KIMMS) that uses FMEA calculations of risk to construct a hierarchy of incidents (untoward process events) based on the event types’ different contributions to total risk.48 Combination of monitored incident frequencies, estimated detection difficulties, and rank ordering of harms of incidents integrates event monitoring with risk calculation in both interesting and useful ways. At present, KIMMS monitors 21 incident types among 69 laboratories in one country. The system has detected 100,000s of incidents with 10,000s in three sorts of “top 10” categories of incident types: 1 Incident types most frequently monitored by participating laboratories 2. Incident types that generate the highest frequencies of unwanted events 3. Incident types with the highest risk ratings Hemolyzed specimens lead all three lists: 82% of participants monitor that QI; it accounts for 23% of all monitored events and contributes 27% to total calculated risk. From there, the “top 10” lists vary considerably. Retracted reports do not make the most frequently monitored top 10 and are 10th in incident frequency because they account for only 3% of all monitored incidents; however, they are number 2 on the risk list, because they contribute 11% to total risk. The Most Frequently Monitored and Highest Risk Lists Have Six Monitors in Common Their contributions to incident detection frequencies and total risk are as follows: 1 As just noted above, hemolyzed samples lead both lists; they contribute more than a fifth to total error frequency and more than a quarter to total risk. 2. Retracted reports contribute, as noted above, 3% to total frequency and 11% to total risk. 3. Specimens not collected are second in frequency (18%) and sixth in risk (5%). 4. Registration errors add 8% to total incident frequency and 9% to total risk. 5. Contributing similar percentages to total error frequency and total risk is mismatched or discrepant labeling: 5% of incident frequency and 6% of risk. 6. Incorrect fill (improper sample volume) is eighth in frequency (accounting for 4% of total incidents) and seventh in risk (contributing a similar 4% to total risk). Four of these six KIMMS monitors, which appear on both the event frequency and risk calculation “top 10 lists,” are specimen-quality QIs. Their prominence seems to support the focus on sample quality in the KIMMS QI list. KIMMS Combines Likelihoods of Detection and Potential Severities of Patient Outcomes Some incidents are easy to detect, such as the absence of the patient’s name on the blood test tube (detection difficulty scored as 1/10), whereas others are almost impossible to detect, such as when the wrong patient’s name is on both the request and blood tube (detection difficulty scored as 9/10). Some harm outcomes have minor severity, such as when the potential risk involved is an extra venesection (phlebotomy) (harm weight scored 1/10). Other harms are potentially catastrophic, such as misidentification of a blood transfusion sample leading to a transfusion reaction (harm weight scored 9/10). Interestingly, because the incident type of sample hemolysis is extremely common, it represents a major risk for laboratories even after allowing for its comparatively easy detection and usually low clinical impact. In contrast, the low-prevalence incidence of transfusion identification errors represents a major risk because of its potential catastrophic consequences. What’s to Be Done Next? The four immediate tasks are to (1) pursue clinical laboratory stewardship, (2) concentrate on fewer monitors, (3) calculate risks, and (4) demonstrate economic value for laboratory tests, as addressed in the following sections. Pursue Clinical Laboratory Stewardship The imperative here is to extend quality measurement to steps in the total testing process that laboratorians do not control, antecedent to test ordering and subsequent to result reporting. Clinical laboratory stewardship’s collaboration with clinician test orderers and clinical consumers of result report information is essential not only to develop relevant, measurable indices at both ends of the total testing process but also to act on the evidence that they generate. Concentrate on Fewer Monitors The imperatives here are simultaneously to distribute appropriately a suite of a limited number of monitors over the entire sequence of process steps and to recognize the particular importance of specimen quality. Across the globe, QIs have proliferated. The IFCC WG:LEPS has defined 35 priority 1 QIs (five times the total of seven generic Q-Track monitors), two priority 2 QIs, three priority 3 QIs, and five priority 4 QIs for a total of 45 quality monitors. The WG:LEPS priority 1 QIs focus on the preanalytic phase of the total testing process: 22 (63%) of 35 monitor preanalytic steps. Within the preanalytic phase, the WG:LEPS QIs’ attention concentrates particularly on monitors of specimen quality, with 15 (68%) of 22 preanalytic QIs monitoring specimen defects.21 In contrast, the US Q-Tracks’ relatively few QIs are spread evenly over the entire laboratory-controlled testing process. The Q-Tracks QIs gather several specific WG:LEPS QIs into more widely based generic monitors: the Q-Tracks order entry monitor covers five WG:LEPS QIs, the Q-Tracks specimen acceptability QI includes 11 WG:LEPS specimen defect QIs, the Q-Tracks critical value reporting event QI tracks variables followed in three WG:LEPS reporting event QIs, and the Q-Tracks amended report QI also collects information monitored in three WG:LEPS QIs. In all, the seven Q-Tracks generic, less specimen quality–focused monitoring approach covers 25 (71%) of 35 WG:LEPS specific, specimen defect–concentrated priority 1 QIs.11,22 The Australian KIMMS follows a middle course. It monitors 21 QIs, three-fifths of the WG:LEPS total but three times the Q-Tracks seven-monitor suite. The KIMMS QIs divide into four categories: seven QIs for identification incidents, eight QIs for collection and transport incidents, three for intralaboratory preanalytic incidents, and three postanalytic occasions of unwanted variation. KIMMS, like the WG:LEPS model, maintains a focus on preanalytic variables, with 18 (86%) of 21 KIMMS monitors in that phase. KIMMS, however, divides the preanalytic QIs differently from the WG:LEPS scheme, sorting patient and specimen identification defects into the same category but distinguishing incidents occurring before specimens arrive in laboratories and from intralaboratory incidents after they have arrived.22,49 Step-by-step coverage by more inclusive monitors and focus on QIs that attend to risk are two selection criteria for the suite of standard measures that have emerged from extensive experience in the United States and Australia. All in all, laboratory-driven process QIs should probably converge in suites of fewer measures, 7 to 14 rather than 21 to 35. These suites of fewer monitors ought to maintain coverage of all process steps, as in the Q-Tracks model, but also account for error frequency and risk from errors, as in the six QIs present in both the total frequency and total risk “top 10” lists’ contributors in the KIMMS model. Calculate Risk The imperative here is to connect process defects with potential bad outcomes in a more explicit way. FMEA introduces calculated risks of patient harm outcomes by defining them as measured error event frequencies multiplied by error harm estimates and projections of detection difficulties. Harm estimates often have limited exportability from one medical setting to another, but KIMMS has accumulated experience that has focused attention and prevention on two kinds of process events: events whose high frequencies contribute substantially to total risk and those events whose frequencies are low but whose high-harm estimates gain them a place in the “top 10” of clinical laboratory testing risks. Meet the Challenge of Demonstrating Economic Value Recently, Paul Epner,49senior author of the Epner et al34 article discussed above, made an even more sweeping critique of how laboratory quality is appraised. He now argues that, in addition to not taking patient safety into adequate account, improving production performance quality has failed to demonstrate increased value from laboratory testing to five sorts of nonlaboratorians economically interested in clinical laboratory tests: clinicians, health system administrators, patients, government regulators and insurers, and diagnostics developers. An example of government regulators and insurers analyzing the economic value of a laboratory test is given in online Appendix 2.50,51 Value Epner49 defines value as the remainder when harms and missed opportunities from testing are subtracted from its benefits. He proposes, instead of a calculation of laboratory testing’s quality, a calculation of testing’s value, defined as test results’ variable worth, as a commodity, to the five sorts of consumers. Despairing of convenient connections between laboratory testing and Donabedian’s patient outcomes, Epner proposes three more global measures of laboratory testing’s positive economic value: broadening test process turnaround time to time to diagnosis, broadening result report accuracy to diagnostic accuracy, and adding a new intermediate outcome—diagnostic completeness. The new index of diagnostic completeness is composed of a set of standards determined from consumers’ standpoints. These consumer-set standards define the economic demands of consumers of laboratory information in the five domains: (1) care teams of clinicians, (2) administrators of health systems in which clinicians work, (3) patients to whom health systems provide services, (4) governmental regulators and insurers paying for patient services delivered by systems, and (5) researchers who develop diagnostic and management techniques for the other four types of consumers. Epner49 cites two classes of negative values: (1) harms suffered by consumers and (2) missed opportunities that consumers lament. Harms and missed opportunities subtract from the positive values for which consumers are willing to pay for laboratories to perform testing. The positive commodity value, as Epner sees it, is the ability to shorten the time to diagnosis, capacity to make diagnoses more accurate, and potential to make diagnoses more complete, from consumers’ different standpoints in the five domains.49 The negative values vary from one domain to another.As the concept of process quality emerged from statistical process control and the strategy of risk estimation emerged from human factors engineering, so consumer value emerges from business economics. In postindustrial, information-focused consumer economies, consumer decision makers apply one or another calculus of economic value: clinicians find themselves paid less to do more in less time; for them, clinical laboratory testing has a negative value when it is not a labor-saving device. Cost pressures drive system administrators to consider clinical laboratory testing negatively when it generates even small direct cost increases that escape administrative control. Patients regard with suspicion test results that fail to reassure them about health status or fail to force otherwise distracted providers and expense-weary insurers to deliver care. On the other side of that economic hill, regulators and insurers, in their widening domain, recognize that clinical laboratory testing primarily amplifies or dampens demand for other services, so its negative value registers when testing triggers more costly services that make health care, in aggregate, more expensive. Finally, without regulators’ mandates for their use and insurers’ funds for their purchase, inventors and investors consider with concern shrinking markets for diagnostics. The Bottom Line Pursuit of economic value, as Epner sees it, shapes the present and will drive the future of clinical laboratory testing, as efforts to increase performance quality, measured by QIs, contributed to shaping the period from 1990 to 2010.49 In this current and future economic context, however, rather than competing with economic value, the triple strategy of (1) clinical laboratory stewardship, (2) concentration on a suite of relatively few quality indicators, and (3) integration of risk calculations contributes to economic value, as the following examples demonstrate. Clinical stewardship is often the pivot on which turns reducing time to diagnosis of complex disorders. For example, one of the seven Q-Tracks QIs added measurable positive economic value by decreasing time to diagnosis of stat tests for myocardial ischemia.25 A suite of a few QIs holds great promise for global improvement of diagnostic accuracy, when the suite’s components balance monitoring of all laboratory control steps with emphasis on high-risk attributes of specimen quality. As further examples, two other generic Q-Tracks monitors demonstrated decreased order entry errors and fewer reports requiring correction across the board. These indices plausibly contribute to diagnostic accuracy, as do two other Q-Track monitors, which showed improvement in all circumstances: fewer misidentified patients and fewer inadequate specimens. All four monitors thus decrease negative economic values.25 One member of the pair of QIs just cited, patient identification band inspection—at the time of specimen collection or other laboratorian patient contact—and another generic Q-Track monitor—check of critical value reporting events’ speed and completeness—present two examples of process monitors that demonstrated significant improvement in response to regulators’ demands for good behavior, as defined by consensus requirements.25 Conclusion The triple approach of (1) a more focused suite of generic consensus QIs, (2) more active clinical laboratory testing stewardship, and (3) greater integration of formal risk assessment provides surrogate measures for outcomes for clinical laboratory testing and may enhance economic value rather than compete with it. References 1. Donabedian A. Evaluating the quality of medical care. Milbank Mem Fund Q . 1966; 44 (suppl): 166- 206. Google Scholar CrossRef Search ADS 2. Donabedian A. The quality of care: how can it be assessed? JAMA . 1988; 260: 1743- 1748. Google Scholar CrossRef Search ADS PubMed 3. Donabedian A. Evaluating the quality of medical care. 1966. Milbank Q . 2005; 83: 691- 729. Google Scholar CrossRef Search ADS PubMed 4. Ayanian JZ, Markel H. Donabedian’s lasting framework for health care quality. N Engl J Med . 2016; 375: 205- 207. Google Scholar CrossRef Search ADS PubMed 5. Wilkerson DS, Wager EA. Quality management in laboratory medicine. In: Wager EA, Horowitz RE, Siegel GP, eds. Laboratory Administration for Pathologists . Northfield, IL: CAP Press; 2011: 119- 136. 6. Carlson DA. Laboratory inspections: the view from CAP. Laboratory Medicine . 2003; 34: 373- 380. Google Scholar CrossRef Search ADS 7. Benneyan JC, Lloyd RC, Plsek PE. Statistical process control as a tool for research and healthcare improvement. Qual Saf Health Care . 2003; 12: 458- 464. Google Scholar CrossRef Search ADS PubMed 8. Plebani M, Carraro P. Mistakes in a stat laboratory: types and frequency. Clin Chem . 1997; 43: 1348- 1351. Google Scholar PubMed 9. Howanitz PJ, Perrotta PL, Bashleben CPet al. Twenty-five years of accomplishments of the College of American Pathologists q-Probes program for clinical pathology. Arch Pathol Lab Med . 2014; 138: 1141- 1149. Google Scholar CrossRef Search ADS PubMed 10. Nakhleh RE, Souers RJ, Bashleben CPet al. Fifteen years’ experience of a College of American Pathologists program for continuous monitoring and improvement. Arch Pathol Lab Med . 2014; 138: 1150- 1155. Google Scholar CrossRef Search ADS PubMed 11. Meier FA, Souers RJ, Howanitz PJet al. Seven q-Tracks monitors of laboratory quality drive general performance improvement: experience from the College of American Pathologists q-Tracks program 1999-2011. Arch Pathol Lab Med . 2015; 139: 762- 775. Google Scholar CrossRef Search ADS PubMed 12. Shahangian S, Snyder SR. Laboratory medicine quality indicators: a review of the literature. Am J Clin Pathol . 2009; 131: 418- 431. Google Scholar CrossRef Search ADS PubMed 13. Khoury M, Burnett L, Mackay MA. Error rates in Australian chemical pathology laboratories. Med J Aust . 1996; 165: 128- 130. Google Scholar PubMed 14. Kirchner MJ, Funes VA, Adzet CBet al. Quality indicators and specifications for key processes in clinical laboratories: a preliminary experience. Clin Chem Lab Med . 2007; 45: 672- 677. Google Scholar PubMed 15. Simundic AM, Topic E. Quality indicators. Biochem Med . 2008; 18: 311- 319. Google Scholar CrossRef Search ADS 16. Shcolnik W, de Oliveira CA, de São José ASet al. Brazilian laboratory indicators program. Clin Chem Lab Med . 2012; 50: 1923- 1934. Google Scholar CrossRef Search ADS PubMed 17. Barth JH. Clinical quality indicators in laboratory medicine: a survey of current practice in the UK. Ann Clin Biochem . 2011; 48: 238- 240. Google Scholar CrossRef Search ADS PubMed 18. Barth JH. Clinical quality indicators in laboratory medicine. Ann Clin Biochem . 2012; 49: 9- 16. Google Scholar CrossRef Search ADS PubMed 19. Barth JH. Selecting clinical quality indicators for laboratory medicine. Ann Clin Biochem . 2012; 49: 257- 261. Google Scholar CrossRef Search ADS PubMed 20. Plebani M, Sciacovelli L, Marinova Met al. Quality indicators in laboratory medicine: a fundamental tool for quality and patient safety. Clin Biochem . 2013; 46: 1170- 1174. Google Scholar CrossRef Search ADS PubMed 21. Plebani M, Chiozza ML, Sciacovelli L. Towards harmonization of quality indicators in laboratory medicine. Clin Chem Lab Med . 2013; 51: 187- 195. Google Scholar PubMed 22. Plebani M, Astion ML, Barth JHet al. Harmonization of quality indicators in laboratory medicine: a preliminary consensus. Clin Chem Lab Med . 2014; 52: 951- 958. Google Scholar PubMed 23. Plebani M, Sciacovelli L, Aita Aet al. Harmonization of pre-analytical quality indicators. Biochem Med (Zagreb) . 2014; 24: 105- 113. Google Scholar CrossRef Search ADS PubMed 24. Sciacovelli L, Aita A, Padoan Aet al. Performance criteria and quality indicators for the post-analytical phase. Clin Chem Lab Med . 2015; 54: 1169- 1176. 25. Meier FA, Souers RJ, Howanitz PJet al. Seven q-Tracks monitors of laboratory quality drive general performance improvement: experience from the College of American Pathologists q-Tracks program 1999-2011. Arch Pathol Lab Med . 2015; 139: 762- 775. Google Scholar CrossRef Search ADS PubMed 26. Ngo A, Gandhi P, Miller WG. Frequency that laboratory tests influence medical decisions. J Applied Lab Med . 2017; 1: 410- 414. 27. Kim JY, Dzik WH, Dighe ASet al. Utilization management in a large urban academic medical center: a 10-year experience. Am J Clin Pathol . 2011; 135: 108- 118. Google Scholar CrossRef Search ADS PubMed 28. Krasowski MD, Chudzik D, Dolezal Aet al. Promoting improved utilization of laboratory testing through changes in an electronic medical record: experience at an academic medical center. BMC Med Inform Decis Mak . 2015; 15: 11. Google Scholar CrossRef Search ADS PubMed 29. Spelman D. Inappropriate pathology ordering and pathology stewardship. Med J Aust . 2015; 202: 13- 15. Google Scholar CrossRef Search ADS PubMed 30. Procop GW, Keating C, Stagno Pet al. Reducing duplicate testing: a comparison of two clinical decision support tools. Am J Clin Pathol . 2015; 143: 623- 626. Google Scholar CrossRef Search ADS PubMed 31. Burke W, Zimmern RL. Ensuring the appropriate use of genetic tests. Nat Rev Genet . 2004; 5: 955- 959. Google Scholar CrossRef Search ADS PubMed 32. Callen J, Georgiou A, Li Jet al. The safety implications of missed test results for hospitalised patients: a systematic review. BMJ Qual Saf . 2011; 20: 194- 199. Google Scholar CrossRef Search ADS PubMed 33. Saber Tehrain AS, Lee H, Mathews SC, et al. 25-Year summary of US malpractice claims for diagnostic errors 1986-2010: an analysis from the National Practitioner Data Bank. BMJ Qual Saf . 2013; 22: 672- 680. Google Scholar CrossRef Search ADS PubMed 34. Epner PL, Gans JE, Graber ML. When diagnostic testing leads to harm: a new outcomes-based approach for laboratory medicine. BMJ Qual Saf . 2013; 22(suppl 2): ii6- ii10. Google Scholar CrossRef Search ADS PubMed 35. Kohn LT, Corrigan JM, Donaldson MS, eds. To Err Is Human: Building a Safer Health System . Washington, DC: National Academies Press; 1999: 25- 27. 36. Balough EP, Miller BT, Ball JR, eds. Improving Diagnosis in Health Care . Washington, DC: National Academies Press; 2015: 9- 10. Google Scholar CrossRef Search ADS 37. Grabner M. Next steps: envisioning a research agenda. Adv Health Sci Educ Theory Pract . 2009; 14: 107- 112. Google Scholar CrossRef Search ADS PubMed 38. Schiff GD, Hasan O, Kim Set al. Diagnostic error in medicine: analysis of 583 physician-reported errors. Arch Intern Med . 2009; 169: 1881- 1887. Google Scholar CrossRef Search ADS PubMed 39. Singh H, Giardina TD, Forjuoh SNet al. Electronic health record–based surveillance of diagnostic errors in primary care. BMJ Qual Saf . 2012; 21: 93- 100. Google Scholar CrossRef Search ADS PubMed 40. Singh H, Weingart SN. Diagnostic errors in ambulatory care: dimensions and preventive strategies. Adv Health Sci Educ Theory Pract . 2009; 14( suppl 1): 57- 61. Google Scholar CrossRef Search ADS PubMed 41. Chiozza ML, Ponzetti C. FMEA: a model for reducing medical errors. Clin Chim Acta . 2009; 404: 75- 78. Google Scholar CrossRef Search ADS PubMed 42. McElroy LM, Khorzad R, Nannicelli APet al. Failure mode and effects analysis: a comparison of two common risk prioritisation methods. BMJ Qual Saf . 2016; 25: 329- 336. Google Scholar CrossRef Search ADS PubMed 43. Reason J. Human Error . Cambridge, UK: Cambridge University Press; 1990. Google Scholar CrossRef Search ADS 44. Wachter RM. Understanding Patient Safety . 2nd ed. NY: McGraw-Hill Medical; 2012: 3- 54. 45. Krouwer JS. An improved failure mode effects analysis for hospitals. Arch Pathol Lab Med . 2004; 128: 663- 667. Google Scholar PubMed 46. Douglas M. Risk and Blame: Essays in Cultural Theory . London, UK: Routledge; 1992: 3- 124. Google Scholar CrossRef Search ADS 47. Tezak B, Anderson C, Down Aet al. Looking ahead: the use of prospective analysis to improve the quality and safety of care. Healthc Q . 2009; 12: 80- 84. Google Scholar CrossRef Search ADS PubMed 48. Badrick T, Gay S, Mackay Met al. The key incident monitoring and management system: history and role in quality improvement. Clin Chem Lab Med . 2018; 56: 264– 272. 49. Epner PL. Appraising laboratory quality and value: what’s missing? Clin Biochem . 2017; 50: 622- 624. Google Scholar CrossRef Search ADS PubMed 50. Li R, Zhang P, Barker LE,et al. Cost-effectiveness of interventions to prevent and control diabetes mellitus: a systematic review. Diabetes Care . 2010; 33: 1872- 1894. Google Scholar CrossRef Search ADS PubMed 51. CDC Diabetes Cost-Effectiveness Study Group. The cost-effectiveness of screening for type 2 diabetes. JAMA 1998; 280: 1757- 1763. CrossRef Search ADS PubMed © American Society for Clinical Pathology, 2018. All rights reserved. For permissions, please e-mail: email@example.com
American Journal of Clinical Pathology – Oxford University Press
Published: Mar 1, 2018
It’s your single place to instantly
discover and read the research
that matters to you.
Enjoy affordable access to
over 18 million articles from more than
15,000 peer-reviewed journals.
All for just $49/month
Query the DeepDyve database, plus search all of PubMed and Google Scholar seamlessly
Save any article or search result from DeepDyve, PubMed, and Google Scholar... all in one place.
All the latest content is available, no embargo periods.
“Whoa! It’s like Spotify but for academic articles.”@Phil_Robichaud