Automated Identification of Postoperative Complications Within an Electronic Medical Record Using Natural Language Processing

Harvey J. Murff; Fern FitzHenry; Michael E. Matheny; Nancy Gentry; Kristen L. Kotter; Kimberly Crimin; Robert S. Dittus; Amy K. Rosen; Peter L. Elkin; Steven H. Brown; Theodore Speroff

doi:10.1001/jama.2011.1204

Automated Identification of Postoperative Complications Within an Electronic Medical Record Using Natural Language Processing

Murff, Harvey J.;FitzHenry, Fern;Matheny, Michael E.;Gentry, Nancy;Kotter, Kristen L.;Crimin, Kimberly;Dittus, Robert S.;Rosen, Amy K.;Elkin, Peter L.;Brown, Steven H.;Speroff, Theodore 2011-08-24 00:00:00 Abstract Context Currently most automated methods to identify patient safety occurrences rely on administrative data codes; however, free-text searches of electronic medical records could represent an additional surveillance approach. Objective To evaluate a natural language processing search–approach to identify postoperative surgical complications within a comprehensive electronic medical record. Design, Setting, and Patients Cross-sectional study involving 2974 patients undergoing inpatient surgical procedures at 6 Veterans Health Administration (VHA) medical centers from 1999 to 2006. Main Outcome Measures Postoperative occurrences of acute renal failure requiring dialysis, deep vein thrombosis, pulmonary embolism, sepsis, pneumonia, or myocardial infarction identified through medical record review as part of the VA Surgical Quality Improvement Program. We determined the sensitivity and specificity of the natural language processing approach to identify these complications and compared its performance with patient safety indicators that use discharge coding information. Results The proportion of postoperative events for each sample was 2% (39 of 1924) for acute renal failure requiring dialysis, 0.7% (18 of 2327) for pulmonary embolism, 1% (29 of 2327) for deep vein thrombosis, 7% (61 of 866) for sepsis, 16% (222 of 1405) for pneumonia, and 2% (35 of 1822) for myocardial infarction. Natural language processing correctly identified 82% (95% confidence interval [CI], 67%-91%) of acute renal failure cases compared with 38% (95% CI, 25%-54%) for patient safety indicators. Similar results were obtained for venous thromboembolism (59%, 95% CI, 44%-72% vs 46%, 95% CI, 32%-60%), pneumonia (64%, 95% CI, 58%-70% vs 5%, 95% CI, 3%-9%), sepsis (89%, 95% CI, 78%-94% vs 34%, 95% CI, 24%-47%), and postoperative myocardial infarction (91%, 95% CI, 78%-97%) vs 89%, 95% CI, 74%-96%). Both natural language processing and patient safety indicators were highly specific for these diagnoses. Conclusion Among patients undergoing inpatient surgical procedures at VA medical centers, natural language processing analysis of electronic medical records to identify postoperative complications had higher sensitivity and lower specificity compared with patient safety indicators based on discharge coding. Improving patient safety remains an important priority. One method for identifying safety concerns is through screening administrative data for specific International Classification of Disease, Ninth Revision, Clinical Modification (ICD-9-CM) codes that might be suggestive of a medical injury.1,2 To expand on this method, the Agency for Healthcare Research and Quality developed a set of 20 measures, known as the patient safety indicators, which use administrative data to screen for potential adverse events that occur during hospitalization.3 Several private organizations and the Centers for Medicare & Medicaid Services use the patient safety indicator method to provide ratings on individual health care institutions.4-6 Administrative data have several intrinsic strengths as a health care quality surveillance tool. First, administrative data are readily available, easily accessible, and inexpensively captured. However, they are not without limitations. Concerns exist about the validity of administrative codes,7-9 and it can be difficult to determine from discharge diagnostic codes whether a disease entity existed before the patient was hospitalized or occurred during the hospital admission.10,11 With the rapid expansion of electronic medical record (EMR) use, along with increased federal support for health care information technology, a far richer source of clinical information regarding hospital-related safety events has emerged.12 The development of automated approaches, such as natural language processing, that extract specific medical concepts from textual medical documents that do not rely on discharge codes offers a powerful alternative to either unreliable administrative data or labor-intensive, expensive manual chart reviews.13 Nevertheless, there have been few studies investigating natural language processing tools for the detection of adverse events.14,15 It is not known whether a surveillance approach based on language processing searches of free-text documents will perform better than currently used tools based on administrative data. The purpose of this study was to evaluate a language processing–based approach to identify postoperative complications within a multihospital health care network using the same EMR. We hypothesized that the language processing searches would better detect surgical complications than the patient safety indicators identified from administrative discharge information. Methods Setting The study population included a randomly selected sample of Veterans Affairs Surgical Quality Improvement Program (VASQIP)–reviewed surgical inpatient admissions to 6 Veterans Health Administration medical centers across 3 states between fiscal years 1999 and 2006. At each study site the institutional review board approved the study and granted a waiver for the need to obtain informed consent for the use of patient data. Self-reported patient race/ethnicity information was obtained from demographic files. Database We linked VASQIP cases to the Veterans Affairs (VA) patient treatment file, an administrative database containing records on all veterans discharged from VA facilities. Linkage was based on the patients' identifier code and having the surgical procedure date fall between a patient treatment file admission and discharge date. Narrative clinical notes such as discharge summaries, progress notes, operative notes, microbiology reports, imaging reports, and outpatient visit notes were obtained from the Veterans Health Information System and Technology Architecture. In addition we acquired structured data tables including demographic data, vital sign information, pharmacy data files, and laboratory results. Measures VASQIP. As part of the VASQIP protocol during the time of this study, only major noncardiac surgical procedures were eligible for review. In addition, surgical procedures were excluded if performed under local nerve blocks, were low volume, or were low risk. The VASQIP nurse reviewers underwent extensive training to prospectively collect clinical information on surgical cases.16 These nurse reviewers tracked eligible surgical cases for 30 days after surgery and recorded the occurrence of 1 of 20 prespecified postoperative complications. We focused on the 6 postoperative complications (acute renal failure requiring dialysis, sepsis, deep vein thrombosis, pulmonary embolism, myocardial infarction, and pneumonia) that are also included as patient safety indicator events (Table 1). At the time of this study, the patient safety indicators for myocardial infarction and pneumonia were considered experimental. The nurse reviewer interrater reliability on postoperative occurrences has been estimated at 0.73 for acute renal failure requiring dialysis, 0.65 for pneumonia, 0.60 for myocardial infarction, 0.81 for deep vein thrombosis, and 0.89 for pulmonary embolism.17 Patient Safety Indicators. To simultaneously evaluate the approaches of natural language processing and patient safety indicators, we applied additional exclusion rules to the VASQIP cohort in order to match a previously published method for applying patient safety indicator software to VA databases.9,18 First, the patient safety indicators are hospital-based, whereas outcomes for VASQIP are surgery-based (ie, may include inpatient and outpatient outcomes). As such, we matched individual surgeries to specific hospitalizations and limited our sample to patients with a surgical episode occurring during their hospitalization. Because the patient safety indicators were designed to detect potentially preventable adverse occurrences, specific exclusion criteria were created to eliminate patients who were at high risk or had a greater likelihood of an adverse event due to preexisting comorbid illnesses or other circumstances. Thus, we applied specific safety indicator numerator and denominator exclusion rules for each safety indicator event to the VASQIP database as originally described by the Agency for Healthcare Research and Quality to obtain our analytic sample for each study outcome3 (Figure 1). In addition, as the safety indicators were developed to identify complications occurring during hospitalizations, we excluded any postoperative complications identified by VASQIP that had occurred after the patients' hospital discharge. Finally, the safety indicator combined both deep vein thrombosis and pulmonary embolism into a single category (venous thromboembolism), and as such we combined these outcomes. Natural Language Processor. The Multi-threaded Clinical Vocabulary Server natural language processor system19 was used to index the free text records used in this project. The system indexed source materials using a concept-based indexing schema. This underlying indexing schema was in turn based on the robust ontology of medical concepts available in the Systematized Nomenclature of Medicine-Clinical Terms (SNOMED-CT) terminology, a clinical health care terminology index that contains more than 310 000 active hierarchically organized concepts.20 The output of the indexing process was an extensible markup language (XML) version of the encoded clinical record and a set of relational tables that was used for measure development and implementation. An earlier language processing version of the tool has been used to examine the quality of VA disability examinations, and in this setting, the system's sensitivity for the detection of clinical problems was 99.7% and the specificity was 97.9%.21,22 EMR Measures. Source documents for this study included narrative clinical notes, such as progress notes, consultant notes, imaging reports, microbiology reports, and discharge summaries. Some documents such as electrocardiograms and some types of imaging reports were not in a machine-readable format within the EMR and were thus unable to be processed by the language processing tool. This occurred when documents were scanned in from outside sources or were rendered internally into PDF documents. Microbiology reports were transformed into structured data using regular expressions that recognized strings of text, in this case bacterial or fungal organisms but did not account for the syntax in which the term was identified. Search queries were also constructed using structured data from laboratory, pharmacy, and vital-sign databases. Because we were interested in postoperative occurrences, we only applied our search queries to documents and structured data with dates occurring after the date of the surgical procedure. In addition, to facilitate our comparison with the patient safety indicator, we applied our language processing queries to clinical narratives that occurred only within the inpatient stay and were directly associated with the surgical procedure. Narrative clinical notes were initially processed by parsing each note and then by electronically identifying specific medical concepts and mapping these concepts to SNOMED-CT concepts. Text documents were also mapped to phrase and sentence strings allowing inclusion in the rules string searches of colloquial terms or ordering of expressions not yet recognized by the language processing tool vocabulary. The rule-building process involved clinical teams working from the VASQIP criteria to create specific search criteria (Table 1). These initial queries were tested on a training set of 6 randomly selected cases for each condition and 94 randomly selected controls. Search query development was an iterative 2-stage process in which the training documents were evaluated with individual queries followed by various combinations of queries. We also tested sequential testing strategies, where an initially highly sensitive single query would be applied to the analytic samples followed by a second round of more specific queries applied to all positive hits generated from the initial query. This process was repeated for each of the 6 postoperative complications. These rules were then applied to our patient sample, which excluded any cases or controls included within the training set. Analyses VASQIP-identified postoperative complications were considered the referent standard. We applied the natural language processing software and development query rule sets to determine the rate of language processing–detected complications and ran the patient safety indicator software version 3.1 to determine the rate of safety indicator events. We calculated sensitivity and specificity for the 6 adverse outcomes of interest. Because the safety indicators combined deep vein thrombosis and pulmonary embolism into a single event, we presented the results of our search algorithms for the 2 events both separately and combined. Sensitivity was defined as the proportion of the 6 postoperative events that were identified by either the natural language processing or the patient safety indicator approach. Specificity was defined as the proportion of hospitalizations without a VASQIP-identified event that were not flagged by the corresponding natural language processing or patient safety indicator query. The positive predictive value was defined as the proportion of cases flagged by the natural language processing or patient safety indicator query that had a VASQIP-confirmed adverse event. Negative predictive value was defined as the proportion of cases not flagged by the natural language processing or the patient safety indicator query that did not have a VASQIP event. We calculated 95% confidence intervals (CIs) for sensitivity and specificity using the Wilson score method using R version 2.12.0. We used the McNemar test to compare sensitivity and specificity between the natural language processing approach and the patient safety indicator using SAS version 9.2 (SAS Institute Inc, Cary, North Carolina). Statistical testing was 2 tailed, and any P value <.05 was considered significant. Results Of the 2974 patients included in this study, the median patient age was 64.5 years and 95% were men, typical for the VA population (Table 2). Eighty-two percent of patients had an American Society of Anesthesiologist preoperative score of 3 or higher. Thirty-eight percent of operations were classified as general surgical procedures, 21% were orthopaedic surgeries, and 14% were vascular procedures. Within each analytic sample the percentage of postoperative acute renal failure requiring dialysis was 2% (39 of 1924); for pulmonary embolism, 0.7% (18 of 2327); for deep vein thrombosis, 1% (29 of 2327); for sepsis, 7% (61 of 866); for pneumonia, 16% (222 of 1405), and for myocardial infarction, 2% (35 of 1822). In general, using a natural language processing–based approach had higher sensitivities and lower specificities than did the patient safety indicator (Table 3). The increase in sensitivity of the natural language processing–based approach compared with the patient safety indicator was more than 2-fold for acute renal failure and sepsis and over 12-fold for pneumonia. Specificities were 4% to 7% higher with the patient safety indicator method than the natural language processing approach. For postoperative acute renal failure requiring dialysis, the patient safety indicator algorithm had a sensitivity of 0.38 (95% CI, 0.25-0.54) with a specificity of 1.00 (95% CI, 0.99-1.00). Natural language processing–based queries of postoperative progress notes using SNOMED terms or string searches had sensitivities ranging from 0.39 (95% CI, 0.25-0.54) to 0.77 (95% CI, 0.62-0.87; Figure 2 and eTable 1). A sequential search strategy using a natural language processing approach first followed by the patient safety indicator algorithms had a sensitivity of 0.33 (95% CI, 0.21-0.49) and a positive predictive value of 0.93 (95% CI, 0.69-1.00). The patient safety indicator algorithm had a sensitivity of 0.46 (95% CI, 0.32-0.60) and specificity of 0.98 (95% CI, 0.98-0.99) for venous thromboembolism. The natural language processing approach for venous thromboembolism had a sensitivity of 0.59 (95% CI, 0.44-0.72) and a specificity of 0.91 (95% CI, 0.90-0.92; Figure 2 and eTable 2) The patient safety indicator approach for pneumonia had a sensitivity of 0.05 (95% CI, 0.03-0.09) and a specificity of 0.99 (95% CI, 0.99-1.00; Figure 2 and eTable 3). A search strategy that identified postoperative occurrences of lung consolidation recorded within progress notes or discharge summaries had a lower sensitivity of 0.64 (95% CI, 0.58-0.70), and a specificity of 0.94 (95% CI, 0.94-0.96). Occurrences of postoperative sepsis were identified using the patient safety indicator method with a sensitivity of 0.34 (95% CI, 0.24-0.47) and a specificity of 0.99 (95% CI, 0.98-0.99; Figure 2 and eTable 4). An identification strategy combining query searches for multiorgan failure, septic shock, systemic infection, or bacterial or fungal organisms on blood culture reports resulted in a sensitivity of 0.89 (95% CI, 0.78-0.94) with a specificity of 0.95 (95% CI, 0.93-0.96) . The patient safety indicator algorithms identified postoperative myocardial infarctions with a sensitivity of 0.89 (95% CI, 0.74-0.96) and a specificity of 0.99 (95% CI, 0.98-0.99; Figure 2 and eTable 5) Combining cardiac biomarker results obtained from structured data with text searches of postoperative progress notes for SNOMED terms related to “electrocardiographic ST segment changes” resulted in a sensitivity of 0.74 (95% CI, 0.58-0.86) and a specificity of 0.98 (95% CI, 0.98-0.99). Comment We found that automated searches of an EMR using a natural language processing–based approach was able to identify occurrences of acute renal failure requiring dialysis, deep vein thrombosis, pulmonary embolism, pneumonia, sepsis, and acute myocardial infarctions in patients following surgery. Varying the search strategies and source documents resulted in differing levels of case finding and false positive alerts; however, for many outcomes, rules could be developed with both high sensitivities and high positive predictive values. For some outcomes, the choice of search strategy required substantial tradeoffs between case finding and false-positive alerts. Although the patient safety indicator algorithms offered consistently high specificities, the natural language processing approach in general had significantly greater sensitivities with only a small reduction in specificities. In addition, depending on one's chosen search strategy, positive predictive values could be moderate to high. In contrast to the patient safety indicator approach, for which test characteristics are fixed, the natural language processing approach offered a wide array of search strategies with varying test characteristics. Nevertheless in some cases, specifically postoperative myocardial infarction, the patient safety indicator algorithm had excellent test characteristics that were not improved through the natural language processing approach. A natural language processing–based approach offers several potential advantages over administrative-code based strategies to identify health care quality concerns. First is the flexibility of the approach to meet the individual institutional needs. Once documents have been processed, different approaches and query strategies to identify a specific outcome can be implemented at a relatively low programming effort using standard database query applications. Second, as opposed to administrative codes, search strategies using daily progress notes, microbiology reports, or imaging reports could be monitored on a prospective basis. Thus, this approach could potentially identify complications while a patient is still in the hospital, which could greatly facilitate real-time quality assurance processes. A natural language processing–based search strategy is far more scalable than manual abstraction, potentially allowing surveillance on an entire health care system population rather than a subsample. Finally, in systems with highly integrated EMRs, prospective surveillance could be extended to the outpatient setting for individuals remaining with the health care system. Only a few studies have used text-based approaches to identify medical complications. In a study by Melton and Hripcsak,14 the natural language processing system MedLEE was used to identify 45 adverse events tracked as part of the New York Patient Occurrence Reporting and Tracking System. The overall sensitivity of the system was 0.28 (95% CI, 0.17-0.42) with a specificity of 0.99 (95% CI, 0.98-0.99). This system was limited in that the only electronically available text source was discharge summaries. Penz et al15 compared 2 automated techniques to identify adverse events related to the use of central venous catheters. An approach using a phrase-matching algorithm had a higher sensitivity but lower specificity than an approach using natural language processing. Improvements in using automated approaches to extract information from the clinical narrative are still ongoing, but this approach is well-regarded as a current strategy for the detection of adverse events associated with medical care.13,23 Our patient safety indicator results are similar to previously published studies. Romano et al9 and Rivard et al18 found the sensitivity of the patient safety indicators in a VA population was 44% (95% CI, 32%-56%) to detect acute renal failure, 56% (95% CI, 50-63) to detect pulmonary embolism or deep vein thrombosis; and 32% (95% CI, 23%-43%) to detect sepsis. Differences in patient safety indicator event rates between VA and non-VA populations have been generally small and inconsistent.24 Rosen et al,24,25 have suggested that these differences are likely a result of inadequate case-mix adjustment. A strength of our study is its large sample size. In addition, we only applied our natural language processing queries to cases that would have been included within the patient safety indicator denominator. Although this reduced the number of total events that we would have detected, this approach helped ensure the best compatibility between the patient safety indicator and the natural language processing approaches. Another strength of our study was the use of VASQIP nurse-reviewed events as our referent standard. The VASQIP program has been in operation for more than 15 years, and nurse reviewers undergo a rigorous training protocol that has been determined to be reliable.17,26 In addition, this study applied natural language processing methods for extraction of clinical information across multiple types of medical documentation occurring over a longitudinal period of hospitalizations. Our study has several limitations. One is that the patient safety indicators were not originally designed for VA data and some of the methodological issues in modifying the patient safety indicators for VA data have been previously described.9,21 Nevertheless, patient safety indicator rates appear similar between the VA and non-VA populations.24 Perhaps the greatest limitation is that although the adoption of EMRs by health systems is improving, only a small minority of institutions currently use them, so some of the query strategies would not be feasible at all institutions.27 Nevertheless, our results should contribute to the growing literature supporting the utility of EMRs and help to encourage future adoption of such technology and the integration of such systems across health care systems. In conclusion, using natural language processing with an electronic medical record greatly improves postoperative complication identification compared with the patient safety indicators, an administrative-code based algorithm. Different query strategies produced varying sensitivity and specificity, which in many cases could be improved through combining individual queries to optimize test characteristic. A natural language processing–based approach designed to detect postoperative complications within an EMR identifies several surgical complications with moderate to good sensitivities and specificities. Developing natural language processing–based algorithms was an iterative process and in many cases query combinations resulted in improvement of poorly functioning rules. As additional institutions develop fully integrated EMR, electronic chart reviews for quality purposes should be further developed and evaluated. Back to top Article Information Corresponding Author: Harvey J. Murff, MD, MPH, Institute for Medicine and Public Health, Vanderbilt Epidemiology Center, 2525 West End Ave, Ste 600, Sixth Floor, Nashville, TN 37203 (harvey.j.murff@vanderbilt.edu). Author Contributions: Dr Speroff had full access to all the data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis. Study concept and design. Murff, Brown, Speroff Acquisition of data. Gentry, Brown, Speroff Analysis and interpretation of data. Murff, FitzHenry, Matheny, Gentry, Kotter, Crimin, Dittus, Rosen, Elkin, Brown, Speroff Drafting of the manuscript. Murff Critical revision of the manuscript for important intellectual content. Murff, FitzHenry, Matheny, Gentry, Dittus, Rosen, Elkin, Brown, Speroff Statistical analysis. Kotter, Crimin Obtained funding. Murff, Brown, Speroff Study supervision. Murff, Speroff Conflict of Interest Disclosures: All authors have completed and submitted the ICMJE Form for Disclosure of Potential Conflicts of Interests. Dr Matheny reported that he is supported by the Veterans Health Administration HSR&D Career Development Award CDA-08–020. Drs Matheny and Speroff reported that they are supported by the Veterans Health Consortium for Health Informatics Research (CHIR) awards HIR 09-001 and HIR 09-003. Dr Elkins reported that a pending grant from the National Institutes of Health. No other disclosures were made. Funding/Support: This study was supported by grant SAF-03-223 from the Department of Veterans Affairs. Role of the Sponsor: The sponsors were not involved in the design and conduct of the study; collection, management, analysis, or interpretation of the data; and preparation, review, or approval of the manuscript. Disclaimer: The view expressed in this article are those of the authors and do not necessarily reflect the position or policy of the Department of Veterans Affairs or the United States government. Additional Contributions: We thank the VA Surgical Quality Data Use Group (SQDUG) for its role as scientific advisors and for the critical review of data use and analysis presented in the manuscript. We also thank the collaborative investigators for their work in this study: Debra Jo Barrett, MSN, Lexington VA Medical Center, Lexington, Kentucky; William G. Cheadle, MD, Louisville VA Medical Center, Louisville, Kentucky; Brad L. Roper, PhD, Memphis VA Medical Center, Memphis, Tennessee; Teresa England, RN, James H. Quillen VA Medical Center, Mountain Home, Tennessee; and Sandra E. Shaw, RN, Huntington VA Medical Center, Huntington, West Virginia, none of whom received compensation. References 1. Zhan C, Miller MR. Administrative data based patient safety research: a critical review. Qual Saf Health Care. 2003;12:(suppl 2) ii58-ii6314645897PubMedGoogle ScholarCrossref 2. Iezzoni LI, Daley J, Heeren T, et al. Identifying complications of care using administrative data. Med Care. 1994;32(7):700-7158028405PubMedGoogle ScholarCrossref 3. AHRQ Guide to Patient Safety Indicators Version 3.0a. Version 3.1. Rockville, MD: Agency for Healthcare Research and Quality. March 2003 4. HealthGrades. HealthGrades seventh annual patient safety in american hospitals study. 2010. http://www.healthgrades.com/media/DMS/pdf/PatientSafetyInAmericanHospitalsStudy2010.pdf. Accessed July 20, 2010 5. Premier I. CMS/Premier Hospital Quality Incentive Demonstration (HQID). 2010; http://www.premierinc.com/quality-safety/tools-services/p4p/hqi/index.jsp. Accessed July 20, 2010 6. Centers for Medicare and Medicaid Services, Medicare Quality Monitoring System. http://www.cms.gov. Accessed July 20, 2010 7. Kaafarani HM, Rosen AK. Using administrative data to identify surgical adverse events: an introduction to the Patient Safety Indicators. Am J Surg. 2009;198(5):(suppl) S63-S6819874937PubMedGoogle ScholarCrossref 8. White RH, Sadeghi B, Tancredi DJ, et al. How valid is the ICD-9-CM based AHRQ patient safety indicator for postoperative venous thromboembolism? Med Care. 2009;47(12):1237-124319786907PubMedGoogle ScholarCrossref 9. Romano PS, Mull HJ, Rivard PE, et al. Validity of selected AHRQ patient safety indicators based on VA National Surgical Quality Improvement Program data. Health Serv Res. 2009;44(1):182-20418823449PubMedGoogle ScholarCrossref 10. Bahl V, Thompson MA, Kau TY, Hu HM, Campbell DA Jr. Do the AHRQ patient safety indicators flag conditions that are present at the time of hospital admission? Med Care. 2008;46(5):516-52218438200PubMedGoogle ScholarCrossref 11. Houchens RL, Elixhauser A, Romano PS. How often are potential patient safety events present on admission? Jt Comm J Qual Patient Saf. 2008;34(3):154-16318419045PubMedGoogle Scholar 12. Blumenthal D. Launching HITECH. N Engl J Med. 2010;362(5):382-38520042745PubMedGoogle ScholarCrossref 13. Meystre SM, Savova GK, Kipper-Schuler KC, Hurdle JF. Extracting information from textual documents in the electronic health record: a review of recent research. Yearb Med Inform. 2008;128-14418660887PubMedGoogle Scholar 14. Melton GB, Hripcsak G. Automated detection of adverse events using natural language processing of discharge summaries. J Am Med Inform Assoc. 2005;12(4):448-45715802475PubMedGoogle ScholarCrossref 15. Penz JF, Wilcox AB, Hurdle JF. Automated identification of adverse events related to central venous catheters. J Biomed Inform. 2007;40(2):174-18216901760PubMedGoogle ScholarCrossref 16. Khuri SF, Daley J, Henderson W, et al; National VA Surgical Quality Improvement Program. The Department of Veterans Affairs' NSQIP: the first national, validated, outcome-based, risk-adjusted, and peer-controlled program for the measurement and enhancement of the quality of surgical care. Ann Surg. 1998;228(4):491-5079790339PubMedGoogle ScholarCrossref 17. Davis CL, Pierce JR, Henderson W, et al. Assessment of the reliability of data collected for the Department of Veterans Affairs national surgical quality improvement program. J Am Coll Surg. 2007;204(4):550-56017382213PubMedGoogle ScholarCrossref 18. Rivard P, Elwy AR, Loveland S, et al. Applying patient safety indicators (PSIs) across healthcare systems: achieving data comparability. In: Henriksen K, Battles JB, Marks E, Lewin DI, eds. Advances in Patient Safety: From Research to Implementation. Vol 2. Rockville, MD: Agency for Healthcare Research and Quality and Department of Defense; 2005:7-25 19. Elkin PL, Brown SH, Husser CS, et al. Evaluation of the content coverage of SNOMED CT: ability of SNOMED clinical terms to represent clinical problem lists. Mayo Clin Proc. 2006;81(6):741-74816770974PubMedGoogle ScholarCrossref 20. International Health Terminology Stadards Development Organization. About SNOMED CT [Web page]. http://www.ihtsdo.org/snomed-ct/snomed-ct0/. Accessed January 20, 2011 21. Brown SH, Elkin PL, Rosenbloom ST, Fielstein E, Speroff T. eQuality for all: extending automated quality measurement of free text clinical narratives. AMIA Annu Symp Proc. November 6, 2008;71-7518999230PubMedGoogle Scholar 22. Brown SH, Speroff T, Fielstein EM, et al. eQuality: electronic quality assessment from narrative clinical reports. Mayo Clin Proc. 2006;81(11):1472-148117120403PubMedGoogle ScholarCrossref 23. Hripcsak G, Bakken S, Stetson PD, Patel VL. Mining complex clinical data for patient safety research: a framework for event discovery. J Biomed Inform. 2003;36(1-2):120-13014552853PubMedGoogle ScholarCrossref 24. Rosen AK, Rivard P, Zhao S, et al. Evaluating the patient safety indicators: how well do they perform on Veterans Health Administration data? Med Care. 2005;43(9):873-88416116352PubMedGoogle ScholarCrossref 25. Rosen AK, Zhao S, Rivard P, et al. Tracking rates of Patient Safety Indicators over time: lessons from the Veterans Administration. Med Care. 2006;44(9):850-86116932137PubMedGoogle ScholarCrossref 26. Itani KM. Fifteen years of the National Surgical Quality Improvement Program in review. Am J Surg. 2009;198(5):(suppl) S9-S1819874939PubMedGoogle ScholarCrossref 27. Jha AK, DesRoches CM, Campbell EG, et al. Use of electronic health records in US hospitals. N Engl J Med. 2009;360(16):1628-163819321858PubMedGoogle ScholarCrossref http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png JAMA American Medical Association http://www.deepdyve.com/lp/american-medical-association/automated-identification-of-postoperative-complications-within-an-MVI4KsU7dR

Loading next page...

References (31)

Houchens RL, Elixhauser A, Romano PS (2008)
How often are potential patient safety events present on admission?
Jt Comm J Qual Patient Saf, 34
About SNOMED CT [Web page].
S. Brown, Ms Md, Peter Md, S. Trent, Rosenbloom Mph, E. Fielstein, Ted Phd (2008)
eQuality for All: Extending Automated Quality Measurement of Free Text Clinical Narratives
AMIA ... Annual Symposium proceedings. AMIA Symposium
Accessed
A. Jha, C. DesRoches, E. Campbell, K. Donelan, Sowmya Rao, T. Ferris, Alexandra Shields, S. Rosenbaum, D. Blumenthal (2009)
Use of electronic health records in U.S. hospitals.
The New England journal of medicine, 360 16
P. Romano, H. Mull, P. Rivard, Shibei Zhao, W. Henderson, Susan Loveland, D. Tsilimingras, C. Christiansen, A. Rosen (2009)
Validity of selected AHRQ patient safety indicators based on VA National Surgical Quality Improvement Program data.
Health services research, 44 1
C. Davis, J. Pierce, W. Henderson, C. Spencer, Christine Tyler, R. Langberg, J. Swafford, Gladys Felan, Martha Kearns, B. Booker (2007)
Assessment of the reliability of data collected for the Department of Veterans Affairs national surgical quality improvement program.
Journal of the American College of Surgeons, 204 4
Melton GB, Hripcsak G (2005)
Automated detection of adverse events using natural language processing of discharge summaries.
J Am Med Inform Assoc, 12
Janet Penz, A. Wilcox, John Hurdle (2007)
Automated identification of adverse events related to central venous catheters
Journal of biomedical informatics, 40 2
H. Kaafarani, A. Rosen (2009)
Using administrative data to identify surgical adverse events: an introduction to the Patient Safety Indicators.
American journal of surgery, 198 5 Suppl
Khuri SF, Daley J, Henderson W (1998)
The Department of Veterans Affairs' NSQIP: the first national, validated, outcome-based, risk-adjusted, and peer-controlled program for the measurement and enhancement of the quality of surgical care.
Ann Surg, 228
S. Meystre, G. Savova, K. Kipper-Schuler, John Hurdle (2008)
Extracting Information from Textual Documents in the Electronic Health Record: A Review of Recent Research
Yearbook of Medical Informatics, 17
A. Rosen, P. Rivard, Shibei Zhao, Susan Loveland, D. Tsilimingras, C. Christiansen, Anne Elixhauser, P. Romano (2005)
Evaluating the Patient Safety Indicators: How Well Do They Perform on Veterans Health Administration Data?
Medical Care, 43
G. Hripcsak, S. Bakken, P. Stetson, V. Patel (2003)
Mining complex clinical data for patient safety research: a framework for event discovery
Journal of biomedical informatics, 36 1-2
(2010)
HealthGrades seventh annual patient safety in american hospitals study
K. Itani (2009)
Fifteen years of the National Surgical Quality Improvement Program in review.
American journal of surgery, 198 5 Suppl
S. Brown, T. Speroff, E. Fielstein, B. Bauer, D. Wahner-Roedler, R. Greevy, P. Elkin (2006)
eQuality: electronic quality assessment from narrative clinical reports.
Mayo Clinic proceedings, 81 11
(2010)
CMS/Premier Hospital Quality Incentive Demonstration (HQID). 2010; http://www .premierinc.com/quality-safety/tools-services/p4p /hqi/index.jsp
AHRQ Guide
C. Zhan, Marlene Miller (2003)
Administrative data based patient safety research: a critical review
Quality and Safety in Health Care, 12
R. Houchens, Anne Elixhauser, P. Romano (2008)
How often are potential patient safety events present on admission?
Joint Commission journal on quality and patient safety, 34 3
A. Rosen, Shibei Zhao, P. Rivard, Susan Loveland, M. Montez-Rath, Anne Elixhauser, P. Romano (2006)
Tracking Rates of Patient Safety Indicators Over Time: Lessons From the Veterans Administration
Medical Care, 44
L. Iezzoni, J. Daley, T. Heeren, S. Foley, E. Fisher, C. Duncan, J. Hughes, G. Coffman (1994)
Identifying Complications of Care Using Administrative Data
Medical Care, 32
P. Rivard, A. Elwy, Susan Loveland, Shibei Zhao, D. Tsilimingras, Anne Elixhauser, P. Romano, A. Rosen (2005)
Applying Patient Safety Indicators (PSIs) Across Health Care Systems: Achieving Data Comparability
Automated Detection of Adverse Events Using Natural Language Processing of Discharge Summaries Melton, Hripcsak, Adverse Event Detection with Natural Language Processing
Premier Hospital Quality Incentive Demonstration (HQID)
D. Blumenthal (2010)
Launching HITECH.
The New England journal of medicine, 362 5
R. White, B. Sadeghi, D. Tancredi, P. Zrelak, J. Cuny, Pradeep Sama, G. Utter, J. Geppert, P. Romano (2009)
How Valid is the ICD-9-CM Based AHRQ Patient Safety Indicator for Postoperative Venous Thromboembolism?
Medical Care, 47
P. Elkin, S. Brown, S. Brown, Casey Husser, B. Bauer, D. Wahner-Roedler, S. Rosenbloom, T. Speroff (2006)
Evaluation of the content coverage of SNOMED CT: ability of SNOMED clinical terms to represent clinical problem lists.
Mayo Clinic proceedings, 81 6
Shukri Khuri, J. Daley, W. Henderson, K. Hur, J. Demakis, J. Aust, V. Chong, P. Fabri, J. Gibbs, F. Grover, K. Hammermeister, rd Irvin, G. McDonald, E. Passaro, L. Phillips, F. Scamman, Jeannette Spencer, J. Stremple (1998)
The Department of Veterans Affairs' NSQIP: the first national, validated, outcome-based, risk-adjusted, and peer-controlled program for the measurement and enhancement of the quality of surgical care. National VA Surgical Quality Improvement Program.
Annals of surgery, 228 4
V. Bahl, Maureen Thompson, T. Kau, Hsou-Mei Hu, D. Campbell (2008)
Do the AHRQ Patient Safety Indicators Flag Conditions That Are Present at the Time of Hospital Admission?
Medical Care, 46

Publisher: American Medical Association
Copyright: Copyright © 2011 American Medical Association. All Rights Reserved.
ISSN: 0098-7484
eISSN: 1538-3598
DOI: 10.1001/jama.2011.1204
Publisher site: See Article on Publisher Site

Abstract

Abstract Context Currently most automated methods to identify patient safety occurrences rely on administrative data codes; however, free-text searches of electronic medical records could represent an additional surveillance approach. Objective To evaluate a natural language processing search–approach to identify postoperative surgical complications within a comprehensive electronic medical record. Design, Setting, and Patients Cross-sectional study involving 2974 patients undergoing inpatient surgical procedures at 6 Veterans Health Administration (VHA) medical centers from 1999 to 2006. Main Outcome Measures Postoperative occurrences of acute renal failure requiring dialysis, deep vein thrombosis, pulmonary embolism, sepsis, pneumonia, or myocardial infarction identified through medical record review as part of the VA Surgical Quality Improvement Program. We determined the sensitivity and specificity of the natural language processing approach to identify these complications and compared its performance with patient safety indicators that use discharge coding information. Results The proportion of postoperative events for each sample was 2% (39 of 1924) for acute renal failure requiring dialysis, 0.7% (18 of 2327) for pulmonary embolism, 1% (29 of 2327) for deep vein thrombosis, 7% (61 of 866) for sepsis, 16% (222 of 1405) for pneumonia, and 2% (35 of 1822) for myocardial infarction. Natural language processing correctly identified 82% (95% confidence interval [CI], 67%-91%) of acute renal failure cases compared with 38% (95% CI, 25%-54%) for patient safety indicators. Similar results were obtained for venous thromboembolism (59%, 95% CI, 44%-72% vs 46%, 95% CI, 32%-60%), pneumonia (64%, 95% CI, 58%-70% vs 5%, 95% CI, 3%-9%), sepsis (89%, 95% CI, 78%-94% vs 34%, 95% CI, 24%-47%), and postoperative myocardial infarction (91%, 95% CI, 78%-97%) vs 89%, 95% CI, 74%-96%). Both natural language processing and patient safety indicators were highly specific for these diagnoses. Conclusion Among patients undergoing inpatient surgical procedures at VA medical centers, natural language processing analysis of electronic medical records to identify postoperative complications had higher sensitivity and lower specificity compared with patient safety indicators based on discharge coding. Improving patient safety remains an important priority. One method for identifying safety concerns is through screening administrative data for specific International Classification of Disease, Ninth Revision, Clinical Modification (ICD-9-CM) codes that might be suggestive of a medical injury.1,2 To expand on this method, the Agency for Healthcare Research and Quality developed a set of 20 measures, known as the patient safety indicators, which use administrative data to screen for potential adverse events that occur during hospitalization.3 Several private organizations and the Centers for Medicare & Medicaid Services use the patient safety indicator method to provide ratings on individual health care institutions.4-6 Administrative data have several intrinsic strengths as a health care quality surveillance tool. First, administrative data are readily available, easily accessible, and inexpensively captured. However, they are not without limitations. Concerns exist about the validity of administrative codes,7-9 and it can be difficult to determine from discharge diagnostic codes whether a disease entity existed before the patient was hospitalized or occurred during the hospital admission.10,11 With the rapid expansion of electronic medical record (EMR) use, along with increased federal support for health care information technology, a far richer source of clinical information regarding hospital-related safety events has emerged.12 The development of automated approaches, such as natural language processing, that extract specific medical concepts from textual medical documents that do not rely on discharge codes offers a powerful alternative to either unreliable administrative data or labor-intensive, expensive manual chart reviews.13 Nevertheless, there have been few studies investigating natural language processing tools for the detection of adverse events.14,15 It is not known whether a surveillance approach based on language processing searches of free-text documents will perform better than currently used tools based on administrative data. The purpose of this study was to evaluate a language processing–based approach to identify postoperative complications within a multihospital health care network using the same EMR. We hypothesized that the language processing searches would better detect surgical complications than the patient safety indicators identified from administrative discharge information. Methods Setting The study population included a randomly selected sample of Veterans Affairs Surgical Quality Improvement Program (VASQIP)–reviewed surgical inpatient admissions to 6 Veterans Health Administration medical centers across 3 states between fiscal years 1999 and 2006. At each study site the institutional review board approved the study and granted a waiver for the need to obtain informed consent for the use of patient data. Self-reported patient race/ethnicity information was obtained from demographic files. Database We linked VASQIP cases to the Veterans Affairs (VA) patient treatment file, an administrative database containing records on all veterans discharged from VA facilities. Linkage was based on the patients' identifier code and having the surgical procedure date fall between a patient treatment file admission and discharge date. Narrative clinical notes such as discharge summaries, progress notes, operative notes, microbiology reports, imaging reports, and outpatient visit notes were obtained from the Veterans Health Information System and Technology Architecture. In addition we acquired structured data tables including demographic data, vital sign information, pharmacy data files, and laboratory results. Measures VASQIP. As part of the VASQIP protocol during the time of this study, only major noncardiac surgical procedures were eligible for review. In addition, surgical procedures were excluded if performed under local nerve blocks, were low volume, or were low risk. The VASQIP nurse reviewers underwent extensive training to prospectively collect clinical information on surgical cases.16 These nurse reviewers tracked eligible surgical cases for 30 days after surgery and recorded the occurrence of 1 of 20 prespecified postoperative complications. We focused on the 6 postoperative complications (acute renal failure requiring dialysis, sepsis, deep vein thrombosis, pulmonary embolism, myocardial infarction, and pneumonia) that are also included as patient safety indicator events (Table 1). At the time of this study, the patient safety indicators for myocardial infarction and pneumonia were considered experimental. The nurse reviewer interrater reliability on postoperative occurrences has been estimated at 0.73 for acute renal failure requiring dialysis, 0.65 for pneumonia, 0.60 for myocardial infarction, 0.81 for deep vein thrombosis, and 0.89 for pulmonary embolism.17 Patient Safety Indicators. To simultaneously evaluate the approaches of natural language processing and patient safety indicators, we applied additional exclusion rules to the VASQIP cohort in order to match a previously published method for applying patient safety indicator software to VA databases.9,18 First, the patient safety indicators are hospital-based, whereas outcomes for VASQIP are surgery-based (ie, may include inpatient and outpatient outcomes). As such, we matched individual surgeries to specific hospitalizations and limited our sample to patients with a surgical episode occurring during their hospitalization. Because the patient safety indicators were designed to detect potentially preventable adverse occurrences, specific exclusion criteria were created to eliminate patients who were at high risk or had a greater likelihood of an adverse event due to preexisting comorbid illnesses or other circumstances. Thus, we applied specific safety indicator numerator and denominator exclusion rules for each safety indicator event to the VASQIP database as originally described by the Agency for Healthcare Research and Quality to obtain our analytic sample for each study outcome3 (Figure 1). In addition, as the safety indicators were developed to identify complications occurring during hospitalizations, we excluded any postoperative complications identified by VASQIP that had occurred after the patients' hospital discharge. Finally, the safety indicator combined both deep vein thrombosis and pulmonary embolism into a single category (venous thromboembolism), and as such we combined these outcomes. Natural Language Processor. The Multi-threaded Clinical Vocabulary Server natural language processor system19 was used to index the free text records used in this project. The system indexed source materials using a concept-based indexing schema. This underlying indexing schema was in turn based on the robust ontology of medical concepts available in the Systematized Nomenclature of Medicine-Clinical Terms (SNOMED-CT) terminology, a clinical health care terminology index that contains more than 310 000 active hierarchically organized concepts.20 The output of the indexing process was an extensible markup language (XML) version of the encoded clinical record and a set of relational tables that was used for measure development and implementation. An earlier language processing version of the tool has been used to examine the quality of VA disability examinations, and in this setting, the system's sensitivity for the detection of clinical problems was 99.7% and the specificity was 97.9%.21,22 EMR Measures. Source documents for this study included narrative clinical notes, such as progress notes, consultant notes, imaging reports, microbiology reports, and discharge summaries. Some documents such as electrocardiograms and some types of imaging reports were not in a machine-readable format within the EMR and were thus unable to be processed by the language processing tool. This occurred when documents were scanned in from outside sources or were rendered internally into PDF documents. Microbiology reports were transformed into structured data using regular expressions that recognized strings of text, in this case bacterial or fungal organisms but did not account for the syntax in which the term was identified. Search queries were also constructed using structured data from laboratory, pharmacy, and vital-sign databases. Because we were interested in postoperative occurrences, we only applied our search queries to documents and structured data with dates occurring after the date of the surgical procedure. In addition, to facilitate our comparison with the patient safety indicator, we applied our language processing queries to clinical narratives that occurred only within the inpatient stay and were directly associated with the surgical procedure. Narrative clinical notes were initially processed by parsing each note and then by electronically identifying specific medical concepts and mapping these concepts to SNOMED-CT concepts. Text documents were also mapped to phrase and sentence strings allowing inclusion in the rules string searches of colloquial terms or ordering of expressions not yet recognized by the language processing tool vocabulary. The rule-building process involved clinical teams working from the VASQIP criteria to create specific search criteria (Table 1). These initial queries were tested on a training set of 6 randomly selected cases for each condition and 94 randomly selected controls. Search query development was an iterative 2-stage process in which the training documents were evaluated with individual queries followed by various combinations of queries. We also tested sequential testing strategies, where an initially highly sensitive single query would be applied to the analytic samples followed by a second round of more specific queries applied to all positive hits generated from the initial query. This process was repeated for each of the 6 postoperative complications. These rules were then applied to our patient sample, which excluded any cases or controls included within the training set. Analyses VASQIP-identified postoperative complications were considered the referent standard. We applied the natural language processing software and development query rule sets to determine the rate of language processing–detected complications and ran the patient safety indicator software version 3.1 to determine the rate of safety indicator events. We calculated sensitivity and specificity for the 6 adverse outcomes of interest. Because the safety indicators combined deep vein thrombosis and pulmonary embolism into a single event, we presented the results of our search algorithms for the 2 events both separately and combined. Sensitivity was defined as the proportion of the 6 postoperative events that were identified by either the natural language processing or the patient safety indicator approach. Specificity was defined as the proportion of hospitalizations without a VASQIP-identified event that were not flagged by the corresponding natural language processing or patient safety indicator query. The positive predictive value was defined as the proportion of cases flagged by the natural language processing or patient safety indicator query that had a VASQIP-confirmed adverse event. Negative predictive value was defined as the proportion of cases not flagged by the natural language processing or the patient safety indicator query that did not have a VASQIP event. We calculated 95% confidence intervals (CIs) for sensitivity and specificity using the Wilson score method using R version 2.12.0. We used the McNemar test to compare sensitivity and specificity between the natural language processing approach and the patient safety indicator using SAS version 9.2 (SAS Institute Inc, Cary, North Carolina). Statistical testing was 2 tailed, and any P value <.05 was considered significant. Results Of the 2974 patients included in this study, the median patient age was 64.5 years and 95% were men, typical for the VA population (Table 2). Eighty-two percent of patients had an American Society of Anesthesiologist preoperative score of 3 or higher. Thirty-eight percent of operations were classified as general surgical procedures, 21% were orthopaedic surgeries, and 14% were vascular procedures. Within each analytic sample the percentage of postoperative acute renal failure requiring dialysis was 2% (39 of 1924); for pulmonary embolism, 0.7% (18 of 2327); for deep vein thrombosis, 1% (29 of 2327); for sepsis, 7% (61 of 866); for pneumonia, 16% (222 of 1405), and for myocardial infarction, 2% (35 of 1822). In general, using a natural language processing–based approach had higher sensitivities and lower specificities than did the patient safety indicator (Table 3). The increase in sensitivity of the natural language processing–based approach compared with the patient safety indicator was more than 2-fold for acute renal failure and sepsis and over 12-fold for pneumonia. Specificities were 4% to 7% higher with the patient safety indicator method than the natural language processing approach. For postoperative acute renal failure requiring dialysis, the patient safety indicator algorithm had a sensitivity of 0.38 (95% CI, 0.25-0.54) with a specificity of 1.00 (95% CI, 0.99-1.00). Natural language processing–based queries of postoperative progress notes using SNOMED terms or string searches had sensitivities ranging from 0.39 (95% CI, 0.25-0.54) to 0.77 (95% CI, 0.62-0.87; Figure 2 and eTable 1). A sequential search strategy using a natural language processing approach first followed by the patient safety indicator algorithms had a sensitivity of 0.33 (95% CI, 0.21-0.49) and a positive predictive value of 0.93 (95% CI, 0.69-1.00). The patient safety indicator algorithm had a sensitivity of 0.46 (95% CI, 0.32-0.60) and specificity of 0.98 (95% CI, 0.98-0.99) for venous thromboembolism. The natural language processing approach for venous thromboembolism had a sensitivity of 0.59 (95% CI, 0.44-0.72) and a specificity of 0.91 (95% CI, 0.90-0.92; Figure 2 and eTable 2) The patient safety indicator approach for pneumonia had a sensitivity of 0.05 (95% CI, 0.03-0.09) and a specificity of 0.99 (95% CI, 0.99-1.00; Figure 2 and eTable 3). A search strategy that identified postoperative occurrences of lung consolidation recorded within progress notes or discharge summaries had a lower sensitivity of 0.64 (95% CI, 0.58-0.70), and a specificity of 0.94 (95% CI, 0.94-0.96). Occurrences of postoperative sepsis were identified using the patient safety indicator method with a sensitivity of 0.34 (95% CI, 0.24-0.47) and a specificity of 0.99 (95% CI, 0.98-0.99; Figure 2 and eTable 4). An identification strategy combining query searches for multiorgan failure, septic shock, systemic infection, or bacterial or fungal organisms on blood culture reports resulted in a sensitivity of 0.89 (95% CI, 0.78-0.94) with a specificity of 0.95 (95% CI, 0.93-0.96) . The patient safety indicator algorithms identified postoperative myocardial infarctions with a sensitivity of 0.89 (95% CI, 0.74-0.96) and a specificity of 0.99 (95% CI, 0.98-0.99; Figure 2 and eTable 5) Combining cardiac biomarker results obtained from structured data with text searches of postoperative progress notes for SNOMED terms related to “electrocardiographic ST segment changes” resulted in a sensitivity of 0.74 (95% CI, 0.58-0.86) and a specificity of 0.98 (95% CI, 0.98-0.99). Comment We found that automated searches of an EMR using a natural language processing–based approach was able to identify occurrences of acute renal failure requiring dialysis, deep vein thrombosis, pulmonary embolism, pneumonia, sepsis, and acute myocardial infarctions in patients following surgery. Varying the search strategies and source documents resulted in differing levels of case finding and false positive alerts; however, for many outcomes, rules could be developed with both high sensitivities and high positive predictive values. For some outcomes, the choice of search strategy required substantial tradeoffs between case finding and false-positive alerts. Although the patient safety indicator algorithms offered consistently high specificities, the natural language processing approach in general had significantly greater sensitivities with only a small reduction in specificities. In addition, depending on one's chosen search strategy, positive predictive values could be moderate to high. In contrast to the patient safety indicator approach, for which test characteristics are fixed, the natural language processing approach offered a wide array of search strategies with varying test characteristics. Nevertheless in some cases, specifically postoperative myocardial infarction, the patient safety indicator algorithm had excellent test characteristics that were not improved through the natural language processing approach. A natural language processing–based approach offers several potential advantages over administrative-code based strategies to identify health care quality concerns. First is the flexibility of the approach to meet the individual institutional needs. Once documents have been processed, different approaches and query strategies to identify a specific outcome can be implemented at a relatively low programming effort using standard database query applications. Second, as opposed to administrative codes, search strategies using daily progress notes, microbiology reports, or imaging reports could be monitored on a prospective basis. Thus, this approach could potentially identify complications while a patient is still in the hospital, which could greatly facilitate real-time quality assurance processes. A natural language processing–based search strategy is far more scalable than manual abstraction, potentially allowing surveillance on an entire health care system population rather than a subsample. Finally, in systems with highly integrated EMRs, prospective surveillance could be extended to the outpatient setting for individuals remaining with the health care system. Only a few studies have used text-based approaches to identify medical complications. In a study by Melton and Hripcsak,14 the natural language processing system MedLEE was used to identify 45 adverse events tracked as part of the New York Patient Occurrence Reporting and Tracking System. The overall sensitivity of the system was 0.28 (95% CI, 0.17-0.42) with a specificity of 0.99 (95% CI, 0.98-0.99). This system was limited in that the only electronically available text source was discharge summaries. Penz et al15 compared 2 automated techniques to identify adverse events related to the use of central venous catheters. An approach using a phrase-matching algorithm had a higher sensitivity but lower specificity than an approach using natural language processing. Improvements in using automated approaches to extract information from the clinical narrative are still ongoing, but this approach is well-regarded as a current strategy for the detection of adverse events associated with medical care.13,23 Our patient safety indicator results are similar to previously published studies. Romano et al9 and Rivard et al18 found the sensitivity of the patient safety indicators in a VA population was 44% (95% CI, 32%-56%) to detect acute renal failure, 56% (95% CI, 50-63) to detect pulmonary embolism or deep vein thrombosis; and 32% (95% CI, 23%-43%) to detect sepsis. Differences in patient safety indicator event rates between VA and non-VA populations have been generally small and inconsistent.24 Rosen et al,24,25 have suggested that these differences are likely a result of inadequate case-mix adjustment. A strength of our study is its large sample size. In addition, we only applied our natural language processing queries to cases that would have been included within the patient safety indicator denominator. Although this reduced the number of total events that we would have detected, this approach helped ensure the best compatibility between the patient safety indicator and the natural language processing approaches. Another strength of our study was the use of VASQIP nurse-reviewed events as our referent standard. The VASQIP program has been in operation for more than 15 years, and nurse reviewers undergo a rigorous training protocol that has been determined to be reliable.17,26 In addition, this study applied natural language processing methods for extraction of clinical information across multiple types of medical documentation occurring over a longitudinal period of hospitalizations. Our study has several limitations. One is that the patient safety indicators were not originally designed for VA data and some of the methodological issues in modifying the patient safety indicators for VA data have been previously described.9,21 Nevertheless, patient safety indicator rates appear similar between the VA and non-VA populations.24 Perhaps the greatest limitation is that although the adoption of EMRs by health systems is improving, only a small minority of institutions currently use them, so some of the query strategies would not be feasible at all institutions.27 Nevertheless, our results should contribute to the growing literature supporting the utility of EMRs and help to encourage future adoption of such technology and the integration of such systems across health care systems. In conclusion, using natural language processing with an electronic medical record greatly improves postoperative complication identification compared with the patient safety indicators, an administrative-code based algorithm. Different query strategies produced varying sensitivity and specificity, which in many cases could be improved through combining individual queries to optimize test characteristic. A natural language processing–based approach designed to detect postoperative complications within an EMR identifies several surgical complications with moderate to good sensitivities and specificities. Developing natural language processing–based algorithms was an iterative process and in many cases query combinations resulted in improvement of poorly functioning rules. As additional institutions develop fully integrated EMR, electronic chart reviews for quality purposes should be further developed and evaluated. Back to top Article Information Corresponding Author: Harvey J. Murff, MD, MPH, Institute for Medicine and Public Health, Vanderbilt Epidemiology Center, 2525 West End Ave, Ste 600, Sixth Floor, Nashville, TN 37203 (harvey.j.murff@vanderbilt.edu). Author Contributions: Dr Speroff had full access to all the data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis. Study concept and design. Murff, Brown, Speroff Acquisition of data. Gentry, Brown, Speroff Analysis and interpretation of data. Murff, FitzHenry, Matheny, Gentry, Kotter, Crimin, Dittus, Rosen, Elkin, Brown, Speroff Drafting of the manuscript. Murff Critical revision of the manuscript for important intellectual content. Murff, FitzHenry, Matheny, Gentry, Dittus, Rosen, Elkin, Brown, Speroff Statistical analysis. Kotter, Crimin Obtained funding. Murff, Brown, Speroff Study supervision. Murff, Speroff Conflict of Interest Disclosures: All authors have completed and submitted the ICMJE Form for Disclosure of Potential Conflicts of Interests. Dr Matheny reported that he is supported by the Veterans Health Administration HSR&D Career Development Award CDA-08–020. Drs Matheny and Speroff reported that they are supported by the Veterans Health Consortium for Health Informatics Research (CHIR) awards HIR 09-001 and HIR 09-003. Dr Elkins reported that a pending grant from the National Institutes of Health. No other disclosures were made. Funding/Support: This study was supported by grant SAF-03-223 from the Department of Veterans Affairs. Role of the Sponsor: The sponsors were not involved in the design and conduct of the study; collection, management, analysis, or interpretation of the data; and preparation, review, or approval of the manuscript. Disclaimer: The view expressed in this article are those of the authors and do not necessarily reflect the position or policy of the Department of Veterans Affairs or the United States government. Additional Contributions: We thank the VA Surgical Quality Data Use Group (SQDUG) for its role as scientific advisors and for the critical review of data use and analysis presented in the manuscript. We also thank the collaborative investigators for their work in this study: Debra Jo Barrett, MSN, Lexington VA Medical Center, Lexington, Kentucky; William G. Cheadle, MD, Louisville VA Medical Center, Louisville, Kentucky; Brad L. Roper, PhD, Memphis VA Medical Center, Memphis, Tennessee; Teresa England, RN, James H. Quillen VA Medical Center, Mountain Home, Tennessee; and Sandra E. Shaw, RN, Huntington VA Medical Center, Huntington, West Virginia, none of whom received compensation. References 1. Zhan C, Miller MR. Administrative data based patient safety research: a critical review. Qual Saf Health Care. 2003;12:(suppl 2) ii58-ii6314645897PubMedGoogle ScholarCrossref 2. Iezzoni LI, Daley J, Heeren T, et al. Identifying complications of care using administrative data. Med Care. 1994;32(7):700-7158028405PubMedGoogle ScholarCrossref 3. AHRQ Guide to Patient Safety Indicators Version 3.0a. Version 3.1. Rockville, MD: Agency for Healthcare Research and Quality. March 2003 4. HealthGrades. HealthGrades seventh annual patient safety in american hospitals study. 2010. http://www.healthgrades.com/media/DMS/pdf/PatientSafetyInAmericanHospitalsStudy2010.pdf. Accessed July 20, 2010 5. Premier I. CMS/Premier Hospital Quality Incentive Demonstration (HQID). 2010; http://www.premierinc.com/quality-safety/tools-services/p4p/hqi/index.jsp. Accessed July 20, 2010 6. Centers for Medicare and Medicaid Services, Medicare Quality Monitoring System. http://www.cms.gov. Accessed July 20, 2010 7. Kaafarani HM, Rosen AK. Using administrative data to identify surgical adverse events: an introduction to the Patient Safety Indicators. Am J Surg. 2009;198(5):(suppl) S63-S6819874937PubMedGoogle ScholarCrossref 8. White RH, Sadeghi B, Tancredi DJ, et al. How valid is the ICD-9-CM based AHRQ patient safety indicator for postoperative venous thromboembolism? Med Care. 2009;47(12):1237-124319786907PubMedGoogle ScholarCrossref 9. Romano PS, Mull HJ, Rivard PE, et al. Validity of selected AHRQ patient safety indicators based on VA National Surgical Quality Improvement Program data. Health Serv Res. 2009;44(1):182-20418823449PubMedGoogle ScholarCrossref 10. Bahl V, Thompson MA, Kau TY, Hu HM, Campbell DA Jr. Do the AHRQ patient safety indicators flag conditions that are present at the time of hospital admission? Med Care. 2008;46(5):516-52218438200PubMedGoogle ScholarCrossref 11. Houchens RL, Elixhauser A, Romano PS. How often are potential patient safety events present on admission? Jt Comm J Qual Patient Saf. 2008;34(3):154-16318419045PubMedGoogle Scholar 12. Blumenthal D. Launching HITECH. N Engl J Med. 2010;362(5):382-38520042745PubMedGoogle ScholarCrossref 13. Meystre SM, Savova GK, Kipper-Schuler KC, Hurdle JF. Extracting information from textual documents in the electronic health record: a review of recent research. Yearb Med Inform. 2008;128-14418660887PubMedGoogle Scholar 14. Melton GB, Hripcsak G. Automated detection of adverse events using natural language processing of discharge summaries. J Am Med Inform Assoc. 2005;12(4):448-45715802475PubMedGoogle ScholarCrossref 15. Penz JF, Wilcox AB, Hurdle JF. Automated identification of adverse events related to central venous catheters. J Biomed Inform. 2007;40(2):174-18216901760PubMedGoogle ScholarCrossref 16. Khuri SF, Daley J, Henderson W, et al; National VA Surgical Quality Improvement Program. The Department of Veterans Affairs' NSQIP: the first national, validated, outcome-based, risk-adjusted, and peer-controlled program for the measurement and enhancement of the quality of surgical care. Ann Surg. 1998;228(4):491-5079790339PubMedGoogle ScholarCrossref 17. Davis CL, Pierce JR, Henderson W, et al. Assessment of the reliability of data collected for the Department of Veterans Affairs national surgical quality improvement program. J Am Coll Surg. 2007;204(4):550-56017382213PubMedGoogle ScholarCrossref 18. Rivard P, Elwy AR, Loveland S, et al. Applying patient safety indicators (PSIs) across healthcare systems: achieving data comparability. In: Henriksen K, Battles JB, Marks E, Lewin DI, eds. Advances in Patient Safety: From Research to Implementation. Vol 2. Rockville, MD: Agency for Healthcare Research and Quality and Department of Defense; 2005:7-25 19. Elkin PL, Brown SH, Husser CS, et al. Evaluation of the content coverage of SNOMED CT: ability of SNOMED clinical terms to represent clinical problem lists. Mayo Clin Proc. 2006;81(6):741-74816770974PubMedGoogle ScholarCrossref 20. International Health Terminology Stadards Development Organization. About SNOMED CT [Web page]. http://www.ihtsdo.org/snomed-ct/snomed-ct0/. Accessed January 20, 2011 21. Brown SH, Elkin PL, Rosenbloom ST, Fielstein E, Speroff T. eQuality for all: extending automated quality measurement of free text clinical narratives. AMIA Annu Symp Proc. November 6, 2008;71-7518999230PubMedGoogle Scholar 22. Brown SH, Speroff T, Fielstein EM, et al. eQuality: electronic quality assessment from narrative clinical reports. Mayo Clin Proc. 2006;81(11):1472-148117120403PubMedGoogle ScholarCrossref 23. Hripcsak G, Bakken S, Stetson PD, Patel VL. Mining complex clinical data for patient safety research: a framework for event discovery. J Biomed Inform. 2003;36(1-2):120-13014552853PubMedGoogle ScholarCrossref 24. Rosen AK, Rivard P, Zhao S, et al. Evaluating the patient safety indicators: how well do they perform on Veterans Health Administration data? Med Care. 2005;43(9):873-88416116352PubMedGoogle ScholarCrossref 25. Rosen AK, Zhao S, Rivard P, et al. Tracking rates of Patient Safety Indicators over time: lessons from the Veterans Administration. Med Care. 2006;44(9):850-86116932137PubMedGoogle ScholarCrossref 26. Itani KM. Fifteen years of the National Surgical Quality Improvement Program in review. Am J Surg. 2009;198(5):(suppl) S9-S1819874939PubMedGoogle ScholarCrossref 27. Jha AK, DesRoches CM, Campbell EG, et al. Use of electronic health records in US hospitals. N Engl J Med. 2009;360(16):1628-163819321858PubMedGoogle ScholarCrossref

Journal

JAMA – American Medical Association

Published: Aug 24, 2011

Keywords: deep vein thrombosis,pulmonary embolism,postoperative complications,renal failure, acute,natural language processing,pneumonia,postoperative care,surgical procedures, operative,sepsis,electronic medical records,patient safety indicators,hemodialysis,myocardial infarction,venous thromboembolism,postoperative myocardial infarction,surgical complications,veterans

Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

Automated Identification of Postoperative Complications Within an Electronic Medical Record Using Natural Language Processing

Automated Identification of Postoperative Complications Within an Electronic Medical Record Using Natural Language Processing

Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

Automated Identification of Postoperative Complications Within an Electronic Medical Record Using Natural Language Processing

Automated Identification of Postoperative Complications Within an Electronic Medical Record Using Natural Language Processing

References (31)

Abstract

Journal

Recommended Articles

There are no references for this article.

Our policy towards the use of cookies