Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

A Lightweight API-Based Approach for Building Flexible Clinical NLP Systems

A Lightweight API-Based Approach for Building Flexible Clinical NLP Systems Hindawi Journal of Healthcare Engineering Volume 2019, Article ID 3435609, 11 pages https://doi.org/10.1155/2019/3435609 Research Article A Lightweight API-Based Approach for Building Flexible Clinical NLP Systems Zhengru Shen , Hugo van Krimpen, and Marco Spruit Department of Computing and Information Sciences, Utrecht University, Utrecht, Netherlands Correspondence should be addressed to Zhengru Shen; z.shen@uu.nl Received 18 February 2019; Revised 20 June 2019; Accepted 26 July 2019; Published 15 August 2019 Academic Editor: Haihong Zhang Copyright © 2019 Zhengru Shen et al. )is is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Natural language processing (NLP) has become essential for secondary use of clinical data. Over the last two decades, many clinical NLP systems were developed in both academia and industry. However, nearly all existing systems are restricted to specific clinical settings mainly because they were developed for and tested with specific datasets, and they often fail to scale up. )erefore, using existing NLP systems for one’s own clinical purposes requires substantial resources and long-term time commitments for customization and testing. Moreover, the maintenance is also troublesome and time-consuming. )is research presents a lightweight approach for building clinical NLP systems with limited resources. Following the design science research approach, we propose a lightweight architecture which is designed to be composable, extensible, and configurable. It takes NLP as an external component which can be accessed independently and orchestrated in a pipeline via web APIs. To validate its feasibility, we developed a web-based prototype for clinical concept extraction with six well-known NLP APIs and evaluated it on three clinical datasets. In comparison with available benchmarks for the datasets, three high F1 scores (0.861, 0.724, and 0.805) were obtained from the evaluation. It also gained a low F1 score (0.373) on one of the tests, which probably is due to the small size of the test dataset. )e development and evaluation of the prototype demonstrates that our approach has a great potential for building effective clinical NLP systems with limited resources. knowledge is the Unified Medical Language System (UMLS) 1. Introduction [6]. Framework refers to a software platform that integrates Today’s technologies allow the accumulation of vast textual various NLP tasks or modules either sequentially or hier- data, which consequently has boosted the popularity of NLP archically into NLP pipelines. GATE and UIMA are the research. )ere has been a huge amount of papers published leading open-source frameworks [7, 8]. )ere are two levels and a variety of NLP systems or toolkits crafted in multiple of NLP tasks: low-level tasks and high-level tasks. Low-level domains over the last two decades. Among them, clinical tasks include tokenization, part of speech tagging, sentence NLP occupies a large portion. )ere are clinical NLP sys- boundary detection, and so on. High-level tasks refer to the semantic level processing such as named entity recognition, tems, such as Apache cTAKES, that integrate different NLP tools to process clinical documents [1, 2]. )ere are also NLP relation extraction, and sentiment analysis. tools which target certain specific clinical needs, including History has shown that building a successful clinical NLP extracting medication information [3], identifying locations system requires a tremendous amount of resources. For of pulmonary embolism from radiology reports [4], and instance, it took a team from Columbia University 14 years categorizing pain status [5]. to commercialize the MedLEE system [9]. )e development Figure 1 presents a general architecture of a clinical NLP of cTAKES started at the Mayo Clinic in 2006, and further system that contains two main components: background external collaborations with four other universities in 2010 knowledge and framework [1]. Background knowledge resulted in the first release of the current Apache project [2]. contains ontologies, domain models, domain knowledge, )erefore, creating reusable NLP pipelines based on open- and trained corpora. )e widely used clinical domain source modular frameworks like GATE and UIMA becomes 2 Journal of Healthcare Engineering TeXTracT was devised to support the setup and deployment Ontologies, domain models, Background of NLP techniques on demand [15]. Abdallah et al. de- domain knowledge, trained corpora knowledge veloped a flexible and extensible framework for integrating named entity recognition (NER) web APIs and assessed it across multiple domains [19]. Although these tools exhibit NLP pipelines promising results, few were built for clinical NLP or eval- uated on clinical datasets. )erefore, it is safe to say that Framework Task 1 Task 2 ... Task N adopting these tools in clinical settings would be problematic due to the unique characteristics of the clinical domain. For example, privacy is considered to be of the utmost impor- tance, but none of the above tools have taken it into Text Structured form consideration. )is paper thus proposes a lightweight framework which Figure 1: A general architecture of clinical NLP systems. enables a rapid development of clinical NLP systems with external NLP APIs. )e approach has the following ad- vantages compared to traditional NLP frameworks: (1) fast more reasonable [9, 10]. Although it dramatically reduces resources and level of expertise, we argue that it is not an development; (2) lower costs; (3) flexibility; and (4) pro- efficient and effective solution for two main reasons. Firstly, gramming language independent. )e deployment is min- nearly every NLP pipeline that is created to address a single imized by outsourcing both NLP tasks and background specific clinical need, either rule or machine-learning based, knowledge to external API services. )us, NLP systems can has been proven to be useful for only its designated purposes be quickly and cost-efficiently developed based on the [11]. )us, reusability is difficult given the properties. Sec- proposed framework. )e framework is flexible in many aspects. To begin with, it supports the flexible combination ondly, deploying cTAKES-based NLP pipelines implies a high cost of operation which requires installation and of different NLP tasks from external APIs. Secondly, users have the freedom of choosing their preferred NLP API configuration of multiple components by NLP experts [12]. Besides, maintenance of a deployed NLP system requires a vendors, and multiple APIs can be integrated to achieve continuous investment. better results. To evaluate the framework, we have built a With the purpose of simplifying and outsourcing the web-based open-source clinical NLP application. NLP implementation, software as a service, or SaaS, has been introduced to the NLP world during recent years [13]. SaaS 2. Methods generally refers to the mode of software delivery where end- users are charged with a monthly or annual subscription fee 2.1. Design Science. Our research followed the design science to utilize a set of functionalities over the Internet [14]. NLP research as we built and evaluated the framework because of systems distributed in the SaaS model are often available its strength and popularity in solving a real-world problem through web application programming interfaces (APIs) by designing and building an innovative IT artifact [23]. In and named as NLP APIs or cloud-based NLP APIs [13, 15]. our case, the artifact is a lightweight framework that facil- Many NLP APIs have emerged from both companies and itates clinical NLP systems development. We follow the universities and are growing popularly [7]. A few prominent design science research methodology (DSRM) proposed by examples are IBM Watson, Aylien, Lexalytics, and TextRazor Peffers et al., which consists of six steps: problem identifi- [13]. From the cost-benefit perspective, these NLP APIs cation and motivation, definition of the objectives for a allow developers to rapidly create NLP-enabled tools solution, design and development, demonstration, evalua- without investing abundant resources on implementing tion, and communication [24]. necessary NLP techniques in codes and on regular main- )e DSRM is initiated by the (I) problem identification tenance. A number of applications based on NLP APIs were and motivation, which we addressed by literature study. built [16–18]. Previous studies have described the general architecture of To utilize NLP APIs, API-based frameworks have been clinical NLP systems and how expensive it is to build them. produced [15,19–21]. API-based systems, also known as Even though the introduction of modular NLP frameworks cloud-based, refer to tools that are built on external web reduced the complexity of NLP systems, it is still challenging APIs and having their functionalities partially or fully ac- to create clinical NLP systems for many healthcare in- complished with one or a pipeline of APIs. Due to the stitutions due to limited resources. Based on the identified growing popularity of web APIs in the software industry, problem, we inferred the (II) objectives for a solution: API-based tools are abundant in companies. For instance, an creating a lightweight NLP framework that enables a rapid API-based CMS (content management system) is utilized to development of an API-based clinical NLP system. In the save development resources and follow-up maintenance (III) design and development, we developed the framework [22]. Furthermore, researchers have also investigated the based on the general architecture we identified, after which approach in recent years. Rizzo and Troncy proposed the each of its components is explained in detail. To (IV) Named Entity Recognition and Disambiguation (NERD) demonstrate and (V) evaluate the framework, a web-based framework that incorporates the result of ten different public open-source clinical NLP application was developed. APIs-based NLP extractors [21]. A web-based tool called Moreover, experiments were carried out with three clinical Journal of Healthcare Engineering 3 annotations of the 2008 Obesity Challenge are different from datasets to primarily examine whether external NLP APIs would deliver the state-of-the-art performance. )e final the other two datasets. To simplify the identification of positives and negatives, we divided annotations into two step of the DSRM is the communication. )e paper serves as the start of our (VI) communication on this topic. groups: positives that are in the text and negatives which are not mentioned. )erefore, comparing clinical entities extracted by our application to the ground truth, we cal- 2.2. Evaluation Design. )ree English anonymized clinical culate the following: datasets were used in our evaluation. Two of the datasets are obtained from the Informatics for Integrating Biology and (i) TP: entities that were both extracted and annotated the Bedside (i2b2) center: 2008 Obesity Challenge and 2009 as positives Medication Challenge. )e third dataset comes from a (ii) FP: entities that were extracted as positives but were European clinical trial called OPERAM. Since the primary annotated as negatives goal of our evaluation is to prove that external general- (iii) FN: entities that were not extracted but were an- purpose NLP APIs can yield good performance on clinical notated as positives data, we only used a subset of the two large i2b2 datasets. Precision (1) represents the proportion of extracted (i) 2008 Obesity Challenge. )is dataset consists of 611 positives that are annotated positives. On the contrary, recall discharge letters. All discharge letters are annotated (2) is the proportion of annotated positives that were cor- with 16 different medical condition terms in the rectly extracted as such. F1 score (3) is the harmonic mean of context of obesity, including asthma, gastroesoph- precision and recall: ageal disorder, and depression. Terms could be TP either annotated as being in the document, not (1) precision � , being in the document, or undecided/unknown TP + FP which was treated as a not being in the document. TP )e strength of this dataset, concerning the aim of (2) recall � , TP + FN these tests, is that there are a lot of documents, snf its weakness is that it is only annotated for 16 abstract precision∗ recall terms in the context of obesity. To simplify the F1 score � 2∗ . (3) precision + recall experiment, we randomly selected 100 discharge letters and labeled each document with the medical conditions that are annotated as “present.” 3. Results (ii) 2009 Medication Challenge. 947 out of 1243 in total deidentified discharge letters have the gold standard )e section presents results in two parts: the framework and annotations. Medication names in the annotations a web-based open-source clinical NLP application. )e are used for the evaluation. By comparing the an- architecture lays down the technical groundwork, upon notated medication names with those generated which the application was constructed. )e following ex- from our application, we calculate the evaluation plains each of them in details. metrics. We also randomly select 100 out of the 947 documents. 3.1. A Lightweight NLP Architecture for Clinical NLP. )e (iii) OPERAM Dataset. )e dataset consists of five architecture addresses the issues of existing clinical NLP discharge letters that have been used during the applications, including interoperability, flexibility, and pilot of the OPERAM clinical trial [25]. Medical specific restrictions within the clinical field, such as privacy experts of the trial annotated these letters by both and security. )e strength of our proposed architecture is medical conditions and pharmaceutical drugs. shown in its capabilities: (1) freedom of assembling suitable Moreover, standardized clinical codes for each NLP APIs either sequentially or hierarchically based on annotation are included. With this dataset, we aim scenarios; (2) encoding clinical terms with comprehensive to demonstrate the performance of our NLP ap- and standardized clinical codes; (3) the built-in deidentifi- plication with clinical documents from practices, cation function to anonymize clinical documents. Figure 2 even though it is clear that the small size limits our depicts its four main components: external APIs, in- findings. frastructure, NLP pipelines, and Apps. We extracted entities of “medical condition” or “phar- maceutical drug” from the, in total, 205 clinical documents and then encoded them with UMLS. Based on the encodings, 3.1.1. External APIs. In this architecture, two types of APIs, extracted entities were filtered so that distinct entities were namely, an NLP API and a domain knowledge API, are extracted for each clinical document. In order to measure the included to parse unstructured medical text and map parsed performance of our extraction, we have used well-known terms against a medical metathesaurus, respectively. )e metrics: precision, recall, and F1 score. )ey are computed NLP API provides various cloud-based NLP services that from true positives (TP), false positives (FP), and false parse unstructured text for different purposes, including negatives (FN) for each document. As stated above, entity recognition and document classification. )e domain 4 Journal of Healthcare Engineering External APIs: UMLS API; NLP APIs NLP pipelines API 1 API 2 ... API N Apps (i) Concept extraction Infrastructure (ii) Summarization (iii) Sentiment analysis API processing Locally implemented NLP tasks Privacy Security Text Structured form Figure 2: A lightweight NLP architecture for clinical NLP. knowledge API supports the mapping of medical text to comparison with CRATE, DEDUCE is more concepts from the UMLS metathesaurus. As the most used lightweight. As a Python package, it processes biomedical database, UMLS contains millions of biomedical sensitive patient information with commands like concept names and their relations. In addition, domain “deduce.deidentify_annotations().” models and training corpora are available for specific clinical (iii) Security. )e security component controls the ac- documents such as radiology reports, pathology reports, and cess of clinical data and all external APIs. Au- discharge summaries [1]. )e UMLS is a major part of the thentication and encryption are added to safeguard solution for standardization and interoperability as it maps data sharing via the Internet. terms extracted by multiple APIs to standardized codes such (iv) (Optional) Local NLP Tasks. As discussed pre- as ATC and ICD10. viously, an external NLP API grants no control of what NLP techniques to employ. In case some specific NLP techniques are required, our local NLP 3.1.2. Infrastructure. )e infrastructure layer prepares technique component provides a choice of imple- clinical data before sending them to external APIs by dei- menting your own NLP techniques locally in a dentification and adding authentications. Furthermore, it preferred language. processes results received from external APIs for later in- tegration. An optional component, locally implemented NLP techniques, is also incorporated. 3.1.3. NLP Pipelines. )is layer provides a list of NLP ser- (i) API Processing. )e purposes of API processing are vices from which clinical applications can select the most two-fold: (1) prepare clinical text before sending suitable ones on demand. First of all, differences among NLP them to external APIs and (2) process results API providers in terms of their available NLP services are returned from external APIs. Given the difference apparent. However, as shown in Table 1, there are also a between multiple APIs, data processing is inevitable number of common NLP services. Secondly, systematic to achieve interoperability. Specific API processing studies have summarized some commonly used NLP tasks include formatting clinical text for APIs re- techniques in clinical NLP applications [11]. By combining quests, filtering results returned from APIs, and data the common NLP services of various APIs and the useful conversion. NLP techniques in clinical settings, a shortlist of NLP ser- vices is selected for the architecture. (ii) Privacy. Privacy protection is a critical issue in clinical data sharing for both research and clinical Moreover, multiple NLP services from different APIs can be integrated either sequentially or hierarchically for a practices, and privacy violations often incur legal problems with substantial consequences. )e pri- single clinical NLP task. )is enables clinical NLP appli- cations to address the limitations of individual APIs caused vacy component embedded in the infrastructure offers technical solutions to deidentify or ano- by particular NLP techniques implemented and data employed to build it. More importantly, having a config- nymize patient-level data, such as CRATE [26] and DEDUCE [27]. CRATE is an open-source software urable NLP pipeline brings scalability and flexibility. For instance, a clinical concepts extraction enabled application system that anonymizes an electronic health records database to create a research database with ano- can support combining entity extraction service from two or more of the NLP APIs in Table 1. However, interoperability nymized patients’ data. With CRATE implemented, between different NLP APIs becomes a challenge as both our approach can directly use patients’ data. In Journal of Healthcare Engineering 5 Table 1: NLP services of common NLP API providers. death certificates [28]. Other text classification ex- amples in clinical settings cover classifying a NLP API Available NLP services complete patient record with respect to its eligibility Entity extraction, concept extraction, relation IBM Watson for a clinical trial [29], categorizing ICU risk extraction, text classification, language NLU stratification from nursing notes [30], assessing detection, and sentiment analysis inpatient violence risk using routinely collected Article extraction, entity extraction, concept clinical notes [31], and among others. extraction, summarization, text classification, Aylien language detection, semantic labeling, (iii) Sentiment Analysis: Unlocking the subjective sentiment analysis, hashtag suggestion, image meaning of clinical text is particularly helpful in tagging, and microformat extraction psychology. A shared task for sentiment analysis Sentiment analysis, concept extraction, of suicide notes was carried out as an i2b2 Lexalytics categorization, named entity extraction, theme challenge [32]. extraction, and summarization Topic extraction, text classification, sentiment analysis, language detection, and linguistic Meaning Cloud 3.2. Prototype: API-Based Clinical Concept Extraction. To analysis (POS tagging, parsing, and evaluate the architecture, a prototype that extracts clinical lemmatization) concepts from clinical free texts has been developed. )is Entity extraction, concept tagging, keywords section first illustrates the design of its main components. extraction, relation extraction, text )en, the prototype itself is presented. Alchemy API classification, language detection, sentiment analysis, microformat extraction, feed detection, and linked data 3.2.1. External NLP APIs. As described above, web NLP Entity extraction, disambiguation, linking, APIs have gained wide popularity over the last few years. TextRazor keywords extraction, topic tagging, and classification Both academics and companies recognized the importance Concept extraction, translation, personality and extended their NLP systems with web APIs. As shown in Developer Cloud insights, and classification Table 2, the prototype incorporates six leading NLP APIs Entity extraction, relation extraction, and from both academia and industry in its implementation. )e Open Calais sentiment analysis selection is based on three criteria: (1) free or free trial Entity extraction, text classification, language available; (2) industrial APIs supported by big companies/ Dandelion API detection, sentiment analysis, and text teams; (3) academic APIs verified by peers. similarity Autocomplete, concept extraction, document Haven categorization, entity extraction, language 3.2.2. NLP Technique Implemented Locally. Studies have OnDemand detection, sentiment analysis, and text revealed that negation is very common in clinical reports tokenization [33, 34]. For instance, “no fracture,” “patient denies a headache,” and “he has no smoking history” often appear in clinical texts. In order to correctly extract clinical terms, their inputs and outputs might vary considerably. )erefore, negation detection becomes inevitable. However, given that the NLP pipelines contain an integration component which most of the selected NLP APIs are tools for text processing facilitates the interoperability by implementing a proper and analysis in the general domain, the negation issue of integration strategy. clinical documents is not properly tackled, and they cannot filter out irrelevant information. )erefore, negation de- 3.1.4. Apps. In the application layer, clinical NLP-enabled tection is implemented locally for the prototype. As the most applications for various needs can be created. )ey are well-known negation detection algorithm, NegEx has been produced either for performing a specific NLP task such as adopted by a number of biomedical applications [35–37]. extracting diagnoses from discharge summaries and iden- We implemented the algorithm to handle negation in this tifying drugs and dosage information from medical records prototype. or with a general purpose of processing unstructured clinical text. Existing NLP applications in clinical domains are 3.2.3. API Processing. NLP APIs first extract clinical terms categorized into the following groups: which will be filtered by the local negator. )en the UMLS (i) Concept Extraction. Kreimeyer et al. conducted a API transforms the filtered clinical terms to the standardized systematic literature review of NLP systems con- codes, such as ATC codes, ICD-10, or SNOMED, which structed to extract terms from clinical documents ensures that the extracted clinical terms are interoperable and map them to standardized clinical codes [11]. after integration. (ii) Text Classification. Classification of free text in For each extracted term, the UMLS API returns its top 10 electronic health record (EHR) has surfaced as a matched codes. )ese top matches are ranked on their popular topic in clinical NLP research. Koopman similarity to the extracted term, with the first as the most similar one. )e prototype captures the unique identifier of et al. devised a binary classifier to detect whether or not death is related to cancer using free texts of each matched code for later use. 6 Journal of Healthcare Engineering Table 2: NLP APIs selected for the prototype. API Fee Company/team References https://www.ibm.com/watson/developercloud/natural- IBM Watson NLU Free trial IBM language-understanding/api/v1/ MeaningCloud Free trial MeaningCloud LLC https://www.meaningcloud.com/developer/documentation Open Calais Free trial )omson Reuters http://www.opencalais.com/opencalais-api/ Haven OnDemand Free trial Hewlett Packard https://dev.havenondemand.com/apis TextRazor Free trial TextRazor Ltd. https://www.textrazor.com/docs/rest Dandelion API Free trial Spaziodati https://dandelion.eu/docs/ As discussed above, when multiple APIs are applied for format, JSON is the chosen format for data transferring one task, results need to be integrated. )e prototype em- between different components. Figure 4 presents a ploys a double weight system to integrate multiple APIs. )e screenshot of the application. Users need to provide first weight system determines whether an extracted term is clinical documents they want to process in the upper input similar to another extracted term from the same document. field and then select APIs and coding standards. After )e weight of a pair of two extracted terms is calculated clicking the Extract button, the results will be displayed in based on their top 10 matches from the UMLS API and then the table at the bottom. “Diseases Remote” lists the ex- is compared with the similarity threshold c; if the weight is tractions of external NLP APIs, while “Diseases Local” higher than the threshold, we consider it to be an equal term. represents results of combining external NLP APIs, the )e weight formula is shown as follows: local negation handler, and the UMLS API. Unfortunately, the application is not accessible online due to a lack of API α 3β 􏼒 􏼓 + 􏼠 􏼡≻c, (4) token management. Sharing our tokens online might incur 4 4 a charge when there are a large number of API requests. Nevertheless, researchers are able to deploy their own where α refers to the percentage of equal terms over all 10 version of the system with the source codes we share on terms and β is the percentage of equal terms over the top 3 GitHub at https://github.com/ianshan0915/MABNLP. A terms. α and β are calculated based on the UMLS API demo video is also available at https://youtu.be/ matches of two extracted terms. )e weight is a value be- dGk9NQGWYfI. tween 0 and 1, 0 being that the terms are not similar at all and 1 being exactly the same. For a given NLP task, an initial value of c � 0.1 is recommended, and then according to the 3.3. Evaluation Results. As explained before, the prototype number of false positives and false negatives, we adjust the comes with three hyperparameters that adjust the extraction value of c to achieve optimal output. )e strategy of tuning outputs: negation (κ), term similarity threshold (c), and these parameters is discussed further in Section 3.3. extractor threshold (θ). )e hyperparameter tuning was )e second weight system determines whether an manually conducted by the researchers in the experiments. extracted clinical term has enough cumulative weight from )e impacts of the controlling hyperparameters on the all NLP APIs. Since the performance of NLP APIs varies, a outputs of our experiments vary. First of all, negation weight for each individual API is estimated by using the F1 surprisingly shows little positive influence as shown in scores calculated after testing each API on a small subset of Table 3. Its main reason probably lies in the fact that the clinical documents. )e F1 score for each API is normalized implemented negation algorithm, NegEx, only uses negation to an extractor-weight ω. For each clinical term extracted, we cue words without considering the semantics of a sentence sum the weights of the extractors the term was extracted by. [34]. Implementation of more advanced algorithms, such as If the weight is over the extractor threshold θ, it is considered DEEPEN and ConText, will be conducted in future research. to be actually extracted. If it is less, it is considered to be a )e higher c value means a higher similarity threshold for false extraction. )e weight is computed as follows: entities to be merged, which results in a lower false positive and higher false negative numbers. By increasing the θ value, 􏽘ω , (5) i we want entities to be extracted by more APIs, and sub- i�1 sequently lower the number of false positives and increase the number of false negatives. However, higher values bring where ω is the weight of an NLP API and n refers to the down the number of true positives. )e aim is to strive for number of API used. )e pseudocode of the integration the best combination of these hyperparameters for each process is shown in Algorithm 1. specific NLP task. )e experiments suggested that the values of c � 0.1 and θ � 0.35 are a decent starting point for further 3.2.4. Prototype. Figure 3 shows the overall functional exploration. components of the prototype, which is an instantiation of Results have shown that the performance of the pro- the proposed architecture. )e prototype is a web appli- totype is not consistent. Datasets like the obesity challenge cation with a minimalistic user interface, developed with can rely on our approach, but its reliability on datasets, such as the medication challenge and OPERAM dataset, need HTML5, CSS, JavaScript, and PHP for the back end. Given that many existing NLP APIs use JSON as the default further improvement and evaluation. Journal of Healthcare Engineering 7 Input: X � [X , X , . . ., X ]: returns of n APIs; 1 2 n W � [ω , ω , . . ., ω ]: weights of n APIs; 1 2 n c: similarity threshold; θ: extractor threshold; Output: T: a list of clinical terms Initialisation: ω � 0.25 and ω � 0.75 α β Filter out same/similar terms extracted by one API (1) for i � 1 to n do (2) for x in X do a i (3) Get the rest of terms: X � X − X j i a (4) for x in X do b j (5) calculate the percentage of equal terms over all 10 terms: α (6) calculate the percentage of equal terms over top 3 terms: β (7) calculate the pairwise similarity: δ � ω ∗α + ω ∗β α β (8) if δ≥ c then (9) discard same/similar term: X � X − X i i b (10) end if (11) end for (12) end for (13) end for (14) Get filtered arrays of terms: X � [X , X , . . ., X ] δ 1δ 2δ nδ Filter out extracted terms by the weights over all APIs (15) Compute weights over all APIs: X � 􏽐 X W i�1 δ (16) for ω , x in X do sum ω (17) if ω ≥ θ then sum (18) Add the term the final list: T+ � [x] (19) end if (20) end for (21) return T ALGORITHM 1: Pseudocode of the API integration algorithm. Domain knowledge: UMLS API JSON Extraction pipeline Concept extraction Watson MeaningCloud ... TextRazor PHP framework: Laravel JSON Infrastructure Node.js express server API integration Negation handler CRATE HTTPS JSON JSON Documents Annotations Figure 3: Prototype architecture. Many NLP systems have been tested on the two i2b2 )erefore, its performance is evaluated from an expert in- datasets, and there are benchmark performance metrics tervention perspective. By comparing the automated being published in the literature [38, 39]. We calculated the extracted clinical concepts with the annotations, we estimate averages of top 5 best systems as the baselines. As displayed how well the prototype can be used to assist physicians in Table 4, the prototype performs well and has great po- during their manual extraction process. Unfortunately, tential of being adopted for clinical concept extraction. In feedbacks from physicians indicate that the prototype is case of the OPERAM dataset, there is no benchmark. not yet considered practically useful. Firstly, its poor 8 Journal of Healthcare Engineering Figure 4: Prototype user interface of the multiple NLP API extraction pipeline. A demo video and source code are available online. Table 3: Impact of negation from the experiments. 4.1. Evaluation Results. In comparison with the popular biomedical NLP component collections listed in [40], the Negation F1 Dataset Recall Precision main advantage of our proposed approach is its lightweight (κ) score nature. )e popular component collections, such as True 0.733 0.939 0.823 cTAKES, Bluima, and JCoRe, require an intensive IT re- Obesity challenge False 0.805 0.925 0.861 sources investment including Java developers, NLP spe- True 0.62 0.835 0.712 cialists with experience in the UIMA framework, and local Medication challenge False 0.636 0.838 0.724 hardware support. On the contrary, clinical institutions OPERAM medical True 0.594 0.271 0.373 could start to process unstructured text with as little re- conditions False 0.594 0.271 0.373 sources as possible due to the fact that our cloud-based True 0.795 0.816 0.805 approach outsources NLP to external NLP services. More- OPERAM medications False 0.795 0.816 0.805 over, Bluima has not been updated for four years. Instead of replacing the popular NLP tools, our approach should be considered as an alternative approach in the face of time and performance in extracting medical conditions requires resource constraints. physicians to spend more time filtering out incorrect ex- tractions. Secondly, the prototype fails to identify the as- sociated dosages and frequencies of medications. 4.2. Error Analysis. An error analysis has been carried out in order to better understand the performance of the prototype. As explained in Section 2.2, there are two types 4. Discussion of errors, namely, FPs and FNs. Figure 5 shows the per- We argue that outsourcing NLP tasks offers efficient NLP centage of FP and FN errors in all experiments. First of all, solutions for processing unstructured clinical documents. To one major source of errors in the two i2b2 datasets is false begin with, outsourcing often leads to a reduction of both IT negatives, which means many annotated terms in the development and maintenance costs. Furthermore, a lower datasets are not extracted by our prototype. )e high proportion of FNs is in great part attributed to the entity- level of NLP expertise is required when external NLP ser- vices are used. A developer with limited knowledge of NLP type detection errors. Since some NLP APIs (Mean- ingCloud and Open Calais) are unable to extract phar- could develop a clinical NLP application such as our pro- totype. Lastly, the architecture supports NLP services be- maceutical drug entities, it results in a lower amount of yond clinical concept extraction. By adding a sentiment extracted entities and higher false negatives. )erefore, to analysis NLP pipeline constructed by external NLP APIs, our enhance the performance, NLP APIs such as Mean- prototype can perform sentiment analysis on clinical doc- ingCloud and Open Calais might as well be excluded. uments. For instance, changing from concept extraction Nevertheless, the higher number of false positives led to to sentiment analysis can be accomplished by adjusting the an overall performance loss in the OPERAM medical API request parameters from “{“features”: “entities”}” to conditions extraction. We found out that the problem lies “{“features”: “sentiment”}.” in the annotation. For example, the sentence “Fall during Journal of Healthcare Engineering 9 Table 4: Overall results on three datasets. Dataset κ c θ Recall Precision F1 score 0.1 0.2 0.805 0.925 0.861 Obesity challenge False Baseline 0.771 0.815 0.787 0.1 0.35 0.636 0.838 0.724 Medication challenge False Baseline 0.794 0.845 0.818 OPERAM medical conditions True 0.1 0.5 0.594 0.271 0.373 OPERAM medications False 0 0.35 0.795 0.816 0.805 Average of the top 5 best systems from the challenge. Obesity challenge Medication challenge OPERAM medical conditions OPERAM medications FP (%) FN (%) Figure 5: Error distribution of all the experiments, false positives vs false negatives. calculated. Since the prototype was running locally on a the night, multiple hematomas. Orthostatic hypotension proven.” contains two medical conditions: hematoma and laptop with 8 GB RAM, we think it would become faster if we implement it on a larger server. orthostatic hypotension. Hematoma was found by two out of six extractors; orthostatic hypotension was found by five out In practice, clinical NLP is employed to solve various of six. However, neither of these two was annotated, most clinical problems, ranging from entity extraction to cohort likely because the context of the sentence was in past tense detection. Our research demonstrates that the proposed and potentially not applicable to the current state of the approach performs well on clinical concept extraction. It is patient. crucial to conduct further evaluation on other tasks, such as cohort detection and sentiment analysis before adopting the approach in practice. 4.3. Limitations and Future Research. )ere are a number of Last but not least, due to the wide adoption of health hurdles that prevent the adoption of our approach in daily information systems (HIS) in healthcare institutions, de- practice. Further research is necessary to sufficiently address veloping a simple method that supports the integration of these concerns. First of all, practical implementation re- our approach with HIS would facilitate its implementation. quires a more thorough privacy and security component. )e privacy and security component is part of the proposed 5. Conclusion architecture and currently implemented in the prototype using CRATE [26] and HTTPS. However, since only ano- )e proposed NLP architecture offers an efficient solution to nymized datasets are used in the evaluation, the deidenti- develop tools that are capable of processing unstructured fication toolkit, CRATE, was not validated. Before the clinical data in the healthcare industry. With our approach, practical adoption, we need to first evaluate the performance less time and resources are required to create and maintain of the privacy and security component with real-world NLP-enabled clinical tools given that all NLP tasks are clinical data. outsourced. Moreover, the prototype built upon the ap- Another concern lies in the computational efficiency of proach produces satisfactory overall results, and its per- our approach, namely, execution time. As shown in the formance on certain datasets indicates that its practical demo video, it takes about 20 seconds to process a discharge application in clinical text processing, particularly clinical letter. In specific, the majority of time (15 seconds) goes to concept extraction, is promising. Nevertheless, high variance annotation in which extracted terms are first encoded with among different datasets brings concerns on its general- UMLS and then pairwise similarity between them is ization and practicability. 10 Journal of Healthcare Engineering Medical Informatics Association, vol. 17, no. 3, pp. 245–252, Data Availability [10] R. E. de Castilho and I. Gurevych, “A broad-coverage col- Source code and the OPERAM dataset are available at lection of portable NLP components for building shareable the GitHub repository https://github.com/ianshan0915/ analysis pipelines,” in Proceedings of the Workshop on Open MABNLP. )e two i2b2 datasets are accessible from Infrastructures and Analysis Frameworks for HLT, pp. 1–11, https://www.i2b2.org/NLP/DataSets/Main.php. Finally, Dublin, Ireland, August 2014. a demo video of the prototype is available at https:// [11] K. Kreimeyer, M. Foster, A. Pandey et al., “Natural language youtu.be/dGk9NQGWYfI. processing systems for capturing and standardizing un- structured clinical information: a systematic review,” Journal Conflicts of Interest of Biomedical Informatics, vol. 73, pp. 14–29, 2017. [12] D. Carrell, “A strategy for deploying secure cloud-based )e authors declare that they have no conflicts of interest. natural language processing systems for applied research involving clinical text,” in Proceedings of the System Sciences Acknowledgments (HICSS), 2011 44th Hawaii International Conference on 2011, pp. 1–11, Kauai, HI, USA, January 2011. )is work is part of the project “OPERAM: OPtimising [13] R. Dale, “NLP meets the cloud,” Natural Language Engi- thERapy to prevent Avoidable hospital admissions in the neering, vol. 21, no. 4, pp. 653–659, 2015. Multimorbid elderly” supported by the European Com- [14] W. L. Currie, B. Desai, and N. Khan, “Customer evaluation of mission (EC) HORIZON 2020, proposal 634238, and by the application services provisioning in five vertical sectors,” Swiss State Secretariat for Education, Research and In- Journal of Information Technology, vol. 19, no. 1, pp. 39–58, novation (SERI), under contract number 15.0137. )e [15] A. Rago, F. M. Ramos, J. I. Velez, J. A. D´ıaz Pace, and opinions expressed and arguments employed herein are C. Marcos, “TeXTracT: a web-based tool for building NLP- those of the authors and do not necessarily reflect the official enabled applications,” in Proceedings of the Simposio Argen- views of the EC and the Swiss government. tino de Ingenier´ıa de Software (ASSE 2016)-JAIIO 45, Buenos Aires, Argentina, September 2016. References [16] G. Haffari, M. Carman, and T. D. Tran, “Efficient bench- marking of NLP APIs using multi-armed bandits,” in Pro- [1] S. Doan, M. Conway, T. M. Phuong, and L. Ohno-Machado, ceedings of the 15th Conference of the European Chapter of the “Natural language processing in biomedicine: a unified system Association for Computational Linguistics, vol. 1, pp. 408–416, architecture overview,” in Methods in Molecular Biology, Valencia, Spain, April 2017. vol. 1168, pp. 275–294, Springer, Clifton, NJ, USA, 2014. [17] S. Hellmann, J. Lehmann, S. Auer, and M. Brummer, ¨ “In- [2] G. K. Savova, J. J. Masanz, P. V. Ogren et al., “Mayo clinical tegrating NLP using linked data,” in Proceedings of the 12th text analysis and knowledge extraction system (cTAKES): International Semantic Web Conference, Sydney, Australia, architecture, component evaluation and applications,” Jour- October 2013. nal of the American Medical Informatics Association, vol. 17, [18] P. Mart´ınez, J. L. Mart´ınez, I. Segura-Bedmar, J. Moreno- no. 5, pp. 507–513, 2010. Schneider, A. Luna, and R. Revert, “Turning user generated [3] J. Patrick and M. Li, “High accuracy information extraction of health-related content into actionable knowledge through text medication information from clinical notes: 2009 i2b2 analytics services,” Natural Language Processing and Text medication extraction challenge,” Journal of the American Analytics in Industry, vol. 78, pp. 43–56, 2016. Medical Informatics Association, vol. 17, no. 5, pp. 524–527, [19] Z. S. Abdallah, M. Carman, and G. Haffari, “Multi-domain evaluation framework for named entity recognition tools,” [4] T. Cai, A. A. Giannopoulos, S. Yu et al., “Natural language Computer Speech & Language, vol. 43, pp. 34–55, 2017. processing technologies in radiology research and clinical [20] K. Chard, M. Russell, Y. A. Lussier, E. A. Mendonca, and applications,” Radiographics, vol. 36, no. 1, pp. 176–191, 2016. J. C. Silverstein, “A cloud-based approach to medical NLP,” [5] M. Kreuzthaler and S. Schulz, “Detection of sentence AMIA Annual Symposium Proceedings, pp. 207–216, 2011. boundaries and abbreviations in clinical narratives,” BMC [21] G. Rizzo and R. Troncy, “NERD: a framework for unifying Medical Informatics and Decision Making, vol. 15, no. S2, named entity recognition and disambiguation web extraction pp. 1–13, 2015. tools,” in Proceedings of the System Demonstration at the 13th [6] O. Bodenreider, “)e unified medical language system Conference of the European Chapter of the Association for (UMLS): integrating biomedical terminology,” Nucleic Acids Computational Linguistics (EACL’2012), Avignon, France, Research, vol. 32, no. 1, pp. D267–D270, 2004. April 2012. [7] H. Cunningham, V. Tablan, A. Roberts, and K. Bontcheva, [22] API-Based CMS Buyer’s Guide, May 2018, https://nordicapis. “Getting more out of biomedical documents with GATE’s full com/api-based-cms-buyers-guide/. lifecycle open source text analytics,” PLoS Computational Biology, vol. 9, no. 2, Article ID e1002854, 2013. [23] A. R. Hevner, S. T. March, J. Park, and S. Ram, “Design science in information systems research,” Management In- [8] D. Ferrucci and A. Lally, “UIMA: an architectural approach to unstructured information processing in the corporate re- formation Systems Quarterly, vol. 28, no. 1, p. 6, 2008. [24] K. Peffers, T. Tuunanen, M. A. Rothenberger, and search environment,” Natural Language Engineering, vol. 10, no. 3-4, pp. 327–348, 2004. S. Chatterjee, “A design science research methodology for information systems research,” Journal of Management In- [9] J.-H. Chiang, J.-W. Lin, and C.-W. Yang, “Automated eval- uation of electronic discharge notes to assess quality of care formation Systems, vol. 24, no. 3, pp. 45–77, 2007. [25] Z. Shen, M. Meulendijk, and M. Spruit, “A federated in- for cardiovascular diseases using medical language extraction and encoding system (MedLEE),” Journal of the American formation architecture for multinational clinical trials: Journal of Healthcare Engineering 11 STRIPA revisited,” in Proceedings of the 24th European Conference on Information Systems (ECIS), Istanbul, Turkey, June 2016. [26] R. N. Cardinal, “Clinical records anonymisation and text extraction (CRATE): an open-source software system,” BMC Medical Informatics and Decision Making, vol. 17, no. 1, p. 50, [27] V. Menger, F. Scheepers, L. M. van Wijk, and M. Spruit, “DEDUCE: a pattern matching method for automatic de- identification of Dutch medical text,” Telematics and In- formatics, vol. 35, no. 4, pp. 727–736, 2018. [28] B. Koopman, G. Zuccon, A. Nguyen, A. Bergheim, and N. Grayson, “Automatic ICD-10 classification of cancers from free-text death certificates,” International Journal of Medical Informatics, vol. 84, no. 11, pp. 956–965, 2015. [29] Y. Ni, S. Kennebeck, J. W. Dexheimer et al., “Automated clinical trial eligibility prescreening: increasing the efficiency of patient identification for clinical trials in the emergency department,” Journal of the American Medical Informatics Association, vol. 22, no. 1, pp. 166–178, 2015. [30] B. J. Marafino, W. John Boscardin, and R. Adams Dudley, “Efficient and sparse feature selection for biomedical text classification via the elastic net: application to ICU risk stratification from nursing notes,” Journal of Biomedical In- formatics, vol. 54, pp. 114–120, 2015. [31] V. Menger, M. Spruit, R. van Est, E. Nap, and F. Scheepers, “Machine learning approach to inpatient violence risk as- sessment using routinely collected clinical notes in electronic health records,” JAMA Network Open, vol. 2, no. 7, Article ID e196709, 2019, 2019. [32] J. P. Pestian, P. Matykiewicz, M. Linn-Gust et al., “Sentiment analysis of suicide notes: a shared task,” Biomedical In- formatics Insights, vol. 5S1, 2012. [33] W. W. Chapman, W. Bridewell, P. Hanbury, G. F. Cooper, and B. G. Buchanan, “Evaluation of negation phrases in narrative clinical reports,” in Proceedings of the AMIA Sym- posium, pp. 105–109, Washington, DC, USA, November 2001. [34] S. Mehrabi, A. Krishnan, S. Sohn et al., “DEEPEN: a negation detection system for clinical text incorporating dependency relation into NegEx,” Journal of Biomedical Informatics, vol. 54, pp. 213–219, 2015. [35] W. W. Chapman, G. F. Cooper, P. Hanbury, B. E. Chapman, L. H. Harrison, and M. M. Wagner, “Creating a text classifier to detect radiology reports describing mediastinal findings associated with inhalational anthrax and other disorders,” Journal of the American Medical Informatics Association, vol. 10, no. 5, pp. 494–503, 2003. [36] S. Meystre and P. J. Haug, “Natural language processing to extract medical problems from electronic clinical documents: performance evaluation,” Journal of Biomedical Informatics, vol. 39, no. 6, pp. 589–599, 2006. [37] K. J. Mitchell, M. J. Becich, J. J. Berman et al., “Imple- mentation and evaluation of a negation tagger in a pipeline- based system for information extract from pathology reports,” Studies in Health Technology and Informatics, vol. 107, no. 1, pp. 663–667, 2004. [38] O. Uzuner, “Recognizing obesity and comorbidities in sparse data,” Journal of the American Medical Informatics Associa- tion, vol. 16, no. 4, pp. 561–570, 2009. [39] O. Uzuner, I. Solti, and E. Cadag, “Extracting medication in- formation from clinical text,” Journal of the American Medical Informatics Association, vol. 17, no. 5, pp. 514–518, 2010. [40] P. Przybyła, M. Shardlow, S. Aubin et al., “Text mining re- sources for the life sciences,” Database, vol. 2016, 2016. International Journal of Advances in Rotating Machinery Multimedia Journal of The Scientific Journal of Engineering World Journal Sensors Hindawi Hindawi Publishing Corporation Hindawi Hindawi Hindawi Hindawi www.hindawi.com Volume 2018 http://www www.hindawi.com .hindawi.com V Volume 2018 olume 2013 www.hindawi.com Volume 2018 www.hindawi.com Volume 2018 www.hindawi.com Volume 2018 Journal of Control Science and Engineering Advances in Civil Engineering Hindawi Hindawi www.hindawi.com Volume 2018 www.hindawi.com Volume 2018 Submit your manuscripts at www.hindawi.com Journal of Journal of Electrical and Computer Robotics Engineering Hindawi Hindawi www.hindawi.com Volume 2018 www.hindawi.com Volume 2018 VLSI Design Advances in OptoElectronics International Journal of Modelling & Aerospace International Journal of Simulation Navigation and in Engineering Engineering Observation Hindawi Hindawi Hindawi Hindawi Volume 2018 Volume 2018 Hindawi www.hindawi.com Volume 2018 www.hindawi.com Volume 2018 www.hindawi.com www.hindawi.com www.hindawi.com Volume 2018 International Journal of Active and Passive International Journal of Antennas and Advances in Chemical Engineering Propagation Electronic Components Shock and Vibration Acoustics and Vibration Hindawi Hindawi Hindawi Hindawi Hindawi www.hindawi.com Volume 2018 www.hindawi.com Volume 2018 www.hindawi.com Volume 2018 www.hindawi.com Volume 2018 www.hindawi.com Volume 2018 http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Journal of Healthcare Engineering Hindawi Publishing Corporation

A Lightweight API-Based Approach for Building Flexible Clinical NLP Systems

Loading next page...
 
/lp/hindawi-publishing-corporation/a-lightweight-api-based-approach-for-building-flexible-clinical-nlp-Crauvmdtv6
Publisher
Hindawi Publishing Corporation
Copyright
Copyright © 2019 Zhengru Shen et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
ISSN
2040-2295
eISSN
2040-2309
DOI
10.1155/2019/3435609
Publisher site
See Article on Publisher Site

Abstract

Hindawi Journal of Healthcare Engineering Volume 2019, Article ID 3435609, 11 pages https://doi.org/10.1155/2019/3435609 Research Article A Lightweight API-Based Approach for Building Flexible Clinical NLP Systems Zhengru Shen , Hugo van Krimpen, and Marco Spruit Department of Computing and Information Sciences, Utrecht University, Utrecht, Netherlands Correspondence should be addressed to Zhengru Shen; z.shen@uu.nl Received 18 February 2019; Revised 20 June 2019; Accepted 26 July 2019; Published 15 August 2019 Academic Editor: Haihong Zhang Copyright © 2019 Zhengru Shen et al. )is is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Natural language processing (NLP) has become essential for secondary use of clinical data. Over the last two decades, many clinical NLP systems were developed in both academia and industry. However, nearly all existing systems are restricted to specific clinical settings mainly because they were developed for and tested with specific datasets, and they often fail to scale up. )erefore, using existing NLP systems for one’s own clinical purposes requires substantial resources and long-term time commitments for customization and testing. Moreover, the maintenance is also troublesome and time-consuming. )is research presents a lightweight approach for building clinical NLP systems with limited resources. Following the design science research approach, we propose a lightweight architecture which is designed to be composable, extensible, and configurable. It takes NLP as an external component which can be accessed independently and orchestrated in a pipeline via web APIs. To validate its feasibility, we developed a web-based prototype for clinical concept extraction with six well-known NLP APIs and evaluated it on three clinical datasets. In comparison with available benchmarks for the datasets, three high F1 scores (0.861, 0.724, and 0.805) were obtained from the evaluation. It also gained a low F1 score (0.373) on one of the tests, which probably is due to the small size of the test dataset. )e development and evaluation of the prototype demonstrates that our approach has a great potential for building effective clinical NLP systems with limited resources. knowledge is the Unified Medical Language System (UMLS) 1. Introduction [6]. Framework refers to a software platform that integrates Today’s technologies allow the accumulation of vast textual various NLP tasks or modules either sequentially or hier- data, which consequently has boosted the popularity of NLP archically into NLP pipelines. GATE and UIMA are the research. )ere has been a huge amount of papers published leading open-source frameworks [7, 8]. )ere are two levels and a variety of NLP systems or toolkits crafted in multiple of NLP tasks: low-level tasks and high-level tasks. Low-level domains over the last two decades. Among them, clinical tasks include tokenization, part of speech tagging, sentence NLP occupies a large portion. )ere are clinical NLP sys- boundary detection, and so on. High-level tasks refer to the semantic level processing such as named entity recognition, tems, such as Apache cTAKES, that integrate different NLP tools to process clinical documents [1, 2]. )ere are also NLP relation extraction, and sentiment analysis. tools which target certain specific clinical needs, including History has shown that building a successful clinical NLP extracting medication information [3], identifying locations system requires a tremendous amount of resources. For of pulmonary embolism from radiology reports [4], and instance, it took a team from Columbia University 14 years categorizing pain status [5]. to commercialize the MedLEE system [9]. )e development Figure 1 presents a general architecture of a clinical NLP of cTAKES started at the Mayo Clinic in 2006, and further system that contains two main components: background external collaborations with four other universities in 2010 knowledge and framework [1]. Background knowledge resulted in the first release of the current Apache project [2]. contains ontologies, domain models, domain knowledge, )erefore, creating reusable NLP pipelines based on open- and trained corpora. )e widely used clinical domain source modular frameworks like GATE and UIMA becomes 2 Journal of Healthcare Engineering TeXTracT was devised to support the setup and deployment Ontologies, domain models, Background of NLP techniques on demand [15]. Abdallah et al. de- domain knowledge, trained corpora knowledge veloped a flexible and extensible framework for integrating named entity recognition (NER) web APIs and assessed it across multiple domains [19]. Although these tools exhibit NLP pipelines promising results, few were built for clinical NLP or eval- uated on clinical datasets. )erefore, it is safe to say that Framework Task 1 Task 2 ... Task N adopting these tools in clinical settings would be problematic due to the unique characteristics of the clinical domain. For example, privacy is considered to be of the utmost impor- tance, but none of the above tools have taken it into Text Structured form consideration. )is paper thus proposes a lightweight framework which Figure 1: A general architecture of clinical NLP systems. enables a rapid development of clinical NLP systems with external NLP APIs. )e approach has the following ad- vantages compared to traditional NLP frameworks: (1) fast more reasonable [9, 10]. Although it dramatically reduces resources and level of expertise, we argue that it is not an development; (2) lower costs; (3) flexibility; and (4) pro- efficient and effective solution for two main reasons. Firstly, gramming language independent. )e deployment is min- nearly every NLP pipeline that is created to address a single imized by outsourcing both NLP tasks and background specific clinical need, either rule or machine-learning based, knowledge to external API services. )us, NLP systems can has been proven to be useful for only its designated purposes be quickly and cost-efficiently developed based on the [11]. )us, reusability is difficult given the properties. Sec- proposed framework. )e framework is flexible in many aspects. To begin with, it supports the flexible combination ondly, deploying cTAKES-based NLP pipelines implies a high cost of operation which requires installation and of different NLP tasks from external APIs. Secondly, users have the freedom of choosing their preferred NLP API configuration of multiple components by NLP experts [12]. Besides, maintenance of a deployed NLP system requires a vendors, and multiple APIs can be integrated to achieve continuous investment. better results. To evaluate the framework, we have built a With the purpose of simplifying and outsourcing the web-based open-source clinical NLP application. NLP implementation, software as a service, or SaaS, has been introduced to the NLP world during recent years [13]. SaaS 2. Methods generally refers to the mode of software delivery where end- users are charged with a monthly or annual subscription fee 2.1. Design Science. Our research followed the design science to utilize a set of functionalities over the Internet [14]. NLP research as we built and evaluated the framework because of systems distributed in the SaaS model are often available its strength and popularity in solving a real-world problem through web application programming interfaces (APIs) by designing and building an innovative IT artifact [23]. In and named as NLP APIs or cloud-based NLP APIs [13, 15]. our case, the artifact is a lightweight framework that facil- Many NLP APIs have emerged from both companies and itates clinical NLP systems development. We follow the universities and are growing popularly [7]. A few prominent design science research methodology (DSRM) proposed by examples are IBM Watson, Aylien, Lexalytics, and TextRazor Peffers et al., which consists of six steps: problem identifi- [13]. From the cost-benefit perspective, these NLP APIs cation and motivation, definition of the objectives for a allow developers to rapidly create NLP-enabled tools solution, design and development, demonstration, evalua- without investing abundant resources on implementing tion, and communication [24]. necessary NLP techniques in codes and on regular main- )e DSRM is initiated by the (I) problem identification tenance. A number of applications based on NLP APIs were and motivation, which we addressed by literature study. built [16–18]. Previous studies have described the general architecture of To utilize NLP APIs, API-based frameworks have been clinical NLP systems and how expensive it is to build them. produced [15,19–21]. API-based systems, also known as Even though the introduction of modular NLP frameworks cloud-based, refer to tools that are built on external web reduced the complexity of NLP systems, it is still challenging APIs and having their functionalities partially or fully ac- to create clinical NLP systems for many healthcare in- complished with one or a pipeline of APIs. Due to the stitutions due to limited resources. Based on the identified growing popularity of web APIs in the software industry, problem, we inferred the (II) objectives for a solution: API-based tools are abundant in companies. For instance, an creating a lightweight NLP framework that enables a rapid API-based CMS (content management system) is utilized to development of an API-based clinical NLP system. In the save development resources and follow-up maintenance (III) design and development, we developed the framework [22]. Furthermore, researchers have also investigated the based on the general architecture we identified, after which approach in recent years. Rizzo and Troncy proposed the each of its components is explained in detail. To (IV) Named Entity Recognition and Disambiguation (NERD) demonstrate and (V) evaluate the framework, a web-based framework that incorporates the result of ten different public open-source clinical NLP application was developed. APIs-based NLP extractors [21]. A web-based tool called Moreover, experiments were carried out with three clinical Journal of Healthcare Engineering 3 annotations of the 2008 Obesity Challenge are different from datasets to primarily examine whether external NLP APIs would deliver the state-of-the-art performance. )e final the other two datasets. To simplify the identification of positives and negatives, we divided annotations into two step of the DSRM is the communication. )e paper serves as the start of our (VI) communication on this topic. groups: positives that are in the text and negatives which are not mentioned. )erefore, comparing clinical entities extracted by our application to the ground truth, we cal- 2.2. Evaluation Design. )ree English anonymized clinical culate the following: datasets were used in our evaluation. Two of the datasets are obtained from the Informatics for Integrating Biology and (i) TP: entities that were both extracted and annotated the Bedside (i2b2) center: 2008 Obesity Challenge and 2009 as positives Medication Challenge. )e third dataset comes from a (ii) FP: entities that were extracted as positives but were European clinical trial called OPERAM. Since the primary annotated as negatives goal of our evaluation is to prove that external general- (iii) FN: entities that were not extracted but were an- purpose NLP APIs can yield good performance on clinical notated as positives data, we only used a subset of the two large i2b2 datasets. Precision (1) represents the proportion of extracted (i) 2008 Obesity Challenge. )is dataset consists of 611 positives that are annotated positives. On the contrary, recall discharge letters. All discharge letters are annotated (2) is the proportion of annotated positives that were cor- with 16 different medical condition terms in the rectly extracted as such. F1 score (3) is the harmonic mean of context of obesity, including asthma, gastroesoph- precision and recall: ageal disorder, and depression. Terms could be TP either annotated as being in the document, not (1) precision � , being in the document, or undecided/unknown TP + FP which was treated as a not being in the document. TP )e strength of this dataset, concerning the aim of (2) recall � , TP + FN these tests, is that there are a lot of documents, snf its weakness is that it is only annotated for 16 abstract precision∗ recall terms in the context of obesity. To simplify the F1 score � 2∗ . (3) precision + recall experiment, we randomly selected 100 discharge letters and labeled each document with the medical conditions that are annotated as “present.” 3. Results (ii) 2009 Medication Challenge. 947 out of 1243 in total deidentified discharge letters have the gold standard )e section presents results in two parts: the framework and annotations. Medication names in the annotations a web-based open-source clinical NLP application. )e are used for the evaluation. By comparing the an- architecture lays down the technical groundwork, upon notated medication names with those generated which the application was constructed. )e following ex- from our application, we calculate the evaluation plains each of them in details. metrics. We also randomly select 100 out of the 947 documents. 3.1. A Lightweight NLP Architecture for Clinical NLP. )e (iii) OPERAM Dataset. )e dataset consists of five architecture addresses the issues of existing clinical NLP discharge letters that have been used during the applications, including interoperability, flexibility, and pilot of the OPERAM clinical trial [25]. Medical specific restrictions within the clinical field, such as privacy experts of the trial annotated these letters by both and security. )e strength of our proposed architecture is medical conditions and pharmaceutical drugs. shown in its capabilities: (1) freedom of assembling suitable Moreover, standardized clinical codes for each NLP APIs either sequentially or hierarchically based on annotation are included. With this dataset, we aim scenarios; (2) encoding clinical terms with comprehensive to demonstrate the performance of our NLP ap- and standardized clinical codes; (3) the built-in deidentifi- plication with clinical documents from practices, cation function to anonymize clinical documents. Figure 2 even though it is clear that the small size limits our depicts its four main components: external APIs, in- findings. frastructure, NLP pipelines, and Apps. We extracted entities of “medical condition” or “phar- maceutical drug” from the, in total, 205 clinical documents and then encoded them with UMLS. Based on the encodings, 3.1.1. External APIs. In this architecture, two types of APIs, extracted entities were filtered so that distinct entities were namely, an NLP API and a domain knowledge API, are extracted for each clinical document. In order to measure the included to parse unstructured medical text and map parsed performance of our extraction, we have used well-known terms against a medical metathesaurus, respectively. )e metrics: precision, recall, and F1 score. )ey are computed NLP API provides various cloud-based NLP services that from true positives (TP), false positives (FP), and false parse unstructured text for different purposes, including negatives (FN) for each document. As stated above, entity recognition and document classification. )e domain 4 Journal of Healthcare Engineering External APIs: UMLS API; NLP APIs NLP pipelines API 1 API 2 ... API N Apps (i) Concept extraction Infrastructure (ii) Summarization (iii) Sentiment analysis API processing Locally implemented NLP tasks Privacy Security Text Structured form Figure 2: A lightweight NLP architecture for clinical NLP. knowledge API supports the mapping of medical text to comparison with CRATE, DEDUCE is more concepts from the UMLS metathesaurus. As the most used lightweight. As a Python package, it processes biomedical database, UMLS contains millions of biomedical sensitive patient information with commands like concept names and their relations. In addition, domain “deduce.deidentify_annotations().” models and training corpora are available for specific clinical (iii) Security. )e security component controls the ac- documents such as radiology reports, pathology reports, and cess of clinical data and all external APIs. Au- discharge summaries [1]. )e UMLS is a major part of the thentication and encryption are added to safeguard solution for standardization and interoperability as it maps data sharing via the Internet. terms extracted by multiple APIs to standardized codes such (iv) (Optional) Local NLP Tasks. As discussed pre- as ATC and ICD10. viously, an external NLP API grants no control of what NLP techniques to employ. In case some specific NLP techniques are required, our local NLP 3.1.2. Infrastructure. )e infrastructure layer prepares technique component provides a choice of imple- clinical data before sending them to external APIs by dei- menting your own NLP techniques locally in a dentification and adding authentications. Furthermore, it preferred language. processes results received from external APIs for later in- tegration. An optional component, locally implemented NLP techniques, is also incorporated. 3.1.3. NLP Pipelines. )is layer provides a list of NLP ser- (i) API Processing. )e purposes of API processing are vices from which clinical applications can select the most two-fold: (1) prepare clinical text before sending suitable ones on demand. First of all, differences among NLP them to external APIs and (2) process results API providers in terms of their available NLP services are returned from external APIs. Given the difference apparent. However, as shown in Table 1, there are also a between multiple APIs, data processing is inevitable number of common NLP services. Secondly, systematic to achieve interoperability. Specific API processing studies have summarized some commonly used NLP tasks include formatting clinical text for APIs re- techniques in clinical NLP applications [11]. By combining quests, filtering results returned from APIs, and data the common NLP services of various APIs and the useful conversion. NLP techniques in clinical settings, a shortlist of NLP ser- vices is selected for the architecture. (ii) Privacy. Privacy protection is a critical issue in clinical data sharing for both research and clinical Moreover, multiple NLP services from different APIs can be integrated either sequentially or hierarchically for a practices, and privacy violations often incur legal problems with substantial consequences. )e pri- single clinical NLP task. )is enables clinical NLP appli- cations to address the limitations of individual APIs caused vacy component embedded in the infrastructure offers technical solutions to deidentify or ano- by particular NLP techniques implemented and data employed to build it. More importantly, having a config- nymize patient-level data, such as CRATE [26] and DEDUCE [27]. CRATE is an open-source software urable NLP pipeline brings scalability and flexibility. For instance, a clinical concepts extraction enabled application system that anonymizes an electronic health records database to create a research database with ano- can support combining entity extraction service from two or more of the NLP APIs in Table 1. However, interoperability nymized patients’ data. With CRATE implemented, between different NLP APIs becomes a challenge as both our approach can directly use patients’ data. In Journal of Healthcare Engineering 5 Table 1: NLP services of common NLP API providers. death certificates [28]. Other text classification ex- amples in clinical settings cover classifying a NLP API Available NLP services complete patient record with respect to its eligibility Entity extraction, concept extraction, relation IBM Watson for a clinical trial [29], categorizing ICU risk extraction, text classification, language NLU stratification from nursing notes [30], assessing detection, and sentiment analysis inpatient violence risk using routinely collected Article extraction, entity extraction, concept clinical notes [31], and among others. extraction, summarization, text classification, Aylien language detection, semantic labeling, (iii) Sentiment Analysis: Unlocking the subjective sentiment analysis, hashtag suggestion, image meaning of clinical text is particularly helpful in tagging, and microformat extraction psychology. A shared task for sentiment analysis Sentiment analysis, concept extraction, of suicide notes was carried out as an i2b2 Lexalytics categorization, named entity extraction, theme challenge [32]. extraction, and summarization Topic extraction, text classification, sentiment analysis, language detection, and linguistic Meaning Cloud 3.2. Prototype: API-Based Clinical Concept Extraction. To analysis (POS tagging, parsing, and evaluate the architecture, a prototype that extracts clinical lemmatization) concepts from clinical free texts has been developed. )is Entity extraction, concept tagging, keywords section first illustrates the design of its main components. extraction, relation extraction, text )en, the prototype itself is presented. Alchemy API classification, language detection, sentiment analysis, microformat extraction, feed detection, and linked data 3.2.1. External NLP APIs. As described above, web NLP Entity extraction, disambiguation, linking, APIs have gained wide popularity over the last few years. TextRazor keywords extraction, topic tagging, and classification Both academics and companies recognized the importance Concept extraction, translation, personality and extended their NLP systems with web APIs. As shown in Developer Cloud insights, and classification Table 2, the prototype incorporates six leading NLP APIs Entity extraction, relation extraction, and from both academia and industry in its implementation. )e Open Calais sentiment analysis selection is based on three criteria: (1) free or free trial Entity extraction, text classification, language available; (2) industrial APIs supported by big companies/ Dandelion API detection, sentiment analysis, and text teams; (3) academic APIs verified by peers. similarity Autocomplete, concept extraction, document Haven categorization, entity extraction, language 3.2.2. NLP Technique Implemented Locally. Studies have OnDemand detection, sentiment analysis, and text revealed that negation is very common in clinical reports tokenization [33, 34]. For instance, “no fracture,” “patient denies a headache,” and “he has no smoking history” often appear in clinical texts. In order to correctly extract clinical terms, their inputs and outputs might vary considerably. )erefore, negation detection becomes inevitable. However, given that the NLP pipelines contain an integration component which most of the selected NLP APIs are tools for text processing facilitates the interoperability by implementing a proper and analysis in the general domain, the negation issue of integration strategy. clinical documents is not properly tackled, and they cannot filter out irrelevant information. )erefore, negation de- 3.1.4. Apps. In the application layer, clinical NLP-enabled tection is implemented locally for the prototype. As the most applications for various needs can be created. )ey are well-known negation detection algorithm, NegEx has been produced either for performing a specific NLP task such as adopted by a number of biomedical applications [35–37]. extracting diagnoses from discharge summaries and iden- We implemented the algorithm to handle negation in this tifying drugs and dosage information from medical records prototype. or with a general purpose of processing unstructured clinical text. Existing NLP applications in clinical domains are 3.2.3. API Processing. NLP APIs first extract clinical terms categorized into the following groups: which will be filtered by the local negator. )en the UMLS (i) Concept Extraction. Kreimeyer et al. conducted a API transforms the filtered clinical terms to the standardized systematic literature review of NLP systems con- codes, such as ATC codes, ICD-10, or SNOMED, which structed to extract terms from clinical documents ensures that the extracted clinical terms are interoperable and map them to standardized clinical codes [11]. after integration. (ii) Text Classification. Classification of free text in For each extracted term, the UMLS API returns its top 10 electronic health record (EHR) has surfaced as a matched codes. )ese top matches are ranked on their popular topic in clinical NLP research. Koopman similarity to the extracted term, with the first as the most similar one. )e prototype captures the unique identifier of et al. devised a binary classifier to detect whether or not death is related to cancer using free texts of each matched code for later use. 6 Journal of Healthcare Engineering Table 2: NLP APIs selected for the prototype. API Fee Company/team References https://www.ibm.com/watson/developercloud/natural- IBM Watson NLU Free trial IBM language-understanding/api/v1/ MeaningCloud Free trial MeaningCloud LLC https://www.meaningcloud.com/developer/documentation Open Calais Free trial )omson Reuters http://www.opencalais.com/opencalais-api/ Haven OnDemand Free trial Hewlett Packard https://dev.havenondemand.com/apis TextRazor Free trial TextRazor Ltd. https://www.textrazor.com/docs/rest Dandelion API Free trial Spaziodati https://dandelion.eu/docs/ As discussed above, when multiple APIs are applied for format, JSON is the chosen format for data transferring one task, results need to be integrated. )e prototype em- between different components. Figure 4 presents a ploys a double weight system to integrate multiple APIs. )e screenshot of the application. Users need to provide first weight system determines whether an extracted term is clinical documents they want to process in the upper input similar to another extracted term from the same document. field and then select APIs and coding standards. After )e weight of a pair of two extracted terms is calculated clicking the Extract button, the results will be displayed in based on their top 10 matches from the UMLS API and then the table at the bottom. “Diseases Remote” lists the ex- is compared with the similarity threshold c; if the weight is tractions of external NLP APIs, while “Diseases Local” higher than the threshold, we consider it to be an equal term. represents results of combining external NLP APIs, the )e weight formula is shown as follows: local negation handler, and the UMLS API. Unfortunately, the application is not accessible online due to a lack of API α 3β 􏼒 􏼓 + 􏼠 􏼡≻c, (4) token management. Sharing our tokens online might incur 4 4 a charge when there are a large number of API requests. Nevertheless, researchers are able to deploy their own where α refers to the percentage of equal terms over all 10 version of the system with the source codes we share on terms and β is the percentage of equal terms over the top 3 GitHub at https://github.com/ianshan0915/MABNLP. A terms. α and β are calculated based on the UMLS API demo video is also available at https://youtu.be/ matches of two extracted terms. )e weight is a value be- dGk9NQGWYfI. tween 0 and 1, 0 being that the terms are not similar at all and 1 being exactly the same. For a given NLP task, an initial value of c � 0.1 is recommended, and then according to the 3.3. Evaluation Results. As explained before, the prototype number of false positives and false negatives, we adjust the comes with three hyperparameters that adjust the extraction value of c to achieve optimal output. )e strategy of tuning outputs: negation (κ), term similarity threshold (c), and these parameters is discussed further in Section 3.3. extractor threshold (θ). )e hyperparameter tuning was )e second weight system determines whether an manually conducted by the researchers in the experiments. extracted clinical term has enough cumulative weight from )e impacts of the controlling hyperparameters on the all NLP APIs. Since the performance of NLP APIs varies, a outputs of our experiments vary. First of all, negation weight for each individual API is estimated by using the F1 surprisingly shows little positive influence as shown in scores calculated after testing each API on a small subset of Table 3. Its main reason probably lies in the fact that the clinical documents. )e F1 score for each API is normalized implemented negation algorithm, NegEx, only uses negation to an extractor-weight ω. For each clinical term extracted, we cue words without considering the semantics of a sentence sum the weights of the extractors the term was extracted by. [34]. Implementation of more advanced algorithms, such as If the weight is over the extractor threshold θ, it is considered DEEPEN and ConText, will be conducted in future research. to be actually extracted. If it is less, it is considered to be a )e higher c value means a higher similarity threshold for false extraction. )e weight is computed as follows: entities to be merged, which results in a lower false positive and higher false negative numbers. By increasing the θ value, 􏽘ω , (5) i we want entities to be extracted by more APIs, and sub- i�1 sequently lower the number of false positives and increase the number of false negatives. However, higher values bring where ω is the weight of an NLP API and n refers to the down the number of true positives. )e aim is to strive for number of API used. )e pseudocode of the integration the best combination of these hyperparameters for each process is shown in Algorithm 1. specific NLP task. )e experiments suggested that the values of c � 0.1 and θ � 0.35 are a decent starting point for further 3.2.4. Prototype. Figure 3 shows the overall functional exploration. components of the prototype, which is an instantiation of Results have shown that the performance of the pro- the proposed architecture. )e prototype is a web appli- totype is not consistent. Datasets like the obesity challenge cation with a minimalistic user interface, developed with can rely on our approach, but its reliability on datasets, such as the medication challenge and OPERAM dataset, need HTML5, CSS, JavaScript, and PHP for the back end. Given that many existing NLP APIs use JSON as the default further improvement and evaluation. Journal of Healthcare Engineering 7 Input: X � [X , X , . . ., X ]: returns of n APIs; 1 2 n W � [ω , ω , . . ., ω ]: weights of n APIs; 1 2 n c: similarity threshold; θ: extractor threshold; Output: T: a list of clinical terms Initialisation: ω � 0.25 and ω � 0.75 α β Filter out same/similar terms extracted by one API (1) for i � 1 to n do (2) for x in X do a i (3) Get the rest of terms: X � X − X j i a (4) for x in X do b j (5) calculate the percentage of equal terms over all 10 terms: α (6) calculate the percentage of equal terms over top 3 terms: β (7) calculate the pairwise similarity: δ � ω ∗α + ω ∗β α β (8) if δ≥ c then (9) discard same/similar term: X � X − X i i b (10) end if (11) end for (12) end for (13) end for (14) Get filtered arrays of terms: X � [X , X , . . ., X ] δ 1δ 2δ nδ Filter out extracted terms by the weights over all APIs (15) Compute weights over all APIs: X � 􏽐 X W i�1 δ (16) for ω , x in X do sum ω (17) if ω ≥ θ then sum (18) Add the term the final list: T+ � [x] (19) end if (20) end for (21) return T ALGORITHM 1: Pseudocode of the API integration algorithm. Domain knowledge: UMLS API JSON Extraction pipeline Concept extraction Watson MeaningCloud ... TextRazor PHP framework: Laravel JSON Infrastructure Node.js express server API integration Negation handler CRATE HTTPS JSON JSON Documents Annotations Figure 3: Prototype architecture. Many NLP systems have been tested on the two i2b2 )erefore, its performance is evaluated from an expert in- datasets, and there are benchmark performance metrics tervention perspective. By comparing the automated being published in the literature [38, 39]. We calculated the extracted clinical concepts with the annotations, we estimate averages of top 5 best systems as the baselines. As displayed how well the prototype can be used to assist physicians in Table 4, the prototype performs well and has great po- during their manual extraction process. Unfortunately, tential of being adopted for clinical concept extraction. In feedbacks from physicians indicate that the prototype is case of the OPERAM dataset, there is no benchmark. not yet considered practically useful. Firstly, its poor 8 Journal of Healthcare Engineering Figure 4: Prototype user interface of the multiple NLP API extraction pipeline. A demo video and source code are available online. Table 3: Impact of negation from the experiments. 4.1. Evaluation Results. In comparison with the popular biomedical NLP component collections listed in [40], the Negation F1 Dataset Recall Precision main advantage of our proposed approach is its lightweight (κ) score nature. )e popular component collections, such as True 0.733 0.939 0.823 cTAKES, Bluima, and JCoRe, require an intensive IT re- Obesity challenge False 0.805 0.925 0.861 sources investment including Java developers, NLP spe- True 0.62 0.835 0.712 cialists with experience in the UIMA framework, and local Medication challenge False 0.636 0.838 0.724 hardware support. On the contrary, clinical institutions OPERAM medical True 0.594 0.271 0.373 could start to process unstructured text with as little re- conditions False 0.594 0.271 0.373 sources as possible due to the fact that our cloud-based True 0.795 0.816 0.805 approach outsources NLP to external NLP services. More- OPERAM medications False 0.795 0.816 0.805 over, Bluima has not been updated for four years. Instead of replacing the popular NLP tools, our approach should be considered as an alternative approach in the face of time and performance in extracting medical conditions requires resource constraints. physicians to spend more time filtering out incorrect ex- tractions. Secondly, the prototype fails to identify the as- sociated dosages and frequencies of medications. 4.2. Error Analysis. An error analysis has been carried out in order to better understand the performance of the prototype. As explained in Section 2.2, there are two types 4. Discussion of errors, namely, FPs and FNs. Figure 5 shows the per- We argue that outsourcing NLP tasks offers efficient NLP centage of FP and FN errors in all experiments. First of all, solutions for processing unstructured clinical documents. To one major source of errors in the two i2b2 datasets is false begin with, outsourcing often leads to a reduction of both IT negatives, which means many annotated terms in the development and maintenance costs. Furthermore, a lower datasets are not extracted by our prototype. )e high proportion of FNs is in great part attributed to the entity- level of NLP expertise is required when external NLP ser- vices are used. A developer with limited knowledge of NLP type detection errors. Since some NLP APIs (Mean- ingCloud and Open Calais) are unable to extract phar- could develop a clinical NLP application such as our pro- totype. Lastly, the architecture supports NLP services be- maceutical drug entities, it results in a lower amount of yond clinical concept extraction. By adding a sentiment extracted entities and higher false negatives. )erefore, to analysis NLP pipeline constructed by external NLP APIs, our enhance the performance, NLP APIs such as Mean- prototype can perform sentiment analysis on clinical doc- ingCloud and Open Calais might as well be excluded. uments. For instance, changing from concept extraction Nevertheless, the higher number of false positives led to to sentiment analysis can be accomplished by adjusting the an overall performance loss in the OPERAM medical API request parameters from “{“features”: “entities”}” to conditions extraction. We found out that the problem lies “{“features”: “sentiment”}.” in the annotation. For example, the sentence “Fall during Journal of Healthcare Engineering 9 Table 4: Overall results on three datasets. Dataset κ c θ Recall Precision F1 score 0.1 0.2 0.805 0.925 0.861 Obesity challenge False Baseline 0.771 0.815 0.787 0.1 0.35 0.636 0.838 0.724 Medication challenge False Baseline 0.794 0.845 0.818 OPERAM medical conditions True 0.1 0.5 0.594 0.271 0.373 OPERAM medications False 0 0.35 0.795 0.816 0.805 Average of the top 5 best systems from the challenge. Obesity challenge Medication challenge OPERAM medical conditions OPERAM medications FP (%) FN (%) Figure 5: Error distribution of all the experiments, false positives vs false negatives. calculated. Since the prototype was running locally on a the night, multiple hematomas. Orthostatic hypotension proven.” contains two medical conditions: hematoma and laptop with 8 GB RAM, we think it would become faster if we implement it on a larger server. orthostatic hypotension. Hematoma was found by two out of six extractors; orthostatic hypotension was found by five out In practice, clinical NLP is employed to solve various of six. However, neither of these two was annotated, most clinical problems, ranging from entity extraction to cohort likely because the context of the sentence was in past tense detection. Our research demonstrates that the proposed and potentially not applicable to the current state of the approach performs well on clinical concept extraction. It is patient. crucial to conduct further evaluation on other tasks, such as cohort detection and sentiment analysis before adopting the approach in practice. 4.3. Limitations and Future Research. )ere are a number of Last but not least, due to the wide adoption of health hurdles that prevent the adoption of our approach in daily information systems (HIS) in healthcare institutions, de- practice. Further research is necessary to sufficiently address veloping a simple method that supports the integration of these concerns. First of all, practical implementation re- our approach with HIS would facilitate its implementation. quires a more thorough privacy and security component. )e privacy and security component is part of the proposed 5. Conclusion architecture and currently implemented in the prototype using CRATE [26] and HTTPS. However, since only ano- )e proposed NLP architecture offers an efficient solution to nymized datasets are used in the evaluation, the deidenti- develop tools that are capable of processing unstructured fication toolkit, CRATE, was not validated. Before the clinical data in the healthcare industry. With our approach, practical adoption, we need to first evaluate the performance less time and resources are required to create and maintain of the privacy and security component with real-world NLP-enabled clinical tools given that all NLP tasks are clinical data. outsourced. Moreover, the prototype built upon the ap- Another concern lies in the computational efficiency of proach produces satisfactory overall results, and its per- our approach, namely, execution time. As shown in the formance on certain datasets indicates that its practical demo video, it takes about 20 seconds to process a discharge application in clinical text processing, particularly clinical letter. In specific, the majority of time (15 seconds) goes to concept extraction, is promising. Nevertheless, high variance annotation in which extracted terms are first encoded with among different datasets brings concerns on its general- UMLS and then pairwise similarity between them is ization and practicability. 10 Journal of Healthcare Engineering Medical Informatics Association, vol. 17, no. 3, pp. 245–252, Data Availability [10] R. E. de Castilho and I. Gurevych, “A broad-coverage col- Source code and the OPERAM dataset are available at lection of portable NLP components for building shareable the GitHub repository https://github.com/ianshan0915/ analysis pipelines,” in Proceedings of the Workshop on Open MABNLP. )e two i2b2 datasets are accessible from Infrastructures and Analysis Frameworks for HLT, pp. 1–11, https://www.i2b2.org/NLP/DataSets/Main.php. Finally, Dublin, Ireland, August 2014. a demo video of the prototype is available at https:// [11] K. Kreimeyer, M. Foster, A. Pandey et al., “Natural language youtu.be/dGk9NQGWYfI. processing systems for capturing and standardizing un- structured clinical information: a systematic review,” Journal Conflicts of Interest of Biomedical Informatics, vol. 73, pp. 14–29, 2017. [12] D. Carrell, “A strategy for deploying secure cloud-based )e authors declare that they have no conflicts of interest. natural language processing systems for applied research involving clinical text,” in Proceedings of the System Sciences Acknowledgments (HICSS), 2011 44th Hawaii International Conference on 2011, pp. 1–11, Kauai, HI, USA, January 2011. )is work is part of the project “OPERAM: OPtimising [13] R. Dale, “NLP meets the cloud,” Natural Language Engi- thERapy to prevent Avoidable hospital admissions in the neering, vol. 21, no. 4, pp. 653–659, 2015. Multimorbid elderly” supported by the European Com- [14] W. L. Currie, B. Desai, and N. Khan, “Customer evaluation of mission (EC) HORIZON 2020, proposal 634238, and by the application services provisioning in five vertical sectors,” Swiss State Secretariat for Education, Research and In- Journal of Information Technology, vol. 19, no. 1, pp. 39–58, novation (SERI), under contract number 15.0137. )e [15] A. Rago, F. M. Ramos, J. I. Velez, J. A. D´ıaz Pace, and opinions expressed and arguments employed herein are C. Marcos, “TeXTracT: a web-based tool for building NLP- those of the authors and do not necessarily reflect the official enabled applications,” in Proceedings of the Simposio Argen- views of the EC and the Swiss government. tino de Ingenier´ıa de Software (ASSE 2016)-JAIIO 45, Buenos Aires, Argentina, September 2016. References [16] G. Haffari, M. Carman, and T. D. Tran, “Efficient bench- marking of NLP APIs using multi-armed bandits,” in Pro- [1] S. Doan, M. Conway, T. M. Phuong, and L. Ohno-Machado, ceedings of the 15th Conference of the European Chapter of the “Natural language processing in biomedicine: a unified system Association for Computational Linguistics, vol. 1, pp. 408–416, architecture overview,” in Methods in Molecular Biology, Valencia, Spain, April 2017. vol. 1168, pp. 275–294, Springer, Clifton, NJ, USA, 2014. [17] S. Hellmann, J. Lehmann, S. Auer, and M. Brummer, ¨ “In- [2] G. K. Savova, J. J. Masanz, P. V. Ogren et al., “Mayo clinical tegrating NLP using linked data,” in Proceedings of the 12th text analysis and knowledge extraction system (cTAKES): International Semantic Web Conference, Sydney, Australia, architecture, component evaluation and applications,” Jour- October 2013. nal of the American Medical Informatics Association, vol. 17, [18] P. Mart´ınez, J. L. Mart´ınez, I. Segura-Bedmar, J. Moreno- no. 5, pp. 507–513, 2010. Schneider, A. Luna, and R. Revert, “Turning user generated [3] J. Patrick and M. Li, “High accuracy information extraction of health-related content into actionable knowledge through text medication information from clinical notes: 2009 i2b2 analytics services,” Natural Language Processing and Text medication extraction challenge,” Journal of the American Analytics in Industry, vol. 78, pp. 43–56, 2016. Medical Informatics Association, vol. 17, no. 5, pp. 524–527, [19] Z. S. Abdallah, M. Carman, and G. Haffari, “Multi-domain evaluation framework for named entity recognition tools,” [4] T. Cai, A. A. Giannopoulos, S. Yu et al., “Natural language Computer Speech & Language, vol. 43, pp. 34–55, 2017. processing technologies in radiology research and clinical [20] K. Chard, M. Russell, Y. A. Lussier, E. A. Mendonca, and applications,” Radiographics, vol. 36, no. 1, pp. 176–191, 2016. J. C. Silverstein, “A cloud-based approach to medical NLP,” [5] M. Kreuzthaler and S. Schulz, “Detection of sentence AMIA Annual Symposium Proceedings, pp. 207–216, 2011. boundaries and abbreviations in clinical narratives,” BMC [21] G. Rizzo and R. Troncy, “NERD: a framework for unifying Medical Informatics and Decision Making, vol. 15, no. S2, named entity recognition and disambiguation web extraction pp. 1–13, 2015. tools,” in Proceedings of the System Demonstration at the 13th [6] O. Bodenreider, “)e unified medical language system Conference of the European Chapter of the Association for (UMLS): integrating biomedical terminology,” Nucleic Acids Computational Linguistics (EACL’2012), Avignon, France, Research, vol. 32, no. 1, pp. D267–D270, 2004. April 2012. [7] H. Cunningham, V. Tablan, A. Roberts, and K. Bontcheva, [22] API-Based CMS Buyer’s Guide, May 2018, https://nordicapis. “Getting more out of biomedical documents with GATE’s full com/api-based-cms-buyers-guide/. lifecycle open source text analytics,” PLoS Computational Biology, vol. 9, no. 2, Article ID e1002854, 2013. [23] A. R. Hevner, S. T. March, J. Park, and S. Ram, “Design science in information systems research,” Management In- [8] D. Ferrucci and A. Lally, “UIMA: an architectural approach to unstructured information processing in the corporate re- formation Systems Quarterly, vol. 28, no. 1, p. 6, 2008. [24] K. Peffers, T. Tuunanen, M. A. Rothenberger, and search environment,” Natural Language Engineering, vol. 10, no. 3-4, pp. 327–348, 2004. S. Chatterjee, “A design science research methodology for information systems research,” Journal of Management In- [9] J.-H. Chiang, J.-W. Lin, and C.-W. Yang, “Automated eval- uation of electronic discharge notes to assess quality of care formation Systems, vol. 24, no. 3, pp. 45–77, 2007. [25] Z. Shen, M. Meulendijk, and M. Spruit, “A federated in- for cardiovascular diseases using medical language extraction and encoding system (MedLEE),” Journal of the American formation architecture for multinational clinical trials: Journal of Healthcare Engineering 11 STRIPA revisited,” in Proceedings of the 24th European Conference on Information Systems (ECIS), Istanbul, Turkey, June 2016. [26] R. N. Cardinal, “Clinical records anonymisation and text extraction (CRATE): an open-source software system,” BMC Medical Informatics and Decision Making, vol. 17, no. 1, p. 50, [27] V. Menger, F. Scheepers, L. M. van Wijk, and M. Spruit, “DEDUCE: a pattern matching method for automatic de- identification of Dutch medical text,” Telematics and In- formatics, vol. 35, no. 4, pp. 727–736, 2018. [28] B. Koopman, G. Zuccon, A. Nguyen, A. Bergheim, and N. Grayson, “Automatic ICD-10 classification of cancers from free-text death certificates,” International Journal of Medical Informatics, vol. 84, no. 11, pp. 956–965, 2015. [29] Y. Ni, S. Kennebeck, J. W. Dexheimer et al., “Automated clinical trial eligibility prescreening: increasing the efficiency of patient identification for clinical trials in the emergency department,” Journal of the American Medical Informatics Association, vol. 22, no. 1, pp. 166–178, 2015. [30] B. J. Marafino, W. John Boscardin, and R. Adams Dudley, “Efficient and sparse feature selection for biomedical text classification via the elastic net: application to ICU risk stratification from nursing notes,” Journal of Biomedical In- formatics, vol. 54, pp. 114–120, 2015. [31] V. Menger, M. Spruit, R. van Est, E. Nap, and F. Scheepers, “Machine learning approach to inpatient violence risk as- sessment using routinely collected clinical notes in electronic health records,” JAMA Network Open, vol. 2, no. 7, Article ID e196709, 2019, 2019. [32] J. P. Pestian, P. Matykiewicz, M. Linn-Gust et al., “Sentiment analysis of suicide notes: a shared task,” Biomedical In- formatics Insights, vol. 5S1, 2012. [33] W. W. Chapman, W. Bridewell, P. Hanbury, G. F. Cooper, and B. G. Buchanan, “Evaluation of negation phrases in narrative clinical reports,” in Proceedings of the AMIA Sym- posium, pp. 105–109, Washington, DC, USA, November 2001. [34] S. Mehrabi, A. Krishnan, S. Sohn et al., “DEEPEN: a negation detection system for clinical text incorporating dependency relation into NegEx,” Journal of Biomedical Informatics, vol. 54, pp. 213–219, 2015. [35] W. W. Chapman, G. F. Cooper, P. Hanbury, B. E. Chapman, L. H. Harrison, and M. M. Wagner, “Creating a text classifier to detect radiology reports describing mediastinal findings associated with inhalational anthrax and other disorders,” Journal of the American Medical Informatics Association, vol. 10, no. 5, pp. 494–503, 2003. [36] S. Meystre and P. J. Haug, “Natural language processing to extract medical problems from electronic clinical documents: performance evaluation,” Journal of Biomedical Informatics, vol. 39, no. 6, pp. 589–599, 2006. [37] K. J. Mitchell, M. J. Becich, J. J. Berman et al., “Imple- mentation and evaluation of a negation tagger in a pipeline- based system for information extract from pathology reports,” Studies in Health Technology and Informatics, vol. 107, no. 1, pp. 663–667, 2004. [38] O. Uzuner, “Recognizing obesity and comorbidities in sparse data,” Journal of the American Medical Informatics Associa- tion, vol. 16, no. 4, pp. 561–570, 2009. [39] O. Uzuner, I. Solti, and E. Cadag, “Extracting medication in- formation from clinical text,” Journal of the American Medical Informatics Association, vol. 17, no. 5, pp. 514–518, 2010. [40] P. Przybyła, M. Shardlow, S. Aubin et al., “Text mining re- sources for the life sciences,” Database, vol. 2016, 2016. International Journal of Advances in Rotating Machinery Multimedia Journal of The Scientific Journal of Engineering World Journal Sensors Hindawi Hindawi Publishing Corporation Hindawi Hindawi Hindawi Hindawi www.hindawi.com Volume 2018 http://www www.hindawi.com .hindawi.com V Volume 2018 olume 2013 www.hindawi.com Volume 2018 www.hindawi.com Volume 2018 www.hindawi.com Volume 2018 Journal of Control Science and Engineering Advances in Civil Engineering Hindawi Hindawi www.hindawi.com Volume 2018 www.hindawi.com Volume 2018 Submit your manuscripts at www.hindawi.com Journal of Journal of Electrical and Computer Robotics Engineering Hindawi Hindawi www.hindawi.com Volume 2018 www.hindawi.com Volume 2018 VLSI Design Advances in OptoElectronics International Journal of Modelling & Aerospace International Journal of Simulation Navigation and in Engineering Engineering Observation Hindawi Hindawi Hindawi Hindawi Volume 2018 Volume 2018 Hindawi www.hindawi.com Volume 2018 www.hindawi.com Volume 2018 www.hindawi.com www.hindawi.com www.hindawi.com Volume 2018 International Journal of Active and Passive International Journal of Antennas and Advances in Chemical Engineering Propagation Electronic Components Shock and Vibration Acoustics and Vibration Hindawi Hindawi Hindawi Hindawi Hindawi www.hindawi.com Volume 2018 www.hindawi.com Volume 2018 www.hindawi.com Volume 2018 www.hindawi.com Volume 2018 www.hindawi.com Volume 2018

Journal

Journal of Healthcare EngineeringHindawi Publishing Corporation

Published: Aug 15, 2019

References