TY - JOUR
AU1 - Rashed, Abdel-Hamed Mohamed
AU2 - El-Attar, Noha E.
AU3 - Abdelminaam, Diaa Salama
AU4 - Abdelfatah, Mohamed
AB - 1. Introduction Healthcare is one of the most important business sectors in any country. Obtaining a high-quality healthcare service as well as not overrated cost is a challenging issue, where the good health of citizens reflects positively on their productivity. Healthcare management is described as a multilateral process; healthcare is provided in hospitals, rehabilitation centers, nursing homes, and clinics. Also, many professionals may participate in it such as general practitioners, dentists, midwives, and physiotherapists. Healthcare contains massive data from its databases related to the care processes. Consequently, enhancing the performance of its processes will improve the care services quality and reduce its costs. Business process management (BPM) domains investigate how to improve the business process of the organization. Domains in BPM like process modeling, process mapping, and process mining are used interchangeably as soon as their similarity. Process mapping is used to draw an idealized process model of how processes should be performed using workshops, interviews, and document analysis. Process mining uses automation techniques to extract process models from organizational databases. Process modeling defines how the process model should be, it depends on human perception [1]. Improving healthcare processes effects directly on the quality and costs of the care services, but there are many barriers related to the management of healthcare processes such as highly dynamic, complex, ad-hoc, and increasingly multidisciplinary [2]. Nevertheless, processes of different complexity and duration (up to several months) can be identified. The process mining can solve these challenges by relating the actual behavior of organization processes with the modeled one. It leads to reveal insights that may prove that hand-made (based on opinion and beliefs) is different from the reality. The business process mining is using techniques and tools to discover, monitor, and improve real processes based on event logs, which are extracted from business information systems. This paper presents process mining application on dataset of a hospital in Egypt, the process discovery algorithms aim to discover the careflows of patients from the event log that extracted from the information system of the hospital. The discovered model explores the patient’s journey inside the hospital as s/he registers to the end. Then discovered model is checked the consensus with standard process model to enhance any error and deviation. The process model and event log were analyzed under the organizational and performance perspectives. The results of the analysis are important for any future suggested improvements in hospital processes. This paper is organized as: Section 1 provides background concepts about process mining in healthcare. And it introduces the previous related applications and works of healthcare using process mining. Then it is followed by section 2 that presents the approach that is proposed to be applied. In the section 3 discusses the results when apply the proposed approach on the case study, the results of the preparing the event logs, then results when applying four miner algorithms to discover the process model from the event log. In the end of this section; the results of performing two main analysis perspectives. One for performance analysis and the second is for organizational analysis. At the last section 4; the conclusion will be presented to summarized the results and points to the future works. 1.1 Process mining and healthcare Process mining brings together the data-centric analysis (such as data mining) and process-centric analysis (such as simulation); hand-made simulation is used in healthcare projects to plan the processes and its predictions, but because of the careflows complexity; the processes do not agree with the reality and fail in the improvement. Process mining can solve that challenge by modeling the care processes from historical event logs to reflect its real processes and go ahead to the improvement. Process mining leverages of data mining methods such as association, classification, clustering, prediction, sequential patterns, and regression in many interior stages such as preprocessing the event logs, clustering the traces, prediction [3]. Process mining techniques tend to automatically discover process models by extracting event logs from the process-aware information systems (PAIS), then check the conformance of process models to reality, and improve process performance in terms of key performance indicators KPIs (cost, time, quality) [4]. The event log is a group of traces, each group representing a process case (i.e., a specific execution or instance of a process), with each trace described as a sequence of events (related to process activity executions) ordered by timestamp and performed by many resources as humans, machines and etc. that involved in the case. Every event has required attributes, such as the activity name, execution time, optional attributes such as the cost, the lifecycle of the activity (i.e., completed, started, assigned) and the resource doing it. Fig 1 shows the implementation of the process mining in a clinical environment, once event logs are extracted from the hospital information system, they can be used as start point for the process mining functions, which are usually classified in the three iterative functions: discovery, conformance checking and enhancement. Firstly; discovery process aims to create process models from the event logs automatically. Secondly; the conformance checking process starts by replaying the event logs with designed simulated model (hand-made model) to check conformance and to uncover bottlenecks in the process. Finally, enhancement process is to enrich the model with the insights that extracted from the event logs, for example, using performance data or timing information on model to display the bottlenecks, throughput, and frequencies [5, 6]. Download: PPT PowerPoint slide PNG larger image TIFF original image Fig 1. Process mining approach including process discovery, conformance checking and enhancements for a hospital [7]. https://doi.org/10.1371/journal.pone.0281836.g001 Discovery process is done under three different perspectives, control-flow perspective: it uses event logs for a set of traces to create a process model that defines all possible paths. It is expressed in terms of a petri net or some other notations (e.g., BPMN, YAWL, EPC, etc.), organizational perspective: it focuses on the actors (people, machines, roles, systems, and departments) are involved within the system and how they are related. This can give deep insights in the roles and organizational units or in the relations between individual performers through a social network, performance perspective: it concerns on the occurrence time and frequencies of events. It graphically shows the bottleneck cases and all types of performance indicators, for example, the average/variance of the total flow time within a process or the sojourn time of a given event. The consensus of healthcare experts is on the complexity of managing clinical pathways, due to the nature of clinical operations; including very long time-consuming, the failure to set a binding deadline for the clinical process due to the presence of several participants and their variety, as well as many resources. Also, another reason for the complexity of clinical processes is the difficulty of formalizing them. In other term, dealing with the same disease is different from one patient to another; the clinical protocol map for the same disease differs; this is called “variation or drifting” dues to the difference in the patients’ traditions, behaviors, commitment to treatment, or the different traditions of their countries. Also, the patient health statuses; where some patients have other diseases make the treatment method different. Also, it may be due to the difference in the beliefs and experiences of the doctors and the nursing specialist and the resources available to them [8]. Based on aforementioned problems related to clinical pathways; many risks cause inefficiency of the healthcare activities performance such as: large waiting time, high cost, deviations from the standard model, bottlenecks, and frequent activities. Process mining techniques propose solutions to cope these risks by understanding what careflows are really happening and analyzing if there are any deviation from the prescription process model. Moreover, using the timestamps of events can determine and detect bottlenecks and other ineffectiveness. The terms of careflows; patient pathways; care processes; and healthcare processes have the same meaning. ProM (promtools.org)is the most frequently used framework for process mining, which will be used throughout this study. ProM has a set of tools aimed to support techniques in various mining processes through plugins which enable researchers to apply versatile algorithms on the collected event logs [9]. Other frequently used process mining tools are Disco (https://fluxicon.com/disco) and Celonis (http://my.celonis.de/login)), and ProDiscovery (https://www.puzzledata.com/prodiscovery_eng)). Custom made tools and techniques are also often used. 1.2 Related works Many process mining applications were utilized in several business processes management to become more efficient and more manageable. For example, Process Mining can be used to enhance the education services quality [10]. Also, it is exploited to optimize loan application processes in the banking sector [11]. And even it can be used to improve the service quality of complaint service in big companies [12]. This section introduces a number of related process mining applications in the healthcare sector. The study [8] used process mining techniques to infer meaningful knowledge insights for a hospital in Italy. The event logs are analyzed using the ProM framework from three different perspectives: the control flow perspective, the organizational perspective and the performance perspective. Applying process mining in this hospital provides the management to diagnose problems and set improvements based on real facts, represented in form of clinical data. There are two issues with this study:(1) the external validity; the extent by which the results can be applied generally beyond the parameters of the study.(2) The dataset records activities that took place during a period of two months between the end of 2015 and the beginning of 2016. For this reason, to confirm the validity and robustness of the findings, It must use the same process mining methods described in this paper to fresh (and more current) datasets collected over a longer time. In [13] the work was carried out conformance checking, the work used the process mining to check the conformance of the workflow of Open-Source EMRs (workflow from event logs of an EMR) and the workflow of hospitals (workflow of hospitals based on domain knowledge). The conformance checked the log and the model using alignment and replay and the results were evaluated on four metrics (fitness, precision, simplicity, and generalization). The main limitation of this study is the complexity and heterogeneity of healthcare processes. In [14], a healthcare case was presented to improve the business process by enhancing the KPIs associated with process instances data by using the process mining tools to extract knowledge from event logs related to the health care process, most of the patients were satisfied to the emergency department. The authors in [15] proposed a process redesign lifecycle phase coupled with process mining as an operational framework. The framework calculates indicators (time, cost, quality, and flexibility) to assess whether process redesign best practices. The limitation that found in this work that the authors analyze variations in the current and proposed processes, with results being positive, negative, and neutral for outcomes desired. The proposed framework has been applied on case studies in a hospital and a tour agency. Another application study was presented by [16] that proposed a new process mining method for simulating unstructured processes In order to achieve the most suitable process model, the proposed methodology permits the evaluation, comparison, and combination of the results of different process mining algorithms leveraging on conformance checking technique. The limitation of this study is related to the constraints of using Petri nets language for evaluating quality parameters, also another limitation is the possibility of lack of integration between coexisting information systems that cause lack of reliable event logs. In [17] another relevant study applied in dataset of a general university hospital in South Korea aiming to manage the duration of inpatient stay more efficiently. Where electronic health record (EHR) data and process mining technology were used to analyze all event records entered between patient admission and discharge, the limitation of this study is the generalizability to apply this approach, it is limited to one specific hospital not on multiple hospitals. Where there are differences between hospitals in the admission process and treatment plans. The study in [18] described how the process mining was applied to unstructured event data for an evidence-based process model discovery of patient journeys from start to end. The resulted process model is the basis for a simulation model. The resulted process models used for performance analysis and verification of the knowledge discovered using process mining techniques. The critical challenge of this study is to identify key process improvement areas, these areas are normally areas where the performances are measured by KPIs. The study [19] integrated the process mining and the goal programming approach to improve the design of an emergency department from the view of operations management. It has been observed that many patients must travel unneeded long distances during their visits owing to inefficient assignment of the clinical units to the available spaces. The proposed method was improved the distances traveled by noncritical and critical patients, this study was limited by the lack of information related to the staff movements and walking behaviors. Also, the study needed to determine the effects of proposed layout on the walking distances of ED staff and personnel. The paper proposed a framework for process performance indicators utilized in emergency rooms. The paper [20] suggested multiple process performance indicators that can be gained by analyzing the event logs from the clinical unit and verify them by discussion with clinical managers in the emergency department. The paper provided a single case study. As such, it is necessary to apply our framework to multiple emergency room event logs to validate it. Also, the paper only included the limited attributes of emergency room patient-related attributes; thus, only a part of them was measured. The study [21] compared the patient pathways across four Australian hospitals using process mining techniques. The study detects the differences visually in process variants to learn from each other. The challenges of this study; the clinical staff usually get a limited view of patient pathways, as a result, it is hard to provide better care, also another challenge; the relational databases do not exist across or even within hospitals. Therefore, merging data via unique identifiers is not straightforward, so the piecing health care data together is difficult. The study [22] proposed a framework based on process mining to analyze acute care from time dimension. The framework detected the time-critical medical procedures, and analyzed the causes of delay and evaluated the existing treatment process. The study has limitations such as: the methodology is tested on the data from only one hospital; another limitation is some medical activities are not recorded completely. In conclusion, Table 1 shows the results of the literature review. The table demonstrates some points about the related works such as the research context, the specialty if specified, and the type of process mining activity. This literature presents applications of process mining in the healthcare sector. But after an investigation and research there are no previous papers devoted in Egypt to the hospitals’ business process mining. The next section describes the applied approach of process mining at a hospital in Egypt by applying different process mining perspectives: the control-flow perspective, the organizational perspective and, the performance perspective to discover more analysis and insights to support the management of the hospital. Download: PPT PowerPoint slide PNG larger image TIFF original image Table 1. Overview of reviewed researches. https://doi.org/10.1371/journal.pone.0281836.t001 1.1 Process mining and healthcare Process mining brings together the data-centric analysis (such as data mining) and process-centric analysis (such as simulation); hand-made simulation is used in healthcare projects to plan the processes and its predictions, but because of the careflows complexity; the processes do not agree with the reality and fail in the improvement. Process mining can solve that challenge by modeling the care processes from historical event logs to reflect its real processes and go ahead to the improvement. Process mining leverages of data mining methods such as association, classification, clustering, prediction, sequential patterns, and regression in many interior stages such as preprocessing the event logs, clustering the traces, prediction [3]. Process mining techniques tend to automatically discover process models by extracting event logs from the process-aware information systems (PAIS), then check the conformance of process models to reality, and improve process performance in terms of key performance indicators KPIs (cost, time, quality) [4]. The event log is a group of traces, each group representing a process case (i.e., a specific execution or instance of a process), with each trace described as a sequence of events (related to process activity executions) ordered by timestamp and performed by many resources as humans, machines and etc. that involved in the case. Every event has required attributes, such as the activity name, execution time, optional attributes such as the cost, the lifecycle of the activity (i.e., completed, started, assigned) and the resource doing it. Fig 1 shows the implementation of the process mining in a clinical environment, once event logs are extracted from the hospital information system, they can be used as start point for the process mining functions, which are usually classified in the three iterative functions: discovery, conformance checking and enhancement. Firstly; discovery process aims to create process models from the event logs automatically. Secondly; the conformance checking process starts by replaying the event logs with designed simulated model (hand-made model) to check conformance and to uncover bottlenecks in the process. Finally, enhancement process is to enrich the model with the insights that extracted from the event logs, for example, using performance data or timing information on model to display the bottlenecks, throughput, and frequencies [5, 6]. Download: PPT PowerPoint slide PNG larger image TIFF original image Fig 1. Process mining approach including process discovery, conformance checking and enhancements for a hospital [7]. https://doi.org/10.1371/journal.pone.0281836.g001 Discovery process is done under three different perspectives, control-flow perspective: it uses event logs for a set of traces to create a process model that defines all possible paths. It is expressed in terms of a petri net or some other notations (e.g., BPMN, YAWL, EPC, etc.), organizational perspective: it focuses on the actors (people, machines, roles, systems, and departments) are involved within the system and how they are related. This can give deep insights in the roles and organizational units or in the relations between individual performers through a social network, performance perspective: it concerns on the occurrence time and frequencies of events. It graphically shows the bottleneck cases and all types of performance indicators, for example, the average/variance of the total flow time within a process or the sojourn time of a given event. The consensus of healthcare experts is on the complexity of managing clinical pathways, due to the nature of clinical operations; including very long time-consuming, the failure to set a binding deadline for the clinical process due to the presence of several participants and their variety, as well as many resources. Also, another reason for the complexity of clinical processes is the difficulty of formalizing them. In other term, dealing with the same disease is different from one patient to another; the clinical protocol map for the same disease differs; this is called “variation or drifting” dues to the difference in the patients’ traditions, behaviors, commitment to treatment, or the different traditions of their countries. Also, the patient health statuses; where some patients have other diseases make the treatment method different. Also, it may be due to the difference in the beliefs and experiences of the doctors and the nursing specialist and the resources available to them [8]. Based on aforementioned problems related to clinical pathways; many risks cause inefficiency of the healthcare activities performance such as: large waiting time, high cost, deviations from the standard model, bottlenecks, and frequent activities. Process mining techniques propose solutions to cope these risks by understanding what careflows are really happening and analyzing if there are any deviation from the prescription process model. Moreover, using the timestamps of events can determine and detect bottlenecks and other ineffectiveness. The terms of careflows; patient pathways; care processes; and healthcare processes have the same meaning. ProM (promtools.org)is the most frequently used framework for process mining, which will be used throughout this study. ProM has a set of tools aimed to support techniques in various mining processes through plugins which enable researchers to apply versatile algorithms on the collected event logs [9]. Other frequently used process mining tools are Disco (https://fluxicon.com/disco) and Celonis (http://my.celonis.de/login)), and ProDiscovery (https://www.puzzledata.com/prodiscovery_eng)). Custom made tools and techniques are also often used. 1.2 Related works Many process mining applications were utilized in several business processes management to become more efficient and more manageable. For example, Process Mining can be used to enhance the education services quality [10]. Also, it is exploited to optimize loan application processes in the banking sector [11]. And even it can be used to improve the service quality of complaint service in big companies [12]. This section introduces a number of related process mining applications in the healthcare sector. The study [8] used process mining techniques to infer meaningful knowledge insights for a hospital in Italy. The event logs are analyzed using the ProM framework from three different perspectives: the control flow perspective, the organizational perspective and the performance perspective. Applying process mining in this hospital provides the management to diagnose problems and set improvements based on real facts, represented in form of clinical data. There are two issues with this study:(1) the external validity; the extent by which the results can be applied generally beyond the parameters of the study.(2) The dataset records activities that took place during a period of two months between the end of 2015 and the beginning of 2016. For this reason, to confirm the validity and robustness of the findings, It must use the same process mining methods described in this paper to fresh (and more current) datasets collected over a longer time. In [13] the work was carried out conformance checking, the work used the process mining to check the conformance of the workflow of Open-Source EMRs (workflow from event logs of an EMR) and the workflow of hospitals (workflow of hospitals based on domain knowledge). The conformance checked the log and the model using alignment and replay and the results were evaluated on four metrics (fitness, precision, simplicity, and generalization). The main limitation of this study is the complexity and heterogeneity of healthcare processes. In [14], a healthcare case was presented to improve the business process by enhancing the KPIs associated with process instances data by using the process mining tools to extract knowledge from event logs related to the health care process, most of the patients were satisfied to the emergency department. The authors in [15] proposed a process redesign lifecycle phase coupled with process mining as an operational framework. The framework calculates indicators (time, cost, quality, and flexibility) to assess whether process redesign best practices. The limitation that found in this work that the authors analyze variations in the current and proposed processes, with results being positive, negative, and neutral for outcomes desired. The proposed framework has been applied on case studies in a hospital and a tour agency. Another application study was presented by [16] that proposed a new process mining method for simulating unstructured processes In order to achieve the most suitable process model, the proposed methodology permits the evaluation, comparison, and combination of the results of different process mining algorithms leveraging on conformance checking technique. The limitation of this study is related to the constraints of using Petri nets language for evaluating quality parameters, also another limitation is the possibility of lack of integration between coexisting information systems that cause lack of reliable event logs. In [17] another relevant study applied in dataset of a general university hospital in South Korea aiming to manage the duration of inpatient stay more efficiently. Where electronic health record (EHR) data and process mining technology were used to analyze all event records entered between patient admission and discharge, the limitation of this study is the generalizability to apply this approach, it is limited to one specific hospital not on multiple hospitals. Where there are differences between hospitals in the admission process and treatment plans. The study in [18] described how the process mining was applied to unstructured event data for an evidence-based process model discovery of patient journeys from start to end. The resulted process model is the basis for a simulation model. The resulted process models used for performance analysis and verification of the knowledge discovered using process mining techniques. The critical challenge of this study is to identify key process improvement areas, these areas are normally areas where the performances are measured by KPIs. The study [19] integrated the process mining and the goal programming approach to improve the design of an emergency department from the view of operations management. It has been observed that many patients must travel unneeded long distances during their visits owing to inefficient assignment of the clinical units to the available spaces. The proposed method was improved the distances traveled by noncritical and critical patients, this study was limited by the lack of information related to the staff movements and walking behaviors. Also, the study needed to determine the effects of proposed layout on the walking distances of ED staff and personnel. The paper proposed a framework for process performance indicators utilized in emergency rooms. The paper [20] suggested multiple process performance indicators that can be gained by analyzing the event logs from the clinical unit and verify them by discussion with clinical managers in the emergency department. The paper provided a single case study. As such, it is necessary to apply our framework to multiple emergency room event logs to validate it. Also, the paper only included the limited attributes of emergency room patient-related attributes; thus, only a part of them was measured. The study [21] compared the patient pathways across four Australian hospitals using process mining techniques. The study detects the differences visually in process variants to learn from each other. The challenges of this study; the clinical staff usually get a limited view of patient pathways, as a result, it is hard to provide better care, also another challenge; the relational databases do not exist across or even within hospitals. Therefore, merging data via unique identifiers is not straightforward, so the piecing health care data together is difficult. The study [22] proposed a framework based on process mining to analyze acute care from time dimension. The framework detected the time-critical medical procedures, and analyzed the causes of delay and evaluated the existing treatment process. The study has limitations such as: the methodology is tested on the data from only one hospital; another limitation is some medical activities are not recorded completely. In conclusion, Table 1 shows the results of the literature review. The table demonstrates some points about the related works such as the research context, the specialty if specified, and the type of process mining activity. This literature presents applications of process mining in the healthcare sector. But after an investigation and research there are no previous papers devoted in Egypt to the hospitals’ business process mining. The next section describes the applied approach of process mining at a hospital in Egypt by applying different process mining perspectives: the control-flow perspective, the organizational perspective and, the performance perspective to discover more analysis and insights to support the management of the hospital. Download: PPT PowerPoint slide PNG larger image TIFF original image Table 1. Overview of reviewed researches. https://doi.org/10.1371/journal.pone.0281836.t001 2. The proposed approach The papers will analyze the performance of the hospital by utilizing process mining techniques to detect any long waiting time or any deviation from the base model. The approach that’s implemented in this study consists of three main phases as it’s showed in Fig 2, (1) data preprocessing; (2) model discovery phase; (3) analysis phase. If the approach starts with discovery model without applying any preprocessing steps or clustering techniques on the event logs, the output will be unstructured as it is represented in Fig 3, Fig 3 is considered as “spaghetti-like” or unstructured model that shows all paths and does not distinguish the critical path or trivial one, It seems a complicated and hard to read, understand, or conduct analysis tasks. The proposed approach includes many steps: firstly, it starts with the preprocessing phase; the event logs are extracted from the hospital information system of the hospital. Then, prepare event logs by filtering them from outliers, noise parts. Secondly, process models are mined from event logs by four famous process discovery miner algorithms, if the generated model seemed as a spaghetti model (unstructured model), clustering and grouping methods are used to cluster the whole event logs into parts that refine the spaghetti model into a simple visualized model. After discovering the models, an evaluating the quality of the discovered models to choose the fitness one to use it in the following steps; thirdly, the analysis phase is proposed from two perspectives; performance analysis to discover deviations, bottlenecks and the insights about the events and cases related to the time; organizational analysis to mine the relations between the resources of the hospital. All steps in this methodology are implemented on ProM framework which has enormous plugins. Download: PPT PowerPoint slide PNG larger image TIFF original image Fig 2. Proposed methodology based on process mining for patient’s careflows. https://doi.org/10.1371/journal.pone.0281836.g002 Download: PPT PowerPoint slide PNG larger image TIFF original image Fig 3. The first discovered process (A spaghetti model) using every trace. https://doi.org/10.1371/journal.pone.0281836.g003 2.1 Preprocessing phase 2.1.1 Extracting data. Extracting data is how to build the event log from the raw data set, it is done by identifying the appropriate attributes that should be used in the event log, identifying the period to extract event logs, and extracting the events with high clarity. The event log often saved in an IEEE standard as XES (eXtensible Event Stream), which is supported by the wide majority of process mining tools. 2.1.2 Preprocessing data. The extracted event log is not qualified to apply the after-mentioned process mining techniques on it; there are many issues in event log such as irrelevant data, noise data, outlier data, and missing data. Also, some process instances are incomplete because they were started before the data extract begins, all these issues must be solved to prepare the event log to be ready. 2.2 Model discovery phase The model discovery is one of imperative process mining functions which automatically constructs process models from the event logs. The produced process model reflects the actual process as observed through real process executions. But it is critical to apply the clustering techniques before the model discovery to mine a simple model or sometimes after model discovery if it is needed. Clustering technique is a very useful for logs which contain various cases following different procedures, as in healthcare systems it is more frequent and common place. 2.2.1 Clustering techniques. Two main approaches for clustering and simplifying the data are abstraction and clustering techniques. The abstraction technique is to aggregate the low level activities into one high level activity. The focus will be on most significant aspects of the process or the main activity. The clustering technique is to group the traces that have the same similarity into separate clusters. Every sub-log groups the traces that have very similar characteristics. There are a number of clustering algorithms has been developed for complex network such as fast fuzzy clustering (F2CAN) algorithm that perform accurate and fast clustering [23]. There are two main methods of clustering technique that can be used, trace clustering and sequence clustering. Trace clustering clusters careflows based up on trace profiles [24]. A trace profile contains certain characteristics of the trace that is identified as essential, e.g. Activities, a certain sequence of activities, medical service providers, etc. From ProM plugins that has been implemented upon this technique is ActiTrac [25]. The second clustering method is the sequence clustering, which focuses only on the sequence of activities in traces and creates more simple models than trace clustering [26]. Clustering not only allows discovering more comprehensible models, but also allows to identify or confirm subgroups based upon their behavior. 2.2.2 Process discovery. Process mining considers the model discovery is the base step to build the model in such as graphical structure from the real events. Process discovery uses event log to generate a process model. Several techniques of discovery model are used to support the process mining, the earlier process discovery algorithm is α-algorithm [27] used to produce a petri net that represents the event log. The algorithm analyzes the event log, then establishes different dependency relations between tasks. Relations between tasks considered as casual and also describe the sequence of the tasks. But it is sensitive to noise and incompleteness of the event logs. On real-life data, such algorithm does not work well. This section will provide descriptions of the four discovery model algorithms that are applied throughout the paper: Heuristic miner addresses many problems with the α-algorithm; to create a model, event logs are analyzed by using the dependency values of the activities. Two steps to build a model by the heuristics miner are: creating dependency graph and creating causal matrix. Dependency graph illustrates the dependency (causality) of events. To create a dependency graph, the dependency matrix, and the length-one loop dependency are built. It builds the directly-follows matrix by using the frequencies between the activities. Then it gets the dependency matrix by using the formula: (1)> directly follows, a > b means a is directly followed by b The above Eq (1) calculated all cells in the dependency matrix. Now the gotten values between minus 1 and plus 1. a negative value means actually that there’s a negative relation. a high value in the plus side means that there’s a strong relation. Then the heuristic Miner builds a causal matrix to represent the correct process model. There are two types of non-observed activities, which are AND and XOR. The AND type represents parallel activities, while the XOR type represents sequential activities. By filtering certain relation and then using particular patterns, the Petri net is generated. This algorithm is useful when there is real-life data with a limited number of diverse events, one of the advantages of this algorithm is to output a Heuristics net and that a heuristic net that can be changed to other types of process models, such as a petri net is used for further analysis in ProM. It considers frequencies and significance in order to filter out noisy or infrequent behavior, which makes it less sensitive to noise and incomplete logs. Additionally, it can identify short loops. Thirdly, it permits single activities to be skipped. It does not guarantee the soundness of the process model [28]. Inductive miner includes two steps to achieve its work, firstly it creates Stochastic Task Graph (SAG) from event log, and at that point, it synchronizes the structures of event log instances, to generate the process model. It discovers a main split in the event log (there are different types of splits: sequential, parallel, concurrent and loop). Once the split has been discovered, the algorithm iterates over the sub-logs (found by applying the split), until a base case is identified. There are different variations of the algorithm, one of them—IMDFc—avoids the recursion on the sub-logs in favor of using the Directly Follows graph. It guarantees the soundness of the process model and a good fitness model.it internally does not work on petri nets but uses process trees then convert it to petri net [29]. ILP miner “an integer linear programming” approach based on the language-based theory of regions. That always results in a petri net that fits the event log perfectly, meaning that all traces can be replayed successfully on the net. ILP-based discovery algorithms mine causal dependencies between activities that are detected in the event log. However, it can produce models that are not very structured and less readable (i.e. spaghetti models). This approach performs well only under the assumption that the process under analysis shows frequent behavior, thus it results to be ineffective in describing low-frequent exceptional behavior. ILP miner is very sensitive to noise, as all traces need to be repayable in the net. It is not preferred to use this miner on real-life logs. The ILP miner outputs petri net model, As many calculations are required to solve ILP problems, ILP miners are typically slower than other algorithms. It is difficult to achieve progress on this topic because these problems are NP-hard [30]. ETM miner “Evolutionary Tree Miner” based on a genetic algorithm that allows the user to control the model discovery process based on user preferences to the four quality metrics: fitness, complexity, precision, and generalization. It based on process tree in its implementation, so no unsound model is generated. To achieve its work, ETM as genetic algorithm does several steps. When the event log of the observed behavior is input. A population of random process trees is created, and each tree contains exactly one instance of each activity. Then, for each candidate in the population, the four quality dimensions are determined. The overall fitness of the process tree is calculated using the weights assigned to each dimension. In the following step, certain stop criteria are checked, such as locating a tree with the desired overall fitness. If any of the stop requirements are met, the candidates in the population are modified and the fitness is again calculated. This process is repeated until at least one stop requirement is met. And the fittest candidate is then returned [31]. The technique successfully handles the problems like non-trivial constructs and/or noise present in the log. This genetic algorithm approach used global search techniques to handle these problems rather than depending on local information, Experiments were carried out using synthetic and real-life logs to explain that the fitness metric is complete and precise. The ETM execution time was extremely long (several hours on our target computer), and the computation failed several times due to memory lack (i.e. java heap space). 2.2.3 Evaluating the quality of the discovered models. Wil van in [30] proposed four quality dimensions of the discovered process model: fitness, simplicity, precision and generalization. Fitness means that the process model is capable of displaying all trace in logs from beginning to end. Simplicity means the simplest model that can be the best to describe the behavior of the model. It concerns on the human comprehension since it will be difficult to understand and reveal complex process models. The simplicity expresses the complexity of the process model. It can be measured by the number of arcs and nodes which are used in the process models. Generalization refers to the process model that gives more behaviors which are not existent in the event logs displaying in the model. Lastly, the precision implies to the ability of the model which disallowed undesirable behavior. 2.3 Analysis phase The analysis tasks are to provide insights into the processes in the business. The analysis in process mining can be presented through the performance analysis and organizational analysis. 2.3.1 Performance analysis. The performance analysis can be implemented by: Conformance checking that used to detect an existence deviation in the discovered careflow model from what was planned through handmade model. Statistics analysis to measure some metrics such as the consuming time of the case process, the consumed time for each activity, as well as the resources used with each activity, what activities consume a lot of time, as well as the long waiting times before the activity start. 2.3.2 Organization analysis. The social network miner used process logs to discover the social networks. because there are several social network analysis methods and research results available, The resulting social network enables analysis of the social relations between originators involving process executions. 2.1 Preprocessing phase 2.1.1 Extracting data. Extracting data is how to build the event log from the raw data set, it is done by identifying the appropriate attributes that should be used in the event log, identifying the period to extract event logs, and extracting the events with high clarity. The event log often saved in an IEEE standard as XES (eXtensible Event Stream), which is supported by the wide majority of process mining tools. 2.1.2 Preprocessing data. The extracted event log is not qualified to apply the after-mentioned process mining techniques on it; there are many issues in event log such as irrelevant data, noise data, outlier data, and missing data. Also, some process instances are incomplete because they were started before the data extract begins, all these issues must be solved to prepare the event log to be ready. 2.1.1 Extracting data. Extracting data is how to build the event log from the raw data set, it is done by identifying the appropriate attributes that should be used in the event log, identifying the period to extract event logs, and extracting the events with high clarity. The event log often saved in an IEEE standard as XES (eXtensible Event Stream), which is supported by the wide majority of process mining tools. 2.1.2 Preprocessing data. The extracted event log is not qualified to apply the after-mentioned process mining techniques on it; there are many issues in event log such as irrelevant data, noise data, outlier data, and missing data. Also, some process instances are incomplete because they were started before the data extract begins, all these issues must be solved to prepare the event log to be ready. 2.2 Model discovery phase The model discovery is one of imperative process mining functions which automatically constructs process models from the event logs. The produced process model reflects the actual process as observed through real process executions. But it is critical to apply the clustering techniques before the model discovery to mine a simple model or sometimes after model discovery if it is needed. Clustering technique is a very useful for logs which contain various cases following different procedures, as in healthcare systems it is more frequent and common place. 2.2.1 Clustering techniques. Two main approaches for clustering and simplifying the data are abstraction and clustering techniques. The abstraction technique is to aggregate the low level activities into one high level activity. The focus will be on most significant aspects of the process or the main activity. The clustering technique is to group the traces that have the same similarity into separate clusters. Every sub-log groups the traces that have very similar characteristics. There are a number of clustering algorithms has been developed for complex network such as fast fuzzy clustering (F2CAN) algorithm that perform accurate and fast clustering [23]. There are two main methods of clustering technique that can be used, trace clustering and sequence clustering. Trace clustering clusters careflows based up on trace profiles [24]. A trace profile contains certain characteristics of the trace that is identified as essential, e.g. Activities, a certain sequence of activities, medical service providers, etc. From ProM plugins that has been implemented upon this technique is ActiTrac [25]. The second clustering method is the sequence clustering, which focuses only on the sequence of activities in traces and creates more simple models than trace clustering [26]. Clustering not only allows discovering more comprehensible models, but also allows to identify or confirm subgroups based upon their behavior. 2.2.2 Process discovery. Process mining considers the model discovery is the base step to build the model in such as graphical structure from the real events. Process discovery uses event log to generate a process model. Several techniques of discovery model are used to support the process mining, the earlier process discovery algorithm is α-algorithm [27] used to produce a petri net that represents the event log. The algorithm analyzes the event log, then establishes different dependency relations between tasks. Relations between tasks considered as casual and also describe the sequence of the tasks. But it is sensitive to noise and incompleteness of the event logs. On real-life data, such algorithm does not work well. This section will provide descriptions of the four discovery model algorithms that are applied throughout the paper: Heuristic miner addresses many problems with the α-algorithm; to create a model, event logs are analyzed by using the dependency values of the activities. Two steps to build a model by the heuristics miner are: creating dependency graph and creating causal matrix. Dependency graph illustrates the dependency (causality) of events. To create a dependency graph, the dependency matrix, and the length-one loop dependency are built. It builds the directly-follows matrix by using the frequencies between the activities. Then it gets the dependency matrix by using the formula: (1)> directly follows, a > b means a is directly followed by b The above Eq (1) calculated all cells in the dependency matrix. Now the gotten values between minus 1 and plus 1. a negative value means actually that there’s a negative relation. a high value in the plus side means that there’s a strong relation. Then the heuristic Miner builds a causal matrix to represent the correct process model. There are two types of non-observed activities, which are AND and XOR. The AND type represents parallel activities, while the XOR type represents sequential activities. By filtering certain relation and then using particular patterns, the Petri net is generated. This algorithm is useful when there is real-life data with a limited number of diverse events, one of the advantages of this algorithm is to output a Heuristics net and that a heuristic net that can be changed to other types of process models, such as a petri net is used for further analysis in ProM. It considers frequencies and significance in order to filter out noisy or infrequent behavior, which makes it less sensitive to noise and incomplete logs. Additionally, it can identify short loops. Thirdly, it permits single activities to be skipped. It does not guarantee the soundness of the process model [28]. Inductive miner includes two steps to achieve its work, firstly it creates Stochastic Task Graph (SAG) from event log, and at that point, it synchronizes the structures of event log instances, to generate the process model. It discovers a main split in the event log (there are different types of splits: sequential, parallel, concurrent and loop). Once the split has been discovered, the algorithm iterates over the sub-logs (found by applying the split), until a base case is identified. There are different variations of the algorithm, one of them—IMDFc—avoids the recursion on the sub-logs in favor of using the Directly Follows graph. It guarantees the soundness of the process model and a good fitness model.it internally does not work on petri nets but uses process trees then convert it to petri net [29]. ILP miner “an integer linear programming” approach based on the language-based theory of regions. That always results in a petri net that fits the event log perfectly, meaning that all traces can be replayed successfully on the net. ILP-based discovery algorithms mine causal dependencies between activities that are detected in the event log. However, it can produce models that are not very structured and less readable (i.e. spaghetti models). This approach performs well only under the assumption that the process under analysis shows frequent behavior, thus it results to be ineffective in describing low-frequent exceptional behavior. ILP miner is very sensitive to noise, as all traces need to be repayable in the net. It is not preferred to use this miner on real-life logs. The ILP miner outputs petri net model, As many calculations are required to solve ILP problems, ILP miners are typically slower than other algorithms. It is difficult to achieve progress on this topic because these problems are NP-hard [30]. ETM miner “Evolutionary Tree Miner” based on a genetic algorithm that allows the user to control the model discovery process based on user preferences to the four quality metrics: fitness, complexity, precision, and generalization. It based on process tree in its implementation, so no unsound model is generated. To achieve its work, ETM as genetic algorithm does several steps. When the event log of the observed behavior is input. A population of random process trees is created, and each tree contains exactly one instance of each activity. Then, for each candidate in the population, the four quality dimensions are determined. The overall fitness of the process tree is calculated using the weights assigned to each dimension. In the following step, certain stop criteria are checked, such as locating a tree with the desired overall fitness. If any of the stop requirements are met, the candidates in the population are modified and the fitness is again calculated. This process is repeated until at least one stop requirement is met. And the fittest candidate is then returned [31]. The technique successfully handles the problems like non-trivial constructs and/or noise present in the log. This genetic algorithm approach used global search techniques to handle these problems rather than depending on local information, Experiments were carried out using synthetic and real-life logs to explain that the fitness metric is complete and precise. The ETM execution time was extremely long (several hours on our target computer), and the computation failed several times due to memory lack (i.e. java heap space). 2.2.3 Evaluating the quality of the discovered models. Wil van in [30] proposed four quality dimensions of the discovered process model: fitness, simplicity, precision and generalization. Fitness means that the process model is capable of displaying all trace in logs from beginning to end. Simplicity means the simplest model that can be the best to describe the behavior of the model. It concerns on the human comprehension since it will be difficult to understand and reveal complex process models. The simplicity expresses the complexity of the process model. It can be measured by the number of arcs and nodes which are used in the process models. Generalization refers to the process model that gives more behaviors which are not existent in the event logs displaying in the model. Lastly, the precision implies to the ability of the model which disallowed undesirable behavior. 2.2.1 Clustering techniques. Two main approaches for clustering and simplifying the data are abstraction and clustering techniques. The abstraction technique is to aggregate the low level activities into one high level activity. The focus will be on most significant aspects of the process or the main activity. The clustering technique is to group the traces that have the same similarity into separate clusters. Every sub-log groups the traces that have very similar characteristics. There are a number of clustering algorithms has been developed for complex network such as fast fuzzy clustering (F2CAN) algorithm that perform accurate and fast clustering [23]. There are two main methods of clustering technique that can be used, trace clustering and sequence clustering. Trace clustering clusters careflows based up on trace profiles [24]. A trace profile contains certain characteristics of the trace that is identified as essential, e.g. Activities, a certain sequence of activities, medical service providers, etc. From ProM plugins that has been implemented upon this technique is ActiTrac [25]. The second clustering method is the sequence clustering, which focuses only on the sequence of activities in traces and creates more simple models than trace clustering [26]. Clustering not only allows discovering more comprehensible models, but also allows to identify or confirm subgroups based upon their behavior. 2.2.2 Process discovery. Process mining considers the model discovery is the base step to build the model in such as graphical structure from the real events. Process discovery uses event log to generate a process model. Several techniques of discovery model are used to support the process mining, the earlier process discovery algorithm is α-algorithm [27] used to produce a petri net that represents the event log. The algorithm analyzes the event log, then establishes different dependency relations between tasks. Relations between tasks considered as casual and also describe the sequence of the tasks. But it is sensitive to noise and incompleteness of the event logs. On real-life data, such algorithm does not work well. This section will provide descriptions of the four discovery model algorithms that are applied throughout the paper: Heuristic miner addresses many problems with the α-algorithm; to create a model, event logs are analyzed by using the dependency values of the activities. Two steps to build a model by the heuristics miner are: creating dependency graph and creating causal matrix. Dependency graph illustrates the dependency (causality) of events. To create a dependency graph, the dependency matrix, and the length-one loop dependency are built. It builds the directly-follows matrix by using the frequencies between the activities. Then it gets the dependency matrix by using the formula: (1)> directly follows, a > b means a is directly followed by b The above Eq (1) calculated all cells in the dependency matrix. Now the gotten values between minus 1 and plus 1. a negative value means actually that there’s a negative relation. a high value in the plus side means that there’s a strong relation. Then the heuristic Miner builds a causal matrix to represent the correct process model. There are two types of non-observed activities, which are AND and XOR. The AND type represents parallel activities, while the XOR type represents sequential activities. By filtering certain relation and then using particular patterns, the Petri net is generated. This algorithm is useful when there is real-life data with a limited number of diverse events, one of the advantages of this algorithm is to output a Heuristics net and that a heuristic net that can be changed to other types of process models, such as a petri net is used for further analysis in ProM. It considers frequencies and significance in order to filter out noisy or infrequent behavior, which makes it less sensitive to noise and incomplete logs. Additionally, it can identify short loops. Thirdly, it permits single activities to be skipped. It does not guarantee the soundness of the process model [28]. Inductive miner includes two steps to achieve its work, firstly it creates Stochastic Task Graph (SAG) from event log, and at that point, it synchronizes the structures of event log instances, to generate the process model. It discovers a main split in the event log (there are different types of splits: sequential, parallel, concurrent and loop). Once the split has been discovered, the algorithm iterates over the sub-logs (found by applying the split), until a base case is identified. There are different variations of the algorithm, one of them—IMDFc—avoids the recursion on the sub-logs in favor of using the Directly Follows graph. It guarantees the soundness of the process model and a good fitness model.it internally does not work on petri nets but uses process trees then convert it to petri net [29]. ILP miner “an integer linear programming” approach based on the language-based theory of regions. That always results in a petri net that fits the event log perfectly, meaning that all traces can be replayed successfully on the net. ILP-based discovery algorithms mine causal dependencies between activities that are detected in the event log. However, it can produce models that are not very structured and less readable (i.e. spaghetti models). This approach performs well only under the assumption that the process under analysis shows frequent behavior, thus it results to be ineffective in describing low-frequent exceptional behavior. ILP miner is very sensitive to noise, as all traces need to be repayable in the net. It is not preferred to use this miner on real-life logs. The ILP miner outputs petri net model, As many calculations are required to solve ILP problems, ILP miners are typically slower than other algorithms. It is difficult to achieve progress on this topic because these problems are NP-hard [30]. ETM miner “Evolutionary Tree Miner” based on a genetic algorithm that allows the user to control the model discovery process based on user preferences to the four quality metrics: fitness, complexity, precision, and generalization. It based on process tree in its implementation, so no unsound model is generated. To achieve its work, ETM as genetic algorithm does several steps. When the event log of the observed behavior is input. A population of random process trees is created, and each tree contains exactly one instance of each activity. Then, for each candidate in the population, the four quality dimensions are determined. The overall fitness of the process tree is calculated using the weights assigned to each dimension. In the following step, certain stop criteria are checked, such as locating a tree with the desired overall fitness. If any of the stop requirements are met, the candidates in the population are modified and the fitness is again calculated. This process is repeated until at least one stop requirement is met. And the fittest candidate is then returned [31]. The technique successfully handles the problems like non-trivial constructs and/or noise present in the log. This genetic algorithm approach used global search techniques to handle these problems rather than depending on local information, Experiments were carried out using synthetic and real-life logs to explain that the fitness metric is complete and precise. The ETM execution time was extremely long (several hours on our target computer), and the computation failed several times due to memory lack (i.e. java heap space). 2.2.3 Evaluating the quality of the discovered models. Wil van in [30] proposed four quality dimensions of the discovered process model: fitness, simplicity, precision and generalization. Fitness means that the process model is capable of displaying all trace in logs from beginning to end. Simplicity means the simplest model that can be the best to describe the behavior of the model. It concerns on the human comprehension since it will be difficult to understand and reveal complex process models. The simplicity expresses the complexity of the process model. It can be measured by the number of arcs and nodes which are used in the process models. Generalization refers to the process model that gives more behaviors which are not existent in the event logs displaying in the model. Lastly, the precision implies to the ability of the model which disallowed undesirable behavior. 2.3 Analysis phase The analysis tasks are to provide insights into the processes in the business. The analysis in process mining can be presented through the performance analysis and organizational analysis. 2.3.1 Performance analysis. The performance analysis can be implemented by: Conformance checking that used to detect an existence deviation in the discovered careflow model from what was planned through handmade model. Statistics analysis to measure some metrics such as the consuming time of the case process, the consumed time for each activity, as well as the resources used with each activity, what activities consume a lot of time, as well as the long waiting times before the activity start. 2.3.2 Organization analysis. The social network miner used process logs to discover the social networks. because there are several social network analysis methods and research results available, The resulting social network enables analysis of the social relations between originators involving process executions. 2.3.1 Performance analysis. The performance analysis can be implemented by: Conformance checking that used to detect an existence deviation in the discovered careflow model from what was planned through handmade model. Statistics analysis to measure some metrics such as the consuming time of the case process, the consumed time for each activity, as well as the resources used with each activity, what activities consume a lot of time, as well as the long waiting times before the activity start. 2.3.2 Organization analysis. The social network miner used process logs to discover the social networks. because there are several social network analysis methods and research results available, The resulting social network enables analysis of the social relations between originators involving process executions. 3. The discussion and results 3.1 The case study The data set was collected from the heart surgery unit in an Egyptian hospital, where the hospital received more than 1233 heart treatment cases, and only 1233 cases that started and ended its treatment during the six months from January 2021 to May 2021. The business process as it is planned by the healthcare professionals shown in Fig 4, The patient’s journey begins with “Admission to Hospital” activity and ends with “Discharge and Leave” the hospital. Some or few of activities are interposed the two activities such as “First Consultation Checkup”, “Second Consultation Checkup”, “Radiology Tests”, “Laboratory Tests”, “Scheduling Patient Appointment”, “Cardiac Stent”, “Diagnostic Catheterization”, “Recuperation Ward”, “Prepare patient for Surgery”, “Blood Bank”, “Perform open heart Surgery”, “ICU”. Surely, a business process depends on the careflow which includes event logs that build complete cases, Fig 5 shows a part of the event log that contains one case of one patient. All including activities points to activities of the patient into the cardiac care department and related activities occurred in other departments. Service fees, names of the patients, and the names of work staff are hidden as the hospital’s privacy issues. Download: PPT PowerPoint slide PNG larger image TIFF original image Fig 4. General business model for the heart surgeries in the hospital. https://doi.org/10.1371/journal.pone.0281836.g004 Download: PPT PowerPoint slide PNG larger image TIFF original image Fig 5. A part of event log. https://doi.org/10.1371/journal.pone.0281836.g005 3.2 Preprocessing phase results This phase is the primary stage that firstly includes extracting the events from the data set and how to select the appropriate attributes that should be used in the event log, then the extracted log is cleaned and prepared for importing into the ProM framework. 1- Extracting data. The undertaken hospital uses a hospital information system to manage their business and data. The hospital database contains more than 40 tables to serve all aspects of the hospital, but this study concerns deeply on tables that related to the patient movement and its relations to collect the complete activities about the patient. Tables describe services are introduced to the patient such as admission, labs, radiology, wards, surgery, and medication departments. Fig 6 shows the necessary tables used to define and extract the event logs of the dataset. Each table describes an activity in general or data needed to describe the event, every activity table contains data about time, resources, originator, and other useful information for the performance or organizational analysis. The tables have relationships among them. Queries used to filter the final dataset depending on these tables from the hospital information system. Download: PPT PowerPoint slide PNG larger image TIFF original image Fig 6. Database model for the case study. https://doi.org/10.1371/journal.pone.0281836.g006 2- Preprocessing phase results. This phase is the primary stage firstly includes extracting the events from the data set and how to select the appropriate attributes that should be used in the event log, then the extracted log is cleaned and prepared for importing into the ProM framework. Data cleaning: before any cleaning process; “patient’s name” attribute was converted into IDs or code number to maintain the confidential and privacy. Then, any duplicate records were deleted. It is important removing irrelevant data such as noise and outlier data from the event logs for better results. The boxplots for event duration time are used for detecting the outlier and take a decision to remove it or change the value to new value. Fig 7 (a) and 7(b) is for boxplots for events duration time. The activities “Admission to Hospital”, “Cardiac Stent”, and “Discharge Leave” are usually taking a fixed time, so the outliers from these activities will return to the known fixed time. Other outliers from other activities will be removed. Handle missing values: there are 44 empty values or noise data in the event logs in column “StartDate” and column “FinishDate”. The suitable handling is to fill the empty cells with the mean values of the same event of other cases as approximations of the true values. Step a and b (Removing outliers and noises, and handling missing values from the event log) are implemented using Python Pandas. Incomplete process instances: To clean up data, it should remove those incomplete process traces from the log. The ProM platform have many plugins to deal with the noise from traces or event attributes such as plugin of “Filter log on trace attribute values”, “Filter log on event attribute values”, and “Filter log using simple heuristics”. These plugins provide the ability to eliminate all traces that do not begin and/or end with a particular event. By calculating the frequency of event recurrence, it can also eliminate all events related to the specific process task. Table 2 illustrates a statistic for the event log data before and after applying preprocessing phase. Download: PPT PowerPoint slide PNG larger image TIFF original image Fig 7. Boxplots for events durations. https://doi.org/10.1371/journal.pone.0281836.g007 Download: PPT PowerPoint slide PNG larger image TIFF original image Table 2. Event data before and after applying preprocessing phase. https://doi.org/10.1371/journal.pone.0281836.t002 3.3 Model discovery phase results There are two main techniques to solve the difficulties in the event log when mining structured model before apply the mine discovery algorithms. The paper used the abstraction and clustering techniques to simplify the discovered models. 1- Clustering techniques. The initial event log contains a hug number of data as a result from a vast number of activity sequences within low level abstraction, which generates a spaghetti model as it’s shown in Fig 3. This model is unstructured and difficult to understand. In order to apply the discovery and analysis processes; it is necessary for the model to be simple and structured. This paper implemented some techniques to solve these difficulties, such as abstraction and clustering techniques. The abstraction technique is to aggregate the low level activities into one high level activity. The focus will be on the most significant aspects of the process or the main activity, this case study will concern on the department level not its low details that happen in the department. For example; the Radiology department contains many low activities such as “register, scheduling, ultrasound, scan abdomen, chest X-ray, CT scan brain, report, send report, and discharge”. All these activities will be discarded and merged in one main activity titled as “Radiology”. Also, this action will be applied to the low level activities in Lab department to merge to them in one main abstraction Lab activity. In the clustering technique, the traces that have the same similarity will be grouped into separate clusters. Every sub-log groups the traces that have very similar characteristics. Patients of the traces are grouped into the same cluster based on the characteristics of their care journeys. This paper proposed to use the trace clustering method by using “ActiTraC” plugin, and Markov cluster algorithm “MCL”. The generated clusters from “ActiTraC” plugin as it’s shown in Fig 8, there are four clusters generated has the same interesting. So the experts of the hospital did not accept this result. Another try to apply trace clustering using Markov cluster algorithm “MCL” [32]. The MCL finds the traces that are similar based on selected perspectives. The Table 3 shows the properties of the two clusters generated from using the ProM plugin “cluster cases using Markov clustering” based on “refers from” attribute, also Fig 9 shows petri net models of the two generated clusters. Download: PPT PowerPoint slide PNG larger image TIFF original image Fig 8. Four clusters generated from “ActiTraC” method. https://doi.org/10.1371/journal.pone.0281836.g008 Download: PPT PowerPoint slide PNG larger image TIFF original image Fig 9. BPMN models of the two clusters generated from using Markov clustering algorithm upon on refers from attribute. (a) Cluster refers from “Doctor” (b) Cluster refers from “Emergency”. https://doi.org/10.1371/journal.pone.0281836.g009 Download: PPT PowerPoint slide PNG larger image TIFF original image Table 3. The properties of the two clusters generated from using Markov clustering algorithm. https://doi.org/10.1371/journal.pone.0281836.t003 After a consultation with the experts in the undertaken hospital, they see no point in relying on the previous trace clustering results. So they were agreed to do the clustering upon on the interesting they need, upon the main activity that the cardiac consultant decided for the patients. The main three categories of clustering upon: 1) cardiac stent or diagnostic catheterization surgeries, 2) heart surgery, 3) the rest of the traces that do not include any surgery wil be added into the third cluster named “medication”. The clustering process generated three clusters based on a certain sequence of event series. The generated clusters logs were converted to petri net models as it is shown in Fig 10. Download: PPT PowerPoint slide PNG larger image TIFF original image Fig 10. BPMN models generated after clustering patients upon on the interest of hospital’s experts. (a) Cardiac Stent and Diagnostic Catheterization (b) open heart surgery (c) without any surgery just a medication. https://doi.org/10.1371/journal.pone.0281836.g010 3.3.1 Process discovery results. The domain experts in the hospital have prepared a hand-made process model for planning the patient’s processes during his healthcare journey once s/he admitted to the hospital at the end of the journey. The domain experts predict that their hand-made model is idealized for planning the processes, but in many times it’s far from the actual processes that performed. Process mining considers the model discovery is the base step to build the model in such as graphical structure from the real events. For illustration propose, the names of the activities will be represented by letters as it is shown in Table 4. Table 5 shows the most frequent and the complete traces in the event logs that are 87% from the original event log without any anomal trace. Process discovery uses event log to generate a process model. Based on the process model, the domain experts in the hospital can apply other process mining techniques such as conformance checking and performance analysis to gain deeper insights into the hospital. This paper applied four popular process discovery miner algorithms to the data set of the hospital and compared among generated models to select one model with the high scores in the quality metrics (fitness, precision, generalization, simplicity). In the following section, a petri net figures Figs 11–14 output from applying the discovery miner algorithms, these petri nets will be the input to next step of “Evaluation of the quality of the discovered models”; The four used mining algorithms are heuristic miner [28], inductive miner [29], ILP miner [30, 33], and ETM miner [31, 34]. These algorithms are supported in Prom, guarantee complete traces, and output a petri net model. Download: PPT PowerPoint slide PNG larger image TIFF original image Fig 11. Result model from heuristic miner. https://doi.org/10.1371/journal.pone.0281836.g011 Download: PPT PowerPoint slide PNG larger image TIFF original image Fig 12. Result model from inductive miner. https://doi.org/10.1371/journal.pone.0281836.g012 Download: PPT PowerPoint slide PNG larger image TIFF original image Fig 13. Petri net from the ILP miner. https://doi.org/10.1371/journal.pone.0281836.g013 Download: PPT PowerPoint slide PNG larger image TIFF original image Fig 14. Result from ETM miner algorithm. https://doi.org/10.1371/journal.pone.0281836.g014 Download: PPT PowerPoint slide PNG larger image TIFF original image Table 4. Index of the activities. https://doi.org/10.1371/journal.pone.0281836.t004 Download: PPT PowerPoint slide PNG larger image TIFF original image Table 5. Most frequent traces. https://doi.org/10.1371/journal.pone.0281836.t005 3- Evaluation of the quality of the discovered models. Throughout this section, the paper introduces how it measured the four quality metrics (fitness, precision, simplicity, and generalization) to evaluate the performance of the discovered process models: Fitness metric is measured using the alignment-based conformance checking method, in general the early alignment methods [35] depended on replaying each trace against the process model one event at a time. The error in replay happens when there are ignore/skip an event in the log or to ignore/skip task in the process model. This drawback is fixed by later alignment techniques in researches of [35, 36], where the closed corresponding trace that can be parsed by the model is identified for each trace. This paper used the Prom plugin named “Replay a log on petri net for conformance analysis” that was derived from the alignment-based conformance checking in [35, 36]. This plugin takes the petri net that generated from the four discovery miners and the event log after filtering, the outputs from this plugin are a “Result Replay” and “log contains statistical information such as trace fitness”. The Fig 15 shows the result of replaying the petri net from inductive miner as one from the four applied discovery miners with extracted log after filtering. Fig 16 shows the output statistical information from replay petri net (from the inductive miner) with the extracted log. This step is applied on the petri nets of the other three discovery miners (heuristic, ILP, and ETM). Download: PPT PowerPoint slide PNG larger image TIFF original image Fig 15. Result Replay of Petri net based on inductive miner with extracted log. https://doi.org/10.1371/journal.pone.0281836.g015 Download: PPT PowerPoint slide PNG larger image TIFF original image Fig 16. The statistical information obtained from replay Petri net based on petri net from inductive miner with extracted log. https://doi.org/10.1371/journal.pone.0281836.g016 The precision metric and the generalization metrics are measured by using the plugin of “Measure the Precision/Generalization” that needs three elements (the petri net, event log, and result replay of the Fig 15 as input to it). The plugin of “Measure the Precision/Generalization” measures the precision using the method that introduced in [37]. This method measures precision by aligning the logs to the process model, if all activities by the model are actually observed, then precision is ‘1’. the method based on Eq 2 below: (2) Where each event eϵε, ϵ is the collection of unique events, enM(e) the number of enabled activities in the model, and enL(e) is the number of the observed activities actually executed. The value of the precision between ‘0’ and ‘1’. If the value near to ‘0’ the model is underfitting, in opposite if the model near to ‘1’ is more precise. Also, the plugin of “Measure the Precision/Generalization” measures the generalization using a similar approach as it used for precision measuring. The approach based on the Eq 3 below to quantify the generalization: (3) Where each event eϵε, ϵ is the collection of unique events, pnew(w;n) is the estimated probability that next visit to stat stateM(e) will reveal a new path not seen before, w = |diff(e)| is the number of unique activities observed leaving state s and n = |sim(e)| is the number of times s was visited by the event log. Fig 17 shows screen shot of a result from the plugin “measure precision/ generalization” of a petri net based on inductive miner. Download: PPT PowerPoint slide PNG larger image TIFF original image Fig 17. Precision and generalization results of Petri net based on inductive miner. https://doi.org/10.1371/journal.pone.0281836.g017 The simplicity metric quantifies complexity of the model as “how number of arcs and nodes in the model”. The simplicity concept concerns on the model itself not relating to the event log. The [38, 39] introduce a number of metrics to measure the simplicity or complexity of the model such as size, diameter, density, connectivity, node degree, separability structuredness, sequentiality, depth, gateway mismatch, gateway heterogeneity, control-flow complexity(Cardoso), cyclicity, token splits, and soundness. There is no single perfect metric that can be used for quantifying the simplicity. All metrics has pros and cons issues. This paper will use a new method to quantify the simplicity metric and decide if the discovered model is simple or not, and what is the simpler model than others. The new method is summarized into two steps: Firstly; checking the soundness of every model resulted from the miner algorithms. Then second step; measuring the metrics of control-flow complexity (Cardoso), cyclomatic, and structuredness. Check the soundness: The process model is sound when it is able to achieve three properties: 1) option to complete; reaches the end state of process model from any state; 2) proper completion: there are no token left behind when reach the final state of process model; 3) no dead transition: each transition in the model can be enabled. The paper used the ProM plugin “Analyze with Woflan” to check the soundness of each model resulted from the discovery miner algorithm. The output from the plugin states that only the models based on the inductive miner and ETM miner are sound. So; the other models based on ILP and heuristic miner algorithms are discarded from the next step. Measuring the metrics of ‘density’, ‘control-flow complexity (Cardoso)’, ‘cyclomatic’, and ‘structuredness’. the density is the relation between the number of arcs and the maximum number of arcs between all nodes, Control-Flow Complexity(Cardoso) is sums of all choices of a process based on the number of splits of each type and its number of outgoing arcs, Cyclomatic is the number of nodes within cycles with regard to the all number of nodes, it is calculated by the rule: CM = |E||V| + P; where E is number of nodes, V is number of edges, P is number of connected components. Structuredness is the proportion of well-structured parts with respect to the rest (non-structured) in the process model. The structuredness ratio of the process graph is measured by decreasing the number of nodes in the reduced process graph from the one and divided by the number of nodes in the original process graph. The paper used ProM plugin “Show Petri net metrics” to get the values of the above metrics (Cardoso, Cyclomatic, and Structuredness). Table 6 shows the output values of the metrics (Cardoso, Cyclomatic, and Structuredness) for the models based on the inductive and ETM discovery miners. Download: PPT PowerPoint slide PNG larger image TIFF original image Table 6. Quantify the complexity of the model from the inductive and ETM discovery miners. https://doi.org/10.1371/journal.pone.0281836.t006 The results of the Table 6 show that both models based on inductive miner and ETM miner algorithms are simple, the values of the metrics (Cardoso, Cyclomatic, and structuredness) indicates that model based on inductive miner is simpler than the model based on ETM miner. Table 7 shows the results of the four miner algorithms in the four quality evaluation metrics (fitness, precision, generalization, complexity). It is shown that the inductive miner algorithm scores the highest values in fitness and generalization issues, followed by ETM miner algorithm. The inductive miner algorithm scores a low value in precision issue than ETM. Based on this comparison, the best choice is to depend on the inductive miner algorithm as discovery process model in this case study. Also, it’s noticeable through execution that ETM miner has consumed more time than other miner algorithms, this has been proved in [40], and therefore the ETM miner was excluded from using in this paper. Download: PPT PowerPoint slide PNG larger image TIFF original image Table 7. Comparison among the four applied algorithms. https://doi.org/10.1371/journal.pone.0281836.t007 3.3.2 The results of the analysis phase. This paper elaborates on the analysis tasks to provide insights into the processes in the hospital. Performance analysis techniques are conducted by statistical analysis or by using Prom tool such as dotted chart, conformance checking, and inductive visual miner. The dotted chart method for case handling processes, conformance checking to check deviations and bottlenecks. Also, organizational analysis conducted by social network mining method to provide insights into the collaboration between originators or departments in the hospital. 1- The results of the performance analysis. The performance analysis stage has a special importance during applying the process mining process in the hospital; it is implemented when the ‘model discovery’ and ‘evaluating the quality metrics of the generated models’ are finished to use their outputs. Where performance analysis is important to know the time that the patient consumes since entering the hospital, the consumed time for each activity, as well as the resources used with each activity, what activities consume a lot of time with the patient, as well as the long waiting times of the patient while performing a particular operation. Also, it is necessary to detect the existent deviation in the discovered careflow model from what was planned through handmade, as well as discovering the occurrence of bottlenecks in implementing certain operations. Performance analysis can be implemented by ProM platform provides process mining with many performance analysis techniques or by statistics analysis. Statistical analysis; using statistical analysis merged with process mining in order to analysis the length of patient journey into the hospital and if any correlation between the diagnosis of the patient and the length of its journey or the length of hospitalization spent time. Table 8 shows the Length of the patient journey (in days) into the hospital according to the patient type (preferred from doctor, or from Emergency). Table 8, shows that patients from emergency have shorter length path than from doctor. The average of stay of patient from doctor was ≈13 day, but the stay of the patients from Emergency was ≈7 days. Download: PPT PowerPoint slide PNG larger image TIFF original image Table 8. The length of the patient journey (in days) into hospital according to the patient refer from type. https://doi.org/10.1371/journal.pone.0281836.t008 Table 9 shows the Length of the patient journey (in days) into the hospital according to the patient diagnosis (open heart, Cardiac Stent and Diagnostic Catheterization, medication, and “discharged without surgery”). Table 9, shows that patients of “open heart” have the longest length path than other diagnostics. The average of stay of patient of “open heart” was ≈23 day, other diagnostics patients have a length stay ≈7 and 8 days. That’s because the activities included in the variants of “open heart surgery” has a long time to complete, but activities includes in others diagnostic have a shorter time. Download: PPT PowerPoint slide PNG larger image TIFF original image Table 9. Length of the patient journey (in days) into hospital according to the patient diagnostics. https://doi.org/10.1371/journal.pone.0281836.t009 ProM platform provides process mining with many performance analysis techniques such as conformance checking, dotted chart, and inductive visual miner plugins. Conformance checking; Fig 18 presents the conformance checking analysis by replaying result from mapping extracted event log from information system (after cleaning and preprocessing it) on petri net of the base model “standard model”, the base model was handmade by hospital’s domain experts (petri net of the base model was converted from BPMN model of Fig 4. The global statistics of replaying results in Fig 19 shows there are compliance between the base model (hand-made model) and the event logs from the information system with percentage 95%. It means that the extracted event log has a deviation from the model with about 5%. By example in Fig 19 (b), the activity “Recuperation Ward” has a red green bar to indicate 123 times not observed in the traces and 838 times it was executed correctly. According to global statistics in Fig 19 (a), some services in the hospital needs to be improved. Download: PPT PowerPoint slide PNG larger image TIFF original image Fig 18. Replaying the Event Log and the Petri Net of the base Model “Standard model” for Conformance Analysis and Bottleneck analysis. https://doi.org/10.1371/journal.pone.0281836.g018 Download: PPT PowerPoint slide PNG larger image TIFF original image Fig 19. The resulting Statistical information obtained from replaying the event log with the Petri Net of the base model “Standard model” for conformance checking process. https://doi.org/10.1371/journal.pone.0281836.g019 Inductive visual miner; for more analysis and inspection, the paper intends to detect the deviations in the model by using the inductive visual miner [41] as it is shown in Figs 20–22. The resulted model from the visual inductive miner starts with green circle and walks over the arrows to the end red circle, whereas each box represent an activity, the numbers that shown under the activity name are the frequencies with which it were executed according to the model. In Fig 20, the red-dashed edges indicate that these paths include deviations or bottlenecks during the execution; there are many activities such as “1st Consultant Checkup”,“Laboratory Tests”, “Admission Request”,“Prepare patient for Surgery”,’“Blood bank”, and “Recuperation Ward” in the model that has been skipped with a number of times, while they should be executed in the model. Fig 21 outlines the patients involved in hospital services during the observed period, a total of 961 patients started the services by the activity of “Admission to Hospital” and “1st Consultant Checkup”. A number of 645 patients passed through the “Laboratory Tests” and 642 for the “Radiology Tests”, 266 patients passed through “consultant checkup” service without passing through the lab or radiology department, and 903 patient passed directly through “Diagnostic Catheterization” and only 305 “Cardiac Stent” surgical services. Then a group of 159 patient wanted to use the services of performing the open heart surgery and then passed to the service of “recuperation ward”. While a group of 914 patient used the “recuperation ward” because they did the surgeries of “Open Heart” and “Cardiac Stent”. And at the end all the 961 patient passed through the service of“discharged and leave” to finish the sojourn in the hospital. Fig 22, shows the bottlenecks on the services where there are unbalanced allocation of resources; some resources were overallocated while others were free. In the Fig 22; it is possible to observe resources with high queue length while other resources were idle in the same instant, it’s clear on the Lab resource and 1st consultant resource. Download: PPT PowerPoint slide PNG larger image TIFF original image Fig 20. The process model explains the deviations. https://doi.org/10.1371/journal.pone.0281836.g020 Download: PPT PowerPoint slide PNG larger image TIFF original image Fig 21. Patients’ distribution among hospital services. https://doi.org/10.1371/journal.pone.0281836.g021 Download: PPT PowerPoint slide PNG larger image TIFF original image Fig 22. The process model explains sojourn time of the resources. https://doi.org/10.1371/journal.pone.0281836.g022 Dotted chart; Also the paper used the “Project chart on Dotted Chart” plugin that’s a dotted chart visualization of the event logs, it’s capable of handling large event logs to show overall events and performance information of the event log. Fig 23 displays the events as dots for the whole event log, the time ranging from January 2021 to May 2021 is measured along the horizontal axis of the chart, the vertical axis represents case IDs and the events are colored according to their activity name. The dotted chart shows that most of the patient’s cases start their activities inside the undertaken hospital with activity “1st Consultant Checkup”, then activities “Laboratory Tests” and “Radiology Tests” together. In Fig 24; the dotted chart for cluster 1 of “cardiac stent and diagnostic catheterization”. It’s noted that the throughput time for the most instances is short, it reaches less than one day. Also, it’s noted the most instances has a few number of activities, it reaches only three activities without the activities of the “Admission to hospital” and “Leave and Discharge”. In Fig 25; the dotted chart for cluster 2 of “open heart surgery”. It’s noted that the throughput time for the most instances is long, it reaches to more than 30 days. In general of the three dotted charts; it is shown that there are many correlation orders of a specific activity such as “Blood Bank”, “Perform open heart Surgery” and “ICU”, also there is correlation between “Consultant Checkup”, “Laboratory Tests” and “Radiology Tests”. Based on Fig 26 that shows the dotted chart for cluster 3, where the patients leaved the hospital without any surgery but only took medication care. It’s noted a space-time between dot of “scheduling patient appointment” event and dot of next admission “Admission request”. Download: PPT PowerPoint slide PNG larger image TIFF original image Fig 23. The dotted chart of whole event log. https://doi.org/10.1371/journal.pone.0281836.g023 Download: PPT PowerPoint slide PNG larger image TIFF original image Fig 24. The dotted chart of cluster 1 of “cardiac stent and diagnostic catheterization”. https://doi.org/10.1371/journal.pone.0281836.g024 Download: PPT PowerPoint slide PNG larger image TIFF original image Fig 25. The dotted chart of cluster 2 of “open heart surgery”. https://doi.org/10.1371/journal.pone.0281836.g025 Download: PPT PowerPoint slide PNG larger image TIFF original image Fig 26. The dotted chart of cluster 3 of “the patients leaved the hospital without any surgery but only took medication care”. https://doi.org/10.1371/journal.pone.0281836.g026 2- The results of the organization analysis. The paper used social network miner that plugged into ProM platform to gain insights about the collaboration between departments, and persons in the hospital. The social network miner allows for the discovery of social networks from process logs. Since there are several social network analysis techniques and research results available, the generated social network allows for analysis of social relations between originators involving process executions. The social network is used to analysis of social relations between originators involving process executions. Fig 27 shows the social network that generated from the event log, it used the “Handover of Work” metric that measures the frequency of transfers of work among departments. In undertaken hospital there are many departments that interact and handover work to each other, there are many-to-many relationship among resources. Fig 27 shows that there are many originators such as laboratory, radiology, and accountant, cardiology consultant, and general practitioner are highly involved in the process and interacts with many departments. But There are other originators are often involved but are not directly connected to all other originators such as ICU nurse, nurse care, and ward nurse. Also, they noted that fewer interactions between the general practitioner 2, accountant2, and receptionist 2 to other originators. This can be explained that those originators work as evening time with no density in patient requests. Download: PPT PowerPoint slide PNG larger image TIFF original image Fig 27. Social network of the hospital originators with the handover of work metric. https://doi.org/10.1371/journal.pone.0281836.g027 3.1 The case study The data set was collected from the heart surgery unit in an Egyptian hospital, where the hospital received more than 1233 heart treatment cases, and only 1233 cases that started and ended its treatment during the six months from January 2021 to May 2021. The business process as it is planned by the healthcare professionals shown in Fig 4, The patient’s journey begins with “Admission to Hospital” activity and ends with “Discharge and Leave” the hospital. Some or few of activities are interposed the two activities such as “First Consultation Checkup”, “Second Consultation Checkup”, “Radiology Tests”, “Laboratory Tests”, “Scheduling Patient Appointment”, “Cardiac Stent”, “Diagnostic Catheterization”, “Recuperation Ward”, “Prepare patient for Surgery”, “Blood Bank”, “Perform open heart Surgery”, “ICU”. Surely, a business process depends on the careflow which includes event logs that build complete cases, Fig 5 shows a part of the event log that contains one case of one patient. All including activities points to activities of the patient into the cardiac care department and related activities occurred in other departments. Service fees, names of the patients, and the names of work staff are hidden as the hospital’s privacy issues. Download: PPT PowerPoint slide PNG larger image TIFF original image Fig 4. General business model for the heart surgeries in the hospital. https://doi.org/10.1371/journal.pone.0281836.g004 Download: PPT PowerPoint slide PNG larger image TIFF original image Fig 5. A part of event log. https://doi.org/10.1371/journal.pone.0281836.g005 3.2 Preprocessing phase results This phase is the primary stage that firstly includes extracting the events from the data set and how to select the appropriate attributes that should be used in the event log, then the extracted log is cleaned and prepared for importing into the ProM framework. 1- Extracting data. The undertaken hospital uses a hospital information system to manage their business and data. The hospital database contains more than 40 tables to serve all aspects of the hospital, but this study concerns deeply on tables that related to the patient movement and its relations to collect the complete activities about the patient. Tables describe services are introduced to the patient such as admission, labs, radiology, wards, surgery, and medication departments. Fig 6 shows the necessary tables used to define and extract the event logs of the dataset. Each table describes an activity in general or data needed to describe the event, every activity table contains data about time, resources, originator, and other useful information for the performance or organizational analysis. The tables have relationships among them. Queries used to filter the final dataset depending on these tables from the hospital information system. Download: PPT PowerPoint slide PNG larger image TIFF original image Fig 6. Database model for the case study. https://doi.org/10.1371/journal.pone.0281836.g006 2- Preprocessing phase results. This phase is the primary stage firstly includes extracting the events from the data set and how to select the appropriate attributes that should be used in the event log, then the extracted log is cleaned and prepared for importing into the ProM framework. Data cleaning: before any cleaning process; “patient’s name” attribute was converted into IDs or code number to maintain the confidential and privacy. Then, any duplicate records were deleted. It is important removing irrelevant data such as noise and outlier data from the event logs for better results. The boxplots for event duration time are used for detecting the outlier and take a decision to remove it or change the value to new value. Fig 7 (a) and 7(b) is for boxplots for events duration time. The activities “Admission to Hospital”, “Cardiac Stent”, and “Discharge Leave” are usually taking a fixed time, so the outliers from these activities will return to the known fixed time. Other outliers from other activities will be removed. Handle missing values: there are 44 empty values or noise data in the event logs in column “StartDate” and column “FinishDate”. The suitable handling is to fill the empty cells with the mean values of the same event of other cases as approximations of the true values. Step a and b (Removing outliers and noises, and handling missing values from the event log) are implemented using Python Pandas. Incomplete process instances: To clean up data, it should remove those incomplete process traces from the log. The ProM platform have many plugins to deal with the noise from traces or event attributes such as plugin of “Filter log on trace attribute values”, “Filter log on event attribute values”, and “Filter log using simple heuristics”. These plugins provide the ability to eliminate all traces that do not begin and/or end with a particular event. By calculating the frequency of event recurrence, it can also eliminate all events related to the specific process task. Table 2 illustrates a statistic for the event log data before and after applying preprocessing phase. Download: PPT PowerPoint slide PNG larger image TIFF original image Fig 7. Boxplots for events durations. https://doi.org/10.1371/journal.pone.0281836.g007 Download: PPT PowerPoint slide PNG larger image TIFF original image Table 2. Event data before and after applying preprocessing phase. https://doi.org/10.1371/journal.pone.0281836.t002 3.3 Model discovery phase results There are two main techniques to solve the difficulties in the event log when mining structured model before apply the mine discovery algorithms. The paper used the abstraction and clustering techniques to simplify the discovered models. 1- Clustering techniques. The initial event log contains a hug number of data as a result from a vast number of activity sequences within low level abstraction, which generates a spaghetti model as it’s shown in Fig 3. This model is unstructured and difficult to understand. In order to apply the discovery and analysis processes; it is necessary for the model to be simple and structured. This paper implemented some techniques to solve these difficulties, such as abstraction and clustering techniques. The abstraction technique is to aggregate the low level activities into one high level activity. The focus will be on the most significant aspects of the process or the main activity, this case study will concern on the department level not its low details that happen in the department. For example; the Radiology department contains many low activities such as “register, scheduling, ultrasound, scan abdomen, chest X-ray, CT scan brain, report, send report, and discharge”. All these activities will be discarded and merged in one main activity titled as “Radiology”. Also, this action will be applied to the low level activities in Lab department to merge to them in one main abstraction Lab activity. In the clustering technique, the traces that have the same similarity will be grouped into separate clusters. Every sub-log groups the traces that have very similar characteristics. Patients of the traces are grouped into the same cluster based on the characteristics of their care journeys. This paper proposed to use the trace clustering method by using “ActiTraC” plugin, and Markov cluster algorithm “MCL”. The generated clusters from “ActiTraC” plugin as it’s shown in Fig 8, there are four clusters generated has the same interesting. So the experts of the hospital did not accept this result. Another try to apply trace clustering using Markov cluster algorithm “MCL” [32]. The MCL finds the traces that are similar based on selected perspectives. The Table 3 shows the properties of the two clusters generated from using the ProM plugin “cluster cases using Markov clustering” based on “refers from” attribute, also Fig 9 shows petri net models of the two generated clusters. Download: PPT PowerPoint slide PNG larger image TIFF original image Fig 8. Four clusters generated from “ActiTraC” method. https://doi.org/10.1371/journal.pone.0281836.g008 Download: PPT PowerPoint slide PNG larger image TIFF original image Fig 9. BPMN models of the two clusters generated from using Markov clustering algorithm upon on refers from attribute. (a) Cluster refers from “Doctor” (b) Cluster refers from “Emergency”. https://doi.org/10.1371/journal.pone.0281836.g009 Download: PPT PowerPoint slide PNG larger image TIFF original image Table 3. The properties of the two clusters generated from using Markov clustering algorithm. https://doi.org/10.1371/journal.pone.0281836.t003 After a consultation with the experts in the undertaken hospital, they see no point in relying on the previous trace clustering results. So they were agreed to do the clustering upon on the interesting they need, upon the main activity that the cardiac consultant decided for the patients. The main three categories of clustering upon: 1) cardiac stent or diagnostic catheterization surgeries, 2) heart surgery, 3) the rest of the traces that do not include any surgery wil be added into the third cluster named “medication”. The clustering process generated three clusters based on a certain sequence of event series. The generated clusters logs were converted to petri net models as it is shown in Fig 10. Download: PPT PowerPoint slide PNG larger image TIFF original image Fig 10. BPMN models generated after clustering patients upon on the interest of hospital’s experts. (a) Cardiac Stent and Diagnostic Catheterization (b) open heart surgery (c) without any surgery just a medication. https://doi.org/10.1371/journal.pone.0281836.g010 3.3.1 Process discovery results. The domain experts in the hospital have prepared a hand-made process model for planning the patient’s processes during his healthcare journey once s/he admitted to the hospital at the end of the journey. The domain experts predict that their hand-made model is idealized for planning the processes, but in many times it’s far from the actual processes that performed. Process mining considers the model discovery is the base step to build the model in such as graphical structure from the real events. For illustration propose, the names of the activities will be represented by letters as it is shown in Table 4. Table 5 shows the most frequent and the complete traces in the event logs that are 87% from the original event log without any anomal trace. Process discovery uses event log to generate a process model. Based on the process model, the domain experts in the hospital can apply other process mining techniques such as conformance checking and performance analysis to gain deeper insights into the hospital. This paper applied four popular process discovery miner algorithms to the data set of the hospital and compared among generated models to select one model with the high scores in the quality metrics (fitness, precision, generalization, simplicity). In the following section, a petri net figures Figs 11–14 output from applying the discovery miner algorithms, these petri nets will be the input to next step of “Evaluation of the quality of the discovered models”; The four used mining algorithms are heuristic miner [28], inductive miner [29], ILP miner [30, 33], and ETM miner [31, 34]. These algorithms are supported in Prom, guarantee complete traces, and output a petri net model. Download: PPT PowerPoint slide PNG larger image TIFF original image Fig 11. Result model from heuristic miner. https://doi.org/10.1371/journal.pone.0281836.g011 Download: PPT PowerPoint slide PNG larger image TIFF original image Fig 12. Result model from inductive miner. https://doi.org/10.1371/journal.pone.0281836.g012 Download: PPT PowerPoint slide PNG larger image TIFF original image Fig 13. Petri net from the ILP miner. https://doi.org/10.1371/journal.pone.0281836.g013 Download: PPT PowerPoint slide PNG larger image TIFF original image Fig 14. Result from ETM miner algorithm. https://doi.org/10.1371/journal.pone.0281836.g014 Download: PPT PowerPoint slide PNG larger image TIFF original image Table 4. Index of the activities. https://doi.org/10.1371/journal.pone.0281836.t004 Download: PPT PowerPoint slide PNG larger image TIFF original image Table 5. Most frequent traces. https://doi.org/10.1371/journal.pone.0281836.t005 3- Evaluation of the quality of the discovered models. Throughout this section, the paper introduces how it measured the four quality metrics (fitness, precision, simplicity, and generalization) to evaluate the performance of the discovered process models: Fitness metric is measured using the alignment-based conformance checking method, in general the early alignment methods [35] depended on replaying each trace against the process model one event at a time. The error in replay happens when there are ignore/skip an event in the log or to ignore/skip task in the process model. This drawback is fixed by later alignment techniques in researches of [35, 36], where the closed corresponding trace that can be parsed by the model is identified for each trace. This paper used the Prom plugin named “Replay a log on petri net for conformance analysis” that was derived from the alignment-based conformance checking in [35, 36]. This plugin takes the petri net that generated from the four discovery miners and the event log after filtering, the outputs from this plugin are a “Result Replay” and “log contains statistical information such as trace fitness”. The Fig 15 shows the result of replaying the petri net from inductive miner as one from the four applied discovery miners with extracted log after filtering. Fig 16 shows the output statistical information from replay petri net (from the inductive miner) with the extracted log. This step is applied on the petri nets of the other three discovery miners (heuristic, ILP, and ETM). Download: PPT PowerPoint slide PNG larger image TIFF original image Fig 15. Result Replay of Petri net based on inductive miner with extracted log. https://doi.org/10.1371/journal.pone.0281836.g015 Download: PPT PowerPoint slide PNG larger image TIFF original image Fig 16. The statistical information obtained from replay Petri net based on petri net from inductive miner with extracted log. https://doi.org/10.1371/journal.pone.0281836.g016 The precision metric and the generalization metrics are measured by using the plugin of “Measure the Precision/Generalization” that needs three elements (the petri net, event log, and result replay of the Fig 15 as input to it). The plugin of “Measure the Precision/Generalization” measures the precision using the method that introduced in [37]. This method measures precision by aligning the logs to the process model, if all activities by the model are actually observed, then precision is ‘1’. the method based on Eq 2 below: (2) Where each event eϵε, ϵ is the collection of unique events, enM(e) the number of enabled activities in the model, and enL(e) is the number of the observed activities actually executed. The value of the precision between ‘0’ and ‘1’. If the value near to ‘0’ the model is underfitting, in opposite if the model near to ‘1’ is more precise. Also, the plugin of “Measure the Precision/Generalization” measures the generalization using a similar approach as it used for precision measuring. The approach based on the Eq 3 below to quantify the generalization: (3) Where each event eϵε, ϵ is the collection of unique events, pnew(w;n) is the estimated probability that next visit to stat stateM(e) will reveal a new path not seen before, w = |diff(e)| is the number of unique activities observed leaving state s and n = |sim(e)| is the number of times s was visited by the event log. Fig 17 shows screen shot of a result from the plugin “measure precision/ generalization” of a petri net based on inductive miner. Download: PPT PowerPoint slide PNG larger image TIFF original image Fig 17. Precision and generalization results of Petri net based on inductive miner. https://doi.org/10.1371/journal.pone.0281836.g017 The simplicity metric quantifies complexity of the model as “how number of arcs and nodes in the model”. The simplicity concept concerns on the model itself not relating to the event log. The [38, 39] introduce a number of metrics to measure the simplicity or complexity of the model such as size, diameter, density, connectivity, node degree, separability structuredness, sequentiality, depth, gateway mismatch, gateway heterogeneity, control-flow complexity(Cardoso), cyclicity, token splits, and soundness. There is no single perfect metric that can be used for quantifying the simplicity. All metrics has pros and cons issues. This paper will use a new method to quantify the simplicity metric and decide if the discovered model is simple or not, and what is the simpler model than others. The new method is summarized into two steps: Firstly; checking the soundness of every model resulted from the miner algorithms. Then second step; measuring the metrics of control-flow complexity (Cardoso), cyclomatic, and structuredness. Check the soundness: The process model is sound when it is able to achieve three properties: 1) option to complete; reaches the end state of process model from any state; 2) proper completion: there are no token left behind when reach the final state of process model; 3) no dead transition: each transition in the model can be enabled. The paper used the ProM plugin “Analyze with Woflan” to check the soundness of each model resulted from the discovery miner algorithm. The output from the plugin states that only the models based on the inductive miner and ETM miner are sound. So; the other models based on ILP and heuristic miner algorithms are discarded from the next step. Measuring the metrics of ‘density’, ‘control-flow complexity (Cardoso)’, ‘cyclomatic’, and ‘structuredness’. the density is the relation between the number of arcs and the maximum number of arcs between all nodes, Control-Flow Complexity(Cardoso) is sums of all choices of a process based on the number of splits of each type and its number of outgoing arcs, Cyclomatic is the number of nodes within cycles with regard to the all number of nodes, it is calculated by the rule: CM = |E||V| + P; where E is number of nodes, V is number of edges, P is number of connected components. Structuredness is the proportion of well-structured parts with respect to the rest (non-structured) in the process model. The structuredness ratio of the process graph is measured by decreasing the number of nodes in the reduced process graph from the one and divided by the number of nodes in the original process graph. The paper used ProM plugin “Show Petri net metrics” to get the values of the above metrics (Cardoso, Cyclomatic, and Structuredness). Table 6 shows the output values of the metrics (Cardoso, Cyclomatic, and Structuredness) for the models based on the inductive and ETM discovery miners. Download: PPT PowerPoint slide PNG larger image TIFF original image Table 6. Quantify the complexity of the model from the inductive and ETM discovery miners. https://doi.org/10.1371/journal.pone.0281836.t006 The results of the Table 6 show that both models based on inductive miner and ETM miner algorithms are simple, the values of the metrics (Cardoso, Cyclomatic, and structuredness) indicates that model based on inductive miner is simpler than the model based on ETM miner. Table 7 shows the results of the four miner algorithms in the four quality evaluation metrics (fitness, precision, generalization, complexity). It is shown that the inductive miner algorithm scores the highest values in fitness and generalization issues, followed by ETM miner algorithm. The inductive miner algorithm scores a low value in precision issue than ETM. Based on this comparison, the best choice is to depend on the inductive miner algorithm as discovery process model in this case study. Also, it’s noticeable through execution that ETM miner has consumed more time than other miner algorithms, this has been proved in [40], and therefore the ETM miner was excluded from using in this paper. Download: PPT PowerPoint slide PNG larger image TIFF original image Table 7. Comparison among the four applied algorithms. https://doi.org/10.1371/journal.pone.0281836.t007 3.3.2 The results of the analysis phase. This paper elaborates on the analysis tasks to provide insights into the processes in the hospital. Performance analysis techniques are conducted by statistical analysis or by using Prom tool such as dotted chart, conformance checking, and inductive visual miner. The dotted chart method for case handling processes, conformance checking to check deviations and bottlenecks. Also, organizational analysis conducted by social network mining method to provide insights into the collaboration between originators or departments in the hospital. 1- The results of the performance analysis. The performance analysis stage has a special importance during applying the process mining process in the hospital; it is implemented when the ‘model discovery’ and ‘evaluating the quality metrics of the generated models’ are finished to use their outputs. Where performance analysis is important to know the time that the patient consumes since entering the hospital, the consumed time for each activity, as well as the resources used with each activity, what activities consume a lot of time with the patient, as well as the long waiting times of the patient while performing a particular operation. Also, it is necessary to detect the existent deviation in the discovered careflow model from what was planned through handmade, as well as discovering the occurrence of bottlenecks in implementing certain operations. Performance analysis can be implemented by ProM platform provides process mining with many performance analysis techniques or by statistics analysis. Statistical analysis; using statistical analysis merged with process mining in order to analysis the length of patient journey into the hospital and if any correlation between the diagnosis of the patient and the length of its journey or the length of hospitalization spent time. Table 8 shows the Length of the patient journey (in days) into the hospital according to the patient type (preferred from doctor, or from Emergency). Table 8, shows that patients from emergency have shorter length path than from doctor. The average of stay of patient from doctor was ≈13 day, but the stay of the patients from Emergency was ≈7 days. Download: PPT PowerPoint slide PNG larger image TIFF original image Table 8. The length of the patient journey (in days) into hospital according to the patient refer from type. https://doi.org/10.1371/journal.pone.0281836.t008 Table 9 shows the Length of the patient journey (in days) into the hospital according to the patient diagnosis (open heart, Cardiac Stent and Diagnostic Catheterization, medication, and “discharged without surgery”). Table 9, shows that patients of “open heart” have the longest length path than other diagnostics. The average of stay of patient of “open heart” was ≈23 day, other diagnostics patients have a length stay ≈7 and 8 days. That’s because the activities included in the variants of “open heart surgery” has a long time to complete, but activities includes in others diagnostic have a shorter time. Download: PPT PowerPoint slide PNG larger image TIFF original image Table 9. Length of the patient journey (in days) into hospital according to the patient diagnostics. https://doi.org/10.1371/journal.pone.0281836.t009 ProM platform provides process mining with many performance analysis techniques such as conformance checking, dotted chart, and inductive visual miner plugins. Conformance checking; Fig 18 presents the conformance checking analysis by replaying result from mapping extracted event log from information system (after cleaning and preprocessing it) on petri net of the base model “standard model”, the base model was handmade by hospital’s domain experts (petri net of the base model was converted from BPMN model of Fig 4. The global statistics of replaying results in Fig 19 shows there are compliance between the base model (hand-made model) and the event logs from the information system with percentage 95%. It means that the extracted event log has a deviation from the model with about 5%. By example in Fig 19 (b), the activity “Recuperation Ward” has a red green bar to indicate 123 times not observed in the traces and 838 times it was executed correctly. According to global statistics in Fig 19 (a), some services in the hospital needs to be improved. Download: PPT PowerPoint slide PNG larger image TIFF original image Fig 18. Replaying the Event Log and the Petri Net of the base Model “Standard model” for Conformance Analysis and Bottleneck analysis. https://doi.org/10.1371/journal.pone.0281836.g018 Download: PPT PowerPoint slide PNG larger image TIFF original image Fig 19. The resulting Statistical information obtained from replaying the event log with the Petri Net of the base model “Standard model” for conformance checking process. https://doi.org/10.1371/journal.pone.0281836.g019 Inductive visual miner; for more analysis and inspection, the paper intends to detect the deviations in the model by using the inductive visual miner [41] as it is shown in Figs 20–22. The resulted model from the visual inductive miner starts with green circle and walks over the arrows to the end red circle, whereas each box represent an activity, the numbers that shown under the activity name are the frequencies with which it were executed according to the model. In Fig 20, the red-dashed edges indicate that these paths include deviations or bottlenecks during the execution; there are many activities such as “1st Consultant Checkup”,“Laboratory Tests”, “Admission Request”,“Prepare patient for Surgery”,’“Blood bank”, and “Recuperation Ward” in the model that has been skipped with a number of times, while they should be executed in the model. Fig 21 outlines the patients involved in hospital services during the observed period, a total of 961 patients started the services by the activity of “Admission to Hospital” and “1st Consultant Checkup”. A number of 645 patients passed through the “Laboratory Tests” and 642 for the “Radiology Tests”, 266 patients passed through “consultant checkup” service without passing through the lab or radiology department, and 903 patient passed directly through “Diagnostic Catheterization” and only 305 “Cardiac Stent” surgical services. Then a group of 159 patient wanted to use the services of performing the open heart surgery and then passed to the service of “recuperation ward”. While a group of 914 patient used the “recuperation ward” because they did the surgeries of “Open Heart” and “Cardiac Stent”. And at the end all the 961 patient passed through the service of“discharged and leave” to finish the sojourn in the hospital. Fig 22, shows the bottlenecks on the services where there are unbalanced allocation of resources; some resources were overallocated while others were free. In the Fig 22; it is possible to observe resources with high queue length while other resources were idle in the same instant, it’s clear on the Lab resource and 1st consultant resource. Download: PPT PowerPoint slide PNG larger image TIFF original image Fig 20. The process model explains the deviations. https://doi.org/10.1371/journal.pone.0281836.g020 Download: PPT PowerPoint slide PNG larger image TIFF original image Fig 21. Patients’ distribution among hospital services. https://doi.org/10.1371/journal.pone.0281836.g021 Download: PPT PowerPoint slide PNG larger image TIFF original image Fig 22. The process model explains sojourn time of the resources. https://doi.org/10.1371/journal.pone.0281836.g022 Dotted chart; Also the paper used the “Project chart on Dotted Chart” plugin that’s a dotted chart visualization of the event logs, it’s capable of handling large event logs to show overall events and performance information of the event log. Fig 23 displays the events as dots for the whole event log, the time ranging from January 2021 to May 2021 is measured along the horizontal axis of the chart, the vertical axis represents case IDs and the events are colored according to their activity name. The dotted chart shows that most of the patient’s cases start their activities inside the undertaken hospital with activity “1st Consultant Checkup”, then activities “Laboratory Tests” and “Radiology Tests” together. In Fig 24; the dotted chart for cluster 1 of “cardiac stent and diagnostic catheterization”. It’s noted that the throughput time for the most instances is short, it reaches less than one day. Also, it’s noted the most instances has a few number of activities, it reaches only three activities without the activities of the “Admission to hospital” and “Leave and Discharge”. In Fig 25; the dotted chart for cluster 2 of “open heart surgery”. It’s noted that the throughput time for the most instances is long, it reaches to more than 30 days. In general of the three dotted charts; it is shown that there are many correlation orders of a specific activity such as “Blood Bank”, “Perform open heart Surgery” and “ICU”, also there is correlation between “Consultant Checkup”, “Laboratory Tests” and “Radiology Tests”. Based on Fig 26 that shows the dotted chart for cluster 3, where the patients leaved the hospital without any surgery but only took medication care. It’s noted a space-time between dot of “scheduling patient appointment” event and dot of next admission “Admission request”. Download: PPT PowerPoint slide PNG larger image TIFF original image Fig 23. The dotted chart of whole event log. https://doi.org/10.1371/journal.pone.0281836.g023 Download: PPT PowerPoint slide PNG larger image TIFF original image Fig 24. The dotted chart of cluster 1 of “cardiac stent and diagnostic catheterization”. https://doi.org/10.1371/journal.pone.0281836.g024 Download: PPT PowerPoint slide PNG larger image TIFF original image Fig 25. The dotted chart of cluster 2 of “open heart surgery”. https://doi.org/10.1371/journal.pone.0281836.g025 Download: PPT PowerPoint slide PNG larger image TIFF original image Fig 26. The dotted chart of cluster 3 of “the patients leaved the hospital without any surgery but only took medication care”. https://doi.org/10.1371/journal.pone.0281836.g026 2- The results of the organization analysis. The paper used social network miner that plugged into ProM platform to gain insights about the collaboration between departments, and persons in the hospital. The social network miner allows for the discovery of social networks from process logs. Since there are several social network analysis techniques and research results available, the generated social network allows for analysis of social relations between originators involving process executions. The social network is used to analysis of social relations between originators involving process executions. Fig 27 shows the social network that generated from the event log, it used the “Handover of Work” metric that measures the frequency of transfers of work among departments. In undertaken hospital there are many departments that interact and handover work to each other, there are many-to-many relationship among resources. Fig 27 shows that there are many originators such as laboratory, radiology, and accountant, cardiology consultant, and general practitioner are highly involved in the process and interacts with many departments. But There are other originators are often involved but are not directly connected to all other originators such as ICU nurse, nurse care, and ward nurse. Also, they noted that fewer interactions between the general practitioner 2, accountant2, and receptionist 2 to other originators. This can be explained that those originators work as evening time with no density in patient requests. Download: PPT PowerPoint slide PNG larger image TIFF original image Fig 27. Social network of the hospital originators with the handover of work metric. https://doi.org/10.1371/journal.pone.0281836.g027 3.3.1 Process discovery results. The domain experts in the hospital have prepared a hand-made process model for planning the patient’s processes during his healthcare journey once s/he admitted to the hospital at the end of the journey. The domain experts predict that their hand-made model is idealized for planning the processes, but in many times it’s far from the actual processes that performed. Process mining considers the model discovery is the base step to build the model in such as graphical structure from the real events. For illustration propose, the names of the activities will be represented by letters as it is shown in Table 4. Table 5 shows the most frequent and the complete traces in the event logs that are 87% from the original event log without any anomal trace. Process discovery uses event log to generate a process model. Based on the process model, the domain experts in the hospital can apply other process mining techniques such as conformance checking and performance analysis to gain deeper insights into the hospital. This paper applied four popular process discovery miner algorithms to the data set of the hospital and compared among generated models to select one model with the high scores in the quality metrics (fitness, precision, generalization, simplicity). In the following section, a petri net figures Figs 11–14 output from applying the discovery miner algorithms, these petri nets will be the input to next step of “Evaluation of the quality of the discovered models”; The four used mining algorithms are heuristic miner [28], inductive miner [29], ILP miner [30, 33], and ETM miner [31, 34]. These algorithms are supported in Prom, guarantee complete traces, and output a petri net model. Download: PPT PowerPoint slide PNG larger image TIFF original image Fig 11. Result model from heuristic miner. https://doi.org/10.1371/journal.pone.0281836.g011 Download: PPT PowerPoint slide PNG larger image TIFF original image Fig 12. Result model from inductive miner. https://doi.org/10.1371/journal.pone.0281836.g012 Download: PPT PowerPoint slide PNG larger image TIFF original image Fig 13. Petri net from the ILP miner. https://doi.org/10.1371/journal.pone.0281836.g013 Download: PPT PowerPoint slide PNG larger image TIFF original image Fig 14. Result from ETM miner algorithm. https://doi.org/10.1371/journal.pone.0281836.g014 Download: PPT PowerPoint slide PNG larger image TIFF original image Table 4. Index of the activities. https://doi.org/10.1371/journal.pone.0281836.t004 Download: PPT PowerPoint slide PNG larger image TIFF original image Table 5. Most frequent traces. https://doi.org/10.1371/journal.pone.0281836.t005 3- Evaluation of the quality of the discovered models. Throughout this section, the paper introduces how it measured the four quality metrics (fitness, precision, simplicity, and generalization) to evaluate the performance of the discovered process models: Fitness metric is measured using the alignment-based conformance checking method, in general the early alignment methods [35] depended on replaying each trace against the process model one event at a time. The error in replay happens when there are ignore/skip an event in the log or to ignore/skip task in the process model. This drawback is fixed by later alignment techniques in researches of [35, 36], where the closed corresponding trace that can be parsed by the model is identified for each trace. This paper used the Prom plugin named “Replay a log on petri net for conformance analysis” that was derived from the alignment-based conformance checking in [35, 36]. This plugin takes the petri net that generated from the four discovery miners and the event log after filtering, the outputs from this plugin are a “Result Replay” and “log contains statistical information such as trace fitness”. The Fig 15 shows the result of replaying the petri net from inductive miner as one from the four applied discovery miners with extracted log after filtering. Fig 16 shows the output statistical information from replay petri net (from the inductive miner) with the extracted log. This step is applied on the petri nets of the other three discovery miners (heuristic, ILP, and ETM). Download: PPT PowerPoint slide PNG larger image TIFF original image Fig 15. Result Replay of Petri net based on inductive miner with extracted log. https://doi.org/10.1371/journal.pone.0281836.g015 Download: PPT PowerPoint slide PNG larger image TIFF original image Fig 16. The statistical information obtained from replay Petri net based on petri net from inductive miner with extracted log. https://doi.org/10.1371/journal.pone.0281836.g016 The precision metric and the generalization metrics are measured by using the plugin of “Measure the Precision/Generalization” that needs three elements (the petri net, event log, and result replay of the Fig 15 as input to it). The plugin of “Measure the Precision/Generalization” measures the precision using the method that introduced in [37]. This method measures precision by aligning the logs to the process model, if all activities by the model are actually observed, then precision is ‘1’. the method based on Eq 2 below: (2) Where each event eϵε, ϵ is the collection of unique events, enM(e) the number of enabled activities in the model, and enL(e) is the number of the observed activities actually executed. The value of the precision between ‘0’ and ‘1’. If the value near to ‘0’ the model is underfitting, in opposite if the model near to ‘1’ is more precise. Also, the plugin of “Measure the Precision/Generalization” measures the generalization using a similar approach as it used for precision measuring. The approach based on the Eq 3 below to quantify the generalization: (3) Where each event eϵε, ϵ is the collection of unique events, pnew(w;n) is the estimated probability that next visit to stat stateM(e) will reveal a new path not seen before, w = |diff(e)| is the number of unique activities observed leaving state s and n = |sim(e)| is the number of times s was visited by the event log. Fig 17 shows screen shot of a result from the plugin “measure precision/ generalization” of a petri net based on inductive miner. Download: PPT PowerPoint slide PNG larger image TIFF original image Fig 17. Precision and generalization results of Petri net based on inductive miner. https://doi.org/10.1371/journal.pone.0281836.g017 The simplicity metric quantifies complexity of the model as “how number of arcs and nodes in the model”. The simplicity concept concerns on the model itself not relating to the event log. The [38, 39] introduce a number of metrics to measure the simplicity or complexity of the model such as size, diameter, density, connectivity, node degree, separability structuredness, sequentiality, depth, gateway mismatch, gateway heterogeneity, control-flow complexity(Cardoso), cyclicity, token splits, and soundness. There is no single perfect metric that can be used for quantifying the simplicity. All metrics has pros and cons issues. This paper will use a new method to quantify the simplicity metric and decide if the discovered model is simple or not, and what is the simpler model than others. The new method is summarized into two steps: Firstly; checking the soundness of every model resulted from the miner algorithms. Then second step; measuring the metrics of control-flow complexity (Cardoso), cyclomatic, and structuredness. Check the soundness: The process model is sound when it is able to achieve three properties: 1) option to complete; reaches the end state of process model from any state; 2) proper completion: there are no token left behind when reach the final state of process model; 3) no dead transition: each transition in the model can be enabled. The paper used the ProM plugin “Analyze with Woflan” to check the soundness of each model resulted from the discovery miner algorithm. The output from the plugin states that only the models based on the inductive miner and ETM miner are sound. So; the other models based on ILP and heuristic miner algorithms are discarded from the next step. Measuring the metrics of ‘density’, ‘control-flow complexity (Cardoso)’, ‘cyclomatic’, and ‘structuredness’. the density is the relation between the number of arcs and the maximum number of arcs between all nodes, Control-Flow Complexity(Cardoso) is sums of all choices of a process based on the number of splits of each type and its number of outgoing arcs, Cyclomatic is the number of nodes within cycles with regard to the all number of nodes, it is calculated by the rule: CM = |E||V| + P; where E is number of nodes, V is number of edges, P is number of connected components. Structuredness is the proportion of well-structured parts with respect to the rest (non-structured) in the process model. The structuredness ratio of the process graph is measured by decreasing the number of nodes in the reduced process graph from the one and divided by the number of nodes in the original process graph. The paper used ProM plugin “Show Petri net metrics” to get the values of the above metrics (Cardoso, Cyclomatic, and Structuredness). Table 6 shows the output values of the metrics (Cardoso, Cyclomatic, and Structuredness) for the models based on the inductive and ETM discovery miners. Download: PPT PowerPoint slide PNG larger image TIFF original image Table 6. Quantify the complexity of the model from the inductive and ETM discovery miners. https://doi.org/10.1371/journal.pone.0281836.t006 The results of the Table 6 show that both models based on inductive miner and ETM miner algorithms are simple, the values of the metrics (Cardoso, Cyclomatic, and structuredness) indicates that model based on inductive miner is simpler than the model based on ETM miner. Table 7 shows the results of the four miner algorithms in the four quality evaluation metrics (fitness, precision, generalization, complexity). It is shown that the inductive miner algorithm scores the highest values in fitness and generalization issues, followed by ETM miner algorithm. The inductive miner algorithm scores a low value in precision issue than ETM. Based on this comparison, the best choice is to depend on the inductive miner algorithm as discovery process model in this case study. Also, it’s noticeable through execution that ETM miner has consumed more time than other miner algorithms, this has been proved in [40], and therefore the ETM miner was excluded from using in this paper. Download: PPT PowerPoint slide PNG larger image TIFF original image Table 7. Comparison among the four applied algorithms. https://doi.org/10.1371/journal.pone.0281836.t007 3.3.2 The results of the analysis phase. This paper elaborates on the analysis tasks to provide insights into the processes in the hospital. Performance analysis techniques are conducted by statistical analysis or by using Prom tool such as dotted chart, conformance checking, and inductive visual miner. The dotted chart method for case handling processes, conformance checking to check deviations and bottlenecks. Also, organizational analysis conducted by social network mining method to provide insights into the collaboration between originators or departments in the hospital. 1- The results of the performance analysis. The performance analysis stage has a special importance during applying the process mining process in the hospital; it is implemented when the ‘model discovery’ and ‘evaluating the quality metrics of the generated models’ are finished to use their outputs. Where performance analysis is important to know the time that the patient consumes since entering the hospital, the consumed time for each activity, as well as the resources used with each activity, what activities consume a lot of time with the patient, as well as the long waiting times of the patient while performing a particular operation. Also, it is necessary to detect the existent deviation in the discovered careflow model from what was planned through handmade, as well as discovering the occurrence of bottlenecks in implementing certain operations. Performance analysis can be implemented by ProM platform provides process mining with many performance analysis techniques or by statistics analysis. Statistical analysis; using statistical analysis merged with process mining in order to analysis the length of patient journey into the hospital and if any correlation between the diagnosis of the patient and the length of its journey or the length of hospitalization spent time. Table 8 shows the Length of the patient journey (in days) into the hospital according to the patient type (preferred from doctor, or from Emergency). Table 8, shows that patients from emergency have shorter length path than from doctor. The average of stay of patient from doctor was ≈13 day, but the stay of the patients from Emergency was ≈7 days. Download: PPT PowerPoint slide PNG larger image TIFF original image Table 8. The length of the patient journey (in days) into hospital according to the patient refer from type. https://doi.org/10.1371/journal.pone.0281836.t008 Table 9 shows the Length of the patient journey (in days) into the hospital according to the patient diagnosis (open heart, Cardiac Stent and Diagnostic Catheterization, medication, and “discharged without surgery”). Table 9, shows that patients of “open heart” have the longest length path than other diagnostics. The average of stay of patient of “open heart” was ≈23 day, other diagnostics patients have a length stay ≈7 and 8 days. That’s because the activities included in the variants of “open heart surgery” has a long time to complete, but activities includes in others diagnostic have a shorter time. Download: PPT PowerPoint slide PNG larger image TIFF original image Table 9. Length of the patient journey (in days) into hospital according to the patient diagnostics. https://doi.org/10.1371/journal.pone.0281836.t009 ProM platform provides process mining with many performance analysis techniques such as conformance checking, dotted chart, and inductive visual miner plugins. Conformance checking; Fig 18 presents the conformance checking analysis by replaying result from mapping extracted event log from information system (after cleaning and preprocessing it) on petri net of the base model “standard model”, the base model was handmade by hospital’s domain experts (petri net of the base model was converted from BPMN model of Fig 4. The global statistics of replaying results in Fig 19 shows there are compliance between the base model (hand-made model) and the event logs from the information system with percentage 95%. It means that the extracted event log has a deviation from the model with about 5%. By example in Fig 19 (b), the activity “Recuperation Ward” has a red green bar to indicate 123 times not observed in the traces and 838 times it was executed correctly. According to global statistics in Fig 19 (a), some services in the hospital needs to be improved. Download: PPT PowerPoint slide PNG larger image TIFF original image Fig 18. Replaying the Event Log and the Petri Net of the base Model “Standard model” for Conformance Analysis and Bottleneck analysis. https://doi.org/10.1371/journal.pone.0281836.g018 Download: PPT PowerPoint slide PNG larger image TIFF original image Fig 19. The resulting Statistical information obtained from replaying the event log with the Petri Net of the base model “Standard model” for conformance checking process. https://doi.org/10.1371/journal.pone.0281836.g019 Inductive visual miner; for more analysis and inspection, the paper intends to detect the deviations in the model by using the inductive visual miner [41] as it is shown in Figs 20–22. The resulted model from the visual inductive miner starts with green circle and walks over the arrows to the end red circle, whereas each box represent an activity, the numbers that shown under the activity name are the frequencies with which it were executed according to the model. In Fig 20, the red-dashed edges indicate that these paths include deviations or bottlenecks during the execution; there are many activities such as “1st Consultant Checkup”,“Laboratory Tests”, “Admission Request”,“Prepare patient for Surgery”,’“Blood bank”, and “Recuperation Ward” in the model that has been skipped with a number of times, while they should be executed in the model. Fig 21 outlines the patients involved in hospital services during the observed period, a total of 961 patients started the services by the activity of “Admission to Hospital” and “1st Consultant Checkup”. A number of 645 patients passed through the “Laboratory Tests” and 642 for the “Radiology Tests”, 266 patients passed through “consultant checkup” service without passing through the lab or radiology department, and 903 patient passed directly through “Diagnostic Catheterization” and only 305 “Cardiac Stent” surgical services. Then a group of 159 patient wanted to use the services of performing the open heart surgery and then passed to the service of “recuperation ward”. While a group of 914 patient used the “recuperation ward” because they did the surgeries of “Open Heart” and “Cardiac Stent”. And at the end all the 961 patient passed through the service of“discharged and leave” to finish the sojourn in the hospital. Fig 22, shows the bottlenecks on the services where there are unbalanced allocation of resources; some resources were overallocated while others were free. In the Fig 22; it is possible to observe resources with high queue length while other resources were idle in the same instant, it’s clear on the Lab resource and 1st consultant resource. Download: PPT PowerPoint slide PNG larger image TIFF original image Fig 20. The process model explains the deviations. https://doi.org/10.1371/journal.pone.0281836.g020 Download: PPT PowerPoint slide PNG larger image TIFF original image Fig 21. Patients’ distribution among hospital services. https://doi.org/10.1371/journal.pone.0281836.g021 Download: PPT PowerPoint slide PNG larger image TIFF original image Fig 22. The process model explains sojourn time of the resources. https://doi.org/10.1371/journal.pone.0281836.g022 Dotted chart; Also the paper used the “Project chart on Dotted Chart” plugin that’s a dotted chart visualization of the event logs, it’s capable of handling large event logs to show overall events and performance information of the event log. Fig 23 displays the events as dots for the whole event log, the time ranging from January 2021 to May 2021 is measured along the horizontal axis of the chart, the vertical axis represents case IDs and the events are colored according to their activity name. The dotted chart shows that most of the patient’s cases start their activities inside the undertaken hospital with activity “1st Consultant Checkup”, then activities “Laboratory Tests” and “Radiology Tests” together. In Fig 24; the dotted chart for cluster 1 of “cardiac stent and diagnostic catheterization”. It’s noted that the throughput time for the most instances is short, it reaches less than one day. Also, it’s noted the most instances has a few number of activities, it reaches only three activities without the activities of the “Admission to hospital” and “Leave and Discharge”. In Fig 25; the dotted chart for cluster 2 of “open heart surgery”. It’s noted that the throughput time for the most instances is long, it reaches to more than 30 days. In general of the three dotted charts; it is shown that there are many correlation orders of a specific activity such as “Blood Bank”, “Perform open heart Surgery” and “ICU”, also there is correlation between “Consultant Checkup”, “Laboratory Tests” and “Radiology Tests”. Based on Fig 26 that shows the dotted chart for cluster 3, where the patients leaved the hospital without any surgery but only took medication care. It’s noted a space-time between dot of “scheduling patient appointment” event and dot of next admission “Admission request”. Download: PPT PowerPoint slide PNG larger image TIFF original image Fig 23. The dotted chart of whole event log. https://doi.org/10.1371/journal.pone.0281836.g023 Download: PPT PowerPoint slide PNG larger image TIFF original image Fig 24. The dotted chart of cluster 1 of “cardiac stent and diagnostic catheterization”. https://doi.org/10.1371/journal.pone.0281836.g024 Download: PPT PowerPoint slide PNG larger image TIFF original image Fig 25. The dotted chart of cluster 2 of “open heart surgery”. https://doi.org/10.1371/journal.pone.0281836.g025 Download: PPT PowerPoint slide PNG larger image TIFF original image Fig 26. The dotted chart of cluster 3 of “the patients leaved the hospital without any surgery but only took medication care”. https://doi.org/10.1371/journal.pone.0281836.g026 2- The results of the organization analysis. The paper used social network miner that plugged into ProM platform to gain insights about the collaboration between departments, and persons in the hospital. The social network miner allows for the discovery of social networks from process logs. Since there are several social network analysis techniques and research results available, the generated social network allows for analysis of social relations between originators involving process executions. The social network is used to analysis of social relations between originators involving process executions. Fig 27 shows the social network that generated from the event log, it used the “Handover of Work” metric that measures the frequency of transfers of work among departments. In undertaken hospital there are many departments that interact and handover work to each other, there are many-to-many relationship among resources. Fig 27 shows that there are many originators such as laboratory, radiology, and accountant, cardiology consultant, and general practitioner are highly involved in the process and interacts with many departments. But There are other originators are often involved but are not directly connected to all other originators such as ICU nurse, nurse care, and ward nurse. Also, they noted that fewer interactions between the general practitioner 2, accountant2, and receptionist 2 to other originators. This can be explained that those originators work as evening time with no density in patient requests. Download: PPT PowerPoint slide PNG larger image TIFF original image Fig 27. Social network of the hospital originators with the handover of work metric. https://doi.org/10.1371/journal.pone.0281836.g027 4. Conclusions The paper suggested applying the process mining approach in analyzing the patient’s journey as s/he registered to the hospital to the end and leaving the hospital. The paper have used a dataset from Egyptian hospital based on the event logs of the cardiac patients. The proposed methodology has employed four discovery algorithms to mine a structured process model from unstructured careflows. But before performing this step; the paper suggested applying the abstraction and trace clustering techniques to simplify the unstructured form of the model. Then, an evaluation of the quality metrics among the output models from the discovery miners was conducted to select the best process model that describe careflows of all groups of patients, the evaluation was based on fitness, precision, generalization, and simplicity metrics. To evaluate the simplicity metric of extracted models; the paper suggested a method to quantify the simplicity metric and decide if the process model is simple or not. The method with two steps: checking the soundness of every model resulted from the miner algorithms, then secondly; measuring the metrics of control-flow complexity: Cardoso, cyclomatic, and structuredness. The highest quality process model is selected to compare it with the process model that hand-made by experts to detect the deviations. The matching rate between the discovered process model and the standard one was a 95%. The paper derived the insights from the careflows and the event log by utilizing organizational and performance analysis. The paper used ProM plugins to apply process mining techniques while extracting and cleaning the event logs, discovering the process models, applying the conformance checking, applying performance or organizational analysis. As results of the paper, the process mining approach was used to extract the careflows of large groups of patients with large variants from start step “registration” in hospital to the end step “leave”, Accordingly, various analyzes were applied to get useful insights to hospital management that can used to improve their healthcare processes. The major impediment of this methodology is it was employed only for current case study, and it was not tested for another cases with other variables. In the future, it may be a serious to extend the implementation of the methodology in another hospital or multiple hospitals to address the problem of drifting. Future work will focus on how to enhance the performance of the business process by improving a prediction models and recommendations systems that based on a merge of process mining and machine learning methods, there are many methods of deep learning that can be useful for predicting and classification the parameters such as remembered in [42, 43]. The recommendation system will be useful to deal with inefficient activities such as; large waiting time, high cost, deviations from the standard model, and bottlenecks.
TI - Analysis the patients’ careflows using process mining
JF - PLoS ONE
DO - 10.1371/journal.pone.0281836
DA - 2023-02-23
UR - https://www.deepdyve.com/lp/public-library-of-science-plos-journal/analysis-the-patients-careflows-using-process-mining-yNjPdUSa6B
SP - e0281836
VL - 18
IS - 2
DP - DeepDyve
ER -