Practical implementation of artificial intelligence algorithms in pulmonary auscultation examination

Practical implementation of artificial intelligence algorithms in pulmonary auscultation examination Lung auscultation is an important part of a physical examination. However, its biggest drawback is its subjectivity. The results depend on the experience and ability of the doctor to perceive and distinguish pathologies in sounds heard via a stethoscope. This paper investigates a new method of automatic sound analysis based on neural networks (NNs), which has been implemented in a system that uses an electronic stethoscope for capturing respiratory sounds. It allows the detection of auscultatory sounds in four classes: wheezes, rhonchi, and fine and coarse crackles. In the blind test, a group of 522 auscultatory sounds from 50 pediatric patients were presented, and the results provided by a group of doctors and an artificial intelligence (AI) algorithm developed by the authors were compared. The gathered data show that machine learning (ML)–based analysis is more efficient in detecting all four types of phenomena, which is reflected in high values of recall (also called as sensitivity) and F1-score. Conclusions: The obtained results suggest that the implementation of automatic sound analysis based on NNs can significantly improve the efficiency of this form of examination, leading to a minimization of the number of errors made in the interpretation of auscultation sounds. What is Known: � Auscultation performance of average physician is very low. AI solutions presented in scientific literature are based on small data bases with isolated pathological sounds (which are far from real recordings) and mainly on leave-one-out validation method thus they are not reliable. What is New: � AI learning process was based on thousands of signals from real patients and a reliable description of recordings was based on multiple validation by physicians and acoustician resulting in practical and statistical prove of AI high performance. . . . . Keywords Auscultation Artificial intelligence Machine learning Respiratory system Stethoscope Abbreviations NN Neural networks; AI Artificial intelligence DNN Deep neural networks GS Golden standard Background ML Machine learning Auscultation has been considered as an integral part of phys- ical examination since the time of Hippocrates. The stetho- Communicated by Peter de Winter scope, introduced by Laennec [2] more than two centuries ago, was one of the first medical instruments which enabled * Honorata Hafke-Dys h.hafke@amu.edu.pl; hafke@stethome.com internal body structures and their functioning to be checked. The stethoscope still remains a tool that can provide po- tentially valuable clinical information. However, the results StethoMe, Winogrady 18A, 61-663 Poznań,Poland 2 of such examinations are strongly subjective and cannot be Department of Pediatric Pneumonology, Allergology and Clinical shared and communicated easily, mostly because of doc- Immunology, K. Jonscher Clinical Hospital, Poznań University of Medical Sciences, Szpitalna 27/33, 60-572 Poznań, Poland tors’ experience and perceptual abilities, which leads to dif- ferences in the their assessments, depending on their spe- Institute of Acoustics, Faculty of Physics, Adam Mickiewicz University, Poznań, Umultowska 85, 61-614 Poznań, Poland cialization (Hafke et al., submitted for publication). Another 884 Eur J Pediatr (2019) 178:883–890 important issue is the inconsistent nomenclature of respira- disorders [3]. This is why the proper detection and evaluation tory sounds. This problem is widely recognized [1], but to of crackles is of high importance. date, there is still no standardized worldwide classification Auscultation includes the evaluation of sound character, its of the types of phenomena appearing in the respiratory sys- intensity, frequency, and pathological signals occurring in the tem [10]. There is both a variety of terms used for the same breathing sound. Its subjective nature is widely recognized, sound by different doctors and different sounds described which has led to a new era of developments, for instance by the same term. Lung sounds, as defined by Sovijarvi computer-based techniques. et al. [14], concern all respiratory sounds heard or detected Recordings made with electronic stethoscopes may be over the chest wall or within the chest, including normal further analyzed by a digital system in terms of its acous- breathing sounds and adventitious sounds. In general, respi- tic features and, after proper signal processing, delivered ratory sound is characterized by a low noise during inspira- to the doctor at an enhanced level of quality or even tion, and hardly audible during expiration. The latter is lon- complemented by a visual representation, e.g., a spectro- ger than the former [12]. The spectrum of noise of normal gram. The latter should be considered as an association respiratory sound (typically 50–2500 Hz) is broader on the between an acoustical signal and its visual representation, trachea (up to 4000 Hz) [11]. and is beneficial to the learning and understanding of Adventitious sounds are abnormalities (pathologies) those sounds, not only for medical students [13], but also superimposed on normal breathing sounds. They can be di- when it comes to doctors diagnosing patients. vided into two sub-classes depending on their duration: con- Currently, the subject of the greatest attention in the tinuous (stationary) sounds—wheezes, rhonchi, and discon- field of computer-based medicine are neural networks tinuous (non-stationary) sounds—fine or coarse crackles. (NNs). NNs are a particularly fast developing area of Wheezes are continuous tonal sounds with a frequency machine learning which learn from examples, as human range from less than 100 Hz to more than 1 kHz, and a dura- do. A decade ago NNs were one of many available tion time longer than 80 ms [8]. They are generally recognized classifiers. They were trained on a small set of high- correctly and rarely misinterpreted, which makes them prob- level features and produced probability scores of a sam- ably the most easily recognized pathological sound [7]. ple belonging to one of several predefined classes. Their However, as Hafke et al. (submitted for publication) proved, popularity sharply rose when it was proven that deeper in the case of describing previously recorded sounds, doctors neuron structures are able to learn intermediate features have difficulty identifying this kind of pathology depending from low-level representations by themselves. These in- on breathing phase, i.e., inspiratory wheezes were confused termediate features learned by the NN are much more with expiratory wheezes and vice versa. distinctive and descriptive incomparison tohand-crafted Rhonchi are continuous, periodic, snoring-like, similar to features in many artificial intelligence (AI) tasks, includ- wheezes, but of lower fundamental frequency (typically below ing audio signal analysis and medicine. 300 Hz) and duration, typically longer than 100 ms [8]. It is one Contemporary deep neural networks (DNNs) operate on of the most ambiguous classes of pathological sounds, as it is raw signals directly and are therefore able to identify and often considered to be on the boundary between wheezes and exploit all important dependencies that they provide. But in crackles (especially of coarse type). Thus, they may be mistak- order to be able to do that, a large number of training examples en for them [15]. Although many authors suggested need to be provided. Yet, after these initial requirements are Brhonchus^ as a separate category [10], some doctors use the met, the NN algorithm is able to match or even surpass human term Blow-pitch wheeze^ [6]. Due to the fact they have the performance. This is also believed to be the best strategy for features of both wheezes and crackles, these phenomena are dealing with respiratory sounds. often differently classified by the respondents. As Hafke et al. Therefore, the aim of this study was to compare the effi- proved, this is strongly dependent on the examiner’s experi- ciency of AI and a group of five physicians in terms of respi- ence. Moreover, in the cited research, the advantage of ratory sounds identification in four main classes of patholog- pulmonologists was clearly visible. In their case, the number ical signals, according to [10]: wheezes (with no differentia- of correct rhonchi detections was 51.2%, while for other tion to sub-classes), rhonchi, and coarse and fine crackles. groups, this value did not exceed 30%, which was the lowest result for all the phenomena taken into account. Finally, crackles are short, explosive sounds of a non-tonal Material and methods character. They tend to appear both during inspiration and expiration. Two categories of this phenomenon have been Auscultation recordings described—fine and coarse crackles. They vary in typical length (ca. 5 ms and ca. 15 ms, respectively) and frequency The auscultation recording files were gathered from 50 visits (broad-band) and may appear in different respiratory system performed by pediatricians using StethoMe® and Littmann Eur J Pediatr (2019) 178:883–890 885 3200 electronic stethoscopes. All the recordings were made in It should be stressed that together with the recording pre- the Department of Paediatric Pulmonology (Karol Jonscher sentation, the information about the location of the point on University Hospital in Poznan, Poland). The subjects were the chest or back in which recording was made, as well as chosen on random from the patients of the abovementioned basic information about the sex and age of the child, the diag- hospital. The whole procedure of signal collection nosis, and accompanying diseases, were provided with every (recordings) took 6 months. In this period, patients with dif- recording of a particular visit. The medical description ferent diseases (thus, different pathological sounds) were hos- consisted of the assessment of whether in a given recording, pitalized. The decision about the recording was made after coming from a particular point, there were adventitious respi- auscultation by a pulmonologist working at the hospital. ratory sounds from each of the four classes. Those descrip- In general, each visit provided a set of 12 recordings—each of tions were compared both with the NN descriptions as well as them was made at a different auscultation point (Fig. 1). with the golden standard (GS). However, in case of children, as was the case in this research, it is often difficult to document breathing sounds from such a num- ber of auscultation points of a sufficiently high quality, due to Golden standard children’s movements and impatience, and crying, or because of other health issues. The age of patients was within the range of 1 Because of the fact that there is no objective measure that to 18 years old (mean 8.5; median, 8). This parameter however provides a classification of pathological breath sounds, it was not taken into account by AI, but the physicians were in- was necessary to establish a point of reference, which in this formed about the age of each patient. Therefore, the total number research is specified as the GS. The mentioned procedure for of recordings that were analyzed from 50 visits was 522. the GS is depicted in a few steps (Fig. 2.). Five pediatricians (different from the previous ones) carried out two self-reliant and independent verifications of the pre- Study design viously described recordings. Thus, each recording had a de- scription and two independent verifications. The recordings The main goal was to investigate the accuracy of NNs in the with double positive medical verifications were automatically classification of respiratory sounds in comparison with medi- qualified to the GS. When the doctors’ opinions were cal specialists. It must be emphasized that, in opposition to ambiguous—which means there was one positive verification most research in many scientific journals which was per- and one negative, the recording was analyzed by an acousti- formed on a small database or in laboratory conditions (e.g., cian experienced in signal recognition. Once the acoustician 5, 9), this research was based on a large amount of actual evaluated the description as disputable, which meant its con- auscultation recordings captured in realistic conditions (hos- tent could be ambiguous in terms of the acoustic parameters, pital). The four abovementioned classes of auscultation phe- the recording was forwarded to a consilium (2 experienced nomena (wheezes, rhonchi, and coarse and fine crackles) were pediatricians and one acoustician), which was convened to chosen as the most frequently occurring and described. The establish a medical description again. It must be emphasized nomenclature suggested by the European Respiratory Society that the GS consisted of real-life recordings collected from real [10] was applied in order to reduce the influence of ambiguous patients in real situations (hospital). Many of the recordings terminology on the final result. Audio data gathered by elec- contained additional external noise (crying, talking, stetho- tronic stethoscopes was described by doctors in terms of the scope movements, etc.). To make the GS as reliable as possi- presence of pathological sounds in certain phases of the ble, the consilium instead of one physicians described those breathing cycle and locations on the chest wall. The same cases. The descriptions from the consilium were not subjected description was carried out by the NN. to further verification (Fig. 2). Fig. 1 The specific localization of auscultation points in the front (left panel) and back (right panel) of a chest 886 Eur J Pediatr (2019) 178:883–890 Fig. 2 Scheme of the GS data acquisition procedure Finally, the GS contained 322 recordings with double-positive Table 1 Number of Phenomenon Number of recordings recordings in terms of the verification and 200 evaluated by the consilium (Table 1). appearance of specific Both no pathology and more than one pathology in one Wheezes 124 pathological phenomena recording were possible; thus, the number of recordings in Rhonchi 113 the Table 1 is not equal to the total number of 522 recordings Coarse crackles 66 used in the experiment. Fine crackles 112 Eur J Pediatr (2019) 178:883–890 887 Participants gathered for the doctors and the NN are statistically different. Detailed results are depicted in Table 2. Doctors The lowest F1-score was observed for coarse crackles both in the case of medical and NN descriptions. This may be The set of all the GS recordings set, accompanied with spec- partially due to the rare occurrence of coarse crackles in the trograms and basic information about each patient, was pre- analyzed database (see Table 1). Moreover, this kind of phe- sented to five pediatricians, and they described them in terms nomena is often confused with other types of crackles or rhon- of the occurrence of four pathological sounds (Table 1). One chi (Hafke et al., submitted for publication) so its correct de- description was made for each recording. tection might be problematic. However, it is important to note that the NN F1-score which is related to its performance in correct phenomena detection is higher than in the case of NN medical descriptions (47.1% vs. 42.8%). The highest F1-score was obtained for rhonchi and StethoMe AI NN architecture based on a modified version wheezes (both continuous, Bmusical^ sounds). Medical de- of that proposed by Çakir et al. [4] was used. This is a scriptions for rhonchi are comparable to the GS (which is specialized network suitable for polyphonic sound event reflected in F1-score value) in 61.0%, while NN is much more detection. It is composed of many specialized layers of accurate—72.0%. This is undeniable proof of the ambiguous neurons, including convolutional layers, which are effec- character of rhonchi, which results in poor detection perfor- tive at detecting local correlations in the signal, as well as mance (probably caused by mistaking them for other phenom- recurrent layers designed to capture long-time dependen- ena, as evidenced by low precision and recall (sensitivity) cies, e.g., a patient’s breathing cycle and the associated values when compared to the NN). recurrence of pathological sounds. The NN had been When it comes to wheezes, despite the slightly lower trained and validated on a set of more than 6000 real values of precision and specificity noted for the NN, its final and 10,071 artificial/synthetic recordings. This dataset performance, expressed in F1-score value, is better than in the was completely different from the GS set. Furthermore, case of human tagging. The results are as follows—61.8% and another database was used in order to provide better noise 66.4%, with NN superiority. detection. As output, the NN provided a matrix called the It canalsobenotedthat theAI-basedanalysisis probability raster. In this data structure, the rows represent more accurate in detecting rhonchi and wheezes. This time, discretized into 10 ms frames, while the columns maybedue to the fact that it is basedmainlyonthe depict the probability of phenomena detection changing spectrograms, which accurately reflect tonal content in a over the frames. The probability values are then recording. For the doctors, descriptions are mainly thresholded in order to obtain boolean values indicating based mainly on acoustical cues, while the visual repre- the presence or absence of such phenomenon along each sentation is used rather as an additional, supporting tool. frame (Fig. 3). This may be an important issue influencing the proper detection of pathology, especially when phenomena is of ambiguous nature (e.g., rhonchi) or accompanied by Analysis louder sounds, which make them barely audible (e.g., silent wheezes). Results The biggest differences in F1-scores, meaning a significant predominance of the new automatic system over doctors, are A GS was used as a point of reference (100%) for tagging observed for fine crackles—64.6% vs. 51.1%. Also, all of recordings performed by doctors and the NN. Therefore, con- other parameters are higher for the NN. fusion matrices could be analyzed—the valuesofrecall (the Generally, for each of the four phenomena, the F1-score for proportion of actual positives that are correctly identified as the NN is higher than for doctors with an average of 8.4 such, also called as sensitivity), precision (the fraction of rel- percentage points (p.p.), which clearly indicates the advantage evant instances among the retrieved instances), specificity (the of the tested algorithm over the group of doctors. NN is 13 p.p. proportion of actual negatives that are correctly identified), in average more sensitive and 4 p.p. more precise than the and the F1-score (the harmonic mean of precision and recall) reference group of pediatricians. were measured for the doctors and NN’s phenomena detection in comparison with the GS. First the chi-square test (α =0.05) Discussion was performed to investigate if there is a difference in the data gathered for doctors and the NN. The proposed null hypothe- The main goal of this research was to investigate the effective- sis was rejected for all four phenomena. Therefore, the results ness of pathological respiratory sounds detection for both 888 Eur J Pediatr (2019) 178:883–890 Fig. 3 Exemplary probability raster for fine crackles (a) and rhonchi (b): represent time, framed in windows of 10 ms each; the columns show the signal (first line) is transformed into a spectrogram (second line) and the probability of positive detection of each phenomenon. The raster is analyzed by the NN. The output of the NN is presented as bidimensional eventually post-processed to obtain boolean values indicating the pres- matrix, called a probability raster (third line). The rows in the matrix ence or absence of phenomena for each frame (the fourth line) doctors and the automatic analyzing system based on the NNs analysis showed that the performance of those two groups developed by the authors. (the doctors and NN) are significantly different, it is reason- To measure the performances, the GS was established as a able to state that that ML-based analysis that uses the NN set of 522 recordings taken from the respiratory system of 50 algorithm introduced here is more efficient in detecting all pediatric patients and gathered during auscultation using elec- four pathological phenomena (wheezes, rhonchi, and coarse tronic stethoscopes in real situations. Since auscultation tends and fine crackles), which is reflected in the high values of to be subjective and there is not an objective measure of cor- recall (sensitivity) and the F1-score. It is worth noting that rectness, those recordings were then tagged (described) by the biggest difference between the performance of doctors doctors and experienced acousticians in terms of pathological and the NN was observed in the case of coarse crackles, where phenomena content. The recordings with consistent taggings the NN clearly outperformed. Moreover, it has to be men- were taken as a point of reference. The inconsistent ones were tioned that the NN performance is also higher than that of described by a consilium (2 experienced pediatricians and one the doctors in the case of ambiguous sounds (i.e., rhonchi) acoustician). Only positively verified recordings were used in which tend to be misinterpreted or evaluated in an improper the next steps of the experiment. In this way, a very reliable way in everyday medical practice. Finally, the difference be- GS was established which was taken as a point of reference for tween the performance of the doctors and the NN was less the evaluation and comparison of the descriptions of both significant when it came to the recognition of wheezes; how- doctors and the newly developed NN. Since the statistical ever, this is just because the performance of doctors with those Eur J Pediatr (2019) 178:883–890 889 Table 2 Juxtaposition of recall Recall (sensitivity), % Precision (%) Specificity (%) F1-score (%) (sensitivity), precision, specificity, and F1-score for Doctors NN Doctors NN Doctors NN Doctors NN doctors (pediatricians) and NN Coarse crackles 56.1 56.1 34.6 40.7 84.6 88.2 42.8 47.1 Fine crackles 72.3 83.9 39.5 52.5 69.8 79.3 51.1 64.6 Wheezes 58.1 78.2 66.1 57.7 90.7 82.2 61.8 66.4 Rhonchi 67.3 87.6 55.9 61.1 85.3 84.6 61.0 72.0 Mean 63.5 76.5 49.0 53.0 82.6 83.6 54.2 62.5 manuscript for important intellectual content. HH, JK, and AB were re- signals which are easiest to interpret is relatively high. Thus, sponsible for final approval of the version to be published. TG, RB, and the potential of the proposed solution seems to be enormous. It MS were accountable for all aspects of the work in ensuring that questions must be also emphasized that the NN algorithm was taught related to the accuracy or integrity of any part of the work are appropri- using thousands of recordings and taggings, which makes the ately investigated and resolved. All authors read and approved the final manuscript. results unique and reliable. Funding This work was supported by the grant from The National Centre of Research and Development (NCRD) in Poland and European Union under Sub-Measure 1.1.1 of the Operational Programme Smart Conclusions Growth (PO IR) BIndustrial R&D carried out by enterprises^ (Fast Track), agreement no. POIR.01.01.01-00-0528/16-00. To conclude, the NN algorithms that were used in this exper- iment can be described as a very efficient tool for pathological Compliance with ethical standards sound detection. This is why AI may become a valuable sup- port for doctors, medical students, or care providers (also lay Ethics approval and consent to participate All studies were approved by the Bioethics Commission at Poznań University of Medical Sciences ones), both when it comes to diagnosing or monitoring pro- (approval number 193/18). cesses, on the one hand, and training or education on the other. The database we built is itself a very good tool in this field. Consent for publication Not applicable. Moreover, the AI algorithms can be also beneficial for lay people in terms of monitoring their respiratory system at Competing interests The authors declare that they have no competing home, which makes this solution valuable in many areas, interests. e.g., patient safety; reaction speed in case of danger; and, for Open Access This article is distributed under the terms of the Creative reducing, the cost of treatment. Commons Attribution 4.0 International License (http:// It also must be emphasized that there are many publications creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appro- that correlate pathological sounds with particular disease; priate credit to the original author(s) and the source, provide a link to the however, it is more complicated. There are many publications Creative Commons license, and indicate if changes were made. that show that efficiency of physicians is very low [1, 10]; thus, the AI solution is a first step in making auscultation more objective with less incorrect identification and thus better cor- relation with diseases made by physicians. References Finally, AI algorithms can also be used in other areas, such as heart disease, which makes this area even more promising, 1. Aviles-Solis JC, Vanbelle S, Halvorsen PA (2017) International perception of lung sounds: a comparison of classification across especially taking into account that the results from this exper- some European borders. BMJ Open Respir Res 4:e000250 iment which was carried out in real conditions, not in a labo- 2. Bishop PJ (1980) Evolution of the stethoscope. J R Soc Med 73: ratory with proven high performance of NN. 448–456 3. Bohadana A, Izbicki G, Kraman SS (2014) Fundamentals of lung Acknowledgments The Authors would like to thank the Management of auscultation. N Engl J Med 370:744–751 Karol Jonscher Clinical Hospital in Poznań, Poznań University of 4. Çakır E, Parascandolo G, Heittola T, Huttunen H, Virtanen T (2017) Medical Sciences, and the physicians from Department of Pediatric Convolutional recurrent neural networks for polyphonic sound Pneumology, Allergology and Clinical Immunology of this hospital for event detection. IEEE/ACM Trans Audio Speech Lang Process their help in recording and description of the acoustic signals. 25(6):1291–1303 5. Chamberlain D, Mofor J, Fletcher R, Kodgule R (2015) Mobile Authors’ contributions TG, MP, HH, and AB were responsible for con- stethoscope and signal processing algorithms for pulmonary screen- ception and design. TG, MP, HH, AP, and JK performer data analysis and ing and diagnostics. In: IEEE Global Humanitarian Technology interpretation. AP, JK, RB, and MS were responsible for drafting the Conference (GHTC) IEEE 385–392 890 Eur J Pediatr (2019) 178:883–890 6. Crackles DK (2018) and Other lung sounds. In: Priftis KN, 12. Sarkar M, Madabhavi I, Niranjan N, Dogra M (2015) Auscultation of the respiratory system. Ann Thorac Med 10(3):158–168 Hadjileontiadis LJ, Everard ML, eds. Breath sounds. 1st Edn. Springer International Publishing, pp. 225–236 13. Sestini P, Renzoni E, Rossi M, Beltrami V, Vagliasindi M (1995) 7. Forgacs M (1978) The functional basis of pulmonary sounds. Chest Multimedia presentation of lung sounds as learning aid for medical 73:399–405 students. Eur Respir J 8:783–788 8. Marques A. (2018) Normal versus adventitious respiratory sounds, 14. Sovijärvi AR, Dalmasso F, Vanderschoot J et al (2000) Definition In: Priftis KN, Hadjileontiadis LJ, Everard ML, eds. Breath sounds. of terms for applications of respiratory sounds. Eur Respir Rev 10: 1st Edn. Springer International Publishing, pp. 181–206 597–610 9. Oweis RJ, Abdulhay EW, Khayal A, Awad A (2015) An alternative 15. Wilkins RL, Dexter JR, Murphy RL Jr et al (1990) Lung sound respiratory sounds classification system utilizing artificial neural nomenclature survey. Chest 98:886–889 networks. Biom J 38(2):153–161 10. Pasterkamp H, Brand PLP, Everard M, Garcia-Marcos L, Melbye Publisher’snote Springer Nature remains neutral with regard to H, Priftis KN (2016) Towards the standardisation of lung sound jurisdictional claims in published maps and institutional affiliations. nomenclature. Eur Respir J 47(3):724–732 11. Reichert S, Gass R, Brandt C, Andrès E (2008) Analysis of respiratory sounds: state of the art. Clin Med Circ Respirat Pulm Med 2:45–58 http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png European Journal of Pediatrics Springer Journals

Practical implementation of artificial intelligence algorithms in pulmonary auscultation examination

Loading next page...
 
/lp/springer-journals/practical-implementation-of-artificial-intelligence-algorithms-in-0p0RqDNV1R
Publisher
Springer Journals
Copyright
Copyright © 2019 by The Author(s)
ISSN
0340-6199
eISSN
1432-1076
D.O.I.
10.1007/s00431-019-03363-2
Publisher site
See Article on Publisher Site

Abstract

Lung auscultation is an important part of a physical examination. However, its biggest drawback is its subjectivity. The results depend on the experience and ability of the doctor to perceive and distinguish pathologies in sounds heard via a stethoscope. This paper investigates a new method of automatic sound analysis based on neural networks (NNs), which has been implemented in a system that uses an electronic stethoscope for capturing respiratory sounds. It allows the detection of auscultatory sounds in four classes: wheezes, rhonchi, and fine and coarse crackles. In the blind test, a group of 522 auscultatory sounds from 50 pediatric patients were presented, and the results provided by a group of doctors and an artificial intelligence (AI) algorithm developed by the authors were compared. The gathered data show that machine learning (ML)–based analysis is more efficient in detecting all four types of phenomena, which is reflected in high values of recall (also called as sensitivity) and F1-score. Conclusions: The obtained results suggest that the implementation of automatic sound analysis based on NNs can significantly improve the efficiency of this form of examination, leading to a minimization of the number of errors made in the interpretation of auscultation sounds. What is Known: � Auscultation performance of average physician is very low. AI solutions presented in scientific literature are based on small data bases with isolated pathological sounds (which are far from real recordings) and mainly on leave-one-out validation method thus they are not reliable. What is New: � AI learning process was based on thousands of signals from real patients and a reliable description of recordings was based on multiple validation by physicians and acoustician resulting in practical and statistical prove of AI high performance. . . . . Keywords Auscultation Artificial intelligence Machine learning Respiratory system Stethoscope Abbreviations NN Neural networks; AI Artificial intelligence DNN Deep neural networks GS Golden standard Background ML Machine learning Auscultation has been considered as an integral part of phys- ical examination since the time of Hippocrates. The stetho- Communicated by Peter de Winter scope, introduced by Laennec [2] more than two centuries ago, was one of the first medical instruments which enabled * Honorata Hafke-Dys h.hafke@amu.edu.pl; hafke@stethome.com internal body structures and their functioning to be checked. The stethoscope still remains a tool that can provide po- tentially valuable clinical information. However, the results StethoMe, Winogrady 18A, 61-663 Poznań,Poland 2 of such examinations are strongly subjective and cannot be Department of Pediatric Pneumonology, Allergology and Clinical shared and communicated easily, mostly because of doc- Immunology, K. Jonscher Clinical Hospital, Poznań University of Medical Sciences, Szpitalna 27/33, 60-572 Poznań, Poland tors’ experience and perceptual abilities, which leads to dif- ferences in the their assessments, depending on their spe- Institute of Acoustics, Faculty of Physics, Adam Mickiewicz University, Poznań, Umultowska 85, 61-614 Poznań, Poland cialization (Hafke et al., submitted for publication). Another 884 Eur J Pediatr (2019) 178:883–890 important issue is the inconsistent nomenclature of respira- disorders [3]. This is why the proper detection and evaluation tory sounds. This problem is widely recognized [1], but to of crackles is of high importance. date, there is still no standardized worldwide classification Auscultation includes the evaluation of sound character, its of the types of phenomena appearing in the respiratory sys- intensity, frequency, and pathological signals occurring in the tem [10]. There is both a variety of terms used for the same breathing sound. Its subjective nature is widely recognized, sound by different doctors and different sounds described which has led to a new era of developments, for instance by the same term. Lung sounds, as defined by Sovijarvi computer-based techniques. et al. [14], concern all respiratory sounds heard or detected Recordings made with electronic stethoscopes may be over the chest wall or within the chest, including normal further analyzed by a digital system in terms of its acous- breathing sounds and adventitious sounds. In general, respi- tic features and, after proper signal processing, delivered ratory sound is characterized by a low noise during inspira- to the doctor at an enhanced level of quality or even tion, and hardly audible during expiration. The latter is lon- complemented by a visual representation, e.g., a spectro- ger than the former [12]. The spectrum of noise of normal gram. The latter should be considered as an association respiratory sound (typically 50–2500 Hz) is broader on the between an acoustical signal and its visual representation, trachea (up to 4000 Hz) [11]. and is beneficial to the learning and understanding of Adventitious sounds are abnormalities (pathologies) those sounds, not only for medical students [13], but also superimposed on normal breathing sounds. They can be di- when it comes to doctors diagnosing patients. vided into two sub-classes depending on their duration: con- Currently, the subject of the greatest attention in the tinuous (stationary) sounds—wheezes, rhonchi, and discon- field of computer-based medicine are neural networks tinuous (non-stationary) sounds—fine or coarse crackles. (NNs). NNs are a particularly fast developing area of Wheezes are continuous tonal sounds with a frequency machine learning which learn from examples, as human range from less than 100 Hz to more than 1 kHz, and a dura- do. A decade ago NNs were one of many available tion time longer than 80 ms [8]. They are generally recognized classifiers. They were trained on a small set of high- correctly and rarely misinterpreted, which makes them prob- level features and produced probability scores of a sam- ably the most easily recognized pathological sound [7]. ple belonging to one of several predefined classes. Their However, as Hafke et al. (submitted for publication) proved, popularity sharply rose when it was proven that deeper in the case of describing previously recorded sounds, doctors neuron structures are able to learn intermediate features have difficulty identifying this kind of pathology depending from low-level representations by themselves. These in- on breathing phase, i.e., inspiratory wheezes were confused termediate features learned by the NN are much more with expiratory wheezes and vice versa. distinctive and descriptive incomparison tohand-crafted Rhonchi are continuous, periodic, snoring-like, similar to features in many artificial intelligence (AI) tasks, includ- wheezes, but of lower fundamental frequency (typically below ing audio signal analysis and medicine. 300 Hz) and duration, typically longer than 100 ms [8]. It is one Contemporary deep neural networks (DNNs) operate on of the most ambiguous classes of pathological sounds, as it is raw signals directly and are therefore able to identify and often considered to be on the boundary between wheezes and exploit all important dependencies that they provide. But in crackles (especially of coarse type). Thus, they may be mistak- order to be able to do that, a large number of training examples en for them [15]. Although many authors suggested need to be provided. Yet, after these initial requirements are Brhonchus^ as a separate category [10], some doctors use the met, the NN algorithm is able to match or even surpass human term Blow-pitch wheeze^ [6]. Due to the fact they have the performance. This is also believed to be the best strategy for features of both wheezes and crackles, these phenomena are dealing with respiratory sounds. often differently classified by the respondents. As Hafke et al. Therefore, the aim of this study was to compare the effi- proved, this is strongly dependent on the examiner’s experi- ciency of AI and a group of five physicians in terms of respi- ence. Moreover, in the cited research, the advantage of ratory sounds identification in four main classes of patholog- pulmonologists was clearly visible. In their case, the number ical signals, according to [10]: wheezes (with no differentia- of correct rhonchi detections was 51.2%, while for other tion to sub-classes), rhonchi, and coarse and fine crackles. groups, this value did not exceed 30%, which was the lowest result for all the phenomena taken into account. Finally, crackles are short, explosive sounds of a non-tonal Material and methods character. They tend to appear both during inspiration and expiration. Two categories of this phenomenon have been Auscultation recordings described—fine and coarse crackles. They vary in typical length (ca. 5 ms and ca. 15 ms, respectively) and frequency The auscultation recording files were gathered from 50 visits (broad-band) and may appear in different respiratory system performed by pediatricians using StethoMe® and Littmann Eur J Pediatr (2019) 178:883–890 885 3200 electronic stethoscopes. All the recordings were made in It should be stressed that together with the recording pre- the Department of Paediatric Pulmonology (Karol Jonscher sentation, the information about the location of the point on University Hospital in Poznan, Poland). The subjects were the chest or back in which recording was made, as well as chosen on random from the patients of the abovementioned basic information about the sex and age of the child, the diag- hospital. The whole procedure of signal collection nosis, and accompanying diseases, were provided with every (recordings) took 6 months. In this period, patients with dif- recording of a particular visit. The medical description ferent diseases (thus, different pathological sounds) were hos- consisted of the assessment of whether in a given recording, pitalized. The decision about the recording was made after coming from a particular point, there were adventitious respi- auscultation by a pulmonologist working at the hospital. ratory sounds from each of the four classes. Those descrip- In general, each visit provided a set of 12 recordings—each of tions were compared both with the NN descriptions as well as them was made at a different auscultation point (Fig. 1). with the golden standard (GS). However, in case of children, as was the case in this research, it is often difficult to document breathing sounds from such a num- ber of auscultation points of a sufficiently high quality, due to Golden standard children’s movements and impatience, and crying, or because of other health issues. The age of patients was within the range of 1 Because of the fact that there is no objective measure that to 18 years old (mean 8.5; median, 8). This parameter however provides a classification of pathological breath sounds, it was not taken into account by AI, but the physicians were in- was necessary to establish a point of reference, which in this formed about the age of each patient. Therefore, the total number research is specified as the GS. The mentioned procedure for of recordings that were analyzed from 50 visits was 522. the GS is depicted in a few steps (Fig. 2.). Five pediatricians (different from the previous ones) carried out two self-reliant and independent verifications of the pre- Study design viously described recordings. Thus, each recording had a de- scription and two independent verifications. The recordings The main goal was to investigate the accuracy of NNs in the with double positive medical verifications were automatically classification of respiratory sounds in comparison with medi- qualified to the GS. When the doctors’ opinions were cal specialists. It must be emphasized that, in opposition to ambiguous—which means there was one positive verification most research in many scientific journals which was per- and one negative, the recording was analyzed by an acousti- formed on a small database or in laboratory conditions (e.g., cian experienced in signal recognition. Once the acoustician 5, 9), this research was based on a large amount of actual evaluated the description as disputable, which meant its con- auscultation recordings captured in realistic conditions (hos- tent could be ambiguous in terms of the acoustic parameters, pital). The four abovementioned classes of auscultation phe- the recording was forwarded to a consilium (2 experienced nomena (wheezes, rhonchi, and coarse and fine crackles) were pediatricians and one acoustician), which was convened to chosen as the most frequently occurring and described. The establish a medical description again. It must be emphasized nomenclature suggested by the European Respiratory Society that the GS consisted of real-life recordings collected from real [10] was applied in order to reduce the influence of ambiguous patients in real situations (hospital). Many of the recordings terminology on the final result. Audio data gathered by elec- contained additional external noise (crying, talking, stetho- tronic stethoscopes was described by doctors in terms of the scope movements, etc.). To make the GS as reliable as possi- presence of pathological sounds in certain phases of the ble, the consilium instead of one physicians described those breathing cycle and locations on the chest wall. The same cases. The descriptions from the consilium were not subjected description was carried out by the NN. to further verification (Fig. 2). Fig. 1 The specific localization of auscultation points in the front (left panel) and back (right panel) of a chest 886 Eur J Pediatr (2019) 178:883–890 Fig. 2 Scheme of the GS data acquisition procedure Finally, the GS contained 322 recordings with double-positive Table 1 Number of Phenomenon Number of recordings recordings in terms of the verification and 200 evaluated by the consilium (Table 1). appearance of specific Both no pathology and more than one pathology in one Wheezes 124 pathological phenomena recording were possible; thus, the number of recordings in Rhonchi 113 the Table 1 is not equal to the total number of 522 recordings Coarse crackles 66 used in the experiment. Fine crackles 112 Eur J Pediatr (2019) 178:883–890 887 Participants gathered for the doctors and the NN are statistically different. Detailed results are depicted in Table 2. Doctors The lowest F1-score was observed for coarse crackles both in the case of medical and NN descriptions. This may be The set of all the GS recordings set, accompanied with spec- partially due to the rare occurrence of coarse crackles in the trograms and basic information about each patient, was pre- analyzed database (see Table 1). Moreover, this kind of phe- sented to five pediatricians, and they described them in terms nomena is often confused with other types of crackles or rhon- of the occurrence of four pathological sounds (Table 1). One chi (Hafke et al., submitted for publication) so its correct de- description was made for each recording. tection might be problematic. However, it is important to note that the NN F1-score which is related to its performance in correct phenomena detection is higher than in the case of NN medical descriptions (47.1% vs. 42.8%). The highest F1-score was obtained for rhonchi and StethoMe AI NN architecture based on a modified version wheezes (both continuous, Bmusical^ sounds). Medical de- of that proposed by Çakir et al. [4] was used. This is a scriptions for rhonchi are comparable to the GS (which is specialized network suitable for polyphonic sound event reflected in F1-score value) in 61.0%, while NN is much more detection. It is composed of many specialized layers of accurate—72.0%. This is undeniable proof of the ambiguous neurons, including convolutional layers, which are effec- character of rhonchi, which results in poor detection perfor- tive at detecting local correlations in the signal, as well as mance (probably caused by mistaking them for other phenom- recurrent layers designed to capture long-time dependen- ena, as evidenced by low precision and recall (sensitivity) cies, e.g., a patient’s breathing cycle and the associated values when compared to the NN). recurrence of pathological sounds. The NN had been When it comes to wheezes, despite the slightly lower trained and validated on a set of more than 6000 real values of precision and specificity noted for the NN, its final and 10,071 artificial/synthetic recordings. This dataset performance, expressed in F1-score value, is better than in the was completely different from the GS set. Furthermore, case of human tagging. The results are as follows—61.8% and another database was used in order to provide better noise 66.4%, with NN superiority. detection. As output, the NN provided a matrix called the It canalsobenotedthat theAI-basedanalysisis probability raster. In this data structure, the rows represent more accurate in detecting rhonchi and wheezes. This time, discretized into 10 ms frames, while the columns maybedue to the fact that it is basedmainlyonthe depict the probability of phenomena detection changing spectrograms, which accurately reflect tonal content in a over the frames. The probability values are then recording. For the doctors, descriptions are mainly thresholded in order to obtain boolean values indicating based mainly on acoustical cues, while the visual repre- the presence or absence of such phenomenon along each sentation is used rather as an additional, supporting tool. frame (Fig. 3). This may be an important issue influencing the proper detection of pathology, especially when phenomena is of ambiguous nature (e.g., rhonchi) or accompanied by Analysis louder sounds, which make them barely audible (e.g., silent wheezes). Results The biggest differences in F1-scores, meaning a significant predominance of the new automatic system over doctors, are A GS was used as a point of reference (100%) for tagging observed for fine crackles—64.6% vs. 51.1%. Also, all of recordings performed by doctors and the NN. Therefore, con- other parameters are higher for the NN. fusion matrices could be analyzed—the valuesofrecall (the Generally, for each of the four phenomena, the F1-score for proportion of actual positives that are correctly identified as the NN is higher than for doctors with an average of 8.4 such, also called as sensitivity), precision (the fraction of rel- percentage points (p.p.), which clearly indicates the advantage evant instances among the retrieved instances), specificity (the of the tested algorithm over the group of doctors. NN is 13 p.p. proportion of actual negatives that are correctly identified), in average more sensitive and 4 p.p. more precise than the and the F1-score (the harmonic mean of precision and recall) reference group of pediatricians. were measured for the doctors and NN’s phenomena detection in comparison with the GS. First the chi-square test (α =0.05) Discussion was performed to investigate if there is a difference in the data gathered for doctors and the NN. The proposed null hypothe- The main goal of this research was to investigate the effective- sis was rejected for all four phenomena. Therefore, the results ness of pathological respiratory sounds detection for both 888 Eur J Pediatr (2019) 178:883–890 Fig. 3 Exemplary probability raster for fine crackles (a) and rhonchi (b): represent time, framed in windows of 10 ms each; the columns show the signal (first line) is transformed into a spectrogram (second line) and the probability of positive detection of each phenomenon. The raster is analyzed by the NN. The output of the NN is presented as bidimensional eventually post-processed to obtain boolean values indicating the pres- matrix, called a probability raster (third line). The rows in the matrix ence or absence of phenomena for each frame (the fourth line) doctors and the automatic analyzing system based on the NNs analysis showed that the performance of those two groups developed by the authors. (the doctors and NN) are significantly different, it is reason- To measure the performances, the GS was established as a able to state that that ML-based analysis that uses the NN set of 522 recordings taken from the respiratory system of 50 algorithm introduced here is more efficient in detecting all pediatric patients and gathered during auscultation using elec- four pathological phenomena (wheezes, rhonchi, and coarse tronic stethoscopes in real situations. Since auscultation tends and fine crackles), which is reflected in the high values of to be subjective and there is not an objective measure of cor- recall (sensitivity) and the F1-score. It is worth noting that rectness, those recordings were then tagged (described) by the biggest difference between the performance of doctors doctors and experienced acousticians in terms of pathological and the NN was observed in the case of coarse crackles, where phenomena content. The recordings with consistent taggings the NN clearly outperformed. Moreover, it has to be men- were taken as a point of reference. The inconsistent ones were tioned that the NN performance is also higher than that of described by a consilium (2 experienced pediatricians and one the doctors in the case of ambiguous sounds (i.e., rhonchi) acoustician). Only positively verified recordings were used in which tend to be misinterpreted or evaluated in an improper the next steps of the experiment. In this way, a very reliable way in everyday medical practice. Finally, the difference be- GS was established which was taken as a point of reference for tween the performance of the doctors and the NN was less the evaluation and comparison of the descriptions of both significant when it came to the recognition of wheezes; how- doctors and the newly developed NN. Since the statistical ever, this is just because the performance of doctors with those Eur J Pediatr (2019) 178:883–890 889 Table 2 Juxtaposition of recall Recall (sensitivity), % Precision (%) Specificity (%) F1-score (%) (sensitivity), precision, specificity, and F1-score for Doctors NN Doctors NN Doctors NN Doctors NN doctors (pediatricians) and NN Coarse crackles 56.1 56.1 34.6 40.7 84.6 88.2 42.8 47.1 Fine crackles 72.3 83.9 39.5 52.5 69.8 79.3 51.1 64.6 Wheezes 58.1 78.2 66.1 57.7 90.7 82.2 61.8 66.4 Rhonchi 67.3 87.6 55.9 61.1 85.3 84.6 61.0 72.0 Mean 63.5 76.5 49.0 53.0 82.6 83.6 54.2 62.5 manuscript for important intellectual content. HH, JK, and AB were re- signals which are easiest to interpret is relatively high. Thus, sponsible for final approval of the version to be published. TG, RB, and the potential of the proposed solution seems to be enormous. It MS were accountable for all aspects of the work in ensuring that questions must be also emphasized that the NN algorithm was taught related to the accuracy or integrity of any part of the work are appropri- using thousands of recordings and taggings, which makes the ately investigated and resolved. All authors read and approved the final manuscript. results unique and reliable. Funding This work was supported by the grant from The National Centre of Research and Development (NCRD) in Poland and European Union under Sub-Measure 1.1.1 of the Operational Programme Smart Conclusions Growth (PO IR) BIndustrial R&D carried out by enterprises^ (Fast Track), agreement no. POIR.01.01.01-00-0528/16-00. To conclude, the NN algorithms that were used in this exper- iment can be described as a very efficient tool for pathological Compliance with ethical standards sound detection. This is why AI may become a valuable sup- port for doctors, medical students, or care providers (also lay Ethics approval and consent to participate All studies were approved by the Bioethics Commission at Poznań University of Medical Sciences ones), both when it comes to diagnosing or monitoring pro- (approval number 193/18). cesses, on the one hand, and training or education on the other. The database we built is itself a very good tool in this field. Consent for publication Not applicable. Moreover, the AI algorithms can be also beneficial for lay people in terms of monitoring their respiratory system at Competing interests The authors declare that they have no competing home, which makes this solution valuable in many areas, interests. e.g., patient safety; reaction speed in case of danger; and, for Open Access This article is distributed under the terms of the Creative reducing, the cost of treatment. Commons Attribution 4.0 International License (http:// It also must be emphasized that there are many publications creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appro- that correlate pathological sounds with particular disease; priate credit to the original author(s) and the source, provide a link to the however, it is more complicated. There are many publications Creative Commons license, and indicate if changes were made. that show that efficiency of physicians is very low [1, 10]; thus, the AI solution is a first step in making auscultation more objective with less incorrect identification and thus better cor- relation with diseases made by physicians. References Finally, AI algorithms can also be used in other areas, such as heart disease, which makes this area even more promising, 1. Aviles-Solis JC, Vanbelle S, Halvorsen PA (2017) International perception of lung sounds: a comparison of classification across especially taking into account that the results from this exper- some European borders. BMJ Open Respir Res 4:e000250 iment which was carried out in real conditions, not in a labo- 2. Bishop PJ (1980) Evolution of the stethoscope. J R Soc Med 73: ratory with proven high performance of NN. 448–456 3. Bohadana A, Izbicki G, Kraman SS (2014) Fundamentals of lung Acknowledgments The Authors would like to thank the Management of auscultation. N Engl J Med 370:744–751 Karol Jonscher Clinical Hospital in Poznań, Poznań University of 4. Çakır E, Parascandolo G, Heittola T, Huttunen H, Virtanen T (2017) Medical Sciences, and the physicians from Department of Pediatric Convolutional recurrent neural networks for polyphonic sound Pneumology, Allergology and Clinical Immunology of this hospital for event detection. IEEE/ACM Trans Audio Speech Lang Process their help in recording and description of the acoustic signals. 25(6):1291–1303 5. Chamberlain D, Mofor J, Fletcher R, Kodgule R (2015) Mobile Authors’ contributions TG, MP, HH, and AB were responsible for con- stethoscope and signal processing algorithms for pulmonary screen- ception and design. TG, MP, HH, AP, and JK performer data analysis and ing and diagnostics. In: IEEE Global Humanitarian Technology interpretation. AP, JK, RB, and MS were responsible for drafting the Conference (GHTC) IEEE 385–392 890 Eur J Pediatr (2019) 178:883–890 6. Crackles DK (2018) and Other lung sounds. In: Priftis KN, 12. Sarkar M, Madabhavi I, Niranjan N, Dogra M (2015) Auscultation of the respiratory system. Ann Thorac Med 10(3):158–168 Hadjileontiadis LJ, Everard ML, eds. Breath sounds. 1st Edn. Springer International Publishing, pp. 225–236 13. Sestini P, Renzoni E, Rossi M, Beltrami V, Vagliasindi M (1995) 7. Forgacs M (1978) The functional basis of pulmonary sounds. Chest Multimedia presentation of lung sounds as learning aid for medical 73:399–405 students. Eur Respir J 8:783–788 8. Marques A. (2018) Normal versus adventitious respiratory sounds, 14. Sovijärvi AR, Dalmasso F, Vanderschoot J et al (2000) Definition In: Priftis KN, Hadjileontiadis LJ, Everard ML, eds. Breath sounds. of terms for applications of respiratory sounds. Eur Respir Rev 10: 1st Edn. Springer International Publishing, pp. 181–206 597–610 9. Oweis RJ, Abdulhay EW, Khayal A, Awad A (2015) An alternative 15. Wilkins RL, Dexter JR, Murphy RL Jr et al (1990) Lung sound respiratory sounds classification system utilizing artificial neural nomenclature survey. Chest 98:886–889 networks. Biom J 38(2):153–161 10. Pasterkamp H, Brand PLP, Everard M, Garcia-Marcos L, Melbye Publisher’snote Springer Nature remains neutral with regard to H, Priftis KN (2016) Towards the standardisation of lung sound jurisdictional claims in published maps and institutional affiliations. nomenclature. Eur Respir J 47(3):724–732 11. Reichert S, Gass R, Brandt C, Andrès E (2008) Analysis of respiratory sounds: state of the art. Clin Med Circ Respirat Pulm Med 2:45–58

Journal

European Journal of PediatricsSpringer Journals

Published: Mar 29, 2019

References

You’re reading a free preview. Subscribe to read the entire article.


DeepDyve is your
personal research library

It’s your single place to instantly
discover and read the research
that matters to you.

Enjoy affordable access to
over 18 million articles from more than
15,000 peer-reviewed journals.

All for just $49/month

Explore the DeepDyve Library

Search

Query the DeepDyve database, plus search all of PubMed and Google Scholar seamlessly

Organize

Save any article or search result from DeepDyve, PubMed, and Google Scholar... all in one place.

Access

Get unlimited, online access to over 18 million full-text articles from more than 15,000 scientific journals.

Your journals are on DeepDyve

Read from thousands of the leading scholarly journals from SpringerNature, Elsevier, Wiley-Blackwell, Oxford University Press and more.

All the latest content is available, no embargo periods.

See the journals in your area

DeepDyve

Freelancer

DeepDyve

Pro

Price

FREE

$49/month
$360/year

Save searches from
Google Scholar,
PubMed

Create folders to
organize your research

Export folders, citations

Read DeepDyve articles

Abstract access only

Unlimited access to over
18 million full-text articles

Print

20 pages / month

PDF Discount

20% off