Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

Unlocking the Power of Artificial Intelligence and Big Data in Medicine

Unlocking the Power of Artificial Intelligence and Big Data in Medicine Data-driven science and its corollaries in machine learning and the wider field of artificial intelligence have the potential to drive important changes in medicine. However, medicine is not a science like any other: It is deeply and tightly bound with a large and wide network of legal, ethical, regulatory, economical, and societal dependencies. As a consequence, the scientific and technological progresses in handling information and its further processing and cross-linking for decision support and predictive systems must be accompanied by parallel changes in the global environment, with numerous stakeholders, including citizen and society. What can be seen at the first glance as a barrier and a mechanism slowing down the progression of data science must, however, be considered an important asset. Only global adoption can transform the potential of big data and artificial intelligence into an effective breakthroughs in handling health and medicine. This requires science and society, scientists and citizens, to progress together. (J Med Internet Res 2019;21(11):e16607) doi: 10.2196/16607 KEYWORDS medical informatics; artificial intelligence; big data Since its early days, the field of AI has allowed the development Introduction of many techniques supporting decision support and prediction, as it is usually made by humans. As early as 1958, a perceptron Most of the daily news and recently published scientific papers was expected to be able “to walk, talk, see, write, reproduce on research, innovations, and applications in artificial itself and be conscious of its existence,“ which led a large intelligence (AI) refer to what is known as machine scientific controversy between neural network and symbolic learning—algorithms using massive amounts of data and various reasoning approaches [3]. The landscape of AI research includes methodologies to find patterns, support decisions, make knowledge representation and engineering; rule-based and predictions, or, for the deep learning part, self-identify important symbolic reasoning; temporal reasoning and planning; sensing features in data. However, AI is a complex concept to grasp, and perception; learning; evolutionary, emerging, social and most people have little understanding of what it really is. behaviors; and the ability to move and manipulate objects, to AI was founded as an academic discipline in 1956 and, despite name the most important [4]—deep machine learning with its youth, already has a rich history [1,2]. In more than 60 years autonomous features extraction. It is the point of view taken in of exploration and progress, AI has become a large field of the paper, acknowledging however that, more recently, there is research and development involving multidisciplinary a trend to restrict AI to the latter, autonomous deep machine approaches to address many challenges, from theoretical learning. As a consequence of the wide landscape, the field frameworks, methods, and tools to real implementations, risk draws at large through philosophy, mathematics, information analysis, and impact measures. The definition of AI is a moving sciences, computer science, psychology, anthropology, social target and changes over time with the evolution of the field. https://www.jmir.org/2019/11/e16607 J Med Internet Res 2019 | vol. 21 | iss. 11 | e16607 | p. 1 (page number not for citation purposes) XSL FO RenderX JOURNAL OF MEDICAL INTERNET RESEARCH Lovis sciences, linguistics, and many others. For some experts and being the flu? If a person has a high measurement of blood visionary people such as Ray Kurzweil, deep machine learning sugar, what is the probability of it being diabetes? To illustrate will allow building of an artificial general intelligence that is the Bayesian trap, let us take a simple example—a pregnancy able to develop itself autonomously and to have the capacity to test. This is a simple test; it can be positive or negative. Let us understand or learn any intellectual task that a human being imagine that we use a classical test, which has 99% sensitivity can, and even go far beyond the limits of human intelligence and 95% specificity. If 100 tests are performed for 100 persons [5], but most experts would agree that there are some big and 5 turn out to be positive, the question is, know how many missing pieces and it is still a long way off, despite recent of them turn out to be pregnant women. This question is defined potential important advances in quantic computing [6]. A recent as determining the positive predictive value of a positive test, white paper published by the European Commission and and it gives the probability of a positive test to be really signing authored by the members of the High-Level Expert Group on the presence of the factor the test is testing. To answer this AI provides, in a few pages, a good overview on what AI is, its question, we need to know the prior probability of being main capabilities, applicable expectations, and disciplines pregnant in the tested population. To understand this, imagine involved [7]. that 100% men were tested; in this case, none of the 5 positive tests would correspond to a pregnant woman. Similarly, if all Taking into the field of AI at large, it is important to emphasize persons tested are pregnant woman, then all 5 positive tests that AI is already broadly used today in medicine. Decision would correspond to pregnant women. If the prior probability support based on knowledge engineering and rule-based systems is around 1%, then applying the Bayesian rules returns that the are implemented widely in computerized provider order entry probability to be pregnant when a test is reported positive is (CPOE) worldwide. Advanced signal processing is implemented about 17%. This means that about 4 of 5 tests are false positive. in pacemakers or defibrillators to take decisions, in At the other end, if the prior probability is around 20% (ie, a cochlear-implants with man-machine interfaces, in woman with several factors suggesting a potential pregnancy), electrocardiograms to provide signal analysis and automated the probability of a positive test to be a true positive is above diagnosis, etc. 80%. Thus, less than one test out of five is a false-positive. The The AI field in itself is aspirational and is expected to contribute example shows the major consequences of the prior probability significantly to medicine, from research to citizen-centered in Bayesian situations. health. Machine learning and deep learning has led most recent These are the consequences of AI. The models must take into major breakthroughs in AI, such as sound (speech and music) account prior probability in the population they are used. This recognition and image (face, radiology, pathology, dermatology, should be better understood even when reporting results in the etc) recognition, and in gaming. Recently, image recognition literature, often limited to specificity and sensitivity. Another has almost reached a level of maturity through which it can be consequence only becomes visible when several focused and used and developed by nonexperts in AI [8,9]. However, the near-to-perfect systems are used together in complex cases. For hype around AI in these last few years has built high example, having many systems, each with its own false-positive expectations and similarly high fears. There are still very few rate can end up with consolidated systems that have the sum of systems based on autonomous deep learning that have emerged all false-positives. This has been shown well with decision widely in the commercial market. support system in CPOE, with a very high rate of false-positive The world of AI could roughly be summarized in three alerts, especially with patients receiving complex drug therapies sequential and superposed acts: [10,11]. Act 1: Humans teach machines to handle data and information. Regulatory Labyrinth Most diagnostic or therapeutic means used nowadays in Act 2: Humans teach expertise to machines. medicine have to go through complex regulatory frameworks Act 3: Humans teach machines to learn alone. to get market approval. The regulatory agencies mostly base their decisions on safety, evidence, and added value. In addition, Challenges medicoeconomic assessments are often used by health agencies according to various dimensions such as quality-adjusted life There are many challenges that need to be addressed in the field year and burden of the disease, by using indicators such as of AI, when it comes to medicine. Most of them are not disability-adjusted life-years [12-14]. These decisions thus have exclusive to medicine and health, but their addition makes the economical and legal consequences, including accountability. goals significantly much harder to reach. The role of regulatory agencies is discussed, especially around Bayesian Trap topics that are getting into the market, such as in image recognition [15]. For example, a call for inputs for “Artificial Medicine and health determinants, in general, are characterized Intelligence and Machine Learning in Software as a Medical by their usually fundamental Bayesian property. In the Bayesian Device“ has been launched by the Federal Drug Administration probability approach, a prior probability is required to evaluate [16]. This is an important aspect, as regulatory agency support the strength of the prediction. is an important asset in building trust for most care professionals Most of what is used in medicine, notably but not exclusively, to use medical tools and for companies to invest in robust to establish diagnosis, falls in the Bayesian approach. For products ready for the market. However, this requires us to example, if a person has a fever, what is the probability of it define a clear regulatory framework, appropriate evaluation https://www.jmir.org/2019/11/e16607 J Med Internet Res 2019 | vol. 21 | iss. 11 | e16607 | p. 2 (page number not for citation purposes) XSL FO RenderX JOURNAL OF MEDICAL INTERNET RESEARCH Lovis processes, and benchmark tools without blocking innovation An unexpected consequence of the “data quality chiasm” is its [17]. influence on modifying acquisition processes, especially in clinical contexts. One often hears sentences such as “the quality Education and Practice Gap of clinical data is not good enough for research.” As a result, Medicine is a science with numerous tools and devices, from there is a constant pressure to move toward more structured stethoscopes to scalpels, microscope to scanners, scores, data acquisition processes. For example, the RECIST (Response guidelines, etc. Most of these tools and devices require education Evaluation Criteria In Solid Tumors) guidelines are meant to and sometimes very specific certification processes for care standardize the radiologic evaluation criteria in solid tumors professionals that use them, not to speak about a good oncological trial treatments. This has been successfully experience. This should be also the case for software, developed for trials. Use of RECIST requires good experience algorithms, and other decision support systems. However, this to avoid interobserver variability, which can be as high as 20% is not the case. Education to use software and understand [29-31]. This assessment has been adapted to reflect changes systems as important as the computerized patient records is in radiological response, for example, in immunotherapies where often minimal. When it is about big data and AI, education on the size of tumors can increase despite good therapeutic response the topic is worse, usually inexistent. There are only very few [32]. Unfortunately, there is growing pressure to extend the use medical schools that teach the use of AI to future health of RECIST and other similarly structured staging guidelines professionals. AI should become mandatory teaching in all beyond clinical trials for all radiological staging to improve the medical schools in the world as a priority. Experts have been capacity to use standard clinical care for therapeutic assessment. raising the question since 20 years, but it has received real focus As a consequence, this leads to a high time pressure on only recently [18-22]. In 5-10 years, when current young operational activities of radiology departments and an increasing students will be starting their clinical activities, machine learning number of inexperienced people using these types of staging. based on data science will have become embedded in many With the progression of natural interfaces such as voice activities, devices, and software and its use, misuse, and overuse recognition and natural language processing and their increased and consequences on patients and accountability will depend daily use in a growing number devices, I would argue in favor on how users will master it [23]. of avoiding artificial structuring many data acquisition processes and keep the data in their most natural form, exploiting more Data Quality Chiasm natural interactions such as voice and text and developing strong Data quality is a recurring topic of discussion when it comes to natural language processing tools that can be applied to produce big data and analytics. One of the characteristics of the big data structured information in a postprocessing step. This will allow era is that data are often used for a purpose that differs from the reprocessing of all narratives whenever needed by new one that motivated data acquisition. This is a notable difference structured resources required. with traditional hypothetic-deductive scientific approaches in medicine, where a hypothesis leads to a methodology design, Quest for Truth which itself will lead to specific data acquisition. In the big data era, the primary goal of data-producing processes is often Many aspects of the landscape of artificial intelligence require completely independent from possible use of the data. It is a good idea of what is true. Knowledge engineering builds the interesting to emphasize that long-term clinical cohorts and graph of the “known” or the “relevant” such as it is made in long-term biobanks face similar challenges. Designing long-term SNOMED CT (Systematized Nomenclature of Medicine - cohorts and building metadata framework and standard operating Clinical Terms) or the Open Biological and Biomedical procedures for biobanking are important challenges, as they Ontology Foundry [33,34]. The same applies with rule-based have to project usages that will be made years after the initial techniques or symbolic reasoning, which need to be able to design. express rules, that is, truth in a formalized way, but also in supervised machine learning approaches, which require having These questions have led to a consequent literature addressing training sets that express truth, at least a probabilistic truth. the question of data quality and secondary usage of clinical data. There are a lot of expectations in these approaches, especially However, most of this work tries to describe dimensions able when combing them [35,36], but all of them, except to assess the “intrinsic” or “absolute” quality of data [24-28]. unsupervised deep machine learning, require some sources of Another approach could be to adopt a “fit-for-purpose” truth, which leads to the fundamental question of finding the approach, which considers only the quantitative and descriptive sources of truth in life sciences and the level of evidence properties of data, allowing further processing. The “qualitative” supporting that truth. At the first glance, it seems to be a trivial properties of any dataset can only be assessed in conjunction question. However, the “truth” is often “lost in text” because with a specific secondary usage. This means that the same for most of it, the sources rely on complex narratives that dataset will be appropriate to answer some scientific questions contextualize the messages they convey. In addition, the “truth” and not appropriate to answer others. The data are not “good” is very diluted. For example, with more than 2500 papers or “bad” by themselves; they are “good” or “bad” when used indexed daily in Medline/PubMed [37], it is nearly impossible in a specific context: the “fit-for-purpose” assessment. This is for an expert to catch everything published in its own research one of the major objectives of the FAIR data initiative, which field. Finally, and by nature, science is evolving, and thus, aims at insuring “a posteriori” data usability (see below). scientific “truth” of what was true once may no more be true today. For example, it was clear until recently that there are two https://www.jmir.org/2019/11/e16607 J Med Internet Res 2019 | vol. 21 | iss. 11 | e16607 | p. 3 (page number not for citation purposes) XSL FO RenderX JOURNAL OF MEDICAL INTERNET RESEARCH Lovis types of lymphocytes—the B and T cells. However, a recent of data acquisition and flow is usually more difficult than that paper from Rizwan et al [38] describes a new type of in traditional controlled studies. The consequence is that the lymphocyte, bearing characteristics of both B and T cells, which data have specific properties, which are not always well may play a role in driving autoimmunity in some diseases such managed, such as selection biases. Sometimes, the assumptions as diabetes [38]. Sources of truth and their characterizations, constraining the use of analytical tools are not well understood, such as the level of evidence or their context of use, are such as homoscedasticity for many statistical tests. In addition, increasingly important. This should be available to all, similar deep machine learning is facing the challenges of precise to Cochrane [39], covering all area of life sciences; maintained; reproducibility and explainability. The latter is currently the and in machine-readable form. object of numerous works, trying to understand intermediate representation of data in neural networks that can predict and Building Trust explain their behavior. Explainability and interpretability are often used interchangeably. Interpretability is the extent to which In science, trust is strongly related to building evidence. Trust it is possible to predict how the system will behave, given a is important in not only the scientific community, but also at change in input or algorithmic parameters. On the other hand, large, to build adoption, political support, and public acceptance. explainability is the extent to which the internal mechanics of In summer 2019, a survey published by the Pew Research Center the deep learning system can be understood and thus explained. showed a positive trend among the public: science acts for the Molnar [44] published a very good overview of the problem in good, but with concerns about integrity, transparency, and bias. an open book available on GitHub. However, explainability Overall, 86% of Americans say they have at least “a fair might not be the best road to raise global trust in deep machine amount” of confidence in scientists [40]. One of the challenges learning approaches, especially when the explanations is that scientific reliability has often been confused with themselves are hard to explain. Some other dimensions such as trustworthiness [41]. Scientific evidence can be very strong, transparency, reproducibility, or uncertainty qualifications might such as for immunization or Web-based health information, but be more effective [45]. For example, in Science in 2018, Hutson the trust can be much lower [42,43]. There are many dimensions [46] reported a survey of 400 artificial intelligence papers that have been discussed in building trust in science, but they presented at major conferences, with only 6% including code can be summarized in three concepts, one for the scientists and for the algorithms and 30% test data, thus considerably limiting the organizations, one for the objects of the research, and one reproducibility possibilities [46]. for the processes. Integrity is first and most important and covers scientific integrity, funding, conflict of interests, etc. FAIR Data Hope Transparency must be present for the motivation, outcomes, and process. Finally, methodologies applied to handle the The FAIR Guiding Principles are guidelines to make data processes must be strong and robust. Building evidence requires discoverable and processable by both humans and machines. many dimensions to be taken into account, such as bias, They were first published by Wilkinson et al [47]. The FAIR generalizability, reproducibility, and explainability. Some Guiding Principles are based on a set of criteria listed in Textbox challenges are more difficult in big data and AI. Proper control 1: https://www.jmir.org/2019/11/e16607 J Med Internet Res 2019 | vol. 21 | iss. 11 | e16607 | p. 4 (page number not for citation purposes) XSL FO RenderX JOURNAL OF MEDICAL INTERNET RESEARCH Lovis Textbox 1. FAIR data criteria. Findable (Meta)data are assigned a globally unique and persistent identifier Data are described with rich metadata (defined below) Metadata clearly and explicitly include the identifier of the data described (Meta)data are registered or indexed in a searchable resource Accessible (Meta)data are retrievable by their identifier using a standardized communications protocol The protocol is open, free, and universally implementable The protocol allows for an authentication and authorization procedure, where necessary Metadata are accessible, even when the data are no longer available Interoperable (Meta)data use a formal, accessible, shared, and broadly applicable language for knowledge representation (Meta)data use vocabularies that follow FAIR Principles (Meta)data include qualified references to other (meta)data Reusable Meta(data) are richly described with a plurality of accurate and relevant attributes: (Meta)data are released with a clear and accessible data usage license (Meta)data are associated with detailed provenance (Meta)data meet domain-relevant community standards Several frameworks have been defined to assess and evaluate becoming less effective to protect privacy. Increasing the compliance to FAIR criteria, such as the FAIR maturity heterogeneous data sources and richness of data about each of tools [48,49]. As such, FAIR data do not imply that data are in us, associated with data linkage techniques, strongly increases the Open Data space [50]. Access can be restricted, such as in the possibility of reidentification, including anonymized data the Harvard Dataverse, and this is an important point to [52-57]. The challenge and potential impacts are even bigger emphasize. There might be a lot of restriction to have data or for genetic information [58-60]. There is no good technical metadata available in the Open Data space, because of national solution that can harmonize the challenge of preserving privacy regulation, privacy protection, intellectual property, etc. FAIR and answering the increasing need of data-driven science for data do not make data available; they make data usable under accessing large genomic et phenotypic datasets, and there are the condition that it is authorized. many ongoing ethical and legal discussions [61-66]. Interestingly, this is not restricted to science, and the same The FAIR initiative is crucial. It illustrates the movement of applies to patients’ needs for health information [67]. There is data from objects to assets initiated this last decade and a need for better global education about implications and risks described in the essay of Sabina Leonelli [51] recently published of privacy, citizen, policy makers, students, research community, in Nature. and all stakeholders. A recent scoping review [68] has shown The initiative promotes the use of rich metadata framework, that the understanding of anonymization and de-deidentification compliant to standards and formal descriptions. It promotes the is heterogeneous in the scientific community [68]. use of free and open resources for descriptive information and Discrimination is one of the major risks in privacy breaches, protocols. It allows us to build a framework of shareable and disclosing privacy information can have many consequences resources that can be used for processing, without actually [69-71], including in reimbursement and insurance coverage sharing the data in an open space. FAIR allows building of a [72,73]. It is important to find the right path between naïve framework that is inclusive with all data sources, including positivism and irrational paranoia. An important step forward those that are subject to authorization, with clear and open is to improve awareness and education of all stakeholders about protocols. privacy, technical limitations to protect it, and building regulatory barriers to avoid discrimination. Privacy – New Deal In the era of big data, privacy requires special attention. Usual paradigms of limiting access to deidentified information are https://www.jmir.org/2019/11/e16607 J Med Internet Res 2019 | vol. 21 | iss. 11 | e16607 | p. 5 (page number not for citation purposes) XSL FO RenderX JOURNAL OF MEDICAL INTERNET RESEARCH Lovis respecting human rights to enable responsible genomic data Conclusions sharing [74], or the European Union General Data Protection Regulation (GDPR) [75] that sets a completely novel privacy AI and big data in medicine are only in their childhood stages; regulation for the European Union. Such initiatives are they grow up fast. Whether they grow up well is still an open converging toward building a landscape that enables science question that the future will answer. However, they will not while building trust in improving protection of individual rights. grow up well without actively helping them do so. There are I invite the readers to visit the JMIR Open Access collections several important initiatives that contribute to this, such as the available on the Web on the following topics: “Big Data,” Global Alliance for Genomics and Health (GA4GH), an “Decision Support for Health Professionals,” and “Artificial organization setting a policy and technical framework for Intelligence” [76-78]. Conflicts of Interest None declared. References 1. Bringsjord S, Govindarajulu N. Stanford Encyclopedia of Philosophy Archive. Stanford, CA: Metaphysics Research Lab, Stanford University; 2018. Artificial Intelligence URL: https://plato.stanford.edu/archives/fall2018/entries/ artificial-intelligence/ [accessed 2019-09-22] 2. Wikipedia. 2019. Artificial intelligence URL: https://en.wikipedia.org/w/index. php?title=Artificial_intelligence&oldid=916837449 [accessed 2019-09-22] 3. Olazaran M. A Sociological Study of the Official History of the Perceptrons Controversy. Soc Stud Sci 2016 Jun 29;26(3):611-659. [doi: 10.1177/030631296026003005] 4. Luger G. Artificial intelligence: structures and strategies for complex problem solving. Boston, MA: Pearson Addison-Wesley; 2009:978. 5. Kurzweil R. The singularity is near: when humans transcend biology. New York: Viking; 2005:978. 6. Giles M. MIT Technology Review. Google researchers have reportedly achieved "quantum supremacy" URL: https://www. technologyreview.com/f/614416/google-researchers-have-reportedly-achieved-quantum-supremacy/ [accessed 2019-09-22] 7. High-Level EGOAI(. European Commission. A definition of Artificial Intelligence: main capabilities and scientific disciplines URL: https://ec.europa.eu/digital-single-market/en/news/ definition-artificial-intelligence-main-capabilities-and-scientific-disciplines [accessed 2019-09-22] 8. Godec P, Pancur M, Ilenic N, Copar A, Stražar M, Erjavec A, et al. Democratized image analytics by visual programming through integration of deep models and small-scale machine learning. Nat Commun 2019 Oct 7;10(1):7. [doi: 10.1038/s41467-019-12397-x] 9. Faes L, Wagner SK, Fu DJ, Liu X, Korot E, Ledsam JR, et al. Automated deep learning design for medical image classification by health-care professionals with no coding experience: a feasibility study. The Lancet Digital Health 2019 Sep;1(5):e232-e242. [doi: 10.1016/s2589-7500(19)30108-6] 10. Carli D, Fahrni G, Bonnabry P, Lovis C. Quality of Decision Support in Computerized Provider Order Entry: Systematic Literature Review. JMIR Med Inform 2018 Jan 24;6(1):e3. [doi: 10.2196/medinform.7170] 11. Jung M, Hoerbst A, Hackl WO, Kirrane F, Borbolla D, Jaspers MW, et al. Attitude of Physicians Towards Automatic Alerting in Computerized Physician Order Entry Systems. Methods Inf Med 2018 Jan 20;52(02):99-108. [doi: 10.3414/me12-02-0007] 12. Neumann PJ, Sanders GD. Cost-Effectiveness Analysis 2.0. N Engl J Med 2017 Jan 19;376(3):203-205. [doi: 10.1056/NEJMp1612619] [Medline: 28099837] 13. Sanders GD, Neumann PJ, Basu A, Brock DW, Feeny D, Krahn M, et al. Recommendations for Conduct, Methodological Practices, and Reporting of Cost-effectiveness Analyses. JAMA 2016 Sep 13;316(10):1093. [doi: 10.1001/jama.2016.12195] 14. Mokdad AH, Ballestros K, Echko M, Glenn S, Olsen HE, Mullany E, et al. The State of US Health, 1990-2016. JAMA 2018 Apr 10;319(14):1444. [doi: 10.1001/jama.2018.0158] 15. Allen B. The Role of the FDA in Ensuring the Safety and Efficacy of Artificial Intelligence Software and De vices. J Am Coll Radiol 2019 Feb;16(2):208-210. [doi: 10.1016/j.jacr.2018.09.007] [Medline: 30389329] 16. U.S. Food and Drug Administration. 2019 Jun 26. Artificial Intelligence and Machine Learning in Software as a Medical Device URL: http://www.fda.gov/medical-devices/software-medical-device-samd/ artificial-intelligence-and-machine-learning-software-medical-device [accessed 2019-09-22] 17. Parikh RB, Obermeyer Z, Navathe AS. Regulation of predictive analytics in medicine. Science 2019 Feb 22;363(6429):810-812 [FREE Full text] [doi: 10.1126/science.aaw0029] [Medline: 30792287] 18. Chan KS, Zary N. Applications and Challenges of Implementing Artificial Intelligence in Medical Education: Integrative Review. JMIR Med Educ 2019 Jun 15;5(1):e13930 [FREE Full text] [doi: 10.2196/13930] [Medline: 31199295] 19. Masters K. Artificial intelligence in medical education. Med Teach 2019 Sep;41(9):976-980. [doi: 10.1080/0142159X.2019.1595557] [Medline: 31007106] https://www.jmir.org/2019/11/e16607 J Med Internet Res 2019 | vol. 21 | iss. 11 | e16607 | p. 6 (page number not for citation purposes) XSL FO RenderX JOURNAL OF MEDICAL INTERNET RESEARCH Lovis 20. Wartman SA, Combs CD. Reimagining Medical Education in the Age of AI. AMA J Ethics 2019 Feb 01;21(2):E146-E152 [FREE Full text] [doi: 10.1001/amajethics.2019.146] [Medline: 30794124] 21. Data Science Institute. Training Medical Students and Residents for the AI Future URL: https://www.acrdsi.org/Blog/ Medical-schools-must-prepare-trainees [accessed 2019-09-22] 22. Lillehaug SI, Lajoie SP. AI in medical education--another grand challenge for medical informatics. Artif Intell Med 1998 Mar;12(3):197-225. [Medline: 9626957] 23. Kolachalama VB, Garg PS. Machine learning and medical education. NPJ Digit Med 2018 Sep 27;1(1):54 [FREE Full text] [doi: 10.1038/s41746-018-0061-1] [Medline: 31304333] 24. Stausberg J, Bauer U, Nasseh D, Pritzkuleit R, Schmidt C, Schrader T, et al. Indicators of data quality: review and requirements from the perspective of networked medical research. GMS Med Inform Biom Epidemiol 2019;15(1):05. 25. Weiskopf NG, Weng C. Methods and dimensions of electronic health record data quality assessment: enabling reuse for clinical research. J Am Med Inform Assoc 2013 Jan 01;20(1):144-151 [FREE Full text] [doi: 10.1136/amiajnl-2011-000681] [Medline: 22733976] 26. Weiner MG. Toward Reuse of Clinical Data for Research and Quality Improvement: The End of the Beginning? Ann Intern Med 2009 Sep 01;151(5):359. [doi: 10.7326/0003-4819-151-5-200909010-00141] 27. Sandhu E, Weinstein S, McKethan A, Jain SH. Secondary Uses of Electronic Health Record Data: Benefits and Barriers. The Joint Commission Journal on Quality and Patient Safety 2012 Jan;38(1):34-40. [doi: 10.1016/s1553-7250(12)38005-7] 28. Hersh W. Adding value to the electronic health record through secondary use of data for quality assurance, research, and surveillance. Am J Manag Care 2007 Jun;13(6 Part 1):277-278 [FREE Full text] [Medline: 17567224] 29. Jiang B, Zhou D, Sun Y, Wang J. Systematic analysis of measurement variability in lung cancer with multidetector computed tomography. Ann Thorac Med 2017;12(2):95-100 [FREE Full text] [doi: 10.4103/1817-1737.203750] [Medline: 28469719] 30. Bellomi M, De Piano F, Ancona E, Lodigiani AF, Curigliano G, Raimondi S, et al. Evaluation of inter-observer variability according to RECIST 1.1 and its influence on response classification in CT measurement of liver metastases. Eur J Radiol 2017 Oct;95:96-101. [doi: 10.1016/j.ejrad.2017.08.001] [Medline: 28987705] 31. Yoon SH, Kim KW, Goo JM, Kim D, Hahn S. Observer variability in RECIST-based tumour burden measurements: a meta-analysis. Eur J Cancer 2016 Jan;53:5-15. [doi: 10.1016/j.ejca.2015.10.014] [Medline: 26687017] 32. Seymour L, Bogaerts J, Perrone A, Ford R, Schwartz LH, Mandrekar S, RECIST working group. iRECIST: guidelines for response criteria for use in trials testing immunotherapeutics. Lancet Oncol 2017 Mar;18(3):e143-e152 [FREE Full text] [doi: 10.1016/S1470-2045(17)30074-8] [Medline: 28271869] 33. The OBO Foundry. URL: http://www.obofoundry.org/ [accessed 2019-10-05] 34. SNOMED International. URL: https://www.snomed.org/snomed-international/who-we-are [accessed 2018-06-27] 35. Skevofilakas M, Zarkogianni K, Karamanos B, Nikita K. A hybrid Decision Support System for the risk assessment of retinopathy development as a long term complication of Type 1 Diabetes Mellitus. Conf Proc IEEE Eng Med Biol Soc 2010;2010:6713-6716. [doi: 10.1109/IEMBS.2010.5626245] [Medline: 21096083] 36. Hoehndorf R, Queralt-Rosinach N. Data Science and symbolic AI: Synergies, challenges and opportunities. DS 2017 Oct 17:1-12. [doi: 10.3233/ds-170004] 37. NIH: U.S. National Library of Medicine. Detailed Indexing Statistics: 1965-2017 URL: https://www.nlm.nih.gov/bsd/ index_stats_comp.html [accessed 2017-10-08] 38. Ahmed R, Omidian Z, Giwa A, Cornwell B, Majety N, Bell DR, et al. A Public BCR Present in a Unique Dual-Receptor-Expressing Lymphocyte from Type 1 Diabetes Patients Encodes a Potent T Cell Autoantigen. Cell 2019 May;177(6):1583-1599.e16. [doi: 10.1016/j.cell.2019.05.007] 39. Cochrane. URL: https://www.cochrane.org/ [accessed 2019-10-05] 40. Pew Research Center: Science & Society. Trust and Mistrust in Americans’ Views of Scientific Experts URL: https://www. pewresearch.org/science/2019/08/02/trust-and-mistrust-in-americans-views-of-scientific-experts/ [accessed 2019-10-05] 41. Kerasidou A. Trust me, I'm a researcher!: The role of trust in biomedical research. Med Health Care Philos 2017 Mar;20(1):43-50 [FREE Full text] [doi: 10.1007/s11019-016-9721-6] [Medline: 27638832] 42. Larson HJ, Clarke RM, Jarrett C, Eckersberger E, Levine Z, Schulz WS, et al. Measuring trust in vaccination: A systematic review. Hum Vaccin Immunother 2018 Jul 03;14(7):1599-1609 [FREE Full text] [doi: 10.1080/21645515.2018.1459252] [Medline: 29617183] 43. Sbaffi L, Rowley J. Trust and Credibility in Web-Based Health Information: A Review and Agenda for Future Research. J Med Internet Res 2017 Jun 19;19(6):e218 [FREE Full text] [doi: 10.2196/jmir.7579] [Medline: 28630033] 44. Interpretable Machine Learning. 2019. URL: https://christophm.github.io/interpretable-ml-book/ [accessed 2019-10-18] 45. Begoli E, Bhattacharya T, Kusnezov D. The need for uncertainty quantification in machine-assisted medical decision making. Nat Mach Intell 2019 Jan 7;1(1):20-23. [doi: 10.1038/s42256-018-0004-1] 46. Hutson M. Artificial intelligence faces reproducibility crisis. Science 2018 Feb 16;359(6377):725-726. [doi: 10.1126/science.359.6377.725] [Medline: 29449469] 47. Wilkinson MD, Dumontier M, Aalbersberg IJ, Appleton G, Axton M, Baak A, et al. The FAIR Guiding Principles for scientific data management and stewardship. Sci Data 2016 Mar 15;3(1):160018 [FREE Full text] [doi: 10.1038/sdata.2016.18] [Medline: 26978244] https://www.jmir.org/2019/11/e16607 J Med Internet Res 2019 | vol. 21 | iss. 11 | e16607 | p. 7 (page number not for citation purposes) XSL FO RenderX JOURNAL OF MEDICAL INTERNET RESEARCH Lovis 48. GitHub. FAIR Metrics 2019 URL: https://github.com/FAIRMetrics/Metrics [accessed 2019-10-05] 49. Union POTE. Publications Office of the European Union. Turning FAIR into reality: final report and action plan from the European Commission expert group on FAIR data URL: https://publications.europa.eu/en/publication-detail/-/publication/ 7769a148-f1f6-11e8-9982-01aa75ed71a1 [accessed 2019-07-04] 50. Harvard University. 2019. Towards an integrated and FAIR Research Data Management @Harvard Internet URL: https:/ /scholar.harvard.edu/mercecrosas/presentations/towards-integrated-and-fair-research-data-management-harvard [accessed 2019-04-10] 51. Leonelli S. Data — from objects to assets. Nature 2019 Oct 15;574(7778):317-320. [doi: 10.1038/d41586-019-03062-w] 52. Malin B. Re-identification of familial database records. AMIA Annu Symp Proc 2006:524-528 [FREE Full text] [Medline: 17238396] 53. El Emam K, Jonker E, Arbuckle L, Malin B. A systematic review of re-identification attacks on health data. PLoS One 2011 Dec 2;6(12):e28071 [FREE Full text] [doi: 10.1371/journal.pone.0028071] [Medline: 22164229] 54. Dankar FK, El Emam K, Neisa A, Roffey T. Estimating the re-identification risk of clinical data sets. BMC Med Inform Decis Mak 2012 Jul 09;12(1):66 [FREE Full text] [doi: 10.1186/1472-6947-12-66] [Medline: 22776564] 55. Benitez K, Malin B. Evaluating re-identification risks with respect to the HIPAA privacy rule. J Am Med Inform Assoc 2010;17(2):169-177 [FREE Full text] [doi: 10.1136/jamia.2009.000026] [Medline: 20190059] 56. Barth-Jones D. The 'Re-identification' of Governor William Weld's Medical Information. Soc Sci Res Netw 2012:19. 57. Emam KE, Dankar FK, Vaillancourt R, Roffey T, Lysyk M. Evaluating the Risk of Re-identification of Patients from Hospital Prescription Records. Can J Hosp Pharm 2009 Jul;62(4):307-319 [FREE Full text] [Medline: 22478909] 58. Kulynych J, Greely HT. Clinical genomics, big data, and electronic medical records: reconciling patient rights with research when privacy and science collide. J Law Biosci 2017 Apr;4(1):94-132 [FREE Full text] [doi: 10.1093/jlb/lsw061] [Medline: 28852559] 59. Mascalzoni D, Paradiso A, Hansson M. Rare disease research: Breaking the privacy barrier. Appl Transl Genom 2014 Jun 01;3(2):23-29 [FREE Full text] [doi: 10.1016/j.atg.2014.04.003] [Medline: 27275410] 60. Zaaijer S, Gordon A, Speyer D, Piccone R, Groen S, Erlich Y. Rapid re-identification of human samples using portable DNA sequencing. eLife Internet 2019:6. [doi: 10.7554/elife.27798] 61. Ienca M, Ferretti A, Hurst S, Puhan M, Lovis C, Vayena E. Considerations for ethics review of big data health research: A scoping review. PLoS ONE 2018 Oct 11;13(10):e0204937. [doi: 10.1371/journal.pone.0204937] 62. Loukides G, Denny JC, Malin B. The disclosure of diagnosis codes can breach research participants' privacy. J Am Med Inform Assoc 2010;17(3):322-327 [FREE Full text] [doi: 10.1136/jamia.2009.002725] [Medline: 20442151] 63. Malin B, Benitez K, Masys D. Never too old for anonymity: a statistical standard for demographic data sharing via the HIPAA Privacy Rule. J Am Med Inform Assoc 2011;18(1):3-10 [FREE Full text] [doi: 10.1136/jamia.2010.004622] [Medline: 21169618] 64. El Emam K, Moher E. Privacy and anonymity challenges when collecting data for public health purposes. J Law Med Ethics 2013 Mar;41 Suppl 1:37-41. [doi: 10.1111/jlme.12036] [Medline: 23590738] 65. Thorogood A, Zawati M. International Guidelines for Privacy in Genomic Biobanking (or the Unexpected Virtue of Pluralism). J Law Med Ethics Internet 2015;43:4. [doi: 10.1111/jlme.12312] 66. O'Neill L, Dexter F, Zhang N. The Risks to Patient Privacy from Publishing Data from Clinical Anesthesia Studies. Anesth Analg 2016 Dec;122(6):2017-2027. [doi: 10.1213/ANE.0000000000001331] [Medline: 27172145] 67. Househ M, Grainger R, Petersen C, Bamidis P, Merolli M. Balancing Between Privacy and Patient Needs for Health Information in the Age of Participatory Health and Social Media: A Scoping Review. Yearb Med Inform 2018 Aug;27(1):29-36 [FREE Full text] [doi: 10.1055/s-0038-1641197] [Medline: 29681040] 68. Chevrier R, Foufi V, Gaudet-Blavignac C, Robert A, Lovis C. Use and Understanding of Anonymization and De-Identification in the Biomedical Literature: Scoping Review. J Med Internet Res 2019 May 31;21(5):e13484. [doi: 10.2196/13484] 69. Dixon P. A Failure to "Do No Harm" -- India's Aadhaar biometric ID program and its inability to protect privacy in relation to measures in Europe and the U.S. Health Technol (Berl) 2017;7(4):539-567 [FREE Full text] [doi: 10.1007/s12553-017-0202-6] [Medline: 29308348] 70. Rengers JM, Heyse L, Otten S, Wittek RPM. "It's Not Always Possible to Live Your Life Openly or Honestly in the Same Way" - Workplace Inclusion of Lesbian and Gay Humanitarian Aid Workers in Doctors Without Borders. Front Psychol 2019 Feb 27;10:320 [FREE Full text] [doi: 10.3389/fpsyg.2019.00320] [Medline: 30873072] 71. Stangl AL, Lloyd JK, Brady LM, Holland CE, Baral S. A systematic review of interventions to reduce HIV-related stigma and discrimination from 2002 to 2013: how far have we come? J Int AIDS Soc 2013 Nov 13;16(3 Suppl 2):18734 [FREE Full text] [Medline: 24242268] 72. Bélisle-Pipon J, Vayena E, Green RC, Cohen IG. Genetic testing, insurance discrimination and medical research: what the United States can learn from peer countries. Nat Med 2019 Aug;25(8):1198-1204. [doi: 10.1038/s41591-019-0534-z] [Medline: 31388181] 73. Bardey D, De Donder P, Mantilla C. How is the trade-off between adverse selection and discrimination risk affected by genetic testing? Theory and experiment. J Health Econ 2019 Sep 30;68:102223. [doi: 10.1016/j.jhealeco.2019.102223] [Medline: 31581025] https://www.jmir.org/2019/11/e16607 J Med Internet Res 2019 | vol. 21 | iss. 11 | e16607 | p. 8 (page number not for citation purposes) XSL FO RenderX JOURNAL OF MEDICAL INTERNET RESEARCH Lovis 74. Global Alliance for Genomics & Health. URL: https://www.ga4gh.org/ [accessed 2019-10-05] 75. EU GDPR.ORG. URL: https://eugdpr.org/ [accessed 2019-10-05] 76. JMIR Medical Informatics. E-collection 'Big Data' URL: https://medinform.jmir.org/themes/183 77. JMIR Medical Informatics. E-collection 'Decision Support for Health Professionals' URL: https://medinform.jmir.org/ themes/186 78. Journal of Medical Internet Research. E-collection 'Artificial Intelligence (other than Clinical Decision Support)' URL: https://www.jmir.org/themes/797 Abbreviations AI: artificial intelligence CPOE: computerized provider order entry GA4GH: Global Alliance for Genomics and Health GDPR: General Data Protection Regulation RECIST: Response Evaluation Criteria In Solid Tumors SNOMED CT: Systematized Nomenclature of Medicine - Clinical Terms Edited by G Eysenbach; submitted 09.10.19; peer-reviewed by A Benis, D Gunasekeran; comments to author 13.10.19; revised version received 18.10.19; accepted 20.10.19; published 08.11.19 Please cite as: Lovis C J Med Internet Res 2019;21(11):e16607 URL: https://www.jmir.org/2019/11/e16607 doi: 10.2196/16607 PMID: 31702565 ©Christian Lovis. Originally published in the Journal of Medical Internet Research (http://www.jmir.org), 08.11.2019. This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in the Journal of Medical Internet Research, is properly cited. The complete bibliographic information, a link to the original publication on http://www.jmir.org/, as well as this copyright and license information must be included. https://www.jmir.org/2019/11/e16607 J Med Internet Res 2019 | vol. 21 | iss. 11 | e16607 | p. 9 (page number not for citation purposes) XSL FO RenderX http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Journal of Medical Internet Research JMIR Publications

Unlocking the Power of Artificial Intelligence and Big Data in Medicine

Journal of Medical Internet Research , Volume 21 (11) – Nov 8, 2019

Loading next page...
 
/lp/jmir-publications/unlocking-the-power-of-artificial-intelligence-and-big-data-in-55JOpzLrxy

References

References for this paper are not available at this time. We will be adding them shortly, thank you for your patience.

Publisher
JMIR Publications
Copyright
Copyright © The Author(s). Licensed under Creative Commons Attribution cc-by 4.0
ISSN
1438-8871
DOI
10.2196/16607
Publisher site
See Article on Publisher Site

Abstract

Data-driven science and its corollaries in machine learning and the wider field of artificial intelligence have the potential to drive important changes in medicine. However, medicine is not a science like any other: It is deeply and tightly bound with a large and wide network of legal, ethical, regulatory, economical, and societal dependencies. As a consequence, the scientific and technological progresses in handling information and its further processing and cross-linking for decision support and predictive systems must be accompanied by parallel changes in the global environment, with numerous stakeholders, including citizen and society. What can be seen at the first glance as a barrier and a mechanism slowing down the progression of data science must, however, be considered an important asset. Only global adoption can transform the potential of big data and artificial intelligence into an effective breakthroughs in handling health and medicine. This requires science and society, scientists and citizens, to progress together. (J Med Internet Res 2019;21(11):e16607) doi: 10.2196/16607 KEYWORDS medical informatics; artificial intelligence; big data Since its early days, the field of AI has allowed the development Introduction of many techniques supporting decision support and prediction, as it is usually made by humans. As early as 1958, a perceptron Most of the daily news and recently published scientific papers was expected to be able “to walk, talk, see, write, reproduce on research, innovations, and applications in artificial itself and be conscious of its existence,“ which led a large intelligence (AI) refer to what is known as machine scientific controversy between neural network and symbolic learning—algorithms using massive amounts of data and various reasoning approaches [3]. The landscape of AI research includes methodologies to find patterns, support decisions, make knowledge representation and engineering; rule-based and predictions, or, for the deep learning part, self-identify important symbolic reasoning; temporal reasoning and planning; sensing features in data. However, AI is a complex concept to grasp, and perception; learning; evolutionary, emerging, social and most people have little understanding of what it really is. behaviors; and the ability to move and manipulate objects, to AI was founded as an academic discipline in 1956 and, despite name the most important [4]—deep machine learning with its youth, already has a rich history [1,2]. In more than 60 years autonomous features extraction. It is the point of view taken in of exploration and progress, AI has become a large field of the paper, acknowledging however that, more recently, there is research and development involving multidisciplinary a trend to restrict AI to the latter, autonomous deep machine approaches to address many challenges, from theoretical learning. As a consequence of the wide landscape, the field frameworks, methods, and tools to real implementations, risk draws at large through philosophy, mathematics, information analysis, and impact measures. The definition of AI is a moving sciences, computer science, psychology, anthropology, social target and changes over time with the evolution of the field. https://www.jmir.org/2019/11/e16607 J Med Internet Res 2019 | vol. 21 | iss. 11 | e16607 | p. 1 (page number not for citation purposes) XSL FO RenderX JOURNAL OF MEDICAL INTERNET RESEARCH Lovis sciences, linguistics, and many others. For some experts and being the flu? If a person has a high measurement of blood visionary people such as Ray Kurzweil, deep machine learning sugar, what is the probability of it being diabetes? To illustrate will allow building of an artificial general intelligence that is the Bayesian trap, let us take a simple example—a pregnancy able to develop itself autonomously and to have the capacity to test. This is a simple test; it can be positive or negative. Let us understand or learn any intellectual task that a human being imagine that we use a classical test, which has 99% sensitivity can, and even go far beyond the limits of human intelligence and 95% specificity. If 100 tests are performed for 100 persons [5], but most experts would agree that there are some big and 5 turn out to be positive, the question is, know how many missing pieces and it is still a long way off, despite recent of them turn out to be pregnant women. This question is defined potential important advances in quantic computing [6]. A recent as determining the positive predictive value of a positive test, white paper published by the European Commission and and it gives the probability of a positive test to be really signing authored by the members of the High-Level Expert Group on the presence of the factor the test is testing. To answer this AI provides, in a few pages, a good overview on what AI is, its question, we need to know the prior probability of being main capabilities, applicable expectations, and disciplines pregnant in the tested population. To understand this, imagine involved [7]. that 100% men were tested; in this case, none of the 5 positive tests would correspond to a pregnant woman. Similarly, if all Taking into the field of AI at large, it is important to emphasize persons tested are pregnant woman, then all 5 positive tests that AI is already broadly used today in medicine. Decision would correspond to pregnant women. If the prior probability support based on knowledge engineering and rule-based systems is around 1%, then applying the Bayesian rules returns that the are implemented widely in computerized provider order entry probability to be pregnant when a test is reported positive is (CPOE) worldwide. Advanced signal processing is implemented about 17%. This means that about 4 of 5 tests are false positive. in pacemakers or defibrillators to take decisions, in At the other end, if the prior probability is around 20% (ie, a cochlear-implants with man-machine interfaces, in woman with several factors suggesting a potential pregnancy), electrocardiograms to provide signal analysis and automated the probability of a positive test to be a true positive is above diagnosis, etc. 80%. Thus, less than one test out of five is a false-positive. The The AI field in itself is aspirational and is expected to contribute example shows the major consequences of the prior probability significantly to medicine, from research to citizen-centered in Bayesian situations. health. Machine learning and deep learning has led most recent These are the consequences of AI. The models must take into major breakthroughs in AI, such as sound (speech and music) account prior probability in the population they are used. This recognition and image (face, radiology, pathology, dermatology, should be better understood even when reporting results in the etc) recognition, and in gaming. Recently, image recognition literature, often limited to specificity and sensitivity. Another has almost reached a level of maturity through which it can be consequence only becomes visible when several focused and used and developed by nonexperts in AI [8,9]. However, the near-to-perfect systems are used together in complex cases. For hype around AI in these last few years has built high example, having many systems, each with its own false-positive expectations and similarly high fears. There are still very few rate can end up with consolidated systems that have the sum of systems based on autonomous deep learning that have emerged all false-positives. This has been shown well with decision widely in the commercial market. support system in CPOE, with a very high rate of false-positive The world of AI could roughly be summarized in three alerts, especially with patients receiving complex drug therapies sequential and superposed acts: [10,11]. Act 1: Humans teach machines to handle data and information. Regulatory Labyrinth Most diagnostic or therapeutic means used nowadays in Act 2: Humans teach expertise to machines. medicine have to go through complex regulatory frameworks Act 3: Humans teach machines to learn alone. to get market approval. The regulatory agencies mostly base their decisions on safety, evidence, and added value. In addition, Challenges medicoeconomic assessments are often used by health agencies according to various dimensions such as quality-adjusted life There are many challenges that need to be addressed in the field year and burden of the disease, by using indicators such as of AI, when it comes to medicine. Most of them are not disability-adjusted life-years [12-14]. These decisions thus have exclusive to medicine and health, but their addition makes the economical and legal consequences, including accountability. goals significantly much harder to reach. The role of regulatory agencies is discussed, especially around Bayesian Trap topics that are getting into the market, such as in image recognition [15]. For example, a call for inputs for “Artificial Medicine and health determinants, in general, are characterized Intelligence and Machine Learning in Software as a Medical by their usually fundamental Bayesian property. In the Bayesian Device“ has been launched by the Federal Drug Administration probability approach, a prior probability is required to evaluate [16]. This is an important aspect, as regulatory agency support the strength of the prediction. is an important asset in building trust for most care professionals Most of what is used in medicine, notably but not exclusively, to use medical tools and for companies to invest in robust to establish diagnosis, falls in the Bayesian approach. For products ready for the market. However, this requires us to example, if a person has a fever, what is the probability of it define a clear regulatory framework, appropriate evaluation https://www.jmir.org/2019/11/e16607 J Med Internet Res 2019 | vol. 21 | iss. 11 | e16607 | p. 2 (page number not for citation purposes) XSL FO RenderX JOURNAL OF MEDICAL INTERNET RESEARCH Lovis processes, and benchmark tools without blocking innovation An unexpected consequence of the “data quality chiasm” is its [17]. influence on modifying acquisition processes, especially in clinical contexts. One often hears sentences such as “the quality Education and Practice Gap of clinical data is not good enough for research.” As a result, Medicine is a science with numerous tools and devices, from there is a constant pressure to move toward more structured stethoscopes to scalpels, microscope to scanners, scores, data acquisition processes. For example, the RECIST (Response guidelines, etc. Most of these tools and devices require education Evaluation Criteria In Solid Tumors) guidelines are meant to and sometimes very specific certification processes for care standardize the radiologic evaluation criteria in solid tumors professionals that use them, not to speak about a good oncological trial treatments. This has been successfully experience. This should be also the case for software, developed for trials. Use of RECIST requires good experience algorithms, and other decision support systems. However, this to avoid interobserver variability, which can be as high as 20% is not the case. Education to use software and understand [29-31]. This assessment has been adapted to reflect changes systems as important as the computerized patient records is in radiological response, for example, in immunotherapies where often minimal. When it is about big data and AI, education on the size of tumors can increase despite good therapeutic response the topic is worse, usually inexistent. There are only very few [32]. Unfortunately, there is growing pressure to extend the use medical schools that teach the use of AI to future health of RECIST and other similarly structured staging guidelines professionals. AI should become mandatory teaching in all beyond clinical trials for all radiological staging to improve the medical schools in the world as a priority. Experts have been capacity to use standard clinical care for therapeutic assessment. raising the question since 20 years, but it has received real focus As a consequence, this leads to a high time pressure on only recently [18-22]. In 5-10 years, when current young operational activities of radiology departments and an increasing students will be starting their clinical activities, machine learning number of inexperienced people using these types of staging. based on data science will have become embedded in many With the progression of natural interfaces such as voice activities, devices, and software and its use, misuse, and overuse recognition and natural language processing and their increased and consequences on patients and accountability will depend daily use in a growing number devices, I would argue in favor on how users will master it [23]. of avoiding artificial structuring many data acquisition processes and keep the data in their most natural form, exploiting more Data Quality Chiasm natural interactions such as voice and text and developing strong Data quality is a recurring topic of discussion when it comes to natural language processing tools that can be applied to produce big data and analytics. One of the characteristics of the big data structured information in a postprocessing step. This will allow era is that data are often used for a purpose that differs from the reprocessing of all narratives whenever needed by new one that motivated data acquisition. This is a notable difference structured resources required. with traditional hypothetic-deductive scientific approaches in medicine, where a hypothesis leads to a methodology design, Quest for Truth which itself will lead to specific data acquisition. In the big data era, the primary goal of data-producing processes is often Many aspects of the landscape of artificial intelligence require completely independent from possible use of the data. It is a good idea of what is true. Knowledge engineering builds the interesting to emphasize that long-term clinical cohorts and graph of the “known” or the “relevant” such as it is made in long-term biobanks face similar challenges. Designing long-term SNOMED CT (Systematized Nomenclature of Medicine - cohorts and building metadata framework and standard operating Clinical Terms) or the Open Biological and Biomedical procedures for biobanking are important challenges, as they Ontology Foundry [33,34]. The same applies with rule-based have to project usages that will be made years after the initial techniques or symbolic reasoning, which need to be able to design. express rules, that is, truth in a formalized way, but also in supervised machine learning approaches, which require having These questions have led to a consequent literature addressing training sets that express truth, at least a probabilistic truth. the question of data quality and secondary usage of clinical data. There are a lot of expectations in these approaches, especially However, most of this work tries to describe dimensions able when combing them [35,36], but all of them, except to assess the “intrinsic” or “absolute” quality of data [24-28]. unsupervised deep machine learning, require some sources of Another approach could be to adopt a “fit-for-purpose” truth, which leads to the fundamental question of finding the approach, which considers only the quantitative and descriptive sources of truth in life sciences and the level of evidence properties of data, allowing further processing. The “qualitative” supporting that truth. At the first glance, it seems to be a trivial properties of any dataset can only be assessed in conjunction question. However, the “truth” is often “lost in text” because with a specific secondary usage. This means that the same for most of it, the sources rely on complex narratives that dataset will be appropriate to answer some scientific questions contextualize the messages they convey. In addition, the “truth” and not appropriate to answer others. The data are not “good” is very diluted. For example, with more than 2500 papers or “bad” by themselves; they are “good” or “bad” when used indexed daily in Medline/PubMed [37], it is nearly impossible in a specific context: the “fit-for-purpose” assessment. This is for an expert to catch everything published in its own research one of the major objectives of the FAIR data initiative, which field. Finally, and by nature, science is evolving, and thus, aims at insuring “a posteriori” data usability (see below). scientific “truth” of what was true once may no more be true today. For example, it was clear until recently that there are two https://www.jmir.org/2019/11/e16607 J Med Internet Res 2019 | vol. 21 | iss. 11 | e16607 | p. 3 (page number not for citation purposes) XSL FO RenderX JOURNAL OF MEDICAL INTERNET RESEARCH Lovis types of lymphocytes—the B and T cells. However, a recent of data acquisition and flow is usually more difficult than that paper from Rizwan et al [38] describes a new type of in traditional controlled studies. The consequence is that the lymphocyte, bearing characteristics of both B and T cells, which data have specific properties, which are not always well may play a role in driving autoimmunity in some diseases such managed, such as selection biases. Sometimes, the assumptions as diabetes [38]. Sources of truth and their characterizations, constraining the use of analytical tools are not well understood, such as the level of evidence or their context of use, are such as homoscedasticity for many statistical tests. In addition, increasingly important. This should be available to all, similar deep machine learning is facing the challenges of precise to Cochrane [39], covering all area of life sciences; maintained; reproducibility and explainability. The latter is currently the and in machine-readable form. object of numerous works, trying to understand intermediate representation of data in neural networks that can predict and Building Trust explain their behavior. Explainability and interpretability are often used interchangeably. Interpretability is the extent to which In science, trust is strongly related to building evidence. Trust it is possible to predict how the system will behave, given a is important in not only the scientific community, but also at change in input or algorithmic parameters. On the other hand, large, to build adoption, political support, and public acceptance. explainability is the extent to which the internal mechanics of In summer 2019, a survey published by the Pew Research Center the deep learning system can be understood and thus explained. showed a positive trend among the public: science acts for the Molnar [44] published a very good overview of the problem in good, but with concerns about integrity, transparency, and bias. an open book available on GitHub. However, explainability Overall, 86% of Americans say they have at least “a fair might not be the best road to raise global trust in deep machine amount” of confidence in scientists [40]. One of the challenges learning approaches, especially when the explanations is that scientific reliability has often been confused with themselves are hard to explain. Some other dimensions such as trustworthiness [41]. Scientific evidence can be very strong, transparency, reproducibility, or uncertainty qualifications might such as for immunization or Web-based health information, but be more effective [45]. For example, in Science in 2018, Hutson the trust can be much lower [42,43]. There are many dimensions [46] reported a survey of 400 artificial intelligence papers that have been discussed in building trust in science, but they presented at major conferences, with only 6% including code can be summarized in three concepts, one for the scientists and for the algorithms and 30% test data, thus considerably limiting the organizations, one for the objects of the research, and one reproducibility possibilities [46]. for the processes. Integrity is first and most important and covers scientific integrity, funding, conflict of interests, etc. FAIR Data Hope Transparency must be present for the motivation, outcomes, and process. Finally, methodologies applied to handle the The FAIR Guiding Principles are guidelines to make data processes must be strong and robust. Building evidence requires discoverable and processable by both humans and machines. many dimensions to be taken into account, such as bias, They were first published by Wilkinson et al [47]. The FAIR generalizability, reproducibility, and explainability. Some Guiding Principles are based on a set of criteria listed in Textbox challenges are more difficult in big data and AI. Proper control 1: https://www.jmir.org/2019/11/e16607 J Med Internet Res 2019 | vol. 21 | iss. 11 | e16607 | p. 4 (page number not for citation purposes) XSL FO RenderX JOURNAL OF MEDICAL INTERNET RESEARCH Lovis Textbox 1. FAIR data criteria. Findable (Meta)data are assigned a globally unique and persistent identifier Data are described with rich metadata (defined below) Metadata clearly and explicitly include the identifier of the data described (Meta)data are registered or indexed in a searchable resource Accessible (Meta)data are retrievable by their identifier using a standardized communications protocol The protocol is open, free, and universally implementable The protocol allows for an authentication and authorization procedure, where necessary Metadata are accessible, even when the data are no longer available Interoperable (Meta)data use a formal, accessible, shared, and broadly applicable language for knowledge representation (Meta)data use vocabularies that follow FAIR Principles (Meta)data include qualified references to other (meta)data Reusable Meta(data) are richly described with a plurality of accurate and relevant attributes: (Meta)data are released with a clear and accessible data usage license (Meta)data are associated with detailed provenance (Meta)data meet domain-relevant community standards Several frameworks have been defined to assess and evaluate becoming less effective to protect privacy. Increasing the compliance to FAIR criteria, such as the FAIR maturity heterogeneous data sources and richness of data about each of tools [48,49]. As such, FAIR data do not imply that data are in us, associated with data linkage techniques, strongly increases the Open Data space [50]. Access can be restricted, such as in the possibility of reidentification, including anonymized data the Harvard Dataverse, and this is an important point to [52-57]. The challenge and potential impacts are even bigger emphasize. There might be a lot of restriction to have data or for genetic information [58-60]. There is no good technical metadata available in the Open Data space, because of national solution that can harmonize the challenge of preserving privacy regulation, privacy protection, intellectual property, etc. FAIR and answering the increasing need of data-driven science for data do not make data available; they make data usable under accessing large genomic et phenotypic datasets, and there are the condition that it is authorized. many ongoing ethical and legal discussions [61-66]. Interestingly, this is not restricted to science, and the same The FAIR initiative is crucial. It illustrates the movement of applies to patients’ needs for health information [67]. There is data from objects to assets initiated this last decade and a need for better global education about implications and risks described in the essay of Sabina Leonelli [51] recently published of privacy, citizen, policy makers, students, research community, in Nature. and all stakeholders. A recent scoping review [68] has shown The initiative promotes the use of rich metadata framework, that the understanding of anonymization and de-deidentification compliant to standards and formal descriptions. It promotes the is heterogeneous in the scientific community [68]. use of free and open resources for descriptive information and Discrimination is one of the major risks in privacy breaches, protocols. It allows us to build a framework of shareable and disclosing privacy information can have many consequences resources that can be used for processing, without actually [69-71], including in reimbursement and insurance coverage sharing the data in an open space. FAIR allows building of a [72,73]. It is important to find the right path between naïve framework that is inclusive with all data sources, including positivism and irrational paranoia. An important step forward those that are subject to authorization, with clear and open is to improve awareness and education of all stakeholders about protocols. privacy, technical limitations to protect it, and building regulatory barriers to avoid discrimination. Privacy – New Deal In the era of big data, privacy requires special attention. Usual paradigms of limiting access to deidentified information are https://www.jmir.org/2019/11/e16607 J Med Internet Res 2019 | vol. 21 | iss. 11 | e16607 | p. 5 (page number not for citation purposes) XSL FO RenderX JOURNAL OF MEDICAL INTERNET RESEARCH Lovis respecting human rights to enable responsible genomic data Conclusions sharing [74], or the European Union General Data Protection Regulation (GDPR) [75] that sets a completely novel privacy AI and big data in medicine are only in their childhood stages; regulation for the European Union. Such initiatives are they grow up fast. Whether they grow up well is still an open converging toward building a landscape that enables science question that the future will answer. However, they will not while building trust in improving protection of individual rights. grow up well without actively helping them do so. There are I invite the readers to visit the JMIR Open Access collections several important initiatives that contribute to this, such as the available on the Web on the following topics: “Big Data,” Global Alliance for Genomics and Health (GA4GH), an “Decision Support for Health Professionals,” and “Artificial organization setting a policy and technical framework for Intelligence” [76-78]. Conflicts of Interest None declared. References 1. Bringsjord S, Govindarajulu N. Stanford Encyclopedia of Philosophy Archive. Stanford, CA: Metaphysics Research Lab, Stanford University; 2018. Artificial Intelligence URL: https://plato.stanford.edu/archives/fall2018/entries/ artificial-intelligence/ [accessed 2019-09-22] 2. Wikipedia. 2019. Artificial intelligence URL: https://en.wikipedia.org/w/index. php?title=Artificial_intelligence&oldid=916837449 [accessed 2019-09-22] 3. Olazaran M. A Sociological Study of the Official History of the Perceptrons Controversy. Soc Stud Sci 2016 Jun 29;26(3):611-659. [doi: 10.1177/030631296026003005] 4. Luger G. Artificial intelligence: structures and strategies for complex problem solving. Boston, MA: Pearson Addison-Wesley; 2009:978. 5. Kurzweil R. The singularity is near: when humans transcend biology. New York: Viking; 2005:978. 6. Giles M. MIT Technology Review. Google researchers have reportedly achieved "quantum supremacy" URL: https://www. technologyreview.com/f/614416/google-researchers-have-reportedly-achieved-quantum-supremacy/ [accessed 2019-09-22] 7. High-Level EGOAI(. European Commission. A definition of Artificial Intelligence: main capabilities and scientific disciplines URL: https://ec.europa.eu/digital-single-market/en/news/ definition-artificial-intelligence-main-capabilities-and-scientific-disciplines [accessed 2019-09-22] 8. Godec P, Pancur M, Ilenic N, Copar A, Stražar M, Erjavec A, et al. Democratized image analytics by visual programming through integration of deep models and small-scale machine learning. Nat Commun 2019 Oct 7;10(1):7. [doi: 10.1038/s41467-019-12397-x] 9. Faes L, Wagner SK, Fu DJ, Liu X, Korot E, Ledsam JR, et al. Automated deep learning design for medical image classification by health-care professionals with no coding experience: a feasibility study. The Lancet Digital Health 2019 Sep;1(5):e232-e242. [doi: 10.1016/s2589-7500(19)30108-6] 10. Carli D, Fahrni G, Bonnabry P, Lovis C. Quality of Decision Support in Computerized Provider Order Entry: Systematic Literature Review. JMIR Med Inform 2018 Jan 24;6(1):e3. [doi: 10.2196/medinform.7170] 11. Jung M, Hoerbst A, Hackl WO, Kirrane F, Borbolla D, Jaspers MW, et al. Attitude of Physicians Towards Automatic Alerting in Computerized Physician Order Entry Systems. Methods Inf Med 2018 Jan 20;52(02):99-108. [doi: 10.3414/me12-02-0007] 12. Neumann PJ, Sanders GD. Cost-Effectiveness Analysis 2.0. N Engl J Med 2017 Jan 19;376(3):203-205. [doi: 10.1056/NEJMp1612619] [Medline: 28099837] 13. Sanders GD, Neumann PJ, Basu A, Brock DW, Feeny D, Krahn M, et al. Recommendations for Conduct, Methodological Practices, and Reporting of Cost-effectiveness Analyses. JAMA 2016 Sep 13;316(10):1093. [doi: 10.1001/jama.2016.12195] 14. Mokdad AH, Ballestros K, Echko M, Glenn S, Olsen HE, Mullany E, et al. The State of US Health, 1990-2016. JAMA 2018 Apr 10;319(14):1444. [doi: 10.1001/jama.2018.0158] 15. Allen B. The Role of the FDA in Ensuring the Safety and Efficacy of Artificial Intelligence Software and De vices. J Am Coll Radiol 2019 Feb;16(2):208-210. [doi: 10.1016/j.jacr.2018.09.007] [Medline: 30389329] 16. U.S. Food and Drug Administration. 2019 Jun 26. Artificial Intelligence and Machine Learning in Software as a Medical Device URL: http://www.fda.gov/medical-devices/software-medical-device-samd/ artificial-intelligence-and-machine-learning-software-medical-device [accessed 2019-09-22] 17. Parikh RB, Obermeyer Z, Navathe AS. Regulation of predictive analytics in medicine. Science 2019 Feb 22;363(6429):810-812 [FREE Full text] [doi: 10.1126/science.aaw0029] [Medline: 30792287] 18. Chan KS, Zary N. Applications and Challenges of Implementing Artificial Intelligence in Medical Education: Integrative Review. JMIR Med Educ 2019 Jun 15;5(1):e13930 [FREE Full text] [doi: 10.2196/13930] [Medline: 31199295] 19. Masters K. Artificial intelligence in medical education. Med Teach 2019 Sep;41(9):976-980. [doi: 10.1080/0142159X.2019.1595557] [Medline: 31007106] https://www.jmir.org/2019/11/e16607 J Med Internet Res 2019 | vol. 21 | iss. 11 | e16607 | p. 6 (page number not for citation purposes) XSL FO RenderX JOURNAL OF MEDICAL INTERNET RESEARCH Lovis 20. Wartman SA, Combs CD. Reimagining Medical Education in the Age of AI. AMA J Ethics 2019 Feb 01;21(2):E146-E152 [FREE Full text] [doi: 10.1001/amajethics.2019.146] [Medline: 30794124] 21. Data Science Institute. Training Medical Students and Residents for the AI Future URL: https://www.acrdsi.org/Blog/ Medical-schools-must-prepare-trainees [accessed 2019-09-22] 22. Lillehaug SI, Lajoie SP. AI in medical education--another grand challenge for medical informatics. Artif Intell Med 1998 Mar;12(3):197-225. [Medline: 9626957] 23. Kolachalama VB, Garg PS. Machine learning and medical education. NPJ Digit Med 2018 Sep 27;1(1):54 [FREE Full text] [doi: 10.1038/s41746-018-0061-1] [Medline: 31304333] 24. Stausberg J, Bauer U, Nasseh D, Pritzkuleit R, Schmidt C, Schrader T, et al. Indicators of data quality: review and requirements from the perspective of networked medical research. GMS Med Inform Biom Epidemiol 2019;15(1):05. 25. Weiskopf NG, Weng C. Methods and dimensions of electronic health record data quality assessment: enabling reuse for clinical research. J Am Med Inform Assoc 2013 Jan 01;20(1):144-151 [FREE Full text] [doi: 10.1136/amiajnl-2011-000681] [Medline: 22733976] 26. Weiner MG. Toward Reuse of Clinical Data for Research and Quality Improvement: The End of the Beginning? Ann Intern Med 2009 Sep 01;151(5):359. [doi: 10.7326/0003-4819-151-5-200909010-00141] 27. Sandhu E, Weinstein S, McKethan A, Jain SH. Secondary Uses of Electronic Health Record Data: Benefits and Barriers. The Joint Commission Journal on Quality and Patient Safety 2012 Jan;38(1):34-40. [doi: 10.1016/s1553-7250(12)38005-7] 28. Hersh W. Adding value to the electronic health record through secondary use of data for quality assurance, research, and surveillance. Am J Manag Care 2007 Jun;13(6 Part 1):277-278 [FREE Full text] [Medline: 17567224] 29. Jiang B, Zhou D, Sun Y, Wang J. Systematic analysis of measurement variability in lung cancer with multidetector computed tomography. Ann Thorac Med 2017;12(2):95-100 [FREE Full text] [doi: 10.4103/1817-1737.203750] [Medline: 28469719] 30. Bellomi M, De Piano F, Ancona E, Lodigiani AF, Curigliano G, Raimondi S, et al. Evaluation of inter-observer variability according to RECIST 1.1 and its influence on response classification in CT measurement of liver metastases. Eur J Radiol 2017 Oct;95:96-101. [doi: 10.1016/j.ejrad.2017.08.001] [Medline: 28987705] 31. Yoon SH, Kim KW, Goo JM, Kim D, Hahn S. Observer variability in RECIST-based tumour burden measurements: a meta-analysis. Eur J Cancer 2016 Jan;53:5-15. [doi: 10.1016/j.ejca.2015.10.014] [Medline: 26687017] 32. Seymour L, Bogaerts J, Perrone A, Ford R, Schwartz LH, Mandrekar S, RECIST working group. iRECIST: guidelines for response criteria for use in trials testing immunotherapeutics. Lancet Oncol 2017 Mar;18(3):e143-e152 [FREE Full text] [doi: 10.1016/S1470-2045(17)30074-8] [Medline: 28271869] 33. The OBO Foundry. URL: http://www.obofoundry.org/ [accessed 2019-10-05] 34. SNOMED International. URL: https://www.snomed.org/snomed-international/who-we-are [accessed 2018-06-27] 35. Skevofilakas M, Zarkogianni K, Karamanos B, Nikita K. A hybrid Decision Support System for the risk assessment of retinopathy development as a long term complication of Type 1 Diabetes Mellitus. Conf Proc IEEE Eng Med Biol Soc 2010;2010:6713-6716. [doi: 10.1109/IEMBS.2010.5626245] [Medline: 21096083] 36. Hoehndorf R, Queralt-Rosinach N. Data Science and symbolic AI: Synergies, challenges and opportunities. DS 2017 Oct 17:1-12. [doi: 10.3233/ds-170004] 37. NIH: U.S. National Library of Medicine. Detailed Indexing Statistics: 1965-2017 URL: https://www.nlm.nih.gov/bsd/ index_stats_comp.html [accessed 2017-10-08] 38. Ahmed R, Omidian Z, Giwa A, Cornwell B, Majety N, Bell DR, et al. A Public BCR Present in a Unique Dual-Receptor-Expressing Lymphocyte from Type 1 Diabetes Patients Encodes a Potent T Cell Autoantigen. Cell 2019 May;177(6):1583-1599.e16. [doi: 10.1016/j.cell.2019.05.007] 39. Cochrane. URL: https://www.cochrane.org/ [accessed 2019-10-05] 40. Pew Research Center: Science & Society. Trust and Mistrust in Americans’ Views of Scientific Experts URL: https://www. pewresearch.org/science/2019/08/02/trust-and-mistrust-in-americans-views-of-scientific-experts/ [accessed 2019-10-05] 41. Kerasidou A. Trust me, I'm a researcher!: The role of trust in biomedical research. Med Health Care Philos 2017 Mar;20(1):43-50 [FREE Full text] [doi: 10.1007/s11019-016-9721-6] [Medline: 27638832] 42. Larson HJ, Clarke RM, Jarrett C, Eckersberger E, Levine Z, Schulz WS, et al. Measuring trust in vaccination: A systematic review. Hum Vaccin Immunother 2018 Jul 03;14(7):1599-1609 [FREE Full text] [doi: 10.1080/21645515.2018.1459252] [Medline: 29617183] 43. Sbaffi L, Rowley J. Trust and Credibility in Web-Based Health Information: A Review and Agenda for Future Research. J Med Internet Res 2017 Jun 19;19(6):e218 [FREE Full text] [doi: 10.2196/jmir.7579] [Medline: 28630033] 44. Interpretable Machine Learning. 2019. URL: https://christophm.github.io/interpretable-ml-book/ [accessed 2019-10-18] 45. Begoli E, Bhattacharya T, Kusnezov D. The need for uncertainty quantification in machine-assisted medical decision making. Nat Mach Intell 2019 Jan 7;1(1):20-23. [doi: 10.1038/s42256-018-0004-1] 46. Hutson M. Artificial intelligence faces reproducibility crisis. Science 2018 Feb 16;359(6377):725-726. [doi: 10.1126/science.359.6377.725] [Medline: 29449469] 47. Wilkinson MD, Dumontier M, Aalbersberg IJ, Appleton G, Axton M, Baak A, et al. The FAIR Guiding Principles for scientific data management and stewardship. Sci Data 2016 Mar 15;3(1):160018 [FREE Full text] [doi: 10.1038/sdata.2016.18] [Medline: 26978244] https://www.jmir.org/2019/11/e16607 J Med Internet Res 2019 | vol. 21 | iss. 11 | e16607 | p. 7 (page number not for citation purposes) XSL FO RenderX JOURNAL OF MEDICAL INTERNET RESEARCH Lovis 48. GitHub. FAIR Metrics 2019 URL: https://github.com/FAIRMetrics/Metrics [accessed 2019-10-05] 49. Union POTE. Publications Office of the European Union. Turning FAIR into reality: final report and action plan from the European Commission expert group on FAIR data URL: https://publications.europa.eu/en/publication-detail/-/publication/ 7769a148-f1f6-11e8-9982-01aa75ed71a1 [accessed 2019-07-04] 50. Harvard University. 2019. Towards an integrated and FAIR Research Data Management @Harvard Internet URL: https:/ /scholar.harvard.edu/mercecrosas/presentations/towards-integrated-and-fair-research-data-management-harvard [accessed 2019-04-10] 51. Leonelli S. Data — from objects to assets. Nature 2019 Oct 15;574(7778):317-320. [doi: 10.1038/d41586-019-03062-w] 52. Malin B. Re-identification of familial database records. AMIA Annu Symp Proc 2006:524-528 [FREE Full text] [Medline: 17238396] 53. El Emam K, Jonker E, Arbuckle L, Malin B. A systematic review of re-identification attacks on health data. PLoS One 2011 Dec 2;6(12):e28071 [FREE Full text] [doi: 10.1371/journal.pone.0028071] [Medline: 22164229] 54. Dankar FK, El Emam K, Neisa A, Roffey T. Estimating the re-identification risk of clinical data sets. BMC Med Inform Decis Mak 2012 Jul 09;12(1):66 [FREE Full text] [doi: 10.1186/1472-6947-12-66] [Medline: 22776564] 55. Benitez K, Malin B. Evaluating re-identification risks with respect to the HIPAA privacy rule. J Am Med Inform Assoc 2010;17(2):169-177 [FREE Full text] [doi: 10.1136/jamia.2009.000026] [Medline: 20190059] 56. Barth-Jones D. The 'Re-identification' of Governor William Weld's Medical Information. Soc Sci Res Netw 2012:19. 57. Emam KE, Dankar FK, Vaillancourt R, Roffey T, Lysyk M. Evaluating the Risk of Re-identification of Patients from Hospital Prescription Records. Can J Hosp Pharm 2009 Jul;62(4):307-319 [FREE Full text] [Medline: 22478909] 58. Kulynych J, Greely HT. Clinical genomics, big data, and electronic medical records: reconciling patient rights with research when privacy and science collide. J Law Biosci 2017 Apr;4(1):94-132 [FREE Full text] [doi: 10.1093/jlb/lsw061] [Medline: 28852559] 59. Mascalzoni D, Paradiso A, Hansson M. Rare disease research: Breaking the privacy barrier. Appl Transl Genom 2014 Jun 01;3(2):23-29 [FREE Full text] [doi: 10.1016/j.atg.2014.04.003] [Medline: 27275410] 60. Zaaijer S, Gordon A, Speyer D, Piccone R, Groen S, Erlich Y. Rapid re-identification of human samples using portable DNA sequencing. eLife Internet 2019:6. [doi: 10.7554/elife.27798] 61. Ienca M, Ferretti A, Hurst S, Puhan M, Lovis C, Vayena E. Considerations for ethics review of big data health research: A scoping review. PLoS ONE 2018 Oct 11;13(10):e0204937. [doi: 10.1371/journal.pone.0204937] 62. Loukides G, Denny JC, Malin B. The disclosure of diagnosis codes can breach research participants' privacy. J Am Med Inform Assoc 2010;17(3):322-327 [FREE Full text] [doi: 10.1136/jamia.2009.002725] [Medline: 20442151] 63. Malin B, Benitez K, Masys D. Never too old for anonymity: a statistical standard for demographic data sharing via the HIPAA Privacy Rule. J Am Med Inform Assoc 2011;18(1):3-10 [FREE Full text] [doi: 10.1136/jamia.2010.004622] [Medline: 21169618] 64. El Emam K, Moher E. Privacy and anonymity challenges when collecting data for public health purposes. J Law Med Ethics 2013 Mar;41 Suppl 1:37-41. [doi: 10.1111/jlme.12036] [Medline: 23590738] 65. Thorogood A, Zawati M. International Guidelines for Privacy in Genomic Biobanking (or the Unexpected Virtue of Pluralism). J Law Med Ethics Internet 2015;43:4. [doi: 10.1111/jlme.12312] 66. O'Neill L, Dexter F, Zhang N. The Risks to Patient Privacy from Publishing Data from Clinical Anesthesia Studies. Anesth Analg 2016 Dec;122(6):2017-2027. [doi: 10.1213/ANE.0000000000001331] [Medline: 27172145] 67. Househ M, Grainger R, Petersen C, Bamidis P, Merolli M. Balancing Between Privacy and Patient Needs for Health Information in the Age of Participatory Health and Social Media: A Scoping Review. Yearb Med Inform 2018 Aug;27(1):29-36 [FREE Full text] [doi: 10.1055/s-0038-1641197] [Medline: 29681040] 68. Chevrier R, Foufi V, Gaudet-Blavignac C, Robert A, Lovis C. Use and Understanding of Anonymization and De-Identification in the Biomedical Literature: Scoping Review. J Med Internet Res 2019 May 31;21(5):e13484. [doi: 10.2196/13484] 69. Dixon P. A Failure to "Do No Harm" -- India's Aadhaar biometric ID program and its inability to protect privacy in relation to measures in Europe and the U.S. Health Technol (Berl) 2017;7(4):539-567 [FREE Full text] [doi: 10.1007/s12553-017-0202-6] [Medline: 29308348] 70. Rengers JM, Heyse L, Otten S, Wittek RPM. "It's Not Always Possible to Live Your Life Openly or Honestly in the Same Way" - Workplace Inclusion of Lesbian and Gay Humanitarian Aid Workers in Doctors Without Borders. Front Psychol 2019 Feb 27;10:320 [FREE Full text] [doi: 10.3389/fpsyg.2019.00320] [Medline: 30873072] 71. Stangl AL, Lloyd JK, Brady LM, Holland CE, Baral S. A systematic review of interventions to reduce HIV-related stigma and discrimination from 2002 to 2013: how far have we come? J Int AIDS Soc 2013 Nov 13;16(3 Suppl 2):18734 [FREE Full text] [Medline: 24242268] 72. Bélisle-Pipon J, Vayena E, Green RC, Cohen IG. Genetic testing, insurance discrimination and medical research: what the United States can learn from peer countries. Nat Med 2019 Aug;25(8):1198-1204. [doi: 10.1038/s41591-019-0534-z] [Medline: 31388181] 73. Bardey D, De Donder P, Mantilla C. How is the trade-off between adverse selection and discrimination risk affected by genetic testing? Theory and experiment. J Health Econ 2019 Sep 30;68:102223. [doi: 10.1016/j.jhealeco.2019.102223] [Medline: 31581025] https://www.jmir.org/2019/11/e16607 J Med Internet Res 2019 | vol. 21 | iss. 11 | e16607 | p. 8 (page number not for citation purposes) XSL FO RenderX JOURNAL OF MEDICAL INTERNET RESEARCH Lovis 74. Global Alliance for Genomics & Health. URL: https://www.ga4gh.org/ [accessed 2019-10-05] 75. EU GDPR.ORG. URL: https://eugdpr.org/ [accessed 2019-10-05] 76. JMIR Medical Informatics. E-collection 'Big Data' URL: https://medinform.jmir.org/themes/183 77. JMIR Medical Informatics. E-collection 'Decision Support for Health Professionals' URL: https://medinform.jmir.org/ themes/186 78. Journal of Medical Internet Research. E-collection 'Artificial Intelligence (other than Clinical Decision Support)' URL: https://www.jmir.org/themes/797 Abbreviations AI: artificial intelligence CPOE: computerized provider order entry GA4GH: Global Alliance for Genomics and Health GDPR: General Data Protection Regulation RECIST: Response Evaluation Criteria In Solid Tumors SNOMED CT: Systematized Nomenclature of Medicine - Clinical Terms Edited by G Eysenbach; submitted 09.10.19; peer-reviewed by A Benis, D Gunasekeran; comments to author 13.10.19; revised version received 18.10.19; accepted 20.10.19; published 08.11.19 Please cite as: Lovis C J Med Internet Res 2019;21(11):e16607 URL: https://www.jmir.org/2019/11/e16607 doi: 10.2196/16607 PMID: 31702565 ©Christian Lovis. Originally published in the Journal of Medical Internet Research (http://www.jmir.org), 08.11.2019. This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in the Journal of Medical Internet Research, is properly cited. The complete bibliographic information, a link to the original publication on http://www.jmir.org/, as well as this copyright and license information must be included. https://www.jmir.org/2019/11/e16607 J Med Internet Res 2019 | vol. 21 | iss. 11 | e16607 | p. 9 (page number not for citation purposes) XSL FO RenderX

Journal

Journal of Medical Internet ResearchJMIR Publications

Published: Nov 8, 2019

Keywords: medical informatics; artificial intelligence; big data

There are no references for this article.