Personalised medicine challenges: quality of data

Personalised medicine challenges: quality of data Personalised evidence-based-medicine aims to use stored health data to prevent future illnesses. This implies that data should be stored in a readable and understandable form, at least until the death of the person in question. The aim of this paper is to discuss the challenges that arise from the existing pressure to maintain health data in electronic format for many decades. Today clinical databases are filled with heterogeneous data regarding who has collected it, protocols used, detail, precision, and subjectivity. Some data elements are typically more exposed to these problems (e.g. diagnosis) than others (e.g. laboratory results). It is critical that data scientists fully understand how data were collected. Also, it is very important to store context information, protocols used and accuracy/precision information in clinical databases to ensure future understanding of such data. Keywords Quality of data · Personalised medicine · Electronic patient record · Heterogeneous data 1 Introduction of such data may occur many years after or in a completely different setting than when and where it was collected. Personalised evidence-based-medicine (EbM) uses stored Despite many efforts along the years for improving nor- health data, namely of patient diagnoses, laboratory work, malisation and standardization of clinical data, concerns insure claims, and demographic information among other. regarding these aspects are still present in recent initiatives This information allows to move beyond the reactive approach intending to push forward personalised medicine. Projects of treating illness, allowing healthcare providers to predict such as FP7 MyHealthAvatar [38] and DISCIPULUS [37] and prevent future illnesses [14] and therefore become a embody the relevance of having digital clinical informa- promising application area for data science as a discipline. tion for pursuing personalised medicine thus reinforcing the Nevertheless, this area has some specific challenges as the use importance on guarantying completeness regarding patient data allowing a complete view and integrated analysis of the patient health: to this end the methods used for the acquisition NORTE-01-0145-FEDER-000016 (NanoSTIMA) is financed by the of information must be such that information is given as a North Portugal Regional Operational Programme (NORTE 2020), standardised set of data and preferably provided with uncer- under the PORTUGAL 2020 Partnership Agreement, and through the tainty ranges. This concerns continued to the next call within European Regional Development Fund (ERDF). the Virtual Physiological Human (VPH) with more hands-on B Ricardo Cruz-Correia approach like (p-medicine: from data sharing and integration rcorreia@med.up.pt via VPH models to personalised medicine) [36]. Another Duarte Ferreira relevant aspect, considered critical to the development of duarteng.ferreira@gmail.com personalised medicine, is surfacing from recent initiatives Gustavo Bacelar like IGNITE (Implementing GeNomics In pracTicE) projects gbacelar@gmail.com [5] where the priority for having genomic data as a part of Pedro Marques the electronic health record is considered to be high. These pmarques@med.up.pt projects need to deal with data quality and precision issues, Priscila Maranhão data heterogeneity, and data aggregation to create a "big pic- priscilamaranhao@gmail.com ture" representation of the patient. CINTESIS - Center for Health Technology and Services Research, Rua Dr. Plácido da Costa, Porto, Portugal 123 International Journal of Data Science and Analytics Therefore when one considers health-related data, it defines the logical structures used in the modelling process. becomes very important to consider the longevity of data On top of the reference model structures, we have data points, regarding its usefulness and how will data age [7]. Due to which in openEHR are called archetypes, and are composable the fact that we do not know for sure how data will be used structures that define the way clinical data should be stored. in the future and therefore its value, the best way to protect Each archetype has an identifier, and each data point can be such use opportunities is to store all data in a readable and accessed through a path inside the archetype. These identi- understandable form, at least until the death of the person in fiers and paths are unique and independent of the context in question. which the archetype is used. Take for example the data of newborns related to having Archetypes can be combined in higher level, context- low weight at childbirth. It is known that being small for specific structures that are called templates. These templates gestational age (SGA) has a higher risk of short stature than can inherit all or only a part of each archetype’s data points. children born at normal size do [40]. Besides, SGA has been This is a very powerful characteristic that allows re-usability associated with an increased prevalence of cardiovascular in different contexts without losing the ability to compare disease, essential hypertension, and metabolic disease, par- data points between different templates that use the same ticularly type 2 diabetes mellitus [22]. In another example, archetype. Application developers can base persistence mod- the short duration of breastfeeding is significantly correlated els, interfaces, and forms on openEHR templates. with metabolic syndrome and obesity in childhood, being an In Table 1, different data types were grouped according to important factor for preventing metabolic disorders [31,54]. the types of clinical and administrative entries of openEHR In these cases, maintaining the data related to the birth weight standard. We also added a system usage data category that and breastfeeding time of each newborn in their patient record is not present in openEHR standard. This data category does may become very useful to calculate the risk of cardiovascu- not represent clinical or administrative data but data about the lar disease of each person many years after. systems, errors, access logs, or how the system contact with Considering the life expectancy at birth in 2014 in Europe other systems. Although not directly related to healthcare is around 80.9 years, according to Eurostat [10], we should delivery, it is of the utmost importance for data provenance, be aiming to use formats and structures that could sustain traceability, and data security [8]. Thus, the purpose of each such a lifespan to store our own data. This means that data clinical entry is as follows [47]: collected (e.g. the weight of newborns) in 2017 should be stored in a format understandable until 2098. – Observation for recording information from the patient’s The aim of this paper is to discuss the difficulties and world—anything measured by a clinician, a laboratory or possible solutions to problems that rise from the existing by them, or reported by the patient as a symptom, event pressure to maintain health data in electronic format for many or concern; decades. – Evaluation for recording opinions and summary state- ments (usually clinical), such as problems, diagnoses, risk assessments, goals, etc., that are generally based on 2 Types of health data observation evidence; – Instruction for recording orders, prescriptions, directives, Health data can be very different depending on its type (e.g. and any other requested interventions; observations, evaluations, instructions), or if it is collected by – Action for recording actions, which may be due to Instruc- humans (e.g. medical doctors, nurses, patients) or machines tions, e.g. drug administrations, procedures, etc; (e.g. ICU monitoring systems, digital thermometer). Some of – Administrative for recording administrative events, e.g. these values are interpreted by humans before data input and admission, discharge, consent, etc. therefore vulnerable to subjectivity, others are automatically sampled or aggregated (e.g. the mean blood pressure of the last 24 h). The protocol used to collect data is very rarely Moreover, for each type of data represented in Table 1,a recorded, and in some settings, it may often change along few examples are given to illustrate and a set of issues (with the years (e.g. use of a different anatomical location, patient a − is a problem with + is an advantage) are itemised. Below position, or device to measure the blood pressure). follows a description of this issues. OpenEHR is a standard (http://www.openEHR.org) that provides an open-source software infrastructure for imple- – Observation data, e.g. temperature and blood pressure, menting an electronic health record (EHR) in a clinical when collected by: knowledge domain [53]. It is based on a multi-level, service- oriented architectural and follows a single-source modelling – Humans are subject to reading or interpretation errors approach. At its core, there is a stable reference model that when transcribing, subjectivity when value reading is 123 International Journal of Data Science and Analytics Table 1 Relation between types of data collected and issues affecting quality of data Type of data Examples Collected by Humans Machines Observations Temperature − Make transcriptions + Usually sensors Blood pressure − Subjective − Do not store protocol data − Lack of standards Evaluations Diagnosis − Subjective Adverse reaction risk − Protocols changes − Need to teach terminology Instructions (orders) Laboratory orders + Reliable Actions Laboratory reports − Incomplete data + Automatic analysis Surgery reports Administrative Patient ID number, name − Some basic data (eg. + Easier when based on ID name) may change cards Date of birth and gender System usage Audit trails − Security credential shar- − Consistent time issues ing Messaging logs not clear [19], lack of standardization in the way they – Administrative data, e.g. patient or visit identification and are stored. demographics data, when collected by: – Machines the error level is known when using sen- – Humans most systems depend critically on the cor- sors, but there is still a lack of storage of data rect identification of persons (both patients and health collection protocol. professionals). Incorrect identification is more com- – Evaluation data, e.g. diagnosis and adverse reaction risk, mon than most admit and can, obviously, have a huge when collected by: impact on patient care. Furthermore, some data ele- ments we use daily to identify persons univocally may – Humans are also subject to subjectivity [19], lack not be that unique or change during our lifetime (eg. of standardization on the way terminology is used women’s name change after marriage in many cul- and understood, and the meaning of some disease tures). concepts changes through time when we consider – Machines the widespread use of ID cards and other decades. similar identification technologies have improved the – Instruction data, e.g. laboratory orders, when collected quality of the data and also the time and effort to by: collect such data. – Humans these data are normally reliable in the sense – System usage data, e.g. audit trails and messaging logs, that it describes what the person that filled the order when collected by: really asked for. The use of terminologies, in this case, – Humans sharing credentials (e.g. login and password) is well known and very common. or computer sessions are a major limitation to analyse – Actions data, e.g. laboratory or surgery reports, when audit trail information both for auditing or process collected by: mining purposes. – Machines even data that are collected automati- – Humans these data may suffer from incompleteness cally may have many problems. For instance, having when introduced by humans, as it tends to be very consistent time in logs is still uncommon in many verbose (many values) and we tend to value more the settings. Many servers do not have their time syn- interpretation of those values than the values them- chronised, and the logs do not use proper time/date selves. standards (e.g. ISO8601) to deal with timezones or – Machines the automatic collection and analysis of daylight saving time. action reports, when possible, have a great potential and is common in most healthcare settings. 123 International Journal of Data Science and Analytics PatID | Date | AP +-| Form in use since 1992 |-+ --------+----------+---- | [ ] Allergy to penicillin | 254 | 03.05.92 | 0 +----------------------------+ 255 | 04.05.92 | 1 ... | ... | ... +--| Form in use since 2002 |---------------------------+ 1232 | 06.07.02 | 0 | Allergy to penicillin: () Yes () No () Unknown | 1233 | 07.07.02 | 1 +-------------------------------------------------------+ 1234 | 08.07.02 | 9 ... | ... | ... +--| Form in use since 2013 |---------------------------------+ 8872 | 18.02.13 | 0 | Allergy to penicillin: | 8875 | 20.02.13 | 0 | () Yes, confirmed by doctor () No, confirmed by doctor | 8877 | 20.02.13 | 1 | () Yes, informed by patient () No, informed by patient | 8879 | 20.02.13 | 9 | () Unknown | 8901 | 21.02.13 | 2 +-------------------------------------------------------------+ 8903 | 22.02.13 | 3 Fig. 1 Illustration of how different versions of a form feed the same data table, making it difficult to interpret the meaning of each answer. AP means allergy to penicillin 3 Data collection form issues human factors engineering should be considered in the dis- play of patient data. As stated in [45], few problems are more challenging than the development of effective techniques for capturing patient data accurately, completely, and efficiently. Although more 3.2 Data values during form changes and more data are being collected using sensors or other auto- matic forms, most data existing in clinical databases are still To better illustrate the case, imagine that a particular hos- the result of filling a form by a health professional or patient. pital has been collecting data about allergy to penicillin of These forms are present inside electronic patient records their patients (see Fig. 1). These data are being collected (EPRs) and typically have both very structured data entries since 1992. During these years, the forms used to collect this and narrative entries to record patient data. The amount of data have changed, aiming to improve the quality of data. structured data is dependent on the time it takes to fill it in, The software developers have chosen to have the allergy to the importance of such data elements to the institution, and penicillin values recorded in the same database field inde- the difficulty to structure it in multiple data elements in oppo- pendently of the form used. Unfortunately, changing forms sition to leaving an open text field. without changing the data structures where those values are recorded or storing the form version used to collect, is much more common than one would expect. In this (common) case, interpretation of such values will be much harder. 3.1 Form formats One very important aspect related to the quality of data is the forms used to present and collect such data [52]. This 3.3 Paper versus electronic forms has been realised in many clinical scenarios, and therefore, efforts must be made to standardise such forms. One important difference between storing patient data in There is evidence that question wording and framing, paper forms and computer systems is the fact that whilst in including the choice and order of response categories, can paper forms the questions and the answers are stored together have an important impact on the nature and quality of (the paper form), in computer systems the questions only responses [27]. McColl et al. also stated that through careful exist as software forms in an application and the answers are attention to the design and layout of questionnaires, the risk stored in the databases. Also, health institutions normally do of errors in posing and interpreting questions and in recording not maintain the previous versions of the computer systems, and coding responses can be reduced, and potential inter-rater easily leading to a situation where you have the answers pro- variability can be minimised. vided by health professionals, but one does not know the Another example is a work aiming to define a synopsis exact questions that were made. Instead of the questions, format that is effective in delivering essential pathological one ends up with a list of field names that may not describe information and to evaluate the aesthetic appeal and the the question made to the user. Knowing the answers with- impact of varying format styles on the speed and accuracy out knowing the exact questions is not useful and may be of data extraction [46]. One of its main conclusions is that dangerous. 123 International Journal of Data Science and Analytics 3.4 Data transformation When data quality and validation are guaranteed then med- ical records might be more suitable data sources in clinical These issues (lack of formalism and clarity in data handling) trials [6]. produce a low rate of reproducibility in research [33]. The use of data provenance, which is a formal representation of 4.3 The variation of the clinical measures computational processes, may be a solution for this issue. Complex computational tools to analyse with large quantities Pagnacco et al. [32] described that measurement errors are of data create the need for more precise descriptions of the mostly random in nature. In other words, assuming a random origin of data the transformations that have been applied to nature of measurement errors also assumes that the within- those data, and the implications of the results. Pasquier et subjects variability, i.e. the variance of the results obtained al. suggestion of publishing the source code used for data from the same subject, is similar between the subjects exam- transformations in scientific papers, could be extended to also ined. This assumption usually holds true in engineering, include the source code of the systems used in data collection. where the “subjects” are inanimate objects. However, when Data provenance refers to attributes of the origin of infor- dealing with humans and clinical measures, this supposition mation, it can help in guaranteeing proof of data integrity assumption is rarely satisfied because the variability is caused [17], which is very useful when using cloud environments or not only by external random factors but also by the subjects storing data for long periods of time. conditions and reactions to random endogenous and exoge- nous stimuli. Thus, the researchers should have a particular attention to the reasons for such variation in clinical measure- 4 Storing the error level ments obtained in clinical trials, mainly intra-patients [30]. Different healthcare professionals can get different results Databases are used to store facts. The data should have when acquiring data from a patient (e.g. vital signs, height, enough precision, detail, and context to be properly under- weight). The variability in patient’s measurements affects the stood and analysed. In the healthcare domain, data can be ability to have reliable results. Most of this variability can collected using many different protocols and devices through be explained by the professional level of training and experi- time. ence, and also by patient’s individual characteristics [28].The adoption of EHR can support quality control procedures, pro- 4.1 Medical devices viding definitions and giving oriented advice at the moment of care (e.g. how to collect the data, confounding factors, A large portion of medical data is originated in medical etc.) [51].This would improve measuring standardization and devices. Like all other calibrated devices, they have the bring benefits to clinical trials data quality. Nevertheless, a capability of measuring physical properties just to a certain fundamental challenge persists: how to control data variabil- accuracy and precision. Obviously, no measurement is per- ity related to patients’ characteristics. fect and all have some error associated with them. Accuracy and precision are necessary to ensure that results are valid. As an example, there are few studies addressing the reli- 5 Clinical measures in reality ability of home blood pressure monitoring devices and the quality of its data. Jung et al. present a study that aimed to Nutrition assessment seeks to detect nutritional problems, evaluate the current status of home BP devices in terms of collaborating for the promotion and recovery of health [26]. validation and accuracy [21]. This study showed that non- For example, the bioimpedance is a useful method to assess validated devices are used widely in clinical practice and a the body composition. However, it has positive and negative substantial portion is inaccurate. Storing the capability of the points: there are so many rules to the exam to be performed device to measure the reality is also important to properly use as the patient does not drink alcohol, caffeine and does not patient data in the future. do exercise 24h before the exam; women cannot do the exam in menstrual period, and the exam must be done in fasting 4.2 Medical records state (of water too), and the bladder must be empty before the exam [48]. This reality can also be found when collecting more subjec- Thus, some studies have been reported that is much vari- tive information in medical records. In these cases, knowing ability among the bioimpedance results, mainly related to the exact source of the data, e.g. doctor opinion or patient the use of equations without the actual knowledge and the description, may have an impact on the interpretation of hydration status of the patient [12,23]. Therefore, we can data. It should be considered the possibility of adding the perceive the importance of the patient to follow the stipu- source of data, and the reliability the user puts in such source. lated rules for clinical research exams. The EHR can help 123 International Journal of Data Science and Analytics the health professional choose the better way to assess the Another issue is clinical evaluator variability. To reduce patients body composition, but patient’s involvement in the evaluator’s variability the same person should do all the clin- study is fundamental as he/she needs to follow the project ical measures. But even then it still would be impossible to rules. guarantee that health professionals are systematic in perform- Another example is the assessment of the dietary intake; ing all tasks, i.e. positioning the cuff of the blood pressure in this case, intrapersonal variability is very significant. The device exactly in the same position every time, or taking the estimation of consumption based on just a few days of col- same amount of adipose panicle with the adipometer to assess lection leads to critical failures in this context [44]. In other body composition. hands, the short time of food observation does not reflect the Thus, we understand that computer system can improve habitual intake [9]. For this, the number of the days of evalua- data quality, but unfortunately, the adequacy of the patient’s tion and the kind of instrument (24h-record; diary food, etc.) clinical trial rules, and storing information about the protocol are significant tools to obtain accurate results. The Institute used in each case, are still essential for this quality of data. of Medicine (IOM) [18], which published the Dietary Ref- erence Intakes (DRI), takes into account both the variability of the nutrient requirement in individuals and intrapersonal 6 Personalised medicine: pros and cons variability of intake. For its application, however, it is nec- essary to use values of intrapersonal variability, expressed Personalised medicine is defined as the use of genomic and by the intrapersonal standard deviation of ingestion of each other biotechnologies to derive information about an individ- nutrient, obtained in studies with the same population [26]. ual that could then be applied to obtain information on types However, we cannot forget that the patients commitment to of health interventions that would best suit that individual filling all food instrument and not lie about the dietary intake, [4,41]. Over recent years, considerable technical advances is a fundamental point to obtain correct data about the dietary have increasingly linked personalised medicine with preven- intake. tive medicine. Although this process provides benefits in Similarly, the blood pressure measurement presents intrin- treating patients, particularly regarding the genetic profile, sic variability [16]. Blood pressure measurement in some challenges, mainly regarding the lay public, still exist. Thus, cases is still performed in a non-standardised way [24]; some certain points must be discussed to ensure the protection and factors as the health professional, environment; equipment, fair treatment of individuals [20,42]. technical and the patient can interfere in trustworthiness to Personalised medicine is an example of what medicine blood pressure assessment. The protocol recommendations desires to be in the future: specific, rigorous, and able to con- include avoiding physical exercise 60 min before the exam, trol disease and death [41]. For this, this field applies tools drinking alcohol, coffee, smoking 30 min before the assess- that enable risk assessment and prediction, such as health ment, not talking and keeping your legs uncrossed, these are risk assessments, family history and, mainly, genetic infor- just a few recommendations [34]. mation [11]. The advantages of personalised medicine are Besides, other types of patients intrapersonal variability, not simply applicable to patient treatment, but also aid in the environments are important to the trustworthiness of clinical prevention and prediction of disease by identifying genetic trials results. Rodriguez-Segade et al. [39] investigated the predispositions, predicting a potential patient [11,41]. association between nephropathy and HbA1c variability in Recent studies have been reported concerning the bene- 2103 patients followed up for a mean 6.6 years. The authors fits of personalised medicine, which covers biomarker used to concluded that in patients with type 2 diabetes, the risk detect specific genetic traits and to guide different approaches of progression of nephropathy increases significantly with towards the prevention and treatment of diseases, offer- HbA1c variability, independently of updates mean HbA1c. ing substantial healthcare savings [35]. Najafzadeh et al. To explain this point, the author hypothesizes that lifestyle [29] described several potential advantages of personalised influences HbA1c variability. Greater HbA1c variability hav- medicine, such as possible applications of pharmacoge- ing been reported to be associated with unfavourable lifestyle nomics in tailoring treatments to improve effectiveness and factors among patients with type 1 diabetes [50]. However, minimise adverse effects; disease diagnosis; genomic testing other authors demonstrate that this variability can be involved in preventive medicine and the identification of new condi- in a low socioeconomic class [50], and insulin resistance, tions. which it has been implicated in the pathogenesis of dia- In addition, cancer prevention and treatment appears as the betes complications [13]. These studies demonstrate that the greatest potential in the field of personalised medicine [15]. patient’s variability is associated with environments and also For example, regarding breast cancer, different immunologic genetic factors which are essential be considered by the clin- markers have been applied to indicate the best treatment ical trials investigators. option and to assess metastasis and recurrence risk [1], whilst colon cancer therapy can be evaluated by genetic testing; 123 International Journal of Data Science and Analytics for example, homozygotes subjects for the UGT1A1, *28 about their predisposition to diseases, and the affordability of allele show increased risk for neutropenia after treatment these tests for disadvantaged socioeconomic groups should with irinotecan, with a reduction in the starting dose being also be taken into account [29]. In addition, citizens also advised [25]. appear concerned about personalised medicine perspectives; However, although personalised medicine has been applied they believe that personalised tests might be used to ration in multiple areas, mainly in oncology and cardiology, and care and that treatment should be applied only if the patient many benefits for patient care have been noted, risks can- wants it. This issue raises clinical and policy challenges that not be ignored. The field also offers many disadvantages, may undermine the value of personalised medicine. Further and several challenges are presents in this context, related efforts to deliberate with the public are warranted in order to informed consent, confidentiality, genetic discrimination to inform effective, efficient, and equitable translations of and direct-to-consumer genetic testing, among others [20]. personalised medicine [3]. Nevertheless, as research/treatment makes use of person- Through these advantages, it is possible to understand alised medicine, becoming increasingly more common, the that personalised medicine raises certain challenges, both for management of the ethical and legal issues becomes even physician and healthcare systems, including the enormous more necessary. For example, in a hypothetical situation number of available tests, the fast development of testing related to a patient that presented no response to a specific technologies, the decreasing unit cost per tested mutation treatment, the health insurance requires genetic testing to and the potential of diagnostic and screening technologies determine drug safety and efficacy, in order to avoid unnec- to determine subsequent individual care pathways. Addi- essary cost burdens. This point is an advantage regarding cost tionally, the support for required economic evidence to be perspective, but it is not ethical, and the patient might prefer produced to improve personalised medicine reimbursement to incur the risks to avoid generating and releasing his/her and coverage decisions is also noteworthy [43]. genomic information [49]. Another ethical issue is related From these described advantages and disadvantages, it to genotype-driven research recruitment. This is a poten- is possible to understand the challenges of personalised tially powerful tool for studying the functional significance medicine in order to aid in creating strategies to prepare the of human genetic variation. However, the genetic informa- future healthcare system, reducing errors and improving the tion generated for one study might be used as the basis for pros of personalised medicine. identifying and reconnecting participants for other studies [2]. These and other points justify the development of rules and clear guidelines to ensure the control and safety of the obtained information. 7 Discussion Other disadvantages and challenges in the use of person- alised medicine can also be cited. Najafzadeh et al. [29] The quality of clinical data reducing errors of clinical results performed a semi-structured focus group with 28 physicians is a challenge. The electronic health record can help to to discuss general themes about personalised medicine. From improve it, but clinicians must have attention on innumerable these focus groups, the authors categorised the disadvan- factors, mainly related to patient involvement. We suggest tages of personalised medicine expressed by experts into that despite the use of EHR, health professionals should three perceived issues: validity uncertainty, equity issues, always discuss their results amply and attentively and pro- and implementation. The authors described that the physi- mote the inclusion of the protocol used collecting data in an cians expressed concerns, mainly related to the uncertainty attempt to improve clinical data. around the validity of genomic tests given the complexity of Information in healthcare institution has the potential to gene expression, which was mentioned as a major concern. be very valuable due to the existing amount of data, and the Other points mentioned as potential disadvantages included economic value of the decisions they describe. To fulfil this financial incentives for private companies for excessive mar- potential today and in the future, it is critical that data scien- keting of their services, possible mishandling of genomic tists fully understand how these data were collected. Storing information by private companies, and discrimination based context information, protocols used and precision/accuracy on the genomic data (by insurance companies, the healthcare information in clinical databases helps to ensure future under- system, and employers). standing of such data. In spite of the fact that personalised medicine consid- ers the patient-centred approach to treatment and that it be Compliance with ethical standards most advantageous in the clinical practice, higher public engagement regarding this issue is required. Physicians also believe that lack of public knowledge about genomic tests Conflict of interest On behalf of all authors, the corresponding author presents possible unsafe impacts to patients after learning states that there is no conflict of interest. 123 International Journal of Data Science and Analytics Open Access This article is distributed under the terms of the Creative 19. Jansen, A.C.M., van Aalst-Cohen, E.S., Hutten, B.A., Büller, H.R., Commons Attribution 4.0 International License (http://creativecomm Kastelein, J.J.P., Prins, M.H.: Guidelines were developed for data ons.org/licenses/by/4.0/), which permits unrestricted use, distribution, collection from medical records for use in retrospective analyses. and reproduction in any medium, provided you give appropriate credit J. Clin. Epidemiol. 58(3), 269–274 (2005) to the original author(s) and the source, provide a link to the Creative 20. Joly, Y., Saulnier, K.M., Osien, G., Knoppers, B.M.: The ethi- Commons license, and indicate if changes were made. cal framing of personalized medicine. Curr. Opin. Allergy Clin. Immunol. 14(5), 404–408 (2014) 21. Jung, M.-H., Kim, G.-H., Kim, J.-H., Moon, K.-W., Yoo, K.-D., Rho, T.-H., Kim, C.-M.: Reliability of home blood pressure mon- References itoring: in the context of validation and accuracy. Blood Press. Monit. 20(4), 215–220 (2015) 1. Arpino, G., Generali, D., Sapino, A., Del Matro, L., Frassoldati, 22. Kuhle, S., Maguire, B., Ata, N., MacInnis, N., Dodds, L.: Birth A., de Laurentis, M., Pronzato, P., Mustacchi, G., Cazzaniga, M., weight for gestational age, anthropometric measures, and cardio- De Placido, S., et al.: Gene expression profiling in breast cancer: a vascular disease markers in children. J. Pediatr. 182(21), 99–106 clinical perspective. Breast 22(2), 109–120 (2013) (2016) 2. Beskow, L.M., Namey, E.E., Miller, P.R., Nelson, D.K., Cooper, A.: 23. Kushner, R.F.: Bioelectrical impedance analysis: a review of prin- Irb chairs perspectives on genotype-driven research recruitment. ciples and applications. J. Am. Coll. Nutr. 11(2), 199–209 (1992) IRB 34(3), 1 (2012) 24. Lehane, A., O’Brien, E.T., O’Malley, K.: Reporting of blood pres- 3. Bombard, Y., Abelson, J., Simeonov, D., Gauvin, F.-P.: Citizens’ sure data in medical journals. Br. Med. J. 281(6255), 1603 (1980) perspectives on personalized medicine: a qualitative public delib- 25. Liu, C.-Y., Chen, P.-M., Chiou, T.-J., Liu, J.-H., Lin, J.-K., Lin, T.- eration study. Eur. J. Hum. Genet. 21(11), 1197–1201 (2013) C., Chen, W.-S., Jiang, J.-K., Wang, H.-S., Wang, W.-S.: Ugt1a1* 4. Burke, W., Psaty, B.M.: Personalized medicine in the era of 28 polymorphism predicts irinotecan-induced severe toxicities genomics. JAMA 298(14), 1682–1684 (2007) without affecting treatment outcome and survival in patients with 5. Cavallari, L.H., Sperber, N.R., Carpenter, J.S., et al.: Challenges metastatic colorectal carcinoma. Cancer 112(9), 1932–1940 (2008) and strategies for implementing genomic services in diverse set- 26. Marchioni, D.M.L., Junior, E.V., Galvão Cesar, C.L., Fisberg, tings: experiences from the implementing genomics in practice R.M.: Avaliação da adequação da ingestão de nutrientes na prática (ignite) network. BMC Med. Genom. 10(1), 35 (2017) clínica. Rev. Nutr. 24(6), 825–832 (2011) 6. Cowie, M.R., Blomster, J.I., Curtis, L.H., Duclaux, S., Ford, I., 27. McColl, E., et al. Design and use of questionnaires: a review Fritz, F., Goldman, S., Janmohamed, S., Kreuzer, J., Leenay, M., of best practice applicable to surveys of health service staff et al.: Electronic health records to facilitate clinical research. Clin. and patients. Core Research (2001). https://www.researchgate. Res. Cardiol. 106(1), 1–9 (2017) net/profile/Lois_Thomas/publication/11550481_Design_and_ 7. Cruz-Correia, R.J., Wyatt, J., Dinis-Ribeiro, M., Costa-Pereira, use_of_questionnaires_a_review_of_best_practice_applicable_ to_surveys_of_Health_Service_staff_and_patients/links/ A.M.: Determinants of frequency and longevity of hospital encoun- 00b7d52df7cdf18db6000000/Design-and-use-of-questionnaires- ters’ data. BMC Med. Inform. Decis. Mak. 10(1), 15 (2010) a-review-of-best-practice-applicable-to-surveys-of-Health- 8. Cruz-Correia, R., Boldt, I., Lapão, L., Santos-Pereira, C., Servicestaff-and-patients.pdf Rodrigues, P.P., Ferreira, A.M., Freitas, A.: Analysis of the quality 28. McGregor, M., Cambron, J.A., Jedlicka, J., Gudavalli, M.R.: Clin- of hospital information systems audit trails. BMC Med. Inform. ical trial variability: quality control in a randomized clinical trial. Decis. Mak. 13(1), 84 (2013) 9. Cuppari, L.: Aplicacoes das dris na avaliacao da ingestao de nutri- Contemp. Clin. Trials 30(1), 20–23 (2009) entes para individuos. In: Usos e aplicaes das dietary reference 29. Najafzadeh, M., Davis, J.C., Joshi, P., Marra, C.: Barriers for inte- Inatakes DRI, Brasil I (2001) grating personalized medicine into clinical practice: a qualitative 10. Eurostat: Mortality and life expectancy statistics, 2017 analysis. Am. J. Med. Genet. Part A 161(4), 758–763 (2013) 11. Ginsburg, G.S., Willard, H.F.: Genomic and personalized medicine: 30. Normando, D.: Dental press journal of orthodontics: one year later, foundations and applications. Transl. Res. 154(6), 277–287 (2009) and more growth. Dent. Press J. Orthod. 22, 9–10 (2017) 12. Graves, J.E., Pollock, M.L., Colvin, A.B., Van Loan, M., Lohman, 31. Novaes, J.F., Lamounier, J.A., Colosimo, E.A., Franceschini, T.G.: Comparison of different bioelectrical impedance analyzers S.C.C.: Breastfeeding and obesity in Brazilian children. Eur. J. in the prediction of body composition. Am. J. Hum. Biol. 1(5), Public Health 22(3), 383–389 (2012) 603–611 (1989) 32. Pagnacco, G., Carrick, F.R., Wright, C.H.G., Oggero, E.: Between- 13. Groop, P.-H., Forsblom, C.: Mechanisms of disease: pathway- subjects differences of within-subject variability in repeated bal- selective insulin resistance and microvascular complications of ance measures: consequences on the minimum detectable change. diabetes. Nat. Rev. Endocrinol. 1(2), 100 (2005) Gait Posture 41(1), 136–140 (2015) 14. Harvey, A., Brand, A., Holgate, S.T., Kristiansen, L.V., Lehrach, 33. Pasquier, T., Lau, M.K., Trisovic, A., Boose, E.R., Couturier, B., H., Palotie, A., Prainsack, B.: The future of technologies for per- Crosas, M., Ellison, A.M., Gibson, V., Jones, C.R., Seltzer, M.: If sonalised medicine. New Biotechnol. 29(6), 625–633 (2012) these data could talk. Sci. Data 4, 170114 (2017) 15. Hayes, D.F., Markus, H.S., Leslie, R.D., Topol, E.J.: Personal- 34. Pickering, T.G., Hall, J.E., Appel, L.J., Falkner, B.E., Graves, J., ized medicine: risk prediction, targeted therapies and mobile health Hill, M.N., Jones, D.W., Kurtz, T., Sheps, S.G., Roccella, E.J.: technology. BMC Med. 12(1), 37 (2014) Recommendations for blood pressure measurement in humans and 16. Imbelloni, L.E., Beato, L., Tolentino, A.P., de Souza, D.D., experimental animals. Circulation 111(5), 697–716 (2005) Cordeiro, J.A.: Monitores automáticos de pressão arterial: avali- 35. Pinho, J.R.R., Sitnik, R., Mangueira, C.L.P.: Personalized medicine and the clinical laboratory. Einstein (São Paulo) 12(3), 366–373 ação de três modelos em voluntárias. Rev. Bras. Anestesiol. 54(1), (2014) 43–52 (2004) 36. Rossi, S., et al.: p-Medicine: from data sharing and integration 17. Imran, M., Hlavacs, H., Haq, I.U., Jan, B., Khan, F.A., Ahmad, A.: via VPH models to personalized medicine. Ecancermedicalscience Provenance based data integrity checking and verification in cloud (2011). https://doi.org/10.3332/ecancer.2011.218 environments. PLoS ONE 12(5), e0177576 (2017) 18. Institute of Medicine: Dietary Reference Intakes: Applications in Dietary Assessment. National Academy Press, Washington (2000) 123 International Journal of Data Science and Analytics 37. Project DISCIPULUS: Digitally integrated scientific data for 46. Strickland-Marmol, L.B., Muro-Cacho, C.A., Barnett, S.D., Banas, patients and populations in user-specific simulations—project id: M.R., Foulis, P.R.: College of American pathologists cancer proto- 288143, 2013 cols: optimizing format for accuracy and efficiency. Arch. Pathol. 38. Project myhealthavatar: A demonstration of 4d digital avatar infras- Lab. Med. 140(6), 578–587 (2016) tructure for access of complete patient information—project id: 47. Kalra, D., Beale, T., Heard, S.: The openEHR foundation. Stud 600929, 2013 health Technol Inform 115, 153–173 (2005) 39. Rodriguez-Segade, S., Rodriguez, J., Garca Lpez, J.M., Casanueva, 48. Vitolo, M.R. Avaliação nutricional no adulto. In: Vitolo, M.R. F.F., Camia, F.: Intrapersonal hba1c variability and the risk of pro- Nutrição: da gestação ao envelhecimento, 2nd edn, pp. 377–397. gression of nephropathy in patients with type 2 diabetes. Diabetic Rubio, Rio de Janeiro (2008) Med. 29(12), 1562–1566 (2012) 49. Vogenberg, F.R., Barash, C.I., Pursel, M.: Personalized medicine: 40. Saenger, P., Czernichow, P., Hughes, I., Reiter, E.O.: Small for part 2: ethical, legal, and regulatory issues. Pharm. Ther. 35(11), gestational age: short stature and beyond. Endocr. Rev. 28(2), 219– 624 (2010) 251 (2007) 50. Wadén, J., Forsblom, C., Thorn, L.M., Gordin, D., Saraheimo, M.: 41. Savard, J.: Personalised medicine: a critique on the future of health A1c variability predicts incident cardiovascular events, microal- care. J. Bioethical Inquiry 10(2), 197–203 (2013) buminuria, and overt diabetic nephropathy in patients with type 1 42. Scheen, A.J., Giet, D.: Personalized medicine: all benefits for the diabetes. Diabetes 58(11), 2649–2655 (2009) patient but new challenge in the physician–patient relationship. 51. Weiskopf, N.G., Weng, C.: Methods and dimensions of electronic Rev. Med. Liege 70(5–6), 247–250 (2015) health record data quality assessment: enabling reuse for clinical 43. Schleidgen, S., Klingler, C., Bertram, T., Rogowski, W.H., Marck- research. J. Am. Med. Inform. Assoc. 20(1), 144–151 (2013) mann, G.: What is personalized medicine: sharpening a vague term 52. Wyatt, J.C., Wright, P.: Design should help use of patients’ data. based on a systematic literature review. BMC Med. Ethics 14(1), Lancet 352, 1375–1378 (1998) 55 (2013) 53. Xiao, L., Cousins, G., Courtney, B., Hederman, L., Fahey, T., 44. Sempos C.T., Looker A.C., Johnson C.L., Woteki C.E. The impor- Dimitrov, B.: Developing an electronic health record (EHR) for tance of within-person variability in estimating prevalence. In: methadone treatment recording and decision support. BMC Med. Macdonald, I. (eds.) Monitoring dietary intakes. ILSI Mono- Inform. Decis. Mak. 11(5), 2–10 (2011) graphs. Springer, London (1991). https://doi.org/10.1007/978-1- 54. Zarrati, M., Shidfar, F., Moradof, M., Nejad, F.N., Keyvani, H., 4471-1828-2_9 Hemami, M.R., Razmpoosh, E.: Relationship between breast feed- 45. Shortliffe, E.H., Cimino, J.J.: Biomedical Informatics: Computer ing and obesity in children with low birth weight. Iran. Red Crescent Applications in Health Care and Biomedicine (Health Informatics), Med. J. 15(8), 676–682 (2013) 4th edn. Springer, Berlin (2015) http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png International Journal of Data Science and Analytics Springer Journals
Free
9 pages
Loading next page...
 
/lp/springer_journal/personalised-medicine-challenges-quality-of-data-w0S1o5E7yi
Publisher
Springer International Publishing
Copyright
Copyright © 2018 by The Author(s)
Subject
Computer Science; Data Mining and Knowledge Discovery; Database Management; Artificial Intelligence (incl. Robotics); Computational Biology/Bioinformatics; Business Information Systems
ISSN
2364-415X
eISSN
2364-4168
D.O.I.
10.1007/s41060-018-0127-9
Publisher site
See Article on Publisher Site

Abstract

Personalised evidence-based-medicine aims to use stored health data to prevent future illnesses. This implies that data should be stored in a readable and understandable form, at least until the death of the person in question. The aim of this paper is to discuss the challenges that arise from the existing pressure to maintain health data in electronic format for many decades. Today clinical databases are filled with heterogeneous data regarding who has collected it, protocols used, detail, precision, and subjectivity. Some data elements are typically more exposed to these problems (e.g. diagnosis) than others (e.g. laboratory results). It is critical that data scientists fully understand how data were collected. Also, it is very important to store context information, protocols used and accuracy/precision information in clinical databases to ensure future understanding of such data. Keywords Quality of data · Personalised medicine · Electronic patient record · Heterogeneous data 1 Introduction of such data may occur many years after or in a completely different setting than when and where it was collected. Personalised evidence-based-medicine (EbM) uses stored Despite many efforts along the years for improving nor- health data, namely of patient diagnoses, laboratory work, malisation and standardization of clinical data, concerns insure claims, and demographic information among other. regarding these aspects are still present in recent initiatives This information allows to move beyond the reactive approach intending to push forward personalised medicine. Projects of treating illness, allowing healthcare providers to predict such as FP7 MyHealthAvatar [38] and DISCIPULUS [37] and prevent future illnesses [14] and therefore become a embody the relevance of having digital clinical informa- promising application area for data science as a discipline. tion for pursuing personalised medicine thus reinforcing the Nevertheless, this area has some specific challenges as the use importance on guarantying completeness regarding patient data allowing a complete view and integrated analysis of the patient health: to this end the methods used for the acquisition NORTE-01-0145-FEDER-000016 (NanoSTIMA) is financed by the of information must be such that information is given as a North Portugal Regional Operational Programme (NORTE 2020), standardised set of data and preferably provided with uncer- under the PORTUGAL 2020 Partnership Agreement, and through the tainty ranges. This concerns continued to the next call within European Regional Development Fund (ERDF). the Virtual Physiological Human (VPH) with more hands-on B Ricardo Cruz-Correia approach like (p-medicine: from data sharing and integration rcorreia@med.up.pt via VPH models to personalised medicine) [36]. Another Duarte Ferreira relevant aspect, considered critical to the development of duarteng.ferreira@gmail.com personalised medicine, is surfacing from recent initiatives Gustavo Bacelar like IGNITE (Implementing GeNomics In pracTicE) projects gbacelar@gmail.com [5] where the priority for having genomic data as a part of Pedro Marques the electronic health record is considered to be high. These pmarques@med.up.pt projects need to deal with data quality and precision issues, Priscila Maranhão data heterogeneity, and data aggregation to create a "big pic- priscilamaranhao@gmail.com ture" representation of the patient. CINTESIS - Center for Health Technology and Services Research, Rua Dr. Plácido da Costa, Porto, Portugal 123 International Journal of Data Science and Analytics Therefore when one considers health-related data, it defines the logical structures used in the modelling process. becomes very important to consider the longevity of data On top of the reference model structures, we have data points, regarding its usefulness and how will data age [7]. Due to which in openEHR are called archetypes, and are composable the fact that we do not know for sure how data will be used structures that define the way clinical data should be stored. in the future and therefore its value, the best way to protect Each archetype has an identifier, and each data point can be such use opportunities is to store all data in a readable and accessed through a path inside the archetype. These identi- understandable form, at least until the death of the person in fiers and paths are unique and independent of the context in question. which the archetype is used. Take for example the data of newborns related to having Archetypes can be combined in higher level, context- low weight at childbirth. It is known that being small for specific structures that are called templates. These templates gestational age (SGA) has a higher risk of short stature than can inherit all or only a part of each archetype’s data points. children born at normal size do [40]. Besides, SGA has been This is a very powerful characteristic that allows re-usability associated with an increased prevalence of cardiovascular in different contexts without losing the ability to compare disease, essential hypertension, and metabolic disease, par- data points between different templates that use the same ticularly type 2 diabetes mellitus [22]. In another example, archetype. Application developers can base persistence mod- the short duration of breastfeeding is significantly correlated els, interfaces, and forms on openEHR templates. with metabolic syndrome and obesity in childhood, being an In Table 1, different data types were grouped according to important factor for preventing metabolic disorders [31,54]. the types of clinical and administrative entries of openEHR In these cases, maintaining the data related to the birth weight standard. We also added a system usage data category that and breastfeeding time of each newborn in their patient record is not present in openEHR standard. This data category does may become very useful to calculate the risk of cardiovascu- not represent clinical or administrative data but data about the lar disease of each person many years after. systems, errors, access logs, or how the system contact with Considering the life expectancy at birth in 2014 in Europe other systems. Although not directly related to healthcare is around 80.9 years, according to Eurostat [10], we should delivery, it is of the utmost importance for data provenance, be aiming to use formats and structures that could sustain traceability, and data security [8]. Thus, the purpose of each such a lifespan to store our own data. This means that data clinical entry is as follows [47]: collected (e.g. the weight of newborns) in 2017 should be stored in a format understandable until 2098. – Observation for recording information from the patient’s The aim of this paper is to discuss the difficulties and world—anything measured by a clinician, a laboratory or possible solutions to problems that rise from the existing by them, or reported by the patient as a symptom, event pressure to maintain health data in electronic format for many or concern; decades. – Evaluation for recording opinions and summary state- ments (usually clinical), such as problems, diagnoses, risk assessments, goals, etc., that are generally based on 2 Types of health data observation evidence; – Instruction for recording orders, prescriptions, directives, Health data can be very different depending on its type (e.g. and any other requested interventions; observations, evaluations, instructions), or if it is collected by – Action for recording actions, which may be due to Instruc- humans (e.g. medical doctors, nurses, patients) or machines tions, e.g. drug administrations, procedures, etc; (e.g. ICU monitoring systems, digital thermometer). Some of – Administrative for recording administrative events, e.g. these values are interpreted by humans before data input and admission, discharge, consent, etc. therefore vulnerable to subjectivity, others are automatically sampled or aggregated (e.g. the mean blood pressure of the last 24 h). The protocol used to collect data is very rarely Moreover, for each type of data represented in Table 1,a recorded, and in some settings, it may often change along few examples are given to illustrate and a set of issues (with the years (e.g. use of a different anatomical location, patient a − is a problem with + is an advantage) are itemised. Below position, or device to measure the blood pressure). follows a description of this issues. OpenEHR is a standard (http://www.openEHR.org) that provides an open-source software infrastructure for imple- – Observation data, e.g. temperature and blood pressure, menting an electronic health record (EHR) in a clinical when collected by: knowledge domain [53]. It is based on a multi-level, service- oriented architectural and follows a single-source modelling – Humans are subject to reading or interpretation errors approach. At its core, there is a stable reference model that when transcribing, subjectivity when value reading is 123 International Journal of Data Science and Analytics Table 1 Relation between types of data collected and issues affecting quality of data Type of data Examples Collected by Humans Machines Observations Temperature − Make transcriptions + Usually sensors Blood pressure − Subjective − Do not store protocol data − Lack of standards Evaluations Diagnosis − Subjective Adverse reaction risk − Protocols changes − Need to teach terminology Instructions (orders) Laboratory orders + Reliable Actions Laboratory reports − Incomplete data + Automatic analysis Surgery reports Administrative Patient ID number, name − Some basic data (eg. + Easier when based on ID name) may change cards Date of birth and gender System usage Audit trails − Security credential shar- − Consistent time issues ing Messaging logs not clear [19], lack of standardization in the way they – Administrative data, e.g. patient or visit identification and are stored. demographics data, when collected by: – Machines the error level is known when using sen- – Humans most systems depend critically on the cor- sors, but there is still a lack of storage of data rect identification of persons (both patients and health collection protocol. professionals). Incorrect identification is more com- – Evaluation data, e.g. diagnosis and adverse reaction risk, mon than most admit and can, obviously, have a huge when collected by: impact on patient care. Furthermore, some data ele- ments we use daily to identify persons univocally may – Humans are also subject to subjectivity [19], lack not be that unique or change during our lifetime (eg. of standardization on the way terminology is used women’s name change after marriage in many cul- and understood, and the meaning of some disease tures). concepts changes through time when we consider – Machines the widespread use of ID cards and other decades. similar identification technologies have improved the – Instruction data, e.g. laboratory orders, when collected quality of the data and also the time and effort to by: collect such data. – Humans these data are normally reliable in the sense – System usage data, e.g. audit trails and messaging logs, that it describes what the person that filled the order when collected by: really asked for. The use of terminologies, in this case, – Humans sharing credentials (e.g. login and password) is well known and very common. or computer sessions are a major limitation to analyse – Actions data, e.g. laboratory or surgery reports, when audit trail information both for auditing or process collected by: mining purposes. – Machines even data that are collected automati- – Humans these data may suffer from incompleteness cally may have many problems. For instance, having when introduced by humans, as it tends to be very consistent time in logs is still uncommon in many verbose (many values) and we tend to value more the settings. Many servers do not have their time syn- interpretation of those values than the values them- chronised, and the logs do not use proper time/date selves. standards (e.g. ISO8601) to deal with timezones or – Machines the automatic collection and analysis of daylight saving time. action reports, when possible, have a great potential and is common in most healthcare settings. 123 International Journal of Data Science and Analytics PatID | Date | AP +-| Form in use since 1992 |-+ --------+----------+---- | [ ] Allergy to penicillin | 254 | 03.05.92 | 0 +----------------------------+ 255 | 04.05.92 | 1 ... | ... | ... +--| Form in use since 2002 |---------------------------+ 1232 | 06.07.02 | 0 | Allergy to penicillin: () Yes () No () Unknown | 1233 | 07.07.02 | 1 +-------------------------------------------------------+ 1234 | 08.07.02 | 9 ... | ... | ... +--| Form in use since 2013 |---------------------------------+ 8872 | 18.02.13 | 0 | Allergy to penicillin: | 8875 | 20.02.13 | 0 | () Yes, confirmed by doctor () No, confirmed by doctor | 8877 | 20.02.13 | 1 | () Yes, informed by patient () No, informed by patient | 8879 | 20.02.13 | 9 | () Unknown | 8901 | 21.02.13 | 2 +-------------------------------------------------------------+ 8903 | 22.02.13 | 3 Fig. 1 Illustration of how different versions of a form feed the same data table, making it difficult to interpret the meaning of each answer. AP means allergy to penicillin 3 Data collection form issues human factors engineering should be considered in the dis- play of patient data. As stated in [45], few problems are more challenging than the development of effective techniques for capturing patient data accurately, completely, and efficiently. Although more 3.2 Data values during form changes and more data are being collected using sensors or other auto- matic forms, most data existing in clinical databases are still To better illustrate the case, imagine that a particular hos- the result of filling a form by a health professional or patient. pital has been collecting data about allergy to penicillin of These forms are present inside electronic patient records their patients (see Fig. 1). These data are being collected (EPRs) and typically have both very structured data entries since 1992. During these years, the forms used to collect this and narrative entries to record patient data. The amount of data have changed, aiming to improve the quality of data. structured data is dependent on the time it takes to fill it in, The software developers have chosen to have the allergy to the importance of such data elements to the institution, and penicillin values recorded in the same database field inde- the difficulty to structure it in multiple data elements in oppo- pendently of the form used. Unfortunately, changing forms sition to leaving an open text field. without changing the data structures where those values are recorded or storing the form version used to collect, is much more common than one would expect. In this (common) case, interpretation of such values will be much harder. 3.1 Form formats One very important aspect related to the quality of data is the forms used to present and collect such data [52]. This 3.3 Paper versus electronic forms has been realised in many clinical scenarios, and therefore, efforts must be made to standardise such forms. One important difference between storing patient data in There is evidence that question wording and framing, paper forms and computer systems is the fact that whilst in including the choice and order of response categories, can paper forms the questions and the answers are stored together have an important impact on the nature and quality of (the paper form), in computer systems the questions only responses [27]. McColl et al. also stated that through careful exist as software forms in an application and the answers are attention to the design and layout of questionnaires, the risk stored in the databases. Also, health institutions normally do of errors in posing and interpreting questions and in recording not maintain the previous versions of the computer systems, and coding responses can be reduced, and potential inter-rater easily leading to a situation where you have the answers pro- variability can be minimised. vided by health professionals, but one does not know the Another example is a work aiming to define a synopsis exact questions that were made. Instead of the questions, format that is effective in delivering essential pathological one ends up with a list of field names that may not describe information and to evaluate the aesthetic appeal and the the question made to the user. Knowing the answers with- impact of varying format styles on the speed and accuracy out knowing the exact questions is not useful and may be of data extraction [46]. One of its main conclusions is that dangerous. 123 International Journal of Data Science and Analytics 3.4 Data transformation When data quality and validation are guaranteed then med- ical records might be more suitable data sources in clinical These issues (lack of formalism and clarity in data handling) trials [6]. produce a low rate of reproducibility in research [33]. The use of data provenance, which is a formal representation of 4.3 The variation of the clinical measures computational processes, may be a solution for this issue. Complex computational tools to analyse with large quantities Pagnacco et al. [32] described that measurement errors are of data create the need for more precise descriptions of the mostly random in nature. In other words, assuming a random origin of data the transformations that have been applied to nature of measurement errors also assumes that the within- those data, and the implications of the results. Pasquier et subjects variability, i.e. the variance of the results obtained al. suggestion of publishing the source code used for data from the same subject, is similar between the subjects exam- transformations in scientific papers, could be extended to also ined. This assumption usually holds true in engineering, include the source code of the systems used in data collection. where the “subjects” are inanimate objects. However, when Data provenance refers to attributes of the origin of infor- dealing with humans and clinical measures, this supposition mation, it can help in guaranteeing proof of data integrity assumption is rarely satisfied because the variability is caused [17], which is very useful when using cloud environments or not only by external random factors but also by the subjects storing data for long periods of time. conditions and reactions to random endogenous and exoge- nous stimuli. Thus, the researchers should have a particular attention to the reasons for such variation in clinical measure- 4 Storing the error level ments obtained in clinical trials, mainly intra-patients [30]. Different healthcare professionals can get different results Databases are used to store facts. The data should have when acquiring data from a patient (e.g. vital signs, height, enough precision, detail, and context to be properly under- weight). The variability in patient’s measurements affects the stood and analysed. In the healthcare domain, data can be ability to have reliable results. Most of this variability can collected using many different protocols and devices through be explained by the professional level of training and experi- time. ence, and also by patient’s individual characteristics [28].The adoption of EHR can support quality control procedures, pro- 4.1 Medical devices viding definitions and giving oriented advice at the moment of care (e.g. how to collect the data, confounding factors, A large portion of medical data is originated in medical etc.) [51].This would improve measuring standardization and devices. Like all other calibrated devices, they have the bring benefits to clinical trials data quality. Nevertheless, a capability of measuring physical properties just to a certain fundamental challenge persists: how to control data variabil- accuracy and precision. Obviously, no measurement is per- ity related to patients’ characteristics. fect and all have some error associated with them. Accuracy and precision are necessary to ensure that results are valid. As an example, there are few studies addressing the reli- 5 Clinical measures in reality ability of home blood pressure monitoring devices and the quality of its data. Jung et al. present a study that aimed to Nutrition assessment seeks to detect nutritional problems, evaluate the current status of home BP devices in terms of collaborating for the promotion and recovery of health [26]. validation and accuracy [21]. This study showed that non- For example, the bioimpedance is a useful method to assess validated devices are used widely in clinical practice and a the body composition. However, it has positive and negative substantial portion is inaccurate. Storing the capability of the points: there are so many rules to the exam to be performed device to measure the reality is also important to properly use as the patient does not drink alcohol, caffeine and does not patient data in the future. do exercise 24h before the exam; women cannot do the exam in menstrual period, and the exam must be done in fasting 4.2 Medical records state (of water too), and the bladder must be empty before the exam [48]. This reality can also be found when collecting more subjec- Thus, some studies have been reported that is much vari- tive information in medical records. In these cases, knowing ability among the bioimpedance results, mainly related to the exact source of the data, e.g. doctor opinion or patient the use of equations without the actual knowledge and the description, may have an impact on the interpretation of hydration status of the patient [12,23]. Therefore, we can data. It should be considered the possibility of adding the perceive the importance of the patient to follow the stipu- source of data, and the reliability the user puts in such source. lated rules for clinical research exams. The EHR can help 123 International Journal of Data Science and Analytics the health professional choose the better way to assess the Another issue is clinical evaluator variability. To reduce patients body composition, but patient’s involvement in the evaluator’s variability the same person should do all the clin- study is fundamental as he/she needs to follow the project ical measures. But even then it still would be impossible to rules. guarantee that health professionals are systematic in perform- Another example is the assessment of the dietary intake; ing all tasks, i.e. positioning the cuff of the blood pressure in this case, intrapersonal variability is very significant. The device exactly in the same position every time, or taking the estimation of consumption based on just a few days of col- same amount of adipose panicle with the adipometer to assess lection leads to critical failures in this context [44]. In other body composition. hands, the short time of food observation does not reflect the Thus, we understand that computer system can improve habitual intake [9]. For this, the number of the days of evalua- data quality, but unfortunately, the adequacy of the patient’s tion and the kind of instrument (24h-record; diary food, etc.) clinical trial rules, and storing information about the protocol are significant tools to obtain accurate results. The Institute used in each case, are still essential for this quality of data. of Medicine (IOM) [18], which published the Dietary Ref- erence Intakes (DRI), takes into account both the variability of the nutrient requirement in individuals and intrapersonal 6 Personalised medicine: pros and cons variability of intake. For its application, however, it is nec- essary to use values of intrapersonal variability, expressed Personalised medicine is defined as the use of genomic and by the intrapersonal standard deviation of ingestion of each other biotechnologies to derive information about an individ- nutrient, obtained in studies with the same population [26]. ual that could then be applied to obtain information on types However, we cannot forget that the patients commitment to of health interventions that would best suit that individual filling all food instrument and not lie about the dietary intake, [4,41]. Over recent years, considerable technical advances is a fundamental point to obtain correct data about the dietary have increasingly linked personalised medicine with preven- intake. tive medicine. Although this process provides benefits in Similarly, the blood pressure measurement presents intrin- treating patients, particularly regarding the genetic profile, sic variability [16]. Blood pressure measurement in some challenges, mainly regarding the lay public, still exist. Thus, cases is still performed in a non-standardised way [24]; some certain points must be discussed to ensure the protection and factors as the health professional, environment; equipment, fair treatment of individuals [20,42]. technical and the patient can interfere in trustworthiness to Personalised medicine is an example of what medicine blood pressure assessment. The protocol recommendations desires to be in the future: specific, rigorous, and able to con- include avoiding physical exercise 60 min before the exam, trol disease and death [41]. For this, this field applies tools drinking alcohol, coffee, smoking 30 min before the assess- that enable risk assessment and prediction, such as health ment, not talking and keeping your legs uncrossed, these are risk assessments, family history and, mainly, genetic infor- just a few recommendations [34]. mation [11]. The advantages of personalised medicine are Besides, other types of patients intrapersonal variability, not simply applicable to patient treatment, but also aid in the environments are important to the trustworthiness of clinical prevention and prediction of disease by identifying genetic trials results. Rodriguez-Segade et al. [39] investigated the predispositions, predicting a potential patient [11,41]. association between nephropathy and HbA1c variability in Recent studies have been reported concerning the bene- 2103 patients followed up for a mean 6.6 years. The authors fits of personalised medicine, which covers biomarker used to concluded that in patients with type 2 diabetes, the risk detect specific genetic traits and to guide different approaches of progression of nephropathy increases significantly with towards the prevention and treatment of diseases, offer- HbA1c variability, independently of updates mean HbA1c. ing substantial healthcare savings [35]. Najafzadeh et al. To explain this point, the author hypothesizes that lifestyle [29] described several potential advantages of personalised influences HbA1c variability. Greater HbA1c variability hav- medicine, such as possible applications of pharmacoge- ing been reported to be associated with unfavourable lifestyle nomics in tailoring treatments to improve effectiveness and factors among patients with type 1 diabetes [50]. However, minimise adverse effects; disease diagnosis; genomic testing other authors demonstrate that this variability can be involved in preventive medicine and the identification of new condi- in a low socioeconomic class [50], and insulin resistance, tions. which it has been implicated in the pathogenesis of dia- In addition, cancer prevention and treatment appears as the betes complications [13]. These studies demonstrate that the greatest potential in the field of personalised medicine [15]. patient’s variability is associated with environments and also For example, regarding breast cancer, different immunologic genetic factors which are essential be considered by the clin- markers have been applied to indicate the best treatment ical trials investigators. option and to assess metastasis and recurrence risk [1], whilst colon cancer therapy can be evaluated by genetic testing; 123 International Journal of Data Science and Analytics for example, homozygotes subjects for the UGT1A1, *28 about their predisposition to diseases, and the affordability of allele show increased risk for neutropenia after treatment these tests for disadvantaged socioeconomic groups should with irinotecan, with a reduction in the starting dose being also be taken into account [29]. In addition, citizens also advised [25]. appear concerned about personalised medicine perspectives; However, although personalised medicine has been applied they believe that personalised tests might be used to ration in multiple areas, mainly in oncology and cardiology, and care and that treatment should be applied only if the patient many benefits for patient care have been noted, risks can- wants it. This issue raises clinical and policy challenges that not be ignored. The field also offers many disadvantages, may undermine the value of personalised medicine. Further and several challenges are presents in this context, related efforts to deliberate with the public are warranted in order to informed consent, confidentiality, genetic discrimination to inform effective, efficient, and equitable translations of and direct-to-consumer genetic testing, among others [20]. personalised medicine [3]. Nevertheless, as research/treatment makes use of person- Through these advantages, it is possible to understand alised medicine, becoming increasingly more common, the that personalised medicine raises certain challenges, both for management of the ethical and legal issues becomes even physician and healthcare systems, including the enormous more necessary. For example, in a hypothetical situation number of available tests, the fast development of testing related to a patient that presented no response to a specific technologies, the decreasing unit cost per tested mutation treatment, the health insurance requires genetic testing to and the potential of diagnostic and screening technologies determine drug safety and efficacy, in order to avoid unnec- to determine subsequent individual care pathways. Addi- essary cost burdens. This point is an advantage regarding cost tionally, the support for required economic evidence to be perspective, but it is not ethical, and the patient might prefer produced to improve personalised medicine reimbursement to incur the risks to avoid generating and releasing his/her and coverage decisions is also noteworthy [43]. genomic information [49]. Another ethical issue is related From these described advantages and disadvantages, it to genotype-driven research recruitment. This is a poten- is possible to understand the challenges of personalised tially powerful tool for studying the functional significance medicine in order to aid in creating strategies to prepare the of human genetic variation. However, the genetic informa- future healthcare system, reducing errors and improving the tion generated for one study might be used as the basis for pros of personalised medicine. identifying and reconnecting participants for other studies [2]. These and other points justify the development of rules and clear guidelines to ensure the control and safety of the obtained information. 7 Discussion Other disadvantages and challenges in the use of person- alised medicine can also be cited. Najafzadeh et al. [29] The quality of clinical data reducing errors of clinical results performed a semi-structured focus group with 28 physicians is a challenge. The electronic health record can help to to discuss general themes about personalised medicine. From improve it, but clinicians must have attention on innumerable these focus groups, the authors categorised the disadvan- factors, mainly related to patient involvement. We suggest tages of personalised medicine expressed by experts into that despite the use of EHR, health professionals should three perceived issues: validity uncertainty, equity issues, always discuss their results amply and attentively and pro- and implementation. The authors described that the physi- mote the inclusion of the protocol used collecting data in an cians expressed concerns, mainly related to the uncertainty attempt to improve clinical data. around the validity of genomic tests given the complexity of Information in healthcare institution has the potential to gene expression, which was mentioned as a major concern. be very valuable due to the existing amount of data, and the Other points mentioned as potential disadvantages included economic value of the decisions they describe. To fulfil this financial incentives for private companies for excessive mar- potential today and in the future, it is critical that data scien- keting of their services, possible mishandling of genomic tists fully understand how these data were collected. Storing information by private companies, and discrimination based context information, protocols used and precision/accuracy on the genomic data (by insurance companies, the healthcare information in clinical databases helps to ensure future under- system, and employers). standing of such data. In spite of the fact that personalised medicine consid- ers the patient-centred approach to treatment and that it be Compliance with ethical standards most advantageous in the clinical practice, higher public engagement regarding this issue is required. Physicians also believe that lack of public knowledge about genomic tests Conflict of interest On behalf of all authors, the corresponding author presents possible unsafe impacts to patients after learning states that there is no conflict of interest. 123 International Journal of Data Science and Analytics Open Access This article is distributed under the terms of the Creative 19. Jansen, A.C.M., van Aalst-Cohen, E.S., Hutten, B.A., Büller, H.R., Commons Attribution 4.0 International License (http://creativecomm Kastelein, J.J.P., Prins, M.H.: Guidelines were developed for data ons.org/licenses/by/4.0/), which permits unrestricted use, distribution, collection from medical records for use in retrospective analyses. and reproduction in any medium, provided you give appropriate credit J. Clin. Epidemiol. 58(3), 269–274 (2005) to the original author(s) and the source, provide a link to the Creative 20. Joly, Y., Saulnier, K.M., Osien, G., Knoppers, B.M.: The ethi- Commons license, and indicate if changes were made. cal framing of personalized medicine. Curr. Opin. Allergy Clin. Immunol. 14(5), 404–408 (2014) 21. Jung, M.-H., Kim, G.-H., Kim, J.-H., Moon, K.-W., Yoo, K.-D., Rho, T.-H., Kim, C.-M.: Reliability of home blood pressure mon- References itoring: in the context of validation and accuracy. Blood Press. Monit. 20(4), 215–220 (2015) 1. Arpino, G., Generali, D., Sapino, A., Del Matro, L., Frassoldati, 22. Kuhle, S., Maguire, B., Ata, N., MacInnis, N., Dodds, L.: Birth A., de Laurentis, M., Pronzato, P., Mustacchi, G., Cazzaniga, M., weight for gestational age, anthropometric measures, and cardio- De Placido, S., et al.: Gene expression profiling in breast cancer: a vascular disease markers in children. J. Pediatr. 182(21), 99–106 clinical perspective. Breast 22(2), 109–120 (2013) (2016) 2. Beskow, L.M., Namey, E.E., Miller, P.R., Nelson, D.K., Cooper, A.: 23. Kushner, R.F.: Bioelectrical impedance analysis: a review of prin- Irb chairs perspectives on genotype-driven research recruitment. ciples and applications. J. Am. Coll. Nutr. 11(2), 199–209 (1992) IRB 34(3), 1 (2012) 24. Lehane, A., O’Brien, E.T., O’Malley, K.: Reporting of blood pres- 3. Bombard, Y., Abelson, J., Simeonov, D., Gauvin, F.-P.: Citizens’ sure data in medical journals. Br. Med. J. 281(6255), 1603 (1980) perspectives on personalized medicine: a qualitative public delib- 25. Liu, C.-Y., Chen, P.-M., Chiou, T.-J., Liu, J.-H., Lin, J.-K., Lin, T.- eration study. Eur. J. Hum. Genet. 21(11), 1197–1201 (2013) C., Chen, W.-S., Jiang, J.-K., Wang, H.-S., Wang, W.-S.: Ugt1a1* 4. Burke, W., Psaty, B.M.: Personalized medicine in the era of 28 polymorphism predicts irinotecan-induced severe toxicities genomics. JAMA 298(14), 1682–1684 (2007) without affecting treatment outcome and survival in patients with 5. Cavallari, L.H., Sperber, N.R., Carpenter, J.S., et al.: Challenges metastatic colorectal carcinoma. Cancer 112(9), 1932–1940 (2008) and strategies for implementing genomic services in diverse set- 26. Marchioni, D.M.L., Junior, E.V., Galvão Cesar, C.L., Fisberg, tings: experiences from the implementing genomics in practice R.M.: Avaliação da adequação da ingestão de nutrientes na prática (ignite) network. BMC Med. Genom. 10(1), 35 (2017) clínica. Rev. Nutr. 24(6), 825–832 (2011) 6. Cowie, M.R., Blomster, J.I., Curtis, L.H., Duclaux, S., Ford, I., 27. McColl, E., et al. Design and use of questionnaires: a review Fritz, F., Goldman, S., Janmohamed, S., Kreuzer, J., Leenay, M., of best practice applicable to surveys of health service staff et al.: Electronic health records to facilitate clinical research. Clin. and patients. Core Research (2001). https://www.researchgate. Res. Cardiol. 106(1), 1–9 (2017) net/profile/Lois_Thomas/publication/11550481_Design_and_ 7. Cruz-Correia, R.J., Wyatt, J., Dinis-Ribeiro, M., Costa-Pereira, use_of_questionnaires_a_review_of_best_practice_applicable_ to_surveys_of_Health_Service_staff_and_patients/links/ A.M.: Determinants of frequency and longevity of hospital encoun- 00b7d52df7cdf18db6000000/Design-and-use-of-questionnaires- ters’ data. BMC Med. Inform. Decis. Mak. 10(1), 15 (2010) a-review-of-best-practice-applicable-to-surveys-of-Health- 8. Cruz-Correia, R., Boldt, I., Lapão, L., Santos-Pereira, C., Servicestaff-and-patients.pdf Rodrigues, P.P., Ferreira, A.M., Freitas, A.: Analysis of the quality 28. McGregor, M., Cambron, J.A., Jedlicka, J., Gudavalli, M.R.: Clin- of hospital information systems audit trails. BMC Med. Inform. ical trial variability: quality control in a randomized clinical trial. Decis. Mak. 13(1), 84 (2013) 9. Cuppari, L.: Aplicacoes das dris na avaliacao da ingestao de nutri- Contemp. Clin. Trials 30(1), 20–23 (2009) entes para individuos. In: Usos e aplicaes das dietary reference 29. Najafzadeh, M., Davis, J.C., Joshi, P., Marra, C.: Barriers for inte- Inatakes DRI, Brasil I (2001) grating personalized medicine into clinical practice: a qualitative 10. Eurostat: Mortality and life expectancy statistics, 2017 analysis. Am. J. Med. Genet. Part A 161(4), 758–763 (2013) 11. Ginsburg, G.S., Willard, H.F.: Genomic and personalized medicine: 30. Normando, D.: Dental press journal of orthodontics: one year later, foundations and applications. Transl. Res. 154(6), 277–287 (2009) and more growth. Dent. Press J. Orthod. 22, 9–10 (2017) 12. Graves, J.E., Pollock, M.L., Colvin, A.B., Van Loan, M., Lohman, 31. Novaes, J.F., Lamounier, J.A., Colosimo, E.A., Franceschini, T.G.: Comparison of different bioelectrical impedance analyzers S.C.C.: Breastfeeding and obesity in Brazilian children. Eur. J. in the prediction of body composition. Am. J. Hum. Biol. 1(5), Public Health 22(3), 383–389 (2012) 603–611 (1989) 32. Pagnacco, G., Carrick, F.R., Wright, C.H.G., Oggero, E.: Between- 13. Groop, P.-H., Forsblom, C.: Mechanisms of disease: pathway- subjects differences of within-subject variability in repeated bal- selective insulin resistance and microvascular complications of ance measures: consequences on the minimum detectable change. diabetes. Nat. Rev. Endocrinol. 1(2), 100 (2005) Gait Posture 41(1), 136–140 (2015) 14. Harvey, A., Brand, A., Holgate, S.T., Kristiansen, L.V., Lehrach, 33. Pasquier, T., Lau, M.K., Trisovic, A., Boose, E.R., Couturier, B., H., Palotie, A., Prainsack, B.: The future of technologies for per- Crosas, M., Ellison, A.M., Gibson, V., Jones, C.R., Seltzer, M.: If sonalised medicine. New Biotechnol. 29(6), 625–633 (2012) these data could talk. Sci. Data 4, 170114 (2017) 15. Hayes, D.F., Markus, H.S., Leslie, R.D., Topol, E.J.: Personal- 34. Pickering, T.G., Hall, J.E., Appel, L.J., Falkner, B.E., Graves, J., ized medicine: risk prediction, targeted therapies and mobile health Hill, M.N., Jones, D.W., Kurtz, T., Sheps, S.G., Roccella, E.J.: technology. BMC Med. 12(1), 37 (2014) Recommendations for blood pressure measurement in humans and 16. Imbelloni, L.E., Beato, L., Tolentino, A.P., de Souza, D.D., experimental animals. Circulation 111(5), 697–716 (2005) Cordeiro, J.A.: Monitores automáticos de pressão arterial: avali- 35. Pinho, J.R.R., Sitnik, R., Mangueira, C.L.P.: Personalized medicine and the clinical laboratory. Einstein (São Paulo) 12(3), 366–373 ação de três modelos em voluntárias. Rev. Bras. Anestesiol. 54(1), (2014) 43–52 (2004) 36. Rossi, S., et al.: p-Medicine: from data sharing and integration 17. Imran, M., Hlavacs, H., Haq, I.U., Jan, B., Khan, F.A., Ahmad, A.: via VPH models to personalized medicine. Ecancermedicalscience Provenance based data integrity checking and verification in cloud (2011). https://doi.org/10.3332/ecancer.2011.218 environments. PLoS ONE 12(5), e0177576 (2017) 18. Institute of Medicine: Dietary Reference Intakes: Applications in Dietary Assessment. National Academy Press, Washington (2000) 123 International Journal of Data Science and Analytics 37. Project DISCIPULUS: Digitally integrated scientific data for 46. Strickland-Marmol, L.B., Muro-Cacho, C.A., Barnett, S.D., Banas, patients and populations in user-specific simulations—project id: M.R., Foulis, P.R.: College of American pathologists cancer proto- 288143, 2013 cols: optimizing format for accuracy and efficiency. Arch. Pathol. 38. Project myhealthavatar: A demonstration of 4d digital avatar infras- Lab. Med. 140(6), 578–587 (2016) tructure for access of complete patient information—project id: 47. Kalra, D., Beale, T., Heard, S.: The openEHR foundation. Stud 600929, 2013 health Technol Inform 115, 153–173 (2005) 39. Rodriguez-Segade, S., Rodriguez, J., Garca Lpez, J.M., Casanueva, 48. Vitolo, M.R. Avaliação nutricional no adulto. In: Vitolo, M.R. F.F., Camia, F.: Intrapersonal hba1c variability and the risk of pro- Nutrição: da gestação ao envelhecimento, 2nd edn, pp. 377–397. gression of nephropathy in patients with type 2 diabetes. Diabetic Rubio, Rio de Janeiro (2008) Med. 29(12), 1562–1566 (2012) 49. Vogenberg, F.R., Barash, C.I., Pursel, M.: Personalized medicine: 40. Saenger, P., Czernichow, P., Hughes, I., Reiter, E.O.: Small for part 2: ethical, legal, and regulatory issues. Pharm. Ther. 35(11), gestational age: short stature and beyond. Endocr. Rev. 28(2), 219– 624 (2010) 251 (2007) 50. Wadén, J., Forsblom, C., Thorn, L.M., Gordin, D., Saraheimo, M.: 41. Savard, J.: Personalised medicine: a critique on the future of health A1c variability predicts incident cardiovascular events, microal- care. J. Bioethical Inquiry 10(2), 197–203 (2013) buminuria, and overt diabetic nephropathy in patients with type 1 42. Scheen, A.J., Giet, D.: Personalized medicine: all benefits for the diabetes. Diabetes 58(11), 2649–2655 (2009) patient but new challenge in the physician–patient relationship. 51. Weiskopf, N.G., Weng, C.: Methods and dimensions of electronic Rev. Med. Liege 70(5–6), 247–250 (2015) health record data quality assessment: enabling reuse for clinical 43. Schleidgen, S., Klingler, C., Bertram, T., Rogowski, W.H., Marck- research. J. Am. Med. Inform. Assoc. 20(1), 144–151 (2013) mann, G.: What is personalized medicine: sharpening a vague term 52. Wyatt, J.C., Wright, P.: Design should help use of patients’ data. based on a systematic literature review. BMC Med. Ethics 14(1), Lancet 352, 1375–1378 (1998) 55 (2013) 53. Xiao, L., Cousins, G., Courtney, B., Hederman, L., Fahey, T., 44. Sempos C.T., Looker A.C., Johnson C.L., Woteki C.E. The impor- Dimitrov, B.: Developing an electronic health record (EHR) for tance of within-person variability in estimating prevalence. In: methadone treatment recording and decision support. BMC Med. Macdonald, I. (eds.) Monitoring dietary intakes. ILSI Mono- Inform. Decis. Mak. 11(5), 2–10 (2011) graphs. Springer, London (1991). https://doi.org/10.1007/978-1- 54. Zarrati, M., Shidfar, F., Moradof, M., Nejad, F.N., Keyvani, H., 4471-1828-2_9 Hemami, M.R., Razmpoosh, E.: Relationship between breast feed- 45. Shortliffe, E.H., Cimino, J.J.: Biomedical Informatics: Computer ing and obesity in children with low birth weight. Iran. Red Crescent Applications in Health Care and Biomedicine (Health Informatics), Med. J. 15(8), 676–682 (2013) 4th edn. Springer, Berlin (2015)

Journal

International Journal of Data Science and AnalyticsSpringer Journals

Published: Jun 2, 2018

References

You’re reading a free preview. Subscribe to read the entire article.


DeepDyve is your
personal research library

It’s your single place to instantly
discover and read the research
that matters to you.

Enjoy affordable access to
over 18 million articles from more than
15,000 peer-reviewed journals.

All for just $49/month

Explore the DeepDyve Library

Search

Query the DeepDyve database, plus search all of PubMed and Google Scholar seamlessly

Organize

Save any article or search result from DeepDyve, PubMed, and Google Scholar... all in one place.

Access

Get unlimited, online access to over 18 million full-text articles from more than 15,000 scientific journals.

Your journals are on DeepDyve

Read from thousands of the leading scholarly journals from SpringerNature, Elsevier, Wiley-Blackwell, Oxford University Press and more.

All the latest content is available, no embargo periods.

See the journals in your area

DeepDyve

Freelancer

DeepDyve

Pro

Price

FREE

$49/month
$360/year

Save searches from
Google Scholar,
PubMed

Create lists to
organize your research

Export lists, citations

Read DeepDyve articles

Abstract access only

Unlimited access to over
18 million full-text articles

Print

20 pages / month

PDF Discount

20% off