Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

A Geoprivacy by Design Guideline for Research Campaigns That Use Participatory Sensing Data:

A Geoprivacy by Design Guideline for Research Campaigns That Use Participatory Sensing Data: Participatory sensing applications collect personal data of monitored subjects along with their spatial or spatiotemporal stamps. The attributes of a monitored subject can be private, sensitive, or confidential information. Also, the spatial or spatiotemporal attributes are prone to inferential disclosure of private information. Although there is extensive problem- oriented literature on geoinformation disclosure, our work provides a clear guideline with practical relevance, containing the steps that a research campaign should follow to preserve the participants’ privacy. We first examine the technical aspects of geoprivacy in the context of participatory sensing data. Then, we propose privacy-preserving steps in four categories, namely, ensuring secure and safe settings, actions prior to the start of a research survey, processing and analysis of collected data, and safe disclosure of datasets and research deliverables. Keywords geoprivacy by design, location privacy, spatiotemporal data, mobile participatory sensors, disclosure risk, anonymization methods, research design, spatial analysis Silva, Zeile, de Oliveira Aguiar, Papastefanou, & Bergner, Introduction 2014). An a-spatial example is the HealthSense project that Participatory sensing refers to sensor data gained voluntarily improves the classification of health detection events from participants for personal benefits or to benefit the com- through user feedback information incorporated into munity (Christin, Reinhardt, Kanhere, & Hollick, 2011). machine learning techniques (Stuntebeck, Davis, Abowd, & Sensors are attached to mobile devices such as smartphones Blount, 2008). The application examples mentioned so far or smart wristbands, and typically collect data to be exam- collect and analyze objective measurements from sensors. ined (e.g., heart rate) along with other sensed data such as However, in some spatial studies subjective measurements location, time, pictures, sound, and video. The main sensing (i.e., provided by the participant via a questionnaire app) measurement can be collected for personal interest such as are collected to either complement objective measurements the BALANCE system that detects the caloric expenditure of of biometric sensors (Resch, Summa, Sagl, Zeile, & Exner, a user (Denning et al., 2009). Another application of partici- 2015), or measure emotions and perceptions (e.g., fear of patory sensing is to alert medical staff of their patients’ crime, happiness, perception of environmental and built abnormal behaviors like the MobAsthma application that phenomena, or mood) that are more difficult to capture via measures asthma peak flows, pollution, and location to biometric sensors (MacKerron & Mourato, 2013; Solymosi, inform on asthma attacks (Kanjo, Bacon, Roberts, & Bowers, & Fujiyama, 2015; Törnros et al., 2016; Zeile, Landshoff, 2009). These applications are human centric Memmel, & Exner, 2012). because they collect information about the individual who The usage of spatiotemporal participatory sensing data is carries the sensor. There are also environment-centric appli- a scientific trend in many fields, and the intensity of the cations, where the participant acts as a “human as sensor operator” and carries the mobile device to capture environ- University of Salzburg, Austria mental phenomena such as air quality or noise (Kanjo et al., Center for Geographic Analysis, Harvard University, Cambridge, MA, USA 2009; Maisonneuve, Stevens, Niessen, & Steels, 2009). Corresponding Author: Also, participatory sensing has been used for spatial as Ourania Kounadi, Postdoctoral Researcher, Department of well as a-spatial research studies. The EmbaGIS application Geoinformatics–Z_GIS, University of Salzburg, Schillerstraße 30, depicts stress-level peaks in the movement of handicapped Salzburg 5020, Austria. people for the identification of urban barriers (Rodrigues da Email: ourania.kounadi@sbg.ac.at 204 Journal of Empirical Research on Human Research Ethics 13(3) studies is expected to increase in the future. However, these individual rights to prevent disclosure of the location of one’s home, workplace, daily activities, or trips. The purpose of data entail significant privacy violations risks, partially due protecting geo-privacy is to prevent individuals from being to their complexity, and partially because practitioners and identified through locational information (p. 3). the public are not fully aware of the potential disclosure risks linked to these data. With respect to the usage of par- The disclosure of locations may compromise individ- ticipatory sensing data in research studies, Resch (2013) ual privacy when these are used to infer personal infor- denotes the practitioners’ obligation to address several pri- mation about an individual (e.g., living place, working vacy issues such as data ownership, accessibility, integrity, place, frequently visited places). In addition, confidenti- liability, and participants’ opt-in/opt-out possibility. ality can be breached if the disclosed locations are linked However, practitioners are not always aware of privacy to one or more sensitive attributes such as in confidential implications, methods for protection, and how and when to discrete location datasets. Thus, spatial datasets may apply them in research. Three studies in the fields of medi- pose risks to both the privacy and confidentiality of the cine, health geography, sexual and reproductive health, entities. GIScience, geography, and spatial crime analysis examined Regarding participatory sensing data, Christin et al. how confidential point data of participants were portrayed (2011) provided a definition that gives full control of the on maps, and found numerous cases where original data disclosed information to the users of a participatory sensing were used instead of aggregated or anonymized data application: (Brownstein, Cassa, & Mandl, 2006b; Haley et al., 2016; Kounadi & Leitner, 2014). The studies cover a period Privacy in participatory sensing is the guarantee that between 1994 and 2015, and their findings remain consis- participants maintain control over the release of their sensitive tent; efforts to instill sensitivity to location privacy and dis- information. This includes the protection of information that closure risk have been relatively unsuccessful, and can be inferred from both the sensor readings themselves as researchers ignore or are unaware of the spatial reidentifica- well as from the interaction of the users with the participatory tion risk when publishing point data on maps. The findings sensing system (p. 1934). reveal the need for educating practitioners over privacy and confidentiality issues with the use of spatial data. The definition above describes privacy with respect to Our article aims to establish a general guidelines frame- e-diaries, health monitoring, or other applications. work for privacy-preserving tasks during a research cam- However, when it comes to data that need to be collected paign that collects participatory sensing data. The term for research purposes the disclosed information should be “research campaign” encompasses two possible research predefined in a confidentiality–participation agreement, efforts: First, an institution or research group not only con- and thus the control is transferred to the trusted data hold- ducts surveys for their studies, but they may also consider to ers (i.e., controller). publish the data, share them with other members of the Overall, geoprivacy definitions do not encompass all institution or with third parties. Second, a research group or types and applications of spatial data that are prone to com- an individual researcher collects survey data for a single promising individual privacy and/or confidentiality. For study. In the next sections, we analyze privacy issues and certain types, such as the collection of data through a sur- practices (sections “Geoprivacy, Confidentiality, and vey, a spatial confidentiality definition would be more Spatial Datasets” and “Essential Technical Analysis”), and appropriate to use than a location privacy definition. The then propose recommendations for the different stages of a complexity and several dimensions of the confidentiality research campaign (section “Privacy by Design Research and privacy risks linked to spatial data make the formula- Campaign”). tion of a single definition extremely difficult, if not impos- sible. However, there exist anonymization methods that have not only been developed for one datatype but can also Geoprivacy, Confidentiality, and be applied to another. Furthermore, some privacy threats Spatial Datasets that were mentioned for one datatype may have been Although privacy has been conceptualized and explored for neglected or unacknowledged for another datatype that has quite sometime (Post, 2001; Waldo, Herbert, & Lin Millett, similar risk of reidentification. This shows that privacy and 2007; Westin, 1968), privacy regarding spatial data is confidentiality literature for location data has to be exam- described with separate definitions and in sometimes distin- ined more broadly to bring complete solutions. The spatial guished by the type of spatial dataset that it addresses. A data that are at risk of disclosing private or confidential general definition that describes well geoprivacy for both information are listed below. Our categorization is subjec- confidential discrete location data and spatiotemporal tra- tive and aims at highlighting the differences of the catego- jectories of individuals by Kwan, Casas, and Schmitz ries that they have an effect on the geoprivacy strategy to be (2004) denotes that geoprivacy refers to implemented: Kounadi and Resch 205 1. Mobile phone data Each of the nine datasets has certain characteristics due 2. Location-based services (LBS) data to which protection approaches may differ between catego- 3. Location-based social network (LBSN) data ries of data. A LBS dataset may not only have similar attri- 4. Confidential discrete location data butes to a mobile phone dataset, but it may also have 5. Confidential discrete location data on individuals significant differences in its temporal frequency. The text 6. Sensitive discrete location data on individuals attributes of a LBSN dataset may lead to inferential disclo- 7. Data from mobile technical sensors carried by sure of personal preferences, opinions, and other private “humans as sensor operators” matters. The fourth dataset is about confidential locations 8. Data from mobile technical sensors carried by (e.g., a location where a radioactive material was stolen), “humans as objective sensors” and the fifth dataset is about confidential location data on 9. Data from mobile devices carried by “humans as individuals (e.g., the home location of a patient who has subjective sensors” been diagnosed with a certain disease). The approaches to protect the abovementioned datasets (i.e., method, anonym- Mobile phone data contain the users’ past locations ity measure, anonymity level as requested by authorities attached with their time stamp and other phone-related and institutions, and data to assess the disclosure risk) shall attributes depending on the dataset. The spatiotemporal be different. accuracy may vary depending on the population density, Furthermore, Datatypes 8 and 9 can be considered as the the method of extracting locations, and the type of dataset. most complex ones due to the variety and sensitivity of per- Typically, in areas with high population density, such as sonal information that is collected (i.e., spatial, temporal, cities and towns, the spatiotemporal accuracy is high. A and sensitive/confidential). Also, for research purposes typical example of the second type are applications for additional attributes of the data subjects and/or a combina- navigation services that, like the first type, may collect spa- tion of subjective and objective measurements can be col- tial and temporal information of their users. In the third lected. Our recommendations focus on Datatypes 8 and 9 dataset, a user has the option to disclose his or her location because their complexity and sensitivity can lead to greater along with the time stamp and the attribute information that privacy loss compared with the other datasets. is inherent in most social media applications (e.g., a text on Twitter). The fourth location dataset is the least discussed Essential Technical Analysis in the literature of location privacy. An exemplary dataset here is the Incident and Trafficking Database (ITDB) by Disclosure Risk of Released Data and the International Atomic Energy Agency enclosing the ille- Deliverables gal movement of nuclear and radioactive materials (International Atomic Energy Agency, 2015). The fifth and The comprehension of disclosure risk and reidentification sixth datatypes have been mostly discussed for health and techniques is critical to design efficient privacy implemen- crime geocoded datasets such as the residential locations of tations. Below, we present a list of release scenarios for patients of a disease or household locations of victims of a research efforts that collect microdata and associated deliv- crime. The next three datatypes refer to spatiotemporal erables of Datatypes 8 and 9. Each scenario is analyzed in data collected from participatory mobile sensing applica- terms of the risk of disclosure and privacy threats to the data tions. The “human as sensor operators” refers to examples subjects. The location protection methods and research where users of mobile phones capture environmentally guidelines in the next sections take into consideration these related information such as noise, traffic, and air quality. scenarios. However, we do not claim that this is an exhaus- However, to project this information spatially the temporal tive list. and spatial information of the users is captured as well. The eighth datatype involves physiological measurements of •• Scenario 1: Disclosure of original data the individual who carries the device such as data from biometric sensors used for health-monitoring purposes. In The full dataset is disclosed that includes the values for each the last type, the data subjects act as sensors similar to objective or subjective measurement (or both), the spatial and Datatype 8, but they report their own subjective percep- temporal stamps, as well as the identity of the measurement’s tions of the sensed attribute, which can be either about the subject. environment (e.g., public safety, quality of life, or road safety) or about themselves (e.g., fear or emotions). This is Data from Scenario 1 are prone to similar inference typically done with a smartphone application that sends attacks to data collected in LBSNs. According to Alrayes requests to the participants to enter their emotions or per- and Abdelmoty (2014), LBSNs contain three types of ceptions instantly, or at their earliest convenience (based semantics: the spatial semantics that can be used to infer on experience sampling method). places visited, the nonspatial semantics which are mostly 206 Journal of Empirical Research on Human Research Ethics 13(3) textual information for LBSN, whereas for participatory Scenario 1. The space–time stamps of a participant can be sensing these semantics are the subjective or objective mea- translated to trips with distinguishable start and ending des- surements, and the temporal semantics revealing the time tinations. What if the ending destination of a participant for and duration of a visited place. We filtered out privacy trips after 10:00 p.m. is frequently on the same or a nearby threats from inference attacks that were discussed by the location? This location can be the participant’s home loca- aforementioned authors based on their common characteris- tion. Krumm (2007) analyzed subjects’ trips for a recording tics with participatory sensing data. The following personal period of a minimum 2 weeks and tried to infer their home information can be inferred: (a) home location, (b) work locations using several algorithms. The median distance location, (c) most visited places and time spent at these error of the real home address to the inferred one was 60.7 places, (d) locations and activities during weekends, (e) m. Similar approaches may be used for most inference lunch places and after-work activities, (f) favorite stores, (g) attacks mentioned in Scenario 1. The spatial reidentification time spent away from home, and (h) time spent away from risk of data from participatory sensing applications depends work. In addition to these eight privacy threats, the partici- on the recording period, the residential patchiness study pants of the study will be known, and sensitive private area, and the frequency of the space–time stamps. Although specific reidentification studies for participatory sensing information depending on the measurement will be revealed. data do not exist, previous findings from other spatial This extreme scenario leads to a far-reaching loss of privacy datatypes pinpoint the risk that should not be neglected. and involves all types of disclosures (i.e., identity, attribute, and inferential—for definitions, refer to the supporting •• Scenario 4: Disclosure of quasi-identifiers and data information file). It is also worth mentioning other serious collection meta-data privacy threats that have been identified related to the use of mobile sensing applications such as identity theft, profiling, A dataset is disclosed that includes the values for each objective stalking, embarrassment, extortion, and cooperate use/mis- or subjective measurement (or both), the spatial and temporal use (Barcena, Wueest, & Lau, 2014). stamps, as well as one or more quasi identifiers of the measurement’s subject. •• Scenario 2: Disclosure of key identifiers Identity or attribute disclosure is difficult to achieve A dataset is disclosed that includes the values for each objective when quasi-identifiers (e.g., socioeconomic characteristics or subjective measurement (or both), the spatial and temporal of a subject) exist in a dataset that has multiple and variable stamps, as well as one or more key identifiers of the measurements per participant. This is because a subset of measurement’s subject. measurements cannot be linked to an individual. However, if there are only a couple of measurements with the same While a full name is not present in the dataset, other combination of quasi-identifiers it can be inferred that they identifying elements may be given such as e-mail or home belong to a single individual. Also, if the controller dis- address. E-mail addresses can be linked with other online closes information on the data collection methods (e.g., sources to reveal the identity of a participant. Furthermore, there are a minimum or predefined number of measure- home addresses can disclose the participants’ identities, ments per participant), this information can be used to especially in purely residential single family areas (i.e., a define a subset of measurements for one or more data sub- location depicts a residence of only one household). Even if jects. For example, a study collects 100 measurements per the home address is given as a set of geographical coordi- participant, and discloses this dataset along with the sex and nates, X and Y, instead of textual information, the latter can the occupation of each measurement’s subject. A subse- be inferred using freely available reverse geocoding ser- quent data analysis filters out 100 measurements of a man vices (Kounadi, Lampoltshammer, Leitner, & Heistracher, of occupation “X.” All measurements refer to one individ- 2013). ual, which is known due to data collection meta-data infor- mation. Also, it can be found that there is only one man of •• Scenario 3: Disclosure of pseudonyms this occupation in the study area. Thus, the identity and A dataset is disclosed that includes the values for each objective or attribute disclosure of this participant have been compro- subjective measurement (or both), the spatial and temporal stamps, mised like Scenarios 1 and 2. as well as a pseudonym representing the measurement’s subject. •• Scenario 5: Identifying participants in a digital map This scenario illustrates the inferential disclosure of such or printed map datasets with the use of data mining and geoprocessing techniques. If a participant is distinguished by an id, a sub- A map is disclosed in a digital or printed format that portrays the locations and/or values of the measurements for one or set of location data can be analyzed to infer his or her home more participants. address that will lead to privacy threats mentioned in Kounadi and Resch 207 Data deliverables such as participants’ maps are also Controllers often disclose meta-data regarding the loca- prone to reidentification. For example, a map is uploaded tion protection method or any other disclosure limitation on a website of a research organization portraying the val- technique that is applied to the original data to ensure that ues and locations of the measurements for one participant. confidentiality and privacy of subjects are protected, and Reengineering can be applied to the point map to extract the also to provide information on the spatial information loss geographical coordinates of the participant’s locations. of the anonymized released copy that may be used and ana- Brownstein et al. (2006a) applied a reengineering process lyzed by others. However, reengineering can be improved that involves an unsupervised classification to examine the with the disclosure of anonymization meta-data because, spatial reidentification risk of the publication of high- and just like Scenario 6, it provides hints to a potential attacker. low-resolution point maps. The number of correctly reengi- This has been tested with methods such as aggregation and neered addresses was 79% for the high-resolution map and perturbation (Zimmerman & Pavlik, 2008). 26% for the low-resolution map, indicating that by lowering the resolution of a digital map does not prevent reidentifica- Disclosure Risk of Data Collection and Storing tion. Once the coordinates of the participant are extracted, on Devices the home address can be estimated (Scenario 3), then reverse identification (Scenario 2) will reveal a single Data security has been characterized by Boulos, Curtis, and address or a set of addresses, and finally addresses can be AbdelMalik (2009) as the “missing ring” in privacy-pre- used to infer the identity of the participant. The disclosure serving discussions. The authors describe a scenario of a remains even if the map is in a printed format. In this case, research study that has a well-defined privacy-preserving the map can be scanned and georeferenced to a known coor- plan, has been approved by an institutional review board dinate system. The reengineering error of a point printed (IRB), and employs adequate practices for the publication map was examined by Leitner, Mills, and Curtis (2007) of results and maps. However, the security components are who found that the distance errors (i.e., distance from the not checked and approved as the other parts of the research actual to the reengineered location) ranged from 59.54 m to study such as the subjects’ consent to conduct the study, 156.63 m, and are independent of the map scale. disclosure risk of analysis, reporting findings, and sharing data. Thus, the research process is likely to neglect risks •• Scenario 6: Multiple versions of anonymized regarding data theft, data loss, or data disclosure to nonau- datasets thorized parties. Tracking devices that collect physiological or subjective The controller releases multiple versions of anonymised copies measurements can be smartphone applications that collect of the original data. responses to emotions and perceptions, smartphone appli- cations that exploit built-in sensors, or wearable tracking In this scenario, original data are first anonymized using devices such as a wristband or a watch. The measurements an anonymization method. The controller shares the anony- are stored in databases locally, remotely, or both. Data are mized data with a research firm, and soon after discards viewed and analyzed via computer (smartphone, desktop, them because he or she owns the original data. After some- or laptop), and frequently require Internet access (i.e., time, another research firm may make a request for an ano- cloud-based model). Based on the structure of self-tracking nymized copy. The controller reapplies the anonymization systems, security risks exist when data are stored on the method that incorporates a randomization function, and device, data are stored in the cloud, and data are transmitted therefore the anonymized copy is different than the first to the cloud. Barcena et al. (2014) examined a range of self- one. The more this process is repeated, the more copies are tracking services regarding the security issues that take distributed that increase the spatial reidentification risk of place during the storing or transmission of data. First, they the original data. Multiple versions of anonymized dataset found that Bluetooth-Low-Energy-enabled devices can may give hints regarding the method’s parameters and char- transmit a signal that can be read by scanning devices and acteristics to an attacker who will try to reidentify the origi- provide an estimate location of the device. Therefore, the nal data. This scenario has been tested and confirmed for spatiotemporal patterns of the users can be leaked (the same the “non-deterministic Gaussian skew” location protection applies when Wi-Fi is enabled on the device). Second, 20% method (Cassa, Wieland, & Mandl, 2008). of the examined applications that offer cloud-based service components may transmit login credentials in clear text •• Scenario 7: Disclosure of anonymization meta-data (i.e., nonencrypted data). Third, the examined services con- tacted on an average five unique domains. These domains The controller releases metadata information on the location protection method and/or additional disclosure limitation receive information on the user’s behavior and activities practices applied to the original data. without the users being aware of it. Fourth, the services 208 Journal of Empirical Research on Human Research Ethics 13(3) employ user account-based services that make the sessions way that predefined cross-tabulations are preserved. Also, insecure and potential to be hijacked. Fifth, data leakage most techniques can be applied to the records of the matrix may occur if applications use third-party services. Last but (i.e., record transforming masks) or to the columns of the not least, half of the existing services do not have or do not matrix (i.e., attribute transforming mask; Duncan & make available their privacy policies. Pearson, 1991). Several security and anonymity frameworks, however, The first generation of anonymization methods for con- have been proposed for participatory sensing applications fidential discrete spatial datasets, commonly known as (De Cristofaro & Soriente, 2011; Shin et al., 2011; X. O. “geomasking-techniques,” is based on existing methods on Wang, Cheng, Mohapatra, & Abdelzaher, 2013). These microdata such as aggregation and modification with spe- frameworks provide mechanisms to preserve users’ privacy cific adaptations to protect the spatial attribute of the data. when their data are reported in the cloud to a service pro- According to Zandbergen (2014), “Geographic masking is vider. However, we should outline here that in the context of the process of altering the coordinates of point location data a research campaign it is not necessary to send and store data to limit the risk of re-identification upon release of the data in the cloud or to involve a third-party service provider. (p. 4).” The alteration of the coordinates produces an aggre- gated dataset or a modified dataset depending on the tech- nique to be used. If points are aggregated into areal units, Anonymization Methods the transformed dataset has fewer entities than the original In this section, we refer to widely discussed anonymization dataset with count data for each one of them, similar to methods (Table 1) that aim to protect from Disclosure microdata aggregation. If points are aggregated into a new Scenarios 1 to 5. However, we should outline that most of set of symbolic or surrogate points, the transformed dataset the methods have not been evaluated for Scenarios 6 and 7 may retain the original number of observations (Armstrong, on meta-data disclosure or multiple versions of anonymized Rushton, & Zimmerman, 1999; Leitner & Curtis, 2004). copies. The methods mostly affect the precision or the accu- Regarding the modification of the coordinates, points can racy of the produced anonymized (“masked”) data. be processed at a global level with an affine transformation Precision refers to the exactness of information (in geo- (Armstrong et al., 1999) or other cartographic techniques graphical terms, it is the number of decimal places of the such as flipping and rotation (Leitner & Curtis, 2004), and latitude and longitude of locations), whereas accuracy is the at a local level by modifying points with approaches based relation between a measured value and the ground truth. In on random perturbation (Kwan et al., 2004; Leitner & general, “precision-affecting” methods are accurate with Curtis, 2004), or snapping them along the edges of their respect to the information they report, and “accuracy-affect- corresponding Voronoi polygon (Seidl, Paulus, Jankowski, ing” methods are fairly precise. For example, if an observa- & Regenfelder, 2015). tion is aggregated into a postcode level it is not as precise as Adaptive geomasking techniques are modification tech- a point-level observation, but the information that the niques that displace original point locations within uncer- observation lies within the postcode is accurate. Similarly, tainty areas, where the sizes of these areas are defined by if an observation is translated 300 m to the north it is very the underlying population density. The purpose of these precise but still inaccurate. techniques is to offer a “spatial k-anonymity,” meaning that Early methods are mainly statistical and were developed each confidential or private location on the dataset (e.g., a for the protection of microdata. Due to the nature of the household) cannot be distinguished among k-1 other loca- data, the methods are applied to a matrix in which each row tions. Spatial k-anonymity is an adaptation of the classic is a subject and each column an attribute. Although the k-anonymity model. K-anonymity ensures that an effort to structure of participatory sensing spatiotemporal data is dif- identify information of an entity ambiguously maps infor- ferent, these methods formed the basis for the next genera- mation to at least k entities; in other words, any group is tion of more advanced techniques, including the spatial or hidden in a group of size k regarding the quasi-identifiers the spatiotemporal ones. They can be summarized into four (Samarati & Sweeney, 1998). The uncertainty area of the categories: abbreviation, aggregation, modification, and “population-density-based Gaussian spatial blurring” is cir- fabrication (Cox, 1996). An example of abbreviation is the cular, and the selection of the displacement is based on a suppression of records (in this context, it means removal) normal distribution (Cassa, Grannis, Overhage, & Mandl, from geographical areas of low population density. In 2006). In “donut geomasking,” the uncertainty area has the aggregation, microdata records (one record equals to one form of a torus so as to ensure a minimal displacement data subject) of similar values can be averaged, and there- (Hampton et al., 2010). fore microdata are transformed to tabular data. A typical Furthermore, the “voronoi-based aggregation system” example of modification is perturbation where random (Croft, Shi, Sack, & Corriveau, 2016; a spatial aggregation noise is added to each cell or to certain variables. Last, one approach) and the “triangular displacement” (a modifica- fabrication technique is data swapping between records in a tion approach; Murad, Hilton, Horan, & Tangenberg, 2014) 209 Table 1. Privacy and Confidentiality Approaches for Statistical and Spatial Data. Anonymization Dataset approaches Description Major effect Benefits Limitations Microdata Abbreviation Reduces the volume or granularity of Imprecision Easy implementation; mathematical basis for Current applications are released information location protection methods restricted to a-spatial data Aggregation Combines adjacent categories or replaces with nearby values Modification Changes data values with rounding or Inaccuracy perturbation Fabrication Creates a fictional dataset that has distributional and inferential similarities with the original Confidential discrete Adaptive Actual locations are perturbed Inaccuracy Risk of identification information can be Current applications are spatial data (e.g., geomasking considering the spatial k-anonymity adaptively anonymized to meet data-specific restricted to static, health care, crime, regulations and restrictions; anonymized data nontemporal discrete household surveys) retain the initial discrete structure that is location data crucial for many spatial-point pattern analyses Geomasking Geographical masks that extend spatial Inaccuracy or In addition to the location and sensitive theme, with quasi- k-anonymity to basic k-anonymity to imprecision quasi-identifiers may be disclosed that allow identifiers account for quasi-identifiers further analysis of covariates Synthetic Anonymized data are synthesized from Inaccuracy Retains relationship between locations and geographies the results of spatial estimation models covariates that use covariates as estimators of confidential locations Spatiotemporal data Point A set of locations is replaced by a single Imprecision Adequate for visualizing trajectories of Point aggregation of individuals (e.g., aggregation representative location individuals or movement flows in between underperforms random GPS trajectories, areas perturbation techniques cellular data, LBS, Cloaking Lowers the space and/or time precision Option to decrease the temporal or the spatial Prohibits spatial-point radio-frequency of individual-level data resolution pattern analysis; polygon identification devices clustering may hide [RFID]) significant point clusters Dummies Adds noise that simulates human Inaccuracy Allows spatial-point pattern analysis and The spatial accuracy of the trajectories analysis by user augmented anonymized dataset compared with the original one has not been addressed Pseudonyms Identities are stored with pseudonyms Inferential disclosure is not protected Mix zones Locations are hidden in certain areas, and High positional accuracy is achieved in low Analysis by user or group pseudonyms change when exiting them sensitivity areas; it is harder, if not impossible, of users is not possible to perform inference attacks on individuals’ if pseudonyms change in spatiotemporal behavior if pseudonyms are time changed periodically Note. GPS = global positioning system; LBS = location-based services. 210 Journal of Empirical Research on Human Research Ethics 13(3) can be applied to spatial datasets that include covariates, dummies are an interesting approach, the spatial analytical although there are still open questions with respect to the errors of the increased dataset have not been addressed and spatial analytical error they produce (regarding the Voronoi- should be considered when such a dataset is released for based method) or the quantification of the offered k-ano- research purposes. Another technique that affects the accu- nymity (regarding the triangular displacement method). racy of the data is the use of “unlinked pseudonyms” that Last, concepts of simulated geographies (a fabrication are fake identities associated with data subjects (Cuellar, approach) also require additional attributes to create a pro- 2004). As it is explained earlier, pseudonyms will not pre- tected spatial dataset (Paiva, Chakraborty, Reiter, & vent inferential disclosure when space–time stamps are dis- Gelfand, 2014; H. Wang & Reiter, 2012). Here, the attri- closed. A more sophisticated version of pseudonyms is the butes are used to make spatial predictions on the confiden- “mix zones” method in which a new pseudonym is given to tial theme. The resulting hotspots are then used to synthesize a subject as soon as he or she exits the so called mix zone the anonymized dataset. (Beresford & Stajano, 2003, 2004; Buttyán, Holczer, & The general drawback of techniques on confidential dis- Vajda, 2007). In addition, while being in the mix zone loca- crete spatial data is that they have not been applied to spa- tions are hidden. There are two limitations to be considered tiotemporal data. Tuning of the algorithms is needed to if such methods are to be exploited for participatory sensing consider multiple sensitive measurements per data subject data: First, they take into consideration only the space and as opposed to traditional confidential discrete data where time attributes, whereas participatory sensing data also one location, typically a home address, is given per subject. include confidential measurements and potentially addi- However, an important advantage of geomasking studies tional quasi-identifiers. Second, the anonymity refers to for privacy research design is the extensive evaluation of other or artificially inserted subjects in the dataset (i.e., the produced masked datasets regarding the spatial analyti- users of a service), which may not prevent disclosure of pri- cal error. vate locations (see Scenario 3), unless either the underlying Spatial-point aggregation (Adrienko & Adrienko, 2011; residential/building structure is considered or a very large Monreale et al., 2010), or spatial-areal and temporal aggre- number of participations in the study are achieved. gation, known also as cloaking (Cheng, Zhang, Bertino, & The presented methods have the potential to be used for Prabhakar, 2006; Gruteser & Grunwald, 2003; Kalnis, participatory sensing data if they are combined and/or Ghinita, Mouratidis, & Papadias, 2007), follows the same adapted. Nevertheless, the complexity of a participatory approach as statistical aggregation. In particular, it decreases sensing dataset has to be taken into account. Specifically, a the precision of original data. Point aggregation can be used spatiotemporal trajectory dataset contains the attributes for for both privacy protection and a generalization approach to each data subject for multiple measurements per subject, visualize flows in movements and in between areas. With like a participatory sensing dataset. However, it does not cloaking, the time duration of an object at one location is have sensitive attributes or quasi-identifiers other than the considered as quasi-identifier. Given the number of other spatiotemporal information. On the contrary, a confidential objects at this location and for this time duration, a decision discrete dataset may have quasi-identifiers and sensitive to decrease spatial resolution will be taken. Similarly, one attributes but collects only a single measurement for each can lower the temporal resolution. Because cloaking is data subject. designed for LBS data, the anonymity it offers is calculated Another limitation of the existing techniques is that most based on the number of other data subjects (i.e., users of a of them are based on the concepts of spatial k-anonymity service) at a particular time and location. Considering the and k-anonymity aiming at decreasing the risk of inferential number of users of a LBS, this approach can provide suffi- disclosure or identity disclosure. These concepts cannot cient anonymity. However, the number of participants in prevent attribute disclosure that may occur from homogene- participatory sensing studies will probably be much lower, ity attack (i.e., knowing a person who is in the database) and and this will greatly affect the anonymized dataset’s spatial background knowledge attack (i.e., knowing a person who precision due to larger disclosed regions and/or coarser is in the database, and additional information on the distri- time. Generally, all techniques that involve some sort of bution of the sensitive attribute or on the characteristics of spatial aggregation will affect analytical results due to the the person who is in a database). The problems can be modifiable areal unit problem (Openshaw & Openshaw, solved with the concept of “l-diversity” where an equiva- 1984). In practice, polygon or point clusters of the measure- lent class has at least l “well-represented” values for the ments’ values may appear or disappear depending on the sensitive attributes (Machanavajjhala, Kifer, Gehrke, & aggregation’s division of the space. Venkitasubramaniam, 2007). L-diversity ensures that for A different concept is to add noise to the data with artifi- one sensitive attribute table, all equivalent classes of a table cial trajectories so called “dummies”(Kido, Yanagisawa, & have at least l-distinct values for the sensitive attribute. For Satoh, 2005; You, Peng, & Lee, 2007). Dummies are added the case of multiple sensitive attributes, one sensitive attri- to satisfy the anonymity of each data subject. Although bute is treated as the sole sensitive attribute, while the Kounadi and Resch 211 Table 2. Privacy and Confidentiality Recommendations From Public and Independent Bodies. FCSM CDC-ATSDR NRC Organization 1. Standardize 1. Designate a privacy manager 1. Methodological training in the acquisition and use of data and training and centralize 2. Train all responsible staff 2. Training in ethical considerations of data that include agency review of 3. Define criteria for access to restricted-access explicit location information on participants disclosure-limited files 3. Design studies in ways that provide confidentiality data products 4. Planning for release of PUDS protection for human participants 2. Use consistent practices FCSM CDC-ATSDR ICO (POA) Data processing 3. Remove direct 5. Classify each dataset as a restricted-access or 1. Increase a mapping area to cover more properties or identifiers and limit a PUDS occupants other identifying information FCSM CDC-ATSDR ICO (POA) ICO (GCD) NIJ–CMRC Publication 4. Share information 6. Include disclosure 2. Reduce the frequency 1. The use of heat maps, 1. Decide which data to present: blocks, and zones Point versus aggregate data of data and on assessing statement with or timeliness of deliverables disclosure risk PUDS publication reduces privacy risks 2. Use disclaimers to avoid 2. New ways of liability from misuse or 7. Maintain log of 3. Use mapping formats datasets rereleased that do not allow the representing misinterpretation of data information about 3. Provide information on inference of detailed information crime should be laws, liability, freedom of explored information, and privacy 4. Avoid the publication of spatial information 4. Provide contact information of on a household level persons with privacy expertise and familiarity with the data CDC-ATSDR NRC NIJ–CMRC Release data to 8. Authenticate the identity of data 4. Data stewards should 5. Consider privacy and other implications if data provided a third party requestors develop licensing will be merged with other data 9. All restricted-access data requestors are agreements to provide 6. Decide presentation of research results required to sign a DSA increased access 7. Researchers and the agency decide what data will be 10. Requirements for a standard DSA for to linked social- needed restricted-access data spatial datasets that 8. A nondisclosure agreement may be used to guarantee 11. Monitor user compliance with DSAs include confidential confidentiality 12. Include addendum to the DSA when a information 9. The agency can review any research results before requestor plans to link restricted-access publication data to other data 10. Perform background checks on research personnel who 13. Include addendum to the DSA when a will have access to data requestor plans further data releases from 11. Decide where data will be stored to ensure secure restricted-access data to other parties settings 12. Require researchers to destroy raw data after the research is completed Note. Recommendations have been grouped into four categories according to the topic they address. FCSM = Federal Committee on Statistical Methodology; CDC-ATSDR = Centers for Disease Control and Prevention and the Agency for Toxic Substances and Disease Registry; PUDS = public-use dataset; ICO = Information Commissioner’s Office; POA = Practice on Anonymization; GCD = Geospatial crime data; NRC = National Research Council; NIJ = National Institute of Justice; CMRC = Crime Mapping Research Center; DSA = disclosure sharing agreement. others are treated as quasi-identifiers. Thus, l-diversity sets research group or institution, and are specific to the public or requirements on both the quasi-identifiers and the sensitive independent bodies who issued the documents were filtered attributes. out. The recommendations are shown in Table 2 (some of those may have been paraphrased from the original reports) by each body, and divided into four categories according to Recommendations From Relevant Institutions the topic they address. The top part of the table shows the In this subsection, we examine privacy documents from pub- recommendations regarding the organization processes and lic or independent bodies. We focus on recommendations or training of the staff. The second category is about data pro- guidelines with respect to the usage, anonymization, and cessing, and the third category is about the publication of data release of private or confidential data. Recommendations that and deliverables. The bottom part of the table shows recom- are not applicable to research design, within the context of a mendations regarding the release of data to a third body. Two 212 Journal of Empirical Research on Human Research Ethics 13(3) public bodies provide recommendations with respect to con- more precise recommendations when it comes to presenting fidential microdata (Centers for Disease Control and spatial research outputs (2-4 from ICO [POA]; 1-2 from Prevention [CDC]-CSTE, 2005; Federal Committee on ICO [GPD], and 1 from NIJ). It is also recommended that a Statistical Methodology, 2005). Two bodies discuss social, research output or a disclosed dataset is accompanied by health, or personal spatial data (Graham, 2012; Gutmann & privacy-related information (e.g., disclosure assessment, Stern, 2007). Last, two bodies look into crime events as a laws, liability, etc.) and a reference to contact person (4 special type of confidential discrete spatial data (Information from FCSM, 6 from CDC-ATSDR, 3-4 from NIJ). In addi- Commissioner’s Office [ICO], 2012; Wartell & McEwen, tion, CDC-ATSDR suggests to maintain an inventory of 2001). released datasets. The inventory of restricted-access data The U.S.-based Federal Committee on Statistical should be stored internally to ensure compliance with the Methodology (FCSM) provides assistance and guidance on terms of the disclosure sharing agreement (DSA). On the issues that affect federal statistics such as in situations when contrary, for an anonymized public-use dataset (PUDS) the the Office of Management and Budget applies policies inventory can inform interested parties on the datasets’ related to statistics. The most recent working paper on dis- availability and meta-data. Last, NIJ suggests the use of dis- closure by the agency from 2005 discusses anonymization claimers to reduce liability when outputs, such as maps, methods, practices employed by federal agencies, and offers may lead to ambiguous interpretations. recommendations for good practice for both tables and Regarding data releases to a third party (last category of microdata. Another list of guidelines was published in a Table 2), the bodies agree to the requirement of a formal comprehensive report in 2005 by the Centers for Disease agreement between the controller and the requestor. Also, Control and Prevention and the Agency for Toxic Substances checks of the requestor’s validity may be conducted (8 from and Disease Registry (CDC-ATSDR). CDC and ATSDR are CDC-ATSDR and 10 from NIJ). Then, the particulars of the both U.S. federal agencies under the Department of Health data release and potential uses should be discussed and and Human Services and therefore the focus of the report is decided between the two parties such as merging released on health data. data with other data or presentation of results (12, 13 CDC- The recommendations by the National Research Council ATSDR and 5, 6, 7, 11 NIJ). Although data sharing particu- (NRC) in the United States and the independent body lars are decided with the DSA, the collector should still be Information Commissioner’s Office (ICO) in the United allowed to review research outputs if needed. Kingdom are specific to spatial confidential data. NRC pro- vides services via reports to the government, the public, and Privacy by Design Research Campaign the scientific or engineering communities. The recommen- dations address data collected by federal agencies, individ- While previous research has mainly focused on methods to ual researchers, academic or research organizations, and preserve privacy and measures to examine information dis- outline the need to anonymize discrete spatial data. The closure, we propose practical privacy-preserving steps for code of practice on anonymization by ICO (named as ICO the collection, storage, analysis, and dissemination of indi- [POA] in Table 2) focuses on the requirements set by the vidual measurements from mobile participatory sensing Data Protection Act (The Stationery Office, 1998) to high- applications. A privacy-preserving research campaign light key issues in the anonymization of personal data, and requires a concrete privacy plan of several tasks to be devel- has a dedicated section on spatial information. Furthermore, oped before, during, and after the completion of the cam- ICO has published a separate report (named as ICO [GCD] paign. These tasks are presented here as recommendations, in Table 2) with a focus on geospatial crime data. Due to the because their application depends and varies based on a sensitivity of crime events and the increase of online crime project’s specifications. In this article, we treat initial tasks mapping, the National Institute of Justice (NIJ) in the as prior to starting a survey (subsection Presurvey Activities), United States published as well a detailed report tailored to storing, anonymization, and assessment of derived datasets this topic. It discusses, among other issues, the publication (subsection Processing and Analyzing Collected Data), and of data and maps, and the sharing of data with other agen- actions to eliminate disclosure from published data and cies or researchers. deliverables, or when datasets are shared with third parties Recommendations 1 and 2 from FCSM, 1 to 4 from (subsection Disclosure Prevention). Furthermore, a separate CDC-ATSDR, and 1 to 3 from NRC suggest practices prior subsection is dedicated to recommendations that aim to to the anonymization, release, or sharing of the data such as ensure the appropriateness of the research environment to to offer essential training, establish a privacy plan, and stan- handle a privacy-preserving research campaign (subsection dardize practices. There are a few recommendations regard- Security and Safety). In each subsection, we analyze and ing the processing of the data (3 from FCSM, 5 from explicate the details of the recommendations which are then CDC-ATSDR, and 1 from ICO [POA]), but they do not pro- summed up on a table at the end of the respective subsec- pose concrete anonymization methods. However, there are tions (Tables 3, 4, 5, and 7). Kounadi and Resch 213 Table 3. A List of Initial Activities Prior to the Starting of the shared with third parties, criteria for access to restricted- Survey. access datasets (e.g., research personnel, data requestors) have to be defined and included in the plan. A. Presurvey activities The next presurvey step is the preparation of the partici- 1. Design study in the least privacy invasive manner pation agreement. Essential elements of a participation 2. Develop a privacy-preserving research plan agreement include (a) purpose and procedures of the study, 3. Define criteria for access to restricted-access datasets (b) potential risks and discomforts, (c) anticipated benefits, 4. Prepare a participation agreement (d) alternatives to participation, (e) confidentiality state- 5. Ensure inform consent on location privacy disclosure risks ment, (f) injury statement, (g) contact information, and (h) 6. Obtain institutional approval preferably reviewed from a DRB voluntary participation and withdrawal (Hall, 2016). The confidentiality statement can vary depending on the loca- Note. DRB = disclosure review board. tion of the study area, and respective laws and regulations. The participation agreement should outline the location Table 4. A List of Recommendations to Ensure Secure and Safe privacy protection insertions in each stage of the project and Settings. communicate the remaining disclosure risks, if any. Those who communicate the study to the participants should B. Security and safety explain in common language what is “location privacy” and 1. Assign a privacy manager other related terminologies, and provide examples that 2. Train collectors and/or processors in methods and ethical allow them to make an informed decision about whether to considerations participate or not. An optional step for improvement in 3. Ensure a secure IT system future surveys is to add the participants’ feedback regarding 4. Ensure secure sensing devices the perception and preferences on the established privacy Note. IT = information technology. measures. Last, both the research plan and the participation agree- ment should go through institutional approval from objec- Table 5. A List of Recommendations to Store, Anonymize, and tive and experienced staff of the institution or University Asses Derived Datasets. such as IRB, review ethics committee (REC), or a more C. Processing and analysis of collected data specialized disclosure review board (DRB). With respect to the type of organization, De Wolf (2003) suggests to consult 1. Delete data from sensor devices once stored in the IT system a cross-disciplinary DRB that makes recommendations to 2. Remove identifiers from the dataset the IRB, if the institution’s IRB does not have a standard- 3. standardize anonymization practices ized process for reviewing outputs from survey confidential 4. Ensure that the inclusion of pseudonyms does not lead to disclosure data. The creation of a cross-disciplinary DRB could also 5. Ensure that the inclusion of quasi-identifiers does not lead to serve as a committee that educates researchers on the cur- disclosure rent available anonymization and disclosure techniques. 6. Ensure a sufficient l-diversity of the sensitive attributes 7. Classify each dataset as a restricted-access or anonymized Security and Safety dataset 8. Assess disclosure of anonymized datasets The first step of a research campaign that collects participa- 9. Assess anonymization effect on spatial analysis tory sensing data is to assign a dedicated privacy manager who is responsible for the tasks of this subsection as well as Note. IT = information technology. for consulting on (or performing) the tasks of the following subsections. The privacy manager should train data proces- Presurvey Activities sors and collectors regarding their specific activities, and is also responsible to ensure that the research environment The privacy manager should initially design the study in the provides secure and safe settings regarding the sensing least privacy invasive manner depending on the purposes of devices and the information technology (IT) system where the research study. For example, if analysis by user or group data will be stored and processed. of users is not foreseen, all measurements can be stored With regard to the security of IT systems, Boulos et al. altogether without pseudonyms. The study design should be (2009) provide a comprehensive list of measures that reported within a research plan that has dedicated sections include the usage of (a) advanced cryptography, (b) biomet- regarding privacy preservation. These sections should rics, (c) unlocking the data under the physical presentation describe methods and practices that take place during the of other members, (d) cable locks, (e) computers with a project’s duration, and for the time period for which per- built-in trusted platform module (TPM) chip, (f) password sonal data are to be kept by the team. Also if data are to be 214 Journal of Empirical Research on Human Research Ethics 13(3) attack protection, (g) network security, (h) multilevel secu- According to the Health Insurance Portability and rity (MLS), (i) secure USB flash drives, (j) blanking com- Accountability Act (HIPAA) Privacy Rule, there are 18 ele- puter display and autolog-off, (k) discarding of old ments that should be either removed or generalized to dei- equipment and storage media. dentify a dataset (U.S. Government Publishing Office, Furthermore, security should be scrutinized on the sens- 2009). These are (a) names; (b) geographic subdivisions ing devices. Tracking subjective observations is typically smaller than a State with some exceptions having a popula- performed via smartphone “human-as-sensor” applications tion threshold of 20,000 people; (c) dates directly related to that are developed by research teams tailored to the require- an individual; (d) telephone numbers; (e) fax numbers; (f) ments of a research study (Solymosi et al., 2015; Zeile, electronic mail addresses; (g) social security numbers; (h) Resch, Loidl, Petutschnig, & Dörrzapf, 2016). It is recom- medical record numbers; (i) health plan beneficiary num- mended that the application does not incorporate a closed- bers; (j) account numbers; (k) certificate/license numbers; source third-party code. In this case, the researchers cannot (l) vehicle identifiers and serial numbers, including license accurately estimate the risk because they cannot be certain plate numbers; (m) device identifiers and serial numbers; that the third party will not appropriate the sensed data. (n) Web Universal Resource Locators (URLs); (o) Internet Instead, the “human-as-sensor” software should be devel- Protocol (IP) addresses; (p) biometric identifiers, including oped exclusively by the research team. Also, data should be finger and voice prints; (q) full-face photographic images stored only locally and in an encrypting form to prevent the and any comparable images; and (r) any other unique iden- security risks during transmission, when data are stored in tifying number, characteristic, or code. If necessary, identi- the cloud, and when devices are lost or stolen. Collected fiers linked to pseudonyms or measurements may be kept in data should be transferred regularly to the secure research a separate encrypted database to allow original data and IT system. study results to be sent to the participants. Also, the deletion Also, objective observations are tracked with products of data and removal of identifiers may be a daily task or a (smartphone applications or wearable devices) that measure regular task during the survey when it is conducted for lon- physiological measurements. Although a research cam- ger periods of time. paign may develop and use their own product (Bergner, The next step is data anonymization. The anonymization Zeile, Papastefanou, & Rech, 2011; Zeile, Höffken, & of an identifier’s free spatial dataset is necessary as long as Papastefanou, 2009), professional products may be pur- data subjects are to be distinguished from each other. If chased as well from specialized sensor companies. This multiple datasets are to be collected by the research cam- means that researchers analyze collected data (outputs) of paign, the anonymization approach should be standardized “blackbox” systems. When these systems operate on smart- to ensure consistency on released datasets. Collected data phones that may have access to other applications and sen- should be anonymized prior to their release considering the sors of the device data security risks are harder to estimate. following three principles: (a) inclusion of pseudonyms Thus, we recommend the purchase and use of wearable does not lead to disclosure, (b) inclusion of quasi-identifiers devices. Similar to the “human-as-sensor” applications, does not lead to disclosure, and (c) sensitive attributes are data should be stored only locally and in an encrypted form. “well represented” among the equivalent classes of quasi- In addition, Bluetooth and Wi-Fi should be turned off identifiers. All processed datasets should be classified as while the participants use the devices. If this is not possible restricted-access and anonymized datasets. and the survey is conducted for longer periods of time, the An inevitable result of the anonymization process is the devices should be randomly and regularly interchanged reduced quality and accuracy of the anonymized dataset. In among the participants. Therefore, if the trajectories of a fact, by increasing the privacy levels of an anonymized device are collected by a scanner, they could not be linked dataset the dissimilarity of the dataset to the original one to a single individual. The research group may empty the will also increase. Nevertheless, the analytic usefulness also devices and store the data before each exchange (e.g., on a depends on the anonymized method. For example, anony- daily basis) to retain the trajectories of each participant mized data based on the donut method, random perturba- distinguishable. tion, and adaptive areal elimination performed better in If a research team opts for third-party smartphone appli- detecting spatial clusters compared with aggregation for the cations (for collecting either subjective or objective sensing same level of spatial k-anonymity (Hampton et al., 2010; measurements) which transmit and store data on the cloud, Kounadi & Leitner, 2016). Hence, the person who is respon- the relevant security risks have to be considered and com- sible to anonymize should select the approach that has the municated to the participants of a survey. least effect on the analysis to be performed by future data users, conditioned that the approaches can offer the same level of anonymity. Processing and Analyzing Collected Data For example, if the relationship between the locations of The processor should empty sensor devices once data have measurements and other covariates is important, the syn- been archived, and remove identifiers from the dataset. thetic geographies may be an ideal approach. For clustering Kounadi and Resch 215 Table 6. Measures to Evaluate the Anonymization Effect by Type of Spatial Analysis. Unit of analysis Spatial analysis Measures of spatial error and information loss Points Global descriptive statistics Global divergence index (GDi) Pattern detection/analysis Divergence to clustering distance in cross K function analysis, distance to k-nearest neighbors, or Moran’s I value Univariate spatial prediction Divergence to prediction accuracy index (PAI), prediction efficiency index (PEI) Local indicators of spatial Local divergence index (LDi), stability of hotspot (SoH) association Spatial clustering Detection rate, accuracy, sensitivity, and specificity Multivariate spatial relationship Divergence to R-squared or root-mean-square standardized error Areas Choropleth mapping, density Index of similarity (S), suppression, compactness, discernibility, surface estimation nonuniform entropy and pattern analysis, we suggest adaptive geomasking, Table 7. A List of Recommendations to Prevent Disclosure When (a) Findings Are Published, (b) Anonymized Datasets Are dummies, or mix zones. While geomasking retains the Published, and (c) Data Are Shared With Third Parties. count of the original dataset, dummies add data, and mix zones remove data from the dataset. Hence, they should be D. Disclosure prevention preferred in highly populated areas of low sensitivity where Dissemination of findings it is more likely that the addition or removal of measure- 1. Reduce spatial precision ments has a minimal effect. If data are to be used for areal 2. Reduce temporal precision analysis or for choropleth mapping, cloaking can be used as 3. Consider alternatives to point distribution maps a form of adaptive areal aggregation. The data will be less 4. Assess disclosure on a point distribution map precise than the original data; however, there will be no spa- 5. Provide protection vs. disclosure information tial error involved. On the contrary, the usefulness of the 6. Provide contact information cloaked areas should be considered because they may vary 7. Use disclaimers in size and also overlap other analysis units such as admin- Anonymized datasets istrative areas. In such scenarios, areal interpolation can be 8. Avoid the release of multiple versions of anonymized performed that also involves a spatial error to be estimated. datasets Also, point aggregation, as a form of generalization, can be 9. Avoid the disclosure of anonymization meta-data used to visualize the measurements’ trajectories. Again, 10. Inform about disclosure risk assessment there is no spatial error but less precise data. 11. Provide information on protection and effect The final step is the assessment of the anonymized data 12. Provide contact information regarding the disclosure risk, if any, and the anonymization 13. Maintain log of anonymized disclosed datasets effect of the quality of the masked data. The assessment Data sharing with third parties should be clearly communicated to potential users. In Table 14. Plan a mandatory licensing agreement 6, we present measures that can be used to quantify the 15. Plan a DSA for restricted-access data effect of the anonymization process to the masked data 16. Authenticate the identity of data requestors based on the type of spatial analysis to be performed. The 17. Perform background checks on research personnel who global divergence index (GDi) is a composite indicator will have access to data which considers the spatial mean as a measure of central 18. Ensure requestor’s safe settings tendency, the orientation of the ellipse as a measure of 19. Decide what data will be needed directional trend, and the length of the ellipse’s major axis 20. Consider implications if restricted-access data will be as a measure of spatial dispersion (Kounadi & Leitner, merged with other data 2015). It shows the divergence of global spatial statistics of 21. Decide presentation of research outputs the masked point pattern to the original point pattern. For 22. Decide length of period of retaining restricted-access data 23. Review research outputs before publication point pattern analysis and detection, possible approaches 24. Maintain log of restricted-access disclosed datasets are to calculate Cross K function analysis (Kwan et al., 2004), distance to k-nearest neighbor (Seidl et al., 2015), or Note. DSA = disclosure sharing agreement. Moran’s I value to both masked and original datasets, and report the differences of the results. When locations of masked events are used in univariate spatial prediction, the Uhlig, 2008) and the prediction efficiency index (PEI; Hunt, prediction accuracy index (PAI; Chainey, Tompson, & 2016) can be used to evaluate the predicted hotspot areas 216 Journal of Empirical Research on Human Research Ethics 13(3) where the events are more likely to occur. Then, the PAI and However, this measure cannot be used to compare the PEI of masked and original datasets can be compared and effects of two datasets in different areas that were anony- reported. mized in the same way. In this scenario, the divergence to The local divergence index (LDi) calculates the diver- Moran’s I values or to another global statistic of spatial gence of hotspot areas using the Getis-Ord Gi* statistic. autocorrelation with fixed intervals can be employed. The This index can be used to detect the masking effects to the use of indices and standardized metrics allows the testing local characteristics of the original pattern. Another with several datasets and areas, and can give an overall approach that can be used for the local properties is the evaluation of anonymization technique for its usage in spa- Stability of Hotspot (SoH) metric which was originally tial analysis. designed to measure the clusters’ deviation from the same datasets in different resolutions (Bruns & Simko, 2017). Disclosure Prevention The same metrics can be used to measure the clusters’ devi- ation from different datasets (original vs. masked) of the Dissemination of research findings poses significant pri- same resolution. Regarding spatial clustering, there are a vacy threats as those discussed in Disclosure Scenario 5 few indices that can be used. Clusters’ detection rate is the (Subsection Disclosure Risk of Data and Deliverables). percentage of significant spatial clusters (Olson, Grannis, & Hence, researchers should carefully evaluate their research Mandl, 2006), clusters’ accuracy is the percentage of sig- outputs and only present findings, particularly in the form nificant clusters in which at least half of the masked points of a map, if these are needed to convey important messages originate from clustered original points (Olson et al., 2006), to the readers of a publication. A simple way to avoid dis- clusters’ sensitivity is the percentage of masked points that closure risks is to decrease the spatial and/or the temporal originate from clustered original points and are still clus- precision of findings. While researchers may want to report tered (Cassa et al., 2006; Hampton et al., 2010), and clus- on details of the study area and collected data, they should ters’ specificity is the percentage of masked points that avoid point distribution maps of original data in cases where originate from nonclustered points and are still nonclus- participants can be distinguished (e.g., different coloring tered (Cassa et al., 2006; Hampton et al., 2010). Regression per participant or groups of participants, or each point indi- models such as geographically weighted regression (GWR) cates a private location about one participant). Haley et al. or spatial regression can be applied to the original data and (2016) did a literature review on articles published in a covariate(s) (explanatory variable), and then to the masked PubMed, and identified numerous cases that displayed par- data and the covariate(s). The divergence of the models’ ticipant data in maps as points or small-population geo- results, such as R-squared or root-mean-square standard- graphic units. In more than half of the articles, the authors ized error, can act as a measure of error in prospective mul- either did not refer to employed privacy protection tivariate analysis. approaches or anonymized data inadequately. Safe alterna- Regarding analysis on areas or grid cells, the index of tives to point distributions can be a density surface estima- similarity S can identify the degree to which counts within tion or a clustering spatial distribution that reduce the risk areal units is different (Andresen, 2009; Tompson, Johnson, of spatial reidentification. However, these practices may Ashby, Perkins, & Edwards, 2015). Furthermore, aggrega- portray a negative or positive image about an entire neigh- tion-based anonymization techniques are ideal for chorop- borhood that will be perceived as a hotspot of the sensed leth mapping or density surface estimation. Aggregation measurement. does not affect the accuracy but the precision of the data. If it is necessary for research purposes to present a sensi- Therefore, the effect can be evaluated with information loss tive point map, anonymization techniques such as these metrics such as suppression (i.e., number of suppressed under the category “confidential discrete spatial data” of records), compactness (indicates level of geographic preci- Table 1 should be employed. However, the masked point sion), discernibility (checks for anonymity levels higher distribution will, to some degree, be different from the point than the desired level), and nonuniform entropy (based on distribution of the original dataset. The researchers should the probability of identifying original locations; Croft, Shi, consider this error and the impact it may have on reader’s Sack, & Corriveau, 2017). interpretation of the map. Also, it is important to mention Measures that are in the form of an index or a standard- that location privacy risks appear when participatory data ized metric should be preferred because they allow com- are collected for longer periods of time for which the par- parisons between datasets and study areas that are not ticipant has the device on, meaning that his or her identify- possible for some of the measures listed in Table 6. For ing locations can be captured (for more details, refer to example, it may be useful to calculate the divergence of inferences on places under Disclosure Scenario 1, masked and original data to the third nearest neighbor dis- Subsection Disclosure Risk of Released Data and tance by three anonymization approaches, and identify the Deliverables). Hence, a participant distinguishable map approach that has the least effect on point pattern analysis. poses no privacy risks if data are collected for a clearly Kounadi and Resch 217 defined study area or route, and no further identifying infor- inventory of all disclosed or shared datasets that describes mation about the participants is included on the map. If the datatype based on the classification, the disclosed desti- there are any disclosure risks associated with a published nation (e.g., another institution, open data platform), and map, the responsible researcher must estimate and report other relevant information. them. Last, when research outputs are uploaded on a research project’s web page the usage of disclaimers may Conclusion limit unintended misconceptions of the presented informa- tion. There are no standard disclaimers for use, but they The proposed privacy recommendations were generated depend on the publication and information prone to inter- from two sources of information: The first source of infor- pretation. The wording should specify what does the publi- mation is the technical information, and second one is the cation is not liable for, such as decisions and actions taken experts’ suggestions. Technical information includes the by a reader, and errors in the data such as omissions, sys- disclosure risk and approaches to minimize or eliminate tematic bias, or inaccuracies due to privacy constraints. the risk. The experts’ suggestions are a summary of recom- Furthermore, anonymized datasets may be disclosed as mendations or guidelines regarding confidentiality issues long as data are protected and follow the recommendations that arise from the collection, use, or dissemination of per- below. There are different reasons behind an institution’s or sonal data. A chronological classification of our recommen- research group’s decision to share their data. A research dations involves first these that should take place before the group may wish to make their collected data publicly avail- initiation of the survey (presurvey), second these that ensure able to increase visibility of their work, and allow other the safety and security of the research environment, next as researchers to use them which will in turn make scientific soon as data are collected (processing), and finally after comparisons possible. On the contrary, releasing data may data are processed. Some recommendations are applicable be a compromise against the will to publish in a scientific to all research projects (e.g., ensuring safe settings or the journal that has a data policy which requires research data privacy protection of research outputs). However, recom- to be publicly available (PLOS ONE, 2014). In such cases, mendations regarding the disclosure of anonymized datas- a document should be attached to the released datasets that ets and sharing restricted-access data with third parties are contains information on the disclosure risk, protection applicable only if the data controller opts for these prac- method, masked data quality, and contact information on tices. Our set of recommendations can act as a general privacy matters. In addition, practices that increase the dis- guideline for research campaigns that want to use participa- closure risk such as the release of multiple versions of ano- tory sensing data by enlisting the steps of the campaign nymized datasets or disclosure of anonymization meta-data where privacy actions should be taken. Some of our recom- should be avoided. mendations, such as anonymization and dissemination of It is also possible that collected spatiotemporal participa- findings, can also be applicable to other types of spatial tory data are shared with other institutions or researchers. data. However, privacy restrictions that may be specific to Data sharing should be one of the many privacy insertions other types of data and the bodies that share them are not of the confidentiality statement within the participation discussed here. agreement. The institution is responsible for preparing a An important prerequisite of any research project that licensing agreement for such purposes regardless of the involves spatiotemporal participatory data is that the mem- data nature (i.e., anonymized data or restricted-access data). bers of the project are either trained or experts in location For restricted-access data, a separate DSA should be pre- privacy threats. The training should take place at an early pared, or a respective section within the licensing agree- stage of the research campaign to guarantee success in the ment should be inserted. Recommendations 17 to 23 of next two tasks: The first task is to prepare the research plan Table 7 are intended mainly for restricted-access data. It is and the participation agreement. If the data collector decides advisable that the institution performs checks on the credi- to share sensitive data with third parties, criteria for sharing bility and capability of the requestor to handle sensitive per- restricted-access data (i.e., identifier-free survey data) sonal data such as investigating the requestor’s research should be included in the research plan. Both the research personnel, settings, and identity. The controller and the plan and the participation agreement should be comprehen- requestor should decide together about the data that are sive regarding the privacy insertions to ensure a successful needed, the length of period that will be kept by the institutional approval of the survey. The second task is to requestor, and examine potential linkage-disclosure impli- ensure that the research environment establishes secure cations if original data are to be used with other datasets. measures to prevent privacy and confidentiality breaches of Regarding research outputs, the controller should have the collected and stored data. right to review the presentation as well as the final publica- The processing tasks start as soon as survey data are col- tion deliverables to ensure that anonymity is preserved. Last lected. First, data should be safely stored, the devices should but not least, the privacy manager should maintain an be cleaned from any stored data, and identifiers should be 218 Journal of Empirical Research on Human Research Ethics 13(3) removed from the datasets to be analyzed. These basic yet Best Practices critical steps are frequently neglected during the processing Best practices are discussed in detail in the recommenda- of survey data. The removal of direct identifiers is a prereq- tions sections: Presurvey Activities, Security and Safety, uisite to deidentify the data, but if quasi-identifiers and Processing and Analyzing Collected Data, and Disclosure pseudonyms are to be included an anonymization approach Prevention. The most critical practices are summarized in should be employed as well. As a general principle, the the “Conclusion” section. analysis to be performed should be the guide for selecting an anonymization technique to minimize the effect of masked data to the accuracy of spatial analysis (e.g., clus- Research Agenda tering, point pattern, multivariate, etc.). Anonymization and disclosure risk evaluation are important Then, the research team should calculate the anonymiza- tasks of a privacy-preserving research campaign that require tion effect of the masked data on spatial analysis, evaluate further empirical research. Regarding the anonymization, the remaining disclosure risk, and classify all stored datas- we emphasized on approaches that are heavily discussed in ets as “anonymised” or “restricted-access” datasets. the geoprivacy literature, but we do not claim that this is a Regarding the anonymization effect, we suggested mea- comprehensive list. Additional methods should be explored, sures to evaluate the error or information loss of the masked especially in situations when the anonymization needs to be data in spatial analyses. We focused on measures that quan- tailored to the specifications of a research campaign and tify the magnitude of the effect and, whenever possible, collected survey data. We discussed how the qualities of have been used in the geoprivacy literature because their participatory sensing data urge for a fusion of anonymiza- usage in future studies would allow comparison of results. tion methods that consider both k-anonymity and l-diver- The last set of recommendations refer to the tasks after sity. Currently, there is a lack of methods specified to these data are processed. First, the members of the research cam- qualities that can successfully prevent all types of paign should examine the disclosure risk of their research disclosure. outputs, such as maps in scientific journals, and apply a pro- Also, there has been limited discussion on the evaluation tection approach if private locations of measurements are to or quantification of the disclosure risk. Some of the scholars be published. Second, to ensure ethical conduct of research who developed anonymization methods have either quanti- we suggest reporting generally on the employed privacy fied the disclosure risk with formulas that are typically spec- protection practices of outputs or anonymized data as well ified to their method, or developed a method conditioned as adding disclaimers. Third, careful consideration should that it preserves an estimated anonymity (Allshouse et al., be taken while releasing and reporting on anonymized data- 2010; Beresford & Stajano, 2004; Croft et al., 2016; Paiva sets so as not to provide disclosure hints to a potential pri- et al., 2014; Wieland, Cassa, Mandl, & Berger, 2008; You vacy attacker. Fourth, the privacy manager should prepare et al., 2007; Zhang, Freundschuh, Lenzer, & Zandbergen, licensing and DSAs, and maintain a data inventory of all 2017). However, the results or conclusions of these studies published or shared datasets. Last, the controller must should not be generalized because the characteristics of a investigate the appropriateness of the requestor’s environ- study area or available linked datasets and background infor- ment and personnel to handle sensitive data, and he or she mation of the original dataset can vary. Therefore, a different should have an active role regarding the privacy-preserva- approach to evaluate the disclosure risk may be needed. tion practices of the requestor’s research plan. Furthermore, not all anonymization methods were assessed This set of recommendations establishes ethical scien- regarding the disclosure risk they entail. However, there are tific practices and ensures sufficient privacy protection some studies who looked at the disclosure risk of original which are crucial elements so as to engage people to con- data. For instance, Alrayes and Abdelmoty (2014) examined tribute actively being “human data sources.” This is neces- aspects of potential personal information that may be derived sary to leverage collective information in areas such as from LBSN data. The estimated potential problems are not environmental monitoring, urban planning, security and verified by means of actual disclosure due to the fact that quality of life, emergency management, traffic monitoring, LBSN validation data are hard to obtain (e.g., real private or e-tourism. Nonetheless, the willingness to voluntarily locations or identity of users). De Montjoye, Hidalgo, share personal data is linked with the trust in the security of the data. To make an informed decision on the data’s secu- Verleysen, and Blondel (2013) analyzed mobile phone data, rity, participants need to be aware of the potential misuses, and found that four randomly chosen points are enough to countermeasures, and their efficiency. Yet, privacy-related uniquely characterize 95% of heavy users drawn from a ran- terms, conditions, and technology are mostly hardly under- dom sample. Nevertheless, the extent to which these four standable to nonexperts. Therefore, more simple and bind- locations can lead to a successful inferential or attribute dis- ing ways of communicating this kind of information have to closure of the users’ personal information (e.g., identity or be found. household location of a user) remains unexplored. Thus, the Kounadi and Resch 219 evaluation of the disclosure risk is still a topic that needs to Barcena, M. B., Wueest, C., & Lau, H. (2014). How safe is your quantified self. Mountain View, CA: Symantec. be examined in depth with empirical studies that involve Beresford, A., & Stajano, F. (2003). Location privacy in pervasive validation data. computing. IEEE Pervasive Computing, 2(1), 46-55. Beresford, A., & Stajano, F. (2004). Mix zones: User privacy Educational Implications in location-aware services. In Proceedings of the Second IEEE Annual Conference on Pervasive Computing and Current research in location privacy has revealed that research- Communications Workshops, Orlando, FL, USA (pp. 127-131). ers that use spatial data do not always employ adequate pri- Bergner, B., Zeile, P., Papastefanou, G., & Rech, W. (2011). vacy-preserving practices. This can be partially attributed to Emotional barrier GIS as a new tool for Identification the lack of scientific expertise and technological background. and optimization of urban space barriers. Angewandte To eliminate future practices that may compromise individual Geoinformatik, 430-439. privacy, every research campaign that collects participatory Boulos, M. N. K., Curtis, A. J., & AbdelMalik, P. (2009). Musings sensing data should assign a privacy manager. The privacy on privacy issues in health research involving disaggregate manager must be trained in the following areas: geographic data about individuals. International Journal of Health Geographics, 8, Article 46. Brownstein, J. S., Cassa, C. A., Kohane, I. S., & Mandl, K. D. •• Anonymization techniques and location protection (2006a). An unsupervised classification method for infer- methods ring original case locations from low-resolution disease •• Estimation of the disclosure risk maps. International Journal of Health Geographics, 5(1), •• Analytical methods of participatory sensing data Article 56. Brownstein, J. S., Cassa, C. A., & Mandl, K. D. (2006b). No place Authors’ Note to hide—Reverse identification of patients from published Ourania Kounadi is now affiliated to University of Twente, maps. New England Journal of Medicine, 355, 1741-1742. Faculty of Geo-Information Science and Earth Observation (ITC), Bruns, J., & Simko, V. (2017, July). Stable hotspot analysis for Department of Geo-information Processing (https://people. intra-urban heat islands. Paper presented at the GI_Forum, utwente.nl/o.kounadi) Salzburg, Austria. Buttyán, L., Holczer, T., & Vajda, I. (2007). On the effective- Declaration of Conflicting Interests ness of changing pseudonyms to provide location privacy in VANETs. Security and Privacy in Ad-hoc and Sensor The author(s) declared no potential conflicts of interest with respect Networks, 4572, 129-141. to the research, authorship, and/or publication of this article. Cassa, C. A., Grannis, S. J., Overhage, J. M., & Mandl, K. D. (2006). A context-sensitive approach to anonymizing spatial Funding surveillance data: Impact on outbreak detection. Journal of The author(s) disclosed receipt of the following financial support the American Medical Informatics Association, 13, 160-165. for the research, authorship, and/or publication of this article: This doi:10.1197/Jamia.M1920 research is funded by the Austrian Science Fund (FWF) for the Cassa, C. A., Wieland, S. C., & Mandl, K. D. (2008). project Urban Emotions—development of methods for production Re-identification of home addresses from spatial locations of contextual emotion information in spatial planning with the anonymized by Gaussian skew. International Journal of help of human sensory assessment and crowdsourcing technolo- Health Geographics, 7, Article 45. gies in social networks. Project Number I 3022N33. CDC-CSTE. (2005). CDC-ATSDR data release guidelines and procedures for re-release of state-provided data. Retrieved References from https://stacks.cdc.gov/view/cdc/7563 Adrienko, N., & Adrienko, G. (2011). Spatial generalization and Chainey, S., Tompson, L., & Uhlig, S. (2008). The utility of aggregation of massive movement data. IEEE Transactions hotspot mapping for predicting spatial patterns of crime. on Visualization and Computer Graphics, 17, 205-219. Security Journal, 21(1-2), 4-28. Allshouse, W. B., Fitch, M. K., Hampton, K. H., Gesink, D. C., Cheng R., Zhang Y., Bertino E., Prabhakar S. (2006) Preserving Doherty, I. A., Leone, P. A., . . . Miller, W. C. (2010). Geomasking User Location Privacy in Mobile Data Management sensitive health data and privacy protection: An evaluation using Infrastructures. In: Danezis G., Golle P. (eds) Privacy an E911 database. Geocarto International, 25, 443-452. Enhancing Technologies. PET 2006. Lecture Notes in Alrayes, F., & Abdelmoty, A. (2014). No place to hide: A study of Computer Science, 4258. Springer, Berlin, Heidelberg. privacy concerns due to location sharing on geo-social networks. Christin, D., Reinhardt, A., Kanhere, S. S., & Hollick, M. (2011). International Journal on Advances in Security, 7(3/4), 62-75. A survey on privacy in mobile participatory sensing applica- Andresen, M. A. (2009). Testing for similarity in area-based spa- tions. Journal of Systems and Software, 84, 1928-1946. tial patterns: A nonparametric Monte Carlo approach. Applied Cox, L. H. (1996). Protecting confidentiality in small population Geography, 29, 333-345. health and environmental statistics. Statistics in Medicine, 15, Armstrong, M. P., Rushton, G., & Zimmerman, D. L. (1999). 1895-1905. Geographically masking health data to preserve confidential- Croft, W. L., Shi, W., Sack, J.-R., & Corriveau, J.-P. (2016). ity. Statistics in Medicine, 18, 497-525. Location-based anonymization: Comparison and evaluation of 220 Journal of Empirical Research on Human Research Ethics 13(3) the Voronoi-based aggregation system. International Journal temporal unit problem on crime hot spot stability (Doctoral of Geographical Information Science, 30, 1-23. dissertation, American University). Retrieved from https:// Croft, W. L., Shi, W., Sack, J.-R., & Corriveau, J.-P. (2017). www.ncjrs.gov/App/Publications/abstract.aspx?ID=272536 Comparison of approaches of geographic partitioning for data Information Commissioner’s Office. (2012). Crime-mapping and anonymization. Journal of Geographical Systems, 19, 1-28. geo-spatial crime data: Privacy and transparency principles. Cuellar, J. R. (2004). Geopriv requirements (Internet Draft, Nov. Retrieved from https://ico.org.uk/media/for-organisations/ 2002). Retrieved from https://tools.ietf.org/html/draft-ietf- documents/1543/crime_mapping.pdf geopriv-dhcp-lbyr-uri-option-03.html International Atomic Energy Agency. (2015). Incident and De Cristofaro, E., & Soriente, C. (2011). Short paper: PEPSI— Trafficking Database (ITDB). Retrieved from http://www-ns. privacy-enhanced participatory sensing infrastructure. In iaea.org/security/itdb.asp Proceedings of the fourth ACM Conference on Wireless Kalnis, P., Ghinita, G., Mouratidis, K., & Papadias, D. (2007). Network Security, Hamburg, Germany, 14–17 June 2011, pp. Preventing location-based identity inference in anonymous 71-78. ACM Press. spatial queries. IEEE Transactions on Knowledge and Data De Montjoye, Y.-A., Hidalgo, C. A., Verleysen, M., & Blondel, Engineering, 19, 1719-1733. V. D. (2013). Unique in the crowd: The privacy bounds of Kanjo, E., Bacon, J., Roberts, D., & Landshoff, P. (2009). human mobility. Scientific Reports, 3, Article 1376. MobSens: Making smart phones smarter. IEEE Pervasive Denning, T., Andrew, A., Chaudhri, R., Hartung, C., Lester, J., Computing, 8(4), 50-57. Borriello, G., & Duncan, G. (2009). BALANCE: Towards a Kido, H., Yanagisawa, Y., & Satoh, T. (2005). An anonymous usable pervasive wellness application with accurate activity communication technique using dummies for location-based inference. In Proceedings of the 10th workshop on Mobile services. In Proceedings International Conference on the Computing Systems and Applications, Santa Cruz, CA, 23–24 Pervasive Services (ICPS’05), 11–14 July 2005, pp. 461-464. February 2009, pp. 1-6. ACM. Santorini, Greece, IEEE. De Wolf, V. A. (2003). Issues in accessing and sharing confidential Kounadi, O., Lampoltshammer, T. J., Leitner, M., & Heistracher, survey and social science data. Data Science Journal, 2, 66-74. T. (2013). Accuracy and privacy aspects in free online Duncan, G. T., & Pearson, R. W. (1991). Enhancing access to reverse geocoding services. Cartography and Geographic microdata while protecting confidentiality: Prospects for the Information Science, 40, 140-153. future. Statistical Science, 6, 219-232. Kounadi, O., & Leitner, M. (2014). Why does geoprivacy mat- Federal Committee on Statistical Methodology. (2005). Report on ter? The scientific publication of confidential data presented statistical disclosure limitation methodology. Retrieved from on maps. Journal of Empirical Research on Human Research https://www.hhs.gov/sites/default/files/spwp22.pdf Ethics, 9, 34-45. Graham, C. (2012). Anonymisation: Managing data protection Kounadi, O., & Leitner, M. (2015). Spatial information diver- risk code of practice. Information Commissioner’s Office. gence: Using global and local indices to compare geographi- Retrieved from https://ico.org.uk/media/1061/anonymisation- cal masks applied to crime data. Transactions in GIS, 19, code.pdf 737-757. doi:10.1111/tgis.12125 Gruteser, M., & Grunwald, D. (2003). Anonymous usage of Kounadi, O., & Leitner, M. (2016). Adaptive areal elimination location-based services through spatial and temporal cloaking. (AAE): A transparent way of disclosing protected spatial In Proceedings of the 1st International Conference on Mobile datasets. Computers, Environment and Urban Systems, 56, Systems, Applications and Services, San Francisco, CA, 5–8 59-67. doi:10.1016/j.compenvurbsys.2016.01.004 May 2003, pp. 273-286. Krumm, J. (2007). Inference attacks on location tracks. In A. Gutmann, M. P., & Stern, P. C. (Ed.). (2007). Putting people on LaMarca, M. Langheinrich, & K. Truong (Eds.), Pervasive the map: Protecting confidentiality with linked social-spatial computing (Vol. 4480, pp. 127-143). Berlin, Germany: data. Washington, DC: The National Academies Press. Springer. Haley, D. F., Matthews, S. A., Cooper, H. L., Haardörfer, R., Kwan, M. P., Casas, I., & Schmitz, B. C. (2004). Protection of geo- Adimora, A. A., Wingood, G. M., & Kramer, M. R. (2016). privacy and accuracy of spatial information: How effective Confidentiality considerations for use of social-spatial data are geographical masks? Cartographica: The International on the social determinants of health: Sexual and reproductive Journal for Geographic Information and Geovisualization, health case study. Social Science & Medicine, 166, 49-56. 39(2), 15-28. Hall, W. R. (2016). Human Subjects Protection Program (HSPP): Leitner, M., & Curtis, A. (2004). Cartographic guidelines for geo- Policies and procedures. Office for the Protection of Research graphically masking the locations of confidential point data. Subjects, Health Sciences Institutional Review Board, Cartographic Perspectives, 49, 22-39. University Park Institutional Review Board, University of Leitner, M., Mills, J. W., & Curtis, A. (2007). Can novices to Southern California. Retrieved from https://oprs.usc.edu/hspp/ geospatial technology compromise spatial confidentially? Hampton, K. H., Fitch, M. K., Allshouse, W. B., Doherty, I. A., Kartographische Nachrichten(“cartographic News”), 57, Gesink, D. C., Leone, P. A., . . . Miller, W. C. (2010). Mapping 78-84. health data: Improved privacy protection with donut method Machanavajjhala, A., Kifer, D., Gehrke, J., & Venkitasubramaniam, geomasking. American Journal of Epidemiology, 172, 1062- M. (2007). l-diversity: Privacy beyond k-anonymity. ACM 1069. doi:10.1093/Aje/Kwq248 Transactions on Knowledge Discovery from Data, 1, 1-12. Hunt, J. M. (2016). Do crime hot spots move? Exploring the MacKerron, G., & Mourato, S. (2013). Happiness is greater in natu- effects of the modifiable areal unit problem and modifiable ral environments. Global Environmental Change, 23, 992-1000. Kounadi and Resch 221 Maisonneuve N., Stevens M., Niessen M.E., Steels L. (2009) Solymosi, R., Bowers, K., & Fujiyama, T. (2015). Mapping fear of NoiseTube: Measuring and mapping noise pollution with crime as a context-dependent everyday experience that varies mobile phones. In: Athanasiadis I.N., Rizzoli A.E., Mitkas in space and time. Legal and Criminological Psychology, 20, P.A., Gómez J.M. (eds) Information Technologies in 193-211. Environmental Engineering. Environmental Science and The Stationery Office. (1998). Data Protection Act. Retrieved from Engineering (pp. 53-65). Berlin, Heidelberg: Springer. http://www.legislation.gov.uk/ukpga/1998/29/contents Monreale, A., Andrienko, G., Andrienko, N., Giannotti, F., Stuntebeck, E. P., Davis, I. I. J. S., Abowd, G. D., & Blount, M. Pedreschi, D., Rinzivillo, S., & Wrobel, S. (2010). Movement (2008). HealthSense: Classification of health-related sensor data anonymity through generalization. Transactions on Data data through user-assisted machine learning. In Proceedings Privacy, 3, 91-121. of the 9th Workshop on Mobile Computing Systems and Murad, A., Hilton, B., Horan, T., & Tangenberg, J. (2014). Protecting Applications, Napa, CA, 25–26 February 2008, pp. 6–10. patient geo-privacy via a triangular displacement geo-masking New York: ACM. method. In Proceedings of the 1st ACM SIGSPATIAL Tompson, L., Johnson, S., Ashby, M., Perkins, C., & Edwards, P. International Workshop on Privacy in Geographic Information (2015). UK open source crime data: Accuracy and possibili- Collection and Analysis, Dallas/Fort Worth, TX, 4–7 November ties for research. Cartography and Geographic Information 2014, pp. 1–4. New York, NY: ACM. Science, 42, 97-111. Olson, K. L., Grannis, S. J., & Mandl, K. D. (2006). Privacy Törnros, T., Dorn, H., Reichert, M., Ebner-Priemer, U., Salize, protection versus cluster detection in spatial epidemiol- H., Tost, H., Meyer-Lindenberg, A., & Zipf, A. (2016). A ogy. American Journal of Public Health, 96, 2002-2008. comparison of temporal and location-based sampling strate- doi:10.2105/Ajph.2005.069526 gies for global positioning system-triggered electronic dia- Openshaw, S., & Openshaw, S. (1984). The modifiable areal unit ries. Geospatial Health, 11(3). Retrieved from https://doi. problem. Norwich: Geo Abstracts University of East Anglia. org/10.4081/gh.2016.473 Paiva, T., Chakraborty, A., Reiter, J., & Gelfand, A. (2014). U.S. Government Publishing Office. (2009). 45 CFR 164.514— Imputation of confidential data sets with spatial locations using Other requirements relating to uses and disclosures of pro- disease mapping models. Statistics in Medicine, 33, 1928-1945. tected health information. Available from https://www. PLOS ONE. (2014). PLOS’ new data policy: Public access to gpo.gov/fdsys/pkg/CFR-2009-title45-vol1/xml/CFR-2009- data. Retreived from http://blogs.plos.org/everyone////plos- title45-vol1-sec164-514.xml new-data-policy-public-access-data-/201402242 Waldo, J., Herbert, S., & Lin Millett, L. I. (2007). Engaging pri- Post, R. C. (2001). Three concepts of privacy. Georgetown Law vacy and information technology in a digital age. Washington, Journal, 89, Article 2087. DC: The National Academies Press. Resch, B. (2013, October). People as sensors and collective Wang, H., & Reiter, J. P. (2012). Multiple imputation for sharing sensing-contextual observations complementing geo-sensor precise geographies in public use data. The Annals of Applied network measurements. In Krisp J (ed.) Progress in location- Statistics, 6(1), 229-252. based services (pp. 391-406). Berlin, Germany: Springer. Wang, X. O., Cheng, W., Mohapatra, P., & Abdelzaher, T. (2013). Resch, B., Summa, A., Sagl, G., Zeile, P., & Exner, J.-P. (2015, Artsense: Anonymous reputation and trust in participatory November). Urban emotions—Geo-semantic emotion extrac- sensing. In The Proceedings of IEEE INFOCOM 2013, Turin, tion from technical sensors, human sensors and crowdsourced Italy, 14–19 April 2013, pp. 2652–2660. IEEE. data. In Georg G., & Haoshen H. (eds) Progress in location- Wartell, J., & McEwen, J. T. (2001). Privacy in the information based services 2014 (pp. 199-212). Vienna, Austria: Springer age: A guide for sharing crime maps and spatial data series: International Publishing. Research report. Crime Mapping Research Center, National Rodrigues da Silva, A. N., Zeile, P., de Oliveira Aguiar, F., Institute of Justice, Retrieved from https://www.ncjrs.gov/ Papastefanou, G., & Bergner, B. S. (2014). Smart Sensoring pdffiles1/nij/grants/188739.pdf as a planning support tool for barrier free planning: Project Westin, A. F. (1968). Privacy and freedom. Washington and Lee outcomes and recent developments. In N. N. Pinto, J. A. Law Review, 25(1), Article 20. Tenedório, A. P. Antunes, & J. R. Cladera (eds). Technologies Wieland, S. C., Cassa, C. A., Mandl, K. D., & Berger, B. (2008). for Urban and Spatial Planning: Virtual Cities and Territories Revealing the spatial distribution of a disease while pre- (pp. 1-16), Hershey, PA: IGI Global. serving privacy. Proceedings of the National Academy Samarati, P., & Sweeney, L. (1998). Protecting privacy when of Sciences of the United States of America, 105, 17608- disclosing information: k-anonymity and its enforcement 17613. through generalization and suppression: Technical report. You, T. H., Peng, W. C., & Lee, W. C. (2007). Protecting moving SRI International, Retreived from http://citeseerx.ist.psu.edu/ trajectories with dummies. In The International Conference viewdoc/summary?doi=10.1.1.37.5829 on Mobile Data Management, Washington, DC, 1 May 2007, Seidl, D. E., Paulus, G., Jankowski, P., & Regenfelder, M. (2015). pp. 198–205. IEEE Computer Society. Spatial obfuscation methods for privacy protection of house- Zandbergen, P. A. (2014). Ensuring confidentiality of geocoded hold-level data. Applied Geography, 63, 253-263. health data: Assessing geographic masking strategies for indi- Shin, M., Cornelius, C., Peebles, D., Kapadia, A., Kotz, D., vidual-level data. Advances in Medicine, 2014(2014), Article & Triandopoulos, N. (2011). AnonySense: A system for 567049. anonymous opportunistic sensing. Pervasive and Mobile Zeile, P., Höffken, S., & Papastefanou, G. (2009). Mapping Computing, 7(1), 16-30. people?–The measurement of physiological data in city areas 222 Journal of Empirical Research on Human Research Ethics 13(3) and the potential benefit for urban planning, In Proceedings Author Biographies REAL CORP 2009, Catalonia, Spain, 22–25 April 2009. Ourania Kounadi is a postdoc researcher at the Department of Zeile, P., Memmel, M., & Exner, J.-P. (2012). A new urban Geoinformatics, University of Salzburg. Her main research interests sensing and monitoring approach: Tagging the city with include geoprivacy and spatial confidentiality, urban emotions in the RADAR SENSING app. In Schrenk, M., Popovich, V., spatial planning, fear of crime, and spatial crime analysis. She con- Engelke, D., Elisei, P. (eds) The Proceedings REAL CORP ceived and designed the study. She analyzed the technical aspects of 2012 (pp. 1397–1409). Schwechat, Austria: CORP. geoprivacy, developed the lists of recommendations, and wrote the Zeile, P., Resch, B., Loidl, M., Petutschnig, A., & Dörrzapf, L. paper. (2016). Urban emotions and cycling experience–Enriching Bernd Resch is an assistant professor at the Department of traffic planning for cyclists with human sensor data. Gi_Forum, Geoinformatics, University of Salzburg, and a visiting scholar at 2016(1), 204-216. Harvard University. His main research interests include human and Zhang, S., Freundschuh, S. M., Lenzer, K., & Zandbergen, P. technical sensors, collective sensing, self-learning systems in A. (2017). The location swapping method for geomasking. GIScience, and real-time and smart cities. He contributed to the Cartography and Geographic Information Science, 44(1), 22-34. conception of the study and critically reviewed the manuscript. He Zimmerman, D. L., & Pavlik, C. (2008). Quantifying the addressed key points regarding the particulars of participatory effects of mask metadata disclosure and multiple releases sensing data and the remaining open research questions of on the confidentiality of geographically masked health geoprivacy. data. Geographical Analysis, 40(1), 52-76. http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Journal of Empirical Research on Human Research Ethics SAGE

A Geoprivacy by Design Guideline for Research Campaigns That Use Participatory Sensing Data:

Loading next page...
1
 
/lp/sage/a-geoprivacy-by-design-guideline-for-research-campaigns-that-use-yZL2h4PzSI

References (85)

Publisher
SAGE
Copyright
Copyright © 2022 by SAGE Publications
ISSN
1556-2646
eISSN
1556-2654
DOI
10.1177/1556264618759877
Publisher site
See Article on Publisher Site

Abstract

Participatory sensing applications collect personal data of monitored subjects along with their spatial or spatiotemporal stamps. The attributes of a monitored subject can be private, sensitive, or confidential information. Also, the spatial or spatiotemporal attributes are prone to inferential disclosure of private information. Although there is extensive problem- oriented literature on geoinformation disclosure, our work provides a clear guideline with practical relevance, containing the steps that a research campaign should follow to preserve the participants’ privacy. We first examine the technical aspects of geoprivacy in the context of participatory sensing data. Then, we propose privacy-preserving steps in four categories, namely, ensuring secure and safe settings, actions prior to the start of a research survey, processing and analysis of collected data, and safe disclosure of datasets and research deliverables. Keywords geoprivacy by design, location privacy, spatiotemporal data, mobile participatory sensors, disclosure risk, anonymization methods, research design, spatial analysis Silva, Zeile, de Oliveira Aguiar, Papastefanou, & Bergner, Introduction 2014). An a-spatial example is the HealthSense project that Participatory sensing refers to sensor data gained voluntarily improves the classification of health detection events from participants for personal benefits or to benefit the com- through user feedback information incorporated into munity (Christin, Reinhardt, Kanhere, & Hollick, 2011). machine learning techniques (Stuntebeck, Davis, Abowd, & Sensors are attached to mobile devices such as smartphones Blount, 2008). The application examples mentioned so far or smart wristbands, and typically collect data to be exam- collect and analyze objective measurements from sensors. ined (e.g., heart rate) along with other sensed data such as However, in some spatial studies subjective measurements location, time, pictures, sound, and video. The main sensing (i.e., provided by the participant via a questionnaire app) measurement can be collected for personal interest such as are collected to either complement objective measurements the BALANCE system that detects the caloric expenditure of of biometric sensors (Resch, Summa, Sagl, Zeile, & Exner, a user (Denning et al., 2009). Another application of partici- 2015), or measure emotions and perceptions (e.g., fear of patory sensing is to alert medical staff of their patients’ crime, happiness, perception of environmental and built abnormal behaviors like the MobAsthma application that phenomena, or mood) that are more difficult to capture via measures asthma peak flows, pollution, and location to biometric sensors (MacKerron & Mourato, 2013; Solymosi, inform on asthma attacks (Kanjo, Bacon, Roberts, & Bowers, & Fujiyama, 2015; Törnros et al., 2016; Zeile, Landshoff, 2009). These applications are human centric Memmel, & Exner, 2012). because they collect information about the individual who The usage of spatiotemporal participatory sensing data is carries the sensor. There are also environment-centric appli- a scientific trend in many fields, and the intensity of the cations, where the participant acts as a “human as sensor operator” and carries the mobile device to capture environ- University of Salzburg, Austria mental phenomena such as air quality or noise (Kanjo et al., Center for Geographic Analysis, Harvard University, Cambridge, MA, USA 2009; Maisonneuve, Stevens, Niessen, & Steels, 2009). Corresponding Author: Also, participatory sensing has been used for spatial as Ourania Kounadi, Postdoctoral Researcher, Department of well as a-spatial research studies. The EmbaGIS application Geoinformatics–Z_GIS, University of Salzburg, Schillerstraße 30, depicts stress-level peaks in the movement of handicapped Salzburg 5020, Austria. people for the identification of urban barriers (Rodrigues da Email: ourania.kounadi@sbg.ac.at 204 Journal of Empirical Research on Human Research Ethics 13(3) studies is expected to increase in the future. However, these individual rights to prevent disclosure of the location of one’s home, workplace, daily activities, or trips. The purpose of data entail significant privacy violations risks, partially due protecting geo-privacy is to prevent individuals from being to their complexity, and partially because practitioners and identified through locational information (p. 3). the public are not fully aware of the potential disclosure risks linked to these data. With respect to the usage of par- The disclosure of locations may compromise individ- ticipatory sensing data in research studies, Resch (2013) ual privacy when these are used to infer personal infor- denotes the practitioners’ obligation to address several pri- mation about an individual (e.g., living place, working vacy issues such as data ownership, accessibility, integrity, place, frequently visited places). In addition, confidenti- liability, and participants’ opt-in/opt-out possibility. ality can be breached if the disclosed locations are linked However, practitioners are not always aware of privacy to one or more sensitive attributes such as in confidential implications, methods for protection, and how and when to discrete location datasets. Thus, spatial datasets may apply them in research. Three studies in the fields of medi- pose risks to both the privacy and confidentiality of the cine, health geography, sexual and reproductive health, entities. GIScience, geography, and spatial crime analysis examined Regarding participatory sensing data, Christin et al. how confidential point data of participants were portrayed (2011) provided a definition that gives full control of the on maps, and found numerous cases where original data disclosed information to the users of a participatory sensing were used instead of aggregated or anonymized data application: (Brownstein, Cassa, & Mandl, 2006b; Haley et al., 2016; Kounadi & Leitner, 2014). The studies cover a period Privacy in participatory sensing is the guarantee that between 1994 and 2015, and their findings remain consis- participants maintain control over the release of their sensitive tent; efforts to instill sensitivity to location privacy and dis- information. This includes the protection of information that closure risk have been relatively unsuccessful, and can be inferred from both the sensor readings themselves as researchers ignore or are unaware of the spatial reidentifica- well as from the interaction of the users with the participatory tion risk when publishing point data on maps. The findings sensing system (p. 1934). reveal the need for educating practitioners over privacy and confidentiality issues with the use of spatial data. The definition above describes privacy with respect to Our article aims to establish a general guidelines frame- e-diaries, health monitoring, or other applications. work for privacy-preserving tasks during a research cam- However, when it comes to data that need to be collected paign that collects participatory sensing data. The term for research purposes the disclosed information should be “research campaign” encompasses two possible research predefined in a confidentiality–participation agreement, efforts: First, an institution or research group not only con- and thus the control is transferred to the trusted data hold- ducts surveys for their studies, but they may also consider to ers (i.e., controller). publish the data, share them with other members of the Overall, geoprivacy definitions do not encompass all institution or with third parties. Second, a research group or types and applications of spatial data that are prone to com- an individual researcher collects survey data for a single promising individual privacy and/or confidentiality. For study. In the next sections, we analyze privacy issues and certain types, such as the collection of data through a sur- practices (sections “Geoprivacy, Confidentiality, and vey, a spatial confidentiality definition would be more Spatial Datasets” and “Essential Technical Analysis”), and appropriate to use than a location privacy definition. The then propose recommendations for the different stages of a complexity and several dimensions of the confidentiality research campaign (section “Privacy by Design Research and privacy risks linked to spatial data make the formula- Campaign”). tion of a single definition extremely difficult, if not impos- sible. However, there exist anonymization methods that have not only been developed for one datatype but can also Geoprivacy, Confidentiality, and be applied to another. Furthermore, some privacy threats Spatial Datasets that were mentioned for one datatype may have been Although privacy has been conceptualized and explored for neglected or unacknowledged for another datatype that has quite sometime (Post, 2001; Waldo, Herbert, & Lin Millett, similar risk of reidentification. This shows that privacy and 2007; Westin, 1968), privacy regarding spatial data is confidentiality literature for location data has to be exam- described with separate definitions and in sometimes distin- ined more broadly to bring complete solutions. The spatial guished by the type of spatial dataset that it addresses. A data that are at risk of disclosing private or confidential general definition that describes well geoprivacy for both information are listed below. Our categorization is subjec- confidential discrete location data and spatiotemporal tra- tive and aims at highlighting the differences of the catego- jectories of individuals by Kwan, Casas, and Schmitz ries that they have an effect on the geoprivacy strategy to be (2004) denotes that geoprivacy refers to implemented: Kounadi and Resch 205 1. Mobile phone data Each of the nine datasets has certain characteristics due 2. Location-based services (LBS) data to which protection approaches may differ between catego- 3. Location-based social network (LBSN) data ries of data. A LBS dataset may not only have similar attri- 4. Confidential discrete location data butes to a mobile phone dataset, but it may also have 5. Confidential discrete location data on individuals significant differences in its temporal frequency. The text 6. Sensitive discrete location data on individuals attributes of a LBSN dataset may lead to inferential disclo- 7. Data from mobile technical sensors carried by sure of personal preferences, opinions, and other private “humans as sensor operators” matters. The fourth dataset is about confidential locations 8. Data from mobile technical sensors carried by (e.g., a location where a radioactive material was stolen), “humans as objective sensors” and the fifth dataset is about confidential location data on 9. Data from mobile devices carried by “humans as individuals (e.g., the home location of a patient who has subjective sensors” been diagnosed with a certain disease). The approaches to protect the abovementioned datasets (i.e., method, anonym- Mobile phone data contain the users’ past locations ity measure, anonymity level as requested by authorities attached with their time stamp and other phone-related and institutions, and data to assess the disclosure risk) shall attributes depending on the dataset. The spatiotemporal be different. accuracy may vary depending on the population density, Furthermore, Datatypes 8 and 9 can be considered as the the method of extracting locations, and the type of dataset. most complex ones due to the variety and sensitivity of per- Typically, in areas with high population density, such as sonal information that is collected (i.e., spatial, temporal, cities and towns, the spatiotemporal accuracy is high. A and sensitive/confidential). Also, for research purposes typical example of the second type are applications for additional attributes of the data subjects and/or a combina- navigation services that, like the first type, may collect spa- tion of subjective and objective measurements can be col- tial and temporal information of their users. In the third lected. Our recommendations focus on Datatypes 8 and 9 dataset, a user has the option to disclose his or her location because their complexity and sensitivity can lead to greater along with the time stamp and the attribute information that privacy loss compared with the other datasets. is inherent in most social media applications (e.g., a text on Twitter). The fourth location dataset is the least discussed Essential Technical Analysis in the literature of location privacy. An exemplary dataset here is the Incident and Trafficking Database (ITDB) by Disclosure Risk of Released Data and the International Atomic Energy Agency enclosing the ille- Deliverables gal movement of nuclear and radioactive materials (International Atomic Energy Agency, 2015). The fifth and The comprehension of disclosure risk and reidentification sixth datatypes have been mostly discussed for health and techniques is critical to design efficient privacy implemen- crime geocoded datasets such as the residential locations of tations. Below, we present a list of release scenarios for patients of a disease or household locations of victims of a research efforts that collect microdata and associated deliv- crime. The next three datatypes refer to spatiotemporal erables of Datatypes 8 and 9. Each scenario is analyzed in data collected from participatory mobile sensing applica- terms of the risk of disclosure and privacy threats to the data tions. The “human as sensor operators” refers to examples subjects. The location protection methods and research where users of mobile phones capture environmentally guidelines in the next sections take into consideration these related information such as noise, traffic, and air quality. scenarios. However, we do not claim that this is an exhaus- However, to project this information spatially the temporal tive list. and spatial information of the users is captured as well. The eighth datatype involves physiological measurements of •• Scenario 1: Disclosure of original data the individual who carries the device such as data from biometric sensors used for health-monitoring purposes. In The full dataset is disclosed that includes the values for each the last type, the data subjects act as sensors similar to objective or subjective measurement (or both), the spatial and Datatype 8, but they report their own subjective percep- temporal stamps, as well as the identity of the measurement’s tions of the sensed attribute, which can be either about the subject. environment (e.g., public safety, quality of life, or road safety) or about themselves (e.g., fear or emotions). This is Data from Scenario 1 are prone to similar inference typically done with a smartphone application that sends attacks to data collected in LBSNs. According to Alrayes requests to the participants to enter their emotions or per- and Abdelmoty (2014), LBSNs contain three types of ceptions instantly, or at their earliest convenience (based semantics: the spatial semantics that can be used to infer on experience sampling method). places visited, the nonspatial semantics which are mostly 206 Journal of Empirical Research on Human Research Ethics 13(3) textual information for LBSN, whereas for participatory Scenario 1. The space–time stamps of a participant can be sensing these semantics are the subjective or objective mea- translated to trips with distinguishable start and ending des- surements, and the temporal semantics revealing the time tinations. What if the ending destination of a participant for and duration of a visited place. We filtered out privacy trips after 10:00 p.m. is frequently on the same or a nearby threats from inference attacks that were discussed by the location? This location can be the participant’s home loca- aforementioned authors based on their common characteris- tion. Krumm (2007) analyzed subjects’ trips for a recording tics with participatory sensing data. The following personal period of a minimum 2 weeks and tried to infer their home information can be inferred: (a) home location, (b) work locations using several algorithms. The median distance location, (c) most visited places and time spent at these error of the real home address to the inferred one was 60.7 places, (d) locations and activities during weekends, (e) m. Similar approaches may be used for most inference lunch places and after-work activities, (f) favorite stores, (g) attacks mentioned in Scenario 1. The spatial reidentification time spent away from home, and (h) time spent away from risk of data from participatory sensing applications depends work. In addition to these eight privacy threats, the partici- on the recording period, the residential patchiness study pants of the study will be known, and sensitive private area, and the frequency of the space–time stamps. Although specific reidentification studies for participatory sensing information depending on the measurement will be revealed. data do not exist, previous findings from other spatial This extreme scenario leads to a far-reaching loss of privacy datatypes pinpoint the risk that should not be neglected. and involves all types of disclosures (i.e., identity, attribute, and inferential—for definitions, refer to the supporting •• Scenario 4: Disclosure of quasi-identifiers and data information file). It is also worth mentioning other serious collection meta-data privacy threats that have been identified related to the use of mobile sensing applications such as identity theft, profiling, A dataset is disclosed that includes the values for each objective stalking, embarrassment, extortion, and cooperate use/mis- or subjective measurement (or both), the spatial and temporal use (Barcena, Wueest, & Lau, 2014). stamps, as well as one or more quasi identifiers of the measurement’s subject. •• Scenario 2: Disclosure of key identifiers Identity or attribute disclosure is difficult to achieve A dataset is disclosed that includes the values for each objective when quasi-identifiers (e.g., socioeconomic characteristics or subjective measurement (or both), the spatial and temporal of a subject) exist in a dataset that has multiple and variable stamps, as well as one or more key identifiers of the measurements per participant. This is because a subset of measurement’s subject. measurements cannot be linked to an individual. However, if there are only a couple of measurements with the same While a full name is not present in the dataset, other combination of quasi-identifiers it can be inferred that they identifying elements may be given such as e-mail or home belong to a single individual. Also, if the controller dis- address. E-mail addresses can be linked with other online closes information on the data collection methods (e.g., sources to reveal the identity of a participant. Furthermore, there are a minimum or predefined number of measure- home addresses can disclose the participants’ identities, ments per participant), this information can be used to especially in purely residential single family areas (i.e., a define a subset of measurements for one or more data sub- location depicts a residence of only one household). Even if jects. For example, a study collects 100 measurements per the home address is given as a set of geographical coordi- participant, and discloses this dataset along with the sex and nates, X and Y, instead of textual information, the latter can the occupation of each measurement’s subject. A subse- be inferred using freely available reverse geocoding ser- quent data analysis filters out 100 measurements of a man vices (Kounadi, Lampoltshammer, Leitner, & Heistracher, of occupation “X.” All measurements refer to one individ- 2013). ual, which is known due to data collection meta-data infor- mation. Also, it can be found that there is only one man of •• Scenario 3: Disclosure of pseudonyms this occupation in the study area. Thus, the identity and A dataset is disclosed that includes the values for each objective or attribute disclosure of this participant have been compro- subjective measurement (or both), the spatial and temporal stamps, mised like Scenarios 1 and 2. as well as a pseudonym representing the measurement’s subject. •• Scenario 5: Identifying participants in a digital map This scenario illustrates the inferential disclosure of such or printed map datasets with the use of data mining and geoprocessing techniques. If a participant is distinguished by an id, a sub- A map is disclosed in a digital or printed format that portrays the locations and/or values of the measurements for one or set of location data can be analyzed to infer his or her home more participants. address that will lead to privacy threats mentioned in Kounadi and Resch 207 Data deliverables such as participants’ maps are also Controllers often disclose meta-data regarding the loca- prone to reidentification. For example, a map is uploaded tion protection method or any other disclosure limitation on a website of a research organization portraying the val- technique that is applied to the original data to ensure that ues and locations of the measurements for one participant. confidentiality and privacy of subjects are protected, and Reengineering can be applied to the point map to extract the also to provide information on the spatial information loss geographical coordinates of the participant’s locations. of the anonymized released copy that may be used and ana- Brownstein et al. (2006a) applied a reengineering process lyzed by others. However, reengineering can be improved that involves an unsupervised classification to examine the with the disclosure of anonymization meta-data because, spatial reidentification risk of the publication of high- and just like Scenario 6, it provides hints to a potential attacker. low-resolution point maps. The number of correctly reengi- This has been tested with methods such as aggregation and neered addresses was 79% for the high-resolution map and perturbation (Zimmerman & Pavlik, 2008). 26% for the low-resolution map, indicating that by lowering the resolution of a digital map does not prevent reidentifica- Disclosure Risk of Data Collection and Storing tion. Once the coordinates of the participant are extracted, on Devices the home address can be estimated (Scenario 3), then reverse identification (Scenario 2) will reveal a single Data security has been characterized by Boulos, Curtis, and address or a set of addresses, and finally addresses can be AbdelMalik (2009) as the “missing ring” in privacy-pre- used to infer the identity of the participant. The disclosure serving discussions. The authors describe a scenario of a remains even if the map is in a printed format. In this case, research study that has a well-defined privacy-preserving the map can be scanned and georeferenced to a known coor- plan, has been approved by an institutional review board dinate system. The reengineering error of a point printed (IRB), and employs adequate practices for the publication map was examined by Leitner, Mills, and Curtis (2007) of results and maps. However, the security components are who found that the distance errors (i.e., distance from the not checked and approved as the other parts of the research actual to the reengineered location) ranged from 59.54 m to study such as the subjects’ consent to conduct the study, 156.63 m, and are independent of the map scale. disclosure risk of analysis, reporting findings, and sharing data. Thus, the research process is likely to neglect risks •• Scenario 6: Multiple versions of anonymized regarding data theft, data loss, or data disclosure to nonau- datasets thorized parties. Tracking devices that collect physiological or subjective The controller releases multiple versions of anonymised copies measurements can be smartphone applications that collect of the original data. responses to emotions and perceptions, smartphone appli- cations that exploit built-in sensors, or wearable tracking In this scenario, original data are first anonymized using devices such as a wristband or a watch. The measurements an anonymization method. The controller shares the anony- are stored in databases locally, remotely, or both. Data are mized data with a research firm, and soon after discards viewed and analyzed via computer (smartphone, desktop, them because he or she owns the original data. After some- or laptop), and frequently require Internet access (i.e., time, another research firm may make a request for an ano- cloud-based model). Based on the structure of self-tracking nymized copy. The controller reapplies the anonymization systems, security risks exist when data are stored on the method that incorporates a randomization function, and device, data are stored in the cloud, and data are transmitted therefore the anonymized copy is different than the first to the cloud. Barcena et al. (2014) examined a range of self- one. The more this process is repeated, the more copies are tracking services regarding the security issues that take distributed that increase the spatial reidentification risk of place during the storing or transmission of data. First, they the original data. Multiple versions of anonymized dataset found that Bluetooth-Low-Energy-enabled devices can may give hints regarding the method’s parameters and char- transmit a signal that can be read by scanning devices and acteristics to an attacker who will try to reidentify the origi- provide an estimate location of the device. Therefore, the nal data. This scenario has been tested and confirmed for spatiotemporal patterns of the users can be leaked (the same the “non-deterministic Gaussian skew” location protection applies when Wi-Fi is enabled on the device). Second, 20% method (Cassa, Wieland, & Mandl, 2008). of the examined applications that offer cloud-based service components may transmit login credentials in clear text •• Scenario 7: Disclosure of anonymization meta-data (i.e., nonencrypted data). Third, the examined services con- tacted on an average five unique domains. These domains The controller releases metadata information on the location protection method and/or additional disclosure limitation receive information on the user’s behavior and activities practices applied to the original data. without the users being aware of it. Fourth, the services 208 Journal of Empirical Research on Human Research Ethics 13(3) employ user account-based services that make the sessions way that predefined cross-tabulations are preserved. Also, insecure and potential to be hijacked. Fifth, data leakage most techniques can be applied to the records of the matrix may occur if applications use third-party services. Last but (i.e., record transforming masks) or to the columns of the not least, half of the existing services do not have or do not matrix (i.e., attribute transforming mask; Duncan & make available their privacy policies. Pearson, 1991). Several security and anonymity frameworks, however, The first generation of anonymization methods for con- have been proposed for participatory sensing applications fidential discrete spatial datasets, commonly known as (De Cristofaro & Soriente, 2011; Shin et al., 2011; X. O. “geomasking-techniques,” is based on existing methods on Wang, Cheng, Mohapatra, & Abdelzaher, 2013). These microdata such as aggregation and modification with spe- frameworks provide mechanisms to preserve users’ privacy cific adaptations to protect the spatial attribute of the data. when their data are reported in the cloud to a service pro- According to Zandbergen (2014), “Geographic masking is vider. However, we should outline here that in the context of the process of altering the coordinates of point location data a research campaign it is not necessary to send and store data to limit the risk of re-identification upon release of the data in the cloud or to involve a third-party service provider. (p. 4).” The alteration of the coordinates produces an aggre- gated dataset or a modified dataset depending on the tech- nique to be used. If points are aggregated into areal units, Anonymization Methods the transformed dataset has fewer entities than the original In this section, we refer to widely discussed anonymization dataset with count data for each one of them, similar to methods (Table 1) that aim to protect from Disclosure microdata aggregation. If points are aggregated into a new Scenarios 1 to 5. However, we should outline that most of set of symbolic or surrogate points, the transformed dataset the methods have not been evaluated for Scenarios 6 and 7 may retain the original number of observations (Armstrong, on meta-data disclosure or multiple versions of anonymized Rushton, & Zimmerman, 1999; Leitner & Curtis, 2004). copies. The methods mostly affect the precision or the accu- Regarding the modification of the coordinates, points can racy of the produced anonymized (“masked”) data. be processed at a global level with an affine transformation Precision refers to the exactness of information (in geo- (Armstrong et al., 1999) or other cartographic techniques graphical terms, it is the number of decimal places of the such as flipping and rotation (Leitner & Curtis, 2004), and latitude and longitude of locations), whereas accuracy is the at a local level by modifying points with approaches based relation between a measured value and the ground truth. In on random perturbation (Kwan et al., 2004; Leitner & general, “precision-affecting” methods are accurate with Curtis, 2004), or snapping them along the edges of their respect to the information they report, and “accuracy-affect- corresponding Voronoi polygon (Seidl, Paulus, Jankowski, ing” methods are fairly precise. For example, if an observa- & Regenfelder, 2015). tion is aggregated into a postcode level it is not as precise as Adaptive geomasking techniques are modification tech- a point-level observation, but the information that the niques that displace original point locations within uncer- observation lies within the postcode is accurate. Similarly, tainty areas, where the sizes of these areas are defined by if an observation is translated 300 m to the north it is very the underlying population density. The purpose of these precise but still inaccurate. techniques is to offer a “spatial k-anonymity,” meaning that Early methods are mainly statistical and were developed each confidential or private location on the dataset (e.g., a for the protection of microdata. Due to the nature of the household) cannot be distinguished among k-1 other loca- data, the methods are applied to a matrix in which each row tions. Spatial k-anonymity is an adaptation of the classic is a subject and each column an attribute. Although the k-anonymity model. K-anonymity ensures that an effort to structure of participatory sensing spatiotemporal data is dif- identify information of an entity ambiguously maps infor- ferent, these methods formed the basis for the next genera- mation to at least k entities; in other words, any group is tion of more advanced techniques, including the spatial or hidden in a group of size k regarding the quasi-identifiers the spatiotemporal ones. They can be summarized into four (Samarati & Sweeney, 1998). The uncertainty area of the categories: abbreviation, aggregation, modification, and “population-density-based Gaussian spatial blurring” is cir- fabrication (Cox, 1996). An example of abbreviation is the cular, and the selection of the displacement is based on a suppression of records (in this context, it means removal) normal distribution (Cassa, Grannis, Overhage, & Mandl, from geographical areas of low population density. In 2006). In “donut geomasking,” the uncertainty area has the aggregation, microdata records (one record equals to one form of a torus so as to ensure a minimal displacement data subject) of similar values can be averaged, and there- (Hampton et al., 2010). fore microdata are transformed to tabular data. A typical Furthermore, the “voronoi-based aggregation system” example of modification is perturbation where random (Croft, Shi, Sack, & Corriveau, 2016; a spatial aggregation noise is added to each cell or to certain variables. Last, one approach) and the “triangular displacement” (a modifica- fabrication technique is data swapping between records in a tion approach; Murad, Hilton, Horan, & Tangenberg, 2014) 209 Table 1. Privacy and Confidentiality Approaches for Statistical and Spatial Data. Anonymization Dataset approaches Description Major effect Benefits Limitations Microdata Abbreviation Reduces the volume or granularity of Imprecision Easy implementation; mathematical basis for Current applications are released information location protection methods restricted to a-spatial data Aggregation Combines adjacent categories or replaces with nearby values Modification Changes data values with rounding or Inaccuracy perturbation Fabrication Creates a fictional dataset that has distributional and inferential similarities with the original Confidential discrete Adaptive Actual locations are perturbed Inaccuracy Risk of identification information can be Current applications are spatial data (e.g., geomasking considering the spatial k-anonymity adaptively anonymized to meet data-specific restricted to static, health care, crime, regulations and restrictions; anonymized data nontemporal discrete household surveys) retain the initial discrete structure that is location data crucial for many spatial-point pattern analyses Geomasking Geographical masks that extend spatial Inaccuracy or In addition to the location and sensitive theme, with quasi- k-anonymity to basic k-anonymity to imprecision quasi-identifiers may be disclosed that allow identifiers account for quasi-identifiers further analysis of covariates Synthetic Anonymized data are synthesized from Inaccuracy Retains relationship between locations and geographies the results of spatial estimation models covariates that use covariates as estimators of confidential locations Spatiotemporal data Point A set of locations is replaced by a single Imprecision Adequate for visualizing trajectories of Point aggregation of individuals (e.g., aggregation representative location individuals or movement flows in between underperforms random GPS trajectories, areas perturbation techniques cellular data, LBS, Cloaking Lowers the space and/or time precision Option to decrease the temporal or the spatial Prohibits spatial-point radio-frequency of individual-level data resolution pattern analysis; polygon identification devices clustering may hide [RFID]) significant point clusters Dummies Adds noise that simulates human Inaccuracy Allows spatial-point pattern analysis and The spatial accuracy of the trajectories analysis by user augmented anonymized dataset compared with the original one has not been addressed Pseudonyms Identities are stored with pseudonyms Inferential disclosure is not protected Mix zones Locations are hidden in certain areas, and High positional accuracy is achieved in low Analysis by user or group pseudonyms change when exiting them sensitivity areas; it is harder, if not impossible, of users is not possible to perform inference attacks on individuals’ if pseudonyms change in spatiotemporal behavior if pseudonyms are time changed periodically Note. GPS = global positioning system; LBS = location-based services. 210 Journal of Empirical Research on Human Research Ethics 13(3) can be applied to spatial datasets that include covariates, dummies are an interesting approach, the spatial analytical although there are still open questions with respect to the errors of the increased dataset have not been addressed and spatial analytical error they produce (regarding the Voronoi- should be considered when such a dataset is released for based method) or the quantification of the offered k-ano- research purposes. Another technique that affects the accu- nymity (regarding the triangular displacement method). racy of the data is the use of “unlinked pseudonyms” that Last, concepts of simulated geographies (a fabrication are fake identities associated with data subjects (Cuellar, approach) also require additional attributes to create a pro- 2004). As it is explained earlier, pseudonyms will not pre- tected spatial dataset (Paiva, Chakraborty, Reiter, & vent inferential disclosure when space–time stamps are dis- Gelfand, 2014; H. Wang & Reiter, 2012). Here, the attri- closed. A more sophisticated version of pseudonyms is the butes are used to make spatial predictions on the confiden- “mix zones” method in which a new pseudonym is given to tial theme. The resulting hotspots are then used to synthesize a subject as soon as he or she exits the so called mix zone the anonymized dataset. (Beresford & Stajano, 2003, 2004; Buttyán, Holczer, & The general drawback of techniques on confidential dis- Vajda, 2007). In addition, while being in the mix zone loca- crete spatial data is that they have not been applied to spa- tions are hidden. There are two limitations to be considered tiotemporal data. Tuning of the algorithms is needed to if such methods are to be exploited for participatory sensing consider multiple sensitive measurements per data subject data: First, they take into consideration only the space and as opposed to traditional confidential discrete data where time attributes, whereas participatory sensing data also one location, typically a home address, is given per subject. include confidential measurements and potentially addi- However, an important advantage of geomasking studies tional quasi-identifiers. Second, the anonymity refers to for privacy research design is the extensive evaluation of other or artificially inserted subjects in the dataset (i.e., the produced masked datasets regarding the spatial analyti- users of a service), which may not prevent disclosure of pri- cal error. vate locations (see Scenario 3), unless either the underlying Spatial-point aggregation (Adrienko & Adrienko, 2011; residential/building structure is considered or a very large Monreale et al., 2010), or spatial-areal and temporal aggre- number of participations in the study are achieved. gation, known also as cloaking (Cheng, Zhang, Bertino, & The presented methods have the potential to be used for Prabhakar, 2006; Gruteser & Grunwald, 2003; Kalnis, participatory sensing data if they are combined and/or Ghinita, Mouratidis, & Papadias, 2007), follows the same adapted. Nevertheless, the complexity of a participatory approach as statistical aggregation. In particular, it decreases sensing dataset has to be taken into account. Specifically, a the precision of original data. Point aggregation can be used spatiotemporal trajectory dataset contains the attributes for for both privacy protection and a generalization approach to each data subject for multiple measurements per subject, visualize flows in movements and in between areas. With like a participatory sensing dataset. However, it does not cloaking, the time duration of an object at one location is have sensitive attributes or quasi-identifiers other than the considered as quasi-identifier. Given the number of other spatiotemporal information. On the contrary, a confidential objects at this location and for this time duration, a decision discrete dataset may have quasi-identifiers and sensitive to decrease spatial resolution will be taken. Similarly, one attributes but collects only a single measurement for each can lower the temporal resolution. Because cloaking is data subject. designed for LBS data, the anonymity it offers is calculated Another limitation of the existing techniques is that most based on the number of other data subjects (i.e., users of a of them are based on the concepts of spatial k-anonymity service) at a particular time and location. Considering the and k-anonymity aiming at decreasing the risk of inferential number of users of a LBS, this approach can provide suffi- disclosure or identity disclosure. These concepts cannot cient anonymity. However, the number of participants in prevent attribute disclosure that may occur from homogene- participatory sensing studies will probably be much lower, ity attack (i.e., knowing a person who is in the database) and and this will greatly affect the anonymized dataset’s spatial background knowledge attack (i.e., knowing a person who precision due to larger disclosed regions and/or coarser is in the database, and additional information on the distri- time. Generally, all techniques that involve some sort of bution of the sensitive attribute or on the characteristics of spatial aggregation will affect analytical results due to the the person who is in a database). The problems can be modifiable areal unit problem (Openshaw & Openshaw, solved with the concept of “l-diversity” where an equiva- 1984). In practice, polygon or point clusters of the measure- lent class has at least l “well-represented” values for the ments’ values may appear or disappear depending on the sensitive attributes (Machanavajjhala, Kifer, Gehrke, & aggregation’s division of the space. Venkitasubramaniam, 2007). L-diversity ensures that for A different concept is to add noise to the data with artifi- one sensitive attribute table, all equivalent classes of a table cial trajectories so called “dummies”(Kido, Yanagisawa, & have at least l-distinct values for the sensitive attribute. For Satoh, 2005; You, Peng, & Lee, 2007). Dummies are added the case of multiple sensitive attributes, one sensitive attri- to satisfy the anonymity of each data subject. Although bute is treated as the sole sensitive attribute, while the Kounadi and Resch 211 Table 2. Privacy and Confidentiality Recommendations From Public and Independent Bodies. FCSM CDC-ATSDR NRC Organization 1. Standardize 1. Designate a privacy manager 1. Methodological training in the acquisition and use of data and training and centralize 2. Train all responsible staff 2. Training in ethical considerations of data that include agency review of 3. Define criteria for access to restricted-access explicit location information on participants disclosure-limited files 3. Design studies in ways that provide confidentiality data products 4. Planning for release of PUDS protection for human participants 2. Use consistent practices FCSM CDC-ATSDR ICO (POA) Data processing 3. Remove direct 5. Classify each dataset as a restricted-access or 1. Increase a mapping area to cover more properties or identifiers and limit a PUDS occupants other identifying information FCSM CDC-ATSDR ICO (POA) ICO (GCD) NIJ–CMRC Publication 4. Share information 6. Include disclosure 2. Reduce the frequency 1. The use of heat maps, 1. Decide which data to present: blocks, and zones Point versus aggregate data of data and on assessing statement with or timeliness of deliverables disclosure risk PUDS publication reduces privacy risks 2. Use disclaimers to avoid 2. New ways of liability from misuse or 7. Maintain log of 3. Use mapping formats datasets rereleased that do not allow the representing misinterpretation of data information about 3. Provide information on inference of detailed information crime should be laws, liability, freedom of explored information, and privacy 4. Avoid the publication of spatial information 4. Provide contact information of on a household level persons with privacy expertise and familiarity with the data CDC-ATSDR NRC NIJ–CMRC Release data to 8. Authenticate the identity of data 4. Data stewards should 5. Consider privacy and other implications if data provided a third party requestors develop licensing will be merged with other data 9. All restricted-access data requestors are agreements to provide 6. Decide presentation of research results required to sign a DSA increased access 7. Researchers and the agency decide what data will be 10. Requirements for a standard DSA for to linked social- needed restricted-access data spatial datasets that 8. A nondisclosure agreement may be used to guarantee 11. Monitor user compliance with DSAs include confidential confidentiality 12. Include addendum to the DSA when a information 9. The agency can review any research results before requestor plans to link restricted-access publication data to other data 10. Perform background checks on research personnel who 13. Include addendum to the DSA when a will have access to data requestor plans further data releases from 11. Decide where data will be stored to ensure secure restricted-access data to other parties settings 12. Require researchers to destroy raw data after the research is completed Note. Recommendations have been grouped into four categories according to the topic they address. FCSM = Federal Committee on Statistical Methodology; CDC-ATSDR = Centers for Disease Control and Prevention and the Agency for Toxic Substances and Disease Registry; PUDS = public-use dataset; ICO = Information Commissioner’s Office; POA = Practice on Anonymization; GCD = Geospatial crime data; NRC = National Research Council; NIJ = National Institute of Justice; CMRC = Crime Mapping Research Center; DSA = disclosure sharing agreement. others are treated as quasi-identifiers. Thus, l-diversity sets research group or institution, and are specific to the public or requirements on both the quasi-identifiers and the sensitive independent bodies who issued the documents were filtered attributes. out. The recommendations are shown in Table 2 (some of those may have been paraphrased from the original reports) by each body, and divided into four categories according to Recommendations From Relevant Institutions the topic they address. The top part of the table shows the In this subsection, we examine privacy documents from pub- recommendations regarding the organization processes and lic or independent bodies. We focus on recommendations or training of the staff. The second category is about data pro- guidelines with respect to the usage, anonymization, and cessing, and the third category is about the publication of data release of private or confidential data. Recommendations that and deliverables. The bottom part of the table shows recom- are not applicable to research design, within the context of a mendations regarding the release of data to a third body. Two 212 Journal of Empirical Research on Human Research Ethics 13(3) public bodies provide recommendations with respect to con- more precise recommendations when it comes to presenting fidential microdata (Centers for Disease Control and spatial research outputs (2-4 from ICO [POA]; 1-2 from Prevention [CDC]-CSTE, 2005; Federal Committee on ICO [GPD], and 1 from NIJ). It is also recommended that a Statistical Methodology, 2005). Two bodies discuss social, research output or a disclosed dataset is accompanied by health, or personal spatial data (Graham, 2012; Gutmann & privacy-related information (e.g., disclosure assessment, Stern, 2007). Last, two bodies look into crime events as a laws, liability, etc.) and a reference to contact person (4 special type of confidential discrete spatial data (Information from FCSM, 6 from CDC-ATSDR, 3-4 from NIJ). In addi- Commissioner’s Office [ICO], 2012; Wartell & McEwen, tion, CDC-ATSDR suggests to maintain an inventory of 2001). released datasets. The inventory of restricted-access data The U.S.-based Federal Committee on Statistical should be stored internally to ensure compliance with the Methodology (FCSM) provides assistance and guidance on terms of the disclosure sharing agreement (DSA). On the issues that affect federal statistics such as in situations when contrary, for an anonymized public-use dataset (PUDS) the the Office of Management and Budget applies policies inventory can inform interested parties on the datasets’ related to statistics. The most recent working paper on dis- availability and meta-data. Last, NIJ suggests the use of dis- closure by the agency from 2005 discusses anonymization claimers to reduce liability when outputs, such as maps, methods, practices employed by federal agencies, and offers may lead to ambiguous interpretations. recommendations for good practice for both tables and Regarding data releases to a third party (last category of microdata. Another list of guidelines was published in a Table 2), the bodies agree to the requirement of a formal comprehensive report in 2005 by the Centers for Disease agreement between the controller and the requestor. Also, Control and Prevention and the Agency for Toxic Substances checks of the requestor’s validity may be conducted (8 from and Disease Registry (CDC-ATSDR). CDC and ATSDR are CDC-ATSDR and 10 from NIJ). Then, the particulars of the both U.S. federal agencies under the Department of Health data release and potential uses should be discussed and and Human Services and therefore the focus of the report is decided between the two parties such as merging released on health data. data with other data or presentation of results (12, 13 CDC- The recommendations by the National Research Council ATSDR and 5, 6, 7, 11 NIJ). Although data sharing particu- (NRC) in the United States and the independent body lars are decided with the DSA, the collector should still be Information Commissioner’s Office (ICO) in the United allowed to review research outputs if needed. Kingdom are specific to spatial confidential data. NRC pro- vides services via reports to the government, the public, and Privacy by Design Research Campaign the scientific or engineering communities. The recommen- dations address data collected by federal agencies, individ- While previous research has mainly focused on methods to ual researchers, academic or research organizations, and preserve privacy and measures to examine information dis- outline the need to anonymize discrete spatial data. The closure, we propose practical privacy-preserving steps for code of practice on anonymization by ICO (named as ICO the collection, storage, analysis, and dissemination of indi- [POA] in Table 2) focuses on the requirements set by the vidual measurements from mobile participatory sensing Data Protection Act (The Stationery Office, 1998) to high- applications. A privacy-preserving research campaign light key issues in the anonymization of personal data, and requires a concrete privacy plan of several tasks to be devel- has a dedicated section on spatial information. Furthermore, oped before, during, and after the completion of the cam- ICO has published a separate report (named as ICO [GCD] paign. These tasks are presented here as recommendations, in Table 2) with a focus on geospatial crime data. Due to the because their application depends and varies based on a sensitivity of crime events and the increase of online crime project’s specifications. In this article, we treat initial tasks mapping, the National Institute of Justice (NIJ) in the as prior to starting a survey (subsection Presurvey Activities), United States published as well a detailed report tailored to storing, anonymization, and assessment of derived datasets this topic. It discusses, among other issues, the publication (subsection Processing and Analyzing Collected Data), and of data and maps, and the sharing of data with other agen- actions to eliminate disclosure from published data and cies or researchers. deliverables, or when datasets are shared with third parties Recommendations 1 and 2 from FCSM, 1 to 4 from (subsection Disclosure Prevention). Furthermore, a separate CDC-ATSDR, and 1 to 3 from NRC suggest practices prior subsection is dedicated to recommendations that aim to to the anonymization, release, or sharing of the data such as ensure the appropriateness of the research environment to to offer essential training, establish a privacy plan, and stan- handle a privacy-preserving research campaign (subsection dardize practices. There are a few recommendations regard- Security and Safety). In each subsection, we analyze and ing the processing of the data (3 from FCSM, 5 from explicate the details of the recommendations which are then CDC-ATSDR, and 1 from ICO [POA]), but they do not pro- summed up on a table at the end of the respective subsec- pose concrete anonymization methods. However, there are tions (Tables 3, 4, 5, and 7). Kounadi and Resch 213 Table 3. A List of Initial Activities Prior to the Starting of the shared with third parties, criteria for access to restricted- Survey. access datasets (e.g., research personnel, data requestors) have to be defined and included in the plan. A. Presurvey activities The next presurvey step is the preparation of the partici- 1. Design study in the least privacy invasive manner pation agreement. Essential elements of a participation 2. Develop a privacy-preserving research plan agreement include (a) purpose and procedures of the study, 3. Define criteria for access to restricted-access datasets (b) potential risks and discomforts, (c) anticipated benefits, 4. Prepare a participation agreement (d) alternatives to participation, (e) confidentiality state- 5. Ensure inform consent on location privacy disclosure risks ment, (f) injury statement, (g) contact information, and (h) 6. Obtain institutional approval preferably reviewed from a DRB voluntary participation and withdrawal (Hall, 2016). The confidentiality statement can vary depending on the loca- Note. DRB = disclosure review board. tion of the study area, and respective laws and regulations. The participation agreement should outline the location Table 4. A List of Recommendations to Ensure Secure and Safe privacy protection insertions in each stage of the project and Settings. communicate the remaining disclosure risks, if any. Those who communicate the study to the participants should B. Security and safety explain in common language what is “location privacy” and 1. Assign a privacy manager other related terminologies, and provide examples that 2. Train collectors and/or processors in methods and ethical allow them to make an informed decision about whether to considerations participate or not. An optional step for improvement in 3. Ensure a secure IT system future surveys is to add the participants’ feedback regarding 4. Ensure secure sensing devices the perception and preferences on the established privacy Note. IT = information technology. measures. Last, both the research plan and the participation agree- ment should go through institutional approval from objec- Table 5. A List of Recommendations to Store, Anonymize, and tive and experienced staff of the institution or University Asses Derived Datasets. such as IRB, review ethics committee (REC), or a more C. Processing and analysis of collected data specialized disclosure review board (DRB). With respect to the type of organization, De Wolf (2003) suggests to consult 1. Delete data from sensor devices once stored in the IT system a cross-disciplinary DRB that makes recommendations to 2. Remove identifiers from the dataset the IRB, if the institution’s IRB does not have a standard- 3. standardize anonymization practices ized process for reviewing outputs from survey confidential 4. Ensure that the inclusion of pseudonyms does not lead to disclosure data. The creation of a cross-disciplinary DRB could also 5. Ensure that the inclusion of quasi-identifiers does not lead to serve as a committee that educates researchers on the cur- disclosure rent available anonymization and disclosure techniques. 6. Ensure a sufficient l-diversity of the sensitive attributes 7. Classify each dataset as a restricted-access or anonymized Security and Safety dataset 8. Assess disclosure of anonymized datasets The first step of a research campaign that collects participa- 9. Assess anonymization effect on spatial analysis tory sensing data is to assign a dedicated privacy manager who is responsible for the tasks of this subsection as well as Note. IT = information technology. for consulting on (or performing) the tasks of the following subsections. The privacy manager should train data proces- Presurvey Activities sors and collectors regarding their specific activities, and is also responsible to ensure that the research environment The privacy manager should initially design the study in the provides secure and safe settings regarding the sensing least privacy invasive manner depending on the purposes of devices and the information technology (IT) system where the research study. For example, if analysis by user or group data will be stored and processed. of users is not foreseen, all measurements can be stored With regard to the security of IT systems, Boulos et al. altogether without pseudonyms. The study design should be (2009) provide a comprehensive list of measures that reported within a research plan that has dedicated sections include the usage of (a) advanced cryptography, (b) biomet- regarding privacy preservation. These sections should rics, (c) unlocking the data under the physical presentation describe methods and practices that take place during the of other members, (d) cable locks, (e) computers with a project’s duration, and for the time period for which per- built-in trusted platform module (TPM) chip, (f) password sonal data are to be kept by the team. Also if data are to be 214 Journal of Empirical Research on Human Research Ethics 13(3) attack protection, (g) network security, (h) multilevel secu- According to the Health Insurance Portability and rity (MLS), (i) secure USB flash drives, (j) blanking com- Accountability Act (HIPAA) Privacy Rule, there are 18 ele- puter display and autolog-off, (k) discarding of old ments that should be either removed or generalized to dei- equipment and storage media. dentify a dataset (U.S. Government Publishing Office, Furthermore, security should be scrutinized on the sens- 2009). These are (a) names; (b) geographic subdivisions ing devices. Tracking subjective observations is typically smaller than a State with some exceptions having a popula- performed via smartphone “human-as-sensor” applications tion threshold of 20,000 people; (c) dates directly related to that are developed by research teams tailored to the require- an individual; (d) telephone numbers; (e) fax numbers; (f) ments of a research study (Solymosi et al., 2015; Zeile, electronic mail addresses; (g) social security numbers; (h) Resch, Loidl, Petutschnig, & Dörrzapf, 2016). It is recom- medical record numbers; (i) health plan beneficiary num- mended that the application does not incorporate a closed- bers; (j) account numbers; (k) certificate/license numbers; source third-party code. In this case, the researchers cannot (l) vehicle identifiers and serial numbers, including license accurately estimate the risk because they cannot be certain plate numbers; (m) device identifiers and serial numbers; that the third party will not appropriate the sensed data. (n) Web Universal Resource Locators (URLs); (o) Internet Instead, the “human-as-sensor” software should be devel- Protocol (IP) addresses; (p) biometric identifiers, including oped exclusively by the research team. Also, data should be finger and voice prints; (q) full-face photographic images stored only locally and in an encrypting form to prevent the and any comparable images; and (r) any other unique iden- security risks during transmission, when data are stored in tifying number, characteristic, or code. If necessary, identi- the cloud, and when devices are lost or stolen. Collected fiers linked to pseudonyms or measurements may be kept in data should be transferred regularly to the secure research a separate encrypted database to allow original data and IT system. study results to be sent to the participants. Also, the deletion Also, objective observations are tracked with products of data and removal of identifiers may be a daily task or a (smartphone applications or wearable devices) that measure regular task during the survey when it is conducted for lon- physiological measurements. Although a research cam- ger periods of time. paign may develop and use their own product (Bergner, The next step is data anonymization. The anonymization Zeile, Papastefanou, & Rech, 2011; Zeile, Höffken, & of an identifier’s free spatial dataset is necessary as long as Papastefanou, 2009), professional products may be pur- data subjects are to be distinguished from each other. If chased as well from specialized sensor companies. This multiple datasets are to be collected by the research cam- means that researchers analyze collected data (outputs) of paign, the anonymization approach should be standardized “blackbox” systems. When these systems operate on smart- to ensure consistency on released datasets. Collected data phones that may have access to other applications and sen- should be anonymized prior to their release considering the sors of the device data security risks are harder to estimate. following three principles: (a) inclusion of pseudonyms Thus, we recommend the purchase and use of wearable does not lead to disclosure, (b) inclusion of quasi-identifiers devices. Similar to the “human-as-sensor” applications, does not lead to disclosure, and (c) sensitive attributes are data should be stored only locally and in an encrypted form. “well represented” among the equivalent classes of quasi- In addition, Bluetooth and Wi-Fi should be turned off identifiers. All processed datasets should be classified as while the participants use the devices. If this is not possible restricted-access and anonymized datasets. and the survey is conducted for longer periods of time, the An inevitable result of the anonymization process is the devices should be randomly and regularly interchanged reduced quality and accuracy of the anonymized dataset. In among the participants. Therefore, if the trajectories of a fact, by increasing the privacy levels of an anonymized device are collected by a scanner, they could not be linked dataset the dissimilarity of the dataset to the original one to a single individual. The research group may empty the will also increase. Nevertheless, the analytic usefulness also devices and store the data before each exchange (e.g., on a depends on the anonymized method. For example, anony- daily basis) to retain the trajectories of each participant mized data based on the donut method, random perturba- distinguishable. tion, and adaptive areal elimination performed better in If a research team opts for third-party smartphone appli- detecting spatial clusters compared with aggregation for the cations (for collecting either subjective or objective sensing same level of spatial k-anonymity (Hampton et al., 2010; measurements) which transmit and store data on the cloud, Kounadi & Leitner, 2016). Hence, the person who is respon- the relevant security risks have to be considered and com- sible to anonymize should select the approach that has the municated to the participants of a survey. least effect on the analysis to be performed by future data users, conditioned that the approaches can offer the same level of anonymity. Processing and Analyzing Collected Data For example, if the relationship between the locations of The processor should empty sensor devices once data have measurements and other covariates is important, the syn- been archived, and remove identifiers from the dataset. thetic geographies may be an ideal approach. For clustering Kounadi and Resch 215 Table 6. Measures to Evaluate the Anonymization Effect by Type of Spatial Analysis. Unit of analysis Spatial analysis Measures of spatial error and information loss Points Global descriptive statistics Global divergence index (GDi) Pattern detection/analysis Divergence to clustering distance in cross K function analysis, distance to k-nearest neighbors, or Moran’s I value Univariate spatial prediction Divergence to prediction accuracy index (PAI), prediction efficiency index (PEI) Local indicators of spatial Local divergence index (LDi), stability of hotspot (SoH) association Spatial clustering Detection rate, accuracy, sensitivity, and specificity Multivariate spatial relationship Divergence to R-squared or root-mean-square standardized error Areas Choropleth mapping, density Index of similarity (S), suppression, compactness, discernibility, surface estimation nonuniform entropy and pattern analysis, we suggest adaptive geomasking, Table 7. A List of Recommendations to Prevent Disclosure When (a) Findings Are Published, (b) Anonymized Datasets Are dummies, or mix zones. While geomasking retains the Published, and (c) Data Are Shared With Third Parties. count of the original dataset, dummies add data, and mix zones remove data from the dataset. Hence, they should be D. Disclosure prevention preferred in highly populated areas of low sensitivity where Dissemination of findings it is more likely that the addition or removal of measure- 1. Reduce spatial precision ments has a minimal effect. If data are to be used for areal 2. Reduce temporal precision analysis or for choropleth mapping, cloaking can be used as 3. Consider alternatives to point distribution maps a form of adaptive areal aggregation. The data will be less 4. Assess disclosure on a point distribution map precise than the original data; however, there will be no spa- 5. Provide protection vs. disclosure information tial error involved. On the contrary, the usefulness of the 6. Provide contact information cloaked areas should be considered because they may vary 7. Use disclaimers in size and also overlap other analysis units such as admin- Anonymized datasets istrative areas. In such scenarios, areal interpolation can be 8. Avoid the release of multiple versions of anonymized performed that also involves a spatial error to be estimated. datasets Also, point aggregation, as a form of generalization, can be 9. Avoid the disclosure of anonymization meta-data used to visualize the measurements’ trajectories. Again, 10. Inform about disclosure risk assessment there is no spatial error but less precise data. 11. Provide information on protection and effect The final step is the assessment of the anonymized data 12. Provide contact information regarding the disclosure risk, if any, and the anonymization 13. Maintain log of anonymized disclosed datasets effect of the quality of the masked data. The assessment Data sharing with third parties should be clearly communicated to potential users. In Table 14. Plan a mandatory licensing agreement 6, we present measures that can be used to quantify the 15. Plan a DSA for restricted-access data effect of the anonymization process to the masked data 16. Authenticate the identity of data requestors based on the type of spatial analysis to be performed. The 17. Perform background checks on research personnel who global divergence index (GDi) is a composite indicator will have access to data which considers the spatial mean as a measure of central 18. Ensure requestor’s safe settings tendency, the orientation of the ellipse as a measure of 19. Decide what data will be needed directional trend, and the length of the ellipse’s major axis 20. Consider implications if restricted-access data will be as a measure of spatial dispersion (Kounadi & Leitner, merged with other data 2015). It shows the divergence of global spatial statistics of 21. Decide presentation of research outputs the masked point pattern to the original point pattern. For 22. Decide length of period of retaining restricted-access data 23. Review research outputs before publication point pattern analysis and detection, possible approaches 24. Maintain log of restricted-access disclosed datasets are to calculate Cross K function analysis (Kwan et al., 2004), distance to k-nearest neighbor (Seidl et al., 2015), or Note. DSA = disclosure sharing agreement. Moran’s I value to both masked and original datasets, and report the differences of the results. When locations of masked events are used in univariate spatial prediction, the Uhlig, 2008) and the prediction efficiency index (PEI; Hunt, prediction accuracy index (PAI; Chainey, Tompson, & 2016) can be used to evaluate the predicted hotspot areas 216 Journal of Empirical Research on Human Research Ethics 13(3) where the events are more likely to occur. Then, the PAI and However, this measure cannot be used to compare the PEI of masked and original datasets can be compared and effects of two datasets in different areas that were anony- reported. mized in the same way. In this scenario, the divergence to The local divergence index (LDi) calculates the diver- Moran’s I values or to another global statistic of spatial gence of hotspot areas using the Getis-Ord Gi* statistic. autocorrelation with fixed intervals can be employed. The This index can be used to detect the masking effects to the use of indices and standardized metrics allows the testing local characteristics of the original pattern. Another with several datasets and areas, and can give an overall approach that can be used for the local properties is the evaluation of anonymization technique for its usage in spa- Stability of Hotspot (SoH) metric which was originally tial analysis. designed to measure the clusters’ deviation from the same datasets in different resolutions (Bruns & Simko, 2017). Disclosure Prevention The same metrics can be used to measure the clusters’ devi- ation from different datasets (original vs. masked) of the Dissemination of research findings poses significant pri- same resolution. Regarding spatial clustering, there are a vacy threats as those discussed in Disclosure Scenario 5 few indices that can be used. Clusters’ detection rate is the (Subsection Disclosure Risk of Data and Deliverables). percentage of significant spatial clusters (Olson, Grannis, & Hence, researchers should carefully evaluate their research Mandl, 2006), clusters’ accuracy is the percentage of sig- outputs and only present findings, particularly in the form nificant clusters in which at least half of the masked points of a map, if these are needed to convey important messages originate from clustered original points (Olson et al., 2006), to the readers of a publication. A simple way to avoid dis- clusters’ sensitivity is the percentage of masked points that closure risks is to decrease the spatial and/or the temporal originate from clustered original points and are still clus- precision of findings. While researchers may want to report tered (Cassa et al., 2006; Hampton et al., 2010), and clus- on details of the study area and collected data, they should ters’ specificity is the percentage of masked points that avoid point distribution maps of original data in cases where originate from nonclustered points and are still nonclus- participants can be distinguished (e.g., different coloring tered (Cassa et al., 2006; Hampton et al., 2010). Regression per participant or groups of participants, or each point indi- models such as geographically weighted regression (GWR) cates a private location about one participant). Haley et al. or spatial regression can be applied to the original data and (2016) did a literature review on articles published in a covariate(s) (explanatory variable), and then to the masked PubMed, and identified numerous cases that displayed par- data and the covariate(s). The divergence of the models’ ticipant data in maps as points or small-population geo- results, such as R-squared or root-mean-square standard- graphic units. In more than half of the articles, the authors ized error, can act as a measure of error in prospective mul- either did not refer to employed privacy protection tivariate analysis. approaches or anonymized data inadequately. Safe alterna- Regarding analysis on areas or grid cells, the index of tives to point distributions can be a density surface estima- similarity S can identify the degree to which counts within tion or a clustering spatial distribution that reduce the risk areal units is different (Andresen, 2009; Tompson, Johnson, of spatial reidentification. However, these practices may Ashby, Perkins, & Edwards, 2015). Furthermore, aggrega- portray a negative or positive image about an entire neigh- tion-based anonymization techniques are ideal for chorop- borhood that will be perceived as a hotspot of the sensed leth mapping or density surface estimation. Aggregation measurement. does not affect the accuracy but the precision of the data. If it is necessary for research purposes to present a sensi- Therefore, the effect can be evaluated with information loss tive point map, anonymization techniques such as these metrics such as suppression (i.e., number of suppressed under the category “confidential discrete spatial data” of records), compactness (indicates level of geographic preci- Table 1 should be employed. However, the masked point sion), discernibility (checks for anonymity levels higher distribution will, to some degree, be different from the point than the desired level), and nonuniform entropy (based on distribution of the original dataset. The researchers should the probability of identifying original locations; Croft, Shi, consider this error and the impact it may have on reader’s Sack, & Corriveau, 2017). interpretation of the map. Also, it is important to mention Measures that are in the form of an index or a standard- that location privacy risks appear when participatory data ized metric should be preferred because they allow com- are collected for longer periods of time for which the par- parisons between datasets and study areas that are not ticipant has the device on, meaning that his or her identify- possible for some of the measures listed in Table 6. For ing locations can be captured (for more details, refer to example, it may be useful to calculate the divergence of inferences on places under Disclosure Scenario 1, masked and original data to the third nearest neighbor dis- Subsection Disclosure Risk of Released Data and tance by three anonymization approaches, and identify the Deliverables). Hence, a participant distinguishable map approach that has the least effect on point pattern analysis. poses no privacy risks if data are collected for a clearly Kounadi and Resch 217 defined study area or route, and no further identifying infor- inventory of all disclosed or shared datasets that describes mation about the participants is included on the map. If the datatype based on the classification, the disclosed desti- there are any disclosure risks associated with a published nation (e.g., another institution, open data platform), and map, the responsible researcher must estimate and report other relevant information. them. Last, when research outputs are uploaded on a research project’s web page the usage of disclaimers may Conclusion limit unintended misconceptions of the presented informa- tion. There are no standard disclaimers for use, but they The proposed privacy recommendations were generated depend on the publication and information prone to inter- from two sources of information: The first source of infor- pretation. The wording should specify what does the publi- mation is the technical information, and second one is the cation is not liable for, such as decisions and actions taken experts’ suggestions. Technical information includes the by a reader, and errors in the data such as omissions, sys- disclosure risk and approaches to minimize or eliminate tematic bias, or inaccuracies due to privacy constraints. the risk. The experts’ suggestions are a summary of recom- Furthermore, anonymized datasets may be disclosed as mendations or guidelines regarding confidentiality issues long as data are protected and follow the recommendations that arise from the collection, use, or dissemination of per- below. There are different reasons behind an institution’s or sonal data. A chronological classification of our recommen- research group’s decision to share their data. A research dations involves first these that should take place before the group may wish to make their collected data publicly avail- initiation of the survey (presurvey), second these that ensure able to increase visibility of their work, and allow other the safety and security of the research environment, next as researchers to use them which will in turn make scientific soon as data are collected (processing), and finally after comparisons possible. On the contrary, releasing data may data are processed. Some recommendations are applicable be a compromise against the will to publish in a scientific to all research projects (e.g., ensuring safe settings or the journal that has a data policy which requires research data privacy protection of research outputs). However, recom- to be publicly available (PLOS ONE, 2014). In such cases, mendations regarding the disclosure of anonymized datas- a document should be attached to the released datasets that ets and sharing restricted-access data with third parties are contains information on the disclosure risk, protection applicable only if the data controller opts for these prac- method, masked data quality, and contact information on tices. Our set of recommendations can act as a general privacy matters. In addition, practices that increase the dis- guideline for research campaigns that want to use participa- closure risk such as the release of multiple versions of ano- tory sensing data by enlisting the steps of the campaign nymized datasets or disclosure of anonymization meta-data where privacy actions should be taken. Some of our recom- should be avoided. mendations, such as anonymization and dissemination of It is also possible that collected spatiotemporal participa- findings, can also be applicable to other types of spatial tory data are shared with other institutions or researchers. data. However, privacy restrictions that may be specific to Data sharing should be one of the many privacy insertions other types of data and the bodies that share them are not of the confidentiality statement within the participation discussed here. agreement. The institution is responsible for preparing a An important prerequisite of any research project that licensing agreement for such purposes regardless of the involves spatiotemporal participatory data is that the mem- data nature (i.e., anonymized data or restricted-access data). bers of the project are either trained or experts in location For restricted-access data, a separate DSA should be pre- privacy threats. The training should take place at an early pared, or a respective section within the licensing agree- stage of the research campaign to guarantee success in the ment should be inserted. Recommendations 17 to 23 of next two tasks: The first task is to prepare the research plan Table 7 are intended mainly for restricted-access data. It is and the participation agreement. If the data collector decides advisable that the institution performs checks on the credi- to share sensitive data with third parties, criteria for sharing bility and capability of the requestor to handle sensitive per- restricted-access data (i.e., identifier-free survey data) sonal data such as investigating the requestor’s research should be included in the research plan. Both the research personnel, settings, and identity. The controller and the plan and the participation agreement should be comprehen- requestor should decide together about the data that are sive regarding the privacy insertions to ensure a successful needed, the length of period that will be kept by the institutional approval of the survey. The second task is to requestor, and examine potential linkage-disclosure impli- ensure that the research environment establishes secure cations if original data are to be used with other datasets. measures to prevent privacy and confidentiality breaches of Regarding research outputs, the controller should have the collected and stored data. right to review the presentation as well as the final publica- The processing tasks start as soon as survey data are col- tion deliverables to ensure that anonymity is preserved. Last lected. First, data should be safely stored, the devices should but not least, the privacy manager should maintain an be cleaned from any stored data, and identifiers should be 218 Journal of Empirical Research on Human Research Ethics 13(3) removed from the datasets to be analyzed. These basic yet Best Practices critical steps are frequently neglected during the processing Best practices are discussed in detail in the recommenda- of survey data. The removal of direct identifiers is a prereq- tions sections: Presurvey Activities, Security and Safety, uisite to deidentify the data, but if quasi-identifiers and Processing and Analyzing Collected Data, and Disclosure pseudonyms are to be included an anonymization approach Prevention. The most critical practices are summarized in should be employed as well. As a general principle, the the “Conclusion” section. analysis to be performed should be the guide for selecting an anonymization technique to minimize the effect of masked data to the accuracy of spatial analysis (e.g., clus- Research Agenda tering, point pattern, multivariate, etc.). Anonymization and disclosure risk evaluation are important Then, the research team should calculate the anonymiza- tasks of a privacy-preserving research campaign that require tion effect of the masked data on spatial analysis, evaluate further empirical research. Regarding the anonymization, the remaining disclosure risk, and classify all stored datas- we emphasized on approaches that are heavily discussed in ets as “anonymised” or “restricted-access” datasets. the geoprivacy literature, but we do not claim that this is a Regarding the anonymization effect, we suggested mea- comprehensive list. Additional methods should be explored, sures to evaluate the error or information loss of the masked especially in situations when the anonymization needs to be data in spatial analyses. We focused on measures that quan- tailored to the specifications of a research campaign and tify the magnitude of the effect and, whenever possible, collected survey data. We discussed how the qualities of have been used in the geoprivacy literature because their participatory sensing data urge for a fusion of anonymiza- usage in future studies would allow comparison of results. tion methods that consider both k-anonymity and l-diver- The last set of recommendations refer to the tasks after sity. Currently, there is a lack of methods specified to these data are processed. First, the members of the research cam- qualities that can successfully prevent all types of paign should examine the disclosure risk of their research disclosure. outputs, such as maps in scientific journals, and apply a pro- Also, there has been limited discussion on the evaluation tection approach if private locations of measurements are to or quantification of the disclosure risk. Some of the scholars be published. Second, to ensure ethical conduct of research who developed anonymization methods have either quanti- we suggest reporting generally on the employed privacy fied the disclosure risk with formulas that are typically spec- protection practices of outputs or anonymized data as well ified to their method, or developed a method conditioned as adding disclaimers. Third, careful consideration should that it preserves an estimated anonymity (Allshouse et al., be taken while releasing and reporting on anonymized data- 2010; Beresford & Stajano, 2004; Croft et al., 2016; Paiva sets so as not to provide disclosure hints to a potential pri- et al., 2014; Wieland, Cassa, Mandl, & Berger, 2008; You vacy attacker. Fourth, the privacy manager should prepare et al., 2007; Zhang, Freundschuh, Lenzer, & Zandbergen, licensing and DSAs, and maintain a data inventory of all 2017). However, the results or conclusions of these studies published or shared datasets. Last, the controller must should not be generalized because the characteristics of a investigate the appropriateness of the requestor’s environ- study area or available linked datasets and background infor- ment and personnel to handle sensitive data, and he or she mation of the original dataset can vary. Therefore, a different should have an active role regarding the privacy-preserva- approach to evaluate the disclosure risk may be needed. tion practices of the requestor’s research plan. Furthermore, not all anonymization methods were assessed This set of recommendations establishes ethical scien- regarding the disclosure risk they entail. However, there are tific practices and ensures sufficient privacy protection some studies who looked at the disclosure risk of original which are crucial elements so as to engage people to con- data. For instance, Alrayes and Abdelmoty (2014) examined tribute actively being “human data sources.” This is neces- aspects of potential personal information that may be derived sary to leverage collective information in areas such as from LBSN data. The estimated potential problems are not environmental monitoring, urban planning, security and verified by means of actual disclosure due to the fact that quality of life, emergency management, traffic monitoring, LBSN validation data are hard to obtain (e.g., real private or e-tourism. Nonetheless, the willingness to voluntarily locations or identity of users). De Montjoye, Hidalgo, share personal data is linked with the trust in the security of the data. To make an informed decision on the data’s secu- Verleysen, and Blondel (2013) analyzed mobile phone data, rity, participants need to be aware of the potential misuses, and found that four randomly chosen points are enough to countermeasures, and their efficiency. Yet, privacy-related uniquely characterize 95% of heavy users drawn from a ran- terms, conditions, and technology are mostly hardly under- dom sample. Nevertheless, the extent to which these four standable to nonexperts. Therefore, more simple and bind- locations can lead to a successful inferential or attribute dis- ing ways of communicating this kind of information have to closure of the users’ personal information (e.g., identity or be found. household location of a user) remains unexplored. Thus, the Kounadi and Resch 219 evaluation of the disclosure risk is still a topic that needs to Barcena, M. B., Wueest, C., & Lau, H. (2014). How safe is your quantified self. Mountain View, CA: Symantec. be examined in depth with empirical studies that involve Beresford, A., & Stajano, F. (2003). Location privacy in pervasive validation data. computing. IEEE Pervasive Computing, 2(1), 46-55. Beresford, A., & Stajano, F. (2004). Mix zones: User privacy Educational Implications in location-aware services. In Proceedings of the Second IEEE Annual Conference on Pervasive Computing and Current research in location privacy has revealed that research- Communications Workshops, Orlando, FL, USA (pp. 127-131). ers that use spatial data do not always employ adequate pri- Bergner, B., Zeile, P., Papastefanou, G., & Rech, W. (2011). vacy-preserving practices. This can be partially attributed to Emotional barrier GIS as a new tool for Identification the lack of scientific expertise and technological background. and optimization of urban space barriers. Angewandte To eliminate future practices that may compromise individual Geoinformatik, 430-439. privacy, every research campaign that collects participatory Boulos, M. N. K., Curtis, A. J., & AbdelMalik, P. (2009). Musings sensing data should assign a privacy manager. The privacy on privacy issues in health research involving disaggregate manager must be trained in the following areas: geographic data about individuals. International Journal of Health Geographics, 8, Article 46. Brownstein, J. S., Cassa, C. A., Kohane, I. S., & Mandl, K. D. •• Anonymization techniques and location protection (2006a). An unsupervised classification method for infer- methods ring original case locations from low-resolution disease •• Estimation of the disclosure risk maps. International Journal of Health Geographics, 5(1), •• Analytical methods of participatory sensing data Article 56. Brownstein, J. S., Cassa, C. A., & Mandl, K. D. (2006b). No place Authors’ Note to hide—Reverse identification of patients from published Ourania Kounadi is now affiliated to University of Twente, maps. New England Journal of Medicine, 355, 1741-1742. Faculty of Geo-Information Science and Earth Observation (ITC), Bruns, J., & Simko, V. (2017, July). Stable hotspot analysis for Department of Geo-information Processing (https://people. intra-urban heat islands. Paper presented at the GI_Forum, utwente.nl/o.kounadi) Salzburg, Austria. Buttyán, L., Holczer, T., & Vajda, I. (2007). On the effective- Declaration of Conflicting Interests ness of changing pseudonyms to provide location privacy in VANETs. Security and Privacy in Ad-hoc and Sensor The author(s) declared no potential conflicts of interest with respect Networks, 4572, 129-141. to the research, authorship, and/or publication of this article. Cassa, C. A., Grannis, S. J., Overhage, J. M., & Mandl, K. D. (2006). A context-sensitive approach to anonymizing spatial Funding surveillance data: Impact on outbreak detection. Journal of The author(s) disclosed receipt of the following financial support the American Medical Informatics Association, 13, 160-165. for the research, authorship, and/or publication of this article: This doi:10.1197/Jamia.M1920 research is funded by the Austrian Science Fund (FWF) for the Cassa, C. A., Wieland, S. C., & Mandl, K. D. (2008). project Urban Emotions—development of methods for production Re-identification of home addresses from spatial locations of contextual emotion information in spatial planning with the anonymized by Gaussian skew. International Journal of help of human sensory assessment and crowdsourcing technolo- Health Geographics, 7, Article 45. gies in social networks. Project Number I 3022N33. CDC-CSTE. (2005). CDC-ATSDR data release guidelines and procedures for re-release of state-provided data. Retrieved References from https://stacks.cdc.gov/view/cdc/7563 Adrienko, N., & Adrienko, G. (2011). Spatial generalization and Chainey, S., Tompson, L., & Uhlig, S. (2008). The utility of aggregation of massive movement data. IEEE Transactions hotspot mapping for predicting spatial patterns of crime. on Visualization and Computer Graphics, 17, 205-219. Security Journal, 21(1-2), 4-28. Allshouse, W. B., Fitch, M. K., Hampton, K. H., Gesink, D. C., Cheng R., Zhang Y., Bertino E., Prabhakar S. (2006) Preserving Doherty, I. A., Leone, P. A., . . . Miller, W. C. (2010). Geomasking User Location Privacy in Mobile Data Management sensitive health data and privacy protection: An evaluation using Infrastructures. In: Danezis G., Golle P. (eds) Privacy an E911 database. Geocarto International, 25, 443-452. Enhancing Technologies. PET 2006. Lecture Notes in Alrayes, F., & Abdelmoty, A. (2014). No place to hide: A study of Computer Science, 4258. Springer, Berlin, Heidelberg. privacy concerns due to location sharing on geo-social networks. Christin, D., Reinhardt, A., Kanhere, S. S., & Hollick, M. (2011). International Journal on Advances in Security, 7(3/4), 62-75. A survey on privacy in mobile participatory sensing applica- Andresen, M. A. (2009). Testing for similarity in area-based spa- tions. Journal of Systems and Software, 84, 1928-1946. tial patterns: A nonparametric Monte Carlo approach. Applied Cox, L. H. (1996). Protecting confidentiality in small population Geography, 29, 333-345. health and environmental statistics. Statistics in Medicine, 15, Armstrong, M. P., Rushton, G., & Zimmerman, D. L. (1999). 1895-1905. Geographically masking health data to preserve confidential- Croft, W. L., Shi, W., Sack, J.-R., & Corriveau, J.-P. (2016). ity. Statistics in Medicine, 18, 497-525. Location-based anonymization: Comparison and evaluation of 220 Journal of Empirical Research on Human Research Ethics 13(3) the Voronoi-based aggregation system. International Journal temporal unit problem on crime hot spot stability (Doctoral of Geographical Information Science, 30, 1-23. dissertation, American University). Retrieved from https:// Croft, W. L., Shi, W., Sack, J.-R., & Corriveau, J.-P. (2017). www.ncjrs.gov/App/Publications/abstract.aspx?ID=272536 Comparison of approaches of geographic partitioning for data Information Commissioner’s Office. (2012). Crime-mapping and anonymization. Journal of Geographical Systems, 19, 1-28. geo-spatial crime data: Privacy and transparency principles. Cuellar, J. R. (2004). Geopriv requirements (Internet Draft, Nov. Retrieved from https://ico.org.uk/media/for-organisations/ 2002). Retrieved from https://tools.ietf.org/html/draft-ietf- documents/1543/crime_mapping.pdf geopriv-dhcp-lbyr-uri-option-03.html International Atomic Energy Agency. (2015). Incident and De Cristofaro, E., & Soriente, C. (2011). Short paper: PEPSI— Trafficking Database (ITDB). Retrieved from http://www-ns. privacy-enhanced participatory sensing infrastructure. In iaea.org/security/itdb.asp Proceedings of the fourth ACM Conference on Wireless Kalnis, P., Ghinita, G., Mouratidis, K., & Papadias, D. (2007). Network Security, Hamburg, Germany, 14–17 June 2011, pp. Preventing location-based identity inference in anonymous 71-78. ACM Press. spatial queries. IEEE Transactions on Knowledge and Data De Montjoye, Y.-A., Hidalgo, C. A., Verleysen, M., & Blondel, Engineering, 19, 1719-1733. V. D. (2013). Unique in the crowd: The privacy bounds of Kanjo, E., Bacon, J., Roberts, D., & Landshoff, P. (2009). human mobility. Scientific Reports, 3, Article 1376. MobSens: Making smart phones smarter. IEEE Pervasive Denning, T., Andrew, A., Chaudhri, R., Hartung, C., Lester, J., Computing, 8(4), 50-57. Borriello, G., & Duncan, G. (2009). BALANCE: Towards a Kido, H., Yanagisawa, Y., & Satoh, T. (2005). An anonymous usable pervasive wellness application with accurate activity communication technique using dummies for location-based inference. In Proceedings of the 10th workshop on Mobile services. In Proceedings International Conference on the Computing Systems and Applications, Santa Cruz, CA, 23–24 Pervasive Services (ICPS’05), 11–14 July 2005, pp. 461-464. February 2009, pp. 1-6. ACM. Santorini, Greece, IEEE. De Wolf, V. A. (2003). Issues in accessing and sharing confidential Kounadi, O., Lampoltshammer, T. J., Leitner, M., & Heistracher, survey and social science data. Data Science Journal, 2, 66-74. T. (2013). Accuracy and privacy aspects in free online Duncan, G. T., & Pearson, R. W. (1991). Enhancing access to reverse geocoding services. Cartography and Geographic microdata while protecting confidentiality: Prospects for the Information Science, 40, 140-153. future. Statistical Science, 6, 219-232. Kounadi, O., & Leitner, M. (2014). Why does geoprivacy mat- Federal Committee on Statistical Methodology. (2005). Report on ter? The scientific publication of confidential data presented statistical disclosure limitation methodology. Retrieved from on maps. Journal of Empirical Research on Human Research https://www.hhs.gov/sites/default/files/spwp22.pdf Ethics, 9, 34-45. Graham, C. (2012). Anonymisation: Managing data protection Kounadi, O., & Leitner, M. (2015). Spatial information diver- risk code of practice. Information Commissioner’s Office. gence: Using global and local indices to compare geographi- Retrieved from https://ico.org.uk/media/1061/anonymisation- cal masks applied to crime data. Transactions in GIS, 19, code.pdf 737-757. doi:10.1111/tgis.12125 Gruteser, M., & Grunwald, D. (2003). Anonymous usage of Kounadi, O., & Leitner, M. (2016). Adaptive areal elimination location-based services through spatial and temporal cloaking. (AAE): A transparent way of disclosing protected spatial In Proceedings of the 1st International Conference on Mobile datasets. Computers, Environment and Urban Systems, 56, Systems, Applications and Services, San Francisco, CA, 5–8 59-67. doi:10.1016/j.compenvurbsys.2016.01.004 May 2003, pp. 273-286. Krumm, J. (2007). Inference attacks on location tracks. In A. Gutmann, M. P., & Stern, P. C. (Ed.). (2007). Putting people on LaMarca, M. Langheinrich, & K. Truong (Eds.), Pervasive the map: Protecting confidentiality with linked social-spatial computing (Vol. 4480, pp. 127-143). Berlin, Germany: data. Washington, DC: The National Academies Press. Springer. Haley, D. F., Matthews, S. A., Cooper, H. L., Haardörfer, R., Kwan, M. P., Casas, I., & Schmitz, B. C. (2004). Protection of geo- Adimora, A. A., Wingood, G. M., & Kramer, M. R. (2016). privacy and accuracy of spatial information: How effective Confidentiality considerations for use of social-spatial data are geographical masks? Cartographica: The International on the social determinants of health: Sexual and reproductive Journal for Geographic Information and Geovisualization, health case study. Social Science & Medicine, 166, 49-56. 39(2), 15-28. Hall, W. R. (2016). Human Subjects Protection Program (HSPP): Leitner, M., & Curtis, A. (2004). Cartographic guidelines for geo- Policies and procedures. Office for the Protection of Research graphically masking the locations of confidential point data. Subjects, Health Sciences Institutional Review Board, Cartographic Perspectives, 49, 22-39. University Park Institutional Review Board, University of Leitner, M., Mills, J. W., & Curtis, A. (2007). Can novices to Southern California. Retrieved from https://oprs.usc.edu/hspp/ geospatial technology compromise spatial confidentially? Hampton, K. H., Fitch, M. K., Allshouse, W. B., Doherty, I. A., Kartographische Nachrichten(“cartographic News”), 57, Gesink, D. C., Leone, P. A., . . . Miller, W. C. (2010). Mapping 78-84. health data: Improved privacy protection with donut method Machanavajjhala, A., Kifer, D., Gehrke, J., & Venkitasubramaniam, geomasking. American Journal of Epidemiology, 172, 1062- M. (2007). l-diversity: Privacy beyond k-anonymity. ACM 1069. doi:10.1093/Aje/Kwq248 Transactions on Knowledge Discovery from Data, 1, 1-12. Hunt, J. M. (2016). Do crime hot spots move? Exploring the MacKerron, G., & Mourato, S. (2013). Happiness is greater in natu- effects of the modifiable areal unit problem and modifiable ral environments. Global Environmental Change, 23, 992-1000. Kounadi and Resch 221 Maisonneuve N., Stevens M., Niessen M.E., Steels L. (2009) Solymosi, R., Bowers, K., & Fujiyama, T. (2015). Mapping fear of NoiseTube: Measuring and mapping noise pollution with crime as a context-dependent everyday experience that varies mobile phones. In: Athanasiadis I.N., Rizzoli A.E., Mitkas in space and time. Legal and Criminological Psychology, 20, P.A., Gómez J.M. (eds) Information Technologies in 193-211. Environmental Engineering. Environmental Science and The Stationery Office. (1998). Data Protection Act. Retrieved from Engineering (pp. 53-65). Berlin, Heidelberg: Springer. http://www.legislation.gov.uk/ukpga/1998/29/contents Monreale, A., Andrienko, G., Andrienko, N., Giannotti, F., Stuntebeck, E. P., Davis, I. I. J. S., Abowd, G. D., & Blount, M. Pedreschi, D., Rinzivillo, S., & Wrobel, S. (2010). Movement (2008). HealthSense: Classification of health-related sensor data anonymity through generalization. Transactions on Data data through user-assisted machine learning. In Proceedings Privacy, 3, 91-121. of the 9th Workshop on Mobile Computing Systems and Murad, A., Hilton, B., Horan, T., & Tangenberg, J. (2014). Protecting Applications, Napa, CA, 25–26 February 2008, pp. 6–10. patient geo-privacy via a triangular displacement geo-masking New York: ACM. method. In Proceedings of the 1st ACM SIGSPATIAL Tompson, L., Johnson, S., Ashby, M., Perkins, C., & Edwards, P. International Workshop on Privacy in Geographic Information (2015). UK open source crime data: Accuracy and possibili- Collection and Analysis, Dallas/Fort Worth, TX, 4–7 November ties for research. Cartography and Geographic Information 2014, pp. 1–4. New York, NY: ACM. Science, 42, 97-111. Olson, K. L., Grannis, S. J., & Mandl, K. D. (2006). Privacy Törnros, T., Dorn, H., Reichert, M., Ebner-Priemer, U., Salize, protection versus cluster detection in spatial epidemiol- H., Tost, H., Meyer-Lindenberg, A., & Zipf, A. (2016). A ogy. American Journal of Public Health, 96, 2002-2008. comparison of temporal and location-based sampling strate- doi:10.2105/Ajph.2005.069526 gies for global positioning system-triggered electronic dia- Openshaw, S., & Openshaw, S. (1984). The modifiable areal unit ries. Geospatial Health, 11(3). Retrieved from https://doi. problem. Norwich: Geo Abstracts University of East Anglia. org/10.4081/gh.2016.473 Paiva, T., Chakraborty, A., Reiter, J., & Gelfand, A. (2014). U.S. Government Publishing Office. (2009). 45 CFR 164.514— Imputation of confidential data sets with spatial locations using Other requirements relating to uses and disclosures of pro- disease mapping models. Statistics in Medicine, 33, 1928-1945. tected health information. Available from https://www. PLOS ONE. (2014). PLOS’ new data policy: Public access to gpo.gov/fdsys/pkg/CFR-2009-title45-vol1/xml/CFR-2009- data. Retreived from http://blogs.plos.org/everyone////plos- title45-vol1-sec164-514.xml new-data-policy-public-access-data-/201402242 Waldo, J., Herbert, S., & Lin Millett, L. I. (2007). Engaging pri- Post, R. C. (2001). Three concepts of privacy. Georgetown Law vacy and information technology in a digital age. Washington, Journal, 89, Article 2087. DC: The National Academies Press. Resch, B. (2013, October). People as sensors and collective Wang, H., & Reiter, J. P. (2012). Multiple imputation for sharing sensing-contextual observations complementing geo-sensor precise geographies in public use data. The Annals of Applied network measurements. In Krisp J (ed.) Progress in location- Statistics, 6(1), 229-252. based services (pp. 391-406). Berlin, Germany: Springer. Wang, X. O., Cheng, W., Mohapatra, P., & Abdelzaher, T. (2013). Resch, B., Summa, A., Sagl, G., Zeile, P., & Exner, J.-P. (2015, Artsense: Anonymous reputation and trust in participatory November). Urban emotions—Geo-semantic emotion extrac- sensing. In The Proceedings of IEEE INFOCOM 2013, Turin, tion from technical sensors, human sensors and crowdsourced Italy, 14–19 April 2013, pp. 2652–2660. IEEE. data. In Georg G., & Haoshen H. (eds) Progress in location- Wartell, J., & McEwen, J. T. (2001). Privacy in the information based services 2014 (pp. 199-212). Vienna, Austria: Springer age: A guide for sharing crime maps and spatial data series: International Publishing. Research report. Crime Mapping Research Center, National Rodrigues da Silva, A. N., Zeile, P., de Oliveira Aguiar, F., Institute of Justice, Retrieved from https://www.ncjrs.gov/ Papastefanou, G., & Bergner, B. S. (2014). Smart Sensoring pdffiles1/nij/grants/188739.pdf as a planning support tool for barrier free planning: Project Westin, A. F. (1968). Privacy and freedom. Washington and Lee outcomes and recent developments. In N. N. Pinto, J. A. Law Review, 25(1), Article 20. Tenedório, A. P. Antunes, & J. R. Cladera (eds). Technologies Wieland, S. C., Cassa, C. A., Mandl, K. D., & Berger, B. (2008). for Urban and Spatial Planning: Virtual Cities and Territories Revealing the spatial distribution of a disease while pre- (pp. 1-16), Hershey, PA: IGI Global. serving privacy. Proceedings of the National Academy Samarati, P., & Sweeney, L. (1998). Protecting privacy when of Sciences of the United States of America, 105, 17608- disclosing information: k-anonymity and its enforcement 17613. through generalization and suppression: Technical report. You, T. H., Peng, W. C., & Lee, W. C. (2007). Protecting moving SRI International, Retreived from http://citeseerx.ist.psu.edu/ trajectories with dummies. In The International Conference viewdoc/summary?doi=10.1.1.37.5829 on Mobile Data Management, Washington, DC, 1 May 2007, Seidl, D. E., Paulus, G., Jankowski, P., & Regenfelder, M. (2015). pp. 198–205. IEEE Computer Society. Spatial obfuscation methods for privacy protection of house- Zandbergen, P. A. (2014). Ensuring confidentiality of geocoded hold-level data. Applied Geography, 63, 253-263. health data: Assessing geographic masking strategies for indi- Shin, M., Cornelius, C., Peebles, D., Kapadia, A., Kotz, D., vidual-level data. Advances in Medicine, 2014(2014), Article & Triandopoulos, N. (2011). AnonySense: A system for 567049. anonymous opportunistic sensing. Pervasive and Mobile Zeile, P., Höffken, S., & Papastefanou, G. (2009). Mapping Computing, 7(1), 16-30. people?–The measurement of physiological data in city areas 222 Journal of Empirical Research on Human Research Ethics 13(3) and the potential benefit for urban planning, In Proceedings Author Biographies REAL CORP 2009, Catalonia, Spain, 22–25 April 2009. Ourania Kounadi is a postdoc researcher at the Department of Zeile, P., Memmel, M., & Exner, J.-P. (2012). A new urban Geoinformatics, University of Salzburg. Her main research interests sensing and monitoring approach: Tagging the city with include geoprivacy and spatial confidentiality, urban emotions in the RADAR SENSING app. In Schrenk, M., Popovich, V., spatial planning, fear of crime, and spatial crime analysis. She con- Engelke, D., Elisei, P. (eds) The Proceedings REAL CORP ceived and designed the study. She analyzed the technical aspects of 2012 (pp. 1397–1409). Schwechat, Austria: CORP. geoprivacy, developed the lists of recommendations, and wrote the Zeile, P., Resch, B., Loidl, M., Petutschnig, A., & Dörrzapf, L. paper. (2016). Urban emotions and cycling experience–Enriching Bernd Resch is an assistant professor at the Department of traffic planning for cyclists with human sensor data. Gi_Forum, Geoinformatics, University of Salzburg, and a visiting scholar at 2016(1), 204-216. Harvard University. His main research interests include human and Zhang, S., Freundschuh, S. M., Lenzer, K., & Zandbergen, P. technical sensors, collective sensing, self-learning systems in A. (2017). The location swapping method for geomasking. GIScience, and real-time and smart cities. He contributed to the Cartography and Geographic Information Science, 44(1), 22-34. conception of the study and critically reviewed the manuscript. He Zimmerman, D. L., & Pavlik, C. (2008). Quantifying the addressed key points regarding the particulars of participatory effects of mask metadata disclosure and multiple releases sensing data and the remaining open research questions of on the confidentiality of geographically masked health geoprivacy. data. Geographical Analysis, 40(1), 52-76.

Journal

Journal of Empirical Research on Human Research EthicsSAGE

Published: Apr 23, 2018

Keywords: geoprivacy by design; location privacy; spatiotemporal data; mobile participatory sensors; disclosure risk; anonymization methods; research design; spatial analysis

There are no references for this article.