Rejoinder to the Discussion by Steven Heeringa

Rejoinder to the Discussion by Steven Heeringa We are very pleased to respond to the discussion by Dr. Heeringa, a senior, well-known, and internationally respected expert in the field of probability sampling and statistical methods for survey studies. Dr. Heeringa summarizes the strengths of our method based on the four criteria for evaluating random sampling methods; he also points out the weaknesses of our method and provides alternative approaches to improve it. We appreciate the comprehensive review, thoughtful comments and critiques, and constructive recommendations. In reply, we further expand our GIS/GPS-assisted sampling method by adding the recommended probability segment sampling as an alternative to the random walk method in stage two sampling. We believe that these two methods each have their own strengths and weaknesses. Investigators can decide which method to use based on the needs of their research and resources they can command. The four criteria Dr. Heeringa mentioned are very useful for evaluating a probability sampling method. They can be summarized as: 1) maximizing the study population coverage, 2) controlling the true inclusion probability for all sample cases, 3) optimizing the average size of primary sampling frame (PSU)-level clusters (or “geounit” in our method) and minimizing cluster size variations across all PSUs, and 4) minimizing noninformative variations in the final inclusion probabilities and the corresponding weights for analysis. With our method, investigators 1) have the potential to maximize population coverage for various types of study populations without depending on any hard-to-access data; 2) have the option to utilize simple or stratified sampling methods to select individual geographic areas (PSUs) with equal or unequal probabilities; 3) can optimize the allocations of PSUs (clusters) by stratum to deal with cost constraints for fieldwork; and 4) can use a random-digit method to select individual participants from the secondary sampling frame, which is eventually constructed using data from a complete enumeration of selected households on a sampled geounit. According to Dr. Heeringa, a weakness of our method stems from the random walk method for household selection (Bauer 2016). Several types and sources of errors can be introduced by using this method. First, it is not possible for investigators to control the probability of selecting individual households. This method depends on natural marks (e.g., the main entrance of a street or the first household in a sampled segment of a street) to determine the start point for household selection. Second, errors are unavoidable in implementing the random walk method despite detailed guidance and rules for data collection staff to use in the field. This is because all geographic units are randomly selected, and household arrangements in some of the selected geounits can be very complex in both rural and urban settings. Third, the actual size (Ag) of a selected geographic area has to be determined post hoc (i.e., using data to be collected after the households/individual participants are sampled). Further, sampled residential area (Ag) has to be estimated by on-site assessment of the geographic area from which the households/participants are sampled, and the assessment has to be completed by data-collection staff using a GPS receiver. Errors can be introduced during these steps, including subjective selection of the geographic boundaries, errors due to limited GPS precision for public use (up to 10 meters), and data-collector differences. Dr. Heeringa proposes to improve our GIS/GPS sampling method by replacing the random walk method with probabilistic geographic-segment sampling, a well-established spatial sampling method (Wang, Stein, Gao, and Yong 2012) that has also been successfully used in survey studies (Heeringa and O’Muircheartaigh 2010; Heeringa and Zinel, 2012; Singh and Clark 2013). Under this method, a geographic unit (as PSU) selected randomly using the area probability sampling method in stage one can be divided into small and mutually exclusive segments with boundaries determined by the field data collection staff based on either satellite images, paper maps, or knowledge from fieldwork. After the completion of segmentation of a selected geounit, a set of random numbers are assigned to each of all the segments. These segments are then ordered according to the assigned random numbers. With the information, there is no need for a data collector to locate a natural landmark (i.e., the entrance of a street) for the first household to start recruitment; he or he simply follows the order determined by the random number to select households and recruit participants continually until the pre-determined sample size is reached. In figure 1, we have called the random geographic-segment selection recommended by Dr. Heeringa “random ordering” and added it as an option for stage two sampling of our method. Figure 1. View largeDownload slide GIS-GPS-Assisted Probability Sampling by Connecting Geographic Area with Households. Figure 1. View largeDownload slide GIS-GPS-Assisted Probability Sampling by Connecting Geographic Area with Households. Despite many strengths of the random ordering method, it possesses several limitations. First, it may be challenging to implement the method without subjective judgement. Detailed segmentation rules must be established no matter how the method will be implemented on either satellite images, paper maps, or knowledge from the fieldwork. If the established rules fail to cover even one type of household arrangement, an investigator must use judgement on how to segment a geounit, creating chances to introduce errors. A second challenge is the determination of the ratio of residential and nonresidential areas for individual segments. Although randomly sampling segments within a geounit provides quality data to assess the probability of the total area being sampled, data are still needed on the ratio of residential area over that of the nonresidential area. Relative to the total geographic area, it will be very hard to determine the ratio for specific segment; and extra noninformative variations will be introduced if either an average value is used in place of segment-specific residential/nonresidential ratios or a specific value is estimated using the post hoc method we proposed with a GPS receiver after completion of sampling and data collection. Given the strengths and limitations of both the random walk and the random ordering method, we recommend the following: (1) For large-scale survey studies targeting urban populations with complex household arrangements, with an adequate amount of funding and required expertise on probability area sampling, and experienced personnel for field data collection, random ordering is the first choice. (2) For studies targeting rural or suburban areas with relatively simpler household arrangement and limited resources, the random walk method is the first choice. Regarding the large variation in the sample weights of the Wuhan study as pointed out by Dr. Heeringa, we attribute it largely to the extremely high heterogeneity of the population density of rural-to-urban migrants living in urban settings—the high coverage of our sampling method leads to large variation in the sample weights. We agree that this variation can be reduced by more detailed stratification of the geographic areas and further optimization of allocation of the geounits by stratum. Additional research is needed to verify this conclusion. Looking ahead, rapid development in deep learning and increasing availability of geospatial big data may enable us to further improve our method. A number of leading research groups are exploring the utility of high-resolution satellite images in population distribution mapping. For example, in Facebook’s Connectivity Lab, researchers utilized deep convolutional neural network to automatically identify dwelling units (Connectivity Lab 2018). We are planning to adapt the related technologies into our method to increase work efficiency and minimize noninformative sampling errors. In conclusion, the discussion by Dr. Heeringa provides a great summary of our method and strengthens it by adding the random ordering method as an alternative to the random walk method for household/participant sampling. Capitalizing on the recent rapid development in big data, we will continue our research to further improve GIS/GPS-assisted probability sampling methods, promoting survey studies. References Bauer J. ( 2016), “Biases in Random Route Surveys,” Journal of Survey Statistics and Methodology , 4, 263– 287. Google Scholar CrossRef Search ADS   Connectivity Lab ( 2018), “Connecting the World with Better Maps: Data-Assisted Population Distribution Mapping,” Facebook, available at https://fbnewsroomus.files.wordpress.com/2016/02/population_density_final_mj2_ym_tt2113.pdf. Accessed: January 19, 2018. Heeringa S., O’Muircheartaigh C. ( 2010), “Sampling Designs for Cross-Cultural and Cross-National Survey Programs,” in Survey Methods in Multinational, Multiregional and Multicultural Contexts , eds. J. A. Harkness, M. Braun, B. Edwards, T. P. Johnson, L. Lyberg, P. Mor, B. Pennel, T. W. Smith, pp. 251– 268, New York: John Wiley and Sons. Google Scholar CrossRef Search ADS   Heeringa S., Zinel S. ( 2012), Sample Design and Procedures for Hepatitis B Immunization Surveys: A Companion to the WHO Cluster Survey Manual . Geneva, Switzerland: WHO/ivb/11.12. Singh G., Clark B. D. ( 2013), “Creating a Frame: A Spatial Approach to Random Sampling of Immigrant Households in Inner City Johannesburg,” Journal of Refugee Studies , 26, 126– 144. Google Scholar CrossRef Search ADS   Wang J., Stein A., Gao B., Yong G. ( 2012), “A Review of Spatial Sampling,” Spatial Statistics , 2, 1– 14. Google Scholar CrossRef Search ADS   © Crown copyright 2018. This article contains public sector information licensed under the Open Government Licence v3.0 (http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/) This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://academic.oup.com/journals/pages/about_us/legal/notices) http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Journal of Survey Statistics and Methodology Oxford University Press

Rejoinder to the Discussion by Steven Heeringa

Loading next page...
 
/lp/ou_press/rejoinder-to-the-discussion-by-steven-heeringa-MFOyoNGmmC
Publisher
Oxford University Press
Copyright
© Crown copyright 2018. This article contains public sector information licensed under the Open Government Licence v3.0 (http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/)
ISSN
2325-0984
eISSN
2325-0992
D.O.I.
10.1093/jssam/smy011
Publisher site
See Article on Publisher Site

Abstract

We are very pleased to respond to the discussion by Dr. Heeringa, a senior, well-known, and internationally respected expert in the field of probability sampling and statistical methods for survey studies. Dr. Heeringa summarizes the strengths of our method based on the four criteria for evaluating random sampling methods; he also points out the weaknesses of our method and provides alternative approaches to improve it. We appreciate the comprehensive review, thoughtful comments and critiques, and constructive recommendations. In reply, we further expand our GIS/GPS-assisted sampling method by adding the recommended probability segment sampling as an alternative to the random walk method in stage two sampling. We believe that these two methods each have their own strengths and weaknesses. Investigators can decide which method to use based on the needs of their research and resources they can command. The four criteria Dr. Heeringa mentioned are very useful for evaluating a probability sampling method. They can be summarized as: 1) maximizing the study population coverage, 2) controlling the true inclusion probability for all sample cases, 3) optimizing the average size of primary sampling frame (PSU)-level clusters (or “geounit” in our method) and minimizing cluster size variations across all PSUs, and 4) minimizing noninformative variations in the final inclusion probabilities and the corresponding weights for analysis. With our method, investigators 1) have the potential to maximize population coverage for various types of study populations without depending on any hard-to-access data; 2) have the option to utilize simple or stratified sampling methods to select individual geographic areas (PSUs) with equal or unequal probabilities; 3) can optimize the allocations of PSUs (clusters) by stratum to deal with cost constraints for fieldwork; and 4) can use a random-digit method to select individual participants from the secondary sampling frame, which is eventually constructed using data from a complete enumeration of selected households on a sampled geounit. According to Dr. Heeringa, a weakness of our method stems from the random walk method for household selection (Bauer 2016). Several types and sources of errors can be introduced by using this method. First, it is not possible for investigators to control the probability of selecting individual households. This method depends on natural marks (e.g., the main entrance of a street or the first household in a sampled segment of a street) to determine the start point for household selection. Second, errors are unavoidable in implementing the random walk method despite detailed guidance and rules for data collection staff to use in the field. This is because all geographic units are randomly selected, and household arrangements in some of the selected geounits can be very complex in both rural and urban settings. Third, the actual size (Ag) of a selected geographic area has to be determined post hoc (i.e., using data to be collected after the households/individual participants are sampled). Further, sampled residential area (Ag) has to be estimated by on-site assessment of the geographic area from which the households/participants are sampled, and the assessment has to be completed by data-collection staff using a GPS receiver. Errors can be introduced during these steps, including subjective selection of the geographic boundaries, errors due to limited GPS precision for public use (up to 10 meters), and data-collector differences. Dr. Heeringa proposes to improve our GIS/GPS sampling method by replacing the random walk method with probabilistic geographic-segment sampling, a well-established spatial sampling method (Wang, Stein, Gao, and Yong 2012) that has also been successfully used in survey studies (Heeringa and O’Muircheartaigh 2010; Heeringa and Zinel, 2012; Singh and Clark 2013). Under this method, a geographic unit (as PSU) selected randomly using the area probability sampling method in stage one can be divided into small and mutually exclusive segments with boundaries determined by the field data collection staff based on either satellite images, paper maps, or knowledge from fieldwork. After the completion of segmentation of a selected geounit, a set of random numbers are assigned to each of all the segments. These segments are then ordered according to the assigned random numbers. With the information, there is no need for a data collector to locate a natural landmark (i.e., the entrance of a street) for the first household to start recruitment; he or he simply follows the order determined by the random number to select households and recruit participants continually until the pre-determined sample size is reached. In figure 1, we have called the random geographic-segment selection recommended by Dr. Heeringa “random ordering” and added it as an option for stage two sampling of our method. Figure 1. View largeDownload slide GIS-GPS-Assisted Probability Sampling by Connecting Geographic Area with Households. Figure 1. View largeDownload slide GIS-GPS-Assisted Probability Sampling by Connecting Geographic Area with Households. Despite many strengths of the random ordering method, it possesses several limitations. First, it may be challenging to implement the method without subjective judgement. Detailed segmentation rules must be established no matter how the method will be implemented on either satellite images, paper maps, or knowledge from the fieldwork. If the established rules fail to cover even one type of household arrangement, an investigator must use judgement on how to segment a geounit, creating chances to introduce errors. A second challenge is the determination of the ratio of residential and nonresidential areas for individual segments. Although randomly sampling segments within a geounit provides quality data to assess the probability of the total area being sampled, data are still needed on the ratio of residential area over that of the nonresidential area. Relative to the total geographic area, it will be very hard to determine the ratio for specific segment; and extra noninformative variations will be introduced if either an average value is used in place of segment-specific residential/nonresidential ratios or a specific value is estimated using the post hoc method we proposed with a GPS receiver after completion of sampling and data collection. Given the strengths and limitations of both the random walk and the random ordering method, we recommend the following: (1) For large-scale survey studies targeting urban populations with complex household arrangements, with an adequate amount of funding and required expertise on probability area sampling, and experienced personnel for field data collection, random ordering is the first choice. (2) For studies targeting rural or suburban areas with relatively simpler household arrangement and limited resources, the random walk method is the first choice. Regarding the large variation in the sample weights of the Wuhan study as pointed out by Dr. Heeringa, we attribute it largely to the extremely high heterogeneity of the population density of rural-to-urban migrants living in urban settings—the high coverage of our sampling method leads to large variation in the sample weights. We agree that this variation can be reduced by more detailed stratification of the geographic areas and further optimization of allocation of the geounits by stratum. Additional research is needed to verify this conclusion. Looking ahead, rapid development in deep learning and increasing availability of geospatial big data may enable us to further improve our method. A number of leading research groups are exploring the utility of high-resolution satellite images in population distribution mapping. For example, in Facebook’s Connectivity Lab, researchers utilized deep convolutional neural network to automatically identify dwelling units (Connectivity Lab 2018). We are planning to adapt the related technologies into our method to increase work efficiency and minimize noninformative sampling errors. In conclusion, the discussion by Dr. Heeringa provides a great summary of our method and strengthens it by adding the random ordering method as an alternative to the random walk method for household/participant sampling. Capitalizing on the recent rapid development in big data, we will continue our research to further improve GIS/GPS-assisted probability sampling methods, promoting survey studies. References Bauer J. ( 2016), “Biases in Random Route Surveys,” Journal of Survey Statistics and Methodology , 4, 263– 287. Google Scholar CrossRef Search ADS   Connectivity Lab ( 2018), “Connecting the World with Better Maps: Data-Assisted Population Distribution Mapping,” Facebook, available at https://fbnewsroomus.files.wordpress.com/2016/02/population_density_final_mj2_ym_tt2113.pdf. Accessed: January 19, 2018. Heeringa S., O’Muircheartaigh C. ( 2010), “Sampling Designs for Cross-Cultural and Cross-National Survey Programs,” in Survey Methods in Multinational, Multiregional and Multicultural Contexts , eds. J. A. Harkness, M. Braun, B. Edwards, T. P. Johnson, L. Lyberg, P. Mor, B. Pennel, T. W. Smith, pp. 251– 268, New York: John Wiley and Sons. Google Scholar CrossRef Search ADS   Heeringa S., Zinel S. ( 2012), Sample Design and Procedures for Hepatitis B Immunization Surveys: A Companion to the WHO Cluster Survey Manual . Geneva, Switzerland: WHO/ivb/11.12. Singh G., Clark B. D. ( 2013), “Creating a Frame: A Spatial Approach to Random Sampling of Immigrant Households in Inner City Johannesburg,” Journal of Refugee Studies , 26, 126– 144. Google Scholar CrossRef Search ADS   Wang J., Stein A., Gao B., Yong G. ( 2012), “A Review of Spatial Sampling,” Spatial Statistics , 2, 1– 14. Google Scholar CrossRef Search ADS   © Crown copyright 2018. This article contains public sector information licensed under the Open Government Licence v3.0 (http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/) This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://academic.oup.com/journals/pages/about_us/legal/notices)

Journal

Journal of Survey Statistics and MethodologyOxford University Press

Published: Apr 18, 2018

There are no references for this article.

You’re reading a free preview. Subscribe to read the entire article.


DeepDyve is your
personal research library

It’s your single place to instantly
discover and read the research
that matters to you.

Enjoy affordable access to
over 18 million articles from more than
15,000 peer-reviewed journals.

All for just $49/month

Explore the DeepDyve Library

Search

Query the DeepDyve database, plus search all of PubMed and Google Scholar seamlessly

Organize

Save any article or search result from DeepDyve, PubMed, and Google Scholar... all in one place.

Access

Get unlimited, online access to over 18 million full-text articles from more than 15,000 scientific journals.

Your journals are on DeepDyve

Read from thousands of the leading scholarly journals from SpringerNature, Elsevier, Wiley-Blackwell, Oxford University Press and more.

All the latest content is available, no embargo periods.

See the journals in your area

DeepDyve

Freelancer

DeepDyve

Pro

Price

FREE

$49/month
$360/year

Save searches from
Google Scholar,
PubMed

Create lists to
organize your research

Export lists, citations

Read DeepDyve articles

Abstract access only

Unlimited access to over
18 million full-text articles

Print

20 pages / month

PDF Discount

20% off