Evaluating Variance Estimators for Respondent-Driven Sampling

Evaluating Variance Estimators for Respondent-Driven Sampling Abstract Respondent-Driven Sampling (RDS) is a network-based method for sampling hard-to-reach populations that is widely used by public health agencies and researchers worldwide. Estimation of population characteristics from RDS data is challenging due to the unobserved population network, and multiple point and variance estimators have been proposed. Research evaluating these estimators has been limited and largely focused on point estimation; this analysis is the first evaluation of multiple variance estimators currently in use. We evaluated the performance of RDS variance estimators via simulations of RDS on synthetic networked populations constructed from 40 RDS surveys of injection drug users in the United States. In these simulations, average design effects (DEs) were lower and average 95% confidence interval (CI) coverage percentages were higher than suggested in previous work: typical DE range = 1–3; average 95% CI coverage = 93%. However, DE and CI coverage vary across the 40 sets of simulations, suggesting that the characteristics of a given study should be evaluated to assess estimator performance. We also found that simulation results are sensitive to whether sampling is conducted with replacement and the approach used to create CIs. We conclude that CI coverage rates and DEs are often acceptable but not perfect and that RDS estimates are usually reliable in scenarios where RDS assumptions are met. While RDS estimation performed reasonably well, we found strong evidence that the simple random sample variance estimator and corresponding CIs significantly underestimate variance and should not be used to analyze RDS data. 1. INTRODUCTION Respondent‐driven sampling (RDS) is a network‐based method for sampling populations for which a sampling frame is not available (Heckathorn 1997). Information about these “hard‐to‐reach” or “hidden” populations is critical for public health research with populations at high risk of acquiring HIV infection, including persons who inject drugs (PWID), men who have sex with men, and sex workers. RDS is widely used for public health surveillance of hidden populations by organizations such as the U.S. Centers for Disease Control and Prevention (CDC) (Gallagher, Sullivan, Lansky, and Onorato 2007), the Chinese Centers for Disease Control (Li, Lu, Cox, Zhao, Xia, et al. 2014), and entities funded through the President’s Emergency Plan for AIDS Relief (PEPFAR) (Hladik, Barker, Ssenkusu, Opio, Tappero, et al. 2012). RDS is primarily used to estimate the prevalence of traits such as diseases and risk factors. Design-based unbiased point and variance estimates of such prevalence from survey samples require calculating each participant’s probability of being sampled (“inclusion probability”). Because a sampling frame is not available, hidden population members’ inclusion probabilities cannot be calculated using standard approaches. Therefore, statistical inference from samples collected via RDS relies on models approximating the sampling process that incorporate information about the sample members’ social networks and information observed during the recruitment process. Multiple current evaluations of RDS point estimators and violations of RDS assumptions have been conducted, but significantly less work has examined RDS variance estimators (Wejnert 2009; Gile and Handcock 2010; Goel and Salganik 2010; Gile 2011; Tomas and Gile 2011; Lu, Bengtsson, Britton, Camitz, Kim, et al. 2012; McCreesh, Frost, Seeley, Katongole, Tarsh, et al. 2012; Merli, Moody, Smith, Li, Weir, et al. 2014; Verdery, Mouw, Bauldry, and Mucha 2015). Variance estimates are an essential complement to point estimates. Without good variance estimators, one cannot assess the amount of information contained in a sample and may draw invalid conclusions. In particular, statistical significance tests and confidence intervals will be misleading when the variance is under- (or over-) estimated. Additionally, understanding the variance of RDS point estimates in populations to which RDS is typically applied is important for calculating appropriate sample sizes for future studies. Past research on RDS variance estimation suggested that RDS confidence intervals provide unacceptably low coverage rates and that RDS may have extremely large design effects when applied to hidden populations of public health interest (Goel and Salganik 2010; Lu et al. 2012; Verdery et al. 2015). This study is the first systematic evaluation of the different RDS variance estimators. Our results indicate that confidence interval coverage rates are often acceptable although not perfect, and design effects are in the range of other complex survey designs. 2. BACKGROUND RDS begins with researchers choosing a small number (usually five to 10) of “seed” population members. The seeds are interviewed and given a small number of uniquely numbered coupons with which they can recruit population members they know into the sample (usually three to five). Recruited population members are interviewed and given coupons, and the process is repeated until the target sample size is reached. Participants are remunerated both for completing the survey questionnaire and for each eligible population member they recruit. RDS survey questionnaires and associated biological tests provide data on many characteristics of interest. For the purposes of this article, we represent these variables of interest by a two‐valued trait, with values “with trait” and “without trait.” Populations sampled via RDS are connected via social network ties; we refer to the set of persons, or “nodes,” and ties connecting them as the “population network.” We refer to the number of ties each person has to other members of the population as that person’s “degree.” Most RDS point estimators are design‐based, including the Salganik‐Heckathorn (SH) (Salganik and Heckathorn 2004), Volz‐Heckathorn (VH) (Volz and Heckathorn 2008), and Successive Sampling (SS) estimators (Gile 2011). The SH estimator models RDS as a Markov chain on the nodes in the population network; it is based on equating the number of network ties between population groups with different trait statuses (Salganik and Heckathorn 2004; Gile and Handcock 2010). The VH estimator uses the same Markov chain approximation to RDS and applies a modified Hansen‐Hurwitz estimator calculated from respondent degrees and the trait statuses of sample members (Volz and Heckathorn 2008). The SS estimator models RDS as sampling population members proportional to degree without replacement; it applies an algorithm to estimate the mapping between a person’s degree and his sampling probability and applies a form of the Horvitz‐Thompson estimator (Gile 2011). More details on these estimators is available in the supplementary materials. Commonly used RDS variance estimators employ a bootstrap resampling approach that approximates the RDS design (Davison and Hinkley 1997). The variance of the point estimates produced by these bootstrap resamples is computed and used to estimate the RDS variance. Two approximations that are currently used are the Salganik Bootstrap (“Sal‐BS”) (Salganik 2006) and the Successive Sampling Bootstrap (“SS‐BS”) (Gile 2011). The Sal‐BS is typically applied in conjunction with the SH or VH RDS point estimators (Salganik 2006), and the SS‐BS is applied in conjunction with the SS point estimator (Gile 2011). We refer to these point and variance estimator pairs as “SH/Sal‐BS,” “VH/Sal‐BS,” and “SS/SS‐BS,” respectively. Sal‐BS is based on ordered with‐replacement resampling draws from the sample, such that each subsequent node is selected from among the nodes whose recruiters have a trait status matching that of the previous node (Salganik 2006). SS‐BS takes a similar approach but considers the without‐replacement structure of RDS by adjusting the set of available nodes at each resampling draw based on which nodes had been previously sampled (Gile 2011). This analysis evaluates each of these point and variance estimate pairs; for comparison, we also consider the case when the RDS data are naively treated as a simple random sample (SRS) and the sample mean point estimator is used. We refer to this estimator pair as “Mean/SRS.” The variability of estimators is typically presented as a confidence interval (CI). A properly calibrated method for computing level α CIs produces intervals that capture the true population value for an estimand with probability of at least 1 – α (e.g., an α of 0.05 corresponds to a CI that includes the true population value in 95 percent of samples). CIs can be calculated from bootstrap variance estimates using a number of methods; the percentile and studentized bootstrap CI methods are most commonly used for RDS data (Efron and Tibshirani 1986). The lower and upper bounds of the CI under the percentile bootstrap method are the 100*(α2) and 100*(1-α2) percentiles, respectively, of the bootstrap resamples. In contrast, the studentized bootstrap CI method calculates the standard deviation (SD) of the bootstrap resample estimates and the t‐value (t) associated with the sample’s degrees of freedom; it then calculates the CI as the point estimate plus or minus t*SD for the upper and lower bounds, respectively. The percentile method can generate CIs that are asymmetric about the point estimate, whereas the studentized method always produces symmetric CIs. The SH/Sal‐BS and VH/Sal‐BS RDS estimator pairs have traditionally calculated CIs using the percentile method (Salganik 2006), while the SS/SS‐BS estimator pair has traditionally used the studentized bootstrap method (Gile 2011). The Mean/SRS estimator pair calculates a CI based on a normal approximation to the sampling distribution. 3. FRAMEWORK FOR ASSESSING RDS Evaluations of RDS point estimators have been conducted both with real RDS samples from nonhidden populations and with simulated RDS samples, but the accuracy of RDS variance estimators can only be evaluated via simulation. This is because, while it is theoretically possible to know the true value of an estimand in the target population to which to compare point estimators, it is only possible to know the true variability of an estimator in a true population by conducting a large number of independent studies in the same population with the same structure, which is practically infeasible. Evaluating RDS by simulation consists of three steps: (1) obtaining or creating a population network with certain characteristics, (2) simulating RDS on that network, and (3) applying RDS estimators to the trait of interest in the resulting samples. As these procedures are repeated many times, the resulting distribution of simulated estimates approaches the true sampling distribution of the estimators under the simulation conditions. Our primary results evaluate the performance of RDS uncertainty estimation based on the performance of the CIs calculated from different point/variance estimator pairs (e.g., SH/Sal‐BS). An estimator pair’s CI coverage is the percentage of simulations in which its CIs capture the network’s true population value, which is compared to the nominal coverage of 100*1-α% (e.g., a 95 percent CI should capture the true population value in 95 percent of simulations). In addition to evaluating variance estimators, for comparison with previous research on RDS uncertainty estimation, we consider RDS design effects (DEs), a relative measure of the variability of an estimator calculated from a sample drawn with a complex sampling method (Goel and Salganik 2010; Verdery et al. 2015). We calculate the DE as the ratio of the variance of an estimator from a given sampling design to the hypothetical variance if the sample had been collected using SRS on the same population. Specifically, the DE is the ratio of the RDS estimate’s variance to that under an SRS design of the same sample size. A method with a DE of 2 would require a sample size twice as large as that required by SRS to achieve the same variability for the estimate of a given trait. Typical DEs for many complex surveys that did not use RDS are between 1.5 and 2, but for some variables in some studies can range to 5 (Pettersson and Silva 2005; U.S. Census Bureau 2006). Previous research on the variance of RDS estimators has suggested that RDS DEs may be significantly larger than is typical in surveys conducted using complex sampling methods other than RDS (Goel and Salganik 2010; Lu et al. 2012; Verdery et al. 2015). While the DE of a given RDS study in the real world is unknown because it cannot be calculated from the data, we can calculate the DEs for our simulations numerically. We refer to these as the “actual DEs” below. In addition to actual RDS DEs, which previous research has also calculated based on simulations, the DEs estimated by RDS variance estimators (which can be calculated from a real RDS study’s data) are also of interest. We refer to these as “estimated DEs” below. Previous simulation studies have suggested that RDS variance estimators produce inaccurate estimated DEs when they compare the estimated and actual DEs for a given simulation (Goel and Salganik 2010; Verdery et al. 2015). Table 1 summarizes the findings from three previous simulation studies of RDS variance estimation and design effects. The two studies that evaluated 95 percent CI coverage reported mean or median 95 percent CI coverage rates below 70 percent. The three studies found a wide range of design effects, with mean or median design effects greater than 5 and ranging from 5 to 30. Table 1. Findings from Previous Simulation Studies of RDS Variance Estimation and Design Effects Study  Point estimator  Variance estimator  Simulation approach/CI method  Population network data used for RDS simulations  95% CI coverage  DE result  Goel and Salganik (Goel and Salganik 2010)  Volz-Heckathorn  Salganik  With replacement/percentile  1987 attempted network census of high‐risk heterosexuals in Colorado Springs, CO  52% (median)  11 (median; multiple traits)  Goel and Salganik  Volz-Heckathorn  Salganik  With replacement/percentile  Sample of U.S. adolescents in 7th–12th grades between 1994 and 1996  62% (median)  5.9 (median; multiple traits)  Lu et al. (Lu et al. 2012)  Volz-Heckathorn  N/A  N/A  Online social network in Sweden  N/A  5 to 13 (multiple traits)  Verdery et al. (Verdery et al. 2015)  Volz-Heckathorn  Salganik  With replacement/studentized  Sample of U.S. adolescents in 7th–12th grades between 1994 and 1996  68% (mean)  15 (mean; 3 traits)  Verdery et al.  Volz-Heckathorn  Salganik  With replacement/studentized  Multiple Facebook social networks of college students  65% (mean)  30 (mean; 2 traits)  Study  Point estimator  Variance estimator  Simulation approach/CI method  Population network data used for RDS simulations  95% CI coverage  DE result  Goel and Salganik (Goel and Salganik 2010)  Volz-Heckathorn  Salganik  With replacement/percentile  1987 attempted network census of high‐risk heterosexuals in Colorado Springs, CO  52% (median)  11 (median; multiple traits)  Goel and Salganik  Volz-Heckathorn  Salganik  With replacement/percentile  Sample of U.S. adolescents in 7th–12th grades between 1994 and 1996  62% (median)  5.9 (median; multiple traits)  Lu et al. (Lu et al. 2012)  Volz-Heckathorn  N/A  N/A  Online social network in Sweden  N/A  5 to 13 (multiple traits)  Verdery et al. (Verdery et al. 2015)  Volz-Heckathorn  Salganik  With replacement/studentized  Sample of U.S. adolescents in 7th–12th grades between 1994 and 1996  68% (mean)  15 (mean; 3 traits)  Verdery et al.  Volz-Heckathorn  Salganik  With replacement/studentized  Multiple Facebook social networks of college students  65% (mean)  30 (mean; 2 traits)  4. METHODS Evaluating RDS via simulation requires obtaining or creating a population network from which to draw samples and simulating RDS on that network. Previous studies have simulated RDS both on real and synthetic population networks. RDS is used to study hidden populations, so an RDS simulation study’s population network should be as similar to real hidden population networks as possible. Unfortunately, complete data for hidden population networks are extremely rare. Complete network data are difficult and expensive to collect in any setting (Morris 2004), and these challenges are compounded among populations whose members wish to remain hidden. Hidden population network data are unavailable, so the real population networks in previous RDS simulation studies have come from a variety of sources (table 1). Two of the studies used network data from a sample of United States adolescents in the 7th through 12th grades (the “Add Health” study) (Harris, Halpern, Whitsel, Hussey, Tabor, et al. 2009; Goel and Salganik 2010; Verdery et al. 2015), and another used Facebook network data from college students when Facebook only permitted college students to use the service (Verdery et al. 2015). Notably, both of these population networks are embedded within schools. In the United States, middle schools and high schools are highly structured by grade, with students typically taking classes only with others in the same grade. Colleges are less structured by grade, but they have additional structure along academic disciplines. Students in these settings often have friends outside their grades and disciplines, but their networks are strongly shaped and constrained by those institutional structures. Such institutional structures are not present for the vast majority of the hidden population networks RDS is used to sample. RDS variance is known to be strongly positively related to homophily, the extent to which networks are assortative along characteristics of its members. These school networks demonstrate strong homophily by grade, resulting in networks that may have “bottlenecks” between population subgroups (Goodreau, Kitts, and Morris 2009). As noted earlier, a key element of RDS estimators is self-reported degree. In Add Health, participants were asked to name up to five boy best friends and five girl best friends (Harris et al. 2009). The degree for a given participant was the sum of the number of persons she named and the number of times she was named by participants she did not name. In contrast, RDS study participants are typically asked to state the number of persons they know in the target population who also know them (Malekinejad, Johnston, Kendall, Kerr, Rifkin, et al. 2008), which serves as a proxy for the number of people who might recruit them into the study. Because of the difference in how degree is elicited between Add Health and RDS studies, the degree distribution for RDS studies typically has a higher mean and higher variance than those in the Add Health school networks (Malekinejad et al. 2008; Goodreau, Kitts, and Morris 2009). In sum, these school networks are very unlike the hidden population networks through which RDS coupons are typically passed, and some of their features mean that the known a priori that RDS will perform poorly in simulations. Given the differences between the available population network data and the networks of hidden populations, we based both our simulated population networks and our simulated RDS sampling process on real RDS studies. To maximize the similarity of our simulations to RDS as it occurs in the field, our simulations are designed to reflect RDS as it was used to sample PWID by the CDC’S National HIV Behavioral Surveillance system (NHBS) in 2009 and 2012. NHBS sampled PWID in 20 U.S. cities in both 2009 and 2012 using a standard protocol, resulting in 40 RDS samples (Centers for Disease Control and Prevention 2012, 2015). A flowchart of our simulation methods is presented in the supplementary material. To create the simulated population networks for our study (step 1 in the three-step process described above), we first estimated four characteristics of the PWID population in each NHBS city from each of the 40 NHBS samples: the prevalence and homophily for a two‐valued trait of public health interest; the estimated mean degree of population members; and differential activity (DA). Homophily is defined as the proportion of ties in the network between two respondents who share a trait status relative to what would be expected by chance. DA is a measure of one group’s gregariousness compared to another and is defined as the ratio of the mean degrees of population members with and without the trait. Summary statistics of these characteristics can be found in table 2.1 Using each of the 40 sets of characteristics, we then simulated 1,000 networks using exponential‐family random graph models (ERGMs) (Frank and Strauss 1986; Hunter and Handcock 2006; Hunter, Goodreau, and Handcock 2008; Hunter, Handcock, Butts, Goodreau, and Morris 2008; Handcock, Hunter, Butts, Goodreau, Krivitsky, et al. 2014), for a total of 40,000 networks. Each simulated network had a population size of 10,000 members. Table 2. Summary of 40 NHBS Samples Used to Create RDS Simulations Characteristic  Mean  SD  Median  Minimum  Maximum  Prevalence  0.104  0.0653  0.091  0.018  0.286  Mean degree  10.64  5.096  9.88  4.45  35.39  Homophily  1.226  0.2281  1.19  0.91  1.99  Differential activity  0.931  0.2098  0.92  0.53  1.44  Sample size  519.1  108.85  539.5  206  700  Number of Seeds  8  3.31  8  3  16  Number of seeds with trait  1.1  1.18  1  0  5  Number of seeds without trait  6.8  3.21  7  1  16  Number of seeds missing traita  0.13  0.404  0  0  2  % of coupons returned  30.60%  6.60%  33.20%  20.00%  49.80%  Number of recruits = 0b  33.90%  7.01%  35.50%  21.40%  48.00%  Number of recruits = 1b  21.80%  5.07%  22.10%  9.10%  32.10%  Number of recruits = 2b  17.70%  3.54%  18.20%  10.00%  25.10%  Number of recruits = 3b  10.50%  2.69%  10.00%  4.60%  16.00%  Number of recruits = 4b,c  1.70%  2.00%  0.67%  0%  7.70%  Number of recruits = 5b,c  0.54%  0.65%  0.30%  0%  2.40%  Characteristic  Mean  SD  Median  Minimum  Maximum  Prevalence  0.104  0.0653  0.091  0.018  0.286  Mean degree  10.64  5.096  9.88  4.45  35.39  Homophily  1.226  0.2281  1.19  0.91  1.99  Differential activity  0.931  0.2098  0.92  0.53  1.44  Sample size  519.1  108.85  539.5  206  700  Number of Seeds  8  3.31  8  3  16  Number of seeds with trait  1.1  1.18  1  0  5  Number of seeds without trait  6.8  3.21  7  1  16  Number of seeds missing traita  0.13  0.404  0  0  2  % of coupons returned  30.60%  6.60%  33.20%  20.00%  49.80%  Number of recruits = 0b  33.90%  7.01%  35.50%  21.40%  48.00%  Number of recruits = 1b  21.80%  5.07%  22.10%  9.10%  32.10%  Number of recruits = 2b  17.70%  3.54%  18.20%  10.00%  25.10%  Number of recruits = 3b  10.50%  2.69%  10.00%  4.60%  16.00%  Number of recruits = 4b,c  1.70%  2.00%  0.67%  0%  7.70%  Number of recruits = 5b,c  0.54%  0.65%  0.30%  0%  2.40%  a Assigned to be without trait for purposes of sampling simulation. b Among sample members who were given coupons. c These numbers include six studies where a maximum of three coupons were distributed per subject; the counts for those studies are constrained to be 0. We designed the RDS process (step #2 above) used in the simulations to match those observed in the NHBS samples by first measuring the following characteristics for each of the 40 NHBS samples: the sample size, the numbers of seeds with and without the trait, and the distribution of number of recruitments by sample members. Summary statistics of these characteristics can be found in table 2.2 For each of the 1,000 networks corresponding to a given NHBS sample, we simulated one RDS sample using the RDS package in the statistical software R (step #3 above) (Handcock, Fellows, and Gile 2015; R Core Team 2015). The simulated RDS process was implemented based on the RDS process characteristics of the NHBS sample described above; for example, a given simulated RDS sample had the same number of seeds as its corresponding NHBS sample. Because RDS samples do not allow for repeated participation, our baseline samples were without replacement. For each simulated RDS sample, we applied each of the four point/variance estimator pairs to the trait of interest. For each of the three RDS estimator pairs, we calculated 95 percent CIs using both the studentized and percentile bootstrap methods. Our analysis compares the coverage rates of the 95 percent CIs for the four point/variance estimator pairs and two bootstrap CI methods when sampling was with and without replacement, where the coverage rates are calculated as the proportion of the simulations in which the CI contained the true population prevalence of the trait. We calculate the actual DEs for our simulations numerically as ratio of the variance of the distribution of point estimates across simulations to the SRS variance, where the SRS variance includes a finite population adjustment based on the proportion of the population that was sampled. We calculate the estimated DEs for our simulations as the ratio of the estimated variance to the SRS variance. Each actual DE is calculated as the variance of the 1,000 simulations for each population network. Each estimated DE is calculated from a specific estimator pair applied to a single sampling simulation. Because the actual DE varies in magnitude across population networks, we summarize the estimated DEs’ accuracy by calculating the ratio of each simulation’s estimated DE to the actual DE for that population network. We compare the actual DEs for the four point estimators and also compare the actual DEs to the DEs estimated by the RDS variance estimators. 5. RESULTS Figure 1 presents the 95 percent CI coverage rates for the four estimator pairs for the 40 sets of RDS simulations conducted with the baseline condition of sampling without replacement and estimating the CI via the studentized bootstrap method. The horizontal axis of the figure is the nominal 95 percent CI coverage rate, and the vertical axis is the 40 simulation sets ordered from top to bottom by the SS coverage rate (the red line). 3 Figure 1. View largeDownload slide Ninety-Five Percent Confidence Interval (CI) Coverage Percentages for 40 Sets of RDS Simulations (Sampling without Replacement; Studentized Bootstrap CI Method). The horizontal axis is the nominal 95 percent CI coverage percentage, and the vertical axis is the 40 simulation sets ordered from top to bottom by the SS coverage percentage (the red line). The left panel’s horizontal axis ranges from 0 to 100 percent; the right panel’s horizontal axis ranges from 80 percent to 100 percent for detail. The coverage percentages for the sample mean do not appear in the right panel. Figure 1. View largeDownload slide Ninety-Five Percent Confidence Interval (CI) Coverage Percentages for 40 Sets of RDS Simulations (Sampling without Replacement; Studentized Bootstrap CI Method). The horizontal axis is the nominal 95 percent CI coverage percentage, and the vertical axis is the 40 simulation sets ordered from top to bottom by the SS coverage percentage (the red line). The left panel’s horizontal axis ranges from 0 to 100 percent; the right panel’s horizontal axis ranges from 80 percent to 100 percent for detail. The coverage percentages for the sample mean do not appear in the right panel. The left panel of figure 1 displays the full range of coverage rates on the horizontal axis. The sample mean performs poorly compared to the other estimators. Hence, the right panel omits the sample mean and displays coverage rates from 80 percent to 100 percent to allow more detailed comparison of the coverages for estimator pairs other than the Mean/SRS. The right panel reveals that the SH/Sal-BS and VH/Sal-BS estimates have similar performance to SS/SS-BS estimators for a majority of simulation sets, but that they have considerably worse coverage rates in at least four sets of simulations. The SS/SS‐BS estimator pair had higher overall coverage than the other two RDS estimator pairs: It only had one instance of coverage below 90 percent, whereas the SH/Sal‐BS and VH/Sal‐BS coverages were below 90 percent in five instances. The NHBS sample corresponding to the instance with SS/SS-BS coverage below 90 percent (86.8 percent) has trait prevalence of 0.034 and the smallest sample size (n = 210) of all the NHBS samples. The SH/Sal-BS and VH/Sal-BS had considerably worse coverage rates in four sets of simulations (figure 1, right panel: B-01, A-08, B-19, and A-01). The NHBS samples corresponding to these extreme cases had lower differential activity and higher homophily for the trait of interest than did the other samples. Table 3 shows summary statistics across the 40 simulation sets for these RDS estimator pair coverages along with results for the percentile bootstrap. For the SH/Sal-BS and VH/Sal-BS estimators, the studentized bootstrap performs better, with mean coverage rates 6 and 5.9 percentage points higher and median coverage rates 2 and 2.1 percentage points higher, respectively. For the SS/SS-BS, the results are very similar, with the studentized mean coverage rate 0.3 and median coverage rate 0.4 percentage points lower. Table 3. 95% Confidence Interval Coverage Percentages for Four RDS Point and Variance Estimator Pairs by Bootstrap CI Method Point estimator  Variance estimator  Bootstrap CI method  Mean  SD  Median  Range  Acceptable coverage %a  Sample mean  SRS variance  N/A  67.4  23.8  74.9  [14, 96]  5  Salganik-Heckathorn  Salganik  Percentile  87  12.8  91.9  [41, 96]  40  Salganik-Heckathorn  Salganik  Studentized bootstrap  93  2.8  93.9  [86, 97]  67.5  Volz-Heckathorn  Salganik  Percentile  87  12.8  91.8  [41, 96]  42.5  Volz-Heckathorn  Salganik  Studentized bootstrap  92.9  3.2  93.9  [82, 97]  67.5  Successive Sampling  Successive Sampling  Percentile  94.1  1.8  94.6  [87, 97]  80  Successive Sampling  Successive Sampling  Studentized bootstrap  93.8  1.8  94.2  [87, 96]  75  Point estimator  Variance estimator  Bootstrap CI method  Mean  SD  Median  Range  Acceptable coverage %a  Sample mean  SRS variance  N/A  67.4  23.8  74.9  [14, 96]  5  Salganik-Heckathorn  Salganik  Percentile  87  12.8  91.9  [41, 96]  40  Salganik-Heckathorn  Salganik  Studentized bootstrap  93  2.8  93.9  [86, 97]  67.5  Volz-Heckathorn  Salganik  Percentile  87  12.8  91.8  [41, 96]  42.5  Volz-Heckathorn  Salganik  Studentized bootstrap  92.9  3.2  93.9  [82, 97]  67.5  Successive Sampling  Successive Sampling  Percentile  94.1  1.8  94.6  [87, 97]  80  Successive Sampling  Successive Sampling  Studentized bootstrap  93.8  1.8  94.2  [87, 96]  75  aAcceptable coverage is calculated as the percentage of confidence intervals (CIs) with coverage between 93% and 97%, inclusive, for a given estimator pair and bootstrap CI method. Given that the conditions varied considerably across the 40 simulation sets, summary statistics such as the mean may mask meaningful variation in the coverages. Therefore, we calculated a summary measure of “acceptable coverage.”4 The Mean/SRS estimator had acceptable coverage in 5 percent of CIs. The SH/Sal‐BS and VH/Sal‐BS with studentized bootstrap estimator pairs produced acceptable coverage for 67.5 percent of CIs, and the SS/SS‐BS with percentile and studentized bootstrap CI methods produced acceptable coverages for 80 percent and 75 percent of CIs, respectively. We conducted additional simulations to investigate the 95 percent CI coverage rates for the VH/Sal-BS in our analysis (all greater than 80 percent, see figure 1, which were higher than the VH/Sal-BS coverage rates reported in the seminal Goel and Salganik paper (medians of 52 percent and 62 percent coverage for the two samples analyzed) and the paper by Verdery and colleagues (means of 68 percent and 65 percent for the two samples analyzed) (Goel and Salganik 2010; Verdery et al. 2015). We hypothesized that the simulation of RDS sampling with replacement or the use of the percentile bootstrap CI method impacted the coverage findings in those papers. Figure 2 presents the coverage rates for the VH/Sal‐BS estimator pair under four conditions: sampling with replacement with percentile bootstrap CIs, sampling with replacement with studentized bootstrap CIs, sampling without replacement with percentile bootstrap CIs, and sampling without replacement with studentized bootstrap CIs. This figure shows that the estimator applied to simulations using without-replacement sampling and the studentized bootstrap method (purple line and triangles) consistently outperform simulations using with-replacement sampling and the percentile bootstrap (red line and circles). Figure 2. View largeDownload slide Ninety-Five Percent Confidence Interval (CI) Coverage Percentages for 40 Sets of RDS Simulations (VH/Sal-BS Estimator Pair) by Bootstrap CI Method and Sampling with and without Replacement. The horizontal axis is the nominal 95% CI coverage percentage, and the vertical axis is the 40 simulation sets ordered from top to bottom by the without replacement, studentized bootstrap condition (the purple line and triangles). Figure 2. View largeDownload slide Ninety-Five Percent Confidence Interval (CI) Coverage Percentages for 40 Sets of RDS Simulations (VH/Sal-BS Estimator Pair) by Bootstrap CI Method and Sampling with and without Replacement. The horizontal axis is the nominal 95% CI coverage percentage, and the vertical axis is the 40 simulation sets ordered from top to bottom by the without replacement, studentized bootstrap condition (the purple line and triangles). Table 4 summarizes the DEs from our RDS simulations. The first four rows of table 4 show the actual DEs for samples drawn without replacement for the sample mean, SH, VH, and SS estimators. The median DEs for the SH, VH, and SS point estimators (table 4, rows 2–4) were approximately 1.7, which is similar to the DEs observed for other complex sampling methods (Pettersson and Silva 2005; U.S. Census Bureau 2006). For both the VH and SS estimators, the maximum DE was between 6 and 6.2; the maximum DE for SH was 95.5. In addition to its maximum DE of 95.5, the SH had three additional DEs that were much higher than expected. The DEs in these four scenarios were caused by the SH estimator failing for between two and six of the 1,000 simulation runs. Specifically, in these cases the SH produced a trait prevalence of 1 when the true prevalence was less than 0.08.5 The last row of table 4 shows the DEs of the VH estimator for sampling with replacement, which was the RDS simulation process used in the papers by Goel and Salganik and Verdery and colleagues (Goel and Salganik 2010; Verdery et al. 2015). Note that for every summary statistic, the DEs are higher for the VH sampling with replacement condition than for the VH sampling without replacement condition. Table 4. Design Effects for Four RDS Point Estimators by Sampling with or without Replacement Point estimator (sampling method)  Range  Median  Mean  SD  Sample mean (without replacement)  [0.75, 2.64]  1.34  1.42  0.49  Salganik-Heckathorn (without replacement)  [0.83, 95.51]  1.72  7.47  19.96  Volz-Heckathorn (without replacement)  [0.81, 6.19]  1.69  1.91  0.96  Successive Sampling (without replacement)  [0.83, 6.03]  1.66  1.89  0.93  Volz-Heckathorn (with replacement)a  [1.01, 7.97]  2.34  2.77  1.48  Point estimator (sampling method)  Range  Median  Mean  SD  Sample mean (without replacement)  [0.75, 2.64]  1.34  1.42  0.49  Salganik-Heckathorn (without replacement)  [0.83, 95.51]  1.72  7.47  19.96  Volz-Heckathorn (without replacement)  [0.81, 6.19]  1.69  1.91  0.96  Successive Sampling (without replacement)  [0.83, 6.03]  1.66  1.89  0.93  Volz-Heckathorn (with replacement)a  [1.01, 7.97]  2.34  2.77  1.48  a Point estimator and sampling method used in Goel and Salganik 2010. Table 5 compares the estimated and actual DEs by estimator pair and sampling method. It summarizes the performance of the DEs estimated by a given estimator pair and sampling method by comparing the distribution of estimated DE to actual DE ratios across the 40,000 simulations to a benchmark. It presents three benchmarks: estimated DEs within a factor of 1.5 (i.e., 60 percent to 150 percent) of the actual DE, a factor of 2 (i.e., 50 percent to 200 percent) of the actual DE, and a factor of 3 (i.e., 33 percent to 300 percent) of the actual DE. For each benchmark, it presents the percent of estimated DEs that were within that factor and the percent of those that were within the factor that were too low. For example, 78.1 percent of the SH/Sal-BS without replacement estimated DEs were within a factor of 1.5 of the actual DE; of that 78.1 percent, 45.8 percent were too low. Table 5. Comparison of Estimated and Actual Design Effects by Estimator Pair and Sampling Method Point estimator (sampling method)  Within a factor of 1.5 of the actual DEa   Within a factor of 2 of the actual DEb   Within a factor of 3 of the actual DEc   Percent  Percent of those within factor that are too low  Percent  Percent of those within factor that are too low  Percent  Percent of those within factor that are too low  SH/Sal-BS (without replacement)  78.1  45.8  83.7  46.2  88.3  47.0  VH/Sal-BS (without replacement)  83.5  46.6  91.8  47.3  97.5  48.0  SS/SS-BS (without replacement)  92.9  51.3  98.6  51.0  99.8  51.0  VH/Sal-BS (with replacement)*  56.9  79.4  79.9  82.8  93.5  84.5  Point estimator (sampling method)  Within a factor of 1.5 of the actual DEa   Within a factor of 2 of the actual DEb   Within a factor of 3 of the actual DEc   Percent  Percent of those within factor that are too low  Percent  Percent of those within factor that are too low  Percent  Percent of those within factor that are too low  SH/Sal-BS (without replacement)  78.1  45.8  83.7  46.2  88.3  47.0  VH/Sal-BS (without replacement)  83.5  46.6  91.8  47.3  97.5  48.0  SS/SS-BS (without replacement)  92.9  51.3  98.6  51.0  99.8  51.0  VH/Sal-BS (with replacement)*  56.9  79.4  79.9  82.8  93.5  84.5  a Between 66% and 150% of the actual DE. b Between 50% and 200% of the actual DE. c Between 33% and 300% of the actual DE. * Point estimator and sampling method used in Goel and Salganik 2010. Table 5 shows that for without-replacement sampling, the pattern of estimated DE performance for the estimators is consistent for all three benchmarks: the SS/SS-BS estimator pair had the highest percentage within the factor, the VH/Sal-BS had the second-highest percentage, and the SH/Sal-BS had the lowest percentage. This ordering was the same for the percentage of estimates within the benchmark that were too low, with the SS/SS-BS pair having the most even distribution (percentages closest to 50 percent). This pattern reflects the SH/Sal-BS and VH/Sal-BS pairs having less accurate estimated DEs that are biased upward and the SS/SS-BS pair having more accurate estimated DEs that are approximately unbiased. For with-replacement sampling, the VH/Sal-BS estimator pair shows much lower accuracy than all three without replacement estimators for the most stringent benchmark factor. It also has a high proportion of estimated DEs that are too low, with more than 79 percent of estimated DEs lower than the actual DE. 6. DISCUSSION Our simulations suggest that the coverage of 95 percent CIs for RDS samples is usually above 90 percent (with no coverage rates above 97 percent). This is better than past work has suggested, demonstrating that reasonably accurate RDS variance estimation is feasible and that conclusions drawn from past analyses of RDS data that applied one of these estimators may well be reasonable in scenarios where RDS assumptions are met. While the RDS estimators performed better than expected, the SRS variance estimator significantly underestimates the variability of RDS samples and provides very low coverage. Because of the complexity of RDS, it may be tempting to dispense with complicated inferential approaches and use the sample mean and SRS variance approximation. Our results show that this approach is likely to cause significant underestimation of uncertainty and lead to misleading conclusions. We found that the SS/SS‐BS estimator pair had overall higher coverage than the other two estimator pairs. The SS/SS-BS exhibited its lowest coverage when applied to a sample with lower prevalence and a smaller sample size than the other samples. In contrast, the SH/Sal‐BS and VH/Sal‐BS had lower coverage for samples, with levels of differential activity much lower than those of the other samples in combination with higher levels of homophily than those of the other samples. Note that the difference between the SS and VH estimators is a finite population adjustment that requires knowing the true size of the population, which is typically unavailable. The impact of error in the population size specified for the SS estimator in a given sample is a function of the true size of the population. The impact is relative, so the impact of a given absolute error in the specified population size will be larger for smaller population sizes (e.g., an error of 500 in the specified population size will have more impact when the true population size is 1,000 than when it is 10,000). For large population sizes, the SS estimator approaches the VH estimator because the finite population adjustment has little impact, so using the SS estimator with an overestimated population size will pull it toward the VH estimate. Therefore, the SS will perform at least as well as the VH unless the population size is dramatically underestimated. The relationship between a population’s characteristics and RDS CI coverage is complex, so the specific relationships between prevalence, sample size, and homophily and the performance of RDS estimator pairs require further investigation. More generally, a large number of population characteristics must be systematically varied in a simulation to disentangle the combinations of factors that influence RDS CI coverage. While other work has suggested RDS variance estimators perform poorly, our analysis suggests those results can, at least partially, be attributed to the choice of bootstrap method and unrealistic use of with-replacement sampling in prior studies. For the SH and VH estimators, we found that using the studentized bootstrap, as compared to the percentile bootstrap, significantly increased the percentage of CIs with good coverage from 40 to 67.5 and 42.5 to 67.5, respectively (table 2). Goel and Salganik’s findings of low CI coverage were likely at least partially due to their use of with‐replacement sampling and the percentile bootstrap CI method (see figure 2). Chernick and LaBudde (2014) studied the relative performance of studentized and percentile bootstrap CI estimates and found that in most scenarios the studentized approach is more accurate. We also found significantly smaller DEs than Goel and Salganik, with evidence that sampling with replacement increases the DE. For example, for without-replacement sampling, both SS/SS and VH/Sal-BS produced actual DEs less than 3 in 92.5 percent of our conditions (37/40) and 62.5 percent less than 2, whereas for with-replacement sampling the VH/Sal-BS estimator pair DE was less than 3 in only 67.5 percent of our conditions, with only 30 percent less than 2. This echoes findings by Lu and colleagues and Gile and Handcock that sampling without replacement may reduce the DEs for RDS (Gile and Handcock 2010; Lu et al. 2012). The estimated DEs were more accurate for sampling without replacement than for sampling with replacement. For example, for the VH/Sal-BS estimator pair sampling without replacement produced estimated DEs within a factor of 2 of the actual DEs 91.8 percent of the time, with slightly less than half (47.3 percent) being lower than the actual DE (the anticonservative direction). In contrast, for with-replacement sampling, the estimated DEs were within a factor of 2 of the actual DEs only 79.9 percent of the time, with a significant majority (82.8 percent) being lower than the actual DE. Overall, the estimated DEs for the SS/SS estimator pair were the most accurate: 92.9 percent within a factor of 1.5, with fewer large outliers (see the technical supplementary material for more detail). The RDS sampling process is highly complex and only partially observed in real RDS studies, so many choices about simulation design and specification must be made without reference to empirical data. Because the ultimate goal of RDS simulation studies is to understand how RDS performs in the real world, we recommend conducting RDS simulations without replacement and with parameters informed by real RDS samples to the extent possible. This study’s simulations find that RDS DEs are in the range suggested in other methodological work that did not use simulation studies (Wejnert, Pham, Krishna, Le, and DiNenno 2012). We found that simulated RDS DEs in cases chosen to approximate the NHBS are usually between 1 and 3, in contrast with suggestions in past simulation work that DEs may often be greater than 10 (Goel and Salganik 2010; Verdery et al. 2015). This means that, in instances where RDS assumptions are met, RDS provides samples with statistical precision similar to that of other complex sampling methods (although with significantly less precision than simple random samples of the same populations). We used data from a large number of real RDS studies to parameterize our simulated networks and RDS sampling process. These RDS samples were of PWID in large U.S. urban areas, so the results are likely most applicable to RDS samples drawn from large cities. Most of the largest RDS studies in the world occur in such places, such as studies conducted in China and Brazil (Szwarcwald, de Souza Junior, Damacena, Junior, and Kendall 2011; Li et al. 2014). However, many RDS samples are drawn from smaller populations in less urban areas, which may have population networks with significantly different structures than those in NHBS cities (Malekinejad et al. 2008). Sampling fractions may be substantial in studies of small populations, making it important to use variance estimators that reflect sampling without replacement (which the SH and VH estimators do not). McCreesh and colleagues conducted an RDS methodological study in Uganda that is more similar to such small populations than NHBS samples (McCreesh et al. 2012). They found that some subpopulations underrepresented in the sample did not have correspondingly lower mean degree, which led the RDS estimators to perform poorly. However, the poor performance of RDS estimators was also partially due to some recruiters’ misunderstanding of which population members were eligible for the study (and should be considered for recruitment) due to differences between the researchers’ and the local population’s interpretation of the language used to communicate the eligibility criteria (McCreesh et al. 2012). This misunderstanding led to systematically biased recruitment by some sample participants. With all sampling methods, but especially in peer-driven methods such as RDS, it is critical that researchers understand and account for the cultural norms and context of the communities they are sampling. These differences in population structure, RDS execution, and RDS estimation highlight the importance of context in understanding the applicability of RDS methodological study findings. Our results are subject to a number of limitations. First, although the networks created for our simulations were designed with some structural characteristics similar to those of PWID networks in NHBS cities, the true structure and complexity of hidden population networks is unknown. Almost all social networks contain structure that is not observed in RDS data. For example, an outcome might vary across a city’s neighborhoods, and the PWID networks in some neighborhoods may have few connections to those in other neighborhoods. The ERGM used to create our simulated networks did not directly specify such complex network structure as it is unclear what the correct levels of such structure should be. Note that for such network structure to strongly influence RDS estimation, it must be strongly related to the outcome (e.g., quite different prevalences of the trait across the weakly connected subgroups). Second, the characteristics we used to create the networks for our simulations were estimated from NHBS samples using RDS estimators. Therefore, the simulations are not replicates of the 40 samples collected by NHBS but are, instead, examples of networks and RDS processes similar to those observed in the NHBS samples. The results may be sensitive to our use of large networks and small sampling fractions as in the NHBS samples. The stability of NHBS samples of PWID over time suggests that our findings are applicable to future NHBS studies of PWID. Third, our simulations implemented RDS with only a few statistical assumptions not met. Both the SH and VH point estimators assume that recruitment trees do not branch (i.e., each sample member makes exactly one recruitment) and that sampling is with replacement, neither of which was true in our simulations. Other RDS statistical assumptions such as participants recruiting randomly from their set of contacts and, for the SS estimator, that the population size is known, were met. It is known that violations of RDS point estimator assumptions decrease the accuracy of RDS point estimates (Gile and Handcock 2010; Tomas and Gile 2011; Lu et al. 2012). This is likely true for RDS variance estimators as well. Future work will examine the effects of violations of assumptions on the performance of RDS variance estimators. Fourth, our analysis did not evaluate all RDS variance estimators. Some work has proposed new point estimators that were accompanied by minor modifications to an existing variance estimator to incorporate the new point estimator (Lu 2013; Lu, Malmros, Liljeros, and Britton 2013). Gile and Handcock introduced an estimator that simulates RDS on a synthetic network created from characteristics of the sample data (Gile and Handcock 2015). Yamanis and colleagues proposed a modification to the Salganik bootstrap that reflects the branching structure of RDS samples (Yamanis, Merli, Neely, Tian, Moody, et al. 2013). Baraff and colleagues recently proposed a tree bootstrap in which each resample replicates recruitment trees’ structures by sampling with replacement from each recruiter’s set of recruits (Baraff, McCormick, and Raftery 2016). 7. CONCLUSION Sampling hidden populations is critical for public health surveillance and planning around the world. RDS is effective at reaching members of hidden populations that other sampling methods cannot and is inexpensive enough to be used in low‐resource settings. These strengths have led to its wide use around the world for many different applications. Past research on RDS variance estimation suggested that RDS variance estimator CIs provide very low coverage rates and that RDS has higher DEs than has been assumed in the public health literature (Goel and Salganik 2010; Verdery et al. 2015). Our results indicate instead that both CI coverage rates and DEs are often acceptable but not perfect. However, researchers should evaluate whether a given study has characteristics similar to those found in our simulations that produced good (or poor) coverage. Additionally, deviations from the assumed RDS sampling process or population network structures not examined in this paper may impact the CI coverage rates and DE magnitudes for a given study. RDS is used around the world to sample hidden populations that suffer from high rates of infection by HIV and other diseases. It is critical that researchers draw correct conclusions from RDS data by applying appropriate statistical techniques. We look forward to an improved understanding of RDS estimation that will better inform the policies critical to preventing and reducing the burden of disease borne by hidden populations worldwide. Supplementary Materials Supplementary materials are available online at http://www.oxfordjournals.org/our_journals/jssam/. Footnotes 1 De-identified characteristics for each sample may be found in the supplementary materials. 2 De-identified characteristics for each sample may be found in the supplementary materials. 3 Sample numbers are prefixed with “A” for samples from 2009 and “B” for samples from 2012. Sample numbers were randomly assigned to cities and are consistent across the two survey years (e.g., A-01 is the same city as B-01). 4 Acceptable coverage percent is calculated as the percentage of confidence intervals (CIs) with coverage between 93 percent and 97 percent, inclusive, for a given estimator pair and bootstrap CI method. 5 We have also observed this pattern of SH estimator behavior in its implementation in the Respondent-Driven Sampling Analysis Tool v. 7.1 software (Volz, Wejnert, Cameron, Spiller, Barash, et al. 2012). It typically, but not always, occurs when “0” cells are present in the recruitment matrix (e.g., when two population subgroups do not recruit one another). References Baraff A. J., McCormick T. H., Raftery A. E. ( 2016), “ Estimating Uncertainty in Respondent-Driven Sampling Using a Tree Bootstrap Method,” Proceedings of the National Academy of Sciences U S A , 113, 14668– 14673. Google Scholar CrossRef Search ADS   Centers for Disease Control and Prevention ( 2012), “ HIV Infection and HIV-Associated Behaviors among Injecting Drug Users—20 Cities, United States, 2009,” MMWR , 61, 133– 138. PubMed  Centers for Disease Control and Prevention. ( 2015), “ HIV Infection and HIV-Associated Behaviors among Persons Who Inject Drugs—20 Cities, United States, 2012,” MMWR , 64(10), 270– 275. PubMed  Chernick M. R., LaBudde R. A. ( 2014), An Introduction to Bootstrap Methods with Applications to R , New Jersey: John Wiley & Sons. Davison A. C., Hinkley D. V. ( 1997), Bootstrap Methods and Their Applications , Cambridge: Cambridge University Press. Google Scholar CrossRef Search ADS   Efron B., Tibshirani R. ( 1986), “ Bootstrap Methods for Standard Errors, Confidence Intervals, and Other Measures of Statistical Accuracy,” Statistical Science , 1 1, 54– 75. Google Scholar CrossRef Search ADS   Frank O., Strauss D. ( 1986), “ Markov Graphs,” Journal of the American Statistical Association , 81, 832– 842. Google Scholar CrossRef Search ADS   Gallagher K. M., Sullivan P., Lansky A., Onorato I. M. ( 2007), “ Behavioral Surveillance among People at Risk for HIV Infection in the U.S.: The National HIV Behavioral Surveillance System,” Public Health Reports , 122 (Suppl 1), 32– 38. Google Scholar CrossRef Search ADS PubMed  Gile K. J. ( 2011), “ Improved Inference for Respondent-Driven Sampling Data with Application to HIV Prevalence Estimation,” Journal of the American Statistical Association , 106, 135– 146. Google Scholar CrossRef Search ADS   Gile K. J., Handcock M. S. ( 2010), “ Respondent-Driven Sampling: An Assessment of Current Methodology,” Sociological Methodology , 40, 285– 327. Google Scholar CrossRef Search ADS PubMed  Gile K. J., Handcock M. S. ( 2015), “ Network Model‐Assisted Inference from Respondent‐Driven Sampling Data,” Journal of the Royal Statistical Society: Series A (Statistics in Society) , 178, 619– 639. Google Scholar CrossRef Search ADS   Goel S., Salganik M. J. ( 2010), “ Assessing Respondent-Driven Sampling,” Proceedings of the National Academy of Sciences U S A , 107, 6743– 6747. Google Scholar CrossRef Search ADS   Goodreau S. M., Kitts J. A., Morris M. ( 2009), “ Birds of a Feather, or Friend of a Friend? Using Exponential Random Graph Models to Investigate Adolescent Social Networks,” Demography , 46, 103– 125. Google Scholar CrossRef Search ADS PubMed  Handcock M. S., Fellows I. E., Gile K. J. ( 2015), “ RDS: Respondent-Driven Sampling.” https://cran.r-project.org/web/packages/RDS/citation.html. Handcock M. S., Hunter D. R., Butts C. T., Goodreau S. M., Krivitsky P. N., Morris M. ( 2014), “ERGM: Fit, Simulate and Diagnose Exponential-Family Models for Networks.” https://cran.r-project.org/web/packages/ergm/citation.html. Harris K. M., Halpern C. T., Whitsel E., Hussey J., Tabor J., Entzel P., Udry J. R. ( 2009), “The National Longitudinal Study of Adolescent Health: Research Design [Www Document].” http://www.cpc.unc.edu/projects/addhealth/faqs/addhealth. Heckathorn D. D. ( 1997), “ Respondent-Driven Sampling: A New Approach to the Study of Hidden Populations,” Social Problems , 44, 174– 199. Google Scholar CrossRef Search ADS   Hladik W., Barker J., Ssenkusu J. M., Opio A., Tappero J. W., Hakim A., Serwadda D., for the Crane Survey Group ( 2012), “ HIV Infection among Men Who Have Sex with Men in Kampala, Uganda–a Respondent Driven Sampling Survey,” PLoS One , 7, e38143. Google Scholar CrossRef Search ADS PubMed  Hunter D. R., Goodreau S. M., Handcock M. S. ( 2008), “ Goodness of Fit of Social Network Models,” Journal of the American Statistical Association , 103: 481, 248– 258. Google Scholar CrossRef Search ADS   Hunter D. R., Handcock M. S. ( 2006), “ Inference in Curved Exponential Family Models for Networks,” Journal of Computational and Graphical Statistics , 15. Hunter D. R., Handcock M. S., Butts C. T., Goodreau S. M., Morris M. ( 2008), “ ERGM: A Package to Fit, Simulate and Diagnose Exponential-Family Models for Networks,” Journal of Statistical Software , 24, 1– 29. Google Scholar CrossRef Search ADS PubMed  Li X., Lu H., Cox C., Zhao Y., Xia D., Sun Y., He X., et al.   ( 2014), “ Changing the Landscape of the HIV Epidemic among MSM in China: Results from Three Consecutive Respondent-Driven Sampling Surveys from 2009 to 2011,” BioMed Research International , 2014, 10. Lu X. ( 2013), “ Linked Ego Networks: Improving Estimate Reliability and Validity with Respondent-Driven Sampling,” Social Networks , 35, 669– 685. Google Scholar CrossRef Search ADS   Lu X., Bengtsson L., Britton T., Camitz M., Kim B. J., Thorson A., Liljeros F. ( 2012), “ The Sensitivity of Respondent-Driven Sampling,” Journal of the Royal Statistical Society: Series A (Statistics in Society) , 175, 191– 216. Google Scholar CrossRef Search ADS   Lu X., Malmros J., Liljeros F., Britton T. ( 2013), “ Respondent-Driven Sampling on Directed Networks,” Electronic Journal of Statistics , 7, 292– 322. Google Scholar CrossRef Search ADS   Malekinejad M., Johnston L. G., Kendall C., Kerr L. R., Rifkin M. R., Rutherford G. W. ( 2008), “ Using Respondent-Driven Sampling Methodology for HIV Biological and Behavioral Surveillance in International Settings: A Systematic Review,” AIDS and Behavior , 12, S105– S130. Google Scholar CrossRef Search ADS PubMed  McCreesh N., Frost S. D., Seeley J., Katongole J., Tarsh M. N., Ndunguse R., Jichi F., et al.   ( 2012), “ Evaluation of Respondent-Driven Sampling,” Epidemiology , 23, 138– 147. Google Scholar CrossRef Search ADS PubMed  Merli M. G., Moody J., Smith J., Li J., Weir S., Chen X. ( 2015), “ Challenges to Recruiting Population Representative Samples of Female Sex Workers in China Using Respondent Driven Sampling,” Social Science and Medicine , 125, 79– 93. Google Scholar CrossRef Search ADS PubMed  Morris M. ( 2004), Network Epidemiology: A Handbook for Survey Design and Data Collection , Oxford, England: Oxford University Press. Google Scholar CrossRef Search ADS   Pettersson H., Silva P. L. ( 2005), “Analysis of Design Effects for Surveys in Developing Countries,” in Household Sample Surveys in Developing and Transition Countries , New York: United Nations Department of Economic and Social Affairs Statistics Division. https://unstats.un.org/unsd/HHsurveys/pdf/Household_surveys.pdf R Core Team ( 2015), “R: A Language and Environment for Statistical Computing.” https://cran.r-project.org/doc/FAQ/R-FAQ.html#Citing-R Salganik M. J. ( 2006), “ Variance Estimation, Design Effects, and Sample Size Calculations for Respondent-Driven Sampling,” Journal of Urban Health , 83, i98– 112. Google Scholar CrossRef Search ADS PubMed  Salganik M. J., Heckathorn D. D. ( 2004), “ Sampling and Estimation in Hidden Populations Using Respondent-Driven Sampling,” Sociological Methodology , 34, 193– 239. Google Scholar CrossRef Search ADS   Szwarcwald C. L., de Souza Junior P. R., Damacena G. N., Junior A. B., Kendall C. ( 2011), “ Analysis of Data Collected by RDS among Sex Workers in 10 Brazilian Cities, 2009: Estimation of the Prevalence of HIV, Variance, and Design Effect,” Journal of Acquired Immune Deficiency Syndromes , 57 (Suppl 3), S129– S135. Google Scholar CrossRef Search ADS PubMed  Tomas A., Gile K. J. ( 2011), “ The Effect of Differential Recruitment, Non-Response and Non-Recruitment on Estimators for Respondent-Driven Sampling,” Electronic Journal of Statistics , 5, 899– 934. Google Scholar CrossRef Search ADS   U.S. Census Bureau ( 2006), “Current Population Survey Design and Methodology Technical Paper 66.” Verdery A. M., Mouw T., Bauldry S., Mucha P. J. ( 2015), “ Network Structure and Biased Variance Estimation in Respondent Driven Sampling,” PLoS One , 10, e0145296. Google Scholar CrossRef Search ADS PubMed  Volz E., Heckathorn D. D. ( 2008), “ Probability Based Estimation Theory for Respondent Driven Sampling,” Journal of Official Statistics , 24, 79. Volz E., Wejnert C., Cameron C., Spiller M. W., Barash V., Degani I., Heckathorn D. D. ( 2012), “Respondent-Driven Sampling Analysis Tool (RDSat) Version 7.1.” Wejnert C. ( 2009), “ An Empirical Test of Respondent-Driven Sampling: Point Estimates, Variance, Degree Measures, and Out-of-Equilibrium Data,” Sociological Methodology , 39, 73– 116. Google Scholar CrossRef Search ADS PubMed  Wejnert C., Pham H., Krishna N., Le B., DiNenno E. ( 2012), “ Estimating Design Effect and Calculating Sample Size for Respondent-Driven Sampling Studies of Injection Drug Users in the United States,” AIDS and Behavior , 16, 797– 806. Google Scholar CrossRef Search ADS PubMed  Yamanis T. J., Merli M. G., Neely W. W., Tian F. F., Moody J., Tu X., Gao E. ( 2013), “ An Empirical Analysis of the Impact of Recruitment Patterns on RDS Estimates among a Socially Ordered Population of Female Sex Workers in China,” Sociological Methods and Research , 42, 392– 425. Google Scholar CrossRef Search ADS   Published by Oxford University Press on behalf of the American Association for Public Opinion Research 2017. This work is written by US Government employees and is in the public domain in the US. http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Journal of Survey Statistics and Methodology Oxford University Press

Evaluating Variance Estimators for Respondent-Driven Sampling

Loading next page...
 
/lp/ou_press/evaluating-variance-estimators-for-respondent-driven-sampling-6FX3KaaoR6
Publisher
Oxford University Press
Copyright
Published by Oxford University Press on behalf of the American Association for Public Opinion Research 2017. This work is written by US Government employees and is in the public domain in the US.
ISSN
2325-0984
eISSN
2325-0992
D.O.I.
10.1093/jssam/smx018
Publisher site
See Article on Publisher Site

Abstract

Abstract Respondent-Driven Sampling (RDS) is a network-based method for sampling hard-to-reach populations that is widely used by public health agencies and researchers worldwide. Estimation of population characteristics from RDS data is challenging due to the unobserved population network, and multiple point and variance estimators have been proposed. Research evaluating these estimators has been limited and largely focused on point estimation; this analysis is the first evaluation of multiple variance estimators currently in use. We evaluated the performance of RDS variance estimators via simulations of RDS on synthetic networked populations constructed from 40 RDS surveys of injection drug users in the United States. In these simulations, average design effects (DEs) were lower and average 95% confidence interval (CI) coverage percentages were higher than suggested in previous work: typical DE range = 1–3; average 95% CI coverage = 93%. However, DE and CI coverage vary across the 40 sets of simulations, suggesting that the characteristics of a given study should be evaluated to assess estimator performance. We also found that simulation results are sensitive to whether sampling is conducted with replacement and the approach used to create CIs. We conclude that CI coverage rates and DEs are often acceptable but not perfect and that RDS estimates are usually reliable in scenarios where RDS assumptions are met. While RDS estimation performed reasonably well, we found strong evidence that the simple random sample variance estimator and corresponding CIs significantly underestimate variance and should not be used to analyze RDS data. 1. INTRODUCTION Respondent‐driven sampling (RDS) is a network‐based method for sampling populations for which a sampling frame is not available (Heckathorn 1997). Information about these “hard‐to‐reach” or “hidden” populations is critical for public health research with populations at high risk of acquiring HIV infection, including persons who inject drugs (PWID), men who have sex with men, and sex workers. RDS is widely used for public health surveillance of hidden populations by organizations such as the U.S. Centers for Disease Control and Prevention (CDC) (Gallagher, Sullivan, Lansky, and Onorato 2007), the Chinese Centers for Disease Control (Li, Lu, Cox, Zhao, Xia, et al. 2014), and entities funded through the President’s Emergency Plan for AIDS Relief (PEPFAR) (Hladik, Barker, Ssenkusu, Opio, Tappero, et al. 2012). RDS is primarily used to estimate the prevalence of traits such as diseases and risk factors. Design-based unbiased point and variance estimates of such prevalence from survey samples require calculating each participant’s probability of being sampled (“inclusion probability”). Because a sampling frame is not available, hidden population members’ inclusion probabilities cannot be calculated using standard approaches. Therefore, statistical inference from samples collected via RDS relies on models approximating the sampling process that incorporate information about the sample members’ social networks and information observed during the recruitment process. Multiple current evaluations of RDS point estimators and violations of RDS assumptions have been conducted, but significantly less work has examined RDS variance estimators (Wejnert 2009; Gile and Handcock 2010; Goel and Salganik 2010; Gile 2011; Tomas and Gile 2011; Lu, Bengtsson, Britton, Camitz, Kim, et al. 2012; McCreesh, Frost, Seeley, Katongole, Tarsh, et al. 2012; Merli, Moody, Smith, Li, Weir, et al. 2014; Verdery, Mouw, Bauldry, and Mucha 2015). Variance estimates are an essential complement to point estimates. Without good variance estimators, one cannot assess the amount of information contained in a sample and may draw invalid conclusions. In particular, statistical significance tests and confidence intervals will be misleading when the variance is under- (or over-) estimated. Additionally, understanding the variance of RDS point estimates in populations to which RDS is typically applied is important for calculating appropriate sample sizes for future studies. Past research on RDS variance estimation suggested that RDS confidence intervals provide unacceptably low coverage rates and that RDS may have extremely large design effects when applied to hidden populations of public health interest (Goel and Salganik 2010; Lu et al. 2012; Verdery et al. 2015). This study is the first systematic evaluation of the different RDS variance estimators. Our results indicate that confidence interval coverage rates are often acceptable although not perfect, and design effects are in the range of other complex survey designs. 2. BACKGROUND RDS begins with researchers choosing a small number (usually five to 10) of “seed” population members. The seeds are interviewed and given a small number of uniquely numbered coupons with which they can recruit population members they know into the sample (usually three to five). Recruited population members are interviewed and given coupons, and the process is repeated until the target sample size is reached. Participants are remunerated both for completing the survey questionnaire and for each eligible population member they recruit. RDS survey questionnaires and associated biological tests provide data on many characteristics of interest. For the purposes of this article, we represent these variables of interest by a two‐valued trait, with values “with trait” and “without trait.” Populations sampled via RDS are connected via social network ties; we refer to the set of persons, or “nodes,” and ties connecting them as the “population network.” We refer to the number of ties each person has to other members of the population as that person’s “degree.” Most RDS point estimators are design‐based, including the Salganik‐Heckathorn (SH) (Salganik and Heckathorn 2004), Volz‐Heckathorn (VH) (Volz and Heckathorn 2008), and Successive Sampling (SS) estimators (Gile 2011). The SH estimator models RDS as a Markov chain on the nodes in the population network; it is based on equating the number of network ties between population groups with different trait statuses (Salganik and Heckathorn 2004; Gile and Handcock 2010). The VH estimator uses the same Markov chain approximation to RDS and applies a modified Hansen‐Hurwitz estimator calculated from respondent degrees and the trait statuses of sample members (Volz and Heckathorn 2008). The SS estimator models RDS as sampling population members proportional to degree without replacement; it applies an algorithm to estimate the mapping between a person’s degree and his sampling probability and applies a form of the Horvitz‐Thompson estimator (Gile 2011). More details on these estimators is available in the supplementary materials. Commonly used RDS variance estimators employ a bootstrap resampling approach that approximates the RDS design (Davison and Hinkley 1997). The variance of the point estimates produced by these bootstrap resamples is computed and used to estimate the RDS variance. Two approximations that are currently used are the Salganik Bootstrap (“Sal‐BS”) (Salganik 2006) and the Successive Sampling Bootstrap (“SS‐BS”) (Gile 2011). The Sal‐BS is typically applied in conjunction with the SH or VH RDS point estimators (Salganik 2006), and the SS‐BS is applied in conjunction with the SS point estimator (Gile 2011). We refer to these point and variance estimator pairs as “SH/Sal‐BS,” “VH/Sal‐BS,” and “SS/SS‐BS,” respectively. Sal‐BS is based on ordered with‐replacement resampling draws from the sample, such that each subsequent node is selected from among the nodes whose recruiters have a trait status matching that of the previous node (Salganik 2006). SS‐BS takes a similar approach but considers the without‐replacement structure of RDS by adjusting the set of available nodes at each resampling draw based on which nodes had been previously sampled (Gile 2011). This analysis evaluates each of these point and variance estimate pairs; for comparison, we also consider the case when the RDS data are naively treated as a simple random sample (SRS) and the sample mean point estimator is used. We refer to this estimator pair as “Mean/SRS.” The variability of estimators is typically presented as a confidence interval (CI). A properly calibrated method for computing level α CIs produces intervals that capture the true population value for an estimand with probability of at least 1 – α (e.g., an α of 0.05 corresponds to a CI that includes the true population value in 95 percent of samples). CIs can be calculated from bootstrap variance estimates using a number of methods; the percentile and studentized bootstrap CI methods are most commonly used for RDS data (Efron and Tibshirani 1986). The lower and upper bounds of the CI under the percentile bootstrap method are the 100*(α2) and 100*(1-α2) percentiles, respectively, of the bootstrap resamples. In contrast, the studentized bootstrap CI method calculates the standard deviation (SD) of the bootstrap resample estimates and the t‐value (t) associated with the sample’s degrees of freedom; it then calculates the CI as the point estimate plus or minus t*SD for the upper and lower bounds, respectively. The percentile method can generate CIs that are asymmetric about the point estimate, whereas the studentized method always produces symmetric CIs. The SH/Sal‐BS and VH/Sal‐BS RDS estimator pairs have traditionally calculated CIs using the percentile method (Salganik 2006), while the SS/SS‐BS estimator pair has traditionally used the studentized bootstrap method (Gile 2011). The Mean/SRS estimator pair calculates a CI based on a normal approximation to the sampling distribution. 3. FRAMEWORK FOR ASSESSING RDS Evaluations of RDS point estimators have been conducted both with real RDS samples from nonhidden populations and with simulated RDS samples, but the accuracy of RDS variance estimators can only be evaluated via simulation. This is because, while it is theoretically possible to know the true value of an estimand in the target population to which to compare point estimators, it is only possible to know the true variability of an estimator in a true population by conducting a large number of independent studies in the same population with the same structure, which is practically infeasible. Evaluating RDS by simulation consists of three steps: (1) obtaining or creating a population network with certain characteristics, (2) simulating RDS on that network, and (3) applying RDS estimators to the trait of interest in the resulting samples. As these procedures are repeated many times, the resulting distribution of simulated estimates approaches the true sampling distribution of the estimators under the simulation conditions. Our primary results evaluate the performance of RDS uncertainty estimation based on the performance of the CIs calculated from different point/variance estimator pairs (e.g., SH/Sal‐BS). An estimator pair’s CI coverage is the percentage of simulations in which its CIs capture the network’s true population value, which is compared to the nominal coverage of 100*1-α% (e.g., a 95 percent CI should capture the true population value in 95 percent of simulations). In addition to evaluating variance estimators, for comparison with previous research on RDS uncertainty estimation, we consider RDS design effects (DEs), a relative measure of the variability of an estimator calculated from a sample drawn with a complex sampling method (Goel and Salganik 2010; Verdery et al. 2015). We calculate the DE as the ratio of the variance of an estimator from a given sampling design to the hypothetical variance if the sample had been collected using SRS on the same population. Specifically, the DE is the ratio of the RDS estimate’s variance to that under an SRS design of the same sample size. A method with a DE of 2 would require a sample size twice as large as that required by SRS to achieve the same variability for the estimate of a given trait. Typical DEs for many complex surveys that did not use RDS are between 1.5 and 2, but for some variables in some studies can range to 5 (Pettersson and Silva 2005; U.S. Census Bureau 2006). Previous research on the variance of RDS estimators has suggested that RDS DEs may be significantly larger than is typical in surveys conducted using complex sampling methods other than RDS (Goel and Salganik 2010; Lu et al. 2012; Verdery et al. 2015). While the DE of a given RDS study in the real world is unknown because it cannot be calculated from the data, we can calculate the DEs for our simulations numerically. We refer to these as the “actual DEs” below. In addition to actual RDS DEs, which previous research has also calculated based on simulations, the DEs estimated by RDS variance estimators (which can be calculated from a real RDS study’s data) are also of interest. We refer to these as “estimated DEs” below. Previous simulation studies have suggested that RDS variance estimators produce inaccurate estimated DEs when they compare the estimated and actual DEs for a given simulation (Goel and Salganik 2010; Verdery et al. 2015). Table 1 summarizes the findings from three previous simulation studies of RDS variance estimation and design effects. The two studies that evaluated 95 percent CI coverage reported mean or median 95 percent CI coverage rates below 70 percent. The three studies found a wide range of design effects, with mean or median design effects greater than 5 and ranging from 5 to 30. Table 1. Findings from Previous Simulation Studies of RDS Variance Estimation and Design Effects Study  Point estimator  Variance estimator  Simulation approach/CI method  Population network data used for RDS simulations  95% CI coverage  DE result  Goel and Salganik (Goel and Salganik 2010)  Volz-Heckathorn  Salganik  With replacement/percentile  1987 attempted network census of high‐risk heterosexuals in Colorado Springs, CO  52% (median)  11 (median; multiple traits)  Goel and Salganik  Volz-Heckathorn  Salganik  With replacement/percentile  Sample of U.S. adolescents in 7th–12th grades between 1994 and 1996  62% (median)  5.9 (median; multiple traits)  Lu et al. (Lu et al. 2012)  Volz-Heckathorn  N/A  N/A  Online social network in Sweden  N/A  5 to 13 (multiple traits)  Verdery et al. (Verdery et al. 2015)  Volz-Heckathorn  Salganik  With replacement/studentized  Sample of U.S. adolescents in 7th–12th grades between 1994 and 1996  68% (mean)  15 (mean; 3 traits)  Verdery et al.  Volz-Heckathorn  Salganik  With replacement/studentized  Multiple Facebook social networks of college students  65% (mean)  30 (mean; 2 traits)  Study  Point estimator  Variance estimator  Simulation approach/CI method  Population network data used for RDS simulations  95% CI coverage  DE result  Goel and Salganik (Goel and Salganik 2010)  Volz-Heckathorn  Salganik  With replacement/percentile  1987 attempted network census of high‐risk heterosexuals in Colorado Springs, CO  52% (median)  11 (median; multiple traits)  Goel and Salganik  Volz-Heckathorn  Salganik  With replacement/percentile  Sample of U.S. adolescents in 7th–12th grades between 1994 and 1996  62% (median)  5.9 (median; multiple traits)  Lu et al. (Lu et al. 2012)  Volz-Heckathorn  N/A  N/A  Online social network in Sweden  N/A  5 to 13 (multiple traits)  Verdery et al. (Verdery et al. 2015)  Volz-Heckathorn  Salganik  With replacement/studentized  Sample of U.S. adolescents in 7th–12th grades between 1994 and 1996  68% (mean)  15 (mean; 3 traits)  Verdery et al.  Volz-Heckathorn  Salganik  With replacement/studentized  Multiple Facebook social networks of college students  65% (mean)  30 (mean; 2 traits)  4. METHODS Evaluating RDS via simulation requires obtaining or creating a population network from which to draw samples and simulating RDS on that network. Previous studies have simulated RDS both on real and synthetic population networks. RDS is used to study hidden populations, so an RDS simulation study’s population network should be as similar to real hidden population networks as possible. Unfortunately, complete data for hidden population networks are extremely rare. Complete network data are difficult and expensive to collect in any setting (Morris 2004), and these challenges are compounded among populations whose members wish to remain hidden. Hidden population network data are unavailable, so the real population networks in previous RDS simulation studies have come from a variety of sources (table 1). Two of the studies used network data from a sample of United States adolescents in the 7th through 12th grades (the “Add Health” study) (Harris, Halpern, Whitsel, Hussey, Tabor, et al. 2009; Goel and Salganik 2010; Verdery et al. 2015), and another used Facebook network data from college students when Facebook only permitted college students to use the service (Verdery et al. 2015). Notably, both of these population networks are embedded within schools. In the United States, middle schools and high schools are highly structured by grade, with students typically taking classes only with others in the same grade. Colleges are less structured by grade, but they have additional structure along academic disciplines. Students in these settings often have friends outside their grades and disciplines, but their networks are strongly shaped and constrained by those institutional structures. Such institutional structures are not present for the vast majority of the hidden population networks RDS is used to sample. RDS variance is known to be strongly positively related to homophily, the extent to which networks are assortative along characteristics of its members. These school networks demonstrate strong homophily by grade, resulting in networks that may have “bottlenecks” between population subgroups (Goodreau, Kitts, and Morris 2009). As noted earlier, a key element of RDS estimators is self-reported degree. In Add Health, participants were asked to name up to five boy best friends and five girl best friends (Harris et al. 2009). The degree for a given participant was the sum of the number of persons she named and the number of times she was named by participants she did not name. In contrast, RDS study participants are typically asked to state the number of persons they know in the target population who also know them (Malekinejad, Johnston, Kendall, Kerr, Rifkin, et al. 2008), which serves as a proxy for the number of people who might recruit them into the study. Because of the difference in how degree is elicited between Add Health and RDS studies, the degree distribution for RDS studies typically has a higher mean and higher variance than those in the Add Health school networks (Malekinejad et al. 2008; Goodreau, Kitts, and Morris 2009). In sum, these school networks are very unlike the hidden population networks through which RDS coupons are typically passed, and some of their features mean that the known a priori that RDS will perform poorly in simulations. Given the differences between the available population network data and the networks of hidden populations, we based both our simulated population networks and our simulated RDS sampling process on real RDS studies. To maximize the similarity of our simulations to RDS as it occurs in the field, our simulations are designed to reflect RDS as it was used to sample PWID by the CDC’S National HIV Behavioral Surveillance system (NHBS) in 2009 and 2012. NHBS sampled PWID in 20 U.S. cities in both 2009 and 2012 using a standard protocol, resulting in 40 RDS samples (Centers for Disease Control and Prevention 2012, 2015). A flowchart of our simulation methods is presented in the supplementary material. To create the simulated population networks for our study (step 1 in the three-step process described above), we first estimated four characteristics of the PWID population in each NHBS city from each of the 40 NHBS samples: the prevalence and homophily for a two‐valued trait of public health interest; the estimated mean degree of population members; and differential activity (DA). Homophily is defined as the proportion of ties in the network between two respondents who share a trait status relative to what would be expected by chance. DA is a measure of one group’s gregariousness compared to another and is defined as the ratio of the mean degrees of population members with and without the trait. Summary statistics of these characteristics can be found in table 2.1 Using each of the 40 sets of characteristics, we then simulated 1,000 networks using exponential‐family random graph models (ERGMs) (Frank and Strauss 1986; Hunter and Handcock 2006; Hunter, Goodreau, and Handcock 2008; Hunter, Handcock, Butts, Goodreau, and Morris 2008; Handcock, Hunter, Butts, Goodreau, Krivitsky, et al. 2014), for a total of 40,000 networks. Each simulated network had a population size of 10,000 members. Table 2. Summary of 40 NHBS Samples Used to Create RDS Simulations Characteristic  Mean  SD  Median  Minimum  Maximum  Prevalence  0.104  0.0653  0.091  0.018  0.286  Mean degree  10.64  5.096  9.88  4.45  35.39  Homophily  1.226  0.2281  1.19  0.91  1.99  Differential activity  0.931  0.2098  0.92  0.53  1.44  Sample size  519.1  108.85  539.5  206  700  Number of Seeds  8  3.31  8  3  16  Number of seeds with trait  1.1  1.18  1  0  5  Number of seeds without trait  6.8  3.21  7  1  16  Number of seeds missing traita  0.13  0.404  0  0  2  % of coupons returned  30.60%  6.60%  33.20%  20.00%  49.80%  Number of recruits = 0b  33.90%  7.01%  35.50%  21.40%  48.00%  Number of recruits = 1b  21.80%  5.07%  22.10%  9.10%  32.10%  Number of recruits = 2b  17.70%  3.54%  18.20%  10.00%  25.10%  Number of recruits = 3b  10.50%  2.69%  10.00%  4.60%  16.00%  Number of recruits = 4b,c  1.70%  2.00%  0.67%  0%  7.70%  Number of recruits = 5b,c  0.54%  0.65%  0.30%  0%  2.40%  Characteristic  Mean  SD  Median  Minimum  Maximum  Prevalence  0.104  0.0653  0.091  0.018  0.286  Mean degree  10.64  5.096  9.88  4.45  35.39  Homophily  1.226  0.2281  1.19  0.91  1.99  Differential activity  0.931  0.2098  0.92  0.53  1.44  Sample size  519.1  108.85  539.5  206  700  Number of Seeds  8  3.31  8  3  16  Number of seeds with trait  1.1  1.18  1  0  5  Number of seeds without trait  6.8  3.21  7  1  16  Number of seeds missing traita  0.13  0.404  0  0  2  % of coupons returned  30.60%  6.60%  33.20%  20.00%  49.80%  Number of recruits = 0b  33.90%  7.01%  35.50%  21.40%  48.00%  Number of recruits = 1b  21.80%  5.07%  22.10%  9.10%  32.10%  Number of recruits = 2b  17.70%  3.54%  18.20%  10.00%  25.10%  Number of recruits = 3b  10.50%  2.69%  10.00%  4.60%  16.00%  Number of recruits = 4b,c  1.70%  2.00%  0.67%  0%  7.70%  Number of recruits = 5b,c  0.54%  0.65%  0.30%  0%  2.40%  a Assigned to be without trait for purposes of sampling simulation. b Among sample members who were given coupons. c These numbers include six studies where a maximum of three coupons were distributed per subject; the counts for those studies are constrained to be 0. We designed the RDS process (step #2 above) used in the simulations to match those observed in the NHBS samples by first measuring the following characteristics for each of the 40 NHBS samples: the sample size, the numbers of seeds with and without the trait, and the distribution of number of recruitments by sample members. Summary statistics of these characteristics can be found in table 2.2 For each of the 1,000 networks corresponding to a given NHBS sample, we simulated one RDS sample using the RDS package in the statistical software R (step #3 above) (Handcock, Fellows, and Gile 2015; R Core Team 2015). The simulated RDS process was implemented based on the RDS process characteristics of the NHBS sample described above; for example, a given simulated RDS sample had the same number of seeds as its corresponding NHBS sample. Because RDS samples do not allow for repeated participation, our baseline samples were without replacement. For each simulated RDS sample, we applied each of the four point/variance estimator pairs to the trait of interest. For each of the three RDS estimator pairs, we calculated 95 percent CIs using both the studentized and percentile bootstrap methods. Our analysis compares the coverage rates of the 95 percent CIs for the four point/variance estimator pairs and two bootstrap CI methods when sampling was with and without replacement, where the coverage rates are calculated as the proportion of the simulations in which the CI contained the true population prevalence of the trait. We calculate the actual DEs for our simulations numerically as ratio of the variance of the distribution of point estimates across simulations to the SRS variance, where the SRS variance includes a finite population adjustment based on the proportion of the population that was sampled. We calculate the estimated DEs for our simulations as the ratio of the estimated variance to the SRS variance. Each actual DE is calculated as the variance of the 1,000 simulations for each population network. Each estimated DE is calculated from a specific estimator pair applied to a single sampling simulation. Because the actual DE varies in magnitude across population networks, we summarize the estimated DEs’ accuracy by calculating the ratio of each simulation’s estimated DE to the actual DE for that population network. We compare the actual DEs for the four point estimators and also compare the actual DEs to the DEs estimated by the RDS variance estimators. 5. RESULTS Figure 1 presents the 95 percent CI coverage rates for the four estimator pairs for the 40 sets of RDS simulations conducted with the baseline condition of sampling without replacement and estimating the CI via the studentized bootstrap method. The horizontal axis of the figure is the nominal 95 percent CI coverage rate, and the vertical axis is the 40 simulation sets ordered from top to bottom by the SS coverage rate (the red line). 3 Figure 1. View largeDownload slide Ninety-Five Percent Confidence Interval (CI) Coverage Percentages for 40 Sets of RDS Simulations (Sampling without Replacement; Studentized Bootstrap CI Method). The horizontal axis is the nominal 95 percent CI coverage percentage, and the vertical axis is the 40 simulation sets ordered from top to bottom by the SS coverage percentage (the red line). The left panel’s horizontal axis ranges from 0 to 100 percent; the right panel’s horizontal axis ranges from 80 percent to 100 percent for detail. The coverage percentages for the sample mean do not appear in the right panel. Figure 1. View largeDownload slide Ninety-Five Percent Confidence Interval (CI) Coverage Percentages for 40 Sets of RDS Simulations (Sampling without Replacement; Studentized Bootstrap CI Method). The horizontal axis is the nominal 95 percent CI coverage percentage, and the vertical axis is the 40 simulation sets ordered from top to bottom by the SS coverage percentage (the red line). The left panel’s horizontal axis ranges from 0 to 100 percent; the right panel’s horizontal axis ranges from 80 percent to 100 percent for detail. The coverage percentages for the sample mean do not appear in the right panel. The left panel of figure 1 displays the full range of coverage rates on the horizontal axis. The sample mean performs poorly compared to the other estimators. Hence, the right panel omits the sample mean and displays coverage rates from 80 percent to 100 percent to allow more detailed comparison of the coverages for estimator pairs other than the Mean/SRS. The right panel reveals that the SH/Sal-BS and VH/Sal-BS estimates have similar performance to SS/SS-BS estimators for a majority of simulation sets, but that they have considerably worse coverage rates in at least four sets of simulations. The SS/SS‐BS estimator pair had higher overall coverage than the other two RDS estimator pairs: It only had one instance of coverage below 90 percent, whereas the SH/Sal‐BS and VH/Sal‐BS coverages were below 90 percent in five instances. The NHBS sample corresponding to the instance with SS/SS-BS coverage below 90 percent (86.8 percent) has trait prevalence of 0.034 and the smallest sample size (n = 210) of all the NHBS samples. The SH/Sal-BS and VH/Sal-BS had considerably worse coverage rates in four sets of simulations (figure 1, right panel: B-01, A-08, B-19, and A-01). The NHBS samples corresponding to these extreme cases had lower differential activity and higher homophily for the trait of interest than did the other samples. Table 3 shows summary statistics across the 40 simulation sets for these RDS estimator pair coverages along with results for the percentile bootstrap. For the SH/Sal-BS and VH/Sal-BS estimators, the studentized bootstrap performs better, with mean coverage rates 6 and 5.9 percentage points higher and median coverage rates 2 and 2.1 percentage points higher, respectively. For the SS/SS-BS, the results are very similar, with the studentized mean coverage rate 0.3 and median coverage rate 0.4 percentage points lower. Table 3. 95% Confidence Interval Coverage Percentages for Four RDS Point and Variance Estimator Pairs by Bootstrap CI Method Point estimator  Variance estimator  Bootstrap CI method  Mean  SD  Median  Range  Acceptable coverage %a  Sample mean  SRS variance  N/A  67.4  23.8  74.9  [14, 96]  5  Salganik-Heckathorn  Salganik  Percentile  87  12.8  91.9  [41, 96]  40  Salganik-Heckathorn  Salganik  Studentized bootstrap  93  2.8  93.9  [86, 97]  67.5  Volz-Heckathorn  Salganik  Percentile  87  12.8  91.8  [41, 96]  42.5  Volz-Heckathorn  Salganik  Studentized bootstrap  92.9  3.2  93.9  [82, 97]  67.5  Successive Sampling  Successive Sampling  Percentile  94.1  1.8  94.6  [87, 97]  80  Successive Sampling  Successive Sampling  Studentized bootstrap  93.8  1.8  94.2  [87, 96]  75  Point estimator  Variance estimator  Bootstrap CI method  Mean  SD  Median  Range  Acceptable coverage %a  Sample mean  SRS variance  N/A  67.4  23.8  74.9  [14, 96]  5  Salganik-Heckathorn  Salganik  Percentile  87  12.8  91.9  [41, 96]  40  Salganik-Heckathorn  Salganik  Studentized bootstrap  93  2.8  93.9  [86, 97]  67.5  Volz-Heckathorn  Salganik  Percentile  87  12.8  91.8  [41, 96]  42.5  Volz-Heckathorn  Salganik  Studentized bootstrap  92.9  3.2  93.9  [82, 97]  67.5  Successive Sampling  Successive Sampling  Percentile  94.1  1.8  94.6  [87, 97]  80  Successive Sampling  Successive Sampling  Studentized bootstrap  93.8  1.8  94.2  [87, 96]  75  aAcceptable coverage is calculated as the percentage of confidence intervals (CIs) with coverage between 93% and 97%, inclusive, for a given estimator pair and bootstrap CI method. Given that the conditions varied considerably across the 40 simulation sets, summary statistics such as the mean may mask meaningful variation in the coverages. Therefore, we calculated a summary measure of “acceptable coverage.”4 The Mean/SRS estimator had acceptable coverage in 5 percent of CIs. The SH/Sal‐BS and VH/Sal‐BS with studentized bootstrap estimator pairs produced acceptable coverage for 67.5 percent of CIs, and the SS/SS‐BS with percentile and studentized bootstrap CI methods produced acceptable coverages for 80 percent and 75 percent of CIs, respectively. We conducted additional simulations to investigate the 95 percent CI coverage rates for the VH/Sal-BS in our analysis (all greater than 80 percent, see figure 1, which were higher than the VH/Sal-BS coverage rates reported in the seminal Goel and Salganik paper (medians of 52 percent and 62 percent coverage for the two samples analyzed) and the paper by Verdery and colleagues (means of 68 percent and 65 percent for the two samples analyzed) (Goel and Salganik 2010; Verdery et al. 2015). We hypothesized that the simulation of RDS sampling with replacement or the use of the percentile bootstrap CI method impacted the coverage findings in those papers. Figure 2 presents the coverage rates for the VH/Sal‐BS estimator pair under four conditions: sampling with replacement with percentile bootstrap CIs, sampling with replacement with studentized bootstrap CIs, sampling without replacement with percentile bootstrap CIs, and sampling without replacement with studentized bootstrap CIs. This figure shows that the estimator applied to simulations using without-replacement sampling and the studentized bootstrap method (purple line and triangles) consistently outperform simulations using with-replacement sampling and the percentile bootstrap (red line and circles). Figure 2. View largeDownload slide Ninety-Five Percent Confidence Interval (CI) Coverage Percentages for 40 Sets of RDS Simulations (VH/Sal-BS Estimator Pair) by Bootstrap CI Method and Sampling with and without Replacement. The horizontal axis is the nominal 95% CI coverage percentage, and the vertical axis is the 40 simulation sets ordered from top to bottom by the without replacement, studentized bootstrap condition (the purple line and triangles). Figure 2. View largeDownload slide Ninety-Five Percent Confidence Interval (CI) Coverage Percentages for 40 Sets of RDS Simulations (VH/Sal-BS Estimator Pair) by Bootstrap CI Method and Sampling with and without Replacement. The horizontal axis is the nominal 95% CI coverage percentage, and the vertical axis is the 40 simulation sets ordered from top to bottom by the without replacement, studentized bootstrap condition (the purple line and triangles). Table 4 summarizes the DEs from our RDS simulations. The first four rows of table 4 show the actual DEs for samples drawn without replacement for the sample mean, SH, VH, and SS estimators. The median DEs for the SH, VH, and SS point estimators (table 4, rows 2–4) were approximately 1.7, which is similar to the DEs observed for other complex sampling methods (Pettersson and Silva 2005; U.S. Census Bureau 2006). For both the VH and SS estimators, the maximum DE was between 6 and 6.2; the maximum DE for SH was 95.5. In addition to its maximum DE of 95.5, the SH had three additional DEs that were much higher than expected. The DEs in these four scenarios were caused by the SH estimator failing for between two and six of the 1,000 simulation runs. Specifically, in these cases the SH produced a trait prevalence of 1 when the true prevalence was less than 0.08.5 The last row of table 4 shows the DEs of the VH estimator for sampling with replacement, which was the RDS simulation process used in the papers by Goel and Salganik and Verdery and colleagues (Goel and Salganik 2010; Verdery et al. 2015). Note that for every summary statistic, the DEs are higher for the VH sampling with replacement condition than for the VH sampling without replacement condition. Table 4. Design Effects for Four RDS Point Estimators by Sampling with or without Replacement Point estimator (sampling method)  Range  Median  Mean  SD  Sample mean (without replacement)  [0.75, 2.64]  1.34  1.42  0.49  Salganik-Heckathorn (without replacement)  [0.83, 95.51]  1.72  7.47  19.96  Volz-Heckathorn (without replacement)  [0.81, 6.19]  1.69  1.91  0.96  Successive Sampling (without replacement)  [0.83, 6.03]  1.66  1.89  0.93  Volz-Heckathorn (with replacement)a  [1.01, 7.97]  2.34  2.77  1.48  Point estimator (sampling method)  Range  Median  Mean  SD  Sample mean (without replacement)  [0.75, 2.64]  1.34  1.42  0.49  Salganik-Heckathorn (without replacement)  [0.83, 95.51]  1.72  7.47  19.96  Volz-Heckathorn (without replacement)  [0.81, 6.19]  1.69  1.91  0.96  Successive Sampling (without replacement)  [0.83, 6.03]  1.66  1.89  0.93  Volz-Heckathorn (with replacement)a  [1.01, 7.97]  2.34  2.77  1.48  a Point estimator and sampling method used in Goel and Salganik 2010. Table 5 compares the estimated and actual DEs by estimator pair and sampling method. It summarizes the performance of the DEs estimated by a given estimator pair and sampling method by comparing the distribution of estimated DE to actual DE ratios across the 40,000 simulations to a benchmark. It presents three benchmarks: estimated DEs within a factor of 1.5 (i.e., 60 percent to 150 percent) of the actual DE, a factor of 2 (i.e., 50 percent to 200 percent) of the actual DE, and a factor of 3 (i.e., 33 percent to 300 percent) of the actual DE. For each benchmark, it presents the percent of estimated DEs that were within that factor and the percent of those that were within the factor that were too low. For example, 78.1 percent of the SH/Sal-BS without replacement estimated DEs were within a factor of 1.5 of the actual DE; of that 78.1 percent, 45.8 percent were too low. Table 5. Comparison of Estimated and Actual Design Effects by Estimator Pair and Sampling Method Point estimator (sampling method)  Within a factor of 1.5 of the actual DEa   Within a factor of 2 of the actual DEb   Within a factor of 3 of the actual DEc   Percent  Percent of those within factor that are too low  Percent  Percent of those within factor that are too low  Percent  Percent of those within factor that are too low  SH/Sal-BS (without replacement)  78.1  45.8  83.7  46.2  88.3  47.0  VH/Sal-BS (without replacement)  83.5  46.6  91.8  47.3  97.5  48.0  SS/SS-BS (without replacement)  92.9  51.3  98.6  51.0  99.8  51.0  VH/Sal-BS (with replacement)*  56.9  79.4  79.9  82.8  93.5  84.5  Point estimator (sampling method)  Within a factor of 1.5 of the actual DEa   Within a factor of 2 of the actual DEb   Within a factor of 3 of the actual DEc   Percent  Percent of those within factor that are too low  Percent  Percent of those within factor that are too low  Percent  Percent of those within factor that are too low  SH/Sal-BS (without replacement)  78.1  45.8  83.7  46.2  88.3  47.0  VH/Sal-BS (without replacement)  83.5  46.6  91.8  47.3  97.5  48.0  SS/SS-BS (without replacement)  92.9  51.3  98.6  51.0  99.8  51.0  VH/Sal-BS (with replacement)*  56.9  79.4  79.9  82.8  93.5  84.5  a Between 66% and 150% of the actual DE. b Between 50% and 200% of the actual DE. c Between 33% and 300% of the actual DE. * Point estimator and sampling method used in Goel and Salganik 2010. Table 5 shows that for without-replacement sampling, the pattern of estimated DE performance for the estimators is consistent for all three benchmarks: the SS/SS-BS estimator pair had the highest percentage within the factor, the VH/Sal-BS had the second-highest percentage, and the SH/Sal-BS had the lowest percentage. This ordering was the same for the percentage of estimates within the benchmark that were too low, with the SS/SS-BS pair having the most even distribution (percentages closest to 50 percent). This pattern reflects the SH/Sal-BS and VH/Sal-BS pairs having less accurate estimated DEs that are biased upward and the SS/SS-BS pair having more accurate estimated DEs that are approximately unbiased. For with-replacement sampling, the VH/Sal-BS estimator pair shows much lower accuracy than all three without replacement estimators for the most stringent benchmark factor. It also has a high proportion of estimated DEs that are too low, with more than 79 percent of estimated DEs lower than the actual DE. 6. DISCUSSION Our simulations suggest that the coverage of 95 percent CIs for RDS samples is usually above 90 percent (with no coverage rates above 97 percent). This is better than past work has suggested, demonstrating that reasonably accurate RDS variance estimation is feasible and that conclusions drawn from past analyses of RDS data that applied one of these estimators may well be reasonable in scenarios where RDS assumptions are met. While the RDS estimators performed better than expected, the SRS variance estimator significantly underestimates the variability of RDS samples and provides very low coverage. Because of the complexity of RDS, it may be tempting to dispense with complicated inferential approaches and use the sample mean and SRS variance approximation. Our results show that this approach is likely to cause significant underestimation of uncertainty and lead to misleading conclusions. We found that the SS/SS‐BS estimator pair had overall higher coverage than the other two estimator pairs. The SS/SS-BS exhibited its lowest coverage when applied to a sample with lower prevalence and a smaller sample size than the other samples. In contrast, the SH/Sal‐BS and VH/Sal‐BS had lower coverage for samples, with levels of differential activity much lower than those of the other samples in combination with higher levels of homophily than those of the other samples. Note that the difference between the SS and VH estimators is a finite population adjustment that requires knowing the true size of the population, which is typically unavailable. The impact of error in the population size specified for the SS estimator in a given sample is a function of the true size of the population. The impact is relative, so the impact of a given absolute error in the specified population size will be larger for smaller population sizes (e.g., an error of 500 in the specified population size will have more impact when the true population size is 1,000 than when it is 10,000). For large population sizes, the SS estimator approaches the VH estimator because the finite population adjustment has little impact, so using the SS estimator with an overestimated population size will pull it toward the VH estimate. Therefore, the SS will perform at least as well as the VH unless the population size is dramatically underestimated. The relationship between a population’s characteristics and RDS CI coverage is complex, so the specific relationships between prevalence, sample size, and homophily and the performance of RDS estimator pairs require further investigation. More generally, a large number of population characteristics must be systematically varied in a simulation to disentangle the combinations of factors that influence RDS CI coverage. While other work has suggested RDS variance estimators perform poorly, our analysis suggests those results can, at least partially, be attributed to the choice of bootstrap method and unrealistic use of with-replacement sampling in prior studies. For the SH and VH estimators, we found that using the studentized bootstrap, as compared to the percentile bootstrap, significantly increased the percentage of CIs with good coverage from 40 to 67.5 and 42.5 to 67.5, respectively (table 2). Goel and Salganik’s findings of low CI coverage were likely at least partially due to their use of with‐replacement sampling and the percentile bootstrap CI method (see figure 2). Chernick and LaBudde (2014) studied the relative performance of studentized and percentile bootstrap CI estimates and found that in most scenarios the studentized approach is more accurate. We also found significantly smaller DEs than Goel and Salganik, with evidence that sampling with replacement increases the DE. For example, for without-replacement sampling, both SS/SS and VH/Sal-BS produced actual DEs less than 3 in 92.5 percent of our conditions (37/40) and 62.5 percent less than 2, whereas for with-replacement sampling the VH/Sal-BS estimator pair DE was less than 3 in only 67.5 percent of our conditions, with only 30 percent less than 2. This echoes findings by Lu and colleagues and Gile and Handcock that sampling without replacement may reduce the DEs for RDS (Gile and Handcock 2010; Lu et al. 2012). The estimated DEs were more accurate for sampling without replacement than for sampling with replacement. For example, for the VH/Sal-BS estimator pair sampling without replacement produced estimated DEs within a factor of 2 of the actual DEs 91.8 percent of the time, with slightly less than half (47.3 percent) being lower than the actual DE (the anticonservative direction). In contrast, for with-replacement sampling, the estimated DEs were within a factor of 2 of the actual DEs only 79.9 percent of the time, with a significant majority (82.8 percent) being lower than the actual DE. Overall, the estimated DEs for the SS/SS estimator pair were the most accurate: 92.9 percent within a factor of 1.5, with fewer large outliers (see the technical supplementary material for more detail). The RDS sampling process is highly complex and only partially observed in real RDS studies, so many choices about simulation design and specification must be made without reference to empirical data. Because the ultimate goal of RDS simulation studies is to understand how RDS performs in the real world, we recommend conducting RDS simulations without replacement and with parameters informed by real RDS samples to the extent possible. This study’s simulations find that RDS DEs are in the range suggested in other methodological work that did not use simulation studies (Wejnert, Pham, Krishna, Le, and DiNenno 2012). We found that simulated RDS DEs in cases chosen to approximate the NHBS are usually between 1 and 3, in contrast with suggestions in past simulation work that DEs may often be greater than 10 (Goel and Salganik 2010; Verdery et al. 2015). This means that, in instances where RDS assumptions are met, RDS provides samples with statistical precision similar to that of other complex sampling methods (although with significantly less precision than simple random samples of the same populations). We used data from a large number of real RDS studies to parameterize our simulated networks and RDS sampling process. These RDS samples were of PWID in large U.S. urban areas, so the results are likely most applicable to RDS samples drawn from large cities. Most of the largest RDS studies in the world occur in such places, such as studies conducted in China and Brazil (Szwarcwald, de Souza Junior, Damacena, Junior, and Kendall 2011; Li et al. 2014). However, many RDS samples are drawn from smaller populations in less urban areas, which may have population networks with significantly different structures than those in NHBS cities (Malekinejad et al. 2008). Sampling fractions may be substantial in studies of small populations, making it important to use variance estimators that reflect sampling without replacement (which the SH and VH estimators do not). McCreesh and colleagues conducted an RDS methodological study in Uganda that is more similar to such small populations than NHBS samples (McCreesh et al. 2012). They found that some subpopulations underrepresented in the sample did not have correspondingly lower mean degree, which led the RDS estimators to perform poorly. However, the poor performance of RDS estimators was also partially due to some recruiters’ misunderstanding of which population members were eligible for the study (and should be considered for recruitment) due to differences between the researchers’ and the local population’s interpretation of the language used to communicate the eligibility criteria (McCreesh et al. 2012). This misunderstanding led to systematically biased recruitment by some sample participants. With all sampling methods, but especially in peer-driven methods such as RDS, it is critical that researchers understand and account for the cultural norms and context of the communities they are sampling. These differences in population structure, RDS execution, and RDS estimation highlight the importance of context in understanding the applicability of RDS methodological study findings. Our results are subject to a number of limitations. First, although the networks created for our simulations were designed with some structural characteristics similar to those of PWID networks in NHBS cities, the true structure and complexity of hidden population networks is unknown. Almost all social networks contain structure that is not observed in RDS data. For example, an outcome might vary across a city’s neighborhoods, and the PWID networks in some neighborhoods may have few connections to those in other neighborhoods. The ERGM used to create our simulated networks did not directly specify such complex network structure as it is unclear what the correct levels of such structure should be. Note that for such network structure to strongly influence RDS estimation, it must be strongly related to the outcome (e.g., quite different prevalences of the trait across the weakly connected subgroups). Second, the characteristics we used to create the networks for our simulations were estimated from NHBS samples using RDS estimators. Therefore, the simulations are not replicates of the 40 samples collected by NHBS but are, instead, examples of networks and RDS processes similar to those observed in the NHBS samples. The results may be sensitive to our use of large networks and small sampling fractions as in the NHBS samples. The stability of NHBS samples of PWID over time suggests that our findings are applicable to future NHBS studies of PWID. Third, our simulations implemented RDS with only a few statistical assumptions not met. Both the SH and VH point estimators assume that recruitment trees do not branch (i.e., each sample member makes exactly one recruitment) and that sampling is with replacement, neither of which was true in our simulations. Other RDS statistical assumptions such as participants recruiting randomly from their set of contacts and, for the SS estimator, that the population size is known, were met. It is known that violations of RDS point estimator assumptions decrease the accuracy of RDS point estimates (Gile and Handcock 2010; Tomas and Gile 2011; Lu et al. 2012). This is likely true for RDS variance estimators as well. Future work will examine the effects of violations of assumptions on the performance of RDS variance estimators. Fourth, our analysis did not evaluate all RDS variance estimators. Some work has proposed new point estimators that were accompanied by minor modifications to an existing variance estimator to incorporate the new point estimator (Lu 2013; Lu, Malmros, Liljeros, and Britton 2013). Gile and Handcock introduced an estimator that simulates RDS on a synthetic network created from characteristics of the sample data (Gile and Handcock 2015). Yamanis and colleagues proposed a modification to the Salganik bootstrap that reflects the branching structure of RDS samples (Yamanis, Merli, Neely, Tian, Moody, et al. 2013). Baraff and colleagues recently proposed a tree bootstrap in which each resample replicates recruitment trees’ structures by sampling with replacement from each recruiter’s set of recruits (Baraff, McCormick, and Raftery 2016). 7. CONCLUSION Sampling hidden populations is critical for public health surveillance and planning around the world. RDS is effective at reaching members of hidden populations that other sampling methods cannot and is inexpensive enough to be used in low‐resource settings. These strengths have led to its wide use around the world for many different applications. Past research on RDS variance estimation suggested that RDS variance estimator CIs provide very low coverage rates and that RDS has higher DEs than has been assumed in the public health literature (Goel and Salganik 2010; Verdery et al. 2015). Our results indicate instead that both CI coverage rates and DEs are often acceptable but not perfect. However, researchers should evaluate whether a given study has characteristics similar to those found in our simulations that produced good (or poor) coverage. Additionally, deviations from the assumed RDS sampling process or population network structures not examined in this paper may impact the CI coverage rates and DE magnitudes for a given study. RDS is used around the world to sample hidden populations that suffer from high rates of infection by HIV and other diseases. It is critical that researchers draw correct conclusions from RDS data by applying appropriate statistical techniques. We look forward to an improved understanding of RDS estimation that will better inform the policies critical to preventing and reducing the burden of disease borne by hidden populations worldwide. Supplementary Materials Supplementary materials are available online at http://www.oxfordjournals.org/our_journals/jssam/. Footnotes 1 De-identified characteristics for each sample may be found in the supplementary materials. 2 De-identified characteristics for each sample may be found in the supplementary materials. 3 Sample numbers are prefixed with “A” for samples from 2009 and “B” for samples from 2012. Sample numbers were randomly assigned to cities and are consistent across the two survey years (e.g., A-01 is the same city as B-01). 4 Acceptable coverage percent is calculated as the percentage of confidence intervals (CIs) with coverage between 93 percent and 97 percent, inclusive, for a given estimator pair and bootstrap CI method. 5 We have also observed this pattern of SH estimator behavior in its implementation in the Respondent-Driven Sampling Analysis Tool v. 7.1 software (Volz, Wejnert, Cameron, Spiller, Barash, et al. 2012). It typically, but not always, occurs when “0” cells are present in the recruitment matrix (e.g., when two population subgroups do not recruit one another). References Baraff A. J., McCormick T. H., Raftery A. E. ( 2016), “ Estimating Uncertainty in Respondent-Driven Sampling Using a Tree Bootstrap Method,” Proceedings of the National Academy of Sciences U S A , 113, 14668– 14673. Google Scholar CrossRef Search ADS   Centers for Disease Control and Prevention ( 2012), “ HIV Infection and HIV-Associated Behaviors among Injecting Drug Users—20 Cities, United States, 2009,” MMWR , 61, 133– 138. PubMed  Centers for Disease Control and Prevention. ( 2015), “ HIV Infection and HIV-Associated Behaviors among Persons Who Inject Drugs—20 Cities, United States, 2012,” MMWR , 64(10), 270– 275. PubMed  Chernick M. R., LaBudde R. A. ( 2014), An Introduction to Bootstrap Methods with Applications to R , New Jersey: John Wiley & Sons. Davison A. C., Hinkley D. V. ( 1997), Bootstrap Methods and Their Applications , Cambridge: Cambridge University Press. Google Scholar CrossRef Search ADS   Efron B., Tibshirani R. ( 1986), “ Bootstrap Methods for Standard Errors, Confidence Intervals, and Other Measures of Statistical Accuracy,” Statistical Science , 1 1, 54– 75. Google Scholar CrossRef Search ADS   Frank O., Strauss D. ( 1986), “ Markov Graphs,” Journal of the American Statistical Association , 81, 832– 842. Google Scholar CrossRef Search ADS   Gallagher K. M., Sullivan P., Lansky A., Onorato I. M. ( 2007), “ Behavioral Surveillance among People at Risk for HIV Infection in the U.S.: The National HIV Behavioral Surveillance System,” Public Health Reports , 122 (Suppl 1), 32– 38. Google Scholar CrossRef Search ADS PubMed  Gile K. J. ( 2011), “ Improved Inference for Respondent-Driven Sampling Data with Application to HIV Prevalence Estimation,” Journal of the American Statistical Association , 106, 135– 146. Google Scholar CrossRef Search ADS   Gile K. J., Handcock M. S. ( 2010), “ Respondent-Driven Sampling: An Assessment of Current Methodology,” Sociological Methodology , 40, 285– 327. Google Scholar CrossRef Search ADS PubMed  Gile K. J., Handcock M. S. ( 2015), “ Network Model‐Assisted Inference from Respondent‐Driven Sampling Data,” Journal of the Royal Statistical Society: Series A (Statistics in Society) , 178, 619– 639. Google Scholar CrossRef Search ADS   Goel S., Salganik M. J. ( 2010), “ Assessing Respondent-Driven Sampling,” Proceedings of the National Academy of Sciences U S A , 107, 6743– 6747. Google Scholar CrossRef Search ADS   Goodreau S. M., Kitts J. A., Morris M. ( 2009), “ Birds of a Feather, or Friend of a Friend? Using Exponential Random Graph Models to Investigate Adolescent Social Networks,” Demography , 46, 103– 125. Google Scholar CrossRef Search ADS PubMed  Handcock M. S., Fellows I. E., Gile K. J. ( 2015), “ RDS: Respondent-Driven Sampling.” https://cran.r-project.org/web/packages/RDS/citation.html. Handcock M. S., Hunter D. R., Butts C. T., Goodreau S. M., Krivitsky P. N., Morris M. ( 2014), “ERGM: Fit, Simulate and Diagnose Exponential-Family Models for Networks.” https://cran.r-project.org/web/packages/ergm/citation.html. Harris K. M., Halpern C. T., Whitsel E., Hussey J., Tabor J., Entzel P., Udry J. R. ( 2009), “The National Longitudinal Study of Adolescent Health: Research Design [Www Document].” http://www.cpc.unc.edu/projects/addhealth/faqs/addhealth. Heckathorn D. D. ( 1997), “ Respondent-Driven Sampling: A New Approach to the Study of Hidden Populations,” Social Problems , 44, 174– 199. Google Scholar CrossRef Search ADS   Hladik W., Barker J., Ssenkusu J. M., Opio A., Tappero J. W., Hakim A., Serwadda D., for the Crane Survey Group ( 2012), “ HIV Infection among Men Who Have Sex with Men in Kampala, Uganda–a Respondent Driven Sampling Survey,” PLoS One , 7, e38143. Google Scholar CrossRef Search ADS PubMed  Hunter D. R., Goodreau S. M., Handcock M. S. ( 2008), “ Goodness of Fit of Social Network Models,” Journal of the American Statistical Association , 103: 481, 248– 258. Google Scholar CrossRef Search ADS   Hunter D. R., Handcock M. S. ( 2006), “ Inference in Curved Exponential Family Models for Networks,” Journal of Computational and Graphical Statistics , 15. Hunter D. R., Handcock M. S., Butts C. T., Goodreau S. M., Morris M. ( 2008), “ ERGM: A Package to Fit, Simulate and Diagnose Exponential-Family Models for Networks,” Journal of Statistical Software , 24, 1– 29. Google Scholar CrossRef Search ADS PubMed  Li X., Lu H., Cox C., Zhao Y., Xia D., Sun Y., He X., et al.   ( 2014), “ Changing the Landscape of the HIV Epidemic among MSM in China: Results from Three Consecutive Respondent-Driven Sampling Surveys from 2009 to 2011,” BioMed Research International , 2014, 10. Lu X. ( 2013), “ Linked Ego Networks: Improving Estimate Reliability and Validity with Respondent-Driven Sampling,” Social Networks , 35, 669– 685. Google Scholar CrossRef Search ADS   Lu X., Bengtsson L., Britton T., Camitz M., Kim B. J., Thorson A., Liljeros F. ( 2012), “ The Sensitivity of Respondent-Driven Sampling,” Journal of the Royal Statistical Society: Series A (Statistics in Society) , 175, 191– 216. Google Scholar CrossRef Search ADS   Lu X., Malmros J., Liljeros F., Britton T. ( 2013), “ Respondent-Driven Sampling on Directed Networks,” Electronic Journal of Statistics , 7, 292– 322. Google Scholar CrossRef Search ADS   Malekinejad M., Johnston L. G., Kendall C., Kerr L. R., Rifkin M. R., Rutherford G. W. ( 2008), “ Using Respondent-Driven Sampling Methodology for HIV Biological and Behavioral Surveillance in International Settings: A Systematic Review,” AIDS and Behavior , 12, S105– S130. Google Scholar CrossRef Search ADS PubMed  McCreesh N., Frost S. D., Seeley J., Katongole J., Tarsh M. N., Ndunguse R., Jichi F., et al.   ( 2012), “ Evaluation of Respondent-Driven Sampling,” Epidemiology , 23, 138– 147. Google Scholar CrossRef Search ADS PubMed  Merli M. G., Moody J., Smith J., Li J., Weir S., Chen X. ( 2015), “ Challenges to Recruiting Population Representative Samples of Female Sex Workers in China Using Respondent Driven Sampling,” Social Science and Medicine , 125, 79– 93. Google Scholar CrossRef Search ADS PubMed  Morris M. ( 2004), Network Epidemiology: A Handbook for Survey Design and Data Collection , Oxford, England: Oxford University Press. Google Scholar CrossRef Search ADS   Pettersson H., Silva P. L. ( 2005), “Analysis of Design Effects for Surveys in Developing Countries,” in Household Sample Surveys in Developing and Transition Countries , New York: United Nations Department of Economic and Social Affairs Statistics Division. https://unstats.un.org/unsd/HHsurveys/pdf/Household_surveys.pdf R Core Team ( 2015), “R: A Language and Environment for Statistical Computing.” https://cran.r-project.org/doc/FAQ/R-FAQ.html#Citing-R Salganik M. J. ( 2006), “ Variance Estimation, Design Effects, and Sample Size Calculations for Respondent-Driven Sampling,” Journal of Urban Health , 83, i98– 112. Google Scholar CrossRef Search ADS PubMed  Salganik M. J., Heckathorn D. D. ( 2004), “ Sampling and Estimation in Hidden Populations Using Respondent-Driven Sampling,” Sociological Methodology , 34, 193– 239. Google Scholar CrossRef Search ADS   Szwarcwald C. L., de Souza Junior P. R., Damacena G. N., Junior A. B., Kendall C. ( 2011), “ Analysis of Data Collected by RDS among Sex Workers in 10 Brazilian Cities, 2009: Estimation of the Prevalence of HIV, Variance, and Design Effect,” Journal of Acquired Immune Deficiency Syndromes , 57 (Suppl 3), S129– S135. Google Scholar CrossRef Search ADS PubMed  Tomas A., Gile K. J. ( 2011), “ The Effect of Differential Recruitment, Non-Response and Non-Recruitment on Estimators for Respondent-Driven Sampling,” Electronic Journal of Statistics , 5, 899– 934. Google Scholar CrossRef Search ADS   U.S. Census Bureau ( 2006), “Current Population Survey Design and Methodology Technical Paper 66.” Verdery A. M., Mouw T., Bauldry S., Mucha P. J. ( 2015), “ Network Structure and Biased Variance Estimation in Respondent Driven Sampling,” PLoS One , 10, e0145296. Google Scholar CrossRef Search ADS PubMed  Volz E., Heckathorn D. D. ( 2008), “ Probability Based Estimation Theory for Respondent Driven Sampling,” Journal of Official Statistics , 24, 79. Volz E., Wejnert C., Cameron C., Spiller M. W., Barash V., Degani I., Heckathorn D. D. ( 2012), “Respondent-Driven Sampling Analysis Tool (RDSat) Version 7.1.” Wejnert C. ( 2009), “ An Empirical Test of Respondent-Driven Sampling: Point Estimates, Variance, Degree Measures, and Out-of-Equilibrium Data,” Sociological Methodology , 39, 73– 116. Google Scholar CrossRef Search ADS PubMed  Wejnert C., Pham H., Krishna N., Le B., DiNenno E. ( 2012), “ Estimating Design Effect and Calculating Sample Size for Respondent-Driven Sampling Studies of Injection Drug Users in the United States,” AIDS and Behavior , 16, 797– 806. Google Scholar CrossRef Search ADS PubMed  Yamanis T. J., Merli M. G., Neely W. W., Tian F. F., Moody J., Tu X., Gao E. ( 2013), “ An Empirical Analysis of the Impact of Recruitment Patterns on RDS Estimates among a Socially Ordered Population of Female Sex Workers in China,” Sociological Methods and Research , 42, 392– 425. Google Scholar CrossRef Search ADS   Published by Oxford University Press on behalf of the American Association for Public Opinion Research 2017. This work is written by US Government employees and is in the public domain in the US.

Journal

Journal of Survey Statistics and MethodologyOxford University Press

Published: Mar 1, 2018

There are no references for this article.

You’re reading a free preview. Subscribe to read the entire article.


DeepDyve is your
personal research library

It’s your single place to instantly
discover and read the research
that matters to you.

Enjoy affordable access to
over 12 million articles from more than
10,000 peer-reviewed journals.

All for just $49/month

Explore the DeepDyve Library

Unlimited reading

Read as many articles as you need. Full articles with original layout, charts and figures. Read online, from anywhere.

Stay up to date

Keep up with your field with Personalized Recommendations and Follow Journals to get automatic updates.

Organize your research

It’s easy to organize your research with our built-in tools.

Your journals are on DeepDyve

Read from thousands of the leading scholarly journals from SpringerNature, Elsevier, Wiley-Blackwell, Oxford University Press and more.

All the latest content is available, no embargo periods.

See the journals in your area

DeepDyve Freelancer

DeepDyve Pro

Price
FREE
$49/month

$360/year
Save searches from
Google Scholar,
PubMed
Create lists to
organize your research
Export lists, citations
Read DeepDyve articles
Abstract access only
Unlimited access to over
18 million full-text articles
Print
20 pages/month
PDF Discount
20% off