Objective: We sought to establish the extent of repeat participation in a large annual cross-sectional survey of peo- ple who inject drugs and assess its implications for analysis. Results: We used “porn star names” (the name of each participant’s first pet followed by the name of the first street in which they lived) to identify repeat participation in three Australian Illicit Drug Reporting System surveys. Over 2013– 2015, 2468 porn star names (96.2%) appeared only once, 88 (3.4%) twice, and nine (0.4%) in all 3 years. We measured design effects, based on the between-cluster variability for selected estimates, of 1.01–1.07 for seven key variables. These values indicate that the complex sample is (e.g.) 7% less efficient in estimating prevalence of heroin use (ever) than a simple random sample, and 1% less efficient in estimating number of heroin overdoses (ever). Porn star names are a useful means of tracking research participants longitudinally while maintaining their anonymity. Repeat par- ticipation in the Australian Illicit Drug Reporting System is low (less than 5% per annum), meaning point-prevalence and effect estimation without correction for the lack of independence in observations is unlikely to seriously affect population inference. Keywords: Repeat participation, Cross-sectional survey, Point prevalence estimation, Design effect estimation, Population inference Introduction Australia’s Illicit Drug Reporting System (IDRS) is Primary health research involving complex sampling an annual survey of people who inject drugs (PWID), often employs inappropriate statistical approaches to designed to provide nationally comparable data about patterns of injecting drug use and related harms and inference, and often gives insufficient detail to provide inform future policy and research initiatives . A tacit methodological clarity [1, 2]. Related to this is the issue in assumption in the field, and in analysis of IDRS data, repeated cross-sectional studies whereby pooled cross- has been that similar cohorts of PWID participate in sectional estimation in the presence of repeat responses these surveys repeatedly. Given this assumption and the from the same individuals can yield biased estimates and potential problems associated with failure to incorpo incorrect estimates of standard error if inappropriate sta- - tistical methodology is applied [2, 3]. Typically, failure to rate a complex sample design into inference estimation, account for such lack of independence in observations or we sought to establish the extent of repeat participation clustering will underestimate standard errors, resulting in among IDRS participants and assess the implications for biased inference which in turn may lead to type I error reliable analysis of IDRS data. [4, 5]. The work described in this manuscript is the side prod - uct of another research project (the IDRS). *Correspondence: email@example.com Burnet Institute, 85 Commercial Rd, Melbourne, VIC 3004, Australia Full list of author information is available at the end of the article © The Author(s) 2018. This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creat iveco mmons .org/licen ses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creat iveco mmons .org/ publi cdoma in/zero/1.0/) applies to the data made available in this article, unless otherwise stated. Agius et al. BMC Res Notes (2018) 11:349 Page 2 of 4 observations from a simple random sample (SRS). For Main Text comparative purposes and given the convenience sam- Methods pling used in the IDRS, we also estimated the sample The IDRS involves a quantitative survey of PWID DEFF using jackknife re-sampling variance estimation recruited from capital cities in all Australian states and , essentially deriving the ratio of the variance from the territories. The same methodology has been employed jackknife variance estimator, accounting for participant since 1997 . To be eligible, participants are required to clustering to jackknife variance estimation assuming no have injected drugs at least monthly in the 6 months pre- repeat observations. (Jackknife variance estimation is ceding interview and to have resided in the same capital a data-dependent estimation method (i.e., not based on city during the previous 12 months. Convenience sam- normal theory) which estimates variance between point pling is facilitated by recruitment notices at needle and estimates from a process of iterative data re-sampling syringe programs, staff at these services advising poten - (based on the number of sample units in the sample), tial participants of the research, and snowballing (i.e., the where in each re-sampled set of data one observation recruitment of participants’ friends and associates via (either an individual response or a set of responses in the word of mouth). case of estimation accounting for participant clustering) The IDRS requires annual ethics approval: the Uni - is omitted.) Univariate sample means (proportions for versity of New South Wales Ethics Committee approved dichotomous measures) were produced to estimate prev- the national IDRS (# HC12086) in 2015. Informed con- alence for each factor. Taylor-linearised standard errors sent to participate in the study was obtained from all were used to report 95% confidence intervals about point participants. estimates, taking account of the lack of independence in Although the core of the IDRS questionnaire has varied observations . Analyses were undertaken using SPSS little since 1997, occasional changes are made to accom- (version 22) and Stata (version 13.1) statistical packages. modate new issues and facilitate new analyses. Since 2013, IDRS interviewers have been asking participants Results for their “porn star name” (PSN—the name of their first Eight hundred and eighty-six, 898 and 887 IDRS partici- pet followed by the name of the first street in which they pants supplied PSNs in 2013, 2014 and 2015 respectively, lived; e.g. the second author’s PSN is Sam Banyan) in giving 2565 unique names. Across the three IDRS sam- order to have some ability to assess the overlap in partici- ples, 2468 PSNs (96.2%) appeared only once, 88 (3.4%) pation across years. PSN has previously been shown to be twice (“doubles”), and nine (0.4%) in all 3 years (“triples”), a unique and reliable identifier . IDRS participants are giving a mean of 1.04 responses per participant. Of the asked for the information needed to create their PSN as 88 doubles, 79 (89.8%) occurred in consecutive years. follows: “What was your first pet’s name [if no pet, star Including triples, 29 names (1.1%) appeared in both 2013 sign]?” “What is the name of the first street you ever lived and 2015. Forty-four PSNs in the 2014 IDRS (4.9%) were in [or can remember living in]?” observed in 2013, and 43 names in the 2015 IDRS (4.8%) The two names given in each annual IDRS (2013–2015) in 2014. The low incidence of repeat observations across were exported to separate columns in an Excel file with three successive sets of IDRS participants suggests the IDRS years in a third column, sorted alphabetically on sample is almost entirely renewed every 2 years. Table 1 first (pet) names, inspected for discrepancies in spelling, shows the estimated prevalences of selected behaviours punctuation and capitalisation between plausibly match- across 2013–15 and their accuracy. ing names, then sorted on second (street) names and inspected again. Names occurring in two or three IDRS Discussion iterations were counted and unique IDs assigned to them. The finding of negligible overlap between IDRS samples We estimated the effect of repeat observation/partici - lends support to the notion that Australian PWID age- pation in terms of the variance associated with parameter ing is a population effect rather than a sample-specific estimates by calculating the design effect (DEFF) based one , and means that point-prevalence and effect esti - on the between-cluster variability for selected prevalence mation without correction for the lack of independence estimates (use of heroin, last 6 months and ever; use of in observations is unlikely to seriously affect population crystal methamphetamine (‘ice’), last 6 months and ever; inference. Nonetheless, as our analysis shows, repeated injected with a needle used by someone else, last month; cross-sectional IDRS samples do exhibit a small degree number of heroin overdoses, ever; number of injections, of repeat observation across periods and this does inflate last month). The DEFF represents the ratio of the vari - standard error marginally when estimating prevalence. ance of the complex estimator (i.e., accounting for par- This research also demonstrates that using a participant- ticipant clustering from repeated observations) to that generated anonymous unique identifier is an effective assuming prevalence was estimated on truly independent Agius et al. BMC Res Notes (2018) 11:349 Page 3 of 4 Table 1 Prevalences of selected behaviours, 2013–2015: mean, 95% confidence interval (95% CI) and design effect (DEFF) a b c Self-reported behaviour Mean 95% CI DEFF1 DEFF2 Used heroin, ever (n = 2565) .88 (.86, .89) 1.07 1.07 Used heroin, last 6 months (n = 2565) .59 (.57, .61) 1.06 1.05 Used ice, ever (n = 2562) .81 (.79, .82) 1.06 1.06 Used ice, last 6 months (n = 2564) .61 (.59, .63) 1.04 1.04 Injected with a needle used by someone else, last month .07 (.06, .08) 1.02 1.02 (n = 2489) No. of heroin overdoses, ever (n = 2081) 2.3 (1.8, 2.7) 1.01 1.01 No. of injections, last month (n = 2426) 38.3 (36.1, 40.5) 1.03 1.03 Indicates proportions for binary measures and average counts for interval measures DEFF1 = ratio of the variance of the complex estimator to that from estimation assuming SRS (e.g. DEFF of 1.06 indicates the complex sample is 6% less efficient in estimating prevalence than an SRS) DEFF2 = ratio of the variance of the jackknife estimator accounting for participant clustering in response to that from jackknife estimation without accounting for clustering means by which to identify participant clustering in One should also note that the IDRS is a non-probabil- repeated cross-sectional data and can be used to estimate ity sample, and in comparing standard errors for com- the degree of non-independence in sampling and cor- plex and SRS estimators, variance estimates from the rect standard errors if necessary. Despite the evidence SRS estimator assume random sampling from a specific that the level of non-independence of samples is low, in population frame. However, given there was virtually light of this lack of independence, appropriate and more no difference in DEFF estimates using data-dependent conservative methods of estimation of standard error jackknife re-sampling estimation, we expect that this (e.g. Taylor-linearised , cluster robust  jackknife will have negligible effect. Furthermore, readers should  or bootstrapped  standard errors) should be used note that the analyses undertaken in this research are where possible. Furthermore, where more complex vari- strictly exploratory and secondary to the aims of IDRS ance estimators are used in the estimation of standard data collection and reporting. error, it is important that the methodological approach be detailed comprehensively in published work in order Abbreviations to inform assessment of the quality of the research and to CI: confidence interval; DEFF: design effect; IDRS: Illicit Drug Reporting System; provide guidance for those with similar data . PSN: porn star name; PWID: people who inject drugs; SRS: simple random sample. Authors’ contributions Limitations PA, CA and PD conceived the research. PA and CA wrote the bulk of the More than 10% of IDRS participants in each year did not manuscript, with comments and approval from PD. CB extracted and supplied the necessary datasets from the national IDRS archive and commented on supply a PSN, affecting the accuracy of our estimates to and approved the manuscript. PA and CA analysed and interpreted the data. an unknown extent. Several participants reported no first All authors contributed to the final manuscript. All authors read and approved pet so gave a star sign instead (resulting in PSNs such as the final manuscript. “Cancer Unknown”), and several unusual street names Author details (from the same city) were repeated but accompanied by 1 2 Burnet Institute, 85 Commercial Rd, Melbourne, VIC 3004, Australia. School a pet name in 1 year and a star sign in another, which we of Public Health and Preventive Medicine, Monash University, 99 Commercial Rd, Melbourne, VIC 3004, Australia. National Drug and Alcohol Research regarded as denoting different individuals. It is possible Centre, University of New South Wales, Sydney, NSW 2052, Australia. that these data mean we have underestimated repeat par- ticipation, but their rarity means they have only a slight Acknowledgements We gratefully acknowledge the IDRS participants’ willingness to contribute to effect. Conversely, some of the few combinations we the research and the IDRS research assistants’ efforts in recruiting and inter - assessed as identical (e.g. Satan Holmes/Homes) might viewing participants. The IDRS is funded by the Australian Government under have been from separate individuals, thus overestimat- the Substance Misuse Prevention and Service Improvement Grants Fund. The Burnet Institute receives support from the Victorian Operational Infrastructure ing repetition. Careful programming, such as probability- Support Program. Paul Dietze holds an NHMRC Senior Research Fellowship. based matching methods/algorithms (e.g. fuzzy matching , soundex code ), would be needed to match mis- Competing interests The authors declare that they have no competing interests. spelt names in larger datasets efficiently and to quantify the degree of error that is associated with matching. Agius et al. BMC Res Notes (2018) 11:349 Page 4 of 4 Availability of data and materials 2. West BT, Sakshaug JW, Aurelien GAS. How big of a problem is analytic The datasets generated and/or analysed during the current study are not error in secondary analyses of survey data? PLoS ONE. 2016;11:e0158120. publicly available due to privacy concerns, but may be made available from 3. Briesacher BA, Tjia J, Doubeni CA, Chen Y, Rao SR. Methodological issues the National Drug and Alcohol Research Centre on reasonable request. Com- in using multiple years of the medicare current beneficiary survey. Medi- prehensive summary data for 2013–2015 are available at: http://www.drugt care Medicaid Res Rev. 2012;2:002.01.a04. rends .org.au/repor ts/?p=IDRS. 4. Kish L, Frankel MR. Inference from complex samples. J R Stat Soc Series B. 1974;36:1–37. Consent for publication 5. Sakshaug JW, West BT. Important considerations when analyzing health Not applicable. survey data collected using a complex sample design. Am J Public Health. 2014;104:15–6. Ethics approval and consent to participate 6. Stafford J, Breen C. Findings from the Illicit Drug Reporting System (IDRS). The University of New South Wales Ethics Committee approved the national In: Australian drug trends 2015 no. 145. Sydney: National Drug and Alco- IDRS (# HC12086) in 2015. Informed consent to participate in the study was hol Research Centre; 2016. obtained from all participants. 7. Lim MS, Bowring A, Gold J, Hellard ME. What’s your “porn star” name? A novel method of identifying research participants. Sex Transm Dis. Funding 2011;38:150–1. The IDRS is funded by the Australian Government under the Substance 8. Tukey JW. Bias and confidence in not-quite large samples. Ann Stat. Misuse Prevention and Service Improvement Grants Fund. The Burnet Institute 1958;29:614. receives support from the Victorian Operational Infrastructure Support Pro- 9. Wolter KM. Introduction to variance estimation. New York: Springer; 1985. gram. Paul Dietze holds an NHMRC Senior Research and Career Development 10. Huber PJ. The behavior of maximum likelihood estimates under non- Fellowship. standard conditions. In: Fifth Berkeley Symposium on Mathematical Statistics and Probability. Berkeley, CA: University of California Press; 1967. p. 221–233. Publisher’s Note 11. Efron B. Bootstrap methods: another look at the jackknife. Ann Stat. Springer Nature remains neutral with regard to jurisdictional claims in pub- 1979;7:1–26. lished maps and institutional affiliations. 12. Vandenbroucke JP, von Elm E, Altman DG, Gøtzsche PC, Mulrow CD, Pocock SJ, et al. Strengthening the reporting of observational stud- Received: 19 June 2017 Accepted: 30 May 2018 ies in epidemiology (strobe): explanation and elaboration. PLoS Med. 2007;4:e297. 13. Wasi N, Flaaen A. Record linkage using stata: preprocessing, linking, and reviewing utilities. Stata J. 2015;15:672–97. 14. Stata. Stata services: Matching strings. http://www.stata .com/stata list/ References archi ve/2002-11/msg00 480.html (2002) Accessed 25 Feb 2017. 1. Bell BA, Onwuegbuzie AJ, Ferron JM, Jiao QG, Hibbard ST, Kromrey JD. Use of design effects and sample weights in complex health survey data: a review of published articles using data from 3 commonly used adoles- cent health surveys. Am J Public Health. 2012;102:1399–405. Ready to submit your research ? Choose BMC and benefit from: fast, convenient online submission thorough peer review by experienced researchers in your ﬁeld rapid publication on acceptance support for research data, including large and complex data types • gold Open Access which fosters wider collaboration and increased citations maximum visibility for your research: over 100M website views per year At BMC, research is always in progress. Learn more biomedcentral.com/submissions
BMC Research Notes – Springer Journals
Published: Jun 4, 2018
It’s your single place to instantly
discover and read the research
that matters to you.
Enjoy affordable access to
over 18 million articles from more than
15,000 peer-reviewed journals.
All for just $49/month
Query the DeepDyve database, plus search all of PubMed and Google Scholar seamlessly
Save any article or search result from DeepDyve, PubMed, and Google Scholar... all in one place.
Get unlimited, online access to over 18 million full-text articles from more than 15,000 scientific journals.
Read from thousands of the leading scholarly journals from SpringerNature, Elsevier, Wiley-Blackwell, Oxford University Press and more.
All the latest content is available, no embargo periods.
“Hi guys, I cannot tell you how much I love this resource. Incredible. I really believe you've hit the nail on the head with this site in regards to solving the research-purchase issue.”Daniel C.
“Whoa! It’s like Spotify but for academic articles.”@Phil_Robichaud
“I must say, @deepdyve is a fabulous solution to the independent researcher's problem of #access to #information.”@deepthiw
“My last article couldn't be possible without the platform @deepdyve that makes journal papers cheaper.”@JoseServera