Respondent-driven sampling : theory, limitations and improvements
Background: The key purpose of sampling is to gain knowledge about a population using a small, affordable subset of selected individuals. This goal is often approached by choosing a representative sample with each individual’s selection probability determined by a full list of individuals from the target population. However, for many populations central to the public health sciences, such as men who have sex with men (MSM), injecting drug users (IDUs), etc., the selection probability of individuals cannot be determined ahead of time because the list of all individuals is not available, impairing the generalization of results from the sample to the population. Respondent-driven sampling (RDS) was developed to generate representative samples of such hard-to-reach populations with improved accessibility. It provides an automated self-growing sampling design as well as asymptotically unbiased population estimates, making it the state-of-the-art sampling method for studying HIV-related key populations at risk in the past years. However, the availability of RDS estimates relies on many assumptions that are often not satisfied in real practice.
Aims: To assess the effect of violating assumptions on the performance of RDS estimators and to improve both the implementation and methodology of RDS for hard-to-reach populations of relevance to the public health sciences.
Contributions: The performance of RDS estimators is evaluated under various conditions. Results indicate that long chains initiated by diverse seeds are highly beneficial, while estimate bias is large if the network is directed or if respondents’ participation behavior (such as preferential recruitment) depends on characteristics that are correlated with study outcomes. An Internet-based RDS (WebRDS) recruiting system is developed to circumvent the limitation of physical interview-based implementations. The system shows its ability to recruit sustaining location-free respondents in a study of MSM in Vietnam. Statistical methods are developed to generalize the RDS method from undirected networks to directed networks. The new method can function as a sensitivity test tool to account for the uncertainties of network directedness and error in self-reported degree data. Lastly, by integrating traditional RDS chain data with self-reported ego network data, a new estimator was developed to improve the reliability and validity of RDS. The new estimator shows not only improved precision, but also strong robustness to the preference of peer recruitment and variations in network structural properties.
Conclusions: Violations of assumptions are inevitable and should be investigated thoroughly in RDS practice. Due to the relatively high variance and vulnerability to certain harmful conditions, such as directedness, preferential recruitment, etc., results from RDS studies should be interpreted with caution. Researchers are encouraged to collect ego network data through the implementation of RDS to improve the precision of population estimates. In spite of its limited ability to generate close-enough population estimates, RDS is easily implementable and it offers a method with an improved response rate, providing an alternative to gain access/venue to the understanding of hard-to-access population.
List of scientific papers
I. Lu X, Bengtsson L, Britton T, Camitz M, Kim BJ, Thorson A, Liljeros F. The sensitivity of respondent-driven sampling. Journal of the Royal Statistical Society Series A. 2012, 175: 191-216.
https://doi.org/10.1111/j.1467-985X.2011.00711.x
II. Bengtsson L, Lu X, Nguyen QC, Camitz M, Hoang NL, Liljeros F, Thorson A et al. Implementation of Web-Based Respondent-Driven Sampling among Men Who Have Sex with Men in Vietnam. PLOS ONE. 2012, 7 (11):e49417.
https://doi.org/10.1371/journal.pone.0049417
III. Lu X, Malmros J, Liljeros F, Britton T. Respondent-driven Sampling on Directed Networks. Electronic Journal of Statistics. 2013, 7: 292-322.
https://doi.org/10.1214/13-EJS772
IV. Lu X. Linked Ego Networks: Improving Estimate Reliability and Validity with Respondent-driven Sampling. 2012. arXiv: 1205.1971v2. [Submitted]
History
Defence date
2013-02-22Department
- Department of Global Public Health
Publisher/Institution
Karolinska InstitutetMain supervisor
Fredrik, LiljerosPublication year
2013Thesis type
- Doctoral thesis
ISBN
978-91-7549-047-2Number of supporting papers
4Language
- eng