Adjustment for Biased Sampling Using NHANES Derived Propensity Weights
The Consent-to-Contact (C2C) registry at the University of California, Irvine collects data from community participants to aid in the recruitment to clinical research studies. Self-selection into the C2C likely leads to bias due in part to enrollees having more years of education relative to the US general population. Salazar et al. (2020) recently used the C2C to examine associations of race/ethnicity with participant willingness to be contacted about research studies. To address questions about generalizability of estimated associations we estimate propensity for self-selection into the convenience sample weights using data from the National Health and Nutrition Examination Survey (NHANES). We create a combined dataset of C2C and NHANES subjects and compare different approaches (logistic regression, covariate balancing propensity score, entropy balancing, and random forest) for estimating the probability of membership in C2C relative to NHANES. We propose methods to estimate the variance of parameter estimates that account for uncertainty that arises from estimating propensity weights. Simulation studies explore the impact of propensity weight estimation on uncertainty. We demonstrate the approach by repeating the analysis by Salazar et al. with the deduced propensity weights for the C2C subjects and contrast the results of the two analyses. This method can be implemented using our estweight package in R available on GitHub.
READ FULL TEXT