Model Assisted Sensitivity Analyses for Hidden Bias with Binary OutcomesNattino, Giovanni; Lu, Bo
doi: 10.1111/biom.12919pmid: 29992547
SummaryIn medical and health sciences, observational studies are a major data source for inferring causal relationships. Unlike randomized experiments, observational studies are vulnerable to the hidden bias introduced by unmeasured confounders. The impact of unmeasured covariates on the causal effect can be assessed by conducting a sensitivity analysis. A comprehensive framework of sensitivity analyses has been developed for matching designs. Sensitivity parameters are introduced to capture the association between the missing covariates and the exposure or the outcome. Fixing sensitivity parameter values, it is possible to compute the bounds of the p-value of a randomization test on causal effects. We propose a model assisted sensitivity analysis with binary outcomes for the general 1:k matching design, which provides results equivalent to the conventional nonparametric approach in large sample. By introducing a conditional logistic outcome model, we substantially simplify the implementation and interpretation of the sensitivity analysis. More importantly, we are able to provide a closed form representation for the set of sensitivity parameters for which the maximum p-values are non-significant. This methodology can be easily extended to matching designs with multilevel treatments. We illustrate our method using a U.S. trauma care database to examine mortality difference between trauma care levels.
Sensitivity Analysis and Power for Instrumental Variable StudiesWang, Xuran; Jiang, Yang; Zhang, Nancy R.; Small, Dylan S.
doi: 10.1111/biom.12873pmid: 29603714
SummaryIn observational studies to estimate treatment effects, unmeasured confounding is often a concern. The instrumental variable (IV) method can control for unmeasured confounding when there is a valid IV. To be a valid IV, a variable needs to be independent of unmeasured confounders and only affect the outcome through affecting the treatment. When applying the IV method, there is often concern that a putative IV is invalid to some degree. We present an approach to sensitivity analysis for the IV method which examines the sensitivity of inferences to violations of IV validity. Specifically, we consider sensitivity when the magnitude of association between the putative IV and the unmeasured confounders and the direct effect of the IV on the outcome are limited in magnitude by a sensitivity parameter. Our approach is based on extending the Anderson–Rubin test and is valid regardless of the strength of the instrument. A power formula for this sensitivity analysis is presented. We illustrate its usage via examples about Mendelian randomization studies and its implications via a comparison of using rare versus common genetic variants as instruments.
A Powerful Approach to the Study of Moderate Effect Modification in Observational StudiesLee, Kwonsang; Small, Dylan S.; Rosenbaum, Paul R.
doi: 10.1111/biom.12884pmid: 29738603
SummaryEffect modification means the magnitude or stability of a treatment effect varies as a function of an observed covariate. Generally, larger and more stable treatment effects are insensitive to larger biases from unmeasured covariates, so a causal conclusion may be considerably firmer if this pattern is noted if it occurs. We propose a new strategy, called the submax-method, that combines exploratory, and confirmatory efforts to determine whether there is stronger evidence of causality—that is, greater insensitivity to unmeasured confounding—in some subgroups of individuals. It uses the joint distribution of test statistics that split the data in various ways based on certain observed covariates. For L binary covariates, the method splits the population L times into two subpopulations, perhaps first men and women, perhaps then smokers and nonsmokers, computing a test statistic from each subpopulation, and appends the test statistic for the whole population, making test statistics in total. Although L binary covariates define interaction groups, only tests are performed, and at least of these tests use at least half of the data. The submax-method achieves the highest design sensitivity and the highest Bahadur efficiency of its component tests. Moreover, the form of the test is sufficiently tractable that its large sample power may be studied analytically. The simulation suggests that the submax method exhibits superior performance, in comparison with an approach using CART, when there is effect modification of moderate size. Using data from the NHANES I epidemiologic follow-up survey, an observational study of the effects of physical activity on survival is used to illustrate the method. The method is implemented in the package which contains the NHANES example. An online Appendix provides simulation results and further analysis of the example.
Doubly Robust Matching Estimators for High Dimensional Confounding AdjustmentAntonelli, Joseph; Cefalu, Matthew; Palmer, Nathan; Agniel, Denis
doi: 10.1111/biom.12887pmid: 29750844
SummaryValid estimation of treatment effects from observational data requires proper control of confounding. If the number of covariates is large relative to the number of observations, then controlling for all available covariates is infeasible. In cases where a sparsity condition holds, variable selection or penalization can reduce the dimension of the covariate space in a manner that allows for valid estimation of treatment effects. In this article, we propose matching on both the estimated propensity score and the estimated prognostic scores when the number of covariates is large relative to the number of observations. We derive asymptotic results for the matching estimator and show that it is doubly robust in the sense that only one of the two score models need be correct to obtain a consistent estimator. We show via simulation its effectiveness in controlling for confounding and highlight its potential to address nonlinear confounding. Finally, we apply the proposed procedure to analyze the effect of gender on prescription opioid use using insurance claims data.
Optimal Two-Stage Dynamic Treatment Regimes from a Classification Perspective with Censored Survival DataHager, Rebecca; Tsiatis, Anastasios A.; Davidian, Marie
doi: 10.1111/biom.12894pmid: 29775203
SummaryClinicians often make multiple treatment decisions at key points over the course of a patient's disease. A dynamic treatment regime is a sequence of decision rules, each mapping a patient's observed history to the set of available, feasible treatment options at each decision point, and thus formalizes this process. An optimal regime is one leading to the most beneficial outcome on average if used to select treatment for the patient population. We propose a method for estimation of an optimal regime involving two decision points when the outcome of interest is a censored survival time, which is based on maximizing a locally efficient, doubly robust, augmented inverse probability weighted estimator for average outcome over a class of regimes. By casting this optimization as a classification problem, we exploit well-studied classification techniques such as support vector machines to characterize the class of regimes and facilitate implementation via a backward iterative algorithm. Simulation studies of performance and application of the method to data from a sequential, multiple assignment randomized clinical trial in acute leukemia are presented.
Bayesian Nonparametric Generative Models for Causal Inference with Missing at Random CovariatesRoy, Jason; Lum, Kirsten J.; Zeldow, Bret; Dworkin, Jordan D.; Re, Vincent Lo; Daniels, Michael J.
doi: 10.1111/biom.12875pmid: 29579341
SummaryWe propose a general Bayesian nonparametric (BNP) approach to causal inference in the point treatment setting. The joint distribution of the observed data (outcome, treatment, and confounders) is modeled using an enriched Dirichlet process. The combination of the observed data model and causal assumptions allows us to identify any type of causal effect—differences, ratios, or quantile effects, either marginally or for subpopulations of interest. The proposed BNP model is well-suited for causal inference problems, as it does not require parametric assumptions about the distribution of confounders and naturally leads to a computationally efficient Gibbs sampling algorithm. By flexibly modeling the joint distribution, we are also able to impute (via data augmentation) values for missing covariates within the algorithm under an assumption of ignorable missingness, obviating the need to create separate imputed data sets. This approach for imputing the missing covariates has the additional advantage of guaranteeing congeniality between the imputation model and the analysis model, and because we use a BNP approach, parametric models are avoided for imputation. The performance of the method is assessed using simulation studies. The method is applied to data from a cohort study of human immunodeficiency virus/hepatitis C virus co-infected patients.
Nonparametric Estimation of Transition Probabilities for a General Progressive Multi-State Model Under Cross-Sectional Samplingde Uña-Álvarez, Jacobo; Mandel, Micha
doi: 10.1111/biom.12874pmid: 29603718
SummaryNonparametric estimation of the transition probability matrix of a progressive multi-state model is considered under cross-sectional sampling. Two different estimators adapted to possibly right-censored and left-truncated data are proposed. The estimators require full retrospective information before the truncation time, which, when exploited, increases efficiency. They are obtained as differences between two survival functions constructed for sub-samples of subjects occupying specific states at a certain time point. Both estimators correct the oversampling of relatively large survival times by using the left-truncation times associated with the cross-sectional observation. Asymptotic results are established, and finite sample performance is investigated through simulations. One of the proposed estimators performs better when there is no censoring, while the second one is strongly recommended with censored data. The new estimators are applied to data on patients in intensive care units (ICUs).
Semiparametric Regression Analysis of Interval-Censored Data with Informative DropoutGao, Fei; Zeng, Donglin; Lin, Dan-Yu
doi: 10.1111/biom.12911pmid: 29870067
SummaryInterval-censored data arise when the event time of interest can only be ascertained through periodic examinations. In medical studies, subjects may not complete the examination schedule for reasons related to the event of interest. In this article, we develop a semiparametric approach to adjust for such informative dropout in regression analysis of interval-censored data. Specifically, we propose a broad class of joint models, under which the event time of interest follows a transformation model with a random effect and the dropout time follows a different transformation model but with the same random effect. We consider nonparametric maximum likelihood estimation and develop an EM algorithm that involves simple and stable calculations. We prove that the resulting estimators of the regression parameters are consistent, asymptotically normal, and asymptotically efficient with a covariance matrix that can be consistently estimated through profile likelihood. In addition, we show how to consistently estimate the survival function when dropout represents voluntary withdrawal and the cumulative incidence function when dropout is an unavoidable terminal event. Furthermore, we assess the performance of the proposed numerical and inferential procedures through extensive simulation studies. Finally, we provide an application to data on the incidence of diabetes from a major epidemiological cohort study.
Pseudo and Conditional Score Approach to Joint Analysis of Current Count and Current Status DataWen, Chi-Chung; Chen, Yi-Hau
doi: 10.1111/biom.12880pmid: 29665618
SummaryWe develop a joint analysis approach for recurrent and nonrecurrent event processes subject to case I interval censorship, which are also known in literature as current count and current status data, respectively. We use a shared frailty to link the recurrent and nonrecurrent event processes, while leaving the distribution of the frailty fully unspecified. Conditional on the frailty, the recurrent event is assumed to follow a nonhomogeneous Poisson process, and the mean function of the recurrent event and the survival function of the nonrecurrent event are assumed to follow some general form of semiparametric transformation models. Estimation of the models is based on the pseudo-likelihood and the conditional score techniques. The resulting estimators for the regression parameters and the unspecified baseline functions are shown to be consistent with rates of square and cubic roots of the sample size, respectively. Asymptotic normality with closed-form asymptotic variance is derived for the estimator of the regression parameters. We apply the proposed method to a fracture-osteoporosis survey data to identify risk factors jointly for fracture and osteoporosis in elders, while accounting for association between the two events within a subject.
Using Survival Information in Truncation by Death Problems without the Monotonicity AssumptionYang, Fan; Ding, Peng
doi: 10.1111/biom.12883pmid: 29665626
SummaryIn some randomized clinical trials, patients may die before the measurement time point of their outcomes. Even though randomization generates comparable treatment and control groups, the remaining survivors often differ significantly in background variables that are prognostic to the outcomes. This is called the truncation by death problem. Under the potential outcomes framework, the only well-defined causal effect on the outcome is within the subgroup of patients who would always survive under both treatment and control. Because the definition of the subgroup depends on the potential values of the survival status that could not be observed jointly, without making strong parametric assumptions, we cannot identify the causal effect of interest and consequently can only obtain bounds of it. Unfortunately, however, many bounds are too wide to be useful. We propose to use detailed survival information before and after the measurement time point of the outcomes to sharpen the bounds of the subgroup causal effect. Because survival times contain useful information about the final outcome, carefully utilizing them could improve statistical inference without imposing strong parametric assumptions. Moreover, we propose to use a copula model to relax the commonly-invoked but often doubtful monotonicity assumption that the treatment extends the survival time for all patients.