journal article
LitStream Collection
Tsiatis, Anastasios A.; Davidian, Marie
doi: 10.1111/biom.13509pmid: 34174097
The COVID‐19 pandemic due to the novel coronavirus SARS CoV‐2 has inspired remarkable breakthroughs in the development of vaccines against the virus and the launch of several phase 3 vaccine trials in Summer 2020 to evaluate vaccine efficacy (VE). Trials of vaccine candidates using mRNA delivery systems developed by Pfizer‐BioNTech and Moderna have shown substantial VEs of 94–95%, leading the US Food and Drug Administration to issue Emergency Use Authorizations and subsequent widespread administration of the vaccines. As the trials continue, a key issue is the possibility that VE may wane over time. Ethical considerations dictate that trial participants be unblinded and those randomized to placebo be offered study vaccine, leading to trial protocol amendments specifying unblinding strategies. Crossover of placebo subjects to vaccine complicates inference on waning of VE. We focus on the particular features of the Moderna trial and propose a statistical framework based on a potential outcomes formulation within which we develop methods for inference on potential waning of VE over time and estimation of VE at any postvaccination time. The framework clarifies assumptions made regarding individual‐ and population‐level phenomena and acknowledges the possibility that subjects who are more or less likely to become infected may be crossed over to vaccine differentially over time. The principles of the framework can be adapted straightforwardly to other trials.
Wang, Wei; Lu, Shou‐En; Cheng, Jerry Q.; Xie, Minge; Kostis, John B.
doi: 10.1111/biom.13469pmid: 33847371
Multivariate failure time data are frequently analyzed using the marginal proportional hazards models and the frailty models. When the sample size is extraordinarily large, using either approach could face computational challenges. In this paper, we focus on the marginal model approach and propose a divide‐and‐combine method to analyze large‐scale multivariate failure time data. Our method is motivated by the Myocardial Infarction Data Acquisition System (MIDAS), a New Jersey statewide database that includes 73,725,160 admissions to nonfederal hospitals and emergency rooms (ERs) from 1995 to 2017. We propose to randomly divide the full data into multiple subsets and propose a weighted method to combine these estimators obtained from individual subsets using three weights. Under mild conditions, we show that the combined estimator is asymptotically equivalent to the estimator obtained from the full data as if the data were analyzed all at once. In addition, to screen out risk factors with weak signals, we propose to perform the regularized estimation on the combined estimator using its combined confidence distribution. Theoretical properties, such as consistency, oracle properties, and asymptotic equivalence between the divide‐and‐combine approach and the full data approach are studied. Performance of the proposed method is investigated using simulation studies. Our method is applied to the MIDAS data to identify risk factors related to multivariate cardiovascular‐related health outcomes.
Edelmann, Dominic; Welchowski, Thomas; Benner, Axel
doi: 10.1111/biom.13470pmid: 33847373
Distance covariance is a powerful new dependence measure that was recently introduced by Székely et al. and Székely and Rizzo. In this work, the concept of distance covariance is extended to measuring dependence between a covariate vector and a right‐censored survival endpoint by establishing an estimator based on an inverse‐probability‐of‐censoring weighted U‐statistic. The consistency of the novel estimator is derived. In a large simulation study, it is shown that induced distance covariance permutation tests show a good performance in detecting various complex associations. Applying the distance covariance permutation tests on a gene expression dataset from breast cancer patients outlines its potential for biostatistical practice.
Basak, Piyali; Linero, Antonio; Sinha, Debajyoti; Lipsitz, Stuart
doi: 10.1111/biom.13478pmid: 33864633
Popular parametric and semiparametric hazards regression models for clustered survival data are inappropriate and inadequate when the unknown effects of different covariates and clustering are complex. This calls for a flexible modeling framework to yield efficient survival prediction. Moreover, for some survival studies involving time to occurrence of some asymptomatic events, survival times are typically interval censored between consecutive clinical inspections. In this article, we propose a robust semiparametric model for clustered interval‐censored survival data under a paradigm of Bayesian ensemble learning, called soft Bayesian additive regression trees or SBART (Linero and Yang, 2018), which combines multiple sparse (soft) decision trees to attain excellent predictive accuracy. We develop a novel semiparametric hazards regression model by modeling the hazard function as a product of a parametric baseline hazard function and a nonparametric component that uses SBART to incorporate clustering, unknown functional forms of the main effects, and interaction effects of various covariates. In addition to being applicable for left‐censored, right‐censored, and interval‐censored survival data, our methodology is implemented using a data augmentation scheme which allows for existing Bayesian backfitting algorithms to be used. We illustrate the practical implementation and advantages of our method via simulation studies and an analysis of a prostate cancer surgery study where dependence on the experience and skill level of the physicians leads to clustering of survival times. We conclude by discussing our method's applicability in studies involving high‐dimensional data with complex underlying associations.
Yi, Grace Y.; He, Wenqing; Carroll, Raymond. J.
doi: 10.1111/biom.13479pmid: 33881782
Data with a huge size present great challenges in modeling, inferences, and computation. In handling big data, much attention has been directed to settings with “large p small n”, and relatively less work has been done to address problems with p and n being both large, though data with such a feature have now become more accessible than before, where p represents the number of variables and n stands for the sample size. The big volume of data does not automatically ensure good quality of inferences because a large number of unimportant variables may be collected in the process of gathering informative variables. To carry out valid statistical analysis, it is imperative to screen out noisy variables that have no predictive value for explaining the outcome variable. In this paper, we develop a screening method for handling large‐sized survival data, where the sample size n is large and the dimension p of covariates is of non‐polynomial order of the sample size n, or the so‐called NP‐dimension. We rigorously establish theoretical results for the proposed method and conduct numerical studies to assess its performance. Our research offers multiple extensions of existing work and enlarges the scope of high‐dimensional data analysis. The proposed method capitalizes on the connections among useful regression settings and offers a computationally efficient screening procedure. Our method can be applied to different situations with large‐scale data including genomic data.
Showing 1 to 10 of 45 Articles