Statistics

journal article

Open Access Collection

Finite mixtures in capture-recapture surveys for modelling residency patterns in marine wildlife populations

Caruso, Gianmarco;Di Loro, Pierfrancesco Alaimo;Mingione, Marco;Tardella, Luca;Pace, Daniela Silvia;Lasinio, Giovanna Jona

2023 Statistics

doi: N/Apmid: N/A

Abstract: In this work, the goal is to estimate the abundance of an animal population using data coming from capture-recapture surveys. We leverage the prior knowledge about the population's structure to specify a parsimonious finite mixture model tailored to its behavioral pattern. Inference is carried out under the Bayesian framework, where we discuss suitable priors' specification that could alleviate label-switching and non-identifiability issues affecting finite mixtures. We conduct simulation experiments to show the competitive advantage of our proposal over less specific alternatives. Finally, the proposed model is used to estimate the common bottlenose dolphins' population size at the Tiber River estuary (Mediterranean Sea), using data collected via photo-identification from 2018 to 2020. Results provide novel insights on the population's size and structure, and shed light on some of the ecological processes governing the population dynamics.

journal article

Open Access Collection

Simulations for estimation of heterogeneity variance and overall effect with constant and inverse-variance weights in meta-analysis of difference in standardized means (DSM)

Kulinskaya, Elena;Hoaglin, David C.

2023 Statistics

doi: N/Apmid: N/A

Abstract: When the individual studies assembled for a meta-analysis report means ($\mu_C$, $\mu_T$) for their treatment (T) and control (C) arms, but those data are on different scales or come from different instruments, the customary measure of effect is the standardized mean difference (SMD). The SMD is defined as the difference between the means in the treatment and control arms, standardized by the assumed common standard deviation, $\sigma$. However, if the variances in the two arms differ, there is no consensus on a definition of SMD. Thus, we propose a new effect measure, the difference of standardized means (DSM), defined as $\Delta = \mu_T/\sigma_T - \mu_C/\sigma_C$. The estimated DSM can easily be used as an effect measure in standard meta-analysis. For random-effects meta-analysis of DSM, we introduce new point and interval estimators of the between-studies variance ($\tau^2$) based on the $Q$ statistic with effective-sample-size weights, $Q_F$. We study, by simulation, bias and coverage of these new estimators of $\tau^2$ and related estimators of $\Delta$. For comparison, we also study bias and coverage of well-known estimators based on the $Q$ statistic with inverse-variance weights, $Q_{IV}$, such as the Mandel-Paule, DerSimonian-Laird, and restricted-maximum-likelihood estimators.

journal article

Open Access Collection

Multivariate probability distribution for categorical and ordinal random variables

Arai, Takashi

2023 Statistics

doi: N/Apmid: N/A

Abstract: We propose a multivariate probability distribution for categorical and ordinal random variables. To this end, we use the Grassmann distribution in conjunction with dummy encoding of categorical and ordinal variables. To realize the co-occurrence probabilities of dummy variables required for categorical and ordinal variables, we propose a parsimonious parameterization for the Grassmann distribution that ensures the positivity of probability distribution. As an application of the proposed distribution, we develop a factor analysis for categorical and ordinal variables and show the validity of the model using a real dataset.

journal article

Open Access Collection

A Bayesian aoristic logistic regression to model spatio-temporal crime risk under the presence of interval-censored event times

Briz-Redón, Álvaro

2023 Statistics

doi: N/Apmid: N/A

Abstract: From a statistical point of view, crime data present certain peculiarities that have led to a growing interest in their analysis. In particular, a characteristic that some property crimes frequently present is the existence of uncertainty about their exact location in time, being usual to only have a time window that delimits the occurrence of the event. There are different methods to deal with this type of interval-censored observation, most of them based on event time imputation. Another alternative is to carry out an aoristic analysis, which is based on assigning the same weight to each time unit included in the interval that limits the uncertainty about the event. However, this method has its limitations. In this paper, we present a spatio-temporal model based on the logistic regression that allows the analysis of crime data with temporal uncertainty, following the spirit of the aoristic method. The model is developed from a Bayesian perspective, which allows accommodating the temporal uncertainty of the observations. The model is applied to a dataset of residential burglaries recorded in Valencia, Spain. The results provided by this model are compared with those corresponding to the complete cases model, which discards temporally-uncertain events.

journal article

Open Access Collection

Statistical inference for dependent competing risks data under adaptive Type-II progressive hybrid censoring

Dutta, Subhankar;Kayal, Suchandan

2023 Statistics

doi: N/Apmid: N/A

Abstract: In this article, we consider statistical inference based on dependent competing risks data from Marshall-Olkin bivariate Weibull distribution. The maximum likelihood estimates of the unknown model parameters have been computed by using the Newton-Raphson method under adaptive Type II progressive hybrid censoring with partially observed failure causes. The existence and uniqueness of maximum likelihood estimates are derived. Approximate confidence intervals have been constructed via the observed Fisher information matrix using the asymptotic normality property of the maximum likelihood estimates. Bayes estimates and highest posterior density credible intervals have been calculated under gamma-Dirichlet prior distribution by using the Markov chain Monte Carlo technique. Convergence of Markov chain Monte Carlo samples is tested. In addition, a Monte Carlo simulation is carried out to compare the effectiveness of the proposed methods. Further, three different optimality criteria have been taken into account to obtain the most effective censoring plans. Finally, a real-life data set has been analyzed to illustrate the operability and applicability of the proposed methods.

journal article

Open Access Collection

Increasing the Scope as You Learn: Adaptive Bayesian Optimization in Nested Subspaces

Papenmeier, Leonard;Nardi, Luigi;Poloczek, Matthias

2023 Statistics

doi: N/Apmid: N/A

Abstract: Recent advances have extended the scope of Bayesian optimization (BO) to expensive-to-evaluate black-box functions with dozens of dimensions, aspiring to unlock impactful applications, for example, in the life sciences, neural architecture search, and robotics. However, a closer examination reveals that the state-of-the-art methods for high-dimensional Bayesian optimization (HDBO) suffer from degrading performance as the number of dimensions increases or even risk failure if certain unverifiable assumptions are not met. This paper proposes BAxUS that leverages a novel family of nested random subspaces to adapt the space it optimizes over to the problem. This ensures high performance while removing the risk of failure, which we assert via theoretical guarantees. A comprehensive evaluation demonstrates that BAxUS achieves better results than the state-of-the-art methods for a broad set of applications.

journal article

Open Access Collection

Coarse race data conceals disparities in clinical risk score performance

Movva, Rajiv;Shanmugam, Divya;Hou, Kaihua;Pathak, Priya;Guttag, John;Garg, Nikhil;Pierson, Emma

2023 Statistics

doi: N/Apmid: N/A

Abstract: Healthcare data in the United States often records only a patient's coarse race group: for example, both Indian and Chinese patients are typically coded as ``Asian.'' It is unknown, however, whether this coarse coding conceals meaningful disparities in the performance of clinical risk scores across granular race groups. Here we show that it does. Using data from 418K emergency department visits, we assess clinical risk score performance disparities across granular race groups for three outcomes, five risk scores, and four performance metrics. Across outcomes and metrics, we show that there are significant granular disparities in performance within coarse race categories. In fact, variation in performance metrics within coarse groups often exceeds the variation between coarse groups. We explore why these disparities arise, finding that outcome rates, feature distributions, and the relationships between features and outcomes all vary significantly across granular race categories. Our results suggest that healthcare providers, hospital systems, and machine learning researchers should strive to collect, release, and use granular race data in place of coarse race data, and that existing analyses may significantly underestimate racial disparities in performance.

journal article

Open Access Collection

Fair Evaluation of Graph Markov Neural Networks

Lemberger, Pirmin;Saillenfest, Antoine

2023 Statistics

doi: N/Apmid: N/A

Abstract: Graph Markov Neural Networks (GMNN) have recently been proposed to improve regular graph neural networks (GNN) by including label dependencies into the semi-supervised node classification task. GMNNs do this in a theoretically principled way and use three kinds of information to predict labels. Just like ordinary GNNs, they use the node features and the graph structure but they moreover leverage information from the labels of neighboring nodes to improve the accuracy of their predictions. In this paper, we introduce a new dataset named WikiVitals which contains a graph of 48k mutually referred Wikipedia articles classified into 32 categories and connected by 2.3M edges. Our aim is to rigorously evaluate the contributions of three distinct sources of information to the prediction accuracy of GMNN for this dataset: the content of the articles, their connections with each other and the correlations among their labels. For this purpose we adapt a method which was recently proposed for performing fair comparisons of GNN performance using an appropriate randomization over partitions and a clear separation of model selection and model assessment.

journal article

Open Access Collection

Online stochastic Newton methods for estimating the geometric median and applications

Godichon-Baggioni, Antoine;Lu, Wei

2023 Statistics

doi: N/Apmid: N/A

Abstract: In the context of large samples, a small number of individuals might spoil basic statistical indicators like the mean. It is difficult to detect automatically these atypical individuals, and an alternative strategy is using robust approaches. This paper focuses on estimating the geometric median of a random variable, which is a robust indicator of central tendency. In order to deal with large samples of data arriving sequentially, online stochastic Newton algorithms for estimating the geometric median are introduced and we give their rates of convergence. Since estimates of the median and those of the Hessian matrix can be recursively updated, we also determine confidences intervals of the median in any designated direction and perform online statistical tests.

journal article

Open Access Collection

Independence testing for inhomogeneous random graphs

Song, Yukun;Priebe, Carey E.;Tang, Minh

2023 Statistics

doi: N/Apmid: N/A

Abstract: Testing for independence between graphs is a problem that arises naturally in social network analysis and neuroscience. In this paper, we address independence testing for inhomogeneous Erdős-Rényi random graphs on the same vertex set. We first formulate a notion of pairwise correlations between the edges of these graphs and derive a necessary condition for their detectability. We next show that the problem can exhibit a statistical vs. computational tradeoff, i.e., there are regimes for which the correlations are statistically detectable but may require algorithms whose running time is exponential in n, the number of vertices. Finally, we consider a special case of correlation testing when the graphs are sampled from a latent space model (graphon) and propose an asymptotically valid and consistent test procedure that also runs in time polynomial in n.

Showing 1 to 10 of 582 Articles

Articles per page

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

Related Journals: