The Stata Journal: Promoting Communications on Statistics and Stata

The Stata Journal: Promoting Communications on Statistics and Stata | DeepDyve

journal article

LitStream Collection

Announcement of the Stata Journal Editors' Prize 2025

Cox, Nicholas J.; Jenkins, Stephen P.

2025 The Stata Journal: Promoting Communications on Statistics and Stata

doi: 10.1177/1536867x251322959pmid: N/A

journal article

LitStream Collection

Binscatter regressions

Cattaneo, Matias D.; Crump, Richard K.; Farrell, Max H.; Feng, Yingjie

2025 The Stata Journal: Promoting Communications on Statistics and Stata

doi: 10.1177/1536867x251322960pmid: N/A

In this article, we introduce the package binsreg, which implements the binscatter methods developed by Cattaneo et al. (2024a, arXiv:2407.15276 [stat.EM]; 2024b, American Economic Review 114: 1488–1514). The package comprises seven commands: binsreg, binslogit, binsprobit, binsqreg, binstest binspwc, and binsregselect. The first four commands implement binscatter plotting, point estimation, and uncertainty quantification (confidence intervals and confidence bands) for least-squares linear binscatter regression (binsreg) and for nonlinear binscatter regression (binslogit for logit regression, binsprobit for. probit regression, and binsqreg for quantile regression). The next two commands focus on pointwise and uniform inference: binstest implements hypothesis testing procedures for parametric specifications and for nonparametric shape restrictions of the unknown regression function, while binspwc implements multigroup pairwise statistical comparisons. The last command, binsregselect, implements. data-driven number-of-bins selectors. The commands offer binned scatterplots and allow for covariate adjustment, weighting, clustering, and multisample analysis, which is useful when studying treatment-effect heterogeneity in randomizec and observational studies, among many other features.

journal article

LitStream Collection

gintreg: Generalized interval regression

McDonald, James B.; Triplett, Jacob

2025 The Stata Journal: Promoting Communications on Statistics and Stata

doi: 10.1177/1536867x251322961pmid: N/A

Many important research questions involve regression models in which the dependent variable is censored or reported in intervals rather than as a numerical value. A common approach to treating these problems is to assume that the data correspond to a certain distribution (for example, a normal distribution) and then apply maximum likelihood estimation. While this method is widely used in the literature, it can yield inconsistent estimators in the presence of either heteroskedasticity or distributional misspecification. The gintreg command is a partially adaptive maximum-likelihood estimation procedure that 1) generalizes the intreg command by relaxing the normality assumption and 2) draws from a library of fexible distributional forms. The treatment of heteroskedasticity is expanded to account for possible skewness and kurtosis. Additional options provide interaction with the estimation process, informative metrics, and visualizations. Right- and left-censored, interval, grouped, and point data can be accommodatec with this method.

journal article

Open Access Collection

Avoiding the eyeballing fallacy: Visualizing statistical differences between estimates using the pheatplot command

Brini, Elisa; Borgen, Solveig Topstad; Borgen, Nicolai T.

2025 The Stata Journal

doi: 10.1177/1536867x251322962pmid: N/A

Graphical representations of coefficients and their confidence intervals are increasingly used in research presentations and publications because they are easier and quicker to read than tables. However, in coefficient plots that include several estimated coefficients, researchers often use confidence intervals to eyeball whether coefficients are statistically significant from each other, which results in an overly conservative test and increased risk of type II errors. To help avoid this eyeballing fallacy, we introduce the pheatplot postestimation command, which visualizes the statistical significance across estimates of categorical variables in a regression model. pheatplot efficiently compares the significance level between point estimates and helps researchers avoid making wrong assumptions about whether estimates differ. Moreover, by representing p-values as continuous measures rather than binary thresholds, it provides the flexibility to move beyond arbitrary cutoffs of statistical significance. This article offers some examples that illustrate the functionality of the pheatplot command.

journal article

LitStream Collection

xtevent: Estimation and visualization in the linear panel event-study design

Freyaldenhoven, Simon; Hansen, Christian B.; Pérez, Jorge Pérez; Shapiro, Jesse M.; Carreto, Constantino

2025 The Stata Journal: Promoting Communications on Statistics and Stata

doi: 10.1177/1536867x251322964pmid: N/A

Linear panel models and the “event-study plots” that often accompany them are popular tools for learning about policy effects. We introduce the xtevent package, which enables the construction of event-study plots following the suggestions in Freyaldenhoven et al. (Forthcoming, Visualization, identification, and estimation in the linear panel event-study design [Cambridge University Press]). The package implements various procedures to estimate the underlying policy effects and allows for nonbinary policy variables and estimation adjusting for preevent trends.

journal article

LitStream Collection

xtpb: The pooled Bewley estimator of long-run relationships in dynamic heterogeneous panels

Asnani, Priyanka; Chudik, Alexander; Strackman, Braden

2025 The Stata Journal: Promoting Communications on Statistics and Stata

doi: 10.1177/1536867x251322965pmid: N/A

In this article, we introduce a new command, xtpb, that implements the Chudik, Pesaran, and Smith (Forthcoming, Econometrics and Statistics, https://doi.org/10.1016/j.ecosta.2023.11.001) pooled Bewley (pb) estimator of longrun relationships in dynamic heterogeneous panel-data models. The pb estimator is based on the Bewley (1979, Economics Letters 3: 357–361) transform of the autoregressive-distributed lag model, and it is applicable under a similar setting to the widely used pooled mean group estimator of Pesaran, Shin, and Smith (1999, Journal of the American Statistical Association 94: 621–634). Two bias-correctior methods and a bootstrapping algorithm for more accurate small-sample inference robust to arbitrary cross-sectional dependence of errors are also implemented. An empirical illustration reproduces the PB estimates of the consumption function as in Chudik, Pesaran, and Smith (Forthcoming).

journal article

LitStream Collection

Establishing reference interval bounds for censored and contaminated data

Bruun, Niels Henrik; Uldall Torp, Nanna Maria; Andersen, Stine Linding

2025 The Stata Journal: Promoting Communications on Statistics and Stata

doi: 10.1177/1536867x251322968pmid: N/A

Reference intervals are essential across the medical and environmental fields. A reference interval (for example, the 95% central prediction interval) defines the normal range of measurements for a specific physiological parameter in healthy individuals. Inappropriate reference interval bounds may occur because of censored measurements (due to instrument limitations) or contaminated data (by accidentally sampling nonhealthy individuals). To address this, we propose using the regression-on-order-statistics (ROS) method combined with an optimal Box–Cox transformation. The ROS method involves regressing Gaussian scores based on ranks from ordered noncensored Box–Cox transformed measurements. To find the optimal Box–Cox transformation, we maximize the adjusted R2 when estimating the mean and standard deviation through regression of empirical Gaussian quantiles on measurements. We demonstrate how to identify contamination and introduce a new command, ros. Real-life data illustrate the effectiveness of the ROS method.

journal article

Open Access Collection

The beyondpareto command for optimal extreme-value index estimation

König, Johannes; Schluter, Christian; Schröder, Carsten; Retter, Isabella; Beckmannshagen, Mattis

2025 The Stata Journal: Promoting Communications on Statistics and Stata

doi: 10.1177/1536867x251322969pmid: N/A

In this article, we introduce the command beyondpareto, which estimates the extreme-value index for distributions that are Pareto-like, that is, whose upper tails are regularly varying and eventually become Pareto. The estimation is based on rank-size regressions, and the threshold value for the upper-order statistics included in the final regression is determined optimally by minimizing the asymptotic mean squared error. An essential diagnostic tool for evaluating the fit of the estimated extrerme-value index is the Pareto quantile–quantile plot, provided in the accompanying command pqqplot. The usefulness of our estimation approach is illustrated in several real-world examples focusing on the upper tail of German wealth and city-size distributions.

journal article

LitStream Collection

A command to fit spatial stochastic frontier models with inefficiency spillovers

Du, Kerui; Galli, Federica; Wang, Luojia

2025 The Stata Journal: Promoting Communications on Statistics and Stata

doi: 10.1177/1536867x251322970pmid: N/A

The interdependence among decision-making units challenges the assumption of cross-sectional independence in traditional stochastic frontier models. Based on the seminal spatial Durbin specification for the frontier function, Galli (2023, Journal of the Royal Statistical Society, C ser., 72: 346-367) introduced inefficiency spillovers to measure neighborhood effects related to the inefficiency determinants. This article presents a new command, sfsd, that fits the comprehensive spatial stochastic frontier model that Galli (2023) proposed, accommodating various spatial and nonspatial specifications in both the frontier and the inefficiency equations. sfsd is the first command that includes different typologies of spatial spillovers in a stochastic frontier framework, facilitating the investigation of contemporary research topics such as agglomeration and technology diffusion at both the firm and the regional levels. The description, options, and illustrative examples for the command are outlined in this article.

journal article

LitStream Collection

Heckman sample-selection estimators under heteroskedasticity

Carlson, Alyssa H.; Zhao, Wei

2025 The Stata Journal: Promoting Communications on Statistics and Stata

doi: 10.1177/1536867x251322971pmid: N/A

Abstract.This article provides a practical guide for Stata users on the consequences of heteroskedasticity in sample-selection models. We review the properties of two Heckman sample-selection estimators, full-information maximum likelihood and limited-information maximum likelihood (LIML), under heteroskedasticity. In this case, full-information maximum likelihood is inconsistent, while LIML can be consistent in certain settings. For the LIML estimator under heteroskedasticity, we show that standard Stata commands are unable to produce correct standard errors and instead suggest the community-contributed command gtsheckman (Carlson 2022, Statistical Software Components S459109, Department of Economics, Boston College; 2024, Stata Journal 24: 687–710). Because heteroskedasticity affects the performance of these two estimators, we also offer guidance on how to test for heteroskedasticity and the conditions needed for the LIML estimator to be consistent. The Monte Carlo simulations illustrate that the suggested testing procedures perform well in terms of appropriate size and power.

Showing 1 to 10 of 15 Articles

Articles per page

The Stata Journal: Promoting Communications on Statistics and Stata

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

2001

Related Journals: