journal article
LitStream Collection
doi: 10.1177/1536867X1101100202pmid: N/A
At the heart of many econometric models are a linear function and a normal error. Examples include the classical small-sample linear regression model and the probit, ordered probit, multinomial probit, tobit, interval regression, and truncated-distribution regression models. Because the normal distribution has a natural multidimensional generalization, such models can be combined into mul-tiequation systems in which the errors share a multivariate normal distribution. The literature has historically focused on multistage procedures for fitting mixed models, which are more efficient computationally, if less so statistically, than maximum likelihood. Direct maximum likelihood estimation has been made more practical by faster computers and simulated likelihood methods for estimating higher-dimensional cumulative normal distributions. Such simulated likelihood methods include the Geweke–Hajivassiliou-Keane algorithm (Geweke, 1989, Econometrica 57: 1317–1339; Hajivassiliou and McFadden, 1998, Econometrica 66: 863–896; Keane, 1994, Econometrica 62: 95–116). Maximum likelihood also facilitates a generalization to switching, selection, and other models in which the number and types of equations vary by observation. The Stata command cmp fits seemingly unrelated regressions models of this broad family. Its estimator is also consistent for recursive systems in which all endogenous variables appear on the right-hand sides as observed. If all the equations are structural, then estimation is full-information maximum likelihood. If only the final stage or stages are structural, then estimation is limited-information maximum likelihood. cmp can mimic a score of built-in and user-written Stata commands. It is also appropriate for a panoply of models that previously were hard to estimate. Heteroskedasticity, however, can render cmp inconsistent. This article explains the theory and implementation of cmp and of a related Mata function, ghk2(), that implements the Geweke–Hajivassiliou–Keane algorithm.
Silva, J. M. C. Santos; Tenreyro, Silvana
doi: 10.1177/1536867X1101100203pmid: N/A
In this article, we identify and illustrate some shortcomings of the poisson command in Stata. Specifically, we point out that the command fails to check for the existence of the estimates, and we show that it is very sensitive to numerical problems. While these are serious problems that may prevent users from obtaining estimates or may even produce spurious and misleading results, we show that the informed user often has simple workarounds available for addressing these problems.
De Luca, Giuseppe; Perotti, Valeria
doi: 10.1177/1536867X1101100204pmid: N/A
We introduce two new Stata commands for the estimation of an ordered response model with sample selection. The opsel command uses a standard maximum-likelihood approach to fit a parametric specification of the model where errors are assumed to follow a bivariate Gaussian distribution. The snpopsel command uses the semi-nonparametric approach of Gallant and Nychka (1987, Econometrica 55: 363–390) to fit a semiparametric specification of the model where the bivariate density function of the errors is approximated by a Hermite polynomial expansion. The snpopsel command extends the set of Stata routines for semi-nonparametric estimation of discrete response models. Compared to the other semi-nonparametric estimators, our routine is relatively faster because it is programmed in Mata. In addition, we provide new postestimation routines to compute linear predictions, predicted probabilities, and marginal effects. These improvements are also extended to the set of semi-nonparametric Stata commands originally written by Stewart (2004, Stata Journal 4: 27–39) and De Luca (2008, Stata Journal 8: 190–220). An illustration of the new opsel and snpopsel commands is provided through an empirical application on self-reported health with selectivity due to sample attrition.
Kunz, Cornelia U.; Kieser, Meinhard
doi: 10.1177/1536867X1101100205pmid: N/A
This article describes a new Stata command called simontwostage, which calculates the critical values and sample sizes for two-stage designs for phase II oncology trials. Options are provided to determine the minimax and optimal designs proposed by Simon (1989, Controlled Clinical Trials 10: 1–10) and admissible designs described by Jung et al. (2004, Statistics in Medicine 23: 561–569). Furthermore, nonstochastic and stochastic curtailment rules can be implemented in both stages of the trial, and the properties of the curtailed designs can be examined.
doi: 10.1177/1536867X1101100206pmid: N/A
An extension of mvmeta, my program for multivariate random-effects meta-analysis, is described. The extension handles meta-regression. Estimation methods available are restricted maximum likelihood, maximum likelihood, method of moments, and fixed effects. The program also allows a wider range of models (Riley's overall correlation model and structured between-studies covariance); better estimation (using Mata for speed and correctly allowing for missing data); and new postestimation facilities (I-squared, standard errors and confidence intervals for between-studies standard deviations and correlations, and identification of the best intervention). The program is illustrated using a multiple-treatments meta-analysis.
Tebaldi, Pietro; Bonetti, Marco; Pagano, Marcello
doi: 10.1177/1536867X1101100207pmid: N/A
We implement the commands mstat and mtest to perform inference based on the M statistic, a statistic that can be used to compare the interpoint distance distribution across groups of observations.The analyses are based on the study of the interpoint distances between n points in a k-dimensional setting to produce a one-dimensional real-valued test statistic. The locations are distributed in a region of the plane. When we consider all (n2) interpoint distances, the dependencies among them are difficult to express analytically, but their distribution is informative, and the M statistic can be built to summarize one aspect of this information.The two commands can be used on a wide class of data sets to test the null hypothesis that two groups have the same (spatial) distribution. mstat and mtest return the exact M test statistic. Moreover, mtest executes a Monte Carlo–type permutation test, which returns the empirical p-value together with its confidence interval. This is the command to use in most situations, because the convergence of M to its asymptotic chi-squared distribution is slow.Both commands can be used to obtain graphical output of the empirical density function of the interpoint distance distributions in the two groups and the two-dimensional map of the n observations in the plane.The descriptions of the commands are accompanied by examples of applications with real and simulated data. We run the test on the Alt and Vach grave site dataset (Manjourides and Pagano, forthcoming, Statistics in Medicine) and reject the null hypothesis, in contradiction to other published analyses. We also show how to adapt the techniques to discrete datasets with more than one unit in each location. Finally, we report an extensive application on breast cancer data in Massachusetts; in the application, we show the compatibility of the M commands with Pisati's spmap package.
doi: 10.1177/1536867X1101100208pmid: N/A
The Stata 11 margins command makes it easier to estimate adjusted risk ratios, and the new robust variance option for xtpoisson, fe provides correct confidence intervals for adjusted risk ratios from matched-cohort data.
doi: 10.1177/1536867X1101100209pmid: N/A
Generating random samples in Stata is very straightforward if the distribution drawn from is uniform or normal. With any other distribution, an inverse method can be used; but even in this case, the user is limited to the built-in functions. For any other distribution functions, their inverse must be derived analytically or numerical methods must be used if analytical derivation of the inverse function is tedious or impossible. In this article, I introduce a command that generates a random sample from any user-specified distribution function using numeric methods that make this command very generic.
doi: 10.1177/1536867X1101100210pmid: N/A
Many problems in data management center on relating values to values in other observations, either within a dataset as a whole or within groups such as panels. This column reviews some basic Stata techniques helpful for such tasks, including the use of subscripts, summarize, by:, sum(), cond(), and egen. Several techniques exploit the fact that logical expressions yield 1 when true and 0 when false. Dividing by zero to yield missings is revealed as a surprisingly valuable device.
Showing 1 to 10 of 15 Articles