Creating LaTeX Documents from within Stata using TexdocJann, Ben
doi: 10.1177/1536867X1601600201pmid: N/A
I discuss the use of texdoc for creating LATEX documents from within Stata. Specifically, texdoc provides a way to embed LATEX code directly in a do-file and to automate the integration of results from Stata in the final document. One can use the command, for example, to assemble automatic reports, write a Stata Journal article, prepare slides for classes, or put together solutions for homework assignments.
Assessing Inequality Using Percentile SharesJann, Ben
doi: 10.1177/1536867X1601600202pmid: N/A
At least since Thomas Piketty's best-selling Capital in the Twenty-First Century (2014, Cambridge, MA: The Belknap Press), percentile shares have become a popular approach for analyzing distributional inequalities. In their work on the development of top incomes, Piketty and collaborators typically report top-percentage shares, using varying percentages as thresholds (top 10%, top 1%, top 0.1%, etc.). However, analysis of percentile shares at other positions in the distribution may also be of interest. In this article, I present a new command, pshare, that estimates percentile shares from individual-level data and displays the results using histograms or stacked bar charts.
Regression Models for Bivariate Count OutcomesXu, Xinling; Hardin, James W.
doi: 10.1177/1536867X1601600203pmid: N/A
We present a new command, bivcnto, for fitting regression models suitable for analyzing correlated count outcomes. bivcnto allows specification of two correlated count outcomes with either two outcome-specific covariate lists or one common covariate list and fits models using a copula function approach in the general case or using specific parameterizations by Marshall and Olkin (1985, Journal of the American Statistical Association 80: 332–338) or Famoye (2010a, Journal of Applied Statistics 37: 969–981; 2010b, Statistica Neerlandica 64: 112–124). bivcnto also calculates a likelihood-ratio test comparing the joint model with estimation of two independent outcome-specific models.
Implementing Weighted-average Estimation of Substance Concentration Using Multiple DilutionsXu, Ying; Milligan, Paul; Remarque, Edmond J.; Cheung, Yin Bun
doi: 10.1177/1536867X1601600204pmid: N/A
In medicine and chemistry, immunoassays are often used to measure substance concentration. These tests use an S-shaped standard curve to map the observed optical responses to the underlying concentration. The enzyme-linked immunosorbent assay is one such test that is commonly used to measure antibody concentration in vaccine and infectious disease research. The enzyme-linked immunosorbent assay and other immunoassays usually involve a series of doubling or tripling dilutions of the test samples so that some of the diluted samples fall within the near-linear range in the center of the standard curve. The dilution that falls within or is nearest to the center of the near-linear range may then be selected for statistical analysis. This common practice of using one dilution does not fully use the information from multiple dilutions and reduces accuracy. We describe a recently proposed weighted-average estimation approach for analyzing multiple-dilution data (Cheung et al. 2015, Journal of Immunological Methods 417: 115–123), and we present the new wavemid command, which carries out the approach. We also present the new command midreshape, which processes raw data in text format exported from some microplate readers into analyzable data format. We use data from an experimental study of malaria vaccine candidates to demonstrate use of the two commands.
Inference in Regression Discontinuity Designs under Local RandomizationCattaneo, Matias D.; Titiunik, Rocío; Vazquez-Bare, Gonzalo
doi: 10.1177/1536867X1601600205pmid: N/A
We introduce the rdlocrand package, which contains four commands to conduct finite-sample inference in regression discontinuity (RD) designs under a local randomization assumption, following the framework and methods proposed in Cattaneo, Frandsen, and Titiunik (2015, Journal of Causal Inference 3: 1–24) and Cattaneo, Titiunik, and Vazquez-Bare (2016, Working Paper, University of Michigan, http://www-personal.umich.edu/∼titiunik/papers/CattaneoTitiunikVazquezBare2015_wp.pdf). Assuming a known assignment mechanism for units close to the RD cutoff, these functions implement a variety of procedures based on randomization inference techniques. First, the rdrandinf command uses randomization methods to conduct point estimation, hypothesis testing, and confidence interval estimation under different assumptions. Second, the rdwinselect command uses finite-sample methods to select a window near the cutoff where the assumption of randomized treatment assignment is most plausible. Third, the rdsensitivity command uses randomization techniques to conduct a sequence of hypothesis tests for different windows around the RD cutoff, which can be used to assess the sensitivity of the methods and to construct confidence intervals by inversion. Finally, the rdrbounds command implements Rosenbaum (2002, Observational Studies [Springer]) sensitivity bounds for the context of RD designs under local randomization. Companion R functions with the same syntax and capabilities are also provided.
Simpler Standard Errors for Two-stage Optimization EstimatorsTerza, Joseph V.
doi: 10.1177/1536867X1601600206pmid: N/A
Aiming to lessen the analytic and computational burden faced by practitioners seeking to correct the standard errors of two-stage estimators, I offer a heretofore unexploited simplification of the conventional formulation for the most commonly encountered cases in empirical application—two-stage estimators that involve maximum likelihood or pseudomaximum likelihood estimation. With the applied researcher in mind, I focus on the two-stage residual inclusion estimator designed for nonlinear regression models involving endogeneity. I demonstrate the analytics and Stata and Mata code for implementing my simplified standard-error formula by applying the two-stage residual inclusion method to the birthweight model of Mullahy (1997, Review of Economics and Statistics 79: 586–593) using his original data.
Igmobil: A Command for Intergenerational Mobility Analysis in StataSavegnago, Marco
doi: 10.1177/1536867X1601600207pmid: N/A
In this article, I describe a new command, igmobil, that computes up to 20 intergenerational mobility (IGM) indices for continuous (that is, income or years of education) or discrete (that is, educational or occupational level) variables. I consider three classes of IGM indices: 1) single-stage indices, 2) indices derived from a transition matrix between parents’ and children's socioeconomic status, and 3) indices based on inequality measures. Users may add a fourth class to specify any possible IGM index not included in igmobil. Standard errors and confidence intervals are calculated using a bootstrap procedure. Users can customize many aspects of the program output, including the type and dimension of the transition matrix, the parameters for some IGM indices (like the ones involving generalized entropy measures and the Atkinson index), and how standard errors and confidence intervals are calculated.
Fixed Effects in Unconditional Quantile RegressionBorgen, Nicolai T.
doi: 10.1177/1536867X1601600208pmid: N/A
Unconditional quantile regression has quickly become popular after being introduced by Firpo, Fortin, and Lemieux (2009, Econometrica 77: 953–973) and is easily implemented using the user-written command rifreg by the same authors. However, including high-dimensional fixed effects in rifreg is quite burdensome and sometimes even impossible. In this article, I show that when the number of fixed effects is large, the computational speed is massively increased by using xtreg rather than regress to fit the unconditional quantile regression models. I also introduce the xtrifreg command, which should be considered a supplement to rifreg. The xtrifreg command has many of the same features as rifreg but can be used to include a large number of fixed effects, to estimate cluster–robust standard errors, and to estimate cluster–bootstrapped standard errors.
Calculate Travel Time and Distance with Openstreetmap Data Using the Open Source Routing Machine (OSRM)Huber, Stephan; Rust, Christoph
doi: 10.1177/1536867X1601600209pmid: N/A
In this article, we introduce the osrmtime command, which calculates the distance and travel time between two points using latitude and longitude information. The command uses the Open Source Routing Machine (OSRM) and OpenStreetMap to find the optimal route by car, by bicycle, or on foot. The procedure is specially built for large georeferenced datasets. Because it is fast, the command uses the full computational capacity of a PC, allows the user to make unlimited requests, and is independent of the Internet and commercial online providers. Hence, there is no risk of the command becoming obsolete. Moreover, the results can be replicated at any time.
Panel Time Series: Review of the Methodological EvolutionBurdisso, Tamara; Sangiácomo, Máximo
doi: 10.1177/1536867X1601600210pmid: N/A
In this article, we discuss the econometric treatment of macropanels, also known as panel time series. This new approach rejects the assumption of slope homogeneity and handles nonstationarity. It also recognizes that cross-section dependence (that is, some correlation structure in the error term between units due to unobservable common factors) squanders efficiency gains by operating with a panel. This approach uses a new set of estimators known in the literature as the common correlated effect, which essentially consists of increasing the model to be fit by adding the averages of the individuals in each time t, of both the dependent variable and the specific regressors of each individual. We present two commands developed for the evaluation and treatment of cross-section dependence.