Odds Ratios and Logistic Regression: Further Examples of their use and InterpretationHailpern, Susan M.; Visintainer, Paul F.
doi: 10.1177/1536867X0300300301pmid: N/A
Logistic regression is perhaps the most widely used method for adjustment of confounding in epidemiologic studies. Its popularity is understandable. The method can simultaneously adjust for confounders measured on different scales; it provides estimates that are clinically interpretable; and its estimates are valid in a variety of study designs with few underlying assumptions. To those of us in practice settings, several aspects of applying and interpreting the model, however, can be confusing and counterintuitive. We attempt to clarify some of these points through several examples. We apply the method to a study of risk factors associated with periventricular leucomalacia and intraventricular hemorrhage in neonates. We relate the logit model to Cornfield's 2 x 2 table and discuss its application to both cohort and case–control study design. Interpretations of odds ratios, relative risk, and β0 from the logit model are presented.
Tools for Analyzing Multiple Imputed DatasetsCarlin, John B.; Li, Ning; Greenwood, Philip; Coffey, Carolyn
doi: 10.1177/1536867X0300300302pmid: N/A
The method of multiple imputation (MI) is used increasingly for analyzing datasets with missing observations. Two sets of tasks are required in order to implement the method: (a) generating multiple complete datasets in which missing values have been imputed by simulating from an appropriate probability distribution and (b) analyzing the multiple imputed datasets and combining complete data inferences from them to form an overall inference for parameters of interest. An increasing number of software tools are available for task (a), although this is difficult to automate, because the method of imputation should depend on the context and available covariate data. When the quantity of missing data is not great, the sensitivity of results to the imputation model may be relatively low. In this context, software tools that enable task (b) to be performed with similar ease to the analysis of a single dataset should facilitate the wider use of multiple imputation. Such tools need not only to implement techniques for inference from multiple imputed datasets but also to allow standard manipulations such as transformation and recoding of variables. In this article, we describe a set of Stata commands that we have developed for manipulating and analyzing multiple datasets.
Confidence Intervals and p-values for Delivery to the End UserNewson, Roger
doi: 10.1177/1536867X0300300303pmid: N/A
Statisticians make their living producing confidence intervals and p-values. However, those in the Stata log are not ready for delivery to the end user, who usually wants to see statistical output either as a plot or as a table. This article describes a suite of programs used to convert Stata results to one or other of these forms. The eclplot package creates plots of estimates with confidence intervals, and the listtex package outputs a Stata dataset in the form of table rows that can be inserted into a plain TEX, LATEX, HTML, or word processor table. To create a Stata dataset that can be output in these ways, we can use the parmest, dsconcat, and lincomest packages to create datasets with one observation per estimated parameter; the sencode, tostring, ingap, and reshape packages to process these datasets into a form ready to be output; and the descsave and factext packages to reconstruct, in the output dataset, categorical predictor variables represented by dummy variables in regression models.
Do-it-yourself Shuffling and the Number of Runs under RandomnessSmeeton, Nigel; Cox, Nicholas J.
doi: 10.1177/1536867X0300300304pmid: N/A
A common class of problem in statistical science is estimating, as a benchmark, the probability of some event under randomness. For example, in a sequence of events in which several outcomes are possible and the length of the sequence and number of outcomes of each type known, the number of runs gives an indication of whether the outcomes are random, clustered, or alternating. This note explains and illustrates a simple method of random shuffling that is often useful. We show how the conditional probability distribution of the number of runs may be derived easily in Stata, thus yielding p-values for testing the null hypothesis that the type of outcome is random. We also compare our direct approach with that using the simulate command.
Lean Mainstream Schemes for Stata 8 GraphicsJuul, Svend
doi: 10.1177/1536867X0300300306pmid: N/A
The new Stata 8 graphics are powerful and flexible. Now, a few months after the first release, the graphics still have some shortcomings—both in design and in the manual documenting the program—but progress is being made. The graph layout used throughout the Graphics Reference Manual has led some users to underestimate the potential of the program. This paper presents two schemes for a lean layout, conforming to the mainstream in scientific publishing.
Speaking Stata: Problems with Tables, Part ICox, Nicholas J.
doi: 10.1177/1536867X0300300308pmid: N/A
Tables in some form or another are part and parcel of data management and analysis. The main general-purpose tabulation commands, tabulate, table, and tabstat, are reviewed and compared. When these do not provide a tabulation solution, one key strategy is to prepare the material for tabulation as a set of variables, after which the table itself can be presented with tabdisp or list. This is the first of two papers on this topic.