The Use of One- Versus Two-Tailed Tests to Evaluate Prevention ProgramsRingwalt, Chris; Paschall, M.J.; Gorman, Dennis; Derzon, James; Kinlaw, Alan
doi: 10.1177/0163278710388178pmid: 21138911
Investigators have used both one- and two-tailed tests to determine the significance of findings yielded by program evaluations. While the literature that addresses the appropriate use of each type of significance test should be used is historically inconsistent, almost all authorities now agree that one-tailed tests are rarely (if ever) appropriate. A review of 85 published evaluations of school-based drug prevention curricula specified on the National Registry of Effective Programs and Practices revealed that 20% employed one-tailed tests and, within this subgroup, an additional 4% also employed two-tailed tests. The majority of publications either did not specify the type of statistical test employed or used some other criterion such as effect sizes or confidence intervals. Evaluators reported that they used one-tailed tests either because they stipulated the direction of expected findings in advance, or because prior evaluations of similar programs had yielded no negative results. The authors conclude that one-tailed tests should never be used because they introduce greater potential for Type I errors and create an uneven playing field when outcomes are compared across programs. The authors also conclude that the traditional threshold of significance that places α at .05 is arbitrary and obsolete, and that evaluators should consistently report the exact p values they find.
Pseudo Cluster Randomization: Balancing the Disadvantages of Cluster and Individual RandomizationMelis, René J. F.; Teerenstra, S.; Olde Rikkert, M.G.M.; Borm, G.F.
doi: 10.1177/0163278710361925pmid: 20457714
While designing a trial to evaluate a complex intervention, one may be confronted with the dilemma that randomization at the level of the individual patient risks contamination bias, whereas cluster randomization risks incomparability of study arms and recruitment problems. Literature provides only few solutions to this dilemma and these are not always feasible. As an alternative solution for this dilemma, we developed a new two-stage randomization method called pseudo cluster randomization. In the first stage, the clusters (e.g., recruiting physicians) are randomized into two groups: one group of clusters in which the majority of the participants (e.g., 80%) will receive the experimental treatment; one group of clusters in which the majority will receive the control condition. Following this, the second stage of the randomization involves randomly assigning participants within clusters in the proportions determined by the first stage. This has important advantages. Compared with cluster randomization the potential occurrence of baseline incomparability of treatment arms and poor recruitment is reduced, because the physicians who recruit the participants are unable to know in advance which treatment condition the next participant they recruit will be assigned to. Limiting the exposure of half of the physicians to the innovative intervention lowers risk of contamination bias. When this type of contamination bias is present, pseudo cluster randomization can be more efficient than individual or cluster randomization in that smaller number of study participants is needed to achieve a predefined power.
Designing a Prospective Study When Randomization is Not FeasibleLinden, Ariel
doi: 10.1177/0163278710376824pmid: 20696741
When conducting a randomized controlled trial (RCT) is unfeasible, the goal is to replicate the randomization process by creating a control group that is essentially equivalent to the treatment group on known pre-intervention characteristics and assume that the remaining unknown characteristics will not bias the results. The strategies proposed in this article are based on the thesis that since only pre-intervention characteristics are used for adjustment, a comparable control group can be established as soon as the participant group is identified. Consequently, outcomes can be observed immediately after launching the initiative rather than waiting until study completion. The benefit is that significant treatment effects can be observed as they occur, or alternatively, the initiative can be cancelled if treatment effects are not attained by a certain time point. Although these methods can never ensure the same level of validity as in an RCT, they are considered robust alternatives when randomization is impractical, and therefore a compelling study design for many commercial initiatives, such as disease management programs, benefit design changes, and pay-for-performance efforts. An obvious constraint is that treated participants must first be identified before suitable controls can be found. The preferred strategy is to enroll the entire treatment group within a narrow time frame. An alternative option is to have periodic enrollment periods with their respective treatment and control cohorts. The concept proposed in this article is intended to offer a robust alternative to the inadequate strategies currently being used in many health care settings where study findings may not be trusted, and thus decision makers remain uninformed as to whether an initiative is worth continuing or cancelled.
Joint Modeling of Longitudinal Data in Multiple Behavioral ChangeCharnigo, Richard; Kryscio, Richard; Bardo, Michael T.; Lynam, Donald; Zimmerman, Rick S.
doi: 10.1177/0163278710392982pmid: 21196429
Multiple behavioral change is an exciting and evolving research area, albeit one that presents analytic challenges to investigators. This manuscript considers the problem of modeling jointly trajectories for two or more possibly non-normally distributed dependent variables, such as marijuana smoking and risky sexual activity, collected longitudinally. Of particular scientific interest is applying such modeling to elucidate the nature of the interaction, if any, between an intervention and personal characteristics, such as sensation seeking and impulsivity. The authors describe three analytic approaches: generalized linear mixed modeling, group-based trajectory modeling, and latent growth curve modeling. In particular, the authors identify identify the strengths and weaknesses of these analytic approaches and assess their impact (or lack thereof) on the psychological and behavioral science literature. The authors also compare what investigators have been doing analytically versus what they might want to be doing in the future and discuss the implications for basic and translational research.
Assessing Costs and Potential Returns of Evidence-Based Programs for SeniorsMiller, Thomas R.; Dickerson, Justin B.; Smith, Matthew L.; Ory, Marcia G.
doi: 10.1177/0163278710393955pmid: 21196430
The authors describe the customary tools used by health services researchers to conduct economic evaluations of health interventions. Recognizing the inherent challenges of these tools for utilization in contemporary public health practice, we recommend a practical cost-benefit analysis (PCBA) to allow public health practitioners to assess the economic merits of their existing public health programs. The PCBA estimates what health effects and corresponding medical cost avoidance would be required to support the costs associated with implementing a community-based prevention program. We apply the PCBA to evaluate a statewide evidence-based falls prevention program for seniors in Texas. We estimate a positive return on realized costs due to avoided direct and indirect medical expenses if the program averts 7 falls among 140 participants within the first year. While acknowledging the demonstrated health-related benefits of public health interventions, we provide a practical ex-post economic evaluation methodology to assess return on investment as a more simplistic yet effective alternative for public health practitioners versus contemporary analyses of health services researchers.
Web-Based Application to Eliminate Five Contraindicated Multiple-Choice Question PracticesBrunnquell, Andreas; Degirmenci, Ümüt; Kreil, Sebastian; Kornhuber, Johannes; Weih, Markus
doi: 10.1177/0163278710370459pmid: 20483716
Multiple-choice questions (MCQs) evaluate factual knowledge in medical education and have a high reliability, if performed appropriately. However, many MCQs contain formal errors leading to reduced validity. The authors developed a Web application capable of recognizing and eliminating five frequent contraindicated practices in MCQs: negative stem, unfocused stem, cueing words, longest item = right item flaw, and stem/item similarities. The authors used simple string algorithms and dynamic comparisons with keywords. The system was successfully validated with a sample of approximately 800 continuous medical education (CME) questions, showing that our system automatically detects 60% of all formal didactic errors. Flaws not detected by the software can easily be avoided using quick manuals on item wording or clear instruction to the authors. The authors conclude that it is feasible to improve the quality of MCQs by designing a Web application that is capable of detecting common flaws by simple string operations.
The Distribution of Outcomes Research Papers Across Clinical JournalsGoldsack, Jennifer; McLaughlin, Chris; Bristol, Mirar N.; Loeb, Alex; Bergey, Meredith; Sonnad, Seema S.
doi: 10.1177/0163278710394461pmid: 21411472
This study examines the distribution of health outcomes research (HOR) studies in the clinical literature by clinical areas and journal impact factor. The authors reviewed 535 journals and divided the sample into higher and lower impact journals across four clinical area. Mann-Whitney and Kruskal-Wallis tests were used to examine differences across four categories of outcomes research articles published, specifically the incidence of articles in higher versus lower impact journals and differences across clinical areas. All high-impact journals published more safety and quality articles than process assessment, quality of life, or cost analysis studies. The number of each type of outcomes research study published was highly variable across all clinical areas. Only arthritis and outcomes research journals showed statistically significant differences between higher versus lower impact journals. Authors may benefit from considering these differences in their clinical specialty area when deciding where to submit HOR studies.