TY - JOUR AU - Armitage,, Peter AB - In the 1920s RA Fisher presented randomization as an essential ingredient of his approach to the design and analysis of experiments, validating significance tests. In its absence the experimenter had to rely on his judgement that the effects of biases could be discounted. Twenty years later, A Bradford Hill promulgated the random assignment of treatments in clinical trials as the only means of avoiding systematic bias between the characteristics of patients assigned to different treatments. The two approaches were complementary, Fisher appealing to statistical theory, Hill to practical needs. The two men remained on good terms throughout most of their careers. Two great scientists, Ronald Aylmer Fisher (1890–1962) and Austin Bradford Hill (1897–1991), were influential proponents of randomized treatment assignment in comparative experiments: Fisher in the 1920s and 1930s, Hill in the 1940s and 1950s. Yet they rarely, if ever, referred to each other in their publications on this topic. The reason was not personal antagonism—they remained on good terms during this period; nor was there a generation gap—Fisher was the elder by only 7 years. Their fields of application were different, it is true, but they had this in common—that they were both deeply motivated by, and highly informed about, these applied fields. Fisher, of course, was a great geneticist and evolutionary biologist in his own right, as Sir Walter Bodmer will no doubt tell us. The reason for the lack of discourse between them was, in my view, that they inhabited quite different intellectual territories, approaching randomization from different points of view, with partially different purposes in mind. I want to look very briefly at their achievements and outlooks, and try to explain their attitudes to randomization. Fisher was a mathematical prodigy at school and university, aided rather than hampered by severe myopia that encouraged mental rather than written work. His first mathematical publication appeared in his graduation year, 1912, but his great passion was for eugenics and this drew him to the study of statistics and genetics. In her superb biography of her father, Joan Fisher Box1 refers to the next period as the ‘wilderness years’, as he fluctuated between school teaching and farming while continuing to produce important research work. The breakthrough came in 1919, with his appointment to Rothamsted Experimental Station, the major centre for agricultural research in the UK. During the next 15 or so years he revolutionized statistical theory and methodology. Before his time, in what JBS Haldane called the ‘pre-piscatorial era’, the theory of statistics had been extremely fragmentary, with many unsolved problems and a good deal of confusion especially about the role of inverse probability—the assignment of probabilities to hypotheses, or what we should now call ‘Bayesian’ methods. Fisher produced a powerful theory of estimation and hypothesis testing free of such prior assumptions, backed up by penetrating mathematical results on the distributions of test statistics. For the first time it was possible to provide the research worker with a practical kit of tools for the analysis of many standard types of data. Fisher’s own book on statistical methods2 was a hard nut to crack, but it engendered several other more approachable books during the inter-war period, such as those by Mather, Snedecor, Goulden, and Tippett. Especially relevant for our present topic was Fisher’s work on experimental design, a topic he could be said to have invented. This was first presented in Fisher’s 1925 book,2 expanded in a 1926 paper,3 aimed at agricultural research workers, and developed more fully in his 1935 book The Design of Experiments.4 One of its key features was the technique of random assignment of treatments or varieties to the field plots, and I shall say more about this later. Two points are worth making at this stage. Fisher always thought of design as going hand-in-hand with analysis. A design should maximize efficiency but must also provide a means of valid inference: randomization was an essential condition for the validity of Fisherian analysis. Secondly, note that the experiments considered by Fisher are the somewhat restricted class of controlled comparisons of treatments or varieties on arbitrarily variable experimental plots. The relevance of Fisher’s work to medical trials is clear, the ‘plots’ in this case being the individual patients. Bradford Hill was the son of a distinguished physiologist, Sir Leonard Hill, and his intention to follow a medical career was thwarted by the onset of tuberculosis in 1917. He was an invalid for several years, during which he took a degree in economics by correspondence. He received help from the leading medical statistician of the time, Major Greenwood, who had worked in physiology under Hill’s father and also in statistics with Karl Pearson, the dominant figure in British statistics during the first two decades of the century. In 1927 Hill moved with Greenwood to the London School of Hygiene and Tropical Medicine and during the 1930s he researched mainly in occupational epidemiology. His renown in medical statistics started in 1937 with the publication of his textbook5 based on a series of articles in the Lancet. This was entirely consistent with Fisherian methodology although the style was wholly different from Fisher’s, emphasizing practical snags and difficulties rather than theoretical minutiae. In this book he advocated randomization, although not initially distinguishing it from other apparently reasonable methods of avoiding allocation bias, such as alternation. He emphasized the distinction later and was able to put the method into practice in the trial of streptomycin in pulmonary tuberculosis and in the trial of pertussis vaccines planned earlier but published later. Let us now look in more detail at the reasons for randomization given by these two men. We should realize that Fisher was able to approach controlled experimentation with much stronger organizational support than Hill. Rothamsted had been the home of field trials for many decades, and his colleagues in the 1920s needed no persuasion of the need for simultaneous controlled comparisons. They knew better than to compare fertilizers on different fields, or to compare a particular variety one year with another variety in the previous year. The only question was how to allocate treatments to plots. By contrast, the case for controlled trials in medicine was far from secure in the late 1930s. Voices were certainly heard in praise of the pioneers of ‘fair’ comparisons, such as Lind, but others advocated historical comparisons or argued that controlled allocation ignored the uniqueness of the individual and was perhaps unethical. So Hill had to make the general case for simultaneous control as well as the more specific case for randomization. Fisher repeatedly and consistently gave the same justification for randomization. In 19263 he wrote: One way of making sure that a valid estimate of error will be obtained is to arrange the plots deliberately at random, so that no distinction can creep in between pairs of plots treated alike and pairs treated differently; in such a case an estimate of error, derived in the usual way from the variation of sets of plots treated alike, may be applied to test the significance of the observed difference between the averages of plots treated differently. And again in The Design of Experiments4 (the quotation being from p. 26 of the 6th edition, 1951): The purpose of randomisation ... is to guarantee the validity of the test of significance, this test being based on an estimate of error made possible by replication. Fisher regarded randomization as the unique means by which the random error variance would be correctly estimated, thus validating the significance tests which were at the heart of his analyses. If, for instance, the agricultural experimenter tried to balance the perceived fertility of the plots receiving different treatments he would tend to overestimate the error, so that differences would actually be more significant than they appeared to be, although if he guessed wrong the effect might be in the opposite direction. Similarly Fisher would criticize a common practice in animal experimentation, to balance the mean weights of animals in different groups. The aim to allow for relevant factors like soil fertility or animal weight was laudable, but should be approached by forming blocks of more homogeneous plots or animals and randomizing within these. Moreover, randomization provided a partial safeguard against failure of the assumption usually made in the statistical theory, that random variation followed a normal distribution. In practice no distributions are exactly normal, and some are very far from being so, but in a certain sense the tests can be shown to be approximately correct even under non-normality. Furthermore, if the worst comes to the worst, an exact test can be carried out by considering the permutations that would have occurred to the data if all the possible random assignments had been made—a so-called ‘randomization test’. In 19476 he again emphasized the crucial role of randomization: The theory of estimation presupposes a process of random sampling. All our conclusions within that theory rest on this basis; without it our tests of significance would be worthless. ... In controlled experimentation it has been found not difficult to introduce explicit and objective randomisation in such a way that the tests of significance are demonstrably correct. In other cases we must still act in faith that Nature has done the randomisation for us.... We now recognise randomisation as a postulate necessary to the validity of our conclusions, and the modern experimenter is careful to make sure that this postulate is justified. This passage echoes an earlier one from his book: Randomisation properly carried out ... relieves the experimenter from the anxiety of considering and estimating the magnitude of the innumerable causes by which his data may be disturbed. (ref. 4, section 20) Fisher was able to relieve his own anxieties by the support given by Rothamsted to randomized experiments. Experimenters in areas other than agricultural research were less fortunate, and for most of them randomization remained an unfamiliar concept for many years. In spite of his passionate advocacy of randomization, Fisher was not above attempting rescue operations on non-randomized data sets for which his help was sought. Indeed, he invented one of the principal methods for making adjustments to correct for known biases—the analysis of covariance. Two examples of such rescue operations come to mind. In 1931 he was consulted about the Lanarkshire milk experiment to determine the effect on child growth of milk supplements. The study, which was inadequately designed and somewhat flawed in its execution, attracted the attention of several leading statisticians, and Fisher and Bartlett7 chipped in with a fairly cautious analysis contradicting the investigators’ apparently unjustified conclusions. Secondly, Atkins and Fisher8 analysed data from a study in which soldiers were dosed with vitamin C until the urinary level reached saturation. Different groups showed different times to saturation, but a confounding factor had been identified that might have been responsible—namely whether they were dosed before or after breakfast. Bradford Hill, having cut his teeth on epidemiological studies, would have seen confounding as the rule rather than the exception, and his natural tendency in a non-randomized study would have been to examine carefully all possible confounding factors. The first study of smoking and lung cancer by Doll and Hill9 is a model for such an examination. In his own advocacy of randomization, Hill would have been less than impressed by Fisher’s emphasis on exactness in statistical analysis. He had himself an acutely sensitive ‘feel’ for numerical data, and would tend to trust his own judgement rather than rely on formal calculations, which he would perform to satisfy the reader’s expectations rather than for his own edification. In 195210 he gave three justifications for randomization: It ensures that neither our personal idiosyncrasies (our likes or dislikes consciously or unwittingly applied) nor our lack of balanced judgement has entered into the construction of the different treatment groups—the allocation has been outside our control and the groups are therefore unbiased; ... it removes the danger, inherent in an allocation based on personal judgement, that believing we may be biased in our judgements we endeavour to allow for that bias, to exclude it, and that in doing so we may overcompensate and by thus ‘leaning over backward’ introduce a lack of balance from the other direction; ... and, having used a random allocation, the sternest critic is unable to say when we eventually dash into print that quite probably the groups were differentially biased through our predilections or through our stupidity. Hill follows this passage with a somewhat incautious sentence: Once it has been decided that a patient is of the right type to be brought into the trial the random method of allocation removes all responsibility from the clinical observer which may have led Marks11 to describe these statements as ‘naïve’. We may agree with that description in the sense that they do not depend on sophisticated theoretical reasoning, being largely a matter of common sense. But they were precisely the sort of arguments that would appeal to physicians eager to conduct reliable research. Hill would, I think, have been content with any pseudo-random method if it could be shown to avoid bias, but he realized that nothing short of randomization was likely to do that. He remarked in a valedictory paper12 that alternation of successive cases might have been successful if strictly adhered to, but, he wrote, ‘it’s a very large IF’. In helping to plan the early trials he became aware that alternation might be dangerous because the foreknowledge of a patient’s assignment might affect the physician’s decision whether or not, or when, to enter the patient in the trial—a so-called ‘selection bias’. Similar objections applied to other deterministic methods such as use of the last digit of the patient’s hospital number. Of course, even a random assignment must be made known only after a patient’s entry into the trial, otherwise a selection bias might still apply. As Hill10 had emphasized, strict adherence to randomization is a sine qua non. One of Hill’s ingenious inventions13 was the use of restricted randomization or ‘permuted blocks’ in order to achieve near-equality in the numbers assigned to different treatments at any stage of the recruitment period. This was done by forming blocks of, say, ten successive patients and assigning equal numbers of patients randomly within each block. This might be done separately within prognostic groups or, in a multicentre study, separately for each centre. Fisher would have said that such restrictions, while admirable, should be taken into account in the analysis. Hill, I think, would have replied that the effect would be so small as to be ignorable. A similar point arises with modern methods, such as ‘minimization’, which are widely used for balancing groups for prognostic factors, where the effects on the analysis are usually ignored. So, did Bradford Hill introduce randomization without any knowledge of Fisher’s work? Most certainly not, in my view. Hill moved in the inner circle of statisticians of the 1930s, and was a colleague of J Oscar Irwin, a keen Fisherian. He would have known about Fisher’s use of randomization in field trials and seen that it was the key to success in medical trials. Fisher, for his part, seems to have taken little interest in clinical medicine—I know of no written comment by him on clinical trials, although Hill once remarked to me that Fisher had suggested to him that randomization proportions should be altered dynamically as a function of the P-value from a significance test, so that as the difference became more significant a smaller proportion of patients received the apparently inferior treatment. Methods of dynamic allocation such as this have been studied a good deal in succeeding decades, although rarely used in practice.14 It may be that in the 1930s Fisher thought that doctors would never accept controlled experimentation, and he may even have had ethical objections to the idea. It is interesting to trace the warmth of the relationship between Fisher and Hill from a file of correspondence kindly made available by the curator of the Fisher Archive at the University of Adelaide. In 1929 Hill replies to a letter from Fisher, writing ‘Dear Sir’. By 1931, when Hill receives and declines an offer of a job with Fisher, they are ‘Dear Fisher’ and ‘Dear Bradford Hill’. By 1940 we have ‘My dear Fisher’ and ‘My dear Bradford Hill’, and in 1949 ‘Dear RA’ and ‘Dear Tony’. (Hill was always ‘Tony’ to his friends. His family members assert that his parents were confidently expecting a girl and had no boy’s name in readiness. Sir Leonard was given the job of combing through a book of boys’ names and got bored by the time he had reached the end of the ‘A’s. The name Austin never caught on.) By 1952 they were ‘My dear Ron’ and ‘My dear Tony’. But in 1958 and 1959 they exchanged cool letters about the data on inhaling from the case-control study of smoking and lung cancer. (‘Dear Fisher, I do not normally take notice of hearsay but I have recently been told on good authority that you are suggesting ...’; ‘Dear Bradford Hill, What a stuffy letter!...’) I doubt whether they met or corresponded much after that. It is clear, though, that during the development of Hill’s ideas about clinical trials, they were on good terms. During Fisher’s Presidency of the Royal Statistical Society (1953–1954), when meetings were held at the London School of Hygiene, Fisher would frequently drop in to the Department of Medical Statistics before a meeting, to pass the time of day with Tony or with any junior members of the Department like myself who happened to be around. In summary, then, we should dismiss any thoughts of competitive claims for priority between Fisher and Hill. Fisher was clearly the progenitor of randomization as an integral part of rigorous comparative experimentation, although the idea had already been broached by others such as CS Peirce, the 19th century American philosopher. Hill, aided by colleagues such as Philip D’Arcy Hart and Marc Daniels in the streptomycin trial, had the ability and personality to persuade the medical profession that this was the way forward. Scientific medicine owes them both a great debt of gratitude. I wish to thank the Special Collections Librarian, Barr Smith Library, University of Adelaide, Australia, for permission to quote from correspondence between Bradford Hill and Fisher archived in the collection of Fisher’s papers. I am also grateful to Sir Iain Chalmers and Professor Harry Marks for comments on an early draft of this paper. References 1 Box JF. R.A Fisher: The Life of a Scientist. New York: Wiley, 1978 . 2 Fisher RA. Statistical Methods for Research Workers. Edinburgh: Oliver and Boyd, 1925 . 3 Fisher RA. The arrangement of field experiments. J Ministry of Agriculture of Great Britain 1926 ; 33 : 503 –13. 4 Fisher RA. The Design of Experiments. Edinburgh: Oliver and Boyd, 1935 . 5 Hill AB. Principles of Medical Statistics. London: The Lancet, 1937 . 6 Fisher RA. Development of the theory of experimental design. Proceedings of the International Statistical Conferences 1947 ; 3 : 434 –39. 7 Fisher RA, Bartlett S. Pasteurised and raw milk. Nature, London 1931 ; 127 : 591 –92. 8 Atkins WRB, Fisher RA. The therapeutic use of vitamin C. Journal of the Royal Army Medical Corps 1943 ; 83 : 251 –52. 9 Doll R, Hill AB. Smoking and carcinoma of the lung. Preliminary report. BMJ 1950 ; ii : 739 –48. 10 Hill AB. The clinical trial. New Engl J Med 1952 ; 247 : 113 –19. 11 Marks HM. The Progress of Experiment: Science and Therapeutic Reform in the United States, 1900–1990. Cambridge, England: Cambridge University Press, 1997 . 12 Hill AB. Memories of the British streptomycin trial in tuberculosis: the first randomized clinical trial. Control Clin Trials 1990 ; 11 : 77 –79. 13 Hill AB. The clinical trial. Br Med Bull 1951 ; 7 : 278 –82. 14 Armitage P. The search for optimality in clinical trials. International Statistical Review 1985 ; 53 : 15 –24. © International Epidemiological Association 2003 TI - Fisher, Bradford Hill, and randomization JF - International Journal of Epidemiology DO - 10.1093/ije/dyg286 DA - 2003-12-01 UR - https://www.deepdyve.com/lp/oxford-university-press/fisher-bradford-hill-and-randomization-gXPwF0gFew SP - 925 EP - 928 VL - 32 IS - 6 DP - DeepDyve ER -