Modus Darwin Reconsidered

Modus Darwin Reconsidered ABSTRACT ‘Modus Darwin’ is the name given by Elliott Sober to a form of argument that he attributes to Darwin in the Origin of Species, and to subsequent evolutionary biologists who have reasoned in the same way. In short, the argument form goes: similarity, ergo common ancestry. In this article, I review and critique Sober’s analysis of Darwin’s reasoning. I argue that modus Darwin has serious limitations that make the argument form unsuitable for supporting Darwin’s conclusions, and that Darwin did not reason in this way. 1 Introduction 2 Modus Darwin 3 Limitations of Sober’s Formal Framework 3.1 Anatomical space 3.2 Branch lengths 4 Did Darwin Use Modus Darwin? 4.1 Adaptive characters 4.2 Galapagos 5 Modus Darwin versus Phylogenetic Inference 6 Conclusion 1 Introduction One of the central tenants of modern evolutionary biology is the shared ancestry of all extant life on Earth. Darwin’s On the Origin of Species took a big step in that direction. Darwin could address only the portion of Earth’s biota of which nineteenth-century naturalists were aware, and he could see only a short way back into the long history of life. But he argued compellingly that diverse groups of organisms had evolved each from a single ancestor species, concluding that ‘animals have descended from at most only four or five progenitors, and plants from an equal or lesser number’ (Darwin [2003], p. 484). It was a radical conclusion, yet his scientific audience was largely convinced (Bowler [1989]; Larson [2004]).1 In a series of publications, Elliott Sober has sought to clarify and analyse Darwin’s case for common ancestry, and to generalize Darwin’s reasoning to encompass contemporary thinking about newer evidence for the hypothesis (Sober [1999], [2008], [2011]; Sober and Steel [2002], [2014]). Sober’s project is thus part exegesis, part epistemology: How does Darwin argue? And how does that argument justify common ancestry (CA)? In answer to the first question, Sober ([1999], p. 265) attributes to Darwin the following argument form: Similarity, ergo common ancestry. This form of argument occurs so often in Darwin’s writings that it deserves to be called modus Darwin. The finches in the Galapagos Islands are similar; hence, they descended from a common ancestor. Human beings and monkeys are similar; hence, they descended from a common ancestor. The examples are plentiful, not just in Darwin’s thought, but in evolutionary reasoning down to the present. To address the epistemological question, Sober sets out to formalize modus Darwin with mathematical rigour, ultimately deriving the force of the argument form from the law of likelihood (explained below). In this article, I review and critique Sober’s analysis of Darwin’s reasoning. After introducing Sober’s account, I temporarily bracket Darwin exegesis to focus on the epistemic merits of modus Darwin as Sober understands it. Here I argue that several difficulties undermine Sober’s defence of that argument form. Then I turn back to Darwin and argue against attributing to him the suspect argument form ‘similarity, ergo common ancestry’. I suggest an alternative reading of key Origin passages and offer a partial epistemological defence of Darwin’s reasoning as I see it. 2 Modus Darwin Sober derives the normative force of modus Darwin from the Law of Likelihood (Hacking [1965]; Royall [1997]; Sober [2008]), according to which an observation supports one hypothesis over another whenever that observation is more probable supposing the one hypothesis were true, compared with supposing the other hypothesis were true. More formally, observation o favours hypothesis h1 over hypothesis h2 if and only if P(o|h1)>P(o|h2). Mapping this framework onto Darwin’s reasoning requires identifying an observation, o, and two hypotheses, h1 and h2. Similarity between two species is the observation o. The hypothesis h1 is common ancestry (CA), which states that those two species descended from a single ancestor species. For the alternative hypothesis, h2, Sober chooses separate ancestry (SA), meaning that the two species’ lineages trace back to separate origin-of-life events. These are, however, only the rough, qualitative statements of o, h1, and h2. To evaluate the inequality P(o|h1)>P(o|h2), Sober must define o more concretely and then formally characterize h1 and h2 as stochastic (chancy) processes that can produce such outcomes with some concrete probability. Regarding the observation o, when do two species count as ‘similar’? Any two species are similar in some ways and dissimilar in others. What is the right yardstick? Sober initially sidesteps this thorny question, and begins with a simpler and more tractable observation: that two species share the same trait on a single dichotomous character. A dichotomous character is one that has just two possible states, for example, an insect might have wings or lack them, or the edge of a leaf might be smooth or serrated. (Coding morphology in terms of dichotomous characters typically masks more continuous underlying variation, but dichotomous characters are adequate in many scientific contexts, and they provide a convenient starting point for the formalization of modus Darwin.) Does the observation favour CA over SA by the law of likelihood? To generate the required conditional probabilities, Sober repurposes the idealizations and mathematical framework of contemporary phylogenetic inference, as follows: Let variables X and Y represent the two species, where each can take states {0, 1}, standing for the two possible states of the observed character. So the observation o is both species in the same state (either both 0 or both 1). Each hypothesis is then characterized by a schematic genealogy for the two species (Figure 1), plus a stochastic model describing how the character variables change states as they move along a line in the genealogy. (While Darwin’s primary target in the Origin was a non-evolutionary, creationist version of the SA hypothesis, Sober prefers to reconstruct modus Darwin using an SA hypothesis that allows for evolutionary change. The idea is that this choice leaves the basic form of Darwin’s reasoning intact, with the added benefit of illuminating the fundamental similarity between Darwin’s reasoning and subsequent arguments made within evolutionary theory.) Figure 1. View largeDownload slide Schematic diagrams illustrating lineages postulated by the common ancestry and separate ancestry hypotheses. Figure 1. View largeDownload slide Schematic diagrams illustrating lineages postulated by the common ancestry and separate ancestry hypotheses. The model of character-state evolution (applied in the same way to all solid lines in both Figure 1 schematics) works as follows: Each solid line comprises a number of time steps (the same number for each of the four lines); the variable associated with each line starts in one state or the other, and then undergoes this many time steps of evolution. At every step there is a small probability that the variable changes from its present state to the other state. (Two state-change probabilities are required: 0→1 and 1→0, which need not be equal.) The probability of changing states at any given step depends only on the current state of the variable. The longer the stretch of lineage, the greater the chance that the character variable will change states along that stretch. In which state does a character variable begin? The initial state is determined by a random draw from a probability distribution over the state space {0, 1} (that is, a coin flip—though the coin may be biased). And here lies the only difference between the models of CA and SA: for SA, the initial states of X and Y are set by two independent draws from that distribution; whereas for CA, just one draw is required because X and Y must begin in the same state (think of this as a point just before speciation). With CA and SA so characterized, Sober proves the following result: it is more probable that X and Y will end up in the same state at the end of the process on CA than on SA—regardless of time steps, state-change probabilities, and starting-state distribution (Sober [2008], Chapter 4).2 In other words, two species found in the same state always favours CA over SA. It isn’t hard to understand intuitively why this is so. If the state-change probabilities are small relative to the number of time steps, then the most probable outcome along any branch is stasis. In this case, since CA puts the two species in the same state from the start, chances are good they will both still be in that state at the end. The probability of ending up in the same state is somewhat smaller on SA, since on that hypothesis X and Y may differ from the get-go. As the probability of change along the branches increases (due either to long lineages or high state-change probabilities), P(o|CA) and P(o|SA) converge on the same value, though p(o|CA) must always be a little bit higher. The opposite is true for species found in different states: mismatches always favour SA. Sober goes on to extend this treatment to cover multi-state characters as well, where the variables X and Y can now take any number of states {1,2, … n} and correspondingly more state-change probabilities are needed: one for every possible transition from one state to another ( i→j, for all i,j∈{1,2, … n}). Sober shows that, here too, X and Y in the same state at the end of the process is more probable on CA than on SA. Mismatches on multi-state characters, however, are more complicated. Some mismatches will favour CA, while others will favour SA, depending on the details (Sober [2008], pp. 295–314). Returning to the question of overall similarity, Sober considers a whole set of observations, {o1,o2, … om}, each one comparing the two species on a different trait. Given such a set, including both matches and mismatches, which hypothesis is favoured overall? As described above, the evidential import of each individual observation, oi, is encoded by the ratio of conditional probabilities, P(oi|CA)/P(oi|SA). Supposing that the process by which each trait evolves is probabilistically independent of that governing every other trait,3 the set favours CA over SA if and only if the product of those ratios (one for each observation) is greater than one—in mathematical notation, if and only if   ∏i=1mP(oi|CA)P(oi|SA)>1. (1) To interpret Darwin’s geographical distribution observations (how species are distributed about the globe), Sober also develops a variant of modus Darwin that proceeds from observed geographical proximity rather than anatomical similarity; in other words, proximity, ergo common ancestry. The mathematics remains the same; all that’s required is a reinterpretation of the stochastic model of character-state evolution as a model of geographical dispersal. Consider a multi-state character with ten discrete states. The model governing how this character evolves requires a ten-by-ten matrix of transition probabilities, one for each possible transition from one state to another (Equation (2)). Allow non-zero probability only between neighbouring states (and between a state and itself). Now think of the states themselves as geographical locations along a line (islands in an archipelago, for example) rather than variants of an anatomical character. And think of state change as geographical dispersal rather than morphological evolution. A species can disperse from one location to another only by passing through the locations in-between—thus the zeros for non-neighbouring state transitions; ‘Neutral evolution within an ordered n-state character is formally just like random dispersal across an n-island archipelago’ (Sober [2008], p. 326):   (0.990.01000 0 0 0 000.010.980.0100 0 0 0 0000.010.980.010 0 0 0 00⋮000000000.010.99) (2) A random draw from a distribution over the ten states determines where a species begins.4 And just as with anatomical modus Darwin, the difference between CA and SA is that CA posits one random draw (whence both species begin dispersing), while SA posits two independent draws, one for each species. The observation o is the distance between species ( |X−Y|) after a period of dispersal. In this example, Sober calculates that With ten locations, the expectation under the separate-ancestry hypothesis is that X and Y will be a bit more than three islands away from each other. If X and Y are more spatially proximate than this, then CA has the higher likelihood; if not, not. (Sober [2008], p. 326) Sober goes on analyse Darwin’s use of geographical distribution observations in the Origin by mapping the reinterpreted formalism onto Darwin’s (Chapter 12) discussion of the Galapagos Archipelago. I will return to the Galapagos example below. 3 Limitations of Sober’s Formal Framework Set Darwin to one side for the moment and consider the argument form modus Darwin on its own merits. Is Sober’s mathematical argument cogent? And what does it tell us about ‘similarity, ergo common ancestry’? I will argue that two features of Sober’s formal framework sharply limit the validation that it provides for modus Darwin. The core mathematical result underlying Sober’s analysis is that two species found in the same state for a single character always favours CA over SA. While this conclusion is striking in its generality, it does not by itself get one very far towards applying modus Darwin to real observations. Most applications call for a continuous, or at least multi-state, treatment, where exact matches will be few and far between. And as soon as we leave behind the special case of the exact match, all of the details and parameters that Sober’s proof manages to bracket become important again. Here modus Darwin can pronounce evidential favouring verdicts only after additional assumptions fix the moving parts within the stochastic models of CA and SA. Can the right assumptions be identified in the contexts in which the inference form is supposed to operate? There is reason for worry in the cases of (i) the size of the anatomical space, and (ii) branch lengths. 3.1 Anatomical space Suppose we compare species X and Y on a given anatomical character, and we model this character as having ten ordered states (Sober’s example, from above). Say it’s the length of a certain bone in centimetres, and the two species measure 1 cm and 4 cm, for a difference of three. Sticking with the Equation (2) transition probabilities (and the resulting equilibrium distribution for the initial states) and supposing a middling 300 time steps, this observation gives a likelihood ratio a hair above one (that is, no evidence either way). But now let’s rethink one of the modelling decisions that led to this number. Who said the range of possible character states is 1–10 cm? Perhaps the upper limit is instead five, or maybe fifteen. Figure 2 displays likelihood ratios for the same observation (and others) recalculated on the assumption that the range of possible states is 1–5 cm (light grey) and 1–15 cm (dark grey). Using the 1–5 cm space, our 3 cm observation registers as evidence favouring SA, but using the 1–15 cm space, the same observation favours CA. Looking across possible observations, on the 1–5 cm space the evidence turns against CA when the observed difference between states is greater than one; in the case of 1–15 cm, the difference must be more than four. Figure 2. View largeDownload slide Likelihood ratios P(oi|CA)/P(oi|SA) for character state observations, varying the assumed range of possible states. Ratios above one favour CA, ratios below one favour SA. Figure 2. View largeDownload slide Likelihood ratios P(oi|CA)/P(oi|SA) for character state observations, varying the assumed range of possible states. Ratios above one favour CA, ratios below one favour SA. In general, positing a larger anatomical space raises the likelihood ratio P(oi|CA)/P(oi|SA), making the evidence appear more favourable to CA, while positing a smaller space lowers the ratio, pushing the needle back towards SA. This effect happens through the denominator: if the starting states of X and Y are chosen independently from a uniform distribution, then a bigger space makes larger observed differences more probable. The size of the anatomical space matters little to the numerator: if variables that begin in the same state will have typically evolved apart by, say, three units at the end of the process, it makes no difference to the outcome whether the space in which this occurs is 15 units wide or 150. The problem is that the choice between different state spaces appears to be arbitrary. What could privilege one over another? You might think to use the range of states observed across all taxa, but surely the organisms that have evolved so far don’t exhaust all possible anatomies. If there is no sensible way of fixing the allowable character states, then Sober’s formal framework cannot, in the end, yield any defensible evidence rulings. In other words, that framework fails to demonstrate how the law of likelihood can be brought to bear on the matter of CA versus SA, in which case the framework does little to validate the argument form modus Darwin. 3.2 Branch lengths Continuing with the example of the ordered, ten-state character (now bracketing concerns about how to choose the range of possible states), Sober writes that the expectation under SA is that species X and Y will be observed to differ by just over three states, and that observations below this threshold favour CA over SA. But this analysis understates the dependence of the evidential favouring verdict on the stipulations that go into the model’s evolutionary mechanics: while the expectation claim is correct, it does not follow, and it is not generally true, that distances below that expectation favour CA. For trait differences of 1–3 cm, the direction of evidential favouring depends on ‘branch length’, a term from phylogenetics that refers to the probability of change along a lineage. Branch length is a function of both the number of time steps (in the solid lines of Figure 1) and the state-transition probabilities: one branch is longer than another if change is more probable along that branch, whether this is due to more time steps, or to higher transition probabilities, or a combination of the two. To understand intuitively the dependence of evidential favouring on branch length, consider the case of branch lengths so short that any change at all is very improbable. Since CA puts the species in the same state to begin with, they will very probably still be in the same state after the period of evolution, for a trait difference of zero. In this case, observing a difference of one, two, or three will heavily favour SA. The broader picture of dependence on branch length can be seen in Figure 3, which displays likelihood ratios for all observations (0–9), calculated on three different assumptions about the number of time steps of evolution.5 The shorter the time frame, the closer the two states must be for the observation to favour CA. And since the mathematics is the same for geographical dispersal, the lesson applies equally to biogeographical modus Darwin. Figure 3. View largeDownload slide Likelihood ratios P(oi|CA)/P(oi|SA) for observed character state differences, varying the number of time steps. Ratios above one indicate evidence for CA, ratios below one indicate evidence for SA. Figure 3. View largeDownload slide Likelihood ratios P(oi|CA)/P(oi|SA) for observed character state differences, varying the number of time steps. Ratios above one indicate evidence for CA, ratios below one indicate evidence for SA. So using Sober’s likelihood framework to interpret similarity/proximity as evidence bearing on CA requires knowledge of branch lengths. What does this mean for modus Darwin? One thing the formal framework is meant to do is show how similarity can, in principle, be evidence for CA. Dependence on branch length does not stand in the way; it means only that the range of circumstances under which similarity really is evidence for CA is defined in part by details about branch length. But we might hope to go beyond saying that similarity can sometimes be evidence for CA, to conclude that agents in a particular context were justified in taking the similarities that they observed as evidence for CA—for example, that Darwin made a good argument, or that his scientific colleagues were swayed rationally. Of course, Darwin didn’t explicitly calculate any likelihood ratios, but the reasoning embodied in Sober’s reconstruction can also be appreciated qualitatively. Which hypothesis fits better with the observed similarities between species X and Y: That the two started out identical, then evolved some? Or that they started with randomly chosen anatomies, then evolved some? This qualitative version of Sober’s likelihood reasoning displays the same dependence on branch length: one cannot judge without knowing something about how long the species have been evolving, and how quickly species evolve. Could a mid-nineteenth century naturalist have had sufficient grasp of the pace and timescale of evolution to justifiably argue that CA is the better fit with the observed similarities? In Darwin’s time, insight into the timescale of biological change came from geology via paleontology. Tremendous progress was made in the eighteenth and nineteenth centuries in collating layers of sediment from sites around the world, resulting in a coherent time-ordering of geological eras and of the fossil remains carried within those layers. But the project of assigning absolute dates to geological eras (and thus fossil remains) proceeded much more slowly. The nineteenth century was characterized by competing and wildly divergent estimates of the age of the earth and its geological eras (Gohau [1990]), and by interdisciplinary jostling on the subject between biologists, geologists, and physicists (Shipley [2001]). Early nineteenth century catastrophists thought in terms of hundreds of thousands of years (Cuvier), or of millions (de Serres, Buckland). Lyell’s uniformitarian assumptions led him to posit 240 million years since the beginning of the Cambrian period—which contained the earliest known fossils at that time (Gohau [1990]). But physicists balked at the idea of a steady-state earth, and following mid-century developments in thermodynamics William Thompson (later Lord Kelvin) calculated at most one–half—and more probably one-tenth—of that time for the earth’s entire history from a molten state to its present condition (Burchfield [1975]). Thompson’s work was influential, pushing most geologists in the late nineteenth century away from uniformitarianism and towards shorter time scales and faster geological processes (Bowler [1989], Chapter 7). Darwin followed Lyell in matters geological, and his own back-of-the-envelope calculations were even more generous than Lyell’s Cambrian estimate.6 Darwin had originally assumed an almost unlimited amount of time for life to evolve (Bowler [1989]; Larson [2004]), and the trend towards shorter time scales put pressure on his theory of natural selection. Thompson’s timeframe in particular was regarded by all parties as too short for Darwin’s slow, gradual process to yield the observed diversity of life. The discrepancy contributed to scepticism about natural selection and encouraged evolutionists’ explorations of alternative processes, including orthogenesis, saltationism, and Lamarckian inheritance. Though sceptical of Thompson’s results, Darwin himself gave the inheritance of acquired characteristics an ever greater role in later editions of the Origin, in part to allow for more rapid evolution (Larson [2004], Chapter 5). Yet through all of the uncertainty and discord over the timescale, pace, and processes of evolution, naturalists grew ever more committed to CA and evolution by some mechanism or other, a hypothesis also called the ‘theory of descent’. American entomologist Vernon Kellogg ([1907], p. 3) summarized the state of play in his Darwinism Today (‘Darwinism’ referring specifically to Darwin’s theory of natural selection): While many reputable biologists to-day strongly doubt the commonly reputed effectiveness of the Darwinian selection factors to explain descent—some, indeed, holding them to be of absolutely no species-forming value—practically no naturalists of position and recongnised attainment doubt the theory of descent. Organic evolution, that is, the descent of species, is looked on by biologists to be as proved a part of their science as gravitation is in the science of physics or chemical affinity in that of chemistry. Doubts of Darwinism are not, then, doubts of organic evolution. This broad-brush historical narrative of the late-nineteenth and early-twentieth centuries shows increasing belief in CA driving research into the mechanisms of evolution and inheritance, while fuzzy ideas about the pace and timescale of evolution were pushed around by constraints from geology and physics. Uncertainty about branch lengths was severe and persistent, with respectable attempts to estimate the absolute age of geological eras varying by more than a factor of one hundred. This suggests that trying to interpret similarity as evidence bearing on CA by reasoning along the lines of Sober’s likelihood comparison would have made for a rather speculative exercise. To summarize, dependence on branch length does not undermine Sober’s demonstration that similarity can sometimes be evidence for CA. But it does mean that without knowledge of branch lengths, one cannot know when this ‘sometimes’ obtains. Sober’s likelihood ratios also depend on how one specifies the space of possible character states. This poses a deeper challenge since, unlike branch length, the extent of the character space appears to be a fundamentally arbitrary mathematical stipulation. If there is no correct way of fixing the character state space, then the framework’s evidence rulings are themselves arbitrary and fail to show that similarity can sometimes be evidence for CA. 4 Did Darwin Use Modus Darwin? So far I have raised concerns about the merits of modus Darwin as a way of reasoning. But now that the inference form looks increasingly difficult to justify, we might step back and (giving Darwin the benefit of the doubt) reconsider the attribution. While Sober ([2008]) sees modus Darwin at work throughout Darwin’s thinking, two specific Origin passages receive special attention. In what follows, I examine these two passages and ask whether modus Darwin plays a role in the reasoning displayed there. In each case, I’ll answer ‘No’, and briefly sketch an alternative reading. 4.1 Adaptive characters Are some similarities between species X and Y more telling than others in favour of CA? Sober ([2008], p. 297) raises the question while discussing the combined evidence from a set of observations, and references the following passage as Darwin’s answer: On my view of characters being of real importance for classification, only in so far as they reveal descent, we can clearly understand why analogical or adaptive character, although of the utmost importance to the welfare of the being, are almost valueless to the systematist. For animals, belonging to two most distinct lines of descent, may readily become adapted to similar conditions and thus assume a close external resemblance; but such resemblances will not reveal—will rather tend to conceal their blood-relationship to their proper lines of descent. (Darwin [2003], p. 427) Within the formal framework discussed above, Sober ([2008], pp. 297-8) shows that transition probabilities that bias both variables X and Y towards a particular state (that is, selection) give rise to smaller likelihood ratios for the observation of both species in the favoured state, compared to symmetrical transition probabilities (drift). In other words, matches on character states that are adaptive provide weaker evidence for CA over SA, just as Darwin said. Or did he? The quoted passage comes from a section of Chapter 13 labelled ‘classification’, in which Darwin reinterprets existing taxonomic practice in light of his theory of evolution. Mid-nineteenth century taxonomic classifications used a groups-within-groups structure to represent relationships between taxa. In essence, Darwin said that those taxonomic structures were in fact genealogical trees (now we would say ‘phylogenetic trees’), and that existing taxonomic practice amounted to a method of phylogenetic inference. To drive home the point, Darwin picked out a handful of taxonomic practices that—though widely followed—had no deep methodological justification, and he argued that those practices made sense in light of his theory of evolution and his interpretation of taxonomy. One of those poorly grounded practices was the discounting of adaptive characters. So Darwin’s comments address the role of adaptive characters within phylogenetic systematics, where the competing hypotheses are alternative genealogical trees (Figure 4), all of which presuppose CA. Competing trees differ only with respect to which species have more recently diverged from which. In this context, SA is out of the picture, and with it modus Darwin. The fundamental mode of reasoning to which Darwin’s discussion of adaptive characters adds a caveat is not ‘similarity, ergo common ancestry’, but rather ‘greater similarity, ergo more recent ancestry’. The latter is the basic credo of phylogenetic inference (a comparatively well-researched inference problem). The adaptive characters passage does not show Darwin using modus Darwin after all. Figure 4. View largeDownload slide Three competing genealogical hypotheses. Figure 4. View largeDownload slide Three competing genealogical hypotheses. This is not to say that Darwin’s discussion of classification and adaptive characters doesn’t ultimately contribute to his case for CA. Darwin takes the branching, tree-like structure of his CA hypothesis to explain the groups-within-groups nature of existing taxonomic relations (Winsor [2009]), as well as—when combined with natural selection—the otherwise mysterious usefulness (for classification) of non-adaptive traits, rudimentary organs, and embryological characters (Richards [2009]). And these explanatory feats, thinks Darwin, redound to the credit of his theory. Of course, this is only a superficial sketch of Darwin’s reasoning—not a philosophical analysis linking that reasoning to well-defined epistemic norms. Yet so long as it is descriptively accurate, we can see that modus Darwin is not invoked. 4.2 Galapagos Darwin’s Origin discussion of the Galapagos Archipelago is the second spot where Sober maps modus Darwin onto a specific passage. Darwin’s brief discussion of the Galapagos comes at the end of two chapters devoted to the geographical distribution of species, where it serves as an illustration of the following generalization: ‘The most striking and important fact for us in regard to the inhabitants of islands, is their affinity to those of the nearest mainland, without being actually the same species’ (Darwin [2003], p. 238–9). Darwin takes this feature of island biogeography to speak in favour of CA, and Sober reconstructs that reasoning as follows: Each Galapagos species, {X1,X2,…Xn}, is paired with a species found on mainland South America, {Y1,Y2,…Yn}, on the basis of close anatomical similarity (Figure 5). For each pair, the anatomical similarity of species Xi to its mainland counterpart Yi supports CA over SA, for that pair, via modus Darwin. On top of that anatomical evidence, the geographical proximity of Xi and Yi then adds further support for CA over SA for that pair, now by the geographical distribution variant of modus Darwin (Sober [2008], p. 330). Figure 5. View largeDownload slide Schematic representation of the Galapagos, {X1, … Xn}, and mainland South American, {Y1, … Yn}, species featured in Sober’s reading of Darwin’s Galapagos Archipelago illustration. Figure 5. View largeDownload slide Schematic representation of the Galapagos, {X1, … Xn}, and mainland South American, {Y1, … Yn}, species featured in Sober’s reading of Darwin’s Galapagos Archipelago illustration. I have another reading. Darwin’s island biogeography generalization is a special case of an even more general trend that he introduces at the very beginning of his first chapter on geographical distribution, namely, that the more accessible any two geographical regions (by migration or dispersal), the more similar the inhabitants of those regions (Darwin [2003], pp. 347–50). That Darwin’s real focus is on relative proximity within groups of species (n > 2) is not obvious from his (rather cursory) treatment of the Galapagos, but it is clear looking at the few examples that he discusses in (somewhat) greater detail. His primary illustrations feature South America’s unique rodents and flightless birds. The agouti, viscacha, coypu, and capybara are each other’s closest taxonomic relations (that is, they’re more similar to each other than to anything else in the world) and they all live in nearby regions of South America. Somewhat similar, but less so (again, as judged by existing taxonomic classifications) are the beaver and muskrat, which are found much further afield in North America and Europe; hares and rabbits are even more widely dispersed. The flightless birds (greater rhea, Darwin’s rhea, emu, and ostrich) illustrate the same pattern (Darwin [2003], p. 349). Darwin’s argument goes roughly as follows: Suppose a group of species shares a branching, tree-like ancestry, and suppose the true tree is reflected (albeit imperfectly) in taxonomists’ classifications. How might this be checked against geographical distribution observations? Consider any species, together with its closest taxonomic relations plus a somewhat more distally classified species or two (similar to what we now call an ‘outgroup’). Since more recent CA leaves less time for geographical dispersal, the closest taxonomic relations should typically be found somewhere more accessible than the outgroup species. The observed trend with which Darwin opens his geographical distribution discussion (the more accessible the regions, the more similar the inhabitants) shows that this is generally the case. This relationship is difficult to explain on the supposition that each species was created independently, so the observations support the CA suppositions from which we began. As before with the adaptive characters passage, my alternative reading falls short of a deep epistemological analysis or evaluation of the argument. The point is that Darwin’s geographical distribution argument does not conform to modus Darwin. The step in Darwin’s reasoning that links accessibility to ancestry presupposes CA for every species pair. And while modus Darwin attends to the absolute proximity between two species—for example, ‘If X and Y are more spatially proximate than [three units away], then CA has the higher likelihood; if not, not’ (Sober [2008], p. 326)—Darwin is talking about relative proximity (X is closer to Y than to Z), with no regard for scale. Indeed, on an absolute scale the Galapagos are very inaccessible from South America, being separated by 600 miles of open ocean. Darwin’s point is that even in such cases, the general pattern of relative similarity mirroring relative proximity persists: for a given Galapagos species, the most similar species found outside the Galapagos archipelago inhabit the most accessible region, the South American mainland; less similar species are found further afield.7 So neither the adaptive characters passage nor the Galapagos example illustrate modus Darwin in action. While two false alarms don’t show that Darwin never used modus Darwin, I hope that by supplying alternative readings of the passages that Sober discusses explicitly, I have at least shifted the burden of proof. My own judgement—which goes beyond what I can argue for here—is that (at least in the Origin) modus Darwin plays at best a minor role in Darwin’s reasoning.8 Other passages that may appear to espouse ‘similarity, ergo common ancestry’ are in my view most likely abbreviated rehearsals of Darwin’s blanket reinterpretation of biological classifications as genealogical hypotheses. They are, in other words, further instances of ‘greater similarity, ergo more recent ancestry’. I suggest it is this phylogenetic thinking—and not modus Darwin—that occurs again and again in Darwin’s reasoning, as a recurring element within various arguments that Darwin constructs in support of his theory. But it may not be quite correct to say that Darwin himself made inferences of the form ‘greater similarity, ergo more recent ancestry’. Nineteenth-century taxonomic classifications were produced by specialists with years of experience working on specific groups of organisms. Except where Darwin did this kind of work himself (for example, on barnacles), he would have relied on the work of others, those classifications becoming part and parcel of any judgements of similarity between species. Other naturalists with deeper knowledge of the taxa in question would have proceeded, in the course of constructing their classifications, roughly along the lines of ‘greater overall similarity, ergo closer taxonomic relatedness’, to which Darwin added ‘by “closer taxonomic relatedness” I think you mean more recent common ancestry’. In any case, the beginning-to-end chain of observation and reasoning that goes from in-depth knowledge of comparative anatomy to a particular genealogical tree for a given set of taxa is something to which Darwin contributes, and on which many of his arguments rely. 5 Modus Darwin versus Phylogenetic Inference Given Darwin’s reliance on something like the inference form ‘greater similarity, ergo more recent ancestry’, one might wonder whether this mode of reasoning founders on the same objections raised above to Sober’s defence of modus Darwin. I should therefore briefly explain why those particular objections do not apply.9 Let Sober’s probabilistic model of character-state evolution operate along the branches of the Figure 4 genealogical trees just as it has, up to this point, along the lineages in the Figure 1 genealogies. Doing so gives us stochastic models of those trees. Given the right kind of observations, these models generate likelihoods P(oi), which can then be compared, just as P(oi|CA) and P(oi|SA) were previously compared, only now there is a three-way contest (between trees 1–3). The observation to which a tree assigns a probability is not a single character-state comparison, but rather a set of comparisons, one for each species pair in the mix. It’s convenient to arrange these pair-wise comparisons on a 2 × 2 table, where each cell shows the character-state difference for one species pair (Figure 6). Think of these (made up) numbers as indicating which species is more similar to which; the pair that differ by three units are the most similar, then the pair that differ by ten, then the pair that differ by thirteen. For a given table of observations, trees 1–3 can be ranked by likelihood with the highest-likelihood tree being the one best supported by the observations. (Sober’s approach to CA and SA was modelled on phylogenetic inference to begin with, so this shift to genealogical trees is just a return to the formalism’s home turf. The result is a simplified version of how likelihoods of trees are calculated within contemporary maximum-likelihood and Bayesian phylogenetic inference.) Figure 6. View largeDownload slide Example data for calculating likelihoods of phylogenetic trees. Figure 6. View largeDownload slide Example data for calculating likelihoods of phylogenetic trees. My first objection to Sober’s defence of modus Darwin was that the likelihood ratio P(oi|CA)/P(oi|SA) is inappropriately sensitive to how one models the space of character traits. Recall that the culprit is the quantity P(oi|SA); a bigger space allows for more divergent starting states, making observations of large character state differences more probable. In contrast, P(oi|CA) does not depend on the size of the character space—provided it is not so small that the species have already bumped into the endpoints—so there is no need to specify its size beyond ‘bigger than what evolution has so far explored’. When it comes to comparing tree versus tree, every hypothesis in the mix affirms CA, and like p(oi|CA) the likelihoods p(oi|treei) can be calculated without having to postulate a concrete range of possible character states. (The calculations below employ an infinite one-dimensional anatomical space.) So the ‘anatomical space’ objection does not apply to a likelihood-based defence of ‘greater similarity, ergo more recent common ancestry’. My second objection was that Sober’s likelihood reasoning rests on a knowledge of branch lengths that was unavailable in Darwin’s time. While branch lengths are a source of uncertainty in phylogenetic inference as well, there is an important sense in which that uncertainty is less debilitating than in the case of modus Darwin. In the likelihood contest between CA and SA, simply stretching or shrinking all branches proportionally can change the direction of evidential favouring from one hypothesis to the other—for example, an observation that appears to favour CA instead favours SA if you halve the time scale. The same is not true when comparing one tree to another, as some example calculations will illustrate. Using the Figure 6 observations, Figure 7 displays the likelihoods P(oi|treei) for trees 1–3 (see Figure 4) over a very wide sweep of branch length assumptions.10 The important feature of Figure 7 is that the lines never cross, meaning that the ranking of hypotheses by likelihood is independent of branch length. This independence is a general feature of the inference problem, not specific to these example observations. Even very severe uncertainty about the overall timescale of evolution therefore does not undermine claims about the observations favouring one tree over another. (Though in the limit as the number of time steps grows arbitrarily large, the three likelihoods converge to the same value, meaning that evidence for one tree over another gradually weakens; see (Sober and Steel [2014]) for an in-depth look at this phenomenon.) Figure 7. View largeDownload slide Likelihoods p(oi|treei) for three trees, over a range of branch lengths. Figure 7. View largeDownload slide Likelihoods p(oi|treei) for three trees, over a range of branch lengths. 6 Conclusion What is absolutely clear is that Darwin is eager to convince his readers of CA, and that some of the Origin passages where he argues most pointedly for this conclusion involve talk of ‘similarity’ or ‘resemblance’. But the structure of the arguments can be somewhat opaque. Sober sees ‘similarity, ergo common ancestry’ at work in those arguments, and launches an (informed and enlightening) investigation into the epistemology of the inference form and its relation to modern statistical inferences within evolutionary biology (Sober [1999], [2008], [2011]; Sober and Steel [2002], [2014]). My aim here has been to review and assess the argument form modus Darwin and its role in Darwin’s case for CA in the Origin. I have argued that the probabilistic justification Sober offers for modus Darwin is inadequate. The basic form of that justification is of course sound (it is the foundation of both likelihoodist and Bayesian statistics): compare the probability of an observation supposing CA were true with the same observation’s probability supposing SA were true. But this is easier said than done. Sober picks an observation type and offers a recipe for calculating the two probabilities, but the recipe calls for some far-fetched ingredients. One of those is branch length, a perfectly legitimate scientific quantity that is routinely estimated with some confidence in modern molecular phylogenetics but was not done so by Victorian naturalists. Another is the range of possible character states, a dubious notion that has no significance within evolutionary theory. Sober’s mathematical construction provides a framework for investigating and rigorously evaluating modus Darwin. I have continued to use that framework here and it has enabled the present analysis. But for the reasons just rehearsed, that construction does not yield a satisfactory justification for modus Darwin, especially not in the nineteenth-century context. In any case, it is far from clear that Darwin argued in that way. Closer inspection of the passages that motivate the attribution to Darwin reveal a different argument form, one familiar from contemporary phylogenetics: ‘greater similarity, ergo more recent ancestry’. This argument form is more defensible, both epistemically and exegetically, though it cannot replace modus Darwin as a self-contained argument for CA—indeed it presupposes that conclusion. ‘Greater similarity, ergo more recent ancestry’ describes just one step of reasoning, used by Darwin in constructing more complex arguments. Acknowledgements Thanks to Bengt Autzen, Matt Barker, David Baum, Maclolm Forster, Jillian Scott McIntosh, Trevor Pearce, Bill Saucier, Elena Spitzer, Michael Titelbaum, Joel Velasco, Peter Vranas, and especially Elliott Sober. Also to audiences at Philosophy of Biology in the UK 2014, APA Pacific 2013, and ISHPSSB 2013. Three anonymous referees helped me to improve the article. Funding This work was supported by a National Science Foundation Graduate Research Fellowship. Footnotes 1 The broad acceptance of CA by Darwin’s scientific audience (within a decade or two of the Origin) should not be confused with their lukewarm response to natural selection, which languished until the modern synthesis. 2 With these very minor assumptions: the starting-state distribution gives non-zero probabilities to both states; transition probabilities are strictly between zero and one; and time steps are finite. 3 While this assumption is certainly not true, it is a standard idealization in phylogenetic inference from genetic data (that is, thinking of each nucleotide site, or of each codon, as a trait). 4 The state-change probabilities (Equation (2)) determine what’s called the ‘equilibrium distribution’ of the location variable, which gives the probabilities of finding the variable in each of its ten states after (loosely speaking) infinitely many time steps. Sober uses this equilibrium distribution as the starting-state distribution—in this case, that distribution is uniform. 5Equation (2) transition probabilities are used throughout; alternatively, one could explore dependence on branch length by fixing the number of time steps and scaling the transition probabilities—with equivalent results. 6 Darwin gives an example close to home: a large geological feature in South East England called the Weald, where relatively deep geological strata are exposed. Higher layers of known (local) thickness must have been worn away over time, and based on Darwin’s estimate of the rate of denudation (wearing down, by various means), he figures the process must have required 300 million years (Darwin [2003], pp. 285–7). All of the strata in question are well above the Cambrian layer, so compared to Lyell’s Cambrian estimate, Darwin’s 300 million is a bigger number for a small fraction of the same geological period. 7 One special absolute threshold of accessibility does play a role in Darwin’s reasoning: if it were impossible for a species or their ancestors to get from point A to point B, then species in those locations could not share CA. Darwin is thus keen to emphasize the mechanisms and ‘accidental means’ by which prima facie implausible journeys might have happened. 8 A good candidate may be Darwin’s closing comments on CA (Darwin [2003], p. 484), where he suggests, on the grounds that all species share some basic chemical and cellular similarities, that there is just one original species from which everything evolved. But he concedes this a flimsy argument, and doesn’t take it very seriously. 9 On phylogenetic inference more generally, see (Baum and Smith [2013]) for a non-technical overview, (Sober [1988]) for a philosophically oriented introduction, and (Felsenstein [1988]) for an early review of mathematical methods. 10 The quantity varied is the number of time steps from the root of the tree to any leaf (assumed equal on all paths in all trees); the branching in each tree takes place after half this number of steps. The state space is the integer number line and the transition probabilities are: step left 1%, step right 1%, stay put 98% (like Equation (2), only without the endpoints). Selectively stretching only certain sections of a tree, on the other hand, can upend the likelihood ranking—a genuine issue in phylogenetic inference, as rates of evolution can vary over time and between lineages. References Baum D. A., Smith S. D. [ 2013]: Tree Thinking: An Introduction to Phylogenetic Biology , Chicago, IL: Roberts and Company. Bowler P. J. [ 1989]: Evolution: The History of an Idea , Berkeley, CA: University of California Press. Burchfield J. D. [ 1975]: Lord Kelvin and the Age of the Earth , Chicago, IL: University of Chicago Press. Google Scholar CrossRef Search ADS   Darwin C. [ 2003]: On the Origin of Species: A Facsimile of the First Edition , Cambridge, MA: Harvard University Press. Felsenstein J. [ 1988]: ‘Phylogenies from Molecular Sequences: Inference and Reliability’, Annual Review of Genetics , 22, pp. 521– 65. Google Scholar CrossRef Search ADS PubMed  Gohau G. [ 1990]: A History of Geology , New Brunswick, NJ: Rutgers University Press. Hacking I. [ 1965]: The Logic of Statistical Inference , Cambridge: Cambridge University Press. Google Scholar CrossRef Search ADS   Kellogg V. L. [ 1907]: Darwinism Today: A Discussion of Present-Day Scientific Criticism of the Darwinian Selection Theories, Together with a Brief Account of the Principal other Proposed Auxiliary and Alternative Theories of Species-Forming , London: George Bell and Sons. Larson E. J. [ 2004]: Evolution: The Remarkable History of a Scientific Theory , New York: Modern Library. Richards R. J. [ 2009]: ‘Classification in Darwin’s Origin’, in Ruse M., Richards R. J. (eds), The Cambridge Companion to the ‘Origin of Species’ , Cambridge and New York: Cambridge University Press, pp. 173– 93. Google Scholar CrossRef Search ADS   Royall R. M. [ 1997]: Statistical Evidence: A Likelihood Paradigm , Boca Raton, FL: Chapman and Hall. Shipley B. C. [ 2001]: ‘‘Had Lord Kelvin a Right?’ John Perry, Natural Selection, and the Age of the Earth, 1894–1895’, in Lewis C. L. E., Knell S. J. (eds), The Age of the Earth: From 4004 BC to AD 2002 , London: Geological Society of London, pp. 91– 105. Google Scholar CrossRef Search ADS   Sober E. [ 1988]: Reconstructing the Past: Parsimony, Evolution, and Inference , Cambridge, MA: MIT Press. Sober E. [ 1999]: ‘Modus Darwin’, Biology and Philosophy , 14, pp. 253– 78. Google Scholar CrossRef Search ADS   Sober E. [ 2008]: Evidence and Evolution: The Logic Behind the Science , Cambridge: Cambridge University Press. Google Scholar CrossRef Search ADS   Sober E. [ 2011]: Did Darwin Write the Origin Backwards? Philosophical Essays on Darwin’s Theory , Amherst, NY: Prometheus Books. Sober E., Steel M. [ 2002]: ‘Testing the Hypothesis of Common Ancestry’, Journal of Theoretical Biology , 218, pp. 395– 408. Google Scholar CrossRef Search ADS PubMed  Sober E., Steel M. [ 2014]: ‘Time and Knowability in Evolutionary Processes’, Philosophy of Science , 81, pp. 558– 79. Google Scholar CrossRef Search ADS   Winsor M. [ 2009]: ‘Taxonomy Was the Foundation of Darwin’s Evolution’, Taxon , 58, pp. 43– 9. © The Author 2016. Published by Oxford University Press on behalf of British Society for the Philosophy of Science. All rights reserved. For Permissions, please email: journals.permissions@oup.com http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png The British Journal for the Philosophy of Science Oxford University Press

Loading next page...
 
/lp/ou_press/modus-darwin-reconsidered-7NwtneJjH0
Publisher
Oxford University Press
Copyright
© The Author 2016. Published by Oxford University Press on behalf of British Society for the Philosophy of Science. All rights reserved. For Permissions, please email: journals.permissions@oup.com
ISSN
0007-0882
eISSN
1464-3537
D.O.I.
10.1093/bjps/axw015
Publisher site
See Article on Publisher Site

Abstract

ABSTRACT ‘Modus Darwin’ is the name given by Elliott Sober to a form of argument that he attributes to Darwin in the Origin of Species, and to subsequent evolutionary biologists who have reasoned in the same way. In short, the argument form goes: similarity, ergo common ancestry. In this article, I review and critique Sober’s analysis of Darwin’s reasoning. I argue that modus Darwin has serious limitations that make the argument form unsuitable for supporting Darwin’s conclusions, and that Darwin did not reason in this way. 1 Introduction 2 Modus Darwin 3 Limitations of Sober’s Formal Framework 3.1 Anatomical space 3.2 Branch lengths 4 Did Darwin Use Modus Darwin? 4.1 Adaptive characters 4.2 Galapagos 5 Modus Darwin versus Phylogenetic Inference 6 Conclusion 1 Introduction One of the central tenants of modern evolutionary biology is the shared ancestry of all extant life on Earth. Darwin’s On the Origin of Species took a big step in that direction. Darwin could address only the portion of Earth’s biota of which nineteenth-century naturalists were aware, and he could see only a short way back into the long history of life. But he argued compellingly that diverse groups of organisms had evolved each from a single ancestor species, concluding that ‘animals have descended from at most only four or five progenitors, and plants from an equal or lesser number’ (Darwin [2003], p. 484). It was a radical conclusion, yet his scientific audience was largely convinced (Bowler [1989]; Larson [2004]).1 In a series of publications, Elliott Sober has sought to clarify and analyse Darwin’s case for common ancestry, and to generalize Darwin’s reasoning to encompass contemporary thinking about newer evidence for the hypothesis (Sober [1999], [2008], [2011]; Sober and Steel [2002], [2014]). Sober’s project is thus part exegesis, part epistemology: How does Darwin argue? And how does that argument justify common ancestry (CA)? In answer to the first question, Sober ([1999], p. 265) attributes to Darwin the following argument form: Similarity, ergo common ancestry. This form of argument occurs so often in Darwin’s writings that it deserves to be called modus Darwin. The finches in the Galapagos Islands are similar; hence, they descended from a common ancestor. Human beings and monkeys are similar; hence, they descended from a common ancestor. The examples are plentiful, not just in Darwin’s thought, but in evolutionary reasoning down to the present. To address the epistemological question, Sober sets out to formalize modus Darwin with mathematical rigour, ultimately deriving the force of the argument form from the law of likelihood (explained below). In this article, I review and critique Sober’s analysis of Darwin’s reasoning. After introducing Sober’s account, I temporarily bracket Darwin exegesis to focus on the epistemic merits of modus Darwin as Sober understands it. Here I argue that several difficulties undermine Sober’s defence of that argument form. Then I turn back to Darwin and argue against attributing to him the suspect argument form ‘similarity, ergo common ancestry’. I suggest an alternative reading of key Origin passages and offer a partial epistemological defence of Darwin’s reasoning as I see it. 2 Modus Darwin Sober derives the normative force of modus Darwin from the Law of Likelihood (Hacking [1965]; Royall [1997]; Sober [2008]), according to which an observation supports one hypothesis over another whenever that observation is more probable supposing the one hypothesis were true, compared with supposing the other hypothesis were true. More formally, observation o favours hypothesis h1 over hypothesis h2 if and only if P(o|h1)>P(o|h2). Mapping this framework onto Darwin’s reasoning requires identifying an observation, o, and two hypotheses, h1 and h2. Similarity between two species is the observation o. The hypothesis h1 is common ancestry (CA), which states that those two species descended from a single ancestor species. For the alternative hypothesis, h2, Sober chooses separate ancestry (SA), meaning that the two species’ lineages trace back to separate origin-of-life events. These are, however, only the rough, qualitative statements of o, h1, and h2. To evaluate the inequality P(o|h1)>P(o|h2), Sober must define o more concretely and then formally characterize h1 and h2 as stochastic (chancy) processes that can produce such outcomes with some concrete probability. Regarding the observation o, when do two species count as ‘similar’? Any two species are similar in some ways and dissimilar in others. What is the right yardstick? Sober initially sidesteps this thorny question, and begins with a simpler and more tractable observation: that two species share the same trait on a single dichotomous character. A dichotomous character is one that has just two possible states, for example, an insect might have wings or lack them, or the edge of a leaf might be smooth or serrated. (Coding morphology in terms of dichotomous characters typically masks more continuous underlying variation, but dichotomous characters are adequate in many scientific contexts, and they provide a convenient starting point for the formalization of modus Darwin.) Does the observation favour CA over SA by the law of likelihood? To generate the required conditional probabilities, Sober repurposes the idealizations and mathematical framework of contemporary phylogenetic inference, as follows: Let variables X and Y represent the two species, where each can take states {0, 1}, standing for the two possible states of the observed character. So the observation o is both species in the same state (either both 0 or both 1). Each hypothesis is then characterized by a schematic genealogy for the two species (Figure 1), plus a stochastic model describing how the character variables change states as they move along a line in the genealogy. (While Darwin’s primary target in the Origin was a non-evolutionary, creationist version of the SA hypothesis, Sober prefers to reconstruct modus Darwin using an SA hypothesis that allows for evolutionary change. The idea is that this choice leaves the basic form of Darwin’s reasoning intact, with the added benefit of illuminating the fundamental similarity between Darwin’s reasoning and subsequent arguments made within evolutionary theory.) Figure 1. View largeDownload slide Schematic diagrams illustrating lineages postulated by the common ancestry and separate ancestry hypotheses. Figure 1. View largeDownload slide Schematic diagrams illustrating lineages postulated by the common ancestry and separate ancestry hypotheses. The model of character-state evolution (applied in the same way to all solid lines in both Figure 1 schematics) works as follows: Each solid line comprises a number of time steps (the same number for each of the four lines); the variable associated with each line starts in one state or the other, and then undergoes this many time steps of evolution. At every step there is a small probability that the variable changes from its present state to the other state. (Two state-change probabilities are required: 0→1 and 1→0, which need not be equal.) The probability of changing states at any given step depends only on the current state of the variable. The longer the stretch of lineage, the greater the chance that the character variable will change states along that stretch. In which state does a character variable begin? The initial state is determined by a random draw from a probability distribution over the state space {0, 1} (that is, a coin flip—though the coin may be biased). And here lies the only difference between the models of CA and SA: for SA, the initial states of X and Y are set by two independent draws from that distribution; whereas for CA, just one draw is required because X and Y must begin in the same state (think of this as a point just before speciation). With CA and SA so characterized, Sober proves the following result: it is more probable that X and Y will end up in the same state at the end of the process on CA than on SA—regardless of time steps, state-change probabilities, and starting-state distribution (Sober [2008], Chapter 4).2 In other words, two species found in the same state always favours CA over SA. It isn’t hard to understand intuitively why this is so. If the state-change probabilities are small relative to the number of time steps, then the most probable outcome along any branch is stasis. In this case, since CA puts the two species in the same state from the start, chances are good they will both still be in that state at the end. The probability of ending up in the same state is somewhat smaller on SA, since on that hypothesis X and Y may differ from the get-go. As the probability of change along the branches increases (due either to long lineages or high state-change probabilities), P(o|CA) and P(o|SA) converge on the same value, though p(o|CA) must always be a little bit higher. The opposite is true for species found in different states: mismatches always favour SA. Sober goes on to extend this treatment to cover multi-state characters as well, where the variables X and Y can now take any number of states {1,2, … n} and correspondingly more state-change probabilities are needed: one for every possible transition from one state to another ( i→j, for all i,j∈{1,2, … n}). Sober shows that, here too, X and Y in the same state at the end of the process is more probable on CA than on SA. Mismatches on multi-state characters, however, are more complicated. Some mismatches will favour CA, while others will favour SA, depending on the details (Sober [2008], pp. 295–314). Returning to the question of overall similarity, Sober considers a whole set of observations, {o1,o2, … om}, each one comparing the two species on a different trait. Given such a set, including both matches and mismatches, which hypothesis is favoured overall? As described above, the evidential import of each individual observation, oi, is encoded by the ratio of conditional probabilities, P(oi|CA)/P(oi|SA). Supposing that the process by which each trait evolves is probabilistically independent of that governing every other trait,3 the set favours CA over SA if and only if the product of those ratios (one for each observation) is greater than one—in mathematical notation, if and only if   ∏i=1mP(oi|CA)P(oi|SA)>1. (1) To interpret Darwin’s geographical distribution observations (how species are distributed about the globe), Sober also develops a variant of modus Darwin that proceeds from observed geographical proximity rather than anatomical similarity; in other words, proximity, ergo common ancestry. The mathematics remains the same; all that’s required is a reinterpretation of the stochastic model of character-state evolution as a model of geographical dispersal. Consider a multi-state character with ten discrete states. The model governing how this character evolves requires a ten-by-ten matrix of transition probabilities, one for each possible transition from one state to another (Equation (2)). Allow non-zero probability only between neighbouring states (and between a state and itself). Now think of the states themselves as geographical locations along a line (islands in an archipelago, for example) rather than variants of an anatomical character. And think of state change as geographical dispersal rather than morphological evolution. A species can disperse from one location to another only by passing through the locations in-between—thus the zeros for non-neighbouring state transitions; ‘Neutral evolution within an ordered n-state character is formally just like random dispersal across an n-island archipelago’ (Sober [2008], p. 326):   (0.990.01000 0 0 0 000.010.980.0100 0 0 0 0000.010.980.010 0 0 0 00⋮000000000.010.99) (2) A random draw from a distribution over the ten states determines where a species begins.4 And just as with anatomical modus Darwin, the difference between CA and SA is that CA posits one random draw (whence both species begin dispersing), while SA posits two independent draws, one for each species. The observation o is the distance between species ( |X−Y|) after a period of dispersal. In this example, Sober calculates that With ten locations, the expectation under the separate-ancestry hypothesis is that X and Y will be a bit more than three islands away from each other. If X and Y are more spatially proximate than this, then CA has the higher likelihood; if not, not. (Sober [2008], p. 326) Sober goes on analyse Darwin’s use of geographical distribution observations in the Origin by mapping the reinterpreted formalism onto Darwin’s (Chapter 12) discussion of the Galapagos Archipelago. I will return to the Galapagos example below. 3 Limitations of Sober’s Formal Framework Set Darwin to one side for the moment and consider the argument form modus Darwin on its own merits. Is Sober’s mathematical argument cogent? And what does it tell us about ‘similarity, ergo common ancestry’? I will argue that two features of Sober’s formal framework sharply limit the validation that it provides for modus Darwin. The core mathematical result underlying Sober’s analysis is that two species found in the same state for a single character always favours CA over SA. While this conclusion is striking in its generality, it does not by itself get one very far towards applying modus Darwin to real observations. Most applications call for a continuous, or at least multi-state, treatment, where exact matches will be few and far between. And as soon as we leave behind the special case of the exact match, all of the details and parameters that Sober’s proof manages to bracket become important again. Here modus Darwin can pronounce evidential favouring verdicts only after additional assumptions fix the moving parts within the stochastic models of CA and SA. Can the right assumptions be identified in the contexts in which the inference form is supposed to operate? There is reason for worry in the cases of (i) the size of the anatomical space, and (ii) branch lengths. 3.1 Anatomical space Suppose we compare species X and Y on a given anatomical character, and we model this character as having ten ordered states (Sober’s example, from above). Say it’s the length of a certain bone in centimetres, and the two species measure 1 cm and 4 cm, for a difference of three. Sticking with the Equation (2) transition probabilities (and the resulting equilibrium distribution for the initial states) and supposing a middling 300 time steps, this observation gives a likelihood ratio a hair above one (that is, no evidence either way). But now let’s rethink one of the modelling decisions that led to this number. Who said the range of possible character states is 1–10 cm? Perhaps the upper limit is instead five, or maybe fifteen. Figure 2 displays likelihood ratios for the same observation (and others) recalculated on the assumption that the range of possible states is 1–5 cm (light grey) and 1–15 cm (dark grey). Using the 1–5 cm space, our 3 cm observation registers as evidence favouring SA, but using the 1–15 cm space, the same observation favours CA. Looking across possible observations, on the 1–5 cm space the evidence turns against CA when the observed difference between states is greater than one; in the case of 1–15 cm, the difference must be more than four. Figure 2. View largeDownload slide Likelihood ratios P(oi|CA)/P(oi|SA) for character state observations, varying the assumed range of possible states. Ratios above one favour CA, ratios below one favour SA. Figure 2. View largeDownload slide Likelihood ratios P(oi|CA)/P(oi|SA) for character state observations, varying the assumed range of possible states. Ratios above one favour CA, ratios below one favour SA. In general, positing a larger anatomical space raises the likelihood ratio P(oi|CA)/P(oi|SA), making the evidence appear more favourable to CA, while positing a smaller space lowers the ratio, pushing the needle back towards SA. This effect happens through the denominator: if the starting states of X and Y are chosen independently from a uniform distribution, then a bigger space makes larger observed differences more probable. The size of the anatomical space matters little to the numerator: if variables that begin in the same state will have typically evolved apart by, say, three units at the end of the process, it makes no difference to the outcome whether the space in which this occurs is 15 units wide or 150. The problem is that the choice between different state spaces appears to be arbitrary. What could privilege one over another? You might think to use the range of states observed across all taxa, but surely the organisms that have evolved so far don’t exhaust all possible anatomies. If there is no sensible way of fixing the allowable character states, then Sober’s formal framework cannot, in the end, yield any defensible evidence rulings. In other words, that framework fails to demonstrate how the law of likelihood can be brought to bear on the matter of CA versus SA, in which case the framework does little to validate the argument form modus Darwin. 3.2 Branch lengths Continuing with the example of the ordered, ten-state character (now bracketing concerns about how to choose the range of possible states), Sober writes that the expectation under SA is that species X and Y will be observed to differ by just over three states, and that observations below this threshold favour CA over SA. But this analysis understates the dependence of the evidential favouring verdict on the stipulations that go into the model’s evolutionary mechanics: while the expectation claim is correct, it does not follow, and it is not generally true, that distances below that expectation favour CA. For trait differences of 1–3 cm, the direction of evidential favouring depends on ‘branch length’, a term from phylogenetics that refers to the probability of change along a lineage. Branch length is a function of both the number of time steps (in the solid lines of Figure 1) and the state-transition probabilities: one branch is longer than another if change is more probable along that branch, whether this is due to more time steps, or to higher transition probabilities, or a combination of the two. To understand intuitively the dependence of evidential favouring on branch length, consider the case of branch lengths so short that any change at all is very improbable. Since CA puts the species in the same state to begin with, they will very probably still be in the same state after the period of evolution, for a trait difference of zero. In this case, observing a difference of one, two, or three will heavily favour SA. The broader picture of dependence on branch length can be seen in Figure 3, which displays likelihood ratios for all observations (0–9), calculated on three different assumptions about the number of time steps of evolution.5 The shorter the time frame, the closer the two states must be for the observation to favour CA. And since the mathematics is the same for geographical dispersal, the lesson applies equally to biogeographical modus Darwin. Figure 3. View largeDownload slide Likelihood ratios P(oi|CA)/P(oi|SA) for observed character state differences, varying the number of time steps. Ratios above one indicate evidence for CA, ratios below one indicate evidence for SA. Figure 3. View largeDownload slide Likelihood ratios P(oi|CA)/P(oi|SA) for observed character state differences, varying the number of time steps. Ratios above one indicate evidence for CA, ratios below one indicate evidence for SA. So using Sober’s likelihood framework to interpret similarity/proximity as evidence bearing on CA requires knowledge of branch lengths. What does this mean for modus Darwin? One thing the formal framework is meant to do is show how similarity can, in principle, be evidence for CA. Dependence on branch length does not stand in the way; it means only that the range of circumstances under which similarity really is evidence for CA is defined in part by details about branch length. But we might hope to go beyond saying that similarity can sometimes be evidence for CA, to conclude that agents in a particular context were justified in taking the similarities that they observed as evidence for CA—for example, that Darwin made a good argument, or that his scientific colleagues were swayed rationally. Of course, Darwin didn’t explicitly calculate any likelihood ratios, but the reasoning embodied in Sober’s reconstruction can also be appreciated qualitatively. Which hypothesis fits better with the observed similarities between species X and Y: That the two started out identical, then evolved some? Or that they started with randomly chosen anatomies, then evolved some? This qualitative version of Sober’s likelihood reasoning displays the same dependence on branch length: one cannot judge without knowing something about how long the species have been evolving, and how quickly species evolve. Could a mid-nineteenth century naturalist have had sufficient grasp of the pace and timescale of evolution to justifiably argue that CA is the better fit with the observed similarities? In Darwin’s time, insight into the timescale of biological change came from geology via paleontology. Tremendous progress was made in the eighteenth and nineteenth centuries in collating layers of sediment from sites around the world, resulting in a coherent time-ordering of geological eras and of the fossil remains carried within those layers. But the project of assigning absolute dates to geological eras (and thus fossil remains) proceeded much more slowly. The nineteenth century was characterized by competing and wildly divergent estimates of the age of the earth and its geological eras (Gohau [1990]), and by interdisciplinary jostling on the subject between biologists, geologists, and physicists (Shipley [2001]). Early nineteenth century catastrophists thought in terms of hundreds of thousands of years (Cuvier), or of millions (de Serres, Buckland). Lyell’s uniformitarian assumptions led him to posit 240 million years since the beginning of the Cambrian period—which contained the earliest known fossils at that time (Gohau [1990]). But physicists balked at the idea of a steady-state earth, and following mid-century developments in thermodynamics William Thompson (later Lord Kelvin) calculated at most one–half—and more probably one-tenth—of that time for the earth’s entire history from a molten state to its present condition (Burchfield [1975]). Thompson’s work was influential, pushing most geologists in the late nineteenth century away from uniformitarianism and towards shorter time scales and faster geological processes (Bowler [1989], Chapter 7). Darwin followed Lyell in matters geological, and his own back-of-the-envelope calculations were even more generous than Lyell’s Cambrian estimate.6 Darwin had originally assumed an almost unlimited amount of time for life to evolve (Bowler [1989]; Larson [2004]), and the trend towards shorter time scales put pressure on his theory of natural selection. Thompson’s timeframe in particular was regarded by all parties as too short for Darwin’s slow, gradual process to yield the observed diversity of life. The discrepancy contributed to scepticism about natural selection and encouraged evolutionists’ explorations of alternative processes, including orthogenesis, saltationism, and Lamarckian inheritance. Though sceptical of Thompson’s results, Darwin himself gave the inheritance of acquired characteristics an ever greater role in later editions of the Origin, in part to allow for more rapid evolution (Larson [2004], Chapter 5). Yet through all of the uncertainty and discord over the timescale, pace, and processes of evolution, naturalists grew ever more committed to CA and evolution by some mechanism or other, a hypothesis also called the ‘theory of descent’. American entomologist Vernon Kellogg ([1907], p. 3) summarized the state of play in his Darwinism Today (‘Darwinism’ referring specifically to Darwin’s theory of natural selection): While many reputable biologists to-day strongly doubt the commonly reputed effectiveness of the Darwinian selection factors to explain descent—some, indeed, holding them to be of absolutely no species-forming value—practically no naturalists of position and recongnised attainment doubt the theory of descent. Organic evolution, that is, the descent of species, is looked on by biologists to be as proved a part of their science as gravitation is in the science of physics or chemical affinity in that of chemistry. Doubts of Darwinism are not, then, doubts of organic evolution. This broad-brush historical narrative of the late-nineteenth and early-twentieth centuries shows increasing belief in CA driving research into the mechanisms of evolution and inheritance, while fuzzy ideas about the pace and timescale of evolution were pushed around by constraints from geology and physics. Uncertainty about branch lengths was severe and persistent, with respectable attempts to estimate the absolute age of geological eras varying by more than a factor of one hundred. This suggests that trying to interpret similarity as evidence bearing on CA by reasoning along the lines of Sober’s likelihood comparison would have made for a rather speculative exercise. To summarize, dependence on branch length does not undermine Sober’s demonstration that similarity can sometimes be evidence for CA. But it does mean that without knowledge of branch lengths, one cannot know when this ‘sometimes’ obtains. Sober’s likelihood ratios also depend on how one specifies the space of possible character states. This poses a deeper challenge since, unlike branch length, the extent of the character space appears to be a fundamentally arbitrary mathematical stipulation. If there is no correct way of fixing the character state space, then the framework’s evidence rulings are themselves arbitrary and fail to show that similarity can sometimes be evidence for CA. 4 Did Darwin Use Modus Darwin? So far I have raised concerns about the merits of modus Darwin as a way of reasoning. But now that the inference form looks increasingly difficult to justify, we might step back and (giving Darwin the benefit of the doubt) reconsider the attribution. While Sober ([2008]) sees modus Darwin at work throughout Darwin’s thinking, two specific Origin passages receive special attention. In what follows, I examine these two passages and ask whether modus Darwin plays a role in the reasoning displayed there. In each case, I’ll answer ‘No’, and briefly sketch an alternative reading. 4.1 Adaptive characters Are some similarities between species X and Y more telling than others in favour of CA? Sober ([2008], p. 297) raises the question while discussing the combined evidence from a set of observations, and references the following passage as Darwin’s answer: On my view of characters being of real importance for classification, only in so far as they reveal descent, we can clearly understand why analogical or adaptive character, although of the utmost importance to the welfare of the being, are almost valueless to the systematist. For animals, belonging to two most distinct lines of descent, may readily become adapted to similar conditions and thus assume a close external resemblance; but such resemblances will not reveal—will rather tend to conceal their blood-relationship to their proper lines of descent. (Darwin [2003], p. 427) Within the formal framework discussed above, Sober ([2008], pp. 297-8) shows that transition probabilities that bias both variables X and Y towards a particular state (that is, selection) give rise to smaller likelihood ratios for the observation of both species in the favoured state, compared to symmetrical transition probabilities (drift). In other words, matches on character states that are adaptive provide weaker evidence for CA over SA, just as Darwin said. Or did he? The quoted passage comes from a section of Chapter 13 labelled ‘classification’, in which Darwin reinterprets existing taxonomic practice in light of his theory of evolution. Mid-nineteenth century taxonomic classifications used a groups-within-groups structure to represent relationships between taxa. In essence, Darwin said that those taxonomic structures were in fact genealogical trees (now we would say ‘phylogenetic trees’), and that existing taxonomic practice amounted to a method of phylogenetic inference. To drive home the point, Darwin picked out a handful of taxonomic practices that—though widely followed—had no deep methodological justification, and he argued that those practices made sense in light of his theory of evolution and his interpretation of taxonomy. One of those poorly grounded practices was the discounting of adaptive characters. So Darwin’s comments address the role of adaptive characters within phylogenetic systematics, where the competing hypotheses are alternative genealogical trees (Figure 4), all of which presuppose CA. Competing trees differ only with respect to which species have more recently diverged from which. In this context, SA is out of the picture, and with it modus Darwin. The fundamental mode of reasoning to which Darwin’s discussion of adaptive characters adds a caveat is not ‘similarity, ergo common ancestry’, but rather ‘greater similarity, ergo more recent ancestry’. The latter is the basic credo of phylogenetic inference (a comparatively well-researched inference problem). The adaptive characters passage does not show Darwin using modus Darwin after all. Figure 4. View largeDownload slide Three competing genealogical hypotheses. Figure 4. View largeDownload slide Three competing genealogical hypotheses. This is not to say that Darwin’s discussion of classification and adaptive characters doesn’t ultimately contribute to his case for CA. Darwin takes the branching, tree-like structure of his CA hypothesis to explain the groups-within-groups nature of existing taxonomic relations (Winsor [2009]), as well as—when combined with natural selection—the otherwise mysterious usefulness (for classification) of non-adaptive traits, rudimentary organs, and embryological characters (Richards [2009]). And these explanatory feats, thinks Darwin, redound to the credit of his theory. Of course, this is only a superficial sketch of Darwin’s reasoning—not a philosophical analysis linking that reasoning to well-defined epistemic norms. Yet so long as it is descriptively accurate, we can see that modus Darwin is not invoked. 4.2 Galapagos Darwin’s Origin discussion of the Galapagos Archipelago is the second spot where Sober maps modus Darwin onto a specific passage. Darwin’s brief discussion of the Galapagos comes at the end of two chapters devoted to the geographical distribution of species, where it serves as an illustration of the following generalization: ‘The most striking and important fact for us in regard to the inhabitants of islands, is their affinity to those of the nearest mainland, without being actually the same species’ (Darwin [2003], p. 238–9). Darwin takes this feature of island biogeography to speak in favour of CA, and Sober reconstructs that reasoning as follows: Each Galapagos species, {X1,X2,…Xn}, is paired with a species found on mainland South America, {Y1,Y2,…Yn}, on the basis of close anatomical similarity (Figure 5). For each pair, the anatomical similarity of species Xi to its mainland counterpart Yi supports CA over SA, for that pair, via modus Darwin. On top of that anatomical evidence, the geographical proximity of Xi and Yi then adds further support for CA over SA for that pair, now by the geographical distribution variant of modus Darwin (Sober [2008], p. 330). Figure 5. View largeDownload slide Schematic representation of the Galapagos, {X1, … Xn}, and mainland South American, {Y1, … Yn}, species featured in Sober’s reading of Darwin’s Galapagos Archipelago illustration. Figure 5. View largeDownload slide Schematic representation of the Galapagos, {X1, … Xn}, and mainland South American, {Y1, … Yn}, species featured in Sober’s reading of Darwin’s Galapagos Archipelago illustration. I have another reading. Darwin’s island biogeography generalization is a special case of an even more general trend that he introduces at the very beginning of his first chapter on geographical distribution, namely, that the more accessible any two geographical regions (by migration or dispersal), the more similar the inhabitants of those regions (Darwin [2003], pp. 347–50). That Darwin’s real focus is on relative proximity within groups of species (n > 2) is not obvious from his (rather cursory) treatment of the Galapagos, but it is clear looking at the few examples that he discusses in (somewhat) greater detail. His primary illustrations feature South America’s unique rodents and flightless birds. The agouti, viscacha, coypu, and capybara are each other’s closest taxonomic relations (that is, they’re more similar to each other than to anything else in the world) and they all live in nearby regions of South America. Somewhat similar, but less so (again, as judged by existing taxonomic classifications) are the beaver and muskrat, which are found much further afield in North America and Europe; hares and rabbits are even more widely dispersed. The flightless birds (greater rhea, Darwin’s rhea, emu, and ostrich) illustrate the same pattern (Darwin [2003], p. 349). Darwin’s argument goes roughly as follows: Suppose a group of species shares a branching, tree-like ancestry, and suppose the true tree is reflected (albeit imperfectly) in taxonomists’ classifications. How might this be checked against geographical distribution observations? Consider any species, together with its closest taxonomic relations plus a somewhat more distally classified species or two (similar to what we now call an ‘outgroup’). Since more recent CA leaves less time for geographical dispersal, the closest taxonomic relations should typically be found somewhere more accessible than the outgroup species. The observed trend with which Darwin opens his geographical distribution discussion (the more accessible the regions, the more similar the inhabitants) shows that this is generally the case. This relationship is difficult to explain on the supposition that each species was created independently, so the observations support the CA suppositions from which we began. As before with the adaptive characters passage, my alternative reading falls short of a deep epistemological analysis or evaluation of the argument. The point is that Darwin’s geographical distribution argument does not conform to modus Darwin. The step in Darwin’s reasoning that links accessibility to ancestry presupposes CA for every species pair. And while modus Darwin attends to the absolute proximity between two species—for example, ‘If X and Y are more spatially proximate than [three units away], then CA has the higher likelihood; if not, not’ (Sober [2008], p. 326)—Darwin is talking about relative proximity (X is closer to Y than to Z), with no regard for scale. Indeed, on an absolute scale the Galapagos are very inaccessible from South America, being separated by 600 miles of open ocean. Darwin’s point is that even in such cases, the general pattern of relative similarity mirroring relative proximity persists: for a given Galapagos species, the most similar species found outside the Galapagos archipelago inhabit the most accessible region, the South American mainland; less similar species are found further afield.7 So neither the adaptive characters passage nor the Galapagos example illustrate modus Darwin in action. While two false alarms don’t show that Darwin never used modus Darwin, I hope that by supplying alternative readings of the passages that Sober discusses explicitly, I have at least shifted the burden of proof. My own judgement—which goes beyond what I can argue for here—is that (at least in the Origin) modus Darwin plays at best a minor role in Darwin’s reasoning.8 Other passages that may appear to espouse ‘similarity, ergo common ancestry’ are in my view most likely abbreviated rehearsals of Darwin’s blanket reinterpretation of biological classifications as genealogical hypotheses. They are, in other words, further instances of ‘greater similarity, ergo more recent ancestry’. I suggest it is this phylogenetic thinking—and not modus Darwin—that occurs again and again in Darwin’s reasoning, as a recurring element within various arguments that Darwin constructs in support of his theory. But it may not be quite correct to say that Darwin himself made inferences of the form ‘greater similarity, ergo more recent ancestry’. Nineteenth-century taxonomic classifications were produced by specialists with years of experience working on specific groups of organisms. Except where Darwin did this kind of work himself (for example, on barnacles), he would have relied on the work of others, those classifications becoming part and parcel of any judgements of similarity between species. Other naturalists with deeper knowledge of the taxa in question would have proceeded, in the course of constructing their classifications, roughly along the lines of ‘greater overall similarity, ergo closer taxonomic relatedness’, to which Darwin added ‘by “closer taxonomic relatedness” I think you mean more recent common ancestry’. In any case, the beginning-to-end chain of observation and reasoning that goes from in-depth knowledge of comparative anatomy to a particular genealogical tree for a given set of taxa is something to which Darwin contributes, and on which many of his arguments rely. 5 Modus Darwin versus Phylogenetic Inference Given Darwin’s reliance on something like the inference form ‘greater similarity, ergo more recent ancestry’, one might wonder whether this mode of reasoning founders on the same objections raised above to Sober’s defence of modus Darwin. I should therefore briefly explain why those particular objections do not apply.9 Let Sober’s probabilistic model of character-state evolution operate along the branches of the Figure 4 genealogical trees just as it has, up to this point, along the lineages in the Figure 1 genealogies. Doing so gives us stochastic models of those trees. Given the right kind of observations, these models generate likelihoods P(oi), which can then be compared, just as P(oi|CA) and P(oi|SA) were previously compared, only now there is a three-way contest (between trees 1–3). The observation to which a tree assigns a probability is not a single character-state comparison, but rather a set of comparisons, one for each species pair in the mix. It’s convenient to arrange these pair-wise comparisons on a 2 × 2 table, where each cell shows the character-state difference for one species pair (Figure 6). Think of these (made up) numbers as indicating which species is more similar to which; the pair that differ by three units are the most similar, then the pair that differ by ten, then the pair that differ by thirteen. For a given table of observations, trees 1–3 can be ranked by likelihood with the highest-likelihood tree being the one best supported by the observations. (Sober’s approach to CA and SA was modelled on phylogenetic inference to begin with, so this shift to genealogical trees is just a return to the formalism’s home turf. The result is a simplified version of how likelihoods of trees are calculated within contemporary maximum-likelihood and Bayesian phylogenetic inference.) Figure 6. View largeDownload slide Example data for calculating likelihoods of phylogenetic trees. Figure 6. View largeDownload slide Example data for calculating likelihoods of phylogenetic trees. My first objection to Sober’s defence of modus Darwin was that the likelihood ratio P(oi|CA)/P(oi|SA) is inappropriately sensitive to how one models the space of character traits. Recall that the culprit is the quantity P(oi|SA); a bigger space allows for more divergent starting states, making observations of large character state differences more probable. In contrast, P(oi|CA) does not depend on the size of the character space—provided it is not so small that the species have already bumped into the endpoints—so there is no need to specify its size beyond ‘bigger than what evolution has so far explored’. When it comes to comparing tree versus tree, every hypothesis in the mix affirms CA, and like p(oi|CA) the likelihoods p(oi|treei) can be calculated without having to postulate a concrete range of possible character states. (The calculations below employ an infinite one-dimensional anatomical space.) So the ‘anatomical space’ objection does not apply to a likelihood-based defence of ‘greater similarity, ergo more recent common ancestry’. My second objection was that Sober’s likelihood reasoning rests on a knowledge of branch lengths that was unavailable in Darwin’s time. While branch lengths are a source of uncertainty in phylogenetic inference as well, there is an important sense in which that uncertainty is less debilitating than in the case of modus Darwin. In the likelihood contest between CA and SA, simply stretching or shrinking all branches proportionally can change the direction of evidential favouring from one hypothesis to the other—for example, an observation that appears to favour CA instead favours SA if you halve the time scale. The same is not true when comparing one tree to another, as some example calculations will illustrate. Using the Figure 6 observations, Figure 7 displays the likelihoods P(oi|treei) for trees 1–3 (see Figure 4) over a very wide sweep of branch length assumptions.10 The important feature of Figure 7 is that the lines never cross, meaning that the ranking of hypotheses by likelihood is independent of branch length. This independence is a general feature of the inference problem, not specific to these example observations. Even very severe uncertainty about the overall timescale of evolution therefore does not undermine claims about the observations favouring one tree over another. (Though in the limit as the number of time steps grows arbitrarily large, the three likelihoods converge to the same value, meaning that evidence for one tree over another gradually weakens; see (Sober and Steel [2014]) for an in-depth look at this phenomenon.) Figure 7. View largeDownload slide Likelihoods p(oi|treei) for three trees, over a range of branch lengths. Figure 7. View largeDownload slide Likelihoods p(oi|treei) for three trees, over a range of branch lengths. 6 Conclusion What is absolutely clear is that Darwin is eager to convince his readers of CA, and that some of the Origin passages where he argues most pointedly for this conclusion involve talk of ‘similarity’ or ‘resemblance’. But the structure of the arguments can be somewhat opaque. Sober sees ‘similarity, ergo common ancestry’ at work in those arguments, and launches an (informed and enlightening) investigation into the epistemology of the inference form and its relation to modern statistical inferences within evolutionary biology (Sober [1999], [2008], [2011]; Sober and Steel [2002], [2014]). My aim here has been to review and assess the argument form modus Darwin and its role in Darwin’s case for CA in the Origin. I have argued that the probabilistic justification Sober offers for modus Darwin is inadequate. The basic form of that justification is of course sound (it is the foundation of both likelihoodist and Bayesian statistics): compare the probability of an observation supposing CA were true with the same observation’s probability supposing SA were true. But this is easier said than done. Sober picks an observation type and offers a recipe for calculating the two probabilities, but the recipe calls for some far-fetched ingredients. One of those is branch length, a perfectly legitimate scientific quantity that is routinely estimated with some confidence in modern molecular phylogenetics but was not done so by Victorian naturalists. Another is the range of possible character states, a dubious notion that has no significance within evolutionary theory. Sober’s mathematical construction provides a framework for investigating and rigorously evaluating modus Darwin. I have continued to use that framework here and it has enabled the present analysis. But for the reasons just rehearsed, that construction does not yield a satisfactory justification for modus Darwin, especially not in the nineteenth-century context. In any case, it is far from clear that Darwin argued in that way. Closer inspection of the passages that motivate the attribution to Darwin reveal a different argument form, one familiar from contemporary phylogenetics: ‘greater similarity, ergo more recent ancestry’. This argument form is more defensible, both epistemically and exegetically, though it cannot replace modus Darwin as a self-contained argument for CA—indeed it presupposes that conclusion. ‘Greater similarity, ergo more recent ancestry’ describes just one step of reasoning, used by Darwin in constructing more complex arguments. Acknowledgements Thanks to Bengt Autzen, Matt Barker, David Baum, Maclolm Forster, Jillian Scott McIntosh, Trevor Pearce, Bill Saucier, Elena Spitzer, Michael Titelbaum, Joel Velasco, Peter Vranas, and especially Elliott Sober. Also to audiences at Philosophy of Biology in the UK 2014, APA Pacific 2013, and ISHPSSB 2013. Three anonymous referees helped me to improve the article. Funding This work was supported by a National Science Foundation Graduate Research Fellowship. Footnotes 1 The broad acceptance of CA by Darwin’s scientific audience (within a decade or two of the Origin) should not be confused with their lukewarm response to natural selection, which languished until the modern synthesis. 2 With these very minor assumptions: the starting-state distribution gives non-zero probabilities to both states; transition probabilities are strictly between zero and one; and time steps are finite. 3 While this assumption is certainly not true, it is a standard idealization in phylogenetic inference from genetic data (that is, thinking of each nucleotide site, or of each codon, as a trait). 4 The state-change probabilities (Equation (2)) determine what’s called the ‘equilibrium distribution’ of the location variable, which gives the probabilities of finding the variable in each of its ten states after (loosely speaking) infinitely many time steps. Sober uses this equilibrium distribution as the starting-state distribution—in this case, that distribution is uniform. 5Equation (2) transition probabilities are used throughout; alternatively, one could explore dependence on branch length by fixing the number of time steps and scaling the transition probabilities—with equivalent results. 6 Darwin gives an example close to home: a large geological feature in South East England called the Weald, where relatively deep geological strata are exposed. Higher layers of known (local) thickness must have been worn away over time, and based on Darwin’s estimate of the rate of denudation (wearing down, by various means), he figures the process must have required 300 million years (Darwin [2003], pp. 285–7). All of the strata in question are well above the Cambrian layer, so compared to Lyell’s Cambrian estimate, Darwin’s 300 million is a bigger number for a small fraction of the same geological period. 7 One special absolute threshold of accessibility does play a role in Darwin’s reasoning: if it were impossible for a species or their ancestors to get from point A to point B, then species in those locations could not share CA. Darwin is thus keen to emphasize the mechanisms and ‘accidental means’ by which prima facie implausible journeys might have happened. 8 A good candidate may be Darwin’s closing comments on CA (Darwin [2003], p. 484), where he suggests, on the grounds that all species share some basic chemical and cellular similarities, that there is just one original species from which everything evolved. But he concedes this a flimsy argument, and doesn’t take it very seriously. 9 On phylogenetic inference more generally, see (Baum and Smith [2013]) for a non-technical overview, (Sober [1988]) for a philosophically oriented introduction, and (Felsenstein [1988]) for an early review of mathematical methods. 10 The quantity varied is the number of time steps from the root of the tree to any leaf (assumed equal on all paths in all trees); the branching in each tree takes place after half this number of steps. The state space is the integer number line and the transition probabilities are: step left 1%, step right 1%, stay put 98% (like Equation (2), only without the endpoints). Selectively stretching only certain sections of a tree, on the other hand, can upend the likelihood ranking—a genuine issue in phylogenetic inference, as rates of evolution can vary over time and between lineages. References Baum D. A., Smith S. D. [ 2013]: Tree Thinking: An Introduction to Phylogenetic Biology , Chicago, IL: Roberts and Company. Bowler P. J. [ 1989]: Evolution: The History of an Idea , Berkeley, CA: University of California Press. Burchfield J. D. [ 1975]: Lord Kelvin and the Age of the Earth , Chicago, IL: University of Chicago Press. Google Scholar CrossRef Search ADS   Darwin C. [ 2003]: On the Origin of Species: A Facsimile of the First Edition , Cambridge, MA: Harvard University Press. Felsenstein J. [ 1988]: ‘Phylogenies from Molecular Sequences: Inference and Reliability’, Annual Review of Genetics , 22, pp. 521– 65. Google Scholar CrossRef Search ADS PubMed  Gohau G. [ 1990]: A History of Geology , New Brunswick, NJ: Rutgers University Press. Hacking I. [ 1965]: The Logic of Statistical Inference , Cambridge: Cambridge University Press. Google Scholar CrossRef Search ADS   Kellogg V. L. [ 1907]: Darwinism Today: A Discussion of Present-Day Scientific Criticism of the Darwinian Selection Theories, Together with a Brief Account of the Principal other Proposed Auxiliary and Alternative Theories of Species-Forming , London: George Bell and Sons. Larson E. J. [ 2004]: Evolution: The Remarkable History of a Scientific Theory , New York: Modern Library. Richards R. J. [ 2009]: ‘Classification in Darwin’s Origin’, in Ruse M., Richards R. J. (eds), The Cambridge Companion to the ‘Origin of Species’ , Cambridge and New York: Cambridge University Press, pp. 173– 93. Google Scholar CrossRef Search ADS   Royall R. M. [ 1997]: Statistical Evidence: A Likelihood Paradigm , Boca Raton, FL: Chapman and Hall. Shipley B. C. [ 2001]: ‘‘Had Lord Kelvin a Right?’ John Perry, Natural Selection, and the Age of the Earth, 1894–1895’, in Lewis C. L. E., Knell S. J. (eds), The Age of the Earth: From 4004 BC to AD 2002 , London: Geological Society of London, pp. 91– 105. Google Scholar CrossRef Search ADS   Sober E. [ 1988]: Reconstructing the Past: Parsimony, Evolution, and Inference , Cambridge, MA: MIT Press. Sober E. [ 1999]: ‘Modus Darwin’, Biology and Philosophy , 14, pp. 253– 78. Google Scholar CrossRef Search ADS   Sober E. [ 2008]: Evidence and Evolution: The Logic Behind the Science , Cambridge: Cambridge University Press. Google Scholar CrossRef Search ADS   Sober E. [ 2011]: Did Darwin Write the Origin Backwards? Philosophical Essays on Darwin’s Theory , Amherst, NY: Prometheus Books. Sober E., Steel M. [ 2002]: ‘Testing the Hypothesis of Common Ancestry’, Journal of Theoretical Biology , 218, pp. 395– 408. Google Scholar CrossRef Search ADS PubMed  Sober E., Steel M. [ 2014]: ‘Time and Knowability in Evolutionary Processes’, Philosophy of Science , 81, pp. 558– 79. Google Scholar CrossRef Search ADS   Winsor M. [ 2009]: ‘Taxonomy Was the Foundation of Darwin’s Evolution’, Taxon , 58, pp. 43– 9. © The Author 2016. Published by Oxford University Press on behalf of British Society for the Philosophy of Science. All rights reserved. For Permissions, please email: journals.permissions@oup.com

Journal

The British Journal for the Philosophy of ScienceOxford University Press

Published: Mar 1, 2018

There are no references for this article.

You’re reading a free preview. Subscribe to read the entire article.


DeepDyve is your
personal research library

It’s your single place to instantly
discover and read the research
that matters to you.

Enjoy affordable access to
over 18 million articles from more than
15,000 peer-reviewed journals.

All for just $49/month

Explore the DeepDyve Library

Search

Query the DeepDyve database, plus search all of PubMed and Google Scholar seamlessly

Organize

Save any article or search result from DeepDyve, PubMed, and Google Scholar... all in one place.

Access

Get unlimited, online access to over 18 million full-text articles from more than 15,000 scientific journals.

Your journals are on DeepDyve

Read from thousands of the leading scholarly journals from SpringerNature, Elsevier, Wiley-Blackwell, Oxford University Press and more.

All the latest content is available, no embargo periods.

See the journals in your area

DeepDyve

Freelancer

DeepDyve

Pro

Price

FREE

$49/month
$360/year

Save searches from
Google Scholar,
PubMed

Create lists to
organize your research

Export lists, citations

Read DeepDyve articles

Abstract access only

Unlimited access to over
18 million full-text articles

Print

20 pages / month

PDF Discount

20% off