Spatial Evolution of Human Dialects

Spatial Evolution of Human Dialects Selected for a Viewpoint in Physics PHYSICAL REVIEW X 7, 031008 (2017) James Burridge Department of Mathematics, University of Portsmouth, Portsmouth PO1 3HF, United Kingdom (Received 28 February 2017; revised manuscript received 9 May 2017; published 17 July 2017) The geographical pattern of human dialects is a result of history. Here, we formulate a simple spatial model of language change which shows that the final result of this historical evolution may, to some extent, be predictable. The model shows that the boundaries of language dialect regions are controlled by a length minimizing effect analogous to surface tension, mediated by variations in population density which can induce curvature, and by the shape of coastline or similar borders. The predictability of dialect regions arises because these effects will drive many complex, randomized early states toward one of a smaller number of stable final configurations. The model is able to reproduce observations and predictions of dialectologists. These include dialect continua, isogloss bundling, fanning, the wavelike spread of dialect features from cities, and the impact of human movement on the number of dialects that an area can support. The model also provides an analytical form for Séguy’s curve giving the relationship between geographical and linguistic distance, and a generalization of the curve to account for the presence of a population center. A simple modification allows us to analytically characterize the variation of language use by age in an area undergoing linguistic change. DOI: 10.1103/PhysRevX.7.031008 Subject Areas: Complex Systems, Interdisciplinary Physics, Statistical Physics I. INTRODUCTION without sharp boundaries [1–3]. Whereas an isogloss represents the extent of an individual feature, a recogniz- Over time, human societies develop systems of belief, able dialect is typically a combination of many distinctive languages, technology, and artistic forms that collectively features [1,4]. We can attempt to distinguish dialects by may be called culture. The formation of culture requires superposing many different isoglosses, but often they do individuals to have ideas, and then for others to copy them. not coincide [3], leading to ambiguous conclusions. Historically, most copying has required face-to-face inter- The first steps toward an objective, quantitative analysis action, and because most human beings tend to remain of the shapes of dialect areas were made by Séguy [5,6], localized in geographical regions that are small in com- who examined large aggregates of features, making com- parison to the world, human culture can take quite different parison between lexical distances and geographic separa- forms in different places. One aspect of culture where tions. Central to the quantitative study of dialects, called geographical distribution has been studied in great detail is dialectometry (see Ref. [7] for a recent review), is the dialect [1]. measurement of linguistic distance which, for example, can In order to visualize the spatial extent of dialects, be viewed as the smallest number of insertions, deletions, dialectologists have traditionally drawn isoglosses: lines or substitutions of language features needed to transform enclosing the domain within which a particular linguistic one segment of speech into another [8]. This “Levenshtein feature (a word, a phoneme, or an element of syntax) is distance” was originally devised to measure the difference used. However, it is not usually the case that language use between sequences [9]. Using a metric of this kind, a set of changes abruptly at an isogloss—typically there is a dialect observations can be grouped into clusters according transition zone where a mixture of alternative features is to their linguistic (as opposed to spatial) closeness [10–13]. used [1]. In fact, there is debate about whether the most The clusters then define geographical dialect areas. appropriate way to view the geographical organization of The question we address is why dialect domains have dialects is as a set of distinct areas or as a continuum particular spatial forms, and to give a quantitative answer requires a model. The question has been addressed in the past, famously (amongst dialectologists) by Trudgill and james.burridge@port.ac.uk co-workers [1,14], with his “gravity model.” According to Published by the American Physical Society under the terms of this, the strength of linguistic interaction between two the Creative Commons Attribution 4.0 International license. population centers is proportional to the product of their Further distribution of this work must maintain attribution to populations, divided by the square of the distance between the author(s) and the published article’s title, journal citation, and DOI. them. The influence of a settlement (e.g., a city) i on 2160-3308=17=7(3)=031008(27) 031008-1 Published by the American Physical Society JAMES BURRIDGE PHYS. REV. X 7, 031008 (2017) regions of aligned atoms evolve so as to minimize boundary another j is then defined to be the product of interaction strength with the ratio P =ðP þ P Þ, where P and P are length [21,22]. The human agents who interact to form i i j i j dialects behave in roughly the same way (as do some birds the population sizes of settlements i and j. These additive influence scores may then be used to predict the progress of [23]). When people speak and listen to each other, they have a linguistic change that originated in one city, by determin- a tendency to conform to the patterns of speech they hear ing the settlements over which it exerts the greatest net others using, and therefore to “align” their dialects. Since influence. It is then predicted that the change progresses people typically remain geographically localized in their from settlement to settlement in a cascade. Predictions may everyday lives, they tend to align with those nearby. This also be made regarding the combined influence of cities on local copying gives rise to dialects in the same way that neighboring nonurban areas. The model has been partially short-range atomic interactions give rise to domains in successful in predicting observed sequences of linguistic ferromagnets. However, whereas the atoms in a ferromagnet change [1,15–17], and offers some qualitative insight into are regularly spaced, human population density is variable. the most likely positions of isoglosses [14]. In this paper, We show that as a result, stable boundaries between domains we offer an alternative model, also based on population become curved lines. data, which makes use of ideas from statistical mechanics. While our interest is in the spatial distribution of Rather than starting with a postulate about the nature of linguistic forms, there are other properties of language interactions between population centers, we begin with for which parallels with the physical or natural world can be assumptions about the interactions between speakers. From usefully drawn, and corresponding mathematical methods these assumptions about small-scale behavior we derive applied. For example, the rank-frequency distribution of predictions about macroscopic behavior. This approach word use, compiled from millions of books, takes the form has the advantage of making clear the link between of a double power law [24,25], which can be explained [24] individual human interactions and population-level behav- using a novel form of the Yule process [26,27], first ior. Moreover, we are able to unambiguously define the introduced to explain the distribution of the number of dynamics of the model and make precise predictions about species in genera of flowering plants. Historical fluctua- the locations of isoglosses, the nature of transition regions tions in the relative frequency with which words are used between linguistic forms, and the most likely structure of have been shown to decay as a language ages and expands dialect domains. There are links between our approach and [25], analogous with the cooling effect produced by the agent-based models of language change [18], which expansion of a gas. Methods used to understand disorder in directly simulate the behavior of individuals. The difference physical systems (“quenched” averages) have been applied between this approach and ours lies in the fact that for us, to explain how a tendency to focus on topics controls assumptions about individual behavior lead to equations for fluctuations in the combined vocabulary of groups of texts language evolution which are macroscopic in character. [28]. A significant focus of current statistical physics These equations have considerable analytical tractability research has been on the evolution and properties of and offer a simple and intuitive picture of the large-scale networks [29], which have many diverse applications from spatial processes at play. the spread of ideas, fashions, and disease [30] to the In seeking to model the spatial distribution of language vulnerability of the internet [31]. Real networks are often beginning with the individual, we are encouraged by the formed by “preferential attachment” where new connec- fact that dialects are created through a vast number of tions are more often made to already well-connected nodes, complex interactions between millions of people. These leading to a “scale-free” (power-law) distribution of node people are analogous to atoms in the physical context, and degree. The popularity of words has been shown to evolve when very large numbers of particles interact in physical in the same way [32]; words used more in the past tend to systems, simple macroscopic laws often emerge. Despite be used more in the future. Beyond the study of word use the fact that dialects are the product of hundreds of years of and vocabulary, agent-based models such as the naming linguistic and cultural evolution [4], and thus historical game [33], used to investigate the emergence of language, events must have played a role in creating their spatial and the utterance selection model [34], used to model distribution [19], the physical analogy suggests that it may changes in language use over time, have been particularly be possible to formulate approximate statistical laws that influential. We follow the latter model by representing play a powerful role in their spatial evolution. language use using a set of discrete linguistic variables. A physical effect analogous to the formation of dialects is Spatial models motivated by concepts of statistical physics phase ordering [20].Thisoccurs, forexample,inferromag- have also been used to study the spread of crime [35] and to netic materials, where each atom attempts to align itself with devise optimal vaccination strategies to prevent disease neighbors. If the material is two dimensional (a flat sheet), this leads to the formation of a patchwork of domains where [36]. The importance of the emergence of order in social all atoms are aligned with others in the same domain, but not contexts, and connections to statistical physics, may be with those in other domains. The boundaries between these found in a wide-ranging review [37] by Castellano et al. 031008-2 SPATIAL EVOLUTION OF HUMAN DIALECTS PHYS. REV. X 7, 031008 (2017) II. SUMMARY FOR LINGUISTS implemented using only a spreadsheet (see Supplemental Material [38]), although a computer program would be A. Contents of the paper much faster. Using this, isogloss evolution can be explored The aim of this paper is to adapt the theory of phase in linguistic domains with any shape and population ordering to the study of dialects, and then to use this theory distribution. The simplicity of the scheme invites adapta- to explain aspects of their spatial structure. For those tion to include more linguistic realism (e.g., bias toward a without a particular mathematical or quantitative inclina- linguistic variant). Beyond the exploration of individual tion, the model can be simply explained: We assume that isoglosses, a line of inquiry that may be of interest to people come into linguistic contact predominantly with dialectometrists is to test our predicted forms of Séguy’s those who live within a typical travel radius of their home curve against observations. (around 10–20 km). If they live near a town or city, we assume that they experience more frequent interactions III. MODEL with people from the city than with those living outside it, Our aim is to define a model of speech copying which simply because there are many more city dwellers with incorporates as few assumptions as possible, while whom to interact. We represent dialects using a set of allowing the effect of local linguistic interaction and linguistic variables [1], and we suppose that speakers have movement to be investigated. The model has its roots in a tendency to adapt their speech over time in order to the ideas of the linguist Bloomfield [3], who believed that conform to local conventions of language use. Our model is the speech pattern of an individual constantly evolved deliberately minimal: these are our only assumptions. We through his or her life via pairwise interaction. This discover that, starting from any historical language state, microscopic view of language change led to the prediction these assumptions lead to the formation of spatial domains that the diffusion of linguistic features should follow routes where particular linguistic variants are in common use, as with the greatest density of communication. Bloomfield in Fig. 2. We find that the isoglosses that bound these defined this as the density of conversational links between domains are driven away from population centers, that they speakers accumulated over a given period of time. In our tend to reduce in curvature over time, and that they are most model, the analogy of this link density is an interaction stable when emerging perpendicular to borders of a kernel weighted by spatial variations in population distri- linguistic domain. These theoretical principles of isogloss bution. We implicitly assume that interaction is inherently evolution are explained pictorially in Figs. 3, 4, and 5, and local so that linguistic changes spread via normal contact provide a theoretical explanation for a range of observed [39], rather than via major displacements, conquests, phenomena, such as the dialects of England (Fig. 7), the or dispersion of settled communities. We are, therefore, Rhenish fan (Fig. 10), the wavelike spread of language modeling language in stable settlements, with initial con- features from cities (Figs. 12 and 16), the fact that narrow ditions set by the most recent major population upheaval. regions often have “striped” dialects (Fig. 11), and that We consider a population of speakers, each of whom has coastal indentations including rivers and estuaries often a small home neighborhood, and we introduce a population generate isogloss bundles. Our assumptions also lead to a density ρðx; yÞ giving the spatial variation of the number mathematical expression for the relationship between homes per unit area. In order to incorporate local human linguistic and geographical distance—the Séguy curve— movement within the model, we begin by defining a and a hypothesis regarding the question of when dialects Gaussian interaction kernel for each speaker: should be viewed as a spatial continuum, as opposed to distinct areas (Fig. 19). 2 2 1 Δx þ Δy ϕðΔx;ΔyÞ ≔ exp − : 2 2 2πσ 2σ B. How might a linguist make use of this work? Without using mathematics, but having understood our Note that the symbol ≔ indicates the definition of a new principles of isogloss evolution and considered the exam- quantity. Consider a speaker, Anna, whose home neighbor- ples set out in this paper, further cases may be sought where hood is centered on ðx ;y Þ. In the absence of variation in 0 0 the principles explain observations. If the principles cannot population density, ϕ is the normalized distribution of the explain a particular situation or are violated, one might seek relative positions, ðΔx;ΔyÞ, of the home neighborhoods to understand what was missing from the underlying of speakers with whom Anna regularly interacts. The assumptions, or if they were wrong. Since the assumptions constant σ, the interaction range, is a measure of the typical are so minimal, they cannot be the whole story, and a geographical distance between the neighborhoods of inter- discussion of possible missing pieces is given in Sec. VIII. acting speakers. Now suppose that density is not uniform For the mathematically inclined linguist, Appendix A sets due to the presence of a city or a sparsely populated out an elementary scheme for solving the fundamental mountainous area. In this case, while Anna is going about evolution equation on a computer. This scheme also offers a her daily life she is more likely to hold conversations with simple and intuitive understanding of the model, and can be people whose homes lie in a nearby densely populated 031008-3 JAMES BURRIDGE PHYS. REV. X 7, 031008 (2017) region because these people constitute a greater proportion An intuitive understanding of this equation may be gained of the local population. To incorporate this density effect, by imagining that each speaker possesses an internal tape we define a normalized weighted interaction kernel for a recorder that records language use as they travel around the home at ðx ;y Þ: vicinity of their home. As time passes, older recordings 0 0 fade in importance to the speaker, and the variable m ϕðx − x ;y − y Þρðx; yÞ measures the historical frequency with which variable i has 0 0 kðx ;y ; x; yÞ ≔ : 0 0 been heard, accounting for the declining importance of 2 ϕðu − x ;v − y Þρðu; vÞdudv 0 0 older recordings. The rate of this decline is determined by Given any region A, the fraction of Anna’s interactions that the parameter τ, which we call memory length, and note that are with people who live in A is kðx ;y ;x;yÞdxdy. changing its value simply rescales the unit of time. We note 0 0 We distinguish between dialects by constructing a set of also that this form of memory may be seen as a determin- linguistic variables whose values vary between dialects. A istic spatial version of the discrete stochastic memory used single variable might, for example, be the pronunciation of in the utterance selection model [34,46]. On the grounds the vowel u in the words “but” and “up” [4]. In England, that speakers collect very large samples of local linguistic northerners use a long form, “boott” and “oopp,” with information, our definition does not contain terms repre- phonetic symbol [℧], and southerners use a short version, senting random sampling error. In going from Eq. (1) to (2) [∧]. Considering a single variable which we suppose has we use the saddle point method [47] to approximate the V> 1 variants, we define f ðx; y; tÞ to be the relative i spatial integral in Eq. (1) and assume that j∇ ρj=ρ is small frequency with which the ith variant of our variable is used compared to σ (that is, population changes approximately by speakers in the neighborhood of ðx; yÞ, at time t.For linearly over the length scale of human interaction). mathematical simplicity, we assume that nearby speakers To allow speakers to base their current speech on what use language in a similar way, so that f ðx; y; tÞ varies i they have heard in the past, we let f ðx; y; tÞ be a function smoothly with position. p of the set of memories ðm ;m ; …;m Þ ≕ m: i 1 2 V People speak on average 16 000 words per day [40] and can take months or years (depending on their age and f ðx; y; tÞ ≔ p ½mðx; y; tÞ: i i background) to adapt their speech to local forms [41,42]. Changing speech habits therefore involves a very large Differentiating Eq. (2) with respect to t, and rescaling the number of word exchanges, at least in the tens of thousands units of time so that one time unit is equal to one memory (comparable in magnitude to typical vocabulary size [43]). length τ, we obtain Although the rate at which individuals adapt their speech is not constant throughout life (it is particularly rapid in the ∂m ðx; y; tÞ young), adaptation has been observed even in late middle ¼ p ½mðx; y; tÞ − m ðx; y; tÞ i i age [44]. To capture the cumulative effect of linguistic ∂t interaction we make use of a forgetting curve, which þ ∇ fρðx; yÞp ½mðx; y; tÞg; ð3Þ measures the relative importance of recent interactions to 2ρðx; yÞ older ones. From a mathematical point of view, the simplest form for this curve is an exponential, and in fact there is which governs the spatial evolution of the ith alternative for some evidence from experiments involving word recall a single linguistic variable. We note that memory length no [45], which suggests that this is an appropriate choice. longer appears as a parameter. An enhanced intuitive However, we emphasize that the curve, for us, is simply a understanding of this evolution equation may be gained way to capture the fact that current speech patterns depend from its discrete counterpart, used to find computational on past interactions and that older interactions tend to be solutions, and derived in Appendix A. less important. With this in mind we make the following The simplest possible choice for p is to let speakers use definition of the memory of a speaker from the neighbor- each variant with the same frequency that they remember it hood of ðx; yÞ, for the ith variant of a variable, being used: p ½mðx; y; tÞ ¼ m ðx; y; tÞ. This produces i i “neutral evolution” [34,46,48–50], where there is no bias m ðx; y; tÞ in the evolution of each variant. Equation (3) then describes Z Z ðs−tÞ=τ e pure diffusion, and variants spread out uniformly over the ≔ kðx; y; u; vÞf ðu; v; sÞdudv ds ð1Þ system. If all linguistic variables evolved in this way we −∞ R would eventually have one spatially homogeneous mixture ðs−tÞ=τ t of grammar, pronunciation, and vocabulary. If our memory model involved a stochastic component [46], then even- −∞ tually we would expect all but one variant of each variable to disappear. Neither of these outcomes reflects the reality × f ðx; y; sÞþ ∇ fρðx; yÞf ðx; y; sÞg ds: ð2Þ i i 2ρðx; yÞ of locally distinctive forms of language. 031008-4 SPATIAL EVOLUTION OF HUMAN DIALECTS PHYS. REV. X 7, 031008 (2017) We motivate our choice for p based on two observa- tions. The first is that dialects exist. In order for this to be the case, if the ith variant of a linguistic variable has been established amongst a local population for a considerable time so that m ≈ 1, then a small amount of immigration into the region by speakers using a different variable should not normally be sufficient to change it. Mathematically, this is equivalent to the statement that the nondiffusive term, p ½mðx; y; tÞ − m ðx; y; tÞ, in our evolution equation i i [Eq. (3)] must possess a locally stable fixed point at m ¼ 1. The second observation is derived from experiments on social learning, which show that the behavior of individuals is considerably influenced by the majority opinion of those FIG. 1. Dashed line shows the function p ðmÞ ≡ f , defined by with whom they interact [51–53]. In fact, such social 1 1 Eq. (4) for the V ¼ 2 model with conformity number β ¼ 2. conformity is widely observed in the animal kingdom Dotted line shows the neutral version p ðmÞ¼ m (when β ¼ 1) 1 1 and is responsible for the formation of dialects in some for comparison. Solid line shows the function p ðmÞ − m in the 1 1 species of birds [54]. Recent experimental research into case β ¼ 2, giving the time derivative of the memory in the human social learning [53], in which individuals were absence of spatial variation. Note that the dashed line shows that allowed to make a choice, before being exposed to the when speakers’ memories contain a majority of variant 1, they opinions of a group, has revealed that the likelihood of an use this variant in a greater proportion than they recall it being individual switching their decision depends nonlinearly on used. This leads to progressively greater levels of conformity: the proportion of the group who disagree. It is an increasing more speakers using variant 1. function which climbs rapidly when the proportion exceeds 50%, also possessing an inflection for large groups. In the English, for example, language change is often initiated by context of language, such nonlinear conforming behavior the working and lower middle classes [56,57], before would mean that variants that were used more frequently spreading to other groups. Some forms of language change than others should be used with disproportionately large are driven by resistance to conformity; for example frequency in the future. A simple way to capture this “prestige dialects” (received pronunciation in the UK) behavior is to define are used to signify membership of a social elite, set apart from the common people. The desire to set oneself apart ðm Þ from others can also create reversals in language use p ðmÞ ≔ P ; ð4Þ V β ðm Þ j¼1 j amongst subsets of a population. For example, local residents of Martha’s Vineyard [58,59] reverted to an where β ≥ 1 measures the extent of conformity (non- archaic form of pronunciation in order to reaffirm local neutrality). If β ¼ 1, we have the neutral model, and for tradition in the face of invading tourists. A similar effect β > 1, the nondiffusive term in Eq. (3) has a stable fixed was observed on the island of Ocracoke in North Carolina point at m ¼ 1. According to Eq. (4), individuals dispro- [60], but in this case the reversal was temporary. As well as portionately favor the most common variants they have social factors, language use may also be determined by age, heard: they have a tendency to conform to the local majority gender, or ethnicity [61]. It is clear that reality is far more language use with β measuring the strength of this effect. In complex than our simple model, which does not make any the limit β → ∞, all speakers use only the most common of these distinctions between speakers. However, the fact (modal) dialect they have heard. An example of this func- that dialects exist is itself evidence that, in general, people tion is plotted in Fig. 1 for the V ¼ 2 model. Conforming do adapt to local speech patterns. To model every speaker behavior allows local dialects to form, as we show below. as having the same need to conform is therefore a The model we define is a coarse-grained description of reasonable first approximation to reality. It also has the real linguistic interactions which in reality are much more value of simplicity, allowing us later to determine the complex. Much of this complexity arises because there are importance of various additional levels of complexity by often many distinct class, ethnic, or age-defined social comparing how effectively our model fits empirical data networks in any given geographical region. Within each of when compared to more complex models. these subgroups the need to conform leads to similar speech patterns among members, and these patterns often, but not IV. SYNTHETIC DIALECT MAPS always, spread to other groups. Research by linguists has A. Application to Great Britain demonstrated that social factors strongly influence the uptake of particular speech patterns [55] and that language We apply our model to the island of Great Britain (GB), use is correlated with social class and identity. In American whose early inhabitants were known as Britons, and spoke 031008-5 JAMES BURRIDGE PHYS. REV. X 7, 031008 (2017) Celtic languages [62]. The earliest form of English was back through hundreds of years. Since dialect evolution brought to the island by invading Germanic-speaking equation (3) depends only on relative population densities, settlers. This became Anglo Saxon (or Old English), as the current density distribution therefore serves as reason- written by Alfred, King of Wessex (849–899 A.D.), but able proxy for historical versions. We estimate that σ lies in would not be recognizable to modern speakers. It slowly the range 5 < σ < 15 km based on that fact that the changed, with external influences (notably Norman), into average distance traveled to work in GB in 2011 was the English we know today [19]. 15 km [64], whereas the average distance traveled to We seek to discover the extent to which the spatial secondary school was 5.5 km [65]. In Sec, VII, we find distribution of dialect structures that have emerged in GB that the typical width of a transition region between −1=2 can be predicted by Eq. (3). To model the evolution of linguistic variables is ≈1.8σðβ − 1Þ . For example, the individual linguistic variables we take mainland GB as our transition between northern and southern GB dialects is spatial domain, and numerically solve Eq. (3) on a grid of ≈60 km wide [1], which, if σ ¼ 10 km, gives the approxi- discrete points (Fig. 2) using an explicit Euler scheme [63] mation β ≈ 1.1. (Appendix A). The initial condition for the solution is a randomly generated spatial frequency distribution where 1. Evolution of isoglosses each grid point is assigned a randomly selected variant. By When it comes to interpreting our results, the fact that repeatedly generating initial conditions and solving the usage frequencies are continuously varying through space system, we can determine the most probable equilibrium presents a similar problem to that faced by dialectologists spatial distributions of language use. The population when trying to draw isoglosses. We resolve this by defining density ρðx; yÞ is estimated using 2011 census data [64], domain boundaries to be lines across which the modal which gives the number of inhabitants at each of the ≈1.8 × (most common) variant changes. A domain is therefore a 10 UK postcodes. A smooth density is then obtained from region throughout which a single variant is the most this by allowing the inhabitants to diffuse a short distance commonly used. We may think of domain boundaries as from the geographical center of their postcode. Despite synthetic isoglosses generated by Eq. (3). In Fig. 2,we significant overall population growth, the locations of show a series of snapshots of the evolution of domains major population centers in GB can trace their origins when there are V ¼ 3 variants. Isogloss evolution is driven by a two-dimensional form of surface tension [66]: in the absence of density variation, curved boundaries straighten out. Figure 3 illustrates why this happens faster when curvature is greater. Here, speaker L hears more of variant A, so domain B will retract in this locality. Speaker R hears more of variant B, and so domain A will retract in this region. The net effect will be to straighten the boundary, reducing its length. If a boundary forms a closed curve, then this length reduction effect can cause it to evolve toward a circular shape, and reduce in area, eventually disappearing altogether. However, this shrinking droplet effect can be arrested or reversed if the droplet surrounds a sufficiently dense population center (a city). In fact, population centers typically repel isoglosses in our model, and so have a tendency to create their own domains. An explanation of this effect is given in Fig. 4. Here, we have a FIG. 2. Evolution of the V ¼ 3 model from randomized initial condition with σ ¼ 15 km and β ¼ 1.1 at times t ∈ f1; 2; 4; 8; 16; 32g, where one time unit corresponds to one FIG. 3. The surface tension effect at domain boundaries. Blue memory length. Colors indicate which variant is most common at dots represent speakers and black circles give an approximate each position. Numerical solution implemented in C++ on grid representation of interaction ranges. In the red shaded parts of with 2-km spacing [63] (GB is ≈1000 km north to south). Each these interaction ranges, variant A is more common, and in the grid point initialized with randomly selected variant. yellow shaded parts, variant B is more common. 031008-6 SPATIAL EVOLUTION OF HUMAN DIALECTS PHYS. REV. X 7, 031008 (2017) FIG. 4. Behavior of an isogloss surrounding a densely popu- lated area. The blue dot represents a speaker on the isogloss FIG. 5. Behavior of isoglosses at an indentation in coastline or between variants A and B. Other speakers are shown as black other boundary (political or naturally occurring). Dashed isogloss dots. The circle around our speaker represents their typical range is unstable and will evolve toward the solid isogloss which of interaction. Within red shaded part of this interaction range, emerges perpendicular from the coast. We assume that both variant A is more common, and in the yellow shaded part, variant isoglosses are effectively anchored to a feature some distance B is more common. Because of variation in population density, away, opposite the boundary shown. Speakers are shown as blue they hear more of variant B (dashed lines indicate interactions) dots, and colors have the same meanings as in Fig. 3. despite the fact that a greater area (red shaded) of their interaction range lies within the domain of variant A. in the Atlantic coast of France: the Gironde estuary [1], region of high population density in which linguistic separating the langue d’oc from the langue d’oil. The fact variant B is dominant, surrounded by a low-population that bundling at such locations is predicted by our model density region where variant A is in common use. We provides the first sign of the predictive power of the surface consider the linguistic neighborhood of a speaker located tension effect. on the isogloss separating the two domains. From Fig. 4 we Having considered the evolution of a single linguistic see that although the majority of the speaker’s interaction variable, we now turn to modeling dialects. A dialect is range lies in region A, she has many more interactions with typically defined by multiple linguistic characteristics, and those in region B, and is therefore likely to adapt her speech we can capture this by combining many solutions to toward variant B, causing the isogloss to shift outward into Eq. (3). In Fig. 6, we superpose the synthetic isoglosses the low-density area. for 20 binary (V ¼ 2) linguistic variables. We see that there The most common form of stable isogloss generated by is a significant degree of bundling where many isoglosses our model is a line, typically with some population density follow similar routes across the system. Given that the induced curvature, connecting two points on the boundary initial conditions for each variable are distinct random of the system. In order to be stable, such lines must emerge frequency distributions (Fig. 2), these bundles represent perpendicular from system boundaries, and as a result they highly probable isogloss positions: many different early are attracted to indentations in coastline, as illustrated in spatial distributions lead to these at later stages of evolu- Fig. 5. In this figure, we consider two speakers located at tion. The key point here is that the final spatial structure of the points where two possible isoglosses meet the coast (or dialect domains is rather insensitive to the early history of other system boundary—a country border or a mountain the language: the effects of surface tension and population range, for example). Speaker R, on the dashed isogloss, density draw many different isoglosses toward the same hears more of variant B because the isogloss is not stable configurations. In this sense the surface tension perpendicular to the coast; it will therefore migrate upward effect is an “invisible hand” which, in the long term, can toward the apex of the coastal indentation until it reaches overpower historical population upheavals. However, we the stable form shown by the solid line. This effect can be emphasize that our model predicts only the spatial structure seen in Fig. 2, where the longest east-west isogloss has of dialects and not their particular sound; this is very much migrated so that it emerges from the largest indentation on determined by quirks of history and the initial state of the the east coast of GB. In reality this indentation, called “the system. Figure 6 also illustrates the effect of human wash,” is the site of an isogloss bundle (the coincidence of mobility on dialect structure. For a smaller interaction several isoglosses) separating northern [℧] from southern range (5 km), the structure of synthetic isogloss bundles is [∧] [67]. A similar bundle occurs at the largest indentation more complex, producing a larger number of distinct 031008-7 JAMES BURRIDGE PHYS. REV. X 7, 031008 (2017) FIG. 7. Left map: Future England dialect boundaries predicted by Trudgill [4]. Right map: Future dialect boundaries predicted using k-medoids cluster analysis of 20 synthetic binary linguistic variables when σ ¼ 10 km and β ¼ 1.1 at t ¼ 150. Levenshtein distance (or “edit distance”) [9] used as distance metric. Colors, determined by Hungarian method, show mapping between FIG. 6. Superposition of the isoglosses at t ¼ 50 produced by dialect areas. Black dotted line shows north-south isogloss. 20 solutions of the V ¼ 2 model with β ¼ 1.1, each with different randomized initial conditions. For the left-hand map, σ ¼ 5 km, and for the right-hand map, σ ¼ 10 km (see video in Supple- boundaries. These aggregated data are then divided into k mental Material [38]). Background shading indicates population clusters using the k-medoids algorithm [70] (available in density with brightest orange corresponding to 7200 inhabitants the R language). The metric used for linguistic distance per km . between sample points is the Manhattan distance between the binary vectors, where the two variants are labeled 1 or regions. This effect is well documented in studies of the −1. Because we are comparing vectors which can be historical evolution of dialects which were, in the past, transformed into one another purely by substitutions more numerous and covered smaller geographical areas [4]. (1 for −1 or vice versa), rather than insertions or deletions, Within our model, this is explained by the fact that this is equivalent to the Levenshtein distance used in fluctuations in population density become relevant to dialectometry [2,9]. We find that almost identical results isogloss evolution only when they take place over a length are obtained by applying Ward’s hierarchical clustering scale that is comparable to the interaction range: Two algorithm [71] to the sample locations and subsequently human settlements could develop distinct dialects only if cutting the tree into k clusters. they were separated by a distance significantly greater than In order to compare our cluster analysis to the work of σ, otherwise they would be in regular linguistic contact. dialectologists, we consider a prediction for the future dialect areas of England (excluding Wales and Scotland) 2. Cluster analysis made by Trudgill [4], shown in the left-hand map of Figs. 7 and 8. This prediction divides the country into 13 regions, Having analyzed our model using isoglosses, we now and is the result of a systematic analysis of regional make comparison to recent work in dialectometry, where variation in speech and ongoing changes. Such sharp dialect domains have been determined using cluster analysis and by multidimensional scaling [68]. A typical clustering approach [10,12] is to construct a data set giving the frequencies of a wide range of variant pronunciations at different locations, and then to cluster these locations according to the similarity of their aggregated sets of characteristics. Resampling techniques such as bootstrap [69] may be used to generate “fictitious” data sets and improve stability. We mimic this approach by constructing a synthetic data set from 20 solutions of Eq. (3) with V ¼ 2, and each with different random initial conditions, corre- sponding to different linguistic variables. We then ran- domly select a large number (6000) of sample locations within GB and determine the modal variants for each of the FIG. 8. Left map: Future England dialect boundaries predicted 20 variables at each location. This sample size is chosen to by Trudgill [4]. Right map: Voronoi tessellation with the same be sufficiently large so that the effect of resampling is only number of cells as Trudgill’s prediction. Colors determined by to make short length scale (≪1 km) changes to cluster Hungarian algorithm. 031008-8 SPATIAL EVOLUTION OF HUMAN DIALECTS PHYS. REV. X 7, 031008 (2017) TABLE I. Metrics measuring the similarity between Trudgill’s predictions [4] for future English dialects and the predictions of our model. Metric acronyms are OL (overlap), WOL (weighted overlap), RI (Rand index), and ARI (adjusted Rand index). The Voronoi example column gives equivalent metrics for the example Voronoi tessellation in Fig. 8, and the Voronoi set column gives the mean metrics, with standard deviation, for five randomly generated Vornoi tessellations. Metric Model Voronoi example Voronoi set OL 68% 41% 45  3% WOL 82% 36% 49  11% FIG. 9. Results of k-medoids cluster analysis of 20 synthetic RI 0.91 0.84 0.83  0.01 binary linguistic variables when σ ¼ 10 km and β ¼ 1.1 at ARI 0.63 0.29 0.30  0.01 t ¼ 150. Edit distance [9] used as distance metric. k values from left to right are k ¼ 16, 22, 30. Trudgill’s prediction. The weighted overlap (WOL) divisions are a significant simplification of reality, however, weights overlapping regions in proportion to their popu- and hide many subtle smaller-scale variations. The decision lation density: it gives the probability that a randomly to define 13 regions therefore reflects a judgment on the selected inhabitant will be assigned to the same dialect zone range of language use which can be categorized as a single by both maps. From Table I we see that this probability is dialect. To allow comparison with this prediction, we high (82%) for our model, but lower for the random perform a set of cluster analyses of near-equilibrium Voronoi model. We suggest that this is a result of the fact (large t) solutions for the whole of GB, for a range of that population centers tend to repel isoglosses and, values of the number k of clusters (see Fig. 9), with the therefore, often lie at the centers of dialect domains. We aim of producing 13 within the subset of GB defined by examine this repulsion effect in more detail below. The England. The closest result is 14 clusters for 20 ≤ k ≤ 24, final two metrics are commonly used to compare cluster- with almost identical results within England for each of ings. Consider a set S of elements (spatial locations for us) these choices. Having defined our synthetic dialect regions, that has been partitioned into clusters (dialect areas) in two we apply the Hungarian method [72] to find the mapping different ways. Let us call these two partitions X and Y. The between our synthetic dialects and Trudgill’s predicted Rand index (RI) [74] is defined as the probability that, dialects, which maximizes the total area of overlap between given two randomly selected elements of S, the partitions X the two. The results are shown in Fig. 7. To provide a and Y will agree in their answer to the question: are both measure of the effectiveness of our model in matching elements in the same cluster? A disadvantage with using Trudgill’s predictions, we also define a null model, which this index to compare dialect maps is that the larger the divides the country into regions at random, independent of number of regions in the maps, the more likely it is that two population distribution and without reference to any model randomly selected spatial points will not lie in the same of speaker interaction. There are a number of models that cluster in either map. The index therefore approaches 1 as generate random tessellations of space [73], many of which the number of dialect areas grows. This problem may be are motivated by physical processes such as fracture or countered by taking account of its expected value if X and crack propagation. We exclude such physical assumption Y were picked at random, subject to having the same and thus opt for the Voronoi tessellation [73], based on the number of clusters and cluster sizes as the originals [75]. Poisson point process: the simplest of all random spatial The “adjusted Rand index” (ARI) is then defined as processes. Our null model is then a Voronoi tessellation of England (Fig. 8) using 13 points selected uniformly at RI − expected index ARI ≔ : ð5Þ random from within its borders, with dialects labeled to 1 − expected index most closely match Trudgill’s map, using the Hungarian method. The ARI ∈ ½−1; 1 measures the extent to which a cluster- Having generated our synthetic dialect maps, we now ing is a better match than random to some reference quantify the extent to which they match the predictions of clustering and is used by dialectometrists [76] in preference Trudgill. The null model, because of its lack of modeling to the original Rand index. For us, the reference clustering assumptions, will reveal the extent to which our model is is Trudgill’s predicted dialect map, and the Rand and “better than random” at matching these predictions. We adjusted Rand indices in Table I measure similarity to this offer four alternative metrics of similarity in Table I. The reference. simplest metric is overlap (OL): the percentage of land area The primary conclusion that may be drawn from the which is identified as belonging to the same dialect as indices in Table I is that by all measures our model provides 031008-9 JAMES BURRIDGE PHYS. REV. X 7, 031008 (2017) a better match than the null model (indices all differ by at least 3 standard deviations, and typically many more). Of particular interest is the weighted overlap probability (WOL ¼ 82%). Isoglosses are typically repelled by pop- ulation centers, so tend to pass through regions of relatively low density. Because of this the WOL may be viewed as a measure of the effectiveness of the model at determining the centers of dialect regions and is less sensitive to small errors in isogloss construction, explaining its high value. It is important to realize also that Trudgill’s predictions may themselves be imperfect. FIG. 10. Evolution of isoglosses in a 400 × 200 system with two We now make some qualitative comments. The dotted opposing boundary indentations and unit background population line in Fig. 7 shows the location of our model’s most dense density, together with a collection of cities contributing additional north-south isogloss bundle. This is coincident with what is 2 2 population densities ρðrÞ¼ðρ − 1Þ exp f−r =ð2R Þg,where r is described by Trudgill as “one of the most important distance from city center, R ¼ 10,and ρ ¼ 4 (measuring ratio of isoglosses in England” [14] dividing those who have [℧] peak city density to background). Parameter values σ ¼ 4, β ¼ 2. in butter from those who do not. In our model, the fact that Evolution times t ¼ 10, 50, 300, 1660. See video in Supplemental this border lies where it does is a result of the surface Material for full animation [38]. tension effect which attracts many isoglosses towards the two coastal indentations at either end (see video in Supplemental Material [38]). The fact that many random- Dusseldorf, Cologne, Koblenz, and Trier. This arrangement ized initial boundary shapes evolve toward this configura- of isoglosses is known as the “Rhenish fan.” In Fig. 10,we tion, and that the configuration is seen as important by construct an artificial system with boundaries approximat- dialectologists, supports the hypothesis that surface tension ing the geographical structure of relevant parts of the is an important driver of spatial language evolution. We Dutch-German language area illustrated in Bloomfield also note that the western extremities of GB (Cornwall and [3] containing an artificial cluster of population centers northwest Scotland) support multiple synthetic dialects in representing the German cities located near the Rhine. The our model, and we suggest that this is due to a heavily system was initialized using the same randomization indented coastline and the fact that high aspect-ratio procedure used for GB, and Fig. 10 shows a superposition tongues of land are likely to be crossed by isoglosses; a of ten solutions, each with different initial conditions. In the fact predictable by analogy with continuum percolation early stages of evolution, very little pattern is discernible, [21]. The southwest peninsula has historically supported but as time progresses, the main indentation collects three dialects. isoglosses, while the cities repel them, producing a fanlike In future work, the model might be tested by comparing structure. We therefore suggest that the isogloss separation its predictions to well-researched dialect areas. On example which created the Rhenish fan may have been the result of is Netherlands. Here, dialectologists have performed a repulsion by the cities of the Rhine. cluster analysis [68] based on Levenshtein distances We next consider an example of what some physicists between field observations of 360 dialect varieties (corre- refer to as “stripe states” [21]: in finite systems that sponding to 357 geographical locations), revealing 13 experience phase ordering, and have aspect ratios greater significant geographical groupings. The extent to which than 1, boundaries between two orderings often form across a model is consistent with these groupings, accounting for the system by the shortest route (in a rectangle, joining two variability caused by finite sample size, could be tested by long sides). A collection of such boundaries forms a striped generating equivalent clusterings for multiple fictitious pattern of phase orderings. Figure 11 illustrates this effect, dialect samples of the same size. produced by Eq. (3). Our model therefore predicts such striped dialect patterns in long thin countries, and a B. Bundles, fans, stripes, and circular waves particularly striking example of the effect may be seen in the dialects of the Saami language [77]. The Saami We now illustrate a number of well-known features of dialect distributions which may be qualitatively reproduced people are indigenous to the Sámpi region (Lapland), by our model. We consider first the isogloss bundle which includes parts of Norway, Sweden, and Finland. reported by Bloomfield [3] separating “High German” Their Arctic homeland forms a curved strip with a length from “Low German.” The bundle emerged from the tip which is approximately 5 times its average width. The of an indentation of the Dutch-German speech area region is divided into ten language areas, and the bounda- (bordered to the east by Slavic languages) and ran roughly ries of all but two of these take a near-direct route between east-west before separating approximately 40 km east of the two long boundaries of the system, forming a distinctive striped pattern. Another example is the dialects of Japan, the river Rhine, and fanning out around cities such as 031008-10 SPATIAL EVOLUTION OF HUMAN DIALECTS PHYS. REV. X 7, 031008 (2017) FIG. 11. Evolution of isoglosses in a 400 × 200 rectangular system with uniform population density, starting from random- ized initial conditions. We show a superposition of 10 such solutions. Notice that all isoglosses join the two long sides of the system. Parameter values σ ¼ 4, β ¼ 2. Evolution times t ¼ 30, 90, 270, 810. FIG. 12. Evolution of a circular isogloss (initial radius 30) in a whose boundaries in many cases cross the country 200 × 200 system with unit background population density, perpendicular to its spine [78]. together with a central city contributing additional density The relationship between geographical separation and 2 2 ρðrÞ¼ðρ − 1Þ exp f−r =ð2R Þg, where r is distance from city linguistic distance (often measured using Levenshtein dis- center, R ¼ 40, and ρ ¼ 21. Parameter values σ ¼ 4, β ¼ 2. tances [2]) is typically sublinear [2,39]. The definition of Evolution times t ∈ f10; 20; 30; …; 300g. linguistic distance and its relation to geographic distance was made by Séguy [5,6], and the relationship therefore younger speakers, who are more susceptible to new forms goes by the name Séguy’s curve [39]. It has been of speech. We illustrate how this effect can be analyzed in substantially refined and tested since [2,39,79], and also Appendix B. generalized to involve travel time [80]. Séguy’s curve is not universal, however. For example, an analysis of Tuscan V. SÉGUY’S CURVE dialect data [81] reveals an unusually low correlation between phonetic and geographical distances. A more We now determine the relationship between geographi- detailed analysis reveals that there are geographically cal and linguistic distance within our model, providing an remote areas which are linguistically similar, and that analytical prediction for the form of Séguy’s curve [5,6,39]. within an approximately circular region (radius ≈40 km) For simplicity, we consider the two variant model V ¼ 2 around the main city, Florence, phonetic variation corre- and suppose that our language contains a number n of lates more strongly with geographical proximity. It is linguistic variables. At each location in space the local hypothesized [81] that this pattern marks the radial spread dialect is an n-dimensional vector of the local modal of a linguistic innovation (called “Tuscan-gorgia”). These variants which we label 1 and −1. Letting ϕðr;tÞ, where Tuscan data motivate our final example of the qualitative r ¼ðx; yÞ, be the vector field giving the distribution of behavior of our model. To illustrate how a linguistic these variants, the number of differences (the Levenshtein variable can spread outwards from a population center, distance [9]) between two dialects ϕðr ;tÞ ≕ ϕð1Þ and purely through the effects of population distribution and not ϕðr ;tÞ ≕ ϕð2Þ is ½n − ϕð1Þ · ϕð2Þ=2. The linguistic dis- necessarily driven by prestige or other forms of bias, we tance Lð1; 2Þ between two dialects may be defined [82] as simulate our model using an artificial city with Gaussian the fraction of variables that differ between them: population distribution (Fig. 12). The system is initialized with a circular isogloss, centered on the city, representing a 1 ϕð1Þ · ϕð2Þ Lð1; 2Þ ≔ 1 − : ð6Þ local linguistic innovation. Because population density is a 2 n decreasing function of the distance from the city center, Since we assume that each variant evolves independently of speakers on the isogloss hear more of the innovation than every other, the expected linguistic distance is their current speech form, allowing it to expand (as explained in Fig, 4). We see in Sec. VII that this expansion will not not necessarily continue indefinitely. Expansion lð1; 2Þ ≔ E½Lð1; 2Þ ¼ ð1 − E½ϕ ð1Þϕ ð2ÞÞ; ð7Þ i i processes such as this have also been observed in Norway [14]. In that case, the progress of new linguistic forms was where ϕ is the ith component of ϕ. To compute shown to depend on age, with changes more advanced for E½ϕ ð1Þϕ ð2Þ, we make use of the close similarity between i i 031008-11 JAMES BURRIDGE PHYS. REV. X 7, 031008 (2017) Eq. (3) and the time-dependent Ginzburg-Landau equation that derivatives are limits of sums, and sums of Gaussian [20] to derive (see Appendix C) an analogue of the Allen- random variables are themselves Gaussian). However, the Cahn equation [83] giving the velocity of an isogloss at a values of the field at different spatial locations develop point in terms of the unit vector g normal to it at that point: correlations so that the joint distribution of any pair is bivariate normal. Following Bray [20], we define the ˆ ˆ ∇ · g ∇ρ · g normalized correlator v ¼ −βσ þ ð8Þ 2 ρ E½mðr ;tÞmðr ;tÞ 1 2 pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi γðr ; r Þ ≔ : ð14Þ κ ∇ρ · g ˆ 1 2 2 2 E½mðr ;tÞ  E½mðr ;tÞ ¼ −βσ þ : ð9Þ 1 2 2 ρ Using the abbreviated notation γðr ; r Þ ≡ γð1; 2Þ, the 1 2 The quantity κ is the curvature of the isogloss at the point: correlator for the original field may be found by averaging in the absence of variations of population density, the over the bivariate normal distribution of mðr ;tÞ ≕ mð1Þ isogloss moves so as to reduce curvature. The second term and mðr ;tÞ ≕ mð2Þ (see, e.g., Ref. [85]): in the square brackets produces a net migration of iso- glosses towards regions of lower population density. To E½ϕ ð1Þϕ ð2ÞðtÞ¼ E½sgn½mð1Þsgn½mð2Þ ð15Þ i i compute correlation functions between the field ϕ at different locations in space, we apply the Ohta-Jasnow- −1 ¼ sin ðγð1; 2ÞÞ: ð16Þ Kawasaki (OJK) method [84], introducing smoothly vary- ing auxiliary field mðx; y; tÞ, which gives the value of the ith variant as ϕ ¼ sgnðmÞ. Note that the auxiliary field We now compute this correlator and derive a theoretical mðx; y; tÞ is distinct from the memory m ðx; y; tÞ for prediction for Séguy’s curve. the ith variant. The OJK equation, describing the time evolution of this field, adapted to include density effects, is A. Uniform population density (Appendix C) If population density is constant ρ ¼ C, then our adapted 2 OJK equation (10) reduces to OJK’s original form, which ∂m ∇ m ∇ρ:∇m ¼ βσ þ : ð10Þ has the fundamental solution ∂t 4 ρ jr−r j expf− g We introduce the fundamental solution, Gðr;t; r Þ (the 0 βσ t Gðr;t; r Þ¼ ; ð17Þ Green’s function) of Eq. (10), giving the function mðr;tÞ πβσ t subject to the initial condition mðr; 0Þ¼ δðr − r Þ. The solution for arbitrary initial conditions is then giving a normalized correlator mðr;tÞ¼ dr Gðr;t; r Þmðr ; 0Þ: ð11Þ 0 0 0 γð1; 2Þ¼ exp − ≕ γ ðrÞ: ð18Þ 2 2 2tβσ We assume that the initial condition of our system Our prediction for Séguy’s curve at time t is therefore consists of spatially uncorrelated language use, so that E½ϕ ðr ; 0Þϕ ðr ; 0Þ ¼ δ . A convenient, equivalent i 1 i 2 r r 1 2 1 2 −1 lðr; tÞ¼ 1 − sin ðγ ðrÞÞ : ð19Þ condition on the auxiliary field is to let it be Gaussian 2 π (normally) distributed mðr; 0Þ ∼ N ð0; 1Þ with correlator This curve is plotted in Fig. 13 along with simulation E½mðr ; 0Þmðr ; 0Þ ¼ δðr − r Þ: ð12Þ 1 2 1 2 results. We give the following interpretation of the curve. Starting from a randomized spatial distribution of language We can compute this correlator at later times using the use, the need for conformity generates localized regions fundamental solution G: where particular linguistic variables are in common use, and these regions are bounded by isoglosses. These regions E½mðr ;tÞmðr ;tÞ 1 2 Z expand, driven by surface tension in isoglosses, so that 0 0 0 0 0 0 from any given geographical point one would need to travel ¼ Gðr ;t;r ÞGðr ;t;r ÞE½mðr ; 0Þmðr ; 0Þdr dr 1 1 2 2 1 2 1 2 R farther in order to find a change in language use. The linguistic distance between two points therefore tends to ¼ Gðr ;t;r ÞGðr ;t;r Þdr : ð13Þ 1 0 2 0 0 decrease with time, and the curve [Eq. (19)] gives the rate of decrease as exponential. There are features of reality which The linearity of our adapted OJK equation (10) ensures that we might expect to alter this behavior. First, we assume that mðr;tÞ remains Gaussian for all time [20] (to see this, note no major population mixing or migration takes place—such 031008-12 SPATIAL EVOLUTION OF HUMAN DIALECTS PHYS. REV. X 7, 031008 (2017) The source term is ∇ρ m − ∇ · m ¼ : ð23Þ ρ rR We now view Eq. (21) as describing the mass distribution for a collection of Brownian particles which are being driven radially away from the origin. The source term is interpreted as a field that causes particles to produce −1 offspring at rate ðrRÞ as they move through it. The fundamental solution, Gðr;t; r Þ, to Eq. (21) is then the mass distribution for a very large (approaching infinite) collection of particles with total mass initially equal to one, FIG. 13. Séguy’s curve showing linguistic distance (l) versus all of which started at r . geographical distance (r). Dashed line shows Eq. (19) in the case We wish to compute the dependence of linguistic σ ¼ 4, β ¼ 2. Open and closed dots show simulated linguistic distance on geographical distance from the peak of the distances using same parameter values in a 1000 × 1000 system population density (thought of as the center of a city). We at times t ¼ 80, 160. Note that linguistic distance depends only therefore require the expectation −1=2 on the combination rt , so curves evaluated at different times collapse onto one another. E½mð0;tÞmðr;tÞ ¼ Gð0;t; r ÞGðr;t; r Þdr : ð24Þ 0 0 0 events would have the effect of resetting the initial con- ditions of the model. Our prediction is valid only during Computation of a general closed-form expression for times of stability. Second, we assume that the population is Gðr;t; r Þ is not our aim; preliminary computations in uniformly distributed in the system when in reality pop- this direction suggest that if such a form existed, its ulations are clumped and, as we have seen, population complexity would restrict its use to numerical computations centers can support their own dialects if they are large alone. Instead, we make arguments leading to a simple enough. We take some steps toward addressing this issue approximation for Séguy’s curve. We observe first that the below. In Appendix D, we briefly discuss a simple one- integrand in Eq. (24) is dominated by the region around dimensional simulation model from the dialectology liter- r ¼ 0. Numerical evidence for this is provided in Fig. 14, ature [39], which includes the same large r behavior as in where we see that the fundamental solution grows in Eq. (19) for a particular choice (quadratic) of macroscopic magnitude as r → 0. In general, the solution consists of “influence” curve. a circular plateau propagating outward from the origin plus an isolated but spreading peak also drifting away from the B. Peaked population density origin (Fig. 14). The plateau is formed once the rate of loss −1 We now consider how Séguy’s curve is modified by the of particles from the peak source region (jr j ≲ R ) though presence of a peak in population density. In order to allow analytical tractability, we consider a simple exponentially decaying peak pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 2 2 x þ y ρðx; yÞ¼ exp − ; ð20Þ where R> 1. To understand the behavior of the modified OJK equation (10), it is useful to decompose it into an advection diffusion equation plus a source term: ∂m ∇m ∇ρ ∇ρ ¼ βσ ∇ · þ m − ∇ · m : ð21Þ ∂t 4 ρ ρ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi FIG. 14. Radial cross sections (along the line y ¼ 0) through 2 2 Defining r ≔ x þ y , the average velocity field for the numerical approximations to fundamental solutions of the diffusing particle is modified OJK equation (21) with population distribution pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 2 2 Eq. (20) at time t ¼ 300. Here, r ¼ x þ y ¼ x. Parameter ∇ρ ðx; yÞ values are βσ ¼ 1 and R ¼ 10. Initial conditions are r ¼ð1; 0Þ; − ¼ : ð22Þ 0 ρ rR ð10; 0Þ; ð20; 0Þ (solid, dashed, dotted lines, respectively). 031008-13 JAMES BURRIDGE PHYS. REV. X 7, 031008 (2017) advection and diffusion is equal to the rate of creation of new particles. The plateau height is determined by the particle mass which reaches the peak source region in the early stages of evolution. Because of radial drift, the only particles with a chance of doing this are those with sufficiently small Péclet number [86]: 4jr j Pe ¼ ; ð25Þ where r is their starting point (or that of their earliest ancestor if they are daughters). Values of r that lie outside a region of radius ∝ R (henceforth Péclet region) around FIG. 15. Continuous line line shows radial cross section (along the origin can therefore be ignored when computing the line y ¼ 0) through numerical solution of the modified Gð0;t; r Þ.For R ≫ 1, the peak region forms a small OJK equation (21) with population distribution Eq. (20) −4 fraction OðR Þ of the Péclet region, and particles within at time t ¼ 700, with initial condition r ¼ð0; 0Þ. Here, pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 2 2 2 the Péclet region have a probability of reaching the peak r ¼ x þ y ¼ x. Parameter values are βσ ¼ 1 and R ¼ 10. which decays exponentially with their initial distance from Data points show asymptotic analytical solution Gðjrj;t;0Þ it. The function Gð0;t; r Þ will therefore itself be sharply [Eq. (30)], with t ¼ 700, time offset t ¼ −4.4, and A ¼ 0 0 0.0109 (found by maximum likelihood). peaked within the Péclet region, around r ¼ 0, and we make the approximation Gð0;t; r Þ ≈ hδðr Þ, where h is 0 0 plateau height. Making use of this approximation in where t is a time correction which accounts for the fact that Eq. (24),wehave the propagation velocity of the plateau takes some time to settle down to its long time value of βσ =R. We verify in E½mð0;tÞmðr;tÞ ≈ hGðr;t; 0Þ: ð26Þ Fig. 15 that this is the correct asymptotic solution by comparing it to the numerical solution of Eq. (29) for To compute the variance large t. We now approximate the normalized correlator as 2 2 Gðr; t;0Þ E½m ðr;tÞ ¼ G ðr;t; r Þdr ; ð27Þ 0 0 2 γðr; tÞ ≈ : ð31Þ Gð0;t;0Þ we note that if jrj ≪ t=R, then the dominant contribution to This approximation neglects the drop in the variance of the integral comes from the plateau component of the mðr; tÞ for r ≫ t=R described by Eq. (28), which amounts solution. If jrj ≫ t=R, then the plateau will not have pffiffi to neglecting a multiplicative factor t in the large r reached r, so only the spreading peak component of the behavior of the correlator. Our approximate analytical fundamental solution will contribute. Therefore, prediction for Séguy’s curve measured radially from the qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi center of the exponentially decaying population distribu- Oð1Þ if jrj ≪ t=R E½m ðr;tÞ ≈ ð28Þ tion is therefore −1=2 Oðt Þ if jrj ≫ t=R: 1 2 Gðr; t;0Þ −1 We comment on the significance of this behavior below. To l ðr; tÞ ≔ 1 − sin : ð32Þ 2 π Gð0;t;0Þ find the form of Gðr;t; 0Þ, we note first its circular symmetry, which reduces the number of variables in the This prediction is compared to correlations in the full OJK equation to two: model (Fig. 16 and Fig. 17) by generating 100 realiza- tions of isogloss evolution over the exponential popula- 2 2 ∂m σ β ∂ m 1 4 ∂m ¼ þ − : ð29Þ tion density, each with different randomized initial ∂t 4 ∂r r R ∂r conditions. From Fig. 16 we see that as time progresses a growing region emerges around the center of the city in We seek a traveling wave solution, subject to the initial which the linguistic distance to the center is close to zero. condition mðr; 0Þ¼ δðrÞ, representing the expanding pla- An alternative visualization of this effect is given in teau, valid for large r, so that the 1=r term in Eq. (29) can be Fig. 18, which shows a superposition of the isoglosses neglected. We obtain, as t → ∞, from 20 simulation runs. As time progresses, a circular patch emerges in the center of the system, which is Rr − βσ ðt − t Þ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi devoid of isoglosses, and therefore where all speakers use Gðr; t;0Þ ∼ A erfc ; ð30Þ Rσ βðt − t Þ the same linguistic variables. Outside of this central “city 031008-14 SPATIAL EVOLUTION OF HUMAN DIALECTS PHYS. REV. X 7, 031008 (2017) FIG. 16. Dashed lines show theoretical shape of Séguy’s curve FIG. 17. Dashed lines show theoretical shape of Séguy’s curve −r=R −r=R Eq. (32) centered at peak of population density ρ ¼ e with centered at peak of population density ρ ¼ e with R ¼ 20, R ¼ 20. Curve computed using Eq. (32) when β ¼ 1.4, σ ¼ 5, and with time evolution accelerated by factor of 1.25. Curve times are t ¼ 10, 20, 30 with offset t ¼ −5.77 (maximum computed using Eq. (32) when β ¼ 1.1, σ ¼ 5, and times are likelihood estimate). Simulation points give equivalent correla- t þ 1.25t, where t ¼ 10, 20, 30 and t ¼ 4.43. Linear scaling of 0 0 tions in the full model computed from 100 independent simu- time determined by maximum likelihood fit of simulation to lations in a 400 × 400 system. analytical prediction. Simulation points give equivalent correla- tions in the full model computed from 100 independent simu- lations in a 400 × 400 system at times t ¼ 10, 20, 30. dialect,” we note that the asymptotic behavior of the complementary error function, insight into the formation of dialects in population centers and the behavior of Séguy’s curve around cities. expf−x g erfcðxÞ ∼ as x → ∞ ð33Þ We leave the development of a more sophisticated theory for future work. −1 3 together with the expansion sin ðϵÞ¼ ϵ þ Oðϵ Þ lead to the prediction that linguistic correlations fall as −cðΔrÞ e =ðΔrÞ, where c is a constant and Δr is the distance from the edge of the city dialect. This is a faster rate of decay than in the flat population density case. It appears from Fig. 16 that in reality the decay rate may be even faster than this. Further simulations reveal that the velocity with which the city dialect expands shows some systematic deviation from the prediction v ≈ βσ =R of our OJK analysis. For example, in Fig. 17 we reduce the conformity parameter to β ¼ 1.1 and we see that our theoretical predictions are in close agreement with the simulation data, provided we accelerate time by a factor of ≈1.25. The value of β ¼ 1.4 selected in Fig. 13 produces a match between predicted and observed velocity, but for larger values of β the pre- diction is an overestimate. For example, when β ¼ 1.5 with all other parameters identical, the simulated velocity in the full model is smaller than our prediction by a factor of 0.97. One possible explanation for this discrep- ancy is that the interface shape may affect the constant of proportionality in the Allen-Cahn equation (9), for example, if it did not match its constant density equi- FIG. 18. Isogloss evolution in a 400 × 400 system with V ¼ 2, librium form. We also note that OJK’s assumption of −r=R β ¼ 1.1, σ ¼ 5 at t ¼ 10, 15, 25, 35 with ρ ¼ e , where isotropy in unit normals to isoglosses, although preserved R ¼ 20 and r ¼ 0 corresponds to the center of the system. Plot is globally by the circular symmetry of our system, is lost a superposition of 20 simulations with different initial conditions. locally at the edge of the city dialect. Despite these Central peak repels isoglosses. See video in Supplemental shortcomings, the adapted OJK theory allows analytical Material for full animation [38]. 031008-15 JAMES BURRIDGE PHYS. REV. X 7, 031008 (2017) range will push out linguistic change, and where two VI. DIALECT AREAS AND DIALECT CONTINUA centers both repel, we expect to see bundling where their There is debate amongst dialectologists as to the most zones of influence meet. Each city would then create its appropriate way to view the geographical variation of own well-defined dialect area. Within real cities we also see language use [1,2]. The debate arises because it is rarely subdialects spoken by particular social groups [55],but the case that dialects are perfectly divided into areas. since our model does not account for social affiliations, we Chambers and Trudgill [1] imagine the following example: cannot explicitly model this. we travel from village to village in a particular direction and In Fig. 19, we schematically illustrate examples of these notice linguistic differences (large or small) as we go. effects using three imaginary “island nations.” Nation A These differences accumulate so that eventually the local exhibits distinct dialect areas. The northernmost area is population are using a very different dialect from that of the supported both by a city, which may generate and then village we set out from. Did we cross a border dividing the repel language features, and by two indentations which dialect area of the first village from that of the second, and form a “pinch point” which will tend to collect isoglosses if so, when? Alternatively, is it a mistake to think of dialects via the boundary effect. Several other pinches exist which as organized into distinct areas; should we only think of a also collect isoglosses, creating distinct dialects. The continuum? southernmost city supports an isogloss via repulsion, which We now set out what our model can tell us about these would otherwise migrate south under the combined influ- questions. In one sense, language use in our model is ence of surface tension and the boundary effect, eventually always continuous in space. Although domains emerge disappearing. Nation B also possesses boundary indenta- where one variable is dominant, domain boundaries form tions, but the lack of pinch points reduces bundling: while transition regions in which the variants change continu- the indentations collect isoglosses, the smoother parts of ously (the width of these regions is computed in Sec. VII). the coastline allow isoglosses to attach anywhere, creating a Despite this, the boundary between two sufficiently large continuum of language use. Two city dialects do exist, single-variant domains will appear narrow compared to the however, driven by repulsion. Finally, nation C is a convex size of the domains, and in this sense the domains are well region. This is an example of a system which, without a defined and noticeable by a traveler interested in one linguistic variable. Of more interest are the observations of a traveler who pays attention to the full range of language use. To perceive a dialect boundary, this traveler must see a major change in language use over a short distance. This change must be large in comparison to other, smaller changes perceived earlier. In our model a major language change is created by crossing a large number of isoglosses over a short distance. The question then is, under what circumstances will isoglosses bundle sufficiently strongly for dialect boundaries to be noticeable? To answer this we need to recall the three effects which drive isogloss motion. First, surface tension, which tends to reduce curvature. Second, migration of isoglosses until they emerge perpendicular to a boundary such as the coast, the border of a linguistic region, a sparsely populated zone, or an estuary. Third, repulsion of isoglosses from densely populated areas. There are two major ways in which these effects can induce bundling, both of which require the essential ingredient of time and demographic stability in order for surface tension to take hold. Indented bounda- ries can collect multiple isoglosses, creating a bundle. Examples already noted include the Wash and the Severn in FIG. 19. Schematic diagram of isoglosses (dashed lines) for GB, the Gironde Estuary in France, and the historical three language areas or “island nations.” Yellow and ochre circles indentation in the Dutch-German language area marking represent cities. Nation A supports dialect areas formed by coastal the eastern end of the Rhenish fan. A major boundary boundary shape and repulsion from cities. Nation B largely indentation may not always create a bundle, however: it exhibits a continuum of variation apart from two somewhat may be that other parts of the boundary, or the presence of indistinct city dialects. Nation C has a single city dialect, but cities, creates a fanning effect. Variations in population without this city (or if the city were not sufficiently densely density can also create bundling. Dense population centers populated) it would have no linguistic variation due to its entirely which are large in comparison to the typical interaction convex boundary and evenly distributed population. 031008-16 SPATIAL EVOLUTION OF HUMAN DIALECTS PHYS. REV. X 7, 031008 (2017) city, could not support more than one dialect, and would Since V is defined by an indefinite integral, we can define 1 0 tend over time to lose isoglosses. Vð Þ ≔ 0.As x → ∞, we require that f ðxÞ → 0 and In some regions there are no dialect areas, only a fðxÞ → 1 or 0, so continuum of variation [87], and in others clear dialects exist [4]. The above examples point to some general 4 ln 2 − 1 E ¼ limVðfÞ¼ limVðfÞ¼ ðβ − 1Þ: principles. In regions with low population density, a lack f→1 f→0 of major boundary indentations, and few large cities, we might expect isoglosses to position themselves in a less The magnitude of the frequency gradient at the origin, predictable way, creating language variation which would 1 where fð0Þ¼ , is, therefore, be perceived as a continuum by a traveler. The regular rffiffiffiffiffiffi creation of new isoglosses (resetting the initial conditions pffiffiffiffiffiffiffiffiffiffiffi 4E β − 1 of the model) through linguistic innovation or demographic jf ð0Þj ¼ ≈ 0.545 : ð36Þ σ σ instability could also disrupt the ordering processes. Narrow geographical regions, or where there are major From this we see that weak non-neutrality and larger boundary indentations, or dense population centers which interaction range produce shallower gradients and therefore push out linguistic change, are particularly susceptible to wider transition regions. As β approaches one, the tran- the formation of distinct dialects. sition region becomes increasingly wide and boundaries disintegrate, destroying the surface tension effect described VII. TRANSITION REGIONS AND CURVATURE in Fig. 3. Equation (36) is verified numerically for an We now derive analytical results that characterize the “almost straight” isogloss in Fig. 20 (red dashed line). transition regions between variables and the effect of We now relate the equilibrium shape of isoglosses to population density on the curvature of dialect domain population density. Starting from our modified Allen-Cahn boundaries. equation (9) for isogloss velocity, and introducing the local To compute the gradients of transition regions, we radius of curvature R ¼ 1=κ, we see that when an isogloss consider a straight isogloss (with constant population is in equilibrium (having zero velocity) density) in equilibrium between variants 1 and 2, given by the line x ¼ 0. Because the isogloss is vertical the R ¼ − ; ð37Þ frequencies will not depend on y, so we write them f ðxÞ 2∇ρ · g and f ðxÞ and note that f ðxÞ¼ 1 − f ðxÞ, so we need only 2 1 2 consider the behavior of f ðxÞ. For notational simplicity we where g ˆ is a unit normal to the isogloss. We note a simple define f ≔ f , m ≔ m and pðmÞ ≔ p ðmÞ. The isogloss 1 1 1 alternative derivation of this result based on the dialect will form the midpoint of a transition region where the fraction F ðx; yÞ of a domain D. We define this as the frequencies change smoothly between one and zero, and where f ð0Þ measures the rate of this transition. In equilibrium, from Eq. (3) we have ∂ pðmÞ¼ m − pðmÞ: 2 00 −1 Since f ¼ pðmÞ,wehave ðσ =2Þf ¼ p ðfÞ − f. Note 00 2 that f is shorthand for ∂ fðxÞ. If non-neutrality (con- −1 formity) is small, we may replace p ðfÞ with its Taylor series about β ¼ 1, neglecting O½ðβ − 1Þ  terms: σ 1 − f 00 2 f ¼ðβ − 1Þfð1 − fÞ ln þ O½ðβ − 1Þð34Þ 2 f FIG. 20. Sequence of radial cross sections of the frequency of a dV ≕− þ O½ðβ − 1Þ : ð35Þ linguistic variable whose initial domain is concentric with a city. df Snapshots taken at times t ¼ 10; 30; 50; … starting from initial radius r ¼ 55. Model parameters σ ¼ 5, β ¼ 1.1. In this example, Here, we define a “potential” function VðfÞ, allowing the city has radius R ¼ 50 and peak density ρ ¼ 3. Vertical line c 1 us to identify Eq. (34) as Newton’s second law for the gives theoretical stable radius R ¼ 125.3 computed from motion of a particle of mass σ =2 in a potential VðfÞ, Eq. (41). Unstable radius [Eq. (40)]is R ¼ 47.2. Red dashed where x plays the role of time, so that the total energy line gives the theoretical frequency gradient in transition region 2 0 2 [Eq. (36)]. E ≔ ðσ =4Þf ðxÞ þ V(fðxÞ) is independent of x [88]. 031008-17 JAMES BURRIDGE PHYS. REV. X 7, 031008 (2017) fraction of conversations that a speaker at a point ðx; yÞ has we see that our law Eq. (37) accurately predicts the stable with people whose home neighborhoods lie in D: radius produced by our evolution equation (3). VIII. DISCUSSION AND CONCLUSION F ðx; yÞ ≔ kðx; y; u; vÞdudv: ð38Þ A. Summary of results Let P ≔ ðx ;y Þ be a point on an isogloss with local radius Departing from the existing approaches of dialectology, of curvature R ≫ σ, bounding some region D. An intui- we formulate a theory of how interactions between indi- tively appealing condition for language equilibrium is that vidual speakers control how dialect regions evolve. Much the speaker at P should interact with equal numbers of of what we demonstrate is a consequence of the similarity speakers from within and without D: between dialect formation and the physical phenomenon of phase ordering. Humans tend to set down roots, and to conform to local speech patterns. These local patterns may F ðx ;y Þ¼ : ð39Þ be viewed as analogous to local crystal orderings in binary alloys [83] or magnetic domains [22]. As with phase Using the saddle point method [47] to evaluate Eq. (38), ordering, surface tension is a dominant process controlling we have the evolution of dialect regions. However, important differences arise from the fact that human populations 1 σ 1 ∇ρ · g ˆ σ are not evenly distributed through space, and the geo- F ¼ − pffiffiffiffiffiffi þ þ O : 2 2R ρ R graphical regions in which they live have irregular shapes. 2π These two effects cause many different randomized early Result Eq. (37) then follows from the equilibrium condition linguistic conditions to evolve toward a much smaller Eq. (39). From this we see that large gradients in population number of stable final states. For Great Britain we show density can produce equilibrium isoglosses with higher that an ensemble of these final states can produce predicted curvature. To test this against our full evolution equa- dialect areas which are in remarkable agreement with the tion (3), consider a circular city with radius R having a work of dialectologists. Gaussian population distribution set in a unit uniform Since language change is inherently stochastic, at small background: spatial scales we can only expect predictions of a statistical nature. At larger scales an element of deterministic pre- 2 2 2 −ðx þy Þ=2R dictability emerges. Within our model, all stochasticity ρðx; yÞ¼ 1 þðρ − 1Þe : arises from the randomization of initial conditions. The effect of this randomness is strongest in the early stages of The constant ρ ≥ 1 gives the relative density of the city language evolution, when the typical size of dialect center compared to outlying areas. Consider a circular domains is small. The superposition of isoglosses lacks isogloss which is concentric with the city, then Eq. (37) has a discernible pattern. This “tangle” of lines produces a two solutions: continuum of language variation, with spatial correlations sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi given by Séguy’s curve. As surface tension takes hold, 1=4 1 e steered by variations in population density and system R ¼ R − 2W − ; ð40Þ 1 c p 2 4ðρ − 1Þ shape, isoglosses begin to bundle and the continuum resolves into distinct dialect regions. Both long-term sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi population stability and large variations in population 1=4 1 e R ¼ R − 2W − ; ð41Þ density play an important role. Without these ingredients 2 c m 2 4ðρ − 1Þ isoglosses will not have time to evolve into smooth lines, or bundle. where W and W are two branches of Lambert’s W p m The assumptions of our model are minimal, and clearly function [89], WðzÞ, which solves z ¼ we . The two there are many additional complexities involved in lan- solutions R and R are, respectively, decreasing and 1 2 guage change which we have not captured; we discuss increasing functions of ρ . The stability of these solutions 1 below how the model might be extended to account for may be determined by noting that if F > , then D will some of these. Despite this simplicity, in addition to expand, and contract if the inequality is reversed. From this substantially reproducing Trudgill’s predictions for we are able to determine that R is stable, whereas R is not. English dialects [4], we are also able to explain the 2 1 If a dialect domain begins with R< R , then it will shrink observation of both dialect continua and more sharply and disappear under the influence of surface tension, but if bounded dialect domains. We also provide an explanation initially R>R , then the domain will expand or contract for why boundary indentations (e.g., in coastline or in the until R ¼ R . This behavior is illustrated in Fig. 20, where border between different languages) are likely to collect 031008-18 SPATIAL EVOLUTION OF HUMAN DIALECTS PHYS. REV. X 7, 031008 (2017) inherently imperfect [19], and many interactions are isoglosses [1]. We show that cities repel isoglosses, explaining the origin of the Rhenish fan [43], the wavelike between young speakers who are simultaneously assimi- spread of city language features [15], and the fact that many lating their language. In this sense the young are weakly dialect patterns are centered on large urban areas. We coupled to the current, adult speech community, and their explain why linguistic regions with high aspect ratio tend to language state is subject to fluctuations which may be develop striped dialect regions [77,78]. We compute an sufficient to overcome local conformity for long enough to analytical form for Séguy’s curve [2,5,6,39] which as yet become established. As these speakers age their linguistic has had no theoretical derivation. We also adapt this plasticity declines, older speakers die, and the change is derivation to deal with a population center. We quantify cemented. Other mechanisms include speech modifications how the width of a transition region [1] between dialects is made to demonstrate membership of a social group, or a related to the strength of conformity in individual language bias toward easier or more attractive language features. To use and the typical geographical distances over which understand mathematically the effect of innovation on individuals interact. We show how to relate the curvature of spatial evolution, we might simply allow the creation of stable isoglosses to population gradients. Finally, in new variants, and then assign them a “fitness” relative to the Appendix B we show how incorporating an age distribution existing population. into the model can quantify the “‘apparent time” [58] effect used by dialectologists to detect a linguistic change in 2. Interaction network progress. Given these findings, we suggest that the theo- By selecting a Gaussian interaction kernel, and not retical approach we present would be worth further distinguishing between different social groups, we are investigation. assuming that the social network through which language change is transmitted is only locally connected in a B. Missing pieces geographical sense but within each locality the social The model we present is deliberately minimal: it allows network is fully connected. That is, I will listen without us to see how much of what is observed can be explained by prejudice to anyone regardless of age, sex, status, or local interactions and conformity alone. This leads to a ethnicity, as long as they live near my home. Research simple unified picture with surprising explanatory power. into 21st-century human mobility [90,91] and the work of However, having chosen simplicity, we cannot then claim linguists [55,58,61] indicates that both these assumptions to provide a complete description of the processes at play. are a simplification. Human mobility patterns, and by We now describe the missing pieces, indicating what effect implication interaction kernels, exhibit heavy-tailed behav- we expect them to have, and how to include them. ior (with an exponential cutoff at large distances). In our framework, an interaction kernel of this type, combined 1. Innovation with densely populated cities would allow long-range connectivity between population centers. Long-range An important aspect of language change that is not interactions in phase-ordering phenomena can have sub- explicitly encoded within our model is innovation: the stantial effects on spatial correlations and domain sizes creation of new forms of speech. We instead assume that there are a finite set of possible linguistic variables, and for [92], and may effectively alter the spatial dimension of the each one, a finite set of alternatives, all of which are present system [93]. in varying frequencies within the initial conditions of the Further evidence for nonlocal networks is provided by model. Each alternative is equally attractive so that con- the Frisian language, spoken in the Dutch province of formity alone decides its fate. A new variant cannot Friesland. This has a “town Frisian” dialect [68], spoken in spontaneously emerge within the domain of another. The towns that are separated from each other by the Frisian model therefore evolves toward increasing order and spatial countryside, where a different dialect is spoken: town correlation. Because of the presence of population centers Frisian is “distributed.” Within the social network these this ordering process is eventually arrested creating distinct, towns are “local” (nearby) to each other. To incorporate this stable domains. If innovation were allowed, then ordering effect into our model, we must either reformulate our would be interrupted by the initialization of new features, fundamental equation (3) to describe evolution on a more and Séguy’s curve would reach a steady state where the rate general network or generalize our interaction kernel to of innovation (creating spatial disorder) balanced the rate of allow communication between cities. Further empirical ordering. evidence for nonlocal interactions is provided by hierar- For a local innovation to take hold, it must overcome chical diffusion [94], where linguistic innovations spread local conformity, realized as surface tension and the between population centers, not necessarily passing “shrinking droplet” effect. Several mechanisms might through the countryside in between. Such a process allow this to happen: for example, young speakers must motivated the creation of Trudgill’s gravity model [14]. recreate their language using input from parents, peers, and In addition, mobility data [91] show that individuals other community members. This recreation process is follow regular, repeating trajectories, introducing strong 031008-19 JAMES BURRIDGE PHYS. REV. X 7, 031008 (2017) heterogeneity within the set of individual interaction As pointed out by Vitány [99], Tolstoy was, in modern kernels. Social, as well as spatial, heterogeneity may also terms, advocating the formulation of a statistical mechanics be important. For example, it has been shown theoretically of history. The work we present is an attempt to formulate [95] that the time required for two social groups to reach such a theory for the spatial history of language. Because of linguistic consensus is highly sensitive to the level of its simplicity, dealing only with copying and movement, affinity that individuals have for their own group. our model may apply more broadly to other forms of culture. 3. Linguistic space and dynamics ACKNOWLEDGMENTS By using a set of linguistic variables we are treating The author was supported by a Leverhulme Trust dialects as points in vector space. Implicit in our dynamics Research Fellowship (RF-2016-177) while carrying out are two assumptions. First, all transitions between variants this work, and is most grateful for this. The referees, from are allowed, with probabilities given only by the frequen- Statistical Physics and Dialectology, provided invaluable cies with which the variables are used. Second, the advice and their time is greatly appreciated. The author evolution of different linguistic variables are mutually would like to particularly acknowledge one quantitative independent. There are cases where this is an incomplete dialectologist reviewer whose comments had a substantial description. A notable example is chain shifting in vowel impact on the work. He is also particularly grateful to sounds [58]. Linguists represents the set of possible vowel Samia Burridge for many ideas and useful advice. sounds as points in a two-dimensional domain where closeness of the tongue to the roof of the mouth and the position of the tongue’s highest point (toward the front or APPENDIX A: DISCRETIZED EVOLUTION back of mouth) form two orthogonal coordinate directions EQUATION [96,97]. The vowel system of a language is then a set of Here, we present the computational scheme used for points in this domain. If one vowel change leaves a gap in solving our evolution equation (3). This is aimed at this system (an empty region of the domain), then other researchers having some familiarity with computer pro- vowel sounds may shift to fill this gap, producing a chain of graming, such as linguists interested in quantitative interconnected linguistic changes. Similarly, a change in approaches. It can also be implemented using only a one vowel to bring it closer to another can push it, and then spreadsheet (see Supplemental Material [38]). The discre- others, out of their positions. A famous example is the tized version of evolution equation (3) also provides a “great vowel shift” [19] in England between the 14th and greater intuitive understanding of its continuous counter- 17th centuries. Another example concerns changes that part. For simplicity, we consider the V ¼ 2 model and spread to progressively more general linguistic (as opposed define f ≔ f , m ≔ m , pðmÞ ≔ p ðmÞ, and note that 1 1 1 to geographical) contexts [81]. If we have several variables, we need only consider the evolution of m and f each signifying the presence of the change in a different because f ¼ 1 − f ¼ 1 − f. 2 1 context, then it is clear that the frequency of one variable We begin by rewriting our evolution equation (3) in can influence the frequency of another, contradicting our terms of the memory and frequency fields assumption that variables are independent. The fact that linguistic variables are sometimes depen- dent upon one another means that, within our model, p , i ∂ m ¼ f − m þ ∇ ðρfÞ; ðA1Þ 2ρ which relates the past use of some variable to the current frequency of its ith variant, via the relationship where f ðx; y; tÞ ≔ p ½mðx; y; tÞ, should sometimes depend on i i the use of other variables, and might be adapted to capture more than just conformity. f ¼ pðmÞ¼ : ðA2Þ β β m þð1 − mÞ C. Conclusion To solve Eq. (A1) on a computer, we discretize space into a We conclude by noting that a major theme of the book rectangular grid of square sites. We let the side of each grid War and Peace by Tolstoy is the idea that history is square define one unit of length. The interaction range used determined not by great individuals but rather by millions in the computer calculation should be expressed in these units. That is, if the side of a grid square is a-km long, and of small choices made by the people. the real-world interaction range is σ km, then the interaction range used in the computer should be σ ≔ σ=a. We choose To elicit the laws of history we must leave aside kings, a so that σ > 1, so speakers interact over distances greater ministers, and generals, and select for study the homo- than one square. The subscript c distinguishes the inter- geneous, infinitesimal elements which influence the masses [98]. action range measured in computer grid units from the 031008-20 SPATIAL EVOLUTION OF HUMAN DIALECTS PHYS. REV. X 7, 031008 (2017) interaction range in km. For each site we store three boundary sites simply shift attention from empty neighbor- quantities, ρ , f , m , where the subscripts i, j are ing squares to those which are part of the linguistic domain, ij ij ij consistent with the original definition of the weighted horizontal and vertical indices related to spatial position interaction kernel). Second, it ensures that spatially con- by x ¼ ia and y ¼ ja. These quantities are our approx- stant memory and frequency fields constitute a fixed point imations to the values of the fields ρ, f, m at the centers of the dynamics. Rule Eq. (A5) is an explicit scheme and as of sites, and are stored in three arrays. Intuitively, we think such its stability requires that δt be chosen sufficiently of each site as containing a group of a ρ speakers, each of ij small. In the case of zero conformity (β ¼ 1) and constant whom uses variant 1 with frequency f . The linguistic ij population density, the von Neumann stability condition domain of interest is the set of sites with nonzero [63] is δt< 1=σ . This serves as a guide to find δt population density. Within the domain, sites with one or sufficiently small for our scheme to converge. For the more nearest neighbors with zero population are referred to density fields and conformity values we use in this paper, as boundary sites, otherwise they are bulk sites. Sites which we find that δt< 1=ð4σ Þ is more than sufficient. are not part of the domain are never updated, and it is useful We conclude this section by explaining the linguistic to include a border of such sites around the edge of the meaning of the terms on the right-hand side of Eq. (A5). rectangular grid. The first term, f − m , drives conformity. If m > , this ij ij ij The scheme we present is based on approximating the 2 2 2 term is positive, driving the memory further towards m ¼1 ij Laplacian ∇ ¼ ∂ þ ∂ using a central finite difference x y 2 2 where all speakers use variant 1. Otherwise, if m < , the approximation for the derivatives ∂ and ∂ . Let g be an ij x y memory is driven towards zero where no speakers do. The arbitrary function defined at every site. We define a local second term, hρfi =hρi − f , acts to equalize speech use average at each grid point: ij ij ij in the local area. If variant 1 is used more at ði; jÞ than in the 1 surrounding squares, then this term acts to reduce its use in hgi ¼ ðg þ g þ g þ g Þ: ðA3Þ ij iþ1;j i−1;j i;jþ1 i;j−1 ði; jÞ. If variant 1 is used relatively less at ði; jÞ, the term has the opposite effect. This is just the average of the values of g at the four nearest neighbors of ði; jÞ. The Laplacian is then approximated: APPENDIX B: INCORPORATING AGE 2 INTO THE MODEL ∇ g ≈ 4ðhgi − g Þ: ðA4Þ ij ij ij In order to experimentally detect a linguistic change in This follows from the finite difference approximation progress, ideally one would like to survey the same ∂ g ≈ g − 2g þ g , the effectiveness of which population of individuals, or a representative sample of x iþ1;j ij i−1;j depends on g varying slowly between sites. From the population, at two or more points in time [1]. Such longitudinal studies may be practically difficult to carry Eq. (A4) we see that ∇ g measures the extent to which out, so linguists have made use of the assumption that g differs from the average of its neighbors. If ∇ g< 0, then speech patterns are acquired mainly in the early part of g exceeds the local average, and is less than the local people’s lives. The speech of a 50-year old today should average when the inequality is reversed. therefore reflect the speech of a 30-year old 20 years ago. It We now introduce a small discrete time step δt and write should be noted though that speech patterns can change Δm ≔ m ðt þ δtÞ − m ðtÞ for the change in the memory ij ij ij throughout life [44], although more slowly in older speak- field over the time interval ½t; t þ δt. We note also that ers. A linguistic change detected by observing different provided the grid is sufficiently fine, then at bulk sites speech patterns in the young and old is said to have been ρ ≈ hρi , so, making use of Eqs. (A1) and (A4),we have ij ij observed in apparent time [58,59]. A famous example of apparent time is the replacement of the term chesterfield hρfi ij Δm ≈ ðf − m Þþ 2σ − f δt: ðA5Þ (meaning an upholstered multiple-person seat) in Canadian ij ij ij ij hρi ij speech with the fashionable American term couch [100].In this case the use of couch was shown to decrease sigmoi- At each time step Eq. (A5) is used to update the stored dally from ≈85% amongst teenagers to ≈5% among those values of m for all sites in the linguistic domain, after ij in their eighties. The apparent time theory has been tested which the frequencies can also be updated using f ¼ ij by comparing language surveys taken at different times and pðm Þ. The quantity hρfi =hρi is the average of frequen- ij ij ij comparing predictions based on apparent time in the earlier cies at neighboring sites, weighted in proportion to their sample with the observations made in the later one [101]. populations. Using hρi rather than ρ in the denominator We note that differences between speech patterns between ij ij ensures that these weights sum to one. This serves two the generations do not always indicate a linguistic change purposes. First, it avoids the need for an additional in progress [44]. For example, the use of some speech condition at boundary sites (intuitively, speakers in forms may change systematically with age in the same way, 031008-21 JAMES BURRIDGE PHYS. REV. X 7, 031008 (2017) 1 if x< 0 generation after generation, so that the community as a whole is in a stable state [55]. 1 ηðxÞ ≔ if x ¼ 0 ðB7Þ We now give a simple illustration of how age and 0 if x> 0 apparent time can be incorporated in our model. For simplicity, we consider the progress of a straight isogloss and prepare the system with the initial condition between two variants, driven by a slowly declining pop- ulation density. This density variation is equivalent to a fðx; t < t Þ¼ ηðx − ΛÞ; ðB8Þ social bias toward one variable which we call the new variant. We let fðx; tÞ be the frequency with which the new where Λ is the initial location of the isogloss. Because each (spreading) variable is used at position x and time t. Note speaker listens more to the speakers on their left, the that there is no y dependence due to symmetry. To isogloss will travel right. In the limit β → ∞ then when the distinguish between young and old we introduce an age memory μ ðx; tÞ of a speaker, with x> Λ, reaches , they density distribution αðaÞ giving the fraction of individuals will switch linguistic variables. The motion of the isogloss within the age bracket ½a ;a  as 1 2 will then take the form of a traveling wave formed from a a superposition of traveling step functions, one for each age, αðaÞda: ðB1Þ 1 ∞ fðx; tÞ¼ αðaÞη½x − vt þ ΛðaÞda; ðB9Þ Using this distribution we modify our original model to account for the fact that individuals have been exposed only with the function ΛðaÞ > 0 and the velocity v to be to the linguistic information available in their lifetime. The determined. According to Eq. (B9),at t ≔ ΛðaÞ=v,a memory of a speaker with age a is therefore defined as speaker at the origin with memory of length a will be on the Z Z point of a switching variable, so that −s=τ μ ðx; tÞ ≔ kðx; uÞfðu; t − sÞdu ds: Z Z Z −a=τ a ∞ τð1 − e Þ 0 R 1 ds 0 0 μ ð0;t Þ¼ ¼ da dyαða ÞkðyÞ a a 2 a ðB2Þ 0 0 R × η½y þ vs − ΛðaÞþ Λða Þ Z Z Z Note that as a → ∞ this definition coincides with our a ∞ ΛðaÞ−Λða Þ−vs ds 0 0 original definition Eq. (2) of memory. In the interests of ¼ da αða Þ kðyÞdy 0 0 −∞ analytical tractability, we consider the limit of small Z Z a ∞ ds memory decay rate (τ → ∞) in which case linguistic 0 0 0 ≔ αða ÞK½ΛðaÞ − Λða Þ − vsda : memory is a simple “bus stop” average over life history: 0 0 Z Z  ðB10Þ μ ðx; tÞ¼ kðx; uÞfðx; t − sÞdu ds: ðB3Þ 0 R Here, we introduce the cumulative K of the interaction kernel: We also take the limit of total conformity β → ∞ so that language is chosen according to a simple majority rule. We z KðzÞ ≔ kðyÞdy: ðB11Þ consider an exponentially decaying population density −∞ −ϵx ρðxÞ¼ e ; ðB4Þ As ϵ → 0, the isogloss velocity must also tend to zero. The quantity where ϵ ≪ 1. The weighted interaction kernel for this density is then Δða ;a Þ ≔ Λða Þ − ΛðaÞðB12Þ 1 2 1 2 2 2 1 ðu − x þ ϵσ Þ gives the distance between the step functions for speakers kðx; uÞ¼ pffiffiffiffiffiffi exp − ðB5Þ with ages a and a as jΔða ;a Þj, and this separation must 2σ 1 2 1 2 2πσ also tend to zero as ϵ → 0. We can therefore compute a series expansion for v in powers of ϵ by expanding the ≕ kðu − xÞ: ðB6Þ cumulative interaction kernel KðzÞ in Eq. (B10) about z ¼ 0 and ϵ ¼ 0. To lowest order we have Notice that the effect of the decaying population density is to shift the interaction kernel to the left so that more 1 z ϵσ attention is paid to language use on that side of the listener. pffiffiffiffiffiffi pffiffiffiffiffiffi KðzÞ¼ þ þ þ oðzÞþ oðϵÞ: ðB13Þ 2πσ 2π To compute the isogloss velocity, we define 031008-22 SPATIAL EVOLUTION OF HUMAN DIALECTS PHYS. REV. X 7, 031008 (2017) Substituting this approximation into Eq. (B10),wehave Δða; a Þ av 0 0 αða Þda − þ ϵσ ¼ 0: ðB14Þ σ 2σ It is straightforward to verify that this equation has the solution 2σ ϵ v ¼ ; ðB15Þ 0 2 ða − a Þσ ϵ Δða; a Þ¼ ; ðB16Þ a¯ FIG. 21. Thick dashed line shows theoretical frequency of new variant for age distribution Eq. (B18)—using linear approxima- tions Eq. (B15) and (B16) for vðaÞ and ΛðaÞ, and assuming where a¯ is the mean age of the population: Λð0Þ¼ 0—when a ¼ 10, A ¼ 90, σ ¼ 5, ϵ ¼ 0.1 at time t ¼ A=2 ¼ 45 (chosen so that oldest speaker at x ¼ 0 has just switched variable). For these parameter values a¯ ¼ 35.23. Ver- a¯ ≔ aαðaÞda: ðB17Þ tical dashed line is drawn at location of youngest adopter of new variable (giving width of transition region). Open circles give If the oldest speaker has age A, then the width of the frequency values using an age distribution discretized into two- year bins, computed by numerically evolving the full model. transition region is ΔðA; 0Þ¼ Aσ ϵ=a¯. We provide a concrete example using a population “pyramid” age dis- 1 h − ΔðA; aÞ tribution, cut off exponentially at low ages to account for E½q ðaÞ ¼ erfc pffiffiffi ≕ q¯ða; h; ωÞ: ðB20Þ the fact that very young speakers listen to, but do not 2 2ω influence, others. Letting a be the low age cutoff, we define An example of this distribution is illustrated in Fig. 22.In this example, the mean sample location is the center of the transition region, and we see that uptake of the new variant −a=a αðaÞ¼ ð1 − e ÞðA − aÞ; ðB18Þ exhibits the characteristic “S-shaped” age distribution seen in studies of linguistic change observed in apparent time where C is a normalizing constant. An example of the [1,58,100]. traveling wave Eq. (B9) generated by this age distribution We conclude by noting that this extension of the model is illustrated in Fig. 21. Also shown are the results of a to include different memory lengths should be seen as a toy numerical implementation of the full model with a dis- model of the effect of age on language change. The fact that cretized version of the age distribution Eq. (B18). This discretization is necessary in order to implement the model 1.0 numerically, because the memory of each age of speaker 0.8 must be individually stored. Finally, we consider the likely outcome of experimen- 0.6 tally sampling the use of language within this model. We let x be the left boundary of the transition region (the oldest 0.4 speaker at x is just about to switch variable). We then define the indicator function of the event that a speaker of 0.2 age a, located at position x, is using the new variant: 0.0 1 if x − x < ΔðA; aÞ 80 q ðaÞ ≔ ðB19Þ 0 otherwise: FIG. 22. Expected frequency Eq. (B20) with which new variant Consider a sample of speakers with home locations X, is used by speakers of different ages from a random sample. Mean normally distributed around some average position x þ h: and variance of speaker locations relative to left boundary x of 1 2 X ∼ N ðx þ h; ω Þ. The probability that a speaker of age a 0 transition region are h ¼ Δð0;AÞ and ω ¼ 1. Model parameter within this sample will use the new variant is then the values are a ¼ 10, A ¼ 90, σ ¼ 5, ϵ ¼ 0.1. For these parameter expectation of q ðaÞ over the position X: values, a ¼ 35.23. 031008-23 JAMES BURRIDGE PHYS. REV. X 7, 031008 (2017) older speakers tend to take longer to change their language For constant density, the equilibrium configuration of the use is captured purely by the length of their memories. In isogloss is a straight line, so the curvature κ ≔ ∇ · g is zero, reality, the influence on a given speaker of the language and f depends only on the displacement g. Making use of they are exposed to at different stages of life will be much Eqs. (C1) and (C4), we see that the equilibrium equation for more complicated than our simple model [102].For f is in this case example, if language use were determined entirely during 2 2 early life, then the forgetting curve should be peaked during σ d f −1 ¼ p ðfÞ − f: ðC5Þ these early years—in effect, the linguistic memory should 2 dg stop “recording” once a speaker’s youth has ended. Each speaker will respond differently to what they hear, so the We now make the assumption that out of equilibrium, if forgetting curve will not be identical for every speaker. the curvature is low, and the density slowly varying, Such heterogeneity amongst speakers is straightforward to then the profile of the transition region around the isogloss incorporate, but at the cost of tractability. The advantage of takes its equilibrium form. We also recall our assumption the simple approach is to illustrate the apparent time effect in our derivation of the full evolution equation (3) that in an analytically simple way. 2 2 j∇ ρj=ρ ≪ σ . The evolution equation (C1) may then be written as APPENDIX C: ALLEN-CAHN EQUATION AND OHTA-JASNOW-KAWASAKI THEORY σ ∂f ∇ρ · ∇f 0 −1 ∂ f ¼ p ½p ðfÞ ∇ · g ˆ þ 2 ðC6Þ 2 ∂g ρ Here, we derive an analogue of the Allen-Cahn equation t [83] for the velocity of an isogloss, before deriving a ∂f ∇ · g ˆ ∇ρ · g ˆ modified Ohta-Jasnow-Kawasaki equation [84], which 2 0 −1 ¼ σ p ½p ðfÞ þ : ðC7Þ provides a simplified method for understanding the evo- ∂g 2 ρ lution of spatial correlations in the model. In the binary variant case (V ¼ 2), we have that m ¼ 1 − m , so after Making use of relation Eq. (C2), we have 1 2 defining pðm Þ ≔ p ðmÞ and f ≔ f , then f ¼ pðm Þ.In 1 1 1 1 terms of f, our evolution equation (3) may be written as ∂g κ ∇ρ · g ˆ 2 0 −1 ¼ −σ p ½p ðfÞ þ : ðC8Þ ∂t 2 ρ 0 −1 −1 2 ∂ f ¼ p ½p ðfÞ f − p ðfÞþ ∇ ðρfÞ : ðC1Þ 2ρ Since ð∂g=∂tÞ is the isogloss velocity, and at the isogloss we have f ¼ , then Following Refs. [20,83], we introduce a unit vector g, normal to the isogloss at a point P. We let g be the κ ∇ρ · g ˆ 2 displacement from P in the direction of g. At the isogloss v ¼ −σ β þ : ðC9Þ 2 ρ we have ∂f To obtain spatial correlation functions between different ∇f ¼ g; modal linguistic variables, we apply the Ohta-Jasnow- ∂g Kawasaki method [84]. We adapt the description of ∂ f ∂f OJK’s analysis given in Bray [20] to include population ∇ f ¼ þ ∇ · g: ∂g ∂g t t density effects. As we describe in Sec. V, we label the two alternatives for a particular variable as −1 and 1 and We also need the cyclic relation: introduce a smoothly varying auxiliary field mðx; y; tÞ which gives the modal (most common) variant ϕ of ∂t ∂f ∂g ¼ −1: ðC2Þ ˆ variable i as ϕ ðmÞ¼ sgnðmÞ. The unit vector g may then ∂f ∂g ∂t g t f be written as The Laplacian term in Eq. (C1) may be expanded as ∇m follows: g ¼ ; ðC10Þ j∇mj 2 2 ∇ ðρfÞ ∇ ρ ∇ρ · ∇f ¼ ∇ f þ f þ 2 ðC3Þ from which we see that the isogloss velocity is ρ ρ ρ 2 2 −∇ m þ g ˆ g ˆ ∂ ∂ m − 2ð∇ρ:∇mÞ=ρ ∂ f ∂f ∇ ρ ∇ρ · ∇f i;j i j i j ¼ þ ∇ · g þ f þ 2 : v ¼ σ β ; ∂g ∂g ρ ρ 2j∇mj t t ðC4Þ ðC11Þ 031008-24 SPATIAL EVOLUTION OF HUMAN DIALECTS PHYS. REV. X 7, 031008 (2017) where i; j ∈ fx; yg. In a reference frame attached to the inverse sine function, this behavior is not present in our interface, version of Séguy’s law [Eq. (19)]. We note also that the dynamics of Ref. [39] reduce order in the system, whereas our model leads to increasingly ordered states. dm ∂m ¼ 0 ¼ þ v · ∇m: ðC12Þ dt ∂t Since v∥∇m, then v · ∇m ¼ vj∇mj and ∂ m ¼ −vj∇mj,so [1] J. K. Chambers and P. Trudgill, Dialectology (Cambridge University Press, Cambridge, England, 1998). ∂m σ β ∇ρ:∇m [2] W. Heeringa and J. Nerbonne, Dialect Areas and Dialect ¼ ∇ m − g ˆ g ˆ ∂ ∂ m þ 2 : ðC13Þ i j i j ∂t 2 ρ Continua, Lang. Var. Change 13, 375 (2001). i;j [3] L. Bloomfield, Language (Holt, Rinehart and Winston, New York, 1933). This is the OJK equation, modified to include variable [4] P. Trudgill, The Dialects of England (Blackwell, Hoboken, population density. As OJK did, we now assume that the NJ, 1999). direction g ˆ is uniformly distributed over the system, and we [5] J. Séguy, La Relation Entre la Distance Spatiale et la replace g ˆ g ˆ with its circular mean δ , giving i j ij Distance Lexicale, Rev. Ling. Romane 35, 335 (1971). [6] J. Séguy, La Dialectomtrie Dans l’Atlas Linguistique de la 2 Gascogne, Rev. Ling. Romane 37, 1 (1973). ∂m ∇ m ∇ρ:∇m ¼ σ β þ : ðC14Þ [7] M. Wieling and J. Nerbonne, Advances in Dialectometry, ∂t 4 ρ Annu. Rev. Ling. 1, 243 (2015). [8] B. Kessler, in Proceedings of the 7th Conference of the This is our modified Ohta-Jasnow-Kawasaki equation. European Chapter of the Association for Computational Linguistics (Morgan Kaufmann Publishers Inc., San Francisco, CA, 1995), pp. 60–67. APPENDIX D: COMPARISON WITH A [9] V. Levenshtein, Binary Codes Capable of Correcting SIMULATION OF NERBONNE Deletions, Insertions and Reversals, Dokl. Akad. Nauk SSSR 163, 845 (1965). We note a link between curve Eq. (19) and a simple [10] R. G. Shackelton, Jr., Phonetic Variation in the Traditional model simulated by Nerbonne [39], but not characterized English Dialects. A Computational Analysis, J. Engl. Ling. analytically. The model consists of a line of discrete spatial 35, 30 (2007). points, with a single reference site at one end representing a [11] S. Embleton, D. Uritescu, and E. S. Wheeler, Defining city. In contrast to our model, all sites initially use an Dialect Regions with Interpretations: Advancing the identical set of N (¼ 100 in Ref. [39]) binary linguistic Multidimensional Scaling Approach, Literary Ling. variables. Evolution of language use is simulated at each Comput. 28, 13 (2013). site by repeatedly selecting a variable at random and then [12] M. Wieling and J. Nerbonne, Bipartite Spectral Graph Partitioning for Clustering Dialect Varieties and changing the state of the variable with probability . For a Detecting Their Linguistic Features, Comput. Speech site at distance r from the city, the number of repeats of this Lang. 25, 700 (2011). randomization process is defined as nðrÞ ≔ ⌊Cr ⌋, where [13] M. Wieling and J. Nerbonne, GABMAP: A Web Application the constant α measures the spatial decline of the “influ- for Dialectology, Dialectologia II, 65 (2011). ence” of the city with r. Larger values of nðrÞ imply a [14] P. Trudgill, Linguistic Change and Diffusion: Description greater level of noisy evolution and therefore a lower and Explanation in Sociolinguistic Dialect Geography, influence of the city. Linguistic distance in this model, Lang. Soc. 3, 215 (1974). after each site has received its nðrÞ updates, is given by [15] W. Wolfram and N. Schilling-Estes, Dialectology and Linguistic Diffusion,in The Handbook of Historical 1 1 Linguistics, edited by B. D. Joseph and R. D. Janda −1 nðrÞ −nðrÞ=N lðrÞ¼ ½1 − ð1 − N Þ  ≈ ½1 − e : ðD1Þ (Blackwell Publishing, Malden, 2003), pp. 713–725. 2 2 [16] W. Labov, Social Dialectology in Honour of Peter Trudgill (John Benjamins Publishing Company, Amsterdam, In Ref. [39], two values of α are tested, α ¼ 1, 2, and the 2003), pp. 9–22. quadratic case is identified as being consistent with [17] G. Bailey, T. Wikle, J. Tillery, and L. Sand, Some Patterns Trudgill’s macroscopic gravity model. However, we of Linguistic Diffusion, Lang. Var. Change 5, 359 (1993). emphasize that the two models do not make predictions [18] J. N. Stanford and L. A. Kenney, Revisiting Transmission that can be directly compared: in the microscopic model and Diffusion: An Agent-Based Model of Vowel Chain [39] no indication is given of how two centers of influence Shifts Across Large Communities, Lang. Var. Change 25, would compete. Of interest is the fact that the α ¼ 2 case 119 (2013). coincides with our prediction for large r. However, this [19] D. Crystal, The Cambridge Encyclopedia of the English value of α is rejected in Ref. [39] on the basis of its Language (Cambridge University Press, Cambridge, sigmoidal shape for small r. Because of the presence of the England, 2003). 031008-25 JAMES BURRIDGE PHYS. REV. X 7, 031008 (2017) [20] A. J. Bray, Theory of Phase-Ordering Kinetics, Adv. Phys. [39] J. Nerbonne, Measuring the Diffusion of Linguistic 43, 357 (1994). Change, Phil. Trans. R. Soc. B 365, 3821 (2010). [21] K. Barros, P. L. Krapivsky, and S. Redner, Freezing into [40] M. R. Mehl, S. Vazire, N. Ramrez-Esparza, R. B. Slatcher, Stripe States in Two-Dimensional Ferromagnets and and J. W. Pennebaker, Are Women Really More Talkative Crossing Probabilities in Critical Percolation, Phys. Than Men?, Science 317, 82 (2007). [41] J. K. Chambers, Dialect Acquisition, Language 68, 673 Rev. E 80, 040101 (2009). [22] P. L. Krapivsky, S. Redner, and E. Ben-Naim, A Kinetic (1992). View of Statistical Physics (Cambridge University Press, [42] J. Siegel, Second Dialect Acquisition (Cambridge University Cambridge, England, 2010). Press, Cambridge, England, 2010). [23] J. Burridge and S. Kenney, Birdsong Dialect Patterns [43] P. Bloom, How Children Learn the Meanings of Words Explained Using Magnetic Domains, Phys. Rev. E 93, (MIT Press, Cambridge, MA, 2000). [44] G. Sankoff and H. Blondeau, Language Change Across the 062402 (2016). Lifespan: /r/ in Montreal French, Language 83, 560 [24] M. Gerlach and E. G. Altmann, Stochastic Model for the Vocabulary Growth in Natural Languages, Phys. Rev. X (2007). 3, 021006 (2013). [45] L. Averell and A Heathcote, The Form of the Forgetting [25] A. M. Petersen, J. N. Tenenbaum, S. Havlin, H. E. Stanley, Curve and the Fate of Memories, J. Math. Psychol. 55.25 and M. Perc, Languages Cool as They Expand: Allometric (2011). [46] R. A. Blythe, Neutral Evolution: A Null Model for Scaling and the Decreasing Need for New Words, Sci. Language Dynamics, Adv. Complex Syst. 15, 1150015 Rep. 2, 943 (2012). [26] J. C. Willis and G. U. Yule, Some Statistics of Evolution (2012). and Geographical Distribution in Plants and Animals, and [47] J. Mathews and R. L. Walker, Mathematical Methods of Their Signature, Nature (London) 109, 177 (1922). Physics (Addison-Wesley, Reading, MA, 1971). [27] M. E. J. Newman, Power Laws, Pareto Distributions and [48] W. Croft, Explaining Language Change: An Evolutionary Approach (Longman, Harlow, England, 2000). Zipf’s Law, Contemp. Phys. 46, 323 (2005). [49] M. Kimura, The Neutral Theory of Molecular Evolu- [28] M. Gerlach and E. G. Altmann, Scaling Laws and tion (Cambridge University Press, Cambridge, England, Fluctuations in the Statistics of Word Frequencies, New J. Phys. 16, 113010 (2014). 1983). [29] M. Newman, Networks: An Introduction, (Oxford, [50] P. A. P Moran, Random Processes in Genetics, Proc. New York, 2010). Cambridge Philos. Soc. 54, 60 (1958). [30] J. P. Gleeson, K. P. O’Sullivan, R. A. Banos, and Y. [51] S. Asch, Studies of Independence and Conformity: I. A Minority of One Against a Unanimous Majority, Psychol. Moreno, Effects of Network Structure, Competition Monogr. 70, 1 (1956). and Memory Time on Social Spreading Phenomena., [52] R. Bond and P. B. Smith, Culture and Conformity: A Meta- Phys. Rev. X 6, 021019 (2016). [31] R. Albert, H. Jeong, and A. L. Barabasi, Error and Attack Analysis of Studies Using Asch’s (1952b, 1956) Line Tolerance of Complex Networks, Nature (London) 406, Judgment Task, Psychol. Bull. 119, 111 (1996). [53] T. J. H. Morgan, L. E. Rendell, M. Ehn, W. Hoppitt, and 378 (2000). K. N. Laland, The Evolutionary Basis of Human Social [32] M. Perc, Evolution of the Most Common English Words Learning, Proc. R. Soc. B 279, 653 (2012). and Phrases Over the Centuries, J. R. Soc. Interface 9, [54] D. A. Nelson, Song Overproduction, Selective Attrition 3323 (2012). [33] V. V. Loreto, A. Baronchelli, A. Mukherjee, A. Puglisi, and and Song Dialects in the White-Crowned Sparrow, Animal F. Tria, Statistical Physics of Language Dynamics, J. Stat. Behaviour 60, 887 (2000). [55] P. Trudgill, Sociolinguistics (Penguin, London, 2000). Mech. (2011) P04006. [56] W. Labov, The Social Stratification of English in New York [34] G. J. Baxter, R. A. Blythe, W. Croft, and A. J. McKane, City (Cambridge University Press, Cambridge, England, Utterance Selection Model of Language Change, Phys. 2006). Rev. E 73, 046118 (2006). [57] A. S. Kroch, Toward a Theory of Social Dialect Variation, [35] M. R. D’Orsogna and M. Perc, Statistical Physics of Crime: A Review, Phys. Life Rev. 12, 1 (2015). Lang. Soc. 7, 17 (1978). [36] Z. Wanga, C. T. Bauch, Samit Bhattacharyya, A. [58] W. Labov, Principles of Linguistic Change (Blackwell, Malden, MA, 2001). d’Onofrio, P. Manfredi, M. Perc, N. Perra, M. Salathé, [59] W. Labov, The Social Motivation of a Sound Change, and D. Zhaol, Statistical Physics of Vaccination, Phys. Word and Information Processing 19, 273 (1963). Rep. 664, 1 (2016). [60] W. Wolfram and N. Schilling-Estes, in Sociolinguistic [37] C. Castellano, S. Fortunato, and V. Loreto, Statistical Variation: Data, Theory, and Analysis: Selected Papers Physics of Social Dynamics, Rev. Mod. Phys. 81, 591 from NWAV 23 at Stanford (CSLI, Stanford, CA, 1996), (2009). pp. 69–82. [38] See Supplemental Material at http://link.aps.org/ [61] W. Labov, Sociolinguistic Patterns (University of supplemental/10.1103/PhysRevX.7.031008 for animated Pennsylvania Press, Philadelphia, PA, 1972). versions of Fig. 6, Fig. 10, Fig. 18, and a spreadsheet [62] D. MacAulay, The Celtic Languages (Cambridge Univer- implementation of the discretized evolution equation sity Press, Cambridge, England, 2008). derived in Appendix A. 031008-26 SPATIAL EVOLUTION OF HUMAN DIALECTS PHYS. REV. X 7, 031008 (2017) [63] W. H. Press, S. A. Teukolsky, W. T. Vetterling, and B. P. [82] J. Nerbonne and P. Kleiweg, Lexical Distance in Lamsas, Flannery, Numerical Recipes (Cambridge University Press, Comput. Humanit. 37, 339 (2003). Cambridge, England, 2007). [83] S. M. Allen and J. W. Cahn, A Microscopic Theory [64] Office for National Statistics: 2011 Census aggregate data, for Antiphase Boundary Motion and Its Application to UK Data Service, 2016, http://dx.doi.org/10.5257/census/ Antiphase Domain Coarsening, Acta Metall. 27, 1085 aggregate‑2011‑1. (1979). [65] M. Tranter, D. Robineau, and G. Goodman, National [84] T. Ohta, D. Jasnow, and K. Kawasaki, Universal Scaling in Travel Survey: 2014 Report (Department of Transport, the Motions of Random Interfaces, Phys. Rev. Lett. 49, United Kingdom, 2014), https://www.gov.uk/government/ 1223 (1982). statistics/national‑travel‑survey‑2014. [85] Y. Oono and S. Puri, Large Wave Number Features of [66] P.-G. de Gennes, F. Brochard-Wyart, and D. Quere, Form Factors for Phase Transition Kinetics, Mod. Phys. Capillarity and Wetting Phenomena (Springer-Verlag, Lett. B 02, 861 (1988). New York, 2004). [86] S. V. Patankar, Numerical Heat Transfer and Fluid Flow [67] C. Upton, The Importance of Being Janus: Midland (McGraw-Hill, New York, 1980). Speakers and the “North-South Divide”,in Middle and [87] T. Leinonen, An Acoustic Analysis of Vowel Pronunciation Modern English Corpus Linguistics, edited by M. Markus, in Swedish Dialects (Rijksuniversiteit Groningen, 2010). Y. Iyeiri, R. Heuberger, and E. Chamson (John Benjamins [88] David Morin, Introduction to Classical Mechanics Publishing Company, Amsterdam, 2012), pp. 257–268. (Cambridge University Press, Cambridge, England, 2008). [68] W. J. Heeringa, Measuring Dialect Pronunciation [89] NIST Handbook of Mathematical Functions, edited by Differences Using Levenshtein Distance (University of F. W. J. Olver, D. W. Lozier, R. F. Boisvert, and C. W. Groningen, Groningen, 2004). Clark (Cambridge University Press, Cambridge, England, [69] A. K. Jain and R. C. Dubes, Algorithms for Clustering 2010). Data (Prentice-Hall, Englewood Cliffs, NJ, 1988. [90] D. Brockmann, L. Hufnagel, and T. Geisel, The Scaling [70] M. Maechler, P. Rousseeuw, A. Struyf, M. Hubert, and Laws of Human Travel, Nature (London) 439, 462 (2006). K. Hornik, Cluster: Cluster Analysis Basics and Exten- [91] M. C. Gonzlez, C. A. Hidalgo, and L. A. Barabsi, Under- sions. 2016. R package version 2.0.5. For new features, standing Individual Human Mobility Patterns, Nature see the “Changelog” file (in the package source), https:// (London) 453, 779 (2008). cran.r‑project.org/web/packages/cluster. [92] H. Hayakawa, T Ishihara, K. Kawanishi, and T. S. [71] B Everitt, S. Landau, M. Leese, and D. Stahl, Cluster Kobayakawa, Phase-Ordering Kinetics in Nonconserved Analysis (Wiley, Chichester, England, 2011). Scalar Systems with Long-Range Interactions, Phys. Rev. [72] H Kuhn, The Hungarian Method for the Assignment E 48, 4257 (1993). Problem, Naval research logistics quarterly 2, 83 (1955). [93] T. Blanchard, M. Picco, and M. A. Rajabpour, Influence of [73] S. N. Chiu, D. Stoyan, W. Kendall, and J. Mecke, Sto- Long-Range Interactions on the Critical Behavior of the chastic Geometry and Its Applications (Wiley, Chichester, Ising Model, Europhys. Lett. 101, 56003 (2013). England, 2013). [94] B. D. Joseph and R. D. Janda, The Handbook of Historical [74] W. M. Rand, Objective Criteria for the Evaluation of Linguistics (Blackwell, Oxford, 2003). Clustering Methods, J. Am. Stat. Assoc. 66, 846 (1971). [95] C.-M. Pop and E. Frey, Language Change in a Multiple [75] L. Hubert, Comparing Partitions, J. Classif. 2, 193 Group Society, Phys. Rev. E 88, 022814 (2013). (1985). [96] P. Roach, English Phonetics and Phonology (Cambridge [76] W. Heeringa, J. Nerbonne, and P. Kleiweg, Validating University Press, Cambridge, England, 2009). Dialect Comparison Methods,in Classification, Automa- [97] M. Davenport and S. J. Hannahs, Introducing Phonetics tion, and New Media (Springer-Verlag, Berlin, 2002), and Phonology (Routledge, Oxford, 2010). pp. 445–452. [98] Leo Tolstoy, War and Peace (Penguin, Toronto, 1869). [77] P. Sammallahti, The Saami Languages. An Introduction [99] P. M. B. Vitányi, Tolstoy’s Mathematics in War and Peace, (Dawi Girji, Karasjok, Norway, 1998). Math. Intelligence 35, 71 (2013). [78] D. R. Preston, Handbook of Perceptual Dialectology (John [100] J. K. Chambers, The Canada US Border as a Vanishing Benjamins, Amsterdam, 1999). Isogloss: The Evidence of Chesterfield, J. Engl. Ling. 23, [79] M. R. Spruit, W. Heeringa, and J. Nerbonne, Associations 155 (1995). among Linguistic Levels, Lingua 119, 1624 (2009). [101] G. Bailey, T. Wilke, J. Tillery, and L. Sand, The Apparent [80] C. Gooskens, Traveling Time as a Predictor of Linguistic Time Construct, Lang. Var. Change 3, 241 (1991). Distance, Dialectologia Geolinguistica 13, 38 (2005). [102] G. Baxter and W. Croft, Modeling Language Change [81] S. Montemagni, The Space of Tuscan Dialectal Variation. Across the Lifespan: Individual Trajectories in Community A Correlation Study, Int. J. Humanit. Arts Comput. 2, 135 Change, Lang. Var. Change 28, 129 (2016). (2008). 031008-27 http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Physical Review X American Physical Society (APS)

Spatial Evolution of Human Dialects

Free
27 pages

Loading next page...
 
/lp/aps_physical/spatial-evolution-of-human-dialects-xDx00UroaG
Publisher
The American Physical Society
Copyright
Copyright © Published by the American Physical Society
eISSN
2160-3308
D.O.I.
10.1103/PhysRevX.7.031008
Publisher site
See Article on Publisher Site

Abstract

Selected for a Viewpoint in Physics PHYSICAL REVIEW X 7, 031008 (2017) James Burridge Department of Mathematics, University of Portsmouth, Portsmouth PO1 3HF, United Kingdom (Received 28 February 2017; revised manuscript received 9 May 2017; published 17 July 2017) The geographical pattern of human dialects is a result of history. Here, we formulate a simple spatial model of language change which shows that the final result of this historical evolution may, to some extent, be predictable. The model shows that the boundaries of language dialect regions are controlled by a length minimizing effect analogous to surface tension, mediated by variations in population density which can induce curvature, and by the shape of coastline or similar borders. The predictability of dialect regions arises because these effects will drive many complex, randomized early states toward one of a smaller number of stable final configurations. The model is able to reproduce observations and predictions of dialectologists. These include dialect continua, isogloss bundling, fanning, the wavelike spread of dialect features from cities, and the impact of human movement on the number of dialects that an area can support. The model also provides an analytical form for Séguy’s curve giving the relationship between geographical and linguistic distance, and a generalization of the curve to account for the presence of a population center. A simple modification allows us to analytically characterize the variation of language use by age in an area undergoing linguistic change. DOI: 10.1103/PhysRevX.7.031008 Subject Areas: Complex Systems, Interdisciplinary Physics, Statistical Physics I. INTRODUCTION without sharp boundaries [1–3]. Whereas an isogloss represents the extent of an individual feature, a recogniz- Over time, human societies develop systems of belief, able dialect is typically a combination of many distinctive languages, technology, and artistic forms that collectively features [1,4]. We can attempt to distinguish dialects by may be called culture. The formation of culture requires superposing many different isoglosses, but often they do individuals to have ideas, and then for others to copy them. not coincide [3], leading to ambiguous conclusions. Historically, most copying has required face-to-face inter- The first steps toward an objective, quantitative analysis action, and because most human beings tend to remain of the shapes of dialect areas were made by Séguy [5,6], localized in geographical regions that are small in com- who examined large aggregates of features, making com- parison to the world, human culture can take quite different parison between lexical distances and geographic separa- forms in different places. One aspect of culture where tions. Central to the quantitative study of dialects, called geographical distribution has been studied in great detail is dialectometry (see Ref. [7] for a recent review), is the dialect [1]. measurement of linguistic distance which, for example, can In order to visualize the spatial extent of dialects, be viewed as the smallest number of insertions, deletions, dialectologists have traditionally drawn isoglosses: lines or substitutions of language features needed to transform enclosing the domain within which a particular linguistic one segment of speech into another [8]. This “Levenshtein feature (a word, a phoneme, or an element of syntax) is distance” was originally devised to measure the difference used. However, it is not usually the case that language use between sequences [9]. Using a metric of this kind, a set of changes abruptly at an isogloss—typically there is a dialect observations can be grouped into clusters according transition zone where a mixture of alternative features is to their linguistic (as opposed to spatial) closeness [10–13]. used [1]. In fact, there is debate about whether the most The clusters then define geographical dialect areas. appropriate way to view the geographical organization of The question we address is why dialect domains have dialects is as a set of distinct areas or as a continuum particular spatial forms, and to give a quantitative answer requires a model. The question has been addressed in the past, famously (amongst dialectologists) by Trudgill and james.burridge@port.ac.uk co-workers [1,14], with his “gravity model.” According to Published by the American Physical Society under the terms of this, the strength of linguistic interaction between two the Creative Commons Attribution 4.0 International license. population centers is proportional to the product of their Further distribution of this work must maintain attribution to populations, divided by the square of the distance between the author(s) and the published article’s title, journal citation, and DOI. them. The influence of a settlement (e.g., a city) i on 2160-3308=17=7(3)=031008(27) 031008-1 Published by the American Physical Society JAMES BURRIDGE PHYS. REV. X 7, 031008 (2017) regions of aligned atoms evolve so as to minimize boundary another j is then defined to be the product of interaction strength with the ratio P =ðP þ P Þ, where P and P are length [21,22]. The human agents who interact to form i i j i j dialects behave in roughly the same way (as do some birds the population sizes of settlements i and j. These additive influence scores may then be used to predict the progress of [23]). When people speak and listen to each other, they have a linguistic change that originated in one city, by determin- a tendency to conform to the patterns of speech they hear ing the settlements over which it exerts the greatest net others using, and therefore to “align” their dialects. Since influence. It is then predicted that the change progresses people typically remain geographically localized in their from settlement to settlement in a cascade. Predictions may everyday lives, they tend to align with those nearby. This also be made regarding the combined influence of cities on local copying gives rise to dialects in the same way that neighboring nonurban areas. The model has been partially short-range atomic interactions give rise to domains in successful in predicting observed sequences of linguistic ferromagnets. However, whereas the atoms in a ferromagnet change [1,15–17], and offers some qualitative insight into are regularly spaced, human population density is variable. the most likely positions of isoglosses [14]. In this paper, We show that as a result, stable boundaries between domains we offer an alternative model, also based on population become curved lines. data, which makes use of ideas from statistical mechanics. While our interest is in the spatial distribution of Rather than starting with a postulate about the nature of linguistic forms, there are other properties of language interactions between population centers, we begin with for which parallels with the physical or natural world can be assumptions about the interactions between speakers. From usefully drawn, and corresponding mathematical methods these assumptions about small-scale behavior we derive applied. For example, the rank-frequency distribution of predictions about macroscopic behavior. This approach word use, compiled from millions of books, takes the form has the advantage of making clear the link between of a double power law [24,25], which can be explained [24] individual human interactions and population-level behav- using a novel form of the Yule process [26,27], first ior. Moreover, we are able to unambiguously define the introduced to explain the distribution of the number of dynamics of the model and make precise predictions about species in genera of flowering plants. Historical fluctua- the locations of isoglosses, the nature of transition regions tions in the relative frequency with which words are used between linguistic forms, and the most likely structure of have been shown to decay as a language ages and expands dialect domains. There are links between our approach and [25], analogous with the cooling effect produced by the agent-based models of language change [18], which expansion of a gas. Methods used to understand disorder in directly simulate the behavior of individuals. The difference physical systems (“quenched” averages) have been applied between this approach and ours lies in the fact that for us, to explain how a tendency to focus on topics controls assumptions about individual behavior lead to equations for fluctuations in the combined vocabulary of groups of texts language evolution which are macroscopic in character. [28]. A significant focus of current statistical physics These equations have considerable analytical tractability research has been on the evolution and properties of and offer a simple and intuitive picture of the large-scale networks [29], which have many diverse applications from spatial processes at play. the spread of ideas, fashions, and disease [30] to the In seeking to model the spatial distribution of language vulnerability of the internet [31]. Real networks are often beginning with the individual, we are encouraged by the formed by “preferential attachment” where new connec- fact that dialects are created through a vast number of tions are more often made to already well-connected nodes, complex interactions between millions of people. These leading to a “scale-free” (power-law) distribution of node people are analogous to atoms in the physical context, and degree. The popularity of words has been shown to evolve when very large numbers of particles interact in physical in the same way [32]; words used more in the past tend to systems, simple macroscopic laws often emerge. Despite be used more in the future. Beyond the study of word use the fact that dialects are the product of hundreds of years of and vocabulary, agent-based models such as the naming linguistic and cultural evolution [4], and thus historical game [33], used to investigate the emergence of language, events must have played a role in creating their spatial and the utterance selection model [34], used to model distribution [19], the physical analogy suggests that it may changes in language use over time, have been particularly be possible to formulate approximate statistical laws that influential. We follow the latter model by representing play a powerful role in their spatial evolution. language use using a set of discrete linguistic variables. A physical effect analogous to the formation of dialects is Spatial models motivated by concepts of statistical physics phase ordering [20].Thisoccurs, forexample,inferromag- have also been used to study the spread of crime [35] and to netic materials, where each atom attempts to align itself with devise optimal vaccination strategies to prevent disease neighbors. If the material is two dimensional (a flat sheet), this leads to the formation of a patchwork of domains where [36]. The importance of the emergence of order in social all atoms are aligned with others in the same domain, but not contexts, and connections to statistical physics, may be with those in other domains. The boundaries between these found in a wide-ranging review [37] by Castellano et al. 031008-2 SPATIAL EVOLUTION OF HUMAN DIALECTS PHYS. REV. X 7, 031008 (2017) II. SUMMARY FOR LINGUISTS implemented using only a spreadsheet (see Supplemental Material [38]), although a computer program would be A. Contents of the paper much faster. Using this, isogloss evolution can be explored The aim of this paper is to adapt the theory of phase in linguistic domains with any shape and population ordering to the study of dialects, and then to use this theory distribution. The simplicity of the scheme invites adapta- to explain aspects of their spatial structure. For those tion to include more linguistic realism (e.g., bias toward a without a particular mathematical or quantitative inclina- linguistic variant). Beyond the exploration of individual tion, the model can be simply explained: We assume that isoglosses, a line of inquiry that may be of interest to people come into linguistic contact predominantly with dialectometrists is to test our predicted forms of Séguy’s those who live within a typical travel radius of their home curve against observations. (around 10–20 km). If they live near a town or city, we assume that they experience more frequent interactions III. MODEL with people from the city than with those living outside it, Our aim is to define a model of speech copying which simply because there are many more city dwellers with incorporates as few assumptions as possible, while whom to interact. We represent dialects using a set of allowing the effect of local linguistic interaction and linguistic variables [1], and we suppose that speakers have movement to be investigated. The model has its roots in a tendency to adapt their speech over time in order to the ideas of the linguist Bloomfield [3], who believed that conform to local conventions of language use. Our model is the speech pattern of an individual constantly evolved deliberately minimal: these are our only assumptions. We through his or her life via pairwise interaction. This discover that, starting from any historical language state, microscopic view of language change led to the prediction these assumptions lead to the formation of spatial domains that the diffusion of linguistic features should follow routes where particular linguistic variants are in common use, as with the greatest density of communication. Bloomfield in Fig. 2. We find that the isoglosses that bound these defined this as the density of conversational links between domains are driven away from population centers, that they speakers accumulated over a given period of time. In our tend to reduce in curvature over time, and that they are most model, the analogy of this link density is an interaction stable when emerging perpendicular to borders of a kernel weighted by spatial variations in population distri- linguistic domain. These theoretical principles of isogloss bution. We implicitly assume that interaction is inherently evolution are explained pictorially in Figs. 3, 4, and 5, and local so that linguistic changes spread via normal contact provide a theoretical explanation for a range of observed [39], rather than via major displacements, conquests, phenomena, such as the dialects of England (Fig. 7), the or dispersion of settled communities. We are, therefore, Rhenish fan (Fig. 10), the wavelike spread of language modeling language in stable settlements, with initial con- features from cities (Figs. 12 and 16), the fact that narrow ditions set by the most recent major population upheaval. regions often have “striped” dialects (Fig. 11), and that We consider a population of speakers, each of whom has coastal indentations including rivers and estuaries often a small home neighborhood, and we introduce a population generate isogloss bundles. Our assumptions also lead to a density ρðx; yÞ giving the spatial variation of the number mathematical expression for the relationship between homes per unit area. In order to incorporate local human linguistic and geographical distance—the Séguy curve— movement within the model, we begin by defining a and a hypothesis regarding the question of when dialects Gaussian interaction kernel for each speaker: should be viewed as a spatial continuum, as opposed to distinct areas (Fig. 19). 2 2 1 Δx þ Δy ϕðΔx;ΔyÞ ≔ exp − : 2 2 2πσ 2σ B. How might a linguist make use of this work? Without using mathematics, but having understood our Note that the symbol ≔ indicates the definition of a new principles of isogloss evolution and considered the exam- quantity. Consider a speaker, Anna, whose home neighbor- ples set out in this paper, further cases may be sought where hood is centered on ðx ;y Þ. In the absence of variation in 0 0 the principles explain observations. If the principles cannot population density, ϕ is the normalized distribution of the explain a particular situation or are violated, one might seek relative positions, ðΔx;ΔyÞ, of the home neighborhoods to understand what was missing from the underlying of speakers with whom Anna regularly interacts. The assumptions, or if they were wrong. Since the assumptions constant σ, the interaction range, is a measure of the typical are so minimal, they cannot be the whole story, and a geographical distance between the neighborhoods of inter- discussion of possible missing pieces is given in Sec. VIII. acting speakers. Now suppose that density is not uniform For the mathematically inclined linguist, Appendix A sets due to the presence of a city or a sparsely populated out an elementary scheme for solving the fundamental mountainous area. In this case, while Anna is going about evolution equation on a computer. This scheme also offers a her daily life she is more likely to hold conversations with simple and intuitive understanding of the model, and can be people whose homes lie in a nearby densely populated 031008-3 JAMES BURRIDGE PHYS. REV. X 7, 031008 (2017) region because these people constitute a greater proportion An intuitive understanding of this equation may be gained of the local population. To incorporate this density effect, by imagining that each speaker possesses an internal tape we define a normalized weighted interaction kernel for a recorder that records language use as they travel around the home at ðx ;y Þ: vicinity of their home. As time passes, older recordings 0 0 fade in importance to the speaker, and the variable m ϕðx − x ;y − y Þρðx; yÞ measures the historical frequency with which variable i has 0 0 kðx ;y ; x; yÞ ≔ : 0 0 been heard, accounting for the declining importance of 2 ϕðu − x ;v − y Þρðu; vÞdudv 0 0 older recordings. The rate of this decline is determined by Given any region A, the fraction of Anna’s interactions that the parameter τ, which we call memory length, and note that are with people who live in A is kðx ;y ;x;yÞdxdy. changing its value simply rescales the unit of time. We note 0 0 We distinguish between dialects by constructing a set of also that this form of memory may be seen as a determin- linguistic variables whose values vary between dialects. A istic spatial version of the discrete stochastic memory used single variable might, for example, be the pronunciation of in the utterance selection model [34,46]. On the grounds the vowel u in the words “but” and “up” [4]. In England, that speakers collect very large samples of local linguistic northerners use a long form, “boott” and “oopp,” with information, our definition does not contain terms repre- phonetic symbol [℧], and southerners use a short version, senting random sampling error. In going from Eq. (1) to (2) [∧]. Considering a single variable which we suppose has we use the saddle point method [47] to approximate the V> 1 variants, we define f ðx; y; tÞ to be the relative i spatial integral in Eq. (1) and assume that j∇ ρj=ρ is small frequency with which the ith variant of our variable is used compared to σ (that is, population changes approximately by speakers in the neighborhood of ðx; yÞ, at time t.For linearly over the length scale of human interaction). mathematical simplicity, we assume that nearby speakers To allow speakers to base their current speech on what use language in a similar way, so that f ðx; y; tÞ varies i they have heard in the past, we let f ðx; y; tÞ be a function smoothly with position. p of the set of memories ðm ;m ; …;m Þ ≕ m: i 1 2 V People speak on average 16 000 words per day [40] and can take months or years (depending on their age and f ðx; y; tÞ ≔ p ½mðx; y; tÞ: i i background) to adapt their speech to local forms [41,42]. Changing speech habits therefore involves a very large Differentiating Eq. (2) with respect to t, and rescaling the number of word exchanges, at least in the tens of thousands units of time so that one time unit is equal to one memory (comparable in magnitude to typical vocabulary size [43]). length τ, we obtain Although the rate at which individuals adapt their speech is not constant throughout life (it is particularly rapid in the ∂m ðx; y; tÞ young), adaptation has been observed even in late middle ¼ p ½mðx; y; tÞ − m ðx; y; tÞ i i age [44]. To capture the cumulative effect of linguistic ∂t interaction we make use of a forgetting curve, which þ ∇ fρðx; yÞp ½mðx; y; tÞg; ð3Þ measures the relative importance of recent interactions to 2ρðx; yÞ older ones. From a mathematical point of view, the simplest form for this curve is an exponential, and in fact there is which governs the spatial evolution of the ith alternative for some evidence from experiments involving word recall a single linguistic variable. We note that memory length no [45], which suggests that this is an appropriate choice. longer appears as a parameter. An enhanced intuitive However, we emphasize that the curve, for us, is simply a understanding of this evolution equation may be gained way to capture the fact that current speech patterns depend from its discrete counterpart, used to find computational on past interactions and that older interactions tend to be solutions, and derived in Appendix A. less important. With this in mind we make the following The simplest possible choice for p is to let speakers use definition of the memory of a speaker from the neighbor- each variant with the same frequency that they remember it hood of ðx; yÞ, for the ith variant of a variable, being used: p ½mðx; y; tÞ ¼ m ðx; y; tÞ. This produces i i “neutral evolution” [34,46,48–50], where there is no bias m ðx; y; tÞ in the evolution of each variant. Equation (3) then describes Z Z ðs−tÞ=τ e pure diffusion, and variants spread out uniformly over the ≔ kðx; y; u; vÞf ðu; v; sÞdudv ds ð1Þ system. If all linguistic variables evolved in this way we −∞ R would eventually have one spatially homogeneous mixture ðs−tÞ=τ t of grammar, pronunciation, and vocabulary. If our memory model involved a stochastic component [46], then even- −∞ tually we would expect all but one variant of each variable to disappear. Neither of these outcomes reflects the reality × f ðx; y; sÞþ ∇ fρðx; yÞf ðx; y; sÞg ds: ð2Þ i i 2ρðx; yÞ of locally distinctive forms of language. 031008-4 SPATIAL EVOLUTION OF HUMAN DIALECTS PHYS. REV. X 7, 031008 (2017) We motivate our choice for p based on two observa- tions. The first is that dialects exist. In order for this to be the case, if the ith variant of a linguistic variable has been established amongst a local population for a considerable time so that m ≈ 1, then a small amount of immigration into the region by speakers using a different variable should not normally be sufficient to change it. Mathematically, this is equivalent to the statement that the nondiffusive term, p ½mðx; y; tÞ − m ðx; y; tÞ, in our evolution equation i i [Eq. (3)] must possess a locally stable fixed point at m ¼ 1. The second observation is derived from experiments on social learning, which show that the behavior of individuals is considerably influenced by the majority opinion of those FIG. 1. Dashed line shows the function p ðmÞ ≡ f , defined by with whom they interact [51–53]. In fact, such social 1 1 Eq. (4) for the V ¼ 2 model with conformity number β ¼ 2. conformity is widely observed in the animal kingdom Dotted line shows the neutral version p ðmÞ¼ m (when β ¼ 1) 1 1 and is responsible for the formation of dialects in some for comparison. Solid line shows the function p ðmÞ − m in the 1 1 species of birds [54]. Recent experimental research into case β ¼ 2, giving the time derivative of the memory in the human social learning [53], in which individuals were absence of spatial variation. Note that the dashed line shows that allowed to make a choice, before being exposed to the when speakers’ memories contain a majority of variant 1, they opinions of a group, has revealed that the likelihood of an use this variant in a greater proportion than they recall it being individual switching their decision depends nonlinearly on used. This leads to progressively greater levels of conformity: the proportion of the group who disagree. It is an increasing more speakers using variant 1. function which climbs rapidly when the proportion exceeds 50%, also possessing an inflection for large groups. In the English, for example, language change is often initiated by context of language, such nonlinear conforming behavior the working and lower middle classes [56,57], before would mean that variants that were used more frequently spreading to other groups. Some forms of language change than others should be used with disproportionately large are driven by resistance to conformity; for example frequency in the future. A simple way to capture this “prestige dialects” (received pronunciation in the UK) behavior is to define are used to signify membership of a social elite, set apart from the common people. The desire to set oneself apart ðm Þ from others can also create reversals in language use p ðmÞ ≔ P ; ð4Þ V β ðm Þ j¼1 j amongst subsets of a population. For example, local residents of Martha’s Vineyard [58,59] reverted to an where β ≥ 1 measures the extent of conformity (non- archaic form of pronunciation in order to reaffirm local neutrality). If β ¼ 1, we have the neutral model, and for tradition in the face of invading tourists. A similar effect β > 1, the nondiffusive term in Eq. (3) has a stable fixed was observed on the island of Ocracoke in North Carolina point at m ¼ 1. According to Eq. (4), individuals dispro- [60], but in this case the reversal was temporary. As well as portionately favor the most common variants they have social factors, language use may also be determined by age, heard: they have a tendency to conform to the local majority gender, or ethnicity [61]. It is clear that reality is far more language use with β measuring the strength of this effect. In complex than our simple model, which does not make any the limit β → ∞, all speakers use only the most common of these distinctions between speakers. However, the fact (modal) dialect they have heard. An example of this func- that dialects exist is itself evidence that, in general, people tion is plotted in Fig. 1 for the V ¼ 2 model. Conforming do adapt to local speech patterns. To model every speaker behavior allows local dialects to form, as we show below. as having the same need to conform is therefore a The model we define is a coarse-grained description of reasonable first approximation to reality. It also has the real linguistic interactions which in reality are much more value of simplicity, allowing us later to determine the complex. Much of this complexity arises because there are importance of various additional levels of complexity by often many distinct class, ethnic, or age-defined social comparing how effectively our model fits empirical data networks in any given geographical region. Within each of when compared to more complex models. these subgroups the need to conform leads to similar speech patterns among members, and these patterns often, but not IV. SYNTHETIC DIALECT MAPS always, spread to other groups. Research by linguists has A. Application to Great Britain demonstrated that social factors strongly influence the uptake of particular speech patterns [55] and that language We apply our model to the island of Great Britain (GB), use is correlated with social class and identity. In American whose early inhabitants were known as Britons, and spoke 031008-5 JAMES BURRIDGE PHYS. REV. X 7, 031008 (2017) Celtic languages [62]. The earliest form of English was back through hundreds of years. Since dialect evolution brought to the island by invading Germanic-speaking equation (3) depends only on relative population densities, settlers. This became Anglo Saxon (or Old English), as the current density distribution therefore serves as reason- written by Alfred, King of Wessex (849–899 A.D.), but able proxy for historical versions. We estimate that σ lies in would not be recognizable to modern speakers. It slowly the range 5 < σ < 15 km based on that fact that the changed, with external influences (notably Norman), into average distance traveled to work in GB in 2011 was the English we know today [19]. 15 km [64], whereas the average distance traveled to We seek to discover the extent to which the spatial secondary school was 5.5 km [65]. In Sec, VII, we find distribution of dialect structures that have emerged in GB that the typical width of a transition region between −1=2 can be predicted by Eq. (3). To model the evolution of linguistic variables is ≈1.8σðβ − 1Þ . For example, the individual linguistic variables we take mainland GB as our transition between northern and southern GB dialects is spatial domain, and numerically solve Eq. (3) on a grid of ≈60 km wide [1], which, if σ ¼ 10 km, gives the approxi- discrete points (Fig. 2) using an explicit Euler scheme [63] mation β ≈ 1.1. (Appendix A). The initial condition for the solution is a randomly generated spatial frequency distribution where 1. Evolution of isoglosses each grid point is assigned a randomly selected variant. By When it comes to interpreting our results, the fact that repeatedly generating initial conditions and solving the usage frequencies are continuously varying through space system, we can determine the most probable equilibrium presents a similar problem to that faced by dialectologists spatial distributions of language use. The population when trying to draw isoglosses. We resolve this by defining density ρðx; yÞ is estimated using 2011 census data [64], domain boundaries to be lines across which the modal which gives the number of inhabitants at each of the ≈1.8 × (most common) variant changes. A domain is therefore a 10 UK postcodes. A smooth density is then obtained from region throughout which a single variant is the most this by allowing the inhabitants to diffuse a short distance commonly used. We may think of domain boundaries as from the geographical center of their postcode. Despite synthetic isoglosses generated by Eq. (3). In Fig. 2,we significant overall population growth, the locations of show a series of snapshots of the evolution of domains major population centers in GB can trace their origins when there are V ¼ 3 variants. Isogloss evolution is driven by a two-dimensional form of surface tension [66]: in the absence of density variation, curved boundaries straighten out. Figure 3 illustrates why this happens faster when curvature is greater. Here, speaker L hears more of variant A, so domain B will retract in this locality. Speaker R hears more of variant B, and so domain A will retract in this region. The net effect will be to straighten the boundary, reducing its length. If a boundary forms a closed curve, then this length reduction effect can cause it to evolve toward a circular shape, and reduce in area, eventually disappearing altogether. However, this shrinking droplet effect can be arrested or reversed if the droplet surrounds a sufficiently dense population center (a city). In fact, population centers typically repel isoglosses in our model, and so have a tendency to create their own domains. An explanation of this effect is given in Fig. 4. Here, we have a FIG. 2. Evolution of the V ¼ 3 model from randomized initial condition with σ ¼ 15 km and β ¼ 1.1 at times t ∈ f1; 2; 4; 8; 16; 32g, where one time unit corresponds to one FIG. 3. The surface tension effect at domain boundaries. Blue memory length. Colors indicate which variant is most common at dots represent speakers and black circles give an approximate each position. Numerical solution implemented in C++ on grid representation of interaction ranges. In the red shaded parts of with 2-km spacing [63] (GB is ≈1000 km north to south). Each these interaction ranges, variant A is more common, and in the grid point initialized with randomly selected variant. yellow shaded parts, variant B is more common. 031008-6 SPATIAL EVOLUTION OF HUMAN DIALECTS PHYS. REV. X 7, 031008 (2017) FIG. 4. Behavior of an isogloss surrounding a densely popu- lated area. The blue dot represents a speaker on the isogloss FIG. 5. Behavior of isoglosses at an indentation in coastline or between variants A and B. Other speakers are shown as black other boundary (political or naturally occurring). Dashed isogloss dots. The circle around our speaker represents their typical range is unstable and will evolve toward the solid isogloss which of interaction. Within red shaded part of this interaction range, emerges perpendicular from the coast. We assume that both variant A is more common, and in the yellow shaded part, variant isoglosses are effectively anchored to a feature some distance B is more common. Because of variation in population density, away, opposite the boundary shown. Speakers are shown as blue they hear more of variant B (dashed lines indicate interactions) dots, and colors have the same meanings as in Fig. 3. despite the fact that a greater area (red shaded) of their interaction range lies within the domain of variant A. in the Atlantic coast of France: the Gironde estuary [1], region of high population density in which linguistic separating the langue d’oc from the langue d’oil. The fact variant B is dominant, surrounded by a low-population that bundling at such locations is predicted by our model density region where variant A is in common use. We provides the first sign of the predictive power of the surface consider the linguistic neighborhood of a speaker located tension effect. on the isogloss separating the two domains. From Fig. 4 we Having considered the evolution of a single linguistic see that although the majority of the speaker’s interaction variable, we now turn to modeling dialects. A dialect is range lies in region A, she has many more interactions with typically defined by multiple linguistic characteristics, and those in region B, and is therefore likely to adapt her speech we can capture this by combining many solutions to toward variant B, causing the isogloss to shift outward into Eq. (3). In Fig. 6, we superpose the synthetic isoglosses the low-density area. for 20 binary (V ¼ 2) linguistic variables. We see that there The most common form of stable isogloss generated by is a significant degree of bundling where many isoglosses our model is a line, typically with some population density follow similar routes across the system. Given that the induced curvature, connecting two points on the boundary initial conditions for each variable are distinct random of the system. In order to be stable, such lines must emerge frequency distributions (Fig. 2), these bundles represent perpendicular from system boundaries, and as a result they highly probable isogloss positions: many different early are attracted to indentations in coastline, as illustrated in spatial distributions lead to these at later stages of evolu- Fig. 5. In this figure, we consider two speakers located at tion. The key point here is that the final spatial structure of the points where two possible isoglosses meet the coast (or dialect domains is rather insensitive to the early history of other system boundary—a country border or a mountain the language: the effects of surface tension and population range, for example). Speaker R, on the dashed isogloss, density draw many different isoglosses toward the same hears more of variant B because the isogloss is not stable configurations. In this sense the surface tension perpendicular to the coast; it will therefore migrate upward effect is an “invisible hand” which, in the long term, can toward the apex of the coastal indentation until it reaches overpower historical population upheavals. However, we the stable form shown by the solid line. This effect can be emphasize that our model predicts only the spatial structure seen in Fig. 2, where the longest east-west isogloss has of dialects and not their particular sound; this is very much migrated so that it emerges from the largest indentation on determined by quirks of history and the initial state of the the east coast of GB. In reality this indentation, called “the system. Figure 6 also illustrates the effect of human wash,” is the site of an isogloss bundle (the coincidence of mobility on dialect structure. For a smaller interaction several isoglosses) separating northern [℧] from southern range (5 km), the structure of synthetic isogloss bundles is [∧] [67]. A similar bundle occurs at the largest indentation more complex, producing a larger number of distinct 031008-7 JAMES BURRIDGE PHYS. REV. X 7, 031008 (2017) FIG. 7. Left map: Future England dialect boundaries predicted by Trudgill [4]. Right map: Future dialect boundaries predicted using k-medoids cluster analysis of 20 synthetic binary linguistic variables when σ ¼ 10 km and β ¼ 1.1 at t ¼ 150. Levenshtein distance (or “edit distance”) [9] used as distance metric. Colors, determined by Hungarian method, show mapping between FIG. 6. Superposition of the isoglosses at t ¼ 50 produced by dialect areas. Black dotted line shows north-south isogloss. 20 solutions of the V ¼ 2 model with β ¼ 1.1, each with different randomized initial conditions. For the left-hand map, σ ¼ 5 km, and for the right-hand map, σ ¼ 10 km (see video in Supple- boundaries. These aggregated data are then divided into k mental Material [38]). Background shading indicates population clusters using the k-medoids algorithm [70] (available in density with brightest orange corresponding to 7200 inhabitants the R language). The metric used for linguistic distance per km . between sample points is the Manhattan distance between the binary vectors, where the two variants are labeled 1 or regions. This effect is well documented in studies of the −1. Because we are comparing vectors which can be historical evolution of dialects which were, in the past, transformed into one another purely by substitutions more numerous and covered smaller geographical areas [4]. (1 for −1 or vice versa), rather than insertions or deletions, Within our model, this is explained by the fact that this is equivalent to the Levenshtein distance used in fluctuations in population density become relevant to dialectometry [2,9]. We find that almost identical results isogloss evolution only when they take place over a length are obtained by applying Ward’s hierarchical clustering scale that is comparable to the interaction range: Two algorithm [71] to the sample locations and subsequently human settlements could develop distinct dialects only if cutting the tree into k clusters. they were separated by a distance significantly greater than In order to compare our cluster analysis to the work of σ, otherwise they would be in regular linguistic contact. dialectologists, we consider a prediction for the future dialect areas of England (excluding Wales and Scotland) 2. Cluster analysis made by Trudgill [4], shown in the left-hand map of Figs. 7 and 8. This prediction divides the country into 13 regions, Having analyzed our model using isoglosses, we now and is the result of a systematic analysis of regional make comparison to recent work in dialectometry, where variation in speech and ongoing changes. Such sharp dialect domains have been determined using cluster analysis and by multidimensional scaling [68]. A typical clustering approach [10,12] is to construct a data set giving the frequencies of a wide range of variant pronunciations at different locations, and then to cluster these locations according to the similarity of their aggregated sets of characteristics. Resampling techniques such as bootstrap [69] may be used to generate “fictitious” data sets and improve stability. We mimic this approach by constructing a synthetic data set from 20 solutions of Eq. (3) with V ¼ 2, and each with different random initial conditions, corre- sponding to different linguistic variables. We then ran- domly select a large number (6000) of sample locations within GB and determine the modal variants for each of the FIG. 8. Left map: Future England dialect boundaries predicted 20 variables at each location. This sample size is chosen to by Trudgill [4]. Right map: Voronoi tessellation with the same be sufficiently large so that the effect of resampling is only number of cells as Trudgill’s prediction. Colors determined by to make short length scale (≪1 km) changes to cluster Hungarian algorithm. 031008-8 SPATIAL EVOLUTION OF HUMAN DIALECTS PHYS. REV. X 7, 031008 (2017) TABLE I. Metrics measuring the similarity between Trudgill’s predictions [4] for future English dialects and the predictions of our model. Metric acronyms are OL (overlap), WOL (weighted overlap), RI (Rand index), and ARI (adjusted Rand index). The Voronoi example column gives equivalent metrics for the example Voronoi tessellation in Fig. 8, and the Voronoi set column gives the mean metrics, with standard deviation, for five randomly generated Vornoi tessellations. Metric Model Voronoi example Voronoi set OL 68% 41% 45  3% WOL 82% 36% 49  11% FIG. 9. Results of k-medoids cluster analysis of 20 synthetic RI 0.91 0.84 0.83  0.01 binary linguistic variables when σ ¼ 10 km and β ¼ 1.1 at ARI 0.63 0.29 0.30  0.01 t ¼ 150. Edit distance [9] used as distance metric. k values from left to right are k ¼ 16, 22, 30. Trudgill’s prediction. The weighted overlap (WOL) divisions are a significant simplification of reality, however, weights overlapping regions in proportion to their popu- and hide many subtle smaller-scale variations. The decision lation density: it gives the probability that a randomly to define 13 regions therefore reflects a judgment on the selected inhabitant will be assigned to the same dialect zone range of language use which can be categorized as a single by both maps. From Table I we see that this probability is dialect. To allow comparison with this prediction, we high (82%) for our model, but lower for the random perform a set of cluster analyses of near-equilibrium Voronoi model. We suggest that this is a result of the fact (large t) solutions for the whole of GB, for a range of that population centers tend to repel isoglosses and, values of the number k of clusters (see Fig. 9), with the therefore, often lie at the centers of dialect domains. We aim of producing 13 within the subset of GB defined by examine this repulsion effect in more detail below. The England. The closest result is 14 clusters for 20 ≤ k ≤ 24, final two metrics are commonly used to compare cluster- with almost identical results within England for each of ings. Consider a set S of elements (spatial locations for us) these choices. Having defined our synthetic dialect regions, that has been partitioned into clusters (dialect areas) in two we apply the Hungarian method [72] to find the mapping different ways. Let us call these two partitions X and Y. The between our synthetic dialects and Trudgill’s predicted Rand index (RI) [74] is defined as the probability that, dialects, which maximizes the total area of overlap between given two randomly selected elements of S, the partitions X the two. The results are shown in Fig. 7. To provide a and Y will agree in their answer to the question: are both measure of the effectiveness of our model in matching elements in the same cluster? A disadvantage with using Trudgill’s predictions, we also define a null model, which this index to compare dialect maps is that the larger the divides the country into regions at random, independent of number of regions in the maps, the more likely it is that two population distribution and without reference to any model randomly selected spatial points will not lie in the same of speaker interaction. There are a number of models that cluster in either map. The index therefore approaches 1 as generate random tessellations of space [73], many of which the number of dialect areas grows. This problem may be are motivated by physical processes such as fracture or countered by taking account of its expected value if X and crack propagation. We exclude such physical assumption Y were picked at random, subject to having the same and thus opt for the Voronoi tessellation [73], based on the number of clusters and cluster sizes as the originals [75]. Poisson point process: the simplest of all random spatial The “adjusted Rand index” (ARI) is then defined as processes. Our null model is then a Voronoi tessellation of England (Fig. 8) using 13 points selected uniformly at RI − expected index ARI ≔ : ð5Þ random from within its borders, with dialects labeled to 1 − expected index most closely match Trudgill’s map, using the Hungarian method. The ARI ∈ ½−1; 1 measures the extent to which a cluster- Having generated our synthetic dialect maps, we now ing is a better match than random to some reference quantify the extent to which they match the predictions of clustering and is used by dialectometrists [76] in preference Trudgill. The null model, because of its lack of modeling to the original Rand index. For us, the reference clustering assumptions, will reveal the extent to which our model is is Trudgill’s predicted dialect map, and the Rand and “better than random” at matching these predictions. We adjusted Rand indices in Table I measure similarity to this offer four alternative metrics of similarity in Table I. The reference. simplest metric is overlap (OL): the percentage of land area The primary conclusion that may be drawn from the which is identified as belonging to the same dialect as indices in Table I is that by all measures our model provides 031008-9 JAMES BURRIDGE PHYS. REV. X 7, 031008 (2017) a better match than the null model (indices all differ by at least 3 standard deviations, and typically many more). Of particular interest is the weighted overlap probability (WOL ¼ 82%). Isoglosses are typically repelled by pop- ulation centers, so tend to pass through regions of relatively low density. Because of this the WOL may be viewed as a measure of the effectiveness of the model at determining the centers of dialect regions and is less sensitive to small errors in isogloss construction, explaining its high value. It is important to realize also that Trudgill’s predictions may themselves be imperfect. FIG. 10. Evolution of isoglosses in a 400 × 200 system with two We now make some qualitative comments. The dotted opposing boundary indentations and unit background population line in Fig. 7 shows the location of our model’s most dense density, together with a collection of cities contributing additional north-south isogloss bundle. This is coincident with what is 2 2 population densities ρðrÞ¼ðρ − 1Þ exp f−r =ð2R Þg,where r is described by Trudgill as “one of the most important distance from city center, R ¼ 10,and ρ ¼ 4 (measuring ratio of isoglosses in England” [14] dividing those who have [℧] peak city density to background). Parameter values σ ¼ 4, β ¼ 2. in butter from those who do not. In our model, the fact that Evolution times t ¼ 10, 50, 300, 1660. See video in Supplemental this border lies where it does is a result of the surface Material for full animation [38]. tension effect which attracts many isoglosses towards the two coastal indentations at either end (see video in Supplemental Material [38]). The fact that many random- Dusseldorf, Cologne, Koblenz, and Trier. This arrangement ized initial boundary shapes evolve toward this configura- of isoglosses is known as the “Rhenish fan.” In Fig. 10,we tion, and that the configuration is seen as important by construct an artificial system with boundaries approximat- dialectologists, supports the hypothesis that surface tension ing the geographical structure of relevant parts of the is an important driver of spatial language evolution. We Dutch-German language area illustrated in Bloomfield also note that the western extremities of GB (Cornwall and [3] containing an artificial cluster of population centers northwest Scotland) support multiple synthetic dialects in representing the German cities located near the Rhine. The our model, and we suggest that this is due to a heavily system was initialized using the same randomization indented coastline and the fact that high aspect-ratio procedure used for GB, and Fig. 10 shows a superposition tongues of land are likely to be crossed by isoglosses; a of ten solutions, each with different initial conditions. In the fact predictable by analogy with continuum percolation early stages of evolution, very little pattern is discernible, [21]. The southwest peninsula has historically supported but as time progresses, the main indentation collects three dialects. isoglosses, while the cities repel them, producing a fanlike In future work, the model might be tested by comparing structure. We therefore suggest that the isogloss separation its predictions to well-researched dialect areas. On example which created the Rhenish fan may have been the result of is Netherlands. Here, dialectologists have performed a repulsion by the cities of the Rhine. cluster analysis [68] based on Levenshtein distances We next consider an example of what some physicists between field observations of 360 dialect varieties (corre- refer to as “stripe states” [21]: in finite systems that sponding to 357 geographical locations), revealing 13 experience phase ordering, and have aspect ratios greater significant geographical groupings. The extent to which than 1, boundaries between two orderings often form across a model is consistent with these groupings, accounting for the system by the shortest route (in a rectangle, joining two variability caused by finite sample size, could be tested by long sides). A collection of such boundaries forms a striped generating equivalent clusterings for multiple fictitious pattern of phase orderings. Figure 11 illustrates this effect, dialect samples of the same size. produced by Eq. (3). Our model therefore predicts such striped dialect patterns in long thin countries, and a B. Bundles, fans, stripes, and circular waves particularly striking example of the effect may be seen in the dialects of the Saami language [77]. The Saami We now illustrate a number of well-known features of dialect distributions which may be qualitatively reproduced people are indigenous to the Sámpi region (Lapland), by our model. We consider first the isogloss bundle which includes parts of Norway, Sweden, and Finland. reported by Bloomfield [3] separating “High German” Their Arctic homeland forms a curved strip with a length from “Low German.” The bundle emerged from the tip which is approximately 5 times its average width. The of an indentation of the Dutch-German speech area region is divided into ten language areas, and the bounda- (bordered to the east by Slavic languages) and ran roughly ries of all but two of these take a near-direct route between east-west before separating approximately 40 km east of the two long boundaries of the system, forming a distinctive striped pattern. Another example is the dialects of Japan, the river Rhine, and fanning out around cities such as 031008-10 SPATIAL EVOLUTION OF HUMAN DIALECTS PHYS. REV. X 7, 031008 (2017) FIG. 11. Evolution of isoglosses in a 400 × 200 rectangular system with uniform population density, starting from random- ized initial conditions. We show a superposition of 10 such solutions. Notice that all isoglosses join the two long sides of the system. Parameter values σ ¼ 4, β ¼ 2. Evolution times t ¼ 30, 90, 270, 810. FIG. 12. Evolution of a circular isogloss (initial radius 30) in a whose boundaries in many cases cross the country 200 × 200 system with unit background population density, perpendicular to its spine [78]. together with a central city contributing additional density The relationship between geographical separation and 2 2 ρðrÞ¼ðρ − 1Þ exp f−r =ð2R Þg, where r is distance from city linguistic distance (often measured using Levenshtein dis- center, R ¼ 40, and ρ ¼ 21. Parameter values σ ¼ 4, β ¼ 2. tances [2]) is typically sublinear [2,39]. The definition of Evolution times t ∈ f10; 20; 30; …; 300g. linguistic distance and its relation to geographic distance was made by Séguy [5,6], and the relationship therefore younger speakers, who are more susceptible to new forms goes by the name Séguy’s curve [39]. It has been of speech. We illustrate how this effect can be analyzed in substantially refined and tested since [2,39,79], and also Appendix B. generalized to involve travel time [80]. Séguy’s curve is not universal, however. For example, an analysis of Tuscan V. SÉGUY’S CURVE dialect data [81] reveals an unusually low correlation between phonetic and geographical distances. A more We now determine the relationship between geographi- detailed analysis reveals that there are geographically cal and linguistic distance within our model, providing an remote areas which are linguistically similar, and that analytical prediction for the form of Séguy’s curve [5,6,39]. within an approximately circular region (radius ≈40 km) For simplicity, we consider the two variant model V ¼ 2 around the main city, Florence, phonetic variation corre- and suppose that our language contains a number n of lates more strongly with geographical proximity. It is linguistic variables. At each location in space the local hypothesized [81] that this pattern marks the radial spread dialect is an n-dimensional vector of the local modal of a linguistic innovation (called “Tuscan-gorgia”). These variants which we label 1 and −1. Letting ϕðr;tÞ, where Tuscan data motivate our final example of the qualitative r ¼ðx; yÞ, be the vector field giving the distribution of behavior of our model. To illustrate how a linguistic these variants, the number of differences (the Levenshtein variable can spread outwards from a population center, distance [9]) between two dialects ϕðr ;tÞ ≕ ϕð1Þ and purely through the effects of population distribution and not ϕðr ;tÞ ≕ ϕð2Þ is ½n − ϕð1Þ · ϕð2Þ=2. The linguistic dis- necessarily driven by prestige or other forms of bias, we tance Lð1; 2Þ between two dialects may be defined [82] as simulate our model using an artificial city with Gaussian the fraction of variables that differ between them: population distribution (Fig. 12). The system is initialized with a circular isogloss, centered on the city, representing a 1 ϕð1Þ · ϕð2Þ Lð1; 2Þ ≔ 1 − : ð6Þ local linguistic innovation. Because population density is a 2 n decreasing function of the distance from the city center, Since we assume that each variant evolves independently of speakers on the isogloss hear more of the innovation than every other, the expected linguistic distance is their current speech form, allowing it to expand (as explained in Fig, 4). We see in Sec. VII that this expansion will not not necessarily continue indefinitely. Expansion lð1; 2Þ ≔ E½Lð1; 2Þ ¼ ð1 − E½ϕ ð1Þϕ ð2ÞÞ; ð7Þ i i processes such as this have also been observed in Norway [14]. In that case, the progress of new linguistic forms was where ϕ is the ith component of ϕ. To compute shown to depend on age, with changes more advanced for E½ϕ ð1Þϕ ð2Þ, we make use of the close similarity between i i 031008-11 JAMES BURRIDGE PHYS. REV. X 7, 031008 (2017) Eq. (3) and the time-dependent Ginzburg-Landau equation that derivatives are limits of sums, and sums of Gaussian [20] to derive (see Appendix C) an analogue of the Allen- random variables are themselves Gaussian). However, the Cahn equation [83] giving the velocity of an isogloss at a values of the field at different spatial locations develop point in terms of the unit vector g normal to it at that point: correlations so that the joint distribution of any pair is bivariate normal. Following Bray [20], we define the ˆ ˆ ∇ · g ∇ρ · g normalized correlator v ¼ −βσ þ ð8Þ 2 ρ E½mðr ;tÞmðr ;tÞ 1 2 pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi γðr ; r Þ ≔ : ð14Þ κ ∇ρ · g ˆ 1 2 2 2 E½mðr ;tÞ  E½mðr ;tÞ ¼ −βσ þ : ð9Þ 1 2 2 ρ Using the abbreviated notation γðr ; r Þ ≡ γð1; 2Þ, the 1 2 The quantity κ is the curvature of the isogloss at the point: correlator for the original field may be found by averaging in the absence of variations of population density, the over the bivariate normal distribution of mðr ;tÞ ≕ mð1Þ isogloss moves so as to reduce curvature. The second term and mðr ;tÞ ≕ mð2Þ (see, e.g., Ref. [85]): in the square brackets produces a net migration of iso- glosses towards regions of lower population density. To E½ϕ ð1Þϕ ð2ÞðtÞ¼ E½sgn½mð1Þsgn½mð2Þ ð15Þ i i compute correlation functions between the field ϕ at different locations in space, we apply the Ohta-Jasnow- −1 ¼ sin ðγð1; 2ÞÞ: ð16Þ Kawasaki (OJK) method [84], introducing smoothly vary- ing auxiliary field mðx; y; tÞ, which gives the value of the ith variant as ϕ ¼ sgnðmÞ. Note that the auxiliary field We now compute this correlator and derive a theoretical mðx; y; tÞ is distinct from the memory m ðx; y; tÞ for prediction for Séguy’s curve. the ith variant. The OJK equation, describing the time evolution of this field, adapted to include density effects, is A. Uniform population density (Appendix C) If population density is constant ρ ¼ C, then our adapted 2 OJK equation (10) reduces to OJK’s original form, which ∂m ∇ m ∇ρ:∇m ¼ βσ þ : ð10Þ has the fundamental solution ∂t 4 ρ jr−r j expf− g We introduce the fundamental solution, Gðr;t; r Þ (the 0 βσ t Gðr;t; r Þ¼ ; ð17Þ Green’s function) of Eq. (10), giving the function mðr;tÞ πβσ t subject to the initial condition mðr; 0Þ¼ δðr − r Þ. The solution for arbitrary initial conditions is then giving a normalized correlator mðr;tÞ¼ dr Gðr;t; r Þmðr ; 0Þ: ð11Þ 0 0 0 γð1; 2Þ¼ exp − ≕ γ ðrÞ: ð18Þ 2 2 2tβσ We assume that the initial condition of our system Our prediction for Séguy’s curve at time t is therefore consists of spatially uncorrelated language use, so that E½ϕ ðr ; 0Þϕ ðr ; 0Þ ¼ δ . A convenient, equivalent i 1 i 2 r r 1 2 1 2 −1 lðr; tÞ¼ 1 − sin ðγ ðrÞÞ : ð19Þ condition on the auxiliary field is to let it be Gaussian 2 π (normally) distributed mðr; 0Þ ∼ N ð0; 1Þ with correlator This curve is plotted in Fig. 13 along with simulation E½mðr ; 0Þmðr ; 0Þ ¼ δðr − r Þ: ð12Þ 1 2 1 2 results. We give the following interpretation of the curve. Starting from a randomized spatial distribution of language We can compute this correlator at later times using the use, the need for conformity generates localized regions fundamental solution G: where particular linguistic variables are in common use, and these regions are bounded by isoglosses. These regions E½mðr ;tÞmðr ;tÞ 1 2 Z expand, driven by surface tension in isoglosses, so that 0 0 0 0 0 0 from any given geographical point one would need to travel ¼ Gðr ;t;r ÞGðr ;t;r ÞE½mðr ; 0Þmðr ; 0Þdr dr 1 1 2 2 1 2 1 2 R farther in order to find a change in language use. The linguistic distance between two points therefore tends to ¼ Gðr ;t;r ÞGðr ;t;r Þdr : ð13Þ 1 0 2 0 0 decrease with time, and the curve [Eq. (19)] gives the rate of decrease as exponential. There are features of reality which The linearity of our adapted OJK equation (10) ensures that we might expect to alter this behavior. First, we assume that mðr;tÞ remains Gaussian for all time [20] (to see this, note no major population mixing or migration takes place—such 031008-12 SPATIAL EVOLUTION OF HUMAN DIALECTS PHYS. REV. X 7, 031008 (2017) The source term is ∇ρ m − ∇ · m ¼ : ð23Þ ρ rR We now view Eq. (21) as describing the mass distribution for a collection of Brownian particles which are being driven radially away from the origin. The source term is interpreted as a field that causes particles to produce −1 offspring at rate ðrRÞ as they move through it. The fundamental solution, Gðr;t; r Þ, to Eq. (21) is then the mass distribution for a very large (approaching infinite) collection of particles with total mass initially equal to one, FIG. 13. Séguy’s curve showing linguistic distance (l) versus all of which started at r . geographical distance (r). Dashed line shows Eq. (19) in the case We wish to compute the dependence of linguistic σ ¼ 4, β ¼ 2. Open and closed dots show simulated linguistic distance on geographical distance from the peak of the distances using same parameter values in a 1000 × 1000 system population density (thought of as the center of a city). We at times t ¼ 80, 160. Note that linguistic distance depends only therefore require the expectation −1=2 on the combination rt , so curves evaluated at different times collapse onto one another. E½mð0;tÞmðr;tÞ ¼ Gð0;t; r ÞGðr;t; r Þdr : ð24Þ 0 0 0 events would have the effect of resetting the initial con- ditions of the model. Our prediction is valid only during Computation of a general closed-form expression for times of stability. Second, we assume that the population is Gðr;t; r Þ is not our aim; preliminary computations in uniformly distributed in the system when in reality pop- this direction suggest that if such a form existed, its ulations are clumped and, as we have seen, population complexity would restrict its use to numerical computations centers can support their own dialects if they are large alone. Instead, we make arguments leading to a simple enough. We take some steps toward addressing this issue approximation for Séguy’s curve. We observe first that the below. In Appendix D, we briefly discuss a simple one- integrand in Eq. (24) is dominated by the region around dimensional simulation model from the dialectology liter- r ¼ 0. Numerical evidence for this is provided in Fig. 14, ature [39], which includes the same large r behavior as in where we see that the fundamental solution grows in Eq. (19) for a particular choice (quadratic) of macroscopic magnitude as r → 0. In general, the solution consists of “influence” curve. a circular plateau propagating outward from the origin plus an isolated but spreading peak also drifting away from the B. Peaked population density origin (Fig. 14). The plateau is formed once the rate of loss −1 We now consider how Séguy’s curve is modified by the of particles from the peak source region (jr j ≲ R ) though presence of a peak in population density. In order to allow analytical tractability, we consider a simple exponentially decaying peak pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 2 2 x þ y ρðx; yÞ¼ exp − ; ð20Þ where R> 1. To understand the behavior of the modified OJK equation (10), it is useful to decompose it into an advection diffusion equation plus a source term: ∂m ∇m ∇ρ ∇ρ ¼ βσ ∇ · þ m − ∇ · m : ð21Þ ∂t 4 ρ ρ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi FIG. 14. Radial cross sections (along the line y ¼ 0) through 2 2 Defining r ≔ x þ y , the average velocity field for the numerical approximations to fundamental solutions of the diffusing particle is modified OJK equation (21) with population distribution pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 2 2 Eq. (20) at time t ¼ 300. Here, r ¼ x þ y ¼ x. Parameter ∇ρ ðx; yÞ values are βσ ¼ 1 and R ¼ 10. Initial conditions are r ¼ð1; 0Þ; − ¼ : ð22Þ 0 ρ rR ð10; 0Þ; ð20; 0Þ (solid, dashed, dotted lines, respectively). 031008-13 JAMES BURRIDGE PHYS. REV. X 7, 031008 (2017) advection and diffusion is equal to the rate of creation of new particles. The plateau height is determined by the particle mass which reaches the peak source region in the early stages of evolution. Because of radial drift, the only particles with a chance of doing this are those with sufficiently small Péclet number [86]: 4jr j Pe ¼ ; ð25Þ where r is their starting point (or that of their earliest ancestor if they are daughters). Values of r that lie outside a region of radius ∝ R (henceforth Péclet region) around FIG. 15. Continuous line line shows radial cross section (along the origin can therefore be ignored when computing the line y ¼ 0) through numerical solution of the modified Gð0;t; r Þ.For R ≫ 1, the peak region forms a small OJK equation (21) with population distribution Eq. (20) −4 fraction OðR Þ of the Péclet region, and particles within at time t ¼ 700, with initial condition r ¼ð0; 0Þ. Here, pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 2 2 2 the Péclet region have a probability of reaching the peak r ¼ x þ y ¼ x. Parameter values are βσ ¼ 1 and R ¼ 10. which decays exponentially with their initial distance from Data points show asymptotic analytical solution Gðjrj;t;0Þ it. The function Gð0;t; r Þ will therefore itself be sharply [Eq. (30)], with t ¼ 700, time offset t ¼ −4.4, and A ¼ 0 0 0.0109 (found by maximum likelihood). peaked within the Péclet region, around r ¼ 0, and we make the approximation Gð0;t; r Þ ≈ hδðr Þ, where h is 0 0 plateau height. Making use of this approximation in where t is a time correction which accounts for the fact that Eq. (24),wehave the propagation velocity of the plateau takes some time to settle down to its long time value of βσ =R. We verify in E½mð0;tÞmðr;tÞ ≈ hGðr;t; 0Þ: ð26Þ Fig. 15 that this is the correct asymptotic solution by comparing it to the numerical solution of Eq. (29) for To compute the variance large t. We now approximate the normalized correlator as 2 2 Gðr; t;0Þ E½m ðr;tÞ ¼ G ðr;t; r Þdr ; ð27Þ 0 0 2 γðr; tÞ ≈ : ð31Þ Gð0;t;0Þ we note that if jrj ≪ t=R, then the dominant contribution to This approximation neglects the drop in the variance of the integral comes from the plateau component of the mðr; tÞ for r ≫ t=R described by Eq. (28), which amounts solution. If jrj ≫ t=R, then the plateau will not have pffiffi to neglecting a multiplicative factor t in the large r reached r, so only the spreading peak component of the behavior of the correlator. Our approximate analytical fundamental solution will contribute. Therefore, prediction for Séguy’s curve measured radially from the qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi center of the exponentially decaying population distribu- Oð1Þ if jrj ≪ t=R E½m ðr;tÞ ≈ ð28Þ tion is therefore −1=2 Oðt Þ if jrj ≫ t=R: 1 2 Gðr; t;0Þ −1 We comment on the significance of this behavior below. To l ðr; tÞ ≔ 1 − sin : ð32Þ 2 π Gð0;t;0Þ find the form of Gðr;t; 0Þ, we note first its circular symmetry, which reduces the number of variables in the This prediction is compared to correlations in the full OJK equation to two: model (Fig. 16 and Fig. 17) by generating 100 realiza- tions of isogloss evolution over the exponential popula- 2 2 ∂m σ β ∂ m 1 4 ∂m ¼ þ − : ð29Þ tion density, each with different randomized initial ∂t 4 ∂r r R ∂r conditions. From Fig. 16 we see that as time progresses a growing region emerges around the center of the city in We seek a traveling wave solution, subject to the initial which the linguistic distance to the center is close to zero. condition mðr; 0Þ¼ δðrÞ, representing the expanding pla- An alternative visualization of this effect is given in teau, valid for large r, so that the 1=r term in Eq. (29) can be Fig. 18, which shows a superposition of the isoglosses neglected. We obtain, as t → ∞, from 20 simulation runs. As time progresses, a circular patch emerges in the center of the system, which is Rr − βσ ðt − t Þ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi devoid of isoglosses, and therefore where all speakers use Gðr; t;0Þ ∼ A erfc ; ð30Þ Rσ βðt − t Þ the same linguistic variables. Outside of this central “city 031008-14 SPATIAL EVOLUTION OF HUMAN DIALECTS PHYS. REV. X 7, 031008 (2017) FIG. 16. Dashed lines show theoretical shape of Séguy’s curve FIG. 17. Dashed lines show theoretical shape of Séguy’s curve −r=R −r=R Eq. (32) centered at peak of population density ρ ¼ e with centered at peak of population density ρ ¼ e with R ¼ 20, R ¼ 20. Curve computed using Eq. (32) when β ¼ 1.4, σ ¼ 5, and with time evolution accelerated by factor of 1.25. Curve times are t ¼ 10, 20, 30 with offset t ¼ −5.77 (maximum computed using Eq. (32) when β ¼ 1.1, σ ¼ 5, and times are likelihood estimate). Simulation points give equivalent correla- t þ 1.25t, where t ¼ 10, 20, 30 and t ¼ 4.43. Linear scaling of 0 0 tions in the full model computed from 100 independent simu- time determined by maximum likelihood fit of simulation to lations in a 400 × 400 system. analytical prediction. Simulation points give equivalent correla- tions in the full model computed from 100 independent simu- lations in a 400 × 400 system at times t ¼ 10, 20, 30. dialect,” we note that the asymptotic behavior of the complementary error function, insight into the formation of dialects in population centers and the behavior of Séguy’s curve around cities. expf−x g erfcðxÞ ∼ as x → ∞ ð33Þ We leave the development of a more sophisticated theory for future work. −1 3 together with the expansion sin ðϵÞ¼ ϵ þ Oðϵ Þ lead to the prediction that linguistic correlations fall as −cðΔrÞ e =ðΔrÞ, where c is a constant and Δr is the distance from the edge of the city dialect. This is a faster rate of decay than in the flat population density case. It appears from Fig. 16 that in reality the decay rate may be even faster than this. Further simulations reveal that the velocity with which the city dialect expands shows some systematic deviation from the prediction v ≈ βσ =R of our OJK analysis. For example, in Fig. 17 we reduce the conformity parameter to β ¼ 1.1 and we see that our theoretical predictions are in close agreement with the simulation data, provided we accelerate time by a factor of ≈1.25. The value of β ¼ 1.4 selected in Fig. 13 produces a match between predicted and observed velocity, but for larger values of β the pre- diction is an overestimate. For example, when β ¼ 1.5 with all other parameters identical, the simulated velocity in the full model is smaller than our prediction by a factor of 0.97. One possible explanation for this discrep- ancy is that the interface shape may affect the constant of proportionality in the Allen-Cahn equation (9), for example, if it did not match its constant density equi- FIG. 18. Isogloss evolution in a 400 × 400 system with V ¼ 2, librium form. We also note that OJK’s assumption of −r=R β ¼ 1.1, σ ¼ 5 at t ¼ 10, 15, 25, 35 with ρ ¼ e , where isotropy in unit normals to isoglosses, although preserved R ¼ 20 and r ¼ 0 corresponds to the center of the system. Plot is globally by the circular symmetry of our system, is lost a superposition of 20 simulations with different initial conditions. locally at the edge of the city dialect. Despite these Central peak repels isoglosses. See video in Supplemental shortcomings, the adapted OJK theory allows analytical Material for full animation [38]. 031008-15 JAMES BURRIDGE PHYS. REV. X 7, 031008 (2017) range will push out linguistic change, and where two VI. DIALECT AREAS AND DIALECT CONTINUA centers both repel, we expect to see bundling where their There is debate amongst dialectologists as to the most zones of influence meet. Each city would then create its appropriate way to view the geographical variation of own well-defined dialect area. Within real cities we also see language use [1,2]. The debate arises because it is rarely subdialects spoken by particular social groups [55],but the case that dialects are perfectly divided into areas. since our model does not account for social affiliations, we Chambers and Trudgill [1] imagine the following example: cannot explicitly model this. we travel from village to village in a particular direction and In Fig. 19, we schematically illustrate examples of these notice linguistic differences (large or small) as we go. effects using three imaginary “island nations.” Nation A These differences accumulate so that eventually the local exhibits distinct dialect areas. The northernmost area is population are using a very different dialect from that of the supported both by a city, which may generate and then village we set out from. Did we cross a border dividing the repel language features, and by two indentations which dialect area of the first village from that of the second, and form a “pinch point” which will tend to collect isoglosses if so, when? Alternatively, is it a mistake to think of dialects via the boundary effect. Several other pinches exist which as organized into distinct areas; should we only think of a also collect isoglosses, creating distinct dialects. The continuum? southernmost city supports an isogloss via repulsion, which We now set out what our model can tell us about these would otherwise migrate south under the combined influ- questions. In one sense, language use in our model is ence of surface tension and the boundary effect, eventually always continuous in space. Although domains emerge disappearing. Nation B also possesses boundary indenta- where one variable is dominant, domain boundaries form tions, but the lack of pinch points reduces bundling: while transition regions in which the variants change continu- the indentations collect isoglosses, the smoother parts of ously (the width of these regions is computed in Sec. VII). the coastline allow isoglosses to attach anywhere, creating a Despite this, the boundary between two sufficiently large continuum of language use. Two city dialects do exist, single-variant domains will appear narrow compared to the however, driven by repulsion. Finally, nation C is a convex size of the domains, and in this sense the domains are well region. This is an example of a system which, without a defined and noticeable by a traveler interested in one linguistic variable. Of more interest are the observations of a traveler who pays attention to the full range of language use. To perceive a dialect boundary, this traveler must see a major change in language use over a short distance. This change must be large in comparison to other, smaller changes perceived earlier. In our model a major language change is created by crossing a large number of isoglosses over a short distance. The question then is, under what circumstances will isoglosses bundle sufficiently strongly for dialect boundaries to be noticeable? To answer this we need to recall the three effects which drive isogloss motion. First, surface tension, which tends to reduce curvature. Second, migration of isoglosses until they emerge perpendicular to a boundary such as the coast, the border of a linguistic region, a sparsely populated zone, or an estuary. Third, repulsion of isoglosses from densely populated areas. There are two major ways in which these effects can induce bundling, both of which require the essential ingredient of time and demographic stability in order for surface tension to take hold. Indented bounda- ries can collect multiple isoglosses, creating a bundle. Examples already noted include the Wash and the Severn in FIG. 19. Schematic diagram of isoglosses (dashed lines) for GB, the Gironde Estuary in France, and the historical three language areas or “island nations.” Yellow and ochre circles indentation in the Dutch-German language area marking represent cities. Nation A supports dialect areas formed by coastal the eastern end of the Rhenish fan. A major boundary boundary shape and repulsion from cities. Nation B largely indentation may not always create a bundle, however: it exhibits a continuum of variation apart from two somewhat may be that other parts of the boundary, or the presence of indistinct city dialects. Nation C has a single city dialect, but cities, creates a fanning effect. Variations in population without this city (or if the city were not sufficiently densely density can also create bundling. Dense population centers populated) it would have no linguistic variation due to its entirely which are large in comparison to the typical interaction convex boundary and evenly distributed population. 031008-16 SPATIAL EVOLUTION OF HUMAN DIALECTS PHYS. REV. X 7, 031008 (2017) city, could not support more than one dialect, and would Since V is defined by an indefinite integral, we can define 1 0 tend over time to lose isoglosses. Vð Þ ≔ 0.As x → ∞, we require that f ðxÞ → 0 and In some regions there are no dialect areas, only a fðxÞ → 1 or 0, so continuum of variation [87], and in others clear dialects exist [4]. The above examples point to some general 4 ln 2 − 1 E ¼ limVðfÞ¼ limVðfÞ¼ ðβ − 1Þ: principles. In regions with low population density, a lack f→1 f→0 of major boundary indentations, and few large cities, we might expect isoglosses to position themselves in a less The magnitude of the frequency gradient at the origin, predictable way, creating language variation which would 1 where fð0Þ¼ , is, therefore, be perceived as a continuum by a traveler. The regular rffiffiffiffiffiffi creation of new isoglosses (resetting the initial conditions pffiffiffiffiffiffiffiffiffiffiffi 4E β − 1 of the model) through linguistic innovation or demographic jf ð0Þj ¼ ≈ 0.545 : ð36Þ σ σ instability could also disrupt the ordering processes. Narrow geographical regions, or where there are major From this we see that weak non-neutrality and larger boundary indentations, or dense population centers which interaction range produce shallower gradients and therefore push out linguistic change, are particularly susceptible to wider transition regions. As β approaches one, the tran- the formation of distinct dialects. sition region becomes increasingly wide and boundaries disintegrate, destroying the surface tension effect described VII. TRANSITION REGIONS AND CURVATURE in Fig. 3. Equation (36) is verified numerically for an We now derive analytical results that characterize the “almost straight” isogloss in Fig. 20 (red dashed line). transition regions between variables and the effect of We now relate the equilibrium shape of isoglosses to population density on the curvature of dialect domain population density. Starting from our modified Allen-Cahn boundaries. equation (9) for isogloss velocity, and introducing the local To compute the gradients of transition regions, we radius of curvature R ¼ 1=κ, we see that when an isogloss consider a straight isogloss (with constant population is in equilibrium (having zero velocity) density) in equilibrium between variants 1 and 2, given by the line x ¼ 0. Because the isogloss is vertical the R ¼ − ; ð37Þ frequencies will not depend on y, so we write them f ðxÞ 2∇ρ · g and f ðxÞ and note that f ðxÞ¼ 1 − f ðxÞ, so we need only 2 1 2 consider the behavior of f ðxÞ. For notational simplicity we where g ˆ is a unit normal to the isogloss. We note a simple define f ≔ f , m ≔ m and pðmÞ ≔ p ðmÞ. The isogloss 1 1 1 alternative derivation of this result based on the dialect will form the midpoint of a transition region where the fraction F ðx; yÞ of a domain D. We define this as the frequencies change smoothly between one and zero, and where f ð0Þ measures the rate of this transition. In equilibrium, from Eq. (3) we have ∂ pðmÞ¼ m − pðmÞ: 2 00 −1 Since f ¼ pðmÞ,wehave ðσ =2Þf ¼ p ðfÞ − f. Note 00 2 that f is shorthand for ∂ fðxÞ. If non-neutrality (con- −1 formity) is small, we may replace p ðfÞ with its Taylor series about β ¼ 1, neglecting O½ðβ − 1Þ  terms: σ 1 − f 00 2 f ¼ðβ − 1Þfð1 − fÞ ln þ O½ðβ − 1Þð34Þ 2 f FIG. 20. Sequence of radial cross sections of the frequency of a dV ≕− þ O½ðβ − 1Þ : ð35Þ linguistic variable whose initial domain is concentric with a city. df Snapshots taken at times t ¼ 10; 30; 50; … starting from initial radius r ¼ 55. Model parameters σ ¼ 5, β ¼ 1.1. In this example, Here, we define a “potential” function VðfÞ, allowing the city has radius R ¼ 50 and peak density ρ ¼ 3. Vertical line c 1 us to identify Eq. (34) as Newton’s second law for the gives theoretical stable radius R ¼ 125.3 computed from motion of a particle of mass σ =2 in a potential VðfÞ, Eq. (41). Unstable radius [Eq. (40)]is R ¼ 47.2. Red dashed where x plays the role of time, so that the total energy line gives the theoretical frequency gradient in transition region 2 0 2 [Eq. (36)]. E ≔ ðσ =4Þf ðxÞ þ V(fðxÞ) is independent of x [88]. 031008-17 JAMES BURRIDGE PHYS. REV. X 7, 031008 (2017) fraction of conversations that a speaker at a point ðx; yÞ has we see that our law Eq. (37) accurately predicts the stable with people whose home neighborhoods lie in D: radius produced by our evolution equation (3). VIII. DISCUSSION AND CONCLUSION F ðx; yÞ ≔ kðx; y; u; vÞdudv: ð38Þ A. Summary of results Let P ≔ ðx ;y Þ be a point on an isogloss with local radius Departing from the existing approaches of dialectology, of curvature R ≫ σ, bounding some region D. An intui- we formulate a theory of how interactions between indi- tively appealing condition for language equilibrium is that vidual speakers control how dialect regions evolve. Much the speaker at P should interact with equal numbers of of what we demonstrate is a consequence of the similarity speakers from within and without D: between dialect formation and the physical phenomenon of phase ordering. Humans tend to set down roots, and to conform to local speech patterns. These local patterns may F ðx ;y Þ¼ : ð39Þ be viewed as analogous to local crystal orderings in binary alloys [83] or magnetic domains [22]. As with phase Using the saddle point method [47] to evaluate Eq. (38), ordering, surface tension is a dominant process controlling we have the evolution of dialect regions. However, important differences arise from the fact that human populations 1 σ 1 ∇ρ · g ˆ σ are not evenly distributed through space, and the geo- F ¼ − pffiffiffiffiffiffi þ þ O : 2 2R ρ R graphical regions in which they live have irregular shapes. 2π These two effects cause many different randomized early Result Eq. (37) then follows from the equilibrium condition linguistic conditions to evolve toward a much smaller Eq. (39). From this we see that large gradients in population number of stable final states. For Great Britain we show density can produce equilibrium isoglosses with higher that an ensemble of these final states can produce predicted curvature. To test this against our full evolution equa- dialect areas which are in remarkable agreement with the tion (3), consider a circular city with radius R having a work of dialectologists. Gaussian population distribution set in a unit uniform Since language change is inherently stochastic, at small background: spatial scales we can only expect predictions of a statistical nature. At larger scales an element of deterministic pre- 2 2 2 −ðx þy Þ=2R dictability emerges. Within our model, all stochasticity ρðx; yÞ¼ 1 þðρ − 1Þe : arises from the randomization of initial conditions. The effect of this randomness is strongest in the early stages of The constant ρ ≥ 1 gives the relative density of the city language evolution, when the typical size of dialect center compared to outlying areas. Consider a circular domains is small. The superposition of isoglosses lacks isogloss which is concentric with the city, then Eq. (37) has a discernible pattern. This “tangle” of lines produces a two solutions: continuum of language variation, with spatial correlations sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi given by Séguy’s curve. As surface tension takes hold, 1=4 1 e steered by variations in population density and system R ¼ R − 2W − ; ð40Þ 1 c p 2 4ðρ − 1Þ shape, isoglosses begin to bundle and the continuum resolves into distinct dialect regions. Both long-term sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi population stability and large variations in population 1=4 1 e R ¼ R − 2W − ; ð41Þ density play an important role. Without these ingredients 2 c m 2 4ðρ − 1Þ isoglosses will not have time to evolve into smooth lines, or bundle. where W and W are two branches of Lambert’s W p m The assumptions of our model are minimal, and clearly function [89], WðzÞ, which solves z ¼ we . The two there are many additional complexities involved in lan- solutions R and R are, respectively, decreasing and 1 2 guage change which we have not captured; we discuss increasing functions of ρ . The stability of these solutions 1 below how the model might be extended to account for may be determined by noting that if F > , then D will some of these. Despite this simplicity, in addition to expand, and contract if the inequality is reversed. From this substantially reproducing Trudgill’s predictions for we are able to determine that R is stable, whereas R is not. English dialects [4], we are also able to explain the 2 1 If a dialect domain begins with R< R , then it will shrink observation of both dialect continua and more sharply and disappear under the influence of surface tension, but if bounded dialect domains. We also provide an explanation initially R>R , then the domain will expand or contract for why boundary indentations (e.g., in coastline or in the until R ¼ R . This behavior is illustrated in Fig. 20, where border between different languages) are likely to collect 031008-18 SPATIAL EVOLUTION OF HUMAN DIALECTS PHYS. REV. X 7, 031008 (2017) inherently imperfect [19], and many interactions are isoglosses [1]. We show that cities repel isoglosses, explaining the origin of the Rhenish fan [43], the wavelike between young speakers who are simultaneously assimi- spread of city language features [15], and the fact that many lating their language. In this sense the young are weakly dialect patterns are centered on large urban areas. We coupled to the current, adult speech community, and their explain why linguistic regions with high aspect ratio tend to language state is subject to fluctuations which may be develop striped dialect regions [77,78]. We compute an sufficient to overcome local conformity for long enough to analytical form for Séguy’s curve [2,5,6,39] which as yet become established. As these speakers age their linguistic has had no theoretical derivation. We also adapt this plasticity declines, older speakers die, and the change is derivation to deal with a population center. We quantify cemented. Other mechanisms include speech modifications how the width of a transition region [1] between dialects is made to demonstrate membership of a social group, or a related to the strength of conformity in individual language bias toward easier or more attractive language features. To use and the typical geographical distances over which understand mathematically the effect of innovation on individuals interact. We show how to relate the curvature of spatial evolution, we might simply allow the creation of stable isoglosses to population gradients. Finally, in new variants, and then assign them a “fitness” relative to the Appendix B we show how incorporating an age distribution existing population. into the model can quantify the “‘apparent time” [58] effect used by dialectologists to detect a linguistic change in 2. Interaction network progress. Given these findings, we suggest that the theo- By selecting a Gaussian interaction kernel, and not retical approach we present would be worth further distinguishing between different social groups, we are investigation. assuming that the social network through which language change is transmitted is only locally connected in a B. Missing pieces geographical sense but within each locality the social The model we present is deliberately minimal: it allows network is fully connected. That is, I will listen without us to see how much of what is observed can be explained by prejudice to anyone regardless of age, sex, status, or local interactions and conformity alone. This leads to a ethnicity, as long as they live near my home. Research simple unified picture with surprising explanatory power. into 21st-century human mobility [90,91] and the work of However, having chosen simplicity, we cannot then claim linguists [55,58,61] indicates that both these assumptions to provide a complete description of the processes at play. are a simplification. Human mobility patterns, and by We now describe the missing pieces, indicating what effect implication interaction kernels, exhibit heavy-tailed behav- we expect them to have, and how to include them. ior (with an exponential cutoff at large distances). In our framework, an interaction kernel of this type, combined 1. Innovation with densely populated cities would allow long-range connectivity between population centers. Long-range An important aspect of language change that is not interactions in phase-ordering phenomena can have sub- explicitly encoded within our model is innovation: the stantial effects on spatial correlations and domain sizes creation of new forms of speech. We instead assume that there are a finite set of possible linguistic variables, and for [92], and may effectively alter the spatial dimension of the each one, a finite set of alternatives, all of which are present system [93]. in varying frequencies within the initial conditions of the Further evidence for nonlocal networks is provided by model. Each alternative is equally attractive so that con- the Frisian language, spoken in the Dutch province of formity alone decides its fate. A new variant cannot Friesland. This has a “town Frisian” dialect [68], spoken in spontaneously emerge within the domain of another. The towns that are separated from each other by the Frisian model therefore evolves toward increasing order and spatial countryside, where a different dialect is spoken: town correlation. Because of the presence of population centers Frisian is “distributed.” Within the social network these this ordering process is eventually arrested creating distinct, towns are “local” (nearby) to each other. To incorporate this stable domains. If innovation were allowed, then ordering effect into our model, we must either reformulate our would be interrupted by the initialization of new features, fundamental equation (3) to describe evolution on a more and Séguy’s curve would reach a steady state where the rate general network or generalize our interaction kernel to of innovation (creating spatial disorder) balanced the rate of allow communication between cities. Further empirical ordering. evidence for nonlocal interactions is provided by hierar- For a local innovation to take hold, it must overcome chical diffusion [94], where linguistic innovations spread local conformity, realized as surface tension and the between population centers, not necessarily passing “shrinking droplet” effect. Several mechanisms might through the countryside in between. Such a process allow this to happen: for example, young speakers must motivated the creation of Trudgill’s gravity model [14]. recreate their language using input from parents, peers, and In addition, mobility data [91] show that individuals other community members. This recreation process is follow regular, repeating trajectories, introducing strong 031008-19 JAMES BURRIDGE PHYS. REV. X 7, 031008 (2017) heterogeneity within the set of individual interaction As pointed out by Vitány [99], Tolstoy was, in modern kernels. Social, as well as spatial, heterogeneity may also terms, advocating the formulation of a statistical mechanics be important. For example, it has been shown theoretically of history. The work we present is an attempt to formulate [95] that the time required for two social groups to reach such a theory for the spatial history of language. Because of linguistic consensus is highly sensitive to the level of its simplicity, dealing only with copying and movement, affinity that individuals have for their own group. our model may apply more broadly to other forms of culture. 3. Linguistic space and dynamics ACKNOWLEDGMENTS By using a set of linguistic variables we are treating The author was supported by a Leverhulme Trust dialects as points in vector space. Implicit in our dynamics Research Fellowship (RF-2016-177) while carrying out are two assumptions. First, all transitions between variants this work, and is most grateful for this. The referees, from are allowed, with probabilities given only by the frequen- Statistical Physics and Dialectology, provided invaluable cies with which the variables are used. Second, the advice and their time is greatly appreciated. The author evolution of different linguistic variables are mutually would like to particularly acknowledge one quantitative independent. There are cases where this is an incomplete dialectologist reviewer whose comments had a substantial description. A notable example is chain shifting in vowel impact on the work. He is also particularly grateful to sounds [58]. Linguists represents the set of possible vowel Samia Burridge for many ideas and useful advice. sounds as points in a two-dimensional domain where closeness of the tongue to the roof of the mouth and the position of the tongue’s highest point (toward the front or APPENDIX A: DISCRETIZED EVOLUTION back of mouth) form two orthogonal coordinate directions EQUATION [96,97]. The vowel system of a language is then a set of Here, we present the computational scheme used for points in this domain. If one vowel change leaves a gap in solving our evolution equation (3). This is aimed at this system (an empty region of the domain), then other researchers having some familiarity with computer pro- vowel sounds may shift to fill this gap, producing a chain of graming, such as linguists interested in quantitative interconnected linguistic changes. Similarly, a change in approaches. It can also be implemented using only a one vowel to bring it closer to another can push it, and then spreadsheet (see Supplemental Material [38]). The discre- others, out of their positions. A famous example is the tized version of evolution equation (3) also provides a “great vowel shift” [19] in England between the 14th and greater intuitive understanding of its continuous counter- 17th centuries. Another example concerns changes that part. For simplicity, we consider the V ¼ 2 model and spread to progressively more general linguistic (as opposed define f ≔ f , m ≔ m , pðmÞ ≔ p ðmÞ, and note that 1 1 1 to geographical) contexts [81]. If we have several variables, we need only consider the evolution of m and f each signifying the presence of the change in a different because f ¼ 1 − f ¼ 1 − f. 2 1 context, then it is clear that the frequency of one variable We begin by rewriting our evolution equation (3) in can influence the frequency of another, contradicting our terms of the memory and frequency fields assumption that variables are independent. The fact that linguistic variables are sometimes depen- dent upon one another means that, within our model, p , i ∂ m ¼ f − m þ ∇ ðρfÞ; ðA1Þ 2ρ which relates the past use of some variable to the current frequency of its ith variant, via the relationship where f ðx; y; tÞ ≔ p ½mðx; y; tÞ, should sometimes depend on i i the use of other variables, and might be adapted to capture more than just conformity. f ¼ pðmÞ¼ : ðA2Þ β β m þð1 − mÞ C. Conclusion To solve Eq. (A1) on a computer, we discretize space into a We conclude by noting that a major theme of the book rectangular grid of square sites. We let the side of each grid War and Peace by Tolstoy is the idea that history is square define one unit of length. The interaction range used determined not by great individuals but rather by millions in the computer calculation should be expressed in these units. That is, if the side of a grid square is a-km long, and of small choices made by the people. the real-world interaction range is σ km, then the interaction range used in the computer should be σ ≔ σ=a. We choose To elicit the laws of history we must leave aside kings, a so that σ > 1, so speakers interact over distances greater ministers, and generals, and select for study the homo- than one square. The subscript c distinguishes the inter- geneous, infinitesimal elements which influence the masses [98]. action range measured in computer grid units from the 031008-20 SPATIAL EVOLUTION OF HUMAN DIALECTS PHYS. REV. X 7, 031008 (2017) interaction range in km. For each site we store three boundary sites simply shift attention from empty neighbor- quantities, ρ , f , m , where the subscripts i, j are ing squares to those which are part of the linguistic domain, ij ij ij consistent with the original definition of the weighted horizontal and vertical indices related to spatial position interaction kernel). Second, it ensures that spatially con- by x ¼ ia and y ¼ ja. These quantities are our approx- stant memory and frequency fields constitute a fixed point imations to the values of the fields ρ, f, m at the centers of the dynamics. Rule Eq. (A5) is an explicit scheme and as of sites, and are stored in three arrays. Intuitively, we think such its stability requires that δt be chosen sufficiently of each site as containing a group of a ρ speakers, each of ij small. In the case of zero conformity (β ¼ 1) and constant whom uses variant 1 with frequency f . The linguistic ij population density, the von Neumann stability condition domain of interest is the set of sites with nonzero [63] is δt< 1=σ . This serves as a guide to find δt population density. Within the domain, sites with one or sufficiently small for our scheme to converge. For the more nearest neighbors with zero population are referred to density fields and conformity values we use in this paper, as boundary sites, otherwise they are bulk sites. Sites which we find that δt< 1=ð4σ Þ is more than sufficient. are not part of the domain are never updated, and it is useful We conclude this section by explaining the linguistic to include a border of such sites around the edge of the meaning of the terms on the right-hand side of Eq. (A5). rectangular grid. The first term, f − m , drives conformity. If m > , this ij ij ij The scheme we present is based on approximating the 2 2 2 term is positive, driving the memory further towards m ¼1 ij Laplacian ∇ ¼ ∂ þ ∂ using a central finite difference x y 2 2 where all speakers use variant 1. Otherwise, if m < , the approximation for the derivatives ∂ and ∂ . Let g be an ij x y memory is driven towards zero where no speakers do. The arbitrary function defined at every site. We define a local second term, hρfi =hρi − f , acts to equalize speech use average at each grid point: ij ij ij in the local area. If variant 1 is used more at ði; jÞ than in the 1 surrounding squares, then this term acts to reduce its use in hgi ¼ ðg þ g þ g þ g Þ: ðA3Þ ij iþ1;j i−1;j i;jþ1 i;j−1 ði; jÞ. If variant 1 is used relatively less at ði; jÞ, the term has the opposite effect. This is just the average of the values of g at the four nearest neighbors of ði; jÞ. The Laplacian is then approximated: APPENDIX B: INCORPORATING AGE 2 INTO THE MODEL ∇ g ≈ 4ðhgi − g Þ: ðA4Þ ij ij ij In order to experimentally detect a linguistic change in This follows from the finite difference approximation progress, ideally one would like to survey the same ∂ g ≈ g − 2g þ g , the effectiveness of which population of individuals, or a representative sample of x iþ1;j ij i−1;j depends on g varying slowly between sites. From the population, at two or more points in time [1]. Such longitudinal studies may be practically difficult to carry Eq. (A4) we see that ∇ g measures the extent to which out, so linguists have made use of the assumption that g differs from the average of its neighbors. If ∇ g< 0, then speech patterns are acquired mainly in the early part of g exceeds the local average, and is less than the local people’s lives. The speech of a 50-year old today should average when the inequality is reversed. therefore reflect the speech of a 30-year old 20 years ago. It We now introduce a small discrete time step δt and write should be noted though that speech patterns can change Δm ≔ m ðt þ δtÞ − m ðtÞ for the change in the memory ij ij ij throughout life [44], although more slowly in older speak- field over the time interval ½t; t þ δt. We note also that ers. A linguistic change detected by observing different provided the grid is sufficiently fine, then at bulk sites speech patterns in the young and old is said to have been ρ ≈ hρi , so, making use of Eqs. (A1) and (A4),we have ij ij observed in apparent time [58,59]. A famous example of apparent time is the replacement of the term chesterfield hρfi ij Δm ≈ ðf − m Þþ 2σ − f δt: ðA5Þ (meaning an upholstered multiple-person seat) in Canadian ij ij ij ij hρi ij speech with the fashionable American term couch [100].In this case the use of couch was shown to decrease sigmoi- At each time step Eq. (A5) is used to update the stored dally from ≈85% amongst teenagers to ≈5% among those values of m for all sites in the linguistic domain, after ij in their eighties. The apparent time theory has been tested which the frequencies can also be updated using f ¼ ij by comparing language surveys taken at different times and pðm Þ. The quantity hρfi =hρi is the average of frequen- ij ij ij comparing predictions based on apparent time in the earlier cies at neighboring sites, weighted in proportion to their sample with the observations made in the later one [101]. populations. Using hρi rather than ρ in the denominator We note that differences between speech patterns between ij ij ensures that these weights sum to one. This serves two the generations do not always indicate a linguistic change purposes. First, it avoids the need for an additional in progress [44]. For example, the use of some speech condition at boundary sites (intuitively, speakers in forms may change systematically with age in the same way, 031008-21 JAMES BURRIDGE PHYS. REV. X 7, 031008 (2017) 1 if x< 0 generation after generation, so that the community as a whole is in a stable state [55]. 1 ηðxÞ ≔ if x ¼ 0 ðB7Þ We now give a simple illustration of how age and 0 if x> 0 apparent time can be incorporated in our model. For simplicity, we consider the progress of a straight isogloss and prepare the system with the initial condition between two variants, driven by a slowly declining pop- ulation density. This density variation is equivalent to a fðx; t < t Þ¼ ηðx − ΛÞ; ðB8Þ social bias toward one variable which we call the new variant. We let fðx; tÞ be the frequency with which the new where Λ is the initial location of the isogloss. Because each (spreading) variable is used at position x and time t. Note speaker listens more to the speakers on their left, the that there is no y dependence due to symmetry. To isogloss will travel right. In the limit β → ∞ then when the distinguish between young and old we introduce an age memory μ ðx; tÞ of a speaker, with x> Λ, reaches , they density distribution αðaÞ giving the fraction of individuals will switch linguistic variables. The motion of the isogloss within the age bracket ½a ;a  as 1 2 will then take the form of a traveling wave formed from a a superposition of traveling step functions, one for each age, αðaÞda: ðB1Þ 1 ∞ fðx; tÞ¼ αðaÞη½x − vt þ ΛðaÞda; ðB9Þ Using this distribution we modify our original model to account for the fact that individuals have been exposed only with the function ΛðaÞ > 0 and the velocity v to be to the linguistic information available in their lifetime. The determined. According to Eq. (B9),at t ≔ ΛðaÞ=v,a memory of a speaker with age a is therefore defined as speaker at the origin with memory of length a will be on the Z Z point of a switching variable, so that −s=τ μ ðx; tÞ ≔ kðx; uÞfðu; t − sÞdu ds: Z Z Z −a=τ a ∞ τð1 − e Þ 0 R 1 ds 0 0 μ ð0;t Þ¼ ¼ da dyαða ÞkðyÞ a a 2 a ðB2Þ 0 0 R × η½y þ vs − ΛðaÞþ Λða Þ Z Z Z Note that as a → ∞ this definition coincides with our a ∞ ΛðaÞ−Λða Þ−vs ds 0 0 original definition Eq. (2) of memory. In the interests of ¼ da αða Þ kðyÞdy 0 0 −∞ analytical tractability, we consider the limit of small Z Z a ∞ ds memory decay rate (τ → ∞) in which case linguistic 0 0 0 ≔ αða ÞK½ΛðaÞ − Λða Þ − vsda : memory is a simple “bus stop” average over life history: 0 0 Z Z  ðB10Þ μ ðx; tÞ¼ kðx; uÞfðx; t − sÞdu ds: ðB3Þ 0 R Here, we introduce the cumulative K of the interaction kernel: We also take the limit of total conformity β → ∞ so that language is chosen according to a simple majority rule. We z KðzÞ ≔ kðyÞdy: ðB11Þ consider an exponentially decaying population density −∞ −ϵx ρðxÞ¼ e ; ðB4Þ As ϵ → 0, the isogloss velocity must also tend to zero. The quantity where ϵ ≪ 1. The weighted interaction kernel for this density is then Δða ;a Þ ≔ Λða Þ − ΛðaÞðB12Þ 1 2 1 2 2 2 1 ðu − x þ ϵσ Þ gives the distance between the step functions for speakers kðx; uÞ¼ pffiffiffiffiffiffi exp − ðB5Þ with ages a and a as jΔða ;a Þj, and this separation must 2σ 1 2 1 2 2πσ also tend to zero as ϵ → 0. We can therefore compute a series expansion for v in powers of ϵ by expanding the ≕ kðu − xÞ: ðB6Þ cumulative interaction kernel KðzÞ in Eq. (B10) about z ¼ 0 and ϵ ¼ 0. To lowest order we have Notice that the effect of the decaying population density is to shift the interaction kernel to the left so that more 1 z ϵσ attention is paid to language use on that side of the listener. pffiffiffiffiffiffi pffiffiffiffiffiffi KðzÞ¼ þ þ þ oðzÞþ oðϵÞ: ðB13Þ 2πσ 2π To compute the isogloss velocity, we define 031008-22 SPATIAL EVOLUTION OF HUMAN DIALECTS PHYS. REV. X 7, 031008 (2017) Substituting this approximation into Eq. (B10),wehave Δða; a Þ av 0 0 αða Þda − þ ϵσ ¼ 0: ðB14Þ σ 2σ It is straightforward to verify that this equation has the solution 2σ ϵ v ¼ ; ðB15Þ 0 2 ða − a Þσ ϵ Δða; a Þ¼ ; ðB16Þ a¯ FIG. 21. Thick dashed line shows theoretical frequency of new variant for age distribution Eq. (B18)—using linear approxima- tions Eq. (B15) and (B16) for vðaÞ and ΛðaÞ, and assuming where a¯ is the mean age of the population: Λð0Þ¼ 0—when a ¼ 10, A ¼ 90, σ ¼ 5, ϵ ¼ 0.1 at time t ¼ A=2 ¼ 45 (chosen so that oldest speaker at x ¼ 0 has just switched variable). For these parameter values a¯ ¼ 35.23. Ver- a¯ ≔ aαðaÞda: ðB17Þ tical dashed line is drawn at location of youngest adopter of new variable (giving width of transition region). Open circles give If the oldest speaker has age A, then the width of the frequency values using an age distribution discretized into two- year bins, computed by numerically evolving the full model. transition region is ΔðA; 0Þ¼ Aσ ϵ=a¯. We provide a concrete example using a population “pyramid” age dis- 1 h − ΔðA; aÞ tribution, cut off exponentially at low ages to account for E½q ðaÞ ¼ erfc pffiffiffi ≕ q¯ða; h; ωÞ: ðB20Þ the fact that very young speakers listen to, but do not 2 2ω influence, others. Letting a be the low age cutoff, we define An example of this distribution is illustrated in Fig. 22.In this example, the mean sample location is the center of the transition region, and we see that uptake of the new variant −a=a αðaÞ¼ ð1 − e ÞðA − aÞ; ðB18Þ exhibits the characteristic “S-shaped” age distribution seen in studies of linguistic change observed in apparent time where C is a normalizing constant. An example of the [1,58,100]. traveling wave Eq. (B9) generated by this age distribution We conclude by noting that this extension of the model is illustrated in Fig. 21. Also shown are the results of a to include different memory lengths should be seen as a toy numerical implementation of the full model with a dis- model of the effect of age on language change. The fact that cretized version of the age distribution Eq. (B18). This discretization is necessary in order to implement the model 1.0 numerically, because the memory of each age of speaker 0.8 must be individually stored. Finally, we consider the likely outcome of experimen- 0.6 tally sampling the use of language within this model. We let x be the left boundary of the transition region (the oldest 0.4 speaker at x is just about to switch variable). We then define the indicator function of the event that a speaker of 0.2 age a, located at position x, is using the new variant: 0.0 1 if x − x < ΔðA; aÞ 80 q ðaÞ ≔ ðB19Þ 0 otherwise: FIG. 22. Expected frequency Eq. (B20) with which new variant Consider a sample of speakers with home locations X, is used by speakers of different ages from a random sample. Mean normally distributed around some average position x þ h: and variance of speaker locations relative to left boundary x of 1 2 X ∼ N ðx þ h; ω Þ. The probability that a speaker of age a 0 transition region are h ¼ Δð0;AÞ and ω ¼ 1. Model parameter within this sample will use the new variant is then the values are a ¼ 10, A ¼ 90, σ ¼ 5, ϵ ¼ 0.1. For these parameter expectation of q ðaÞ over the position X: values, a ¼ 35.23. 031008-23 JAMES BURRIDGE PHYS. REV. X 7, 031008 (2017) older speakers tend to take longer to change their language For constant density, the equilibrium configuration of the use is captured purely by the length of their memories. In isogloss is a straight line, so the curvature κ ≔ ∇ · g is zero, reality, the influence on a given speaker of the language and f depends only on the displacement g. Making use of they are exposed to at different stages of life will be much Eqs. (C1) and (C4), we see that the equilibrium equation for more complicated than our simple model [102].For f is in this case example, if language use were determined entirely during 2 2 early life, then the forgetting curve should be peaked during σ d f −1 ¼ p ðfÞ − f: ðC5Þ these early years—in effect, the linguistic memory should 2 dg stop “recording” once a speaker’s youth has ended. Each speaker will respond differently to what they hear, so the We now make the assumption that out of equilibrium, if forgetting curve will not be identical for every speaker. the curvature is low, and the density slowly varying, Such heterogeneity amongst speakers is straightforward to then the profile of the transition region around the isogloss incorporate, but at the cost of tractability. The advantage of takes its equilibrium form. We also recall our assumption the simple approach is to illustrate the apparent time effect in our derivation of the full evolution equation (3) that in an analytically simple way. 2 2 j∇ ρj=ρ ≪ σ . The evolution equation (C1) may then be written as APPENDIX C: ALLEN-CAHN EQUATION AND OHTA-JASNOW-KAWASAKI THEORY σ ∂f ∇ρ · ∇f 0 −1 ∂ f ¼ p ½p ðfÞ ∇ · g ˆ þ 2 ðC6Þ 2 ∂g ρ Here, we derive an analogue of the Allen-Cahn equation t [83] for the velocity of an isogloss, before deriving a ∂f ∇ · g ˆ ∇ρ · g ˆ modified Ohta-Jasnow-Kawasaki equation [84], which 2 0 −1 ¼ σ p ½p ðfÞ þ : ðC7Þ provides a simplified method for understanding the evo- ∂g 2 ρ lution of spatial correlations in the model. In the binary variant case (V ¼ 2), we have that m ¼ 1 − m , so after Making use of relation Eq. (C2), we have 1 2 defining pðm Þ ≔ p ðmÞ and f ≔ f , then f ¼ pðm Þ.In 1 1 1 1 terms of f, our evolution equation (3) may be written as ∂g κ ∇ρ · g ˆ 2 0 −1 ¼ −σ p ½p ðfÞ þ : ðC8Þ ∂t 2 ρ 0 −1 −1 2 ∂ f ¼ p ½p ðfÞ f − p ðfÞþ ∇ ðρfÞ : ðC1Þ 2ρ Since ð∂g=∂tÞ is the isogloss velocity, and at the isogloss we have f ¼ , then Following Refs. [20,83], we introduce a unit vector g, normal to the isogloss at a point P. We let g be the κ ∇ρ · g ˆ 2 displacement from P in the direction of g. At the isogloss v ¼ −σ β þ : ðC9Þ 2 ρ we have ∂f To obtain spatial correlation functions between different ∇f ¼ g; modal linguistic variables, we apply the Ohta-Jasnow- ∂g Kawasaki method [84]. We adapt the description of ∂ f ∂f OJK’s analysis given in Bray [20] to include population ∇ f ¼ þ ∇ · g: ∂g ∂g t t density effects. As we describe in Sec. V, we label the two alternatives for a particular variable as −1 and 1 and We also need the cyclic relation: introduce a smoothly varying auxiliary field mðx; y; tÞ which gives the modal (most common) variant ϕ of ∂t ∂f ∂g ¼ −1: ðC2Þ ˆ variable i as ϕ ðmÞ¼ sgnðmÞ. The unit vector g may then ∂f ∂g ∂t g t f be written as The Laplacian term in Eq. (C1) may be expanded as ∇m follows: g ¼ ; ðC10Þ j∇mj 2 2 ∇ ðρfÞ ∇ ρ ∇ρ · ∇f ¼ ∇ f þ f þ 2 ðC3Þ from which we see that the isogloss velocity is ρ ρ ρ 2 2 −∇ m þ g ˆ g ˆ ∂ ∂ m − 2ð∇ρ:∇mÞ=ρ ∂ f ∂f ∇ ρ ∇ρ · ∇f i;j i j i j ¼ þ ∇ · g þ f þ 2 : v ¼ σ β ; ∂g ∂g ρ ρ 2j∇mj t t ðC4Þ ðC11Þ 031008-24 SPATIAL EVOLUTION OF HUMAN DIALECTS PHYS. REV. X 7, 031008 (2017) where i; j ∈ fx; yg. In a reference frame attached to the inverse sine function, this behavior is not present in our interface, version of Séguy’s law [Eq. (19)]. We note also that the dynamics of Ref. [39] reduce order in the system, whereas our model leads to increasingly ordered states. dm ∂m ¼ 0 ¼ þ v · ∇m: ðC12Þ dt ∂t Since v∥∇m, then v · ∇m ¼ vj∇mj and ∂ m ¼ −vj∇mj,so [1] J. K. Chambers and P. Trudgill, Dialectology (Cambridge University Press, Cambridge, England, 1998). ∂m σ β ∇ρ:∇m [2] W. Heeringa and J. Nerbonne, Dialect Areas and Dialect ¼ ∇ m − g ˆ g ˆ ∂ ∂ m þ 2 : ðC13Þ i j i j ∂t 2 ρ Continua, Lang. Var. Change 13, 375 (2001). i;j [3] L. Bloomfield, Language (Holt, Rinehart and Winston, New York, 1933). This is the OJK equation, modified to include variable [4] P. Trudgill, The Dialects of England (Blackwell, Hoboken, population density. As OJK did, we now assume that the NJ, 1999). direction g ˆ is uniformly distributed over the system, and we [5] J. Séguy, La Relation Entre la Distance Spatiale et la replace g ˆ g ˆ with its circular mean δ , giving i j ij Distance Lexicale, Rev. Ling. Romane 35, 335 (1971). [6] J. Séguy, La Dialectomtrie Dans l’Atlas Linguistique de la 2 Gascogne, Rev. Ling. Romane 37, 1 (1973). ∂m ∇ m ∇ρ:∇m ¼ σ β þ : ðC14Þ [7] M. Wieling and J. Nerbonne, Advances in Dialectometry, ∂t 4 ρ Annu. Rev. Ling. 1, 243 (2015). [8] B. Kessler, in Proceedings of the 7th Conference of the This is our modified Ohta-Jasnow-Kawasaki equation. European Chapter of the Association for Computational Linguistics (Morgan Kaufmann Publishers Inc., San Francisco, CA, 1995), pp. 60–67. APPENDIX D: COMPARISON WITH A [9] V. Levenshtein, Binary Codes Capable of Correcting SIMULATION OF NERBONNE Deletions, Insertions and Reversals, Dokl. Akad. Nauk SSSR 163, 845 (1965). We note a link between curve Eq. (19) and a simple [10] R. G. Shackelton, Jr., Phonetic Variation in the Traditional model simulated by Nerbonne [39], but not characterized English Dialects. A Computational Analysis, J. Engl. Ling. analytically. The model consists of a line of discrete spatial 35, 30 (2007). points, with a single reference site at one end representing a [11] S. Embleton, D. Uritescu, and E. S. Wheeler, Defining city. In contrast to our model, all sites initially use an Dialect Regions with Interpretations: Advancing the identical set of N (¼ 100 in Ref. [39]) binary linguistic Multidimensional Scaling Approach, Literary Ling. variables. Evolution of language use is simulated at each Comput. 28, 13 (2013). site by repeatedly selecting a variable at random and then [12] M. Wieling and J. Nerbonne, Bipartite Spectral Graph Partitioning for Clustering Dialect Varieties and changing the state of the variable with probability . For a Detecting Their Linguistic Features, Comput. Speech site at distance r from the city, the number of repeats of this Lang. 25, 700 (2011). randomization process is defined as nðrÞ ≔ ⌊Cr ⌋, where [13] M. Wieling and J. Nerbonne, GABMAP: A Web Application the constant α measures the spatial decline of the “influ- for Dialectology, Dialectologia II, 65 (2011). ence” of the city with r. Larger values of nðrÞ imply a [14] P. Trudgill, Linguistic Change and Diffusion: Description greater level of noisy evolution and therefore a lower and Explanation in Sociolinguistic Dialect Geography, influence of the city. Linguistic distance in this model, Lang. Soc. 3, 215 (1974). after each site has received its nðrÞ updates, is given by [15] W. Wolfram and N. Schilling-Estes, Dialectology and Linguistic Diffusion,in The Handbook of Historical 1 1 Linguistics, edited by B. D. Joseph and R. D. Janda −1 nðrÞ −nðrÞ=N lðrÞ¼ ½1 − ð1 − N Þ  ≈ ½1 − e : ðD1Þ (Blackwell Publishing, Malden, 2003), pp. 713–725. 2 2 [16] W. Labov, Social Dialectology in Honour of Peter Trudgill (John Benjamins Publishing Company, Amsterdam, In Ref. [39], two values of α are tested, α ¼ 1, 2, and the 2003), pp. 9–22. quadratic case is identified as being consistent with [17] G. Bailey, T. Wikle, J. Tillery, and L. Sand, Some Patterns Trudgill’s macroscopic gravity model. However, we of Linguistic Diffusion, Lang. Var. Change 5, 359 (1993). emphasize that the two models do not make predictions [18] J. N. Stanford and L. A. Kenney, Revisiting Transmission that can be directly compared: in the microscopic model and Diffusion: An Agent-Based Model of Vowel Chain [39] no indication is given of how two centers of influence Shifts Across Large Communities, Lang. Var. Change 25, would compete. Of interest is the fact that the α ¼ 2 case 119 (2013). coincides with our prediction for large r. However, this [19] D. Crystal, The Cambridge Encyclopedia of the English value of α is rejected in Ref. [39] on the basis of its Language (Cambridge University Press, Cambridge, sigmoidal shape for small r. Because of the presence of the England, 2003). 031008-25 JAMES BURRIDGE PHYS. REV. X 7, 031008 (2017) [20] A. J. Bray, Theory of Phase-Ordering Kinetics, Adv. Phys. [39] J. Nerbonne, Measuring the Diffusion of Linguistic 43, 357 (1994). Change, Phil. Trans. R. Soc. B 365, 3821 (2010). [21] K. Barros, P. L. Krapivsky, and S. Redner, Freezing into [40] M. R. Mehl, S. Vazire, N. Ramrez-Esparza, R. B. Slatcher, Stripe States in Two-Dimensional Ferromagnets and and J. W. Pennebaker, Are Women Really More Talkative Crossing Probabilities in Critical Percolation, Phys. Than Men?, Science 317, 82 (2007). [41] J. K. Chambers, Dialect Acquisition, Language 68, 673 Rev. E 80, 040101 (2009). [22] P. L. Krapivsky, S. Redner, and E. Ben-Naim, A Kinetic (1992). View of Statistical Physics (Cambridge University Press, [42] J. Siegel, Second Dialect Acquisition (Cambridge University Cambridge, England, 2010). Press, Cambridge, England, 2010). [23] J. Burridge and S. Kenney, Birdsong Dialect Patterns [43] P. Bloom, How Children Learn the Meanings of Words Explained Using Magnetic Domains, Phys. Rev. E 93, (MIT Press, Cambridge, MA, 2000). [44] G. Sankoff and H. Blondeau, Language Change Across the 062402 (2016). Lifespan: /r/ in Montreal French, Language 83, 560 [24] M. Gerlach and E. G. Altmann, Stochastic Model for the Vocabulary Growth in Natural Languages, Phys. Rev. X (2007). 3, 021006 (2013). [45] L. Averell and A Heathcote, The Form of the Forgetting [25] A. M. Petersen, J. N. Tenenbaum, S. Havlin, H. E. Stanley, Curve and the Fate of Memories, J. Math. Psychol. 55.25 and M. Perc, Languages Cool as They Expand: Allometric (2011). [46] R. A. Blythe, Neutral Evolution: A Null Model for Scaling and the Decreasing Need for New Words, Sci. Language Dynamics, Adv. Complex Syst. 15, 1150015 Rep. 2, 943 (2012). [26] J. C. Willis and G. U. Yule, Some Statistics of Evolution (2012). and Geographical Distribution in Plants and Animals, and [47] J. Mathews and R. L. Walker, Mathematical Methods of Their Signature, Nature (London) 109, 177 (1922). Physics (Addison-Wesley, Reading, MA, 1971). [27] M. E. J. Newman, Power Laws, Pareto Distributions and [48] W. Croft, Explaining Language Change: An Evolutionary Approach (Longman, Harlow, England, 2000). Zipf’s Law, Contemp. Phys. 46, 323 (2005). [49] M. Kimura, The Neutral Theory of Molecular Evolu- [28] M. Gerlach and E. G. Altmann, Scaling Laws and tion (Cambridge University Press, Cambridge, England, Fluctuations in the Statistics of Word Frequencies, New J. Phys. 16, 113010 (2014). 1983). [29] M. Newman, Networks: An Introduction, (Oxford, [50] P. A. P Moran, Random Processes in Genetics, Proc. New York, 2010). Cambridge Philos. Soc. 54, 60 (1958). [30] J. P. Gleeson, K. P. O’Sullivan, R. A. Banos, and Y. [51] S. Asch, Studies of Independence and Conformity: I. A Minority of One Against a Unanimous Majority, Psychol. Moreno, Effects of Network Structure, Competition Monogr. 70, 1 (1956). and Memory Time on Social Spreading Phenomena., [52] R. Bond and P. B. Smith, Culture and Conformity: A Meta- Phys. Rev. X 6, 021019 (2016). [31] R. Albert, H. Jeong, and A. L. Barabasi, Error and Attack Analysis of Studies Using Asch’s (1952b, 1956) Line Tolerance of Complex Networks, Nature (London) 406, Judgment Task, Psychol. Bull. 119, 111 (1996). [53] T. J. H. Morgan, L. E. Rendell, M. Ehn, W. Hoppitt, and 378 (2000). K. N. Laland, The Evolutionary Basis of Human Social [32] M. Perc, Evolution of the Most Common English Words Learning, Proc. R. Soc. B 279, 653 (2012). and Phrases Over the Centuries, J. R. Soc. Interface 9, [54] D. A. Nelson, Song Overproduction, Selective Attrition 3323 (2012). [33] V. V. Loreto, A. Baronchelli, A. Mukherjee, A. Puglisi, and and Song Dialects in the White-Crowned Sparrow, Animal F. Tria, Statistical Physics of Language Dynamics, J. Stat. Behaviour 60, 887 (2000). [55] P. Trudgill, Sociolinguistics (Penguin, London, 2000). Mech. (2011) P04006. [56] W. Labov, The Social Stratification of English in New York [34] G. J. Baxter, R. A. Blythe, W. Croft, and A. J. McKane, City (Cambridge University Press, Cambridge, England, Utterance Selection Model of Language Change, Phys. 2006). Rev. E 73, 046118 (2006). [57] A. S. Kroch, Toward a Theory of Social Dialect Variation, [35] M. R. D’Orsogna and M. Perc, Statistical Physics of Crime: A Review, Phys. Life Rev. 12, 1 (2015). Lang. Soc. 7, 17 (1978). [36] Z. Wanga, C. T. Bauch, Samit Bhattacharyya, A. [58] W. Labov, Principles of Linguistic Change (Blackwell, Malden, MA, 2001). d’Onofrio, P. Manfredi, M. Perc, N. Perra, M. Salathé, [59] W. Labov, The Social Motivation of a Sound Change, and D. Zhaol, Statistical Physics of Vaccination, Phys. Word and Information Processing 19, 273 (1963). Rep. 664, 1 (2016). [60] W. Wolfram and N. Schilling-Estes, in Sociolinguistic [37] C. Castellano, S. Fortunato, and V. Loreto, Statistical Variation: Data, Theory, and Analysis: Selected Papers Physics of Social Dynamics, Rev. Mod. Phys. 81, 591 from NWAV 23 at Stanford (CSLI, Stanford, CA, 1996), (2009). pp. 69–82. [38] See Supplemental Material at http://link.aps.org/ [61] W. Labov, Sociolinguistic Patterns (University of supplemental/10.1103/PhysRevX.7.031008 for animated Pennsylvania Press, Philadelphia, PA, 1972). versions of Fig. 6, Fig. 10, Fig. 18, and a spreadsheet [62] D. MacAulay, The Celtic Languages (Cambridge Univer- implementation of the discretized evolution equation sity Press, Cambridge, England, 2008). derived in Appendix A. 031008-26 SPATIAL EVOLUTION OF HUMAN DIALECTS PHYS. REV. X 7, 031008 (2017) [63] W. H. Press, S. A. Teukolsky, W. T. Vetterling, and B. P. [82] J. Nerbonne and P. Kleiweg, Lexical Distance in Lamsas, Flannery, Numerical Recipes (Cambridge University Press, Comput. Humanit. 37, 339 (2003). Cambridge, England, 2007). [83] S. M. Allen and J. W. Cahn, A Microscopic Theory [64] Office for National Statistics: 2011 Census aggregate data, for Antiphase Boundary Motion and Its Application to UK Data Service, 2016, http://dx.doi.org/10.5257/census/ Antiphase Domain Coarsening, Acta Metall. 27, 1085 aggregate‑2011‑1. (1979). [65] M. Tranter, D. Robineau, and G. Goodman, National [84] T. Ohta, D. Jasnow, and K. Kawasaki, Universal Scaling in Travel Survey: 2014 Report (Department of Transport, the Motions of Random Interfaces, Phys. Rev. Lett. 49, United Kingdom, 2014), https://www.gov.uk/government/ 1223 (1982). statistics/national‑travel‑survey‑2014. [85] Y. Oono and S. Puri, Large Wave Number Features of [66] P.-G. de Gennes, F. Brochard-Wyart, and D. Quere, Form Factors for Phase Transition Kinetics, Mod. Phys. Capillarity and Wetting Phenomena (Springer-Verlag, Lett. B 02, 861 (1988). New York, 2004). [86] S. V. Patankar, Numerical Heat Transfer and Fluid Flow [67] C. Upton, The Importance of Being Janus: Midland (McGraw-Hill, New York, 1980). Speakers and the “North-South Divide”,in Middle and [87] T. Leinonen, An Acoustic Analysis of Vowel Pronunciation Modern English Corpus Linguistics, edited by M. Markus, in Swedish Dialects (Rijksuniversiteit Groningen, 2010). Y. Iyeiri, R. Heuberger, and E. Chamson (John Benjamins [88] David Morin, Introduction to Classical Mechanics Publishing Company, Amsterdam, 2012), pp. 257–268. (Cambridge University Press, Cambridge, England, 2008). [68] W. J. Heeringa, Measuring Dialect Pronunciation [89] NIST Handbook of Mathematical Functions, edited by Differences Using Levenshtein Distance (University of F. W. J. Olver, D. W. Lozier, R. F. Boisvert, and C. W. Groningen, Groningen, 2004). Clark (Cambridge University Press, Cambridge, England, [69] A. K. Jain and R. C. Dubes, Algorithms for Clustering 2010). Data (Prentice-Hall, Englewood Cliffs, NJ, 1988. [90] D. Brockmann, L. Hufnagel, and T. Geisel, The Scaling [70] M. Maechler, P. Rousseeuw, A. Struyf, M. Hubert, and Laws of Human Travel, Nature (London) 439, 462 (2006). K. Hornik, Cluster: Cluster Analysis Basics and Exten- [91] M. C. Gonzlez, C. A. Hidalgo, and L. A. Barabsi, Under- sions. 2016. R package version 2.0.5. For new features, standing Individual Human Mobility Patterns, Nature see the “Changelog” file (in the package source), https:// (London) 453, 779 (2008). cran.r‑project.org/web/packages/cluster. [92] H. Hayakawa, T Ishihara, K. Kawanishi, and T. S. [71] B Everitt, S. Landau, M. Leese, and D. Stahl, Cluster Kobayakawa, Phase-Ordering Kinetics in Nonconserved Analysis (Wiley, Chichester, England, 2011). Scalar Systems with Long-Range Interactions, Phys. Rev. [72] H Kuhn, The Hungarian Method for the Assignment E 48, 4257 (1993). Problem, Naval research logistics quarterly 2, 83 (1955). [93] T. Blanchard, M. Picco, and M. A. Rajabpour, Influence of [73] S. N. Chiu, D. Stoyan, W. Kendall, and J. Mecke, Sto- Long-Range Interactions on the Critical Behavior of the chastic Geometry and Its Applications (Wiley, Chichester, Ising Model, Europhys. Lett. 101, 56003 (2013). England, 2013). [94] B. D. Joseph and R. D. Janda, The Handbook of Historical [74] W. M. Rand, Objective Criteria for the Evaluation of Linguistics (Blackwell, Oxford, 2003). Clustering Methods, J. Am. Stat. Assoc. 66, 846 (1971). [95] C.-M. Pop and E. Frey, Language Change in a Multiple [75] L. Hubert, Comparing Partitions, J. Classif. 2, 193 Group Society, Phys. Rev. E 88, 022814 (2013). (1985). [96] P. Roach, English Phonetics and Phonology (Cambridge [76] W. Heeringa, J. Nerbonne, and P. Kleiweg, Validating University Press, Cambridge, England, 2009). Dialect Comparison Methods,in Classification, Automa- [97] M. Davenport and S. J. Hannahs, Introducing Phonetics tion, and New Media (Springer-Verlag, Berlin, 2002), and Phonology (Routledge, Oxford, 2010). pp. 445–452. [98] Leo Tolstoy, War and Peace (Penguin, Toronto, 1869). [77] P. Sammallahti, The Saami Languages. An Introduction [99] P. M. B. Vitányi, Tolstoy’s Mathematics in War and Peace, (Dawi Girji, Karasjok, Norway, 1998). Math. Intelligence 35, 71 (2013). [78] D. R. Preston, Handbook of Perceptual Dialectology (John [100] J. K. Chambers, The Canada US Border as a Vanishing Benjamins, Amsterdam, 1999). Isogloss: The Evidence of Chesterfield, J. Engl. Ling. 23, [79] M. R. Spruit, W. Heeringa, and J. Nerbonne, Associations 155 (1995). among Linguistic Levels, Lingua 119, 1624 (2009). [101] G. Bailey, T. Wilke, J. Tillery, and L. Sand, The Apparent [80] C. Gooskens, Traveling Time as a Predictor of Linguistic Time Construct, Lang. Var. Change 3, 241 (1991). Distance, Dialectologia Geolinguistica 13, 38 (2005). [102] G. Baxter and W. Croft, Modeling Language Change [81] S. Montemagni, The Space of Tuscan Dialectal Variation. Across the Lifespan: Individual Trajectories in Community A Correlation Study, Int. J. Humanit. Arts Comput. 2, 135 Change, Lang. Var. Change 28, 129 (2016). (2008). 031008-27

Journal

Physical Review XAmerican Physical Society (APS)

Published: Jul 1, 2017

There are no references for this article.

You’re reading a free preview. Subscribe to read the entire article.


DeepDyve is your
personal research library

It’s your single place to instantly
discover and read the research
that matters to you.

Enjoy affordable access to
over 12 million articles from more than
10,000 peer-reviewed journals.

All for just $49/month

Explore the DeepDyve Library

Unlimited reading

Read as many articles as you need. Full articles with original layout, charts and figures. Read online, from anywhere.

Stay up to date

Keep up with your field with Personalized Recommendations and Follow Journals to get automatic updates.

Organize your research

It’s easy to organize your research with our built-in tools.

Your journals are on DeepDyve

Read from thousands of the leading scholarly journals from SpringerNature, Elsevier, Wiley-Blackwell, Oxford University Press and more.

All the latest content is available, no embargo periods.

See the journals in your area

Monthly Plan

  • Read unlimited articles
  • Personalized recommendations
  • No expiration
  • Print 20 pages per month
  • 20% off on PDF purchases
  • Organize your research
  • Get updates on your journals and topic searches

$49/month

Start Free Trial

14-day Free Trial

Best Deal — 39% off

Annual Plan

  • All the features of the Professional Plan, but for 39% off!
  • Billed annually
  • No expiration
  • For the normal price of 10 articles elsewhere, you get one full year of unlimited access to articles.

$588

$360/year

billed annually
Start Free Trial

14-day Free Trial