Abstract Wildflower seeds are routinely spread along highways and thoroughfares throughout North America as part of federal beautification policy, but the genetic effect of the introduction of these cultivated populations on wild populations of the same species is unknown. Interbreeding may occur between these seeded and wild populations, resulting in several possible outcomes. Here we sample 187 individuals in 12 matched pairs of neighboring wild and seeded populations of the Texas bluebonnet (Lupinus texensis), a species popular in commercially available wildflower seed mixes used by both the Texas Department of Transportation and the public. We use genotyping by sequencing to identify 11741 genome-wide single nucleotide polymorphisms, as well as a smaller number of SNPs from the chloroplast genome, to analyze population structure and genetic diversity within and between the populations. We find a striking lack of population structure both between wild and seeded populations and amongst wild populations. STRUCTURE analyses indicate that all populations are apparently panmictic. This pattern may be explained by extensive swamping of wild populations by seeded germplasm and increased dispersal of semi-domesticated seed across this species’ core native range by humans. We discuss the possible negative and positive ramifications of homogenization on the evolutionary future of this popular wildflower species. beautification, cultivation, Department of Transportation, genetic swamping, ornamental seed spreading, population structure Introduction The spread of wildflower seed for ornamental or beautification purposes along highways and thoroughfares is a common practice across North America. This practice can both introduce species to new habitats, sometimes outside of their native range, and introduce novel genetic material across the native range of a species. The Texas Department of Transportation (TXDoT) spreads approximately 30000 lbs of wildflower seed along state highways each year (TXDoT 2016). The practice of intentionally seeding wildflowers along roads and highways for ornamental purposes by TXDoT (then the Texas Highway Department) began in some counties as early as 1917, and became a state-wide practice by 1932 (Davis et al. 1994, TXDoT 2016). The federal Highway Beautification Act of 1965, strongly supported by First Lady “Lady Bird” Johnson, encouraged maintenance of roadside land, including intentional plantings of wildflowers (Highway Beautification Act of 1965). Today, TXDoT typically obtains seed from commercial wildflower seed farms in Texas and New Mexico (Buddy Hudson, TXDoT, personal communication). The genetic effect of this ornamental seed spreading, generated from cultivated stock populations, on local populations of native wildflowers is unclear. Population structure in wild populations without human assisted dispersal is expected to be shaped by local adaptation, drift due to finite population size, and gene flow (Slatkin 1987). Gene flow between wild populations is thought to occur under several general scenarios: isolation-by-distance (IBD; where drift, especially in small populations, causes populations to become increasingly differentiated with increasing distance), isolation-by-environment (IBE; where gene flow may occur between populations in similar environments more frequently than expected by IBD); “counter gradient gene flow” (where gene flow occurs between dissimilar environments, resulting in the loss of locally adapted genes to swamping); unrestricted (for example due to nearly unlimited dispersal and/or nearly random mating across the distribution); and highly restricted (for example due to habitat fragmentation and/or very limited dispersal) (Sexton et al. 2014 and citations therein). If migration and interbreeding occurs between wild and seeded populations, this may have important impacts on the genetic diversity of the species, or the viability of desirable seeded or wild populations (Ellstrand 2014). One possibility is that propagule pressure from seeded populations may swamp out wild, isolated, or locally adapted populations, decreasing within-species genetic diversity (Sexton et al. 2014; Todesco et al. 2016). For example, in rice, extensive hybridization between the domesticated crop and related wild taxa has resulted in the near extinction of several subspecies due to genetic swamping (reviewed in Ellstrand et al. 1999). Alternatively, gene flow may move in the other direction, with locally adapted populations contributing selectively advantageous alleles that allow the seeded populations to persist and/or thrive in the local habitat (a form of genetic rescue; Whiteley et al. 2015). On the Pacific coast of North America, invasive Spartina alterniflora has a geographic area doubling time of ~5.3 years, while a hybrid lineage between S. alterniflora and native S. foliosa has a doubling time of ~1.6 years, suggesting the native species genetic contribution to have important effects on fitness and population growth (Strong and Ayres 2013). Here we investigate the effect of seed spreading on the genetic diversity of wild populations of the Texas bluebonnet (Lupinus texensis), the state flower and a popular component of wildflower seed mixtures used in the state of Texas and elsewhere. Using a genotyping-by-sequencing (GBS) approach to identify genetic polymorphism in this non-model organism, we assess the degree of within-species genetic diversity and population structure within and between wild populations of L. texensis, and compare this to seeded populations from central Texas. Methods Study Species Lupinus texensis Hook. (Fabaceae), the Texas bluebonnet, is an herbaceous winter annual. The species is diploid, with a moderate genome size (1C = 1193 Mbp; Price et al. 2005), and a mixed-mating system with high outcrossing rates (Helenurm and Schaal 1996). Lupinus texensis is predominantly pollinated by Bombus pennsylvanica and Apis mellifera (Schaal 1980). Both pollen and seed dispersal distances are limited in this species. Seeds are dispersed over distances up to 4 m by the explosive dehiscence of the seed pod, though average seed dispersal distance is 0.58 m (Schaal 1980). The seed coats of lupines are well-camouflaged, and in other lupine species, predated mainly by local rodents, rather than birds (Lupinus arboreus;Maron and Simms 1997). Perhaps because of this limited dispersal, the species occurs in distinct, well separated local populations (Banks and Birky 1985). Schaal (1980) estimated gene flow in an experimental population in this species, calculated from molecular markers, to be up to 6.3 m2 with an effective population (aka “genetic neighborhood”) size of 95.4 plants. However such measurements of dispersal from a single source likely under-estimate gene flow; because the dispersal distribution in this species is leptokurtic (Schaal 1980), the long tail of the dispersal distribution will have gone undetected (Slatkin 1987; Rieseberg and Burke 2001). Early genetic work in this species indicates that L. texensis possesses relatively high nuclear genetic variation and typically low plastid variation. Average nuclear genic heterozygosity (HO) within 10 populations, estimated from direct count of five polymorphic allozyme markers, was 0.3559 (Babbel and Selander 1974). Mean genetic identity, using Nei’s (1972) index of the normalized identity (I) of genes, for these populations was (I) = 0.9569 (Babbel and Selander 1974). RFLP analysis of chloroplast DNA in this species indicates that genetic diversity is lower in the chloroplast than the nuclear genome (Banks and Birky 1985), typical of flowering plants (Gaut et al. 1993). The species is endemic to central Texas, and commonly found in Gould Ecoregion 5: Cross Timbers and Prairies and Ecoregion 7: Edwards Plateau (Figure 1; Turner 1957; Gould et al. 1960). Turner (1957) estimated the range of the species in the 1950s as occurring on calcareous soils in central Texas (stippling, Figure 1). The “bluebonnet” was selected as the state flower in 1901 (Andrews 1993). This was amended in 1971 to specifically include L. texensis, as well as 5 other lupine species native to the state (Andrews 1993). The bluebonnet species predominantly used by TXDoT is in the area sampled in this study (Gould Ecoregions 5 and 7) is L. texensis (Davis et al. 1994; Buddy Hudson, TXDoT, personal communication). Since the mid-1980s, concerted effort has gone into the development of L. texensis as a floricultural crop (Davis et al. 1994). This has resulted in the production of several cultivars through recurrent selection, including “Abbott Pink”, “Barbara Bush”, and “Texas Maroon”/“Alamo Fire” (Parsons and Davis 1993; Parsons et al. 1994; Mackay et al. 2000). Wildseed Farms (wildseedfarms.com) is the largest commercial producer of L. texensis seed, producing between 10000–90000 lbs of seed per year since 1983 (Tom Kramer, Wildseed Farms, personal communication). Approximately 90% of this seed is sold within Texas, through sales to individuals, highway departments, and organizations involved in beautification. Today the species occurs in 43% of the counties in Texas (shading, Figure 1; USDA NCRS 2017). Though Babbel and Selander (1974) note the possible impact of TXDoT activity previous to 1975 on the genetic diversity of this species, they conclude that “It is unlikely that [this species] has been genically [sic] ‘homogenized’ as a result of human activity in transporting plants or seeds.” Population Sampling In April 2013, leaf tissue was sampled from 8 individuals of L. texensis from each of 24 populations within central Texas, the historical range of the species (Figure 1, Supplementary Table S1). These populations consisted of 12 pairs, one wild and one seeded population, within 10 km of each other. Each pair was at least 20 km from any others. Wild populations were first selected from sites managed by or affiliated with the Texas Land Conservancy, Texas Parks and Wildlife, the Nature Conservancy of Texas, The Ladybird Johnson Wildflower Center, or the US Army. These sites are all unimproved and explicitly managed for conservation: to the best knowledge of the land owners and managers, these sites contained populations which are not known to have been intentionally seeded in the past. Seeded populations were then collected from nearby locations on the roadside of State or Interstate highways, based on information provided by the TXDoT office of Maintenance Field Support, Wildflower Program. Rather than strictly minimize distance between members of the population pair, we attempted to maximize confidence in population history; the chosen seeded populations are the most likely to have been seeded or reseeded by TXDoT in the recent past. In several cases we could confirm this at the seeded sites by the presence of green coir mats used by TXDoT to keep seeds in place. Figure 1. View largeDownload slide Sampled population and range map of Lupinus texensis in the state of Texas, United States of America. Neighboring population pairs indicated by points, and population type, wild or seeded, is indicated by point color. Geographic coordinates for all populations are included in Table S1. Historical distribution from the 1950s of L. texensis is indicated by stippling (Turner 1957). Current distribution at the county level is indicated by shading (USDA 2016). Figure 1. View largeDownload slide Sampled population and range map of Lupinus texensis in the state of Texas, United States of America. Neighboring population pairs indicated by points, and population type, wild or seeded, is indicated by point color. Geographic coordinates for all populations are included in Table S1. Historical distribution from the 1950s of L. texensis is indicated by stippling (Turner 1957). Current distribution at the county level is indicated by shading (USDA 2016). GBS Library Preparation Young leaves were sampled and dried on silica gel, then stored at room temperature. DNA was extracted from young leaves using a Qiagen DNeasy 96 Plant Kit (Qiagen, Valencia, CA, USA). DNA quantity was assessed using a Qubit 2.0 Fluorometer (Thermo Fisher Scientific, Waltham, MA, USA). This DNA was used to produce one genotype-by-sequencing library containing 192 individuals. The PstI-MspI GBS library was produced using a modified version of the protocol described in Poland et al. (2012). Briefly, we treated genomic DNA with HF-PstI and MspI at 37 °C for 5 h. We ligated barcoded adaptors and barcoded common Y-shaped adaptors to digested DNA at 22 °C for 3 h. We pooled 192 ligated samples, which were then purified using the Agencourt AMPure XP system and amplified by PCR using KAPPA HiFi Hotstart master mix. Finally, we selected DNA fragments between 400–1000 bp using the Agencourt AMPure XP system for paired-end sequencing on two lanes of Illumina HiSeq 2000 paired-end 100bp platform. Sequencing runs resulted in ~394 million reads passing initial quality control at the sequencing facility, with an average quality score of 36. Chloroplast Haplotype Analysis In order to determine possible matrilineal variation, we filtered the GBS data for chloroplast-specific sequences using a technique based on Huang et al. (2014). We aligned the reads against a reference plastome sequence from Lupinus albus (Genbank NC_026681). Based on the assumption that plastomes would be present at very high frequency in leaf tissue samples, we only considered reads that were present at 1000× depth or higher. This threshold filters out nuclear paralogs of chloroplast sequences, which would confound the analysis. We performed a haplotype analysis using TCS (Clement et al. 2000) to visualize relationships between the sequenced haplotypes. SNP Calling Data from both sequencing lanes were concatenated, and the raw reads de-multiplexed. De novo assembly of a reference catalogue and genotyping was performed with dDocent v2.2.6 (Puritz et al. 2014). dDocent combines several existing software packages into a single pipeline designed specifically for paired-end GBS/RAD data and takes advantage of both forward and reverse reads for SNP discovery. The resulting variant call file (VCF) was filtered using vcftools v0.1.15 (Danecek et al. 2011) to a minimum quality score of 30, a minor allele count of 3, with a minimum genotype depth set to 3 reads, no more than 30% missing data per individual, and retain sites that were successfully genotyped in at least 95% of individuals. Complex variants (multinucleotide polymorphisms and composite insertions and substitutions) were decomposed into SNP and indel representation following Puritz et al. (2014), retaining only one biallelic SNP per locus with a minimum minor allele frequency (MAF) of 0.05. Further filtering steps were performed to remove SNPs likely to be the result of sequencing errors, paralogs, multicopy loci or artifacts of library preparation (Supplementary Methods S1). The final data set used in downstream analyses consisted of 11741 SNPs in 187 individuals, with 7 or 8 individuals from each of 24 populations (Supplementary Table S1). More stringent filtering criteria did not substantively alter the interpretation of results (Supplementary Table S2). Population Structure Using this set of filtered SNPs, we investigated population differentiation in three ways. First, we calculated FST, a measure of population divergence (Weir and Cockerham 1984; Lowe et al. 2004), between wild and seeded populations, among wild, among seeded, and between each pair of neighboring populations using the Weir and Cockerham method via vcftools v.0.1.15 (Weir and Cockerham 1984; Danecek et al. 2011). FST estimates and standard errors are reported. We also calculated overall population genetic statistics, including mean observed heterozygosity and FIS, overall and for each population, as defined in Nei (1987), implemented in the hierfstat package in R (Goudet and Jombart 2016). We estimated effective population size overall and for each population using the heterozygote excess method as implemented in NeEstimator (Do et al. 2014). We additionally used principal components analyses implemented in SNPrelate package in R (Zheng et al. 2012). PCA is widely used in population structure analysis because it is computationally efficient in handling large numbers of SNP markers (Price et al. 2006; Bryc et al. 2010; Zheng et al. 2012). In addition, 95% confidence ellipses describing the cluster for each population type using the bivariate t-distribution are presented. To detect admixture and population structure amongst bluebonnet populations, we used the Bayesian clustering algorithm implemented in STRUCTURE v.2.3.4 (Pritchard et al. 2000) and following best practices recommended in Gilbert et al. (2012) and Janes et al. (2017). We used an admixture model with correlated allele frequency and location prior information identifying seeded or wild status. Additional admixture models run without the location prior information were extremely similar (not shown). We ran three independent iterations each with K = 1–24 with a burn-in of 1000 steps, and confirmed with 10 additional iterations of K = 1–5 with a burn-in of 100000 steps. The optimal K (from K = 1 to K = 24) was found using CLUMPAK (Kopelman et al. 2015) using both the Evanno method (Evanno et al. 2005) and the Pritchard method (the k for which Pr(K=k) is highest, Pritchard et al. 2000). Results Chloroplast Haplotype Analysis The aligned dataset totaled 1199 characters, six of which were variable. There were 19 unique haplotypes present in the 192 samples. Six haplotypes represented the bulk of the individuals, 169 of the 192. These six haplotypes were approximately evenly split between seeded and wild populations, but there was no evident geographic signal in the distribution of the haplotypes between the populations (Figure 2). Figure 2. View largeDownload slide Chloroplast haplotype analysis. Haplotype network of chloroplast sequences assembled from genotyping by sequencing reads. Each labeled circle represents a unique haplotype, sized in proportion to the number of individuals with that haplotype. The circles are also shaded according to the number of seeded or wild populations that contained that haplotype. Lines represent possible inferred nucleotide changes. Small circles represent unsampled inferred haplotypes. Figure 2. View largeDownload slide Chloroplast haplotype analysis. Haplotype network of chloroplast sequences assembled from genotyping by sequencing reads. Each labeled circle represents a unique haplotype, sized in proportion to the number of individuals with that haplotype. The circles are also shaded according to the number of seeded or wild populations that contained that haplotype. Lines represent possible inferred nucleotide changes. Small circles represent unsampled inferred haplotypes. Population Structure FST values among the sampled bluebonnet populations were typically low, ranging between 0.0007 and 0.027 depending on the comparison (Table 1). Population differentiation was lowest in the wild vs. seeded population comparison, indicating near total panmixis between these 2 groups. FST was estimated at 0.018 among all wild populations, and 0.013 among all seeded populations, and 0.015 overall. Among the pairs of wild and seeded neighboring populations, the highest FST was estimated between populations 1007 and 1008 at 0.027; these populations are approximately 4.3 km apart. Mean observed heterozygosity for all individuals was estimated at 0.270 and overall FIS was −0.047. Population level FIS for most populations (19 out of 24; Table 2) was negative, indicating individuals in these populations are less related than expected under random mating. Only one population had a positive FIS, indicating individuals in this population are more related than expected under random mating (1104; Table 1). Effective population size (Ne) was estimated as “infinite” over all samples (Table 1), and ranged between 6.9 and “infinite” for each population (Table 2). The smallest effective population size was estimated for population 1006, the easternmost seeded population. Table 1. Comparison population genetic statistics for 24 populations of Lupinus texensis collected from Texas, United States of America Comparison Statistic Estimate (± SE) Overall Observed heterozygosity HO 0.270 Population differentiation FST 0.015 (± 0.0003) Inbreeding coefficient FIS −0.047 Gene diversity Ht 0.262 Within population gene diversity HS 0.258 Amount of gene diversity among samples DST 0.004 Effective population size Ne Infinite Wild vs. seeded populations FST 0.0007 (±0.0001) Among wild populations FST 0.018 (± 0.0004) Among seeded populations FST 0.013 (±0.0004) Between neighboring populations FST 1002 vs 1001 0.004 (± 0.0007) 1005 vs 1006 0.015 (± 0.0008) 1007 vs 1008 0.027 (± 0.0009) 1101 vs 1102 0.023 (± 0.0008) 1103 vs 1104 0.017 (± 0.0008) 1105 vs 1107 0.010 (± 0.0008) 1201 vs 1202 0.001 (± 0.0007) 1203 vs 1204 0.006 (± 0.0009) 1205 vs 1301 0.008 (± 0.0008) 1303 vs 1304 0.020 (± 0.0009) 1601 vs 1602 0.004 (± 0.0007) 1603 vs 1604 0.020 (± 0.0009) Comparison Statistic Estimate (± SE) Overall Observed heterozygosity HO 0.270 Population differentiation FST 0.015 (± 0.0003) Inbreeding coefficient FIS −0.047 Gene diversity Ht 0.262 Within population gene diversity HS 0.258 Amount of gene diversity among samples DST 0.004 Effective population size Ne Infinite Wild vs. seeded populations FST 0.0007 (±0.0001) Among wild populations FST 0.018 (± 0.0004) Among seeded populations FST 0.013 (±0.0004) Between neighboring populations FST 1002 vs 1001 0.004 (± 0.0007) 1005 vs 1006 0.015 (± 0.0008) 1007 vs 1008 0.027 (± 0.0009) 1101 vs 1102 0.023 (± 0.0008) 1103 vs 1104 0.017 (± 0.0008) 1105 vs 1107 0.010 (± 0.0008) 1201 vs 1202 0.001 (± 0.0007) 1203 vs 1204 0.006 (± 0.0009) 1205 vs 1301 0.008 (± 0.0008) 1303 vs 1304 0.020 (± 0.0009) 1601 vs 1602 0.004 (± 0.0007) 1603 vs 1604 0.020 (± 0.0009) In neighboring population comparisons, populations are listed in the order of wild vs seeded. See text for estimation methods. View Large Table 2. Population specific statistics for 24 populations of Lupinus texensis collected from Texas, United States of America Population ID FIS 95% CI Ne estimate (95% CI) 1001 −0.085, −0.067 15.6 (12.9–19.7) 1002 −0.014, 0.008 Infinite (142.8–Infinite) 1005 −0.052, −0.029 134.0 (44.5–Infinite) 1006 −0.040, −0.017 6.9 (6.4–7.5) 1007 −0.016, 0.002 9.4 (8.3–10.7) 1008 −0.015, 0.009 11.4 (9.9–13.5) 1101 −0.027, −0.006 10.0 (8.8–11.5) 1102 −0.014, 0.010 49.2 (28.5–189.1) 1103 −0.034, −0.015 73.1 (34.5–Infinite) 1104 0.005, 0.026 Infinite 1105 −0.071, −0.049 52.7 (29.3–279.6) 1107 −0.039, −0.017 Infinite (140.2–Infinite) 1201 −0.006, 0.017 Infinite 1202 −0.057, −0.034 Infinite 1203 −0.098, −0.077 Infinite 1204 −0.046, −0.023 Infinite 1205 −0.113, −0.094 Infinite (309.4–Infinite) 1301 −0.048, −0.030 Infinite 1303 −0.144, −0.125 21.8 (16.6–32.0) 1304 −0.116, −0.093 Infinite 1601 −0.101, −0.077 45.3 (27.5–132.7) 1602 −0.108, −0.088 11.4 (9.9–13.4) 1603 −0.058, −0.037 172.4 (45.5–Infinite) 1604 −0.052, −0.027 9.7 (8.6–11.1) Population ID FIS 95% CI Ne estimate (95% CI) 1001 −0.085, −0.067 15.6 (12.9–19.7) 1002 −0.014, 0.008 Infinite (142.8–Infinite) 1005 −0.052, −0.029 134.0 (44.5–Infinite) 1006 −0.040, −0.017 6.9 (6.4–7.5) 1007 −0.016, 0.002 9.4 (8.3–10.7) 1008 −0.015, 0.009 11.4 (9.9–13.5) 1101 −0.027, −0.006 10.0 (8.8–11.5) 1102 −0.014, 0.010 49.2 (28.5–189.1) 1103 −0.034, −0.015 73.1 (34.5–Infinite) 1104 0.005, 0.026 Infinite 1105 −0.071, −0.049 52.7 (29.3–279.6) 1107 −0.039, −0.017 Infinite (140.2–Infinite) 1201 −0.006, 0.017 Infinite 1202 −0.057, −0.034 Infinite 1203 −0.098, −0.077 Infinite 1204 −0.046, −0.023 Infinite 1205 −0.113, −0.094 Infinite (309.4–Infinite) 1301 −0.048, −0.030 Infinite 1303 −0.144, −0.125 21.8 (16.6–32.0) 1304 −0.116, −0.093 Infinite 1601 −0.101, −0.077 45.3 (27.5–132.7) 1602 −0.108, −0.088 11.4 (9.9–13.4) 1603 −0.058, −0.037 172.4 (45.5–Infinite) 1604 −0.052, −0.027 9.7 (8.6–11.1) FIS confidence interval calculated using hierfstat package in R (Goudet and Jombart 2016). Ne estimate calculated using excess heterozygosity method in NeEstimator (Do et al. 2014). View Large Principal components analysis of 11741 filtered SNPs defined the genetic variation among these 24 populations along several axes of variation. None of the top 4 principal components separated the populations into seeded and wild clusters (Figure 3, Supplementary Figure S1). The first 2 components explained only 1.5% and 1.3% of genetic variance among individuals respectively. As seen in Figure 3, 95% confidence ellipses of genetic variation between seeded and wild populations largely overlap. Figure 3. View largeDownload slide Principal components analysis of single nucleotide polymorphisms from individuals in 24 Lupinus texensis populations. Genetic differentiation determined by principal components analysis of 11,741 SNPs from 187 individuals from 24 populations of wild and seeded L. texensis. Shaded areas represent 95% confidence ellipses for each population type. The first two principal components explained 1.5% and 1.3% of variance among individuals, respectively. Figure 3. View largeDownload slide Principal components analysis of single nucleotide polymorphisms from individuals in 24 Lupinus texensis populations. Genetic differentiation determined by principal components analysis of 11,741 SNPs from 187 individuals from 24 populations of wild and seeded L. texensis. Shaded areas represent 95% confidence ellipses for each population type. The first two principal components explained 1.5% and 1.3% of variance among individuals, respectively. Clustering analyses using STRUCTURE and CLUMPAK identify little evidence of population structure in this dataset. A population number of K = 2 is the statistically most likely number of populations in this sample using both the Evanno and Pritchard methods (Supplementary Figure S2). For each individual, the proportion of ancestry from the two clusters was estimated. At K = 2, nearly every population was identified as dominated by one cluster, though having individuals of mixed ancestry between both clusters (Figure 4; 5/10, major and minor clusters very similar, see Supplementary Figure S3, Mean(LnProb) = −182000, Mean(similarity score) = 0.999). Population structure analyses did not separate seeded from wild populations, or separate populations by ecoregion (Gould et al. 1960; Figure 4). Higher levels of K (K = 3 or greater, Supplementary Figure S3) suggest population 1006, the easternmost seeded population, may be more distinct, and populations 1101, 1102, and 1103, 2 wild and 1 seeded population in the western portion of the sampled area, may cluster together. Overall these populations appear to be panmictic, independent of their seeded or wild status. Figure 4. View largeDownload slide Population structure analysis of 24 collection locations of Lupinus texensis. Each individual is represented by a vertical column; each column is divided into shaded segments proportional to their estimated ancestry in the STRUCTURE-defined genetic cluster. Populations are grouped by geographically neighboring pairs, first the wild and then the seeded member of the pair (designated by “_W” and “_S” appended to population ID), and pairs are ordered by population ID. Figure 4. View largeDownload slide Population structure analysis of 24 collection locations of Lupinus texensis. Each individual is represented by a vertical column; each column is divided into shaded segments proportional to their estimated ancestry in the STRUCTURE-defined genetic cluster. Populations are grouped by geographically neighboring pairs, first the wild and then the seeded member of the pair (designated by “_W” and “_S” appended to population ID), and pairs are ordered by population ID. Discussion Based on our analysis of 11741 genome-wide SNPs, L. texensis populations in central Texas, the historical range of this endemic species, appear to be nearly panmictic across the sampled range, with little genetic differentiation between populations recently seeded from commercial seed stock and those without recorded evidence of ever being intentionally seeded. Seeded populations are not distinct from wild populations, and wild populations are not distinct from each other. This is surprising in a species with what appears to be a fairly limited un-assisted dispersal range (Schaal 1980). The lack of agreement between the experimental measures of gene flow estimated by Schaal (1980) and indirect measures in the present study is consistent with a recent history of large-scale dispersal and major changes in population structure (Slatkin 1987). In addition to the lack of population structure, our data suggest that genetic diversity may have increased over the past 40 years. While measures of diversity at allozymes are not strictly comparable to those based on bi-allelic SNPs (Nei 1987; Pyhäjärvi et al. 2011; Lowe et al. 2004), a conversion equation was recently developed to provide a rough estimate of likely nucleotide diversity (π) from allozyme data (Ai et al. 2014). In L. texensis, Babbel and Selander’s estimate (average observed heterozygosity from allozymes, HO = 0.3559) can be approximated to π = 0.0116, versus π = HO = 0.270 from SNPs, this study. However, we cannot rule out the possibility that the conversion is unreliable in this system, possibly because of the small number of allozymes employed. Few other empirical studies of gene flow using DNA sequence data exist in natural populations of herbaceous, annual, insect-pollinated (especially bee), passively dispersed, and primarily outcrossing plant species. Those that have looked at population genetic structure under these parameters have found levels of differentiation varying from low to incipient speciation. Population structure was found to be low, with FST at neutral loci ranging between 0.03 and 0.07, within 2 subspecies of Nigella degenii (Ranuculaceae) occurring on different islands across an area of approximately 70 km2 (149 AFLP fragments; Jorgensen et al. 2006). Significant isolation-by-distance exists among Greek populations of Capsella grandiflora (Brassicaceae), with an overall FST of 0.14 among populations (16 loci; St. Onge et al. 2011). Pairwise FST ranged from 0.048 to 0.171 among 6 populations over approximately 500 km2 area of the range of Clarkia xantiana ssp. xantiana (Onagraceae) (8 loci and 4 microsatellites; Moeller et al. 2011). Adaptation to active sand dune habitat has promoted structure and perhaps incipient speciation between dune and non-dune populations of Helianthus petiolaris (Asteraceae), despite ongoing, asymmetric, gene flow across an area of approximately 30 km2 (15 microsatellite and genome-wide RADseq data; Andrew et al. 2012; Andrew et al. 2013). And finally, a history of isolation in multiple glacial refugia has resulted in strong phylogeographic structure and even reproductive isolation among populations of Campanulastrum americanum across eastern North America (Campanulaceae) (chloroplast loci and genome-wide RADseq data; Barnard-Kubow et al. 2015). Compared to these taxa, L. texensis has the lowest level of population differentiation across nearly the largest range assessed (overall FST = 0.015 over approximately 70000 km2, this study). Interestingly, population patterns for nuclear SNPs, which are biparentally inherited, and chloroplast SNPs, which are typically maternally inherited, show the same pattern: that of little existing population structure between wild and seeded—or even among wild—populations. Paternal inheritance of chloroplasts via pollen have been documented in some species in the Fabaceae (e.g. Medicago sativa; Havananda et al. 2010), but not to our knowledge in lupines. While the pattern observed here could be explained in nuclear SNPs by pollen movement via bees, observations of the same pattern among plastid SNPs suggests migration between populations via seed movement. Also, experimental evidence suggests that pollen transmission by bees is typically quite limited; <10 m2 according to Schaal (1980), although this is likely to be an under-estimate because of the “single-source” experimental design that was employed (see above). Taken together, this evidence suggests that wild and seeded populations share a common genetic pool and history. One possible explanation is that seed dispersal distances in this species have been wildly underestimated, such that populations as far as 313 km away from each other regularly exchange migrants via reseeding. The seeds of this species may be dispersed via some unknown animal vector, though this seems unlikely as the seeds are smooth and blend in well with the soil. Species, such as L. texensis, that lack obvious traits adapted to enhance dispersal have been understudied; these species may have very restricted dispersal and have problems moving about in fragmented landscapes without the involvement of people (Cousens et al. 2008). Occasional long distance dispersal by non-human means is possible (for example, via rodents predated by wide ranging birds; Hämäläinen et al. 2017), but such occasional migrants are unlikely to result in the panmictic pattern observed here. Adaptation to cultivation, as may occur to some degree during commercial seed propagation, has selected for decreased dispersal in many cultivated and domesticated species (for example, loss of shattering in cereals; Heiser 1988). If cultivation has had any effect on human dispersed L. texensis, it is most likely for decreased, rather than increased, natural dispersal. However, because of the scale and regularity of human-assisted dispersal in this species, natural dispersal mechanisms may not be necessary to produce the panmictic population structure seen here. Seeded and wild populations may be similar because cultivated and wild individuals are only weakly isolated. Given the brief history of cultivation in this species, there may not have been enough time for selection to result in differentiation between the wild and seeded populations, at least to the levels that we are able to identify with the genomic reduction approach used here (“weakly domesticated”; Dempewolf et al. 2008). Cultivated stock populations may frequently incorporate wild individuals, a regular practice at Wildseed Farms (from varying wild populations <300 km away from the farm; Tom Kramer, Wildseed Farms, personal communication). This should result in a “melting pot” of genetic diversity from the core part of the native range, blending populations and bulking up seed from blended individuals, before human dispersal of this seed across the range sampled here. Previous RFLP analysis of chloroplast diversity in this species indicated that chloroplast diversity across the core range of L. texensis was thought to be low (Banks and Birky 1985). However, our chloroplast data indicate variation is within the expected range for chloroplasts, which are known to have a slow mutation rate (Gaut et al. 1993). While a lack of differentiation between seeded and particular “wild” populations could be explained by the short period of time that L. texensis has been commercially cultivated, this does not explain why wild populations are not distinct from each other. Alternately, little structure may exist between wild and seeded populations because commercial cultivation may have resulted in little to no selection on the seeded populations. Effort is made by TXDoT to source germplasm from local (and therefore perhaps environmentally similar) commercial sources; for example, farms utilized in 2013 were approximately 200 km from their distribution sites, though in some cases commercial germplasm sources may be as far away as New Mexico (Buddy Hudson, TXDoT, personal communication). If cultivation conditions closely mimic those experienced by wild populations, it is possible, but unlikely, that little selection occurs under cultivation. Cultivation, even associated with ecological restoration efforts, often selects for traits which increase seed production in agricultural settings, such as early emergence, large plant and seed size, and high seed production (Kulpa and Leger 2013). It seems unlikely that cultivation would be able to mimic natural conditions sufficiently to prevent evolution given the logistical efforts involved in harvesting, seed storage, and scarification associated with commercial production (Davis et al. 1994; Tom Kramer, Wildseed Farms, personal communication). However, while this may explain the similarity and admixture among seeded populations or between seeded and wild populations, it would not explain the unexpected lack of structure among wild populations. Some gene flow must occur between these wild populations, possibly via the “melting pot” of weak domestication followed by human-assisted dispersal. Intentional ornamental seeding may have swamped wild populations, such that little genetic evidence of the structured, isolated, wild populations of the recent past remain within this species’ core historic range. If there were population structure among L. texensis populations in the central Texas area before the adoption of seed spreading, whether isolated-by-distance through drift or locally adapted, that structure was not detected in this dataset. To confirm this extensive swamping hypothesis, L. texensis must be sampled from areas where ornamental seeding has never occurred within reasonable dispersal distance (if any such populations exist). Alternatively, the genetic differentiation among modern wild populations could be compared to that of historical samples, such as those of L. texensis maintained by herbaria and natural history collections, collected before the state-wide adoption of wildflower seeding in the 1930s. Because our identification of populations as “wild” depends on the knowledge and memory of land managers, and the historical resources at their disposal, we may have erroneously identified some populations as wild which did in fact have some history of intentional seeding. It is also possible that some populations we identified as “seeded” were actually wild populations naturally occurring along highways. This potential misidentification would obscure evidence of differentiation between wild and seeded populations. But this is unlikely to be the case for every population included here; some of the wild populations have been under conservation management for more than 60 years or have otherwise remained unimproved since at least the 19th century. The pattern that we found (Figure 3), with total overlap between most individuals of all populations occurring in the same SNP space, does not suggest that a small number of populations were misidentified. Conclusion Thanks in part to the cultural value imbued in this species, it is currently common and abundant (Andrews 1993). However, roadside seed spreading on the scale of the TxDOT’s program would provide ample opportunities for constant introduction of cultivated germplasm into nearby wild populations, and this human-assisted abundance should have an impact on gene flow in this species. The homogenization of genetic diversity across its core range may have important implications for adaptation to environmental change in the future, potentially both positive and negative. It is possible that there is a net genetic benefit; populations fractured by drift may be historically interesting, but may lack evolutionary potential. Homogenization could increase effective population size and promote evolutionary response to climate change or other selective pressures. Given the intense level of involvement that human activity has had on the success of this species, we have perhaps selected for populations well adapted to human contexts, such as the abundant habitat available along state and interstate highways. Lupinus texensis, once endemic to central Texas, can now be found throughout the state, as well as Oklahoma, Louisiana, and Florida (USDA NCRS 2017). One possible concern, however, is that the constant introduction of seeds into locally adapted populations could result in migration load, in which maladapted alleles are introduced more quickly than they can be weeded out by natural selection (Bolnick and Nosil 2007). One-time increases in genetic variation may be beneficial, but continued large-scale introductions of cultivated seed, if migrants have lower relative fitness in a given location, would be detrimental. Our data suggest that current L. texensis populations in the species’ endemic core range experience a high degree of migration both between wild and intentionally seeded populations and even among purportedly wild populations. We suggest that this is most likely the result of the intentional spreading of cultivated seed by TXDoT and others. The historical state of population structure in this species is not clear, though allozyme studies from the 1970s suggest that populations once contained less genetic variability (Babbel and Selander 1974; Ai et al. 2014). Ultimately the fate of this species is tied to its cultural importance and the value we place on “wild” flowers. Supplementary Material Supplementary material can be found at https://academic.oup.com/jhered/. Funding This work was supported, in part, by the National Science Foundation (PRFB 1523842 to K.G.T.) and the Natural Sciences and Engineering Research Council of Canada (NSERC) Discovery Grants Program (DG 327475 to L.H.R. and RGPIN-2014–05820 to Q.C.). Acknowledgments The population sampling in this study would not have been possible without the gracious participation of the following organizations and people: Texas Department of Transportation (Buddy Hudson), Texas Parks and Wildlife Department (Jackie Poole, Donnie Frels), Kerr Wildlife Management Area (Ryan Reitz), Mason Mountain Wildlife Management Area (Mark Mitchell), Muse Wildlife Management Area (Devin Erxleben), The Nature Conservancy of Texas (Charlotte Reemts, Brandon Crawford, Rachael Ranft, Jim Eidson), The Texas Land Conservancy (Daniel Dietz, Joyce Lucas, Bob and Dee Tusch, Susan and Elmer Rosenberger, Howard Hicks, Terry Green), the Lady Bird Johnson Wildflower Center (Phillip Schulze), Wildseed Farms (Tom Kramer, Kay Kailer), and the US Army - Fort Hood (Carla Picinich). Collection help was also provided by M.D. Turner. And every single one of them loves bluebonnets. Conflict of Interest None declared. Data Availability We have deposited the primary data underlying these analyses at the Dryad data repository: http://dx.doi.org/doi:10.5061/dryad.jh3kk. Voucher specimens for all populations are maintained by the Colorado State University Herbarium (CS). References Ai B, Kang M, Huang H. 2014. Assessment of genetic diversity in seed plants based on a uniform π criterion. Molecules . 19: 20113– 20127. Google Scholar CrossRef Search ADS PubMed Andrew RL, Kane NC, Baute GJ, Grassa CJ, Rieseberg LH. 2013. Recent nonhybrid origin of sunflower ecotypes in a novel habitat. Molecular Ecology . 22: 799– 813. Google Scholar CrossRef Search ADS PubMed Andrew RL, Ostevik KL, Ebert DP, Rieseberg LH. 2012. Adaptation with gene flow across the landscape in a dune sunflower. Molecular Ecology . 21: 2078– 2091. Google Scholar CrossRef Search ADS PubMed Andrews J. 1993. The texas bluebonnet . Austin: University of Texas Press. Babbel GR, Selander RK. 1974. Genetic variability in edaphically restricted and widespread plant species. Evolution . 28: 619– 630. Google Scholar CrossRef Search ADS PubMed Banks JA, Birky CW. 1985. Chloroplast DNA diversity is low in a wild plant, Lupinus texensis. Proceedings of the National Academy of Sciences . 82: 6950– 6954. Google Scholar CrossRef Search ADS Barnard-Kubow KB, Debban CL, Galloway LF. 2015. Multiple glacial refugia lead to genetic structuring and the potential for reproductive isolation in a herbaceous plant. American Journal of Botany . 102: 1842– 1853. Google Scholar CrossRef Search ADS PubMed Bolnick DI, Nosil P. 2007. Natural selection in populations subject to a migration load. Evolution . 61: 2229– 2243. Google Scholar CrossRef Search ADS PubMed Bryc K, Auton A, Nelson MRet al. 2010. Genome-wide patterns of population structure and admixture in West Africans and African Americans. Proceedings of the National Academy of Sciences . 107: 786– 791. Google Scholar CrossRef Search ADS Clement M, Posada D, Crandall KA. 2000. TCS: a computer program to estimate gene genealogies. Molecular ecology . 9: 1657– 1659. Google Scholar CrossRef Search ADS PubMed Cousens R, Dytham C, Law R. 2008. Dispersal in plants: a population perspective . Oxford University Press. Google Scholar CrossRef Search ADS Danecek P, Auton A, Abecasis Get al. 2011. The variant call format and VCFtools. Bioinformatics . 27: 2156– 2158. Google Scholar CrossRef Search ADS PubMed Davis TD, George SW, Mackay WA, Parsons JM. 1994. Development of Texas bluebonnets into floricultural crops. HortScience . 29: 1110– 1211. Dempewolf H, Rieseberg LH, Cronk QC. 2008. Crop domestication in the Compositae: a family-wide trait assessment. Genetic Resources and Crop Evolution . 55: 1141– 1157. Google Scholar CrossRef Search ADS Do C, Waples RS, Peel Det al. 2014. NeEstimator v2: re-implementation of software for the estimation of contemporary effective population size (Ne) from genetic data. Molecular Ecology Resources . 14: 209– 214. Google Scholar CrossRef Search ADS PubMed Ellstrand NC. 2014. Is gene flow the most important evolutionary force in plants? American Journal of Botany . 101: 737– 753. Google Scholar CrossRef Search ADS PubMed Ellstrand NC, Prentice HC, Hancock JF. 1999. Gene flow and introgression from domesticated plants into their wild relatives. Annual Review of Ecology and Systematics . 30: 539– 563. Google Scholar CrossRef Search ADS Evanno G, Regnaut S, Goudet J. 2005. Detecting the number of clusters of individuals using the software STRUCTURE: a simulation study. Molecular ecology . 14: 2611– 2620. Google Scholar CrossRef Search ADS PubMed Gaut BS, Muse SV, Clegg MT. 1993. Relative rates of nucleotide substitution in the chloroplast genome. Molecular Phylogenetics and Evolution . 2: 89– 96. Google Scholar CrossRef Search ADS PubMed Gilbert KJ, Andrew RL, Bock DGet al. 2012. Recommendations for utilizing and reporting population genetic analyses: the reproducibility of genetic clustering using the program structure. Molecular Ecology . 21: 4925– 4930. Google Scholar CrossRef Search ADS PubMed Goudet J, Jombart T. 2016. hierfstat: Estimation and Tests of Hierarchical F-Statistics . Gould FW, Hoffman GO, Rechenthin CA. 1960. Vegetational areas of Texas. Texas A&M University, Texas Agricultural Experiment Station, Leaflet . 492: 1– 4. Hämäläinen A, Broadley K, Droghini Aet al. 2017. The ecological significance of secondary seed dispersal by carnivores. Ecosphere . 8: e01685. Google Scholar CrossRef Search ADS Havananda T, Brummer EC, Maureira-Butler IJ, Doyle JJ. 2010. Relationships Among Diploid Members of the Medicago sativa (Fabaceae) Species Complex Based on Chloroplast and Mitochondrial DNA Sequences. Systematic Botany . 35: 140– 150. Google Scholar CrossRef Search ADS Heiser CB. 1988. Aspects of unconscious selection and the evolution of domesticated plants. Euphytica . 37: 77– 81. Google Scholar CrossRef Search ADS Helenurm K, Schaal BA. 1996. Genetic Load, Nutrient Limitation, and Seed Production in Lupinus texensis (Fabaceae). American Journal of Botany . 83: 1585– 1595. Google Scholar CrossRef Search ADS Highway Beautification Act of 1965, Pub. L. No. 89–285, 79 Stat. 1028 (codified as amended at 23 U.S.C. § 136 (2016)). Huang DI, Hefer CA, Kolosova N, Douglas CJ, Cronk QC. 2014. Whole plastome sequencing reveals deep plastid divergence and cytonuclear discordance between closely related balsam poplars, Populus balsamifera and P. trichocarpa (Salicaceae). New Phytol . 204: 693– 703. Google Scholar CrossRef Search ADS PubMed Janes JK, Miller JM, Dupuis JRet al. 2017. The K=2 conundrum. Molecular Ecology , accepted author manuscript. doi: 10.1111/mec.14187. Jorgensen TH, Richardson DS, Andersson S. 2006. Comparative analyses of population structure in two subspecies of Nigella degenii: evidence for diversifying selection on pollen-color dimorphisms. Evolution . 60: 518– 528. Google Scholar CrossRef Search ADS PubMed Kopelman NM, Mayzel J, Jakobsson M, Rosenberg NA, Mayrose I. 2015. Clumpak: a program for identifying clustering modes and packaging population structure inferences across K. Molecular ecology resources . 15: 1179– 1191. Google Scholar CrossRef Search ADS PubMed Kulpa SM, Leger EA. 2013. Strong natural selection during plant restoration favors an unexpected suite of plant traits. Evolutionary Applications . 6: 510– 523. Google Scholar CrossRef Search ADS PubMed Lowe A, Harris S, Ashton P. 2004. Ecological genetics: design, analysis, and application . Oxford, UK: Blackwell Publishing. Mackay WA, George S, Parsons JMet al. 2000. Texas Maroon’Bluebonnet. HortScience . 35: 313– 313. Maron JL, Simms EL. 1997. Effect of seed predation on seed bank size and seedling recruitment of bush lupine (Lupinus arboreus). Oecologia . 111: 76– 83. Google Scholar CrossRef Search ADS PubMed Moeller DA, Geber MA, Tiffin P. 2011. Population Genetics and the Evolution of Geographic Range Limits in an Annual Plant. The American Naturalist . 178: S44– S57. Google Scholar CrossRef Search ADS PubMed Nei M. 1972. Genetic distance between populations. American naturalist . 283– 292. Nei M. 1987. Molecular evolutionary genetics . New York: Columbia university press. Parsons JM, Davis TD. 1993. Abbott Pink’Bluebonnet (Lupinus texensis Hook.). HortScience . 28: 65– 66. Parsons JM, Davis TD, George SW, Mackay WA. 1994. Barbara Bush’Bluebonnet (Lupinus texensis Hook.). HortScience . 29: 1202– 1202. Poland JA, Brown PJ, Sorrells ME, Jannink JL. 2012. Development of high-density genetic maps for barley and wheat using a novel two-enzyme genotyping-by-sequencing approach. PLoS One . 7: e32253. Google Scholar CrossRef Search ADS PubMed Price HJ, Dillon SL, Hodnett Get al. 2005. Genome evolution in the genus Sorghum (Poaceae). Annals of Botany . 95: 219– 227. Google Scholar CrossRef Search ADS PubMed Price AL, Patterson NJ, Plenge RMet al. 2006. Principal components analysis corrects for stratification in genome-wide association studies. Nature Genetics . 38: 904– 909. Google Scholar CrossRef Search ADS PubMed Pritchard JK, Stephens M, Donnelly P. 2000. Inference of population structure using multilocus genotype data. Genetics . 155: 945– 959. Google Scholar PubMed Puritz JB, Hollenbeck CM, Gold JR. 2014. dDocent: a RADseq, variant-calling pipeline designed for population genomics of non-model organisms. PeerJ . 2: e431. Google Scholar CrossRef Search ADS PubMed Pyhäjärvi T, Kujala ST, Savolainen O. 2011. Revisiting protein heterozygosity in plants—nucleotide diversity in allozyme coding genes of conifer Pinus sylvestris. Tree genetics & genomes . 7: 385– 397. Google Scholar CrossRef Search ADS Rieseberg LH, Burke JM. 2001. The biological reality of species: gene flow, selection, and collective evolution. Taxon . 50(1): 47– 67. doi:10.2307/1224511 Google Scholar CrossRef Search ADS Schaal BA. 1980. Measurement of gene flow in Lupinus texensis . Nature. 284: 450– 451. Sexton JP, Hangartner SB, Hoffmann AA. 2014. Genetic isolation by environment or distance: which pattern of gene flow is most common? Evolution . 68: 1– 15. Google Scholar CrossRef Search ADS PubMed Slatkin M. 1987. Gene flow and the geographic structure of natural populations. Science . 236: 787– 792. Google Scholar CrossRef Search ADS PubMed St Onge KR, Källman T, Slotte T, Lascoux M, Palmé AE. 2011. Contrasting demographic history and population structure in Capsella rubella and Capsella grandiflora, two closely related species with different mating systems. Mol Ecol . 20: 3306– 3320. Google Scholar CrossRef Search ADS PubMed Strong DR, Ayres DR. 2013. Ecological and Evolutionary Misadventures of Spartina. Annual Review of Ecology, Evolution, and Systematics . 44: 389– 410. Google Scholar CrossRef Search ADS Texas Department of Transportation (TXDoT). 2016. Wildflower Program. [cited 2017 January 10]. Available from: http://www.txdot.gov/inside-txdot/division/maintenance/wildflower-program.html. Todesco M, Pascual MA, Owens GLet al. 2016. Hybridization and extinction. Evolutionary Applications . 9: 892– 908. Google Scholar CrossRef Search ADS PubMed Turner BL. 1957. The chromosomal and distributional relationships of Lupinus texensis and L. subcarnosus (Leguminosae). Madroño . 13– 16. USDA NCRS. 2017. Lupinus texensis Hook. The PLANTS Database, National Plant Data Team. [cited 2017 January 9]. Available from: http://plants.usda.gov/. Weir BS, Cockerham CC. 1984. Estimating f-statistics for the analysis of population structure. Evolution . 38: 1358– 1370. Google Scholar PubMed Whiteley AR, Fitzpatrick SW, Funk WC, Tallmon DA. 2015. Genetic rescue to the rescue. Trends in Ecology & Evolution . 30: 42– 49. Google Scholar CrossRef Search ADS PubMed Zheng X, Levine D, Shen Jet al. 2012. A High-performance Computing Toolset for Relatedness and Principal Component Analysis of SNP Data. Bioinformatics . 28: 3326– 3328. Google Scholar CrossRef Search ADS PubMed © The American Genetic Association 2017. All rights reserved. For permissions, please e-mail: firstname.lastname@example.org
Journal of Heredity – Oxford University Press
Published: Mar 1, 2018
It’s your single place to instantly
discover and read the research
that matters to you.
Enjoy affordable access to
over 18 million articles from more than
15,000 peer-reviewed journals.
All for just $49/month
Query the DeepDyve database, plus search all of PubMed and Google Scholar seamlessly
Save any article or search result from DeepDyve, PubMed, and Google Scholar... all in one place.
All the latest content is available, no embargo periods.
“Whoa! It’s like Spotify but for academic articles.”@Phil_Robichaud