Resolving Recent Plant Radiations: Power and Robustness of Genotyping-by-Sequencing

Resolving Recent Plant Radiations: Power and Robustness of Genotyping-by-Sequencing Abstract Disentangling species boundaries and phylogenetic relationships within recent evolutionary radiations is a challenge due to the poor morphological differentiation and low genetic divergence between species, frequently accompanied by phenotypic convergence, interspecific gene flow and incomplete lineage sorting. Here we employed a genotyping-by-sequencing (GBS) approach, in combination with morphometric analyses, to investigate a small western Mediterranean clade in the flowering plant genus Linaria that radiated in the Quaternary. After confirming the morphological and genetic distinctness of eight species, we evaluated the relative performances of concatenation and coalescent methods to resolve phylogenetic relationships. Specifically, we focused on assessing the robustness of both approaches to variations in the parameter used to estimate sequence homology (clustering threshold). Concatenation analyses suffered from strong systematic bias, as revealed by the high statistical support for multiple alternative topologies depending on clustering threshold values. By contrast, topologies produced by two coalescent-based methods (NJ$$_{\mathrm{st}}$$, SVDquartets) were robust to variations in the clustering threshold. Reticulate evolution may partly explain incongruences between NJ$$_{\mathrm{st}}$$, SVDquartets and concatenated trees. Integration of morphometric and coalescent-based phylogenetic results revealed (i) extensive morphological divergence associated with recent splits between geographically close or sympatric sister species and (ii) morphological convergence in geographically disjunct species. These patterns are particularly true for floral traits related to pollinator specialization, including nectar spur length, tube width and corolla color, suggesting pollinator-driven diversification. Given its relatively simple and inexpensive implementation, GBS is a promising technique for the phylogenetic and systematic study of recent radiations, but care must be taken to evaluate the robustness of results to variation of data assembly parameters. Coalescence, concatenation, genotyping-by-sequencing, Linaria, phylogeny, radiation, RAD-Seq, speciation Recent evolutionary radiations constitute ideal systems to investigate evolution, speciation and adaptation (Schluter 2000; Hughes and Eastwood 2006; Seehausen 2006; Valente et al. 2010; Farrington et al. 2014). To understand the causes and direction of evolutionary change, robust systematic treatments and well-resolved phylogenies are required. However, disentangling species boundaries and phylogenetic relationships within recent evolutionary radiations is a challenge due to the low genetic divergence between species, frequently accompanied by inter-specific gene flow and incomplete lineage sorting (ILS) (Shaw 2002; Shaffer and Thomson 2007). This leads to poor resolution, low support values and short branch lengths in phylogenetic trees, interpreted as signatures of rapid diversification (Valente et al. 2010; Meiklejohn et al. 2016). Although morphological diversity may be considerable, the high frequency of phenotypic convergence further confounds systematics and phylogeny, yet it has high evolutionary interest because it is potentially informative about recurrent adaptive traits and their molecular basis (e.g. Whittall et al. 2006; Muschick et al. 2012). Some authors cast doubt on the possibility of resolving recent radiations in the tree of life because of these difficulties (Vargas and Zardoya 2014), and even state-of-the-art approaches have methodological challenges to overcome (Harvey et al. 2016). In plants, classical approaches to phylogenetic analysis are based on a small number of loci, frequently nuclear ribosomal DNA (nrDNA), plastid DNA (ptDNA) and low-copy number genes (Sang 2002; Álvarez and Wendel 2003; Shaw et al. 2007). This strategy has been proven effective to resolve some recent radiations, and is frequently able to recover major clades and evolutionary patterns (Givnish et al. 2009; Guzmán et al. 2009). However, incongruence between loci that may result from hybridization and ILS, among other causes, often brings about uncertainty regarding species boundaries and species-level relationships (Blanco-Pastor et al. 2012). Under these conditions, the sequencing of a large number of genome-wide genetic markers, accompanied by thorough intraspecific sampling of individuals, is a requirement to resolve systematic and phylogenetic questions. To achieve this, several approaches based on next generation sequencing (NGS) have been proposed in recent years, including sequence capture, transcriptomics, and restriction digest-based methods (Lemmon and Lemmon 2013; McCormack et al. 2013; Harvey et al. 2016). Restriction digest-based NGS methods comprise a number of procedures, including those known as restriction-site associated DNA sequencing (RAD-Seq) and genotyping-by-sequencing (GBS) (van Orsouw et al. 2007; Baird et al. 2008; Davey et al. 2011; Elshire et al. 2011; Peterson et al. 2012; Poland et al. 2012). Despite the plethora of protocols that have been published, a NGS-based method that is universally and routinely applicable by systematic biologists is yet to be established as a replacement to conventional phylogenetic approaches. Such a method should not only be capable of providing genome-wide phylogenetic information but should also be cost- and time-effective, involve relatively simple laboratory protocols, and should not require a sequenced reference genome or previous genomic information (Lemmon and Lemmon 2013; McCormack et al. 2013). Ideally, it should also be successful with fragmented DNA available from biological collections of various ages. In addition to laboratory protocols, uncertainties also remain concerning the bioinformatic analysis of large numbers of genomic loci with potentially conflicting phylogenetic signals. The concatenated approach has frequently been favored because of its analytical speed (Wagner et al. 2013; Hipp et al. 2014). However, simulation studies show that, even when analyzing thousands of loci, concatenated analysis can produce inconsistent results in the presence of ILS (Kubatko and Degnan 2007). Species tree methods based on the multi-species coalescent approach therefore should be preferred to deal with this problem but they may be computationally intensive (reviewed by Liu et al. 2015). In addition, as for concatenation methods, species tree methods may be inconsistent in the presence of hybridization (Solís-Lemus et al. 2016). Coalescent-based phylogenetic network methods, capable of dealing with gene flow, are even more computationally intensive, but more efficient implementations of these are currently in development (Solís-Lemus and Ané 2016). Here we have analyzed species boundaries, phylogenetic relationships, and patterns of morphological evolution during speciation in a small radiation of flowering plants, the Iberian clade of Linaria subsect. Versicolores (Fernández-Mazuecos et al. 2013a). According to current taxonomic treatments, this clade comprises eight taxa endemic or sub-endemic to the Iberian Peninsula, in the western Mediterranean biodiversity hotspot (Sáez and Bernal 2009; Vigalondo et al. 2015; Blanca et al. 2017). This clade constitutes an excellent study system for the understanding of pollinator-driven floral evolution and speciation because it has high phenotypic diversity in floral traits promoting pollinator specialization, including corolla tube width, corolla color and nectar spur length (Fernández-Mazuecos et al. 2013a). Nectar spurs in particular have been considered a key evolutionary innovation, promoting speciation and morphological diversity (Kay et al. 2006; Fernández-Mazuecos and Glover 2017). However, the Iberian clade also displays a number of features that make it a challenging study case from both a systematic and phylogenetic standpoint (Fernández-Mazuecos et al. 2013a; Fernández-Mazuecos and Vargas 2015; Vigalondo et al. 2015): (i) recent and rapid radiation, dated back to the Quaternary; (ii) potential instances of phenotypic convergence; (iii) incongruence between ptDNA and nrDNA (internal transcribed spacer, ITS) phylogenies; (iv) inconsistent results between concatenation- and coalescent-based phylogenies of combined ptDNA$$+$$ITS data; and (v) lack of monophyly of intraspecific samples and extensive sharing of ptDNA haplotypes across species. As a result, previous attempts to resolve relationships with standard phylogenetic markers were unable to recover a reliable species-level phylogeny for this clade. To generate genome-wide phylogenetic markers, we selected the genotyping-by-sequencing approach first described by Elshire et al. (2011) as it does not require previous genomic information and is relatively simple and inexpensive compared to related restriction digest-based methods. This makes it a good candidate to become a universally applicable NGS replacement for conventional phylogenetic approaches. This method has recently been successfully applied to species phylogenetics and species delimitation (Escudero et al. 2014; Nicotra et al. 2016), but questions remain concerning the analytical approach to GBS data, such as the treatment of missing data (whether loci with large proportions of missing sequences should be excluded or not; Roure et al. 2013; Huang and Knowles 2014; Eaton et al. 2016) or the relative merits of concatenated and coalescent-based approaches to infer species phylogenies (DaCosta and Sorenson 2016). Our objective was to use both morphometric data and genome-wide genetic markers generated by GBS in the Iberian clade of Linaria subsect. Versicolores to: (i) test species limits, thus achieving a robust systematic treatment; (ii) infer a well-supported species phylogeny comparing the relative performances of concatenation and coalescent-based methods; (iii) analyze the patterns of morphological evolution during speciation, particularly those potentially linked to pollinator specialization. The ultimate goal was to establish a unified approach that allows the resolution of the elusive species limits and phylogenetic relationships in a recent radiation. Materials and Methods Study System The eight taxa currently recognized in the Iberian clade of Linaria subsect. Versicolores and acronyms used hereafter in this article are listed in Table 1 (Sáez and Bernal 2009; Vigalondo et al. 2015; Blanca et al. 2017; see Supplementary Appendix S1 available on Dryad at http://dx.doi.org/10.5061/dryad.mp818). One of these taxa has been treated as either a species or a subspecies (L. salzmannii or L. viscosa subsp. spicata), while the other seven have been consistently treated as species. An additional taxon has been described, L. viscosa subsp. crassifolia (CRA) (Sutton 1988), but its taxonomic status is uncertain (Sáez and Bernal 2009). These taxa inhabit grasslands and open shrubby formations on soils developed from diverse lithologies. They are endemic to the Iberian Peninsula, except for SPA, which is also present in southern France. They are diploid species with chromosome number 2$$n$$$$=$$ 12 (Sutton 1988; Sáez and Bernal 2009) and genome sizes around 2C $$=$$ 1.1 pg (Castro et al. 2012), and they all seem to be predominantly allogamous (Valdés 1970). Table 1. Study taxa and summary of plant materials included in morphometric and GBS analyses Taxon$$^{\mathrm{a}}$$  Acronym  Number of sampled individuals, morphometry  Number of sampled individuals, GBS$$^{\mathrm{b}}$$  Number of loci (mean $$\pm$$ SD)$$^{\mathrm{c}}$$  Linaria subsect. Elegantes (2/2)  OUT  —  4/4  2130.296  Linaria subsect. Versicolores, North African clade (6/20)  NAF  —  9/9  2524.618  Linaria subsect. Versicolores, Iberian clade (8/8)  —  —  76/68  3586.1048  $$\qquad$$L. algarviana Chav.  ALG  17  10/10  3219.808  $$\qquad$$L. becerrae Blanca, Cueto & J.Fuentes  BEC  11  8/8  4464.706  $$\qquad$$L. clementei Haens.  CLE  10  9/9  4229.845  $$\qquad$$L. incarnata (Vent.) Spreng.  INC  33  10/10  3526.670  $$\qquad$$L. onubensis Pau  ONU  32  9/8  3805.668  $$\qquad$$L. salzmannii Boiss. ($$=$$L. viscosa subsp. spicata (Kunze) D.A.Sutton)  SAL  21  10/10  3801.875  $$\qquad$$L. spartea (L.) Chaz.  SPA  27  10/8  3213.1218  $$\qquad$$L. viscosa (L.) Chaz.  VIS  22  10/5  2630.1412  Taxon$$^{\mathrm{a}}$$  Acronym  Number of sampled individuals, morphometry  Number of sampled individuals, GBS$$^{\mathrm{b}}$$  Number of loci (mean $$\pm$$ SD)$$^{\mathrm{c}}$$  Linaria subsect. Elegantes (2/2)  OUT  —  4/4  2130.296  Linaria subsect. Versicolores, North African clade (6/20)  NAF  —  9/9  2524.618  Linaria subsect. Versicolores, Iberian clade (8/8)  —  —  76/68  3586.1048  $$\qquad$$L. algarviana Chav.  ALG  17  10/10  3219.808  $$\qquad$$L. becerrae Blanca, Cueto & J.Fuentes  BEC  11  8/8  4464.706  $$\qquad$$L. clementei Haens.  CLE  10  9/9  4229.845  $$\qquad$$L. incarnata (Vent.) Spreng.  INC  33  10/10  3526.670  $$\qquad$$L. onubensis Pau  ONU  32  9/8  3805.668  $$\qquad$$L. salzmannii Boiss. ($$=$$L. viscosa subsp. spicata (Kunze) D.A.Sutton)  SAL  21  10/10  3801.875  $$\qquad$$L. spartea (L.) Chaz.  SPA  27  10/8  3213.1218  $$\qquad$$L. viscosa (L.) Chaz.  VIS  22  10/5  2630.1412  Notes: The final taxonomic treatment is followed, with a previous treatment in brackets for L. salzmannii. The specimens sampled in the locus classicus of L. viscosa subsp. crassifolia (CRA) are included in L. spartea (see text). $$^{\mathrm{a}}$$Sampled/total numbers of species and subspecies for the three major clades are shown in brackets. $$^{\mathrm{b}}$$For each taxon, total number of sampled individuals/number of individuals included in phylogenetic analyses are shown. $$^{\mathrm{c}}$$For each taxon, the mean ($$\pm$$SD) number of loci recovered across individuals and pyRAD assemblies (from clustering threshold 0.80–0.95, with minimum taxon coverage of 4) is shown. Figures calculated after taxonomic reassignment of some individuals resulting from genetic structure analyses. The Iberian clade has been strongly supported as monophyletic by both concatenated and coalescent-based analyses of combined ITS and ptDNA sequences, and it is sister to a mostly North African clade of c. 20 taxa (Supplementary Fig. S1a available on Dryad; Fernández-Mazuecos et al. 2013a; Vigalondo et al. 2015). Both the Iberian and North African clades constitute Linaria subsect. Versicolores, which is in turn sister to Linaria subsect. Elegantes (two species). Both subsections make up Linaria sect. Versicolores, one of the major clades of the toadflax genus (Fernández-Mazuecos et al. 2013b). Divergence between the Iberian and North African clades has been dated back to the Pliocene or early Quaternary, and speciation of the Iberian clade is estimated to have happened in the late Quaternary, during the last million years, with high diversification rates (Supplementary Fig. S1a available on Dryad; Fernández-Mazuecos and Vargas 2011; Fernández-Mazuecos et al. 2013a; Fernández-Mazuecos and Vargas 2015). Two major geographically-structured ptDNA lineages have been detected in the Iberian clade (Fernández-Mazuecos and Vargas 2015). Morphometric Analysis A morphometric analysis of all taxa included in the Iberian clade was performed following the methods of our previous study of INC and ONU (Vigalondo et al. 2015). A full account of methods is found in Supplementary Appendix S2 available on Dryad. Specimens were obtained from herbaria (Supplementary Appendix S1 available on Dryad) and from the authors’ collections (vouchers in BCB and MA). The selected characters included 24 quantitative and three qualitative variables (Supplementary Table S1 available on Dryad). Ten to thirty-three specimens per taxon were analyzed and one measure per character per specimen was taken. Specimens of two different flower color morphs of SAL (purple and yellow) were included, as well as three specimens collected in the locus classicus of CRA. The final morphometric data set included 166 samples. To assess the morphological affinities between specimens and taxa, principal coordinate analysis (PCoA) was conducted using Gower’s coefficient of similarity for mixed data (Gower 1971). A discriminant function analysis (DFA) was then performed to reveal the main characters contributing to the morphological differentiation of taxa and to test the assignment of specimens to taxa using a cross-validation approach. The R package rgl (Adler and Murdoch 2016) was used to represent the first three axes of both the PCoA and DFA in 3D scatter plots with 95% confidence ellipses of concentration for the eight taxa. Distributions of quantitative characters were summarized in the form of beanplots using the R package beanplot (Kampstra 2008). To evaluate the impact of sample size on the results, we repeated the PCoA and DFA analyses replacing the missing data with mean values, without obtaining significantly different results. We used the basic packages of R v3.2.5 (R Development Core Team 2016), ape (Paradis et al. 2004) and vegan (Oksanen et al. 2011) to construct the PCoA and MASS (Venables and Ripley 2002) for the DFA. Phylogenomic Analysis Specimen sampling and DNA extraction.— We sampled a total of 89 individuals of Linaria. Of these, 76 individuals represented the distribution and morphological variation of all species and subspecies of the Iberian clade of Linaria subsect. Versicolores, including specimens of the two flower color morphs of SAL and one individual of CRA (Table 1, Supplementary Table S2 and Fig. S1b available on Dryad). Thirteen individuals of additional species of Linaria sect. Versicolores were sampled to be used as the outgroup, including the two species of Linaria subsect. Elegantes and six species of the North African clade of Linaria subsect. Versicolores (Table 1, Supplementary Table S2 available on Dryad). Seventy-five individuals were sampled in the field and dried in silica gel between 2006 and 2015, while 14 individuals were obtained from generally older (1970–2006) herbarium specimens (MA, RNG). For each sampled individual, total genomic DNA was extracted from c. 20 mg of leaf tissue using a modified CTAB protocol (Doyle and Doyle 1987; Cullings 1992). DNA concentrations were quantified using a Qubit 2.0 fluorometer (Invitrogen, CA, USA) with the dsDNA Broad-Range Assay Kit. Genotyping-by-sequencing library preparation.— A GBS library was prepared using all DNA samples with the PstI-HF restriction enzyme and following previously published procedures (Elshire et al. 2011; Escudero et al. 2014; Grabowski et al. 2014) with some modifications. A full description of laboratory protocols is found in Supplemenatry Appendix S2 available on Dryad, and a summary is provided here. For each sample, 500 ng (where available) of genomic DNA were combined with 0.6 pmol of a sample specific barcode adapter and 0.6 pmol of common adapter (Supplementary Tables S2 and S3 available on Dryad). Each sample was digested with four units of PstI-HF (NEB, MA, USA) at 37ºC overnight. Adapters were ligated using 400 units of T4 DNA Ligase (NEB, MA, USA) at room temperature for 4 h. We combined 50 ng of each sample, and the pool was then purified with Agencourt AMPure XP (Beckman Coulter, CA, USA). DNA fragments were amplified for 15 PCR cycles starting from 35 ng of DNA and using NEB 2x Taq MasterMix (NEB, MA, USA). The PCR product was purified using different volume ratios of AMPure XP beads to sample, and fragment size distributions were assessed in a 2100 Bioanalyzer (Agilent Technologies, CA, USA; Supplementary Fig. S2 available on Dryad). The library with the optimal profile and concentration (AMPure to sample ratio 0.8:1, concentration 1.87 ng/$$\mu $$l, average size 595 bp) was submitted to BGI-Europe (Copenhagen, Denmark) for 100 bp HiSeq 2000 Illumina sequencing. Quality control of sequencing results was conducted in FastQC 0.11.3 (Andrews 2010). Data assembly and genetic structure.— Assembly of GBS loci was performed using the pyRAD 3.0.61 pipeline (Eaton 2014). The pyRAD procedure consisted of seven sequential steps, with parameters based on those recommended for single-end GBS data in the pyRAD documentation (http://dereneaton.com/software/pyrad/; see Supplementary Appendix S2 available on Dryad for details). Since phylogenetic results are known to be sensitive to the similarity threshold employed for within-sample and across-sample sequence clustering ($$c$$) (Takahashi et al. 2014; Mastretta-Yanes et al. 2015; Shafer et al. 2016), 16 assemblies of GBS loci were generated using a range of clustering thresholds from $$c= 0.80$$ to $$c = 0.95$$. Statistical base calling was conducted following Li et al. (2008), with a minimum depth of coverage of six and a maximum of five Ns in consensus sequences. Filtering of putative paralogs was performed by setting a maximum of two alleles (corresponding to diploid organisms) and eight heterozygous sites per consensus sequence. Loci with a minimum taxon coverage ($$m$$, minimum number of samples with data) of four and a maximum of three individuals with shared polymorphic sites (as a further filter of putative paralogs) were retained. Additional assemblies were generated for particular downstream analyses by excluding specific individuals and varying the minimum taxon coverage ($$m =$$ 10, $$m =$$ 20; see below). Henceforth, assemblies are denoted as cXmY, with X being the clustering threshold and Y being the minimum taxon coverage (Table 2). Table 2. Characteristics of assembled genotyping-by-sequencing data sets generated in pyRAD and used in phylogenetic analyses (81 individuals) Data set  Clustering threshold  Minimum taxon coverage  Number of loci  Concatenated length (bp)  Missing data, %  Number of PICs  c84m4  0.84  4  22,106  2,039,028  85.50  81,001  c84m10  0.84  10  9090  850,257  74.72  57,815  c84m20  0.84  20  3888  367,013  62.18  32,559  c85m4  0.85  4  22,586  2,079,729  85.62  80,502  c85m10  0.85  10  9203  858,302  74.81  57,033  c85m20  0.85  20  3894  366,417  62.24  31,842  c87m4  0.87  4  23,593  2,166,830  85.91  78,488  c87m10  0.87  10  9475  880,128  75.28  54,553  c87m20  0.87  20  3881  363,324  62.65  29,259  c92m4  0.92  4  29,037  2,647,482  86.90  72,514  c92m10  0.92  10  10,738  986,997  76.42  47,175  c92m20  0.92  20  3899  359,814  63.09  22,620  Data set  Clustering threshold  Minimum taxon coverage  Number of loci  Concatenated length (bp)  Missing data, %  Number of PICs  c84m4  0.84  4  22,106  2,039,028  85.50  81,001  c84m10  0.84  10  9090  850,257  74.72  57,815  c84m20  0.84  20  3888  367,013  62.18  32,559  c85m4  0.85  4  22,586  2,079,729  85.62  80,502  c85m10  0.85  10  9203  858,302  74.81  57,033  c85m20  0.85  20  3894  366,417  62.24  31,842  c87m4  0.87  4  23,593  2,166,830  85.91  78,488  c87m10  0.87  10  9475  880,128  75.28  54,553  c87m20  0.87  20  3881  363,324  62.65  29,259  c92m4  0.92  4  29,037  2,647,482  86.90  72,514  c92m10  0.92  10  10,738  986,997  76.42  47,175  c92m20  0.92  20  3899  359,814  63.09  22,620  Notes: Assembly parameters, data set sizes, completeness, and numbers of PICs are indicated. Before starting phylogenetic analyses, we tested the correspondence between morphologically defined taxa and genetic clusters by analysing unlinked single nucleotide polymorphism (SNP) matrices using three approaches: principal component analysis (PCA; Waterhouse et al. 2009), Neighbor-Net phylogenetic network estimation (Bryant and Moulton 2004) and Bayesian clustering (Pritchard et al. 2000; Corander et al. 2004) (see Supplementary Appendix S2 available on Dryad for details). As a result of these analyses, eight individuals of uncertain genetic identity, either because of their putative recent hybrid origin or low-quality sequencing results ($$>$$99% missing data in the final data sets) were excluded from downstream phylogenetic analyses. Concatenation-based phylogenies.— Maximum likelihood (ML) analyses of the 16 full-sequence concatenated data sets obtained under 16 clustering threshold values were implemented in RAxML 8.2.8 (Stamatakis 2014) using the GTRCAT substitution model (Stamatakis 2006) during tree search, followed by evaluation and optimization of the final tree under GTRGAMMA. The four Linaria subsect. Elegantes samples were set as the outgroup. Four alternative species-level topologies were found among the 16 ML trees (see Results). We selected four data sets, each yielding a distinct topology with high bootstrap support (BS) values, for further analysis and testing: c84m4, c85m4, c87m4, and c92m4. For these data sets, we additionally explored the performance of ML analyses based on concatenated SNP matrices (i.e. excluding invariant sites) using two acquisition bias corrections (Leaché et al. 2015a) implemented in RAxML (see Supplementary Appendix S2 available on Dryad for details). The four full-sequence matrices were also analyzed using Bayesian inference (BI), as implemented in ExaBayes 1.4.1 (Aberer et al. 2014), with the GTRGAMMA substitution model. To assess the sensitivity of the results to the minimum taxon coverage, assemblies generated with $$m =$$ 10 and $$m =$$ 20 and the same clustering threshold values (see Table 2) were analyzed by ML in RAxML using the same methods described above. Coalescent-based phylogenies.— It is known that the concat- enated approach may produce spurious phylogenetic results, particularly under high ILS, expected in rapid radiations. Among available gene-tree-based methods accounting for ILS, we selected NJ$$_{\mathrm{st}}$$ (Liu and Yu 2011) because of the following features: it is able to infer the species tree from unrooted gene trees (outgroup samples would be absent from many gene trees in our data set, impeding the rooting of gene trees); it can accommodate missing data; and it can handle allele data from multiple individuals per species. The four selected data sets yielding contrasting topologies in concatenated analyses (c84m4, c85m4, c87m4, and c92m4) were separately analyzed as follows. For all loci showing variability, gene trees were estimated using RAxML with the GTR$$+$$GAMMA substitution model and 100 bootstrap replicates. Fifty multilocus bootstrap replicates (Seo 2008; Mallo 2015) were generated, thus resampling nucleotides within loci, as well as loci within the data set. The NJ$$_{\mathrm{st}}$$ method was implemented on the 50 bootstrapped matrices using the R script NJstM (Mallo 2016), which relies on the phybase package (Liu and Yu 2010). A 50% majority-rule (MR) consensus tree was then built from the 50 bootstrap replicates in PAUP* 4 (Swofford 2002). Additionally, assemblies generated with higher values of $$m$$ ($$m =$$ 10 and $$m =$$ 20) and the same clustering threshold values (see Table 2) were analyzed using the same methods described above, to assess the sensitivity of the results to the minimum taxon coverage. For the same data sets analyzed in NJ$$_{\mathrm{st}}$$, we additionally implemented the quartet-based method SVDquartets (Chifman and Kubatko 2014), also accounting for ILS. This method can handle both unlinked SNP and multi-locus full-sequence data sets. Therefore, we analyzed both types of matrices generated under the same four clustering threshold values and three minimum taxon coverage values. Analyses were run in PAUP* 4 under the multispecies coalescent, evaluating all possible quartets. For each matrix, one hundred bootstrap replicates were conducted, and results were summarized in a 50% MR consensus tree. Tests of alternative topologies.— To examine if alternative tree topologies could be statistically rejected by each of the four selected concatenated data sets (c84m4, c85m4, c87m4 and c92m4), topology tests were conducted. First, for each of the four sequence matrices, constrained ML analyses were conducted in RAxML using the four alternative species-level topologies encountered in unconstrained analyses (Topologies 1–4), the topologies obtained from coalescent-based NJ$$_{\mathrm{st}}$$ (Topology 5) and SVDquartets (Topology 6) analyses and two additional topologies recovered from previous concatenated (Topology 7; Vigalondo et al. 2015) and coalescent-based (Topology 8; Fernández-Mazuecos et al. 2013a) analyses of ITS$$+$$ptDNA sequences. For each matrix, per-site log likelihoods using all constrained ML topologies were computed in RAxML under the GTR$$+$$GAMMA model, re-estimating model parameters for each tree. Then we used CONSEL v0.20 (Shimodaira and Hasegawa 2001) to calculate the $$P$$-values of topology tests, including the approximately unbiased (AU) test (Shimodaira 2002), the Shimodaira–Hasegawa (SH) test (Shimodaira and Hasegawa 1999) and the weighted SH (WSH) test (Buckley et al. 2001). To visualize differences in the numbers of loci supporting alternative topologies, we used the “partitioned RAD” approach (Hipp et al. 2014) implemented in the R package RADami (Hipp et al. 2014). The numbers of loci supporting and disfavoring each of the eight constrained topologies generated for topology tests (see above) were compared for each of the four selected data sets. A set of unique trees was generated for each locus (by pruning the original eight trees to those tips present in each locus), and the log likelihood of each unique tree for each locus was calculated in RAxML under the GTR$$+$$GAMMA model. For each locus, trees were then ranked as supported if they were within two log likelihood units of the best supported tree for that locus, or disfavored if they were within two log likelihood units of the least supported tree for that locus. Finally, for each data set, the number of loci supporting or disfavoring each of the eight alternative trees was plotted against the tree log likelihood calculated using the concatenated matrix. Hybridization analyses.— Coalescent-based species tree analyses provided two alternative topologies (see Results). We explored ancestral hybridization as a potential source of these conflicting results using two approaches: D-statistic tests (Durand et al. 2011; Eaton et al. 2015) and the pseudolikelihood-based phylogenetic network method SNaQ (Species Networks applying Quartets), which incorporates both ILS and hybridization, and employs unrooted gene trees (Solís-Lemus and Ané 2016). Four-taxon $$D$$-statistic tests (ABBA–BABA tests) were conducted in pyRAD, including heterozygous sites (Durand et al. 2011; Eaton et al. 2015). Analyses were performed on the c84m4, c85m4, c87m4, and c92m4 data sets using a set of selected individuals with good-quality sequencing, and assuming either the NJ$$_{\mathrm{st}}$$ or the SVDquartets phylogeny as the species tree (representing the major vertical inheritance pattern, MVIP). For each data set, three sets of tests were conducted exploring the potential role of hybridization on the alternative positions of CLE and SPA in the NJ$$_{\mathrm{st}}$$, SVDquartets and concatenation-based trees (see Supplementary Appendix S2 available on Dryad for details). The SNaQ method (Solís-Lemus and Ané 2016) was implemented using a small set of 27 individuals to reduce computational demand. For each of the c84m4, c85m4, c87m4, and c92m4 assemblies, ML gene trees were generated in RAxML and used as input for SNaQ analyses in PhyloNetworks 0.5.0. Since preliminary searches for an optimal network produced highly inconsistent results in independent runs, we focused on evaluating each of the NJ$$_{\mathrm{st}}$$ and SVDquartets species trees as fixed candidate topologies representing the MVIP, both without reticulation and with reticulation events (hybrid edges) accounting for incongruences with the alternative species tree topology. After optimization, the fit of trees and networks to the data was evaluated based on pseudo-deviance values, and estimated inheritance probabilities (i.e. the proportion of genes contributed by each parental population to a hybrid taxon) were visualized. Patterns of Morphological Evolution Morphometric and phylogenomic data were integrated to investigate patterns of morphological evolution during the radiation of the study clade. A phylomorphospace approach, implemented in the R package phytools (Revell 2012), was used to map the history of morphological diversification and explore the magnitude and direction of morphological changes (Sidlauskas 2008). The coalescent-based MR phylogenies obtained in NJ$$_{\mathrm{st}}$$ and SVDquartets, which provided the most robust phylogenetic hypotheses (see below), were used. After estimating branch lengths by ML (see Supplementary Appendix S2 available on Dryad for details), both trees were projected onto the multivariate morphospace defined by the first three canonical discriminant functions of our DFA of vegetative and reproductive traits, with each taxon represented by the mean values of the functions for all examined individuals. The same approach was used to project the phylogeny into the two-dimensional flower morphospace defined by the geometric morphometric study of Fernández-Mazuecos et al. (2013a), based on canonical variate (CV) analysis of landmark data. For comparison with geographic patterns of speciation, the phylogeny was also mapped onto the geographic distribution of the eight taxa in the Iberian Peninsula. In addition, we investigated shifts in particular morphological characters and instances of phenotypic convergence by conducting ancestral state reconstructions (ASRs) in Mesquite 3.04 (Maddison and Maddison 2011) using the coalescent-based species trees (NJ$$_{\mathrm{st}}$$ and SVDquartets). Maximum parsimony (MP) and ML methods were applied. Four key traits determining morphological diversity in the study clade were analyzed: habit (annual vs. perennial), inflorescence density (lax vs. dense), dominant corolla color (purple vs. yellow) and corolla shape. For corolla shape, three types based on geometric morphometric analyses were defined (Fernández-Mazuecos et al. 2013a): type I (broad corolla tube and long nectar spur), type II (broad corolla tube and short nectar spur) and type III (narrow corolla tube and long nectar spur). Results Morphometric Analysis Morphometric analyses revealed the eight taxa in the Iberian clade of Linaria subsect. Versicolores as morphologically distinct units (Fig. 1a,b, Supplementary Fig. S3 and Table S4 available on Dryad). Beanplots (Supplementary Fig. S3 available on Dryad) depicted differences between taxa in particular morphological traits, especially between CLE and other taxa for stem length, flower pedicel length, spur width, and corolla/spur length ratio. The first three coordinates of the PCoA (explaining 54.8% of variance) depicted a complex morphospace (Fig. 1a), with morphological affinities between individuals and taxa consistent with taxonomic expectations, and different degrees of morphological variation within taxa. VIS, SPA, and SAL displayed the highest overall morphological variation, while BEC, CLE and ALG displayed the smallest variation. Individuals of different species did not generally intermingle in the morphospace, despite some overlap of 95% confidence ellipses of concentration. CRA specimens were found to fall within the variation of SPA and were considered as SPA for the DFA analysis. Figure 1. View largeDownload slide Results of morphometric and genetic structure analyses of the eight taxa included in the Iberian clade of Linaria subsect. Versicolores (see Table 1 for the meaning of acronyms). a) Morphospace based on PCoA of all morphometric data (24 quantitative and 3 qualitative variables; 166 individuals); values of the first three coordinates, explaining 54.8% of variance, are represented. b) Morphospace based on DFA of quantitative variables; values of the first three canonical discriminant functions, explaining 79.5% of variation among groups, are represented. c) Principal component analysis based on the unlinked single nucleotide polymorphism (SNP) matrix obtained from the c85m4 data set (5265 SNPs; 76 individuals); values of coordinates 1, 2, and 4, explaining 72.1% of variance, are represented. In all panels, 95% confidence ellipses of concentration for each of the eight species are shown, and arrows indicate CRA individuals (ultimately assigned to SPA; see text). Figure 1. View largeDownload slide Results of morphometric and genetic structure analyses of the eight taxa included in the Iberian clade of Linaria subsect. Versicolores (see Table 1 for the meaning of acronyms). a) Morphospace based on PCoA of all morphometric data (24 quantitative and 3 qualitative variables; 166 individuals); values of the first three coordinates, explaining 54.8% of variance, are represented. b) Morphospace based on DFA of quantitative variables; values of the first three canonical discriminant functions, explaining 79.5% of variation among groups, are represented. c) Principal component analysis based on the unlinked single nucleotide polymorphism (SNP) matrix obtained from the c85m4 data set (5265 SNPs; 76 individuals); values of coordinates 1, 2, and 4, explaining 72.1% of variance, are represented. In all panels, 95% confidence ellipses of concentration for each of the eight species are shown, and arrows indicate CRA individuals (ultimately assigned to SPA; see text). The first three canonical discriminant functions of the DFA (explaining 79.5% of variation among groups) clearly discriminated CLE from two other clusters of species, one formed by INC and ONU, and the other formed by ALG, BEC, SPA, SAL and VIS (Fig. 1b; Supplementary Table S5 available on Dryad). The isolation of CLE was mainly the result of significant differences in the ratio corolla/spur length (Supplementary Table S5 and Fig. S3 available on Dryad). In the DFA, 100% of cases were correctly classified in the original species grouping, and 96.98% in the cross-validated cases, with ALG, BEC, and CLE showing 100% of correct classification (Supplementary Table S6 available on Dryad). Phylogenomic Analysis Data assembly and genetic structure.— Illumina sequencing provided c. 155 million raw reads, with a GC content of 43.05% and bases with quality $$>$$Q20 at 96.36%. Assessment in FastQC showed high quality throughout the full 100 bp length of the first reads and little signs of contamination (based on per sequence GC content). Assembly in pyRAD with a minimum taxon coverage of 4 and the full set of individuals resulted in a total number of loci between 21,283 and 36,435 and a concatenated length of 1.97–3.30 Mbp depending on the clustering threshold. Percentages of missing data were high (86.41–88.85%). The number of loci, concatenated length and percentage of missing data increased with the clustering threshold, while the number of parsimony-informative characters (PICs) decreased (Table 2). The steep increase in number of loci for the highest values of clustering threshold suggested “over-splitting,” where orthologous sequences are divided into separate loci (Harvey et al. 2015). The average number of loci per species of the study clade ranged between 2630 (VIS) and 4464 (BEC), and the number of sequenced loci per individual decreased quickly with sample age (Supplementary Fig. S4 available on Dryad), with four herbarium specimens of VIS older than 25 years producing low-quality results. After excluding problematic individuals, similar numbers of loci and percentages of missing data were found when using a minimum taxon coverage of 4. When increasing the minimum taxon coverage, the percentage of missing data improved, but the number of loci, concatenated length and number of PICs decreased dramatically (Table 2). SNP-based genetic structure analyses produced qualitatively similar results across data sets. Genetic clusters broadly corresponding with morphologically defined taxa were recovered by PCA (Fig. 1c), Neighbor-Net (Supplementary Fig. S5a available on Dryad) and Bayesian clustering (Supplementary Fig. S5b available on Dryad) analyses. The CRA individual was consistently included in the SPA cluster. Individuals of the two color morphs of SAL were consistently included in the same genetic cluster. Eight individuals with uncertain genetic affinities were excluded from further analyses, including six individuals of VIS, one of SPA and one of ONU (see Supplementary Appendix S2 available on Dryad for details). Concatenation-based phylogenies.— ML phylogenetic anal- yses of the 16 data sets supported the monophyly of the Iberian clade of Linaria subsect. Versicolores and its sister relationship to the North African clade (Fig. 2a–d, Supplementary Fig. S6 available on Dryad). The eight taxa were recovered as monophyletic groups with 100% BS, and four alternative species-level topologies were obtained (Figs 2a–d and 3). Three pairs of sister species were consistently recovered from all data sets: (BEC,SAL), (VIS,ONU) and (SPA,INC). In contrast, two species displayed two alternative phylogenetic placements each with variable BS: CLE was recovered as either sister to a clade formed by the remaining seven species (8 data sets, e.g. Fig. 2c,d) or sister to the (BEC,SAL) clade (8 data sets, e.g. Fig. 2a,b); and ALG was recovered as either sister to a ((VIS,ONU),(SPA,INC)) clade (10 data sets, e.g. Fig. 2a,d) or sister to the (SPA,INC) clade (6 data sets, e.g. Fig. 2b,c). The c84m4, c85m4, c87m4, and c92m4 data sets were selected as representatives of the four alternative topologies (Topologies 1–4) for further analysis because of their high average BS for species-level divergences. Topologies produced by ML analyses of concatenated SNP matrices fell within the range of topologies produced by full-sequence analyses (Supplementary Fig. S7; see Appendix S2 available on Dryad for details). Figure 2. View largeDownload slide Results from concatenation- and coalescent-based phylogenetic analyses of genotyping-by-sequencing data sets of the Iberian clade of Linaria subsect. Versicolores. a–d) Four alternative maximum-likelihood (ML) phylogenetic trees obtained in RAxML from concatenated analyses of data sets assembled using different clustering thresholds: a) c84m4, b) c85m4, c) c87m4, and d) c92m4 (see Table 2). Numbers above branches are percentage bootstrap values from ML analyses. Numbers below branches are posterior probabilities from Bayesian analyses in ExaBayes. e) Coalescent-based species tree obtained using the NJ$$_{\text{st}}$$ method, after estimation of unrooted gene trees by ML in RAxML with 100 bootstrap replicates. The 50% MR consensus of 50 multilocus bootstrap replicates is shown. The same MR topology was obtained from four genotyping-by-sequencing data sets assembled using different clustering thresholds (c84m4, c85m4, c87m4, c92m4; see Table 2). Numbers above branches are multilocus bootstrap support values (in percentage) from the c84m4 and c85m4 data sets. Numbers below branches are multilocus bootstrap support values from the c87m4 and c92m4 data sets. f) Coalescent-based species tree obtained using the SVDquartets method with 100 bootstrap replicates. The 50% MR consensus obtained from the same four data sets is shown. Numbers above branches are bootstrap support values from the c84m4 and c85m4 data sets. Numbers below branches are bootstrap support values from the c87m4 and c92m4 data sets. g and h) Fixed network topologies evaluated in a coalescent with hybridization framework using the SNaQ method: g) a network with the NJ$$_{\text{st}}$$ tree as major vertical inheritance pattern (MVIP) and reticulations representing incongruences with the SVDquartets tree, and h) a network with the SVDquartets tree as MVIP and reticulations representing incongruences with the NJ$$_{\text{st}}$$ tree. The four values attached to hybrid edges represent inheritance probabilities (i.e., the proportions of genes contributed by each parental population to a hybrid taxon) estimated from the c84m4, c85m4, c87m4, and c92m4 data sets, respectively. Figure 2. View largeDownload slide Results from concatenation- and coalescent-based phylogenetic analyses of genotyping-by-sequencing data sets of the Iberian clade of Linaria subsect. Versicolores. a–d) Four alternative maximum-likelihood (ML) phylogenetic trees obtained in RAxML from concatenated analyses of data sets assembled using different clustering thresholds: a) c84m4, b) c85m4, c) c87m4, and d) c92m4 (see Table 2). Numbers above branches are percentage bootstrap values from ML analyses. Numbers below branches are posterior probabilities from Bayesian analyses in ExaBayes. e) Coalescent-based species tree obtained using the NJ$$_{\text{st}}$$ method, after estimation of unrooted gene trees by ML in RAxML with 100 bootstrap replicates. The 50% MR consensus of 50 multilocus bootstrap replicates is shown. The same MR topology was obtained from four genotyping-by-sequencing data sets assembled using different clustering thresholds (c84m4, c85m4, c87m4, c92m4; see Table 2). Numbers above branches are multilocus bootstrap support values (in percentage) from the c84m4 and c85m4 data sets. Numbers below branches are multilocus bootstrap support values from the c87m4 and c92m4 data sets. f) Coalescent-based species tree obtained using the SVDquartets method with 100 bootstrap replicates. The 50% MR consensus obtained from the same four data sets is shown. Numbers above branches are bootstrap support values from the c84m4 and c85m4 data sets. Numbers below branches are bootstrap support values from the c87m4 and c92m4 data sets. g and h) Fixed network topologies evaluated in a coalescent with hybridization framework using the SNaQ method: g) a network with the NJ$$_{\text{st}}$$ tree as major vertical inheritance pattern (MVIP) and reticulations representing incongruences with the SVDquartets tree, and h) a network with the SVDquartets tree as MVIP and reticulations representing incongruences with the NJ$$_{\text{st}}$$ tree. The four values attached to hybrid edges represent inheritance probabilities (i.e., the proportions of genes contributed by each parental population to a hybrid taxon) estimated from the c84m4, c85m4, c87m4, and c92m4 data sets, respectively. BI phylogenetic analyses of the c84m4, c85m4, c87m4, and c92m4 data sets recovered MR trees topologically identical to the ML trees obtained from the same data sets, and displaying high posterior probabilities (PP $$\approx $$ 1) for all species divergences (Fig. 2a–d). When increasing the minimum taxon coverage parameter, ML analyses (Supplementary Fig. S6 available on Dryad) displayed varied topologies with a general decrease in statistical support of clades (particularly for $$m =$$ 20). Coalescent-based phylogenies.— Unlike the contrasting topologies obtained from concatenated analyses, coalescent-based NJ$$_{\mathrm{st}}$$ analyses of the c84m4, c85m4, c87m4, and c92m4 data sets produced identical MR topologies, although with variable BS values (Fig. 2e, Supplementary Fig. S8 available on Dryad). The monophyly of the Iberian clade of Linaria subsect. Versicolores (98–100% BS) and its sister relationship to the North African clade were strongly supported. Two major, geographically structured subclades were recovered within the Iberian clade (Figs 2e, 4a, and 5): a clade formed by SAL, BEC and CLE, endemic to the south-eastern Iberian Peninsula (74–94% BS); and a clade formed by VIS, ONU, ALG, SPA, and INC, mostly distributed in the western Iberian Peninsula (100% BS in all analyses). Within the south-eastern Iberian clade, SAL was recovered as sister to a (BEC, CLE) clade. This phylogenetic position of CLE as sister to BEC (94–100% BS) was not recovered from concatenated analyses. Within the western Iberian clade, two sister species pairs found in the NJ$$_{\mathrm{st}}$$ tree were also consistently recovered from concatenated analyses: (VIS, ONU) (100% BS in all analyses) and (SPA, INC) (88–100% BS). ALG was recovered as sister to the (SPA, INC) pair, one of the two alternative positions identified in concatenated analyses. When increasing the minimum taxon coverage parameter to $$m =$$ 10, we obtained the same MR topology for all data sets, although with lower BS values. For $$m =$$ 20, there was a general loss of resolution, including basal polytomies and poor BS values (Supplementary Fig. S8 available on Dryad). SVDquartets analyses produced another topology (Fig. 2f, Supplementary Fig. S8 available on Dryad) that was highly robust to variations in clustering threshold and minimum taxon coverage. The monophyly of the Iberian clade (98–100% BS for analyses with $$m =$$ 4) and its sister relationship to the North African clade were again strongly supported, but there were two differences with the NJ$$_{\mathrm{st}}$$ tree: CLE was consistently recovered as sister to a clade formed by the remaining seven species (100% BS), and SPA was found to form a clade (69–97% BS) with the (VIS, ONU) pair (75–88% BS). The same topology was obtained for all analyses with $$m =$$ 10 and three out of four analyses with $$m =$$ 20 (Supplementary Fig. S8 available on Dryad). Tests of alternative topologies.— The two topologies obtained in previous analyses of ITS$$+$$ptDNA sequences (Topologies 7, 8) were strongly rejected by all tests implemented for all analyzed GBS data sets (Table 3). The SVDquartets tree (Topology 6) was rejected by all tests for the c85m4, c87m4, and c92m4 data sets. None of the remaining five topologies (four concatenated and NJ$$_{\mathrm{st}})$$ were consistently rejected. SH tests could not reject any of these five topologies for any of the data sets, while WSH tests only rejected the NJ$$_{\mathrm{st}}$$ topology (topology 5) for the c92m4 data set. AU tests rejected the NJ$$_{\mathrm{st}}$$ topology for the c87m4 and c92m4 data sets and one of the concatenation-based topologies (topology 2) for the c92m4 data set. All remaining $$P$$-values were nonsignificant (Table 3). Table 3. Results from topology tests in CONSEL Data set  Rank  Topology  Obs.  AU  SH  WSH  c84m4  1  1  –32.5  0.841  0.98  0.985     2  4  32.5  0.466  0.825  0.758     3  2  59.2  0.262  0.752  0.511     4  3  76.7  0.244  0.663  0.57     5  5  164.2  0.102  0.398  0.251     6  6  239.1  0.061  0.207  0.195     7  8  978.7  1.00E-07$$^*$$  $$0^*$$  $$0^*$$     8  7  6270.7  2.00E-08$$^*$$  $$0^*$$  $$0^*$$  c85m4  1  2  –29.9  0.823  0.986  0.991     2  5  29.9  0.365  0.791  0.694     3  3  46.1  0.335  0.746  0.647     4  1  85.1  0.12  0.615  0.294     5  4  137  0.063  0.411  0.247     6  6  456.7  0.001$$^*$$  0.004$$^*$$  0.001$$^*$$     7  8  1055  1.00E-04$$^*$$  $$0^*$$  $$0^*$$     8  7  6398.8  4.00E-05$$^*$$  $$0^*$$  $$0^*$$  c87m4  1  3  –61.4  0.897  0.991  0.997     2  4  61.4  0.205  0.69  0.489     3  2  71.4  0.228  0.655  0.477     4  1  115.5  0.137  0.491  0.356     5  5  182.4  0.015$$^*$$  0.287  0.099     6  6  442.4  1.00E-04$$^*$$  0.005$$^*$$  2.00E-04$$^*$$     7  8  1323.9  1.00E-06$$^*$$  $$0^*$$  $$0^*$$     8  7  6279  2.00E-06$$^*$$  $$0^*$$  $$0^*$$  c92m4  1  4  –75.7  0.89  0.993  0.996     2  1  75.7  0.198  0.64  0.496     3  3  77  0.173  0.639  0.435     4  2  155.7  0.033$$^*$$  0.346  0.174     5  5  268.1  0.005$$^*$$  0.136  0.027$$^*$$     6  6  488.9  4.00E-11$$^*$$  0.004$$^*$$  $$0^*$$     7  8  1282.5  3.00E-24$$^*$$  $$0^*$$  $$0^*$$     8  7  5344.1  5.00E-08$$^*$$  $$0^*$$  $$0^*$$  Data set  Rank  Topology  Obs.  AU  SH  WSH  c84m4  1  1  –32.5  0.841  0.98  0.985     2  4  32.5  0.466  0.825  0.758     3  2  59.2  0.262  0.752  0.511     4  3  76.7  0.244  0.663  0.57     5  5  164.2  0.102  0.398  0.251     6  6  239.1  0.061  0.207  0.195     7  8  978.7  1.00E-07$$^*$$  $$0^*$$  $$0^*$$     8  7  6270.7  2.00E-08$$^*$$  $$0^*$$  $$0^*$$  c85m4  1  2  –29.9  0.823  0.986  0.991     2  5  29.9  0.365  0.791  0.694     3  3  46.1  0.335  0.746  0.647     4  1  85.1  0.12  0.615  0.294     5  4  137  0.063  0.411  0.247     6  6  456.7  0.001$$^*$$  0.004$$^*$$  0.001$$^*$$     7  8  1055  1.00E-04$$^*$$  $$0^*$$  $$0^*$$     8  7  6398.8  4.00E-05$$^*$$  $$0^*$$  $$0^*$$  c87m4  1  3  –61.4  0.897  0.991  0.997     2  4  61.4  0.205  0.69  0.489     3  2  71.4  0.228  0.655  0.477     4  1  115.5  0.137  0.491  0.356     5  5  182.4  0.015$$^*$$  0.287  0.099     6  6  442.4  1.00E-04$$^*$$  0.005$$^*$$  2.00E-04$$^*$$     7  8  1323.9  1.00E-06$$^*$$  $$0^*$$  $$0^*$$     8  7  6279  2.00E-06$$^*$$  $$0^*$$  $$0^*$$  c92m4  1  4  –75.7  0.89  0.993  0.996     2  1  75.7  0.198  0.64  0.496     3  3  77  0.173  0.639  0.435     4  2  155.7  0.033$$^*$$  0.346  0.174     5  5  268.1  0.005$$^*$$  0.136  0.027$$^*$$     6  6  488.9  4.00E-11$$^*$$  0.004$$^*$$  $$0^*$$     7  8  1282.5  3.00E-24$$^*$$  $$0^*$$  $$0^*$$     8  7  5344.1  5.00E-08$$^*$$  $$0^*$$  $$0^*$$  Notes: Four genotyping-by-sequencing data sets were used to test eight alternative topologies in a maximum likelihood framework with concatenated matrices. Tested topologies (summarized in Fig. 3) included the four alternative optimal topologies produced by concatenated analyses of the same four data sets (Topologies 1–4; Fig. 2a–d), the topologies produced by coalescent-based NJ$$_{\mathrm{st}}$$ (Topology 5; Fig. 2e), and SVDquartets (Topology 6; Fig. 2f) analyses of all four data sets and two topologies produced by previously-published analyses of internal transcribed spacer and plastid DNA sequences (Topologies 7, 8). For each data set, topologies are ranked by the test statistics (Obs.), and $$P$$-values of the approximately unbiased (AU), Shimodaira–Hasegawa (SH), and Weighted Shimodaira–Hasegawa (WSH) tests are shown. Asterisks indicate significant results ($$P $$ 0.05). Figure 3. View largeDownload slide Numbers of loci supporting and disfavoring constrained concatenated trees with eight alternative topologies (1–8, left) in four genotyping-by-sequencing data sets assembled using different clustering thresholds (c84m4, c85m4, c87m4, c92m4; right; see Table 2). Topologies 1–4 are the four alternative optimal topologies respectively produced by concatenated analyses of the same four data sets (see Fig. 2a–d). Topology 5 is the one produced by coalescent-based analyses of all four data sets in NJ$$_{\text{st}}$$ (see Fig. 2e). Topology 6 is the one produced by coalescent-based analyses of all four data sets in SVDquartets (see Fig. 2f). Topologies 7 and 8 were produced by previously published concatenated (Topology 7) and coalescent-based (Topology 8) analyses of combined ITS and ptDNA sequences (Fernández-Mazuecos et al. 2013a; Vigalondo et al. 2015). For each data set, numbers of loci significantly supporting and disfavoring each tree topology are plotted against the log-likelihood of the tree when using the full concatenated matrix. Black dots represent the ML tree for each data set. White dots represent constrained trees that were significantly rejected by Shimodaira–Hasegawa tests, and grey dots represent constrained trees that were not significantly rejected (see Table 3). Figure 3. View largeDownload slide Numbers of loci supporting and disfavoring constrained concatenated trees with eight alternative topologies (1–8, left) in four genotyping-by-sequencing data sets assembled using different clustering thresholds (c84m4, c85m4, c87m4, c92m4; right; see Table 2). Topologies 1–4 are the four alternative optimal topologies respectively produced by concatenated analyses of the same four data sets (see Fig. 2a–d). Topology 5 is the one produced by coalescent-based analyses of all four data sets in NJ$$_{\text{st}}$$ (see Fig. 2e). Topology 6 is the one produced by coalescent-based analyses of all four data sets in SVDquartets (see Fig. 2f). Topologies 7 and 8 were produced by previously published concatenated (Topology 7) and coalescent-based (Topology 8) analyses of combined ITS and ptDNA sequences (Fernández-Mazuecos et al. 2013a; Vigalondo et al. 2015). For each data set, numbers of loci significantly supporting and disfavoring each tree topology are plotted against the log-likelihood of the tree when using the full concatenated matrix. Black dots represent the ML tree for each data set. White dots represent constrained trees that were significantly rejected by Shimodaira–Hasegawa tests, and grey dots represent constrained trees that were not significantly rejected (see Table 3). Topologies 7 and 8 were also consistently supported by the smallest numbers of loci and disfavored by the highest numbers of loci (Fig. 3). The ML tree for each concatenated data set was also supported by the highest number and disfavored by the smallest number of loci. The remaining coalescent- and concatenation-based topologies were supported and disfavored by intermediate numbers of loci, with the NJ$$_{\mathrm{st}}$$ tree being consistently supported by more loci than the SVDquartets tree (Fig. 3). Hybridization analyses.— When assuming the NJ$$_{\mathrm{st}}$$ species tree as MVIP in $$D$$-statistic tests, we found clear evidence for an excess of shared derived alleles between BEC and SAL in three of the data sets (24–30% of significant tests depending on the data set) but weaker evidence for an excess of shared derived alleles between SPA and VIS/ONU (8–13% of significant tests) (Table 4). When assuming the SVDquartets species tree as MVIP, the tests (Table 4) found evidence for an excess of shared derived alleles between SPA and INC (24–41% of significant tests), and between BEC and CLE (18–35% of significant tests). Table 4. D-statistic tests conducted on four genotyping-by-sequencing data sets given a four-taxon tree (((P1,P2),P3),O) Data set  Test set  Assumed species tree  P1  P2  P3  O  N  Sig. (ABBA $$>$$ BABA)  Sig. (BABA $$>$$ ABBA)  c84m4  1  NJ$$_{\mathrm{st}}$$  ALG, INC  SPA  VIS, ONU  BEC, SAL  648  9.26  3.55     2  NJ$$_{\mathrm{st}}$$  CLE  BEC  SAL  VIS, ONU, ALG, SPA, INC  405  23.95  0     3  SVDquartets  VIS, ONU  SPA  INC  BEC, SAL  324  25.00  0.31     4  SVDquartets  ALG, INC, SPA, VIS, ONU  BEC  CLE  NAF  540  22.59  0.00  c85m4  1  NJ$$_{\mathrm{st}}$$  ALG, INC  SPA  VIS, ONU  BEC, SAL  648  8.33  2.31     2  NJ$$_{\mathrm{st}}$$  CLE  BEC  SAL  VIS, ONU, ALG, SPA, INC  405  27.16  0     3  SVDquartets  VIS, ONU  SPA  INC  BEC, SAL  324  24.07  0.62     4  SVDquartets  ALG, INC, SPA, VIS, ONU  BEC  CLE  NAF  540  34.63  0.00  c87m4  1  NJ$$_{\mathrm{st}}$$  ALG, INC  SPA  VIS, ONU  BEC, SAL  648  11.11  5.40     2  NJ$$_{\mathrm{st}}$$  CLE  BEC  SAL  VIS, ONU, ALG, SPA, INC  405  29.88  0     3  SVDquartets  VIS, ONU  SPA  INC  BEC, SAL  324  38.27  0.31     4  SVDquartets  ALG, INC, SPA, VIS, ONU  BEC  CLE  NAF  540  17.96  0.74  c92m4  1  NJ$$_{\mathrm{st}}$$  ALG, INC  SPA  VIS, ONU  BEC, SAL  648  12.65  2.01     2  NJ$$_{\mathrm{st}}$$  CLE  BEC  SAL  VIS, ONU, ALG, SPA, INC  405  5.68  1.73     3  SVDquartets  VIS, ONU  SPA  INC  BEC, SAL  324  41.36  0.31     4  SVDquartets  ALG, INC, SPA, VIS, ONU  BEC  CLE  NAF  540  18.89  0.93  Data set  Test set  Assumed species tree  P1  P2  P3  O  N  Sig. (ABBA $$>$$ BABA)  Sig. (BABA $$>$$ ABBA)  c84m4  1  NJ$$_{\mathrm{st}}$$  ALG, INC  SPA  VIS, ONU  BEC, SAL  648  9.26  3.55     2  NJ$$_{\mathrm{st}}$$  CLE  BEC  SAL  VIS, ONU, ALG, SPA, INC  405  23.95  0     3  SVDquartets  VIS, ONU  SPA  INC  BEC, SAL  324  25.00  0.31     4  SVDquartets  ALG, INC, SPA, VIS, ONU  BEC  CLE  NAF  540  22.59  0.00  c85m4  1  NJ$$_{\mathrm{st}}$$  ALG, INC  SPA  VIS, ONU  BEC, SAL  648  8.33  2.31     2  NJ$$_{\mathrm{st}}$$  CLE  BEC  SAL  VIS, ONU, ALG, SPA, INC  405  27.16  0     3  SVDquartets  VIS, ONU  SPA  INC  BEC, SAL  324  24.07  0.62     4  SVDquartets  ALG, INC, SPA, VIS, ONU  BEC  CLE  NAF  540  34.63  0.00  c87m4  1  NJ$$_{\mathrm{st}}$$  ALG, INC  SPA  VIS, ONU  BEC, SAL  648  11.11  5.40     2  NJ$$_{\mathrm{st}}$$  CLE  BEC  SAL  VIS, ONU, ALG, SPA, INC  405  29.88  0     3  SVDquartets  VIS, ONU  SPA  INC  BEC, SAL  324  38.27  0.31     4  SVDquartets  ALG, INC, SPA, VIS, ONU  BEC  CLE  NAF  540  17.96  0.74  c92m4  1  NJ$$_{\mathrm{st}}$$  ALG, INC  SPA  VIS, ONU  BEC, SAL  648  12.65  2.01     2  NJ$$_{\mathrm{st}}$$  CLE  BEC  SAL  VIS, ONU, ALG, SPA, INC  405  5.68  1.73     3  SVDquartets  VIS, ONU  SPA  INC  BEC, SAL  324  41.36  0.31     4  SVDquartets  ALG, INC, SPA, VIS, ONU  BEC  CLE  NAF  540  18.89  0.93  Notes: Three individuals per species of the Iberian clade and four individuals from different species of the North African clade were selected, and introgression hypotheses potentially explaining topological differences between the two species trees obtained in coalescent-based analyses (Fig. 2e,f) were tested. For each set of tests, we show the number of tests conducted using different combinations of individuals ($$N$$) and the percentage of significant tests ($$P < 0.05$$) for ABBA $$>$$BABA (potential introgression between P2 and P3) and BABA $$>$$ABBA (potential introgression between P1 and P3). Evaluation of the NJ$$_{\mathrm{st}}$$ and SVDquartets trees without reticulations in SNaQ (Table 5, Fig. 2g,h) consistently supported the NJ$$_{\mathrm{st}}$$ tree for all analyzed data sets. When including reticulations accounting for incongruences between species tree topologies, three data sets (c85m4, c87m4, and c92m4) supported the NJ$$_{\mathrm{st}}$$ topology as MVIP, while the c84m4 data set supported the SVDquartets topology (although in this case with a small difference in pseudo-deviance values). For the best supported network (with the NJ$$_{\mathrm{st}}$$ tree as MVIP), CLE was estimated to contain a 81–85% genomic contribution from a lineage sister to BEC and 15–19% from an ancestral lineage sister to the other seven species of the clade. Similarly, SPA was estimated to contain a 81–87% contribution from a lineage sister to INC and a 13–19% contribution from a lineage sister to the (VIS, ONU) clade. Table 5. Pseudo-deviance values obtained from tree and network evaluation using the SNaQ method   NJ$$_{\mathrm{st}}$$ tree without reticulations  SVDquartets tree without reticulations  NJ$$_{\mathrm{st}}$$ tree with reticulations  SVDquartets tree with reticulations  c84m4  734.72  1084.15  736.67  732.19  c85m4  758.05  1120.34  761.38  801.18  c87m4  817.74  1201.44  829.88  896.67  c92m4  1547.23  1648.7  1166.42  1406.25    NJ$$_{\mathrm{st}}$$ tree without reticulations  SVDquartets tree without reticulations  NJ$$_{\mathrm{st}}$$ tree with reticulations  SVDquartets tree with reticulations  c84m4  734.72  1084.15  736.67  732.19  c85m4  758.05  1120.34  761.38  801.18  c87m4  817.74  1201.44  829.88  896.67  c92m4  1547.23  1648.7  1166.42  1406.25  Notes: Four reduced data sets with 27 individuals were used to evaluate four fixed topologies, including the NJ$$_{\mathrm{st}}$$ and SVDquartets species tree topologies without reticulations (Fig. 2e,f), the NJ$$_{\mathrm{st}}$$ tree as major vertical inheritance pattern (MVIP) with reticulations representing incongruences with the SVDquartets tree (Fig. 2g), and the SVDquartets tree as MVIP with hybrid edges representing incongruences with the NJ$$_{\mathrm{st}}$$ tree (Fig. 2h). Lower pseudo-deviance values indicate a better fit. Patterns of Morphological Evolution The phylomorphospace based on vegetative and reproductive traits and the NJ$$_{\mathrm{st}}$$ species tree (Fig. 4b) suggested that SAL has retained morphological traits close to those of the common ancestor of the Iberian clade, while other species have experienced extensive morphological change in multiple directions. Changes of great magnitude in opposite directions of the morphospace are inferred during the divergence of pairs of sister species, including BEC/CLE, VIS/ONU and, to a lesser extent, SPA/INC. At the same time, convergence is suggested by the proximity in the morphospace of nonclosely related species pairs, such as INC/ONU, BEC/ALG and SAL/SPA. When focusing on flower shape (Fig. 4c), morphological disparity between sister species was even more striking. The ancestral flower for the clade seems to have had a corolla with a broad tube and relatively long nectar spur (Type I), a basic shape that has been retained by five species in both subclades (SAL, BEC, SPA, VIS and ALG). Speciation in the sister species pairs VIS/ONU and SPA/INC involved extensive change in flower shape along CV1, which independently gave rise to narrow-tubed, long-spurred flowers (Type III) in ONU and INC. By contrast, speciation of the BEC/CLE pair involved extensive change along CV2, producing the short-spurred, broad-tubed flowers (Type II) of CLE. The longest-spurred Type I flowers (with the lowest values of CV2) have evolved independently in BEC (sister to CLE) and ALG. Mapping of the phylogeny onto geographic ranges showed that pairs of morphologically divergent sister species display geographically close or even overlapping distributions (Fig. 5): BEC and CLE are both narrow endemics to the same small region in south-eastern Spain; VIS and ONU occur in south-western Iberia; and the distributions of SPA and INC largely overlap in the central-western Peninsula. MP reconstructions based on the NJ$$_{\mathrm{st}}$$ tree (Supplementary Fig. S9a available on Dryad) unambiguously inferred a single shift from annual to perennial habit (in CLE), and multiple shifts in inflorescence habit, dominant corolla color and corolla shape. ML reconstructions (Supplementary Fig. S9b available on Dryad) produced similar results. Figure 4. View largeDownload slide Patterns of morphological evolution in the Iberian clade of Linaria subsect. Versicolores based on coalescent-based phylogenetic analysis of genotyping-by-sequencing data. a) Coalescent-based species tree obtained using the NJ$$_{\text{st}}$$ method, with dotted lines representing the alternative position of two species in the SVDquartets tree. Flowers of the eight species of the Iberian clade of Linaria subsect. Versicolores are shown (photos by J. Ramírez and M. Fernández-Mazuecos), and mean values of spur length and tube width according to Fernández-Mazuecos et al. (2013a) are plotted. b) Phylomorphospace defined by the NJ$$_{\text{st}}$$ phylogeny and the first three canonical discriminant functions of the morphometric analysis based on vegetative and reproductive traits. Each species is represented by the mean values of the coordinates. c) Phylomorphospace defined by the NJ$$_{\text{st}}$$ phylogeny and the first two canonical variates of a previously published geometric morphometric analysis of flowers (Fernández-Mazuecos et al. 2013a). Figure 4. View largeDownload slide Patterns of morphological evolution in the Iberian clade of Linaria subsect. Versicolores based on coalescent-based phylogenetic analysis of genotyping-by-sequencing data. a) Coalescent-based species tree obtained using the NJ$$_{\text{st}}$$ method, with dotted lines representing the alternative position of two species in the SVDquartets tree. Flowers of the eight species of the Iberian clade of Linaria subsect. Versicolores are shown (photos by J. Ramírez and M. Fernández-Mazuecos), and mean values of spur length and tube width according to Fernández-Mazuecos et al. (2013a) are plotted. b) Phylomorphospace defined by the NJ$$_{\text{st}}$$ phylogeny and the first three canonical discriminant functions of the morphometric analysis based on vegetative and reproductive traits. Each species is represented by the mean values of the coordinates. c) Phylomorphospace defined by the NJ$$_{\text{st}}$$ phylogeny and the first two canonical variates of a previously published geometric morphometric analysis of flowers (Fernández-Mazuecos et al. 2013a). Broadly similar patterns of morphological evolution were inferred from the SVDquartets tree (Supplementary Fig. S9c–f available on Dryad), including extensive morphological change associated with speciation events, as well as phenotypic convergence. In phylomorphospace analyses, the main differences resulted from the early-diverging position of CLE in this tree. As a consequence, the extensive morphological change leading to CLE was associated with the first speciation event in the Iberian clade, instead of a more recent divergence with BEC. Discussion GBS, an Effective Method to Resolve Recent Radiations Our genotyping-by-sequencing approach proved successful in generating highly informative genome-wide genetic markers, not only across the recent (Quaternary) radiation of our 8-species study clade, but also across the whole Linaria sect. Versicolores, whose most recent common ancestor has been dated back to the late Miocene-Pliocene (Fernández-Mazuecos and Vargas 2011; Fernández-Mazuecos et al. 2013a). Regardless of the method used for phylogenetic inference, inferred basal relationships (i.e., the monophyly and sister relationship of the Iberian and North African clades of subsect. Versicolores; Figs 2 and 4a) were identical to those obtained using conventional phylogenetic markers. The two major subclades within the Iberian clade supported by coalescent-based analyses of GBS data in NJ$$_{\mathrm{st}}$$ (Fig. 4a) broadly corresponded to those already obtained (with varying support) in a coalescent-based analysis of ITS and ptDNA sequences and in a concatenated analysis of ptDNA sequences (but they were inconsistent with those obtained using ITS sequences alone) (Fernández-Mazuecos et al. 2013a; Fernández-Mazuecos and Vargas 2015). However, apart from major subclades, previous species-level topologies found little support in our genome-wide data, and comparatively more reliable phylogenetic hypotheses were inferred herein. In addition, genetic structure analyses recovered unprecedented resolution of species boundaries, as shown by the congruence between inferred genetic clusters and morphologically-defined species (Fig. 1, Supplementary Fig. S5 available on Dryad). Therefore, despite the uncertainties remaining (possibly explained by hybridization; see below), our results show that GBS can provide unprecedented systematic and phylogenetic resolution in recent radiations where conventional markers failed, and without the need for previous genomic information (Escudero et al. 2014). Given their relatively simple and inexpensive implementation, GBS and similar methods of the GBS/RAD-Seq family may be universally and routinely applicable by systematic biologists as a NGS-based replacement to conventional phylogenetic approaches (Ree and Hipp 2015; de la Harpe et al. 2017; Hamon et al. 2017). Indeed, after DNA extraction, GBS library preparation can be carried out in 2–3 days using equipment available in most molecular systematic laboratories, and successful results may be obtained by multiplexing c.100 individuals in a single HiSeq Illumina lane (although the allowed level of multiplexing would depend on genome sizes and GC content as well as number of reads per lane). Interestingly, even though recently collected tissue is clearly preferable due to a decline in sequencing success with sample age as a result of DNA degradation, we still obtained $$>$$1000 sequenced loci with herbarium samples as old as 20–30 years (Supplementary Fig. S4 available on Dryad; but see an improved method for highly degraded DNA samples in Suchan et al. 2016). Compared to the RAD-Seq procedure (Baird et al. 2008), which has been more widely used in phylogenetic and population genetic studies in recent years, the GBS protocol of Elshire et al. (2011) has the advantage of a much simpler protocol for library preparation, including fewer purification steps (thus increasing the probability of success with low-starting amounts) and of not requiring accurate size selection of DNA fragments (therefore involving less specialized equipment). The main drawback of GBS, particularly when applied to phylogenetics, is the high proportion of missing data in the assembled data sets (Table 2). This problem is shared by RAD-Seq, although to a lesser extent thanks to size selection. Data sets with lower percentages of missing data may be obtained by increasing the minimum taxon coverage, therefore restricting the final data set to those loci that were sequenced in a high-percentage of individuals. Still, simulations show that accurate phylogenies can be obtained from data sets including vast amounts of missing data (Rubin et al. 2012). Furthermore, increasing the minimum taxon coverage may result in the loss of valuable phylogenetic information due to reduced data set size and a biased representation of the mutation spectrum across sequenced loci (Huang and Knowles 2014). This is consistent with our empirical results, in which the numbers of loci dramatically decreased when increasing the minimum taxon coverage (Table 2), resulting in a loss of phylogenetic resolution in both concatenated and coalescent-based analyses (Supplementary Figs S6 and S8 available on Dryad). Nevertheless, more complete GBS data sets may be obtained, at the same time preventing information loss, by multiplexing smaller numbers of samples per Illumina lane (thus increasing sequencing coverage) and optimizing the quality of starting DNA. Concatenation Versus Coalescent-Based Approaches to GBS Phylogenetics A concatenation approach has been widely used for the analysis of restriction digest-based phylogenomic data sets (e.g., Nadeau et al. 2013; Wagner et al. 2013; Escudero et al. 2014; Hipp et al. 2014; Cavender-Bares et al. 2015). In particular, ML in RAxML is the current standard for concatenated analysis thanks to its high efficiency with large data matrices. However, there are several well-known drawbacks of concatenated analysis (Wagner et al. 2013). Concatenation assumes a shared phylogenetic history for all sequenced genes, and therefore it is inconsistent in the presence of gene tree discordance caused by ILS and hybridization (Kubatko and Degnan 2007; Solís-Lemus et al. 2016). Accuracy of species phylogenies inferred by concatenation relies on the most frequent gene tree topology being the same as the species tree topology, but “anomaly zones” exist under certain conditions (involving short branch lengths) in which the most likely gene tree topology differs from the species tree (Degnan and Rosenberg 2006). Additionally, as phylogenomic data sets grow larger, concatenation is more likely to produce anomalously high statistical support for incorrect topologies as a result of systematic biases (Gadagkar et al. 2005; Kumar et al. 2011). Bias may result from the specification of a single substitution model, which assumes substitution rate homogeneity across the whole data set. Partitioned analysis may prevent this problem, but it may be computationally problematic with high numbers of loci. Issues with orthology determination and alignment can also result in biases (Kumar et al. 2011), which may be alleviated if a sequenced genome is available as reference for data assembly. Finally, another well-known cause of bias is long-branch attraction, which may be mitigated by a complete taxonomic sampling and by avoiding those methods that are more sensitive to it, such as parsimony (Bergsten 2005). Analyses of our GBS data sets highlight such shortcomings of concatenation. The high statistical support, as measured by both ML bootstrap and Bayesian posterior probabilities, for alternative topologies (Fig. 2a–d) is a sign of systematic bias (Kumar et al. 2011) resulting from the different clustering thresholds. This is true for both concatenated full-sequence and SNP matrices (Supplementary Figs S6 and S7 available on Dryad). Indeed, changes in the clustering threshold produce contrasting optimal topologies (with alternative phylogenetic positions for two species) as a result of significant changes in the numbers of loci supporting and disfavoring alternative trees (Fig. 3). Interestingly, in contrast to the misleading confidence suggested by BS and PP values, topology tests revealed no such certainty, as they were generally unable to reject topologies produced by alternative GBS data sets (they did reject previously-published topologies based on only two loci; Table 3). This bias potentially introduced during data assembly at the crucial steps of orthology determination casts doubts on any concatenation analysis based on a fixed clustering threshold. While methods have been proposed to optimize clustering parameters (Mastretta-Yanes et al. 2015), the significant topological changes that we found associated with small changes in the clustering threshold (as small as 1%; Fig. 2a–d) indicate that multiple analyses based on a range of values (Takahashi et al. 2014; Leaché et al. 2015b), combined with topology tests, are still necessary to evaluate if high clade support values provide a realistic measurement of confidence. In contrast to concatenation, gene-tree-based coalescent methods have been less frequently used for the inference of species trees from restriction digest-based NGS data. The short length of sequenced loci when analyzing single-end data (c. 100–200 bp) may result in poorly informative gene trees based on individual loci, which may be a problem for species tree inference (Salichos and Rokas 2013; Mirarab et al. 2016), and full probabilistic methods co-estimating gene trees and species trees (Liu 2008; Heled and Drummond 2010) are generally unable to handle large NGS data sets (Wagner et al. 2013). Unlike full probabilistic methods, summary methods that operate by combining previously estimated gene trees (e.g. Liu and Yu 2011; Mirarab et al. 2014) are computationally efficient with large data sets. Even with short loci, one of these methods (NJ$$_{\mathrm{st}})$$ allowed us to infer a resolved species tree (Figs 2e and 4a), showing that the implementation of summary methods is computationally feasible with GBS and RAD-Seq data (see also Ebel et al. 2015; DaCosta and Sorenson 2016). Moreover, the same species tree was produced by this coalescent method from data sets assembled using different clustering thresholds that produced contrasting topologies in concatenation analyses and, furthermore, the NJ$$_{\mathrm{st}}$$ topology could not be rejected by a majority of topology tests using the concatenated data sets (Table 3). The NJ$$_{\mathrm{st}}$$ topology was similar to the concatenation-based topologies, with one of the problematic species, ALG, consistently recovered at one of the two alternative positions of the concatenated analyses (although with varying support). By contrast, the other problematic species, CLE, was recovered as sister to BEC, a relationship not found in concatenated analyses but biogeographically reasonable, given the close proximity of the two species’ narrow distributions (Fig. 5). All other species remained at the phylogenetic placements found in concatenation analyses. Figure 5. View largeDownload slide Biogeographic patterns in the Iberian clade of Linaria subsect. Versicolores based on coalescent-based phylogenetic analysis of genotyping-by-sequencing data. The NJ$$_{\text{st}}$$ species tree is projected onto the geographic distribution of the eight species in the Iberian Peninsula. Dashed lines at the limit between L. spartea and L. viscosa indicate uncertainty on distribution ranges. Figure 5. View largeDownload slide Biogeographic patterns in the Iberian clade of Linaria subsect. Versicolores based on coalescent-based phylogenetic analysis of genotyping-by-sequencing data. The NJ$$_{\text{st}}$$ species tree is projected onto the geographic distribution of the eight species in the Iberian Peninsula. Dashed lines at the limit between L. spartea and L. viscosa indicate uncertainty on distribution ranges. As an alternative to gene-tree-based methods, coalescent-based approaches that bypass the inference of gene trees have been proposed (Bryant et al. 2012; Chifman and Kubatko 2014) and applied to RAD-Seq data sets (DaCosta and Sorenson 2016; Manthey et al. 2016). We tested the performance of the SVDquartets method, which infers relationships among quartets of taxa using algebraic statistics (Chifman and Kubatko 2014). While poor resolution and support values were recovered when analyzing unlinked SNP matrices (Supplementary Fig. S8 available on Dryad), analyses of full-sequence matrices produced a well-supported topology that was highly robust to variations in assembly parameters (Fig. 2f, Supplementary Fig. S8 available on Dryad), but different to the NJ$$_{\mathrm{st}}$$ topology. In the SVDquartets tree, CLE was sister to the remaining seven species, a position encountered in some concatenated analyses. SPA was related to VIS and ONU, a position only obtained for one concatenated data set (c84m20; Supplementary Fig. S6 available on Dryad), but consistent with the close proximity of these tree species in the SNP-based PCA (Fig. 1c). Robustness of phylogenetic patterns across methods and data is considered crucial in phylogenomics (Kumar et al. 2011). The constancy of the NJ$$_{\mathrm{st}}$$ and SVDquartets species trees across data sets indicates that coalescent analyses may be more robust than concatenation to systematic bias introduced during data set assembly. This result adds to the list of advantages of coalescent methods over concatenation, including their accuracy in the presence of gene tree discordance caused by ILS (Mirarab et al. 2016) and their more realistic measures of clade statistical support (Giarla and Esselstyn 2015). The question remains as to the robustness of our results to the violation of the main assumption of most coalescent methods, i.e. the absence of interspecific gene flow. Indeed, historical hybridization may be behind the contrasting topologies of the NJ$$_{\mathrm{st}}$$ and SVDquartets trees. We minimized the effect of recent hybridization by excluding individuals suggested as hybrids by genetic structure analyses. However, $$D$$-statistic tests (Table 4) still supported ancestral hybridization events during diversification of the clade, as indicated by significant asymmetries in the sharing of derived alleles between nonsister species when using either tree as the major vertical inheritance pattern. These results show that a fully bifurcating species tree is an oversimplified representation of the diversification of young clades. Simulations indicate that coalescent methods, including NJ$$_{\mathrm{st}}$$, are able to infer the correct dominant species tree in the presence of low levels of gene flow, given a sufficiently large number of genes (Solís-Lemus et al. 2016). With high levels of gene flow, coalescent methods may not be able to recover the true dominant topology, but they are still more accurate than concatenation (Solís-Lemus et al. 2016). We propose that both coalescent-based topologies (and possibly the concatenation-based topologies to an extent) obtained from our GBS data represent different aspects of a reticulate history of diversification (Fig. 4a). At the same time, we favor the NJ$$_{\mathrm{st}}$$ tree over the SVDquartets tree as the current best estimate of the major pattern of speciation based on the following lines of evidence: (i) the NJ$$_{\mathrm{st}}$$ tree generally fitted the data better in SNaQ analyses, both with and without hybridization (Table 5); (ii) the SVDquartets tree was more frequently rejected in topology tests (Table 3); and (iii) the NJ$$_{\mathrm{st}}$$ tree was supported by more gene trees (Fig. 3). In any case, we cannot minimize the potential role of homoploid hybrid speciation, already suggested for other Linaria clades (Blanco-Pastor et al. 2012; see also Nieto Feliner et al. 2017). The origin of L. clementei is particularly intriguing, as it may have involved hybridization between an early-diverging lineage and a recently derived one (probably with a higher contribution of the latter according to SNaQ analyses; Table 5; Fig. 2g). Phenotypic Divergence and Convergence during Speciation Our combination of multivariate morphometric, genetic structure, and coalescent-based phylogenetic analyses for the first time provide a well-resolved picture of systematics and patterns of morphological evolution in a Mediterranean plant lineage, the Iberian clade of Linaria subsect. Versicolores. The data support a systematic hypothesis with eight morphologically and genetically distinct species. Taxonomic notes are provided in Supplementary Appendix S1 available on Dryad. Our results are essentially consistent with previous taxonomic revisions (Sutton 1988; Sáez and Bernal 2009), except that L. salzmannii had been generally considered a subspecies of L. viscosa (L. viscosa subsp. spicata) but is now supported as a distinct species as a result of both taxa being independently derived in all phylogenetic analyses (Figs 2 and 4a; see also Blanca et al. 2017). A flower color polymorphism within L. salzmannii (Boissier 1841; Sáez and Bernal 2009) is confirmed, as shown by the morphological and genetic similarity of the yellow- and purple-flowered morphs, and L. becerrae is well supported as distinct from L. salzmannii (Blanca et al. 2017). The morphologically similar (particularly in flower traits) L. spartea and L. viscosa are established as genetically distinct and nonsister species (Supplementary Fig. S5 available on Dryad, Figs 2 and 4a), although further population genetic sampling would be required to determine the exact limits of their geographic ranges. Finally, plants collected in the locus classicus of L. viscosa subsp. crassifolia are found to fall within the morphological and genetic variation of L. spartea (Fig. 1, Supplementary Figs S5 and S3 available on Dryad). Mapping of morphological traits and geography onto the NJ$$_{\mathrm{st}}$$ phylogeny (which we consider a good estimate of major speciation events) shows a pattern in which closely related species have geographically close or overlapping distributions, but they are strikingly divergent in phenotypic traits, and potentially ecological interactions (Figs 4 and 5, Supplementary Fig. S9a,b available on Dryad). At the same time, unrelated species have evolved convergent phenotypes in different geographic regions. This pattern has been found in large scale replicated island radiations (Mahler et al. 2013), but is documented here for a smaller scale radiation in a continental setting (see Hughes and Eastwood 2006; Mittelbach and Schemske 2015). The deeper splits show a geographic pattern, reflecting the progressive colonization of the Iberian Peninsula during the Quaternary (Fernández-Mazuecos et al. 2013a; Fernández-Mazuecos and Vargas 2015). The major phylogenetic split between south-eastern and western species coincides with a phylogeographic discontinuity found for ptDNA markers (Fernández-Mazuecos and Vargas 2015). There is no clear pattern of morphological differentiation linked to this split, as indicated by the phylomorphospace based on both vegetative and reproductive characters (Fig. 4b). In contrast, extensive morphological divergence is associated with recent splits between sister species (Fig. 4b,c, Supplementary Fig. S9a,b available on Dryad). This is particularly true for floral traits related to pollinator specialization (Fenster et al. 2004), including corolla color, corolla shape, nectar spur length and tube width. Three recent splits between sister species are revealed by the phylogeny. One of them, in the south-eastern subclade, has led to the divergent flower morphologies of L. clementei and L. becerrae, respectively displaying the shortest and one of the two longest nectar spurs in the clade (Fig. 4). In addition, L. clementei is the only perennial species in the clade, which adds to the singularity of this species (see also DFA results; Fig. 1b) and its phenotypic divergence with the annual L. becerrae. The two other recent speciation events, in the western subclade, L. viscosa—L. onubensis in southwestern Iberia and L. spartea—L. incarnata in central Iberia, also represent high degrees of phenotypic divergence. Interestingly, they have both followed nearly identical pathways of floral change in different geographic regions, including divergence in corolla color (yellow in L. viscosa and L. spartea, and purple-dominated in their sisters L. onubensis and L. incarnata), corolla tube width (broad in L. viscosa and L. spartea, narrow in L. onubensis and L. incarnata), and overall flower shape (with parallel divergence along the first axis of the geometric morphometric analysis) (Fig. 4, Supplementary Fig. S9a,b available on Dryad). The most likely floral ancestral morphology for the western subclade, and for the two pairs of sister species, corresponds to a purple corolla with broad tube and type I shape. L. algarviana is the only species in the subclade that has retained this ancestral morphology (except for its elongated nectar spur), while convergent changes have happened in the two other pairs of sister species: a yellow corolla has independently evolved in L. spartea and L. viscosa from purple-flowered ancestors, and a narrow-tubed, type III corolla has independently evolved in L. onubensis and L. incarnata from broad-tubed, Type I ancestors (Fig. 4, Supplementary Fig. S9a,b available on Dryad). Broad- and narrow-tubed flowers have been suggested as distinct evolutionary optima representing divergent strategies of pollen placement on nectar-feeding insects (Fernández-Mazuecos et al. 2013a). Other instances of phenotypic convergence in geographically disjunct species include the additional shift to a yellow from a purple-dominated corolla in most of the range of L. salzmannii, the independent evolution of dense inflorescences from lax inflorescences in the south-eastern subclade and L. viscosa, and the independent evolution of the longest nectar spurs in L. becerrae and L. algarviana (Fig. 4, Supplementary Fig. S9a,b available on Dryad). General patterns of morphological evolution are similar when analyzing the SVDquartets tree (Supplementary Fig. S9c–f available on Dryad). Even though some of the details are different, morphological divergence between close relatives and convergence between nonclose relatives is still detected, indicating that our main evolutionary inferences are robust to the uncertainty in topology and to hypothesized reticulation events. Similar results have been obtained, although with lower resolution, in the North African clade of Linaria subsect. Versicolores (Fernández-Mazuecos et al. 2013a) and in Linaria sect. Supinae (Blanco-Pastor et al. 2015), pointing to a general pattern in the genus. In summary, our results indicate a higher robustness of coalescent approaches, as compared to concatenation, for GBS-based species phylogenetics. In our study clade, combined roles of geographical and ecological isolation in speciation are suggested, together with a likely role of hybridization. Recent speciation events involve extensive divergence in key traits linked to pollinator interactions, such as the length of the nectar spur, considered a key innovation and a model system for eco-evo-devo studies of plant speciation (Fernández-Mazuecos and Glover 2017). These patterns suggest adaptive, ecological diversification, including cases of “parallel speciation” likely associated with similar isolating barriers and selective pressures in different regions. Supplementary Material Data available from the Dryad Digital Repository: http://dx.doi.org/10.5061/dryad.mp818. Acknowledgements The authors thank Matthew Dorling for technical assistance; Edwige Moyroud, Marcial Escudero, José Luis Blanco-Pastor, Irene Villa, Isabel Liberal, Yurena Arjona, Sam Brockington, Paul Grabowski, Natasha Elina and Cristina Ariani for their support and useful discussion at different stages of this study; Bruno Santos and Greg Habrych for bioinformatic assistance; the David Baulcombe lab for permission to use laboratory and bioinformatic resources; Alberto Fernández-Mazuecos and Joaquín Ramírez for fieldwork support; José Quiles, Jesús Vílchez, Lucas Gutiérrez, Enrique Sánchez-Gullón, Isabel Marques, and Belén Estébanez for plant materials and/or population locations; the MA, RNG, ABH, SEV, JAEN, BCB, COI, LISU, and SALA herbaria for plant materials; Mats Hjertson, Laurence Loze, Laurent Gautier, and Roxali Bijmoer for their support in solving taxonomic questions; and three anonymous reviewers for their helpful comments. Funding This work was supported by the Marie Curie Intra-European Fellowship LINARIA-SPECIATION (FP7-PEOPLE-2013-IEF, project reference 624396) and an Isaac Newton Trust Research Grant (Trinity College, Cambridge). References Aberer A.J., Kobert K., Stamatakis A. 2014. ExaBayes: massively parallel Bayesian tree inference for the whole-genome era. Mol. Biol. Evol.  31: 2553– 2556. Google Scholar CrossRef Search ADS PubMed  Adler D., Murdoch D. 2016. RGL—3D real-time visualization device system for R. Available from: http://rgl.neoscientists.org/. álvarez I., Wendel J.F. 2003. Ribosomal ITS sequences and plant phylogenetic inference. Mol. Phylogenet. Evol.  29: 417– 434. Google Scholar CrossRef Search ADS PubMed  Andrews S. 2010. FastQC: a quality control tool for high throughput sequence data. Available from: http://www.bioinformatics.babraham.ac.uk/projects/fastqc/. Baird N.A., Etter P.D., Atwood T.S., Currey M.C., Shiver A.L., Lewis Z.A., Selker E.U., Cresko W.A., Johnson E.A. 2008. Rapid SNP discovery and genetic mapping using sequenced RAD markers. Plos One  3: e3376. Google Scholar CrossRef Search ADS PubMed  Bergsten J. 2005. A review of long-branch attraction. Cladistics  21: 163– 193. Google Scholar CrossRef Search ADS   Blanca G., Cueto M., Fuentes J. 2017. Linaria becerrae (Plantaginaceae), a new endemic species from the southern Spain, and remarks on what Linaria salzmannii is and is not. Phytotaxa  298: 261– 268. Google Scholar CrossRef Search ADS   Blanco-Pastor J.L., Ornosa C., Romero D., Liberal I.M., Gómez J.M., Vargas P. 2015. Bees explain floral variation in a recent radiation of Linaria. J. Evolution. Biol.  28: 851– 863. Google Scholar CrossRef Search ADS   Blanco-Pastor J.L., Vargas P., Pfeil B.E. 2012. Coalescent simulations reveal hybridization and incomplete lineage sorting in Mediterranean Linaria. Plos One  7: e39089. Google Scholar CrossRef Search ADS PubMed  Boissier P.E. 1841. Linaria Salzmanni. In: Voyage botanique dans le midi de l’Espagne 2(15).  Paris: Gide et Cie. p. 456– 457. Bryant D., Bouckaert R., Felsenstein J., Rosenberg N.A., RoyChoudhury A. 2012. Inferring species trees directly from biallelic genetic markers: bypassing gene trees in a full coalescent analysis. Mol. Biol. Evol.  29: 1917– 1932. Google Scholar CrossRef Search ADS PubMed  Bryant D., Moulton V. 2004. Neighbor-net: an agglomerative method for the construction of phylogenetic networks. Mol. Biol. Evol.  21: 255– 265. Google Scholar CrossRef Search ADS PubMed  Buckley T.R., Simon C., Shimodaira H., Chambers G.K. 2001. Evaluating hypotheses on the origin and evolution of the New Zealand alpine cicadas (Maoricicada) using multiple-comparison tests of tree topology. Mol. Biol. Evol.  18: 223– 234. Google Scholar CrossRef Search ADS PubMed  Castro M., Castro S., Loureiro J. 2012. Genome size variation and incidence of polyploidy in Scrophulariaceae sensu lato from the Iberian Peninsula. AoB Plants  2012:pls037. Cavender-Bares J., González-Rodríguez A., Eaton D.A.R., Hipp A.A.L., Beulke A., Manos P.S. 2015. Phylogeny and biogeography of the American live oaks (Quercus subsection Virentes): a genomic and population genetics approach. Mol. Ecol.  24: 3668– 3687. Google Scholar CrossRef Search ADS PubMed  Chifman J., Kubatko L. 2014. Quartet inference from SNP data under the coalescent model. Bioinformatics  30: 3317– 3324. Google Scholar CrossRef Search ADS PubMed  Corander J., Waldmann P., Marttinen P., Sillanpää M.J. 2004. BAPS 2: enhanced possibilities for the analysis of genetic population structure. Bioinformatics  20: 2363– 2369. Google Scholar CrossRef Search ADS PubMed  Cullings K.W. 1992. Design and testing of a plant-specific PCR primer for ecological and evolutionary studies. Mol. Ecol.  1: 233– 240. Google Scholar CrossRef Search ADS   DaCosta J.M., Sorenson M.D. 2016. ddRAD-seq phylogenetics based on nucleotide, indel, and presence-absence polymorphisms: analyses of two avian genera with contrasting histories. Mol. Phylogenet. Evol.  94: 122– 135. Google Scholar CrossRef Search ADS PubMed  Davey J.W., Hohenlohe P.A., Etter P.D., Boone J.Q., Catchen J.M., Blaxter M.L. 2011. Genome-wide genetic marker discovery and genotyping using next-generation sequencing. Nat. Rev. Genet.  12: 499– 510. Google Scholar CrossRef Search ADS PubMed  de la Harpe M., Paris M., Karger D., Rolland J., Kessler M., Salamin N., Lexer C. 2017. Molecular ecology studies of species radiations: current research gaps, opportunities, and challenges. Mol. Ecol.  26: 2608– 2622. Google Scholar CrossRef Search ADS PubMed  Degnan J.H., Rosenberg N.A. 2006. Discordance of species trees with their most likely gene trees. PLOS Genet.  2: e68. Google Scholar CrossRef Search ADS PubMed  Doyle J.J., Doyle J.L. 1987. A rapid DNA isolation procedure for small quantities of fresh leaf tissue. Phytochem. Bull.  19: 11– 15. Durand E.Y., Patterson N., Reich D., Slatkin M. 2011. Testing for ancient admixture between closely related populations. Mol. Biol. Evol.  28: 2239– 2252. Google Scholar CrossRef Search ADS PubMed  Eaton D.A.R. 2014. PyRAD: assembly of de novo RADseq loci for phylogenetic analyses. Bioinformatics  30: 1844– 1849. Google Scholar CrossRef Search ADS PubMed  Eaton D.A.R., Hipp A.L., González-Rodríguez A., Cavender-Bares J. 2015. Historical introgression among the American live oaks and the comparative nature of tests for introgression. Evolution  69: 2587– 2601. Google Scholar CrossRef Search ADS PubMed  Eaton D.A.R., Spriggs E.L., Park B., Donoghue M.J. 2016. Misconceptions on missing data in RAD-seq phylogenetics with a deep-scale example from flowering plants. Syst. Biol.  66: 399– 412. Ebel E.R., DaCosta J.M., Sorenson M.D., Hill R.I., Briscoe A.D., Willmott K.R., Mullen S.P. 2015. Rapid diversification associated with ecological specialization in Neotropical Adelpha butterflies. Mol. Ecol.  24: 2392– 2405. Google Scholar CrossRef Search ADS PubMed  Elshire R.J., Glaubitz J.C., Sun Q., Poland J.A., Kawamoto K., Buckler E.S., Mitchell S.E. 2011. A robust, simple genotyping-by-sequencing (GBS) approach for high diversity species. Plos One  6: e19379. Google Scholar CrossRef Search ADS PubMed  Escudero M., Eaton D.A.R., Hahn M., Hipp A.L. 2014. Genotyping-by-sequencing as a tool to infer phylogeny and ancestral hybridization: a case study in Carex (Cyperaceae). Mol. Phylogenet. Evol.  79: 359– 367. Google Scholar CrossRef Search ADS PubMed  Farrington H.L., Lawson L.P., Clark C.M., Petren K. 2014. The evolutionary history of Darwin’s finches: speciation, gene flow, and introgression in a fragmented landscape. Evolution  68: 2932– 2944. Google Scholar CrossRef Search ADS PubMed  Fenster C.B., Armbruster W.S., Wilson P., Dudash M.R., Thomson J.D. 2004. Pollination syndromes and floral specialization. Annu. Rev. Ecol. Evol. Systemat.  35: 375– 403. Google Scholar CrossRef Search ADS   Fernández-Mazuecos M., Blanco-Pastor J.L., Gómez J.M., Vargas P. 2013a. Corolla morphology influences diversification rates in bifid toadflaxes (Linaria sect. Versicolores). Ann. Bot.  112: 1705– 1722. Google Scholar CrossRef Search ADS   Fernández-Mazuecos M., Blanco-Pastor J.L., Vargas P. 2013b. A phylogeny of toadflaxes (Linaria Mill.) based on nuclear internal transcribed spacer sequences: systematic and evolutionary consequences. Int. J. Plant Sci.  174: 234– 249. Google Scholar CrossRef Search ADS   Fernández-Mazuecos M., Glover B.J. 2017. The evo-devo of plant speciation. Nat. Ecol. Evol.  1: 0110. Google Scholar CrossRef Search ADS   Fernández-Mazuecos M., Vargas P. 2011. Historical isolation versus recent long-distance connections between Europe and Africa in bifid toadflaxes (Linaria sect. Versicolores). Plos One  6: e22234. Google Scholar CrossRef Search ADS PubMed  Fernández-Mazuecos M., Vargas P. 2015. Quaternary radiation of bifid toadflaxes (Linaria sect. Versicolores) in the Iberian Peninsula: low taxonomic signal but high geographic structure of plastid DNA lineages. Plant Syst. Evol.  301: 1411– 1423. Google Scholar CrossRef Search ADS   Gadagkar S.R., Rosenberg M.S., Kumar S. 2005. Inferring species phylogenies from multiple genes: concatenated sequence tree versus consensus gene tree. J. Exp. Zool. B Mol. Dev. Evol.  304: 64– 74. Google Scholar CrossRef Search ADS PubMed  Giarla T.C., Esselstyn J.A. 2015. The challenges of resolving a rapid, recent radiation: empirical and simulated phylogenomics of Philippine shrews. Syst. Biol.  64: 727– 740. Google Scholar CrossRef Search ADS PubMed  Givnish T.J., Millam K.C., Mast A.R., Paterson T.B., Theim T.J., Hipp A.L., Henss J.M., Smith J.F., Wood K.R., Sytsma K.J. 2009. Origin, adaptive radiation and diversification of the Hawaiian lobeliads (Asterales: Campanulaceae). Proc. Roy. Soc. Lond. B Biol. Sci.  276: 407– 416. Google Scholar CrossRef Search ADS   Gower J.C. 1971. A general coefficient of similarity and some of its properties. Biometrics  27: 857– 874. Google Scholar CrossRef Search ADS   Grabowski P.P., Morris G.P., Casler M.D., Borevitz J.O. 2014. Population genomic variation reveals roles of history, adaptation and ploidy in switchgrass. Mol. Ecol.  23: 4059– 4073. Google Scholar CrossRef Search ADS PubMed  Guzmán B., Lledó M.D., Vargas P. 2009. Adaptive radiation in Mediterranean Cistus (Cistaceae). Plos One  4: e6362. Google Scholar CrossRef Search ADS PubMed  Hamon P., Grover C.E., Davis A.P., Rakotomalala J.J., Raharimalala N.E., Albert V.A., Sreenath H.L., Stoffelen P., Mitchell S.E., Couturon E., Hamon S., de Kochko A., Crouzillat D., Rigoreau M., Sumirat U., Akaffou S., Guyot R. 2017. Genotyping-by-sequencing provides the first well-resolved phylogeny for coffee (Coffea) and insights into the evolution of caffeine content in its species: GBS coffee phylogeny and the evolution of caffeine content. Mol. Phylogenet. Evol.  109: 351– 361. Google Scholar CrossRef Search ADS PubMed  Harvey M.G., Judy C.D., Seeholzer G.F., Maley J.M., Graves G.R., Brumfield R.T. 2015. Similarity thresholds used in DNA sequence assembly from short reads can reduce the comparability of population histories across species. PeerJ  3: e895. Google Scholar CrossRef Search ADS PubMed  Harvey M.G., Smith B.T., Glenn T.C., Faircloth B.C., Brumfield R.T. 2016. Sequence capture versus restriction site associated DNA sequencing for shallow systematics. Syst. Biol.  65: 910– 924. Google Scholar CrossRef Search ADS PubMed  Heled J., Drummond A.J. 2010. Bayesian inference of species trees from multilocus data. Mol. Biol. Evol.  27: 570– 580. Google Scholar CrossRef Search ADS PubMed  Hipp A.L. 2014. RADami: R package for phylogenetic analysis of RADseq data. Available from https://cran.r-project.org/web/packages/RADami/index.html. Hipp A.L., Eaton D.A.R., Cavender-Bares J., Fitzek E., Nipper R., Manos P.S. 2014. A framework phylogeny of the American oak clade based on sequenced RAD data. Plos One  9: e93975. Google Scholar CrossRef Search ADS PubMed  Huang H., Knowles L.L. 2014. Unforeseen consequences of excluding missing data from next-generation sequences: simulation study of RAD sequences. Syst. Biol.  65: 357– 365. Google Scholar CrossRef Search ADS PubMed  Hughes C., Eastwood R. 2006. Island radiation on a continental scale: exceptional rates of plant diversification after uplift of the Andes. P. Natl. Acad. Sci. USA  103: 10334– 10339. Google Scholar CrossRef Search ADS   Kampstra P. 2008. Beanplot: a boxplot alternative for visual comparison of distributions. J. Stat. Softw.  28: 1– 9. Google Scholar CrossRef Search ADS PubMed  Kay K.M., Voelckel C., Yang J.Y., Hufford K.M., Kaska D.D., Hodges S.A. 2006. Floral characters and species diversification. In: Harder L.D., Barrett S.C.H., editors. Ecology and evolution of flowers.  Oxford: Oxford University Press p. 311– 325. Kubatko L.S., Degnan J.H. 2007. Inconsistency of phylogenetic estimates from concatenated data under coalescence. Syst. Biol.  56: 17– 24. Google Scholar CrossRef Search ADS PubMed  Kumar S., Filipski A.J., Battistuzzi F.U., Pond S.L.K., Tamura K. 2011. Statistics and truth in phylogenomics. Mol. Biol. Evol.  29: 457– 472. Google Scholar CrossRef Search ADS PubMed  Leaché A.D., Banbury B.L., Felsenstein J., de Oca A.N.-M., Stamatakis A. 2015a. Short tree, long tree, right tree, wrong tree: new acquisition bias corrections for inferring SNP phylogenies. Syst. Biol.  64: 1032– 1047. Google Scholar CrossRef Search ADS   Leaché A.D., Chavez A.S., Jones L.N., Grummer J.A., Gottscho A.D., Linkem C.W. 2015b. Phylogenomics of phrynosomatid lizards: conflicting signals from sequence capture versus restriction site associated DNA sequencing. Genome Biol. Evol.  7: 706– 719. Google Scholar CrossRef Search ADS   Lemmon E.M., Lemmon A.R. 2013. High-throughput genomic data in systematics and phylogenetics. Annu. Rev. Ecol. Evol. Systemat.  44: 99– 121. Google Scholar CrossRef Search ADS   Li H., Ruan J., Durbin R. 2008. Mapping short DNA sequencing reads and calling variants using mapping quality scores. Genome Res.  18: 1851– 1858. Google Scholar CrossRef Search ADS PubMed  Liu L. 2008. BEST: Bayesian estimation of species trees under the coalescent model. Bioinformatics  24: 2542– 2543. Google Scholar CrossRef Search ADS PubMed  Liu L., Wu S., Yu L. 2015. Coalescent methods for estimating species trees from phylogenomic data. J. Syst. Evol.  53: 380– 390. Google Scholar CrossRef Search ADS   Liu L., Yu L. 2010. Phybase: an R package for species tree analysis. Bioinformatics  26: 962– 963. Google Scholar CrossRef Search ADS PubMed  Liu L., Yu L. 2011. Estimating species trees from unrooted gene trees. Syst. Biol.  60: 661– 667. Google Scholar CrossRef Search ADS PubMed  Maddison W.P., Maddison D.R. 2011. Mesquite: a modular system for evolutionary analysis. Available from http://mesquiteproject.org. Mahler D.L., Ingram T., Revell L.J., Losos J.B. 2013. Exceptional convergence on the macroevolutionary landscape in island lizard radiations. Science  341: 292– 295. Google Scholar CrossRef Search ADS PubMed  Mallo D. 2015. Multi-locus bootstrapping script. Available from https://github.com/adamallo/multi-locus-bootstrapping. Mallo D. 2016. NJstM. Available from https://github.com/adamallo/NJstM. Manthey J.D., Campillo L.C., Burns K.J., Moyle R.G. 2016. Comparison of target-capture and restriction-site associated DNA sequencing for phylogenomics: a test in cardinalid tanagers (Aves, Genus: Piranga). Syst. Biol.  65: 640– 650. Google Scholar CrossRef Search ADS PubMed  Mastretta-Yanes A., Arrigo N., Alvarez N., Jorgensen T.H., Piñero D., Emerson B.C. 2015. Restriction site-associated DNA sequencing, genotyping error estimation and de novo assembly optimization for population genetic inference. Mol. Ecol. Resour.  15: 28– 41. Google Scholar CrossRef Search ADS PubMed  McCormack J.E., Hird S.M., Zellmer A.J., Carstens B.C., Brumfield R.T. 2013. Applications of next-generation sequencing to phylogeography and phylogenetics. Mol. Phylogenet. Evol.  66: 526– 538. Google Scholar CrossRef Search ADS PubMed  Meiklejohn K.A., Faircloth B.C., Glenn T.C., Kimball R.T., Braun E.L. 2016. Analysis of a rapid evolutionary radiation using ultraconserved elements: evidence for a bias in some multispecies coalescent methods. Syst. Biol.  65: 612– 627. Google Scholar CrossRef Search ADS PubMed  Mirarab S., Bayzid M.S., Warnow T. 2016. Evaluating summary methods for multilocus species tree estimation in the presence of incomplete lineage sorting. Syst. Biol.  65: 366– 380. Google Scholar CrossRef Search ADS PubMed  Mirarab S., Reaz R., Bayzid M.S., Zimmermann T., Swenson M.S., Warnow T. 2014. ASTRAL: genome-scale coalescent-based species tree estimation. Bioinformatics  30: i541– i548. Google Scholar CrossRef Search ADS PubMed  Mittelbach G.G., Schemske D.W. 2015. Ecological and evolutionary perspectives on community assembly. Trends Ecol. Evol.  30: 241– 247. Google Scholar CrossRef Search ADS PubMed  Muschick M., Indermaur A., Salzburger W. 2012. Convergent evolution within an adaptive radiation of cichlid fishes. Curr. Biol.  22: 2362– 2368. Google Scholar CrossRef Search ADS PubMed  Nadeau N.J., Martin S.H., Kozak K.M., Salazar C., Dasmahapatra K.K., Davey J.W., Baxter S.W., Blaxter M.L., Mallet J., Jiggins C.D. 2013. Genome-wide patterns of divergence and gene flow across a butterfly radiation. Mol. Ecol.  22: 814– 826. Google Scholar CrossRef Search ADS PubMed  Nicotra A.B., Chong C., Bragg J.G., Ong C.R., Aitken N.C., Chuah A., Lepschi B., Borevitz J.O. 2016. Population and phylogenomic decomposition via genotyping-by-sequencing in Australian Pelargonium. Mol. Ecol.  25: 2000– 2014. Google Scholar CrossRef Search ADS PubMed  Nieto Feliner G., álvarez I., Fuertes-Aguilar J., Heuertz M., Marques I., Moharrek F., Piñeiro R., Riina R., Rosselló J.A., Soltis P.S., Villa-Machío I. 2017. Is homoploid hybrid speciation that rare? An empiricist’s view. Heredity  118: 513– 516. Google Scholar CrossRef Search ADS PubMed  Oksanen J., Blanchet F.G., Kindt R., Legendre P., O’Hara R.B., Simpson G.L., Solymos P., M.H.H. S., Wagner H. 2011. vegan: Community Ecology Package. R package version 1.17-9.  Available from http://CRAN.R-project.org/package=vegan. Paradis E., Claude J., Strimmer K. 2004. APE: analyses of phylogenetics and evolution in R language. Bioinformatics  20: 289– 290. Google Scholar CrossRef Search ADS PubMed  Peterson B.K., Weber J.N., Kay E.H., Fisher H.S., Hoekstra H.E. 2012. Double digest RADseq: an inexpensive method for de novo SNP discovery and genotyping in model and non-model species. Plos One  7: e37135. Google Scholar CrossRef Search ADS PubMed  Poland J.A., Brown P.J., Sorrells M.E., Jannink J.L. 2012. Development of high-density genetic maps for barley and wheat using a novel two-enzyme genotyping-by-sequencing approach. Plos One  7: e32253. Google Scholar CrossRef Search ADS PubMed  Pritchard J.K., Stephens M., Donnelly P. 2000. Inference of population structure using multilocus genotype data. Genetics  155: 945– 959. Google Scholar PubMed  R Development Core Team. 2016. R: A language and environment for statistical computing.  Vienna: R Foundation for Statistical Computing. Ree R.H., Hipp A.L. 2015. Inferring phylogenetic history from restriction site associated DNA (RADseq). In: Hörandl E., Appelhans M.S., editors. Next-generation sequencing in plant systematics.  Königstein: Koeltz Scientific Books p. 181– 204. Revell L.J. 2012. phytools: an R package for phylogenetic comparative biology (and other things). Methods Ecol. Evol.  3: 217– 223. Google Scholar CrossRef Search ADS   Roure B., Baurain D., Philippe H. 2013. Impact of missing data on phylogenies inferred from empirical phylogenomic data sets. Mol. Biol. Evol.  30: 197– 214. Google Scholar CrossRef Search ADS PubMed  Rubin B.E.R., Ree R.H., Moreau C.S. 2012. Inferring phylogenies from RAD sequence data. Plos One  7: e33394. Google Scholar CrossRef Search ADS PubMed  Sáez L., Bernal M. 2009. Linaria Mill. In: Castroviejo S., Herrero A., Benedí C., Rico E., Güemes J., editors. Flora iberica XIII (Plantaginaceae – Scrophulariaceae).  Madrid: CSIC p. 232– 324. Salichos L., Rokas A. 2013. Inferring ancient divergences requires genes with strong phylogenetic signals. Nature  497: 327– 331. Google Scholar CrossRef Search ADS PubMed  Sang T. 2002. Utility of low-copy nuclear gene sequences in plant phylogenetics. Crit. Rev. Biochem. Mol.  37: 121– 147. Google Scholar CrossRef Search ADS   Schluter D. 2000. The ecology of adaptive radiation. Oxford: Oxford University Press. Seehausen O. 2006. African cichlid fish: a model system in adaptive radiation research. Proc. Roy. Soc. Lond. B Biol. Sci.  273: 1987– 1998. Google Scholar CrossRef Search ADS   Seo T.K. 2008. Calculating bootstrap probabilities of phylogeny using multilocus sequence data. Mol. Biol. Evol.  25: 960– 971. Google Scholar CrossRef Search ADS PubMed  Shafer A., Peart C.R., Tusso S., Maayan I., Brelsford A., Wheat C.W., Wolf J.B.W. 2016. Bioinformatic processing of RAD-seq data dramatically impacts downstream population genetic inference. Methods Ecol. Evol.  doi: 10.1111/2041-210X.12700. Shaffer H.B., Thomson R.C. 2007. Delimiting species in recent radiations. Syst. Biol.  56: 896– 906. Google Scholar CrossRef Search ADS PubMed  Shaw J., Lickey E.B., Schilling E.E., Small R.L. 2007. Comparison of whole chloroplast genome sequences to choose noncoding regions for phylogenetic studies in angiosperms: the tortoise and the hare III. Am. J. Bot.  94: 275– 288. Google Scholar CrossRef Search ADS PubMed  Shaw K.L. 2002. Conflict between nuclear and mitochondrial DNA phylogenies of a recent species radiation: what mtDNA reveals and conceals about modes of speciation in Hawaiian crickets. Proc. Natl. Acad. Sci. USA  99: 16122– 16127. Google Scholar CrossRef Search ADS   Shimodaira H. 2002. An approximately unbiased test of phylogenetic tree selection. Syst. Biol.  51: 492– 508. Google Scholar CrossRef Search ADS PubMed  Shimodaira H., Hasegawa M. 1999. Multiple comparisons of log-likelihoods with applications to phylogenetic inference. Mol. Biol. Evol.  16: 1114– 1116. Google Scholar CrossRef Search ADS   Shimodaira H., Hasegawa M. 2001. CONSEL: for assessing the confidence of phylogenetic tree selection. Bioinformatics  17: 1246– 1247. Google Scholar CrossRef Search ADS PubMed  Sidlauskas B. 2008. Continuous and arrested morphological diversification in sister clades of characiform fishes: a phylomorphospace approach. Evolution  62: 3135– 3156. Google Scholar CrossRef Search ADS PubMed  Solís-Lemus C., Ané C. 2016. Inferring phylogenetic networks with maximum pseudolikelihood under incomplete lineage sorting. Plos Genet.  12: e1005896. Google Scholar CrossRef Search ADS PubMed  Solís-Lemus C., Yang M., Ané C. 2016. Inconsistency of species-tree methods under gene flow. Syst. Biol.  65: 843– 851. Google Scholar CrossRef Search ADS PubMed  Stamatakis A. 2006. Phylogenetic models of rate heterogeneity: a high performance computing perspective. Proceedings of the IPDPS2006, Rhodos,  Greece: IEEE. doi: 10.1109/IPDPS.2006.1639535. Stamatakis A. 2014. RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics  30: 1312– 1313. Google Scholar CrossRef Search ADS PubMed  Suchan T., Pitteloud C., Gerasimova N.S., Kostikova A., Schmid S., Arrigo N., Pajkovic M., Ronikier M., Alvarez N. 2016. Hybridization capture using RAD probes (hyRAD), a new tool for performing genomic analyses on collection specimens. Plos One  11: e0151651. Google Scholar CrossRef Search ADS PubMed  Sutton D.A. 1988. A revision of the tribe Antirrhineae.  Oxford: Oxford University Press. Swofford D. 2002. PAUP*. Phylogenetic analysis using parsimony (*and other methods). Version 4. Sunderland, Massachusetts: Sinauer. Takahashi T., Nagata N., Sota T. 2014. Application of RAD-based phylogenetics to complex relationships among variously related taxa in a species flock. Mol. Phylogenet. Evol.  80: 137– 144. Google Scholar CrossRef Search ADS PubMed  Valdés B. 1970. Taxonomía experimental del género Linaria IV. Reproducción sexual: autogamia y polinización intraespecífica. Bol. R. Soc. Esp. Hist. Nat. Sec. Biol.  68: 79– 89. Valente L.M., Savolainen V., Vargas P. 2010. Unparalleled rates of species diversification in Europe. Proc. Roy. Soc. Lond. B Biol. Sci.  277: 1489– 1496. Google Scholar CrossRef Search ADS   van Orsouw N.J., Hogers R.C.J., Janssen A., Yalcin F., Snoeijers S., Verstege E., Schneiders H., van der Poel H., Van Oeveren J., Verstegen H. 2007. Complexity reduction of polymorphic sequences (CRoPS™): a novel approach for large-scale polymorphism discovery in complex genomes. Plos One  2: e1172. Google Scholar CrossRef Search ADS PubMed  Vargas P., Zardoya R. 2014. The tree of life: evolution and classification of living organisms.  Sunderland, Massachusetts: Sinauer Associates Inc. Venables W.N., Ripley B.D. 2002. Modern applied statistics with S.  4th ed. New York: Springer. Google Scholar CrossRef Search ADS   Vigalondo B., Fernández-Mazuecos M., Vargas P., Sáez L. 2015. Unmasking cryptic species: morphometric and phylogenetic analyses of the Ibero-North African Linaria incarnata complex. Bot. J. Linn. Soc.  177: 395– 417. Google Scholar CrossRef Search ADS   Wagner C.E., Keller I., Wittwer S., Selz O.M., Mwaiko S., Greuter L., Sivasundar A., Seehausen O. 2013. Genome-wide RAD sequence data provide unprecedented resolution of species boundaries and relationships in the Lake Victoria cichlid adaptive radiation. Mol. Ecol.  22: 787– 798. Google Scholar CrossRef Search ADS PubMed  Waterhouse A.M., Procter J.B., Martin D.M.A., Clamp M., Barton G.J. 2009. Jalview Version 2—a multiple sequence alignment editor and analysis workbench. Bioinformatics  25: 1189– 1191. Google Scholar CrossRef Search ADS PubMed  Whittall J.B., Voelckel C., Kliebenstein D.J., Hodges S.A. 2006. Convergence, constraint and the role of gene expression during adaptive radiation: floral anthocyanins in Aquilegia. Mol. Ecol.  15: 4645– 4657. Google Scholar CrossRef Search ADS PubMed  © The Author(s) 2017. Published by Oxford University Press on behalf of the Systematic Biologists. All rights reserved. For Permissions, please email: journals.permissions@oup.com http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Systematic Biology Oxford University Press

Resolving Recent Plant Radiations: Power and Robustness of Genotyping-by-Sequencing

Loading next page...
 
/lp/ou_press/resolving-recent-plant-radiations-power-and-robustness-of-genotyping-9aOdOklW0j
Publisher
Oxford University Press
Copyright
© The Author(s) 2017. Published by Oxford University Press on behalf of the Systematic Biologists. All rights reserved. For Permissions, please email: journals.permissions@oup.com
ISSN
1063-5157
eISSN
1076-836X
D.O.I.
10.1093/sysbio/syx062
Publisher site
See Article on Publisher Site

Abstract

Abstract Disentangling species boundaries and phylogenetic relationships within recent evolutionary radiations is a challenge due to the poor morphological differentiation and low genetic divergence between species, frequently accompanied by phenotypic convergence, interspecific gene flow and incomplete lineage sorting. Here we employed a genotyping-by-sequencing (GBS) approach, in combination with morphometric analyses, to investigate a small western Mediterranean clade in the flowering plant genus Linaria that radiated in the Quaternary. After confirming the morphological and genetic distinctness of eight species, we evaluated the relative performances of concatenation and coalescent methods to resolve phylogenetic relationships. Specifically, we focused on assessing the robustness of both approaches to variations in the parameter used to estimate sequence homology (clustering threshold). Concatenation analyses suffered from strong systematic bias, as revealed by the high statistical support for multiple alternative topologies depending on clustering threshold values. By contrast, topologies produced by two coalescent-based methods (NJ$$_{\mathrm{st}}$$, SVDquartets) were robust to variations in the clustering threshold. Reticulate evolution may partly explain incongruences between NJ$$_{\mathrm{st}}$$, SVDquartets and concatenated trees. Integration of morphometric and coalescent-based phylogenetic results revealed (i) extensive morphological divergence associated with recent splits between geographically close or sympatric sister species and (ii) morphological convergence in geographically disjunct species. These patterns are particularly true for floral traits related to pollinator specialization, including nectar spur length, tube width and corolla color, suggesting pollinator-driven diversification. Given its relatively simple and inexpensive implementation, GBS is a promising technique for the phylogenetic and systematic study of recent radiations, but care must be taken to evaluate the robustness of results to variation of data assembly parameters. Coalescence, concatenation, genotyping-by-sequencing, Linaria, phylogeny, radiation, RAD-Seq, speciation Recent evolutionary radiations constitute ideal systems to investigate evolution, speciation and adaptation (Schluter 2000; Hughes and Eastwood 2006; Seehausen 2006; Valente et al. 2010; Farrington et al. 2014). To understand the causes and direction of evolutionary change, robust systematic treatments and well-resolved phylogenies are required. However, disentangling species boundaries and phylogenetic relationships within recent evolutionary radiations is a challenge due to the low genetic divergence between species, frequently accompanied by inter-specific gene flow and incomplete lineage sorting (ILS) (Shaw 2002; Shaffer and Thomson 2007). This leads to poor resolution, low support values and short branch lengths in phylogenetic trees, interpreted as signatures of rapid diversification (Valente et al. 2010; Meiklejohn et al. 2016). Although morphological diversity may be considerable, the high frequency of phenotypic convergence further confounds systematics and phylogeny, yet it has high evolutionary interest because it is potentially informative about recurrent adaptive traits and their molecular basis (e.g. Whittall et al. 2006; Muschick et al. 2012). Some authors cast doubt on the possibility of resolving recent radiations in the tree of life because of these difficulties (Vargas and Zardoya 2014), and even state-of-the-art approaches have methodological challenges to overcome (Harvey et al. 2016). In plants, classical approaches to phylogenetic analysis are based on a small number of loci, frequently nuclear ribosomal DNA (nrDNA), plastid DNA (ptDNA) and low-copy number genes (Sang 2002; Álvarez and Wendel 2003; Shaw et al. 2007). This strategy has been proven effective to resolve some recent radiations, and is frequently able to recover major clades and evolutionary patterns (Givnish et al. 2009; Guzmán et al. 2009). However, incongruence between loci that may result from hybridization and ILS, among other causes, often brings about uncertainty regarding species boundaries and species-level relationships (Blanco-Pastor et al. 2012). Under these conditions, the sequencing of a large number of genome-wide genetic markers, accompanied by thorough intraspecific sampling of individuals, is a requirement to resolve systematic and phylogenetic questions. To achieve this, several approaches based on next generation sequencing (NGS) have been proposed in recent years, including sequence capture, transcriptomics, and restriction digest-based methods (Lemmon and Lemmon 2013; McCormack et al. 2013; Harvey et al. 2016). Restriction digest-based NGS methods comprise a number of procedures, including those known as restriction-site associated DNA sequencing (RAD-Seq) and genotyping-by-sequencing (GBS) (van Orsouw et al. 2007; Baird et al. 2008; Davey et al. 2011; Elshire et al. 2011; Peterson et al. 2012; Poland et al. 2012). Despite the plethora of protocols that have been published, a NGS-based method that is universally and routinely applicable by systematic biologists is yet to be established as a replacement to conventional phylogenetic approaches. Such a method should not only be capable of providing genome-wide phylogenetic information but should also be cost- and time-effective, involve relatively simple laboratory protocols, and should not require a sequenced reference genome or previous genomic information (Lemmon and Lemmon 2013; McCormack et al. 2013). Ideally, it should also be successful with fragmented DNA available from biological collections of various ages. In addition to laboratory protocols, uncertainties also remain concerning the bioinformatic analysis of large numbers of genomic loci with potentially conflicting phylogenetic signals. The concatenated approach has frequently been favored because of its analytical speed (Wagner et al. 2013; Hipp et al. 2014). However, simulation studies show that, even when analyzing thousands of loci, concatenated analysis can produce inconsistent results in the presence of ILS (Kubatko and Degnan 2007). Species tree methods based on the multi-species coalescent approach therefore should be preferred to deal with this problem but they may be computationally intensive (reviewed by Liu et al. 2015). In addition, as for concatenation methods, species tree methods may be inconsistent in the presence of hybridization (Solís-Lemus et al. 2016). Coalescent-based phylogenetic network methods, capable of dealing with gene flow, are even more computationally intensive, but more efficient implementations of these are currently in development (Solís-Lemus and Ané 2016). Here we have analyzed species boundaries, phylogenetic relationships, and patterns of morphological evolution during speciation in a small radiation of flowering plants, the Iberian clade of Linaria subsect. Versicolores (Fernández-Mazuecos et al. 2013a). According to current taxonomic treatments, this clade comprises eight taxa endemic or sub-endemic to the Iberian Peninsula, in the western Mediterranean biodiversity hotspot (Sáez and Bernal 2009; Vigalondo et al. 2015; Blanca et al. 2017). This clade constitutes an excellent study system for the understanding of pollinator-driven floral evolution and speciation because it has high phenotypic diversity in floral traits promoting pollinator specialization, including corolla tube width, corolla color and nectar spur length (Fernández-Mazuecos et al. 2013a). Nectar spurs in particular have been considered a key evolutionary innovation, promoting speciation and morphological diversity (Kay et al. 2006; Fernández-Mazuecos and Glover 2017). However, the Iberian clade also displays a number of features that make it a challenging study case from both a systematic and phylogenetic standpoint (Fernández-Mazuecos et al. 2013a; Fernández-Mazuecos and Vargas 2015; Vigalondo et al. 2015): (i) recent and rapid radiation, dated back to the Quaternary; (ii) potential instances of phenotypic convergence; (iii) incongruence between ptDNA and nrDNA (internal transcribed spacer, ITS) phylogenies; (iv) inconsistent results between concatenation- and coalescent-based phylogenies of combined ptDNA$$+$$ITS data; and (v) lack of monophyly of intraspecific samples and extensive sharing of ptDNA haplotypes across species. As a result, previous attempts to resolve relationships with standard phylogenetic markers were unable to recover a reliable species-level phylogeny for this clade. To generate genome-wide phylogenetic markers, we selected the genotyping-by-sequencing approach first described by Elshire et al. (2011) as it does not require previous genomic information and is relatively simple and inexpensive compared to related restriction digest-based methods. This makes it a good candidate to become a universally applicable NGS replacement for conventional phylogenetic approaches. This method has recently been successfully applied to species phylogenetics and species delimitation (Escudero et al. 2014; Nicotra et al. 2016), but questions remain concerning the analytical approach to GBS data, such as the treatment of missing data (whether loci with large proportions of missing sequences should be excluded or not; Roure et al. 2013; Huang and Knowles 2014; Eaton et al. 2016) or the relative merits of concatenated and coalescent-based approaches to infer species phylogenies (DaCosta and Sorenson 2016). Our objective was to use both morphometric data and genome-wide genetic markers generated by GBS in the Iberian clade of Linaria subsect. Versicolores to: (i) test species limits, thus achieving a robust systematic treatment; (ii) infer a well-supported species phylogeny comparing the relative performances of concatenation and coalescent-based methods; (iii) analyze the patterns of morphological evolution during speciation, particularly those potentially linked to pollinator specialization. The ultimate goal was to establish a unified approach that allows the resolution of the elusive species limits and phylogenetic relationships in a recent radiation. Materials and Methods Study System The eight taxa currently recognized in the Iberian clade of Linaria subsect. Versicolores and acronyms used hereafter in this article are listed in Table 1 (Sáez and Bernal 2009; Vigalondo et al. 2015; Blanca et al. 2017; see Supplementary Appendix S1 available on Dryad at http://dx.doi.org/10.5061/dryad.mp818). One of these taxa has been treated as either a species or a subspecies (L. salzmannii or L. viscosa subsp. spicata), while the other seven have been consistently treated as species. An additional taxon has been described, L. viscosa subsp. crassifolia (CRA) (Sutton 1988), but its taxonomic status is uncertain (Sáez and Bernal 2009). These taxa inhabit grasslands and open shrubby formations on soils developed from diverse lithologies. They are endemic to the Iberian Peninsula, except for SPA, which is also present in southern France. They are diploid species with chromosome number 2$$n$$$$=$$ 12 (Sutton 1988; Sáez and Bernal 2009) and genome sizes around 2C $$=$$ 1.1 pg (Castro et al. 2012), and they all seem to be predominantly allogamous (Valdés 1970). Table 1. Study taxa and summary of plant materials included in morphometric and GBS analyses Taxon$$^{\mathrm{a}}$$  Acronym  Number of sampled individuals, morphometry  Number of sampled individuals, GBS$$^{\mathrm{b}}$$  Number of loci (mean $$\pm$$ SD)$$^{\mathrm{c}}$$  Linaria subsect. Elegantes (2/2)  OUT  —  4/4  2130.296  Linaria subsect. Versicolores, North African clade (6/20)  NAF  —  9/9  2524.618  Linaria subsect. Versicolores, Iberian clade (8/8)  —  —  76/68  3586.1048  $$\qquad$$L. algarviana Chav.  ALG  17  10/10  3219.808  $$\qquad$$L. becerrae Blanca, Cueto & J.Fuentes  BEC  11  8/8  4464.706  $$\qquad$$L. clementei Haens.  CLE  10  9/9  4229.845  $$\qquad$$L. incarnata (Vent.) Spreng.  INC  33  10/10  3526.670  $$\qquad$$L. onubensis Pau  ONU  32  9/8  3805.668  $$\qquad$$L. salzmannii Boiss. ($$=$$L. viscosa subsp. spicata (Kunze) D.A.Sutton)  SAL  21  10/10  3801.875  $$\qquad$$L. spartea (L.) Chaz.  SPA  27  10/8  3213.1218  $$\qquad$$L. viscosa (L.) Chaz.  VIS  22  10/5  2630.1412  Taxon$$^{\mathrm{a}}$$  Acronym  Number of sampled individuals, morphometry  Number of sampled individuals, GBS$$^{\mathrm{b}}$$  Number of loci (mean $$\pm$$ SD)$$^{\mathrm{c}}$$  Linaria subsect. Elegantes (2/2)  OUT  —  4/4  2130.296  Linaria subsect. Versicolores, North African clade (6/20)  NAF  —  9/9  2524.618  Linaria subsect. Versicolores, Iberian clade (8/8)  —  —  76/68  3586.1048  $$\qquad$$L. algarviana Chav.  ALG  17  10/10  3219.808  $$\qquad$$L. becerrae Blanca, Cueto & J.Fuentes  BEC  11  8/8  4464.706  $$\qquad$$L. clementei Haens.  CLE  10  9/9  4229.845  $$\qquad$$L. incarnata (Vent.) Spreng.  INC  33  10/10  3526.670  $$\qquad$$L. onubensis Pau  ONU  32  9/8  3805.668  $$\qquad$$L. salzmannii Boiss. ($$=$$L. viscosa subsp. spicata (Kunze) D.A.Sutton)  SAL  21  10/10  3801.875  $$\qquad$$L. spartea (L.) Chaz.  SPA  27  10/8  3213.1218  $$\qquad$$L. viscosa (L.) Chaz.  VIS  22  10/5  2630.1412  Notes: The final taxonomic treatment is followed, with a previous treatment in brackets for L. salzmannii. The specimens sampled in the locus classicus of L. viscosa subsp. crassifolia (CRA) are included in L. spartea (see text). $$^{\mathrm{a}}$$Sampled/total numbers of species and subspecies for the three major clades are shown in brackets. $$^{\mathrm{b}}$$For each taxon, total number of sampled individuals/number of individuals included in phylogenetic analyses are shown. $$^{\mathrm{c}}$$For each taxon, the mean ($$\pm$$SD) number of loci recovered across individuals and pyRAD assemblies (from clustering threshold 0.80–0.95, with minimum taxon coverage of 4) is shown. Figures calculated after taxonomic reassignment of some individuals resulting from genetic structure analyses. The Iberian clade has been strongly supported as monophyletic by both concatenated and coalescent-based analyses of combined ITS and ptDNA sequences, and it is sister to a mostly North African clade of c. 20 taxa (Supplementary Fig. S1a available on Dryad; Fernández-Mazuecos et al. 2013a; Vigalondo et al. 2015). Both the Iberian and North African clades constitute Linaria subsect. Versicolores, which is in turn sister to Linaria subsect. Elegantes (two species). Both subsections make up Linaria sect. Versicolores, one of the major clades of the toadflax genus (Fernández-Mazuecos et al. 2013b). Divergence between the Iberian and North African clades has been dated back to the Pliocene or early Quaternary, and speciation of the Iberian clade is estimated to have happened in the late Quaternary, during the last million years, with high diversification rates (Supplementary Fig. S1a available on Dryad; Fernández-Mazuecos and Vargas 2011; Fernández-Mazuecos et al. 2013a; Fernández-Mazuecos and Vargas 2015). Two major geographically-structured ptDNA lineages have been detected in the Iberian clade (Fernández-Mazuecos and Vargas 2015). Morphometric Analysis A morphometric analysis of all taxa included in the Iberian clade was performed following the methods of our previous study of INC and ONU (Vigalondo et al. 2015). A full account of methods is found in Supplementary Appendix S2 available on Dryad. Specimens were obtained from herbaria (Supplementary Appendix S1 available on Dryad) and from the authors’ collections (vouchers in BCB and MA). The selected characters included 24 quantitative and three qualitative variables (Supplementary Table S1 available on Dryad). Ten to thirty-three specimens per taxon were analyzed and one measure per character per specimen was taken. Specimens of two different flower color morphs of SAL (purple and yellow) were included, as well as three specimens collected in the locus classicus of CRA. The final morphometric data set included 166 samples. To assess the morphological affinities between specimens and taxa, principal coordinate analysis (PCoA) was conducted using Gower’s coefficient of similarity for mixed data (Gower 1971). A discriminant function analysis (DFA) was then performed to reveal the main characters contributing to the morphological differentiation of taxa and to test the assignment of specimens to taxa using a cross-validation approach. The R package rgl (Adler and Murdoch 2016) was used to represent the first three axes of both the PCoA and DFA in 3D scatter plots with 95% confidence ellipses of concentration for the eight taxa. Distributions of quantitative characters were summarized in the form of beanplots using the R package beanplot (Kampstra 2008). To evaluate the impact of sample size on the results, we repeated the PCoA and DFA analyses replacing the missing data with mean values, without obtaining significantly different results. We used the basic packages of R v3.2.5 (R Development Core Team 2016), ape (Paradis et al. 2004) and vegan (Oksanen et al. 2011) to construct the PCoA and MASS (Venables and Ripley 2002) for the DFA. Phylogenomic Analysis Specimen sampling and DNA extraction.— We sampled a total of 89 individuals of Linaria. Of these, 76 individuals represented the distribution and morphological variation of all species and subspecies of the Iberian clade of Linaria subsect. Versicolores, including specimens of the two flower color morphs of SAL and one individual of CRA (Table 1, Supplementary Table S2 and Fig. S1b available on Dryad). Thirteen individuals of additional species of Linaria sect. Versicolores were sampled to be used as the outgroup, including the two species of Linaria subsect. Elegantes and six species of the North African clade of Linaria subsect. Versicolores (Table 1, Supplementary Table S2 available on Dryad). Seventy-five individuals were sampled in the field and dried in silica gel between 2006 and 2015, while 14 individuals were obtained from generally older (1970–2006) herbarium specimens (MA, RNG). For each sampled individual, total genomic DNA was extracted from c. 20 mg of leaf tissue using a modified CTAB protocol (Doyle and Doyle 1987; Cullings 1992). DNA concentrations were quantified using a Qubit 2.0 fluorometer (Invitrogen, CA, USA) with the dsDNA Broad-Range Assay Kit. Genotyping-by-sequencing library preparation.— A GBS library was prepared using all DNA samples with the PstI-HF restriction enzyme and following previously published procedures (Elshire et al. 2011; Escudero et al. 2014; Grabowski et al. 2014) with some modifications. A full description of laboratory protocols is found in Supplemenatry Appendix S2 available on Dryad, and a summary is provided here. For each sample, 500 ng (where available) of genomic DNA were combined with 0.6 pmol of a sample specific barcode adapter and 0.6 pmol of common adapter (Supplementary Tables S2 and S3 available on Dryad). Each sample was digested with four units of PstI-HF (NEB, MA, USA) at 37ºC overnight. Adapters were ligated using 400 units of T4 DNA Ligase (NEB, MA, USA) at room temperature for 4 h. We combined 50 ng of each sample, and the pool was then purified with Agencourt AMPure XP (Beckman Coulter, CA, USA). DNA fragments were amplified for 15 PCR cycles starting from 35 ng of DNA and using NEB 2x Taq MasterMix (NEB, MA, USA). The PCR product was purified using different volume ratios of AMPure XP beads to sample, and fragment size distributions were assessed in a 2100 Bioanalyzer (Agilent Technologies, CA, USA; Supplementary Fig. S2 available on Dryad). The library with the optimal profile and concentration (AMPure to sample ratio 0.8:1, concentration 1.87 ng/$$\mu $$l, average size 595 bp) was submitted to BGI-Europe (Copenhagen, Denmark) for 100 bp HiSeq 2000 Illumina sequencing. Quality control of sequencing results was conducted in FastQC 0.11.3 (Andrews 2010). Data assembly and genetic structure.— Assembly of GBS loci was performed using the pyRAD 3.0.61 pipeline (Eaton 2014). The pyRAD procedure consisted of seven sequential steps, with parameters based on those recommended for single-end GBS data in the pyRAD documentation (http://dereneaton.com/software/pyrad/; see Supplementary Appendix S2 available on Dryad for details). Since phylogenetic results are known to be sensitive to the similarity threshold employed for within-sample and across-sample sequence clustering ($$c$$) (Takahashi et al. 2014; Mastretta-Yanes et al. 2015; Shafer et al. 2016), 16 assemblies of GBS loci were generated using a range of clustering thresholds from $$c= 0.80$$ to $$c = 0.95$$. Statistical base calling was conducted following Li et al. (2008), with a minimum depth of coverage of six and a maximum of five Ns in consensus sequences. Filtering of putative paralogs was performed by setting a maximum of two alleles (corresponding to diploid organisms) and eight heterozygous sites per consensus sequence. Loci with a minimum taxon coverage ($$m$$, minimum number of samples with data) of four and a maximum of three individuals with shared polymorphic sites (as a further filter of putative paralogs) were retained. Additional assemblies were generated for particular downstream analyses by excluding specific individuals and varying the minimum taxon coverage ($$m =$$ 10, $$m =$$ 20; see below). Henceforth, assemblies are denoted as cXmY, with X being the clustering threshold and Y being the minimum taxon coverage (Table 2). Table 2. Characteristics of assembled genotyping-by-sequencing data sets generated in pyRAD and used in phylogenetic analyses (81 individuals) Data set  Clustering threshold  Minimum taxon coverage  Number of loci  Concatenated length (bp)  Missing data, %  Number of PICs  c84m4  0.84  4  22,106  2,039,028  85.50  81,001  c84m10  0.84  10  9090  850,257  74.72  57,815  c84m20  0.84  20  3888  367,013  62.18  32,559  c85m4  0.85  4  22,586  2,079,729  85.62  80,502  c85m10  0.85  10  9203  858,302  74.81  57,033  c85m20  0.85  20  3894  366,417  62.24  31,842  c87m4  0.87  4  23,593  2,166,830  85.91  78,488  c87m10  0.87  10  9475  880,128  75.28  54,553  c87m20  0.87  20  3881  363,324  62.65  29,259  c92m4  0.92  4  29,037  2,647,482  86.90  72,514  c92m10  0.92  10  10,738  986,997  76.42  47,175  c92m20  0.92  20  3899  359,814  63.09  22,620  Data set  Clustering threshold  Minimum taxon coverage  Number of loci  Concatenated length (bp)  Missing data, %  Number of PICs  c84m4  0.84  4  22,106  2,039,028  85.50  81,001  c84m10  0.84  10  9090  850,257  74.72  57,815  c84m20  0.84  20  3888  367,013  62.18  32,559  c85m4  0.85  4  22,586  2,079,729  85.62  80,502  c85m10  0.85  10  9203  858,302  74.81  57,033  c85m20  0.85  20  3894  366,417  62.24  31,842  c87m4  0.87  4  23,593  2,166,830  85.91  78,488  c87m10  0.87  10  9475  880,128  75.28  54,553  c87m20  0.87  20  3881  363,324  62.65  29,259  c92m4  0.92  4  29,037  2,647,482  86.90  72,514  c92m10  0.92  10  10,738  986,997  76.42  47,175  c92m20  0.92  20  3899  359,814  63.09  22,620  Notes: Assembly parameters, data set sizes, completeness, and numbers of PICs are indicated. Before starting phylogenetic analyses, we tested the correspondence between morphologically defined taxa and genetic clusters by analysing unlinked single nucleotide polymorphism (SNP) matrices using three approaches: principal component analysis (PCA; Waterhouse et al. 2009), Neighbor-Net phylogenetic network estimation (Bryant and Moulton 2004) and Bayesian clustering (Pritchard et al. 2000; Corander et al. 2004) (see Supplementary Appendix S2 available on Dryad for details). As a result of these analyses, eight individuals of uncertain genetic identity, either because of their putative recent hybrid origin or low-quality sequencing results ($$>$$99% missing data in the final data sets) were excluded from downstream phylogenetic analyses. Concatenation-based phylogenies.— Maximum likelihood (ML) analyses of the 16 full-sequence concatenated data sets obtained under 16 clustering threshold values were implemented in RAxML 8.2.8 (Stamatakis 2014) using the GTRCAT substitution model (Stamatakis 2006) during tree search, followed by evaluation and optimization of the final tree under GTRGAMMA. The four Linaria subsect. Elegantes samples were set as the outgroup. Four alternative species-level topologies were found among the 16 ML trees (see Results). We selected four data sets, each yielding a distinct topology with high bootstrap support (BS) values, for further analysis and testing: c84m4, c85m4, c87m4, and c92m4. For these data sets, we additionally explored the performance of ML analyses based on concatenated SNP matrices (i.e. excluding invariant sites) using two acquisition bias corrections (Leaché et al. 2015a) implemented in RAxML (see Supplementary Appendix S2 available on Dryad for details). The four full-sequence matrices were also analyzed using Bayesian inference (BI), as implemented in ExaBayes 1.4.1 (Aberer et al. 2014), with the GTRGAMMA substitution model. To assess the sensitivity of the results to the minimum taxon coverage, assemblies generated with $$m =$$ 10 and $$m =$$ 20 and the same clustering threshold values (see Table 2) were analyzed by ML in RAxML using the same methods described above. Coalescent-based phylogenies.— It is known that the concat- enated approach may produce spurious phylogenetic results, particularly under high ILS, expected in rapid radiations. Among available gene-tree-based methods accounting for ILS, we selected NJ$$_{\mathrm{st}}$$ (Liu and Yu 2011) because of the following features: it is able to infer the species tree from unrooted gene trees (outgroup samples would be absent from many gene trees in our data set, impeding the rooting of gene trees); it can accommodate missing data; and it can handle allele data from multiple individuals per species. The four selected data sets yielding contrasting topologies in concatenated analyses (c84m4, c85m4, c87m4, and c92m4) were separately analyzed as follows. For all loci showing variability, gene trees were estimated using RAxML with the GTR$$+$$GAMMA substitution model and 100 bootstrap replicates. Fifty multilocus bootstrap replicates (Seo 2008; Mallo 2015) were generated, thus resampling nucleotides within loci, as well as loci within the data set. The NJ$$_{\mathrm{st}}$$ method was implemented on the 50 bootstrapped matrices using the R script NJstM (Mallo 2016), which relies on the phybase package (Liu and Yu 2010). A 50% majority-rule (MR) consensus tree was then built from the 50 bootstrap replicates in PAUP* 4 (Swofford 2002). Additionally, assemblies generated with higher values of $$m$$ ($$m =$$ 10 and $$m =$$ 20) and the same clustering threshold values (see Table 2) were analyzed using the same methods described above, to assess the sensitivity of the results to the minimum taxon coverage. For the same data sets analyzed in NJ$$_{\mathrm{st}}$$, we additionally implemented the quartet-based method SVDquartets (Chifman and Kubatko 2014), also accounting for ILS. This method can handle both unlinked SNP and multi-locus full-sequence data sets. Therefore, we analyzed both types of matrices generated under the same four clustering threshold values and three minimum taxon coverage values. Analyses were run in PAUP* 4 under the multispecies coalescent, evaluating all possible quartets. For each matrix, one hundred bootstrap replicates were conducted, and results were summarized in a 50% MR consensus tree. Tests of alternative topologies.— To examine if alternative tree topologies could be statistically rejected by each of the four selected concatenated data sets (c84m4, c85m4, c87m4 and c92m4), topology tests were conducted. First, for each of the four sequence matrices, constrained ML analyses were conducted in RAxML using the four alternative species-level topologies encountered in unconstrained analyses (Topologies 1–4), the topologies obtained from coalescent-based NJ$$_{\mathrm{st}}$$ (Topology 5) and SVDquartets (Topology 6) analyses and two additional topologies recovered from previous concatenated (Topology 7; Vigalondo et al. 2015) and coalescent-based (Topology 8; Fernández-Mazuecos et al. 2013a) analyses of ITS$$+$$ptDNA sequences. For each matrix, per-site log likelihoods using all constrained ML topologies were computed in RAxML under the GTR$$+$$GAMMA model, re-estimating model parameters for each tree. Then we used CONSEL v0.20 (Shimodaira and Hasegawa 2001) to calculate the $$P$$-values of topology tests, including the approximately unbiased (AU) test (Shimodaira 2002), the Shimodaira–Hasegawa (SH) test (Shimodaira and Hasegawa 1999) and the weighted SH (WSH) test (Buckley et al. 2001). To visualize differences in the numbers of loci supporting alternative topologies, we used the “partitioned RAD” approach (Hipp et al. 2014) implemented in the R package RADami (Hipp et al. 2014). The numbers of loci supporting and disfavoring each of the eight constrained topologies generated for topology tests (see above) were compared for each of the four selected data sets. A set of unique trees was generated for each locus (by pruning the original eight trees to those tips present in each locus), and the log likelihood of each unique tree for each locus was calculated in RAxML under the GTR$$+$$GAMMA model. For each locus, trees were then ranked as supported if they were within two log likelihood units of the best supported tree for that locus, or disfavored if they were within two log likelihood units of the least supported tree for that locus. Finally, for each data set, the number of loci supporting or disfavoring each of the eight alternative trees was plotted against the tree log likelihood calculated using the concatenated matrix. Hybridization analyses.— Coalescent-based species tree analyses provided two alternative topologies (see Results). We explored ancestral hybridization as a potential source of these conflicting results using two approaches: D-statistic tests (Durand et al. 2011; Eaton et al. 2015) and the pseudolikelihood-based phylogenetic network method SNaQ (Species Networks applying Quartets), which incorporates both ILS and hybridization, and employs unrooted gene trees (Solís-Lemus and Ané 2016). Four-taxon $$D$$-statistic tests (ABBA–BABA tests) were conducted in pyRAD, including heterozygous sites (Durand et al. 2011; Eaton et al. 2015). Analyses were performed on the c84m4, c85m4, c87m4, and c92m4 data sets using a set of selected individuals with good-quality sequencing, and assuming either the NJ$$_{\mathrm{st}}$$ or the SVDquartets phylogeny as the species tree (representing the major vertical inheritance pattern, MVIP). For each data set, three sets of tests were conducted exploring the potential role of hybridization on the alternative positions of CLE and SPA in the NJ$$_{\mathrm{st}}$$, SVDquartets and concatenation-based trees (see Supplementary Appendix S2 available on Dryad for details). The SNaQ method (Solís-Lemus and Ané 2016) was implemented using a small set of 27 individuals to reduce computational demand. For each of the c84m4, c85m4, c87m4, and c92m4 assemblies, ML gene trees were generated in RAxML and used as input for SNaQ analyses in PhyloNetworks 0.5.0. Since preliminary searches for an optimal network produced highly inconsistent results in independent runs, we focused on evaluating each of the NJ$$_{\mathrm{st}}$$ and SVDquartets species trees as fixed candidate topologies representing the MVIP, both without reticulation and with reticulation events (hybrid edges) accounting for incongruences with the alternative species tree topology. After optimization, the fit of trees and networks to the data was evaluated based on pseudo-deviance values, and estimated inheritance probabilities (i.e. the proportion of genes contributed by each parental population to a hybrid taxon) were visualized. Patterns of Morphological Evolution Morphometric and phylogenomic data were integrated to investigate patterns of morphological evolution during the radiation of the study clade. A phylomorphospace approach, implemented in the R package phytools (Revell 2012), was used to map the history of morphological diversification and explore the magnitude and direction of morphological changes (Sidlauskas 2008). The coalescent-based MR phylogenies obtained in NJ$$_{\mathrm{st}}$$ and SVDquartets, which provided the most robust phylogenetic hypotheses (see below), were used. After estimating branch lengths by ML (see Supplementary Appendix S2 available on Dryad for details), both trees were projected onto the multivariate morphospace defined by the first three canonical discriminant functions of our DFA of vegetative and reproductive traits, with each taxon represented by the mean values of the functions for all examined individuals. The same approach was used to project the phylogeny into the two-dimensional flower morphospace defined by the geometric morphometric study of Fernández-Mazuecos et al. (2013a), based on canonical variate (CV) analysis of landmark data. For comparison with geographic patterns of speciation, the phylogeny was also mapped onto the geographic distribution of the eight taxa in the Iberian Peninsula. In addition, we investigated shifts in particular morphological characters and instances of phenotypic convergence by conducting ancestral state reconstructions (ASRs) in Mesquite 3.04 (Maddison and Maddison 2011) using the coalescent-based species trees (NJ$$_{\mathrm{st}}$$ and SVDquartets). Maximum parsimony (MP) and ML methods were applied. Four key traits determining morphological diversity in the study clade were analyzed: habit (annual vs. perennial), inflorescence density (lax vs. dense), dominant corolla color (purple vs. yellow) and corolla shape. For corolla shape, three types based on geometric morphometric analyses were defined (Fernández-Mazuecos et al. 2013a): type I (broad corolla tube and long nectar spur), type II (broad corolla tube and short nectar spur) and type III (narrow corolla tube and long nectar spur). Results Morphometric Analysis Morphometric analyses revealed the eight taxa in the Iberian clade of Linaria subsect. Versicolores as morphologically distinct units (Fig. 1a,b, Supplementary Fig. S3 and Table S4 available on Dryad). Beanplots (Supplementary Fig. S3 available on Dryad) depicted differences between taxa in particular morphological traits, especially between CLE and other taxa for stem length, flower pedicel length, spur width, and corolla/spur length ratio. The first three coordinates of the PCoA (explaining 54.8% of variance) depicted a complex morphospace (Fig. 1a), with morphological affinities between individuals and taxa consistent with taxonomic expectations, and different degrees of morphological variation within taxa. VIS, SPA, and SAL displayed the highest overall morphological variation, while BEC, CLE and ALG displayed the smallest variation. Individuals of different species did not generally intermingle in the morphospace, despite some overlap of 95% confidence ellipses of concentration. CRA specimens were found to fall within the variation of SPA and were considered as SPA for the DFA analysis. Figure 1. View largeDownload slide Results of morphometric and genetic structure analyses of the eight taxa included in the Iberian clade of Linaria subsect. Versicolores (see Table 1 for the meaning of acronyms). a) Morphospace based on PCoA of all morphometric data (24 quantitative and 3 qualitative variables; 166 individuals); values of the first three coordinates, explaining 54.8% of variance, are represented. b) Morphospace based on DFA of quantitative variables; values of the first three canonical discriminant functions, explaining 79.5% of variation among groups, are represented. c) Principal component analysis based on the unlinked single nucleotide polymorphism (SNP) matrix obtained from the c85m4 data set (5265 SNPs; 76 individuals); values of coordinates 1, 2, and 4, explaining 72.1% of variance, are represented. In all panels, 95% confidence ellipses of concentration for each of the eight species are shown, and arrows indicate CRA individuals (ultimately assigned to SPA; see text). Figure 1. View largeDownload slide Results of morphometric and genetic structure analyses of the eight taxa included in the Iberian clade of Linaria subsect. Versicolores (see Table 1 for the meaning of acronyms). a) Morphospace based on PCoA of all morphometric data (24 quantitative and 3 qualitative variables; 166 individuals); values of the first three coordinates, explaining 54.8% of variance, are represented. b) Morphospace based on DFA of quantitative variables; values of the first three canonical discriminant functions, explaining 79.5% of variation among groups, are represented. c) Principal component analysis based on the unlinked single nucleotide polymorphism (SNP) matrix obtained from the c85m4 data set (5265 SNPs; 76 individuals); values of coordinates 1, 2, and 4, explaining 72.1% of variance, are represented. In all panels, 95% confidence ellipses of concentration for each of the eight species are shown, and arrows indicate CRA individuals (ultimately assigned to SPA; see text). The first three canonical discriminant functions of the DFA (explaining 79.5% of variation among groups) clearly discriminated CLE from two other clusters of species, one formed by INC and ONU, and the other formed by ALG, BEC, SPA, SAL and VIS (Fig. 1b; Supplementary Table S5 available on Dryad). The isolation of CLE was mainly the result of significant differences in the ratio corolla/spur length (Supplementary Table S5 and Fig. S3 available on Dryad). In the DFA, 100% of cases were correctly classified in the original species grouping, and 96.98% in the cross-validated cases, with ALG, BEC, and CLE showing 100% of correct classification (Supplementary Table S6 available on Dryad). Phylogenomic Analysis Data assembly and genetic structure.— Illumina sequencing provided c. 155 million raw reads, with a GC content of 43.05% and bases with quality $$>$$Q20 at 96.36%. Assessment in FastQC showed high quality throughout the full 100 bp length of the first reads and little signs of contamination (based on per sequence GC content). Assembly in pyRAD with a minimum taxon coverage of 4 and the full set of individuals resulted in a total number of loci between 21,283 and 36,435 and a concatenated length of 1.97–3.30 Mbp depending on the clustering threshold. Percentages of missing data were high (86.41–88.85%). The number of loci, concatenated length and percentage of missing data increased with the clustering threshold, while the number of parsimony-informative characters (PICs) decreased (Table 2). The steep increase in number of loci for the highest values of clustering threshold suggested “over-splitting,” where orthologous sequences are divided into separate loci (Harvey et al. 2015). The average number of loci per species of the study clade ranged between 2630 (VIS) and 4464 (BEC), and the number of sequenced loci per individual decreased quickly with sample age (Supplementary Fig. S4 available on Dryad), with four herbarium specimens of VIS older than 25 years producing low-quality results. After excluding problematic individuals, similar numbers of loci and percentages of missing data were found when using a minimum taxon coverage of 4. When increasing the minimum taxon coverage, the percentage of missing data improved, but the number of loci, concatenated length and number of PICs decreased dramatically (Table 2). SNP-based genetic structure analyses produced qualitatively similar results across data sets. Genetic clusters broadly corresponding with morphologically defined taxa were recovered by PCA (Fig. 1c), Neighbor-Net (Supplementary Fig. S5a available on Dryad) and Bayesian clustering (Supplementary Fig. S5b available on Dryad) analyses. The CRA individual was consistently included in the SPA cluster. Individuals of the two color morphs of SAL were consistently included in the same genetic cluster. Eight individuals with uncertain genetic affinities were excluded from further analyses, including six individuals of VIS, one of SPA and one of ONU (see Supplementary Appendix S2 available on Dryad for details). Concatenation-based phylogenies.— ML phylogenetic anal- yses of the 16 data sets supported the monophyly of the Iberian clade of Linaria subsect. Versicolores and its sister relationship to the North African clade (Fig. 2a–d, Supplementary Fig. S6 available on Dryad). The eight taxa were recovered as monophyletic groups with 100% BS, and four alternative species-level topologies were obtained (Figs 2a–d and 3). Three pairs of sister species were consistently recovered from all data sets: (BEC,SAL), (VIS,ONU) and (SPA,INC). In contrast, two species displayed two alternative phylogenetic placements each with variable BS: CLE was recovered as either sister to a clade formed by the remaining seven species (8 data sets, e.g. Fig. 2c,d) or sister to the (BEC,SAL) clade (8 data sets, e.g. Fig. 2a,b); and ALG was recovered as either sister to a ((VIS,ONU),(SPA,INC)) clade (10 data sets, e.g. Fig. 2a,d) or sister to the (SPA,INC) clade (6 data sets, e.g. Fig. 2b,c). The c84m4, c85m4, c87m4, and c92m4 data sets were selected as representatives of the four alternative topologies (Topologies 1–4) for further analysis because of their high average BS for species-level divergences. Topologies produced by ML analyses of concatenated SNP matrices fell within the range of topologies produced by full-sequence analyses (Supplementary Fig. S7; see Appendix S2 available on Dryad for details). Figure 2. View largeDownload slide Results from concatenation- and coalescent-based phylogenetic analyses of genotyping-by-sequencing data sets of the Iberian clade of Linaria subsect. Versicolores. a–d) Four alternative maximum-likelihood (ML) phylogenetic trees obtained in RAxML from concatenated analyses of data sets assembled using different clustering thresholds: a) c84m4, b) c85m4, c) c87m4, and d) c92m4 (see Table 2). Numbers above branches are percentage bootstrap values from ML analyses. Numbers below branches are posterior probabilities from Bayesian analyses in ExaBayes. e) Coalescent-based species tree obtained using the NJ$$_{\text{st}}$$ method, after estimation of unrooted gene trees by ML in RAxML with 100 bootstrap replicates. The 50% MR consensus of 50 multilocus bootstrap replicates is shown. The same MR topology was obtained from four genotyping-by-sequencing data sets assembled using different clustering thresholds (c84m4, c85m4, c87m4, c92m4; see Table 2). Numbers above branches are multilocus bootstrap support values (in percentage) from the c84m4 and c85m4 data sets. Numbers below branches are multilocus bootstrap support values from the c87m4 and c92m4 data sets. f) Coalescent-based species tree obtained using the SVDquartets method with 100 bootstrap replicates. The 50% MR consensus obtained from the same four data sets is shown. Numbers above branches are bootstrap support values from the c84m4 and c85m4 data sets. Numbers below branches are bootstrap support values from the c87m4 and c92m4 data sets. g and h) Fixed network topologies evaluated in a coalescent with hybridization framework using the SNaQ method: g) a network with the NJ$$_{\text{st}}$$ tree as major vertical inheritance pattern (MVIP) and reticulations representing incongruences with the SVDquartets tree, and h) a network with the SVDquartets tree as MVIP and reticulations representing incongruences with the NJ$$_{\text{st}}$$ tree. The four values attached to hybrid edges represent inheritance probabilities (i.e., the proportions of genes contributed by each parental population to a hybrid taxon) estimated from the c84m4, c85m4, c87m4, and c92m4 data sets, respectively. Figure 2. View largeDownload slide Results from concatenation- and coalescent-based phylogenetic analyses of genotyping-by-sequencing data sets of the Iberian clade of Linaria subsect. Versicolores. a–d) Four alternative maximum-likelihood (ML) phylogenetic trees obtained in RAxML from concatenated analyses of data sets assembled using different clustering thresholds: a) c84m4, b) c85m4, c) c87m4, and d) c92m4 (see Table 2). Numbers above branches are percentage bootstrap values from ML analyses. Numbers below branches are posterior probabilities from Bayesian analyses in ExaBayes. e) Coalescent-based species tree obtained using the NJ$$_{\text{st}}$$ method, after estimation of unrooted gene trees by ML in RAxML with 100 bootstrap replicates. The 50% MR consensus of 50 multilocus bootstrap replicates is shown. The same MR topology was obtained from four genotyping-by-sequencing data sets assembled using different clustering thresholds (c84m4, c85m4, c87m4, c92m4; see Table 2). Numbers above branches are multilocus bootstrap support values (in percentage) from the c84m4 and c85m4 data sets. Numbers below branches are multilocus bootstrap support values from the c87m4 and c92m4 data sets. f) Coalescent-based species tree obtained using the SVDquartets method with 100 bootstrap replicates. The 50% MR consensus obtained from the same four data sets is shown. Numbers above branches are bootstrap support values from the c84m4 and c85m4 data sets. Numbers below branches are bootstrap support values from the c87m4 and c92m4 data sets. g and h) Fixed network topologies evaluated in a coalescent with hybridization framework using the SNaQ method: g) a network with the NJ$$_{\text{st}}$$ tree as major vertical inheritance pattern (MVIP) and reticulations representing incongruences with the SVDquartets tree, and h) a network with the SVDquartets tree as MVIP and reticulations representing incongruences with the NJ$$_{\text{st}}$$ tree. The four values attached to hybrid edges represent inheritance probabilities (i.e., the proportions of genes contributed by each parental population to a hybrid taxon) estimated from the c84m4, c85m4, c87m4, and c92m4 data sets, respectively. BI phylogenetic analyses of the c84m4, c85m4, c87m4, and c92m4 data sets recovered MR trees topologically identical to the ML trees obtained from the same data sets, and displaying high posterior probabilities (PP $$\approx $$ 1) for all species divergences (Fig. 2a–d). When increasing the minimum taxon coverage parameter, ML analyses (Supplementary Fig. S6 available on Dryad) displayed varied topologies with a general decrease in statistical support of clades (particularly for $$m =$$ 20). Coalescent-based phylogenies.— Unlike the contrasting topologies obtained from concatenated analyses, coalescent-based NJ$$_{\mathrm{st}}$$ analyses of the c84m4, c85m4, c87m4, and c92m4 data sets produced identical MR topologies, although with variable BS values (Fig. 2e, Supplementary Fig. S8 available on Dryad). The monophyly of the Iberian clade of Linaria subsect. Versicolores (98–100% BS) and its sister relationship to the North African clade were strongly supported. Two major, geographically structured subclades were recovered within the Iberian clade (Figs 2e, 4a, and 5): a clade formed by SAL, BEC and CLE, endemic to the south-eastern Iberian Peninsula (74–94% BS); and a clade formed by VIS, ONU, ALG, SPA, and INC, mostly distributed in the western Iberian Peninsula (100% BS in all analyses). Within the south-eastern Iberian clade, SAL was recovered as sister to a (BEC, CLE) clade. This phylogenetic position of CLE as sister to BEC (94–100% BS) was not recovered from concatenated analyses. Within the western Iberian clade, two sister species pairs found in the NJ$$_{\mathrm{st}}$$ tree were also consistently recovered from concatenated analyses: (VIS, ONU) (100% BS in all analyses) and (SPA, INC) (88–100% BS). ALG was recovered as sister to the (SPA, INC) pair, one of the two alternative positions identified in concatenated analyses. When increasing the minimum taxon coverage parameter to $$m =$$ 10, we obtained the same MR topology for all data sets, although with lower BS values. For $$m =$$ 20, there was a general loss of resolution, including basal polytomies and poor BS values (Supplementary Fig. S8 available on Dryad). SVDquartets analyses produced another topology (Fig. 2f, Supplementary Fig. S8 available on Dryad) that was highly robust to variations in clustering threshold and minimum taxon coverage. The monophyly of the Iberian clade (98–100% BS for analyses with $$m =$$ 4) and its sister relationship to the North African clade were again strongly supported, but there were two differences with the NJ$$_{\mathrm{st}}$$ tree: CLE was consistently recovered as sister to a clade formed by the remaining seven species (100% BS), and SPA was found to form a clade (69–97% BS) with the (VIS, ONU) pair (75–88% BS). The same topology was obtained for all analyses with $$m =$$ 10 and three out of four analyses with $$m =$$ 20 (Supplementary Fig. S8 available on Dryad). Tests of alternative topologies.— The two topologies obtained in previous analyses of ITS$$+$$ptDNA sequences (Topologies 7, 8) were strongly rejected by all tests implemented for all analyzed GBS data sets (Table 3). The SVDquartets tree (Topology 6) was rejected by all tests for the c85m4, c87m4, and c92m4 data sets. None of the remaining five topologies (four concatenated and NJ$$_{\mathrm{st}})$$ were consistently rejected. SH tests could not reject any of these five topologies for any of the data sets, while WSH tests only rejected the NJ$$_{\mathrm{st}}$$ topology (topology 5) for the c92m4 data set. AU tests rejected the NJ$$_{\mathrm{st}}$$ topology for the c87m4 and c92m4 data sets and one of the concatenation-based topologies (topology 2) for the c92m4 data set. All remaining $$P$$-values were nonsignificant (Table 3). Table 3. Results from topology tests in CONSEL Data set  Rank  Topology  Obs.  AU  SH  WSH  c84m4  1  1  –32.5  0.841  0.98  0.985     2  4  32.5  0.466  0.825  0.758     3  2  59.2  0.262  0.752  0.511     4  3  76.7  0.244  0.663  0.57     5  5  164.2  0.102  0.398  0.251     6  6  239.1  0.061  0.207  0.195     7  8  978.7  1.00E-07$$^*$$  $$0^*$$  $$0^*$$     8  7  6270.7  2.00E-08$$^*$$  $$0^*$$  $$0^*$$  c85m4  1  2  –29.9  0.823  0.986  0.991     2  5  29.9  0.365  0.791  0.694     3  3  46.1  0.335  0.746  0.647     4  1  85.1  0.12  0.615  0.294     5  4  137  0.063  0.411  0.247     6  6  456.7  0.001$$^*$$  0.004$$^*$$  0.001$$^*$$     7  8  1055  1.00E-04$$^*$$  $$0^*$$  $$0^*$$     8  7  6398.8  4.00E-05$$^*$$  $$0^*$$  $$0^*$$  c87m4  1  3  –61.4  0.897  0.991  0.997     2  4  61.4  0.205  0.69  0.489     3  2  71.4  0.228  0.655  0.477     4  1  115.5  0.137  0.491  0.356     5  5  182.4  0.015$$^*$$  0.287  0.099     6  6  442.4  1.00E-04$$^*$$  0.005$$^*$$  2.00E-04$$^*$$     7  8  1323.9  1.00E-06$$^*$$  $$0^*$$  $$0^*$$     8  7  6279  2.00E-06$$^*$$  $$0^*$$  $$0^*$$  c92m4  1  4  –75.7  0.89  0.993  0.996     2  1  75.7  0.198  0.64  0.496     3  3  77  0.173  0.639  0.435     4  2  155.7  0.033$$^*$$  0.346  0.174     5  5  268.1  0.005$$^*$$  0.136  0.027$$^*$$     6  6  488.9  4.00E-11$$^*$$  0.004$$^*$$  $$0^*$$     7  8  1282.5  3.00E-24$$^*$$  $$0^*$$  $$0^*$$     8  7  5344.1  5.00E-08$$^*$$  $$0^*$$  $$0^*$$  Data set  Rank  Topology  Obs.  AU  SH  WSH  c84m4  1  1  –32.5  0.841  0.98  0.985     2  4  32.5  0.466  0.825  0.758     3  2  59.2  0.262  0.752  0.511     4  3  76.7  0.244  0.663  0.57     5  5  164.2  0.102  0.398  0.251     6  6  239.1  0.061  0.207  0.195     7  8  978.7  1.00E-07$$^*$$  $$0^*$$  $$0^*$$     8  7  6270.7  2.00E-08$$^*$$  $$0^*$$  $$0^*$$  c85m4  1  2  –29.9  0.823  0.986  0.991     2  5  29.9  0.365  0.791  0.694     3  3  46.1  0.335  0.746  0.647     4  1  85.1  0.12  0.615  0.294     5  4  137  0.063  0.411  0.247     6  6  456.7  0.001$$^*$$  0.004$$^*$$  0.001$$^*$$     7  8  1055  1.00E-04$$^*$$  $$0^*$$  $$0^*$$     8  7  6398.8  4.00E-05$$^*$$  $$0^*$$  $$0^*$$  c87m4  1  3  –61.4  0.897  0.991  0.997     2  4  61.4  0.205  0.69  0.489     3  2  71.4  0.228  0.655  0.477     4  1  115.5  0.137  0.491  0.356     5  5  182.4  0.015$$^*$$  0.287  0.099     6  6  442.4  1.00E-04$$^*$$  0.005$$^*$$  2.00E-04$$^*$$     7  8  1323.9  1.00E-06$$^*$$  $$0^*$$  $$0^*$$     8  7  6279  2.00E-06$$^*$$  $$0^*$$  $$0^*$$  c92m4  1  4  –75.7  0.89  0.993  0.996     2  1  75.7  0.198  0.64  0.496     3  3  77  0.173  0.639  0.435     4  2  155.7  0.033$$^*$$  0.346  0.174     5  5  268.1  0.005$$^*$$  0.136  0.027$$^*$$     6  6  488.9  4.00E-11$$^*$$  0.004$$^*$$  $$0^*$$     7  8  1282.5  3.00E-24$$^*$$  $$0^*$$  $$0^*$$     8  7  5344.1  5.00E-08$$^*$$  $$0^*$$  $$0^*$$  Notes: Four genotyping-by-sequencing data sets were used to test eight alternative topologies in a maximum likelihood framework with concatenated matrices. Tested topologies (summarized in Fig. 3) included the four alternative optimal topologies produced by concatenated analyses of the same four data sets (Topologies 1–4; Fig. 2a–d), the topologies produced by coalescent-based NJ$$_{\mathrm{st}}$$ (Topology 5; Fig. 2e), and SVDquartets (Topology 6; Fig. 2f) analyses of all four data sets and two topologies produced by previously-published analyses of internal transcribed spacer and plastid DNA sequences (Topologies 7, 8). For each data set, topologies are ranked by the test statistics (Obs.), and $$P$$-values of the approximately unbiased (AU), Shimodaira–Hasegawa (SH), and Weighted Shimodaira–Hasegawa (WSH) tests are shown. Asterisks indicate significant results ($$P $$ 0.05). Figure 3. View largeDownload slide Numbers of loci supporting and disfavoring constrained concatenated trees with eight alternative topologies (1–8, left) in four genotyping-by-sequencing data sets assembled using different clustering thresholds (c84m4, c85m4, c87m4, c92m4; right; see Table 2). Topologies 1–4 are the four alternative optimal topologies respectively produced by concatenated analyses of the same four data sets (see Fig. 2a–d). Topology 5 is the one produced by coalescent-based analyses of all four data sets in NJ$$_{\text{st}}$$ (see Fig. 2e). Topology 6 is the one produced by coalescent-based analyses of all four data sets in SVDquartets (see Fig. 2f). Topologies 7 and 8 were produced by previously published concatenated (Topology 7) and coalescent-based (Topology 8) analyses of combined ITS and ptDNA sequences (Fernández-Mazuecos et al. 2013a; Vigalondo et al. 2015). For each data set, numbers of loci significantly supporting and disfavoring each tree topology are plotted against the log-likelihood of the tree when using the full concatenated matrix. Black dots represent the ML tree for each data set. White dots represent constrained trees that were significantly rejected by Shimodaira–Hasegawa tests, and grey dots represent constrained trees that were not significantly rejected (see Table 3). Figure 3. View largeDownload slide Numbers of loci supporting and disfavoring constrained concatenated trees with eight alternative topologies (1–8, left) in four genotyping-by-sequencing data sets assembled using different clustering thresholds (c84m4, c85m4, c87m4, c92m4; right; see Table 2). Topologies 1–4 are the four alternative optimal topologies respectively produced by concatenated analyses of the same four data sets (see Fig. 2a–d). Topology 5 is the one produced by coalescent-based analyses of all four data sets in NJ$$_{\text{st}}$$ (see Fig. 2e). Topology 6 is the one produced by coalescent-based analyses of all four data sets in SVDquartets (see Fig. 2f). Topologies 7 and 8 were produced by previously published concatenated (Topology 7) and coalescent-based (Topology 8) analyses of combined ITS and ptDNA sequences (Fernández-Mazuecos et al. 2013a; Vigalondo et al. 2015). For each data set, numbers of loci significantly supporting and disfavoring each tree topology are plotted against the log-likelihood of the tree when using the full concatenated matrix. Black dots represent the ML tree for each data set. White dots represent constrained trees that were significantly rejected by Shimodaira–Hasegawa tests, and grey dots represent constrained trees that were not significantly rejected (see Table 3). Topologies 7 and 8 were also consistently supported by the smallest numbers of loci and disfavored by the highest numbers of loci (Fig. 3). The ML tree for each concatenated data set was also supported by the highest number and disfavored by the smallest number of loci. The remaining coalescent- and concatenation-based topologies were supported and disfavored by intermediate numbers of loci, with the NJ$$_{\mathrm{st}}$$ tree being consistently supported by more loci than the SVDquartets tree (Fig. 3). Hybridization analyses.— When assuming the NJ$$_{\mathrm{st}}$$ species tree as MVIP in $$D$$-statistic tests, we found clear evidence for an excess of shared derived alleles between BEC and SAL in three of the data sets (24–30% of significant tests depending on the data set) but weaker evidence for an excess of shared derived alleles between SPA and VIS/ONU (8–13% of significant tests) (Table 4). When assuming the SVDquartets species tree as MVIP, the tests (Table 4) found evidence for an excess of shared derived alleles between SPA and INC (24–41% of significant tests), and between BEC and CLE (18–35% of significant tests). Table 4. D-statistic tests conducted on four genotyping-by-sequencing data sets given a four-taxon tree (((P1,P2),P3),O) Data set  Test set  Assumed species tree  P1  P2  P3  O  N  Sig. (ABBA $$>$$ BABA)  Sig. (BABA $$>$$ ABBA)  c84m4  1  NJ$$_{\mathrm{st}}$$  ALG, INC  SPA  VIS, ONU  BEC, SAL  648  9.26  3.55     2  NJ$$_{\mathrm{st}}$$  CLE  BEC  SAL  VIS, ONU, ALG, SPA, INC  405  23.95  0     3  SVDquartets  VIS, ONU  SPA  INC  BEC, SAL  324  25.00  0.31     4  SVDquartets  ALG, INC, SPA, VIS, ONU  BEC  CLE  NAF  540  22.59  0.00  c85m4  1  NJ$$_{\mathrm{st}}$$  ALG, INC  SPA  VIS, ONU  BEC, SAL  648  8.33  2.31     2  NJ$$_{\mathrm{st}}$$  CLE  BEC  SAL  VIS, ONU, ALG, SPA, INC  405  27.16  0     3  SVDquartets  VIS, ONU  SPA  INC  BEC, SAL  324  24.07  0.62     4  SVDquartets  ALG, INC, SPA, VIS, ONU  BEC  CLE  NAF  540  34.63  0.00  c87m4  1  NJ$$_{\mathrm{st}}$$  ALG, INC  SPA  VIS, ONU  BEC, SAL  648  11.11  5.40     2  NJ$$_{\mathrm{st}}$$  CLE  BEC  SAL  VIS, ONU, ALG, SPA, INC  405  29.88  0     3  SVDquartets  VIS, ONU  SPA  INC  BEC, SAL  324  38.27  0.31     4  SVDquartets  ALG, INC, SPA, VIS, ONU  BEC  CLE  NAF  540  17.96  0.74  c92m4  1  NJ$$_{\mathrm{st}}$$  ALG, INC  SPA  VIS, ONU  BEC, SAL  648  12.65  2.01     2  NJ$$_{\mathrm{st}}$$  CLE  BEC  SAL  VIS, ONU, ALG, SPA, INC  405  5.68  1.73     3  SVDquartets  VIS, ONU  SPA  INC  BEC, SAL  324  41.36  0.31     4  SVDquartets  ALG, INC, SPA, VIS, ONU  BEC  CLE  NAF  540  18.89  0.93  Data set  Test set  Assumed species tree  P1  P2  P3  O  N  Sig. (ABBA $$>$$ BABA)  Sig. (BABA $$>$$ ABBA)  c84m4  1  NJ$$_{\mathrm{st}}$$  ALG, INC  SPA  VIS, ONU  BEC, SAL  648  9.26  3.55     2  NJ$$_{\mathrm{st}}$$  CLE  BEC  SAL  VIS, ONU, ALG, SPA, INC  405  23.95  0     3  SVDquartets  VIS, ONU  SPA  INC  BEC, SAL  324  25.00  0.31     4  SVDquartets  ALG, INC, SPA, VIS, ONU  BEC  CLE  NAF  540  22.59  0.00  c85m4  1  NJ$$_{\mathrm{st}}$$  ALG, INC  SPA  VIS, ONU  BEC, SAL  648  8.33  2.31     2  NJ$$_{\mathrm{st}}$$  CLE  BEC  SAL  VIS, ONU, ALG, SPA, INC  405  27.16  0     3  SVDquartets  VIS, ONU  SPA  INC  BEC, SAL  324  24.07  0.62     4  SVDquartets  ALG, INC, SPA, VIS, ONU  BEC  CLE  NAF  540  34.63  0.00  c87m4  1  NJ$$_{\mathrm{st}}$$  ALG, INC  SPA  VIS, ONU  BEC, SAL  648  11.11  5.40     2  NJ$$_{\mathrm{st}}$$  CLE  BEC  SAL  VIS, ONU, ALG, SPA, INC  405  29.88  0     3  SVDquartets  VIS, ONU  SPA  INC  BEC, SAL  324  38.27  0.31     4  SVDquartets  ALG, INC, SPA, VIS, ONU  BEC  CLE  NAF  540  17.96  0.74  c92m4  1  NJ$$_{\mathrm{st}}$$  ALG, INC  SPA  VIS, ONU  BEC, SAL  648  12.65  2.01     2  NJ$$_{\mathrm{st}}$$  CLE  BEC  SAL  VIS, ONU, ALG, SPA, INC  405  5.68  1.73     3  SVDquartets  VIS, ONU  SPA  INC  BEC, SAL  324  41.36  0.31     4  SVDquartets  ALG, INC, SPA, VIS, ONU  BEC  CLE  NAF  540  18.89  0.93  Notes: Three individuals per species of the Iberian clade and four individuals from different species of the North African clade were selected, and introgression hypotheses potentially explaining topological differences between the two species trees obtained in coalescent-based analyses (Fig. 2e,f) were tested. For each set of tests, we show the number of tests conducted using different combinations of individuals ($$N$$) and the percentage of significant tests ($$P < 0.05$$) for ABBA $$>$$BABA (potential introgression between P2 and P3) and BABA $$>$$ABBA (potential introgression between P1 and P3). Evaluation of the NJ$$_{\mathrm{st}}$$ and SVDquartets trees without reticulations in SNaQ (Table 5, Fig. 2g,h) consistently supported the NJ$$_{\mathrm{st}}$$ tree for all analyzed data sets. When including reticulations accounting for incongruences between species tree topologies, three data sets (c85m4, c87m4, and c92m4) supported the NJ$$_{\mathrm{st}}$$ topology as MVIP, while the c84m4 data set supported the SVDquartets topology (although in this case with a small difference in pseudo-deviance values). For the best supported network (with the NJ$$_{\mathrm{st}}$$ tree as MVIP), CLE was estimated to contain a 81–85% genomic contribution from a lineage sister to BEC and 15–19% from an ancestral lineage sister to the other seven species of the clade. Similarly, SPA was estimated to contain a 81–87% contribution from a lineage sister to INC and a 13–19% contribution from a lineage sister to the (VIS, ONU) clade. Table 5. Pseudo-deviance values obtained from tree and network evaluation using the SNaQ method   NJ$$_{\mathrm{st}}$$ tree without reticulations  SVDquartets tree without reticulations  NJ$$_{\mathrm{st}}$$ tree with reticulations  SVDquartets tree with reticulations  c84m4  734.72  1084.15  736.67  732.19  c85m4  758.05  1120.34  761.38  801.18  c87m4  817.74  1201.44  829.88  896.67  c92m4  1547.23  1648.7  1166.42  1406.25    NJ$$_{\mathrm{st}}$$ tree without reticulations  SVDquartets tree without reticulations  NJ$$_{\mathrm{st}}$$ tree with reticulations  SVDquartets tree with reticulations  c84m4  734.72  1084.15  736.67  732.19  c85m4  758.05  1120.34  761.38  801.18  c87m4  817.74  1201.44  829.88  896.67  c92m4  1547.23  1648.7  1166.42  1406.25  Notes: Four reduced data sets with 27 individuals were used to evaluate four fixed topologies, including the NJ$$_{\mathrm{st}}$$ and SVDquartets species tree topologies without reticulations (Fig. 2e,f), the NJ$$_{\mathrm{st}}$$ tree as major vertical inheritance pattern (MVIP) with reticulations representing incongruences with the SVDquartets tree (Fig. 2g), and the SVDquartets tree as MVIP with hybrid edges representing incongruences with the NJ$$_{\mathrm{st}}$$ tree (Fig. 2h). Lower pseudo-deviance values indicate a better fit. Patterns of Morphological Evolution The phylomorphospace based on vegetative and reproductive traits and the NJ$$_{\mathrm{st}}$$ species tree (Fig. 4b) suggested that SAL has retained morphological traits close to those of the common ancestor of the Iberian clade, while other species have experienced extensive morphological change in multiple directions. Changes of great magnitude in opposite directions of the morphospace are inferred during the divergence of pairs of sister species, including BEC/CLE, VIS/ONU and, to a lesser extent, SPA/INC. At the same time, convergence is suggested by the proximity in the morphospace of nonclosely related species pairs, such as INC/ONU, BEC/ALG and SAL/SPA. When focusing on flower shape (Fig. 4c), morphological disparity between sister species was even more striking. The ancestral flower for the clade seems to have had a corolla with a broad tube and relatively long nectar spur (Type I), a basic shape that has been retained by five species in both subclades (SAL, BEC, SPA, VIS and ALG). Speciation in the sister species pairs VIS/ONU and SPA/INC involved extensive change in flower shape along CV1, which independently gave rise to narrow-tubed, long-spurred flowers (Type III) in ONU and INC. By contrast, speciation of the BEC/CLE pair involved extensive change along CV2, producing the short-spurred, broad-tubed flowers (Type II) of CLE. The longest-spurred Type I flowers (with the lowest values of CV2) have evolved independently in BEC (sister to CLE) and ALG. Mapping of the phylogeny onto geographic ranges showed that pairs of morphologically divergent sister species display geographically close or even overlapping distributions (Fig. 5): BEC and CLE are both narrow endemics to the same small region in south-eastern Spain; VIS and ONU occur in south-western Iberia; and the distributions of SPA and INC largely overlap in the central-western Peninsula. MP reconstructions based on the NJ$$_{\mathrm{st}}$$ tree (Supplementary Fig. S9a available on Dryad) unambiguously inferred a single shift from annual to perennial habit (in CLE), and multiple shifts in inflorescence habit, dominant corolla color and corolla shape. ML reconstructions (Supplementary Fig. S9b available on Dryad) produced similar results. Figure 4. View largeDownload slide Patterns of morphological evolution in the Iberian clade of Linaria subsect. Versicolores based on coalescent-based phylogenetic analysis of genotyping-by-sequencing data. a) Coalescent-based species tree obtained using the NJ$$_{\text{st}}$$ method, with dotted lines representing the alternative position of two species in the SVDquartets tree. Flowers of the eight species of the Iberian clade of Linaria subsect. Versicolores are shown (photos by J. Ramírez and M. Fernández-Mazuecos), and mean values of spur length and tube width according to Fernández-Mazuecos et al. (2013a) are plotted. b) Phylomorphospace defined by the NJ$$_{\text{st}}$$ phylogeny and the first three canonical discriminant functions of the morphometric analysis based on vegetative and reproductive traits. Each species is represented by the mean values of the coordinates. c) Phylomorphospace defined by the NJ$$_{\text{st}}$$ phylogeny and the first two canonical variates of a previously published geometric morphometric analysis of flowers (Fernández-Mazuecos et al. 2013a). Figure 4. View largeDownload slide Patterns of morphological evolution in the Iberian clade of Linaria subsect. Versicolores based on coalescent-based phylogenetic analysis of genotyping-by-sequencing data. a) Coalescent-based species tree obtained using the NJ$$_{\text{st}}$$ method, with dotted lines representing the alternative position of two species in the SVDquartets tree. Flowers of the eight species of the Iberian clade of Linaria subsect. Versicolores are shown (photos by J. Ramírez and M. Fernández-Mazuecos), and mean values of spur length and tube width according to Fernández-Mazuecos et al. (2013a) are plotted. b) Phylomorphospace defined by the NJ$$_{\text{st}}$$ phylogeny and the first three canonical discriminant functions of the morphometric analysis based on vegetative and reproductive traits. Each species is represented by the mean values of the coordinates. c) Phylomorphospace defined by the NJ$$_{\text{st}}$$ phylogeny and the first two canonical variates of a previously published geometric morphometric analysis of flowers (Fernández-Mazuecos et al. 2013a). Broadly similar patterns of morphological evolution were inferred from the SVDquartets tree (Supplementary Fig. S9c–f available on Dryad), including extensive morphological change associated with speciation events, as well as phenotypic convergence. In phylomorphospace analyses, the main differences resulted from the early-diverging position of CLE in this tree. As a consequence, the extensive morphological change leading to CLE was associated with the first speciation event in the Iberian clade, instead of a more recent divergence with BEC. Discussion GBS, an Effective Method to Resolve Recent Radiations Our genotyping-by-sequencing approach proved successful in generating highly informative genome-wide genetic markers, not only across the recent (Quaternary) radiation of our 8-species study clade, but also across the whole Linaria sect. Versicolores, whose most recent common ancestor has been dated back to the late Miocene-Pliocene (Fernández-Mazuecos and Vargas 2011; Fernández-Mazuecos et al. 2013a). Regardless of the method used for phylogenetic inference, inferred basal relationships (i.e., the monophyly and sister relationship of the Iberian and North African clades of subsect. Versicolores; Figs 2 and 4a) were identical to those obtained using conventional phylogenetic markers. The two major subclades within the Iberian clade supported by coalescent-based analyses of GBS data in NJ$$_{\mathrm{st}}$$ (Fig. 4a) broadly corresponded to those already obtained (with varying support) in a coalescent-based analysis of ITS and ptDNA sequences and in a concatenated analysis of ptDNA sequences (but they were inconsistent with those obtained using ITS sequences alone) (Fernández-Mazuecos et al. 2013a; Fernández-Mazuecos and Vargas 2015). However, apart from major subclades, previous species-level topologies found little support in our genome-wide data, and comparatively more reliable phylogenetic hypotheses were inferred herein. In addition, genetic structure analyses recovered unprecedented resolution of species boundaries, as shown by the congruence between inferred genetic clusters and morphologically-defined species (Fig. 1, Supplementary Fig. S5 available on Dryad). Therefore, despite the uncertainties remaining (possibly explained by hybridization; see below), our results show that GBS can provide unprecedented systematic and phylogenetic resolution in recent radiations where conventional markers failed, and without the need for previous genomic information (Escudero et al. 2014). Given their relatively simple and inexpensive implementation, GBS and similar methods of the GBS/RAD-Seq family may be universally and routinely applicable by systematic biologists as a NGS-based replacement to conventional phylogenetic approaches (Ree and Hipp 2015; de la Harpe et al. 2017; Hamon et al. 2017). Indeed, after DNA extraction, GBS library preparation can be carried out in 2–3 days using equipment available in most molecular systematic laboratories, and successful results may be obtained by multiplexing c.100 individuals in a single HiSeq Illumina lane (although the allowed level of multiplexing would depend on genome sizes and GC content as well as number of reads per lane). Interestingly, even though recently collected tissue is clearly preferable due to a decline in sequencing success with sample age as a result of DNA degradation, we still obtained $$>$$1000 sequenced loci with herbarium samples as old as 20–30 years (Supplementary Fig. S4 available on Dryad; but see an improved method for highly degraded DNA samples in Suchan et al. 2016). Compared to the RAD-Seq procedure (Baird et al. 2008), which has been more widely used in phylogenetic and population genetic studies in recent years, the GBS protocol of Elshire et al. (2011) has the advantage of a much simpler protocol for library preparation, including fewer purification steps (thus increasing the probability of success with low-starting amounts) and of not requiring accurate size selection of DNA fragments (therefore involving less specialized equipment). The main drawback of GBS, particularly when applied to phylogenetics, is the high proportion of missing data in the assembled data sets (Table 2). This problem is shared by RAD-Seq, although to a lesser extent thanks to size selection. Data sets with lower percentages of missing data may be obtained by increasing the minimum taxon coverage, therefore restricting the final data set to those loci that were sequenced in a high-percentage of individuals. Still, simulations show that accurate phylogenies can be obtained from data sets including vast amounts of missing data (Rubin et al. 2012). Furthermore, increasing the minimum taxon coverage may result in the loss of valuable phylogenetic information due to reduced data set size and a biased representation of the mutation spectrum across sequenced loci (Huang and Knowles 2014). This is consistent with our empirical results, in which the numbers of loci dramatically decreased when increasing the minimum taxon coverage (Table 2), resulting in a loss of phylogenetic resolution in both concatenated and coalescent-based analyses (Supplementary Figs S6 and S8 available on Dryad). Nevertheless, more complete GBS data sets may be obtained, at the same time preventing information loss, by multiplexing smaller numbers of samples per Illumina lane (thus increasing sequencing coverage) and optimizing the quality of starting DNA. Concatenation Versus Coalescent-Based Approaches to GBS Phylogenetics A concatenation approach has been widely used for the analysis of restriction digest-based phylogenomic data sets (e.g., Nadeau et al. 2013; Wagner et al. 2013; Escudero et al. 2014; Hipp et al. 2014; Cavender-Bares et al. 2015). In particular, ML in RAxML is the current standard for concatenated analysis thanks to its high efficiency with large data matrices. However, there are several well-known drawbacks of concatenated analysis (Wagner et al. 2013). Concatenation assumes a shared phylogenetic history for all sequenced genes, and therefore it is inconsistent in the presence of gene tree discordance caused by ILS and hybridization (Kubatko and Degnan 2007; Solís-Lemus et al. 2016). Accuracy of species phylogenies inferred by concatenation relies on the most frequent gene tree topology being the same as the species tree topology, but “anomaly zones” exist under certain conditions (involving short branch lengths) in which the most likely gene tree topology differs from the species tree (Degnan and Rosenberg 2006). Additionally, as phylogenomic data sets grow larger, concatenation is more likely to produce anomalously high statistical support for incorrect topologies as a result of systematic biases (Gadagkar et al. 2005; Kumar et al. 2011). Bias may result from the specification of a single substitution model, which assumes substitution rate homogeneity across the whole data set. Partitioned analysis may prevent this problem, but it may be computationally problematic with high numbers of loci. Issues with orthology determination and alignment can also result in biases (Kumar et al. 2011), which may be alleviated if a sequenced genome is available as reference for data assembly. Finally, another well-known cause of bias is long-branch attraction, which may be mitigated by a complete taxonomic sampling and by avoiding those methods that are more sensitive to it, such as parsimony (Bergsten 2005). Analyses of our GBS data sets highlight such shortcomings of concatenation. The high statistical support, as measured by both ML bootstrap and Bayesian posterior probabilities, for alternative topologies (Fig. 2a–d) is a sign of systematic bias (Kumar et al. 2011) resulting from the different clustering thresholds. This is true for both concatenated full-sequence and SNP matrices (Supplementary Figs S6 and S7 available on Dryad). Indeed, changes in the clustering threshold produce contrasting optimal topologies (with alternative phylogenetic positions for two species) as a result of significant changes in the numbers of loci supporting and disfavoring alternative trees (Fig. 3). Interestingly, in contrast to the misleading confidence suggested by BS and PP values, topology tests revealed no such certainty, as they were generally unable to reject topologies produced by alternative GBS data sets (they did reject previously-published topologies based on only two loci; Table 3). This bias potentially introduced during data assembly at the crucial steps of orthology determination casts doubts on any concatenation analysis based on a fixed clustering threshold. While methods have been proposed to optimize clustering parameters (Mastretta-Yanes et al. 2015), the significant topological changes that we found associated with small changes in the clustering threshold (as small as 1%; Fig. 2a–d) indicate that multiple analyses based on a range of values (Takahashi et al. 2014; Leaché et al. 2015b), combined with topology tests, are still necessary to evaluate if high clade support values provide a realistic measurement of confidence. In contrast to concatenation, gene-tree-based coalescent methods have been less frequently used for the inference of species trees from restriction digest-based NGS data. The short length of sequenced loci when analyzing single-end data (c. 100–200 bp) may result in poorly informative gene trees based on individual loci, which may be a problem for species tree inference (Salichos and Rokas 2013; Mirarab et al. 2016), and full probabilistic methods co-estimating gene trees and species trees (Liu 2008; Heled and Drummond 2010) are generally unable to handle large NGS data sets (Wagner et al. 2013). Unlike full probabilistic methods, summary methods that operate by combining previously estimated gene trees (e.g. Liu and Yu 2011; Mirarab et al. 2014) are computationally efficient with large data sets. Even with short loci, one of these methods (NJ$$_{\mathrm{st}})$$ allowed us to infer a resolved species tree (Figs 2e and 4a), showing that the implementation of summary methods is computationally feasible with GBS and RAD-Seq data (see also Ebel et al. 2015; DaCosta and Sorenson 2016). Moreover, the same species tree was produced by this coalescent method from data sets assembled using different clustering thresholds that produced contrasting topologies in concatenation analyses and, furthermore, the NJ$$_{\mathrm{st}}$$ topology could not be rejected by a majority of topology tests using the concatenated data sets (Table 3). The NJ$$_{\mathrm{st}}$$ topology was similar to the concatenation-based topologies, with one of the problematic species, ALG, consistently recovered at one of the two alternative positions of the concatenated analyses (although with varying support). By contrast, the other problematic species, CLE, was recovered as sister to BEC, a relationship not found in concatenated analyses but biogeographically reasonable, given the close proximity of the two species’ narrow distributions (Fig. 5). All other species remained at the phylogenetic placements found in concatenation analyses. Figure 5. View largeDownload slide Biogeographic patterns in the Iberian clade of Linaria subsect. Versicolores based on coalescent-based phylogenetic analysis of genotyping-by-sequencing data. The NJ$$_{\text{st}}$$ species tree is projected onto the geographic distribution of the eight species in the Iberian Peninsula. Dashed lines at the limit between L. spartea and L. viscosa indicate uncertainty on distribution ranges. Figure 5. View largeDownload slide Biogeographic patterns in the Iberian clade of Linaria subsect. Versicolores based on coalescent-based phylogenetic analysis of genotyping-by-sequencing data. The NJ$$_{\text{st}}$$ species tree is projected onto the geographic distribution of the eight species in the Iberian Peninsula. Dashed lines at the limit between L. spartea and L. viscosa indicate uncertainty on distribution ranges. As an alternative to gene-tree-based methods, coalescent-based approaches that bypass the inference of gene trees have been proposed (Bryant et al. 2012; Chifman and Kubatko 2014) and applied to RAD-Seq data sets (DaCosta and Sorenson 2016; Manthey et al. 2016). We tested the performance of the SVDquartets method, which infers relationships among quartets of taxa using algebraic statistics (Chifman and Kubatko 2014). While poor resolution and support values were recovered when analyzing unlinked SNP matrices (Supplementary Fig. S8 available on Dryad), analyses of full-sequence matrices produced a well-supported topology that was highly robust to variations in assembly parameters (Fig. 2f, Supplementary Fig. S8 available on Dryad), but different to the NJ$$_{\mathrm{st}}$$ topology. In the SVDquartets tree, CLE was sister to the remaining seven species, a position encountered in some concatenated analyses. SPA was related to VIS and ONU, a position only obtained for one concatenated data set (c84m20; Supplementary Fig. S6 available on Dryad), but consistent with the close proximity of these tree species in the SNP-based PCA (Fig. 1c). Robustness of phylogenetic patterns across methods and data is considered crucial in phylogenomics (Kumar et al. 2011). The constancy of the NJ$$_{\mathrm{st}}$$ and SVDquartets species trees across data sets indicates that coalescent analyses may be more robust than concatenation to systematic bias introduced during data set assembly. This result adds to the list of advantages of coalescent methods over concatenation, including their accuracy in the presence of gene tree discordance caused by ILS (Mirarab et al. 2016) and their more realistic measures of clade statistical support (Giarla and Esselstyn 2015). The question remains as to the robustness of our results to the violation of the main assumption of most coalescent methods, i.e. the absence of interspecific gene flow. Indeed, historical hybridization may be behind the contrasting topologies of the NJ$$_{\mathrm{st}}$$ and SVDquartets trees. We minimized the effect of recent hybridization by excluding individuals suggested as hybrids by genetic structure analyses. However, $$D$$-statistic tests (Table 4) still supported ancestral hybridization events during diversification of the clade, as indicated by significant asymmetries in the sharing of derived alleles between nonsister species when using either tree as the major vertical inheritance pattern. These results show that a fully bifurcating species tree is an oversimplified representation of the diversification of young clades. Simulations indicate that coalescent methods, including NJ$$_{\mathrm{st}}$$, are able to infer the correct dominant species tree in the presence of low levels of gene flow, given a sufficiently large number of genes (Solís-Lemus et al. 2016). With high levels of gene flow, coalescent methods may not be able to recover the true dominant topology, but they are still more accurate than concatenation (Solís-Lemus et al. 2016). We propose that both coalescent-based topologies (and possibly the concatenation-based topologies to an extent) obtained from our GBS data represent different aspects of a reticulate history of diversification (Fig. 4a). At the same time, we favor the NJ$$_{\mathrm{st}}$$ tree over the SVDquartets tree as the current best estimate of the major pattern of speciation based on the following lines of evidence: (i) the NJ$$_{\mathrm{st}}$$ tree generally fitted the data better in SNaQ analyses, both with and without hybridization (Table 5); (ii) the SVDquartets tree was more frequently rejected in topology tests (Table 3); and (iii) the NJ$$_{\mathrm{st}}$$ tree was supported by more gene trees (Fig. 3). In any case, we cannot minimize the potential role of homoploid hybrid speciation, already suggested for other Linaria clades (Blanco-Pastor et al. 2012; see also Nieto Feliner et al. 2017). The origin of L. clementei is particularly intriguing, as it may have involved hybridization between an early-diverging lineage and a recently derived one (probably with a higher contribution of the latter according to SNaQ analyses; Table 5; Fig. 2g). Phenotypic Divergence and Convergence during Speciation Our combination of multivariate morphometric, genetic structure, and coalescent-based phylogenetic analyses for the first time provide a well-resolved picture of systematics and patterns of morphological evolution in a Mediterranean plant lineage, the Iberian clade of Linaria subsect. Versicolores. The data support a systematic hypothesis with eight morphologically and genetically distinct species. Taxonomic notes are provided in Supplementary Appendix S1 available on Dryad. Our results are essentially consistent with previous taxonomic revisions (Sutton 1988; Sáez and Bernal 2009), except that L. salzmannii had been generally considered a subspecies of L. viscosa (L. viscosa subsp. spicata) but is now supported as a distinct species as a result of both taxa being independently derived in all phylogenetic analyses (Figs 2 and 4a; see also Blanca et al. 2017). A flower color polymorphism within L. salzmannii (Boissier 1841; Sáez and Bernal 2009) is confirmed, as shown by the morphological and genetic similarity of the yellow- and purple-flowered morphs, and L. becerrae is well supported as distinct from L. salzmannii (Blanca et al. 2017). The morphologically similar (particularly in flower traits) L. spartea and L. viscosa are established as genetically distinct and nonsister species (Supplementary Fig. S5 available on Dryad, Figs 2 and 4a), although further population genetic sampling would be required to determine the exact limits of their geographic ranges. Finally, plants collected in the locus classicus of L. viscosa subsp. crassifolia are found to fall within the morphological and genetic variation of L. spartea (Fig. 1, Supplementary Figs S5 and S3 available on Dryad). Mapping of morphological traits and geography onto the NJ$$_{\mathrm{st}}$$ phylogeny (which we consider a good estimate of major speciation events) shows a pattern in which closely related species have geographically close or overlapping distributions, but they are strikingly divergent in phenotypic traits, and potentially ecological interactions (Figs 4 and 5, Supplementary Fig. S9a,b available on Dryad). At the same time, unrelated species have evolved convergent phenotypes in different geographic regions. This pattern has been found in large scale replicated island radiations (Mahler et al. 2013), but is documented here for a smaller scale radiation in a continental setting (see Hughes and Eastwood 2006; Mittelbach and Schemske 2015). The deeper splits show a geographic pattern, reflecting the progressive colonization of the Iberian Peninsula during the Quaternary (Fernández-Mazuecos et al. 2013a; Fernández-Mazuecos and Vargas 2015). The major phylogenetic split between south-eastern and western species coincides with a phylogeographic discontinuity found for ptDNA markers (Fernández-Mazuecos and Vargas 2015). There is no clear pattern of morphological differentiation linked to this split, as indicated by the phylomorphospace based on both vegetative and reproductive characters (Fig. 4b). In contrast, extensive morphological divergence is associated with recent splits between sister species (Fig. 4b,c, Supplementary Fig. S9a,b available on Dryad). This is particularly true for floral traits related to pollinator specialization (Fenster et al. 2004), including corolla color, corolla shape, nectar spur length and tube width. Three recent splits between sister species are revealed by the phylogeny. One of them, in the south-eastern subclade, has led to the divergent flower morphologies of L. clementei and L. becerrae, respectively displaying the shortest and one of the two longest nectar spurs in the clade (Fig. 4). In addition, L. clementei is the only perennial species in the clade, which adds to the singularity of this species (see also DFA results; Fig. 1b) and its phenotypic divergence with the annual L. becerrae. The two other recent speciation events, in the western subclade, L. viscosa—L. onubensis in southwestern Iberia and L. spartea—L. incarnata in central Iberia, also represent high degrees of phenotypic divergence. Interestingly, they have both followed nearly identical pathways of floral change in different geographic regions, including divergence in corolla color (yellow in L. viscosa and L. spartea, and purple-dominated in their sisters L. onubensis and L. incarnata), corolla tube width (broad in L. viscosa and L. spartea, narrow in L. onubensis and L. incarnata), and overall flower shape (with parallel divergence along the first axis of the geometric morphometric analysis) (Fig. 4, Supplementary Fig. S9a,b available on Dryad). The most likely floral ancestral morphology for the western subclade, and for the two pairs of sister species, corresponds to a purple corolla with broad tube and type I shape. L. algarviana is the only species in the subclade that has retained this ancestral morphology (except for its elongated nectar spur), while convergent changes have happened in the two other pairs of sister species: a yellow corolla has independently evolved in L. spartea and L. viscosa from purple-flowered ancestors, and a narrow-tubed, type III corolla has independently evolved in L. onubensis and L. incarnata from broad-tubed, Type I ancestors (Fig. 4, Supplementary Fig. S9a,b available on Dryad). Broad- and narrow-tubed flowers have been suggested as distinct evolutionary optima representing divergent strategies of pollen placement on nectar-feeding insects (Fernández-Mazuecos et al. 2013a). Other instances of phenotypic convergence in geographically disjunct species include the additional shift to a yellow from a purple-dominated corolla in most of the range of L. salzmannii, the independent evolution of dense inflorescences from lax inflorescences in the south-eastern subclade and L. viscosa, and the independent evolution of the longest nectar spurs in L. becerrae and L. algarviana (Fig. 4, Supplementary Fig. S9a,b available on Dryad). General patterns of morphological evolution are similar when analyzing the SVDquartets tree (Supplementary Fig. S9c–f available on Dryad). Even though some of the details are different, morphological divergence between close relatives and convergence between nonclose relatives is still detected, indicating that our main evolutionary inferences are robust to the uncertainty in topology and to hypothesized reticulation events. Similar results have been obtained, although with lower resolution, in the North African clade of Linaria subsect. Versicolores (Fernández-Mazuecos et al. 2013a) and in Linaria sect. Supinae (Blanco-Pastor et al. 2015), pointing to a general pattern in the genus. In summary, our results indicate a higher robustness of coalescent approaches, as compared to concatenation, for GBS-based species phylogenetics. In our study clade, combined roles of geographical and ecological isolation in speciation are suggested, together with a likely role of hybridization. Recent speciation events involve extensive divergence in key traits linked to pollinator interactions, such as the length of the nectar spur, considered a key innovation and a model system for eco-evo-devo studies of plant speciation (Fernández-Mazuecos and Glover 2017). These patterns suggest adaptive, ecological diversification, including cases of “parallel speciation” likely associated with similar isolating barriers and selective pressures in different regions. Supplementary Material Data available from the Dryad Digital Repository: http://dx.doi.org/10.5061/dryad.mp818. Acknowledgements The authors thank Matthew Dorling for technical assistance; Edwige Moyroud, Marcial Escudero, José Luis Blanco-Pastor, Irene Villa, Isabel Liberal, Yurena Arjona, Sam Brockington, Paul Grabowski, Natasha Elina and Cristina Ariani for their support and useful discussion at different stages of this study; Bruno Santos and Greg Habrych for bioinformatic assistance; the David Baulcombe lab for permission to use laboratory and bioinformatic resources; Alberto Fernández-Mazuecos and Joaquín Ramírez for fieldwork support; José Quiles, Jesús Vílchez, Lucas Gutiérrez, Enrique Sánchez-Gullón, Isabel Marques, and Belén Estébanez for plant materials and/or population locations; the MA, RNG, ABH, SEV, JAEN, BCB, COI, LISU, and SALA herbaria for plant materials; Mats Hjertson, Laurence Loze, Laurent Gautier, and Roxali Bijmoer for their support in solving taxonomic questions; and three anonymous reviewers for their helpful comments. Funding This work was supported by the Marie Curie Intra-European Fellowship LINARIA-SPECIATION (FP7-PEOPLE-2013-IEF, project reference 624396) and an Isaac Newton Trust Research Grant (Trinity College, Cambridge). References Aberer A.J., Kobert K., Stamatakis A. 2014. ExaBayes: massively parallel Bayesian tree inference for the whole-genome era. Mol. Biol. Evol.  31: 2553– 2556. Google Scholar CrossRef Search ADS PubMed  Adler D., Murdoch D. 2016. RGL—3D real-time visualization device system for R. Available from: http://rgl.neoscientists.org/. álvarez I., Wendel J.F. 2003. Ribosomal ITS sequences and plant phylogenetic inference. Mol. Phylogenet. Evol.  29: 417– 434. Google Scholar CrossRef Search ADS PubMed  Andrews S. 2010. FastQC: a quality control tool for high throughput sequence data. Available from: http://www.bioinformatics.babraham.ac.uk/projects/fastqc/. Baird N.A., Etter P.D., Atwood T.S., Currey M.C., Shiver A.L., Lewis Z.A., Selker E.U., Cresko W.A., Johnson E.A. 2008. Rapid SNP discovery and genetic mapping using sequenced RAD markers. Plos One  3: e3376. Google Scholar CrossRef Search ADS PubMed  Bergsten J. 2005. A review of long-branch attraction. Cladistics  21: 163– 193. Google Scholar CrossRef Search ADS   Blanca G., Cueto M., Fuentes J. 2017. Linaria becerrae (Plantaginaceae), a new endemic species from the southern Spain, and remarks on what Linaria salzmannii is and is not. Phytotaxa  298: 261– 268. Google Scholar CrossRef Search ADS   Blanco-Pastor J.L., Ornosa C., Romero D., Liberal I.M., Gómez J.M., Vargas P. 2015. Bees explain floral variation in a recent radiation of Linaria. J. Evolution. Biol.  28: 851– 863. Google Scholar CrossRef Search ADS   Blanco-Pastor J.L., Vargas P., Pfeil B.E. 2012. Coalescent simulations reveal hybridization and incomplete lineage sorting in Mediterranean Linaria. Plos One  7: e39089. Google Scholar CrossRef Search ADS PubMed  Boissier P.E. 1841. Linaria Salzmanni. In: Voyage botanique dans le midi de l’Espagne 2(15).  Paris: Gide et Cie. p. 456– 457. Bryant D., Bouckaert R., Felsenstein J., Rosenberg N.A., RoyChoudhury A. 2012. Inferring species trees directly from biallelic genetic markers: bypassing gene trees in a full coalescent analysis. Mol. Biol. Evol.  29: 1917– 1932. Google Scholar CrossRef Search ADS PubMed  Bryant D., Moulton V. 2004. Neighbor-net: an agglomerative method for the construction of phylogenetic networks. Mol. Biol. Evol.  21: 255– 265. Google Scholar CrossRef Search ADS PubMed  Buckley T.R., Simon C., Shimodaira H., Chambers G.K. 2001. Evaluating hypotheses on the origin and evolution of the New Zealand alpine cicadas (Maoricicada) using multiple-comparison tests of tree topology. Mol. Biol. Evol.  18: 223– 234. Google Scholar CrossRef Search ADS PubMed  Castro M., Castro S., Loureiro J. 2012. Genome size variation and incidence of polyploidy in Scrophulariaceae sensu lato from the Iberian Peninsula. AoB Plants  2012:pls037. Cavender-Bares J., González-Rodríguez A., Eaton D.A.R., Hipp A.A.L., Beulke A., Manos P.S. 2015. Phylogeny and biogeography of the American live oaks (Quercus subsection Virentes): a genomic and population genetics approach. Mol. Ecol.  24: 3668– 3687. Google Scholar CrossRef Search ADS PubMed  Chifman J., Kubatko L. 2014. Quartet inference from SNP data under the coalescent model. Bioinformatics  30: 3317– 3324. Google Scholar CrossRef Search ADS PubMed  Corander J., Waldmann P., Marttinen P., Sillanpää M.J. 2004. BAPS 2: enhanced possibilities for the analysis of genetic population structure. Bioinformatics  20: 2363– 2369. Google Scholar CrossRef Search ADS PubMed  Cullings K.W. 1992. Design and testing of a plant-specific PCR primer for ecological and evolutionary studies. Mol. Ecol.  1: 233– 240. Google Scholar CrossRef Search ADS   DaCosta J.M., Sorenson M.D. 2016. ddRAD-seq phylogenetics based on nucleotide, indel, and presence-absence polymorphisms: analyses of two avian genera with contrasting histories. Mol. Phylogenet. Evol.  94: 122– 135. Google Scholar CrossRef Search ADS PubMed  Davey J.W., Hohenlohe P.A., Etter P.D., Boone J.Q., Catchen J.M., Blaxter M.L. 2011. Genome-wide genetic marker discovery and genotyping using next-generation sequencing. Nat. Rev. Genet.  12: 499– 510. Google Scholar CrossRef Search ADS PubMed  de la Harpe M., Paris M., Karger D., Rolland J., Kessler M., Salamin N., Lexer C. 2017. Molecular ecology studies of species radiations: current research gaps, opportunities, and challenges. Mol. Ecol.  26: 2608– 2622. Google Scholar CrossRef Search ADS PubMed  Degnan J.H., Rosenberg N.A. 2006. Discordance of species trees with their most likely gene trees. PLOS Genet.  2: e68. Google Scholar CrossRef Search ADS PubMed  Doyle J.J., Doyle J.L. 1987. A rapid DNA isolation procedure for small quantities of fresh leaf tissue. Phytochem. Bull.  19: 11– 15. Durand E.Y., Patterson N., Reich D., Slatkin M. 2011. Testing for ancient admixture between closely related populations. Mol. Biol. Evol.  28: 2239– 2252. Google Scholar CrossRef Search ADS PubMed  Eaton D.A.R. 2014. PyRAD: assembly of de novo RADseq loci for phylogenetic analyses. Bioinformatics  30: 1844– 1849. Google Scholar CrossRef Search ADS PubMed  Eaton D.A.R., Hipp A.L., González-Rodríguez A., Cavender-Bares J. 2015. Historical introgression among the American live oaks and the comparative nature of tests for introgression. Evolution  69: 2587– 2601. Google Scholar CrossRef Search ADS PubMed  Eaton D.A.R., Spriggs E.L., Park B., Donoghue M.J. 2016. Misconceptions on missing data in RAD-seq phylogenetics with a deep-scale example from flowering plants. Syst. Biol.  66: 399– 412. Ebel E.R., DaCosta J.M., Sorenson M.D., Hill R.I., Briscoe A.D., Willmott K.R., Mullen S.P. 2015. Rapid diversification associated with ecological specialization in Neotropical Adelpha butterflies. Mol. Ecol.  24: 2392– 2405. Google Scholar CrossRef Search ADS PubMed  Elshire R.J., Glaubitz J.C., Sun Q., Poland J.A., Kawamoto K., Buckler E.S., Mitchell S.E. 2011. A robust, simple genotyping-by-sequencing (GBS) approach for high diversity species. Plos One  6: e19379. Google Scholar CrossRef Search ADS PubMed  Escudero M., Eaton D.A.R., Hahn M., Hipp A.L. 2014. Genotyping-by-sequencing as a tool to infer phylogeny and ancestral hybridization: a case study in Carex (Cyperaceae). Mol. Phylogenet. Evol.  79: 359– 367. Google Scholar CrossRef Search ADS PubMed  Farrington H.L., Lawson L.P., Clark C.M., Petren K. 2014. The evolutionary history of Darwin’s finches: speciation, gene flow, and introgression in a fragmented landscape. Evolution  68: 2932– 2944. Google Scholar CrossRef Search ADS PubMed  Fenster C.B., Armbruster W.S., Wilson P., Dudash M.R., Thomson J.D. 2004. Pollination syndromes and floral specialization. Annu. Rev. Ecol. Evol. Systemat.  35: 375– 403. Google Scholar CrossRef Search ADS   Fernández-Mazuecos M., Blanco-Pastor J.L., Gómez J.M., Vargas P. 2013a. Corolla morphology influences diversification rates in bifid toadflaxes (Linaria sect. Versicolores). Ann. Bot.  112: 1705– 1722. Google Scholar CrossRef Search ADS   Fernández-Mazuecos M., Blanco-Pastor J.L., Vargas P. 2013b. A phylogeny of toadflaxes (Linaria Mill.) based on nuclear internal transcribed spacer sequences: systematic and evolutionary consequences. Int. J. Plant Sci.  174: 234– 249. Google Scholar CrossRef Search ADS   Fernández-Mazuecos M., Glover B.J. 2017. The evo-devo of plant speciation. Nat. Ecol. Evol.  1: 0110. Google Scholar CrossRef Search ADS   Fernández-Mazuecos M., Vargas P. 2011. Historical isolation versus recent long-distance connections between Europe and Africa in bifid toadflaxes (Linaria sect. Versicolores). Plos One  6: e22234. Google Scholar CrossRef Search ADS PubMed  Fernández-Mazuecos M., Vargas P. 2015. Quaternary radiation of bifid toadflaxes (Linaria sect. Versicolores) in the Iberian Peninsula: low taxonomic signal but high geographic structure of plastid DNA lineages. Plant Syst. Evol.  301: 1411– 1423. Google Scholar CrossRef Search ADS   Gadagkar S.R., Rosenberg M.S., Kumar S. 2005. Inferring species phylogenies from multiple genes: concatenated sequence tree versus consensus gene tree. J. Exp. Zool. B Mol. Dev. Evol.  304: 64– 74. Google Scholar CrossRef Search ADS PubMed  Giarla T.C., Esselstyn J.A. 2015. The challenges of resolving a rapid, recent radiation: empirical and simulated phylogenomics of Philippine shrews. Syst. Biol.  64: 727– 740. Google Scholar CrossRef Search ADS PubMed  Givnish T.J., Millam K.C., Mast A.R., Paterson T.B., Theim T.J., Hipp A.L., Henss J.M., Smith J.F., Wood K.R., Sytsma K.J. 2009. Origin, adaptive radiation and diversification of the Hawaiian lobeliads (Asterales: Campanulaceae). Proc. Roy. Soc. Lond. B Biol. Sci.  276: 407– 416. Google Scholar CrossRef Search ADS   Gower J.C. 1971. A general coefficient of similarity and some of its properties. Biometrics  27: 857– 874. Google Scholar CrossRef Search ADS   Grabowski P.P., Morris G.P., Casler M.D., Borevitz J.O. 2014. Population genomic variation reveals roles of history, adaptation and ploidy in switchgrass. Mol. Ecol.  23: 4059– 4073. Google Scholar CrossRef Search ADS PubMed  Guzmán B., Lledó M.D., Vargas P. 2009. Adaptive radiation in Mediterranean Cistus (Cistaceae). Plos One  4: e6362. Google Scholar CrossRef Search ADS PubMed  Hamon P., Grover C.E., Davis A.P., Rakotomalala J.J., Raharimalala N.E., Albert V.A., Sreenath H.L., Stoffelen P., Mitchell S.E., Couturon E., Hamon S., de Kochko A., Crouzillat D., Rigoreau M., Sumirat U., Akaffou S., Guyot R. 2017. Genotyping-by-sequencing provides the first well-resolved phylogeny for coffee (Coffea) and insights into the evolution of caffeine content in its species: GBS coffee phylogeny and the evolution of caffeine content. Mol. Phylogenet. Evol.  109: 351– 361. Google Scholar CrossRef Search ADS PubMed  Harvey M.G., Judy C.D., Seeholzer G.F., Maley J.M., Graves G.R., Brumfield R.T. 2015. Similarity thresholds used in DNA sequence assembly from short reads can reduce the comparability of population histories across species. PeerJ  3: e895. Google Scholar CrossRef Search ADS PubMed  Harvey M.G., Smith B.T., Glenn T.C., Faircloth B.C., Brumfield R.T. 2016. Sequence capture versus restriction site associated DNA sequencing for shallow systematics. Syst. Biol.  65: 910– 924. Google Scholar CrossRef Search ADS PubMed  Heled J., Drummond A.J. 2010. Bayesian inference of species trees from multilocus data. Mol. Biol. Evol.  27: 570– 580. Google Scholar CrossRef Search ADS PubMed  Hipp A.L. 2014. RADami: R package for phylogenetic analysis of RADseq data. Available from https://cran.r-project.org/web/packages/RADami/index.html. Hipp A.L., Eaton D.A.R., Cavender-Bares J., Fitzek E., Nipper R., Manos P.S. 2014. A framework phylogeny of the American oak clade based on sequenced RAD data. Plos One  9: e93975. Google Scholar CrossRef Search ADS PubMed  Huang H., Knowles L.L. 2014. Unforeseen consequences of excluding missing data from next-generation sequences: simulation study of RAD sequences. Syst. Biol.  65: 357– 365. Google Scholar CrossRef Search ADS PubMed  Hughes C., Eastwood R. 2006. Island radiation on a continental scale: exceptional rates of plant diversification after uplift of the Andes. P. Natl. Acad. Sci. USA  103: 10334– 10339. Google Scholar CrossRef Search ADS   Kampstra P. 2008. Beanplot: a boxplot alternative for visual comparison of distributions. J. Stat. Softw.  28: 1– 9. Google Scholar CrossRef Search ADS PubMed  Kay K.M., Voelckel C., Yang J.Y., Hufford K.M., Kaska D.D., Hodges S.A. 2006. Floral characters and species diversification. In: Harder L.D., Barrett S.C.H., editors. Ecology and evolution of flowers.  Oxford: Oxford University Press p. 311– 325. Kubatko L.S., Degnan J.H. 2007. Inconsistency of phylogenetic estimates from concatenated data under coalescence. Syst. Biol.  56: 17– 24. Google Scholar CrossRef Search ADS PubMed  Kumar S., Filipski A.J., Battistuzzi F.U., Pond S.L.K., Tamura K. 2011. Statistics and truth in phylogenomics. Mol. Biol. Evol.  29: 457– 472. Google Scholar CrossRef Search ADS PubMed  Leaché A.D., Banbury B.L., Felsenstein J., de Oca A.N.-M., Stamatakis A. 2015a. Short tree, long tree, right tree, wrong tree: new acquisition bias corrections for inferring SNP phylogenies. Syst. Biol.  64: 1032– 1047. Google Scholar CrossRef Search ADS   Leaché A.D., Chavez A.S., Jones L.N., Grummer J.A., Gottscho A.D., Linkem C.W. 2015b. Phylogenomics of phrynosomatid lizards: conflicting signals from sequence capture versus restriction site associated DNA sequencing. Genome Biol. Evol.  7: 706– 719. Google Scholar CrossRef Search ADS   Lemmon E.M., Lemmon A.R. 2013. High-throughput genomic data in systematics and phylogenetics. Annu. Rev. Ecol. Evol. Systemat.  44: 99– 121. Google Scholar CrossRef Search ADS   Li H., Ruan J., Durbin R. 2008. Mapping short DNA sequencing reads and calling variants using mapping quality scores. Genome Res.  18: 1851– 1858. Google Scholar CrossRef Search ADS PubMed  Liu L. 2008. BEST: Bayesian estimation of species trees under the coalescent model. Bioinformatics  24: 2542– 2543. Google Scholar CrossRef Search ADS PubMed  Liu L., Wu S., Yu L. 2015. Coalescent methods for estimating species trees from phylogenomic data. J. Syst. Evol.  53: 380– 390. Google Scholar CrossRef Search ADS   Liu L., Yu L. 2010. Phybase: an R package for species tree analysis. Bioinformatics  26: 962– 963. Google Scholar CrossRef Search ADS PubMed  Liu L., Yu L. 2011. Estimating species trees from unrooted gene trees. Syst. Biol.  60: 661– 667. Google Scholar CrossRef Search ADS PubMed  Maddison W.P., Maddison D.R. 2011. Mesquite: a modular system for evolutionary analysis. Available from http://mesquiteproject.org. Mahler D.L., Ingram T., Revell L.J., Losos J.B. 2013. Exceptional convergence on the macroevolutionary landscape in island lizard radiations. Science  341: 292– 295. Google Scholar CrossRef Search ADS PubMed  Mallo D. 2015. Multi-locus bootstrapping script. Available from https://github.com/adamallo/multi-locus-bootstrapping. Mallo D. 2016. NJstM. Available from https://github.com/adamallo/NJstM. Manthey J.D., Campillo L.C., Burns K.J., Moyle R.G. 2016. Comparison of target-capture and restriction-site associated DNA sequencing for phylogenomics: a test in cardinalid tanagers (Aves, Genus: Piranga). Syst. Biol.  65: 640– 650. Google Scholar CrossRef Search ADS PubMed  Mastretta-Yanes A., Arrigo N., Alvarez N., Jorgensen T.H., Piñero D., Emerson B.C. 2015. Restriction site-associated DNA sequencing, genotyping error estimation and de novo assembly optimization for population genetic inference. Mol. Ecol. Resour.  15: 28– 41. Google Scholar CrossRef Search ADS PubMed  McCormack J.E., Hird S.M., Zellmer A.J., Carstens B.C., Brumfield R.T. 2013. Applications of next-generation sequencing to phylogeography and phylogenetics. Mol. Phylogenet. Evol.  66: 526– 538. Google Scholar CrossRef Search ADS PubMed  Meiklejohn K.A., Faircloth B.C., Glenn T.C., Kimball R.T., Braun E.L. 2016. Analysis of a rapid evolutionary radiation using ultraconserved elements: evidence for a bias in some multispecies coalescent methods. Syst. Biol.  65: 612– 627. Google Scholar CrossRef Search ADS PubMed  Mirarab S., Bayzid M.S., Warnow T. 2016. Evaluating summary methods for multilocus species tree estimation in the presence of incomplete lineage sorting. Syst. Biol.  65: 366– 380. Google Scholar CrossRef Search ADS PubMed  Mirarab S., Reaz R., Bayzid M.S., Zimmermann T., Swenson M.S., Warnow T. 2014. ASTRAL: genome-scale coalescent-based species tree estimation. Bioinformatics  30: i541– i548. Google Scholar CrossRef Search ADS PubMed  Mittelbach G.G., Schemske D.W. 2015. Ecological and evolutionary perspectives on community assembly. Trends Ecol. Evol.  30: 241– 247. Google Scholar CrossRef Search ADS PubMed  Muschick M., Indermaur A., Salzburger W. 2012. Convergent evolution within an adaptive radiation of cichlid fishes. Curr. Biol.  22: 2362– 2368. Google Scholar CrossRef Search ADS PubMed  Nadeau N.J., Martin S.H., Kozak K.M., Salazar C., Dasmahapatra K.K., Davey J.W., Baxter S.W., Blaxter M.L., Mallet J., Jiggins C.D. 2013. Genome-wide patterns of divergence and gene flow across a butterfly radiation. Mol. Ecol.  22: 814– 826. Google Scholar CrossRef Search ADS PubMed  Nicotra A.B., Chong C., Bragg J.G., Ong C.R., Aitken N.C., Chuah A., Lepschi B., Borevitz J.O. 2016. Population and phylogenomic decomposition via genotyping-by-sequencing in Australian Pelargonium. Mol. Ecol.  25: 2000– 2014. Google Scholar CrossRef Search ADS PubMed  Nieto Feliner G., álvarez I., Fuertes-Aguilar J., Heuertz M., Marques I., Moharrek F., Piñeiro R., Riina R., Rosselló J.A., Soltis P.S., Villa-Machío I. 2017. Is homoploid hybrid speciation that rare? An empiricist’s view. Heredity  118: 513– 516. Google Scholar CrossRef Search ADS PubMed  Oksanen J., Blanchet F.G., Kindt R., Legendre P., O’Hara R.B., Simpson G.L., Solymos P., M.H.H. S., Wagner H. 2011. vegan: Community Ecology Package. R package version 1.17-9.  Available from http://CRAN.R-project.org/package=vegan. Paradis E., Claude J., Strimmer K. 2004. APE: analyses of phylogenetics and evolution in R language. Bioinformatics  20: 289– 290. Google Scholar CrossRef Search ADS PubMed  Peterson B.K., Weber J.N., Kay E.H., Fisher H.S., Hoekstra H.E. 2012. Double digest RADseq: an inexpensive method for de novo SNP discovery and genotyping in model and non-model species. Plos One  7: e37135. Google Scholar CrossRef Search ADS PubMed  Poland J.A., Brown P.J., Sorrells M.E., Jannink J.L. 2012. Development of high-density genetic maps for barley and wheat using a novel two-enzyme genotyping-by-sequencing approach. Plos One  7: e32253. Google Scholar CrossRef Search ADS PubMed  Pritchard J.K., Stephens M., Donnelly P. 2000. Inference of population structure using multilocus genotype data. Genetics  155: 945– 959. Google Scholar PubMed  R Development Core Team. 2016. R: A language and environment for statistical computing.  Vienna: R Foundation for Statistical Computing. Ree R.H., Hipp A.L. 2015. Inferring phylogenetic history from restriction site associated DNA (RADseq). In: Hörandl E., Appelhans M.S., editors. Next-generation sequencing in plant systematics.  Königstein: Koeltz Scientific Books p. 181– 204. Revell L.J. 2012. phytools: an R package for phylogenetic comparative biology (and other things). Methods Ecol. Evol.  3: 217– 223. Google Scholar CrossRef Search ADS   Roure B., Baurain D., Philippe H. 2013. Impact of missing data on phylogenies inferred from empirical phylogenomic data sets. Mol. Biol. Evol.  30: 197– 214. Google Scholar CrossRef Search ADS PubMed  Rubin B.E.R., Ree R.H., Moreau C.S. 2012. Inferring phylogenies from RAD sequence data. Plos One  7: e33394. Google Scholar CrossRef Search ADS PubMed  Sáez L., Bernal M. 2009. Linaria Mill. In: Castroviejo S., Herrero A., Benedí C., Rico E., Güemes J., editors. Flora iberica XIII (Plantaginaceae – Scrophulariaceae).  Madrid: CSIC p. 232– 324. Salichos L., Rokas A. 2013. Inferring ancient divergences requires genes with strong phylogenetic signals. Nature  497: 327– 331. Google Scholar CrossRef Search ADS PubMed  Sang T. 2002. Utility of low-copy nuclear gene sequences in plant phylogenetics. Crit. Rev. Biochem. Mol.  37: 121– 147. Google Scholar CrossRef Search ADS   Schluter D. 2000. The ecology of adaptive radiation. Oxford: Oxford University Press. Seehausen O. 2006. African cichlid fish: a model system in adaptive radiation research. Proc. Roy. Soc. Lond. B Biol. Sci.  273: 1987– 1998. Google Scholar CrossRef Search ADS   Seo T.K. 2008. Calculating bootstrap probabilities of phylogeny using multilocus sequence data. Mol. Biol. Evol.  25: 960– 971. Google Scholar CrossRef Search ADS PubMed  Shafer A., Peart C.R., Tusso S., Maayan I., Brelsford A., Wheat C.W., Wolf J.B.W. 2016. Bioinformatic processing of RAD-seq data dramatically impacts downstream population genetic inference. Methods Ecol. Evol.  doi: 10.1111/2041-210X.12700. Shaffer H.B., Thomson R.C. 2007. Delimiting species in recent radiations. Syst. Biol.  56: 896– 906. Google Scholar CrossRef Search ADS PubMed  Shaw J., Lickey E.B., Schilling E.E., Small R.L. 2007. Comparison of whole chloroplast genome sequences to choose noncoding regions for phylogenetic studies in angiosperms: the tortoise and the hare III. Am. J. Bot.  94: 275– 288. Google Scholar CrossRef Search ADS PubMed  Shaw K.L. 2002. Conflict between nuclear and mitochondrial DNA phylogenies of a recent species radiation: what mtDNA reveals and conceals about modes of speciation in Hawaiian crickets. Proc. Natl. Acad. Sci. USA  99: 16122– 16127. Google Scholar CrossRef Search ADS   Shimodaira H. 2002. An approximately unbiased test of phylogenetic tree selection. Syst. Biol.  51: 492– 508. Google Scholar CrossRef Search ADS PubMed  Shimodaira H., Hasegawa M. 1999. Multiple comparisons of log-likelihoods with applications to phylogenetic inference. Mol. Biol. Evol.  16: 1114– 1116. Google Scholar CrossRef Search ADS   Shimodaira H., Hasegawa M. 2001. CONSEL: for assessing the confidence of phylogenetic tree selection. Bioinformatics  17: 1246– 1247. Google Scholar CrossRef Search ADS PubMed  Sidlauskas B. 2008. Continuous and arrested morphological diversification in sister clades of characiform fishes: a phylomorphospace approach. Evolution  62: 3135– 3156. Google Scholar CrossRef Search ADS PubMed  Solís-Lemus C., Ané C. 2016. Inferring phylogenetic networks with maximum pseudolikelihood under incomplete lineage sorting. Plos Genet.  12: e1005896. Google Scholar CrossRef Search ADS PubMed  Solís-Lemus C., Yang M., Ané C. 2016. Inconsistency of species-tree methods under gene flow. Syst. Biol.  65: 843– 851. Google Scholar CrossRef Search ADS PubMed  Stamatakis A. 2006. Phylogenetic models of rate heterogeneity: a high performance computing perspective. Proceedings of the IPDPS2006, Rhodos,  Greece: IEEE. doi: 10.1109/IPDPS.2006.1639535. Stamatakis A. 2014. RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics  30: 1312– 1313. Google Scholar CrossRef Search ADS PubMed  Suchan T., Pitteloud C., Gerasimova N.S., Kostikova A., Schmid S., Arrigo N., Pajkovic M., Ronikier M., Alvarez N. 2016. Hybridization capture using RAD probes (hyRAD), a new tool for performing genomic analyses on collection specimens. Plos One  11: e0151651. Google Scholar CrossRef Search ADS PubMed  Sutton D.A. 1988. A revision of the tribe Antirrhineae.  Oxford: Oxford University Press. Swofford D. 2002. PAUP*. Phylogenetic analysis using parsimony (*and other methods). Version 4. Sunderland, Massachusetts: Sinauer. Takahashi T., Nagata N., Sota T. 2014. Application of RAD-based phylogenetics to complex relationships among variously related taxa in a species flock. Mol. Phylogenet. Evol.  80: 137– 144. Google Scholar CrossRef Search ADS PubMed  Valdés B. 1970. Taxonomía experimental del género Linaria IV. Reproducción sexual: autogamia y polinización intraespecífica. Bol. R. Soc. Esp. Hist. Nat. Sec. Biol.  68: 79– 89. Valente L.M., Savolainen V., Vargas P. 2010. Unparalleled rates of species diversification in Europe. Proc. Roy. Soc. Lond. B Biol. Sci.  277: 1489– 1496. Google Scholar CrossRef Search ADS   van Orsouw N.J., Hogers R.C.J., Janssen A., Yalcin F., Snoeijers S., Verstege E., Schneiders H., van der Poel H., Van Oeveren J., Verstegen H. 2007. Complexity reduction of polymorphic sequences (CRoPS™): a novel approach for large-scale polymorphism discovery in complex genomes. Plos One  2: e1172. Google Scholar CrossRef Search ADS PubMed  Vargas P., Zardoya R. 2014. The tree of life: evolution and classification of living organisms.  Sunderland, Massachusetts: Sinauer Associates Inc. Venables W.N., Ripley B.D. 2002. Modern applied statistics with S.  4th ed. New York: Springer. Google Scholar CrossRef Search ADS   Vigalondo B., Fernández-Mazuecos M., Vargas P., Sáez L. 2015. Unmasking cryptic species: morphometric and phylogenetic analyses of the Ibero-North African Linaria incarnata complex. Bot. J. Linn. Soc.  177: 395– 417. Google Scholar CrossRef Search ADS   Wagner C.E., Keller I., Wittwer S., Selz O.M., Mwaiko S., Greuter L., Sivasundar A., Seehausen O. 2013. Genome-wide RAD sequence data provide unprecedented resolution of species boundaries and relationships in the Lake Victoria cichlid adaptive radiation. Mol. Ecol.  22: 787– 798. Google Scholar CrossRef Search ADS PubMed  Waterhouse A.M., Procter J.B., Martin D.M.A., Clamp M., Barton G.J. 2009. Jalview Version 2—a multiple sequence alignment editor and analysis workbench. Bioinformatics  25: 1189– 1191. Google Scholar CrossRef Search ADS PubMed  Whittall J.B., Voelckel C., Kliebenstein D.J., Hodges S.A. 2006. Convergence, constraint and the role of gene expression during adaptive radiation: floral anthocyanins in Aquilegia. Mol. Ecol.  15: 4645– 4657. Google Scholar CrossRef Search ADS PubMed  © The Author(s) 2017. Published by Oxford University Press on behalf of the Systematic Biologists. All rights reserved. For Permissions, please email: journals.permissions@oup.com

Journal

Systematic BiologyOxford University Press

Published: Mar 1, 2018

There are no references for this article.

You’re reading a free preview. Subscribe to read the entire article.


DeepDyve is your
personal research library

It’s your single place to instantly
discover and read the research
that matters to you.

Enjoy affordable access to
over 12 million articles from more than
10,000 peer-reviewed journals.

All for just $49/month

Explore the DeepDyve Library

Unlimited reading

Read as many articles as you need. Full articles with original layout, charts and figures. Read online, from anywhere.

Stay up to date

Keep up with your field with Personalized Recommendations and Follow Journals to get automatic updates.

Organize your research

It’s easy to organize your research with our built-in tools.

Your journals are on DeepDyve

Read from thousands of the leading scholarly journals from SpringerNature, Elsevier, Wiley-Blackwell, Oxford University Press and more.

All the latest content is available, no embargo periods.

See the journals in your area

Monthly Plan

  • Read unlimited articles
  • Personalized recommendations
  • No expiration
  • Print 20 pages per month
  • 20% off on PDF purchases
  • Organize your research
  • Get updates on your journals and topic searches

$49/month

Start Free Trial

14-day Free Trial

Best Deal — 39% off

Annual Plan

  • All the features of the Professional Plan, but for 39% off!
  • Billed annually
  • No expiration
  • For the normal price of 10 articles elsewhere, you get one full year of unlimited access to articles.

$588

$360/year

billed annually
Start Free Trial

14-day Free Trial