Genomics-informed species delimitation to support morphological identification of anglewing butterflies (Lepidoptera: Nymphalidae: Polygonia)

Genomics-informed species delimitation to support morphological identification of anglewing... Abstract Species delimitation and identification are integral to virtually all biological disciplines, but are far from straightforward tasks. Taxonomy has recently focused on integrative approaches that consider multiple types of data to resolve species boundaries, yet methodologies to that end are still being developed. Here, we assess species limits in an area of wide distributional overlap between several species of anglewing butterflies in western Canada. Focusing on an area of sympatry provides a rich system to test species boundaries in the face of potential gene flow between morphologically variable yet similar species. Mitochondrial DNA and genome-wide single nucleotide polymorphisms provided clear species delimitation, although previously identified cytonuclear discordance was also apparent. Analysis of two morphological data sets, based on characters commonly used as diagnostic characters in the literature and field guides, was variably successful at separating species. Using analyses that were guided by the results of the genetic data increased successful species identification for traditionally used characters, but not for the morphological data set based on digital colour analysis. Our application of genetics to guide morphological analysis demonstrates a useful approach for implementing iterative methodologies as part of integrative taxonomy. cytonuclear discordance, genome-wide SNPs, genotyping-by-sequencing, mitochondrial phylogeny, Polygonia, species identification INTRODUCTION Species delimitation and identification are essential to virtually all biological disciplines, but are far from straightforward tasks. Morphological variation (Lumley & Sperling, 2010; McKay et al., 2014; Proshek et al., 2015), gene discordance (Maddison, 1997) and complex evolutionary histories (Dejaco et al., 2016) can obfuscate species limits, creating phylogenetic and taxonomic uncertainty. Such uncertainty is particularly undesirable in systems where incorrect delimitation of units can have large economic (e.g. Frey et al., 2013; Dumas et al., 2015), health-related (Sperling & Sperling, 2009; Müller et al., 2013) or conservation consequences (Bickford et al., 2007; Bortolus, 2008; Hedin, 2015; Proshek et al., 2015). Integrative taxonomy (Dayrat, 2005; Schlick-Steiner et al., 2010), which delimits species using a consensus of multiple data types (morphological, molecular, ecological, chemical, etc.), provides an appealing solution to cases of difficult species delimitation, particularly compared to approaches that use single data types (Dupuis, Roe & Sperling, 2012). It is open to discussion whether integrative taxonomy approaches are strictly analytically integrative and objective (e.g. with analyses that consider multiple data types simultaneously, and allowing results to be recreated) or whether they are iterative, considering data sources successively and independently to arrive at a qualitative consensus (Yeates et al., 2011). Regardless, assimilating variable data types into broadly integrative taxonomic consideration has yielded unprecedented resolution in many historically difficult groups (Lumley & Sperling, 2010; Wachter et al., 2015; Dejaco et al., 2016; Gratton et al., 2016). Genomic data are particularly powerful for resolving species boundaries due to the sheer size of genomic data sets. Reduced-representation genomic libraries generated from, for example, restriction site-associated DNA sequencing methods (RAD-seq), can, in similar timeframes, create data sets that have hundreds of times more phylogenetic information than traditional Sanger sequencing-based approaches (Cruaud et al., 2014). Such RAD-seq data sets have been used to resolve phylogenetic relationships (Wagner et al., 2013; Cruaud et al., 2014; DaCosta & Sorenson, 2016; Dupuis et al., 2017), define species limits (Leaché et al., 2014; Pante et al., 2015; Gratton et al., 2016; Razkin et al., 2016) and elucidate evolutionary history (Jones et al., 2013; Nadeau et al., 2013; Stervander et al., 2015). Although the various iterations of these methods have technical advantages and disadvantages (Andrews et al., 2016), and particular considerations for phylogenetic analysis (DaCosta & Sorenson, 2016; Dupuis et al., 2017), they have become one of the mainstay methodologies for systematic and evolutionary biology studies. Here, we assess species limits between anglewing butterflies (Polygonia Hübner, 1819) in western Canada. Of the 15 species described worldwide (Wahlberg et al., 2009), four are widespread in Alberta and British Columbia: Polygonia faunus (Edwards, 1862), P. gracilis (Grote & Robinson, 1867), P. progne (Cramer, 1775) and P. satyrus (Edwards, 1869). Widely variable yet shared morphological characteristics and similar ecologies (Nylin et al., 2015) make identification of these species challenging in the field (Bird et al., 1995; Layberry, Hall & Lafontaine, 1998; Acorn & Sheldon, 2006). Although previous phylogenetic studies have found these species to form monophyletic groups, discordance between mitochondrial and nuclear genes, morphology and ecology make for tenuous hypotheses of evolutionary relationships (Fig. 1; Wahlberg et al., 2009). Taken together, the complex situation across much of Alberta and British Columbia makes this an ideal system for an integrative taxonomic approach, a point highlighted by Wahlberg et al. (2009): ‘careful study of populations in Alberta seems warranted’. Figure 1. View largeDownload slide Alternate topologies of pertinent New World Polygonia species, generated from non-genetic, mitochondrial DNA and nuclear DNA data sets. Some data sets of similar types from different studies have been combined into a single tree, and differences in topologies between those data sets are indicated with grey text and branches. Commonly observed monophyletic groups are highlighted with grey boxes when present, and data sets generated in the current study are noted with an asterisk. Wing pattern: Nylin et al. (2001); MEB (synthesis of morphology, ecology and behaviour): Nylin et al. (2001), Wahlberg & Nylin (2003); cytochrome oxidase subunit I gene (COI): Wahlberg & Nylin (2003), Wahlberg et al. (2009); ND1: Nylin et al. (2001), Wahlberg & Nylin (2003), Wahlberg et al. (2009); EF-1α: Wahlberg & Nylin (2003), Wahlberg et al. (2009); wgl: Nylin et al. (2001), Wahlberg & Nylin (2003), Wahlberg et al. (2009); RpS5: Wahlberg et al. (2009). Figure 1. View largeDownload slide Alternate topologies of pertinent New World Polygonia species, generated from non-genetic, mitochondrial DNA and nuclear DNA data sets. Some data sets of similar types from different studies have been combined into a single tree, and differences in topologies between those data sets are indicated with grey text and branches. Commonly observed monophyletic groups are highlighted with grey boxes when present, and data sets generated in the current study are noted with an asterisk. Wing pattern: Nylin et al. (2001); MEB (synthesis of morphology, ecology and behaviour): Nylin et al. (2001), Wahlberg & Nylin (2003); cytochrome oxidase subunit I gene (COI): Wahlberg & Nylin (2003), Wahlberg et al. (2009); ND1: Nylin et al. (2001), Wahlberg & Nylin (2003), Wahlberg et al. (2009); EF-1α: Wahlberg & Nylin (2003), Wahlberg et al. (2009); wgl: Nylin et al. (2001), Wahlberg & Nylin (2003), Wahlberg et al. (2009); RpS5: Wahlberg et al. (2009). We use mitochondrial DNA (mtDNA) gene sequences, genome-wide single nucleotide polymorphisms (SNPs) generated by genotyping-by-sequencing (GBS, a form of RAD-seq) and two morphological data sets (visually and digitally scored wing characters based on commonly used field diagnostic characters) to test species limits between the four widespread species (and two subspecies of P. gracilis) in this region. Using phylogenetic and population genetic analyses, we first ask whether genetic data sets clearly separate species or whether there are signs of admixture or introgression between species. We then use multivariate analyses of the morphological data sets to test whether quantitative analysis of morphology leads to similar delimitation of species. Both genetic data sets predict distinct species limits, despite comparable cytonuclear discordance to previous studies, but only one morphological data set conforms to the genetic clusters. Given the clear species limits from genome-wide SNPs and mtDNA, we use the genetic data to inform subsequent analyses of the morphological data sets and, from these genetically guided morphological analyses, determine which morphological characters best delimit species. These results provide an iterative methodological approach to taxonomy, informing more accurate diagnoses of these species in the field. With the popularization of citizen science (Silvertown, 2009; Bonney et al., 2014), particularly in the realm of butterflies (Howard & Davis, 2009; Devictor, Whittaker & Beltrame, 2010; Larrivee et al., 2014), a framework for integrative taxonomy to guide the development of better diagnostic identification can have wide-reaching applications. MATERIAL AND METHODS Specimen collection, DNA extraction and photography Specimens were collected primarily as adults with an aerial net or bait trap (Leptraps, baited with banana) and frozen alive at −80 °C. Larvae were also collected and reared to adulthood on host clippings of the same species from which they were collected (Supporting Information, Table S1). We focused sampling in central and southern Alberta, southeast British Columbia and adjacent states, where six species/subspecies are regularly found: Polygonia faunus, P. gracilis gracilis (Grote and Robinson, 1867), P. gracilis zephyrus (Edwards, 1870), P. oreas (Edwards, 1869), P. progne and P. satyrus. Adequate sampling was achieved for five of these lineages, but we were unable to collect P. oreas, which is uncommon in the southwest corner of Alberta. Several additional specimens were included to represent other North American lineages: P. comma (Harris, 1841) and P. interrogationis (Fabricius, 1798); the latter is also occasionally found in Alberta during irruptive years (Layberry et al., 1998). We identified specimens using several field guides (Bird et al., 1995; Brock & Kaufman, 2003; Acorn & Sheldon, 2006) and the consensus opinion of multiple coauthors (CMM, JHA and FAHS), and follow the nomenclature of Pelham (2008). A total of 258 Polygonia specimens were collected [one P. comma, 106 P. faunus, 21 P. gracilis (10 P. g. gracilis and 11 P. g. zephyrus), one P. interrogationis, 71 P. progne and 58 P. satyrus] as well as seven specimens of Aglais milberti and Nymphalis spp. for use as outgroups. While we delimited subspecies for P. gracilis, we did not do so for P. faunus due to the lack of empirical evidence for those subspecies (Kodandaramaiah et al., 2012). Genomic DNA was extracted from legs and the ventral portion of the thorax using DNeasy Blood and Tissue kits (Qiagen). RNase A was used during the extraction (following Qiagen protocol), and the resulting DNA was ethanol precipitated and eluted in millipore water. DNA quality and quantity were assessed using a Nanodrop ND-1000 (Thermo Scientific) and a Qubit 1.0 dsDNA BR assay kit (Invitrogen), respectively. After DNA extraction, specimens were pinned and spread, and missing thoracic tissue was fortified with Archival Quality Neutral pH adhesive (Lineco). Some specimens lacked adequate thoracic tissue for pinning and were stored in glassine envelopes. Images of dorsal and ventral wing surfaces of all specimens were captured using a Canon Powershot A650 IS camera in macro setting, using aperture priority mode at f 22 to optimize depth of focus, and an exposure compensation of +12/3 to optimize the brightness of the images. The camera was manually white-balanced between photography sessions, with the same lighting used throughout the imaging process. The camera was mounted on a copy stand, and we maintained a consistent distance from specimen to camera lens using a modified plastazote foam pedestal. Image files were converted from JPEG to TIFF format in Adobe Photoshop to preserve resolution. All specimen information and photographs are databased in the University of Alberta E. H. Strickland Entomological Museum (accessions UASM370001-UASM370265; Supporting Information, Table S1). mtDNA sequencing and analysis The entire mitochondrial cytochrome oxidase subunit I gene (COI) was amplified in two fragments by polymerase chain reaction (PCR). We targeted the first ~650 bp using the primers LepF1 (5′-ATT CAA CCA ATC ATA AAG ATA TTG G-3′) and HCO (5′-TGA TTT TTT GGT CAC CCT GAA GTT TA-3′), and the remaining ~800 bp with JerryI (5′-CAA CAT TTA TTT TGA TTT TTT GG-3′) and PatII (5′-AGG TAA TGT ATA TTA GAC GGT ATA ACT-3′). Some samples were not successfully amplified using the latter pair, so JerryI and MilaIV (5′-AAA TTA GGA CAT TTA TTA CC-3′) were used to produce a shorter fragment. PCR cycling conditions consisted of a 5-min denaturation step at 94 °C, 35 cycles of 94 ° C for 30 s, 45 °C for 30 s and 72 °C for 2 min, and a final 5-min elongation at 74 °C. PCR products were cleaned using Exonuclease I and shrimp alkaline phosphatase (New England Biolabs) before dye-termination using BigDye sequencing pre-mix v2.1 (Life Technologies). Fragments were sequenced using the 3730 DNA analyzer (Applied Biosystems) at the Molecular Biology Service Unit at the University of Alberta. Forward and reverse COI reads were merged and manually inspected for sequencing call errors in Geneious v7.0.6 (Kearse et al., 2012). We then used Mesquite v2.75 (Maddison & Maddison, 2008) to align individual genes with the ClustalW algorithm (Thompson, Gibson & Higgins, 2003), before manually concatenating sequencing products into a single COI haplotype for every individual. COI sequences generated in our study are accessioned on GenBank (KY318516–KY318762; Supporting Information, Table S1). We also incorporated 89 COI sequences from five Polygonia taxa that were used by Wahlberg et al. (2009) (Supporting Information, Table S1). Finally, we manually trimmed the ends of the sequences and ~100 bp at the junction between the two amplified regions, both of which contained high amounts of missing data. This resulted in a final alignment of 1348 bp for 315 individuals. Parsimony-based haplotype networks were built in TCS (Clement, Posada & Crandall, 2000), using default settings. Gaps were treated as missing data to accommodate fragments that varied slightly in length. The genetic identity of these species was unambiguous (see ‘Results’), so haplotype networks were constructed for P. faunus and P. progne separately, while P. gracilis and P. satyrus were combined due to their similarity. We conducted maximum likelihood (ML) tree searches in IQ-TREE v1.4.2 (Nguyen et al., 2015) with 1000 replicates each of ultra-fast bootstrapping (Minh, Nguyen & von Haeseler, 2013) and the Shimodaira/Hasegawa approximate likelihood ratio test (SH-aLRT – Guindon et al., 2010). IQ-TREE’s model selection procedure using the Bayesian information criterion predicted a transition model (TIM2) + Γ. We conducted Bayesian inference (BI) analysis in ExaBayes v1.5 (Aberer, Kobert & Stamatakis, 2014) with four Metropolis-coupled replicates for one million generations (three heated chains), sampling every 1000 generations, and otherwise default parameters and models. We used Tracer v1.6 (Rambaut et al., 2014) to assess convergence and sampling of parameter values’ posterior distributions, ensuring that all effective sample sizes were >200, and also assessed the average SD of split frequencies and potential scale reduction factors across runs, which should approach zero and one, respectively. A consensus tree was built using TreeAnnotator (Drummond et al., 2012) after manually removing the first 25% of trees as burn-in. Both ML and Bayesian analyses of COI were conducted on a smaller data set of unique haplotypes only (172 haplotypes, including outgroups). Preliminary trees were viewed in FigTree v1.3.1 (Rambaut & Drummond, 2010). GBS library preparation, sequencing and data filtering GBS library preparation following Poland et al. (2012) was conducted at the Institut de Biologie Intégrative et des Systèmes at Université Laval. PstI and MspI restriction enzymes were used with a duplex-specific nuclease treatment (Zhulidov et al., 2004), and complexity reduction (a single C appended to the reverse primer) as described in Sonah et al. (2013). Two to eight base pair barcodes were used to identify individuals and four lanes (including individuals not included in the present study) of 96-plex; 100 bp single-end sequencing was performed on an Illumina HiSeq 2000 at McGill University’s Génome Québec Innovation Centre. Process_radtags of Stacks v1.35 (Catchen et al., 2011) was use to demultiplex and rename raw FASTQ files, allowing no ambiguities in the individual-specific barcodes. As sequencing errors are more probably at the ends of sequencing reads, 5 bp were trimmed from the 3′ end of the reads using an in-house Perl script. We then used Ustacks, Cstacks and Sstacks (Catchen et al., 2011) to assemble de novo catalogs and map reads to these catalogs. Default parameters were used for these utilities, except a minimum depth of five reads was required for initial catalog creation, and three mismatches were allowed in matching secondary reads to primary stacks. Finally, populations (Catchen et al., 2011) was used to call SNPs with a population map specifying a single population of all individuals; loci had to be present in one population and one individual to be processed, and a minimum stack depth of six was required for individuals at each locus. All Stacks utilities and associated filtering were run using Perl wrappers, which also made use of the BioPerl toolkit (Stajich et al., 2002), and these are available at https://github.com/muirheadk/GBS_analysis_pipeline. Two main types of input files were constructed from populations vcf and fasta outputs: one for phylogenetic and the second for population genetic analyses. First, we used vcftools v0.1.12 (Danecek et al., 2011) to calculate a preliminary measure of missing data per individual and removed individuals missing >50% genotype data (calculated after removing loci missing >50% data per individual); these individuals were removed from both phylogenetic and population genetic input files in the following steps. For phylogenetic analyses, we used an in-house Perl script that ingests fasta output from populations, removes loci with high missing data (here a threshold of 50% was used, but preliminary analyses showed little difference when using alternative missing data thresholds) and outputs DNA sequence alignment with missing data symbols for missing loci per individual. This alignment contained SNPs as well as their flanking sequences from the variable length loci of the de novo catalog construction and resulted in an alignment of 121 513 bp (2572 SNPs) for 248 individuals. Including flanking sequence around SNPs allows the use of more realistic models of DNA evolution, such as those that take into account invariant sites (e.g. Steel & Penny, 2000, and references therein). We also used vcftools and an in-house Perl script to construct a SNP-only alignment with the same set of loci, where SNPs were coded using IUPAC ambiguity codes (e.g. an A/G heterozygote would be coded R). For population genetic analyses, we removed loci that had >20% missing data from vcf output of populations using vcftools and used PGDSpider v2.0.1.9 (Lischer & Excoffier, 2012) to convert to structure format. We also removed outgroup specimens, leading to a final data set of 241 individuals and 961 SNPs. GBS analyses ML tree searches of the GBS alignments were conducted identically to analyses of the COI alignment. IQ-TREE’s model selection procedure selected the transversion model (TVM) + I + Γ and TVM + Γ for the 121 513 bp sequence-based alignment and the 2572 SNP-based alignment, respectively. Bayesian analyses of the GBS alignments were also conducted the same as was done for COI, except that we ran the analyses for five million generations and sampled every 2500 generations. We used two methods to assess population structure in the 961 SNP data sets. First, we estimated the number of genetic clusters (K) with STRUCTURE v2.3.4 (Pritchard, Stephens & Donnelly, 2000; Falush, Stephens & Pritchard, 2003). Ten iterations of K = 1–10 were run with the admixture model, correlated allele frequencies, and without any prior information about geographic origin or species. A burn-in of 50 000 Markov chain Monte Carlo generations preceded 150 000 generations that were summarized using CLUMPAK (Kopelman et al., 2015). We considered LnPr(X|K) (Pritchard et al., 2000) and ΔK (Evanno, Regnaut & Goudet, 2005) to select the most likely value of K and conducted hierarchical structure analysis (Pritchard & Wen, 2004; Vähä et al., 2007) with the same parameters to address substructure in the main genetic clusters. Second, discriminant analysis of principal components (DAPC – Jombart, Devillard & Balloux, 2010) was implemented the adegenet library (Jombart, 2008) in R v3.3.1 (R Core Team, 2016). We used find.clusters to estimate K with default parameters and retaining all principal components (PCs), and cross-validation (xvaldapc) to determine the optimal number of PCs to retain in the discriminant analysis (testing 100 maximum PCs and 100 replicates). Hierarchical analysis was also conducted with DAPC. Morphological character scoring Two main categories of morphological wing characters are used to delimit Polygonia species in field guides: those based on discrete or ordinal characteristics of the wings (spots, shapes, etc.) and those based on colour. We assembled a data matrix from both and, given the inherent differences in data type, conducted independent analyses for each category. First, we reviewed the literature and field guides to summarize characters commonly used to diagnose and delimit the five lineages of Polygonia considered here (Supporting Information, Table S2). From this list, we selected characters that were considered highly diagnostic, and those with states that were unique to individual species. Characters that were present in similar states in multiple species were excluded (e.g. submarginal green spots on the ventral hindwing are considered highly diagnostic of P. faunus, but similar spots, although more yellow-green and slightly smaller, are found in P. satyrus and P. progne, respectively). We selected and scored ten discrete/ordinal characters, each with between two and five character states (Fig. 2), and refer to these as ‘visually scored characters’. Figure 2. View largeDownload slide Visually scored characters on dorsal and ventral wing surfaces, and proportion of each character state across species – (c) and (s) denote contrasted and smeared forms of Polygonia faunus and P. satyrus, respectively. Figure 2. View largeDownload slide Visually scored characters on dorsal and ventral wing surfaces, and proportion of each character state across species – (c) and (s) denote contrasted and smeared forms of Polygonia faunus and P. satyrus, respectively. Second, we assembled a set of colour luminance characters with insight from field guides and the literature, using a digital scoring approach developed for tortricid moths by Lumley & Sperling (2010). These characters were defined by shapes on the wings, and bounded by either scale patterns or veins so that they were reliably scored. Additionally, we required that these wing areas were present across all taxa (i.e. the colour of a spot could not be measured if the spot was absent in one species). Luminance values were measured from the standardized photographs of specimens in ImageJ (Schneider, Rasband & Eliceiri, 2012) using the Colour Histogram package (Prodanov, 2010). Red, green and blue (RGB) values were measured for all pixels within each wing area and averaged to create a set of three individual characters for each wing area. In total, six wing areas were measured, creating a final matrix of 18 characters (Fig. 3), and we refer to these as ‘RGB characters’. Figure 3. View largeDownload slide Red, green and blue (RGB) characters on the dorsal and ventral wing surfaces, and the average RGB luminance values (combined red, green and blue) visualized as colour swatches for each species – (c) and (s) denote contrasted and smeared forms of Polygonia faunus and P. satyrus, respectively. Figure 3. View largeDownload slide Red, green and blue (RGB) characters on the dorsal and ventral wing surfaces, and the average RGB luminance values (combined red, green and blue) visualized as colour swatches for each species – (c) and (s) denote contrasted and smeared forms of Polygonia faunus and P. satyrus, respectively. All morphological character scoring was conducted by CMM. Female P. faunus and P. satyrus exhibit two morphotypes, referred to as ‘smeared’ and ‘contrasted’ in this study (see Supporting Information, Fig. S1). Smeared morph females generally lack striations and contrast on the ventral side of the wings, whereas the contrasted morph females appear striated with alternating bands of light and dark coloration, much like their conspecific males. These morphs were treated as separate taxa for morphological analysis. Morphological analysis We used eigenvector-based multivariate analyses to maximize the variance among individuals for morphological characters and visualize morphological similarity between species. Because of the inherent differences between the visually scored and RGB characters (the former were discrete/ordinal and the latter continuous), separate multivariate analyses were used for each data set. Multiple correspondence analysis (MCA) and principal component analysis (PCA) were used for the visually scored and RGB characters, respectively. In some cases, these analyses did not provide resolution of species boundaries. In an effort to increase this resolution, we also conducted discriminant correspondence analysis (DCA) and linear discriminant analysis (LDA) (for visually scored and RGB characters, respectively), which incorporate a priori defined groups into the analysis. The results of the genetic analyses were unambiguous as to the species-level identifications, so we used SNP genetic clusters as the a priori species identity of individuals. Polygonia faunus and P. satyrus were further divided into morphological types (smeared vs. contrasted morphs), and P. gracilis was divided into putative subspecies P. g. gracilis and P. g. zephyrus based on diagnostic descriptions in field guides and collection location (Bird et al., 1995; Pyle, 2002; Shepard & Guppy, 2011). All analyses were conducted using the ade4 library (Dray & Dufour, 2007) in R. Individuals containing missing data (due to wing wear, missing characters, etc.) were removed prior to the analysis, resulting in the following sample sizes: Visually scored characters: P. faunus (smeared n = 23, contrasted n = 67), P. g. gracilis (n = 6), P. g. zephyrus (n = 11), P. progne (n = 62) and P. satyrus (smeared n = 11, contrasted n = 37) and RGB characters: P. faunus (smeared n = 28, contrasted n = 67), P. g. gracilis (n = 8), P. g. zephyrus (n = 11), P. progne (n = 68) and P. satyrus (smeared n = 17, contrasted n = 38). Character states for all individuals are provided in Supporting Information, Table S1. RESULTS Mitochondrial DNA All analyses of COI (parsimony-based haplotype networks and ML and BI tree searches) gave the same broad relationships between species that have been identified previously using mtDNA (Fig. 1; Supporting Information, Fig. S2). The only exception was a divergent mitochondrial clade (G-X) found in three individuals collected for this study. This clade is positioned basally to P. gracilis, P. progne and P. satyrus, but morphological and GBS data identified these individuals as P. gracilis (both subspecies). The low number of non-synonymous mutations (two in total) in this lineage supported a mitochondrial origin, rather than a nuclear insertion of a mitochondrial gene, and repeated extraction and sequencing confirmed that these sequences were not chimeras or the result of wet lab error. Apart from this divergent lineage, all species conformed to well-supported monophyletic clades (Supporting Information, Fig. S2). Within these clades, there was little structured genetic variation, and many individuals from disparate geographic locations and ecozones shared identical haplotypes (Fig. 4). Additionally, P. g. gracilis and P. g. zephyrus were not distinct in either phylogenetic or haplotype network analyses (Supporting Information, Fig. S2). Figure 4. View largeDownload slide A, cytochrome oxidase subunit I gene (COI) sampling localities divided by general ecozones and geographic proximity. Open circles represent specimens collected for this study [for which genotyping-by-sequencing (GBS) and morphological data may be available], and closed circles represent sequences from GenBank. B, Haplotype networks constructed from COI data set for each species (the dashed line between Polygonia gracilis and P. satyrus represent the break between those two species). Circles represent individual haplotypes, circle diameter is proportional to the number of individuals for each haplotype. Small dots represent missing haplotypes, and the cross-hatched line extending to haplotype G-X represents 22 substitutions. Some major haplotypes for P. gracilis, P. progne and P. satyrus were named in this study, those for P. faunus follow Kodandaramaiah et al. (2012). Figure 4. View largeDownload slide A, cytochrome oxidase subunit I gene (COI) sampling localities divided by general ecozones and geographic proximity. Open circles represent specimens collected for this study [for which genotyping-by-sequencing (GBS) and morphological data may be available], and closed circles represent sequences from GenBank. B, Haplotype networks constructed from COI data set for each species (the dashed line between Polygonia gracilis and P. satyrus represent the break between those two species). Circles represent individual haplotypes, circle diameter is proportional to the number of individuals for each haplotype. Small dots represent missing haplotypes, and the cross-hatched line extending to haplotype G-X represents 22 substitutions. Some major haplotypes for P. gracilis, P. progne and P. satyrus were named in this study, those for P. faunus follow Kodandaramaiah et al. (2012). Genotyping-by-sequencing In total, 452.8 million reads were generated in four lanes of sequencing for the specimens included in this study. Filtering with process_radtags reduced this to 345.0 million reads, and 315.5 million reads (average 1.3 million per individual) were incorporated into the de novo assembly for SNP calling, leading to 127 661 catalog loci. All ML and BI phylogenetic analyses of filtered GBS data (including flanking sequence or SNPs only) agreed on a single set of relationships between species (Figs 1, 5; Supporting Information, Figs S3–S5). As with mtDNA, P. faunus was basal to the remaining Polygonia species, but the remainder of the tree showed a sister relationship between the clades containing P. interrogationis + (P. comma + P. satyrus) and P. gracilis + P. progne. All species were reciprocally monophyletic and little geographic structure was apparent within species (Fig. 5). Bayesian runs converged by all metrics for the sequence-based analysis; however, several effective sample size values for the SNPs only analysis were below 200; regardless, the same main topology was found for both data sets. Figure 5. View largeDownload slide A, Sampling across the zone of contact with broad ecozones indicated by different shapes and black outlines. Black lines connecting shapes indicate comparable ecozones in Figure 4 that were subdivided here to provide more resolution. B, Consensus tree from maximum likelihood (ML) analysis of 121 513 bp alignment of genotyping-by-sequencing (GBS) data (including flanking sequence). Black asterisks indicate >95% ultra-fast bootstrap + >80% Shimodaira/Hasegawa approximate likelihood ratio test (SH-aLRT) from ML analysis and >0.9 posterior probability from Bayesian inference (BI); grey asterisks indicate highly supported nodes (using the aforementioned criteria) in only one analysis (generally nodes were well-supported in ML but not with BI). Grey circles for Polygonia gracilis individuals indicate P. g. zephyrus, and individuals with the G-X mitochondrial haplotype indicated with ‘G-X’. C, STRUCTURE barplots from 961 single nucleotide polymorphisms (SNPs) for K = 4 (overall clustering) and the finest level of substructure for each main cluster (excluding P. comma and P. interrogationis); no substructure was observed within P. gracilis or P. progne. Photographs, used with permission, by Kim Davis, Mike Strangeland and Andrew Warren. Figure 5. View largeDownload slide A, Sampling across the zone of contact with broad ecozones indicated by different shapes and black outlines. Black lines connecting shapes indicate comparable ecozones in Figure 4 that were subdivided here to provide more resolution. B, Consensus tree from maximum likelihood (ML) analysis of 121 513 bp alignment of genotyping-by-sequencing (GBS) data (including flanking sequence). Black asterisks indicate >95% ultra-fast bootstrap + >80% Shimodaira/Hasegawa approximate likelihood ratio test (SH-aLRT) from ML analysis and >0.9 posterior probability from Bayesian inference (BI); grey asterisks indicate highly supported nodes (using the aforementioned criteria) in only one analysis (generally nodes were well-supported in ML but not with BI). Grey circles for Polygonia gracilis individuals indicate P. g. zephyrus, and individuals with the G-X mitochondrial haplotype indicated with ‘G-X’. C, STRUCTURE barplots from 961 single nucleotide polymorphisms (SNPs) for K = 4 (overall clustering) and the finest level of substructure for each main cluster (excluding P. comma and P. interrogationis); no substructure was observed within P. gracilis or P. progne. Photographs, used with permission, by Kim Davis, Mike Strangeland and Andrew Warren. Population genetic analysis also supported these divisions. To evaluate the overall STRUCTURE analysis, we used two methods: Ln Pr(X|K) supported values of K between three and five, and ΔK supported K = 5. However, it was apparent that K = 5 only introduced a partial cluster to P. comma and P. interrogationis (each represented by a single individual), and K = 3 simply combined P. gracilis and P. progne. For these reasons, we chose K = 4 as the optimal value of K, which corresponded to the four species considered here (barplots of all values of K provided in Supporting Information, Fig. S6). Hierarchical analysis of these four clusters identified substructure in P. faunus (K = 2) and P. satyrus (K = 3) (Fig. 5; barplots of all values of K for hierarchical analysis provided in Supporting Information, Fig. S6). Although this substructure partially corresponds to distinct clades identified with the phylogenetic analyses, there are no clear geographic or habitat-related patterns except a distinct cluster of five individuals of P. satyrus from a single population in the foothills of the Rocky Mountains (Fig. 5). DAPC corroborated the overall pattern of genetic structure, with find.clusters predicting values of K from four to six, and the first and second discriminant functions strongly separating the four species (Supporting Information, Fig. S7). Some substructure was also supported by DAPC, notably the two main clusters of P. faunus and the distinct cluster of five individuals in P. satyrus (right most cluster in P. satyrus portion of Supporting Information, Fig. S7). Individuals of P. gracilis exhibiting the divergent G-X mitochondrial type were not distinct in any of the SNP-based analyses, and there was no support for separation of P. g. gracilis and P. g. zephyrus (Fig. 5). Morphological characters The plot of visually scored characters on the first two MCA axes produced several clusters with relatively little overlap that corresponded to the species clusters identified using SNP data (Fig. 6A). MCA axis 1 separated P. progne from the rest of the species and P. satyrus from P. gracilis, and axis 2 separated P. faunus from P. satyrus and P. gracilis. The only notable overlap between these clusters was between specimens of P. faunus and P. gracilis; although some of these individuals were worn, others showed morphological similarity in relatively fresh specimens. Gradients were evident between smeared and contrasted forms of P. faunus and P. satyrus, but no clustering was apparent for subspecies of P. gracilis. Based on the relative magnitudes of the canonical weights (loadings) of the MCA, several characters (notably characters 1, 5 and 6) were equally effective for separating P. progne from the rest of the species (Table 1), and a different set (characters 9, 2, 3 and 8) separated P. gracilis and P. satyrus from P. faunus. DCA of the visually scored characters (using a priori groupings based on SNP data) also separated the same main clusters but showed more overlap between species and less distinction between smeared/contrasted morphs (Fig. 6B). DCA axis 2 mainly separated P. satyrus from the other species, while DCA axis 1 separated P. faunus from P. progne, with P. gracilis as intermediate. Character 9 was most effective at separating species along the first axis (P. faunus from P. progne), and characters 9 and 10 were most effective along the second axis for the clear separation of P. satyrus from the rest of the species (Table 1). Third axes of MCA and DCA provided no additional separation of any species clusters (Supporting Information, Fig. S8). Figure 6. View largeDownload slide Multivariate analyses of wing characters. Multiple correspondence analysis (A) and discriminant correspondence analysis (B) of visually scored characters, and principal components analysis (C) and linear discriminant analysis (D) of red, green and blue (RGB) characters. Decimal values on axis labels indicate proportion of total variation explained [multiple correspondence analysis (MCA) and principal component analysis (PCA)] and the remaining proportion of variation [discriminant correspondence analysis (DCA) and linear discriminant analysis (LDA)]. Species inclusion for each specimen was determined by membership in single nucleotide polymorphism (SNP) clusters. Figure 6. View largeDownload slide Multivariate analyses of wing characters. Multiple correspondence analysis (A) and discriminant correspondence analysis (B) of visually scored characters, and principal components analysis (C) and linear discriminant analysis (D) of red, green and blue (RGB) characters. Decimal values on axis labels indicate proportion of total variation explained [multiple correspondence analysis (MCA) and principal component analysis (PCA)] and the remaining proportion of variation [discriminant correspondence analysis (DCA) and linear discriminant analysis (LDA)]. Species inclusion for each specimen was determined by membership in single nucleotide polymorphism (SNP) clusters. Table 1. Canonical loadings for three axes from multiple correspondence analysis (MCA) and discriminant correspondence analysis (DCA) of ten visually scored characters and principal component analysis (PCA) and linear discriminant analysis (LDA) of 18 mean red, green and blue luminance values from six RGB characters Char. MCA1 MCA2 MCA3 DCA1 DCA2 DCA3 1 0.807 0.295 0.095 −2.146 −6.032 −0.398 2 0.190 0.528 0.006 4.870 −6.046 8.963 3 0.149 0.512 0.073 7.210 −4.617 −5.972 4 0.526 0.006 0.004 −1.540 8.894 9.558 5 0.776 0.270 0.219 −1.729 −7.054 7.608 6 0.774 0.030 0.003 −14.391 4.310 8.249 7 0.002 0.134 0.426 −9.374 1.571 −42.971 8 0.706 0.506 0.516 −0.203 6.762 4.765 9 0.199 0.659 0.009 22.324 −15.677 3.823 10 0.607 0.002 0.090 −1.859 16.252 8.434 PCA1 PCA2 PCA3 LDA1 LDA2 LDA3 11R −0.158 0.408 0.017 0.047 0.137 0.012 11G −0.193 0.251 0.259 0.324 0.418 −0.528 11B −0.158 −0.176 0.307 −0.141 −0.350 0.416 12R −0.224 0.374 −0.156 0.047 −0.427 −0.196 12G −0.265 0.219 0.161 −0.398 0.135 0.932 12B −0.232 −0.029 0.392 0.201 0.052 −0.408 13R −0.222 0.361 −0.128 0.107 0.198 −0.435 13G −0.270 0.177 0.132 0.221 −0.239 0.556 13B −0.237 −0.043 0.356 −0.027 −0.046 0.077 14R −0.247 0.069 −0.276 0.261 −0.057 −0.109 14G −0.267 −0.054 −0.147 −0.452 −0.360 −0.754 14B −0.209 −0.136 0.310 0.168 0.250 0.151 15R −0.265 −0.072 −0.342 0.323 −0.660 0.419 15G −0.268 −0.235 −0.241 −1.163 0.685 1.668 15B −0.217 −0.372 −0.049 0.614 −1.024 −2.617 16R −0.275 −0.064 −0.269 −0.630 1.105 −1.276 16G −0.268 −0.230 −0.130 0.001 −0.154 0.673 16B −0.219 −0.324 0.118 0.564 −0.204 0.994 Char. MCA1 MCA2 MCA3 DCA1 DCA2 DCA3 1 0.807 0.295 0.095 −2.146 −6.032 −0.398 2 0.190 0.528 0.006 4.870 −6.046 8.963 3 0.149 0.512 0.073 7.210 −4.617 −5.972 4 0.526 0.006 0.004 −1.540 8.894 9.558 5 0.776 0.270 0.219 −1.729 −7.054 7.608 6 0.774 0.030 0.003 −14.391 4.310 8.249 7 0.002 0.134 0.426 −9.374 1.571 −42.971 8 0.706 0.506 0.516 −0.203 6.762 4.765 9 0.199 0.659 0.009 22.324 −15.677 3.823 10 0.607 0.002 0.090 −1.859 16.252 8.434 PCA1 PCA2 PCA3 LDA1 LDA2 LDA3 11R −0.158 0.408 0.017 0.047 0.137 0.012 11G −0.193 0.251 0.259 0.324 0.418 −0.528 11B −0.158 −0.176 0.307 −0.141 −0.350 0.416 12R −0.224 0.374 −0.156 0.047 −0.427 −0.196 12G −0.265 0.219 0.161 −0.398 0.135 0.932 12B −0.232 −0.029 0.392 0.201 0.052 −0.408 13R −0.222 0.361 −0.128 0.107 0.198 −0.435 13G −0.270 0.177 0.132 0.221 −0.239 0.556 13B −0.237 −0.043 0.356 −0.027 −0.046 0.077 14R −0.247 0.069 −0.276 0.261 −0.057 −0.109 14G −0.267 −0.054 −0.147 −0.452 −0.360 −0.754 14B −0.209 −0.136 0.310 0.168 0.250 0.151 15R −0.265 −0.072 −0.342 0.323 −0.660 0.419 15G −0.268 −0.235 −0.241 −1.163 0.685 1.668 15B −0.217 −0.372 −0.049 0.614 −1.024 −2.617 16R −0.275 −0.064 −0.269 −0.630 1.105 −1.276 16G −0.268 −0.230 −0.130 0.001 −0.154 0.673 16B −0.219 −0.324 0.118 0.564 −0.204 0.994 View Large Table 1. Canonical loadings for three axes from multiple correspondence analysis (MCA) and discriminant correspondence analysis (DCA) of ten visually scored characters and principal component analysis (PCA) and linear discriminant analysis (LDA) of 18 mean red, green and blue luminance values from six RGB characters Char. MCA1 MCA2 MCA3 DCA1 DCA2 DCA3 1 0.807 0.295 0.095 −2.146 −6.032 −0.398 2 0.190 0.528 0.006 4.870 −6.046 8.963 3 0.149 0.512 0.073 7.210 −4.617 −5.972 4 0.526 0.006 0.004 −1.540 8.894 9.558 5 0.776 0.270 0.219 −1.729 −7.054 7.608 6 0.774 0.030 0.003 −14.391 4.310 8.249 7 0.002 0.134 0.426 −9.374 1.571 −42.971 8 0.706 0.506 0.516 −0.203 6.762 4.765 9 0.199 0.659 0.009 22.324 −15.677 3.823 10 0.607 0.002 0.090 −1.859 16.252 8.434 PCA1 PCA2 PCA3 LDA1 LDA2 LDA3 11R −0.158 0.408 0.017 0.047 0.137 0.012 11G −0.193 0.251 0.259 0.324 0.418 −0.528 11B −0.158 −0.176 0.307 −0.141 −0.350 0.416 12R −0.224 0.374 −0.156 0.047 −0.427 −0.196 12G −0.265 0.219 0.161 −0.398 0.135 0.932 12B −0.232 −0.029 0.392 0.201 0.052 −0.408 13R −0.222 0.361 −0.128 0.107 0.198 −0.435 13G −0.270 0.177 0.132 0.221 −0.239 0.556 13B −0.237 −0.043 0.356 −0.027 −0.046 0.077 14R −0.247 0.069 −0.276 0.261 −0.057 −0.109 14G −0.267 −0.054 −0.147 −0.452 −0.360 −0.754 14B −0.209 −0.136 0.310 0.168 0.250 0.151 15R −0.265 −0.072 −0.342 0.323 −0.660 0.419 15G −0.268 −0.235 −0.241 −1.163 0.685 1.668 15B −0.217 −0.372 −0.049 0.614 −1.024 −2.617 16R −0.275 −0.064 −0.269 −0.630 1.105 −1.276 16G −0.268 −0.230 −0.130 0.001 −0.154 0.673 16B −0.219 −0.324 0.118 0.564 −0.204 0.994 Char. MCA1 MCA2 MCA3 DCA1 DCA2 DCA3 1 0.807 0.295 0.095 −2.146 −6.032 −0.398 2 0.190 0.528 0.006 4.870 −6.046 8.963 3 0.149 0.512 0.073 7.210 −4.617 −5.972 4 0.526 0.006 0.004 −1.540 8.894 9.558 5 0.776 0.270 0.219 −1.729 −7.054 7.608 6 0.774 0.030 0.003 −14.391 4.310 8.249 7 0.002 0.134 0.426 −9.374 1.571 −42.971 8 0.706 0.506 0.516 −0.203 6.762 4.765 9 0.199 0.659 0.009 22.324 −15.677 3.823 10 0.607 0.002 0.090 −1.859 16.252 8.434 PCA1 PCA2 PCA3 LDA1 LDA2 LDA3 11R −0.158 0.408 0.017 0.047 0.137 0.012 11G −0.193 0.251 0.259 0.324 0.418 −0.528 11B −0.158 −0.176 0.307 −0.141 −0.350 0.416 12R −0.224 0.374 −0.156 0.047 −0.427 −0.196 12G −0.265 0.219 0.161 −0.398 0.135 0.932 12B −0.232 −0.029 0.392 0.201 0.052 −0.408 13R −0.222 0.361 −0.128 0.107 0.198 −0.435 13G −0.270 0.177 0.132 0.221 −0.239 0.556 13B −0.237 −0.043 0.356 −0.027 −0.046 0.077 14R −0.247 0.069 −0.276 0.261 −0.057 −0.109 14G −0.267 −0.054 −0.147 −0.452 −0.360 −0.754 14B −0.209 −0.136 0.310 0.168 0.250 0.151 15R −0.265 −0.072 −0.342 0.323 −0.660 0.419 15G −0.268 −0.235 −0.241 −1.163 0.685 1.668 15B −0.217 −0.372 −0.049 0.614 −1.024 −2.617 16R −0.275 −0.064 −0.269 −0.630 1.105 −1.276 16G −0.268 −0.230 −0.130 0.001 −0.154 0.673 16B −0.219 −0.324 0.118 0.564 −0.204 0.994 View Large PCA of the RGB characters did not separate any species (Fig. 6C). Despite the fact that the first PC axis explained a relatively large amount of variation (54.8%), all species overlapped in all major axes and relative intraspecific variation was high. LDA of the same characters (using prior groupings) resulted in more overall separation between species than in the PCA; however, overlap was still present in some individuals. Polygonia satyrus was the only species that was more-or-less completely separated, along axis 1, and aspects of characters 15 and 16 (both ventral characters) were most effective at separating this species. There was no apparent clustering of any smeared vs. contrasted morphs, nor of P. gracilis subspecies. Based on pairwise comparisons of their luminance means, P. progne and P. g. zephyrus were most different (Supporting Information, Table S3, Fig. S9). Again, the third axes of PCA and LDA provided no additional separation (Supporting Information, Fig. S8). DISCUSSION Species limits and relationships All genetic data sets and analyses supported clear species limits between the four widespread species considered here: P. faunus, P. gracilis, P. progne and P. satyrus. Within these lineages, there was little or no intraspecific geographic clustering with genome-wide SNPs or mtDNA and no signatures of admixture or introgression between species. The only geographically coherent clustering we observed was with SNP data in individuals of P. satyrus from a single population in the foothills of the Rocky Mountains, which formed a monophyletic clade in phylogenetic analyses and a distinct genetic cluster using STRUCTURE (Fig. 5). These individuals were not distinctive with COI and exhibited one of the most common mitochondrial haplotypes (Supporting Information, Fig. S2). Additionally, they were all collected from the same locality on the same day, so siblingship may easily explain their genetic similarity. The simplest explanation for the broad patterns of genetic differentiation among these four Polygonia species is that reproductive isolation between species is strong, and evolutionary forces (retained ancestral polymorphism, high levels of gene flow, etc.) are maintaining homogeneity within these lineages. At least some Polygonia species undergo two-way migrations (P. interrogationis: Layberry et al., 1998), and all Polygonia have high dispersal capability (Braschler & Hill, 2007; Burke, Fitzsimmons & Kerr, 2011). Given their longevity (Polygonia overwinter as adults, with individual lifespans of 10–12 months; Bird et al., 1995), high dispersal rates could explain the low interspecific variation we observed here, particularly in combination with high levels of retained ancestral polymorphism. Another consideration is that the geographic scale of our sampling was relatively small compared to these species’ full distributions (Layberry et al., 1998). Polygonia faunus, for example, is found from Alaska south to California and New Mexico, and across Canada and the northern United States to the Atlantic coast; Kodandaramaiah et al. (2012) found two mitochondrial haplotype groups and around five nuclear genetic clusters using microsatellites across the range of P. faunus. A similar pattern has been found in P. c-album (Linnaeus, 1758), a species found across Europe, North Africa and Asia, where microsatellites only show patterns of strong differentiation at the largest geographic scales (Kodandaramaiah et al., 2011). However, the smaller area of distributional overlap that we sampled creates an opportunity to test the genomic integrity of these species. Despite their morphological variation and similarity, we found no signatures of admixture or introgression between these species, further supporting the hypothesis of strong reproductive isolation. This lack of introgression is an interesting contrast to some other Nymphalidae, where introgression and reticulate evolution has played an important evolutionary role [e.g. Heliconius spp. (Heliconius Genome Consortium, 2012; Pardo-Diaz et al., 2012; Zhang et al., 2016)]. In keeping with the observed lack of geographic patterns within species, we found no genetic support for differentiation of the subspecies of P. gracilis: P. g. gracilis and P. g. zephyrus. There is substantial disagreement in the literature regarding these lineages, with some considering them to be separate species (Guppy & Shephard, 2001; Wahlberg et al., 2009) and others conspecifics (Layberry et al., 1998; Pelham, 2008). Morphologically, there is a continuum between P. g. gracilis-like and P. g. zephyrus-like characteristics, and while the extremes of this continuum are morphologically diagnosable, many individuals from sympatric regions fall in the middle (Wahlberg et al., 2009). Genetically, they share haplotypes for several mitochondrial and nuclear genes, as well as with other close relatives – P. progne, P. oreas, P. haroldii (Dewitz, 1877) (Wahlberg et al., 2009). Although our sample size is small for these lineages, genome-wide data support their conspecific status. Given the clear species limits between the lineages considered here, a logical step is to consider their phylogenetic relationships. However, our phylogenetic sampling is not comprehensive; we are missing five species that are thought to be interspersed among the taxa considered here [P. c-album, P. g-argenteum (Doubleday, 1848), P. haroldii, P. interposita (Staudinger, 1881) and P. oreas] and four basal species [P. c-aureum Linnaeus, 1758, P. egea (Cramer, 1775), P. gigantea (Leech, 1883) and P. undina (Grum-Grshimailo, 1890)]. For this reason, we will only briefly address phylogenetic considerations, and for more comprehensive phylogenetic discussions direct readers to Nylin et al. (2001), Wahlberg & Nylin (2003) and Wahlberg et al. (2009). Most notably, we observed similar cytonuclear discordance between mitochondrial and nuclear markers as has been observed using traditional gene sequencing approaches (Fig. 1; Nylin et al., 2001; Wahlberg & Nylin, 2003; Wahlberg et al., 2009). Discordance between relationships based on mtDNA and nuclear markers is well documented (Funk & Omland, 2003; Chan & Levin, 2005; Dupuis et al., 2012), and in Polygonia has been explained by ancestral hybridization creating shared haplotypes between species (Wahlberg et al., 2009) or infection with the maternally inherited bacterium Wolbachia (Kodandaramaiah et al., 2011, 2012). Both phenomena may have acted to obfuscate the relationships dictated by mtDNA, and we consider the true species relationships to lie closer to those predicted using more extensive nuclear markers and non-genetic data (to the extent that it can be illustrated using a bifurcating tree given the reticulate history of the group [Wahlberg et al., 2009; see Soucy, Huang & Gogarten (2015)]. Our discovery of the divergent G-X mitochondrial lineage in specimens of P. gracilis (Figs 1, 4) that are indistinguishable from other P. gracilis using SNPs further complicates the reticulate history of the group (Wahlberg et al., 2009). Whether this is the result of ancient hybridization or retention of an ancestral mitochondrial haplotype (Melo-Ferreira et al., 2014; Dupuis & Sperling, 2015; Naciri & Linder, 2015) is difficult to determine given the current sampling. It is clear that genome-wide assessments of genetic diversity with comprehensive range-wide sampling of each species will be essential to further understand these evolutionary histories. Genetic data informing morphological diagnostics For each of the morphological data sets (visually scored and RGB characters), we conducted two sets of analyses: one without prior information on species designations and one with prior information based on species identity from mtDNA and SNP data. While the visually scored characters separated species without guidance from genetic data (Fig. 6A), RGB characters did not (Fig. 6C). Interestingly, with the guidance of genetic data, the separation of species using visually scored characters actually decreased slightly (Fig. 6B), whereas delimitations using RGB characters improved but only to the point of distinguishing P. satyrus (Fig. 6D). While the respective performance of visually scored and RGB characters is likely due to the inherent information content of each of these character sets, there are also some species-specific differences. For instance, overall, P. satyrus was the most consistently separated species from the rest in analyses with prior species designations (DCA and LDA), despite P. progne having the largest degree of separation using designation-free MCA (Fig. 6A). We considered the relative loadings of characters for the axes from each analysis to infer which characters were most effectively separating, or failing to separate, species. For example, the division of the dorsal forewing spot across vein CuA2 (character 1, Fig. 2), contributed most to separation along the first axis of the MCA, which notably separated P. progne from the rest of the species (Fig. 6; Table 1). Given the prevalence of this character as a delimiting feature in the literature, this is unsurprising. However, other characters that are equally prevalent in the literature were not as diagnostic as expected, such as the shape of the silver comma (character 8). This may be because diagnostic characters in field guides are often written to allow users to distinguish between pairs of similar species, rather than all species in a group; a single character with overlapping states between P. satyrus and P. progne would not affect their species identification since they are otherwise easily distinguished. In fact, many of these characters would be diagnostic if P. faunus were removed from the taxon pool, as the variation in character states of this species was broad (Fig. 2). For example, every state of the aforementioned shape of the silver comma (character 8) was found in at least some specimens of P. faunus included in our study, several individuals did not fit neatly into any of the character states, and one specimen exhibited a marking that was upside-down. Surprisingly, we found few cases of visually scored characters being equally diagnostic across analyses (i.e. the same character contributing most to the separation of a given species in both MCA and DCA). The prominence of ventral forewing spots basal to the submarginal chevrons (character 9) was an exception to this trend, as it was the most explanatory character across axis 2 in the MCA and axis 1 in the DCA (Table 1). Neither of these axes clearly separated one species from the rest, but both displayed gradients from a large group of P. faunus to the other species, albeit with some overlap (Fig. 6). Overall, the following visually scored characters provided the most discrimination in these analyses: discal forewing spots across CuA2 (character 1), ventral forewing spots basal to the submarginal chevrons (character 9) and contrast between basal and distal halves of the ventral hindwing (character 10). Average RGB luminance measurements did not perform well in separating the species in our study. When prior species identifications were employed in an LDA, P. satyrus was the only species distinguished without substantial overlap (Fig. 6). Some characters that are frequently used in species diagnoses in the literature were informative for this separation, such as two ventral hindwing characters: the submarginal region of the ventral hindwing near the comma (character 15) and the overall colour of the ventral hind wing (character 16). Others were not as informative despite similar use in the literature, such as the shade of orange on the dorsal forewing (characters 11, 12 and 13) (Bird et al., 1995; Pyle, 2002). One character in particular, the overall colour of the ventral hind wing (character 16), encompassed a large portion of the wing with highly contrasting patterns. While we expected that averaging the luminance values across this contrast might obfuscate the overall colour of the character state, this was one of the more informative RGB characters. Qualitatively, many wing areas appear very similar in average colour across the species (Fig. 3), which may be due to relative amounts of reddish and black pigments (melanin) producing the same hue but different tones (Hovanitz, 1941). Overall, the utility of colour or luminance-based morphological characters may depend on group-specific characteristics, as they have been used to successfully delimit species in other groups (Lumley & Sperling, 2010; Watanabe, Nakazono & Ono, 2012; McKay et al., 2014). Polygonia may be a particularly difficult group for quantifying colour differences, or alternatively our focus on several species at once could be overcomplicating delimitation of these species. Although it is beyond the scope of this paper, combining quantitative methods such as those used here with species pair sampling across the morphological diversity of each species should more comprehensively address the transition from species delimitation to species identification. CONCLUSION Here, we assessed species limits in an area of widespread overlap between four species of Polygonia butterflies in western Canada. mtDNA and genome-wide SNPs provide clear species delimitation, despite well-documented cytonuclear discordance. Further, there was little evidence for population structure or subspecific distinctions. Visually and digitally scored morphological characters were variably successful for separating species, and the application of genetic data to guide morphological analyses improved species delimitation for the digitally scored data set. We identified several morphological characters that are highly informative for separating species, and more geographically comprehensive sampling is now needed to test the efficacy of the most informative diagnostic characters that we identified for field identifications of this morphologically difficult group. This application of genetic data to guide morphological analysis provides a framework for iterative approaches within broader integrative taxonomy and will undoubtedly continue to help taxonomists to resolve complicated groups. SUPPORTING INFORMATION Additional Supporting Information may be found in the online version of this article at the publisher's web-site: Table S1. Specimen collection information, mtDNA GenBank accessions and morphological data. ‘mtDNA only’ indicates specimens used from Wahlberg et al. (2009). Table S2. Diagnostic characters of Polygonia butterflies published in North American field guides, with an emphasis on western North America. If an author designated a character as being primary, or highly diagnostic, it is indicated with a grey background. DFW, dorsal forewing; DHW, dorsal hindwing; VFW, ventral forewing; VHW, ventral hindwing. Table S3. Summary of pairwise luminance mean values with non-overlapping confidence intervals in pairwise comparisons. Figure S1. Examples of smeared and contrasted morphs of P. faunus and P. satyrus used in this study. Figure S2. Consensus tree from ML analysis of unique COI haplotypes. Dots on nodes indicate high support: >80% SH-aLRT and >95% ultra-fast bootstrap in ML analysis, and >0.8 posterior probability in Bayesian analysis. Major clades are named (as in Fig. 4) in grey text, and grey circles indicate P. g. zephyrus individuals. Figure S3. Consensus tree from ML analysis of 2572 SNP data set. Node support values are SH-aLRT/ultra-fast bootstrap. Figure S4. Fifty per cent majority rule consensus from BI analysis of 121 513 bp alignment of GBS data (including flanking sequence) with posterior probability values for node support. Figure S5. Fifty per cent majority rule consensus from BI analysis of 2572 SNPs from GBS data with posterior probability values for node support. Figure S6. ΔK and LnPr(X|K) results, and STRUCTURE results for all values of K, for all individuals, P. faunus, P. satyrus, P. gracilis & P. progne, P. gracilis and P. progne. Figure S7. Discriminant analysis of principal components (DAPC) on 961 SNPs. Top left shows overall DAPC, and remaining panels are individual species coloured by localities. Polygonia gracilis did not have adequate sampling across localities to analyze by itself. Insets show relative explanatory power of individual discriminant functions. Figure S8. Multivariate analyses of wing characters, displaying third axes of each analysis. Multiple correspondence analysis (A) and discriminant correspondence analysis (B) of visually scored characters, and principal components analysis (C) and linear discriminant analysis (D) of RGB characters. Decimal values on axis labels indicate proportion of total variation explained (MCA, PCA) and the remaining proportion of variation (DCA, LDA). Species inclusion for each specimen was determined by membership in SNP clusters. Figure S9. Central tendency statistics of mean R, G and B values of six wing characters across 248 Polygonia butterflies. The upper and lower boundaries of each box are third and first quartiles, respectively; vertical span of the notches correspond to the upper and lower 95% confidence bounds; the mean is represented by black lines bisecting each box. The upper and lower fences (whiskers) indicate the maximum and minimum values in the data set, while outliers – those outside 1.5 times the interquartile range above the fourth quartile or below the first quartile – are shown as black points. ACKNOWLEDGEMENTS We are grateful to Benny Acorn, Gary Anweiler, Heather Bird, Bryan Brunet, Lina Söderlind and Janet Sperling for assistance in collecting specimens; Kurt Yakimovich for assistance with wet lab work; and Kevin Muirhead for bioinformatic assistance. Funding was provided by an Alberta Conservation Association grant to CMM (#RES0017428) and a National Science and Engineering Research Council Discovery grant to FAHS (#217174). This research was enabled in part by support provided by WestGrid (www.westgrid.ca) and Compute Canada Calcul Canada (www.computecanada.ca). Data files are provided in Supporting Information, and raw data are deposited as Sequence Read Archives in the National Center for Biotechnology Information under accessions SRR5725309–SRR5725392 (BioProject PRJNA390990). REFERENCES Aberer AJ , Kobert K , Stamatakis A . 2014 . ExaBayes: massively parallel Bayesian tree inference for the whole-genome era . Molecular Biology and Evolution 31 : 2553 – 2556 . Google Scholar CrossRef Search ADS PubMed Acorn JH , Sheldon I . 2006 . Butterflies of British Columbia . Vancouver : Lone Pine Publishing . Andrews KR , Good JM , Miller MR , Luikart G , Hohenlohe PA . 2016 . Harnessing the power of RADseq for ecological and evolutionary genomics . Nature Reviews: Genetics 17 : 81 – 92 . Google Scholar CrossRef Search ADS PubMed Bickford D , Lohman DJ , Sodhi NS , Ng PK , Meier R , Winker K , Ingram KK , Das I . 2007 . Cryptic species as a window on diversity and conservation . Trends in Ecology & Evolution 22 : 148 – 155 . Google Scholar CrossRef Search ADS PubMed Bird CD , Hilchie GJ , Kondla NG , Pike EM , Sperling FAH . 1995 . Alberta butterflies . Edmonton : Provincial Museum of Alberta . Bonney R , Shirk JL , Phillips TB , Wiggins A , Ballard HL , Miller-Rushing AJ , Parrish JK . 2014 . Citizen science. Next steps for citizen science . Science 343 : 1436 – 1437 . Google Scholar CrossRef Search ADS PubMed Bortolus A . 2008 . Error cascades in the biological sciences: the unwanted consequences of using bad taxonomy in ecology . AMBIO 37 : 114 – 118 . Google Scholar CrossRef Search ADS PubMed Braschler B , Hill JK . 2007 . Role of larval host plants in the climate-driven range expansion of the butterfly Polygonia c-album . The Journal of Animal Ecology 76 : 415 – 423 . Google Scholar CrossRef Search ADS PubMed Brock JP , Kaufman K . 2003 . Kaufman field guide to butterflies of North America . Boston : Houghton Mifflin Harcourt . Burke RJ , Fitzsimmons JM , Kerr JT . 2011 . A mobility index for Canadian butterfly species based on naturalists’ knowledge . Biodiversity and Conservation 20 : 2273 – 2295 . Google Scholar CrossRef Search ADS Catchen JM , Amores A , Hohenlohe P , Cresko W , Postlethwait JH . 2011 . Stacks: building and genotyping loci de novo from short-read sequences . G3 (Bethesda, Md.) 1 : 171 – 182 . Google Scholar CrossRef Search ADS PubMed Chan KM , Levin SA . 2005 . Leaky prezygotic isolation and porous genomes: rapid introgression of maternally inherited DNA . Evolution 59 : 720 – 729 . Google Scholar CrossRef Search ADS PubMed Clement M , Posada D , Crandall KA . 2000 . TCS: a computer program to estimate gene genealogies . Molecular Ecology 9 : 1657 – 1659 . Google Scholar CrossRef Search ADS PubMed Cruaud A , Gautier M , Galan M , Foucaud J , Sauné L , Genson G , Dubois E , Nidelet S , Deuve T , Rasplus JY . 2014 . Empirical assessment of RAD sequencing for interspecific phylogeny . Molecular Biology and Evolution 31 : 1272 – 1274 . Google Scholar CrossRef Search ADS PubMed DaCosta JM , Sorenson MD . 2016 . ddRAD-seq phylogenetics based on nucleotide, indel, and presence-absence polymorphisms: analyses of two avian genera with contrasting histories . Molecular Phylogenetics and Evolution 94 : 122 – 135 . Google Scholar CrossRef Search ADS PubMed Danecek P , Auton A , Abecasis G , Albers CA , Banks E , DePristo MA , Handsaker RE , Lunter G , Marth GT , Sherry ST , McVean G , Durbin R ; 1000 Genomes Project Analysis Group . 2011 . The variant call format and VCFtools . Bioinformatics 27 : 2156 – 2158 . Google Scholar CrossRef Search ADS PubMed Dayrat B . 2005 . Towards integrative taxonomy . Biological Journal of the Linnean Society 85 : 407 – 415 . Google Scholar CrossRef Search ADS Dejaco T , Gassner M , Arthofer W , Schlick-Steiner BC , Steiner FM . 2016 . Taxonomist’s nightmare … evolutionist’s delight: an integrative approach resolves species limits in jumping bristletails despite widespread hybridization and parthenogenesis . Systematic Biology 65 : 947 – 974 . Google Scholar CrossRef Search ADS PubMed Devictor V , Whittaker RJ , Beltrame C . 2010 . Beyond scarcity: citizen science programmes as useful tools for conservation biogeography . Diversity and Distributions 16 : 354 – 362 . Google Scholar CrossRef Search ADS Dray S , Dufour A-B . 2007 . The ade4 package: implementing the duality diagram for ecologists . Journal of Statistical Software 22 : 1 – 20 . Google Scholar CrossRef Search ADS Drummond AJ , Suchard MA , Xie D , Rambaut A . 2012 . Bayesian phylogenetics with BEAUti and the BEAST 1.7 . Molecular Biology and Evolution 29 : 1969 – 1973 . Google Scholar CrossRef Search ADS PubMed Dumas P , Barbut J , Le Ru B , Silvain JF , Clamens AL , d’Alençon E , Kergoat GJ . 2015 . Phylogenetic molecular species delimitations unravel potential new species in the pest genus Spodoptera Guenée, 1852 (Lepidoptera, Noctuidae) . PLoS One 10 : e0122407 . Google Scholar CrossRef Search ADS PubMed Dupuis JR , Brunet BMT , Bird HM , Lumley LM , Fagua G , Boyle B , Levesque R , Cusson M , Powell JA , Sperling FAH . 2017 . Genome-wide SNPs resolve phylogenetic relationships in the North American spruce budworm (Choristoneura fumiferana) species complex . Molecular Phylogenetics and Evolution 111 : 158 – 168 . Google Scholar CrossRef Search ADS PubMed Dupuis JR , Roe AD , Sperling FA . 2012 . Multi-locus species delimitation in closely related animals and fungi: one marker is not enough . Molecular Ecology 21 : 4422 – 4436 . Google Scholar CrossRef Search ADS PubMed Dupuis JR , Sperling FA . 2015 . Repeated reticulate evolution in North American Papilio machaon group swallowtail butterflies . PLoS One 10 : e0141882 . Google Scholar CrossRef Search ADS PubMed Evanno G , Regnaut S , Goudet J . 2005 . Detecting the number of clusters of individuals using the software STRUCTURE: a simulation study . Molecular Ecology 14 : 2611 – 2620 . Google Scholar CrossRef Search ADS PubMed Falush D , Stephens M , Pritchard JK . 2003 . Inference of population structure using multilocus genotype data: linked loci and correlated allele frequencies . Genetics 164 : 1567 – 1587 . Google Scholar PubMed Frey JE , Guillén L , Frey B , Samietz J , Rull J , Aluja M . 2013 . Developing diagnostic SNP panels for the identification of true fruit flies (Diptera: Tephritidae) within the limits of COI-based species delimitation . BMC Evolutionary Biology 13 : 106 . Google Scholar CrossRef Search ADS PubMed Funk DJ , Omland KE . 2003 . Species-level paraphyly and polyphyly: frequency, causes, and consequences, with insights from animal mitochondrial DNA . Annual Review of Ecology, Evolution, and Systematics 34 : 397 – 423 . Google Scholar CrossRef Search ADS Gratton P , Trucchi E , Trasatti A , Riccarducci G , Marta S , Allegrucci G , Cesaroni D , Sbordoni V . 2016 . Testing classical species properties with contemporary data: how “bad species” in the brassy ringlets (Erebia tyndarus complex, Lepidoptera) turned good . Systematic Biology 65 : 292 – 303 . Google Scholar CrossRef Search ADS PubMed Guindon S , Dufayard JF , Lefort V , Anisimova M , Hordijk W , Gascuel O . 2010 . New algorithms and methods to estimate maximum-likelihood phylogenies: assessing the performance of PhyML 3.0 . Systematic Biology 59 : 307 – 321 . Google Scholar CrossRef Search ADS PubMed Guppy CS , Shephard JH . 2001 . Butterflies of British Columbia . Vancouver : University of British Columbia Press in collaboration with the Royal British Columbia Museum . Hedin M . 2015 . High-stakes species delimitation in eyeless cave spiders (Cicurina, Dictynidae, Araneae) from central Texas . Molecular Ecology 24 : 346 – 361 . Google Scholar CrossRef Search ADS PubMed Heliconius Genome Consortium . 2012 . Butterfly genome reveals promiscuous exchange of mimicry adaptations among species . Nature 487 : 94 – 98 . CrossRef Search ADS PubMed Hovanitz W . 1941 . Parallel ecogenotypical color variation in butterflies . Ecology 22 : 259 – 284 . Google Scholar CrossRef Search ADS Howard E , Davis AK . 2009 . The fall migration flyways of monarch butterflies in eastern North America revealed by citizen scientists . Journal of Insect Conservation 13 : 279 – 286 . Google Scholar CrossRef Search ADS Jombart T . 2008 . adegenet: a R package for the multivariate analysis of genetic markers . Bioinformatics 24 : 1403 – 1405 . Google Scholar CrossRef Search ADS PubMed Jombart T , Devillard S , Balloux F . 2010 . Discriminant analysis of principal components: a new method for the analysis of genetically structured populations . BMC Genetics 11 : 94 . Google Scholar CrossRef Search ADS PubMed Jones JC , Fan S , Franchini P , Schartl M , Meyer A . 2013 . The evolutionary history of Xiphophorus fish and their sexually selected sword: a genome-wide approach using restriction site-associated DNA sequencing . Molecular Ecology 22 : 2986 – 3001 . Google Scholar CrossRef Search ADS PubMed Kearse M , Moir R , Wilson A , Stones-Havas S , Cheung M , Sturrock S , Buxton S , Cooper A , Markowitz S , Duran C , Thierer T , Ashton B , Meintjes P , Drummond A . 2012 . Geneious Basic: an integrated and extendable desktop software platform for the organization and analysis of sequence data . Bioinformatics 28 : 1647 – 1649 . Google Scholar CrossRef Search ADS PubMed Kodandaramaiah U , Weingartner E , Janz N , Dalén L , Nylin S . 2011 . Population structure in relation to host-plant ecology and Wolbachia infestation in the comma butterfly . Journal of Evolutionary Biology 24 : 2173 – 2185 . Google Scholar CrossRef Search ADS PubMed Kodandaramaiah U , Weingartner E , Janz N , Leski M , Slove J , Warren A , Nylin S . 2012 . Investigating concordance among genetic data, subspecies circumscriptions and host plant use in the nymphalid butterfly Polygonia faunus . PLoS One 7 : e41058 . Google Scholar CrossRef Search ADS PubMed Kopelman NM , Mayzel J , Jakobsson M , Rosenberg NA , Mayrose I . 2015 . Clumpak: a program for identifying clustering modes and packaging population structure inferences across K . Molecular Ecology Resources 15 : 1179 – 1191 . Google Scholar CrossRef Search ADS PubMed Larrivee M , Prudic KL , McFarland K , Kerr J . 2014 . eButterfly: a citizen-based butterfly database in the biological sciences . Available at: http://www.e-butterfly.org/ Layberry RA , Hall PW , Lafontaine JD . 1998 . The butterflies of Canada . Toronto : University of Toronto Press . Google Scholar CrossRef Search ADS Leaché AD , Fujita MK , Minin VN , Bouckaert RR . 2014 . Species delimitation using genome-wide SNP data . Systematic Biology 63 : 534 – 542 . Google Scholar CrossRef Search ADS PubMed Lischer HE , Excoffier L . 2012 . PGDSpider: an automated data conversion tool for connecting population genetics and genomics programs . Bioinformatics 28 : 298 – 299 . Google Scholar CrossRef Search ADS PubMed Lumley LM , Sperling FAH . 2010 . Integrating morphology and mitochondrial DNA for species delimitation within the spruce budworm (Choristoneura fumiferana) cryptic species complex (Lepidoptera: Tortricidae) . Systematic Entomology 35 : 416 – 428 . Google Scholar CrossRef Search ADS Maddison WP . 1997 . Gene trees in species trees . Systematic Biology 46 : 523 – 536 . Google Scholar CrossRef Search ADS Maddison WP , Maddison DR . 2008 . Mesquite: a modular system for evolutionary analysis . Evolution 62 : 1103 – 1118 . Google Scholar CrossRef Search ADS PubMed McKay BD , Mays HL Jr , Yao CT , Wan D , Higuchi H , Nishiumi I . 2014 . Incorporating color into integrative taxonomy: analysis of the varied tit (Sittiparus varius) complex in East Asia . Systematic Biology 63 : 505 – 517 . Google Scholar CrossRef Search ADS PubMed Melo-Ferreira J , Seixas FA , Cheng E , Mills LS , Alves PC . 2014 . The hidden history of the snowshoe hare, Lepus americanus: extensive mitochondrial DNA introgression inferred from multilocus genetic variation . Molecular Ecology 23 : 4617 – 4630 . Google Scholar CrossRef Search ADS PubMed Minh BQ , Nguyen MA , von Haeseler A . 2013 . Ultrafast approximation for phylogenetic bootstrap . Molecular Biology and Evolution 30 : 1188 – 1195 . Google Scholar CrossRef Search ADS PubMed Müller P , Pflüger V , Wittwer M , Ziegler D , Chandre F , Simard F , Lengeler C . 2013 . Identification of cryptic Anopheles mosquito species by molecular protein profiling . PLoS One 8 : e57486 . Google Scholar CrossRef Search ADS PubMed Naciri Y , Linder HP . 2015 . Species delimitation and relationships: The dance of the seven veils . Taxon 64 : 3 – 16 . Google Scholar CrossRef Search ADS Nadeau NJ , Martin SH , Kozak KM , Salazar C , Dasmahapatra KK , Davey JW , Baxter SW , Blaxter ML , Mallet J , Jiggins CD . 2013 . Genome-wide patterns of divergence and gene flow across a butterfly radiation . Molecular Ecology 22 : 814 – 826 . Google Scholar CrossRef Search ADS PubMed Nguyen LT , Schmidt HA , von Haeseler A , Minh BQ . 2015 . IQ-TREE: a fast and effective stochastic algorithm for estimating maximum-likelihood phylogenies . Molecular Biology and Evolution 32 : 268 – 274 . Google Scholar CrossRef Search ADS PubMed Nylin S , Nybolm K , Ronquist F , Janz N , Belicek J , Källersjo M . 2001 . Phylogeny of Polygonia, Nymphalis and related butterflies (Lepidoptera: Nymphalidae): a total-evidence analysis . Zoological Journal of the Linnean Society 132 : 441 – 448 . Google Scholar CrossRef Search ADS Nylin S , Söderlind L , Gamberale-Stille G , Audusseau H , Celorio-Mancera MDLP , Janz N , Sperling FAH . 2015 . Vestiges of an ancestral host plant: preference and performance in the butterfly Polygonia faunus and its sister species P. c-album . Ecological Entomology 40 : 307 – 315 . Google Scholar CrossRef Search ADS Pante E , Abdelkrim J , Viricel A , Gey D , France SC , Boisselier MC , Samadi S . 2015 . Use of RAD sequencing for delimiting species . Heredity 114 : 450 – 459 . Google Scholar CrossRef Search ADS PubMed Pardo-Diaz C , Salazar C , Baxter SW , Merot C , Figueiredo-Ready W , Joron M , McMillan WO , Jiggins CD . 2012 . Adaptive introgression across species boundaries in Heliconius butterflies . PLoS Genetics 8 : e1002752 . Google Scholar CrossRef Search ADS PubMed Pelham JP . 2008 . A catalogue of the butterflies of the United States and Canada, with a complete bibliography of the descriptive and systematic literature . The Journal of Research on the Lepidoptera 40 : 1 – 672 . Poland JA , Brown PJ , Sorrells ME , Jannink JL . 2012 . Development of high-density genetic maps for barley and wheat using a novel two-enzyme genotyping-by-sequencing approach . PLoS One 7 : e32253 . Google Scholar CrossRef Search ADS PubMed Pritchard JK , Stephens M , Donnelly P . 2000 . Inference of population structure using multilocus genotype data . Genetics 155 : 945 – 959 . Google Scholar PubMed Pritchard JK , Wen X . 2004 . Documentation for structure software: version 2 . Chicago : University of Chicago Press . Prodanov D . 2010 . Color histogram . Available at: http://imagej.nih.gov.ij/plugins/color-histogram.html Proshek B , Dupuis JR , Engberg A , Davenport K , Opler PA , Powell JA , Sperling FA . 2015 . Genetic evaluation of the evolutionary distinctness of a federally endangered butterfly, Lange’s Metalmark . BMC Evolutionary Biology 15 : 73 . Google Scholar CrossRef Search ADS PubMed Pyle RM . 2002 . The butterflies of Cascadia: a field guide to all the species of Washington, Oregon, and surrounding territories . Seattle : Seattle Audubon Society . R Core Team . 2016 . R: a language and environment for statistical computing . Vienna : R Foundation for Statistical Computing . Available at: https://www.r-project.org/ Rambaut A , Drummond AJ . 2010 . FigTree v1.4.2 . Edinburgh, UK: Institute of Evolutionary Biology, University of Edinburgh . Available at: http://tree.bio.ed.ac.uk/software/figtree Rambaut A , Suchard MA , Xie D , Drummond AJ . 2014 . Tracer v1.6 . Available at: http://beast.bio.ed.ac.uk/Tracer Razkin O , Sonet G , Breugelmans K , Madeira MJ , Gómez-Moliner BJ , Backeljau T . 2016 . Species limits, interspecific hybridization and phylogeny in the cryptic land snail complex Pyramidula: the power of RADseq data . Molecular Phylogenetics and Evolution 101 : 267 – 278 . Google Scholar CrossRef Search ADS PubMed Schlick-Steiner BC , Steiner FM , Seifert B , Stauffer C , Christian E , Crozier RH . 2010 . Integrative taxonomy: a multisource approach to exploring biodiversity . Annual Review of Entomology 55 : 421 – 438 . Google Scholar CrossRef Search ADS PubMed Schneider CA , Rasband WS , Eliceiri KW . 2012 . NIH Image to ImageJ: 25 years of image analysis . Nature Methods 9 : 671 – 675 . Google Scholar CrossRef Search ADS PubMed Shepard J , Guppy CS . 2011 . Butterflies of British Columbia: including Western Alberta, Southern Yukon, the Alaska Panhandle, Washington, Northern Oregon, Northern Idaho, and Northwestern Montana . Vancouver : University of British Columbia Press . Silvertown J . 2009 . A new dawn for citizen science . Trends in Ecology & Evolution 24 : 467 – 471 . Google Scholar CrossRef Search ADS PubMed Sonah H , Bastien M , Iquira E , Tardivel A , Légaré G , Boyle B , Normandeau É , Laroche J , Larose S , Jean M , Belzile F . 2013 . An improved genotyping by sequencing (GBS) approach offering increased versatility and efficiency of SNP discovery and genotyping . PLoS One 8 : e54603 . Google Scholar CrossRef Search ADS PubMed Soucy SM , Huang J , Gogarten JP . 2015 . Horizontal gene transfer: building the web of life . Nature Reviews: Genetics 16 : 472 – 482 . Google Scholar CrossRef Search ADS PubMed Sperling JLH , Sperling FAH . 2009 . Lyme borreliosis in Canada: biological diversity and diagnostic complexity from an entomological perspective . The Canadian Entomologist 141 : 521 – 549 . Google Scholar CrossRef Search ADS Stajich JE , Block D , Boulez K , Brenner SE , Chervitz SA , Dagdigian C , Fuellen G , Gilbert JG , Korf I , Lapp H , Lehväslaiho H , Matsalla C , Mungall CJ , Osborne BI , Pocock MR , Schattner P , Senger M , Stein LD , Stupka E , Wilkinson MD , Birney E . 2002 . The BioPerl toolkit: Perl modules for the life sciences . Genome Research 12 : 1611 – 1618 . Google Scholar CrossRef Search ADS PubMed Steel M , Penny D . 2000 . Parsimony, likelihood, and the role of models in molecular phylogenetics . Molecular Biology and Evolution 17 : 839 – 850 . Google Scholar CrossRef Search ADS PubMed Stervander M , Illera JC , Kvist L , Barbosa P , Keehnen NP , Pruisscher P , Bensch S , Hansson B . 2015 . Disentangling the complex evolutionary history of the Western Palearctic blue tits (Cyanistes spp.) – phylogenomic analyses suggest radiation by multiple colonization events and subsequent isolation . Molecular Ecology 24 : 2477 – 2494 . Google Scholar CrossRef Search ADS PubMed Thompson JD , Gibson TJ , Higgins DG . 2003 . Multiple sequence alignment using ClustalW and ClustalX . Current Protocols in Bioinformatics 2 : 1 – 22 . Vähä JP , Erkinaro J , Niemelä E , Primmer CR . 2007 . Life-history and habitat features influence the within-river genetic structure of Atlantic salmon . Molecular Ecology 16 : 2638 – 2654 . Google Scholar CrossRef Search ADS PubMed Wachter GA , Muster C , Arthofer W , Raspotnig G , Föttinger P , Komposch C , Steiner FM , Schlick-Steiner BC . 2015 . Taking the discovery approach in integrative taxonomy: decrypting a complex of narrow-endemic Alpine harvestmen (Opiliones: Phalangiidae: Megabunus) . Molecular Ecology 24 : 863 – 889 . Google Scholar CrossRef Search ADS PubMed Wagner CE , Keller I , Wittwer S , Selz OM , Mwaiko S , Greuter L , Sivasundar A , Seehausen O . 2013 . Genome-wide RAD sequence data provide unprecedented resolution of species boundaries and relationships in the Lake Victoria cichlid adaptive radiation . Molecular Ecology 22 : 787 – 798 . Google Scholar CrossRef Search ADS PubMed Wahlberg N , Nylin S . 2003 . Morphology versus molecules: resolution of the positions of Nymphalis, Polygonia, and related genera (Lepidoptera: Nymphalidae) . Cladistics 19 : 213 – 223 . Google Scholar CrossRef Search ADS Wahlberg N , Weingartner E , Warren AD , Nylin S . 2009 . Timing major conflict between mitochondrial and nuclear genes in species relationships of Polygonia butterflies (Nymphalidae: Nymphalini) . BMC Evolutionary Biology 9 : 92 . Google Scholar CrossRef Search ADS PubMed Watanabe K , Nakazono T , Ono Y . 2012 . Morphology evolution and molecular phylogeny of Pestalotiopsis (Coelomycetes) based on ITS2 secondary structure . Mycoscience 53 : 227 – 237 . Google Scholar CrossRef Search ADS Yeates DK , Seago A , Nelson L , Cameron SL , Joseph LEO , Trueman JWH . 2011 . Integrative taxonomy, or iterative taxonomy ? Systematic Entomology 36 : 209 – 217 . Google Scholar CrossRef Search ADS Zhang W , Dasmahapatra KK , Mallet J , Moreira GR , Kronforst MR . 2016 . Genome-wide introgression among distantly related Heliconius butterfly species . Genome Biology 17 : 25 . Google Scholar CrossRef Search ADS PubMed Zhulidov PA , Bogdanova EA , Shcheglov AS , Vagner LL , Khaspekov GL , Kozhemyako VB , Matz MV , Meleshkevitch E , Moroz LL , Lukyanov SA , Shagin DA . 2004 . Simple cDNA normalization using Kamchatka crab duplex-specific nuclease . Nucleic Acids Research 32 : e37 . Google Scholar CrossRef Search ADS PubMed © 2017 The Linnean Society of London, Zoological Journal of the Linnean Society This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://academic.oup.com/journals/pages/about_us/legal/notices) http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Zoological Journal of the Linnean Society Oxford University Press

Genomics-informed species delimitation to support morphological identification of anglewing butterflies (Lepidoptera: Nymphalidae: Polygonia)

Loading next page...
 
/lp/ou_press/genomics-informed-species-delimitation-to-support-morphological-VUNId7RVuR
Publisher
The Linnean Society of London
Copyright
© 2017 The Linnean Society of London, Zoological Journal of the Linnean Society
ISSN
0024-4082
eISSN
1096-3642
D.O.I.
10.1093/zoolinnean/zlx081
Publisher site
See Article on Publisher Site

Abstract

Abstract Species delimitation and identification are integral to virtually all biological disciplines, but are far from straightforward tasks. Taxonomy has recently focused on integrative approaches that consider multiple types of data to resolve species boundaries, yet methodologies to that end are still being developed. Here, we assess species limits in an area of wide distributional overlap between several species of anglewing butterflies in western Canada. Focusing on an area of sympatry provides a rich system to test species boundaries in the face of potential gene flow between morphologically variable yet similar species. Mitochondrial DNA and genome-wide single nucleotide polymorphisms provided clear species delimitation, although previously identified cytonuclear discordance was also apparent. Analysis of two morphological data sets, based on characters commonly used as diagnostic characters in the literature and field guides, was variably successful at separating species. Using analyses that were guided by the results of the genetic data increased successful species identification for traditionally used characters, but not for the morphological data set based on digital colour analysis. Our application of genetics to guide morphological analysis demonstrates a useful approach for implementing iterative methodologies as part of integrative taxonomy. cytonuclear discordance, genome-wide SNPs, genotyping-by-sequencing, mitochondrial phylogeny, Polygonia, species identification INTRODUCTION Species delimitation and identification are essential to virtually all biological disciplines, but are far from straightforward tasks. Morphological variation (Lumley & Sperling, 2010; McKay et al., 2014; Proshek et al., 2015), gene discordance (Maddison, 1997) and complex evolutionary histories (Dejaco et al., 2016) can obfuscate species limits, creating phylogenetic and taxonomic uncertainty. Such uncertainty is particularly undesirable in systems where incorrect delimitation of units can have large economic (e.g. Frey et al., 2013; Dumas et al., 2015), health-related (Sperling & Sperling, 2009; Müller et al., 2013) or conservation consequences (Bickford et al., 2007; Bortolus, 2008; Hedin, 2015; Proshek et al., 2015). Integrative taxonomy (Dayrat, 2005; Schlick-Steiner et al., 2010), which delimits species using a consensus of multiple data types (morphological, molecular, ecological, chemical, etc.), provides an appealing solution to cases of difficult species delimitation, particularly compared to approaches that use single data types (Dupuis, Roe & Sperling, 2012). It is open to discussion whether integrative taxonomy approaches are strictly analytically integrative and objective (e.g. with analyses that consider multiple data types simultaneously, and allowing results to be recreated) or whether they are iterative, considering data sources successively and independently to arrive at a qualitative consensus (Yeates et al., 2011). Regardless, assimilating variable data types into broadly integrative taxonomic consideration has yielded unprecedented resolution in many historically difficult groups (Lumley & Sperling, 2010; Wachter et al., 2015; Dejaco et al., 2016; Gratton et al., 2016). Genomic data are particularly powerful for resolving species boundaries due to the sheer size of genomic data sets. Reduced-representation genomic libraries generated from, for example, restriction site-associated DNA sequencing methods (RAD-seq), can, in similar timeframes, create data sets that have hundreds of times more phylogenetic information than traditional Sanger sequencing-based approaches (Cruaud et al., 2014). Such RAD-seq data sets have been used to resolve phylogenetic relationships (Wagner et al., 2013; Cruaud et al., 2014; DaCosta & Sorenson, 2016; Dupuis et al., 2017), define species limits (Leaché et al., 2014; Pante et al., 2015; Gratton et al., 2016; Razkin et al., 2016) and elucidate evolutionary history (Jones et al., 2013; Nadeau et al., 2013; Stervander et al., 2015). Although the various iterations of these methods have technical advantages and disadvantages (Andrews et al., 2016), and particular considerations for phylogenetic analysis (DaCosta & Sorenson, 2016; Dupuis et al., 2017), they have become one of the mainstay methodologies for systematic and evolutionary biology studies. Here, we assess species limits between anglewing butterflies (Polygonia Hübner, 1819) in western Canada. Of the 15 species described worldwide (Wahlberg et al., 2009), four are widespread in Alberta and British Columbia: Polygonia faunus (Edwards, 1862), P. gracilis (Grote & Robinson, 1867), P. progne (Cramer, 1775) and P. satyrus (Edwards, 1869). Widely variable yet shared morphological characteristics and similar ecologies (Nylin et al., 2015) make identification of these species challenging in the field (Bird et al., 1995; Layberry, Hall & Lafontaine, 1998; Acorn & Sheldon, 2006). Although previous phylogenetic studies have found these species to form monophyletic groups, discordance between mitochondrial and nuclear genes, morphology and ecology make for tenuous hypotheses of evolutionary relationships (Fig. 1; Wahlberg et al., 2009). Taken together, the complex situation across much of Alberta and British Columbia makes this an ideal system for an integrative taxonomic approach, a point highlighted by Wahlberg et al. (2009): ‘careful study of populations in Alberta seems warranted’. Figure 1. View largeDownload slide Alternate topologies of pertinent New World Polygonia species, generated from non-genetic, mitochondrial DNA and nuclear DNA data sets. Some data sets of similar types from different studies have been combined into a single tree, and differences in topologies between those data sets are indicated with grey text and branches. Commonly observed monophyletic groups are highlighted with grey boxes when present, and data sets generated in the current study are noted with an asterisk. Wing pattern: Nylin et al. (2001); MEB (synthesis of morphology, ecology and behaviour): Nylin et al. (2001), Wahlberg & Nylin (2003); cytochrome oxidase subunit I gene (COI): Wahlberg & Nylin (2003), Wahlberg et al. (2009); ND1: Nylin et al. (2001), Wahlberg & Nylin (2003), Wahlberg et al. (2009); EF-1α: Wahlberg & Nylin (2003), Wahlberg et al. (2009); wgl: Nylin et al. (2001), Wahlberg & Nylin (2003), Wahlberg et al. (2009); RpS5: Wahlberg et al. (2009). Figure 1. View largeDownload slide Alternate topologies of pertinent New World Polygonia species, generated from non-genetic, mitochondrial DNA and nuclear DNA data sets. Some data sets of similar types from different studies have been combined into a single tree, and differences in topologies between those data sets are indicated with grey text and branches. Commonly observed monophyletic groups are highlighted with grey boxes when present, and data sets generated in the current study are noted with an asterisk. Wing pattern: Nylin et al. (2001); MEB (synthesis of morphology, ecology and behaviour): Nylin et al. (2001), Wahlberg & Nylin (2003); cytochrome oxidase subunit I gene (COI): Wahlberg & Nylin (2003), Wahlberg et al. (2009); ND1: Nylin et al. (2001), Wahlberg & Nylin (2003), Wahlberg et al. (2009); EF-1α: Wahlberg & Nylin (2003), Wahlberg et al. (2009); wgl: Nylin et al. (2001), Wahlberg & Nylin (2003), Wahlberg et al. (2009); RpS5: Wahlberg et al. (2009). We use mitochondrial DNA (mtDNA) gene sequences, genome-wide single nucleotide polymorphisms (SNPs) generated by genotyping-by-sequencing (GBS, a form of RAD-seq) and two morphological data sets (visually and digitally scored wing characters based on commonly used field diagnostic characters) to test species limits between the four widespread species (and two subspecies of P. gracilis) in this region. Using phylogenetic and population genetic analyses, we first ask whether genetic data sets clearly separate species or whether there are signs of admixture or introgression between species. We then use multivariate analyses of the morphological data sets to test whether quantitative analysis of morphology leads to similar delimitation of species. Both genetic data sets predict distinct species limits, despite comparable cytonuclear discordance to previous studies, but only one morphological data set conforms to the genetic clusters. Given the clear species limits from genome-wide SNPs and mtDNA, we use the genetic data to inform subsequent analyses of the morphological data sets and, from these genetically guided morphological analyses, determine which morphological characters best delimit species. These results provide an iterative methodological approach to taxonomy, informing more accurate diagnoses of these species in the field. With the popularization of citizen science (Silvertown, 2009; Bonney et al., 2014), particularly in the realm of butterflies (Howard & Davis, 2009; Devictor, Whittaker & Beltrame, 2010; Larrivee et al., 2014), a framework for integrative taxonomy to guide the development of better diagnostic identification can have wide-reaching applications. MATERIAL AND METHODS Specimen collection, DNA extraction and photography Specimens were collected primarily as adults with an aerial net or bait trap (Leptraps, baited with banana) and frozen alive at −80 °C. Larvae were also collected and reared to adulthood on host clippings of the same species from which they were collected (Supporting Information, Table S1). We focused sampling in central and southern Alberta, southeast British Columbia and adjacent states, where six species/subspecies are regularly found: Polygonia faunus, P. gracilis gracilis (Grote and Robinson, 1867), P. gracilis zephyrus (Edwards, 1870), P. oreas (Edwards, 1869), P. progne and P. satyrus. Adequate sampling was achieved for five of these lineages, but we were unable to collect P. oreas, which is uncommon in the southwest corner of Alberta. Several additional specimens were included to represent other North American lineages: P. comma (Harris, 1841) and P. interrogationis (Fabricius, 1798); the latter is also occasionally found in Alberta during irruptive years (Layberry et al., 1998). We identified specimens using several field guides (Bird et al., 1995; Brock & Kaufman, 2003; Acorn & Sheldon, 2006) and the consensus opinion of multiple coauthors (CMM, JHA and FAHS), and follow the nomenclature of Pelham (2008). A total of 258 Polygonia specimens were collected [one P. comma, 106 P. faunus, 21 P. gracilis (10 P. g. gracilis and 11 P. g. zephyrus), one P. interrogationis, 71 P. progne and 58 P. satyrus] as well as seven specimens of Aglais milberti and Nymphalis spp. for use as outgroups. While we delimited subspecies for P. gracilis, we did not do so for P. faunus due to the lack of empirical evidence for those subspecies (Kodandaramaiah et al., 2012). Genomic DNA was extracted from legs and the ventral portion of the thorax using DNeasy Blood and Tissue kits (Qiagen). RNase A was used during the extraction (following Qiagen protocol), and the resulting DNA was ethanol precipitated and eluted in millipore water. DNA quality and quantity were assessed using a Nanodrop ND-1000 (Thermo Scientific) and a Qubit 1.0 dsDNA BR assay kit (Invitrogen), respectively. After DNA extraction, specimens were pinned and spread, and missing thoracic tissue was fortified with Archival Quality Neutral pH adhesive (Lineco). Some specimens lacked adequate thoracic tissue for pinning and were stored in glassine envelopes. Images of dorsal and ventral wing surfaces of all specimens were captured using a Canon Powershot A650 IS camera in macro setting, using aperture priority mode at f 22 to optimize depth of focus, and an exposure compensation of +12/3 to optimize the brightness of the images. The camera was manually white-balanced between photography sessions, with the same lighting used throughout the imaging process. The camera was mounted on a copy stand, and we maintained a consistent distance from specimen to camera lens using a modified plastazote foam pedestal. Image files were converted from JPEG to TIFF format in Adobe Photoshop to preserve resolution. All specimen information and photographs are databased in the University of Alberta E. H. Strickland Entomological Museum (accessions UASM370001-UASM370265; Supporting Information, Table S1). mtDNA sequencing and analysis The entire mitochondrial cytochrome oxidase subunit I gene (COI) was amplified in two fragments by polymerase chain reaction (PCR). We targeted the first ~650 bp using the primers LepF1 (5′-ATT CAA CCA ATC ATA AAG ATA TTG G-3′) and HCO (5′-TGA TTT TTT GGT CAC CCT GAA GTT TA-3′), and the remaining ~800 bp with JerryI (5′-CAA CAT TTA TTT TGA TTT TTT GG-3′) and PatII (5′-AGG TAA TGT ATA TTA GAC GGT ATA ACT-3′). Some samples were not successfully amplified using the latter pair, so JerryI and MilaIV (5′-AAA TTA GGA CAT TTA TTA CC-3′) were used to produce a shorter fragment. PCR cycling conditions consisted of a 5-min denaturation step at 94 °C, 35 cycles of 94 ° C for 30 s, 45 °C for 30 s and 72 °C for 2 min, and a final 5-min elongation at 74 °C. PCR products were cleaned using Exonuclease I and shrimp alkaline phosphatase (New England Biolabs) before dye-termination using BigDye sequencing pre-mix v2.1 (Life Technologies). Fragments were sequenced using the 3730 DNA analyzer (Applied Biosystems) at the Molecular Biology Service Unit at the University of Alberta. Forward and reverse COI reads were merged and manually inspected for sequencing call errors in Geneious v7.0.6 (Kearse et al., 2012). We then used Mesquite v2.75 (Maddison & Maddison, 2008) to align individual genes with the ClustalW algorithm (Thompson, Gibson & Higgins, 2003), before manually concatenating sequencing products into a single COI haplotype for every individual. COI sequences generated in our study are accessioned on GenBank (KY318516–KY318762; Supporting Information, Table S1). We also incorporated 89 COI sequences from five Polygonia taxa that were used by Wahlberg et al. (2009) (Supporting Information, Table S1). Finally, we manually trimmed the ends of the sequences and ~100 bp at the junction between the two amplified regions, both of which contained high amounts of missing data. This resulted in a final alignment of 1348 bp for 315 individuals. Parsimony-based haplotype networks were built in TCS (Clement, Posada & Crandall, 2000), using default settings. Gaps were treated as missing data to accommodate fragments that varied slightly in length. The genetic identity of these species was unambiguous (see ‘Results’), so haplotype networks were constructed for P. faunus and P. progne separately, while P. gracilis and P. satyrus were combined due to their similarity. We conducted maximum likelihood (ML) tree searches in IQ-TREE v1.4.2 (Nguyen et al., 2015) with 1000 replicates each of ultra-fast bootstrapping (Minh, Nguyen & von Haeseler, 2013) and the Shimodaira/Hasegawa approximate likelihood ratio test (SH-aLRT – Guindon et al., 2010). IQ-TREE’s model selection procedure using the Bayesian information criterion predicted a transition model (TIM2) + Γ. We conducted Bayesian inference (BI) analysis in ExaBayes v1.5 (Aberer, Kobert & Stamatakis, 2014) with four Metropolis-coupled replicates for one million generations (three heated chains), sampling every 1000 generations, and otherwise default parameters and models. We used Tracer v1.6 (Rambaut et al., 2014) to assess convergence and sampling of parameter values’ posterior distributions, ensuring that all effective sample sizes were >200, and also assessed the average SD of split frequencies and potential scale reduction factors across runs, which should approach zero and one, respectively. A consensus tree was built using TreeAnnotator (Drummond et al., 2012) after manually removing the first 25% of trees as burn-in. Both ML and Bayesian analyses of COI were conducted on a smaller data set of unique haplotypes only (172 haplotypes, including outgroups). Preliminary trees were viewed in FigTree v1.3.1 (Rambaut & Drummond, 2010). GBS library preparation, sequencing and data filtering GBS library preparation following Poland et al. (2012) was conducted at the Institut de Biologie Intégrative et des Systèmes at Université Laval. PstI and MspI restriction enzymes were used with a duplex-specific nuclease treatment (Zhulidov et al., 2004), and complexity reduction (a single C appended to the reverse primer) as described in Sonah et al. (2013). Two to eight base pair barcodes were used to identify individuals and four lanes (including individuals not included in the present study) of 96-plex; 100 bp single-end sequencing was performed on an Illumina HiSeq 2000 at McGill University’s Génome Québec Innovation Centre. Process_radtags of Stacks v1.35 (Catchen et al., 2011) was use to demultiplex and rename raw FASTQ files, allowing no ambiguities in the individual-specific barcodes. As sequencing errors are more probably at the ends of sequencing reads, 5 bp were trimmed from the 3′ end of the reads using an in-house Perl script. We then used Ustacks, Cstacks and Sstacks (Catchen et al., 2011) to assemble de novo catalogs and map reads to these catalogs. Default parameters were used for these utilities, except a minimum depth of five reads was required for initial catalog creation, and three mismatches were allowed in matching secondary reads to primary stacks. Finally, populations (Catchen et al., 2011) was used to call SNPs with a population map specifying a single population of all individuals; loci had to be present in one population and one individual to be processed, and a minimum stack depth of six was required for individuals at each locus. All Stacks utilities and associated filtering were run using Perl wrappers, which also made use of the BioPerl toolkit (Stajich et al., 2002), and these are available at https://github.com/muirheadk/GBS_analysis_pipeline. Two main types of input files were constructed from populations vcf and fasta outputs: one for phylogenetic and the second for population genetic analyses. First, we used vcftools v0.1.12 (Danecek et al., 2011) to calculate a preliminary measure of missing data per individual and removed individuals missing >50% genotype data (calculated after removing loci missing >50% data per individual); these individuals were removed from both phylogenetic and population genetic input files in the following steps. For phylogenetic analyses, we used an in-house Perl script that ingests fasta output from populations, removes loci with high missing data (here a threshold of 50% was used, but preliminary analyses showed little difference when using alternative missing data thresholds) and outputs DNA sequence alignment with missing data symbols for missing loci per individual. This alignment contained SNPs as well as their flanking sequences from the variable length loci of the de novo catalog construction and resulted in an alignment of 121 513 bp (2572 SNPs) for 248 individuals. Including flanking sequence around SNPs allows the use of more realistic models of DNA evolution, such as those that take into account invariant sites (e.g. Steel & Penny, 2000, and references therein). We also used vcftools and an in-house Perl script to construct a SNP-only alignment with the same set of loci, where SNPs were coded using IUPAC ambiguity codes (e.g. an A/G heterozygote would be coded R). For population genetic analyses, we removed loci that had >20% missing data from vcf output of populations using vcftools and used PGDSpider v2.0.1.9 (Lischer & Excoffier, 2012) to convert to structure format. We also removed outgroup specimens, leading to a final data set of 241 individuals and 961 SNPs. GBS analyses ML tree searches of the GBS alignments were conducted identically to analyses of the COI alignment. IQ-TREE’s model selection procedure selected the transversion model (TVM) + I + Γ and TVM + Γ for the 121 513 bp sequence-based alignment and the 2572 SNP-based alignment, respectively. Bayesian analyses of the GBS alignments were also conducted the same as was done for COI, except that we ran the analyses for five million generations and sampled every 2500 generations. We used two methods to assess population structure in the 961 SNP data sets. First, we estimated the number of genetic clusters (K) with STRUCTURE v2.3.4 (Pritchard, Stephens & Donnelly, 2000; Falush, Stephens & Pritchard, 2003). Ten iterations of K = 1–10 were run with the admixture model, correlated allele frequencies, and without any prior information about geographic origin or species. A burn-in of 50 000 Markov chain Monte Carlo generations preceded 150 000 generations that were summarized using CLUMPAK (Kopelman et al., 2015). We considered LnPr(X|K) (Pritchard et al., 2000) and ΔK (Evanno, Regnaut & Goudet, 2005) to select the most likely value of K and conducted hierarchical structure analysis (Pritchard & Wen, 2004; Vähä et al., 2007) with the same parameters to address substructure in the main genetic clusters. Second, discriminant analysis of principal components (DAPC – Jombart, Devillard & Balloux, 2010) was implemented the adegenet library (Jombart, 2008) in R v3.3.1 (R Core Team, 2016). We used find.clusters to estimate K with default parameters and retaining all principal components (PCs), and cross-validation (xvaldapc) to determine the optimal number of PCs to retain in the discriminant analysis (testing 100 maximum PCs and 100 replicates). Hierarchical analysis was also conducted with DAPC. Morphological character scoring Two main categories of morphological wing characters are used to delimit Polygonia species in field guides: those based on discrete or ordinal characteristics of the wings (spots, shapes, etc.) and those based on colour. We assembled a data matrix from both and, given the inherent differences in data type, conducted independent analyses for each category. First, we reviewed the literature and field guides to summarize characters commonly used to diagnose and delimit the five lineages of Polygonia considered here (Supporting Information, Table S2). From this list, we selected characters that were considered highly diagnostic, and those with states that were unique to individual species. Characters that were present in similar states in multiple species were excluded (e.g. submarginal green spots on the ventral hindwing are considered highly diagnostic of P. faunus, but similar spots, although more yellow-green and slightly smaller, are found in P. satyrus and P. progne, respectively). We selected and scored ten discrete/ordinal characters, each with between two and five character states (Fig. 2), and refer to these as ‘visually scored characters’. Figure 2. View largeDownload slide Visually scored characters on dorsal and ventral wing surfaces, and proportion of each character state across species – (c) and (s) denote contrasted and smeared forms of Polygonia faunus and P. satyrus, respectively. Figure 2. View largeDownload slide Visually scored characters on dorsal and ventral wing surfaces, and proportion of each character state across species – (c) and (s) denote contrasted and smeared forms of Polygonia faunus and P. satyrus, respectively. Second, we assembled a set of colour luminance characters with insight from field guides and the literature, using a digital scoring approach developed for tortricid moths by Lumley & Sperling (2010). These characters were defined by shapes on the wings, and bounded by either scale patterns or veins so that they were reliably scored. Additionally, we required that these wing areas were present across all taxa (i.e. the colour of a spot could not be measured if the spot was absent in one species). Luminance values were measured from the standardized photographs of specimens in ImageJ (Schneider, Rasband & Eliceiri, 2012) using the Colour Histogram package (Prodanov, 2010). Red, green and blue (RGB) values were measured for all pixels within each wing area and averaged to create a set of three individual characters for each wing area. In total, six wing areas were measured, creating a final matrix of 18 characters (Fig. 3), and we refer to these as ‘RGB characters’. Figure 3. View largeDownload slide Red, green and blue (RGB) characters on the dorsal and ventral wing surfaces, and the average RGB luminance values (combined red, green and blue) visualized as colour swatches for each species – (c) and (s) denote contrasted and smeared forms of Polygonia faunus and P. satyrus, respectively. Figure 3. View largeDownload slide Red, green and blue (RGB) characters on the dorsal and ventral wing surfaces, and the average RGB luminance values (combined red, green and blue) visualized as colour swatches for each species – (c) and (s) denote contrasted and smeared forms of Polygonia faunus and P. satyrus, respectively. All morphological character scoring was conducted by CMM. Female P. faunus and P. satyrus exhibit two morphotypes, referred to as ‘smeared’ and ‘contrasted’ in this study (see Supporting Information, Fig. S1). Smeared morph females generally lack striations and contrast on the ventral side of the wings, whereas the contrasted morph females appear striated with alternating bands of light and dark coloration, much like their conspecific males. These morphs were treated as separate taxa for morphological analysis. Morphological analysis We used eigenvector-based multivariate analyses to maximize the variance among individuals for morphological characters and visualize morphological similarity between species. Because of the inherent differences between the visually scored and RGB characters (the former were discrete/ordinal and the latter continuous), separate multivariate analyses were used for each data set. Multiple correspondence analysis (MCA) and principal component analysis (PCA) were used for the visually scored and RGB characters, respectively. In some cases, these analyses did not provide resolution of species boundaries. In an effort to increase this resolution, we also conducted discriminant correspondence analysis (DCA) and linear discriminant analysis (LDA) (for visually scored and RGB characters, respectively), which incorporate a priori defined groups into the analysis. The results of the genetic analyses were unambiguous as to the species-level identifications, so we used SNP genetic clusters as the a priori species identity of individuals. Polygonia faunus and P. satyrus were further divided into morphological types (smeared vs. contrasted morphs), and P. gracilis was divided into putative subspecies P. g. gracilis and P. g. zephyrus based on diagnostic descriptions in field guides and collection location (Bird et al., 1995; Pyle, 2002; Shepard & Guppy, 2011). All analyses were conducted using the ade4 library (Dray & Dufour, 2007) in R. Individuals containing missing data (due to wing wear, missing characters, etc.) were removed prior to the analysis, resulting in the following sample sizes: Visually scored characters: P. faunus (smeared n = 23, contrasted n = 67), P. g. gracilis (n = 6), P. g. zephyrus (n = 11), P. progne (n = 62) and P. satyrus (smeared n = 11, contrasted n = 37) and RGB characters: P. faunus (smeared n = 28, contrasted n = 67), P. g. gracilis (n = 8), P. g. zephyrus (n = 11), P. progne (n = 68) and P. satyrus (smeared n = 17, contrasted n = 38). Character states for all individuals are provided in Supporting Information, Table S1. RESULTS Mitochondrial DNA All analyses of COI (parsimony-based haplotype networks and ML and BI tree searches) gave the same broad relationships between species that have been identified previously using mtDNA (Fig. 1; Supporting Information, Fig. S2). The only exception was a divergent mitochondrial clade (G-X) found in three individuals collected for this study. This clade is positioned basally to P. gracilis, P. progne and P. satyrus, but morphological and GBS data identified these individuals as P. gracilis (both subspecies). The low number of non-synonymous mutations (two in total) in this lineage supported a mitochondrial origin, rather than a nuclear insertion of a mitochondrial gene, and repeated extraction and sequencing confirmed that these sequences were not chimeras or the result of wet lab error. Apart from this divergent lineage, all species conformed to well-supported monophyletic clades (Supporting Information, Fig. S2). Within these clades, there was little structured genetic variation, and many individuals from disparate geographic locations and ecozones shared identical haplotypes (Fig. 4). Additionally, P. g. gracilis and P. g. zephyrus were not distinct in either phylogenetic or haplotype network analyses (Supporting Information, Fig. S2). Figure 4. View largeDownload slide A, cytochrome oxidase subunit I gene (COI) sampling localities divided by general ecozones and geographic proximity. Open circles represent specimens collected for this study [for which genotyping-by-sequencing (GBS) and morphological data may be available], and closed circles represent sequences from GenBank. B, Haplotype networks constructed from COI data set for each species (the dashed line between Polygonia gracilis and P. satyrus represent the break between those two species). Circles represent individual haplotypes, circle diameter is proportional to the number of individuals for each haplotype. Small dots represent missing haplotypes, and the cross-hatched line extending to haplotype G-X represents 22 substitutions. Some major haplotypes for P. gracilis, P. progne and P. satyrus were named in this study, those for P. faunus follow Kodandaramaiah et al. (2012). Figure 4. View largeDownload slide A, cytochrome oxidase subunit I gene (COI) sampling localities divided by general ecozones and geographic proximity. Open circles represent specimens collected for this study [for which genotyping-by-sequencing (GBS) and morphological data may be available], and closed circles represent sequences from GenBank. B, Haplotype networks constructed from COI data set for each species (the dashed line between Polygonia gracilis and P. satyrus represent the break between those two species). Circles represent individual haplotypes, circle diameter is proportional to the number of individuals for each haplotype. Small dots represent missing haplotypes, and the cross-hatched line extending to haplotype G-X represents 22 substitutions. Some major haplotypes for P. gracilis, P. progne and P. satyrus were named in this study, those for P. faunus follow Kodandaramaiah et al. (2012). Genotyping-by-sequencing In total, 452.8 million reads were generated in four lanes of sequencing for the specimens included in this study. Filtering with process_radtags reduced this to 345.0 million reads, and 315.5 million reads (average 1.3 million per individual) were incorporated into the de novo assembly for SNP calling, leading to 127 661 catalog loci. All ML and BI phylogenetic analyses of filtered GBS data (including flanking sequence or SNPs only) agreed on a single set of relationships between species (Figs 1, 5; Supporting Information, Figs S3–S5). As with mtDNA, P. faunus was basal to the remaining Polygonia species, but the remainder of the tree showed a sister relationship between the clades containing P. interrogationis + (P. comma + P. satyrus) and P. gracilis + P. progne. All species were reciprocally monophyletic and little geographic structure was apparent within species (Fig. 5). Bayesian runs converged by all metrics for the sequence-based analysis; however, several effective sample size values for the SNPs only analysis were below 200; regardless, the same main topology was found for both data sets. Figure 5. View largeDownload slide A, Sampling across the zone of contact with broad ecozones indicated by different shapes and black outlines. Black lines connecting shapes indicate comparable ecozones in Figure 4 that were subdivided here to provide more resolution. B, Consensus tree from maximum likelihood (ML) analysis of 121 513 bp alignment of genotyping-by-sequencing (GBS) data (including flanking sequence). Black asterisks indicate >95% ultra-fast bootstrap + >80% Shimodaira/Hasegawa approximate likelihood ratio test (SH-aLRT) from ML analysis and >0.9 posterior probability from Bayesian inference (BI); grey asterisks indicate highly supported nodes (using the aforementioned criteria) in only one analysis (generally nodes were well-supported in ML but not with BI). Grey circles for Polygonia gracilis individuals indicate P. g. zephyrus, and individuals with the G-X mitochondrial haplotype indicated with ‘G-X’. C, STRUCTURE barplots from 961 single nucleotide polymorphisms (SNPs) for K = 4 (overall clustering) and the finest level of substructure for each main cluster (excluding P. comma and P. interrogationis); no substructure was observed within P. gracilis or P. progne. Photographs, used with permission, by Kim Davis, Mike Strangeland and Andrew Warren. Figure 5. View largeDownload slide A, Sampling across the zone of contact with broad ecozones indicated by different shapes and black outlines. Black lines connecting shapes indicate comparable ecozones in Figure 4 that were subdivided here to provide more resolution. B, Consensus tree from maximum likelihood (ML) analysis of 121 513 bp alignment of genotyping-by-sequencing (GBS) data (including flanking sequence). Black asterisks indicate >95% ultra-fast bootstrap + >80% Shimodaira/Hasegawa approximate likelihood ratio test (SH-aLRT) from ML analysis and >0.9 posterior probability from Bayesian inference (BI); grey asterisks indicate highly supported nodes (using the aforementioned criteria) in only one analysis (generally nodes were well-supported in ML but not with BI). Grey circles for Polygonia gracilis individuals indicate P. g. zephyrus, and individuals with the G-X mitochondrial haplotype indicated with ‘G-X’. C, STRUCTURE barplots from 961 single nucleotide polymorphisms (SNPs) for K = 4 (overall clustering) and the finest level of substructure for each main cluster (excluding P. comma and P. interrogationis); no substructure was observed within P. gracilis or P. progne. Photographs, used with permission, by Kim Davis, Mike Strangeland and Andrew Warren. Population genetic analysis also supported these divisions. To evaluate the overall STRUCTURE analysis, we used two methods: Ln Pr(X|K) supported values of K between three and five, and ΔK supported K = 5. However, it was apparent that K = 5 only introduced a partial cluster to P. comma and P. interrogationis (each represented by a single individual), and K = 3 simply combined P. gracilis and P. progne. For these reasons, we chose K = 4 as the optimal value of K, which corresponded to the four species considered here (barplots of all values of K provided in Supporting Information, Fig. S6). Hierarchical analysis of these four clusters identified substructure in P. faunus (K = 2) and P. satyrus (K = 3) (Fig. 5; barplots of all values of K for hierarchical analysis provided in Supporting Information, Fig. S6). Although this substructure partially corresponds to distinct clades identified with the phylogenetic analyses, there are no clear geographic or habitat-related patterns except a distinct cluster of five individuals of P. satyrus from a single population in the foothills of the Rocky Mountains (Fig. 5). DAPC corroborated the overall pattern of genetic structure, with find.clusters predicting values of K from four to six, and the first and second discriminant functions strongly separating the four species (Supporting Information, Fig. S7). Some substructure was also supported by DAPC, notably the two main clusters of P. faunus and the distinct cluster of five individuals in P. satyrus (right most cluster in P. satyrus portion of Supporting Information, Fig. S7). Individuals of P. gracilis exhibiting the divergent G-X mitochondrial type were not distinct in any of the SNP-based analyses, and there was no support for separation of P. g. gracilis and P. g. zephyrus (Fig. 5). Morphological characters The plot of visually scored characters on the first two MCA axes produced several clusters with relatively little overlap that corresponded to the species clusters identified using SNP data (Fig. 6A). MCA axis 1 separated P. progne from the rest of the species and P. satyrus from P. gracilis, and axis 2 separated P. faunus from P. satyrus and P. gracilis. The only notable overlap between these clusters was between specimens of P. faunus and P. gracilis; although some of these individuals were worn, others showed morphological similarity in relatively fresh specimens. Gradients were evident between smeared and contrasted forms of P. faunus and P. satyrus, but no clustering was apparent for subspecies of P. gracilis. Based on the relative magnitudes of the canonical weights (loadings) of the MCA, several characters (notably characters 1, 5 and 6) were equally effective for separating P. progne from the rest of the species (Table 1), and a different set (characters 9, 2, 3 and 8) separated P. gracilis and P. satyrus from P. faunus. DCA of the visually scored characters (using a priori groupings based on SNP data) also separated the same main clusters but showed more overlap between species and less distinction between smeared/contrasted morphs (Fig. 6B). DCA axis 2 mainly separated P. satyrus from the other species, while DCA axis 1 separated P. faunus from P. progne, with P. gracilis as intermediate. Character 9 was most effective at separating species along the first axis (P. faunus from P. progne), and characters 9 and 10 were most effective along the second axis for the clear separation of P. satyrus from the rest of the species (Table 1). Third axes of MCA and DCA provided no additional separation of any species clusters (Supporting Information, Fig. S8). Figure 6. View largeDownload slide Multivariate analyses of wing characters. Multiple correspondence analysis (A) and discriminant correspondence analysis (B) of visually scored characters, and principal components analysis (C) and linear discriminant analysis (D) of red, green and blue (RGB) characters. Decimal values on axis labels indicate proportion of total variation explained [multiple correspondence analysis (MCA) and principal component analysis (PCA)] and the remaining proportion of variation [discriminant correspondence analysis (DCA) and linear discriminant analysis (LDA)]. Species inclusion for each specimen was determined by membership in single nucleotide polymorphism (SNP) clusters. Figure 6. View largeDownload slide Multivariate analyses of wing characters. Multiple correspondence analysis (A) and discriminant correspondence analysis (B) of visually scored characters, and principal components analysis (C) and linear discriminant analysis (D) of red, green and blue (RGB) characters. Decimal values on axis labels indicate proportion of total variation explained [multiple correspondence analysis (MCA) and principal component analysis (PCA)] and the remaining proportion of variation [discriminant correspondence analysis (DCA) and linear discriminant analysis (LDA)]. Species inclusion for each specimen was determined by membership in single nucleotide polymorphism (SNP) clusters. Table 1. Canonical loadings for three axes from multiple correspondence analysis (MCA) and discriminant correspondence analysis (DCA) of ten visually scored characters and principal component analysis (PCA) and linear discriminant analysis (LDA) of 18 mean red, green and blue luminance values from six RGB characters Char. MCA1 MCA2 MCA3 DCA1 DCA2 DCA3 1 0.807 0.295 0.095 −2.146 −6.032 −0.398 2 0.190 0.528 0.006 4.870 −6.046 8.963 3 0.149 0.512 0.073 7.210 −4.617 −5.972 4 0.526 0.006 0.004 −1.540 8.894 9.558 5 0.776 0.270 0.219 −1.729 −7.054 7.608 6 0.774 0.030 0.003 −14.391 4.310 8.249 7 0.002 0.134 0.426 −9.374 1.571 −42.971 8 0.706 0.506 0.516 −0.203 6.762 4.765 9 0.199 0.659 0.009 22.324 −15.677 3.823 10 0.607 0.002 0.090 −1.859 16.252 8.434 PCA1 PCA2 PCA3 LDA1 LDA2 LDA3 11R −0.158 0.408 0.017 0.047 0.137 0.012 11G −0.193 0.251 0.259 0.324 0.418 −0.528 11B −0.158 −0.176 0.307 −0.141 −0.350 0.416 12R −0.224 0.374 −0.156 0.047 −0.427 −0.196 12G −0.265 0.219 0.161 −0.398 0.135 0.932 12B −0.232 −0.029 0.392 0.201 0.052 −0.408 13R −0.222 0.361 −0.128 0.107 0.198 −0.435 13G −0.270 0.177 0.132 0.221 −0.239 0.556 13B −0.237 −0.043 0.356 −0.027 −0.046 0.077 14R −0.247 0.069 −0.276 0.261 −0.057 −0.109 14G −0.267 −0.054 −0.147 −0.452 −0.360 −0.754 14B −0.209 −0.136 0.310 0.168 0.250 0.151 15R −0.265 −0.072 −0.342 0.323 −0.660 0.419 15G −0.268 −0.235 −0.241 −1.163 0.685 1.668 15B −0.217 −0.372 −0.049 0.614 −1.024 −2.617 16R −0.275 −0.064 −0.269 −0.630 1.105 −1.276 16G −0.268 −0.230 −0.130 0.001 −0.154 0.673 16B −0.219 −0.324 0.118 0.564 −0.204 0.994 Char. MCA1 MCA2 MCA3 DCA1 DCA2 DCA3 1 0.807 0.295 0.095 −2.146 −6.032 −0.398 2 0.190 0.528 0.006 4.870 −6.046 8.963 3 0.149 0.512 0.073 7.210 −4.617 −5.972 4 0.526 0.006 0.004 −1.540 8.894 9.558 5 0.776 0.270 0.219 −1.729 −7.054 7.608 6 0.774 0.030 0.003 −14.391 4.310 8.249 7 0.002 0.134 0.426 −9.374 1.571 −42.971 8 0.706 0.506 0.516 −0.203 6.762 4.765 9 0.199 0.659 0.009 22.324 −15.677 3.823 10 0.607 0.002 0.090 −1.859 16.252 8.434 PCA1 PCA2 PCA3 LDA1 LDA2 LDA3 11R −0.158 0.408 0.017 0.047 0.137 0.012 11G −0.193 0.251 0.259 0.324 0.418 −0.528 11B −0.158 −0.176 0.307 −0.141 −0.350 0.416 12R −0.224 0.374 −0.156 0.047 −0.427 −0.196 12G −0.265 0.219 0.161 −0.398 0.135 0.932 12B −0.232 −0.029 0.392 0.201 0.052 −0.408 13R −0.222 0.361 −0.128 0.107 0.198 −0.435 13G −0.270 0.177 0.132 0.221 −0.239 0.556 13B −0.237 −0.043 0.356 −0.027 −0.046 0.077 14R −0.247 0.069 −0.276 0.261 −0.057 −0.109 14G −0.267 −0.054 −0.147 −0.452 −0.360 −0.754 14B −0.209 −0.136 0.310 0.168 0.250 0.151 15R −0.265 −0.072 −0.342 0.323 −0.660 0.419 15G −0.268 −0.235 −0.241 −1.163 0.685 1.668 15B −0.217 −0.372 −0.049 0.614 −1.024 −2.617 16R −0.275 −0.064 −0.269 −0.630 1.105 −1.276 16G −0.268 −0.230 −0.130 0.001 −0.154 0.673 16B −0.219 −0.324 0.118 0.564 −0.204 0.994 View Large Table 1. Canonical loadings for three axes from multiple correspondence analysis (MCA) and discriminant correspondence analysis (DCA) of ten visually scored characters and principal component analysis (PCA) and linear discriminant analysis (LDA) of 18 mean red, green and blue luminance values from six RGB characters Char. MCA1 MCA2 MCA3 DCA1 DCA2 DCA3 1 0.807 0.295 0.095 −2.146 −6.032 −0.398 2 0.190 0.528 0.006 4.870 −6.046 8.963 3 0.149 0.512 0.073 7.210 −4.617 −5.972 4 0.526 0.006 0.004 −1.540 8.894 9.558 5 0.776 0.270 0.219 −1.729 −7.054 7.608 6 0.774 0.030 0.003 −14.391 4.310 8.249 7 0.002 0.134 0.426 −9.374 1.571 −42.971 8 0.706 0.506 0.516 −0.203 6.762 4.765 9 0.199 0.659 0.009 22.324 −15.677 3.823 10 0.607 0.002 0.090 −1.859 16.252 8.434 PCA1 PCA2 PCA3 LDA1 LDA2 LDA3 11R −0.158 0.408 0.017 0.047 0.137 0.012 11G −0.193 0.251 0.259 0.324 0.418 −0.528 11B −0.158 −0.176 0.307 −0.141 −0.350 0.416 12R −0.224 0.374 −0.156 0.047 −0.427 −0.196 12G −0.265 0.219 0.161 −0.398 0.135 0.932 12B −0.232 −0.029 0.392 0.201 0.052 −0.408 13R −0.222 0.361 −0.128 0.107 0.198 −0.435 13G −0.270 0.177 0.132 0.221 −0.239 0.556 13B −0.237 −0.043 0.356 −0.027 −0.046 0.077 14R −0.247 0.069 −0.276 0.261 −0.057 −0.109 14G −0.267 −0.054 −0.147 −0.452 −0.360 −0.754 14B −0.209 −0.136 0.310 0.168 0.250 0.151 15R −0.265 −0.072 −0.342 0.323 −0.660 0.419 15G −0.268 −0.235 −0.241 −1.163 0.685 1.668 15B −0.217 −0.372 −0.049 0.614 −1.024 −2.617 16R −0.275 −0.064 −0.269 −0.630 1.105 −1.276 16G −0.268 −0.230 −0.130 0.001 −0.154 0.673 16B −0.219 −0.324 0.118 0.564 −0.204 0.994 Char. MCA1 MCA2 MCA3 DCA1 DCA2 DCA3 1 0.807 0.295 0.095 −2.146 −6.032 −0.398 2 0.190 0.528 0.006 4.870 −6.046 8.963 3 0.149 0.512 0.073 7.210 −4.617 −5.972 4 0.526 0.006 0.004 −1.540 8.894 9.558 5 0.776 0.270 0.219 −1.729 −7.054 7.608 6 0.774 0.030 0.003 −14.391 4.310 8.249 7 0.002 0.134 0.426 −9.374 1.571 −42.971 8 0.706 0.506 0.516 −0.203 6.762 4.765 9 0.199 0.659 0.009 22.324 −15.677 3.823 10 0.607 0.002 0.090 −1.859 16.252 8.434 PCA1 PCA2 PCA3 LDA1 LDA2 LDA3 11R −0.158 0.408 0.017 0.047 0.137 0.012 11G −0.193 0.251 0.259 0.324 0.418 −0.528 11B −0.158 −0.176 0.307 −0.141 −0.350 0.416 12R −0.224 0.374 −0.156 0.047 −0.427 −0.196 12G −0.265 0.219 0.161 −0.398 0.135 0.932 12B −0.232 −0.029 0.392 0.201 0.052 −0.408 13R −0.222 0.361 −0.128 0.107 0.198 −0.435 13G −0.270 0.177 0.132 0.221 −0.239 0.556 13B −0.237 −0.043 0.356 −0.027 −0.046 0.077 14R −0.247 0.069 −0.276 0.261 −0.057 −0.109 14G −0.267 −0.054 −0.147 −0.452 −0.360 −0.754 14B −0.209 −0.136 0.310 0.168 0.250 0.151 15R −0.265 −0.072 −0.342 0.323 −0.660 0.419 15G −0.268 −0.235 −0.241 −1.163 0.685 1.668 15B −0.217 −0.372 −0.049 0.614 −1.024 −2.617 16R −0.275 −0.064 −0.269 −0.630 1.105 −1.276 16G −0.268 −0.230 −0.130 0.001 −0.154 0.673 16B −0.219 −0.324 0.118 0.564 −0.204 0.994 View Large PCA of the RGB characters did not separate any species (Fig. 6C). Despite the fact that the first PC axis explained a relatively large amount of variation (54.8%), all species overlapped in all major axes and relative intraspecific variation was high. LDA of the same characters (using prior groupings) resulted in more overall separation between species than in the PCA; however, overlap was still present in some individuals. Polygonia satyrus was the only species that was more-or-less completely separated, along axis 1, and aspects of characters 15 and 16 (both ventral characters) were most effective at separating this species. There was no apparent clustering of any smeared vs. contrasted morphs, nor of P. gracilis subspecies. Based on pairwise comparisons of their luminance means, P. progne and P. g. zephyrus were most different (Supporting Information, Table S3, Fig. S9). Again, the third axes of PCA and LDA provided no additional separation (Supporting Information, Fig. S8). DISCUSSION Species limits and relationships All genetic data sets and analyses supported clear species limits between the four widespread species considered here: P. faunus, P. gracilis, P. progne and P. satyrus. Within these lineages, there was little or no intraspecific geographic clustering with genome-wide SNPs or mtDNA and no signatures of admixture or introgression between species. The only geographically coherent clustering we observed was with SNP data in individuals of P. satyrus from a single population in the foothills of the Rocky Mountains, which formed a monophyletic clade in phylogenetic analyses and a distinct genetic cluster using STRUCTURE (Fig. 5). These individuals were not distinctive with COI and exhibited one of the most common mitochondrial haplotypes (Supporting Information, Fig. S2). Additionally, they were all collected from the same locality on the same day, so siblingship may easily explain their genetic similarity. The simplest explanation for the broad patterns of genetic differentiation among these four Polygonia species is that reproductive isolation between species is strong, and evolutionary forces (retained ancestral polymorphism, high levels of gene flow, etc.) are maintaining homogeneity within these lineages. At least some Polygonia species undergo two-way migrations (P. interrogationis: Layberry et al., 1998), and all Polygonia have high dispersal capability (Braschler & Hill, 2007; Burke, Fitzsimmons & Kerr, 2011). Given their longevity (Polygonia overwinter as adults, with individual lifespans of 10–12 months; Bird et al., 1995), high dispersal rates could explain the low interspecific variation we observed here, particularly in combination with high levels of retained ancestral polymorphism. Another consideration is that the geographic scale of our sampling was relatively small compared to these species’ full distributions (Layberry et al., 1998). Polygonia faunus, for example, is found from Alaska south to California and New Mexico, and across Canada and the northern United States to the Atlantic coast; Kodandaramaiah et al. (2012) found two mitochondrial haplotype groups and around five nuclear genetic clusters using microsatellites across the range of P. faunus. A similar pattern has been found in P. c-album (Linnaeus, 1758), a species found across Europe, North Africa and Asia, where microsatellites only show patterns of strong differentiation at the largest geographic scales (Kodandaramaiah et al., 2011). However, the smaller area of distributional overlap that we sampled creates an opportunity to test the genomic integrity of these species. Despite their morphological variation and similarity, we found no signatures of admixture or introgression between these species, further supporting the hypothesis of strong reproductive isolation. This lack of introgression is an interesting contrast to some other Nymphalidae, where introgression and reticulate evolution has played an important evolutionary role [e.g. Heliconius spp. (Heliconius Genome Consortium, 2012; Pardo-Diaz et al., 2012; Zhang et al., 2016)]. In keeping with the observed lack of geographic patterns within species, we found no genetic support for differentiation of the subspecies of P. gracilis: P. g. gracilis and P. g. zephyrus. There is substantial disagreement in the literature regarding these lineages, with some considering them to be separate species (Guppy & Shephard, 2001; Wahlberg et al., 2009) and others conspecifics (Layberry et al., 1998; Pelham, 2008). Morphologically, there is a continuum between P. g. gracilis-like and P. g. zephyrus-like characteristics, and while the extremes of this continuum are morphologically diagnosable, many individuals from sympatric regions fall in the middle (Wahlberg et al., 2009). Genetically, they share haplotypes for several mitochondrial and nuclear genes, as well as with other close relatives – P. progne, P. oreas, P. haroldii (Dewitz, 1877) (Wahlberg et al., 2009). Although our sample size is small for these lineages, genome-wide data support their conspecific status. Given the clear species limits between the lineages considered here, a logical step is to consider their phylogenetic relationships. However, our phylogenetic sampling is not comprehensive; we are missing five species that are thought to be interspersed among the taxa considered here [P. c-album, P. g-argenteum (Doubleday, 1848), P. haroldii, P. interposita (Staudinger, 1881) and P. oreas] and four basal species [P. c-aureum Linnaeus, 1758, P. egea (Cramer, 1775), P. gigantea (Leech, 1883) and P. undina (Grum-Grshimailo, 1890)]. For this reason, we will only briefly address phylogenetic considerations, and for more comprehensive phylogenetic discussions direct readers to Nylin et al. (2001), Wahlberg & Nylin (2003) and Wahlberg et al. (2009). Most notably, we observed similar cytonuclear discordance between mitochondrial and nuclear markers as has been observed using traditional gene sequencing approaches (Fig. 1; Nylin et al., 2001; Wahlberg & Nylin, 2003; Wahlberg et al., 2009). Discordance between relationships based on mtDNA and nuclear markers is well documented (Funk & Omland, 2003; Chan & Levin, 2005; Dupuis et al., 2012), and in Polygonia has been explained by ancestral hybridization creating shared haplotypes between species (Wahlberg et al., 2009) or infection with the maternally inherited bacterium Wolbachia (Kodandaramaiah et al., 2011, 2012). Both phenomena may have acted to obfuscate the relationships dictated by mtDNA, and we consider the true species relationships to lie closer to those predicted using more extensive nuclear markers and non-genetic data (to the extent that it can be illustrated using a bifurcating tree given the reticulate history of the group [Wahlberg et al., 2009; see Soucy, Huang & Gogarten (2015)]. Our discovery of the divergent G-X mitochondrial lineage in specimens of P. gracilis (Figs 1, 4) that are indistinguishable from other P. gracilis using SNPs further complicates the reticulate history of the group (Wahlberg et al., 2009). Whether this is the result of ancient hybridization or retention of an ancestral mitochondrial haplotype (Melo-Ferreira et al., 2014; Dupuis & Sperling, 2015; Naciri & Linder, 2015) is difficult to determine given the current sampling. It is clear that genome-wide assessments of genetic diversity with comprehensive range-wide sampling of each species will be essential to further understand these evolutionary histories. Genetic data informing morphological diagnostics For each of the morphological data sets (visually scored and RGB characters), we conducted two sets of analyses: one without prior information on species designations and one with prior information based on species identity from mtDNA and SNP data. While the visually scored characters separated species without guidance from genetic data (Fig. 6A), RGB characters did not (Fig. 6C). Interestingly, with the guidance of genetic data, the separation of species using visually scored characters actually decreased slightly (Fig. 6B), whereas delimitations using RGB characters improved but only to the point of distinguishing P. satyrus (Fig. 6D). While the respective performance of visually scored and RGB characters is likely due to the inherent information content of each of these character sets, there are also some species-specific differences. For instance, overall, P. satyrus was the most consistently separated species from the rest in analyses with prior species designations (DCA and LDA), despite P. progne having the largest degree of separation using designation-free MCA (Fig. 6A). We considered the relative loadings of characters for the axes from each analysis to infer which characters were most effectively separating, or failing to separate, species. For example, the division of the dorsal forewing spot across vein CuA2 (character 1, Fig. 2), contributed most to separation along the first axis of the MCA, which notably separated P. progne from the rest of the species (Fig. 6; Table 1). Given the prevalence of this character as a delimiting feature in the literature, this is unsurprising. However, other characters that are equally prevalent in the literature were not as diagnostic as expected, such as the shape of the silver comma (character 8). This may be because diagnostic characters in field guides are often written to allow users to distinguish between pairs of similar species, rather than all species in a group; a single character with overlapping states between P. satyrus and P. progne would not affect their species identification since they are otherwise easily distinguished. In fact, many of these characters would be diagnostic if P. faunus were removed from the taxon pool, as the variation in character states of this species was broad (Fig. 2). For example, every state of the aforementioned shape of the silver comma (character 8) was found in at least some specimens of P. faunus included in our study, several individuals did not fit neatly into any of the character states, and one specimen exhibited a marking that was upside-down. Surprisingly, we found few cases of visually scored characters being equally diagnostic across analyses (i.e. the same character contributing most to the separation of a given species in both MCA and DCA). The prominence of ventral forewing spots basal to the submarginal chevrons (character 9) was an exception to this trend, as it was the most explanatory character across axis 2 in the MCA and axis 1 in the DCA (Table 1). Neither of these axes clearly separated one species from the rest, but both displayed gradients from a large group of P. faunus to the other species, albeit with some overlap (Fig. 6). Overall, the following visually scored characters provided the most discrimination in these analyses: discal forewing spots across CuA2 (character 1), ventral forewing spots basal to the submarginal chevrons (character 9) and contrast between basal and distal halves of the ventral hindwing (character 10). Average RGB luminance measurements did not perform well in separating the species in our study. When prior species identifications were employed in an LDA, P. satyrus was the only species distinguished without substantial overlap (Fig. 6). Some characters that are frequently used in species diagnoses in the literature were informative for this separation, such as two ventral hindwing characters: the submarginal region of the ventral hindwing near the comma (character 15) and the overall colour of the ventral hind wing (character 16). Others were not as informative despite similar use in the literature, such as the shade of orange on the dorsal forewing (characters 11, 12 and 13) (Bird et al., 1995; Pyle, 2002). One character in particular, the overall colour of the ventral hind wing (character 16), encompassed a large portion of the wing with highly contrasting patterns. While we expected that averaging the luminance values across this contrast might obfuscate the overall colour of the character state, this was one of the more informative RGB characters. Qualitatively, many wing areas appear very similar in average colour across the species (Fig. 3), which may be due to relative amounts of reddish and black pigments (melanin) producing the same hue but different tones (Hovanitz, 1941). Overall, the utility of colour or luminance-based morphological characters may depend on group-specific characteristics, as they have been used to successfully delimit species in other groups (Lumley & Sperling, 2010; Watanabe, Nakazono & Ono, 2012; McKay et al., 2014). Polygonia may be a particularly difficult group for quantifying colour differences, or alternatively our focus on several species at once could be overcomplicating delimitation of these species. Although it is beyond the scope of this paper, combining quantitative methods such as those used here with species pair sampling across the morphological diversity of each species should more comprehensively address the transition from species delimitation to species identification. CONCLUSION Here, we assessed species limits in an area of widespread overlap between four species of Polygonia butterflies in western Canada. mtDNA and genome-wide SNPs provide clear species delimitation, despite well-documented cytonuclear discordance. Further, there was little evidence for population structure or subspecific distinctions. Visually and digitally scored morphological characters were variably successful for separating species, and the application of genetic data to guide morphological analyses improved species delimitation for the digitally scored data set. We identified several morphological characters that are highly informative for separating species, and more geographically comprehensive sampling is now needed to test the efficacy of the most informative diagnostic characters that we identified for field identifications of this morphologically difficult group. This application of genetic data to guide morphological analysis provides a framework for iterative approaches within broader integrative taxonomy and will undoubtedly continue to help taxonomists to resolve complicated groups. SUPPORTING INFORMATION Additional Supporting Information may be found in the online version of this article at the publisher's web-site: Table S1. Specimen collection information, mtDNA GenBank accessions and morphological data. ‘mtDNA only’ indicates specimens used from Wahlberg et al. (2009). Table S2. Diagnostic characters of Polygonia butterflies published in North American field guides, with an emphasis on western North America. If an author designated a character as being primary, or highly diagnostic, it is indicated with a grey background. DFW, dorsal forewing; DHW, dorsal hindwing; VFW, ventral forewing; VHW, ventral hindwing. Table S3. Summary of pairwise luminance mean values with non-overlapping confidence intervals in pairwise comparisons. Figure S1. Examples of smeared and contrasted morphs of P. faunus and P. satyrus used in this study. Figure S2. Consensus tree from ML analysis of unique COI haplotypes. Dots on nodes indicate high support: >80% SH-aLRT and >95% ultra-fast bootstrap in ML analysis, and >0.8 posterior probability in Bayesian analysis. Major clades are named (as in Fig. 4) in grey text, and grey circles indicate P. g. zephyrus individuals. Figure S3. Consensus tree from ML analysis of 2572 SNP data set. Node support values are SH-aLRT/ultra-fast bootstrap. Figure S4. Fifty per cent majority rule consensus from BI analysis of 121 513 bp alignment of GBS data (including flanking sequence) with posterior probability values for node support. Figure S5. Fifty per cent majority rule consensus from BI analysis of 2572 SNPs from GBS data with posterior probability values for node support. Figure S6. ΔK and LnPr(X|K) results, and STRUCTURE results for all values of K, for all individuals, P. faunus, P. satyrus, P. gracilis & P. progne, P. gracilis and P. progne. Figure S7. Discriminant analysis of principal components (DAPC) on 961 SNPs. Top left shows overall DAPC, and remaining panels are individual species coloured by localities. Polygonia gracilis did not have adequate sampling across localities to analyze by itself. Insets show relative explanatory power of individual discriminant functions. Figure S8. Multivariate analyses of wing characters, displaying third axes of each analysis. Multiple correspondence analysis (A) and discriminant correspondence analysis (B) of visually scored characters, and principal components analysis (C) and linear discriminant analysis (D) of RGB characters. Decimal values on axis labels indicate proportion of total variation explained (MCA, PCA) and the remaining proportion of variation (DCA, LDA). Species inclusion for each specimen was determined by membership in SNP clusters. Figure S9. Central tendency statistics of mean R, G and B values of six wing characters across 248 Polygonia butterflies. The upper and lower boundaries of each box are third and first quartiles, respectively; vertical span of the notches correspond to the upper and lower 95% confidence bounds; the mean is represented by black lines bisecting each box. The upper and lower fences (whiskers) indicate the maximum and minimum values in the data set, while outliers – those outside 1.5 times the interquartile range above the fourth quartile or below the first quartile – are shown as black points. ACKNOWLEDGEMENTS We are grateful to Benny Acorn, Gary Anweiler, Heather Bird, Bryan Brunet, Lina Söderlind and Janet Sperling for assistance in collecting specimens; Kurt Yakimovich for assistance with wet lab work; and Kevin Muirhead for bioinformatic assistance. Funding was provided by an Alberta Conservation Association grant to CMM (#RES0017428) and a National Science and Engineering Research Council Discovery grant to FAHS (#217174). This research was enabled in part by support provided by WestGrid (www.westgrid.ca) and Compute Canada Calcul Canada (www.computecanada.ca). Data files are provided in Supporting Information, and raw data are deposited as Sequence Read Archives in the National Center for Biotechnology Information under accessions SRR5725309–SRR5725392 (BioProject PRJNA390990). REFERENCES Aberer AJ , Kobert K , Stamatakis A . 2014 . ExaBayes: massively parallel Bayesian tree inference for the whole-genome era . Molecular Biology and Evolution 31 : 2553 – 2556 . Google Scholar CrossRef Search ADS PubMed Acorn JH , Sheldon I . 2006 . Butterflies of British Columbia . Vancouver : Lone Pine Publishing . Andrews KR , Good JM , Miller MR , Luikart G , Hohenlohe PA . 2016 . Harnessing the power of RADseq for ecological and evolutionary genomics . Nature Reviews: Genetics 17 : 81 – 92 . Google Scholar CrossRef Search ADS PubMed Bickford D , Lohman DJ , Sodhi NS , Ng PK , Meier R , Winker K , Ingram KK , Das I . 2007 . Cryptic species as a window on diversity and conservation . Trends in Ecology & Evolution 22 : 148 – 155 . Google Scholar CrossRef Search ADS PubMed Bird CD , Hilchie GJ , Kondla NG , Pike EM , Sperling FAH . 1995 . Alberta butterflies . Edmonton : Provincial Museum of Alberta . Bonney R , Shirk JL , Phillips TB , Wiggins A , Ballard HL , Miller-Rushing AJ , Parrish JK . 2014 . Citizen science. Next steps for citizen science . Science 343 : 1436 – 1437 . Google Scholar CrossRef Search ADS PubMed Bortolus A . 2008 . Error cascades in the biological sciences: the unwanted consequences of using bad taxonomy in ecology . AMBIO 37 : 114 – 118 . Google Scholar CrossRef Search ADS PubMed Braschler B , Hill JK . 2007 . Role of larval host plants in the climate-driven range expansion of the butterfly Polygonia c-album . The Journal of Animal Ecology 76 : 415 – 423 . Google Scholar CrossRef Search ADS PubMed Brock JP , Kaufman K . 2003 . Kaufman field guide to butterflies of North America . Boston : Houghton Mifflin Harcourt . Burke RJ , Fitzsimmons JM , Kerr JT . 2011 . A mobility index for Canadian butterfly species based on naturalists’ knowledge . Biodiversity and Conservation 20 : 2273 – 2295 . Google Scholar CrossRef Search ADS Catchen JM , Amores A , Hohenlohe P , Cresko W , Postlethwait JH . 2011 . Stacks: building and genotyping loci de novo from short-read sequences . G3 (Bethesda, Md.) 1 : 171 – 182 . Google Scholar CrossRef Search ADS PubMed Chan KM , Levin SA . 2005 . Leaky prezygotic isolation and porous genomes: rapid introgression of maternally inherited DNA . Evolution 59 : 720 – 729 . Google Scholar CrossRef Search ADS PubMed Clement M , Posada D , Crandall KA . 2000 . TCS: a computer program to estimate gene genealogies . Molecular Ecology 9 : 1657 – 1659 . Google Scholar CrossRef Search ADS PubMed Cruaud A , Gautier M , Galan M , Foucaud J , Sauné L , Genson G , Dubois E , Nidelet S , Deuve T , Rasplus JY . 2014 . Empirical assessment of RAD sequencing for interspecific phylogeny . Molecular Biology and Evolution 31 : 1272 – 1274 . Google Scholar CrossRef Search ADS PubMed DaCosta JM , Sorenson MD . 2016 . ddRAD-seq phylogenetics based on nucleotide, indel, and presence-absence polymorphisms: analyses of two avian genera with contrasting histories . Molecular Phylogenetics and Evolution 94 : 122 – 135 . Google Scholar CrossRef Search ADS PubMed Danecek P , Auton A , Abecasis G , Albers CA , Banks E , DePristo MA , Handsaker RE , Lunter G , Marth GT , Sherry ST , McVean G , Durbin R ; 1000 Genomes Project Analysis Group . 2011 . The variant call format and VCFtools . Bioinformatics 27 : 2156 – 2158 . Google Scholar CrossRef Search ADS PubMed Dayrat B . 2005 . Towards integrative taxonomy . Biological Journal of the Linnean Society 85 : 407 – 415 . Google Scholar CrossRef Search ADS Dejaco T , Gassner M , Arthofer W , Schlick-Steiner BC , Steiner FM . 2016 . Taxonomist’s nightmare … evolutionist’s delight: an integrative approach resolves species limits in jumping bristletails despite widespread hybridization and parthenogenesis . Systematic Biology 65 : 947 – 974 . Google Scholar CrossRef Search ADS PubMed Devictor V , Whittaker RJ , Beltrame C . 2010 . Beyond scarcity: citizen science programmes as useful tools for conservation biogeography . Diversity and Distributions 16 : 354 – 362 . Google Scholar CrossRef Search ADS Dray S , Dufour A-B . 2007 . The ade4 package: implementing the duality diagram for ecologists . Journal of Statistical Software 22 : 1 – 20 . Google Scholar CrossRef Search ADS Drummond AJ , Suchard MA , Xie D , Rambaut A . 2012 . Bayesian phylogenetics with BEAUti and the BEAST 1.7 . Molecular Biology and Evolution 29 : 1969 – 1973 . Google Scholar CrossRef Search ADS PubMed Dumas P , Barbut J , Le Ru B , Silvain JF , Clamens AL , d’Alençon E , Kergoat GJ . 2015 . Phylogenetic molecular species delimitations unravel potential new species in the pest genus Spodoptera Guenée, 1852 (Lepidoptera, Noctuidae) . PLoS One 10 : e0122407 . Google Scholar CrossRef Search ADS PubMed Dupuis JR , Brunet BMT , Bird HM , Lumley LM , Fagua G , Boyle B , Levesque R , Cusson M , Powell JA , Sperling FAH . 2017 . Genome-wide SNPs resolve phylogenetic relationships in the North American spruce budworm (Choristoneura fumiferana) species complex . Molecular Phylogenetics and Evolution 111 : 158 – 168 . Google Scholar CrossRef Search ADS PubMed Dupuis JR , Roe AD , Sperling FA . 2012 . Multi-locus species delimitation in closely related animals and fungi: one marker is not enough . Molecular Ecology 21 : 4422 – 4436 . Google Scholar CrossRef Search ADS PubMed Dupuis JR , Sperling FA . 2015 . Repeated reticulate evolution in North American Papilio machaon group swallowtail butterflies . PLoS One 10 : e0141882 . Google Scholar CrossRef Search ADS PubMed Evanno G , Regnaut S , Goudet J . 2005 . Detecting the number of clusters of individuals using the software STRUCTURE: a simulation study . Molecular Ecology 14 : 2611 – 2620 . Google Scholar CrossRef Search ADS PubMed Falush D , Stephens M , Pritchard JK . 2003 . Inference of population structure using multilocus genotype data: linked loci and correlated allele frequencies . Genetics 164 : 1567 – 1587 . Google Scholar PubMed Frey JE , Guillén L , Frey B , Samietz J , Rull J , Aluja M . 2013 . Developing diagnostic SNP panels for the identification of true fruit flies (Diptera: Tephritidae) within the limits of COI-based species delimitation . BMC Evolutionary Biology 13 : 106 . Google Scholar CrossRef Search ADS PubMed Funk DJ , Omland KE . 2003 . Species-level paraphyly and polyphyly: frequency, causes, and consequences, with insights from animal mitochondrial DNA . Annual Review of Ecology, Evolution, and Systematics 34 : 397 – 423 . Google Scholar CrossRef Search ADS Gratton P , Trucchi E , Trasatti A , Riccarducci G , Marta S , Allegrucci G , Cesaroni D , Sbordoni V . 2016 . Testing classical species properties with contemporary data: how “bad species” in the brassy ringlets (Erebia tyndarus complex, Lepidoptera) turned good . Systematic Biology 65 : 292 – 303 . Google Scholar CrossRef Search ADS PubMed Guindon S , Dufayard JF , Lefort V , Anisimova M , Hordijk W , Gascuel O . 2010 . New algorithms and methods to estimate maximum-likelihood phylogenies: assessing the performance of PhyML 3.0 . Systematic Biology 59 : 307 – 321 . Google Scholar CrossRef Search ADS PubMed Guppy CS , Shephard JH . 2001 . Butterflies of British Columbia . Vancouver : University of British Columbia Press in collaboration with the Royal British Columbia Museum . Hedin M . 2015 . High-stakes species delimitation in eyeless cave spiders (Cicurina, Dictynidae, Araneae) from central Texas . Molecular Ecology 24 : 346 – 361 . Google Scholar CrossRef Search ADS PubMed Heliconius Genome Consortium . 2012 . Butterfly genome reveals promiscuous exchange of mimicry adaptations among species . Nature 487 : 94 – 98 . CrossRef Search ADS PubMed Hovanitz W . 1941 . Parallel ecogenotypical color variation in butterflies . Ecology 22 : 259 – 284 . Google Scholar CrossRef Search ADS Howard E , Davis AK . 2009 . The fall migration flyways of monarch butterflies in eastern North America revealed by citizen scientists . Journal of Insect Conservation 13 : 279 – 286 . Google Scholar CrossRef Search ADS Jombart T . 2008 . adegenet: a R package for the multivariate analysis of genetic markers . Bioinformatics 24 : 1403 – 1405 . Google Scholar CrossRef Search ADS PubMed Jombart T , Devillard S , Balloux F . 2010 . Discriminant analysis of principal components: a new method for the analysis of genetically structured populations . BMC Genetics 11 : 94 . Google Scholar CrossRef Search ADS PubMed Jones JC , Fan S , Franchini P , Schartl M , Meyer A . 2013 . The evolutionary history of Xiphophorus fish and their sexually selected sword: a genome-wide approach using restriction site-associated DNA sequencing . Molecular Ecology 22 : 2986 – 3001 . Google Scholar CrossRef Search ADS PubMed Kearse M , Moir R , Wilson A , Stones-Havas S , Cheung M , Sturrock S , Buxton S , Cooper A , Markowitz S , Duran C , Thierer T , Ashton B , Meintjes P , Drummond A . 2012 . Geneious Basic: an integrated and extendable desktop software platform for the organization and analysis of sequence data . Bioinformatics 28 : 1647 – 1649 . Google Scholar CrossRef Search ADS PubMed Kodandaramaiah U , Weingartner E , Janz N , Dalén L , Nylin S . 2011 . Population structure in relation to host-plant ecology and Wolbachia infestation in the comma butterfly . Journal of Evolutionary Biology 24 : 2173 – 2185 . Google Scholar CrossRef Search ADS PubMed Kodandaramaiah U , Weingartner E , Janz N , Leski M , Slove J , Warren A , Nylin S . 2012 . Investigating concordance among genetic data, subspecies circumscriptions and host plant use in the nymphalid butterfly Polygonia faunus . PLoS One 7 : e41058 . Google Scholar CrossRef Search ADS PubMed Kopelman NM , Mayzel J , Jakobsson M , Rosenberg NA , Mayrose I . 2015 . Clumpak: a program for identifying clustering modes and packaging population structure inferences across K . Molecular Ecology Resources 15 : 1179 – 1191 . Google Scholar CrossRef Search ADS PubMed Larrivee M , Prudic KL , McFarland K , Kerr J . 2014 . eButterfly: a citizen-based butterfly database in the biological sciences . Available at: http://www.e-butterfly.org/ Layberry RA , Hall PW , Lafontaine JD . 1998 . The butterflies of Canada . Toronto : University of Toronto Press . Google Scholar CrossRef Search ADS Leaché AD , Fujita MK , Minin VN , Bouckaert RR . 2014 . Species delimitation using genome-wide SNP data . Systematic Biology 63 : 534 – 542 . Google Scholar CrossRef Search ADS PubMed Lischer HE , Excoffier L . 2012 . PGDSpider: an automated data conversion tool for connecting population genetics and genomics programs . Bioinformatics 28 : 298 – 299 . Google Scholar CrossRef Search ADS PubMed Lumley LM , Sperling FAH . 2010 . Integrating morphology and mitochondrial DNA for species delimitation within the spruce budworm (Choristoneura fumiferana) cryptic species complex (Lepidoptera: Tortricidae) . Systematic Entomology 35 : 416 – 428 . Google Scholar CrossRef Search ADS Maddison WP . 1997 . Gene trees in species trees . Systematic Biology 46 : 523 – 536 . Google Scholar CrossRef Search ADS Maddison WP , Maddison DR . 2008 . Mesquite: a modular system for evolutionary analysis . Evolution 62 : 1103 – 1118 . Google Scholar CrossRef Search ADS PubMed McKay BD , Mays HL Jr , Yao CT , Wan D , Higuchi H , Nishiumi I . 2014 . Incorporating color into integrative taxonomy: analysis of the varied tit (Sittiparus varius) complex in East Asia . Systematic Biology 63 : 505 – 517 . Google Scholar CrossRef Search ADS PubMed Melo-Ferreira J , Seixas FA , Cheng E , Mills LS , Alves PC . 2014 . The hidden history of the snowshoe hare, Lepus americanus: extensive mitochondrial DNA introgression inferred from multilocus genetic variation . Molecular Ecology 23 : 4617 – 4630 . Google Scholar CrossRef Search ADS PubMed Minh BQ , Nguyen MA , von Haeseler A . 2013 . Ultrafast approximation for phylogenetic bootstrap . Molecular Biology and Evolution 30 : 1188 – 1195 . Google Scholar CrossRef Search ADS PubMed Müller P , Pflüger V , Wittwer M , Ziegler D , Chandre F , Simard F , Lengeler C . 2013 . Identification of cryptic Anopheles mosquito species by molecular protein profiling . PLoS One 8 : e57486 . Google Scholar CrossRef Search ADS PubMed Naciri Y , Linder HP . 2015 . Species delimitation and relationships: The dance of the seven veils . Taxon 64 : 3 – 16 . Google Scholar CrossRef Search ADS Nadeau NJ , Martin SH , Kozak KM , Salazar C , Dasmahapatra KK , Davey JW , Baxter SW , Blaxter ML , Mallet J , Jiggins CD . 2013 . Genome-wide patterns of divergence and gene flow across a butterfly radiation . Molecular Ecology 22 : 814 – 826 . Google Scholar CrossRef Search ADS PubMed Nguyen LT , Schmidt HA , von Haeseler A , Minh BQ . 2015 . IQ-TREE: a fast and effective stochastic algorithm for estimating maximum-likelihood phylogenies . Molecular Biology and Evolution 32 : 268 – 274 . Google Scholar CrossRef Search ADS PubMed Nylin S , Nybolm K , Ronquist F , Janz N , Belicek J , Källersjo M . 2001 . Phylogeny of Polygonia, Nymphalis and related butterflies (Lepidoptera: Nymphalidae): a total-evidence analysis . Zoological Journal of the Linnean Society 132 : 441 – 448 . Google Scholar CrossRef Search ADS Nylin S , Söderlind L , Gamberale-Stille G , Audusseau H , Celorio-Mancera MDLP , Janz N , Sperling FAH . 2015 . Vestiges of an ancestral host plant: preference and performance in the butterfly Polygonia faunus and its sister species P. c-album . Ecological Entomology 40 : 307 – 315 . Google Scholar CrossRef Search ADS Pante E , Abdelkrim J , Viricel A , Gey D , France SC , Boisselier MC , Samadi S . 2015 . Use of RAD sequencing for delimiting species . Heredity 114 : 450 – 459 . Google Scholar CrossRef Search ADS PubMed Pardo-Diaz C , Salazar C , Baxter SW , Merot C , Figueiredo-Ready W , Joron M , McMillan WO , Jiggins CD . 2012 . Adaptive introgression across species boundaries in Heliconius butterflies . PLoS Genetics 8 : e1002752 . Google Scholar CrossRef Search ADS PubMed Pelham JP . 2008 . A catalogue of the butterflies of the United States and Canada, with a complete bibliography of the descriptive and systematic literature . The Journal of Research on the Lepidoptera 40 : 1 – 672 . Poland JA , Brown PJ , Sorrells ME , Jannink JL . 2012 . Development of high-density genetic maps for barley and wheat using a novel two-enzyme genotyping-by-sequencing approach . PLoS One 7 : e32253 . Google Scholar CrossRef Search ADS PubMed Pritchard JK , Stephens M , Donnelly P . 2000 . Inference of population structure using multilocus genotype data . Genetics 155 : 945 – 959 . Google Scholar PubMed Pritchard JK , Wen X . 2004 . Documentation for structure software: version 2 . Chicago : University of Chicago Press . Prodanov D . 2010 . Color histogram . Available at: http://imagej.nih.gov.ij/plugins/color-histogram.html Proshek B , Dupuis JR , Engberg A , Davenport K , Opler PA , Powell JA , Sperling FA . 2015 . Genetic evaluation of the evolutionary distinctness of a federally endangered butterfly, Lange’s Metalmark . BMC Evolutionary Biology 15 : 73 . Google Scholar CrossRef Search ADS PubMed Pyle RM . 2002 . The butterflies of Cascadia: a field guide to all the species of Washington, Oregon, and surrounding territories . Seattle : Seattle Audubon Society . R Core Team . 2016 . R: a language and environment for statistical computing . Vienna : R Foundation for Statistical Computing . Available at: https://www.r-project.org/ Rambaut A , Drummond AJ . 2010 . FigTree v1.4.2 . Edinburgh, UK: Institute of Evolutionary Biology, University of Edinburgh . Available at: http://tree.bio.ed.ac.uk/software/figtree Rambaut A , Suchard MA , Xie D , Drummond AJ . 2014 . Tracer v1.6 . Available at: http://beast.bio.ed.ac.uk/Tracer Razkin O , Sonet G , Breugelmans K , Madeira MJ , Gómez-Moliner BJ , Backeljau T . 2016 . Species limits, interspecific hybridization and phylogeny in the cryptic land snail complex Pyramidula: the power of RADseq data . Molecular Phylogenetics and Evolution 101 : 267 – 278 . Google Scholar CrossRef Search ADS PubMed Schlick-Steiner BC , Steiner FM , Seifert B , Stauffer C , Christian E , Crozier RH . 2010 . Integrative taxonomy: a multisource approach to exploring biodiversity . Annual Review of Entomology 55 : 421 – 438 . Google Scholar CrossRef Search ADS PubMed Schneider CA , Rasband WS , Eliceiri KW . 2012 . NIH Image to ImageJ: 25 years of image analysis . Nature Methods 9 : 671 – 675 . Google Scholar CrossRef Search ADS PubMed Shepard J , Guppy CS . 2011 . Butterflies of British Columbia: including Western Alberta, Southern Yukon, the Alaska Panhandle, Washington, Northern Oregon, Northern Idaho, and Northwestern Montana . Vancouver : University of British Columbia Press . Silvertown J . 2009 . A new dawn for citizen science . Trends in Ecology & Evolution 24 : 467 – 471 . Google Scholar CrossRef Search ADS PubMed Sonah H , Bastien M , Iquira E , Tardivel A , Légaré G , Boyle B , Normandeau É , Laroche J , Larose S , Jean M , Belzile F . 2013 . An improved genotyping by sequencing (GBS) approach offering increased versatility and efficiency of SNP discovery and genotyping . PLoS One 8 : e54603 . Google Scholar CrossRef Search ADS PubMed Soucy SM , Huang J , Gogarten JP . 2015 . Horizontal gene transfer: building the web of life . Nature Reviews: Genetics 16 : 472 – 482 . Google Scholar CrossRef Search ADS PubMed Sperling JLH , Sperling FAH . 2009 . Lyme borreliosis in Canada: biological diversity and diagnostic complexity from an entomological perspective . The Canadian Entomologist 141 : 521 – 549 . Google Scholar CrossRef Search ADS Stajich JE , Block D , Boulez K , Brenner SE , Chervitz SA , Dagdigian C , Fuellen G , Gilbert JG , Korf I , Lapp H , Lehväslaiho H , Matsalla C , Mungall CJ , Osborne BI , Pocock MR , Schattner P , Senger M , Stein LD , Stupka E , Wilkinson MD , Birney E . 2002 . The BioPerl toolkit: Perl modules for the life sciences . Genome Research 12 : 1611 – 1618 . Google Scholar CrossRef Search ADS PubMed Steel M , Penny D . 2000 . Parsimony, likelihood, and the role of models in molecular phylogenetics . Molecular Biology and Evolution 17 : 839 – 850 . Google Scholar CrossRef Search ADS PubMed Stervander M , Illera JC , Kvist L , Barbosa P , Keehnen NP , Pruisscher P , Bensch S , Hansson B . 2015 . Disentangling the complex evolutionary history of the Western Palearctic blue tits (Cyanistes spp.) – phylogenomic analyses suggest radiation by multiple colonization events and subsequent isolation . Molecular Ecology 24 : 2477 – 2494 . Google Scholar CrossRef Search ADS PubMed Thompson JD , Gibson TJ , Higgins DG . 2003 . Multiple sequence alignment using ClustalW and ClustalX . Current Protocols in Bioinformatics 2 : 1 – 22 . Vähä JP , Erkinaro J , Niemelä E , Primmer CR . 2007 . Life-history and habitat features influence the within-river genetic structure of Atlantic salmon . Molecular Ecology 16 : 2638 – 2654 . Google Scholar CrossRef Search ADS PubMed Wachter GA , Muster C , Arthofer W , Raspotnig G , Föttinger P , Komposch C , Steiner FM , Schlick-Steiner BC . 2015 . Taking the discovery approach in integrative taxonomy: decrypting a complex of narrow-endemic Alpine harvestmen (Opiliones: Phalangiidae: Megabunus) . Molecular Ecology 24 : 863 – 889 . Google Scholar CrossRef Search ADS PubMed Wagner CE , Keller I , Wittwer S , Selz OM , Mwaiko S , Greuter L , Sivasundar A , Seehausen O . 2013 . Genome-wide RAD sequence data provide unprecedented resolution of species boundaries and relationships in the Lake Victoria cichlid adaptive radiation . Molecular Ecology 22 : 787 – 798 . Google Scholar CrossRef Search ADS PubMed Wahlberg N , Nylin S . 2003 . Morphology versus molecules: resolution of the positions of Nymphalis, Polygonia, and related genera (Lepidoptera: Nymphalidae) . Cladistics 19 : 213 – 223 . Google Scholar CrossRef Search ADS Wahlberg N , Weingartner E , Warren AD , Nylin S . 2009 . Timing major conflict between mitochondrial and nuclear genes in species relationships of Polygonia butterflies (Nymphalidae: Nymphalini) . BMC Evolutionary Biology 9 : 92 . Google Scholar CrossRef Search ADS PubMed Watanabe K , Nakazono T , Ono Y . 2012 . Morphology evolution and molecular phylogeny of Pestalotiopsis (Coelomycetes) based on ITS2 secondary structure . Mycoscience 53 : 227 – 237 . Google Scholar CrossRef Search ADS Yeates DK , Seago A , Nelson L , Cameron SL , Joseph LEO , Trueman JWH . 2011 . Integrative taxonomy, or iterative taxonomy ? Systematic Entomology 36 : 209 – 217 . Google Scholar CrossRef Search ADS Zhang W , Dasmahapatra KK , Mallet J , Moreira GR , Kronforst MR . 2016 . Genome-wide introgression among distantly related Heliconius butterfly species . Genome Biology 17 : 25 . Google Scholar CrossRef Search ADS PubMed Zhulidov PA , Bogdanova EA , Shcheglov AS , Vagner LL , Khaspekov GL , Kozhemyako VB , Matz MV , Meleshkevitch E , Moroz LL , Lukyanov SA , Shagin DA . 2004 . Simple cDNA normalization using Kamchatka crab duplex-specific nuclease . Nucleic Acids Research 32 : e37 . Google Scholar CrossRef Search ADS PubMed © 2017 The Linnean Society of London, Zoological Journal of the Linnean Society This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://academic.oup.com/journals/pages/about_us/legal/notices)

Journal

Zoological Journal of the Linnean SocietyOxford University Press

Published: Nov 13, 2017

There are no references for this article.

You’re reading a free preview. Subscribe to read the entire article.


DeepDyve is your
personal research library

It’s your single place to instantly
discover and read the research
that matters to you.

Enjoy affordable access to
over 18 million articles from more than
15,000 peer-reviewed journals.

All for just $49/month

Explore the DeepDyve Library

Search

Query the DeepDyve database, plus search all of PubMed and Google Scholar seamlessly

Organize

Save any article or search result from DeepDyve, PubMed, and Google Scholar... all in one place.

Access

Get unlimited, online access to over 18 million full-text articles from more than 15,000 scientific journals.

Your journals are on DeepDyve

Read from thousands of the leading scholarly journals from SpringerNature, Elsevier, Wiley-Blackwell, Oxford University Press and more.

All the latest content is available, no embargo periods.

See the journals in your area

DeepDyve

Freelancer

DeepDyve

Pro

Price

FREE

$49/month
$360/year

Save searches from
Google Scholar,
PubMed

Create lists to
organize your research

Export lists, citations

Read DeepDyve articles

Abstract access only

Unlimited access to over
18 million full-text articles

Print

20 pages / month

PDF Discount

20% off