An efficient approach for the development of genome-specific markers in allohexaploid wheat (Triticum aestivum L.) and its application in the construction of high-density linkage maps of the D genome

An efficient approach for the development of genome-specific markers in allohexaploid wheat... In common wheat, the development of genotyping platforms has been hampered by the large size of the genome, its highly repetitive elements and its allohexaploid nature. However, recent advances in sequencing technology provide opportunities to resolve these difficulties. Using next-generation sequencing and gene-targeting sequence capture, 12,551 nucleotide polymor- phisms were detected in the common wheat varieties ‘Hatsumochi’ and ‘Kitahonami’ and were assigned to chromosome arms using International Wheat Genome Sequencing Consortium sur- vey sequences. Because the number of markers for D genome chromosomes in commercially available wheat single nucleotide polymorphism arrays is insufficient, we developed markers using a genome-specific amplicon sequencing strategy. Approximately 80% of the designed pri- mers successfully amplified D genome-specific products, suggesting that by concentrating on a specific subgenome, we were able to design successful markers as efficiently as could be done in a diploid species. The newly developed markers were uniformly distributed across the D ge- nome and greatly extended the total coverage. Polymorphisms were surveyed in six varieties, and 31,542 polymorphic sites and 5,986 potential marker sites were detected in the D genome. The marker development and genotyping strategies are cost effective, robust and flexible and may enhance multi-sample studies in the post-genomic era in wheat. Key words: Triticum aestivum, allohexaploid, next-generation sequencing, amplicon sequencing, wheat D genome V C The Author(s) 2018. Published by Oxford University Press on behalf of Kazusa DNA Research Institute. This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com 317 Downloaded from https://academic.oup.com/dnaresearch/article-abstract/25/3/317/4898127 by Ed 'DeepDyve' Gillespie user on 26 June 2018 318 Genome-specific amplicon sequencing in common wheat genome, new markers must be developed. However, the discovery of 1. Introduction polymorphisms in wheat is hampered by its large genome size (16 Gb) High-throughput genotyping platforms are essential tools for various and high repeat content (approximately 80%). Although the cost of genetic studies that involve genetic mapping, genome-wide association sequencing is decreasing, sequencing the whole genome remains pro- studies, phylogenetic analyses, marker-assisted selection (MAS) and ge- hibitively expensive, particularly in species with large genomes. Henry 1,2 nomic selection. Recent advances in sequencing technologies have et al. reported that exome capture combined with NGS can be used greatly facilitated the discovery of polymorphisms in the whole ge- to successfully and efficiently detect Ethyl methanesulfonate-induced 3 4 nome. The 9K iSelect and 90K iSelect single-nucleotide polymor- mutations. These authors compared the cost effectiveness among plant phism (SNP) arrays have been developed for allohexaploid wheat species with various genome sizes and concluded that this approach is (Triticum aestivum L.) by transcriptome sequencing using next- increasingly advantageous as the genome size increases. Therefore, generation sequencing (NGS) technology. These arrays comprise many wheat is a suitable material for sequence capture in terms of cost effec- gene-based SNPs that allow an individual plant to be genotyped at all tiveness. Furthermore, this technique is useful in controlling the distri- sites simultaneously and tend to be robust marker platforms. bution of targets along chromosomes. Therefore, these arrays have been widely used for germplasm charac- Genotyping by multiplexing amplicon sequencing (GBMAS) (Lab 5–12 terization and quantitative trait locus (QTL) mapping. protocols of Schnable Lab., Iowa State University: http://schnable However, array-based markers are inflexible and have a relatively lab.plantgenomics.iastate.edu/resources/protocols/ (5 February high per-sample cost. The number of polymorphisms detected by the 2018, date last accessed)) and genome-tagged amplification (GTA) markers on an array chip tends to substantially decrease in lines that are new genotyping platforms that combine multiplexed PCR and are genetically distant from those used to design the array. When the multiplexed samples using bar codes with NGS. These methods ap- 9K iSelect array, which was designed based on SNP information of pear to be suitable for MAS due to higher flexibility in choosing com- American and Australian varieties, is used in Japanese varieties, a binations of marker number and sample number. However, low polymorphic rate is detected, particularly in varieties grown in designing PCR primers for wheat is hindered by the close homology the southern regions (latitude 33–37 N). In this region, the wheat of the three genomes (A, B and D) and the high sequence similarity harvest season overlaps with the rainy season, and mainly domestic among the genes and gene family members. Recently, the genetic resources have been used to breed varieties with resistance to International Wheat Genome Sequencing Consortium (IWGSC) of- pre-harvest sprouting and Fusarium head blight. On the other hand, fered a solution for distinguishing genomes using the chromosome- in northern regions from Hokkaido (42–45 N) to Tohoku (37– arm sorting technique. Using the differences among homoeologous 41 N), there is less rain during the harvest season and several North sequences, genome-specific primers that span target polymorphic American varieties were introduced to improve flour quality. In sites can be designed. Genotyping by amplicon sequencing using many cases, the polymorphisms detected in American and Australian genome-specific amplicons could be highly beneficial for simplifying varieties are not conserved in Japanese varieties. Furthermore, when genotype calls and achieving robust analysis for MAS. Axiom HD wheat genotyping arrays (Affymetrix, Santa Clara, CA, The objective of this study is to develop an efficient approach for USA) were used in Japanese materials, only 3.1% of the markers locating polymorphisms across the wheat genomes using the advan- were categorized as ‘PolyHighResolution’, which defines markers tages of a sequence capture technique. To evaluate this approach, we with good cluster resolution and at least two examples of the minor designed D genome-specific primers and constructed linkage maps allele (unpublished). However, a different set of markers in this array using multi-sample genotyping by amplicon sequencing. were of high quality and were more polymorphic in Japanese mate- rials than in samples from varieties processed by the WISP (http:// www.wheatisp.org/ (5 February 2018, date last accessed)) in the de- 2. Materials and methods sign of the array. Thus, the usefulness of array-based markers largely 2.1. Plant materials and DNA extraction depends on the source of the polymorphisms. Additionally, genotype Genomic DNA was extracted from leaves of Triticum aestivum cv. calls remain complicated because of the polyploid nature of wheat, ‘Kitahonami’ and cv. ‘Hatsumochi’ using a DNeasy Plant Mini kit and allele data should be interpreted with caution. These disadvan- (Qiagen, Hilden, Germany). ‘Kitahonami’ is a winter wheat variety tages are deleterious for MAS in wheat breeding programs because a adapted to Hokkaido and has superior properties, such as high yield, cost-effective genotyping platform with a rapid turnaround time, low high flour yield and high noodle-making quality. In contrast, per-sample cost and very low rate of missing data is required for effi- ‘Hatsumochi’ was the first registered waxy (glutinous) variety and is cient MAS. Furthermore, MAS usually relies on a set of specific adapted to the Kanto region (Central Japan). Diversity analyses indi- markers for specific QTLs and genes using a medium throughput sys- cated that modern Japanese varieties were separated into three groups tem, rather than random genes or markers. corresponding to adaptation regions, and these two varieties fall into Available array platforms are limited in their application to the D 13,17 separate groups ; thus, we expected to observe a reasonable level genome, because the genetic coverage of the D genome is highly inade- 14 of polymorphism between them. To construct the genetic map, a popu- quate. This tendency is also observed using other whole-genome gen- lation of 94 recombinant inbred lines (RILs) was developed by crossing otyping platforms, such as DArT and genotyping by sequencing 15–17 ‘Hatsumochi’ and ‘Kitahonami’ then using the single-seed descent (GBS), which also have comparatively fewer D genome markers. method to advance to the F generation. The total genomic DNA was Marcussen et al. reported that the origin of the D genome was 1–2 mil- 18 extracted from the leaves of the RILs using the automated DNA ex- lion years later than that of the A and B genomes. Furthermore, the tracting machine PI-50a (Kurabo Industries Ltd., Osaka, Japan). D subgenome of hexaploid wheat was established by polyploidization 0.4 million years later than the other subgenomes. The shorter time 2.2. Capture probe design and preparation of libraries for accumulation of nucleotide polymorphisms induced by natural mu- tations might influence the scarcity of markers in the D genome. A workflow of sequence capture, polymorphism detection and de- Therefore, to conduct studies focused on chromosomes in the D signing genome-specific primers is described in Fig. 1. To design the Downloaded from https://academic.oup.com/dnaresearch/article-abstract/25/3/317/4898127 by Ed 'DeepDyve' Gillespie user on 26 June 2018 G. Ishikawa et al. 319 Figure 1. A workflow of sequence capture, polymorphism detection and designing genome-specific primers for amplicon sequencing. HTM; Hatsumochi, KH: Kitahonami, SurvSeq: IWGSC survey sequence. capture probes, we selected 8,892 wheat expressed sequence tag hybridization with repetitive regions in the wheat genome, 1,500 ng (EST) and full-length cDNA sequences from NCBI (https://www. of each amplified genomic library was mixed with 10 mL of SeqCap ncbi.nlm.nih.gov/ (5 February 2018, date last accessed)) and EZ reagent Kit in the presence of 10 mg PCE (plant capture enhancer, TriFLDB (Supplementary Table S1). Of these, 4,895 EST se- Roche Diagnostics) in a 0.2-mL thin-wall tube. quences were selected earlier during our design of PCR-based Landmark Unique Gene (PLUG) markers and were known to be 2.3. Detection of nucleotide polymorphisms between evenly distributed across the wheat chromosomes. Most of the re- ‘Hatsumochi’ and ‘Kitahonami’ maining 3,997 sequences were EST sequences derived from mapped The enriched genomic DNAs were sequenced using the next- 9K iSelect probes showing polymorphisms in Japanese varieties, generation sequencer GS FLX plus (Roche Diagnostics). Reads were and additional sequences to fill in gaps in coverage were based on mapped to the wheat survey sequences using 454 Sequencing bin-mapped wheat ESTs that were selected based on the barley System Software 2.7 (option: -mi 98). The detected polymorphisms physical map. Using the syntenic relationships among rice, barley against the survey sequences, including SNPs and Indels, were sup- and wheat, the selected sequences were estimated to be evenly dis- ported with at least three reads in each variety. From them, polymor- tributed across the seven Triticeae chromosomes (Supplementary phic sites between ‘Hatsumochi’ and ‘Kitahonami’ were extracted. Table S1). All sequences were compared with one another, and only The predicted effects of the polymorphisms were analysed using one representative homoeologous copy of each gene was selected for SnpEff with IWGSC ver. 2.2 annotation data (https://wheat-urgi. probe design. DNA capture of genomic DNA of ‘Hatsumochi’ and versailles.inra.fr/Seq-Repository/Genes-annotations (5 February ‘Kitahonami’ was performed with a SeqCap EZ Developer Kit 2018, date last accessed)). (Roche Diagnostics, Basel, Switzerland) following the manufacturer’s protocol. In total, 1,000 ng of each DNA sample was sheared using a 2.4. Design of genome-specific primers Covaris LE220 (Covaris Inc., Woburn, MA, USA) focused ultrasoni- The flanking sequences of target polymorphism sites were obtained cator to fragments that averaged 600 bp. An NEBNext Quick DNA and blasted against the IWGSC survey sequences using PSI- Library Prep Master Mix set for 454 (New England Biolabs Inc., BLAST, and three homoeologous sequences in terms of their chro- Beverly, MA, USA) was used to construct two genomic libraries. mosomal locations were identified. The three sequences were aligned These libraries were amplified and fractionated from 500 to 700 bp by MAFFT ver. 6.864 with a default parameters and imported into before hybridization using a SeqCap EZ reagent Kit. To avoid an in-house Java pipeline. This pipeline requires an alignment file of Downloaded from https://academic.oup.com/dnaresearch/article-abstract/25/3/317/4898127 by Ed 'DeepDyve' Gillespie user on 26 June 2018 320 Genome-specific amplicon sequencing in common wheat the homoeologous sequences in a target region. Based on the poly- script. First, the CS1 and CS2 tail sequences were removed. Second, morphisms among the genomes, the pipeline automatically detects the sequences were aligned to reference sequences using a BLAST- genome-specific primer pairs of 18–22 nucleotides at an amplicon like alignment tool. The reference sequences consisted of internal se- length of approximately 300 bp. The primers contained nucleotides quences in the primers of each marker. Third, based on the alignment specific to the target sequence at the first and/or second positions results of each sample-marker combination, the total number of from their 3 end. The forward and reverse primers for the first PCR aligned reads containing A, C, G and T bases, null alleles or missing were tailed with the common sequence tags CS1 (5 - values (NaN) and nucleotide deletions (“–”) in the base position of 0 0 ACACTGACGACATGGTTCTACA-3 ) and CS2 (5 -TACGGTAGC interest were counted. Fourth, the alleles were defined according to AGAGACTTGGTCT-3 ), respectively, to allow for the addition of the bases with the highest read counts. The minimum read count was adapters during the second round of PCR according to the protocol set to 5. Heterozygous site (H) was called when two alleles each had of the Access Array for Ion Torrent PGM Sequencing System more than 40% of the total reads. Finally, the results of the genotype (Fluidigm Corporation, San Francisco, CA, USA). calls were summarized and exported into a tabular form. 2.5. Library preparation and amplicon sequencing 2.7. Construction of the linkage map The reaction mix for the first PCR contained a total volume of 10 mL The wheat PstI(TaqI) v.3 DArT array (Diversity Array Technology consisting of 1 Multiplex PCR Master Mix (Qiagen), a primer mix Pty Ltd., Bruce, Australia) and 9K SNP array were used to genotype (described below) and 20–50 ng of template DNA. The primer mix 94 RILs. Publicly available SSR markers (GrainGenes 2.0) and func- contained 48 sets of randomly selected locus-specific primers, with tional markers of the Wx-A1, Wx-D1 and Pina-D1 genes were CS1 or CS2 tails attached. The first PCR program consisted of a de- also used. The data obtained using DArT, 9K, SSR and the newly de- naturation step at 95 C for 15 min, followed by 32 cycles of 30 s at veloped markers (TARC) were merged, and polymorphic markers 94 C, 90 s at 60 C and 1 min at 72 C, and a final extension for between parents were extracted. Poor-quality genotypes and geno- 10 min at 72 C. The product from the first PCR of each sample was types with more than 10% missing data or segregation distortion diluted 100-fold with sterilized distilled water, and 2 mL of the di- were removed. Because redundant markers are completely correlated luted product was used as the template for the second PCR. To per- or identical and cannot provide additional information, only one of form bidirectional amplicon tagging in the second PCR, a forward a redundant set of markers was used in map construction. MapDisto fusion primer containing Ion A adapter, barcode, and the CS1 se- 1.7.5 was used to identify linkage groups. Mapped markers corre- 3,15,32 quence and a reverse fusion primer containing Ion P1 adapter and sponding to those on the hexaploid wheat consensus map were the CS2 sequence were used with a portion of the template, while a used as anchors to assign each linkage group to a particular chromo- second portion was amplified with an A-adaptor–barcode–CS2 and some and to orient linkage groups on short and long chromosome P1 adaptor–CS1 primer combination. arms. The 10-mL second PCR mix contained 1 Multiplex PCR Master Mix (Qiagen) and 400 nM forward and reverse fusion primers, and 2.8. Polymorphism survey using multiple genotypes PCR was run using the following profile: 15 min at 95 C, followed To increase the resources for nucleotide polymorphism detection, by 15 cycles of 30 s at 94 C, 90 s at 60 C and 1 min at 72 C, and a four additional varieties, namely ‘Shunyou’, ‘Tohoku224’, final extension step of 10 min at 72 C. All second PCR products ‘Kinuhime’ and ‘Yumechikara’, were subjected to capture sequenc- were mixed in equivalent volumes (2 mL of PCR product per sample). ing. The first three varieties are adopted to the Kanto and Tohoku re- The pooled product was purified using Agencourt AMPure XP gions (Central and Northeastern Japan), while the ‘Yumechikara’ Reagent beads (Beckman-Coulter, Fullerton, CA, USA) as follows: variety is adopted to the Hokkaido region. The procedures used for 12 mL of pooled products, 24 mL of TE buffer and 36 mL of well- DNA extraction, library preparation, sequencing by GS FLX plus mixed AMPure XP beads were vortexed. After a 10-min incubation (Roche Diagnostics) and mapping to the reference sequences were at room temperature, the sample was placed onto a magnetic separa- identical to those described above. To identify highly polymorphic tor for 1 min, and the supernatant was discarded. The beads with sites, we performed pair-wise comparisons among the six varieties sample attached were washed twice with 180 mL of freshly prepared used. The detected polymorphisms, including SNPs and Indels, were 70% ethanol. Finally, the purified PCR products were suspended in supported with at least one read as either “Reference” or “SNP” in a 40 mL of low TE buffer. The quality of the amplicon library was as- given variety. If reads supported both “Reference” and “SNP”, we sessed using an Agilent 2100 Bioanalyzer, and a high sensitivity kit identified that variety as heterozygous at that site. (Agilent Technologies, Santa Clara, CA, USA) was used to define the region covering all PCR library peaks (300–450 bp). The purified li- brary was quantified using a Qubit dsDNA HS assay kit (Thermo 3. Results Fisher Scientific), with dilution to a concentration of 5 pM. Sequencing was performed using an Ion Torrent PGM system with 3.1. Detection of polymorphisms between an Ion PGM 400 sequencing kit and 318 chips (Thermo Fisher ‘Hatsumochi’ and ‘Kitahonami’ Scientific). A schematic diagram of the experimental procedure is Using the next-generation sequencer GS FLX plus, 1,114,867 shown in Supplementary Fig. S1. (536,055,413 bp) and 1,304,168 (615,634,987 bp) reads were ob- tained from the enriched genomic DNA of ‘Hatsumochi’ and 2.6. Data processing and genotype calling ‘Kitahonami’, respectively. The average lengths of the sequences The removal of the Ion Torrent sequencing adaptor sites and demul- were 481 bp for ‘Hatsumochi’ and 472 bp for ‘Kitahonami’. Based tiplexing of the barcodes to separate the different samples were on the criteria described above, 12,551 nucleotide polymorphisms automatically performed by Torrent Suite ver. 5 (Thermo Fisher were detected between these two varieties (Supplementary Table S2). Scientific). Further analysis was conducted using a custom Java Using the survey sequences, we localized these polymorphisms to Downloaded from https://academic.oup.com/dnaresearch/article-abstract/25/3/317/4898127 by Ed 'DeepDyve' Gillespie user on 26 June 2018 G. Ishikawa et al. 321 Table 1. Number of polymorphisms between ‘Hatsumochi’ and ‘Kitahonami’ in each wheat chromosome arm Group Genome Arm 1 2 3 4 5 6 7 Total A S 293 409 137 160 150 287 490 5,293 L 718 651 529 218 513 292 446 B S 174 519 1,197 123 241 428 151 6,316 L 648 929 398 723 512 273 D S 62 143 23 13 16 63 64 942 Figure 2. Cumulative number of polymorphic sites between ‘Hatsumochi’ L 56 126 86 17 146 75 52 and ‘Kitahonami’ in each genic region. The IWGSC gene models were used Total 1,951 2,777 1,972 929 1,789 1,657 1,476 12,551 to define the genic regions. Total number of short and long arms. each chromosome or chromosome arm. As shown in Table 1, the number of polymorphisms varied among the chromosomes and ge- per maker are shown in Fig. 3. The number of reads per sample was nomes. The number of polymorphisms on the D genome were ap- relatively stable, and the average read count was 34,873. However, proximately one-sixth and one-fifth of those on the B and A the variation in the number of reads per marker was high. The differ- genomes, respectively. Relatively fewer polymorphisms were ob- ence between the most and least sequenced markers was more than served in the Group 4 chromosomes than in the other groups. The 150-fold. The markers with large product sizes tended to show a low cumulative numbers of polymorphic sites according to the gene mod- number of reads (data not shown). Based on mapping to the refer- els are shown in Fig. 2. Most polymorphisms were found in genic re- ence sequences, we classified markers into eight classes (Table 2). gions, including both the 500-bp upstream and downstream regions The alleles in markers in classes 1–3 could automatically be defined, (Fig. 2, Supplementary Table S2). In the intron regions, 5,220 poly- while those in markers in classes 4–6 had to be defined manually. morphisms were detected, meaning the intron regions carried the Class 7 indicates markers with mixtures of sequences and indels, and highest percentage (41.6%) of the total polymorphisms. class 8 represents no or quite low reads. The genotypes of 96 samples using 359 markers were obtained, and missing values accounted for 3.2. Validation of polymorphisms by amplicon only 1.2% (392 of 33,746 data points). sequencing Three hundred and ninety-six D genome-specific primers were de- 3.4. Extension of D genome maps using the newly signed using the primer-picking pipeline described above. The details developed markers regarding the primers are provided in Supplementary Table S3. By combining the data obtained using the newly developed markers Preliminary analysis of primers using gel electrophoresis of PCR (defined as TARC markers) with those obtained using DArT, 9K ar- products indicated that single PCR products with the expected frag- ray and SSR, genotyping data for 3,956 markers was obtained ment sizes were obtained using 380 of the 396 primer sets. Multiple (Supplementary Table S4). According to the grouping of redundant products or a lack of bands were observed using the remaining pri- markers, 1,408 markers were considered non-redundant and were mer sets. The PCR products from the 380 successful primer sets were used for map construction using MapDisto software. After filtering mixed and sequenced using GS FLX plus. In total, 442,564 (average by redundancy, the number of DArT and 9K markers was reduced length 322.7 bp) and 640,147 (average length 327.2 bp) reads were to 42.9% and 25.9% of the initial number, respectively. In contrast, obtained from the amplicons of ‘Hatsumochi’ and ‘Kitahonami’, re- of the 359 TARC markers, 261 (72.7%) remained after filtering. spectively. The sequences derived from 312 markers were mapped to Using the default settings of MapDisto, 32 linkage groups were ex- reference sequences without interference from off-target or homoeol- tracted from the 1,399 markers. Nine markers were unassigned. ogous sequences (Table 2). Mixtures of homoeologous sequences Three linkage groups were not assigned to chromosomes because of were observed using 44 markers, which were classified according to the lack of sequence information for the markers. The linkage map the degree of the mixture as follows: low (<30%), medium (30– covered 3,994.7 cM, with an average chromosome length of 189.6 50%) and high (>50%). The alleles in markers with a low level of cM, and the lengths of the individual chromosomes varied from 95.3 mixture could automatically be defined; in contrast, manual exami- cM (6D) to 275.4 cM (3B). Chromosomes 1A, 2A, 3A, 5D, 7B and nation of the data was required to define alleles in markers with me- 7D consisted of two linkage groups, and chromosome 6D consisted dium and high levels of mixture. The polymorphisms detected with of three linkage groups. Only maps of the D genome are illustrated 334 of the markers were consistent with those obtained using se- in Fig. 4. The complete set of linkage maps is shown in quence capture (Supplementary Table S3). Thirty-six markers failed Supplementary Fig. S2. The number of loci per chromosome varied to validate the polymorphisms because of mixtures of homoeologous from 29 (4D) to 112 (2B), with an average of 66 loci per chromo- sequences, while 17 markers showed low read coverage. Nine some. Using a maximum of 0.61 for chromosome 1A and a mini- markers did not show polymorphisms at the target sites. mum of 0.21 for chromosome 4D, the overall marker density was 0.36 markers per cM. The D genome linkage maps were compared 3.3. Genotyping by amplicon sequencing in 96 with previous maps without the TARC markers. The total length of multiplexed samples the D genome increased from 878.4 cM to 1333.1 cM, and the aver- The ‘Hatsumochi’/‘Kitahonami’ RILs were genotyped using an Ion age distance between markers changed from 1.45 to 2.09 cM. In PGM (Thermo Fisher Scientific) according to the procedure de- comparison, the A genome map length was 1222.7 cM with 2.58 cM scribed above. The variations in the number of reads per sample and between markers on average and the B genome 1425.4 cM with Downloaded from https://academic.oup.com/dnaresearch/article-abstract/25/3/317/4898127 by Ed 'DeepDyve' Gillespie user on 26 June 2018 322 Genome-specific amplicon sequencing in common wheat Table 2. Classification of markers based on the mapping results of the amplicon sequencing Validation by parents Genotype of RILs Class Description No. of markers % in total No. of markers % in total 1 Genome specific 312 78.8 247 62.4 2 Mix off-target 6 1.5 11 2.8 3 Mix low (<30%) 16 4.0 34 8.6 4 Mix medium (30–50%) 21 5.3 43 10.9 5 Mix high (>50%) 7 1.8 9 2.3 6 Indel 2 0.5 15 3.8 7 Other 11 2.8 24 6.1 8 Low or no reads 21 5.3 13 3.3 Total 396 396 Markers with both mixtures of sequences and indels. Figure 3. Variations in the number of reads per sample (a) and per marker (b). Error bars in the lower figure indicate standard deviations among 96 samples. 2.80 cM between markers. The TARC markers were distributed varieties ranged from 9,491 (‘Tohoku224’ vs. ‘Kinuhime’) to 19,926 across the seven wheat chromosomes and significantly filled the pre- (‘Hatsumochi’ vs. ‘Yumechikara’) and averaged 15,498 sites. Based viously observed gaps (Fig. 4). Furthermore, the TARC markers suc- on the mapping of the results to the IWGSC survey sequences, 5,986 cessfully extended the long arms of 1D, 2D, 4D and 6D and the polymorphic sites were predicted to be on the D genome and, there- short arms of 2D and 4D. fore, can be considered potential markers for the D genome (Supplementary Table S5). 3.5. Detection of polymorphisms among the six 4. Discussion varieties Using the GS FLX plus, 8,253,381 reads were obtained from the six In allohexaploid wheat, marker development is hampered by a large varieties. Of these reads, 5,473,010 reads were uniquely aligned to genome size, a high proportion of highly repetitive sequence the IWGSC survey sequences. Using the criteria described above, (>80%), and the presence of three different genomes in which corre- 31,542 polymorphic site candidates were detected (Supplementary sponding genes share a high level of sequence similarity. Because of Table S5). The pair-wise comparisons among the six varieties are these characteristics of the wheat genome, nucleotide polymorphisms 33,34 shown in Table 3. The number of polymorphisms among the have been obtained primarily using transcript sequences Downloaded from https://academic.oup.com/dnaresearch/article-abstract/25/3/317/4898127 by Ed 'DeepDyve' Gillespie user on 26 June 2018 G. Ishikawa et al. 323 5D 6D 7D 4D 1D 2D 3D (215.8cM) (95.3cM) (242.3cM) (139.1cM) (179.1cM) (214.5cM) (229.9cM) cM cM Loci cM Loci cM Loci cM Loci cM Loci cM Loci cM Loci (585) wPt-666738 * (1317) tarc0393 0 0.6 (136) tarc0075 (378) tarc0147 (927) pina-d1 (1143) snp4552 1.7 0.5 (744) tarc0025 (1318) wPt-1080 1.1 0.5 (137) snp4716 3.5 (379) tarc0167 3.5 (1319) wPt-664368 0.5 (138) wmc336 (380) tarc0170 (745) tarc0029 7.6 7.7 0.6 2.3 (1320) wPt-743708 (139) tarc0082 12.4 0.6 1.1 (746) tarc0018 (1321) tarc0373 6.8 0.5 (928) tarc0280 0.6 3.5 (140) snp5372 1.1 (1144) tarc0320 (1322) tarc0395 (141) snp7797 (929) cfd18a (1145) tarc0318 1.7 0.5 0.5 (1323) tarc0359 (381) snp989 (586) wPt-742238 (1146) tarc0324 0.6 2.9 (142) tarc0083 0.5 (1324) tarc0396 3.5 1.1 1.1 (143) tarc0076 3.7 15.2 11.5 1.1 (1147) wPt-664719 1.1 (382) snp760 (1325) wPt-744889 (144) tarc0066 1.2 (587) wPt-1630 * (1148) tarc0312 0.5 0.6 (383) snp927 0.6 (1326) wPt-663918 (588) snp8164 * 1.1 0.6 (145) tarc0070 3.1 (1149) wPt-3879 6.7 (1327) wPt-743999 20 (146) tarc0069 (589) wPt-667495 (930) gwm190 4.8 2.9 (1328) snp3747 3.3 1.1 (747) snp3798 0.6 6.8 (147) tarc0081 (384) wPt-733227 6.1 (1329) wPt-744388 (590) wPt-733267 (748) tarc0021 0.6 0.5 (148) wPt-664824 2.9 2.5 1.1 0.6 (1150) tarc0316 (1330) tarc0362 (149) tarc0064 * (385) tarc0158 (591) wPt-669255 * 2.2 (749) tarc0019 (931) wPt-4295 (1151) tarc0321 0.5 0.5 0.6 0.6 (1331) tarc0394 (592) wPt-666676 * (750) snp6277 4.1 14.3 0.5 (150) tarc0074 0.6 0.5 5.6 (1152) tarc0314 (751) tarc0028 (1332) tarc0366 (151) wPt-666174 9.9 (593) wPt-740957 1.7 1.1 (932) snp3470 * 0.5 0.6 (1153) tarc0330 (594) wPt-741202 * 1.1 (752) tarc0022 (933) wPt-3931 * 0.5 (152) wPt-732102 0.5 0.5 2.9 (595) tarc0212 * (753) tarc0012 (1154) tarc0327 1.1 (153) wPt-9664 0.5 1.1 (934) snp8360 (386) tarc0140 2.3 12.4 (154) tarc0071 0.5 (596) wPt-730651 0.6 (754) wPt-5809 (1155) tarc0323 0.5 0.5 1.2 (387) snp4012 (597) wPt-1516 * (755) tarc0002 8.2 (155) wPt-671990 0.5 0.6 14.3 4.8 (388) wPt-666987 ** (756) tarc0014 * 1.7 (156) 90K12311 1.9 (598) wPt-742220 * 1.1 40 11.0 0.5 (1156) tarc0302 0.5 (389) tarc0160 (599) wPt-740629 2.8 (757) cfd23 (1333) wPt-671748 6.7 (157) wPt-731920 2.3 (1157) tarc0289 0.5 (390) snp7273 (600) wPt-740710 (758) tarc0011 3.5 3.7 1.7 (158) wPt-665814 0.5 (391) snp6452 0.5 (1158) tarc0297 (1334) tarc0371 (159) tarc0058 4.7 (759) tarc0003 (935) tarc0284 13.3 2.3 0.5 (1159) tarc0296 (760) tarc0005 5.6 11.4 (936) tarc0285 (1160) tarc0307 9.0 1.1 5.4 (937) tarc0282 1.1 (1335) wmc698 (938) tarc0286 (1161) tarc0287 19.6 (761) wmc331 1.1 1.7 (160) cfd72 (392) tarc0162 (1162) tarc0308 (939) tarc0281 2.3 7.1 5.5 1.7 (1163) snp619 (940) tarc0261 3.5 9.0 (762) tarc0004 (941) tarc0229 4.1 (1336) wx-d1 9.7 3.6 (1164) tarc0298 4.2 (942) tarc0253 1.1 2.4 60 (763) tarc0008 (943) tarc0233 (1165) wPt-743699 0.5 (393) snp209 (601) tarc0221 (161) tarc0035 4.8 11.7 2.3 (394) tarc0145 2.9 (162) snp5698 1.1 1.7 (602) tarc0226 7.1 (944) snp3429 1.1 (395) tarc0149 2.3 (163) gwm458 (603) tarc0216 2.9 3.0 (396) tarc0138 0.6 (764) wPt-667477 (1337) tarc0377 (164) tarc0043 (397) tarc0142 (604) wPt-6066 1.1 0.6 (605) wPt-741446 3.5 (398) tarc0156 3.1 (1166) tarc0295 (399) tarc0155 * (606) tarc0218 1.1 3.8 (607) tarc0217 19.6 4.1 (400) tarc0157 1.1 15.7 (608) wPt-740930 1.1 (401) tarc0139 19.0 (402) wPt-8330 22.1 1.1 0.6 (403) tarc0148 17.6 12.8 (404) wPt-9963 1.2 (165) tarc0049 0.5 (945) tarc0269 0.5 (405) tarc0154 2.9 3.6 (946) tarc0243 (166) tarc0050 1.1 (406) tarc0099 (765) tarc0015 (609) tarc0223 (947) tarc0278 2.9 (407) snp2961 0.6 1.1 2.9 (1167) tarc0292 * (1338) tarc0386 (167) tarc0048 2.9 (610) tarc0228 1.1 1.7 1.1 (408) tarc0127 (948) snp6439 2.9 (168) tarc0040 (611) tarc0214 1.7 (1339) snp1373 1.1 (409) tarc0103 2.9 (949) tarc0230 0.5 (1168) snp6939 0.5 6.1 (169) tarc0038 (612) wPt-732092 14.7 4.1 (1169) tarc0300 1.1 0.6 (410) tarc0086 0.6 1.1 (170) tarc0052 (613) tarc0220 (1170) wPt-731816 3.9 (411) wPt-665342 1.7 (950) tarc0276 0.5 (1340) tarc0372 (412) wPt-2644 (614) tarc0227 2.8 (1171) tarc0304 2.9 0.6 2.2 6.2 (1341) tarc0381 100 11.5 (615) tarc0219 (1172) snp1925 1.7 3.2 (413) wPt-664520 2.2 (766) cfd84 0.5 (1342) tarc0392 2.9 (414) tarc0133 2.2 (616) tnac1248 (951) gwm174 2.3 (1173) snp4056 1.7 (1343) snp1247 (617) tarc0210 * (1174) tarc0294 0.5 10.7 (415) tarc0125 0.5 1.1 (171) wPt-734081 (618) snp7468 * (1344) tarc0387 (416) tarc0087 * 0.5 3.9 (172) tarc0056 1.7 10.0 (1345) tarc0388 ** 2.9 (417) snp4789 * (619) tarc0200 17.1 (173) wPt-732556 1.1 0.5 0.6 (620) tarc0175 * (418) tarc0117 0.5 6.8 0.5 (174) tarc0045 3.5 21.9 (419) tarc0110 ** 0.5 (621) wPt-732185 (952) snp1681 (175) snp3547 1.7 4.1 (622) tarc0187 0.5 (1346) tarc0378 4.7 2.8 (953) tarc0245 (176) tarc0033 (623) tarc0195 3.5 (1347) tarc0368 1.1 (1175) tarc0299 4.3 (420) wPt-666518 (954) tarc0237 0.5 1.1 (624) tarc0209 1.1 (1176) tarc0301 0.6 (1348) tarc0380 120 0.6 (421) tarc0116 (625) tarc0184 4.2 5.4 (1349) tarc0390 (422) tarc0136 6.1 4.8 (626) tarc0171 * 1.1 (767) wPt-1347 (955) snp4274 8.2 (627) tarc0183 (423) cfd233 9.2 (768) wPt-666310 21.7 1.1 1.7 (769) wPt-669158 (424) wPt-2544 2.3 9.1 (1350) tarc0379 0.6 (628) tarc0211 (770) tarc0009 2.8 (629) tarc0193 0.5 (1351) tarc0370 10.4 7.6 (956) tarc0263 (1352) tarc0361 8.0 2.3 4.1 (1353) tarc0383 1.7 (771) tarc0007 (957) tarc0234 (177) tarc0063 (425) tarc0100 2.2 2.2 (1354) tarc0369 2.3 (630) tarc0196 (772) tarc0017 (178) tarc0059 1.7 (1355) tarc0374 1.1 4.7 5.4 1.7 (631) tarc0208 (1356) tarc0385 140 (179) tarc0036 2.3 (426) tarc0101 (958) tarc0236 2.2 (1357) tarc0365 8.2 1.7 (959) tarc0235 (1358) tarc0376 4.2 3.5 (1359) tarc0391 16.0 12.0 (632) tarc0189 *** 1.1 (960) snp678 (961) tarc0257 4.8 5.4 0.5 (1360) tarc0375 (427) tarc0106 11.3 (962) tarc0247 3.5 (1361) snp6350 2.9 2.8 0.5 (180) tarc0061 (1362) tarc0364 * 0.6 (428) wPt-732270 (963) tarc0275 1.1 (181) tarc0047 (1363) snp2208 * 1.1 (429) wPt-732603 1.1 (633) tarc0206 (182) wPt-729773 6.9 3.0 7.6 1.1 (1364) tarc0334 160 6.1 (634) tarc0199 (1365) tarc0360 * 1.1 0.5 (183) tarc0062 (1366) tarc0350 1.1 (430) tarc0114 (635) tarc0207 (964) tarc0241 ** 1.1 1.7 1.1 (184) tarc0031 (431) tarc0091 6.1 (1367) gwm437 * 0.5 5.7 1.7 (185) tarc0039 (432) tarc0130 (1368) tarc0345 * 4.7 0.5 1.1 (636) tarc0177 (433) tarc0120 2.3 (965) snp5970 ** 2.2 (1369) tarc0353 * (186) tarc0032 ** 0.5 2.4 (434) tarc0105 (637) gwm314 (966) tarc0265 * (1370) tarc0343 0.5 0.5 6.1 (1371) snp2273 (435) tarc0089 1.7 0.5 (436) tarc0104 8.3 1.1 (1372) snp1902 (187) tarc0051 * 0.5 (437) tarc0093 (1373) tarc0339 1.1 8.2 (438) tarc0094 0.6 (638) wPt-667098 ** 0.5 (1374) tarc0346 0.5 180 (439) tarc0098 (639) wPt-734315 *** (1375) tarc0336 10.8 2.2 (967) snp1431 6.2 (440) tarc0119 2.9 (1376) gdm67 1.1 (441) tarc0097 11.0 1.1 (968) tarc0249 17.1 5.5 (188) tarc0053 * (969) wPt-740860 2.8 (1377) wPt-4555 * 0.6 (970) tarc0279 7.0 0.6 (442) tarc0088 (640) tarc0181 * (971) tarc0255 (443) tarc0131 0.6 (972) tarc0273 (189) tarc0054 1.1 8.5 9.0 0.6 (973) tarc0264 (974) wmc443 4.9 (1378) tarc0354 (444) tarc0128 (641) tarc0204 2.6 (975) wPt-731949 * 200 (976) tarc0259 1.1 7.0 (977) tarc0231 2.3 9.9 0.6 (978) tarc0262 0.6 (1379) tarc0333 13.3 (979) wPt-666937 1.1 (1380) wPt-5674 0.6 (642) tarc0176 0.6 (980) wPt-1197 0.6 (1381) wPt-665229 (981) tarc0238 1.2 (1382) wPt-663777 1.2 (445) tarc0124 * (982) tarc0270 (1383) tarc0357 1.9 1.7 11.5 2.3 (983) tarc0242 1.7 (1384) wPt-664155 (984) tarc0246 1.8 (1385) tarc0332 2.3 (985) tarc0244 (1386) wPt-664136 0.6 1.3 220 0.5 (643) tarc0180 (986) tarc0272 3.3 (1387) wPt-663958 2.3 (644) barc71 (987) 90K41284 (1388) wPt-663948 0.5 0.6 0.5 (645) tarc0172 1.1 (988) tarc0254 14.5 0.5 (646) wPt-732918 (989) tarc0260 0.6 (647) tarc0182 (990) snp700 1.1 1.7 0.5 (648) snp1847 0.5 (991) wPt-0596 2. (649) tarc0197 (992) snp6189 (1389) tarc0349 4 0.5 (650) wPt-731378 0.5 (993) tarc0250 7.2 (994) tarc0239 1.1 (995) wPt-667413 0.5 (1390) wPt-732048 1.8 0.5 (996) snp7071 240 (1391) snp2226 (997) snp6409 0.5 (998) tarc0271 5.6 (1392) tarc0348 ** Figure 4. Linkage maps of the D genome using the ‘Hatsumochi’/‘Kitahonami’ RILs. Newly developed markers are prefixed with ‘tarc’. Box on the left of each chromosome indicates the putative position of the centromere. Table 3. Number of polymorphic site candidates detected in pair-wise comparisons of six wheat varieties Kitahonami Yumechikara Tohoku224 Hatsumochi Kinuhime Shunyou Kitahonami — 15,638 16,934 16,800 16,670 15,774 Yumechikara — 18,526 19,926 19,216 18,219 Tohoku224 — 12,518 9,491 13,542 Hatsumochi — 11,561 14,008 Kinuhime — 13,661 Shunyou — 16,17 or restriction-site flanking sequences. Although these methods (Fig. 4), and we were able to successfully develop markers in regions have greatly contributed to increasing the number of markers, they that were not covered by commercially available marker platforms, do not provide a method of controlling the chromosomal locations including DArT and 9K arrays. Because of the large genome size of of the detected polymorphisms. Therefore, although the number of wheat (16 Gb), the extraction of reliable polymorphic sites supported polymorphisms is high, a significant bias occurs in the distribution of by sufficient read depth was believed to require a high-performance polymorphisms across the genome. Thus, targeted development of sequencer. However, in this study, using the GS FLX plus (Roche di- markers is required to saturate linkage maps. In this study, gene- agnostics), we obtained only 536 and 616 Mb sequences from the ge- enriched libraries were prepared using custom capture probes. To de- nomic DNA of ‘Hatsumochi’ and ‘Kitahonami’, respectively, sign the probes, we used positional information from sources such as indicating that less than 4% of the genome was sequenced in each the consensus linkage map (International Triticeae Mapping variety. Despite this low coverage, we detected 12,551 polymorphic 32 25 35 Initiative, ITMI), bin-mapped ESTs, PLUG markers and the sites with an average read depth of 5.4 and 5.9 in ‘Hatsumochi’ and barley physical map. Through this process, the map length in the D ‘Kitahonami’, respectively (Supplementary Table S2). These observa- genome increased dramatically from 878.4 cM to 1333.1 cM tions indicated that the SeqCap EZ reagent kit (Roche Diagnostics) Downloaded from https://academic.oup.com/dnaresearch/article-abstract/25/3/317/4898127 by Ed 'DeepDyve' Gillespie user on 26 June 2018 324 Genome-specific amplicon sequencing in common wheat enriched the target sites by approximately 160-fold compared with various genetic studies, such as association mapping and genomic whole genome analysis and greatly contributed to the effectiveness of selection in breeding programs. polymorphism detection. In reference to the gene models of the Because the existing SNP arrays are substantially deficient in the IWGSC survey sequences, most polymorphic sites were located number of D genome markers, we developed markers for the D ge- near genes, indicating that we successfully eliminated repetitive se- nome to demonstrate the effectiveness of our approach. The poly- quences and effectively enriched the targets. The elimination of repet- morphism rate of the D genome is lower than that of the other two itive elements is important for genome mapping because these genomes, which is attributed to the evolutionary history of the D ge- elements usually distort the mapping results. In this study, more than nome. According to recent studies, interploidy natural hybridization 40% of the polymorphic sites were in intron regions, and thus could and subsequent introgression played a significant role in the diversifi- not be found using transcriptome analyses. Because intron regions cation of common wheat, and therefore, the D genome had fewer have a higher level of polymorphisms than exon regions, enriched ge- opportunities to exchange genetic material than the A and B ge- nomic sequencing is a beneficial approach for developing gene- nomes. Although the polymorphic rate is low, comparable numbers related markers. Furthermore, based on SnpEff analysis, 60 and of important genes and QTLs on the D genome are described in the 1,630 polymorphisms were predicted to have high and moderate ef- catalog of gene symbols for wheat. Recent genome-wide associa- fects on the corresponding gene functions, respectively tion studies for pre-harvest sprouting using Chinese wheat land- 40 41 (Supplementary Table S2). Therefore, the strategy used in this study races and European winter wheats reported QTLs on 1D, 3D effectively identified polymorphisms that could potentially be related and 5D chromosomes. Additionally, resistant genes against wheat to agronomically important traits. yellow mosaic virus and soil-borne wheat mosaic virus were found 6,42 In allohexaploid wheat, the presence of three homoeologous ge- on chromosomes 2D and 5D, respectively. For the fine mapping nomes poses a challenge in SNP detection. Although a difference in of these agronomically important genes or QTLs, the number of degree is observed, many markers on the commercially available markers must be increased around the regions of interest. From the SNP arrays are affected by interference from the other two homoeol- polymorphism survey using six varieties, around 6,000 polymorphic ogous sequences. In this study, the genome-specific primers were sites were detected on the D genome, and some sites have been suc- highly beneficial for defining alleles. Approximately 80% of the cessfully used to develop markers (data not shown). Therefore, the marker loci were successfully genotyped using the locus-specific ap- polymorphic information obtained in this study is a useful resource proach when we validated the polymorphisms using the two parental for the further development of markers across the genome. varieties (Table 2). When we used the RIL populations, more than In this study, we proposed efficient strategies for the detection of 70% of the markers were grouped into classes 1–3, indicating that nucleotide polymorphisms among varieties of interest and the design these markers could be genotyped similarly to diploid species. Based of locus-specific primers to achieve robust high-throughput genotyp- on these results, our strategy of designing genome-specific primers ing. The IWGSC’s continuous effort to obtain the first reference se- was effective and demonstrated the importance of obtaining a priori quence of the spring wheat variety ‘Chinese Spring’ is opening the knowledge of the polymorphisms among genomes by comparing the post-genomic era (http://www.wheatgenome.org/ (5 February 2018, homoeologous sequences of interest. date last accessed)). In this era, the motivation for collecting genomic The variations in the read number among samples and markers resources using in-house materials is likely to increase. By comparing were investigated (Fig. 3). In this study, we did not equilibrate the the polymorphic sites found in this study with probe sequences in multiplex PCR samples from individual wells before mixing samples. publicly available SNP arrays, at least two-thirds of the total sites did Despite this simplification of the process, the read numbers among not show any similarity with those probes in BLASTn searches the samples were relatively uniform, except for one outlier sample (Supplementary Tables S2 and S5). The high percentage of new poly- that had an extremely low number of reads. Therefore, the protocol morphic sites indicates the importance of polymorphism surveys us- used in this study is beneficial for processing many samples. In con- ing materials of interest. In addition to the number of polymorphic trast, the read numbers varied among the markers, with the read sites, the polymorphic frequencies among materials are also impor- number ranging from nearly zero to more than 200 per sample. tant. Compared with the existing SNP arrays, the polymorphic sites Markers with longer amplicons tended to have fewer reads (data not among the six varieties in this study provided highly polymorphic shown); however, other factors, such as the annealing efficiencies of markers among Japanese materials, particularly among varieties the primers, also affected the read numbers. Because the multiplex from Central and Western Japan (data not shown). To date, the levels of the samples and markers were determined according to the germplasm used in genomic studies has been limited to the leading minimum number of reads required for each marker, a uniform dis- varieties in developed countries. However, for specific traits, such as tribution of read numbers among the markers is important. Further resistance to disease or abiotic stress, many sources of germplasm optimization of the multiplex PCR conditions will be beneficial in from around the world remain unexamined. Because the method de- minimizing missing genotype values. scribed in this study is less expensive, more flexible and more reliable Because amplicon sequencing primers are typically designed to than previous methods, this method is suitable for pan-genome stud- flank the site of interest, other polymorphisms in addition to the ies that must process many haplotypes. site of interest can sometimes be identified. When applying the method described in this paper to breeding or genetic analysis, the Data Availability detection of new polymorphisms that are independent of the target sites could provide additional haplotype information for the sam- All sequences analysed in the present study were deposited into the ples of interest, and such haplotype information substantially im- DDBJ/GenBank/MMBL database with accession numbers proves power in the detection of marker-trait associations. DRA006270. Sample indices are prefixed to sequence names as follows: Informative markers that contain more than two polymorphic sites ‘Kitahonami’, H7U3MUP; ‘Hatsumochi’, H7YSFDH; ‘Shunyou’, could reduce the marker number required for an investigation, IY4LCZI; ‘Kinuhime’, IZHNKMJ; ‘Tohoku224’, IZO1F6G; and thereby reducing the cost for genotyping, and help in performing ‘Yumechikara’, HKAWHI. Downloaded from https://academic.oup.com/dnaresearch/article-abstract/25/3/317/4898127 by Ed 'DeepDyve' Gillespie user on 26 June 2018 G. Ishikawa et al. 325 novel two-enzyme genotyping-by-sequencing approach, PLoS One, 7, Acknowledgements e32253. This study was supported by a grant from the Ministry of Agriculture, 17. Kobayashi, F., Tanaka, T., Kanamori, H., Wu, J., Katayose, Y. and Forestry, and Fisheries of Japan (Genomics-based Technology for Agricultural Handa, H. 2016, Characterization of a mini core collection of Japanese Improvement, NGB-1002 and NGB-1007). wheat varieties using single-nucleotide polymorphisms generated by genotyping-by-sequencing, Breed Sci., 66, 213–25. 18. Marcussen, T., Sandve, S. R., Heier, L., et al. 2014, Ancient hybridiza- Conflict of interest tions among the ancestral genomes of bread wheat, Science, 345, None declared. 19. Brenchley, R., Spannagl, M., Pfeifer, M., et al. 2012, Analysis of the bread wheat genome using whole-genome shotgun sequencing, Nature, 491, 705–10. Supplementary data 20. Henry, I. M., Nagalakshmi, U., Lieberman, M. C., et al. 2014, Efficient Supplementary data are available at DNARES online. genome-wide detection and cataloging of EMS-induced mutations using exome capture and next-generation sequencing, Plant Cell, 26, 1382–97. 21. Bernardo, A., Wang, S., St Amand, P. and Bai, G. 2015, Using next gener- References ation sequencing for multiplexed trait-linked markers in wheat, PLoS One, 10, e0143890. 1. Gupta, P. K., Rustgi, S. and Mir, R. R. 2008, Array-based 22. IWGSC 2014, A chromosome-based draft sequence of the hexaploid high-throughput DNA markers for crop improvement, Heredity, 101, bread wheat (Triticum aestivum) genome, Science, 345, 1251788. 5–18. 23. Mochida, K., Yoshida, T., Sakurai, T., Ogihara, Y. and Shinozaki, K. 2. Xu, Y. B., Lu, Y. L., Xie, C. X., Gao, S. B., Wan, J. M. and Prasanna, B. 2009, TriFLDB: a database of clustered full-length coding sequences from M. 2012, Whole-genome strategies for marker-assisted plant breeding, Triticeae with applications to comparative grass genomics, Plant Physiol., Mol. Breed., 29, 833–54. 150, 1135–46. 3. Cavanagh, C. R., Chao, S., Wang, S., et al. 2013, Genome-wide compara- 24. Ishikawa, G., Yonemaru, J., Saito, M. and Nakamura, T. 2007, PCR-based tive diversity uncovers multiple targets of selection for improvement in landmark unique gene (PLUG) markers effectively assign homoeologous hexaploid wheat landraces and cultivars, Proc. Natl. Acad. Sci. USA., wheat genes to A, B and D genomes, BMC Genomics, 8,135. 110, 8057–62. 25. Qi, L. L., Echalier, B., Chao, S., et al. 2004, A chromosome bin map of 4. Wang, S., Wong, D., Forrest, K., et al. 2014, Characterization of poly- 16,000 expressed sequence tag loci and distribution of genes among the ploid wheat genomic diversity using a high-density 90,000 single nucleo- three genomes of polyploid wheat, Genetics, 168, 701–12. tide polymorphism array, Plant Biotechnol. J., 12, 787–96. 26. Mayer, K. F., Waugh, R., Brown, J. W., et al. 2012, A physical, genetic 5. Iehisa, J. C., Ohno, R., Kimura, T., et al. 2014, A high-density genetic and functional sequence assembly of the barley genome, Nature, 491, map with array-based markers facilitates structural and quantitative trait 711–6. locus analyses of the common wheat genome, DNA Res., 21, 555–67. 27. Cingolani, P., Platts, A., Wang le, L., et al. 2012, A program for annotat- 6. Liu, S., Yang, X., Zhang, D., Bai, G., Chao, S. and Bockus, W. 2014, ing and predicting the effects of single nucleotide polymorphisms, SnpEff: Genome-wide association analysis identified SNPs closely linked to a gene SNPs in the genome of Drosophila melanogaster strain w1118; iso, Fly, 6, resistant to soil-borne wheat mosaic virus, Theor. Appl. Genet., 127, 80–92. 1039–47. 28. Altschul, S. F., Madden, T. L., Schaffer, A. A., et al. 1997, Gapped 7. Naruoka, Y., Garland-Campbell, K. A. and Carter, A. H. 2015, BLAST and PSI-BLAST: a new generation of protein database search pro- Genome-wide association mapping for stripe rust (Puccinia striiformis F. grams, Nucleic Acids Res., 25, 3389–402. sp. tritici) in US Pacific Northwest winter wheat (Triticum aestivum L.), 29. Katoh, K. and Toh, H. 2008, Recent developments in the MAFFT multi- Theor. Appl. Genet., 128, 1083–101. ple sequence alignment program, Brief Bioinform., 9, 286–98. 8. Wu, Q. H., Chen, Y. X., Zhou, S. H., et al. 2015, High-density genetic 30. Liu, Y., He, Z., Appels, R. and Xia, X. 2012, Functional markers in linkage map construction and QTL mapping of grain shape and size in wheat: current status and future prospects, Theor. Appl. Genet., 125, the wheat population Yanda1817 x Beinong6, PLoS One, 10, e0118144. 1–10. 9. Bulli, P., Zhang, J., Chao, S., Chen, X. and Pumphrey, M. 2016, Genetic 31. Lorieux, M. 2012, MapDisto: fast and efficient computation of genetic architecture of resistance to stripe rust in a global winter wheat germ- linkage maps, Mol. Breed., 30, 1231–5. plasm collection, G3-Genes Genomes Genet., 6, 2237–53. 32. Sorrells, M. E., Gustafson, J. P., Somers, D., et al. 2011, Reconstruction 10. Lin, M., Zhang, D., Liu, S., et al. 2016, Genome-wide association analysis of the synthetic W7984 x Opata M85 wheat reference population, on pre-harvest sprouting resistance and grain color in U.S. winter wheat, Genome, 54, 875–82. BMC Genomics, 17, 794. 33. Allen, A. M., Barker, G. L., Berry, S. T., et al. 2011, Transcript-specific, 11. Yu, L. X., Chao, S., Singh, R. P. and Sorrells, M. E. 2017, Identification single-nucleotide polymorphism discovery and linkage analysis in hexa- and validation of single nucleotide polymorphic markers linked to Ug99 ploid bread wheat (Triticum aestivum L.), Plant Biotechnol. J., 9, stem rust resistance in spring wheat, PLoS One, 12, e0171963. 1086–99. 12. Chao, S. M., Dubcovsky, J., Dvorak, J., et al. 2010, Population- and 34. Akhunov, E. D., Akhunova, A. R., Anderson, O. D., et al. 2010, genome-specific patterns of linkage disequilibrium and SNP variation in Nucleotide diversity maps reveal variation in diversity among wheat ge- spring and winter wheat (Triticum aestivum L.), BMC Genomics, 11,727. nomes and chromosomes, BMC Genomics, 11, 702. 13. Ishikawa, G., Nakamura, K., Ito, H., et al. 2014, Association mapping 35. Ishikawa, G., Nakamura, T., Ashida, T., et al. 2009, Localization of an- and validation of QTLs for flour yield in the soft winter wheat variety chor loci representing five hundred annotated rice genes to wheat chromo- Kitahonami, PLoS One, 9, e111337. somes using PLUG markers, Theor. Appl. Genet., 118, 499–514. 14. Zhai, H., Feng, Z., Li, J., et al. 2016, QTL analysis of spike morphological 36. Akhunov, E., Nicolet, C. and Dvorak, J. 2009, Single nucleotide polymor- traits and plant height in winter wheat (Triticum aestivum L.) using a phism genotyping in polyploid wheat with the Illumina GoldenGate assay, high-density SNP and SSR-based linkage map, Front. Plant Sci., 7, 1617. Theor. Appl. Genet., 119, 507–17. 15. Huang, B. E., George, A. W., Forrest, K. L., et al. 2012, A multiparent ad- 37. Lu, Y. L., Xu, J., Yuan, Z. M., et al. 2012, Comparative LD mapping us- vanced generation inter-cross population for genetic analysis in wheat, ing single SNPs and haplotypes identifies QTL for plant height and bio- Plant Biotechnol. J., 10, 826–39. mass as secondary traits of drought tolerance in maize, Mol. Breeding, 30, 16. Poland, J. A., Brown, P. J., Sorrells, M. E. and Jannink, J. L. 2012, 407–18. Development of high-density genetic maps for barley and wheat using a Downloaded from https://academic.oup.com/dnaresearch/article-abstract/25/3/317/4898127 by Ed 'DeepDyve' Gillespie user on 26 June 2018 326 Genome-specific amplicon sequencing in common wheat 38. Matsuoka, Y. 2011, Evolution of polyploid Triticum wheats under culti- 41. Albrecht, T., Oberforster, M., Kempf, H., et al. 2015, Genome-wide asso- vation: the role of domestication, natural hybridization and allopolyploid ciation mapping of preharvest sprouting resistance in a diversity panel of speciation in their diversification, Plant Cell Physiol., 52, 750–64. European winter wheats, J. Appl. Genet., 56, 277–85. 39. McIntosh, R. A., Yamazaki, Y., Dubcovsky, J., et al. 2013, Catalogue of 42. Nishio, Z., Kojima, H., Hayata, A., et al. 2010, Mapping a gene confer- gene symbols for wheat. https://shigen.nig.ac.jp/wheat/komugi/genes/ ring resistance to Wheat yellow mosaic virus in European winter wheat symbolClassList.jsp (5 February 2018, date last accessed) cultivar ‘Ibis’ (Triticum aestivum L.), Euphytica, 176, 223–9. 40. Zhou, Y., Tang, H., Cheng, M. P., et al. 2017, Genome-wide association 43. Mayer, K. F., Taudien, S., Martis, M., et al. 2009, Gene content and study for pre-harvest sprouting resistance in a large germplasm collection virtual gene order of barley chromosome 1H, Plant Physiol., 151, of Chinese wheat landraces, Front. Plant Sci., 8, 401. 496–505. Downloaded from https://academic.oup.com/dnaresearch/article-abstract/25/3/317/4898127 by Ed 'DeepDyve' Gillespie user on 26 June 2018 http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png DNA Research Oxford University Press

An efficient approach for the development of genome-specific markers in allohexaploid wheat (Triticum aestivum L.) and its application in the construction of high-density linkage maps of the D genome

Free
10 pages

Loading next page...
 
/lp/ou_press/an-efficient-approach-for-the-development-of-genome-specific-markers-OZPs0HPXHz
Publisher
Oxford University Press
Copyright
© The Author(s) 2018. Published by Oxford University Press on behalf of Kazusa DNA Research Institute.
ISSN
1340-2838
eISSN
1756-1663
D.O.I.
10.1093/dnares/dsy004
Publisher site
See Article on Publisher Site

Abstract

In common wheat, the development of genotyping platforms has been hampered by the large size of the genome, its highly repetitive elements and its allohexaploid nature. However, recent advances in sequencing technology provide opportunities to resolve these difficulties. Using next-generation sequencing and gene-targeting sequence capture, 12,551 nucleotide polymor- phisms were detected in the common wheat varieties ‘Hatsumochi’ and ‘Kitahonami’ and were assigned to chromosome arms using International Wheat Genome Sequencing Consortium sur- vey sequences. Because the number of markers for D genome chromosomes in commercially available wheat single nucleotide polymorphism arrays is insufficient, we developed markers using a genome-specific amplicon sequencing strategy. Approximately 80% of the designed pri- mers successfully amplified D genome-specific products, suggesting that by concentrating on a specific subgenome, we were able to design successful markers as efficiently as could be done in a diploid species. The newly developed markers were uniformly distributed across the D ge- nome and greatly extended the total coverage. Polymorphisms were surveyed in six varieties, and 31,542 polymorphic sites and 5,986 potential marker sites were detected in the D genome. The marker development and genotyping strategies are cost effective, robust and flexible and may enhance multi-sample studies in the post-genomic era in wheat. Key words: Triticum aestivum, allohexaploid, next-generation sequencing, amplicon sequencing, wheat D genome V C The Author(s) 2018. Published by Oxford University Press on behalf of Kazusa DNA Research Institute. This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com 317 Downloaded from https://academic.oup.com/dnaresearch/article-abstract/25/3/317/4898127 by Ed 'DeepDyve' Gillespie user on 26 June 2018 318 Genome-specific amplicon sequencing in common wheat genome, new markers must be developed. However, the discovery of 1. Introduction polymorphisms in wheat is hampered by its large genome size (16 Gb) High-throughput genotyping platforms are essential tools for various and high repeat content (approximately 80%). Although the cost of genetic studies that involve genetic mapping, genome-wide association sequencing is decreasing, sequencing the whole genome remains pro- studies, phylogenetic analyses, marker-assisted selection (MAS) and ge- hibitively expensive, particularly in species with large genomes. Henry 1,2 nomic selection. Recent advances in sequencing technologies have et al. reported that exome capture combined with NGS can be used greatly facilitated the discovery of polymorphisms in the whole ge- to successfully and efficiently detect Ethyl methanesulfonate-induced 3 4 nome. The 9K iSelect and 90K iSelect single-nucleotide polymor- mutations. These authors compared the cost effectiveness among plant phism (SNP) arrays have been developed for allohexaploid wheat species with various genome sizes and concluded that this approach is (Triticum aestivum L.) by transcriptome sequencing using next- increasingly advantageous as the genome size increases. Therefore, generation sequencing (NGS) technology. These arrays comprise many wheat is a suitable material for sequence capture in terms of cost effec- gene-based SNPs that allow an individual plant to be genotyped at all tiveness. Furthermore, this technique is useful in controlling the distri- sites simultaneously and tend to be robust marker platforms. bution of targets along chromosomes. Therefore, these arrays have been widely used for germplasm charac- Genotyping by multiplexing amplicon sequencing (GBMAS) (Lab 5–12 terization and quantitative trait locus (QTL) mapping. protocols of Schnable Lab., Iowa State University: http://schnable However, array-based markers are inflexible and have a relatively lab.plantgenomics.iastate.edu/resources/protocols/ (5 February high per-sample cost. The number of polymorphisms detected by the 2018, date last accessed)) and genome-tagged amplification (GTA) markers on an array chip tends to substantially decrease in lines that are new genotyping platforms that combine multiplexed PCR and are genetically distant from those used to design the array. When the multiplexed samples using bar codes with NGS. These methods ap- 9K iSelect array, which was designed based on SNP information of pear to be suitable for MAS due to higher flexibility in choosing com- American and Australian varieties, is used in Japanese varieties, a binations of marker number and sample number. However, low polymorphic rate is detected, particularly in varieties grown in designing PCR primers for wheat is hindered by the close homology the southern regions (latitude 33–37 N). In this region, the wheat of the three genomes (A, B and D) and the high sequence similarity harvest season overlaps with the rainy season, and mainly domestic among the genes and gene family members. Recently, the genetic resources have been used to breed varieties with resistance to International Wheat Genome Sequencing Consortium (IWGSC) of- pre-harvest sprouting and Fusarium head blight. On the other hand, fered a solution for distinguishing genomes using the chromosome- in northern regions from Hokkaido (42–45 N) to Tohoku (37– arm sorting technique. Using the differences among homoeologous 41 N), there is less rain during the harvest season and several North sequences, genome-specific primers that span target polymorphic American varieties were introduced to improve flour quality. In sites can be designed. Genotyping by amplicon sequencing using many cases, the polymorphisms detected in American and Australian genome-specific amplicons could be highly beneficial for simplifying varieties are not conserved in Japanese varieties. Furthermore, when genotype calls and achieving robust analysis for MAS. Axiom HD wheat genotyping arrays (Affymetrix, Santa Clara, CA, The objective of this study is to develop an efficient approach for USA) were used in Japanese materials, only 3.1% of the markers locating polymorphisms across the wheat genomes using the advan- were categorized as ‘PolyHighResolution’, which defines markers tages of a sequence capture technique. To evaluate this approach, we with good cluster resolution and at least two examples of the minor designed D genome-specific primers and constructed linkage maps allele (unpublished). However, a different set of markers in this array using multi-sample genotyping by amplicon sequencing. were of high quality and were more polymorphic in Japanese mate- rials than in samples from varieties processed by the WISP (http:// www.wheatisp.org/ (5 February 2018, date last accessed)) in the de- 2. Materials and methods sign of the array. Thus, the usefulness of array-based markers largely 2.1. Plant materials and DNA extraction depends on the source of the polymorphisms. Additionally, genotype Genomic DNA was extracted from leaves of Triticum aestivum cv. calls remain complicated because of the polyploid nature of wheat, ‘Kitahonami’ and cv. ‘Hatsumochi’ using a DNeasy Plant Mini kit and allele data should be interpreted with caution. These disadvan- (Qiagen, Hilden, Germany). ‘Kitahonami’ is a winter wheat variety tages are deleterious for MAS in wheat breeding programs because a adapted to Hokkaido and has superior properties, such as high yield, cost-effective genotyping platform with a rapid turnaround time, low high flour yield and high noodle-making quality. In contrast, per-sample cost and very low rate of missing data is required for effi- ‘Hatsumochi’ was the first registered waxy (glutinous) variety and is cient MAS. Furthermore, MAS usually relies on a set of specific adapted to the Kanto region (Central Japan). Diversity analyses indi- markers for specific QTLs and genes using a medium throughput sys- cated that modern Japanese varieties were separated into three groups tem, rather than random genes or markers. corresponding to adaptation regions, and these two varieties fall into Available array platforms are limited in their application to the D 13,17 separate groups ; thus, we expected to observe a reasonable level genome, because the genetic coverage of the D genome is highly inade- 14 of polymorphism between them. To construct the genetic map, a popu- quate. This tendency is also observed using other whole-genome gen- lation of 94 recombinant inbred lines (RILs) was developed by crossing otyping platforms, such as DArT and genotyping by sequencing 15–17 ‘Hatsumochi’ and ‘Kitahonami’ then using the single-seed descent (GBS), which also have comparatively fewer D genome markers. method to advance to the F generation. The total genomic DNA was Marcussen et al. reported that the origin of the D genome was 1–2 mil- 18 extracted from the leaves of the RILs using the automated DNA ex- lion years later than that of the A and B genomes. Furthermore, the tracting machine PI-50a (Kurabo Industries Ltd., Osaka, Japan). D subgenome of hexaploid wheat was established by polyploidization 0.4 million years later than the other subgenomes. The shorter time 2.2. Capture probe design and preparation of libraries for accumulation of nucleotide polymorphisms induced by natural mu- tations might influence the scarcity of markers in the D genome. A workflow of sequence capture, polymorphism detection and de- Therefore, to conduct studies focused on chromosomes in the D signing genome-specific primers is described in Fig. 1. To design the Downloaded from https://academic.oup.com/dnaresearch/article-abstract/25/3/317/4898127 by Ed 'DeepDyve' Gillespie user on 26 June 2018 G. Ishikawa et al. 319 Figure 1. A workflow of sequence capture, polymorphism detection and designing genome-specific primers for amplicon sequencing. HTM; Hatsumochi, KH: Kitahonami, SurvSeq: IWGSC survey sequence. capture probes, we selected 8,892 wheat expressed sequence tag hybridization with repetitive regions in the wheat genome, 1,500 ng (EST) and full-length cDNA sequences from NCBI (https://www. of each amplified genomic library was mixed with 10 mL of SeqCap ncbi.nlm.nih.gov/ (5 February 2018, date last accessed)) and EZ reagent Kit in the presence of 10 mg PCE (plant capture enhancer, TriFLDB (Supplementary Table S1). Of these, 4,895 EST se- Roche Diagnostics) in a 0.2-mL thin-wall tube. quences were selected earlier during our design of PCR-based Landmark Unique Gene (PLUG) markers and were known to be 2.3. Detection of nucleotide polymorphisms between evenly distributed across the wheat chromosomes. Most of the re- ‘Hatsumochi’ and ‘Kitahonami’ maining 3,997 sequences were EST sequences derived from mapped The enriched genomic DNAs were sequenced using the next- 9K iSelect probes showing polymorphisms in Japanese varieties, generation sequencer GS FLX plus (Roche Diagnostics). Reads were and additional sequences to fill in gaps in coverage were based on mapped to the wheat survey sequences using 454 Sequencing bin-mapped wheat ESTs that were selected based on the barley System Software 2.7 (option: -mi 98). The detected polymorphisms physical map. Using the syntenic relationships among rice, barley against the survey sequences, including SNPs and Indels, were sup- and wheat, the selected sequences were estimated to be evenly dis- ported with at least three reads in each variety. From them, polymor- tributed across the seven Triticeae chromosomes (Supplementary phic sites between ‘Hatsumochi’ and ‘Kitahonami’ were extracted. Table S1). All sequences were compared with one another, and only The predicted effects of the polymorphisms were analysed using one representative homoeologous copy of each gene was selected for SnpEff with IWGSC ver. 2.2 annotation data (https://wheat-urgi. probe design. DNA capture of genomic DNA of ‘Hatsumochi’ and versailles.inra.fr/Seq-Repository/Genes-annotations (5 February ‘Kitahonami’ was performed with a SeqCap EZ Developer Kit 2018, date last accessed)). (Roche Diagnostics, Basel, Switzerland) following the manufacturer’s protocol. In total, 1,000 ng of each DNA sample was sheared using a 2.4. Design of genome-specific primers Covaris LE220 (Covaris Inc., Woburn, MA, USA) focused ultrasoni- The flanking sequences of target polymorphism sites were obtained cator to fragments that averaged 600 bp. An NEBNext Quick DNA and blasted against the IWGSC survey sequences using PSI- Library Prep Master Mix set for 454 (New England Biolabs Inc., BLAST, and three homoeologous sequences in terms of their chro- Beverly, MA, USA) was used to construct two genomic libraries. mosomal locations were identified. The three sequences were aligned These libraries were amplified and fractionated from 500 to 700 bp by MAFFT ver. 6.864 with a default parameters and imported into before hybridization using a SeqCap EZ reagent Kit. To avoid an in-house Java pipeline. This pipeline requires an alignment file of Downloaded from https://academic.oup.com/dnaresearch/article-abstract/25/3/317/4898127 by Ed 'DeepDyve' Gillespie user on 26 June 2018 320 Genome-specific amplicon sequencing in common wheat the homoeologous sequences in a target region. Based on the poly- script. First, the CS1 and CS2 tail sequences were removed. Second, morphisms among the genomes, the pipeline automatically detects the sequences were aligned to reference sequences using a BLAST- genome-specific primer pairs of 18–22 nucleotides at an amplicon like alignment tool. The reference sequences consisted of internal se- length of approximately 300 bp. The primers contained nucleotides quences in the primers of each marker. Third, based on the alignment specific to the target sequence at the first and/or second positions results of each sample-marker combination, the total number of from their 3 end. The forward and reverse primers for the first PCR aligned reads containing A, C, G and T bases, null alleles or missing were tailed with the common sequence tags CS1 (5 - values (NaN) and nucleotide deletions (“–”) in the base position of 0 0 ACACTGACGACATGGTTCTACA-3 ) and CS2 (5 -TACGGTAGC interest were counted. Fourth, the alleles were defined according to AGAGACTTGGTCT-3 ), respectively, to allow for the addition of the bases with the highest read counts. The minimum read count was adapters during the second round of PCR according to the protocol set to 5. Heterozygous site (H) was called when two alleles each had of the Access Array for Ion Torrent PGM Sequencing System more than 40% of the total reads. Finally, the results of the genotype (Fluidigm Corporation, San Francisco, CA, USA). calls were summarized and exported into a tabular form. 2.5. Library preparation and amplicon sequencing 2.7. Construction of the linkage map The reaction mix for the first PCR contained a total volume of 10 mL The wheat PstI(TaqI) v.3 DArT array (Diversity Array Technology consisting of 1 Multiplex PCR Master Mix (Qiagen), a primer mix Pty Ltd., Bruce, Australia) and 9K SNP array were used to genotype (described below) and 20–50 ng of template DNA. The primer mix 94 RILs. Publicly available SSR markers (GrainGenes 2.0) and func- contained 48 sets of randomly selected locus-specific primers, with tional markers of the Wx-A1, Wx-D1 and Pina-D1 genes were CS1 or CS2 tails attached. The first PCR program consisted of a de- also used. The data obtained using DArT, 9K, SSR and the newly de- naturation step at 95 C for 15 min, followed by 32 cycles of 30 s at veloped markers (TARC) were merged, and polymorphic markers 94 C, 90 s at 60 C and 1 min at 72 C, and a final extension for between parents were extracted. Poor-quality genotypes and geno- 10 min at 72 C. The product from the first PCR of each sample was types with more than 10% missing data or segregation distortion diluted 100-fold with sterilized distilled water, and 2 mL of the di- were removed. Because redundant markers are completely correlated luted product was used as the template for the second PCR. To per- or identical and cannot provide additional information, only one of form bidirectional amplicon tagging in the second PCR, a forward a redundant set of markers was used in map construction. MapDisto fusion primer containing Ion A adapter, barcode, and the CS1 se- 1.7.5 was used to identify linkage groups. Mapped markers corre- 3,15,32 quence and a reverse fusion primer containing Ion P1 adapter and sponding to those on the hexaploid wheat consensus map were the CS2 sequence were used with a portion of the template, while a used as anchors to assign each linkage group to a particular chromo- second portion was amplified with an A-adaptor–barcode–CS2 and some and to orient linkage groups on short and long chromosome P1 adaptor–CS1 primer combination. arms. The 10-mL second PCR mix contained 1 Multiplex PCR Master Mix (Qiagen) and 400 nM forward and reverse fusion primers, and 2.8. Polymorphism survey using multiple genotypes PCR was run using the following profile: 15 min at 95 C, followed To increase the resources for nucleotide polymorphism detection, by 15 cycles of 30 s at 94 C, 90 s at 60 C and 1 min at 72 C, and a four additional varieties, namely ‘Shunyou’, ‘Tohoku224’, final extension step of 10 min at 72 C. All second PCR products ‘Kinuhime’ and ‘Yumechikara’, were subjected to capture sequenc- were mixed in equivalent volumes (2 mL of PCR product per sample). ing. The first three varieties are adopted to the Kanto and Tohoku re- The pooled product was purified using Agencourt AMPure XP gions (Central and Northeastern Japan), while the ‘Yumechikara’ Reagent beads (Beckman-Coulter, Fullerton, CA, USA) as follows: variety is adopted to the Hokkaido region. The procedures used for 12 mL of pooled products, 24 mL of TE buffer and 36 mL of well- DNA extraction, library preparation, sequencing by GS FLX plus mixed AMPure XP beads were vortexed. After a 10-min incubation (Roche Diagnostics) and mapping to the reference sequences were at room temperature, the sample was placed onto a magnetic separa- identical to those described above. To identify highly polymorphic tor for 1 min, and the supernatant was discarded. The beads with sites, we performed pair-wise comparisons among the six varieties sample attached were washed twice with 180 mL of freshly prepared used. The detected polymorphisms, including SNPs and Indels, were 70% ethanol. Finally, the purified PCR products were suspended in supported with at least one read as either “Reference” or “SNP” in a 40 mL of low TE buffer. The quality of the amplicon library was as- given variety. If reads supported both “Reference” and “SNP”, we sessed using an Agilent 2100 Bioanalyzer, and a high sensitivity kit identified that variety as heterozygous at that site. (Agilent Technologies, Santa Clara, CA, USA) was used to define the region covering all PCR library peaks (300–450 bp). The purified li- brary was quantified using a Qubit dsDNA HS assay kit (Thermo 3. Results Fisher Scientific), with dilution to a concentration of 5 pM. Sequencing was performed using an Ion Torrent PGM system with 3.1. Detection of polymorphisms between an Ion PGM 400 sequencing kit and 318 chips (Thermo Fisher ‘Hatsumochi’ and ‘Kitahonami’ Scientific). A schematic diagram of the experimental procedure is Using the next-generation sequencer GS FLX plus, 1,114,867 shown in Supplementary Fig. S1. (536,055,413 bp) and 1,304,168 (615,634,987 bp) reads were ob- tained from the enriched genomic DNA of ‘Hatsumochi’ and 2.6. Data processing and genotype calling ‘Kitahonami’, respectively. The average lengths of the sequences The removal of the Ion Torrent sequencing adaptor sites and demul- were 481 bp for ‘Hatsumochi’ and 472 bp for ‘Kitahonami’. Based tiplexing of the barcodes to separate the different samples were on the criteria described above, 12,551 nucleotide polymorphisms automatically performed by Torrent Suite ver. 5 (Thermo Fisher were detected between these two varieties (Supplementary Table S2). Scientific). Further analysis was conducted using a custom Java Using the survey sequences, we localized these polymorphisms to Downloaded from https://academic.oup.com/dnaresearch/article-abstract/25/3/317/4898127 by Ed 'DeepDyve' Gillespie user on 26 June 2018 G. Ishikawa et al. 321 Table 1. Number of polymorphisms between ‘Hatsumochi’ and ‘Kitahonami’ in each wheat chromosome arm Group Genome Arm 1 2 3 4 5 6 7 Total A S 293 409 137 160 150 287 490 5,293 L 718 651 529 218 513 292 446 B S 174 519 1,197 123 241 428 151 6,316 L 648 929 398 723 512 273 D S 62 143 23 13 16 63 64 942 Figure 2. Cumulative number of polymorphic sites between ‘Hatsumochi’ L 56 126 86 17 146 75 52 and ‘Kitahonami’ in each genic region. The IWGSC gene models were used Total 1,951 2,777 1,972 929 1,789 1,657 1,476 12,551 to define the genic regions. Total number of short and long arms. each chromosome or chromosome arm. As shown in Table 1, the number of polymorphisms varied among the chromosomes and ge- per maker are shown in Fig. 3. The number of reads per sample was nomes. The number of polymorphisms on the D genome were ap- relatively stable, and the average read count was 34,873. However, proximately one-sixth and one-fifth of those on the B and A the variation in the number of reads per marker was high. The differ- genomes, respectively. Relatively fewer polymorphisms were ob- ence between the most and least sequenced markers was more than served in the Group 4 chromosomes than in the other groups. The 150-fold. The markers with large product sizes tended to show a low cumulative numbers of polymorphic sites according to the gene mod- number of reads (data not shown). Based on mapping to the refer- els are shown in Fig. 2. Most polymorphisms were found in genic re- ence sequences, we classified markers into eight classes (Table 2). gions, including both the 500-bp upstream and downstream regions The alleles in markers in classes 1–3 could automatically be defined, (Fig. 2, Supplementary Table S2). In the intron regions, 5,220 poly- while those in markers in classes 4–6 had to be defined manually. morphisms were detected, meaning the intron regions carried the Class 7 indicates markers with mixtures of sequences and indels, and highest percentage (41.6%) of the total polymorphisms. class 8 represents no or quite low reads. The genotypes of 96 samples using 359 markers were obtained, and missing values accounted for 3.2. Validation of polymorphisms by amplicon only 1.2% (392 of 33,746 data points). sequencing Three hundred and ninety-six D genome-specific primers were de- 3.4. Extension of D genome maps using the newly signed using the primer-picking pipeline described above. The details developed markers regarding the primers are provided in Supplementary Table S3. By combining the data obtained using the newly developed markers Preliminary analysis of primers using gel electrophoresis of PCR (defined as TARC markers) with those obtained using DArT, 9K ar- products indicated that single PCR products with the expected frag- ray and SSR, genotyping data for 3,956 markers was obtained ment sizes were obtained using 380 of the 396 primer sets. Multiple (Supplementary Table S4). According to the grouping of redundant products or a lack of bands were observed using the remaining pri- markers, 1,408 markers were considered non-redundant and were mer sets. The PCR products from the 380 successful primer sets were used for map construction using MapDisto software. After filtering mixed and sequenced using GS FLX plus. In total, 442,564 (average by redundancy, the number of DArT and 9K markers was reduced length 322.7 bp) and 640,147 (average length 327.2 bp) reads were to 42.9% and 25.9% of the initial number, respectively. In contrast, obtained from the amplicons of ‘Hatsumochi’ and ‘Kitahonami’, re- of the 359 TARC markers, 261 (72.7%) remained after filtering. spectively. The sequences derived from 312 markers were mapped to Using the default settings of MapDisto, 32 linkage groups were ex- reference sequences without interference from off-target or homoeol- tracted from the 1,399 markers. Nine markers were unassigned. ogous sequences (Table 2). Mixtures of homoeologous sequences Three linkage groups were not assigned to chromosomes because of were observed using 44 markers, which were classified according to the lack of sequence information for the markers. The linkage map the degree of the mixture as follows: low (<30%), medium (30– covered 3,994.7 cM, with an average chromosome length of 189.6 50%) and high (>50%). The alleles in markers with a low level of cM, and the lengths of the individual chromosomes varied from 95.3 mixture could automatically be defined; in contrast, manual exami- cM (6D) to 275.4 cM (3B). Chromosomes 1A, 2A, 3A, 5D, 7B and nation of the data was required to define alleles in markers with me- 7D consisted of two linkage groups, and chromosome 6D consisted dium and high levels of mixture. The polymorphisms detected with of three linkage groups. Only maps of the D genome are illustrated 334 of the markers were consistent with those obtained using se- in Fig. 4. The complete set of linkage maps is shown in quence capture (Supplementary Table S3). Thirty-six markers failed Supplementary Fig. S2. The number of loci per chromosome varied to validate the polymorphisms because of mixtures of homoeologous from 29 (4D) to 112 (2B), with an average of 66 loci per chromo- sequences, while 17 markers showed low read coverage. Nine some. Using a maximum of 0.61 for chromosome 1A and a mini- markers did not show polymorphisms at the target sites. mum of 0.21 for chromosome 4D, the overall marker density was 0.36 markers per cM. The D genome linkage maps were compared 3.3. Genotyping by amplicon sequencing in 96 with previous maps without the TARC markers. The total length of multiplexed samples the D genome increased from 878.4 cM to 1333.1 cM, and the aver- The ‘Hatsumochi’/‘Kitahonami’ RILs were genotyped using an Ion age distance between markers changed from 1.45 to 2.09 cM. In PGM (Thermo Fisher Scientific) according to the procedure de- comparison, the A genome map length was 1222.7 cM with 2.58 cM scribed above. The variations in the number of reads per sample and between markers on average and the B genome 1425.4 cM with Downloaded from https://academic.oup.com/dnaresearch/article-abstract/25/3/317/4898127 by Ed 'DeepDyve' Gillespie user on 26 June 2018 322 Genome-specific amplicon sequencing in common wheat Table 2. Classification of markers based on the mapping results of the amplicon sequencing Validation by parents Genotype of RILs Class Description No. of markers % in total No. of markers % in total 1 Genome specific 312 78.8 247 62.4 2 Mix off-target 6 1.5 11 2.8 3 Mix low (<30%) 16 4.0 34 8.6 4 Mix medium (30–50%) 21 5.3 43 10.9 5 Mix high (>50%) 7 1.8 9 2.3 6 Indel 2 0.5 15 3.8 7 Other 11 2.8 24 6.1 8 Low or no reads 21 5.3 13 3.3 Total 396 396 Markers with both mixtures of sequences and indels. Figure 3. Variations in the number of reads per sample (a) and per marker (b). Error bars in the lower figure indicate standard deviations among 96 samples. 2.80 cM between markers. The TARC markers were distributed varieties ranged from 9,491 (‘Tohoku224’ vs. ‘Kinuhime’) to 19,926 across the seven wheat chromosomes and significantly filled the pre- (‘Hatsumochi’ vs. ‘Yumechikara’) and averaged 15,498 sites. Based viously observed gaps (Fig. 4). Furthermore, the TARC markers suc- on the mapping of the results to the IWGSC survey sequences, 5,986 cessfully extended the long arms of 1D, 2D, 4D and 6D and the polymorphic sites were predicted to be on the D genome and, there- short arms of 2D and 4D. fore, can be considered potential markers for the D genome (Supplementary Table S5). 3.5. Detection of polymorphisms among the six 4. Discussion varieties Using the GS FLX plus, 8,253,381 reads were obtained from the six In allohexaploid wheat, marker development is hampered by a large varieties. Of these reads, 5,473,010 reads were uniquely aligned to genome size, a high proportion of highly repetitive sequence the IWGSC survey sequences. Using the criteria described above, (>80%), and the presence of three different genomes in which corre- 31,542 polymorphic site candidates were detected (Supplementary sponding genes share a high level of sequence similarity. Because of Table S5). The pair-wise comparisons among the six varieties are these characteristics of the wheat genome, nucleotide polymorphisms 33,34 shown in Table 3. The number of polymorphisms among the have been obtained primarily using transcript sequences Downloaded from https://academic.oup.com/dnaresearch/article-abstract/25/3/317/4898127 by Ed 'DeepDyve' Gillespie user on 26 June 2018 G. Ishikawa et al. 323 5D 6D 7D 4D 1D 2D 3D (215.8cM) (95.3cM) (242.3cM) (139.1cM) (179.1cM) (214.5cM) (229.9cM) cM cM Loci cM Loci cM Loci cM Loci cM Loci cM Loci cM Loci (585) wPt-666738 * (1317) tarc0393 0 0.6 (136) tarc0075 (378) tarc0147 (927) pina-d1 (1143) snp4552 1.7 0.5 (744) tarc0025 (1318) wPt-1080 1.1 0.5 (137) snp4716 3.5 (379) tarc0167 3.5 (1319) wPt-664368 0.5 (138) wmc336 (380) tarc0170 (745) tarc0029 7.6 7.7 0.6 2.3 (1320) wPt-743708 (139) tarc0082 12.4 0.6 1.1 (746) tarc0018 (1321) tarc0373 6.8 0.5 (928) tarc0280 0.6 3.5 (140) snp5372 1.1 (1144) tarc0320 (1322) tarc0395 (141) snp7797 (929) cfd18a (1145) tarc0318 1.7 0.5 0.5 (1323) tarc0359 (381) snp989 (586) wPt-742238 (1146) tarc0324 0.6 2.9 (142) tarc0083 0.5 (1324) tarc0396 3.5 1.1 1.1 (143) tarc0076 3.7 15.2 11.5 1.1 (1147) wPt-664719 1.1 (382) snp760 (1325) wPt-744889 (144) tarc0066 1.2 (587) wPt-1630 * (1148) tarc0312 0.5 0.6 (383) snp927 0.6 (1326) wPt-663918 (588) snp8164 * 1.1 0.6 (145) tarc0070 3.1 (1149) wPt-3879 6.7 (1327) wPt-743999 20 (146) tarc0069 (589) wPt-667495 (930) gwm190 4.8 2.9 (1328) snp3747 3.3 1.1 (747) snp3798 0.6 6.8 (147) tarc0081 (384) wPt-733227 6.1 (1329) wPt-744388 (590) wPt-733267 (748) tarc0021 0.6 0.5 (148) wPt-664824 2.9 2.5 1.1 0.6 (1150) tarc0316 (1330) tarc0362 (149) tarc0064 * (385) tarc0158 (591) wPt-669255 * 2.2 (749) tarc0019 (931) wPt-4295 (1151) tarc0321 0.5 0.5 0.6 0.6 (1331) tarc0394 (592) wPt-666676 * (750) snp6277 4.1 14.3 0.5 (150) tarc0074 0.6 0.5 5.6 (1152) tarc0314 (751) tarc0028 (1332) tarc0366 (151) wPt-666174 9.9 (593) wPt-740957 1.7 1.1 (932) snp3470 * 0.5 0.6 (1153) tarc0330 (594) wPt-741202 * 1.1 (752) tarc0022 (933) wPt-3931 * 0.5 (152) wPt-732102 0.5 0.5 2.9 (595) tarc0212 * (753) tarc0012 (1154) tarc0327 1.1 (153) wPt-9664 0.5 1.1 (934) snp8360 (386) tarc0140 2.3 12.4 (154) tarc0071 0.5 (596) wPt-730651 0.6 (754) wPt-5809 (1155) tarc0323 0.5 0.5 1.2 (387) snp4012 (597) wPt-1516 * (755) tarc0002 8.2 (155) wPt-671990 0.5 0.6 14.3 4.8 (388) wPt-666987 ** (756) tarc0014 * 1.7 (156) 90K12311 1.9 (598) wPt-742220 * 1.1 40 11.0 0.5 (1156) tarc0302 0.5 (389) tarc0160 (599) wPt-740629 2.8 (757) cfd23 (1333) wPt-671748 6.7 (157) wPt-731920 2.3 (1157) tarc0289 0.5 (390) snp7273 (600) wPt-740710 (758) tarc0011 3.5 3.7 1.7 (158) wPt-665814 0.5 (391) snp6452 0.5 (1158) tarc0297 (1334) tarc0371 (159) tarc0058 4.7 (759) tarc0003 (935) tarc0284 13.3 2.3 0.5 (1159) tarc0296 (760) tarc0005 5.6 11.4 (936) tarc0285 (1160) tarc0307 9.0 1.1 5.4 (937) tarc0282 1.1 (1335) wmc698 (938) tarc0286 (1161) tarc0287 19.6 (761) wmc331 1.1 1.7 (160) cfd72 (392) tarc0162 (1162) tarc0308 (939) tarc0281 2.3 7.1 5.5 1.7 (1163) snp619 (940) tarc0261 3.5 9.0 (762) tarc0004 (941) tarc0229 4.1 (1336) wx-d1 9.7 3.6 (1164) tarc0298 4.2 (942) tarc0253 1.1 2.4 60 (763) tarc0008 (943) tarc0233 (1165) wPt-743699 0.5 (393) snp209 (601) tarc0221 (161) tarc0035 4.8 11.7 2.3 (394) tarc0145 2.9 (162) snp5698 1.1 1.7 (602) tarc0226 7.1 (944) snp3429 1.1 (395) tarc0149 2.3 (163) gwm458 (603) tarc0216 2.9 3.0 (396) tarc0138 0.6 (764) wPt-667477 (1337) tarc0377 (164) tarc0043 (397) tarc0142 (604) wPt-6066 1.1 0.6 (605) wPt-741446 3.5 (398) tarc0156 3.1 (1166) tarc0295 (399) tarc0155 * (606) tarc0218 1.1 3.8 (607) tarc0217 19.6 4.1 (400) tarc0157 1.1 15.7 (608) wPt-740930 1.1 (401) tarc0139 19.0 (402) wPt-8330 22.1 1.1 0.6 (403) tarc0148 17.6 12.8 (404) wPt-9963 1.2 (165) tarc0049 0.5 (945) tarc0269 0.5 (405) tarc0154 2.9 3.6 (946) tarc0243 (166) tarc0050 1.1 (406) tarc0099 (765) tarc0015 (609) tarc0223 (947) tarc0278 2.9 (407) snp2961 0.6 1.1 2.9 (1167) tarc0292 * (1338) tarc0386 (167) tarc0048 2.9 (610) tarc0228 1.1 1.7 1.1 (408) tarc0127 (948) snp6439 2.9 (168) tarc0040 (611) tarc0214 1.7 (1339) snp1373 1.1 (409) tarc0103 2.9 (949) tarc0230 0.5 (1168) snp6939 0.5 6.1 (169) tarc0038 (612) wPt-732092 14.7 4.1 (1169) tarc0300 1.1 0.6 (410) tarc0086 0.6 1.1 (170) tarc0052 (613) tarc0220 (1170) wPt-731816 3.9 (411) wPt-665342 1.7 (950) tarc0276 0.5 (1340) tarc0372 (412) wPt-2644 (614) tarc0227 2.8 (1171) tarc0304 2.9 0.6 2.2 6.2 (1341) tarc0381 100 11.5 (615) tarc0219 (1172) snp1925 1.7 3.2 (413) wPt-664520 2.2 (766) cfd84 0.5 (1342) tarc0392 2.9 (414) tarc0133 2.2 (616) tnac1248 (951) gwm174 2.3 (1173) snp4056 1.7 (1343) snp1247 (617) tarc0210 * (1174) tarc0294 0.5 10.7 (415) tarc0125 0.5 1.1 (171) wPt-734081 (618) snp7468 * (1344) tarc0387 (416) tarc0087 * 0.5 3.9 (172) tarc0056 1.7 10.0 (1345) tarc0388 ** 2.9 (417) snp4789 * (619) tarc0200 17.1 (173) wPt-732556 1.1 0.5 0.6 (620) tarc0175 * (418) tarc0117 0.5 6.8 0.5 (174) tarc0045 3.5 21.9 (419) tarc0110 ** 0.5 (621) wPt-732185 (952) snp1681 (175) snp3547 1.7 4.1 (622) tarc0187 0.5 (1346) tarc0378 4.7 2.8 (953) tarc0245 (176) tarc0033 (623) tarc0195 3.5 (1347) tarc0368 1.1 (1175) tarc0299 4.3 (420) wPt-666518 (954) tarc0237 0.5 1.1 (624) tarc0209 1.1 (1176) tarc0301 0.6 (1348) tarc0380 120 0.6 (421) tarc0116 (625) tarc0184 4.2 5.4 (1349) tarc0390 (422) tarc0136 6.1 4.8 (626) tarc0171 * 1.1 (767) wPt-1347 (955) snp4274 8.2 (627) tarc0183 (423) cfd233 9.2 (768) wPt-666310 21.7 1.1 1.7 (769) wPt-669158 (424) wPt-2544 2.3 9.1 (1350) tarc0379 0.6 (628) tarc0211 (770) tarc0009 2.8 (629) tarc0193 0.5 (1351) tarc0370 10.4 7.6 (956) tarc0263 (1352) tarc0361 8.0 2.3 4.1 (1353) tarc0383 1.7 (771) tarc0007 (957) tarc0234 (177) tarc0063 (425) tarc0100 2.2 2.2 (1354) tarc0369 2.3 (630) tarc0196 (772) tarc0017 (178) tarc0059 1.7 (1355) tarc0374 1.1 4.7 5.4 1.7 (631) tarc0208 (1356) tarc0385 140 (179) tarc0036 2.3 (426) tarc0101 (958) tarc0236 2.2 (1357) tarc0365 8.2 1.7 (959) tarc0235 (1358) tarc0376 4.2 3.5 (1359) tarc0391 16.0 12.0 (632) tarc0189 *** 1.1 (960) snp678 (961) tarc0257 4.8 5.4 0.5 (1360) tarc0375 (427) tarc0106 11.3 (962) tarc0247 3.5 (1361) snp6350 2.9 2.8 0.5 (180) tarc0061 (1362) tarc0364 * 0.6 (428) wPt-732270 (963) tarc0275 1.1 (181) tarc0047 (1363) snp2208 * 1.1 (429) wPt-732603 1.1 (633) tarc0206 (182) wPt-729773 6.9 3.0 7.6 1.1 (1364) tarc0334 160 6.1 (634) tarc0199 (1365) tarc0360 * 1.1 0.5 (183) tarc0062 (1366) tarc0350 1.1 (430) tarc0114 (635) tarc0207 (964) tarc0241 ** 1.1 1.7 1.1 (184) tarc0031 (431) tarc0091 6.1 (1367) gwm437 * 0.5 5.7 1.7 (185) tarc0039 (432) tarc0130 (1368) tarc0345 * 4.7 0.5 1.1 (636) tarc0177 (433) tarc0120 2.3 (965) snp5970 ** 2.2 (1369) tarc0353 * (186) tarc0032 ** 0.5 2.4 (434) tarc0105 (637) gwm314 (966) tarc0265 * (1370) tarc0343 0.5 0.5 6.1 (1371) snp2273 (435) tarc0089 1.7 0.5 (436) tarc0104 8.3 1.1 (1372) snp1902 (187) tarc0051 * 0.5 (437) tarc0093 (1373) tarc0339 1.1 8.2 (438) tarc0094 0.6 (638) wPt-667098 ** 0.5 (1374) tarc0346 0.5 180 (439) tarc0098 (639) wPt-734315 *** (1375) tarc0336 10.8 2.2 (967) snp1431 6.2 (440) tarc0119 2.9 (1376) gdm67 1.1 (441) tarc0097 11.0 1.1 (968) tarc0249 17.1 5.5 (188) tarc0053 * (969) wPt-740860 2.8 (1377) wPt-4555 * 0.6 (970) tarc0279 7.0 0.6 (442) tarc0088 (640) tarc0181 * (971) tarc0255 (443) tarc0131 0.6 (972) tarc0273 (189) tarc0054 1.1 8.5 9.0 0.6 (973) tarc0264 (974) wmc443 4.9 (1378) tarc0354 (444) tarc0128 (641) tarc0204 2.6 (975) wPt-731949 * 200 (976) tarc0259 1.1 7.0 (977) tarc0231 2.3 9.9 0.6 (978) tarc0262 0.6 (1379) tarc0333 13.3 (979) wPt-666937 1.1 (1380) wPt-5674 0.6 (642) tarc0176 0.6 (980) wPt-1197 0.6 (1381) wPt-665229 (981) tarc0238 1.2 (1382) wPt-663777 1.2 (445) tarc0124 * (982) tarc0270 (1383) tarc0357 1.9 1.7 11.5 2.3 (983) tarc0242 1.7 (1384) wPt-664155 (984) tarc0246 1.8 (1385) tarc0332 2.3 (985) tarc0244 (1386) wPt-664136 0.6 1.3 220 0.5 (643) tarc0180 (986) tarc0272 3.3 (1387) wPt-663958 2.3 (644) barc71 (987) 90K41284 (1388) wPt-663948 0.5 0.6 0.5 (645) tarc0172 1.1 (988) tarc0254 14.5 0.5 (646) wPt-732918 (989) tarc0260 0.6 (647) tarc0182 (990) snp700 1.1 1.7 0.5 (648) snp1847 0.5 (991) wPt-0596 2. (649) tarc0197 (992) snp6189 (1389) tarc0349 4 0.5 (650) wPt-731378 0.5 (993) tarc0250 7.2 (994) tarc0239 1.1 (995) wPt-667413 0.5 (1390) wPt-732048 1.8 0.5 (996) snp7071 240 (1391) snp2226 (997) snp6409 0.5 (998) tarc0271 5.6 (1392) tarc0348 ** Figure 4. Linkage maps of the D genome using the ‘Hatsumochi’/‘Kitahonami’ RILs. Newly developed markers are prefixed with ‘tarc’. Box on the left of each chromosome indicates the putative position of the centromere. Table 3. Number of polymorphic site candidates detected in pair-wise comparisons of six wheat varieties Kitahonami Yumechikara Tohoku224 Hatsumochi Kinuhime Shunyou Kitahonami — 15,638 16,934 16,800 16,670 15,774 Yumechikara — 18,526 19,926 19,216 18,219 Tohoku224 — 12,518 9,491 13,542 Hatsumochi — 11,561 14,008 Kinuhime — 13,661 Shunyou — 16,17 or restriction-site flanking sequences. Although these methods (Fig. 4), and we were able to successfully develop markers in regions have greatly contributed to increasing the number of markers, they that were not covered by commercially available marker platforms, do not provide a method of controlling the chromosomal locations including DArT and 9K arrays. Because of the large genome size of of the detected polymorphisms. Therefore, although the number of wheat (16 Gb), the extraction of reliable polymorphic sites supported polymorphisms is high, a significant bias occurs in the distribution of by sufficient read depth was believed to require a high-performance polymorphisms across the genome. Thus, targeted development of sequencer. However, in this study, using the GS FLX plus (Roche di- markers is required to saturate linkage maps. In this study, gene- agnostics), we obtained only 536 and 616 Mb sequences from the ge- enriched libraries were prepared using custom capture probes. To de- nomic DNA of ‘Hatsumochi’ and ‘Kitahonami’, respectively, sign the probes, we used positional information from sources such as indicating that less than 4% of the genome was sequenced in each the consensus linkage map (International Triticeae Mapping variety. Despite this low coverage, we detected 12,551 polymorphic 32 25 35 Initiative, ITMI), bin-mapped ESTs, PLUG markers and the sites with an average read depth of 5.4 and 5.9 in ‘Hatsumochi’ and barley physical map. Through this process, the map length in the D ‘Kitahonami’, respectively (Supplementary Table S2). These observa- genome increased dramatically from 878.4 cM to 1333.1 cM tions indicated that the SeqCap EZ reagent kit (Roche Diagnostics) Downloaded from https://academic.oup.com/dnaresearch/article-abstract/25/3/317/4898127 by Ed 'DeepDyve' Gillespie user on 26 June 2018 324 Genome-specific amplicon sequencing in common wheat enriched the target sites by approximately 160-fold compared with various genetic studies, such as association mapping and genomic whole genome analysis and greatly contributed to the effectiveness of selection in breeding programs. polymorphism detection. In reference to the gene models of the Because the existing SNP arrays are substantially deficient in the IWGSC survey sequences, most polymorphic sites were located number of D genome markers, we developed markers for the D ge- near genes, indicating that we successfully eliminated repetitive se- nome to demonstrate the effectiveness of our approach. The poly- quences and effectively enriched the targets. The elimination of repet- morphism rate of the D genome is lower than that of the other two itive elements is important for genome mapping because these genomes, which is attributed to the evolutionary history of the D ge- elements usually distort the mapping results. In this study, more than nome. According to recent studies, interploidy natural hybridization 40% of the polymorphic sites were in intron regions, and thus could and subsequent introgression played a significant role in the diversifi- not be found using transcriptome analyses. Because intron regions cation of common wheat, and therefore, the D genome had fewer have a higher level of polymorphisms than exon regions, enriched ge- opportunities to exchange genetic material than the A and B ge- nomic sequencing is a beneficial approach for developing gene- nomes. Although the polymorphic rate is low, comparable numbers related markers. Furthermore, based on SnpEff analysis, 60 and of important genes and QTLs on the D genome are described in the 1,630 polymorphisms were predicted to have high and moderate ef- catalog of gene symbols for wheat. Recent genome-wide associa- fects on the corresponding gene functions, respectively tion studies for pre-harvest sprouting using Chinese wheat land- 40 41 (Supplementary Table S2). Therefore, the strategy used in this study races and European winter wheats reported QTLs on 1D, 3D effectively identified polymorphisms that could potentially be related and 5D chromosomes. Additionally, resistant genes against wheat to agronomically important traits. yellow mosaic virus and soil-borne wheat mosaic virus were found 6,42 In allohexaploid wheat, the presence of three homoeologous ge- on chromosomes 2D and 5D, respectively. For the fine mapping nomes poses a challenge in SNP detection. Although a difference in of these agronomically important genes or QTLs, the number of degree is observed, many markers on the commercially available markers must be increased around the regions of interest. From the SNP arrays are affected by interference from the other two homoeol- polymorphism survey using six varieties, around 6,000 polymorphic ogous sequences. In this study, the genome-specific primers were sites were detected on the D genome, and some sites have been suc- highly beneficial for defining alleles. Approximately 80% of the cessfully used to develop markers (data not shown). Therefore, the marker loci were successfully genotyped using the locus-specific ap- polymorphic information obtained in this study is a useful resource proach when we validated the polymorphisms using the two parental for the further development of markers across the genome. varieties (Table 2). When we used the RIL populations, more than In this study, we proposed efficient strategies for the detection of 70% of the markers were grouped into classes 1–3, indicating that nucleotide polymorphisms among varieties of interest and the design these markers could be genotyped similarly to diploid species. Based of locus-specific primers to achieve robust high-throughput genotyp- on these results, our strategy of designing genome-specific primers ing. The IWGSC’s continuous effort to obtain the first reference se- was effective and demonstrated the importance of obtaining a priori quence of the spring wheat variety ‘Chinese Spring’ is opening the knowledge of the polymorphisms among genomes by comparing the post-genomic era (http://www.wheatgenome.org/ (5 February 2018, homoeologous sequences of interest. date last accessed)). In this era, the motivation for collecting genomic The variations in the read number among samples and markers resources using in-house materials is likely to increase. By comparing were investigated (Fig. 3). In this study, we did not equilibrate the the polymorphic sites found in this study with probe sequences in multiplex PCR samples from individual wells before mixing samples. publicly available SNP arrays, at least two-thirds of the total sites did Despite this simplification of the process, the read numbers among not show any similarity with those probes in BLASTn searches the samples were relatively uniform, except for one outlier sample (Supplementary Tables S2 and S5). The high percentage of new poly- that had an extremely low number of reads. Therefore, the protocol morphic sites indicates the importance of polymorphism surveys us- used in this study is beneficial for processing many samples. In con- ing materials of interest. In addition to the number of polymorphic trast, the read numbers varied among the markers, with the read sites, the polymorphic frequencies among materials are also impor- number ranging from nearly zero to more than 200 per sample. tant. Compared with the existing SNP arrays, the polymorphic sites Markers with longer amplicons tended to have fewer reads (data not among the six varieties in this study provided highly polymorphic shown); however, other factors, such as the annealing efficiencies of markers among Japanese materials, particularly among varieties the primers, also affected the read numbers. Because the multiplex from Central and Western Japan (data not shown). To date, the levels of the samples and markers were determined according to the germplasm used in genomic studies has been limited to the leading minimum number of reads required for each marker, a uniform dis- varieties in developed countries. However, for specific traits, such as tribution of read numbers among the markers is important. Further resistance to disease or abiotic stress, many sources of germplasm optimization of the multiplex PCR conditions will be beneficial in from around the world remain unexamined. Because the method de- minimizing missing genotype values. scribed in this study is less expensive, more flexible and more reliable Because amplicon sequencing primers are typically designed to than previous methods, this method is suitable for pan-genome stud- flank the site of interest, other polymorphisms in addition to the ies that must process many haplotypes. site of interest can sometimes be identified. When applying the method described in this paper to breeding or genetic analysis, the Data Availability detection of new polymorphisms that are independent of the target sites could provide additional haplotype information for the sam- All sequences analysed in the present study were deposited into the ples of interest, and such haplotype information substantially im- DDBJ/GenBank/MMBL database with accession numbers proves power in the detection of marker-trait associations. DRA006270. Sample indices are prefixed to sequence names as follows: Informative markers that contain more than two polymorphic sites ‘Kitahonami’, H7U3MUP; ‘Hatsumochi’, H7YSFDH; ‘Shunyou’, could reduce the marker number required for an investigation, IY4LCZI; ‘Kinuhime’, IZHNKMJ; ‘Tohoku224’, IZO1F6G; and thereby reducing the cost for genotyping, and help in performing ‘Yumechikara’, HKAWHI. Downloaded from https://academic.oup.com/dnaresearch/article-abstract/25/3/317/4898127 by Ed 'DeepDyve' Gillespie user on 26 June 2018 G. Ishikawa et al. 325 novel two-enzyme genotyping-by-sequencing approach, PLoS One, 7, Acknowledgements e32253. This study was supported by a grant from the Ministry of Agriculture, 17. Kobayashi, F., Tanaka, T., Kanamori, H., Wu, J., Katayose, Y. and Forestry, and Fisheries of Japan (Genomics-based Technology for Agricultural Handa, H. 2016, Characterization of a mini core collection of Japanese Improvement, NGB-1002 and NGB-1007). wheat varieties using single-nucleotide polymorphisms generated by genotyping-by-sequencing, Breed Sci., 66, 213–25. 18. Marcussen, T., Sandve, S. R., Heier, L., et al. 2014, Ancient hybridiza- Conflict of interest tions among the ancestral genomes of bread wheat, Science, 345, None declared. 19. Brenchley, R., Spannagl, M., Pfeifer, M., et al. 2012, Analysis of the bread wheat genome using whole-genome shotgun sequencing, Nature, 491, 705–10. Supplementary data 20. Henry, I. M., Nagalakshmi, U., Lieberman, M. C., et al. 2014, Efficient Supplementary data are available at DNARES online. genome-wide detection and cataloging of EMS-induced mutations using exome capture and next-generation sequencing, Plant Cell, 26, 1382–97. 21. Bernardo, A., Wang, S., St Amand, P. and Bai, G. 2015, Using next gener- References ation sequencing for multiplexed trait-linked markers in wheat, PLoS One, 10, e0143890. 1. Gupta, P. K., Rustgi, S. and Mir, R. R. 2008, Array-based 22. IWGSC 2014, A chromosome-based draft sequence of the hexaploid high-throughput DNA markers for crop improvement, Heredity, 101, bread wheat (Triticum aestivum) genome, Science, 345, 1251788. 5–18. 23. Mochida, K., Yoshida, T., Sakurai, T., Ogihara, Y. and Shinozaki, K. 2. Xu, Y. B., Lu, Y. L., Xie, C. X., Gao, S. B., Wan, J. M. and Prasanna, B. 2009, TriFLDB: a database of clustered full-length coding sequences from M. 2012, Whole-genome strategies for marker-assisted plant breeding, Triticeae with applications to comparative grass genomics, Plant Physiol., Mol. Breed., 29, 833–54. 150, 1135–46. 3. Cavanagh, C. R., Chao, S., Wang, S., et al. 2013, Genome-wide compara- 24. Ishikawa, G., Yonemaru, J., Saito, M. and Nakamura, T. 2007, PCR-based tive diversity uncovers multiple targets of selection for improvement in landmark unique gene (PLUG) markers effectively assign homoeologous hexaploid wheat landraces and cultivars, Proc. Natl. Acad. Sci. USA., wheat genes to A, B and D genomes, BMC Genomics, 8,135. 110, 8057–62. 25. Qi, L. L., Echalier, B., Chao, S., et al. 2004, A chromosome bin map of 4. Wang, S., Wong, D., Forrest, K., et al. 2014, Characterization of poly- 16,000 expressed sequence tag loci and distribution of genes among the ploid wheat genomic diversity using a high-density 90,000 single nucleo- three genomes of polyploid wheat, Genetics, 168, 701–12. tide polymorphism array, Plant Biotechnol. J., 12, 787–96. 26. Mayer, K. F., Waugh, R., Brown, J. W., et al. 2012, A physical, genetic 5. Iehisa, J. C., Ohno, R., Kimura, T., et al. 2014, A high-density genetic and functional sequence assembly of the barley genome, Nature, 491, map with array-based markers facilitates structural and quantitative trait 711–6. locus analyses of the common wheat genome, DNA Res., 21, 555–67. 27. Cingolani, P., Platts, A., Wang le, L., et al. 2012, A program for annotat- 6. Liu, S., Yang, X., Zhang, D., Bai, G., Chao, S. and Bockus, W. 2014, ing and predicting the effects of single nucleotide polymorphisms, SnpEff: Genome-wide association analysis identified SNPs closely linked to a gene SNPs in the genome of Drosophila melanogaster strain w1118; iso, Fly, 6, resistant to soil-borne wheat mosaic virus, Theor. Appl. Genet., 127, 80–92. 1039–47. 28. Altschul, S. F., Madden, T. L., Schaffer, A. A., et al. 1997, Gapped 7. Naruoka, Y., Garland-Campbell, K. A. and Carter, A. H. 2015, BLAST and PSI-BLAST: a new generation of protein database search pro- Genome-wide association mapping for stripe rust (Puccinia striiformis F. grams, Nucleic Acids Res., 25, 3389–402. sp. tritici) in US Pacific Northwest winter wheat (Triticum aestivum L.), 29. Katoh, K. and Toh, H. 2008, Recent developments in the MAFFT multi- Theor. Appl. Genet., 128, 1083–101. ple sequence alignment program, Brief Bioinform., 9, 286–98. 8. Wu, Q. H., Chen, Y. X., Zhou, S. H., et al. 2015, High-density genetic 30. Liu, Y., He, Z., Appels, R. and Xia, X. 2012, Functional markers in linkage map construction and QTL mapping of grain shape and size in wheat: current status and future prospects, Theor. Appl. Genet., 125, the wheat population Yanda1817 x Beinong6, PLoS One, 10, e0118144. 1–10. 9. Bulli, P., Zhang, J., Chao, S., Chen, X. and Pumphrey, M. 2016, Genetic 31. Lorieux, M. 2012, MapDisto: fast and efficient computation of genetic architecture of resistance to stripe rust in a global winter wheat germ- linkage maps, Mol. Breed., 30, 1231–5. plasm collection, G3-Genes Genomes Genet., 6, 2237–53. 32. Sorrells, M. E., Gustafson, J. P., Somers, D., et al. 2011, Reconstruction 10. Lin, M., Zhang, D., Liu, S., et al. 2016, Genome-wide association analysis of the synthetic W7984 x Opata M85 wheat reference population, on pre-harvest sprouting resistance and grain color in U.S. winter wheat, Genome, 54, 875–82. BMC Genomics, 17, 794. 33. Allen, A. M., Barker, G. L., Berry, S. T., et al. 2011, Transcript-specific, 11. Yu, L. X., Chao, S., Singh, R. P. and Sorrells, M. E. 2017, Identification single-nucleotide polymorphism discovery and linkage analysis in hexa- and validation of single nucleotide polymorphic markers linked to Ug99 ploid bread wheat (Triticum aestivum L.), Plant Biotechnol. J., 9, stem rust resistance in spring wheat, PLoS One, 12, e0171963. 1086–99. 12. Chao, S. M., Dubcovsky, J., Dvorak, J., et al. 2010, Population- and 34. Akhunov, E. D., Akhunova, A. R., Anderson, O. D., et al. 2010, genome-specific patterns of linkage disequilibrium and SNP variation in Nucleotide diversity maps reveal variation in diversity among wheat ge- spring and winter wheat (Triticum aestivum L.), BMC Genomics, 11,727. nomes and chromosomes, BMC Genomics, 11, 702. 13. Ishikawa, G., Nakamura, K., Ito, H., et al. 2014, Association mapping 35. Ishikawa, G., Nakamura, T., Ashida, T., et al. 2009, Localization of an- and validation of QTLs for flour yield in the soft winter wheat variety chor loci representing five hundred annotated rice genes to wheat chromo- Kitahonami, PLoS One, 9, e111337. somes using PLUG markers, Theor. Appl. Genet., 118, 499–514. 14. Zhai, H., Feng, Z., Li, J., et al. 2016, QTL analysis of spike morphological 36. Akhunov, E., Nicolet, C. and Dvorak, J. 2009, Single nucleotide polymor- traits and plant height in winter wheat (Triticum aestivum L.) using a phism genotyping in polyploid wheat with the Illumina GoldenGate assay, high-density SNP and SSR-based linkage map, Front. Plant Sci., 7, 1617. Theor. Appl. Genet., 119, 507–17. 15. Huang, B. E., George, A. W., Forrest, K. L., et al. 2012, A multiparent ad- 37. Lu, Y. L., Xu, J., Yuan, Z. M., et al. 2012, Comparative LD mapping us- vanced generation inter-cross population for genetic analysis in wheat, ing single SNPs and haplotypes identifies QTL for plant height and bio- Plant Biotechnol. J., 10, 826–39. mass as secondary traits of drought tolerance in maize, Mol. Breeding, 30, 16. Poland, J. A., Brown, P. J., Sorrells, M. E. and Jannink, J. L. 2012, 407–18. Development of high-density genetic maps for barley and wheat using a Downloaded from https://academic.oup.com/dnaresearch/article-abstract/25/3/317/4898127 by Ed 'DeepDyve' Gillespie user on 26 June 2018 326 Genome-specific amplicon sequencing in common wheat 38. Matsuoka, Y. 2011, Evolution of polyploid Triticum wheats under culti- 41. Albrecht, T., Oberforster, M., Kempf, H., et al. 2015, Genome-wide asso- vation: the role of domestication, natural hybridization and allopolyploid ciation mapping of preharvest sprouting resistance in a diversity panel of speciation in their diversification, Plant Cell Physiol., 52, 750–64. European winter wheats, J. Appl. Genet., 56, 277–85. 39. McIntosh, R. A., Yamazaki, Y., Dubcovsky, J., et al. 2013, Catalogue of 42. Nishio, Z., Kojima, H., Hayata, A., et al. 2010, Mapping a gene confer- gene symbols for wheat. https://shigen.nig.ac.jp/wheat/komugi/genes/ ring resistance to Wheat yellow mosaic virus in European winter wheat symbolClassList.jsp (5 February 2018, date last accessed) cultivar ‘Ibis’ (Triticum aestivum L.), Euphytica, 176, 223–9. 40. Zhou, Y., Tang, H., Cheng, M. P., et al. 2017, Genome-wide association 43. Mayer, K. F., Taudien, S., Martis, M., et al. 2009, Gene content and study for pre-harvest sprouting resistance in a large germplasm collection virtual gene order of barley chromosome 1H, Plant Physiol., 151, of Chinese wheat landraces, Front. Plant Sci., 8, 401. 496–505. Downloaded from https://academic.oup.com/dnaresearch/article-abstract/25/3/317/4898127 by Ed 'DeepDyve' Gillespie user on 26 June 2018

Journal

DNA ResearchOxford University Press

Published: Feb 21, 2018

There are no references for this article.

You’re reading a free preview. Subscribe to read the entire article.


DeepDyve is your
personal research library

It’s your single place to instantly
discover and read the research
that matters to you.

Enjoy affordable access to
over 18 million articles from more than
15,000 peer-reviewed journals.

All for just $49/month

Explore the DeepDyve Library

Search

Query the DeepDyve database, plus search all of PubMed and Google Scholar seamlessly

Organize

Save any article or search result from DeepDyve, PubMed, and Google Scholar... all in one place.

Access

Get unlimited, online access to over 18 million full-text articles from more than 15,000 scientific journals.

Your journals are on DeepDyve

Read from thousands of the leading scholarly journals from SpringerNature, Elsevier, Wiley-Blackwell, Oxford University Press and more.

All the latest content is available, no embargo periods.

See the journals in your area

DeepDyve

Freelancer

DeepDyve

Pro

Price

FREE

$49/month
$360/year

Save searches from
Google Scholar,
PubMed

Create lists to
organize your research

Export lists, citations

Read DeepDyve articles

Abstract access only

Unlimited access to over
18 million full-text articles

Print

20 pages / month

PDF Discount

20% off