Association genetics studies on frost tolerance in wheat (Triticum aestivum L.) reveal new highly conserved amino acid substitutions in CBF-A3, CBF-A15, VRN3 and PPD1 genes

Association genetics studies on frost tolerance in wheat (Triticum aestivum L.) reveal new highly... Background: Understanding the genetic basis of frost tolerance (FT) in wheat (Triticum aestivum L.) is essential for preventing yield losses caused by frost due to cellular damage, dehydration and reduced metabolism. FT is a complex trait regulated by a number of genes and several gene families. Availability of the wheat genomic sequence opens new opportunities for exploring candidate genes diversity for FT. Therefore, the objectives of this study were to identity SNPs and insertion-deletion (indels) in genes known to be involved in frost tolerance and to perform association genetics analysis of respective SNPs and indels on FT. Results: Here we report on the sequence analysis of 19 candidate genes for FT in wheat assembled using the Chinese Spring IWGSC RefSeq v1.0. Out of these, the tandem duplicated C-repeat binding factors (CBF), i.e. CBF-A3, CBF-A5, CBF-A10, CBF-A13, CBF-A14, CBF-A15, CBF-A18, the vernalisation response gene VRN-A1, VRN-B3, the photoperiod response genes PPD-B1 and PPD-D1 revealed association to FT in 235 wheat cultivars. Within six genes (CBF-A3, CBF-A15, VRN-A1, VRN-B3, PPD-B1 and PPD-D1) amino acid (AA) substitutions in important protein domains were identified. The amino acid substitution effect in VRN-A1 on FT was confirmed and new AA substitutions in CBF-A3, CBF-A15, VRN-B3, PPD-B1 and PPD-D1 located at highly conserved sites were detected. Since these results rely on phenotypic data obtained at five locations in 2 years, detection of significant associations of FT to AA changes in CBF-A3, CBF-A15, VRN-A1, VRN-B3, PPD-B1 and PPD-D1 may be exploited in marker assisted breeding for frost tolerance in winter wheat. Conclusions: A set of 65 primer pairs for the genes mentioned above from a previous study was BLASTed against the IWGSC RefSeq resulting in the identification of 39 primer combinations covering the full length of 19 genes. This work demonstrates the usefulness of the IWGSC RefSeq in specific primer development for highly conserved gene families in hexaploid wheat and, that a candidate gene association genetics approach based on the sequence data is an efficient tool to identify new alleles of genes important for the response to abiotic stress in wheat. Keywords: Triticum aestivum L., Frost tolerance (FT), Candidate genes, Association studies, SNP, Indel, Haplotypes, CBF, VRN, PPD1 * Correspondence: dragan.perovic@julius-kuehn.de Julius Kühn-Institut (JKI), Federal Research Centre for Cultivated Plants, Institute for Resistance Research and Stress Tolerance, Erwin-Baur-Str. 27, 06484 Quedlinburg, Saxony-Anhalt, Germany Full list of author information is available at the end of the article © The Author(s). 2018 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. Babben et al. BMC Genomics (2018) 19:409 Page 2 of 24 Background resources, extensive phenotypic data and various genomic Wheat (Triticum aestivum L.) is on the world wide level arrays [13]. The chromosomal sequence information is the crop grown on the largest acreage and is of prime available from the IWGSC [14]. All mentioned data- importance for human nutrition and animal feed. bases are useful for the identification of homologous Worldwide production of 729 million tones in 2014, chromosome sequences in bread wheat. Summarising, a ranks wheat globally the third most important crop, next lot of sequence information of wheat sorted chromo- to maize (Zea mays L.) and rice (Oryza sativa L.). Pro- some arms [15–17], T. urartu [18]and Ae. tauschii [19] duction of wheat in Europe was 249.13 mt with an aver- have been published during the past few years and are age yield of 4.25 t/ha in 2014. At the same time in North integrated in the public databases mentioned above. America 84.68 mt of wheat were harvested with an aver- Today, the current version of the Chinese Spring age yield of 2.994 t/ha, while in Asia the average yield IWGSC RefSeq v1.0 [14] facilitates the exploitation of was 3.095 t/ha resulting in a production of 315.71 mt the wheat sequence in basic science as well as in [1]. In North America, North and Eastern Europe and applied wheat breeding. Russia wheat production is exposed to low temperature and frost damage occurs frequently. Depending of the Association analysis time of sowing wheat is known as winter or spring Genome-wide association studies (GWAS) are a power- wheat, respectively, which differ in the vernalisation ful tool to identify genomic regions and candidate genes requirement and frost hardiness. Winter wheat varieties involved in FT. High density genotyping arrays i.e. the require an extended exposure to cold temperatures, Illumina 90 K chip [8], the Affymetrix 820 K [20] and typically 4 to 8 weeks at less than 4 °C, to induce flower- the breeding 35 K axiom [21] arrays enable the deter- ing (vernalization) while spring wheat varieties do not mination of the genetic structure of complex traits by have such a requirement [2]. Vernalization requirements GWAS. However, they are limited in the identification keep these plants safely dormant over winter resulting in of new alleles involved in FT. One approach that allows a longer vegetation period and higher yields of winter mining of novel alleles is re-sequencing of candidate wheat. genes followed by a candidate gene association genetics study [22]. Wheat genome Bread wheat is an allohexaploid species (2n = 6× = 42) Cold stress signalling and transcriptional regulation with an AABBDD genome derived from two independ- Frost tolerance (FT) is a complex biological process in- ent hybridisations [3–5]. This complex genome of about volving at least two main pathways and many additional 17 Giga-base pairs (Gbp) has a repeat content of ap- processes encompassing a large number of genes. The proximately 80% [6] with a gene density of on average main pathway is frost response, whereas flowering as the one gene per 87 to 184 Kilo-base pairs (Kbp) [7]. The second pathway involves vernalisation. Plant cells can acquisition of this large genome sequence became feas- perceive cold stress through the membrane rigidification 2+ ible since the introduction of next Generation Sequen- effect. This leads to Ca influx into the cytosol and 2+ cing (NGS) technologies. these characteristic Ca signatures are detected by Nowadays, the efforts of the International Wheat calcium binding proteins (CBPs) [23, 24]. The induction Genome Sequencing Consortium (IWGSC), in the of cold response genes starts with the activation of development of a physical map and a reference sequence so-called inducer of CBF expression (ICE) genes. These facilitates many downstream applications, i.e. develop- genes are MYC-type basic helix–loop–helix transcrip- ment of high throughput genotyping platforms [8], effi- tion factors which bind on MYC recognition sites of cient development of genome specific primers [9], C-repeat binding factor (CBF) promoters and conse- exome sequencing in large genebank collections [10] etc. quently activate the expression of these genes [25]. The The wheat reference sequence, survey sequence and CBF transcription factors are members of the APE- physical map are available at many public data bases. TALA2/Ethylene response element binding protein The CerealsDB web page, created by members of the (AP2/EREBP) family of DNA-binding proteins [26, 27]. Functional Genomics Group at the University of Bristol The AP2/EREBP DNA-binding protein domain comprises [11], includes online resources of genomic information, a structure with three β-strands and one α-helix [28–30]. i.e. varietal SNPs, DArT markers, and EST sequences all Furthermore, PKK/RPAGRxKFxETRHP and DSAWR mo- linked to a draft genome sequence of the cultivar Chin- tifs are present, which are typical features of CBF proteins ese Spring [12]. The Unité de Recherche Génomique [31]. The CBF transcription factors bind to the C-repeat/ Info (URGI) web portal includes datasets such as dehydration-responsive element (CRT/DRE) and induce chromosome survey sequences, reference sequences, the expression of cold-responsive/late embryogenesis– physical maps, genetic maps, polymorphisms, genetic abundant (COR/LEA)genes [32–34]. The CRT/DRE Babben et al. BMC Genomics (2018) 19:409 Page 3 of 24 element is a highly conserved CCGAC sequence in the [47, 61]. Knox et al. [62]analysedthe FR-A 2locus promoter of cold- and dehydration-responsive genes [35]. of diploid Triticcum monococcum (A genome is very The flowering pathway is involved in FT because it similar to the A genome of hexaploid wheat) and contains vernalisation (VRN) and photoperiod response identified three CBFs (CBF12, CBF14 and CBF15) (PPD) genes that contributed to low temperature accli- highly associated with FT. Also Vagujfalvi et al. [59] matisation [36, 37]. The PPD1 is a member of the PRR identified CBF14 and CBF15 as FT associated in Triticum (pseudo response regulator) protein family and interacts monococcum and Soltesz et al. [63] confirmed this for Tri- with CONSTANS (CO) [38]. This family possesses a ticum aestivum. Additionally, Kocsy et al. [64] identified pseudo-receiver domain and a CONSTANS motif three genes, i.e. Tacr7 (Triticum aestivum cold-regulated [39, 40]. Downstream the VRN genes are localised, 7), Cab (calcium-binding EF-hand family protein-like) and i.e. VRN1, VRN2 and VRN3. VRN1 encodes a Dem (Defective embryo and meristems) being differen- MADS-box transcription factor, VRN2 is similar to a tially expressed during cold hardening in wheat. In putative zinc finger and a CCT domain and VRN3 addition, on the transcriptome level FT signaling is much encodes a RAF kinase inhibitor like protein [41–45]. more complex. Hundreds to thousands of wheat genes Both pathways, the flowering and the cold response were identified to be significantly up- or downregulated pathway, are connected by the interaction of VRN1 and under low temperature [34, 65–69]. CBF genes. For example, the VRN1 gene is able to re- The aim of this study was therefore to (i) identity duce the transcript levels of CBFs and COR genes under SNPs and indels (insertion-deletion) in genes known to long day conditions [36]. be involved in frost tolerance in wheat and (ii) to conduct a candidate gene based association genetics ap- Current knowledge about frost tolerance of wheat proach to get information on the effect of respective Two major FT loci, frost resistance 1 (FR1) and frost re- SNPs and indels on FT. sistance 2 (FR2), were identified on the long arm of chromosome 5A of wheat [46, 47]. Zhao et al. [48] de- Methods scribed an additional FT Quantitative Trait Locus (QTL) Plant material and DNA extraction on chromosome 5B in wheat germplasm from central A diverse set of 235 bread wheat genotypes was used for Europe. Due to the importance of cold acclimatisation in PCR amplification, amplicon sequencing and association winter and spring wheat, the locus FR1 was physically genetics studies. One hundred and seventy-nine and genetically mapped [49, 50]. However, it is not clear cultivars, 48 lines and 8 doubled haploid (DH) lines whether FR1 is an independent gene or is based on a originating from 28 countries from five continents pleiotropic effect of VRN1 [36, 51]. As a consequence of (Additional file 1: Table S1) were analysed. The associ- the presence of the A, B and D genome, there are three ation panel was selected based on pre-existing know- VRN1 homologous genes (VRN-A1, VRN-B1 and ledge regarding the reaction to growing conditions VRN-D1) not having the same impact on vernalisation. during winter time, i.e. high latitude and continental Wheat plants with a dominant VRN-A1 allele are spring European winter wheat collections as well as Russian type and do not need vernalisation for flowering, while and North American cultivars. Furthermore, the core the dominant VRN-B1 and VRND1 alleles result also in collection of the Institute of Field and Vegetable Crops spring habit, but they are weaker than VRN-A1 [52]. (IFVCNS), Novi Sad, Serbia [70] and parental lines of The difference among the spring (dominant VRN-A1 Western European hybrid breeding programs were included. alleles) and winter varieties (recessive vrn-A1 alleles) is For the physical mapping of PCR amplicons to chromo- based on a C/T single nucleotide polymorphism (SNP) somes and chromosome segments, 21 nulli-tetrasomic lines in the fourth exon of the VRN-A1 gene. The winter (NT-lines) and 46 deletion-lines were used [9, 71, 72]. The varieties carrying an ambiguous nucleotide Y (C/T) are DNA was extracted at the three leaf stage according to Stein more frost tolerant than genotypes carrying the VRN-B1 et al. [73]. or VRN-D1 allele which both confer higher frost tolerance than spring varieties carrying a C at the re- Field experiments and phenotypic data analysis spective SNP in VRN-A1 [2, 53–56]. The field experiments were performed in five environ- Plenty of studies identified the FR-A2 locus on ments during 2012 (Gatersleben, Germany; Ranzin, chromosome 5A as the most important locus involved Germany; Puskin, Russia; Roshchinskiy, Russia; Novosibirsk, in FT in wheat [57–59]. The FR-A2 locus comprises at Russia) and 2013 (Gatersleben, Germany; Ranzin, Germany; minimum 11 CBF genes and is located approximately Puskin, Russia; Roshchinskiy, Russia) and one in 2014 30 cM proximal to VRN1 [46, 47, 60]. Two independent (Novosibirsk, Russia). All 235 genotypes were tested in studies illustrate that CBF-A3, which is located in the Gatersleben, Ranzin, Pushkin, and Novosibirsk in a random FR-A2 locus, plays an important role in wheat FT design in double rows and two replications per genotype. Babben et al. BMC Genomics (2018) 19:409 Page 4 of 24 The trial in Roshchinskiy was conducted as a miniplot Tartu, Estonia), 0.2 mM dNTPs (Fermentas, St. Leon-Rot, (2.5m ) trial with one replication. FT was evaluated as win- Germany) and 0.25 pmol primers (Microsynth, Balgach, ter survival in percent (%), i.e. the survival of plants per Switzerland) or 0.4 U MyTaq™ DNA Polymerase, 1 x My genotype and plot was measured as a quantitative trait (%) Taq Reaction Buffer B (that comprised 1 mM dNTPs and ranging from 0% (all dead) to 100% (all alive) after winter. 3 mM MgCl ) (BIOLINE, Luckenwalde, Germany) and To take into account the diversity of the environments 0.25 pmol primers. The PCR fragment amplification was with respect to climatic conditions, a co-variable com- conducted in a GeneAmp® PCR System 9700 (Applied prising the number of days with an average air Biosystems, Darmstadt, Germany) using various PCR pro- temperature under − 15 °C in the period from December files (Additional file 2: Table S2). PCR fragments were sep- 1st to April 30th of each year was calculated. The correl- arated by agarose gel electrophoresis and analysed using ation coefficient r [74] was calculated between the the imaging system Gel Doc™ XR and the Quantity One® co-variable and FT. Principal component analysis (PCA) 1-D analysis software (4.6.2) (Bio-Rad, Hercules, USA) was used to get information on the influence of the en- and subsequently sequenced by the company Microsynth vironment on FT. Correlation coefficient r and PCA AG (Balgach, Switzerland) using the Sanger sequencing were calculated with the JMP® Genomics 5.1 software method [75]. (SAS, Cary, USA). Data measured at the same location in different years were treated as independent. All data Detection of polymorphisms (SNPs/indels) and haplotypes sets that exhibit a deviation described by the second The sequencing raw data were edited using Sequencer component of PCA analysis, indicate an atypical trait 5.1 (Gene Codes Corporation, Ann Arbor, USA). Next, value putatively caused by secondary environmental fac- the adjusted sequences of each primer pair were aligned tors and were discarded. Furthermore, the coefficient of by using the Multiple Alignment tool Fast Fourier variation (CV; standard deviation divided by arithmetic Transform (MAFFT) [76]. MAFFT parameters were set average) was calculated as well as the variance of each as default. The polymorphisms between the 235 geno- environment and year. The data sets with a very low CV types were detected automatically via a small multiple (under 0.15) and/or variance (under 150), were classified sequence alignment (MSA) script (Additional file 3: Data as non-representative and were discarded. After editing S1) in the free software Java™. Parameters for the poly- of the field data, the significance of the influence by en- morphism detection were as follows: polymorphisms be- vironment and genotype was tested by using the analysis tween defined bases (A, T, C or G) and ambiguous of covariance (ANCOVA) and a general linear model nucleotides (N) were ignored.. The detected SNPs were (GLM) procedure. Based on this, the Least-Squares used for the identification of haplotypes and compo- means (LS means) were calculated. For all of these ana- nents of haplotypes for each candidate gene according lyses the SAS® 9.4 software (SAS, Cary, USA) was used. to the position in promoter, exon, intron or in the 3` un- translated region (UTR). PCR amplification, fragment analysis and re-sequencing Primer development and testing was conducted accord- Population structure and kinship calculation ing to Babben et al. [9]. Furthermore, the primer assign- In order to account for population structure effects in ment was verified based on BLASTs of mRNA and association studies, the population structure was esti- genomic regions of close relatives against the Chinese mated based on 249 SNPs, chosen according to the map Spring reference assembly v1.0 using NCBI Megablast. location and even distribution along the 21 wheat chro- Subsequently, 5 kb upstream and downstream regions mosomes [8, 77]. Population structure of wheat acces- were extracted based on the BLAST results and the ana- sions was assessed using STRUCTURE v 2.3.3, which is lysis of the nulli-tertasomic (NT) lines [9]. Primers were based on a Bayesian model-based clustering algorithm aligned to these genomic regions using a free shift align- that incorporates admixture and allele correlation ment with affine gap costs (gap opening = 5, gap elong- models to account for genetic material exchange in pop- ation = 0.01, match = − 1). If ambiguities were detected ulations, resulting in shared ancestry [78]. Five inde- at the beginning or the end of exons, primers were pendent runs were performed setting the number of manually modified to match the consensus dinucleotides populations (k) from 1 to 10, burn in time and Markov of splice sites, GT and AG. PCR amplification was Chain Monte Carlo (MCMC) replication number both performed in 20 μl reaction volume (RV) by using two to 100,000. The k-value was determined by ln P(D) in polymerases, i.e. FIREPol® DNA polymerase (Solis STRUCTURE output and an ad hoc statistic Δk based BioDyne, Tartu, Estonia) and MyTaq™ DNA polymerase on the rate of change in ln P(D) between successive (BIOLINE, Luckenwalde, Germany). The master mix for k-values [79]. Wheat lines with probabilities ≥0.5 were a single PCR reaction comprised 0.4 U FIREPol® DNA assigned to corresponding clusters. Lines with probabil- Polymerase, 1 x Buffer B, 2.5 mM MgCl (Solis BioDyne, ities < 0.5 were assigned to a mixed group. The 2 Babben et al. BMC Genomics (2018) 19:409 Page 5 of 24 population structure plot was constructed by using The alignment of initial and homologous AA sequences STRUCTURE PLOT [80] and the Principal coordinate was performed using CLC and MAFFT. The last step analysis (PCoA) by using the software package DARwin was the identification of domains and motifs and the [81]. The kinship (K) matrix was calculated on a modi- prediction of the protein structure via RaptorX. For the fied Roger’s distance [82–84] by using R version 3.2.1 quality check of the RaptorX results, the uGDT-value, free software [85]. The Roger’s distance was calculated uSeqId-value and P-value were taken into account. A − 3 as follows: uGDT > 50, uSeqId > 30 and a P-value less than 1*10 are indicators for good quality modelling. Nucleotid di- vffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi vergence rates (dN/dS) between the identified haplotypes m nj XX t 2 pffiffiffiffiffiffiffi and reference sequences of Triticum aestivum and re- d ¼ ðp −q Þ ð1Þ ij ij 2m i¼1 j¼1 lated species were analysed using a web-based HyPhy application [93, 94]. Haplotype and reference sequences where pij and qij are allele frequencies of the jth allele at were used to generate sequence alignments by applying the ith locus, n number of alleles at the ith locus and m the L-INS-i option in MAFFT [95]. To obtain the best number of loci. fitting substitution model, the model test application in MEGA-CC was used [96]. The reconstruction of the Association genetics analysis phylogenetic tree was done with maximum likelihood al- SNP and indel association genetics analysis gorithm and 500 bootstraps in MEGA-CC (using the The SNP/indel association analysis was performed with corresponding model). The resulting protein alignment a set of 235 genotypes by using TASSEL 5.0.9 [86, 87]. and the corresponding nucleotide sequences were used Only the SNPs/indels with minor allele frequencies to compute codon alignments with Pal2Nal [97]. The (MAF) > 5% were taken into consideration for analysis. codon alignments and the phylogenetic tree were used Furthermore, population structure, kinship matrix and to compute dN/dS for each variable site using the SLAC phenotypic LS means were included in association stud- method in HyPhy [94]. ies applying the mixed linear model (MLM) algorithm. The threshold of statistically significant effects was set to In silico promoter analysis 1.30 –log of P (P-value of 0.05) according to Li et al. Identification of promoter regions and regulatory sites [22, 77] who used this threshold for the analysis of can- was performed using the internet databases SOGO from didate genes at chromosomal regions with high linkage the National Institute of Agrobiological Sciences [98, 99] disequilibrium (LD. The LD was calculated via TASSEL and Softberry [100, 101]. 5.0.9 by using the full matrix LD type method with 8064 comparisons after MAF selection. Results Phenotypic data analysis Haplotype association genetics analysis Phenotypic data of five field locations were obtained The haplotype association genetics study was performed during 2 years (Fig. 1 and Additional file 4: Figure S1). by using TASSEL 4.1.20 [86, 87]. The parameters, calcu- These phenotypic data sets and the established lation and significance threshold were the same as used co-variable (number of days under − 15 °C) are only for the SNP and indel association analysis. The same ap- weakly correlated (r = 0.24), indicating that all pheno- plies to the LD calculation for the genotype groups from typic data are not suitable to be used for FT analysis. All Europe, North America and Asia including Australia. the FT scores were transformed using arcsine and Cox-Box (data not shown), but transformations did not Sequence analysis result in any improvement. A scatter plot illustrates that The translation of associated gene sequences into amino the phenotypic data obtained in Pushkin_2013 and acid (AA) sequences, homologue AA sequence search, Novosibirsk_2014 are not due to FT (Additional file 4: AA alignments, identification of protein domains and Figure S1). A reason for this may be the very few days motifs as well as prediction of secondary protein struc- under − 15 °C at Pushkin in 2013 (10 days) but a rather tures were performed using the following software: CLC low winter hardiness (mean winter survival of 16.2%) Main Workbench 7.6 (CLC Bio, Aarhus, Denmark), due to missing snow coverage; while in Novosibirsk in MAFFT, free software Jalview [88], NCBI (National 2014 a very high number of days under − 15 °C (41 days) Center for Biotechnology) protein BLAST [89] and but less frost damage was observed due to continuous RaptorX [90–92]. The nucleotide sequences were trans- snow coverage (mean winter survival of 82.6%). The lated into AA sequences via CLC following BLASTn PCA calculation shows that the environments Ran- against the NCBI protein database for homologous AA zin_2013, Gatersleben_2013 and Novosibirsk_2014 are sequence identification. Parameters were set as default. separated from the other environments described by the Babben et al. BMC Genomics (2018) 19:409 Page 6 of 24 Fig. 1 Phenotypic variation at five field locations during two experimental years. Center lines show the medians; box limits indicate the 25th and 75th percentiles as determined by R software; whiskers extend to 5th and 95th percentiles, outliers are represented by dots. n > 203 second component (Additional file 5: Figure S2). This further analysis because results are not really related to indicates that at these locations in the respective years FT. Out of the data sets Ranzin_2012, Gatersleben_2012, phenotypic data are influenced by additional factors and Pushkin_2012, Novosibirsk_2012, Roshchinskiy_2012 are not entirely due to differences in FT. An additional and Roshchinskiy_2013 the LS means were calculated characteristic of phenotypic data is the CV and variance (Additional file 1: Table S1). (in %) calculation of each environment per year. High CVs (over 0.15) and/or variances (over 150) represent a good phenotypic distribution of all 235 genotypes within Candidate gene polymorphisms the field data sets. Ranzin_2013, Gatersleben_2013 and A set of 65 specific primer pairs from the previous Novosibirsk_2014 show very low CVs between 0.08 and study [9] was BLASTed against the IWGSC RefSeq 0.15 and variances between 54.3 and 149.4. The environ- allowing the identification of 39 primer combinations ment Pushkin_2013 shows a low variance with 51 but a that cover the full length of 19 genes and their struc- high CV value of 0.44. These values resulted from a very tural analyses. An optimised set of 39 primer pairs low winter survival with a low standard deviation was used for PCR amplification, amplicon sequencing (Table 1). In conclusion of the phenotypic data analysis, and association genetics studies. No exact position environments Ranzin_2013, Gatersleben_2013, Push- could be determined for the first forward primer of kin_2013 and Novosibirsk_2014 were excluded from PPD-B1 andthe thirdforwardprimer of VRN-D2, Table 1 Variance and coefficient of variation of frost tolerance data Location/Year Size Missing data Mean frost tolerance in % Standard deviation Variance Coefficient of variation Pushkin_2012 235 0 82.40 13.88 192.52 0.17 Pushkin_2013 235 0 16.16 7.14 51.01 0.44 Novosibirsk_2012 235 0 45.60 16.46 271.05 0.36 Novosibirsk_2014 235 0 82.58 12.22 149.42 0.15 Ranzin_2012 235 0 95.20 13.31 166.12 0.14 Ranzin_2013 235 0 97.67 8.19 54.27 0.08 Roshchinskyiy_2012 235 32 69.33 22.64 512.53 0.33 Roshchinskiy_2013 235 4 56.55 13.68 187.24 0.24 Gatersleben_2012 235 0 85.89 22.14 490.01 0.26 Gatersleben_2013 235 0 97.18 9.12 83.21 0.09 Babben et al. BMC Genomics (2018) 19:409 Page 7 of 24 since BLASTN revealed seven and five matches, re- Population structure and kinship spectively. For all other primers an exact position was Genetic profiles were obtained by using the ILLUMINA determined. In addition, an unassigned scaffold was Infinium iSelect 90 k wheat chip. These data were kindly identified for PPD-B1 located on chromosome 2B provided by the Dr. A. Börner from IPK Gatersleben and (Additional file 6: Figure S3). The total re-sequenced will be used for GWAS analysis in an additional paper. length of candidate genes was 13.3 Kbp coding DNA As an outcome of the STRUCTURE analysis based on sequence (CDS) and 43.76 Kbp genomic lengths, with 249 SNPs covering the whole genome, k = 3 was the a CDS/genomic length ratio of 0.30. In total, a se- most probable number of sub-populations mostly quence length of 33.5 Kbp was analysed. In our set of corresponding to the origin of genotypes. The 235 genotypes, the 15 candidate genes Cab, CBF-A3, neighbour-joining tree revealed three sub-populations CBF-A5, CBF-A10, CBF-A13, CBF-A14, CBF-A15, i.e. North American, Russian, and North and Central CBF-A18, Tacr7, VRN-B3, VRN-A1, VRN-B1, VRN-D1, European genotypes. This population structure resem- PPD-B1 and PPD-D1 were polymorphic and four bled a tight association to the origin of the genotypes (CBF-D1, Dhn1, VRN2 and Dem)(Table 2)were analysed. In the first sub-population, accessions from monomorphic. In total 254 polymorphic sites, i.e. 221 European countries, subdivided into two groups, i.e. SNPs and 33 indels were identified. The SNP number North and Central European genotypes, are included. per gene ranged from 0 to 97 and the indel number Accessions from Canada, Mexico and USA were pre- from 0 to 12. Over all genes, 42 polymorphic sites in dominantly in the second sub-population, and the third promoter regions, 64 in introns, 25 in 3` UTRs and sub-population contained predominantly accessions 123 in exons were identified. Out of the 254 poly- from Russia and Kazakhstan. The population structure is morphic sites, 131 were located in non-coding re- shown in Fig. 3 and Additional file 7: Figure S4. The gions, and 54 synonymous and 69 non-synonymous STRUCTURE membership coefficients and the modified polymorphic sites were identified (Fig. 2). The num- Roger’s distance revealed a high degree of admixture in a ber of haplotypes ranged between two and six and large number of accessions. Therefore, many accessions the haplotype diversity (Hd) between 0.07 and 0.68 could not be classified to main groups, because their (Table 3). genomes represent a mixture of the main groups. This Table 2 List of analysed FT candidate genes, sequence length, number of specific PCR fragments and detected mutations Gene Chr. position Number of specific Sequence length CDS length Gene length CDS/gene Detected Number of PCR fragments in bp in bp in bp length ratio mutations polymorphic sites CBF-D1 5D 1 709 639 639 1.00 no 0 CBF-A3 5A 1 790 741 741 1.00 yes 4 CBF-A5 7A 1 1027 633 633 1.00 yes 6 CBF-A10 5A 1 867 720 720 1.00 yes 2 CBF-A13 5A 1 855 720 720 1.00 yes 6 CBF-A14 5A 2 610/748 639 639 1.00 yes 6 CBF-A15 5A 1 786 726 726 1.00 yes 7 CBF-A18 6A 1 1045 738 738 1.00 yes 37 Dhn1 5D 1 936 420 638 0.66 no 0 VRN-A1 5A 4 1039/1097/621/770 735 11,414 0.06 yes 15 VRN-B1 5B 5 600/817/662/467/1077 735 5498 0.13 yes 24 VRN-D1 5D 4 814/1359/1429/705 735 11,550 0.06 yes 2 VRN-D2 4D 2 976/534 639 1650 0.39 no 0 VRN-B3 7B 2 1602/994 534 1258 0.42 yes 1 Cab 5A 1 728 n.a. n.a. n.a. yes 28 Dem 6B/6D 1 616 n.a. n.a. n.a. no 0 Tacr7 2B 1 765 n.a. n.a. n.a. yes 3 PPD-B1 2B 6 1600/954/927/571/773/378 1995 3053 0.65 yes 5 PPD-D1 2D 3 726/1094/966 1983 3141 0.63 yes 109 total 39 34,034 13,332 43,758 0.30 255 Chr. chromosome, bp base pairs, CDS coding DNA sequence, n.a. not available PPD-D1 PPD-B1 VRN-D1 VRN-B1 VRN-A1 Vrn3 Tacr7 CBF-A18 CBF-A15 CBF-A14 CBF-A13 CBF-A10 CBF-A5 CBF-A3 Cab Babben et al. BMC Genomics (2018) 19:409 Page 8 of 24 Polymorphic sites synonymous non-synonymous non-coding Candidate genes Fig. 2 Number of synonymous, non-synonymous and non-coding mutations per candidate gene may be due to germplasm exchange and it use in breed- region of CBF-A5, CBF-A13 and CBF-A14.The ing programs. remaining 17 SNPs and four indels were located in exon regions of five CBFs, PPD-D1, VRN-A1 and SNP/indel association analysis VRN-B3. 11 of these SNPs are non-synonymous. Six The association analysis was performed using 254 associated genes were located on wheat chromosome polymorphic sites identified in 15 candidate genes. 5A and one each on chromosome 2D, 7A and 7B. After MAF selection, 58 polymorphic sites, i.e. 46 Allelic effects of significantly associated polymorphic SNPs and 12 indels, from 13 candidate genes (Cab, sites on FT ranged from 0.11% to 15.01 (Fig. 4, CBF-A3, CBF-A5, CBF-A10, CBF-A13, CBF-A14, Table 4, Additional file 8:Table S3). With an LD of CBF-A15, Tacr7, VRN-B3, VRN-A1, VRN-D1, PPD-B1 r = 0.92 to 1, the statistical significances of associ- and PPD-D1) were included in further analysis. In ated SNPs and indels from the five CBF genes on the SNP/indel association, 27 statistically significant chromosome 5A (CBF-A3, CBF-A10, CBF-A13, (P < 0.05) polymorphic sites (21 SNPs and six indels) CBF-A14 and CBF-A15)isveryhigh. TheLDplotof in six candidate genes were identified (Table 4). Four all used SNPs/indels for association calculation is SNPs and two indels were located in the promoter showninFig. 5. Table 3 Polymorphic candidate genes, location of polymorphisms, amino acid change and haplotype diversity Gene Chr. Acces-sions Poly-morphic SNPs in-dels Promotor Intron Exon 3` UTR Non- Non-syno- Syno-nymous Haplo-types Hd sites coding nymous CBF-A3 5A 235 4 4 0 0 0 4 0 0 3 1 2 0.30 CBF-A5 7A 235 6 5 1 1 0 5 0 1 3 2 4 0.54 CBF-A10 5A 235 2 2 0 0 2 0 0 1 1 2 0.30 CBF-A13 5A 235 6 4 2 1 0 5 0 2 4 0 3 0.30 CBF-A14 5A 235 6 5 1 5 0 1 0 5 0 1 3 0.30 CBF-A15 5A 235 7 6 1 0 0 7 0 0 5 2 2 0.30 CBF-A18 6A 235 37 34 3 1 0 25 11 12 12 13 3 0.20 VRN-A1 5A 235 15 10 5 7 6 2 0 13 2 0 6 0.19 VRN-B1 5B 235 24 21 3 20 4 0 0 24 0 0 5 0.07 VRN-D1 5D 235 2 2 0 0 1 0 1 2 0 0 3 0.14 VRN-B3 7B 235 1 1 0 0 0 1 0 0 1 0 3 0.50 Cab 5A 235 28 24 4 5 0 12 11 16 12 0 6 0.64 Tacr7 2B 235 3 2 1 0 0 1 2 2 1 0 3 0.40 PPD-B1 2B 235 5 5 0 1 0 4 0 1 3 1 6 0.37 PPD-D1 2D 235 109 97 12 1 53 55 0 54 22 33 6 0.68 total 255 222 33 42 64 124 25 132 69 54 58 Chr. chromosome, SNP single nucleotide polymorphism, indel insertion-deletion, UTR untranslated region, Hd Haplotype Diversity Number of polymorphic sites Babben et al. BMC Genomics (2018) 19:409 Page 9 of 24 Fig. 3 Population structure of 235 wheat cultivars based on 249 SNPs. Each individual is represented by a single vertical line that is partitioned into Q colored segments (Q = 3) in the x-axis. The y-axis illustrated the Q-value Linkage disequilibrium (LD) and germplasm origin CBF-A10, CBF-A13, CBF-A14, CBF-A15, VRN-A1), 6A Out of 235 studied genotypes, 116 originate from (CBF-A18) and 7A (CBF-A5). The allelic effects of asso- Europe, 38 from North America and 81 from Asia. After ciated haplotypes for winter survival ranged from 0.68% MAF selection, 51 polymorphic sites were identified in to 33.95 (Fig. 6, Table 5, Additional file 8: Table S3). the genotypes from Europe, 88 in those derived from North America and 25 in Asian genotypes. All three dif- In silico sequence analysis ferent sub-populations show that the CBF genes on C-repeat binding factors (CBFs) chromosome 5A are in a very high LD, and are sepa- To validate and annotate obtained sequences of candi- rated from other 5A located genes, i.e. VRN-A1 and date genes, an in silico analysis was performed. NCBI Cab. All non-5A candidate genes show a low LD to each protein BLASTX of CBF AA sequences results in nine other in all sub-populations. homologues from five related and four unrelated species (Additional file 9: Table S4). For the CBFs (CBF-A3, Haplotype association analysis CBF-A5, CBF-A10, CBF-A13, CBF-A14, CBF-A15, A set of 44 haplotypes and components of haplotypes CBF-A18) that showed a significant association to FT, from 15 candidate genes were identified. After MAF se- the AP2/EREBP transcription factor domain was identi- lection, 32 haplotypes (six promoter haplotypes, 12 exon fied, which is predicted to encode a protein structure haplotypes, one intron haplotype, three 3` UTR haplo- with three β-strands and one α-helix (2gcc). The param- types and 10 whole gene haplotypes) in 14 candidate eters of protein structure modelling are shown in Table 6. genes (Cab, CBF-A3, CBF-A5, CBF-A10, CBF-A13, The CBF-A13_1 haplotype protein shows a low P-value − 4 CBF-A14, CBF-A15, CBF-A18, Tacr7, VRN-B3, VRN-A1, of 4.79 × 10 , and a high uGDT of 51, and a uSeqId of VRN-D1, PPD-B1 and PPD-D1) were used for haplotype 15 extent. Protein structure of CBF-A13_2 could not be analysis. In the haplotype association analysis, 22 haplo- predicted. In all remaining cases protein structure types (five promoter haplotypes, nine exon haplotypes, models were of high quality. Additionally, the PKK/ one intron haplotype and seven whole gene haplotypes) RPAGRxKFxETRHP and DSAWR motifs were identified, in ten candidate genes revealed significant associations which are typical features of CBF proteins. The AA se- (P < 0.05) (Table 5). CBF-A18 is the only exon haplotype quences between two CBF-A3 haplotypes revealed three which is not significantly associated to FT. The same ap- AA substitutions. The CBF-A3_SNP2 [C/G] on CDS site plies for promoter haplotypes of CBF-A5, CBF-A13, 263 leads to an Ala/Gly change, CBF-A3_SNP3 [A/G] CBF-A14, PPD-B1 and PPD-D1, intron haplotype of on CDS site 367 to a Ser/Gly change and CBF-A3_SNP4 PPD-D1 and the whole gene haplotypes of CBF-A5, [C/T] on CDS site 407 to a Ser/Phe change. The AA CBF-A13, CBF-A14, CBF-A18, PPD-B1, PPD-D1 and change of CBF-A3_SNP2 is located in the α helix of the VRN-A1. The associated genes are located on chromo- AP2 domain. The CBF-A3_SNP3 and CBF-A3_SNP4 is some 2B (PPD-B1), 2D (PPD-D1), 5A (CBF-A3, located upstream of the identified domains/motifs. The Babben et al. BMC Genomics (2018) 19:409 Page 10 of 24 Table 4 Statistics of significantly associated SNPs and indels Gene/Polymorphism name CDS and promotor site P-value -Log of P Polymorphism Effect in % Observations CBF-A3_SNP1 222 6.28E-10 9.20 A C 15.01 0.00 193 42 CBF-A3_SNP2 263 6.28E-10 9.20 C G 15.01 0.00 193 42 CBF-A3_SNP3 367 6.28E-10 9.20 A G 15.01 0.00 193 42 CBF-A3_SNP4 407 6.28E-10 9.20 C T 15.01 0.00 193 42 CBF-A5_indel1 −83 6.00E-3 2.22 C – −4.52 0.00 141 84 CBF-A10_SNP1 471 6.28E-10 9.20 C T 15.01 0.00 193 42 CBF-A10_SNP2 518 6.28E-10 9.20 G C 15.01 0.00 193 42 CBF-A13_SNP1 −11 6.28E-10 9.20 T C 15.01 0.00 193 42 CBF-A13_indel3 19 6.28E-10 9.20 T – 15.01 0.00 193 42 CBF-A13_SNP2 92 6.28E-10 9.20 G A 15.01 0.00 193 42 CBF-A13_indel2 133 6.28E-10 9.20 – C 15.01 0.00 193 42 134 6.28E-10 9.20 – G 15.01 0.00 193 42 135 6.28E-10 9.20 – T 15.01 0.00 193 42 136 6.28E-10 9.20 – G 15.01 0.00 193 42 137 6.28E-10 9.20 – C 15.01 0.00 193 42 138 6.28E-10 9.20 – G 15.01 0.00 193 42 139 6.28E-10 9.20 – G 15.01 0.00 193 42 140 6.28E-10 9.20 – C A 15.22 0.00 6.75 193 41 1 141 6.28E-10 9.20 – G 15.01 0.00 193 42 142 6.28E-10 9.20 – C 15.01 0.00 193 42 143 6.28E-10 9.20 – A 15.01 0.00 193 42 144 6.28E-10 9.20 – G 15.01 0.00 193 42 145 6.28E-10 9.20 – G 15.01 0.00 193 42 146 6.28E-10 9.20 – G 15.01 0.00 193 42 147 6.28E-10 9.20 – G 15.01 0.00 193 42 148 6.28E-10 9.20 – C 15.01 0.00 193 42 149 6.28E-10 9.20 – A 15.01 0.00 193 42 150 6.28E-10 9.20 – A 15.01 0.00 193 42 151 6.28E-10 9.20 – C 15.01 0.00 193 42 152 6.28E-10 9.20 – G 15.01 0.00 193 42 153 6.28E-10 9.20 – C 15.01 0.00 193 42 154 6.28E-10 9.20 – G 15.01 0.00 193 42 155 4.00E-10 9.40 – G 15.01 0.00 193 42 156 6.28E-10 9.20 – G 15.01 0.00 193 42 157 6.28E-10 9.20 – G 15.01 0.00 193 42 158 6.28E-10 9.20 – C 15.01 0.00 193 42 159 6.28E-10 9.20 – G 15.01 0.00 193 42 160 6.28E-10 9.20 – G 15.01 0.00 193 42 161 6.28E-10 9.20 – T 15.01 0.00 193 42 162 6.28E-10 9.20 – G 15.01 0.00 193 42 163 6.28E-10 9.20 – G 15.01 0.00 193 42 164 6.28E-10 9.20 – G 15.01 0.00 193 42 CBF-A13_SNP3 294 6.28E-10 9.20 C A 15.01 0.00 193 42 Babben et al. BMC Genomics (2018) 19:409 Page 11 of 24 Table 4 Statistics of significantly associated SNPs and indels (Continued) Gene/Polymorphism name CDS and promotor site P-value -Log of P Polymorphism Effect in % Observations CBF-A14_indel1 −506 6.28E-10 9.20 G – 15.01 0.00 193 42 − 505 6.28E-10 9.20 T – 15.01 0.00 193 42 − 504 6.28E-10 9.20 G – 15.01 0.00 193 42 −503 6.28E-10 9.20 A – 15.01 0.00 193 42 − 502 6.28E-10 9.20 G – 15.01 0.00 193 42 −501 6.28E-10 9.20 T – 15.01 0.00 193 42 −500 6.28E-10 9.20 G – 15.01 0.00 193 42 − 499 6.28E-10 9.20 T – 15.01 0.00 193 42 −498 6.28E-10 9.20 G – 15.01 0.00 193 42 − 497 6.28E-10 9.20 A – 15.01 0.00 193 42 −496 6.28E-10 9.20 G – 15.01 0.00 193 42 − 495 6.28E-10 9.20 T – 15.01 0.00 193 42 CBF-A14_SNP1 − 449 6.28E-10 9.20 T C 15.01 0.00 193 42 CBF-A14_SNP2 − 158 6.28E-10 9.20 T C 15.01 0.00 193 42 CBF-A14_SNP3 −53 6.57E-10 9.18 C G 15.01 0.00 193 42 CBF-A14_SNP4 576 6.28E-10 9.20 G A 15.01 0.00 193 42 CBF-A15_SNP1 84 6.28E-10 9.20 C T 15.01 0.00 193 42 CBF-A15_SNP2 243 6.28E-10 9.20 T C 15.01 0.00 193 42 CBF-A15_SNP3 293 6.28E-10 9.20 C T 15.01 0.00 193 42 CBF-A15_SNP4 397 6.28E-10 9.20 G A 15.01 0.00 193 42 CBF-A15_indel1 503 6.28E-10 9.20 T – 15.01 0.00 193 42 504 6.28E-10 9.20 C – 15.01 0.00 193 42 505 6.28E-10 9.20 G – 15.01 0.00 193 42 506 6.28E-10 9.20 T – 15.01 0.00 193 42 507 6.28E-10 9.20 C – 15.01 0.00 193 42 508 6.28E-10 9.20 G – 15.01 0.00 193 42 509 6.28E-10 9.20 T – 15.01 0.00 193 42 510 6.28E-10 9.20 C – 15.01 0.00 193 42 511 6.28E-10 9.20 G – 15.01 0.00 193 42 CBF-A15_SNP5 694 6.28E-10 9.20 A G 15.01 0.00 193 42 CBF-A15_SNP6 716 6.28E-10 9.20 C G 15.01 0.00 193 42 PPD-D1_indel1 1266 3.77E-2 1.42 – C 3.96 0.00 68 166 1267 3.77E-2 1.42 – G 3.96 0.00 68 166 1268 3.77E-2 1.42 – T 3.96 0.00 68 166 1269 3.77E-2 1.42 – C 3.96 0.00 68 166 1270 3.77E-2 1.42 – G 3.96 0.00 68 166 VRN-A1_SNP1 349 3.03E-4 3.52 C T Y 0.00 1.35 0.11 24 2 209 VRN-B3_SNP1 19 3.76E-2 1.43 C G 3.05 0.00 105 128 CDS coding DNA sequence, P probability, Log logarithm number of genotypes per allele AA site of CBF-A3_SNP2 is significant for negative se- His/Arg change which are located in the β strand 3 or lection (Fig. 7). The AA haplotypes of CBF-A5 show the α helix of the AP2 domain, respectively. Addition- three AA changes, which are not significantly associated ally, CBF-A5_SNP4 site underlies negative selection in the SNP/indel analysis. In contrast CBF-A5_SNP2 [C/ (Additional file 10: Figure S5). The AA haplotypes of A] resulted in an Arg/Ser and CBF-A5_SNP4 [A/G] in a CBF-A10 show an AA change of Gly/Ala based on Babben et al. BMC Genomics (2018) 19:409 Page 12 of 24 All five AA substitutions regions show no positive or negative selection (Fig. 8a, Additional file 13: Figure S8). The CBF-A14 AA haplotypes do not result in AA changes (Additional file 14: Figure S9). The CBF-A18 haplotypes carry 14 AA substitutions. However, none of these is associated with FT or underlies negative or posi- tive selection (Additional file 15: Figure S10). Vernalisation genes (VRN-A1 and VRN-B3) Nine homologue AA sequences of VRN-A1 and VRN-B3 were identified (Additional file 9: Table S4). The RaptorX analysis of VRN-A1 identified protein structures from MADS-box/MEF2 (myocyte enhancer factor 2) domain (3kov) and MEF2A (1egw), which are located in the same conserved domain. In addition, the keratin-like (K) domain (4ox0) was identified. The complete complex was described by Theissen et al. [102] Fig. 4 Manhattan plot of SNPs/indels in candidate genes for FT. The and Kaufmann et al. [103] with a MADS DNA binding -log10 (P-values) from the association analysis are plotted against (M) domain, a type II transcription factor containing the the respective candidate gene. The red horizontal line indicates the Intervening (I) domain, a Keratin-like coiled-coil (K) significance threshold at P < 0.05. For better visualisation, the domain and a C-terminal (C) domain (MIKC-type). The successive genes are shown in alternating black and green colours parameters of protein structure modelling are shown in Table 6. The VRN-A1_SNP1 [C/T/Y] on CDS site 349 CBF-A10_SNP2 [G/C] on CDS site 518. This substitu- generated an AA change of Leu/Phe in the K domain tion is upstream of the identified domains/motifs and at between α helix three and α helix four. Furthermore, in a site not showing positive or negative selection the region of this AA substitution, no positive or nega- (Additional file 11: Figure S6). The haplotypes of tive selection was identified (Fig. 8b, Additional file 16: CBF-A13 show completely different AA sequences. The Figure S11). single bp deletion on CDS site 19 of CBF-A13_indel re- The association study revealed that VRN-B3 is sig- sults in a frame shift that changes every AA from the nificantly associated with FT only in the SNP/indel seventh AA onwards. Furthermore, a stop codon at pos- analysis. The protein prediction analysis identified an ition 74 AAs was detected. The haplotype one shows a Hd3A protein (3axy), which is a mobile flowering sig- 32 bp deletion at CDS site 133 to 164 (CBF-A13_indel2). nal in rice (Table 6). This flowering time family pro- This deletion is localised in the AP2 domain and causes tein contains four α helices, seven β strands and one a stop codon after 135 AAs. In summary, haplotype one segment B [104]. The VRN-B3_SNP1 [C/G] on CDS of CBF-A13 shows a higher similarity to homologue AA site 19 generates an AA change of His/Asp directly sequences of related species and the AP2 domain than before the start of the α helix 1. Furthermore, this haplotype two. Due to the short AA sequences of both site is subject to significant negative selection (Fig. 8c, CBF-A13 haplotypes and low similarity to homologue Additional file 17: Figure S12). AA sequences, no assumptions about positive or negative selection can be made (Additional file 12: Figure S7). Photoperiod response genes (PPD-B1 and PPD-D1) The AA analysis of two CBF-A15 haplotypes revealed The association study revealed that the gene PPD-B1 is four AA changes and one AA indel. A nine-nucleotide significantly associated to FT only in the haplotype ana- insertion within the coding region (sites 503 to 511) re- lysis and PPD-D1 in both analyses. The NCBI protein sulted in three additional Ser residues in the haplotype BLASTX of the PPD-B1 and PPD-D1 AA haplotype se- with CBF_indel. The AA change Ala/Val caused by quences showed nine homologues (Additional file 9: CBF-A15_SNP3 [C/T] on CDS site 293 is located in the Table S4). On the basis of the protein BLAST analysis, AP2 domain. The CBF-A15_SNP4 [G/A] on CDS site the Pseudo-Receiver domain and the CONSTANS motif 397 resulted in an Ala/Thr AA change. The last AA of the pseudo-response regulator (PRR) protein family changes are Ser/Gly caused by CBF-A15_SNP5 [A/G] on were identified [39, 40]. In the RaptorX analysis the pro- CDS site 694 and Ser/Trp caused by CBF-A15_SNP6 tein models of a putative response regulator domain [C/G] on CDS site 716. The CBF-A15_SNP4, (3t6k) and response regulator receiver domain (3jte) CBF-A15_SNP5, CBF-A15_SNP6 and CBF-A15_inde1 were identified (Table 6). These domains contain five α are located upstream of the identified domains/motifs. helices and five β strands. The PPD-B1 haplotype AA Babben et al. BMC Genomics (2018) 19:409 Page 13 of 24 Fig. 5 LD plot of all used SNPs and indels for association analysis. On the left side and at the top the genes and chromosomes with polymorphic sites are shown. Each coloured square in the below triangle represents the intensity of LD expressed by P-values for each pairwise comparison between polymorphic sites. Each coloured square in the top triangle represents the intensity of LD expressed by r for each pairwise comparison between polymorphic sites. On the right side the legend for r values and P-values are shown exhibited three AA changes. The PPD-B1_SNP2 [A/G] other AA changes between haplotype two and three did on CDS site 304 generated an AA change of Arg/Gly not show significant association in the SNP/indel associ- and PPD-B1_SNP3 [A/G] on CDS site 368 generated an ation study (Additional file 19: Figure S14). AA change of Asn/Asp. Both are located in the Pseudo-Receiver domain. The PPD-B1_SNP2 is located In silico promoter analysis in the α helix 3 and PPD-B1_SNP3 between β strand 4 The significantly associated CBF-A5_inde1 [C/−], which and α helix 4. The third AA change from Asp to Asn is is located 83 bp upstream of the start codon, entails no caused by PPD-B1_SNP5 [G/A] on CDS site 623. All changes on the promoter regulatory sites and has no in- three AA substitution sites show no significant positive fluence on the transcription of the CBF-A5 gene. Also, or negative selection (Fig. 8d, Additional file 18: Figure the associated polymorphic sites CBF-A13_SNP1 of the S13). The associated PPD-D1_indel1 on CDS site 1266 CBF-A13 promotor and the sites CBF-A14_indel1, to 1270 bp produced a stop codon on AA position 470. CBF-A14_SNP1, CBF-A14_SNP2 and CBF-A14_SNP3 Therefore, haplotype one has no CONSTANS motif. All of the CBF-A14 promoter show no regulatory site Babben et al. BMC Genomics (2018) 19:409 Page 14 of 24 Table 5 Statistics of haplotypes significantly associated to FT Haplotype name Chr. P-value -Log of P Haplotype FT effect of haplotypes in % Observations CBF-A3 ex. 5A 2.37E-11 10.63 1 2 15.01 0.00 193 42 CBF-A5 pro. 7A 7.17E-3 2.14 1 2 4.44 0.00 82 143 CBF-A5 ex. 7A 1.58E-2 1.80 1 2 3 19.51 18.09 0.00 204 28 3 CBF-A5 7A 5.16E-4 3.29 1 2 3 4 24.18 19.29 18.88 0.00 55 27 140 3 CBF-A10 ex. 5A 2.37E-11 10.63 1 2 15.01 0.00 193 42 CBF-A13 ex. 5A 1.57E-10 9.80 1 2 3 15.22 0.00 8.47 193 41 1 CBF-A13 pro. 5A 2.37E-11 10.63 1 2 15.01 0.00 193 42 CBF-A13 5A 1.57E-10 9.80 1 2 3 15.22 0.00 8.47 193 41 1 CBF-A14 pro. 5A 2.37E-11 10.63 1 2 15.01 0.00 193 42 CBF-A14 ex. 5A 1.90E-10 9.72 1 2 3 17.01 2.11 0.00 193 40 2 CBF-A14 5A 1.90E-10 9.72 1 2 3 17.01 2.11 0.00 193 40 2 CBF-A15 ex. 5A 2.37E-11 10.63 1 2 15.01 0.00 193 42 CBF-A18 6A 6.88E-3 2.16 1 2 3 9.32 6.66 0.00 209 9 17 PPD-B1 pro. 2B 4.23E-2 1.37 1 2 5.50 0.00 213 22 PPD-B1 ex. 2B 3.69E-7 6.43 1 2 3 4 18.97 0.00 28.28 33.39 3 5 207 20 PPD-B1 2B 1.31E-7 6.88 1 2 3 4 5 19.03 0.00 33.95 29.07 22.63 3 5 20 185 22 PPD-D1 pro. 2D 6.95E-3 2.16 1 2 3 14.38 10.75 0.00 141 80 4 PPD-D1 in. 2D 3.93E-2 1.41 1 2 3 18.66 19.56 0.00 162 69 3 PPD-D1 ex. 2D 7.05E-3 2.15 1 2 3 21.80 18.18 0.00 68 163 3 PPD-D1 2D 3.78E-3 2.42 1 2 3 4 5 6 21.47 3.77 19.28 16.13 0.00 20.55 68 3 69 90 3 1 VRN-A1 ex. 5A 2.57E-4 3.59 1 2 3 4 0.00 0.68 13.12 11.89 9 15 2 209 VRN-A1 5A 2.87E-3 2.54 1 2 3 4 5 6 14.13 15.90 2.26 15.30 0.00 8.90 207 2 13 2 6 1 Chr. chromosome, ex. exon, pro. Promotor, in. intron, P probability, Log logarithm, FT frost tolerance number of genotypes per allele changes which are indicative for a modification of the gene transcription. Discussion After hybridisation, domestication and breeding activities have shaped the genome of bread wheat and reduced the level of genetic diversity which is nowadays the major limiting factor in breeding of cultivars resistant to biotic and abiotic stresses [105]. Therefore, sequencing of candidate genes in large collections is a tool to rediscover hidden genetic diversity and use this in breeding. LD and diversity The genotype group from Asia shows a very small diver- sity with 25 polymorphic sites out of 81 genotypes. That implies that these genotypes are very close to each other based on the analyzed genes. The polymorphic sites of Fig. 6 Manhattan plot of haplotypes based on FT candidate genes. the different chromosomes show a high LD among The -log10 (P-values) from the association analysis are plotted themselves but not between the different chromosomes. against candidate gene. The red horizontal line indicates the significance threshold at P < 0.05. The squares show the promotor, Consequently, the polymorphic sites which are in a high triangle the exon, circles the intron, diamonds the 3’ UTR and stars LD are inherited together. In contrast, genotypes from the whole haplotypes. For better visualisation, the successive genes North America show a very high diversity with 88 poly- are shown in alternating black and green colours morphic sites in 38 genotypes. That implies that these Babben et al. BMC Genomics (2018) 19:409 Page 15 of 24 Table 6 Protein structure modelling of associated genes Protein haplotypes P-value uGDT (GDT) uSeqId (SeqId) Score Domain/motif Literature −4 CBF-A3_1 3.03E 71 (62) 30 (26) 51 2gcc (AP2) [28] −4 CBF-A3_2 1.52E 71 (61) 29 (25) 51 2gcc (AP2) [28] −4 CBF-A5_1 3.22E 69 (67) 30 (29) 51 2gcc (AP2) [28] −4 CBF-A5_2 5.75E 56 (54) 30 (29) 54 2gcc (AP2) [28] −4 CBF-A5_3 9.71E 57 (56) 29 (28) 56 2gcc (AP2) [28] −4 CBF-A10_1 2.78E 70 (66) 31 (29) 51 2gcc (AP2) [28] −4 CBF-A10_2 2.88E 71 (67) 31 (29) 50 2gcc (AP2) [28] −4 CBF-A13_1 4.79E 51 (38) 15 (11) 36 2gcc (AP2) [28] CBF-A13_2 n.a. n.a. n.a. n.a. n.a. n.a. −4 CBF-A14_1 1.97E 71 (69) 29 (28) 52 2gcc (AP2) [28] −4 CBF-A14_2 1.97E 71 (69) 29 (28) 52 2gcc (AP2) [28] −4 CBF-A15_1 3.80E 70 (66) 29 (27) 50 2gcc (AP2) [28] −4 CBF-A15_2 2.95E 70 (70) 29 (29) 49 2gcc (AP2) [28] −3 CBF-A18_1 1.33E 55 (49) 29 (26) 52 2gcc (AP2) [28] − 4 CBF-A18_2 8.72E 55 (49) 29 (26) 54 2gcc (AP2) [28] − 4 PPD-B1 1.24E 105(62) 32(19) 130 3t6k (putative response regulator domain) n.a. −5 8.74E 101(60) 27(16) 134 3jte (response regulator receiver domain) [111] −4 PPD-B2 1.21E 104(61) 32(19) 129 3t6k (putative response regulator domain) n.a. −5 8.12E 101(59) 27(16) 134 3jte (response regulator receiver domain) [111] −4 PPD-B3 1.06E 103(60) 31(18) 129 3t6k (putative response regulator domain) n.a. −5 7.21E 102(59) 27(16) 134 3jte (response regulator receiver domain) [111] −4 PPD-B4 1.28E 104(61) 32(19) 130 3t6k (putative response regulator domain) n.a. −5 8.98E 102(59) 27(16) 134 3jte (response regulator receiver domain) [111] −4 PPD-D1_1 1.05E 104 (47) 32 (15) 131 3t6k (putative response regulator domain) n.a. −5 7.11E 102 (46) 27 (12) 136 3jte (response regulator receiver domain) [111] −4 PPD-D1_2 1.14E 105 (63) 32 (19) 130 3t6k (putative response regulator domain) n.a. −4 PPD-D1_3 1.16E 104(62) 32 (19) 129 3t6k (putative response regulator domain) n.a. −5 8.07E 101 (60) 27 (16) 134 3jte (response regulator receiver domain) [111] − 3 VRN-A1_1 8.02E 84 (83) 39 (39) 84 4ox0 (K domain) [110] −4 1.02E 78 (103) 38 (50) 53 3kov (MADS domain/I domain) [112] −4 1.30E 75 (98) 38 (50) 52 1egw (MADS domain/I domain) [113] −3 VRN-A1_2 8.47E 83 (82) 39 (39) 84 4ox0 (K domain) [110] −5 7.90E 73 (103) 38 (50) 54 3kov (MADS domain/I domain) [112] −5 9.11E 74 (98) 38 (50) 53 1egw (MADS domain/I domain) [113] −3 VRN-A1_3 7.95E 84 (83) 38 (38) 84 4ox0 (K domain) [110] −5 7.06E 78 (103) 38 (50) 54 3kov (MADS domain/I domain) [112] −4 1.08E 75 (99) 38 (50) 53 1egw (MADS domain/I domain) [113] VRN-B3_1 7.92E- 154 (87) 147 (83) 166 3axy (Hd3A protein) [104] VRN-B3_2 1.66E- 154 (87) 148 (84) 168 3axy (Hd3A protein) [104] P: probability, uGDT: unnormalized GDT (Global Distance Test) score, GDT: calculated as uGDT divided by the protein (or domain) length and multiplied by a 100, uSeqID: number of identical residues in the alignment, SeqID: uSeqID normalized by the protein (or domain) sequence length and multiplied by 100, n.a.: not available Babben et al. BMC Genomics (2018) 19:409 Page 16 of 24 Fig. 7 Amino acid alignment and nucleotide divergence rates (dN/dS) of CBF-A3 gene and homologous amino acid sequences. Shown are alignments of two haplotype AA sequences of CBF-A3 and nine homologue plant AA sequences. The numbers above the alignment indicate the sites of AAs. The black line above the alignment illustrates the PKK/RPAGRxKFxETRHP and DSAWR motif, the red line the AP2 domain, the black arrows the β-strands and the black spirals the α-helices. The red arrows label the AA changes based on significantly associated SNPs or indels. The plot below the alignment shows the nucleotide divergence rates (dN/dS). The black line describes the dN/dS ratio. The red dots between alignment and plot indicate sites with significant negative selection (P < 0.05) genotypes are very distant to each other based on the 7A and 6A and do not belong to the FR-A2 locus. Add- analyzed genes. The gene cluster of CBFs (CBF-A3, itionally, all the SNPs identified in CBFs on chromosome CBF-A10, CBF-A13, CBF-A14 and CBF-A15)on 5A are in a very high LD (r = 0.92 to 1), indicating chromosome 5A shows a very high LD of r = 1. Conse- localisation at the same chromosomal region without or quently, this gene cluster has not been divided by mei- very rare recombination. This result confirms previous otic events but separated from Cab also located on knowledge of strong linkage of CBF genes and empha- chromosome 5A. An intermediate diversity was detected sises the importance of the FR-A2 locus in frost in genotypes from Europe with 51 polymorphic sites in response. On the other hand, results reveal that not all 116 genotypes. The LD analysis shows a very high LD members of the CBF family are involved in FT [50, 108]. within three blocks located on chromosome 5A but not Based on the occurrence of only two haplotypes of the between. Consequently the gene cluster (CBF-A3, CBF genes and the fact that they are very closely linked, CBF-A10, CBF-A13, CBF-A14 and CBF-A15), Cab and genotypes can be divided into two groups. Group one VRN-A1 are inherited independently. with 193 genotypes which shows 15% better winter sur- vival in comparison to group two (about 42 genotypes) Association study (Table 4). Therefore, breeding efforts in combining two FT is a highly complex and important trait of winter FT haplotypes are highly desirable towards creating elite wheat that is usually studied using QTL and expres- cultivars exhibiting high FT. sion profiling approaches [47, 106, 107]. In this study The CBF-A3_SNP2 [C/G] which is significantly − 10 we are presenting, to our best knowledge, the first associated with FT (P =6.28×10 ) results in Ala/ large scale candidate gene based association analysis Gly AA change. This AA is localised in the α helix of of FT in wheat. the AP2 domain and shows a high conservation of Accordingly, we identified significantly associated Ala in the AA alignment with homologues. Only the polymorphisms (SNPs/indels) in eleven studied genes as AA haplotype two had a Gly which corresponds to a well as respective haplotypes. Eight of these were de- significant reduction in winter survival of 15.01% in tected in both approaches (CBF-A3, CBF-A5, CBF-A10, the SNP/indel association study. Also haplotype one CBF-A13, CBF-A14, CBF-A15, PPD-D1 and VRN-A1). shows a 15.01% better winter survival compared to Out of the seven CBF genes, which are members of a haplotype two with Gly in the haplotype association large gene family that were investigated in this study, six study (Tables 4 and 5). Allen et al. [28] describe that revealed FT association in the SNP/indel and haplotype Ala contributes to the stabilisation of the protein method. Associated polymorphisms at the five candidate structure by its hydrophobic side chain. Through the genes CBF-A3, CBF-A10, CBF-A13, CBF-A14 and AA change from Ala to Gly it is possible that the Gly CBF-A15 are located at the FR-A2 locus [46, 47]on of AA haplotype two loses or impairs the functional- wheat chromosome 5A. The other two CBF members, ity as a transcription factor. Additionally, the dN/dS i.e. CBF-A5 and CBF-A18, are located on chromosomes ratio analysis shows that the CBF-A3_SNP2 site Babben et al. BMC Genomics (2018) 19:409 Page 17 of 24 ab cd Fig. 8 Nucleotide divergence rates (dN/dS) of CBF-A15, VRN-A1, VRN-B3 and PPD-B1 genes and homologue amino acid sequences. a Nucleotide divergence rates (dN/dS) between the identified haplotypes of CBF-A15, reference sequence of T. aestivum and eight homologous plant AA sequences. The black line on top indicates the PKK/RPAGRxKFxETRHP motif and DSAWR, the red line the AP2 domain. The black arrows and clamp label the position of AA changes based on significantly associated SNPs and indels. The black line shows the dN/dS ratio. The red dots illustrate sites underlying significant negative selection (P < 0.05). b Nucleotide divergence rates (dN/dS) between the identified haplotypes of VRN-A1, reference sequence of T. aestivum and eight homologous plant AA sequences. The black line on top indicates the MADS box and K domain, the red line the I and C domain. The black arrow labels the position of AA change based on significantly associated SNPs. The black line describes the dN/dS ratio. The red dots indicate sites with significant negative selection (P < 0.05). c Illustrated are nucleotide divergence rates (dN/dS) between the identified haplotypes of VRN-B3, reference sequence of T. aestivum and eight homologous plant AA sequences. The black line on top indicatesthe segment B, the red line the flowering time (FT)-family protein. The black arrow labels the position of AA change based on significant associated SNPs. The black line describes the dN/dS ratio. The red dots illustrate sites with significant negative selection (P < 0.05). d Nucleotide divergence rates (dN/dS) between the identified haplotypes of PPD-B1, reference sequence of T. aestivum and eight homologous plant AA sequences. The black line on top indicates the Pseudo-Reciver domain, the red line the CONSTANS motif. The black arrows label the position of AA changes significantly associated SNPs. The black line describes the dN/dS ratio. The red dots indicate sites with significant negative selection (P < 0.05) underlies negative selection (Fig. 7), reflecting the to haplotype one respectively two. Due to the fact that association effects. In detail, haplotype two with an the exon haplotype three consists only of three geno- Gly on CBF-A3_SNP2 site shows lower winter sur- types it was not identified in the SNP/indel association vival of 15.01% compared to haplotype one. analysis (Table 5, Additional file 10: Figure S5). The SNP/indel association study of CBF-A5 exon The significantly associated SNPs and indels of SNPs revealed no significant association. In contrast, the CBF-A13 may disorganise the protein structure very exon haplotype is significantly associated with a reduced strongly. The one bp CBF-A13_indel1 of haplotype two winter survival of 19.51% respectively 18.01% of exon changed the complete AA sequences from the seventh haplotype three (Ser on site CBF-A5_SNP2) compared AA onwards and a stop codon is present after 74 AAs. Babben et al. BMC Genomics (2018) 19:409 Page 18 of 24 Since only six AAs are identical to the reference protein conclusion that SNP CBF-A3_SNP2 is the most inter- sequence, a complete loss of function may be assumed. esting CBF allele for FT improvement and at the The 32 bp deletion (CBF-A13_indel2) of haplotype one same time CBF-A15_SNP3 is the second most im- also generates a frame shift from AA 45 on and a stop portant one. Further details remain to be revealed by codon after 135 AAs. Hence, the AA haplotype two ex- investigations in the future such as complementation hibits no similarity and the AA haplotype one shows a of promising alleles in spring or winter wheat var- very low uSeqId of 15 but an appropriate P-value of ieties or protein functionality analysis of these alleles. − 4 4.79 × 10 compared to the AP2 domain. Therefore, it Identification of the VRN-A1_SNP1 [C/T/Y] revealed is most likely that both proteins are non-functional. that the genotypes with the base T and the genotypes However, haplotype one shows about 6.20% higher win- with the ambiguous nucleotide Y increase FT by 1.35% ter survival in the SNP/indel association study and in respectively 0.11% in comparison to the genotypes with the haplotype association study (9.80%) compared to the base C. The exon haplotype association study shows haplotype two. Consequently, it is possible that the pro- stronger effects. In detail, haplotype three (T) and haplo- teins of AA haplotype one have an increased transcrip- type four (Y) increase FT by 12.44% respectively 11.21%, tion factor efficiency compared to the proteins of AA in comparison to haplotype two (C). Chen et al. [53] and haplotype two. On the other hand the association may Eagles et al. [55] also identified the C and Y allele and be due to the very high LD within the FR-A2 locus. Diaz et al. [109] and Zhu et al. [107] described that the Regarding the two AA haplotypes of CBF-A15 an AA C allele is associated with a lower VRN-A1 copy number change in the AP2 domain from Ala to Val was identified and the Y allele with a higher copy number. In addition, which matched to the statistically significant associated Zhu et al. [107] described that an increase of the − 10 CBF-A15_SNP3 [C/T] with a P-value of 6.28 × 10 .Ala VRN-A1 copy number is associated with improved FT is conserved at this position at the nine homologue AA se- among the FR-A2-T allele of the CBF12 and CBF15 quences. The haplotype two which comprises Val in the genes but not among the FR-A2-S which are reflected in AA sequence shows 15.01% less winter survival in the our haplotypes one and two, respectively. However, the SNP/indel association study and 15.01% in the haplotype VRN-A1_SNP1 generates an AA change on a Leu con- association study. That indicates a functional loss, al- served side in the K-domain of a MIKC-type transcrip- though Ala as well as Val possess a hydrophobic side chain tion factor between α helix three and α helix four to a and are very similar in molecule size. A functional loss Phe. Puranik et al. [110] showed that this Leu stabilises due to this AA change is unlikely. On the other hand, no the kink region between α helix three and α helix four statistically significant positive or negative selection at this by extensive intra-molecule hydrophobic interactions of site was detected. multiple Leu residues. Both AA have hydrophobic side Only the whole gene haplotype of CBF-A18 is signifi- chains but Phe with its benzene ring strongly differs cantly associated to FT. Due to the MAF selection, no from Leu for its steric requirements. Therefore, it is pos- association study could be performed for all other haplo- sible that the Phe increases the angle between both α type components and polymorphic sites of CBF-A18 helices due to its bulkiness and the attachment to the (Table 5). target sequence of the transcription factor is improved. All other SNPs/indels of the seven CBFs, which re- Additionally the VRN-A1_SNP1 underlies no negative or sulted in AA changes, may be involved in FT but they positive selection. are not located in the highly conserved AP2 domain. The VRN-B3 gene is significantly associated to FT only But, the CBF-A3_SNP4, which is located upstream of in the SNP/indel association study. The polymorphism − 2 the AP2 domain, shows conserved AA sites. This may VRN-B3_SNP1 [C/G] shows a P-value of 3.76 × 10 . play an important regulatory role in FT. The associated This SNP generates a His/Asp AA change. The geno- polymorphisms of CBF-A5 and CBF-A14 are located types with His show a higher winter survival of 3.05% within the promoter and the in silico promoter analysis compared to those containing Asp (Table 4). Addition- of these haplotypes shows no promoter region differ- ally this site shows a high conservation of Asp regarding ences which explain modifications in gene transcription. all nine homologue AA sequences and underlies signifi- This study revealed significantly associated SNPs/ cant negative selection. All this data indicates that indels and haplotypes in seven CBF genes. Out of the VRN-B3 and respective homologous play a role in FT. associated polymorphisms two SNPs were identified, The polymorphisms of the PPD-B1 gene merely show whichresultedinanAA changeinthe highly con- significance in the haplotype association study. The exon served AP2 domain of the CBF-A3 and CBF-A15 haplotype three shows 5.11% better winter survival than protein. Both CBF genes were identified as important haplotype four. The difference between both haplotypes FT genes in Triticum monococcum and Triticum is the PPD-B1_SNP2 [A/G] which generates an Arg/Gly aestivum [47, 59, 61–63]. All this leads to the change in the α helix 3 of the Pseudo-Receiver domain. Babben et al. BMC Genomics (2018) 19:409 Page 19 of 24 The Arg is highly conserved for grasses [39, 40]. There- (PPD-B1_SNP2, PPD-B1_SNP3 and PPD-B1_SNP5) fore, it is possible that the Gly from haplotype three has (Additional file 18: Figure S13). a positive effect on FT. Another AA change in the The associated indel of PPD-D1 has an effect of 3.96% Pseudo-Receiver domain from Asn to Asp is caused by in the SNP/indel association study (Table 4). Haplotype the PPD-B1_SNP3 [A/G]. Asp at this position is highly one (with a deletion) shows 3.62% better winter survival conserved. Only haplotype one and the homologue AA compared to haplotype two in the haplotype association of Triticum aestivum show Asn resulting in a 9.25% study (Table 5). As a result of this deletion, a stop codon decreased winter survival in comparison to haplotype on AA position 470 occurs and the CONSTANS motif is four. The haplotype two, which is associated with missing. The consequence is that the PPD-D1 protein PPD-B1_SNP5 [G/A], and an Asp/Asn AA change shows cannot interact with the CO protein and the flowering 28.28% less winter survival in comparison to haplotype control pathway is interrupted [38]. Furthermore, the four. The most tolerant haplotype four originates from the interaction between the flowering time and FT pathway Asia group. The position of this AA is 51 AAs down- is disturbed (Additional file 19: Figure S14). stream of the Pseudo-Receiver domain and is slightly This study demonstrated polymorphisms significantly conserved. Association of the haplotypes one (three obser- associated with FT and the importance of the AA vations) and two (five observations) are based on very few changes of seven CBF gene family members and the observation and therefore the results should be interpreted VRN-A1, VRN-B3, PPD-B1 and PPD-D1 genes (Fig. 9). with caution. The dN/dS analysis revealed no significant These results may be used to design highly frost tolerant negative or positive selection for all three AA substitutions wheat cultivars via gene engineering or classical Fig. 9 Workflow for identifying associations for frost tolerance in wheat Babben et al. BMC Genomics (2018) 19:409 Page 20 of 24 breeding. To achieve this, six associated genes (CBF-A3, Additional file 7: Figure S4. Principal coordinate analysis of 235 wheat CBF-A15, VRN-A1, VRN-B3, PPD-B1 and PPD-D1) have cultivars. Three sub-populations based on geographical origin were shown in three colors. Blue, red and green indicate the cultivars from to be combined, employing the alleles which show the Europe, North America and Asia, respectively. The black dots indicatethe strongest positive effect in FT. We suggest a wheat culti- mixture gemplasm from three sub-populations. (PDF 42 kb) var with the haplotype one of both CBF genes, haplotype Additional file 8: Table S3. Polymorphic sites and haplotypes of three of PPD-B1, haplotype one of PPD-D1, haplotype association study and their PH-values and FT effects. (XLSX 14 kb) three of VRN-A1 and VRN-B3_SNP1 of VRN-B3 to Additional file 9: Table S4. Identities and e values of FT associated AA sequences and homologous AA sequences. (XLSX 21 kb) create a genotype with the theoretically highest FT Additional file 10: Figure S5. Amino acid alignment and nucleotide regarding the investigated genes. Additional candidate divergence rates (dN/dS) of CBF-A5 and nine homologous amino acid gene based association genetics studies in the field of FT sequences. Shownare alignments of two haplotype AA sequences of should focus on the COR (cold-regulated) genes and CBF-A5 and nine homologous plant AA sequences. The numbers above the alignment indicate the sites of AAs. The black line above the align- proteins. ment illustrates the PKK/RPAGRxKFxETRHP and DSAWR motif, the red line the AP2 domain, the black arrows the β-strands and the black spiral the Conclusion α-helix. The description of red arrows, black line und red dots is accord- ing to Fig. 6. (PDF 54 kb) In FT research and especially with respect to genome Additional file 11: Figure S6. Amino acid alignment and nucleotide based breeding, it is important to identify genes and al- divergence rates (dN/dS) of CBF-A10 gene and nine homologous amino leles involved in cold response. Based on the phenotypic acid sequences. Alignments of two haplotype AA sequences of CBF-A10 data from German and Russian field trials, this study and nine homologous plant AA sequences. The numbers above the alignment indicate the sites of AAs. The black line above the alignment illustrates the importance of the CBF genes of the FR-A2 illustrates the PKK/RPAGRxKFxETRHP and DSAWR motif, the red line the locus, VRN-A1, VRN-B3, PPD-B1 and PPD-D1 for FT of AP2 domain, the black arrows the β-strands and the black spiral the α- wheat. Polymorphisms in CBF-A3, CBF-A5, CBF-A10, helix. The description of red arrow, black line and red dots is according to Fig. 6. (PDF 53 kb) CBF-A13, CBF-A14, CBF-A15, CBF-A18, VRN-A1, Additional file 12: Figure S7. Amino acid alignment and nucleotide VRN-B3, PPD-B1 and PPD-D1 were identified which are divergence rates (dN/dS) of CBF-A13 gene and nine homologous amino significantly associated with FT. Besides this, we acid sequences. Illustrated are alignments of two haplotype AA sequences of CBF-A13 and nine homologous plant AA sequences. The detected significantly associated polymorphisms and numbers above the alignment illustrate the sites of AAs. The black line consequential AA substitutions in important cold above the alignment illustrates the PKK/RPAGRxKFxETRHP and DSAWR response protein domains. The results of this study motif, the red line the AP2 domain, the black arrows the β-strands and the black spiral the α-helix. The description of black line and red dots is demonstrated that the candidate based association gen- according to Fig. 6. (PDF 53 kb) etics approach is a very useful method to identify alleles Additional file 13: Figure S8. Amino acid alignment and nucleotide positively contributing to frost tolerance in hexaploid divergence rates (dN/dS) of CBF-A15 gene and nine homologous amino wheat. The FT associated SNPs/indels and haplotypes acid sequences. Illustrated are alignments of two haplotype AA identified may be used for developing diagnostic markers sequences of CBF-A15 and nine homologous plant AA sequences. The numbers above the alignment illustrate the sites of AAs. The black line for marker assisted selection (MAS) for frost tolerance above the alignment illustrates the PKK/RPAGRxKFxETRHP and DSAWR in wheat. motif, the red line the AP2 domain, the black arrows the β-strands and the black spiral the α-helix. The description of red arrows, black line and red dots is according to Fig. 6. (PDF 52 kb) Additional files Additional file 14: Figure S9. Amino acid alignment and nucleotide divergence rates (dN/dS) of CBF-A14 gene and nine homologous amino Additional file 1: Table S1. Plant material, origin, LS means of winter acid sequences. Illustrated are alignments of two haplotype AA sequences survival [%] and winter survival [%] of respective environments. (XLSX 34 kb) of CBF-A14 and nine homologous plant AA sequences. The numbers above Additional file 2: Table S2. PCR and sequencing primers. Shcherban the alignment illustrate the sites of AAs. The black line above the alignment 2 3 4 et al., 2012; Miller et al., 2006; Vagujfalvi et al., 2005; Yan et al., 2004; illustrates the PKK/RPAGRxKFxETRHP and DSAWR motif, the red line the AP2 5 6 7 8 Beales et al., 2007; Keilwagen et al., 2014; Knox et al., 2008; Babben et domain, the black arrows the β-strands and the black spiral the α-helix. The 9 10 al., 2015; Seki et al., 2011; Li et al., 2011b (XLSX 19 kb) description of black line and red dots is according to Fig. 6.(PDF51 kb) Additional file 3: Data S1. Java script used to extract differences from a Additional file 15: Figure S10. Amino acid alignment and nucleotide multiple sequence alignment (MSA). (TXT 5 kb) divergence rates (dN/dS) of CBF-A18 gene and nine homologous amino Additional file 4: Figure S1. Scatter plot of mean winter survival and acid sequences. Illustrated are alignments of two haplotype AA number of days under − 15 °C from ten environments. (PDF 24 kb) sequences of CBF-A18 and nine homologous plant AA sequences. The numbers above the alignment illustrate the sites of AAs. The black line Additional file 5: Figure S2. PCA plot of mean winter survival and above the alignment illustrates the PKK/RPAGRxKFxETRHP and DSAWR number of days under − 15 °C from ten environments. (PDF 52 kb) motif, the red line the AP2 domain, the black arrows the β-strands and Additional file 6: Figure S3. Localization of 19 candidate genes and the black spiral the α-helix. The description of black line und red dots is corresponding primers in the Chinese Spring Reference assembly v1.0. according to Fig. 6. (PDF 54 kb) Next to the gene names, the chromosome number and physical position Additional file 16: Figure S11. Amino acid alignment and nucleotide of the Chinese Spring reference assembly v1.0 are shown in brackets. For divergence rates (dN/dS) of VRN-A1 gene and nine homologous amino each gene the structure is illustrated below. The black lines indicate the acid sequences. Illustrated are alignments of three haplotype AA genomic sequences, the gray boxes the exons, the black arrows sequences of VRN-A1 nine homologous plant AA sequences. The rightward the forward primers and the black arrows leftwards the reverse numbers above the alignment illustrate the sites of AAs. The black line primers. (PDF 23 kb) Babben et al. BMC Genomics (2018) 19:409 Page 21 of 24 cultivars were deposited to GenBank under accession numbers MH264428- above the alignment illustrates the I domain and C domain, the red line MH264501 to be available in the NCBI Nucleotide database. the MADS box domain and K domain, the black arrows the β-strands and the spirals the α-helices. The description of red arrow, black line and red Authors’ contributions dots is according to Fig. 6. (PDF 79 kb) Conceived and designed the experiments: DP, FO and MK. Provided the Additional file 17: Figure S12. Amino acid alignment and nucleotide experimental material: AB, MK, TP and YC. Performed the experiments: SB, divergence rates (dN/dS) of VRN-B3 gene and nine homologous amino FA, YC, TP, AB and MK. Analysed the data: SB, JK, TB, ES, PJ, SET and DP. acid sequences. Illustrated are alignments of three haplotype AA Wrote the paper: SB, DP and FO. Study design, subject recruitment and sequences of VRN-B3 nine homologous plant AA sequences. The sample preparation: DP, MK and FO. Data interpretation: SB, JK, TB, ES, PJ, numbers above the alignment illustrate the sites of AAs. The black line SET, PK and DP. All authors read and approved the final manuscript. above the alignment illustrates the segment B, the red line the flowering time (FT)-family protein, the black arrows the β-strands and the spirals Ethics approval and consent to participate the α-helices. The description of red arrow, black line and red dots is The material used in this study consisted mostly of registered varieties a/o according to Fig. 6. (PDF 51 kb) lines publicly accessible. For all the accessions analysed no permissions or Additional file 18: Figure S13. Amino acid alignment and nucleotide licenses were needed. divergence rates (dN/dS) of PPD-B1 gene and nine homologous amino acid sequences. Illustrated are alignments of four haplotype AA Competing interests sequences of PPD-B1 and nine homologous plant AA sequences. The The authors declare that they have no competing interests. numbers above the alignment illustrate the sites of AAs. The red line above the alignment illustrates the Pseuodo Receiver domain and COSTANS motif. The description of red arrows, black line and red Publisher’sNote dots is according to Fig. 6. (PDF 8583 kb) Springer Nature remains neutral with regard to jurisdictional claims in Additional file 19: Figure S14. AA alignment and nucleotide published maps and institutional affiliations. divergence rates (dN/dS) of PPD-D1 gene and nine homologous amino acid sequences. Illustrated are alignments of four haplotype AA Author details sequences of PPD-D1 and nine homologous plant AA sequences. The Julius Kühn-Institut (JKI), Federal Research Centre for Cultivated Plants, numbers above the alignment illustrate the sites of AAs. The red line Institute for Resistance Research and Stress Tolerance, Erwin-Baur-Str. 27, above the alignment illustrates the Pseudo Receiver domain and 06484 Quedlinburg, Saxony-Anhalt, Germany. Martin Luther University COSTANS motif. The description of black line and red dots is according to Halle-Wittenberg (MLU), Institute of Agricultural and Nutritional Sciences, Fig. 6.(PDF 7539 kb) Betty-Heimann-Str. 5, 06120 Halle (Saale), Saxony-Anhalt, Germany. Julius Kühn-Institut (JKI), Federal Research Centre for Cultivated Plants, Institute for Biosafety in Plant Biotechnology, Erwin-Baur-Str. 27, 06484 Quedlinburg, Abbreviations Saxony-Anhalt, Germany. Deutsche Saatveredelung AG (DSV), Weißenburger AA: Amino acid; ANCOVA: Analysis of covariance; AP2/EREBP: APETALA2/ Str. 5, 59557 Lippstadt, Nordrhein-Westfalen, Germany. Leibniz Institute of Ethylene response element binding protein; BLAST: Basic local alignment Plant Genetics and Crop Plant Research (IPK), Resources Genetics and search tool; CAB: Calcium-binding EF-hand family protein-like; CBF: C-repeat Reproduction, Correnstraße 3, 06466 Seeland OT Gatersleben, Saxony-Anhalt, binding factor; CDS: Coding DNA sequence; CO: CONSTANS; COR/LEA: Cold- Germany. Max Planck Institute for Biology of Ageing, Joseph-Stelzmann-Str. responsive/late embryogenesis–abundant; CRT/DRE: C-repeat/dehydration- 9B, 50931 Cologne, Nordrhein-Westfalen, Germany. Agrophysical Research responsive element; CV: Coefficient of variation; DEM: Defective embryo and Institute (AFI), Grazhdanskii prosp. 14, 195220 St. Petersburg, Russia. Institute meristems; DH: Doubled haploid; DN/DS: Nucleotid divergence rate; FR: Frost of Cytology and Genetics of Siberian Branch of the Russian Academy of resistance; FT: Frost tolerance; GBP: Giga-base pairs; GLM: General linear Sciences, Prospekt Lavrentyeva 10, 630090 Novosibirsk, Russia. Saaten-Union model; GWAS: Genome-wide association study; HD: Haplotype diversity; Biotec GmbH, Hovedisser Str. 94, 33818 Leopoldshoehe, Nordrhein-Westfalen, ICE: Inducer of CBF expression; IFVCNS: Institute of Field and Vegetable Germany. Martin Luther University Halle-Wittenberg (MLU), Institute of Crops; INDEL: Insertion-deletion; IWGSC: International wheat genome Agricultural and Nutritional Sciences, Betty-Heimann-Str. 3, 06120 Halle sequencing consortium; KBP: Kilo-base pairs; LD: Linkage disequilibrium; (Saale), Saxony-Anhalt, Germany. LS: Least-Squares; M: Million; MAF: Minor allele frequency; MAFFT: Multiple Alignment tool Fast Fourier Transform; MCMC: Markov Chain Monte Carlo; Received: 15 December 2017 Accepted: 14 May 2018 MEF: Myocyte enhancer factor; MLM: Mixed linear model; MSA: Multiple sequence alignment; NGS: Next generation sequencing; NT: Nulli-tetrasomic; PCA: Principal component analysis; PCOA: Principal coordinate analysis; References PPD: Photoperiod; PRR: Pseudo-response regulator; QTL: Quantitative trait 1. Food and Agriculture Organization of the United Nations. http://www.fao. locus; RV: Reaction volume; SNP: Single nucleotide polymorphism; T: Tonne; org/home/en/ (2014). Accessed 03 Feb 2015. TACR7: Triticum aestivum cold-regulated 7; URGI: Unité de Recherche 2. Koemel JE, Guenzi AC, Anderson JA, Smith EL. Cold hardiness of wheat Génomique Info; UTR: Untranslated region K: number of populations; near-isogenic lines differing in vernalization alleles. Theor Appl Genet. 2004; VRN: Vernalisation 109:839–46. 3. Dvorak J, Akhunov ED. Tempos of gene locus deletions and duplications and their relationship to recombination rate during diploid and polyploid Acknowledgements evolution in the aegilops-triticum alliance. Genetics. 2005;171:323–32. The authors thank the International Wheat Genome Sequencing Consortium 4. Huang S, Sirikhachornkit A, Su XJ, Faris J, Gill B, Haselkorn R, Gornicki P. for pre-publication access to IWGSC RefSeq v1.0, and Katy Niedung for excellent Genes encoding plastid acetyl-CoA carboxylase and 3-phosphoglycerate technical assistance. The authors thank to Majda Valjavec-Gratian from NCBI for kinase of the Triticum/Aegilops complex and the evolutionary history of help during preparation of sequences for submission to GenBank. polyploid wheat. Proc Natl Acad Sci U S A. 2002;99:8133–8. 5. Peng JHH, Sun DF, Nevo E. Domestication evolution, genetics and Funding genomics in wheat. Mol Breed. 2011;28:281–301. The authors thank the German Federal Ministry of Education and Research 6. Smith DB, Flavell RB. Characterization of wheat genome by renaturation (BMBF) for funding the project FROWHEAT (0315953). The field experiments in kinetics. Chromosoma. 1975;50:223–42. Novosibirsk were partially supported by ICG budget project #0324–2016-0001. 7. Choulet F, Wicker T, Rustenholz C, et al. Megabase level sequencing reveals contrasted organization and evolution patterns of the wheat gene and Availability of data and materials Transposable element spaces. Plant Cell. 2010;22:1686–701. All data presented in this study, as well as GenBank accessions and cultivars will 8. Wang S, Wong D, Forrest K, Allen A, Chao S, Huang BE, Maccaferri M, Salvi S, be publicly available. Sequences of individual haplotypes from representative Milner SG, Cattivelli L, et al. Characterization of polyploid wheat genomic Babben et al. BMC Genomics (2018) 19:409 Page 22 of 24 diversity using a high-density 90 000 single nucleotide polymorphism array. 31. Jaglo KR, Kleff S, Amundsen KL, Zhang X, Haake V, Zhang JZ, Deits T, Plant Biotechnol J. 2014;12(6):787–96. Thomashow MF. Components of the Arabidopsis C-repeat/dehydration- 9. Babben S, Perovic D, Koch M, Odon F. An efficient approach for the responsive element binding factore cold-response pathway are development of locus specific primers in bread wheat (Triticum aestivum L.) conserved in Brassica napus and other plant species. Plant Physiol. and its application to re-sequencing of genes involved in frost tolerance. 2001;127:910–7. PLoS One. 2015;10:e0142746. 32. Allagulova CR, Gimalov FR, Shakirova FM, Vakhitov VA. The plant dehydrins. Structure and putative functions. Biochemistry-Moscow. 2003;68:945–51. 10. Jordan KW, Wang S, Lun Y, et al. A haplotype map of allohexaploid wheat reveals distinct patterns of selection on homoeologous genomes. Genome 33. Close TJ. Dehydrins. A commonality in the response of plants to Biol. 2015;16:48. dehydration and low temperature. Physiol Plant. 1997;100:291–6. 11. School of Biological Sciences. University of Bristol. 2012. http://www. 34. Winfield MO, Lu CG, Wilson ID, Coghill JA, Edwards KJ. Plant responses to cerealsdb.uk.net. Accessed 01 Mar 2012. cold. Transcriptome analysis of wheat. Plant Biotechnol J. 2010;8:749–71. 12. Wilkinson P, Winfield M, Barker G, Allen A, Burridge A, Coghill J, Edwards K. 35. Gilmour SJ, Zarka DG, Stockinger EJ, Salazar MP, Houghton JM, Thomashow CerealsDB 2.0. An integrated resource for plant breeders and scientists. BMC MF. Low temperature regulation of the Arabidopsis CBF family of AP2 Bioinformatics. 2012;13:219. transcriptional activators as an early step in cold-induced COR gene 13. Unité de Recherche Génomique Info. Centre de Versailles (2013). http:// expression. Plant J. 1998;16:433–42. wheat-urgi.versailles.inra.fr. Accessed 20 Feb 2013. 36. Dhillon T, Pearce SP, Stockinger EJ, Distelfeld A, Li CX, Knox AK, Vashegyi I, Vagujfalvi A, Galiba G, Dubcovsky J. Regulation of freezing tolerance and 14. Unité de Recherche Génomique Info. Centre de Versailles (2013). https:// flowering in temperate cereals. The VRN-1 connection. Plant Physiol. 2010; wheat-urgi.versailles.inra.fr/Seq-Repository/. Accessed 20 Feb 2013. 153:1846–58. 15. Brenchley R, Spannagl M, Pfeifer M, et al. Analysis of the bread wheat genome using whole-genome shotgun sequencing. Nature. 2012;491:705–10. 37. Kato K, Yamagata H. Method for evaluation of chilling requirement and 16. The International Wheat Genome Sequencing Consortium. A chromosome- narrow-sense earliness of wheat cultivars. Jpn J Breed. 1988;38:172–86. based draft sequence of the hexaploid bread wheat (Triticum aestivum) 38. Turner A, Beales J, Faure S, Dunford RP, Laurie DA. The pseudo-response genome. Science. 2014;345:1251788. regulator Ppd-H1 provides adaptation to photoperiod in barley. Science. 17. Raats D, Frenkel Z, Krugman T, et al. The physical map of wheat 2005;310:1031–4. chromosome 1BS provides insights into its gene space organization and 39. Matsushika A, Makino S, Kojima M, Mizuno T. Circadian waves of expression evolution. Genome Biol. 2013;14:R138. of the APRR1/TOC1 family of pseudo-response regulators in Arabidopsis thaliana. Insight into the plant circadian clock. Plant Cell Physiol. 2000;41: 18. Ling HQ, Zhao S, Liu D, et al. Draft genome of the wheat A-genome 1002–12. progenitor Triticum urartu. Nature. 2013;496:87–90. 40. Strayer C, Oyama T, Schultz TF, Raman R, Somers DE, Mas P, Panda S, Kreps 19. Jia JZ, Zhao SC, Kong XY, et al. Aegilops tauschii draft genome sequence JA, Kay SA. Cloning of the Arabidopsis clock cone TOC1, an autoregulatory reveals a gene repertoire for wheat adaptation. Nature. 2013;496:91–5. response regulator homolog. Science. 2000;289:768–71. 20. Winfield MO, Allen AM, Burridge AJ, Barker GLA, et al. High-density SNP genotyping array for hexaploid wheat and its secondary and tertiary gene 41. Danyluk J, Kane NA, Breton G, Limin AE, Fowler DB, Sarhan F. TaVRT-1, a pool. Plant Biotechnol J. 2016;14(5):1195–206. putative transcription factor associated with vegetative to reproductive transition in cereals. Plant Physiol. 2003;132:1849–60. 21. Allen AM, Winfield MO, Burridge AJ, et al. Characterization of a wheat 42. Trevaskis B, Bagnall DJ, Ellis MH, Peacock WJ, Dennis ES. MADS box genes breeders’ Array suitable for high-throughput SNP genotyping of global accessions of hexaploid bread wheat (Triticum aestivum). Plant Biotechnol J. control vernalization-induced flowering in cereals. Proc Natl Acad Sci U S A. 2003;100:13099–104. 2016;15(3):390–401. 43. Yan L, Loukoianov A, Tranquilli G, Helguera M, Fahima T, Dubcovsky J. 22. Li Y, Boeck A, Haseneyer G, Korzun V, Wilde P, Schoen CC, Ankerst DP, Bauer Positional cloning of the wheat vernalization gene VRN1. Proc Natl Acad Sci E. Association analysis of frost tolerance in rye using candidate genes and U S A. 2003;100:6263–8. phenotypic data from controlled, semi-controlled, and field phenotyping platforms. BMC Plant Biol. 2011;11:146. 44. Yan L, Loukoianov A, Blechl A, Tranquilli G, Ramakrishna W, SanMiguel P, 23. Orvar BL, Sangwan V, Omann F, Dhindsa RS. Early steps in cold sensing by Bennetzen JL, Echenique V, Dubcovsky J. The wheat VRN2 gene is a flowering plant cells. The role of actin cytoskeleton and membrane fluidity. Plant J. repressor down-regulated by Vernalization. Science. 2004;303:1640–4. 2000;23:785–94. 45. Yan L, Fu D, Li C, Blechl A, Tranquilli G, Bonafede M, Sanchez A, Valarik M, 24. Sangwan V, Foulds I, Singh J, Dhindsa RS. Cold-activation of Brassica napus Yasuda S, Dubcovsky J. The wheat and barley vernalization gene VRN3 is an BN115 promoter is mediated by structural changes in membranes and orthologue of FT. Proc Natl Acad Sci U S A. 2006;103:19581–6. cytoskeleton, and requires Ca2+ influx. Plant J. 2001;27:1–12. 46. Francia E, Rizza F, Cattivelli L, Stanca AM, Galiba G, Toth B, Hayes PM, Skinner JS, Pecchioni N. Two loci on chromosome 5H determine low- 25. Chinnusamy V, Ohta M, Kanrar S, Lee BH, Hong X, Agarwal M, Zhu JK. ICE1. temperature tolerance in a ‘Nure’ (winter) x ‘Tremois’ (spring) barley map. A regulator of cold-induced transcriptome and freezing tolerance in Theor Appl Genet. 2004;108:670–80. Arabidopsis. Genes Dev. 2003;17:1043–54. 47. Vagujfalvi A, Galiba G, Cattivelli L, Dubcovsky J. The cold-regulated 26. Liu Q, Kasuga M, Sakuma Y, Abe H, Miura S, Yamaguchi-Shinozaki K, transcriptional activator Cbf3 is linked to the frost-tolerance locus Fr-A2 on Shinozaki K. Two transcription factors, DREB1 and DREB2, with an EREBP/ wheat chromosome 5A. Mol Gen Genomics. 2003;269:60–7. AP2 DNA binding domain separate two cellular signal transduction pathways in drought- and low-temperature-responsive gene expression, 48. Zhao YS, Gowda M, Wurschum T, et al. Dissecting the genetic architecture of respectively, in Arabidopsis. Plant Cell. 1998;10:1391–406. frost tolerance in central European winter wheat. J Exp Bot. 2013;64:4453–60. 27. Stockinger EJ, Gilmour SJ, Thomashow MF. Arabidopsis thaliana CBF1 49. Galiba G, Quarrie SA, Sutka J, Morgounov A, Snape JW. RFLP mapping of encodes an AP2 domain-containing transcriptional activator that binds to the vernalization (vrn1) and frost-resistance (fr1) genes on chromosome 5a the C-repeat/DRE, a cis-acting DNA regulatory element that stimulates of wheat. Theor Appl Genet. 1995;90:1174–9. transcription in response to low temperature and water deficit. Proc Natl 50. Sutka J, Galiba G, Vagujfalvi A, Gill BS, Snape JW. Physical mapping of the Acad Sci U S A. 1997;94:1035–40. Vrn-A1 and Fr1 genes on chromosome 5A of wheat using deletion lines. 28. Allen MD, Yamasaki K, Ohme-Takagi M, Tateno M, Suzuki M. A novel Theor Appl Genet. 1999;99:199–202. mode of DNA recognition by a beta-sheet revealed by the solution 51. Stockinger EJ, Skinner JS, Gardner KG, Francia E, Pecchioni N. Expression structure of the GCC-box binding domain in complex with DNA. EMBO J. levels of barley Cbf genes at the frost resistance-H2 locus are dependent 1998;17:5484–96. upon alleles at Fr-H1 and Fr-H2. Plant J. 2007;51:308–21. 29. Dietz KJ, Vogel MO, Viehhauser A. AP2/EREBP transcription factors are part 52. Santra DK, Santra M, Allan RE, Campbell KG, Kidwell KK. Genetic and of gene regulatory networks and integrate metabolic, hormonal and molecular characterization of Vernalization genes Vrn-A1, Vrn-B1, and Vrn- environmental signals in stress acclimation and retrograde signalling. D1 in spring wheat germplasm from the Pacific northwest region of the Protoplasma. 2010;245:3–14. USA. Plant Breed. 2009;128:576–84. 30. Peng YL, Wang YS, Cheng H, Sun CC, Wu P, Wang LY, Fei J. Characterization 53. Chen YH, Carver BF, Wang SW, Zhang FQ, Yan LL. Genetic loci associated and expression analysis of three CBF/DREB1 transcriptional factor genes with stem elongation and winter dormancy release in wheat. Theor Appl from mangrove Avicennia marina. Aquat Toxicol. 2013;140:68–76. Genet. 2009;118:881–9. Babben et al. BMC Genomics (2018) 19:409 Page 23 of 24 54. Chen Y, Carver BF, Wang S, Cao S, Yan L. Genetic regulation of developmental 77. Li YL, Haseneyer G, Schon CC, Ankerst D, Korzun V, Wilde P, Bauer E. High phases in winter wheat. Mol Breed. 2010;26:573–82. levels of nucleotide diversity and fast decline of linkage disequilibrium in 55. Eagles HA, Cane K, Trevaskis B. Veery wheats carry an allele of Vrn-A1 that has rye (Secale cereale L.) genes involved in frost response. BMC Plant Biol. implications for freezing tolerance in winter wheats. Plant Breed. 2011;130:413–8. 2011;11:6. 78. Pritchard JK, Stephens M, Donnelly P. Inference of population structure 56. Reddy L, Allan RE, Campbell KAG. Evaluation of cold hardiness in two sets using multilocus genotype data. Genetics. 2000;155:945–59. of near-isogenic lines of wheat (Triticum aestivum) with polymorphic 79. Evanno G, Regnaut S, Goudet J. Detecting the number of clusters of vernalization alleles. Plant Breed. 2006;125:448–56. individuals using the software STRUCTURE. A simulation study. Mol Ecol. 57. Baga M, Chodaparambil SV, Limin AE, Pecar M, Fowler DB, Chibbar RN. 2005;14:2611–20. Identification of quantitative trait loci and associated candidate genes for low-temperature tolerance in cold-hardy winter wheat. Funct Integr 80. Ramasamy RK, Ramasamy S, Bindroo BB, Naik VG. STRUCTURE PLOT: a Genomics. 2007;7:53–68. program for drawing elegant STRUCTURE bar plots in user friendly interface. 58. Motomura Y, Kobayashi F, Iehisa JCM, Takumi S. A major quantitative trait SpringerPlus. 2014;3:431. locus for cold-responsive gene expression is linked to frost-resistance gene 81. Perrier X, Jacquemoud-Collet JP. DARwin software. 2006, http://darwin.cirad.fr/. Fr-A2 in common wheat. Breed Sci. 2013;63:58–67. 82. Goodman MM, Stuber CW. Races of maize. 6. Isozyme variation among 59. Vagujfalvi A, Aprile A, Miller A, Dubcovsky J, Delugu G, Galiba G, Cattivelli L. races of maize in Bolivia. Maydica. 1987;28:169–87. The expression of several Cbf genes at the Fr-A2 locus is linked to frost 83. Reif JC, Melchinger AE, Frisch M. Genetical and mathematical properties of resistance in wheat. Mol Gen Genomics. 2005;274:506–14. similarity and dissimilarity coefficients applied in plant breeding and seed bank management. Crop Sci. 2005;45:1–7. 60. Miller AK, Galiba G, Dubcovsky J. A cluster of 11 CBF transcription factors is located at the frost tolerance locus Fr-a(m)2 in Triticum monococcum. Mol 84. Wright S. Evolution and the genetics of populations. A treatise in four Gen Genomics. 2006;275:193–203. volumes. Vol. 4, variability within and among natural population. Chicago, 61. Sutton F, Chen DG, Ge XJ, Kenefick D. Cbf genes of the Fr-A2 allele are London: University of Chicago Press; 1978. p. 91. differentially regulated between long-term cold acclimated crown tissue of 85. The R Project for Statistical Computing. R software. https://www.r-project.org. freeze-resistant and -susceptible, winter wheat mutant lines. BMC Plant Biol. 86. Buckler Lab for Maize Genetics and Diversity. TASSEL software. http://www. 2009;9:34. maizegenetics.net/tassel. 62. Knox AK, Li CX, Vagujfalvi A, Galilba G, Stockinger EJ, Dubcovsky J. 87. Bradbury PJ, Zhang Z, Kroon DE, Casstevens TM, Ramdoss Y, Buckler ES. Identification of candidate CBF genes for the frost tolerance locus Fr-a(m)2 TASSEL. Software for association mapping of complex traits in diverse in Triticum monococcum. Plant Mol Biol. 2008;67:257–70. samples. Bioinformatics. 2007;23:2633–5. 63. Soltesz A, Smedley M, Vashegyi I, Galiba G, Harwood W, Vagujfalvi A. 88. Waterhouse AM, Procter JB, Martin DMA, Clamp M, Barton GJ. Jalview Transgenic barley lines prove the involvement of TaCBF14 and TaCBF15 version 2-a multiple sequence alignment editor and analysis workbench. in the cold acclimation process and in frost tolerance. J Exp Bot. 2013; Bioinformatics. 2009;25:1189–91. 64:1849–62. 89. National Center for Biotechnology Information. U.S. National Library of 64. Kocsy G, Athmer B, Perovic D, Himmelbach A, Szucs A, Vashegyi I, Schweizer Medicine, Rockville pike. 2017. https://blast.ncbi.nlm.nih.gov/Blast.cgi. P, Galiba G, Stein N. Regulation of gene expression by chromosome 5A Accessed 12 May 2017. during cold hardening in wheat. Mol Gen Genomics. 2010;283:351–63. 90. RaptorX: a Web Portal for Protein Structure and Function Prediction. 65. Gulick P, Drouin S, Yu Z, Poisson G, Monroy AF, Sarhan F. Transcriptome Computational Institute at the University of Chicago, Chicago. 2017. http:// comparison of winter and spring wheat responding to low temperature. raptorx.uchicago.edu/StructurePrediction/predict/. Accessed 15 May 2017. Genome. 2005;48:913–23. 91. Kaellberg M, Wang H, Wang S, Peng J, Wang Z, Lu H, Xu J. Template-based 66. Monroy AF, Dryanova A, Malette B, Oren DH, Farajalla MR, Liu W, Danyluk J, protein structure modeling using the RaptorX web server. Nat Protoc. 2012; Ubayasena LWC, Kane K, Scoles GJ, Sarhan F, Gulick PJ. Regulatory gene 7:1511–22. candidates and gene expression analysis of cold acclimation in winter and 92. Peng J, Xu J. RaptorX. Exploiting structure information for protein alignment spring wheat. Plant Mol Biol. 2007;64:409–23. by statistical inference. Proteins Struct Funct Bioinf. 2011;79:161–71. 67. Laudencia-Chingcuanco D, Ganeshan S, You F, Fowler B, Chibbar R, 93. Delpot W, Poon AF, Frost SDW, Kosakovsky Pond SL. Datamonkey 2010: a Anderson O. Genome-wide gene expression analysis supports a suite of phylogenetic analysis tools for evolutionary biology. Bioinfomatics. developmental model of low temperature tolerance gene regulation in 2010;26:2455–7. wheat (Triticum aestivum L.). BMC Genomics. 2011;12:299. 94. Kosakovsky Pond SL, Frost SDW, Muse SV. HyPhy: hypothesis testing using 68. Ganeshan S, Sharma P, Young L, Kumar A, Fowler DB, Chibbar RN. phylogenies. Bioinfomatics. 2005;21:676–9. Contrasting cDNA-AFLP profiles between crown and leaf tissues of cold- 95. Katoh K, Kuma K, Toh H, Miyata T. MAFFT version 5: improvement in acclimated wheat plants indicate differing regulatory circuitries for low accuracy of multiple sequence alignment. Nucleic Acids Res. 2005;33:511–8. temperature tolerance. Plant Mol Biol. 2011;75:379–98. 96. Kumar S, Stecher G, Peterson D, Tamura K. MEGA-CC: computing of 69. Winfield MO, Lu C, Wilson ID, Coghill JA, Edwards KJ. Cold- and light- molecular evolutionary genetics analysis program for automated and induced changes in the transcriptome of wheat leading to phase transition iterative data analysis. Bioinformatics. 2012;28:2685–6. from vegetative to reproductive growth. BMC Plant Biol. 2009;9:55. 97. Suyama M, Torrents D, Bork P. PAL2NAL: robust conversion of protein 70. Neumann K, Kobiljski B, Denčić S, Varshney RK, Börner A. Genome-wide sequence alignments into the corresponding codon alignments. Nucleic association mapping: a case study in bread wheat (Triticum aestivum L.). Acids Res. 2006;34:W609–12. Mol Breed. 2011;27(1):37–58. 98. SOGO Genome Information Database System for Innovation of Crop and 71. Endo TR, Gill BS. The deletion stocks of common wheat. J Hered. 1996; Livestock Production. Institute of Crop Science NARO, Tsukuba. 2017. 87:295–307. https://sogo.dna.affrc.go.jp/cgi-bin/sogo.cgi?lang=en. Accessed 09 Jul 2017. 72. Sears ER. Nullisomic-tetrasomic combinations in hexaploid wheat. In: Riley R, 99. Higo K, Ugawa Y, Iwamoto M, Korenaga T. Plant cis-acting regulatory DNA Lewis KR, editors. Chromosome manipulations and plant genetics. New elements (PLACE) database: 1999. Nucleic Acids Res. 1999;27:297–300. York: Springer Science+Business Media; 1966. p. 29–45. 100. Softberry. Softberry Inc., Mount Kisco. 2017. http://www.softberry.com/berry. 73. Stein N, Herren G, Keller B. A new DNA extraction method for high- phtml?topic=ann2_ann3&no_menu=on. Accessed 09 Jul 2017. throughput marker analysis in a large-genome species such as Triticum 101. Solovyev V, Shahmuradov I, Salamov A. Identification of promoter regions aestivum. Plant Breed. 2001;120:354–6. and regulatory sites. In: Ladunga I, editor. Computational biology of 74. Pearson K. On certain errors with regard to multiple correlation occasionally transcription factor binding. Totowa: Springer Science+Business Media; made by those who have not adequately studied this subject. Biometrika. 2010. p. 57–83. 1914;10:181. 102. Theissen G, Kim JT, Saedler H. Classification and phylogeny of the MADS- 75. Sanger F, Nicklen S, Coulson AR. DNA sequencing with chain-terminating box multigene family suggest defined roles of MADS-box gene subfamilies inhibitors. Proc Natl Acad Sci U S A. 1977;74:5463–7. in the morphological evolution of eukaryotes. J Mol Evol. 1996;43:484–516. 76. Katoh K, Misawa K, Kuma K, Miyata T. MAFFT. A novel method for rapid 103. Kaufmann K, Melzer R, Theissen G. MIKC-type MADS-domain proteins. multiple sequence alignment based on fast Fourier transform. Nucleic Acids Structural modularity, protein interactions and network evolution in land Res. 2002;30:3059–66. plants. Gene. 2005;347:183–98. Babben et al. BMC Genomics (2018) 19:409 Page 24 of 24 104. Taoka K, Ohki I, Tsuji H, et al. 14-3-3 proteins act as intracellular receptors for rice Hd3a florigen. Nature. 2011;476:332–5. 105. Tanksley SD, McCouch SR. Seed banks and molecular maps. Unlocking genetic potential from the wild. Science. 1997;277:1063–6. 106. Knox AK, Dhillon T, Cheng HM, Tondelli A, Pecchioni N, Stockinger EJ. CBF gene copy number variation at frost Resistance-2 is associated with levels of freezing tolerance in temperate-climate cereals. Theor Appl Genet. 2010;121:21–35. 107. Zhu J, Pearce S, Burke A, See D, Skinner D, Dubcovsky J, Garland-Campbell K. Copy number and haplotype variation at the VRN-A1 and central FR-A2 loci are associated with frost tolerance in hexaploid wheat. Theor Appl Genet. 2014;127:1183–97. 108. Campoli C, Matus-Cadiz MA, Pozniak CJ, Cattivelli L, Fowler DB. Comparative expression of Cbf genes in the Triticeae under different acclimation induction temperatures. Mol Gen Genomics. 2009;282:141–52. 109. Diaz A, Zikhali M, Turner AS, Isaac P, Laurie DA. Copy number variation affecting the photoperiod-B1 and Vernalization-A1 genes is associated with altered flowering time in wheat (Triticum aestivum). PLoS One. 2012;7:e33234. 110. Puranik S, Acajjaoui S, Conn S, et al. Structural basis for the oligomerization of the MADS domain transcription factor SEPALLATA3 in Arabidopsis. Plant Cell. 2014;26:3603–15. 111. Bachhawat P, Stock AM. Crystal structures of the receiver domain of the response regulator PhoP from Escherichia coli in the absence and presence of the phosphoryl analog Beryllofluoride. J Bacteriol. 2007;189:5987–95. 112. Wu Y, Dey R, Han A, Jayathilaka N, Philips M, Ye J, Chen L. Structure of the MADS-box/MEF2 domain of MEF2A bound to DNA and its implication for Myocardin recruitment. J Mol Biol. 2010;397:520–33. 113. Santelli E, Richmond TJ. Crystal structure of MEF2A core bound to DNA at 1. 5 angstrom resolution. J Mol Biol. 2000;297:437–49. http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png BMC Genomics Springer Journals

Association genetics studies on frost tolerance in wheat (Triticum aestivum L.) reveal new highly conserved amino acid substitutions in CBF-A3, CBF-A15, VRN3 and PPD1 genes

Free
24 pages

Loading next page...
 
/lp/springer_journal/association-genetics-studies-on-frost-tolerance-in-wheat-triticum-66GcZXXyBh
Publisher
Springer Journals
Copyright
Copyright © 2018 by The Author(s).
Subject
Life Sciences; Life Sciences, general; Microarrays; Proteomics; Animal Genetics and Genomics; Microbial Genetics and Genomics; Plant Genetics and Genomics
eISSN
1471-2164
D.O.I.
10.1186/s12864-018-4795-6
Publisher site
See Article on Publisher Site

Abstract

Background: Understanding the genetic basis of frost tolerance (FT) in wheat (Triticum aestivum L.) is essential for preventing yield losses caused by frost due to cellular damage, dehydration and reduced metabolism. FT is a complex trait regulated by a number of genes and several gene families. Availability of the wheat genomic sequence opens new opportunities for exploring candidate genes diversity for FT. Therefore, the objectives of this study were to identity SNPs and insertion-deletion (indels) in genes known to be involved in frost tolerance and to perform association genetics analysis of respective SNPs and indels on FT. Results: Here we report on the sequence analysis of 19 candidate genes for FT in wheat assembled using the Chinese Spring IWGSC RefSeq v1.0. Out of these, the tandem duplicated C-repeat binding factors (CBF), i.e. CBF-A3, CBF-A5, CBF-A10, CBF-A13, CBF-A14, CBF-A15, CBF-A18, the vernalisation response gene VRN-A1, VRN-B3, the photoperiod response genes PPD-B1 and PPD-D1 revealed association to FT in 235 wheat cultivars. Within six genes (CBF-A3, CBF-A15, VRN-A1, VRN-B3, PPD-B1 and PPD-D1) amino acid (AA) substitutions in important protein domains were identified. The amino acid substitution effect in VRN-A1 on FT was confirmed and new AA substitutions in CBF-A3, CBF-A15, VRN-B3, PPD-B1 and PPD-D1 located at highly conserved sites were detected. Since these results rely on phenotypic data obtained at five locations in 2 years, detection of significant associations of FT to AA changes in CBF-A3, CBF-A15, VRN-A1, VRN-B3, PPD-B1 and PPD-D1 may be exploited in marker assisted breeding for frost tolerance in winter wheat. Conclusions: A set of 65 primer pairs for the genes mentioned above from a previous study was BLASTed against the IWGSC RefSeq resulting in the identification of 39 primer combinations covering the full length of 19 genes. This work demonstrates the usefulness of the IWGSC RefSeq in specific primer development for highly conserved gene families in hexaploid wheat and, that a candidate gene association genetics approach based on the sequence data is an efficient tool to identify new alleles of genes important for the response to abiotic stress in wheat. Keywords: Triticum aestivum L., Frost tolerance (FT), Candidate genes, Association studies, SNP, Indel, Haplotypes, CBF, VRN, PPD1 * Correspondence: dragan.perovic@julius-kuehn.de Julius Kühn-Institut (JKI), Federal Research Centre for Cultivated Plants, Institute for Resistance Research and Stress Tolerance, Erwin-Baur-Str. 27, 06484 Quedlinburg, Saxony-Anhalt, Germany Full list of author information is available at the end of the article © The Author(s). 2018 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. Babben et al. BMC Genomics (2018) 19:409 Page 2 of 24 Background resources, extensive phenotypic data and various genomic Wheat (Triticum aestivum L.) is on the world wide level arrays [13]. The chromosomal sequence information is the crop grown on the largest acreage and is of prime available from the IWGSC [14]. All mentioned data- importance for human nutrition and animal feed. bases are useful for the identification of homologous Worldwide production of 729 million tones in 2014, chromosome sequences in bread wheat. Summarising, a ranks wheat globally the third most important crop, next lot of sequence information of wheat sorted chromo- to maize (Zea mays L.) and rice (Oryza sativa L.). Pro- some arms [15–17], T. urartu [18]and Ae. tauschii [19] duction of wheat in Europe was 249.13 mt with an aver- have been published during the past few years and are age yield of 4.25 t/ha in 2014. At the same time in North integrated in the public databases mentioned above. America 84.68 mt of wheat were harvested with an aver- Today, the current version of the Chinese Spring age yield of 2.994 t/ha, while in Asia the average yield IWGSC RefSeq v1.0 [14] facilitates the exploitation of was 3.095 t/ha resulting in a production of 315.71 mt the wheat sequence in basic science as well as in [1]. In North America, North and Eastern Europe and applied wheat breeding. Russia wheat production is exposed to low temperature and frost damage occurs frequently. Depending of the Association analysis time of sowing wheat is known as winter or spring Genome-wide association studies (GWAS) are a power- wheat, respectively, which differ in the vernalisation ful tool to identify genomic regions and candidate genes requirement and frost hardiness. Winter wheat varieties involved in FT. High density genotyping arrays i.e. the require an extended exposure to cold temperatures, Illumina 90 K chip [8], the Affymetrix 820 K [20] and typically 4 to 8 weeks at less than 4 °C, to induce flower- the breeding 35 K axiom [21] arrays enable the deter- ing (vernalization) while spring wheat varieties do not mination of the genetic structure of complex traits by have such a requirement [2]. Vernalization requirements GWAS. However, they are limited in the identification keep these plants safely dormant over winter resulting in of new alleles involved in FT. One approach that allows a longer vegetation period and higher yields of winter mining of novel alleles is re-sequencing of candidate wheat. genes followed by a candidate gene association genetics study [22]. Wheat genome Bread wheat is an allohexaploid species (2n = 6× = 42) Cold stress signalling and transcriptional regulation with an AABBDD genome derived from two independ- Frost tolerance (FT) is a complex biological process in- ent hybridisations [3–5]. This complex genome of about volving at least two main pathways and many additional 17 Giga-base pairs (Gbp) has a repeat content of ap- processes encompassing a large number of genes. The proximately 80% [6] with a gene density of on average main pathway is frost response, whereas flowering as the one gene per 87 to 184 Kilo-base pairs (Kbp) [7]. The second pathway involves vernalisation. Plant cells can acquisition of this large genome sequence became feas- perceive cold stress through the membrane rigidification 2+ ible since the introduction of next Generation Sequen- effect. This leads to Ca influx into the cytosol and 2+ cing (NGS) technologies. these characteristic Ca signatures are detected by Nowadays, the efforts of the International Wheat calcium binding proteins (CBPs) [23, 24]. The induction Genome Sequencing Consortium (IWGSC), in the of cold response genes starts with the activation of development of a physical map and a reference sequence so-called inducer of CBF expression (ICE) genes. These facilitates many downstream applications, i.e. develop- genes are MYC-type basic helix–loop–helix transcrip- ment of high throughput genotyping platforms [8], effi- tion factors which bind on MYC recognition sites of cient development of genome specific primers [9], C-repeat binding factor (CBF) promoters and conse- exome sequencing in large genebank collections [10] etc. quently activate the expression of these genes [25]. The The wheat reference sequence, survey sequence and CBF transcription factors are members of the APE- physical map are available at many public data bases. TALA2/Ethylene response element binding protein The CerealsDB web page, created by members of the (AP2/EREBP) family of DNA-binding proteins [26, 27]. Functional Genomics Group at the University of Bristol The AP2/EREBP DNA-binding protein domain comprises [11], includes online resources of genomic information, a structure with three β-strands and one α-helix [28–30]. i.e. varietal SNPs, DArT markers, and EST sequences all Furthermore, PKK/RPAGRxKFxETRHP and DSAWR mo- linked to a draft genome sequence of the cultivar Chin- tifs are present, which are typical features of CBF proteins ese Spring [12]. The Unité de Recherche Génomique [31]. The CBF transcription factors bind to the C-repeat/ Info (URGI) web portal includes datasets such as dehydration-responsive element (CRT/DRE) and induce chromosome survey sequences, reference sequences, the expression of cold-responsive/late embryogenesis– physical maps, genetic maps, polymorphisms, genetic abundant (COR/LEA)genes [32–34]. The CRT/DRE Babben et al. BMC Genomics (2018) 19:409 Page 3 of 24 element is a highly conserved CCGAC sequence in the [47, 61]. Knox et al. [62]analysedthe FR-A 2locus promoter of cold- and dehydration-responsive genes [35]. of diploid Triticcum monococcum (A genome is very The flowering pathway is involved in FT because it similar to the A genome of hexaploid wheat) and contains vernalisation (VRN) and photoperiod response identified three CBFs (CBF12, CBF14 and CBF15) (PPD) genes that contributed to low temperature accli- highly associated with FT. Also Vagujfalvi et al. [59] matisation [36, 37]. The PPD1 is a member of the PRR identified CBF14 and CBF15 as FT associated in Triticum (pseudo response regulator) protein family and interacts monococcum and Soltesz et al. [63] confirmed this for Tri- with CONSTANS (CO) [38]. This family possesses a ticum aestivum. Additionally, Kocsy et al. [64] identified pseudo-receiver domain and a CONSTANS motif three genes, i.e. Tacr7 (Triticum aestivum cold-regulated [39, 40]. Downstream the VRN genes are localised, 7), Cab (calcium-binding EF-hand family protein-like) and i.e. VRN1, VRN2 and VRN3. VRN1 encodes a Dem (Defective embryo and meristems) being differen- MADS-box transcription factor, VRN2 is similar to a tially expressed during cold hardening in wheat. In putative zinc finger and a CCT domain and VRN3 addition, on the transcriptome level FT signaling is much encodes a RAF kinase inhibitor like protein [41–45]. more complex. Hundreds to thousands of wheat genes Both pathways, the flowering and the cold response were identified to be significantly up- or downregulated pathway, are connected by the interaction of VRN1 and under low temperature [34, 65–69]. CBF genes. For example, the VRN1 gene is able to re- The aim of this study was therefore to (i) identity duce the transcript levels of CBFs and COR genes under SNPs and indels (insertion-deletion) in genes known to long day conditions [36]. be involved in frost tolerance in wheat and (ii) to conduct a candidate gene based association genetics ap- Current knowledge about frost tolerance of wheat proach to get information on the effect of respective Two major FT loci, frost resistance 1 (FR1) and frost re- SNPs and indels on FT. sistance 2 (FR2), were identified on the long arm of chromosome 5A of wheat [46, 47]. Zhao et al. [48] de- Methods scribed an additional FT Quantitative Trait Locus (QTL) Plant material and DNA extraction on chromosome 5B in wheat germplasm from central A diverse set of 235 bread wheat genotypes was used for Europe. Due to the importance of cold acclimatisation in PCR amplification, amplicon sequencing and association winter and spring wheat, the locus FR1 was physically genetics studies. One hundred and seventy-nine and genetically mapped [49, 50]. However, it is not clear cultivars, 48 lines and 8 doubled haploid (DH) lines whether FR1 is an independent gene or is based on a originating from 28 countries from five continents pleiotropic effect of VRN1 [36, 51]. As a consequence of (Additional file 1: Table S1) were analysed. The associ- the presence of the A, B and D genome, there are three ation panel was selected based on pre-existing know- VRN1 homologous genes (VRN-A1, VRN-B1 and ledge regarding the reaction to growing conditions VRN-D1) not having the same impact on vernalisation. during winter time, i.e. high latitude and continental Wheat plants with a dominant VRN-A1 allele are spring European winter wheat collections as well as Russian type and do not need vernalisation for flowering, while and North American cultivars. Furthermore, the core the dominant VRN-B1 and VRND1 alleles result also in collection of the Institute of Field and Vegetable Crops spring habit, but they are weaker than VRN-A1 [52]. (IFVCNS), Novi Sad, Serbia [70] and parental lines of The difference among the spring (dominant VRN-A1 Western European hybrid breeding programs were included. alleles) and winter varieties (recessive vrn-A1 alleles) is For the physical mapping of PCR amplicons to chromo- based on a C/T single nucleotide polymorphism (SNP) somes and chromosome segments, 21 nulli-tetrasomic lines in the fourth exon of the VRN-A1 gene. The winter (NT-lines) and 46 deletion-lines were used [9, 71, 72]. The varieties carrying an ambiguous nucleotide Y (C/T) are DNA was extracted at the three leaf stage according to Stein more frost tolerant than genotypes carrying the VRN-B1 et al. [73]. or VRN-D1 allele which both confer higher frost tolerance than spring varieties carrying a C at the re- Field experiments and phenotypic data analysis spective SNP in VRN-A1 [2, 53–56]. The field experiments were performed in five environ- Plenty of studies identified the FR-A2 locus on ments during 2012 (Gatersleben, Germany; Ranzin, chromosome 5A as the most important locus involved Germany; Puskin, Russia; Roshchinskiy, Russia; Novosibirsk, in FT in wheat [57–59]. The FR-A2 locus comprises at Russia) and 2013 (Gatersleben, Germany; Ranzin, Germany; minimum 11 CBF genes and is located approximately Puskin, Russia; Roshchinskiy, Russia) and one in 2014 30 cM proximal to VRN1 [46, 47, 60]. Two independent (Novosibirsk, Russia). All 235 genotypes were tested in studies illustrate that CBF-A3, which is located in the Gatersleben, Ranzin, Pushkin, and Novosibirsk in a random FR-A2 locus, plays an important role in wheat FT design in double rows and two replications per genotype. Babben et al. BMC Genomics (2018) 19:409 Page 4 of 24 The trial in Roshchinskiy was conducted as a miniplot Tartu, Estonia), 0.2 mM dNTPs (Fermentas, St. Leon-Rot, (2.5m ) trial with one replication. FT was evaluated as win- Germany) and 0.25 pmol primers (Microsynth, Balgach, ter survival in percent (%), i.e. the survival of plants per Switzerland) or 0.4 U MyTaq™ DNA Polymerase, 1 x My genotype and plot was measured as a quantitative trait (%) Taq Reaction Buffer B (that comprised 1 mM dNTPs and ranging from 0% (all dead) to 100% (all alive) after winter. 3 mM MgCl ) (BIOLINE, Luckenwalde, Germany) and To take into account the diversity of the environments 0.25 pmol primers. The PCR fragment amplification was with respect to climatic conditions, a co-variable com- conducted in a GeneAmp® PCR System 9700 (Applied prising the number of days with an average air Biosystems, Darmstadt, Germany) using various PCR pro- temperature under − 15 °C in the period from December files (Additional file 2: Table S2). PCR fragments were sep- 1st to April 30th of each year was calculated. The correl- arated by agarose gel electrophoresis and analysed using ation coefficient r [74] was calculated between the the imaging system Gel Doc™ XR and the Quantity One® co-variable and FT. Principal component analysis (PCA) 1-D analysis software (4.6.2) (Bio-Rad, Hercules, USA) was used to get information on the influence of the en- and subsequently sequenced by the company Microsynth vironment on FT. Correlation coefficient r and PCA AG (Balgach, Switzerland) using the Sanger sequencing were calculated with the JMP® Genomics 5.1 software method [75]. (SAS, Cary, USA). Data measured at the same location in different years were treated as independent. All data Detection of polymorphisms (SNPs/indels) and haplotypes sets that exhibit a deviation described by the second The sequencing raw data were edited using Sequencer component of PCA analysis, indicate an atypical trait 5.1 (Gene Codes Corporation, Ann Arbor, USA). Next, value putatively caused by secondary environmental fac- the adjusted sequences of each primer pair were aligned tors and were discarded. Furthermore, the coefficient of by using the Multiple Alignment tool Fast Fourier variation (CV; standard deviation divided by arithmetic Transform (MAFFT) [76]. MAFFT parameters were set average) was calculated as well as the variance of each as default. The polymorphisms between the 235 geno- environment and year. The data sets with a very low CV types were detected automatically via a small multiple (under 0.15) and/or variance (under 150), were classified sequence alignment (MSA) script (Additional file 3: Data as non-representative and were discarded. After editing S1) in the free software Java™. Parameters for the poly- of the field data, the significance of the influence by en- morphism detection were as follows: polymorphisms be- vironment and genotype was tested by using the analysis tween defined bases (A, T, C or G) and ambiguous of covariance (ANCOVA) and a general linear model nucleotides (N) were ignored.. The detected SNPs were (GLM) procedure. Based on this, the Least-Squares used for the identification of haplotypes and compo- means (LS means) were calculated. For all of these ana- nents of haplotypes for each candidate gene according lyses the SAS® 9.4 software (SAS, Cary, USA) was used. to the position in promoter, exon, intron or in the 3` un- translated region (UTR). PCR amplification, fragment analysis and re-sequencing Primer development and testing was conducted accord- Population structure and kinship calculation ing to Babben et al. [9]. Furthermore, the primer assign- In order to account for population structure effects in ment was verified based on BLASTs of mRNA and association studies, the population structure was esti- genomic regions of close relatives against the Chinese mated based on 249 SNPs, chosen according to the map Spring reference assembly v1.0 using NCBI Megablast. location and even distribution along the 21 wheat chro- Subsequently, 5 kb upstream and downstream regions mosomes [8, 77]. Population structure of wheat acces- were extracted based on the BLAST results and the ana- sions was assessed using STRUCTURE v 2.3.3, which is lysis of the nulli-tertasomic (NT) lines [9]. Primers were based on a Bayesian model-based clustering algorithm aligned to these genomic regions using a free shift align- that incorporates admixture and allele correlation ment with affine gap costs (gap opening = 5, gap elong- models to account for genetic material exchange in pop- ation = 0.01, match = − 1). If ambiguities were detected ulations, resulting in shared ancestry [78]. Five inde- at the beginning or the end of exons, primers were pendent runs were performed setting the number of manually modified to match the consensus dinucleotides populations (k) from 1 to 10, burn in time and Markov of splice sites, GT and AG. PCR amplification was Chain Monte Carlo (MCMC) replication number both performed in 20 μl reaction volume (RV) by using two to 100,000. The k-value was determined by ln P(D) in polymerases, i.e. FIREPol® DNA polymerase (Solis STRUCTURE output and an ad hoc statistic Δk based BioDyne, Tartu, Estonia) and MyTaq™ DNA polymerase on the rate of change in ln P(D) between successive (BIOLINE, Luckenwalde, Germany). The master mix for k-values [79]. Wheat lines with probabilities ≥0.5 were a single PCR reaction comprised 0.4 U FIREPol® DNA assigned to corresponding clusters. Lines with probabil- Polymerase, 1 x Buffer B, 2.5 mM MgCl (Solis BioDyne, ities < 0.5 were assigned to a mixed group. The 2 Babben et al. BMC Genomics (2018) 19:409 Page 5 of 24 population structure plot was constructed by using The alignment of initial and homologous AA sequences STRUCTURE PLOT [80] and the Principal coordinate was performed using CLC and MAFFT. The last step analysis (PCoA) by using the software package DARwin was the identification of domains and motifs and the [81]. The kinship (K) matrix was calculated on a modi- prediction of the protein structure via RaptorX. For the fied Roger’s distance [82–84] by using R version 3.2.1 quality check of the RaptorX results, the uGDT-value, free software [85]. The Roger’s distance was calculated uSeqId-value and P-value were taken into account. A − 3 as follows: uGDT > 50, uSeqId > 30 and a P-value less than 1*10 are indicators for good quality modelling. Nucleotid di- vffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi vergence rates (dN/dS) between the identified haplotypes m nj XX t 2 pffiffiffiffiffiffiffi and reference sequences of Triticum aestivum and re- d ¼ ðp −q Þ ð1Þ ij ij 2m i¼1 j¼1 lated species were analysed using a web-based HyPhy application [93, 94]. Haplotype and reference sequences where pij and qij are allele frequencies of the jth allele at were used to generate sequence alignments by applying the ith locus, n number of alleles at the ith locus and m the L-INS-i option in MAFFT [95]. To obtain the best number of loci. fitting substitution model, the model test application in MEGA-CC was used [96]. The reconstruction of the Association genetics analysis phylogenetic tree was done with maximum likelihood al- SNP and indel association genetics analysis gorithm and 500 bootstraps in MEGA-CC (using the The SNP/indel association analysis was performed with corresponding model). The resulting protein alignment a set of 235 genotypes by using TASSEL 5.0.9 [86, 87]. and the corresponding nucleotide sequences were used Only the SNPs/indels with minor allele frequencies to compute codon alignments with Pal2Nal [97]. The (MAF) > 5% were taken into consideration for analysis. codon alignments and the phylogenetic tree were used Furthermore, population structure, kinship matrix and to compute dN/dS for each variable site using the SLAC phenotypic LS means were included in association stud- method in HyPhy [94]. ies applying the mixed linear model (MLM) algorithm. The threshold of statistically significant effects was set to In silico promoter analysis 1.30 –log of P (P-value of 0.05) according to Li et al. Identification of promoter regions and regulatory sites [22, 77] who used this threshold for the analysis of can- was performed using the internet databases SOGO from didate genes at chromosomal regions with high linkage the National Institute of Agrobiological Sciences [98, 99] disequilibrium (LD. The LD was calculated via TASSEL and Softberry [100, 101]. 5.0.9 by using the full matrix LD type method with 8064 comparisons after MAF selection. Results Phenotypic data analysis Haplotype association genetics analysis Phenotypic data of five field locations were obtained The haplotype association genetics study was performed during 2 years (Fig. 1 and Additional file 4: Figure S1). by using TASSEL 4.1.20 [86, 87]. The parameters, calcu- These phenotypic data sets and the established lation and significance threshold were the same as used co-variable (number of days under − 15 °C) are only for the SNP and indel association analysis. The same ap- weakly correlated (r = 0.24), indicating that all pheno- plies to the LD calculation for the genotype groups from typic data are not suitable to be used for FT analysis. All Europe, North America and Asia including Australia. the FT scores were transformed using arcsine and Cox-Box (data not shown), but transformations did not Sequence analysis result in any improvement. A scatter plot illustrates that The translation of associated gene sequences into amino the phenotypic data obtained in Pushkin_2013 and acid (AA) sequences, homologue AA sequence search, Novosibirsk_2014 are not due to FT (Additional file 4: AA alignments, identification of protein domains and Figure S1). A reason for this may be the very few days motifs as well as prediction of secondary protein struc- under − 15 °C at Pushkin in 2013 (10 days) but a rather tures were performed using the following software: CLC low winter hardiness (mean winter survival of 16.2%) Main Workbench 7.6 (CLC Bio, Aarhus, Denmark), due to missing snow coverage; while in Novosibirsk in MAFFT, free software Jalview [88], NCBI (National 2014 a very high number of days under − 15 °C (41 days) Center for Biotechnology) protein BLAST [89] and but less frost damage was observed due to continuous RaptorX [90–92]. The nucleotide sequences were trans- snow coverage (mean winter survival of 82.6%). The lated into AA sequences via CLC following BLASTn PCA calculation shows that the environments Ran- against the NCBI protein database for homologous AA zin_2013, Gatersleben_2013 and Novosibirsk_2014 are sequence identification. Parameters were set as default. separated from the other environments described by the Babben et al. BMC Genomics (2018) 19:409 Page 6 of 24 Fig. 1 Phenotypic variation at five field locations during two experimental years. Center lines show the medians; box limits indicate the 25th and 75th percentiles as determined by R software; whiskers extend to 5th and 95th percentiles, outliers are represented by dots. n > 203 second component (Additional file 5: Figure S2). This further analysis because results are not really related to indicates that at these locations in the respective years FT. Out of the data sets Ranzin_2012, Gatersleben_2012, phenotypic data are influenced by additional factors and Pushkin_2012, Novosibirsk_2012, Roshchinskiy_2012 are not entirely due to differences in FT. An additional and Roshchinskiy_2013 the LS means were calculated characteristic of phenotypic data is the CV and variance (Additional file 1: Table S1). (in %) calculation of each environment per year. High CVs (over 0.15) and/or variances (over 150) represent a good phenotypic distribution of all 235 genotypes within Candidate gene polymorphisms the field data sets. Ranzin_2013, Gatersleben_2013 and A set of 65 specific primer pairs from the previous Novosibirsk_2014 show very low CVs between 0.08 and study [9] was BLASTed against the IWGSC RefSeq 0.15 and variances between 54.3 and 149.4. The environ- allowing the identification of 39 primer combinations ment Pushkin_2013 shows a low variance with 51 but a that cover the full length of 19 genes and their struc- high CV value of 0.44. These values resulted from a very tural analyses. An optimised set of 39 primer pairs low winter survival with a low standard deviation was used for PCR amplification, amplicon sequencing (Table 1). In conclusion of the phenotypic data analysis, and association genetics studies. No exact position environments Ranzin_2013, Gatersleben_2013, Push- could be determined for the first forward primer of kin_2013 and Novosibirsk_2014 were excluded from PPD-B1 andthe thirdforwardprimer of VRN-D2, Table 1 Variance and coefficient of variation of frost tolerance data Location/Year Size Missing data Mean frost tolerance in % Standard deviation Variance Coefficient of variation Pushkin_2012 235 0 82.40 13.88 192.52 0.17 Pushkin_2013 235 0 16.16 7.14 51.01 0.44 Novosibirsk_2012 235 0 45.60 16.46 271.05 0.36 Novosibirsk_2014 235 0 82.58 12.22 149.42 0.15 Ranzin_2012 235 0 95.20 13.31 166.12 0.14 Ranzin_2013 235 0 97.67 8.19 54.27 0.08 Roshchinskyiy_2012 235 32 69.33 22.64 512.53 0.33 Roshchinskiy_2013 235 4 56.55 13.68 187.24 0.24 Gatersleben_2012 235 0 85.89 22.14 490.01 0.26 Gatersleben_2013 235 0 97.18 9.12 83.21 0.09 Babben et al. BMC Genomics (2018) 19:409 Page 7 of 24 since BLASTN revealed seven and five matches, re- Population structure and kinship spectively. For all other primers an exact position was Genetic profiles were obtained by using the ILLUMINA determined. In addition, an unassigned scaffold was Infinium iSelect 90 k wheat chip. These data were kindly identified for PPD-B1 located on chromosome 2B provided by the Dr. A. Börner from IPK Gatersleben and (Additional file 6: Figure S3). The total re-sequenced will be used for GWAS analysis in an additional paper. length of candidate genes was 13.3 Kbp coding DNA As an outcome of the STRUCTURE analysis based on sequence (CDS) and 43.76 Kbp genomic lengths, with 249 SNPs covering the whole genome, k = 3 was the a CDS/genomic length ratio of 0.30. In total, a se- most probable number of sub-populations mostly quence length of 33.5 Kbp was analysed. In our set of corresponding to the origin of genotypes. The 235 genotypes, the 15 candidate genes Cab, CBF-A3, neighbour-joining tree revealed three sub-populations CBF-A5, CBF-A10, CBF-A13, CBF-A14, CBF-A15, i.e. North American, Russian, and North and Central CBF-A18, Tacr7, VRN-B3, VRN-A1, VRN-B1, VRN-D1, European genotypes. This population structure resem- PPD-B1 and PPD-D1 were polymorphic and four bled a tight association to the origin of the genotypes (CBF-D1, Dhn1, VRN2 and Dem)(Table 2)were analysed. In the first sub-population, accessions from monomorphic. In total 254 polymorphic sites, i.e. 221 European countries, subdivided into two groups, i.e. SNPs and 33 indels were identified. The SNP number North and Central European genotypes, are included. per gene ranged from 0 to 97 and the indel number Accessions from Canada, Mexico and USA were pre- from 0 to 12. Over all genes, 42 polymorphic sites in dominantly in the second sub-population, and the third promoter regions, 64 in introns, 25 in 3` UTRs and sub-population contained predominantly accessions 123 in exons were identified. Out of the 254 poly- from Russia and Kazakhstan. The population structure is morphic sites, 131 were located in non-coding re- shown in Fig. 3 and Additional file 7: Figure S4. The gions, and 54 synonymous and 69 non-synonymous STRUCTURE membership coefficients and the modified polymorphic sites were identified (Fig. 2). The num- Roger’s distance revealed a high degree of admixture in a ber of haplotypes ranged between two and six and large number of accessions. Therefore, many accessions the haplotype diversity (Hd) between 0.07 and 0.68 could not be classified to main groups, because their (Table 3). genomes represent a mixture of the main groups. This Table 2 List of analysed FT candidate genes, sequence length, number of specific PCR fragments and detected mutations Gene Chr. position Number of specific Sequence length CDS length Gene length CDS/gene Detected Number of PCR fragments in bp in bp in bp length ratio mutations polymorphic sites CBF-D1 5D 1 709 639 639 1.00 no 0 CBF-A3 5A 1 790 741 741 1.00 yes 4 CBF-A5 7A 1 1027 633 633 1.00 yes 6 CBF-A10 5A 1 867 720 720 1.00 yes 2 CBF-A13 5A 1 855 720 720 1.00 yes 6 CBF-A14 5A 2 610/748 639 639 1.00 yes 6 CBF-A15 5A 1 786 726 726 1.00 yes 7 CBF-A18 6A 1 1045 738 738 1.00 yes 37 Dhn1 5D 1 936 420 638 0.66 no 0 VRN-A1 5A 4 1039/1097/621/770 735 11,414 0.06 yes 15 VRN-B1 5B 5 600/817/662/467/1077 735 5498 0.13 yes 24 VRN-D1 5D 4 814/1359/1429/705 735 11,550 0.06 yes 2 VRN-D2 4D 2 976/534 639 1650 0.39 no 0 VRN-B3 7B 2 1602/994 534 1258 0.42 yes 1 Cab 5A 1 728 n.a. n.a. n.a. yes 28 Dem 6B/6D 1 616 n.a. n.a. n.a. no 0 Tacr7 2B 1 765 n.a. n.a. n.a. yes 3 PPD-B1 2B 6 1600/954/927/571/773/378 1995 3053 0.65 yes 5 PPD-D1 2D 3 726/1094/966 1983 3141 0.63 yes 109 total 39 34,034 13,332 43,758 0.30 255 Chr. chromosome, bp base pairs, CDS coding DNA sequence, n.a. not available PPD-D1 PPD-B1 VRN-D1 VRN-B1 VRN-A1 Vrn3 Tacr7 CBF-A18 CBF-A15 CBF-A14 CBF-A13 CBF-A10 CBF-A5 CBF-A3 Cab Babben et al. BMC Genomics (2018) 19:409 Page 8 of 24 Polymorphic sites synonymous non-synonymous non-coding Candidate genes Fig. 2 Number of synonymous, non-synonymous and non-coding mutations per candidate gene may be due to germplasm exchange and it use in breed- region of CBF-A5, CBF-A13 and CBF-A14.The ing programs. remaining 17 SNPs and four indels were located in exon regions of five CBFs, PPD-D1, VRN-A1 and SNP/indel association analysis VRN-B3. 11 of these SNPs are non-synonymous. Six The association analysis was performed using 254 associated genes were located on wheat chromosome polymorphic sites identified in 15 candidate genes. 5A and one each on chromosome 2D, 7A and 7B. After MAF selection, 58 polymorphic sites, i.e. 46 Allelic effects of significantly associated polymorphic SNPs and 12 indels, from 13 candidate genes (Cab, sites on FT ranged from 0.11% to 15.01 (Fig. 4, CBF-A3, CBF-A5, CBF-A10, CBF-A13, CBF-A14, Table 4, Additional file 8:Table S3). With an LD of CBF-A15, Tacr7, VRN-B3, VRN-A1, VRN-D1, PPD-B1 r = 0.92 to 1, the statistical significances of associ- and PPD-D1) were included in further analysis. In ated SNPs and indels from the five CBF genes on the SNP/indel association, 27 statistically significant chromosome 5A (CBF-A3, CBF-A10, CBF-A13, (P < 0.05) polymorphic sites (21 SNPs and six indels) CBF-A14 and CBF-A15)isveryhigh. TheLDplotof in six candidate genes were identified (Table 4). Four all used SNPs/indels for association calculation is SNPs and two indels were located in the promoter showninFig. 5. Table 3 Polymorphic candidate genes, location of polymorphisms, amino acid change and haplotype diversity Gene Chr. Acces-sions Poly-morphic SNPs in-dels Promotor Intron Exon 3` UTR Non- Non-syno- Syno-nymous Haplo-types Hd sites coding nymous CBF-A3 5A 235 4 4 0 0 0 4 0 0 3 1 2 0.30 CBF-A5 7A 235 6 5 1 1 0 5 0 1 3 2 4 0.54 CBF-A10 5A 235 2 2 0 0 2 0 0 1 1 2 0.30 CBF-A13 5A 235 6 4 2 1 0 5 0 2 4 0 3 0.30 CBF-A14 5A 235 6 5 1 5 0 1 0 5 0 1 3 0.30 CBF-A15 5A 235 7 6 1 0 0 7 0 0 5 2 2 0.30 CBF-A18 6A 235 37 34 3 1 0 25 11 12 12 13 3 0.20 VRN-A1 5A 235 15 10 5 7 6 2 0 13 2 0 6 0.19 VRN-B1 5B 235 24 21 3 20 4 0 0 24 0 0 5 0.07 VRN-D1 5D 235 2 2 0 0 1 0 1 2 0 0 3 0.14 VRN-B3 7B 235 1 1 0 0 0 1 0 0 1 0 3 0.50 Cab 5A 235 28 24 4 5 0 12 11 16 12 0 6 0.64 Tacr7 2B 235 3 2 1 0 0 1 2 2 1 0 3 0.40 PPD-B1 2B 235 5 5 0 1 0 4 0 1 3 1 6 0.37 PPD-D1 2D 235 109 97 12 1 53 55 0 54 22 33 6 0.68 total 255 222 33 42 64 124 25 132 69 54 58 Chr. chromosome, SNP single nucleotide polymorphism, indel insertion-deletion, UTR untranslated region, Hd Haplotype Diversity Number of polymorphic sites Babben et al. BMC Genomics (2018) 19:409 Page 9 of 24 Fig. 3 Population structure of 235 wheat cultivars based on 249 SNPs. Each individual is represented by a single vertical line that is partitioned into Q colored segments (Q = 3) in the x-axis. The y-axis illustrated the Q-value Linkage disequilibrium (LD) and germplasm origin CBF-A10, CBF-A13, CBF-A14, CBF-A15, VRN-A1), 6A Out of 235 studied genotypes, 116 originate from (CBF-A18) and 7A (CBF-A5). The allelic effects of asso- Europe, 38 from North America and 81 from Asia. After ciated haplotypes for winter survival ranged from 0.68% MAF selection, 51 polymorphic sites were identified in to 33.95 (Fig. 6, Table 5, Additional file 8: Table S3). the genotypes from Europe, 88 in those derived from North America and 25 in Asian genotypes. All three dif- In silico sequence analysis ferent sub-populations show that the CBF genes on C-repeat binding factors (CBFs) chromosome 5A are in a very high LD, and are sepa- To validate and annotate obtained sequences of candi- rated from other 5A located genes, i.e. VRN-A1 and date genes, an in silico analysis was performed. NCBI Cab. All non-5A candidate genes show a low LD to each protein BLASTX of CBF AA sequences results in nine other in all sub-populations. homologues from five related and four unrelated species (Additional file 9: Table S4). For the CBFs (CBF-A3, Haplotype association analysis CBF-A5, CBF-A10, CBF-A13, CBF-A14, CBF-A15, A set of 44 haplotypes and components of haplotypes CBF-A18) that showed a significant association to FT, from 15 candidate genes were identified. After MAF se- the AP2/EREBP transcription factor domain was identi- lection, 32 haplotypes (six promoter haplotypes, 12 exon fied, which is predicted to encode a protein structure haplotypes, one intron haplotype, three 3` UTR haplo- with three β-strands and one α-helix (2gcc). The param- types and 10 whole gene haplotypes) in 14 candidate eters of protein structure modelling are shown in Table 6. genes (Cab, CBF-A3, CBF-A5, CBF-A10, CBF-A13, The CBF-A13_1 haplotype protein shows a low P-value − 4 CBF-A14, CBF-A15, CBF-A18, Tacr7, VRN-B3, VRN-A1, of 4.79 × 10 , and a high uGDT of 51, and a uSeqId of VRN-D1, PPD-B1 and PPD-D1) were used for haplotype 15 extent. Protein structure of CBF-A13_2 could not be analysis. In the haplotype association analysis, 22 haplo- predicted. In all remaining cases protein structure types (five promoter haplotypes, nine exon haplotypes, models were of high quality. Additionally, the PKK/ one intron haplotype and seven whole gene haplotypes) RPAGRxKFxETRHP and DSAWR motifs were identified, in ten candidate genes revealed significant associations which are typical features of CBF proteins. The AA se- (P < 0.05) (Table 5). CBF-A18 is the only exon haplotype quences between two CBF-A3 haplotypes revealed three which is not significantly associated to FT. The same ap- AA substitutions. The CBF-A3_SNP2 [C/G] on CDS site plies for promoter haplotypes of CBF-A5, CBF-A13, 263 leads to an Ala/Gly change, CBF-A3_SNP3 [A/G] CBF-A14, PPD-B1 and PPD-D1, intron haplotype of on CDS site 367 to a Ser/Gly change and CBF-A3_SNP4 PPD-D1 and the whole gene haplotypes of CBF-A5, [C/T] on CDS site 407 to a Ser/Phe change. The AA CBF-A13, CBF-A14, CBF-A18, PPD-B1, PPD-D1 and change of CBF-A3_SNP2 is located in the α helix of the VRN-A1. The associated genes are located on chromo- AP2 domain. The CBF-A3_SNP3 and CBF-A3_SNP4 is some 2B (PPD-B1), 2D (PPD-D1), 5A (CBF-A3, located upstream of the identified domains/motifs. The Babben et al. BMC Genomics (2018) 19:409 Page 10 of 24 Table 4 Statistics of significantly associated SNPs and indels Gene/Polymorphism name CDS and promotor site P-value -Log of P Polymorphism Effect in % Observations CBF-A3_SNP1 222 6.28E-10 9.20 A C 15.01 0.00 193 42 CBF-A3_SNP2 263 6.28E-10 9.20 C G 15.01 0.00 193 42 CBF-A3_SNP3 367 6.28E-10 9.20 A G 15.01 0.00 193 42 CBF-A3_SNP4 407 6.28E-10 9.20 C T 15.01 0.00 193 42 CBF-A5_indel1 −83 6.00E-3 2.22 C – −4.52 0.00 141 84 CBF-A10_SNP1 471 6.28E-10 9.20 C T 15.01 0.00 193 42 CBF-A10_SNP2 518 6.28E-10 9.20 G C 15.01 0.00 193 42 CBF-A13_SNP1 −11 6.28E-10 9.20 T C 15.01 0.00 193 42 CBF-A13_indel3 19 6.28E-10 9.20 T – 15.01 0.00 193 42 CBF-A13_SNP2 92 6.28E-10 9.20 G A 15.01 0.00 193 42 CBF-A13_indel2 133 6.28E-10 9.20 – C 15.01 0.00 193 42 134 6.28E-10 9.20 – G 15.01 0.00 193 42 135 6.28E-10 9.20 – T 15.01 0.00 193 42 136 6.28E-10 9.20 – G 15.01 0.00 193 42 137 6.28E-10 9.20 – C 15.01 0.00 193 42 138 6.28E-10 9.20 – G 15.01 0.00 193 42 139 6.28E-10 9.20 – G 15.01 0.00 193 42 140 6.28E-10 9.20 – C A 15.22 0.00 6.75 193 41 1 141 6.28E-10 9.20 – G 15.01 0.00 193 42 142 6.28E-10 9.20 – C 15.01 0.00 193 42 143 6.28E-10 9.20 – A 15.01 0.00 193 42 144 6.28E-10 9.20 – G 15.01 0.00 193 42 145 6.28E-10 9.20 – G 15.01 0.00 193 42 146 6.28E-10 9.20 – G 15.01 0.00 193 42 147 6.28E-10 9.20 – G 15.01 0.00 193 42 148 6.28E-10 9.20 – C 15.01 0.00 193 42 149 6.28E-10 9.20 – A 15.01 0.00 193 42 150 6.28E-10 9.20 – A 15.01 0.00 193 42 151 6.28E-10 9.20 – C 15.01 0.00 193 42 152 6.28E-10 9.20 – G 15.01 0.00 193 42 153 6.28E-10 9.20 – C 15.01 0.00 193 42 154 6.28E-10 9.20 – G 15.01 0.00 193 42 155 4.00E-10 9.40 – G 15.01 0.00 193 42 156 6.28E-10 9.20 – G 15.01 0.00 193 42 157 6.28E-10 9.20 – G 15.01 0.00 193 42 158 6.28E-10 9.20 – C 15.01 0.00 193 42 159 6.28E-10 9.20 – G 15.01 0.00 193 42 160 6.28E-10 9.20 – G 15.01 0.00 193 42 161 6.28E-10 9.20 – T 15.01 0.00 193 42 162 6.28E-10 9.20 – G 15.01 0.00 193 42 163 6.28E-10 9.20 – G 15.01 0.00 193 42 164 6.28E-10 9.20 – G 15.01 0.00 193 42 CBF-A13_SNP3 294 6.28E-10 9.20 C A 15.01 0.00 193 42 Babben et al. BMC Genomics (2018) 19:409 Page 11 of 24 Table 4 Statistics of significantly associated SNPs and indels (Continued) Gene/Polymorphism name CDS and promotor site P-value -Log of P Polymorphism Effect in % Observations CBF-A14_indel1 −506 6.28E-10 9.20 G – 15.01 0.00 193 42 − 505 6.28E-10 9.20 T – 15.01 0.00 193 42 − 504 6.28E-10 9.20 G – 15.01 0.00 193 42 −503 6.28E-10 9.20 A – 15.01 0.00 193 42 − 502 6.28E-10 9.20 G – 15.01 0.00 193 42 −501 6.28E-10 9.20 T – 15.01 0.00 193 42 −500 6.28E-10 9.20 G – 15.01 0.00 193 42 − 499 6.28E-10 9.20 T – 15.01 0.00 193 42 −498 6.28E-10 9.20 G – 15.01 0.00 193 42 − 497 6.28E-10 9.20 A – 15.01 0.00 193 42 −496 6.28E-10 9.20 G – 15.01 0.00 193 42 − 495 6.28E-10 9.20 T – 15.01 0.00 193 42 CBF-A14_SNP1 − 449 6.28E-10 9.20 T C 15.01 0.00 193 42 CBF-A14_SNP2 − 158 6.28E-10 9.20 T C 15.01 0.00 193 42 CBF-A14_SNP3 −53 6.57E-10 9.18 C G 15.01 0.00 193 42 CBF-A14_SNP4 576 6.28E-10 9.20 G A 15.01 0.00 193 42 CBF-A15_SNP1 84 6.28E-10 9.20 C T 15.01 0.00 193 42 CBF-A15_SNP2 243 6.28E-10 9.20 T C 15.01 0.00 193 42 CBF-A15_SNP3 293 6.28E-10 9.20 C T 15.01 0.00 193 42 CBF-A15_SNP4 397 6.28E-10 9.20 G A 15.01 0.00 193 42 CBF-A15_indel1 503 6.28E-10 9.20 T – 15.01 0.00 193 42 504 6.28E-10 9.20 C – 15.01 0.00 193 42 505 6.28E-10 9.20 G – 15.01 0.00 193 42 506 6.28E-10 9.20 T – 15.01 0.00 193 42 507 6.28E-10 9.20 C – 15.01 0.00 193 42 508 6.28E-10 9.20 G – 15.01 0.00 193 42 509 6.28E-10 9.20 T – 15.01 0.00 193 42 510 6.28E-10 9.20 C – 15.01 0.00 193 42 511 6.28E-10 9.20 G – 15.01 0.00 193 42 CBF-A15_SNP5 694 6.28E-10 9.20 A G 15.01 0.00 193 42 CBF-A15_SNP6 716 6.28E-10 9.20 C G 15.01 0.00 193 42 PPD-D1_indel1 1266 3.77E-2 1.42 – C 3.96 0.00 68 166 1267 3.77E-2 1.42 – G 3.96 0.00 68 166 1268 3.77E-2 1.42 – T 3.96 0.00 68 166 1269 3.77E-2 1.42 – C 3.96 0.00 68 166 1270 3.77E-2 1.42 – G 3.96 0.00 68 166 VRN-A1_SNP1 349 3.03E-4 3.52 C T Y 0.00 1.35 0.11 24 2 209 VRN-B3_SNP1 19 3.76E-2 1.43 C G 3.05 0.00 105 128 CDS coding DNA sequence, P probability, Log logarithm number of genotypes per allele AA site of CBF-A3_SNP2 is significant for negative se- His/Arg change which are located in the β strand 3 or lection (Fig. 7). The AA haplotypes of CBF-A5 show the α helix of the AP2 domain, respectively. Addition- three AA changes, which are not significantly associated ally, CBF-A5_SNP4 site underlies negative selection in the SNP/indel analysis. In contrast CBF-A5_SNP2 [C/ (Additional file 10: Figure S5). The AA haplotypes of A] resulted in an Arg/Ser and CBF-A5_SNP4 [A/G] in a CBF-A10 show an AA change of Gly/Ala based on Babben et al. BMC Genomics (2018) 19:409 Page 12 of 24 All five AA substitutions regions show no positive or negative selection (Fig. 8a, Additional file 13: Figure S8). The CBF-A14 AA haplotypes do not result in AA changes (Additional file 14: Figure S9). The CBF-A18 haplotypes carry 14 AA substitutions. However, none of these is associated with FT or underlies negative or posi- tive selection (Additional file 15: Figure S10). Vernalisation genes (VRN-A1 and VRN-B3) Nine homologue AA sequences of VRN-A1 and VRN-B3 were identified (Additional file 9: Table S4). The RaptorX analysis of VRN-A1 identified protein structures from MADS-box/MEF2 (myocyte enhancer factor 2) domain (3kov) and MEF2A (1egw), which are located in the same conserved domain. In addition, the keratin-like (K) domain (4ox0) was identified. The complete complex was described by Theissen et al. [102] Fig. 4 Manhattan plot of SNPs/indels in candidate genes for FT. The and Kaufmann et al. [103] with a MADS DNA binding -log10 (P-values) from the association analysis are plotted against (M) domain, a type II transcription factor containing the the respective candidate gene. The red horizontal line indicates the Intervening (I) domain, a Keratin-like coiled-coil (K) significance threshold at P < 0.05. For better visualisation, the domain and a C-terminal (C) domain (MIKC-type). The successive genes are shown in alternating black and green colours parameters of protein structure modelling are shown in Table 6. The VRN-A1_SNP1 [C/T/Y] on CDS site 349 CBF-A10_SNP2 [G/C] on CDS site 518. This substitu- generated an AA change of Leu/Phe in the K domain tion is upstream of the identified domains/motifs and at between α helix three and α helix four. Furthermore, in a site not showing positive or negative selection the region of this AA substitution, no positive or nega- (Additional file 11: Figure S6). The haplotypes of tive selection was identified (Fig. 8b, Additional file 16: CBF-A13 show completely different AA sequences. The Figure S11). single bp deletion on CDS site 19 of CBF-A13_indel re- The association study revealed that VRN-B3 is sig- sults in a frame shift that changes every AA from the nificantly associated with FT only in the SNP/indel seventh AA onwards. Furthermore, a stop codon at pos- analysis. The protein prediction analysis identified an ition 74 AAs was detected. The haplotype one shows a Hd3A protein (3axy), which is a mobile flowering sig- 32 bp deletion at CDS site 133 to 164 (CBF-A13_indel2). nal in rice (Table 6). This flowering time family pro- This deletion is localised in the AP2 domain and causes tein contains four α helices, seven β strands and one a stop codon after 135 AAs. In summary, haplotype one segment B [104]. The VRN-B3_SNP1 [C/G] on CDS of CBF-A13 shows a higher similarity to homologue AA site 19 generates an AA change of His/Asp directly sequences of related species and the AP2 domain than before the start of the α helix 1. Furthermore, this haplotype two. Due to the short AA sequences of both site is subject to significant negative selection (Fig. 8c, CBF-A13 haplotypes and low similarity to homologue Additional file 17: Figure S12). AA sequences, no assumptions about positive or negative selection can be made (Additional file 12: Figure S7). Photoperiod response genes (PPD-B1 and PPD-D1) The AA analysis of two CBF-A15 haplotypes revealed The association study revealed that the gene PPD-B1 is four AA changes and one AA indel. A nine-nucleotide significantly associated to FT only in the haplotype ana- insertion within the coding region (sites 503 to 511) re- lysis and PPD-D1 in both analyses. The NCBI protein sulted in three additional Ser residues in the haplotype BLASTX of the PPD-B1 and PPD-D1 AA haplotype se- with CBF_indel. The AA change Ala/Val caused by quences showed nine homologues (Additional file 9: CBF-A15_SNP3 [C/T] on CDS site 293 is located in the Table S4). On the basis of the protein BLAST analysis, AP2 domain. The CBF-A15_SNP4 [G/A] on CDS site the Pseudo-Receiver domain and the CONSTANS motif 397 resulted in an Ala/Thr AA change. The last AA of the pseudo-response regulator (PRR) protein family changes are Ser/Gly caused by CBF-A15_SNP5 [A/G] on were identified [39, 40]. In the RaptorX analysis the pro- CDS site 694 and Ser/Trp caused by CBF-A15_SNP6 tein models of a putative response regulator domain [C/G] on CDS site 716. The CBF-A15_SNP4, (3t6k) and response regulator receiver domain (3jte) CBF-A15_SNP5, CBF-A15_SNP6 and CBF-A15_inde1 were identified (Table 6). These domains contain five α are located upstream of the identified domains/motifs. helices and five β strands. The PPD-B1 haplotype AA Babben et al. BMC Genomics (2018) 19:409 Page 13 of 24 Fig. 5 LD plot of all used SNPs and indels for association analysis. On the left side and at the top the genes and chromosomes with polymorphic sites are shown. Each coloured square in the below triangle represents the intensity of LD expressed by P-values for each pairwise comparison between polymorphic sites. Each coloured square in the top triangle represents the intensity of LD expressed by r for each pairwise comparison between polymorphic sites. On the right side the legend for r values and P-values are shown exhibited three AA changes. The PPD-B1_SNP2 [A/G] other AA changes between haplotype two and three did on CDS site 304 generated an AA change of Arg/Gly not show significant association in the SNP/indel associ- and PPD-B1_SNP3 [A/G] on CDS site 368 generated an ation study (Additional file 19: Figure S14). AA change of Asn/Asp. Both are located in the Pseudo-Receiver domain. The PPD-B1_SNP2 is located In silico promoter analysis in the α helix 3 and PPD-B1_SNP3 between β strand 4 The significantly associated CBF-A5_inde1 [C/−], which and α helix 4. The third AA change from Asp to Asn is is located 83 bp upstream of the start codon, entails no caused by PPD-B1_SNP5 [G/A] on CDS site 623. All changes on the promoter regulatory sites and has no in- three AA substitution sites show no significant positive fluence on the transcription of the CBF-A5 gene. Also, or negative selection (Fig. 8d, Additional file 18: Figure the associated polymorphic sites CBF-A13_SNP1 of the S13). The associated PPD-D1_indel1 on CDS site 1266 CBF-A13 promotor and the sites CBF-A14_indel1, to 1270 bp produced a stop codon on AA position 470. CBF-A14_SNP1, CBF-A14_SNP2 and CBF-A14_SNP3 Therefore, haplotype one has no CONSTANS motif. All of the CBF-A14 promoter show no regulatory site Babben et al. BMC Genomics (2018) 19:409 Page 14 of 24 Table 5 Statistics of haplotypes significantly associated to FT Haplotype name Chr. P-value -Log of P Haplotype FT effect of haplotypes in % Observations CBF-A3 ex. 5A 2.37E-11 10.63 1 2 15.01 0.00 193 42 CBF-A5 pro. 7A 7.17E-3 2.14 1 2 4.44 0.00 82 143 CBF-A5 ex. 7A 1.58E-2 1.80 1 2 3 19.51 18.09 0.00 204 28 3 CBF-A5 7A 5.16E-4 3.29 1 2 3 4 24.18 19.29 18.88 0.00 55 27 140 3 CBF-A10 ex. 5A 2.37E-11 10.63 1 2 15.01 0.00 193 42 CBF-A13 ex. 5A 1.57E-10 9.80 1 2 3 15.22 0.00 8.47 193 41 1 CBF-A13 pro. 5A 2.37E-11 10.63 1 2 15.01 0.00 193 42 CBF-A13 5A 1.57E-10 9.80 1 2 3 15.22 0.00 8.47 193 41 1 CBF-A14 pro. 5A 2.37E-11 10.63 1 2 15.01 0.00 193 42 CBF-A14 ex. 5A 1.90E-10 9.72 1 2 3 17.01 2.11 0.00 193 40 2 CBF-A14 5A 1.90E-10 9.72 1 2 3 17.01 2.11 0.00 193 40 2 CBF-A15 ex. 5A 2.37E-11 10.63 1 2 15.01 0.00 193 42 CBF-A18 6A 6.88E-3 2.16 1 2 3 9.32 6.66 0.00 209 9 17 PPD-B1 pro. 2B 4.23E-2 1.37 1 2 5.50 0.00 213 22 PPD-B1 ex. 2B 3.69E-7 6.43 1 2 3 4 18.97 0.00 28.28 33.39 3 5 207 20 PPD-B1 2B 1.31E-7 6.88 1 2 3 4 5 19.03 0.00 33.95 29.07 22.63 3 5 20 185 22 PPD-D1 pro. 2D 6.95E-3 2.16 1 2 3 14.38 10.75 0.00 141 80 4 PPD-D1 in. 2D 3.93E-2 1.41 1 2 3 18.66 19.56 0.00 162 69 3 PPD-D1 ex. 2D 7.05E-3 2.15 1 2 3 21.80 18.18 0.00 68 163 3 PPD-D1 2D 3.78E-3 2.42 1 2 3 4 5 6 21.47 3.77 19.28 16.13 0.00 20.55 68 3 69 90 3 1 VRN-A1 ex. 5A 2.57E-4 3.59 1 2 3 4 0.00 0.68 13.12 11.89 9 15 2 209 VRN-A1 5A 2.87E-3 2.54 1 2 3 4 5 6 14.13 15.90 2.26 15.30 0.00 8.90 207 2 13 2 6 1 Chr. chromosome, ex. exon, pro. Promotor, in. intron, P probability, Log logarithm, FT frost tolerance number of genotypes per allele changes which are indicative for a modification of the gene transcription. Discussion After hybridisation, domestication and breeding activities have shaped the genome of bread wheat and reduced the level of genetic diversity which is nowadays the major limiting factor in breeding of cultivars resistant to biotic and abiotic stresses [105]. Therefore, sequencing of candidate genes in large collections is a tool to rediscover hidden genetic diversity and use this in breeding. LD and diversity The genotype group from Asia shows a very small diver- sity with 25 polymorphic sites out of 81 genotypes. That implies that these genotypes are very close to each other based on the analyzed genes. The polymorphic sites of Fig. 6 Manhattan plot of haplotypes based on FT candidate genes. the different chromosomes show a high LD among The -log10 (P-values) from the association analysis are plotted themselves but not between the different chromosomes. against candidate gene. The red horizontal line indicates the significance threshold at P < 0.05. The squares show the promotor, Consequently, the polymorphic sites which are in a high triangle the exon, circles the intron, diamonds the 3’ UTR and stars LD are inherited together. In contrast, genotypes from the whole haplotypes. For better visualisation, the successive genes North America show a very high diversity with 88 poly- are shown in alternating black and green colours morphic sites in 38 genotypes. That implies that these Babben et al. BMC Genomics (2018) 19:409 Page 15 of 24 Table 6 Protein structure modelling of associated genes Protein haplotypes P-value uGDT (GDT) uSeqId (SeqId) Score Domain/motif Literature −4 CBF-A3_1 3.03E 71 (62) 30 (26) 51 2gcc (AP2) [28] −4 CBF-A3_2 1.52E 71 (61) 29 (25) 51 2gcc (AP2) [28] −4 CBF-A5_1 3.22E 69 (67) 30 (29) 51 2gcc (AP2) [28] −4 CBF-A5_2 5.75E 56 (54) 30 (29) 54 2gcc (AP2) [28] −4 CBF-A5_3 9.71E 57 (56) 29 (28) 56 2gcc (AP2) [28] −4 CBF-A10_1 2.78E 70 (66) 31 (29) 51 2gcc (AP2) [28] −4 CBF-A10_2 2.88E 71 (67) 31 (29) 50 2gcc (AP2) [28] −4 CBF-A13_1 4.79E 51 (38) 15 (11) 36 2gcc (AP2) [28] CBF-A13_2 n.a. n.a. n.a. n.a. n.a. n.a. −4 CBF-A14_1 1.97E 71 (69) 29 (28) 52 2gcc (AP2) [28] −4 CBF-A14_2 1.97E 71 (69) 29 (28) 52 2gcc (AP2) [28] −4 CBF-A15_1 3.80E 70 (66) 29 (27) 50 2gcc (AP2) [28] −4 CBF-A15_2 2.95E 70 (70) 29 (29) 49 2gcc (AP2) [28] −3 CBF-A18_1 1.33E 55 (49) 29 (26) 52 2gcc (AP2) [28] − 4 CBF-A18_2 8.72E 55 (49) 29 (26) 54 2gcc (AP2) [28] − 4 PPD-B1 1.24E 105(62) 32(19) 130 3t6k (putative response regulator domain) n.a. −5 8.74E 101(60) 27(16) 134 3jte (response regulator receiver domain) [111] −4 PPD-B2 1.21E 104(61) 32(19) 129 3t6k (putative response regulator domain) n.a. −5 8.12E 101(59) 27(16) 134 3jte (response regulator receiver domain) [111] −4 PPD-B3 1.06E 103(60) 31(18) 129 3t6k (putative response regulator domain) n.a. −5 7.21E 102(59) 27(16) 134 3jte (response regulator receiver domain) [111] −4 PPD-B4 1.28E 104(61) 32(19) 130 3t6k (putative response regulator domain) n.a. −5 8.98E 102(59) 27(16) 134 3jte (response regulator receiver domain) [111] −4 PPD-D1_1 1.05E 104 (47) 32 (15) 131 3t6k (putative response regulator domain) n.a. −5 7.11E 102 (46) 27 (12) 136 3jte (response regulator receiver domain) [111] −4 PPD-D1_2 1.14E 105 (63) 32 (19) 130 3t6k (putative response regulator domain) n.a. −4 PPD-D1_3 1.16E 104(62) 32 (19) 129 3t6k (putative response regulator domain) n.a. −5 8.07E 101 (60) 27 (16) 134 3jte (response regulator receiver domain) [111] − 3 VRN-A1_1 8.02E 84 (83) 39 (39) 84 4ox0 (K domain) [110] −4 1.02E 78 (103) 38 (50) 53 3kov (MADS domain/I domain) [112] −4 1.30E 75 (98) 38 (50) 52 1egw (MADS domain/I domain) [113] −3 VRN-A1_2 8.47E 83 (82) 39 (39) 84 4ox0 (K domain) [110] −5 7.90E 73 (103) 38 (50) 54 3kov (MADS domain/I domain) [112] −5 9.11E 74 (98) 38 (50) 53 1egw (MADS domain/I domain) [113] −3 VRN-A1_3 7.95E 84 (83) 38 (38) 84 4ox0 (K domain) [110] −5 7.06E 78 (103) 38 (50) 54 3kov (MADS domain/I domain) [112] −4 1.08E 75 (99) 38 (50) 53 1egw (MADS domain/I domain) [113] VRN-B3_1 7.92E- 154 (87) 147 (83) 166 3axy (Hd3A protein) [104] VRN-B3_2 1.66E- 154 (87) 148 (84) 168 3axy (Hd3A protein) [104] P: probability, uGDT: unnormalized GDT (Global Distance Test) score, GDT: calculated as uGDT divided by the protein (or domain) length and multiplied by a 100, uSeqID: number of identical residues in the alignment, SeqID: uSeqID normalized by the protein (or domain) sequence length and multiplied by 100, n.a.: not available Babben et al. BMC Genomics (2018) 19:409 Page 16 of 24 Fig. 7 Amino acid alignment and nucleotide divergence rates (dN/dS) of CBF-A3 gene and homologous amino acid sequences. Shown are alignments of two haplotype AA sequences of CBF-A3 and nine homologue plant AA sequences. The numbers above the alignment indicate the sites of AAs. The black line above the alignment illustrates the PKK/RPAGRxKFxETRHP and DSAWR motif, the red line the AP2 domain, the black arrows the β-strands and the black spirals the α-helices. The red arrows label the AA changes based on significantly associated SNPs or indels. The plot below the alignment shows the nucleotide divergence rates (dN/dS). The black line describes the dN/dS ratio. The red dots between alignment and plot indicate sites with significant negative selection (P < 0.05) genotypes are very distant to each other based on the 7A and 6A and do not belong to the FR-A2 locus. Add- analyzed genes. The gene cluster of CBFs (CBF-A3, itionally, all the SNPs identified in CBFs on chromosome CBF-A10, CBF-A13, CBF-A14 and CBF-A15)on 5A are in a very high LD (r = 0.92 to 1), indicating chromosome 5A shows a very high LD of r = 1. Conse- localisation at the same chromosomal region without or quently, this gene cluster has not been divided by mei- very rare recombination. This result confirms previous otic events but separated from Cab also located on knowledge of strong linkage of CBF genes and empha- chromosome 5A. An intermediate diversity was detected sises the importance of the FR-A2 locus in frost in genotypes from Europe with 51 polymorphic sites in response. On the other hand, results reveal that not all 116 genotypes. The LD analysis shows a very high LD members of the CBF family are involved in FT [50, 108]. within three blocks located on chromosome 5A but not Based on the occurrence of only two haplotypes of the between. Consequently the gene cluster (CBF-A3, CBF genes and the fact that they are very closely linked, CBF-A10, CBF-A13, CBF-A14 and CBF-A15), Cab and genotypes can be divided into two groups. Group one VRN-A1 are inherited independently. with 193 genotypes which shows 15% better winter sur- vival in comparison to group two (about 42 genotypes) Association study (Table 4). Therefore, breeding efforts in combining two FT is a highly complex and important trait of winter FT haplotypes are highly desirable towards creating elite wheat that is usually studied using QTL and expres- cultivars exhibiting high FT. sion profiling approaches [47, 106, 107]. In this study The CBF-A3_SNP2 [C/G] which is significantly − 10 we are presenting, to our best knowledge, the first associated with FT (P =6.28×10 ) results in Ala/ large scale candidate gene based association analysis Gly AA change. This AA is localised in the α helix of of FT in wheat. the AP2 domain and shows a high conservation of Accordingly, we identified significantly associated Ala in the AA alignment with homologues. Only the polymorphisms (SNPs/indels) in eleven studied genes as AA haplotype two had a Gly which corresponds to a well as respective haplotypes. Eight of these were de- significant reduction in winter survival of 15.01% in tected in both approaches (CBF-A3, CBF-A5, CBF-A10, the SNP/indel association study. Also haplotype one CBF-A13, CBF-A14, CBF-A15, PPD-D1 and VRN-A1). shows a 15.01% better winter survival compared to Out of the seven CBF genes, which are members of a haplotype two with Gly in the haplotype association large gene family that were investigated in this study, six study (Tables 4 and 5). Allen et al. [28] describe that revealed FT association in the SNP/indel and haplotype Ala contributes to the stabilisation of the protein method. Associated polymorphisms at the five candidate structure by its hydrophobic side chain. Through the genes CBF-A3, CBF-A10, CBF-A13, CBF-A14 and AA change from Ala to Gly it is possible that the Gly CBF-A15 are located at the FR-A2 locus [46, 47]on of AA haplotype two loses or impairs the functional- wheat chromosome 5A. The other two CBF members, ity as a transcription factor. Additionally, the dN/dS i.e. CBF-A5 and CBF-A18, are located on chromosomes ratio analysis shows that the CBF-A3_SNP2 site Babben et al. BMC Genomics (2018) 19:409 Page 17 of 24 ab cd Fig. 8 Nucleotide divergence rates (dN/dS) of CBF-A15, VRN-A1, VRN-B3 and PPD-B1 genes and homologue amino acid sequences. a Nucleotide divergence rates (dN/dS) between the identified haplotypes of CBF-A15, reference sequence of T. aestivum and eight homologous plant AA sequences. The black line on top indicates the PKK/RPAGRxKFxETRHP motif and DSAWR, the red line the AP2 domain. The black arrows and clamp label the position of AA changes based on significantly associated SNPs and indels. The black line shows the dN/dS ratio. The red dots illustrate sites underlying significant negative selection (P < 0.05). b Nucleotide divergence rates (dN/dS) between the identified haplotypes of VRN-A1, reference sequence of T. aestivum and eight homologous plant AA sequences. The black line on top indicates the MADS box and K domain, the red line the I and C domain. The black arrow labels the position of AA change based on significantly associated SNPs. The black line describes the dN/dS ratio. The red dots indicate sites with significant negative selection (P < 0.05). c Illustrated are nucleotide divergence rates (dN/dS) between the identified haplotypes of VRN-B3, reference sequence of T. aestivum and eight homologous plant AA sequences. The black line on top indicatesthe segment B, the red line the flowering time (FT)-family protein. The black arrow labels the position of AA change based on significant associated SNPs. The black line describes the dN/dS ratio. The red dots illustrate sites with significant negative selection (P < 0.05). d Nucleotide divergence rates (dN/dS) between the identified haplotypes of PPD-B1, reference sequence of T. aestivum and eight homologous plant AA sequences. The black line on top indicates the Pseudo-Reciver domain, the red line the CONSTANS motif. The black arrows label the position of AA changes significantly associated SNPs. The black line describes the dN/dS ratio. The red dots indicate sites with significant negative selection (P < 0.05) underlies negative selection (Fig. 7), reflecting the to haplotype one respectively two. Due to the fact that association effects. In detail, haplotype two with an the exon haplotype three consists only of three geno- Gly on CBF-A3_SNP2 site shows lower winter sur- types it was not identified in the SNP/indel association vival of 15.01% compared to haplotype one. analysis (Table 5, Additional file 10: Figure S5). The SNP/indel association study of CBF-A5 exon The significantly associated SNPs and indels of SNPs revealed no significant association. In contrast, the CBF-A13 may disorganise the protein structure very exon haplotype is significantly associated with a reduced strongly. The one bp CBF-A13_indel1 of haplotype two winter survival of 19.51% respectively 18.01% of exon changed the complete AA sequences from the seventh haplotype three (Ser on site CBF-A5_SNP2) compared AA onwards and a stop codon is present after 74 AAs. Babben et al. BMC Genomics (2018) 19:409 Page 18 of 24 Since only six AAs are identical to the reference protein conclusion that SNP CBF-A3_SNP2 is the most inter- sequence, a complete loss of function may be assumed. esting CBF allele for FT improvement and at the The 32 bp deletion (CBF-A13_indel2) of haplotype one same time CBF-A15_SNP3 is the second most im- also generates a frame shift from AA 45 on and a stop portant one. Further details remain to be revealed by codon after 135 AAs. Hence, the AA haplotype two ex- investigations in the future such as complementation hibits no similarity and the AA haplotype one shows a of promising alleles in spring or winter wheat var- very low uSeqId of 15 but an appropriate P-value of ieties or protein functionality analysis of these alleles. − 4 4.79 × 10 compared to the AP2 domain. Therefore, it Identification of the VRN-A1_SNP1 [C/T/Y] revealed is most likely that both proteins are non-functional. that the genotypes with the base T and the genotypes However, haplotype one shows about 6.20% higher win- with the ambiguous nucleotide Y increase FT by 1.35% ter survival in the SNP/indel association study and in respectively 0.11% in comparison to the genotypes with the haplotype association study (9.80%) compared to the base C. The exon haplotype association study shows haplotype two. Consequently, it is possible that the pro- stronger effects. In detail, haplotype three (T) and haplo- teins of AA haplotype one have an increased transcrip- type four (Y) increase FT by 12.44% respectively 11.21%, tion factor efficiency compared to the proteins of AA in comparison to haplotype two (C). Chen et al. [53] and haplotype two. On the other hand the association may Eagles et al. [55] also identified the C and Y allele and be due to the very high LD within the FR-A2 locus. Diaz et al. [109] and Zhu et al. [107] described that the Regarding the two AA haplotypes of CBF-A15 an AA C allele is associated with a lower VRN-A1 copy number change in the AP2 domain from Ala to Val was identified and the Y allele with a higher copy number. In addition, which matched to the statistically significant associated Zhu et al. [107] described that an increase of the − 10 CBF-A15_SNP3 [C/T] with a P-value of 6.28 × 10 .Ala VRN-A1 copy number is associated with improved FT is conserved at this position at the nine homologue AA se- among the FR-A2-T allele of the CBF12 and CBF15 quences. The haplotype two which comprises Val in the genes but not among the FR-A2-S which are reflected in AA sequence shows 15.01% less winter survival in the our haplotypes one and two, respectively. However, the SNP/indel association study and 15.01% in the haplotype VRN-A1_SNP1 generates an AA change on a Leu con- association study. That indicates a functional loss, al- served side in the K-domain of a MIKC-type transcrip- though Ala as well as Val possess a hydrophobic side chain tion factor between α helix three and α helix four to a and are very similar in molecule size. A functional loss Phe. Puranik et al. [110] showed that this Leu stabilises due to this AA change is unlikely. On the other hand, no the kink region between α helix three and α helix four statistically significant positive or negative selection at this by extensive intra-molecule hydrophobic interactions of site was detected. multiple Leu residues. Both AA have hydrophobic side Only the whole gene haplotype of CBF-A18 is signifi- chains but Phe with its benzene ring strongly differs cantly associated to FT. Due to the MAF selection, no from Leu for its steric requirements. Therefore, it is pos- association study could be performed for all other haplo- sible that the Phe increases the angle between both α type components and polymorphic sites of CBF-A18 helices due to its bulkiness and the attachment to the (Table 5). target sequence of the transcription factor is improved. All other SNPs/indels of the seven CBFs, which re- Additionally the VRN-A1_SNP1 underlies no negative or sulted in AA changes, may be involved in FT but they positive selection. are not located in the highly conserved AP2 domain. The VRN-B3 gene is significantly associated to FT only But, the CBF-A3_SNP4, which is located upstream of in the SNP/indel association study. The polymorphism − 2 the AP2 domain, shows conserved AA sites. This may VRN-B3_SNP1 [C/G] shows a P-value of 3.76 × 10 . play an important regulatory role in FT. The associated This SNP generates a His/Asp AA change. The geno- polymorphisms of CBF-A5 and CBF-A14 are located types with His show a higher winter survival of 3.05% within the promoter and the in silico promoter analysis compared to those containing Asp (Table 4). Addition- of these haplotypes shows no promoter region differ- ally this site shows a high conservation of Asp regarding ences which explain modifications in gene transcription. all nine homologue AA sequences and underlies signifi- This study revealed significantly associated SNPs/ cant negative selection. All this data indicates that indels and haplotypes in seven CBF genes. Out of the VRN-B3 and respective homologous play a role in FT. associated polymorphisms two SNPs were identified, The polymorphisms of the PPD-B1 gene merely show whichresultedinanAA changeinthe highly con- significance in the haplotype association study. The exon served AP2 domain of the CBF-A3 and CBF-A15 haplotype three shows 5.11% better winter survival than protein. Both CBF genes were identified as important haplotype four. The difference between both haplotypes FT genes in Triticum monococcum and Triticum is the PPD-B1_SNP2 [A/G] which generates an Arg/Gly aestivum [47, 59, 61–63]. All this leads to the change in the α helix 3 of the Pseudo-Receiver domain. Babben et al. BMC Genomics (2018) 19:409 Page 19 of 24 The Arg is highly conserved for grasses [39, 40]. There- (PPD-B1_SNP2, PPD-B1_SNP3 and PPD-B1_SNP5) fore, it is possible that the Gly from haplotype three has (Additional file 18: Figure S13). a positive effect on FT. Another AA change in the The associated indel of PPD-D1 has an effect of 3.96% Pseudo-Receiver domain from Asn to Asp is caused by in the SNP/indel association study (Table 4). Haplotype the PPD-B1_SNP3 [A/G]. Asp at this position is highly one (with a deletion) shows 3.62% better winter survival conserved. Only haplotype one and the homologue AA compared to haplotype two in the haplotype association of Triticum aestivum show Asn resulting in a 9.25% study (Table 5). As a result of this deletion, a stop codon decreased winter survival in comparison to haplotype on AA position 470 occurs and the CONSTANS motif is four. The haplotype two, which is associated with missing. The consequence is that the PPD-D1 protein PPD-B1_SNP5 [G/A], and an Asp/Asn AA change shows cannot interact with the CO protein and the flowering 28.28% less winter survival in comparison to haplotype control pathway is interrupted [38]. Furthermore, the four. The most tolerant haplotype four originates from the interaction between the flowering time and FT pathway Asia group. The position of this AA is 51 AAs down- is disturbed (Additional file 19: Figure S14). stream of the Pseudo-Receiver domain and is slightly This study demonstrated polymorphisms significantly conserved. Association of the haplotypes one (three obser- associated with FT and the importance of the AA vations) and two (five observations) are based on very few changes of seven CBF gene family members and the observation and therefore the results should be interpreted VRN-A1, VRN-B3, PPD-B1 and PPD-D1 genes (Fig. 9). with caution. The dN/dS analysis revealed no significant These results may be used to design highly frost tolerant negative or positive selection for all three AA substitutions wheat cultivars via gene engineering or classical Fig. 9 Workflow for identifying associations for frost tolerance in wheat Babben et al. BMC Genomics (2018) 19:409 Page 20 of 24 breeding. To achieve this, six associated genes (CBF-A3, Additional file 7: Figure S4. Principal coordinate analysis of 235 wheat CBF-A15, VRN-A1, VRN-B3, PPD-B1 and PPD-D1) have cultivars. Three sub-populations based on geographical origin were shown in three colors. Blue, red and green indicate the cultivars from to be combined, employing the alleles which show the Europe, North America and Asia, respectively. The black dots indicatethe strongest positive effect in FT. We suggest a wheat culti- mixture gemplasm from three sub-populations. (PDF 42 kb) var with the haplotype one of both CBF genes, haplotype Additional file 8: Table S3. Polymorphic sites and haplotypes of three of PPD-B1, haplotype one of PPD-D1, haplotype association study and their PH-values and FT effects. (XLSX 14 kb) three of VRN-A1 and VRN-B3_SNP1 of VRN-B3 to Additional file 9: Table S4. Identities and e values of FT associated AA sequences and homologous AA sequences. (XLSX 21 kb) create a genotype with the theoretically highest FT Additional file 10: Figure S5. Amino acid alignment and nucleotide regarding the investigated genes. Additional candidate divergence rates (dN/dS) of CBF-A5 and nine homologous amino acid gene based association genetics studies in the field of FT sequences. Shownare alignments of two haplotype AA sequences of should focus on the COR (cold-regulated) genes and CBF-A5 and nine homologous plant AA sequences. The numbers above the alignment indicate the sites of AAs. The black line above the align- proteins. ment illustrates the PKK/RPAGRxKFxETRHP and DSAWR motif, the red line the AP2 domain, the black arrows the β-strands and the black spiral the Conclusion α-helix. The description of red arrows, black line und red dots is accord- ing to Fig. 6. (PDF 54 kb) In FT research and especially with respect to genome Additional file 11: Figure S6. Amino acid alignment and nucleotide based breeding, it is important to identify genes and al- divergence rates (dN/dS) of CBF-A10 gene and nine homologous amino leles involved in cold response. Based on the phenotypic acid sequences. Alignments of two haplotype AA sequences of CBF-A10 data from German and Russian field trials, this study and nine homologous plant AA sequences. The numbers above the alignment indicate the sites of AAs. The black line above the alignment illustrates the importance of the CBF genes of the FR-A2 illustrates the PKK/RPAGRxKFxETRHP and DSAWR motif, the red line the locus, VRN-A1, VRN-B3, PPD-B1 and PPD-D1 for FT of AP2 domain, the black arrows the β-strands and the black spiral the α- wheat. Polymorphisms in CBF-A3, CBF-A5, CBF-A10, helix. The description of red arrow, black line and red dots is according to Fig. 6. (PDF 53 kb) CBF-A13, CBF-A14, CBF-A15, CBF-A18, VRN-A1, Additional file 12: Figure S7. Amino acid alignment and nucleotide VRN-B3, PPD-B1 and PPD-D1 were identified which are divergence rates (dN/dS) of CBF-A13 gene and nine homologous amino significantly associated with FT. Besides this, we acid sequences. Illustrated are alignments of two haplotype AA sequences of CBF-A13 and nine homologous plant AA sequences. The detected significantly associated polymorphisms and numbers above the alignment illustrate the sites of AAs. The black line consequential AA substitutions in important cold above the alignment illustrates the PKK/RPAGRxKFxETRHP and DSAWR response protein domains. The results of this study motif, the red line the AP2 domain, the black arrows the β-strands and the black spiral the α-helix. The description of black line and red dots is demonstrated that the candidate based association gen- according to Fig. 6. (PDF 53 kb) etics approach is a very useful method to identify alleles Additional file 13: Figure S8. Amino acid alignment and nucleotide positively contributing to frost tolerance in hexaploid divergence rates (dN/dS) of CBF-A15 gene and nine homologous amino wheat. The FT associated SNPs/indels and haplotypes acid sequences. Illustrated are alignments of two haplotype AA identified may be used for developing diagnostic markers sequences of CBF-A15 and nine homologous plant AA sequences. The numbers above the alignment illustrate the sites of AAs. The black line for marker assisted selection (MAS) for frost tolerance above the alignment illustrates the PKK/RPAGRxKFxETRHP and DSAWR in wheat. motif, the red line the AP2 domain, the black arrows the β-strands and the black spiral the α-helix. The description of red arrows, black line and red dots is according to Fig. 6. (PDF 52 kb) Additional files Additional file 14: Figure S9. Amino acid alignment and nucleotide divergence rates (dN/dS) of CBF-A14 gene and nine homologous amino Additional file 1: Table S1. Plant material, origin, LS means of winter acid sequences. Illustrated are alignments of two haplotype AA sequences survival [%] and winter survival [%] of respective environments. (XLSX 34 kb) of CBF-A14 and nine homologous plant AA sequences. The numbers above Additional file 2: Table S2. PCR and sequencing primers. Shcherban the alignment illustrate the sites of AAs. The black line above the alignment 2 3 4 et al., 2012; Miller et al., 2006; Vagujfalvi et al., 2005; Yan et al., 2004; illustrates the PKK/RPAGRxKFxETRHP and DSAWR motif, the red line the AP2 5 6 7 8 Beales et al., 2007; Keilwagen et al., 2014; Knox et al., 2008; Babben et domain, the black arrows the β-strands and the black spiral the α-helix. The 9 10 al., 2015; Seki et al., 2011; Li et al., 2011b (XLSX 19 kb) description of black line and red dots is according to Fig. 6.(PDF51 kb) Additional file 3: Data S1. Java script used to extract differences from a Additional file 15: Figure S10. Amino acid alignment and nucleotide multiple sequence alignment (MSA). (TXT 5 kb) divergence rates (dN/dS) of CBF-A18 gene and nine homologous amino Additional file 4: Figure S1. Scatter plot of mean winter survival and acid sequences. Illustrated are alignments of two haplotype AA number of days under − 15 °C from ten environments. (PDF 24 kb) sequences of CBF-A18 and nine homologous plant AA sequences. The numbers above the alignment illustrate the sites of AAs. The black line Additional file 5: Figure S2. PCA plot of mean winter survival and above the alignment illustrates the PKK/RPAGRxKFxETRHP and DSAWR number of days under − 15 °C from ten environments. (PDF 52 kb) motif, the red line the AP2 domain, the black arrows the β-strands and Additional file 6: Figure S3. Localization of 19 candidate genes and the black spiral the α-helix. The description of black line und red dots is corresponding primers in the Chinese Spring Reference assembly v1.0. according to Fig. 6. (PDF 54 kb) Next to the gene names, the chromosome number and physical position Additional file 16: Figure S11. Amino acid alignment and nucleotide of the Chinese Spring reference assembly v1.0 are shown in brackets. For divergence rates (dN/dS) of VRN-A1 gene and nine homologous amino each gene the structure is illustrated below. The black lines indicate the acid sequences. Illustrated are alignments of three haplotype AA genomic sequences, the gray boxes the exons, the black arrows sequences of VRN-A1 nine homologous plant AA sequences. The rightward the forward primers and the black arrows leftwards the reverse numbers above the alignment illustrate the sites of AAs. The black line primers. (PDF 23 kb) Babben et al. BMC Genomics (2018) 19:409 Page 21 of 24 cultivars were deposited to GenBank under accession numbers MH264428- above the alignment illustrates the I domain and C domain, the red line MH264501 to be available in the NCBI Nucleotide database. the MADS box domain and K domain, the black arrows the β-strands and the spirals the α-helices. The description of red arrow, black line and red Authors’ contributions dots is according to Fig. 6. (PDF 79 kb) Conceived and designed the experiments: DP, FO and MK. Provided the Additional file 17: Figure S12. Amino acid alignment and nucleotide experimental material: AB, MK, TP and YC. Performed the experiments: SB, divergence rates (dN/dS) of VRN-B3 gene and nine homologous amino FA, YC, TP, AB and MK. Analysed the data: SB, JK, TB, ES, PJ, SET and DP. acid sequences. Illustrated are alignments of three haplotype AA Wrote the paper: SB, DP and FO. Study design, subject recruitment and sequences of VRN-B3 nine homologous plant AA sequences. The sample preparation: DP, MK and FO. Data interpretation: SB, JK, TB, ES, PJ, numbers above the alignment illustrate the sites of AAs. The black line SET, PK and DP. All authors read and approved the final manuscript. above the alignment illustrates the segment B, the red line the flowering time (FT)-family protein, the black arrows the β-strands and the spirals Ethics approval and consent to participate the α-helices. The description of red arrow, black line and red dots is The material used in this study consisted mostly of registered varieties a/o according to Fig. 6. (PDF 51 kb) lines publicly accessible. For all the accessions analysed no permissions or Additional file 18: Figure S13. Amino acid alignment and nucleotide licenses were needed. divergence rates (dN/dS) of PPD-B1 gene and nine homologous amino acid sequences. Illustrated are alignments of four haplotype AA Competing interests sequences of PPD-B1 and nine homologous plant AA sequences. The The authors declare that they have no competing interests. numbers above the alignment illustrate the sites of AAs. The red line above the alignment illustrates the Pseuodo Receiver domain and COSTANS motif. The description of red arrows, black line and red Publisher’sNote dots is according to Fig. 6. (PDF 8583 kb) Springer Nature remains neutral with regard to jurisdictional claims in Additional file 19: Figure S14. AA alignment and nucleotide published maps and institutional affiliations. divergence rates (dN/dS) of PPD-D1 gene and nine homologous amino acid sequences. Illustrated are alignments of four haplotype AA Author details sequences of PPD-D1 and nine homologous plant AA sequences. The Julius Kühn-Institut (JKI), Federal Research Centre for Cultivated Plants, numbers above the alignment illustrate the sites of AAs. The red line Institute for Resistance Research and Stress Tolerance, Erwin-Baur-Str. 27, above the alignment illustrates the Pseudo Receiver domain and 06484 Quedlinburg, Saxony-Anhalt, Germany. Martin Luther University COSTANS motif. The description of black line and red dots is according to Halle-Wittenberg (MLU), Institute of Agricultural and Nutritional Sciences, Fig. 6.(PDF 7539 kb) Betty-Heimann-Str. 5, 06120 Halle (Saale), Saxony-Anhalt, Germany. Julius Kühn-Institut (JKI), Federal Research Centre for Cultivated Plants, Institute for Biosafety in Plant Biotechnology, Erwin-Baur-Str. 27, 06484 Quedlinburg, Abbreviations Saxony-Anhalt, Germany. Deutsche Saatveredelung AG (DSV), Weißenburger AA: Amino acid; ANCOVA: Analysis of covariance; AP2/EREBP: APETALA2/ Str. 5, 59557 Lippstadt, Nordrhein-Westfalen, Germany. Leibniz Institute of Ethylene response element binding protein; BLAST: Basic local alignment Plant Genetics and Crop Plant Research (IPK), Resources Genetics and search tool; CAB: Calcium-binding EF-hand family protein-like; CBF: C-repeat Reproduction, Correnstraße 3, 06466 Seeland OT Gatersleben, Saxony-Anhalt, binding factor; CDS: Coding DNA sequence; CO: CONSTANS; COR/LEA: Cold- Germany. Max Planck Institute for Biology of Ageing, Joseph-Stelzmann-Str. responsive/late embryogenesis–abundant; CRT/DRE: C-repeat/dehydration- 9B, 50931 Cologne, Nordrhein-Westfalen, Germany. Agrophysical Research responsive element; CV: Coefficient of variation; DEM: Defective embryo and Institute (AFI), Grazhdanskii prosp. 14, 195220 St. Petersburg, Russia. Institute meristems; DH: Doubled haploid; DN/DS: Nucleotid divergence rate; FR: Frost of Cytology and Genetics of Siberian Branch of the Russian Academy of resistance; FT: Frost tolerance; GBP: Giga-base pairs; GLM: General linear Sciences, Prospekt Lavrentyeva 10, 630090 Novosibirsk, Russia. Saaten-Union model; GWAS: Genome-wide association study; HD: Haplotype diversity; Biotec GmbH, Hovedisser Str. 94, 33818 Leopoldshoehe, Nordrhein-Westfalen, ICE: Inducer of CBF expression; IFVCNS: Institute of Field and Vegetable Germany. Martin Luther University Halle-Wittenberg (MLU), Institute of Crops; INDEL: Insertion-deletion; IWGSC: International wheat genome Agricultural and Nutritional Sciences, Betty-Heimann-Str. 3, 06120 Halle sequencing consortium; KBP: Kilo-base pairs; LD: Linkage disequilibrium; (Saale), Saxony-Anhalt, Germany. LS: Least-Squares; M: Million; MAF: Minor allele frequency; MAFFT: Multiple Alignment tool Fast Fourier Transform; MCMC: Markov Chain Monte Carlo; Received: 15 December 2017 Accepted: 14 May 2018 MEF: Myocyte enhancer factor; MLM: Mixed linear model; MSA: Multiple sequence alignment; NGS: Next generation sequencing; NT: Nulli-tetrasomic; PCA: Principal component analysis; PCOA: Principal coordinate analysis; References PPD: Photoperiod; PRR: Pseudo-response regulator; QTL: Quantitative trait 1. Food and Agriculture Organization of the United Nations. http://www.fao. locus; RV: Reaction volume; SNP: Single nucleotide polymorphism; T: Tonne; org/home/en/ (2014). Accessed 03 Feb 2015. TACR7: Triticum aestivum cold-regulated 7; URGI: Unité de Recherche 2. Koemel JE, Guenzi AC, Anderson JA, Smith EL. Cold hardiness of wheat Génomique Info; UTR: Untranslated region K: number of populations; near-isogenic lines differing in vernalization alleles. Theor Appl Genet. 2004; VRN: Vernalisation 109:839–46. 3. Dvorak J, Akhunov ED. Tempos of gene locus deletions and duplications and their relationship to recombination rate during diploid and polyploid Acknowledgements evolution in the aegilops-triticum alliance. Genetics. 2005;171:323–32. The authors thank the International Wheat Genome Sequencing Consortium 4. Huang S, Sirikhachornkit A, Su XJ, Faris J, Gill B, Haselkorn R, Gornicki P. for pre-publication access to IWGSC RefSeq v1.0, and Katy Niedung for excellent Genes encoding plastid acetyl-CoA carboxylase and 3-phosphoglycerate technical assistance. The authors thank to Majda Valjavec-Gratian from NCBI for kinase of the Triticum/Aegilops complex and the evolutionary history of help during preparation of sequences for submission to GenBank. polyploid wheat. Proc Natl Acad Sci U S A. 2002;99:8133–8. 5. Peng JHH, Sun DF, Nevo E. Domestication evolution, genetics and Funding genomics in wheat. Mol Breed. 2011;28:281–301. The authors thank the German Federal Ministry of Education and Research 6. Smith DB, Flavell RB. Characterization of wheat genome by renaturation (BMBF) for funding the project FROWHEAT (0315953). The field experiments in kinetics. Chromosoma. 1975;50:223–42. Novosibirsk were partially supported by ICG budget project #0324–2016-0001. 7. Choulet F, Wicker T, Rustenholz C, et al. Megabase level sequencing reveals contrasted organization and evolution patterns of the wheat gene and Availability of data and materials Transposable element spaces. Plant Cell. 2010;22:1686–701. All data presented in this study, as well as GenBank accessions and cultivars will 8. Wang S, Wong D, Forrest K, Allen A, Chao S, Huang BE, Maccaferri M, Salvi S, be publicly available. Sequences of individual haplotypes from representative Milner SG, Cattivelli L, et al. Characterization of polyploid wheat genomic Babben et al. BMC Genomics (2018) 19:409 Page 22 of 24 diversity using a high-density 90 000 single nucleotide polymorphism array. 31. Jaglo KR, Kleff S, Amundsen KL, Zhang X, Haake V, Zhang JZ, Deits T, Plant Biotechnol J. 2014;12(6):787–96. Thomashow MF. Components of the Arabidopsis C-repeat/dehydration- 9. Babben S, Perovic D, Koch M, Odon F. An efficient approach for the responsive element binding factore cold-response pathway are development of locus specific primers in bread wheat (Triticum aestivum L.) conserved in Brassica napus and other plant species. Plant Physiol. and its application to re-sequencing of genes involved in frost tolerance. 2001;127:910–7. PLoS One. 2015;10:e0142746. 32. Allagulova CR, Gimalov FR, Shakirova FM, Vakhitov VA. The plant dehydrins. Structure and putative functions. Biochemistry-Moscow. 2003;68:945–51. 10. Jordan KW, Wang S, Lun Y, et al. A haplotype map of allohexaploid wheat reveals distinct patterns of selection on homoeologous genomes. Genome 33. Close TJ. Dehydrins. A commonality in the response of plants to Biol. 2015;16:48. dehydration and low temperature. Physiol Plant. 1997;100:291–6. 11. School of Biological Sciences. University of Bristol. 2012. http://www. 34. Winfield MO, Lu CG, Wilson ID, Coghill JA, Edwards KJ. Plant responses to cerealsdb.uk.net. Accessed 01 Mar 2012. cold. Transcriptome analysis of wheat. Plant Biotechnol J. 2010;8:749–71. 12. Wilkinson P, Winfield M, Barker G, Allen A, Burridge A, Coghill J, Edwards K. 35. Gilmour SJ, Zarka DG, Stockinger EJ, Salazar MP, Houghton JM, Thomashow CerealsDB 2.0. An integrated resource for plant breeders and scientists. BMC MF. Low temperature regulation of the Arabidopsis CBF family of AP2 Bioinformatics. 2012;13:219. transcriptional activators as an early step in cold-induced COR gene 13. Unité de Recherche Génomique Info. Centre de Versailles (2013). http:// expression. Plant J. 1998;16:433–42. wheat-urgi.versailles.inra.fr. Accessed 20 Feb 2013. 36. Dhillon T, Pearce SP, Stockinger EJ, Distelfeld A, Li CX, Knox AK, Vashegyi I, Vagujfalvi A, Galiba G, Dubcovsky J. Regulation of freezing tolerance and 14. Unité de Recherche Génomique Info. Centre de Versailles (2013). https:// flowering in temperate cereals. The VRN-1 connection. Plant Physiol. 2010; wheat-urgi.versailles.inra.fr/Seq-Repository/. Accessed 20 Feb 2013. 153:1846–58. 15. Brenchley R, Spannagl M, Pfeifer M, et al. Analysis of the bread wheat genome using whole-genome shotgun sequencing. Nature. 2012;491:705–10. 37. Kato K, Yamagata H. Method for evaluation of chilling requirement and 16. The International Wheat Genome Sequencing Consortium. A chromosome- narrow-sense earliness of wheat cultivars. Jpn J Breed. 1988;38:172–86. based draft sequence of the hexaploid bread wheat (Triticum aestivum) 38. Turner A, Beales J, Faure S, Dunford RP, Laurie DA. The pseudo-response genome. Science. 2014;345:1251788. regulator Ppd-H1 provides adaptation to photoperiod in barley. Science. 17. Raats D, Frenkel Z, Krugman T, et al. The physical map of wheat 2005;310:1031–4. chromosome 1BS provides insights into its gene space organization and 39. Matsushika A, Makino S, Kojima M, Mizuno T. Circadian waves of expression evolution. Genome Biol. 2013;14:R138. of the APRR1/TOC1 family of pseudo-response regulators in Arabidopsis thaliana. Insight into the plant circadian clock. Plant Cell Physiol. 2000;41: 18. Ling HQ, Zhao S, Liu D, et al. Draft genome of the wheat A-genome 1002–12. progenitor Triticum urartu. Nature. 2013;496:87–90. 40. Strayer C, Oyama T, Schultz TF, Raman R, Somers DE, Mas P, Panda S, Kreps 19. Jia JZ, Zhao SC, Kong XY, et al. Aegilops tauschii draft genome sequence JA, Kay SA. Cloning of the Arabidopsis clock cone TOC1, an autoregulatory reveals a gene repertoire for wheat adaptation. Nature. 2013;496:91–5. response regulator homolog. Science. 2000;289:768–71. 20. Winfield MO, Allen AM, Burridge AJ, Barker GLA, et al. High-density SNP genotyping array for hexaploid wheat and its secondary and tertiary gene 41. Danyluk J, Kane NA, Breton G, Limin AE, Fowler DB, Sarhan F. TaVRT-1, a pool. Plant Biotechnol J. 2016;14(5):1195–206. putative transcription factor associated with vegetative to reproductive transition in cereals. Plant Physiol. 2003;132:1849–60. 21. Allen AM, Winfield MO, Burridge AJ, et al. Characterization of a wheat 42. Trevaskis B, Bagnall DJ, Ellis MH, Peacock WJ, Dennis ES. MADS box genes breeders’ Array suitable for high-throughput SNP genotyping of global accessions of hexaploid bread wheat (Triticum aestivum). Plant Biotechnol J. control vernalization-induced flowering in cereals. Proc Natl Acad Sci U S A. 2003;100:13099–104. 2016;15(3):390–401. 43. Yan L, Loukoianov A, Tranquilli G, Helguera M, Fahima T, Dubcovsky J. 22. Li Y, Boeck A, Haseneyer G, Korzun V, Wilde P, Schoen CC, Ankerst DP, Bauer Positional cloning of the wheat vernalization gene VRN1. Proc Natl Acad Sci E. Association analysis of frost tolerance in rye using candidate genes and U S A. 2003;100:6263–8. phenotypic data from controlled, semi-controlled, and field phenotyping platforms. BMC Plant Biol. 2011;11:146. 44. Yan L, Loukoianov A, Blechl A, Tranquilli G, Ramakrishna W, SanMiguel P, 23. Orvar BL, Sangwan V, Omann F, Dhindsa RS. Early steps in cold sensing by Bennetzen JL, Echenique V, Dubcovsky J. The wheat VRN2 gene is a flowering plant cells. The role of actin cytoskeleton and membrane fluidity. Plant J. repressor down-regulated by Vernalization. Science. 2004;303:1640–4. 2000;23:785–94. 45. Yan L, Fu D, Li C, Blechl A, Tranquilli G, Bonafede M, Sanchez A, Valarik M, 24. Sangwan V, Foulds I, Singh J, Dhindsa RS. Cold-activation of Brassica napus Yasuda S, Dubcovsky J. The wheat and barley vernalization gene VRN3 is an BN115 promoter is mediated by structural changes in membranes and orthologue of FT. Proc Natl Acad Sci U S A. 2006;103:19581–6. cytoskeleton, and requires Ca2+ influx. Plant J. 2001;27:1–12. 46. Francia E, Rizza F, Cattivelli L, Stanca AM, Galiba G, Toth B, Hayes PM, Skinner JS, Pecchioni N. Two loci on chromosome 5H determine low- 25. Chinnusamy V, Ohta M, Kanrar S, Lee BH, Hong X, Agarwal M, Zhu JK. ICE1. temperature tolerance in a ‘Nure’ (winter) x ‘Tremois’ (spring) barley map. A regulator of cold-induced transcriptome and freezing tolerance in Theor Appl Genet. 2004;108:670–80. Arabidopsis. Genes Dev. 2003;17:1043–54. 47. Vagujfalvi A, Galiba G, Cattivelli L, Dubcovsky J. The cold-regulated 26. Liu Q, Kasuga M, Sakuma Y, Abe H, Miura S, Yamaguchi-Shinozaki K, transcriptional activator Cbf3 is linked to the frost-tolerance locus Fr-A2 on Shinozaki K. Two transcription factors, DREB1 and DREB2, with an EREBP/ wheat chromosome 5A. Mol Gen Genomics. 2003;269:60–7. AP2 DNA binding domain separate two cellular signal transduction pathways in drought- and low-temperature-responsive gene expression, 48. Zhao YS, Gowda M, Wurschum T, et al. Dissecting the genetic architecture of respectively, in Arabidopsis. Plant Cell. 1998;10:1391–406. frost tolerance in central European winter wheat. J Exp Bot. 2013;64:4453–60. 27. Stockinger EJ, Gilmour SJ, Thomashow MF. Arabidopsis thaliana CBF1 49. Galiba G, Quarrie SA, Sutka J, Morgounov A, Snape JW. RFLP mapping of encodes an AP2 domain-containing transcriptional activator that binds to the vernalization (vrn1) and frost-resistance (fr1) genes on chromosome 5a the C-repeat/DRE, a cis-acting DNA regulatory element that stimulates of wheat. Theor Appl Genet. 1995;90:1174–9. transcription in response to low temperature and water deficit. Proc Natl 50. Sutka J, Galiba G, Vagujfalvi A, Gill BS, Snape JW. Physical mapping of the Acad Sci U S A. 1997;94:1035–40. Vrn-A1 and Fr1 genes on chromosome 5A of wheat using deletion lines. 28. Allen MD, Yamasaki K, Ohme-Takagi M, Tateno M, Suzuki M. A novel Theor Appl Genet. 1999;99:199–202. mode of DNA recognition by a beta-sheet revealed by the solution 51. Stockinger EJ, Skinner JS, Gardner KG, Francia E, Pecchioni N. Expression structure of the GCC-box binding domain in complex with DNA. EMBO J. levels of barley Cbf genes at the frost resistance-H2 locus are dependent 1998;17:5484–96. upon alleles at Fr-H1 and Fr-H2. Plant J. 2007;51:308–21. 29. Dietz KJ, Vogel MO, Viehhauser A. AP2/EREBP transcription factors are part 52. Santra DK, Santra M, Allan RE, Campbell KG, Kidwell KK. Genetic and of gene regulatory networks and integrate metabolic, hormonal and molecular characterization of Vernalization genes Vrn-A1, Vrn-B1, and Vrn- environmental signals in stress acclimation and retrograde signalling. D1 in spring wheat germplasm from the Pacific northwest region of the Protoplasma. 2010;245:3–14. USA. Plant Breed. 2009;128:576–84. 30. Peng YL, Wang YS, Cheng H, Sun CC, Wu P, Wang LY, Fei J. Characterization 53. Chen YH, Carver BF, Wang SW, Zhang FQ, Yan LL. Genetic loci associated and expression analysis of three CBF/DREB1 transcriptional factor genes with stem elongation and winter dormancy release in wheat. Theor Appl from mangrove Avicennia marina. Aquat Toxicol. 2013;140:68–76. Genet. 2009;118:881–9. Babben et al. BMC Genomics (2018) 19:409 Page 23 of 24 54. Chen Y, Carver BF, Wang S, Cao S, Yan L. Genetic regulation of developmental 77. Li YL, Haseneyer G, Schon CC, Ankerst D, Korzun V, Wilde P, Bauer E. High phases in winter wheat. Mol Breed. 2010;26:573–82. levels of nucleotide diversity and fast decline of linkage disequilibrium in 55. Eagles HA, Cane K, Trevaskis B. Veery wheats carry an allele of Vrn-A1 that has rye (Secale cereale L.) genes involved in frost response. BMC Plant Biol. implications for freezing tolerance in winter wheats. Plant Breed. 2011;130:413–8. 2011;11:6. 78. Pritchard JK, Stephens M, Donnelly P. Inference of population structure 56. Reddy L, Allan RE, Campbell KAG. Evaluation of cold hardiness in two sets using multilocus genotype data. Genetics. 2000;155:945–59. of near-isogenic lines of wheat (Triticum aestivum) with polymorphic 79. Evanno G, Regnaut S, Goudet J. Detecting the number of clusters of vernalization alleles. Plant Breed. 2006;125:448–56. individuals using the software STRUCTURE. A simulation study. Mol Ecol. 57. Baga M, Chodaparambil SV, Limin AE, Pecar M, Fowler DB, Chibbar RN. 2005;14:2611–20. Identification of quantitative trait loci and associated candidate genes for low-temperature tolerance in cold-hardy winter wheat. Funct Integr 80. Ramasamy RK, Ramasamy S, Bindroo BB, Naik VG. STRUCTURE PLOT: a Genomics. 2007;7:53–68. program for drawing elegant STRUCTURE bar plots in user friendly interface. 58. Motomura Y, Kobayashi F, Iehisa JCM, Takumi S. A major quantitative trait SpringerPlus. 2014;3:431. locus for cold-responsive gene expression is linked to frost-resistance gene 81. Perrier X, Jacquemoud-Collet JP. DARwin software. 2006, http://darwin.cirad.fr/. Fr-A2 in common wheat. Breed Sci. 2013;63:58–67. 82. Goodman MM, Stuber CW. Races of maize. 6. Isozyme variation among 59. Vagujfalvi A, Aprile A, Miller A, Dubcovsky J, Delugu G, Galiba G, Cattivelli L. races of maize in Bolivia. Maydica. 1987;28:169–87. The expression of several Cbf genes at the Fr-A2 locus is linked to frost 83. Reif JC, Melchinger AE, Frisch M. Genetical and mathematical properties of resistance in wheat. Mol Gen Genomics. 2005;274:506–14. similarity and dissimilarity coefficients applied in plant breeding and seed bank management. Crop Sci. 2005;45:1–7. 60. Miller AK, Galiba G, Dubcovsky J. A cluster of 11 CBF transcription factors is located at the frost tolerance locus Fr-a(m)2 in Triticum monococcum. Mol 84. Wright S. Evolution and the genetics of populations. A treatise in four Gen Genomics. 2006;275:193–203. volumes. Vol. 4, variability within and among natural population. Chicago, 61. Sutton F, Chen DG, Ge XJ, Kenefick D. Cbf genes of the Fr-A2 allele are London: University of Chicago Press; 1978. p. 91. differentially regulated between long-term cold acclimated crown tissue of 85. The R Project for Statistical Computing. R software. https://www.r-project.org. freeze-resistant and -susceptible, winter wheat mutant lines. BMC Plant Biol. 86. Buckler Lab for Maize Genetics and Diversity. TASSEL software. http://www. 2009;9:34. maizegenetics.net/tassel. 62. Knox AK, Li CX, Vagujfalvi A, Galilba G, Stockinger EJ, Dubcovsky J. 87. Bradbury PJ, Zhang Z, Kroon DE, Casstevens TM, Ramdoss Y, Buckler ES. Identification of candidate CBF genes for the frost tolerance locus Fr-a(m)2 TASSEL. Software for association mapping of complex traits in diverse in Triticum monococcum. Plant Mol Biol. 2008;67:257–70. samples. Bioinformatics. 2007;23:2633–5. 63. Soltesz A, Smedley M, Vashegyi I, Galiba G, Harwood W, Vagujfalvi A. 88. Waterhouse AM, Procter JB, Martin DMA, Clamp M, Barton GJ. Jalview Transgenic barley lines prove the involvement of TaCBF14 and TaCBF15 version 2-a multiple sequence alignment editor and analysis workbench. in the cold acclimation process and in frost tolerance. J Exp Bot. 2013; Bioinformatics. 2009;25:1189–91. 64:1849–62. 89. National Center for Biotechnology Information. U.S. National Library of 64. Kocsy G, Athmer B, Perovic D, Himmelbach A, Szucs A, Vashegyi I, Schweizer Medicine, Rockville pike. 2017. https://blast.ncbi.nlm.nih.gov/Blast.cgi. P, Galiba G, Stein N. Regulation of gene expression by chromosome 5A Accessed 12 May 2017. during cold hardening in wheat. Mol Gen Genomics. 2010;283:351–63. 90. RaptorX: a Web Portal for Protein Structure and Function Prediction. 65. Gulick P, Drouin S, Yu Z, Poisson G, Monroy AF, Sarhan F. Transcriptome Computational Institute at the University of Chicago, Chicago. 2017. http:// comparison of winter and spring wheat responding to low temperature. raptorx.uchicago.edu/StructurePrediction/predict/. Accessed 15 May 2017. Genome. 2005;48:913–23. 91. Kaellberg M, Wang H, Wang S, Peng J, Wang Z, Lu H, Xu J. Template-based 66. Monroy AF, Dryanova A, Malette B, Oren DH, Farajalla MR, Liu W, Danyluk J, protein structure modeling using the RaptorX web server. Nat Protoc. 2012; Ubayasena LWC, Kane K, Scoles GJ, Sarhan F, Gulick PJ. Regulatory gene 7:1511–22. candidates and gene expression analysis of cold acclimation in winter and 92. Peng J, Xu J. RaptorX. Exploiting structure information for protein alignment spring wheat. Plant Mol Biol. 2007;64:409–23. by statistical inference. Proteins Struct Funct Bioinf. 2011;79:161–71. 67. Laudencia-Chingcuanco D, Ganeshan S, You F, Fowler B, Chibbar R, 93. Delpot W, Poon AF, Frost SDW, Kosakovsky Pond SL. Datamonkey 2010: a Anderson O. Genome-wide gene expression analysis supports a suite of phylogenetic analysis tools for evolutionary biology. Bioinfomatics. developmental model of low temperature tolerance gene regulation in 2010;26:2455–7. wheat (Triticum aestivum L.). BMC Genomics. 2011;12:299. 94. Kosakovsky Pond SL, Frost SDW, Muse SV. HyPhy: hypothesis testing using 68. Ganeshan S, Sharma P, Young L, Kumar A, Fowler DB, Chibbar RN. phylogenies. Bioinfomatics. 2005;21:676–9. Contrasting cDNA-AFLP profiles between crown and leaf tissues of cold- 95. Katoh K, Kuma K, Toh H, Miyata T. MAFFT version 5: improvement in acclimated wheat plants indicate differing regulatory circuitries for low accuracy of multiple sequence alignment. Nucleic Acids Res. 2005;33:511–8. temperature tolerance. Plant Mol Biol. 2011;75:379–98. 96. Kumar S, Stecher G, Peterson D, Tamura K. MEGA-CC: computing of 69. Winfield MO, Lu C, Wilson ID, Coghill JA, Edwards KJ. Cold- and light- molecular evolutionary genetics analysis program for automated and induced changes in the transcriptome of wheat leading to phase transition iterative data analysis. Bioinformatics. 2012;28:2685–6. from vegetative to reproductive growth. BMC Plant Biol. 2009;9:55. 97. Suyama M, Torrents D, Bork P. PAL2NAL: robust conversion of protein 70. Neumann K, Kobiljski B, Denčić S, Varshney RK, Börner A. Genome-wide sequence alignments into the corresponding codon alignments. Nucleic association mapping: a case study in bread wheat (Triticum aestivum L.). Acids Res. 2006;34:W609–12. Mol Breed. 2011;27(1):37–58. 98. SOGO Genome Information Database System for Innovation of Crop and 71. Endo TR, Gill BS. The deletion stocks of common wheat. J Hered. 1996; Livestock Production. Institute of Crop Science NARO, Tsukuba. 2017. 87:295–307. https://sogo.dna.affrc.go.jp/cgi-bin/sogo.cgi?lang=en. Accessed 09 Jul 2017. 72. Sears ER. Nullisomic-tetrasomic combinations in hexaploid wheat. In: Riley R, 99. Higo K, Ugawa Y, Iwamoto M, Korenaga T. Plant cis-acting regulatory DNA Lewis KR, editors. Chromosome manipulations and plant genetics. New elements (PLACE) database: 1999. Nucleic Acids Res. 1999;27:297–300. York: Springer Science+Business Media; 1966. p. 29–45. 100. Softberry. Softberry Inc., Mount Kisco. 2017. http://www.softberry.com/berry. 73. Stein N, Herren G, Keller B. A new DNA extraction method for high- phtml?topic=ann2_ann3&no_menu=on. Accessed 09 Jul 2017. throughput marker analysis in a large-genome species such as Triticum 101. Solovyev V, Shahmuradov I, Salamov A. Identification of promoter regions aestivum. Plant Breed. 2001;120:354–6. and regulatory sites. In: Ladunga I, editor. Computational biology of 74. Pearson K. On certain errors with regard to multiple correlation occasionally transcription factor binding. Totowa: Springer Science+Business Media; made by those who have not adequately studied this subject. Biometrika. 2010. p. 57–83. 1914;10:181. 102. Theissen G, Kim JT, Saedler H. Classification and phylogeny of the MADS- 75. Sanger F, Nicklen S, Coulson AR. DNA sequencing with chain-terminating box multigene family suggest defined roles of MADS-box gene subfamilies inhibitors. Proc Natl Acad Sci U S A. 1977;74:5463–7. in the morphological evolution of eukaryotes. J Mol Evol. 1996;43:484–516. 76. Katoh K, Misawa K, Kuma K, Miyata T. MAFFT. A novel method for rapid 103. Kaufmann K, Melzer R, Theissen G. MIKC-type MADS-domain proteins. multiple sequence alignment based on fast Fourier transform. Nucleic Acids Structural modularity, protein interactions and network evolution in land Res. 2002;30:3059–66. plants. Gene. 2005;347:183–98. Babben et al. BMC Genomics (2018) 19:409 Page 24 of 24 104. Taoka K, Ohki I, Tsuji H, et al. 14-3-3 proteins act as intracellular receptors for rice Hd3a florigen. Nature. 2011;476:332–5. 105. Tanksley SD, McCouch SR. Seed banks and molecular maps. Unlocking genetic potential from the wild. Science. 1997;277:1063–6. 106. Knox AK, Dhillon T, Cheng HM, Tondelli A, Pecchioni N, Stockinger EJ. CBF gene copy number variation at frost Resistance-2 is associated with levels of freezing tolerance in temperate-climate cereals. Theor Appl Genet. 2010;121:21–35. 107. Zhu J, Pearce S, Burke A, See D, Skinner D, Dubcovsky J, Garland-Campbell K. Copy number and haplotype variation at the VRN-A1 and central FR-A2 loci are associated with frost tolerance in hexaploid wheat. Theor Appl Genet. 2014;127:1183–97. 108. Campoli C, Matus-Cadiz MA, Pozniak CJ, Cattivelli L, Fowler DB. Comparative expression of Cbf genes in the Triticeae under different acclimation induction temperatures. Mol Gen Genomics. 2009;282:141–52. 109. Diaz A, Zikhali M, Turner AS, Isaac P, Laurie DA. Copy number variation affecting the photoperiod-B1 and Vernalization-A1 genes is associated with altered flowering time in wheat (Triticum aestivum). PLoS One. 2012;7:e33234. 110. Puranik S, Acajjaoui S, Conn S, et al. Structural basis for the oligomerization of the MADS domain transcription factor SEPALLATA3 in Arabidopsis. Plant Cell. 2014;26:3603–15. 111. Bachhawat P, Stock AM. Crystal structures of the receiver domain of the response regulator PhoP from Escherichia coli in the absence and presence of the phosphoryl analog Beryllofluoride. J Bacteriol. 2007;189:5987–95. 112. Wu Y, Dey R, Han A, Jayathilaka N, Philips M, Ye J, Chen L. Structure of the MADS-box/MEF2 domain of MEF2A bound to DNA and its implication for Myocardin recruitment. J Mol Biol. 2010;397:520–33. 113. Santelli E, Richmond TJ. Crystal structure of MEF2A core bound to DNA at 1. 5 angstrom resolution. J Mol Biol. 2000;297:437–49.

Journal

BMC GenomicsSpringer Journals

Published: May 29, 2018

References

You’re reading a free preview. Subscribe to read the entire article.


DeepDyve is your
personal research library

It’s your single place to instantly
discover and read the research
that matters to you.

Enjoy affordable access to
over 18 million articles from more than
15,000 peer-reviewed journals.

All for just $49/month

Explore the DeepDyve Library

Search

Query the DeepDyve database, plus search all of PubMed and Google Scholar seamlessly

Organize

Save any article or search result from DeepDyve, PubMed, and Google Scholar... all in one place.

Access

Get unlimited, online access to over 18 million full-text articles from more than 15,000 scientific journals.

Your journals are on DeepDyve

Read from thousands of the leading scholarly journals from SpringerNature, Elsevier, Wiley-Blackwell, Oxford University Press and more.

All the latest content is available, no embargo periods.

See the journals in your area

DeepDyve

Freelancer

DeepDyve

Pro

Price

FREE

$49/month
$360/year

Save searches from
Google Scholar,
PubMed

Create lists to
organize your research

Export lists, citations

Read DeepDyve articles

Abstract access only

Unlimited access to over
18 million full-text articles

Print

20 pages / month

PDF Discount

20% off