The effects of repeated whole genome duplication events on the evolution of cytokinin signaling pathway

The effects of repeated whole genome duplication events on the evolution of cytokinin signaling... Background: It is thought that after whole-genome duplications (WGDs), a large fraction of the duplicated gene copies is lost over time while few duplicates are retained. Which factors promote survival or death of a duplicate remains unclear and the underlying mechanisms are poorly understood. According to the model of gene dosage balance, genes encoding interacting proteins are predicted to be preferentially co-retained after WGDs. Among these are genes encoding proteins involved in complexes or in signal transduction. Results: We have investigated the way that repeated WGDs during land plant evolution have affected cytokinin signaling to study patterns of gene duplicability and co-retention in this important signal transduction pathway. Through the integration of phylogenetic analyses with comparisons of genome collinearity, we have found that signal input mediated by cytokinin receptors proved to be highly conserved over long evolutionary time-scales, with receptors showing predominantly gene loss after repeated WGDs. However, the downstream elements, e,g. response regulators, were mainly retained after WGDs and thereby formed gene families in most plant lineages. Conclusions: Gene dosage balance between the interacting components indicated by co-retention after WGDs seems to play a minor role in the evolution of cytokinin signaling pathway. Overall, core genes of cytokinin signaling show a highly heterogeneous pattern of gene retention after WGD, reflecting complex relationships between the various factors that shape the long-term fate of a duplicated gene. Keywords: Gene duplication, Whole-genome duplication, Gene retention, Gene dosage balance, Cytokinin signaling, Signal transduction, Evolution Background Studies of WGD events and their effect on core angio- Duplications of individual genes and whole genomes are sperm genes (i.e., gene families shared by all angiosperm a dominant feature of plant evolution and have been de- species) showed a generalized pattern of “gene duplic- tected in all land plant lineages [1–4]. Gene duplication ability”, meaning the ability of genes to be retained fol- is assumed to be a stochastic process and a common fate lowing WGD. Three categories could be defined: a) of a duplicate is its loss [5, 6]. However, the retention of “singleton” genes: the majority of core genes occur as duplicate genes seems to be biased toward certain func- single copies and are functionally involved in the main- tional classes of genes [7–9]. Another factor that seems tenance of genome integrity; b) “multicopy” genes: genes to influence the long-term survival of a duplicated gene remain in a duplicated state throughout time and are is the mode of duplication as genes which are predomin- functionally biased toward signaling, transport, and me- antly retained differ between whole-genome duplications tabolism; and c) “intermediate” genes: these genes show (WGDs) and small-scale duplication events [10–14]. a pattern of prolonged duplicate retention spanning sev- eral tens of millions of years following WGD but appear eventually to return to singleton status. This later group * Correspondence: ekaltenegger@bot.uni-kiel.de; aheyl@adelphi.edu Department Biochemical Ecology and Molecular Evolution, Botanical (intermediate genes) is enriched for genes that are Institute, Christian-Albrechts-University, Kiel, Germany involved in development, growth, and regulation of tran- Institute of Applied Genetics, Freie Universität Berlin, Berlin, Germany scription [9]. Full list of author information is available at the end of the article © The Author(s). 2018 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. Kaltenegger et al. BMC Evolutionary Biology (2018) 18:76 Page 2 of 19 For the categories b) and c), gene dosage balance after WGD include signal transduction genes in diverse (GDB) theory is discussed to be a major driver of gene species such as banana [21], Arabidopsiss [22], and the retention after WGD [9]. Basically, the GDB theory ciliate Paramecium tetraurelia [23]. Here, we have states that, for many genes whose products participate studied the pattern of gene retention and loss of the in protein complexes, the stoichiometry among interact- individual components of core cytokinin signaling after ing gene products (i.e., proteins) must be maintained repeated WGDs during land plant evolution to test [10, 15–18]. Thus, according to GDB, dosage-balance- whether a bias exists in the gene duplicability of the in- sensitive genes are predicted to be co-retained after a dividual components and to explore whether GDB can WGD event. These genes are also predicted to continu- explain the observed pattern. Cytokinins are plant hor- ously experience purifying selection after duplication mones that play pivotal roles in plant development and leading to prolonged retention. This prolonged retention its response to changes in the environment [24]. Various accompanied by the gradual circumvention of dosage- studies have indicated that the cytokinin signaling sys- balance-constraints may increase the possibility that du- tem was established in early divergent land plants, and plicate genes diversify (sub- or neofunctionalization) and even some Charophyceae green algae have been found become permanently preserved [17, 19]. Genes in the to encode family members of all four components of this “multicopy” group may have been retained – at least ini- signaling pathway [25–27]. Thus, cytokinin signaling is tially – because of dosage balance constraints. The an ideal model system for studying the way that the in- “intermediate” group of gene families can be explained dependent and repeated WGDs during land plant evolu- by a scenario of dosage balance that wears off over time, tion have affected the evolution of the individual leading to prolonged preservation but ultimate loss of components of a signaling pathway. duplicates [9]. The core signaling of the phytohormone cytokinin is GDB has been extended to informational pathways mediated via a variant of the two-component signaling (e.g., signal tansduction) [20], in agreement with the ob- system [28] (Fig. 1a). The cytokinin molecules are servation that preferentially retained gene categories perceived by binding to the Cyclases/Histidine kinases Fig. 1 Cytokinin signaling and repeated polyploidy events during land plant evolution. a Schematic representation of core cytokinin signaling. Cytokinin receptors perceive cytokinins, autophosphorylate and transmit the signal via HPTs to RRAs and RRBs. Pseudo-HPTs may compete for phosphotransfer with HPTs. RRAs are induced by cytokinins and function as negative regulators to form feedback regulatory loops. RRBs encode DNA-binding transcription factors that mediate cytokinin-dependent transcriptional activation [24]. b Repeated WGDs and WGTs during land plant evolution and sampling strategy [4, 21, 37, 40, 41, 43, 44, 50, 95, 101, 102]. The figure illustrates the phylogenetic tree topology for land plants [93, 94, 96, 97]. Klebsormidium flaccidum is placed on the basal lineage of current land plants marking the transition from the aquatic to the terrestrial life form [25]. Ancestral polyploidy events in seed plants and angiosperms are indicated by symbols and were inferred from the literature, given in the key. Gray boxes mark the 14 core species chosen for this study of comparative analyses of cytokinin signaling (Table 1). For all depicted species/lineages, genes encoding CHKs were identified and their evolutionary history was reconstructed. Additionally, the evolutionary history of HPTs, RRAs, and RRBs from species labeled with * was reconstructed Kaltenegger et al. BMC Evolutionary Biology (2018) 18:76 Page 3 of 19 Associated Sensing Extracellular (CHASE) domain of a Table 1 Copy numbers of the cytokinin signaling components membrane-bound hybrid histidine kinase (CHASE do- Species CHKs HPTs (AHPT/PHPT) RRAs RRBs main containing histidine kinase, CHK) that serves as Arabidopsis lyrata 3 6 (5/1) 10 15 receptor [29, 30]. The binding of the hormone leads to Arabidopsis thaliana 3 6 (5/1) 10 15 the autophosphorylation of the histidine kinase domain. Brassica oleracea 4 8 (7/1) 15 19 After an intramolecular phosphotransfer to the c- Brassica rapa 4 8 (7/1) 20 20 terminal response regulator domain of the receptor, the Nelumbo nucifera 4 8 (6/2) 7 11 signal is transferred to histidine phosphotransfer pro- teins (HPTs). These proteins have been shown to shuttle Musa acuminata 8 10 (8/2) 14 14 between the cytoplasm and the nucleus [31]. The HPTs Oryza sativa Japonica Group 4 5 (2/3) 13 10 can be divided into enzymatically active and inactive Zea mays 7 5 (4/1) 12 9 orthologs (pseudo-HPTs). The pseudo-HPTs lack a Amborella trichopoda 2 4 (3/1) 4 6 conserved histidine residue that acts as a phosphoryl- Picea abies 2 7 (5/2) 12 7 ation site and negatively interfere with pathway activity Pinus taeda 3 3 (3/0) 13 9 [32, 33]. HPTs can phosphorylate the response regulator domain of various response regulators. In cytokinin sig- Selaginella moellendorffii 2 2 (2/0) 3 10 naling, two types of response regulators have been Physcomitrella patens 11 2 (2/0) 7 5 shown to be important: i) the type-B response regulators Klebsormidium flaccidum 9 1 (1/0) 3 1 (RRB), which are Myb type transcription factors that, Abbreviations: AHPT authentic His-containing phosphotransfer protein, PHPT upon phosphorylation, initiate the transcription of their pseudo His-containing phosphotransfer protein, which lacks the conserved His without EHD1 copies: OrysatHPT25 OsRR27 and OrysatHPT24 OsRR30, target genes, and ii) the type-A response regulators including B-IV group, B-V group excluded (classification according to Tsai et (RRA), which are transcriptionally regulated by the RRB al. [45]) [34] and have been shown to be negative regulators of the cytokinin signaling pathway [35]. Comparison of copy numbers of the various cytokinin The study presented here reveals that the individual signaling components among the investigated species components of cytokinin signaling were duplicated and Sequences encoding the four components of the cytoki- retained independently of each other. Although the cyto- nin signaling pathway were identified in the core species kinin signaling pathway expanded mainly via WGD and categorized as bona fide CHKs, HPTs, RRAs, or events, the observed pattern of gene duplicability and RRBs. The copy numbers of the identified components the pattern of co-retention after WGDs does not correl- varied between species and also between the different ate with the predictions of GDB. Instead, downstream protein families (Table 1). The number of cytokinin re- elements of the pathway show a trend towards higher ceptors was relatively stable between species, ranging gene duplicability compared with upstream elements. from two to four copies across most land plants. Excep- tions found were M. acuminata, Zea mays, Physcomi- trella patens, and K. flaccidum with eight, seven, eleven, Results and nine receptor genes, respectively. In contrast, for Repeated WGDs during land plant evolution provide the HPTs, a steady increase in copy number was detected background to study the evolutionary patterns of the during land plant evolution starting from two HPTs in cytokinin signaling components the moss P. patens to eight and 10 in Brassica species In order to study the evolutionary pattern of the individ- and Musa acuminata, respectively. In the gymnosperm ual components of cytokinin signaling after whole gen- Picea abies, seven HPT copies were identified, in com- ome duplications, plant species were chosen for further parison with three HPT copies in the closely related analysis to cover the major meso- and paleopolyploidy Pinus taeda. Another noteworthy trend was the emer- events reported in land plant evolution [4, 36–38] gence of pseudo HPTs in P. abies and in angiosperms (Fig. 1b). Furthermore, to allow the identification of all with copy numbers ranging from one to two, with the members of the four protein families involved in cytokinin exception of rice for which three pseudo HPTs were signaling pathway the availability of a large dataset, e.g., a identified. fully sequenced genome or transcriptome, was another RRA and RRB copy number increased steadily during criterion to select species. Thus, this study focused on 14 land plant evolution to form middle size gene families in “core” plant species (Table 1,Fig. 1b) for comparative ana- dicots and monocots. Furthermore, with a few excep- lyses of cytokinin signaling. Beginning with Klebsormidium tions, the number of RRAs and RRBs in flowering plant flaccidum as a representative of the Charophyceae, the species were found to be roughly equal (Table 1). algae lineage that gave rise to land plants, the whole These differences in the copy number between the four spectrum of land plants was covered. gene families involved in cytokinin signaling indicated that Kaltenegger et al. BMC Evolutionary Biology (2018) 18:76 Page 4 of 19 the individual components experienced different evolu- studied (Additional file 2: Table S2). The results are tionary pressures that influenced their duplicability or summarized in Table 2 and information on colinearity is rather their retention after WGD. included in the reconciled CHK tree (Fig. 2). All reconciled trees supported two ancient duplication Reconstruction of cytokinin receptor evolution events: i) one duplication before the split of gymnosperm The complete sequence of cytokinin receptors included and angiosperms giving rise to the ancestor of the AHK4 four protein domains (PFAM domains of CHASE, clade and the ancestor of the AHK2/AHK3 clade (Fig. 2, HisKinaseA, HATPase, response regulator receiver) as Additional file 3: Figure S3) which coincides with the an- alignable regions, which covered in total 466 amino cient ζ WGD event [37]; ii) a second duplication event be- acids. To reconstruct CHK evolution during land plant fore the radiation of angiosperms giving rise to the AHK2 evolution, genes encoding CHKs of the 14 above- and AHK3 clades which coincides with a well-established mentioned “core species” (Table 1, Fig. 1) were analyzed. ε WGD event at the basis of angiosperm evolution [37]. A Additional species were sampled to improve the phylo- third ancient duplication event is predicted within the genetic reconstructions (Fig. 1). Thus, the final dataset monocot clade of AHK4. The different reconciled trees included CHKs from 51 plant species ranging from support either a correlation with the described commeli- mosses, lycophytes, ferns, and gymnosperms up to flow- nid specific τ WGD [40](Fig. 2a) or the grass specific σ ering plants (Additional file 1: Table S1). To test the ro- WGD [41](Fig. 2b). Lineage specific duplications of CHKs bustness of the tree topology, trees based on different are supported by the reconciled trees in Z. mays, Gossy- substitution models (nucleotide, codon, and protein sub- pium raimondii,and M. acuminata that correlate with stitution models) were calculated and compared. Fur- lineage specific WGDs. Exactely these species all have a thermore, maximum likelihood (ML) and Bayesian CHK copy number greater than four (see Table 1). reconstruction were performed. While genomic comparisons did not provide further In the resulting trees, the main branching pattern evidence that the above-described ancient duplications was highly similar. Robinson-Foulds-(RF)-distances could be traced back to the discussed WGD events, these (Additional file 2: Table S2) between pairs of trees comparisons could clearly show that the lineage specific showed that less than 25% of branches were dissimi- WGD in Z. mays and G. raimondii were involved in the lar for most pairwise comparisons. Codon models fit- copy number increase in these species. For example, in ted the data best according to a model test with Z. mays, the paralogous CHK pairs (ZeamayCHK01/ ModelOMatic [39]. Overall, phylogenetic signal in the CHK02, ZeamayCHK03/CHK04, ZeamayCHK05/CHK06) dataset was sufficient, and tree topology was robust. are located in collinear or syntenic regions (Fig. 2; Table 2; All reconstructed trees supported the presence of three Additional file 2: Table S2). Furthermore, the estimated di- major clades within angiosperm CHKs (Additional file 3: vergence time of the collinear regions (i.e., the paralogous Figure S1 and S2). These clades were named according to blocks) based on pairwise synonymous nucleotide substi- the three well-characterized cytokinin receptors of A. tution rates (Ks distances) vary between 0.15 and 0.29 and thaliana, which were located within the clades (the correlate well with the Ks distance characteristic for gene AHK2, the AHK3, and the AHK4 clade, respectively). The pairs created by the maize-lineage specific WGD [42, 43]. more basal branches of the phylogenetic tree reconstruc- Interestingly, only three of the four duplicates resulting tions were generally less supported. Furthermore, the from this WGD were retained (Fig. 2). In G. raimondii, positioning of the CHKs from gymnosperms was incon- genomic organization and Ks distances also supports the sistent in the various tree reconstructions. However, CHKs lineage-specific gene-expansion of CHKs via the five- to of the early diverging land plants (lycophytes and bryo- six-fold ploidy increase over ~ 60 Mya (Fig. 2; Table 2), phytes) fell in all tree reconstructions reproducibly outside characterized by Ks distances ~ 0.5 [44]. In M. acuminata the above-mentioned groups of the land plant CHKs. At however, which possesses eight CHK-encoding gene cop- the very base of the tree, we found sequences encoding ies and experienced three rounds of WGD in the range of CHKs of the algae K. flaccidum and the moss P. patens. 65 to 100 Mya [21], intragenomic comparisons only sup- port an origin of the paralogs MusacuCHK15/CHK5 Ancient duplications of cytokinin receptors (Table 2) through one of these events. Besides, one pair of To analyze the ancient duplication events during cytoki- tandem duplicates, MusacuCHK01/CHK03, has been nin receptor evolution in more detail, the above- identified within an ultracontig of the M. acuminata described set of gene trees (ML; Mr. Bayes, codon, genome. One further duplication event, which led to a protein, nucleotides trees) were reconciled with a species copy number increase in Brassica sp., could be identified tree that reflected the commonly accepted evolution of as a large scale duplication. The paralogs BrarapCHK04/ land plants (Fig. 1). Furthermore, genomic organization CHK03 are located within an collinear intragenomic concerning location in collinear or syntenic blocks was region (Fig. 2, Additional file 2: Table S2). Based on Kaltenegger et al. BMC Evolutionary Biology (2018) 18:76 Page 5 of 19 Table 2 Gene colinearity of the cytokinin signaling components Table 2 Gene colinearity of the cytokinin signaling components a b (Continued) Gene pair Multiplicon Anchor points Ks/median Ks a b Gene pair Multiplicon Anchor points Ks/median Ks BrarapCHK04/CHK03 340,248 5 1.03/1.05 AralyrRRA02/RRA04 36,506 75 1.1/0.91 Syntenic region AralyrRRA05/RRA07 60,452 30 0.96/0.88 ZeamayCHK03/CHK04 11,281 135 0.19/0.26 AralyrRRA01/RRA03 31,061 179 0.62/0.88 ZeamayCHK05/CHK06 10,125 315 0.15/0.29 AralyrRRA09/RRA10 35,707 80 0.87/0.82 GosraiCHK02/CHK06 59,516 31 0.39/0.48 ArathaRRA03/RRA04 34,065 96 0.83/0.91 GosraiCHK02/CHK04 65,400 27 0.34/0.52 ArathaRRA05/RRA06 56,388 33 1.18/0.96 GosraiCHK04/CHK06 36,049 78 0.38/0.60 ArathaRRA03/RRA04 34,327 93 0.77/0.96 GosraiCHK03/CHK07 36,049 78 0.47/0.60 ArathaRRA07/RRA08 30,310 263 1.18/0.9 GosraiCHK03/CHK08 125,670 13 0.47/0.69 ArathaRRA09/RRA11 30,310 263 0.74/0.9 GosraiCHK07/CHK08 148,631 11 0.43/0.46 ZeamayRRA05/RRA08 11,007 162 1.26/0.26 GosraiCHK01/CHK05 98,488 17 0.39/0.57 ZeamayRRA10/RRA12 33,006 7 1.87/2.23 MusacuCHK05/CHK15 11,287 133 0.37/0.54 ZeamayRRA11/RRA12 11,559 116 0.40/0.33 BrarapHPT01/HPT03 52,943 36 1.12/1.24 MusacuRRA06/RRA07 37,422 6 0.98/0.66 BrarapHPT01/HPT02 296,202 6 0.93/1.34 MusacuRRA07/RRA08 35,224 7 1.45/1.16 BrarapHPT02/HPT03 45,950 44 0.32/0.38 MusacuRRA11/RRA14 16,973 19 0.50/0.50 BrarapHPT07/HPT08 29,877 334 0.63/0.37 MusacuRRA11/RRA13 35,273 7 0.73/0.64 ArathaHPT01/HPT02 99,381 17 0.50/1.19 MusacuRRA03/RRA20 15,306 25 0.74/0.60 MusacuHPT06/HPT09 17,637 17 0.36/0.66 MusacuRRA01/RRA02 25,508 9 0.41/0.41 MusacuHPT07/HPT09 39,756 6 0.36/0.85 MusacuRRA02/RRA04 16,323 21 0.53/0.72 ZeamayHPT01/HPT02 23,667 10 0.12/0.28 BrarapRRB41/RRB42 33,554 102 0.33/0.34 OrsaJaHPT01/HPT02 20,328 13 0.83/1.16 BrarapRRB32/RRB33 30,151 290 0.36/0.37 OrsaJaHPT03/HPT04 13,603 40 1.02/1.15 BrarapRRB37/RRB39 40,489 58 0.48/0.42 BrarapRRA02/RRA12 163,623 10 0.38/0.35 BrarapRRB47/RRB48 122,822 14 0.38/0.33 BrarapRRA02/RRA07 65,158 27 0.47/0.37 BrarapRRB59/RRB60 44,640 47 0.56/0.39 BrarapRRA07/RRA12 31,421 160 0.30/0.39 AralyrRRB18/RRB16 70,367 25 1.07/0.92 BrarapRRA07/RRA11 49,477 40 0.87/1.05 AralyrRRB25/RRB24 105,708 16 0.93/0.78 BrarapRRA04/RRA06 151,125 22 0.84/1.13 AralyrRRB37/RRB25 234,790 7 6.90/6.90 BrarapRRA06/RRA10 56,429 33 0.29/0.39 ArathaRRB22/RRB19 56,277 33 1.31/0.82 BrarapRRA08/RRA09 30,151 290 0.43/0.37 ArathaRRB23/RRB24 111,168 15 0.95/1.19 BrarapRRA01/RRA03 31,812 143 0.41/0.36 ArathaRRB23/RRB45 268,141 6 8.89/4.45 BrarapRRA17/RRA13 82,131 21 1.01/1.01 OrsaJaRRB21/RRB27 12,291 73 1.34/1.34 BrarapRRA17/RRA20 101,049 18 0.92/1.02 OrsaJaRRB21/RRB27 13,539 41 0.98/1.10 BrarapRRA17/RRA21 29,311 402 0.35/0.37 ZeamayRRB24/RRB25 14,587 30 1.30/1.79 BrarapRRA21/RRA13 65,663 27 0.11/1.03 ZeamayRRB21/RRB23 10,224 280 0.12/0.25 BrarapRRA21/RRA22 39,677 61 1.20/1.20 ZeamayRRB18/RRB27 10,125 315 0.14/0.29 BrarapRRA13/RRA22 30,346 258 0.25/0.36 MusacuRRB30/RRB33 17,165 18 0.40/0.56 BrarapRRA13/RRA20 31,080 178 0.27/0.38 MusacuRRB30/RRB36 17,906 17 0.49/0.55 BrarapRRA14/RRA15 61,084 30 0.79/1.19 MusacuRRB39/RRB42 16,390 21 0.47/0.72 BrarapRRA14/RRA18 30,346 258 0.26/0.36 MusacuRRB53/RRB57 17,546 17 0.53/0.53 BrarapRRA15/RRA18 40,545 58 1.05/1.11 colinear gene pairs BrarapRRA15/RRA22 350,553 5 16.41/2.30 Ks distance of the gene pair given in column 1/median Ks distance of the multiplicon OrsaJaRRA06/RRA07 10,605 211 1.30/1.30 OrsaJaRRA08/RRA09 12,741 56 1.47/1.33 OrsaJaRRA01/RRA02 10,348 252 0.02/0.1 Kaltenegger et al. BMC Evolutionary Biology (2018) 18:76 Page 6 of 19 Fig. 2 (See legend on next page.) Kaltenegger et al. BMC Evolutionary Biology (2018) 18:76 Page 7 of 19 (See figure on previous page.) Fig. 2 Reconciled tree of CHK encoding sequences. a Reconciled Bayesian tree of CHK encoding sequences (codon substitution model). Gene tree reconciliation includes rearrangement of branches with a support of less than 70% (posterior probability). Branch support of > 70% is given in the tree. Original Mr. Bayes tree see. Additional file 3: Figure S2. The optimization criterion for reconciliation was the number of duplications and losses. The three dominant clades of CHKs are indicated. Duplicates inferred to be lost are labeled (*LOST) and are given in gray. Of note, most duplicates resulting from WGDs that were lost thereafter were not detected. Thus, WGD events that occurred during CHK evolution are indicated with empty symbols (see key) on the corresponding branches, if resulting duplicates were not retained. If, according to phylogenetic inference plus syntenie/collinearity relationships, a copy resulting from a WGD duplication was retained, duplication nodes are labeled with the corresponding filled symbols, and Ks distances between the gene pairs are given. Furthermore, the median Ks, and the number of gene pairs of the total collinear region is given in brackets. Further duplications inferred from phylogenetic analysis are labeled with a black dot, and if phylogenetic inference indicates that a WGD is involved, the node is also labeled with an empty symbol. Additionally, tandem duplicates of M. acuminata are specified and labeled with the symbol #. Possible CHK pseudogenes from M. acuminata and Z. mays are given in italic. b Alternative tree topology within the AHK4 clade (monocots) of the reconciled maximum likelihood tree (codon substitution) phylogenetic reconstruction and Ks distances, they origi- amplification. According to the reconciled tree, HPTs nated from the Brassicaceae-wide α WGD. experienced further lineage-specific amplification in Concerning the functionality of the above mentioned mono- and dicots through WGDs. For example, three of additional CHK copies in Z. mays, G. raimondii and M. the five HPTs from A. thaliana (AHP2, AHP3 and acuminata, in silico analyses of transmembrane helices AHP5) are found in a clade specific for dicots (HPT predict only three of the eight MusacuCHK copies clade 1, Fig. 3a) and most likely originated by the Brassi- (MusacuCHK01, CHK06 andCHK15) and five of the caceae-specific α and β WGD (Fig. 3a, Additional file 2: seven ZeamayCHK copies (ZeamayCHK03–07) encode Table S2). Gene colinearity of AHP2 and AHP3 support functional receptors while all CHKs from G. raimondii this phylogeny based reconstruction of WGD events are predicted to be functional. Additionally, for Zea- (Table 2, Fig. 2). In contrast, the other two HPTs from mayCHK01/CHK02/CHK04 secretory signal peptides A. thaliana that group with monocot HPTs were not that targets its passenger protein for translocation across amplified by the Brassicaceae-specific WGDs, meaning the endoplasmic reticulum membrane in eukaryotes are that the resulting duplicates have been lost (Fig. 3a,HPT predicted. clade 2 and 3). The monocot HPTs increased either via In summary, given the scenario of ancient origin of the pancereal σ or ρ WGD (Fig. 3a, HPT clade 2) which three CHK copies together with the repeated WGDs that is supported by colinearity of OsAHP1/OsAHP2 and occurred in the subsequent radiation of angiosperm spe- OsPHP1/OsPHP3. However, with the limitations of Ks cies, this indicates that the most common fate of CHK distance based dating in mind, it is not possible to duplicates resulting from WGDs is gene loss. Only in clearly identify which of the two WGD was involved some species that experienced rather recent polyploidi- (Table 2, Fig. 3). The duplicates resulting from the meso- zation events (Ks values 0.1–0.4) have several duplicates polyploidy events in the lineage towards Brassica and Z. been retained. mays were predominantly lost, except the paralogous pairs BrarapHPT02/HPT03, BrarapHPT07/HPT08, and HPT copy number increased through palaeo- and ZeamayHPT01/HPT02 (Fig. 3b). All three pairs are mesopolyploidy events located in colinear regions with Ks distances similar to We analyzed and reconstructed the evolution of HPT- the duplicates of the lineage specific polyploidy events. encoding genes analogous to the CHKs by combining HPTs in M. acuminata showed similar to CHKs a highly phylogenetic tree reconstruction, gene tree-species tree species-specific evolutionary pattern and included reconciliation, and gene synteny analyses to obtain tandem duplications and/or possibly retroposition as evi- predictions of duplication events and their timing. denced by the observation that MusacuHPT02/HPT04 Compared with the analyses of CHKs, a reduced dataset are located on one chromosome in close vicinity including only angiosperm species was used as the align- (chromosome 7) and MusacuHPT02/HPT03/HPT04 able region of the HPTs covering only the 85 amino completely lack introns. acids of the PFAM HPT domain contained limited Concerning the evolution of pseudo-HPTs and thus the phylogenetic signal. The HPT copy number in the evolution of a new regulatory component in cytokinin analyzed species ranged from five to 10 (Table 1). signaling, the phylogenetic reconstructions support that The reconciled trees supported two different topolo- they have emerged at least twice independently. The gies with either two or three repeated duplications duplicability of monocot pseudo-HPTs (OsPHP1/OsPHP3, before the split of mono- and dicots (Fig. 3a and see above) differed from the dicot pseudo-HPTs as the Additional file 3: Figure S4) indicating that ζ and ε former were amplified via a lineage-specific WGD event (ρ WGD may have been involved in ancestral HPT and/or σ) while the latter were not amplified. Kaltenegger et al. BMC Evolutionary Biology (2018) 18:76 Page 8 of 19 Fig. 3 Reconciled tree of HPT encoding sequences. Pseudo HPTs without the canonical histidine are marked with a gray box. a Reconciled Bayesian HPT tree (codon substitution model). Figure style is analogous to that of Fig. 2 concerning color code and the labeling of duplication events. Gene tree reconciliation includes the rearrangement of branches with a support less than 90% (posterior probability). Branch support of > 90% is given in the tree. The three dominant clades of HPTs and tandem duplicates are indicated. b Alternative maximum likelihood tree topology of HPT clade 3, supporting the proposal that the gene pair BrarapHPT8/HPT7 originated by the α’ WGT In summary, the most dominant fate of HPT dupli- but individual clades were consistent between the two cates that arose from WGD events was again gene loss. reconstructions (Additional file 3: Figure S5 and S6). However, some HPTs copies were retained after the β, α, These branches are highlighted in Fig. 5. α, ’ and ρ WGD events which in comparison did not led Although similar numbers of RRAs and RRBs exist in to a similar copy number increase in the upstream flowering plants (Table 1), the in-depth phylogenetic acting CHKs. analyses show that their evolutionary pattern is different. For RRAs, the reconciled trees support a scenario of constant copy number increase via repeated paleao- and Response regulator families expand continuously via mesopolyploidy events. The RRAs of monocots and di- repeated WGD cots form two main groups indicating that their last The evolution and duplication pattern of RRAs and common ancestor possessed two RRA copies, which RRBs of mono- and dicots was reconstructed separately might have arisen from the ancient ζ or ε WGD. Dupli- by using the same approach as that for CHKs and HPTs. cation and differentiation of RRAs then occurred inde- The 112-amino-acid PFAM response regulator receiver pendently during monocot and dicot radiation leading to (RR) domain was used as alignable region. However, the the observed two main clades and the increase in copy tree signal of RRBs was poor indicated by a relative high number in both plant groups. For example, the phylo- percentage of low branch supports. Thus, the threshold genetic analyses indicate that β and γ WGD were in- for the branch support for gene tree reconciliation in the volved in the expansion of RRAs within clade 1 (Fig. 4). RRB trees was lowered to > 0.75 for SH-like branch Phylogeny and collinearity suggest an origin of four par- support. Furthermore, RRB trees reconstructed with alogous RRA pairs in Arabidopsis thaliana (ARR6/ maximum likelihood and Bayesian inference were incon- ARR5, ARR15/ARR7, ARR8/ARR9, ARR17/ARR16) via gruent with regard to the basal branching pattern and the α WGD event. Thus, RRA duplicates were more also the placing of some highly diverged RRB sequences likely to be retained compared to the CHKs and HPTs from M. acuminata (MusacuRRB38, RRB40, RRB43), evolution. Kaltenegger et al. BMC Evolutionary Biology (2018) 18:76 Page 9 of 19 Fig. 4 Reconciled maximum likelihood tree of RRA encoding sequences. Figure style is analogous to that in Fig. 2 concerning color code and the labeling of duplication events. Gene tree reconciliation includes the rearrangement of branches with a support less than 85% (SH-like branch support). Branch support of > 85% is given in the tree. The two dominant clades of RRAs are indicated Kaltenegger et al. BMC Evolutionary Biology (2018) 18:76 Page 10 of 19 On the other hand, RRBs show a more heterogeneous did not give rise to land plants [47]. However, their func- pattern of evolution. Some clades share sequences from tion in these green algae is unclear. Thus, we are monocots and dicots (Fig. 5, clades w - z), suggesting tempted to speculate that the CHKs might have been re- that the last common ancestor possessed at least four cruited specifically in the Charophyceae lineage to serve RRB copies. Of note, within three of these clades, which as a new “plug-in” for the established HPT-RRB net- include A. thaliana PRR2, ARR11, and ARR14 (clade w, work. As has recently been proposed, the reorganization y, z), WGD events rarely led to an increase of the copy of gene regulatory network architecture is a major factor number. However, individual RRB copies, such as the A. underlying evolution, and phytohormonal pathways thaliana ARR10/ARR12 and ARR2/ARR1 gene pairs, might be used to redirect such gene regulatory networks originated via WGD events. Compared with RRAs, amp- to fulfill new functions [48]. lification of RRBs via WGD is patchier. Moreover, RRBs show one more noticeable feature: some gene copies in Cytokinin signaling pathway expanded via WGD events O. sativa subspec. Japonica [45] and B. rapa originated The applied approach of integrating phylogenetic ana- from tandem duplication. lyses with comparisons of genome collinearity, which To summarize, RRAs and RRBs differ in their evolu- has previously been successfully applied to identify an- tionary pattern, although both gene families show a high cient WGD in monocots [40], indicates that many of gene duplicability. RRAs exhibit a clear trend towards the duplications that have affected cytokinin signaling gene retention instead of gene loss after WGD. RRBs components are genome-wide or other large-scale du- may have been amplified by additional duplication plications. Also for ethylene signaling in M. acuminata, mechanisms, e.g., tandem duplication. However, the WGD has been shown to be the dominant duplication reconstruction of RRBs evolution is in general vaguer, mode [49]. However, a few exceptions can be found to suggesting that they might comprise a functionally this common trend. In O. sativum subspec. Japonica, heterogeneous group of sequences. the RRBs OsPRR11, OsPRR12, OsRR29, and OsRR28 originated by tandem duplications (Fig. 5), consistent Discussion with repeated unequal cross-overs [45]. In M. acumi- Genes encoding cytokinin receptors were recruited early nata, the genes encoding MusacuHPT02/HPT04 have in land plant evolution for cytokinin signaling been identified as tandem duplicates. Both copies are The results of our phylogenetic reconstructions of CHK, intronless. MusacuHPT03 is closely related to this pair HPT, RRA, and RRB evolution are in solid agreement of HPTs and also lacks introns. One can speculate that with similar, albeit smaller-scale studies [25, 26, 45, 46] retroposition was involved in the origin of these three and support an early origin of cytokinin signaling at the copies; this would represent yet another mechanism for base of land plant evolution, and functional cytokinin gene duplication affecting members of the cytokinin signaling perhaps is even present in K. flaccidum. How- signaling pathway. ever, the CHKs of K. flaccidum, which form a monophy- In order to estimate the timing of WGD events that letic clade at the basis of the CHK gene trees, group affected cytokinin signaling, we have used Ks distances next to a group of CHKs from the early diverging land between paralogous pairs located in collinear intrage- plant P. patens recently identified by Gruhn et al. [2]. Al- nomic blocks. These distances only offer rough dates though a heterologously expressed member of this sub- that have to be taken with caution because of variable family (PpCHK4) has previously been shown to be able rates among the different gene families [40] and between to bind cytokinin hormones and to translate this binding species [50, 51]. Nevertheless, the WGDs identified as into a cellular response in a dose-dependent manner affecting cytokinin signaling by using this approach are [46], in P. patens a group of three further CHKs evolved in good agreement with those in previous studies. A which clade with the classic cytokinin receptors from genome alignment spanning major Poaceae lineages sup- the other land plants [46]. Genetic and biochemical ex- ports the amplification of the cytokinin signaling compo- periments have shown that these three classic cytokinin nents through WGD events in this lineage [51]. receptors are necessary to mediate the cytokinin re- Furthermore, a phylogenomic analysis of ancestral poly- sponse in this moss [27]. One can speculate at this point ploidy events indicates that RRAs and CHKs were amp- that these three copies were recruited to function specif- lified before the split of basal angiosperms (Aristolochia, ically in cytokinin signaling and have evolved from an- Liriodendron, Nuphar, and Amborella), most likely via cestors forming the very basal group of CHKs from K. the ε WGD [37]. flaccidum and P. patens, whose biological function needs to be investigated. Patterns of GDB in cytokinin signaling Interestingly, HPTs and RRBs are also present in Of special interest in this study has been the testing of Chlorophyceae algae, the branch of the green algae that whether the cytokinin signaling components show a Kaltenegger et al. BMC Evolutionary Biology (2018) 18:76 Page 11 of 19 Fig. 5 Reconciled maximum likelihood tree of RRB encoding sequences. Figure style is analogous to that of Fig. 2 concerning color code and the labeling of duplication events. Gene tree reconciliation includes the rearrangement of branches with a support less than 70% (SH-like branch support). Branching pattern in Bayesian RRB tree reconstructions differed for basal branches from maximum likelihood tree reconstructions. Groups that are consistently reconstructed in Bayesian and maximum likelihood trees are marked with gray boxes. Branch support of > 70% (SH-like branch support/posterior probability) is given in the tree Kaltenegger et al. BMC Evolutionary Biology (2018) 18:76 Page 12 of 19 signature of GDB. Many previous studies have implied trend exists in that downstream elements in cytokinin that GDB preferentially “acts” on multiprotein com- signaling are more likely to be retained while the up- plexes and signal transduction/regulatory networks [15, stream elements tend to be lost (Fig. 6b). This pattern 17, 20]. The central assumption is: if a pathway, whose is in agreement to the trend that upstream genes in a components are linked in a dosage-sensitive relationship, biochemical pathwayevolve moreslowlythandown- is duplicated as a whole via a WGD event, the relative stream genes [52, 53], which might be explained by dosage between genes will be preserved, and the dupli- network characteristics, as most likely the rate- or cated dosage-sensitive genes will be preferentially flux-controlling elements on which natural selection retained. Thus, “interacting” genes are bound to be co- preferentially acts are the upstream elements in a retained over evolutionary time [15]. In core cytokinin pathway [54]. signaling, interacting genes are CHKs, HPTs, RRAs and Focusing on the upstream part – the CHKs, which RRBs (Fig. 1a) and if GDB is a dominating force, co- channel the signal into cytokinin signaling – we found retention after WGD of these components is predicted relatively simple orthologous relationships, and flower- (Fig. 6a), leading to a concerted increase of all interact- ing plants have a similar repertoire of CHKs [45, 46]. ing genes. According to our data, however, a different Whereas CHKs were amplified before the split of Fig. 6 Schematic illustration of the discussed fate of core cytokinin signaling after WGD events. a Gene dosage balance (GDB) between interacting components of core cytokinin signaling predicts co-retention of the encoding genes after WGD leading to a concerted increase of upstream and downstream elements. b Upstream genes, i.e. CHKs that mediate signal input, are duplication resistant compared to downstream element leading to a unequal increase of the individual components of the signaling pathway. In this scenario, perceiving of the signal is conserved but the output can be individually regulated by diverse regulators. c GDB between antagonistic HPTs and pseudo-HPT predicts co-retention. Unequal loss could result in a shift towards either the activator or repressor, depending on which gene is lost Kaltenegger et al. BMC Evolutionary Biology (2018) 18:76 Page 13 of 19 angiosperms possibly via WGD events, the subsequent strongly differ from HPTs. For example, the ancestor of repeated rounds of palaeopolyploidy events during flow- AHP6 and its orthologs in Brassica experienced accord- ering plant evolution did not affect copy number of ing to the reconstructed phylogeny up to three WGD CHKs, meaning that the resulting duplicates were lost. events (α, β, γ) but the resulting duplicates have not This pattern of an unique amplification at the trunk of a been retained (Fig. 3). However, the corresponding an- tree with very few or no subsequent additions of gene tagonists (AHP2, AHP3, AHP5) were partly amplified copies is consistent with the phenomenon of “frozen du- via the α and β WGDs. Thus, in Arabidopsis and plications” described by Makarova et al. [55] as evolu- Brassica, either the pseudo-HPTs experienced unequal tionarily stable paralogous clusters that are not further loss after WGDs (Fig. 6c), or their function as repressor amplified. Only the more recent mesopolyploidy events evolved only after the last common WGD. In rice, both led to a copy number increase of CHKs in G. raimondii, the pseudo- and the authentic-HPT gene family show a Z. mays, and possibly also in M. acuminata. In the two copy number increase due to polyploidy events, however latter species however, several of the additional copies most likely different, pancereal WGD events were in- contain extra sequence motifs like transmembrane heli- volved. Thus, no pattern of co-retention between ces or secretory signal peptides possibly rendering them pseudo-HPT and authentic HPTs is observed. non-functional in cytokinin signaling. One can speculate, In general, lineage-specific gene family expansions that these copies are in the process of pseudogenization seem to be one of the principal means of adaptation and and eventually will be purged from the genome, which one of the most important sources of organizational and may also be the long-term fate for the additional CHK regulatory diversity in crown-group eukaryotes [57]. In copies in G. raimondii. cytokinin signaling, expansion is mostly found for down- HPTs, RRAs, and RRBs displayed overall a higher de- stream elements. This amplification might allow the gree of lineage-specific expansion compared with that of regulatory fine tuning (subfunctionalization) or the re- CHKs. However, the in-depth phylogenetic analyses wiring of signal output to execute novel functionality. showed that the copy number increase of these compo- Furthermore, no unique pattern of co-retention after nents can be traced back to different polyploidy events WGDs, indicative of GDB among the encoded proteins, meaning that the downstream parts were not amplified was identified in cytokinin signaling. Possible explana- in a concerted way. Especially striking is the different tions for this result are: i) an important prerequisite for evolutionary pattern of RRAs and RRBs. While RRAs GDB to manifest is that the duplicates are co-expressed. show a continuous increase in copy number via WGDs, While immediately after a WGD event paralogs show an RRBs are much more heterogenous. But, RRBs comprise identical expression pattern as coding as well as regula- in general a more heterogenous group of genes. Most tory sequences have been duplicated, expression pattern but not all of the RRB genes encode an additional Myb subsequently can change. Indeed, expression divergence domain and some RRBs lack the conserved phosphoryl- between paralogs seems to be the rule rather than the ation site. Functional differentiation between these clas- exception [58] and represents an important way of sub- ses is not well understood and thus, the present analyses functionalization between duplicates. Furthermore, gene might guide further functional studies. dosage balance can be compensated by regulatory Another interesting aspect of the cytokinin signaling changes [18]. To further test the potential role of GDB are the pseudo-HPTs. GDB is especially suggested to be in the evolution of cytokinin signaling, detailed co- a driving force between proteins with opposing actions, expression studies of the interacting components such as enzymatic or transcriptional activators and in- (Fig. 6a), the antagonistic interactions partners (Fig. 6c) hibitors within a pathway, and might shape the duplic- as well as their duplicated copies in polyploids of differ- ability of the encoding genes [15]. Within the cytokinin ent age are necessary. GDB predicts that interaction signaling pathway the HPTs and the pseudo-HPTs, partners should show a correlated expression level. which lack the conserved His residue, are discussed to Thus, after a WGD, an unequal loss of one of the interac- be antagonistic players and thus predicted to be co- tions partners needs to be compensated according to the retained after WGD (Fig. 6c). Of the six HPTs from A. GDB. For several model plants, online tools to study thaliana, only AHP6 is a pseudo-HPT [24, 32]. It nega- expression pattern are available to get first insights in the tively interferes with pathway activity, most likely by dynamic of duplication and expression pattern, e.g. the competing with AHP1–5 for interaction with the phos- Bio-Analytic Resource for Plant Biology (bar.utoronto.ca). phorelay machinery [31, 56]. Whereas in A. thaliana And ii) a central assumption of GDB, namely that gene only one pseudo-HPT (AHP6) has been identified, this expression is directly correlated to copy number variation group is especially large in rice with three copies (CNV), may be not valid. For example, CNV in humans (OsPHP1, PHP2, and PHP3) (Fig. 3). Regarding duplic- only partly account for the differences in expression be- ability, or co-retention after WGD, pseudo-HPTs tween individuals, whereas a large portion of the variance Kaltenegger et al. BMC Evolutionary Biology (2018) 18:76 Page 14 of 19 must stem from other sources [35]. Furthermore, most retention of the various components of cytokinin signaling genes present in CNVs in Drosophila melangolaster show and the diverse modes of duplication as besides WGD also no evidence of increased or diminished transcription [59]. tandem duplication and possibly retroposition was found. Thus, this central assumption of GDB needs to be further Obviously, various mechanisms at diverse levels of inter- studied. action act in shaping the evolution of this signal transduc- However, while GDB may be of minor importance to tion pathway, all of which require further experimental co-stabilize duplicates of cytokinin signaling in the long exploration. Detailed mechanistic studies of specific candi- run, in M. acuminata, two antagonistic components of date genes, e.g., young paralogs in neopolyploid species, ethylene signaling have been shown to be co-retained including analyses of gene expression, gene function (in after the three lineage-specific WGD events indicating vivo and in vitro), and protein interactions will allow to that GDB shaped their evolution [49]. Together, these gain a more complete picture of the forces that shape the results reflect the complexity of molecular mechanisms fate of a duplicated gene. that shape gene duplicability. The fixation and the reten- tion of duplicated genes in plant genomes seem to be Methods context-dependent events, and relevant factors include Identification of cytokinin receptor encoding sequences intrinsic properties, such as gene function, and the en- Sequences encoding putative or confirmed cytokinin re- vironment in which the duplication occurred [60, 61]. ceptors were obtained from the genomes and transcrip- Several specific gene features including gene ontology tomes of selected species (Additional file 1: Table S1, (GO-slim classification), sequence-related features (gene Additional file 4: alignment 1) by screening the online and protein sizes, the GC content in the third codon platforms “Phytozome”, “Ensemble”, and “1 K database” position, protein domain size), expression-related fea- [66–68] by using the Basic Local Alignment Search Tool tures (level of expression and biotic and abiotic respon- (BLAST). Species were chosen to cover the evolution of siveness), and conservation-related features (omega, the cytokinin signaling during land plant evolution by using ratio of relative fixation rates of synonymous and nonsy- the charophyte alga Klebsormidium flaccidum [25] nonymous mutations, as an indicator of selective pres- as an outgroup. For each species, we performed an itera- sure), have been shown to influence gene retention after tive BLASTN search with the sequences encoding the WGD [56]. Another important feature of proteins effect- Cyclase Histidine kinase Associated Sensory Extracellu- ing the duplicability of the encoding genes might be lar (CHASE) domain of cytokinin receptors of the model their tendency to form symmetrical homomers. Duplica- plant Arabidopsis thaliana as query sequences tion of a gene that encodes a homomeric protein can (NCBI accession numbers AHK4 NM_201667, AHK2 lead to the phenomenon of paralog interference, which BT002530, AHK3 NM_102494). The CHASE domain is basically predicts a functional link between paralogous the ligand binding domain of cytokinin receptors [29]. genes via interactions of the encoded proteins in multi- Subsequently we used every identified sequence again as meric complexes and one important outcome of paralog query to search for further sequences with homology to interference is a prolonged retention timer after duplica- cytokinin receptors in the respective species. Sequences tion [62]. Examples of paralog interference has been de- were named as CHASE-domain containing His kinase scribed for MADS box transcriptional regulators in the (CHK) according to the common nomenclature [69] and yeast Kluyveromyces lactis [63] and steroid receptors numbered serially. For the simple and rapid identifica- [64], both form dimers. Of note, also cytokinin receptors tion of the species, the first three initials of the genus are also thought to function as dimers [28, 65] and para- and of the species name are given, e.g., Aratha for log interference could influence their duplicability. A Arabidopsis thaliana (Additional file 1: Table S1). prolonged retention of CHKs in Z. mays compared to To differentiate between Oryza sativa ssp. japonia HPTs, RRAs and RRBs (compare Figs. 2, 3, 4 and 5) and Oryza sativa ssp. indica,the first two initials could be an indicator but further studies of protein in- of the genus, species, and subspecies name are given. Fur- teractions are necessary. thermore, we added the annotation of well-characterized cytokinin receptors from Arabidopsis thaliana [70] Conclusions and Oryza sativa ssp. japonica [71]. If different The observation of the non-random loss of genes follow- splicing variants of a gene were identified, only the longest ing WGD has stimulated much discussion regarding the variant was used for subsequent analyses. The individual molecular mechanisms that influence these outcomes. sequences of this sequence collection were analyzed re- The pattern of gene retention after WGDs within the garding their protein domain structure by using the PFAM cytokinin signaling pathway fits best to the model stating database [72]. Sequences that comprised the following that downstream genes in a pathway evolve faster. How- four PFAM protein domains were identified as functional ever, most striking is the heterogeneous pattern of gene cytokinin receptors: i) CHASE domain (PF03924); ii) Kaltenegger et al. BMC Evolutionary Biology (2018) 18:76 Page 15 of 19 Response regulator receiver domain (PF00072); iii)Histi- excluded from the analyses. The RRAs were clearly dis- dine kinase-, DNA gyrase B-, HSP90-like ATPase domain tinguished as they formed a well-supported monophlye- (PF02518); and iv) His Kinase A (phospho-acceptor) do- tic clade, even in phylogenetic analysis with diverse main (PF00512). As we are interested in the retention of species. The same is true for RRCs, which clearly group functional genes after WGDs, stringent selection criteria with the response regulator receiver domain from cyto- were applied, and sequences that lacked more than 50% in kinin receptors [46]. The PRRs were identified by their one of the four domains were rejected. The resulting data- phylogenetic position and by an additional characteristic set of 166 sequences was used for phylogenetic analyses. protein motif: the Constans/Constans-like/TOC1 (CCT) Genes encoding cytokinin receptors were also analyzed domain [46]. The remaining response regulators were concerning the presence of transmembrane helices with designated as RRBs. This group is heterogeneous con- the “TMHMM Server v. 2.0” [73] and the presence of sig- cerning its phylogenetic composition and includes re- nal peptides with the SignalP [74] and Phobius [75]. sponse regulators that have an additional myeloblastosis (Myb)-related DNA binding domain [46] but also Identification of HPT and RR genes response regulators that lack this domain as well as To investigate the evolutionary pattern of the whole response regulators that have been annotated as pseudo- cytokinin signaling pathway, we additionally identified response regulators by Suzuki et al. [79] based on alter- HPT and RR encoding genes in a subset of species that ations in the conserved phosphorylation site (the DDK covered flowering plants (Additional file 4: alignment 2– motif) [80]. The biological roles played by latter, nonca- 4). We followed the same strategy as that described nonical members are not clear but based on the agreed above, performing an iterative BLASTN search starting nomenclature for cytokinin signaling components, they with sequences encoding HPT and RR domains from were included in the RRB group in this study [69]. When Arabidopsis thaliana [69]. We also used the PFAM data- experimental evidence was available, the information base [72] to analyze the protein domain structure of the was included in the analysis. Thus, based on work by identified sequences. Sequences that lacked more than Tsai and colleagues [45], OsRR27, OsRR31, OsRR32, 15% of RR (PF00072)- and HPT (PF01627)-PFAM do- OsRR30, and OsRR33 were shown not to function in mains were rejected. For phylogenetic analysis, only the cytokinin signaling, and thus, these sequences were ex- conserved regions encoding the HPT and RR PFAM do- cluded from our analyses. Furthermore, sequences that main were used as the alignable region. HPT encoding contained an additional EHD1 domain were excluded. sequences were further classified as authentic His- containing phosphotransfer proteins if they contained a conserved histidine residue, and sequences that lacked Phylogenetic analyses this site were classified as pseudo His-containing phos- To reconstruct the process and pattern of evolution of photransfer proteins [76]. Moreover, response regulators the cytokinin signaling pathway during land plant evolu- were further categorized into type-A, type-B, type-C, tion with emphasis on WGD events, the following data- and clock related- and pseudo-response regulators and sets were assembled from the sequence collection for named as RRA, RRB, RRC, and PRR, respectively [69]. phylogenetic analyses: i) sequences encoding cytokinin We assigned the sequences of our dataset to these cat- receptors of land plants ranging from Klebsormidium egories based on phylogenetic analysis as previously sug- flaccidum to monocots and dicots, and from monocots gested [69]. We used the well-characterized set of RRs and dicots: ii) sequences encoding HPTs, iii) sequences from A. thaliana as a reference (see [38]) in these phylo- encoding RRAs, and iv) sequences encoding RRBs. The genetic analyses and compared the individual species- respective sequences were aligned with the MAFFT mul- specific sets of RRs with this reference. For the more tiple sequence alignment algorithm (MSA) [79] based on divergent species investigated in this study (P. abies, P codons by using the GUIDANCE web server [81], which taeda, S. moellendorffii, and K. flaccidum), A trichopoda evaluates the confidence of the MSA. Based on the was additionally included in the comparative phylogen- GUIDANCE confidence scores, unreliable columns were etic analyses to classify RRs. To perform these analyses, removed (threshold 0.93). We used ModelOMatic [39], we calculated multiple sequence alignments with the which allows comparisons of nucleotide, amino acid, MUSCLE algorithm [77] implemented in MEGA Soft- and codon models, to identify the best substitution ware package, version 6.06 [78] and calculated Max- model for subsequent tree reconstruction. For recon- imum Likelihood trees with MEGA based on the structing the evolution of cytokinin receptors, the ro- General Time Reversible (GTR) substitution model with bustness of the reconstructed tree topology was tested a Gamma distribution to model evolutionary rate differ- by comparing phylogenetic trees based on nucleotide ences among sites (5 categories) with 100 bootstrap rep- and codon substitution models. When reconstructing licates. Sites with less than 95% site coverage were the evolution of HPTs, RRAs, and RRBs we focused on Kaltenegger et al. BMC Evolutionary Biology (2018) 18:76 Page 16 of 19 the best substitution model predicted by ModelOMatic. nucleotide substitution rates, and ω. Exponential priors In general, maximum likelihood trees based on the gen- were used for the shape parameter of the gamma distribu- eral time reversible (GTR) nucleotide substitution model tion of rate variation. Branch lengths were unconstrained. were calculated with RaxML [82], and the CAT approxi- The MCMC chain was sampled every 500 generations mation of rate heterogeneity was used to model rate het- with the burn-in set to 25% from the cold chain. Conver- erogeneity. RaxML involves a rapid hill climbing gence diagnostics were calculated every 5000 generations algorithm to search for the best tree, with the subtree and analyses were continued until the average standard pruning re-grafting (SPR) starting on a randomized max- deviation of split frequencies reached 0.01. For all trees, imum parsimony tree. The initial rearrangement settings cutoffs for the branch support were selected according to of SPR and the number of rate categories for the CAT the tree signal, also for the subsequent gene tree reconcili- approximation were optimized through comparative ation. Trees with a high percentage of low support were analysis by using various settings. Based on the original interpreted to contain low phylogenetic signal and to be alignment, 20 tree inferences with the optimized settings less robust. To compare the tree topologies calculated were performed to find the best maximum likelihood based on different substitution models and the different tree according to the final likelihoods. Branch support algorithms (Mr. Bayes and maximum likelihood), we de- was determined by 100 bootstrap repeats and mapped termined Robinson-Foulds distances (RF distance) [89]in onto the best tree of the 20 inferences. R, package phangorn 2.4.0, and normalized them by divid- To calculate maximum likelihood trees under codon ing by the maximal possible distance [90]. RF-distance models, we used CodonPhyML [83]. We compared three matrix is provided as Additional file 5: Table S3. For illus- codon models, namely i) the Goldman and Yang model tration, reconstructed RRB cladograms based on max- [84], ii) the Muse and Gaut model [85], and iii)the YAP imum likelihood (codonphyML, codon substitution model model [86], all of which differ in their instantaneous sub- YAP CF3x4) and Mr. Bayes (codon substitution model) stitution rates between codons. The stationary frequency are compared with R package dendextend 1.7.0 [91]. The of codons and the transition-transversion ratio were esti- tangelgram is provided as Additional file 3: Figure S7. mated by maximum likelihood. The ratio of synonymous and nonsynonymous substitution rates was modeled as being constant over sites (M0 model). Site-rate variation Gene tree reconciliation was drawn from a discrete gamma distribution with four We used Notung [92] for exploring alternate hypotheses classes. Starting from an initial tree built by using the about duplication events of the cytokinin signaling BioNJ algorithm, nearest neighbor interchange was used genes. Both rooting and a rearrangement tool were used to search for the best tree topology. Branch support was to minimize the overall Duplication/Loss score of a gene assessed by using SH-aLRT statistics, which is a conserva- tree. A cladogram reflecting land plant evolution was tive test for branch support, comparable to standard boot- built based on the work of the Angiosperm Phylogeny strap [87]. The tree with the highest log likelihood was Group [93] and Zou et al. [94] for relationships between selected for further analyses. In addition to the maximum rice species. The most basal eudicot is Nelumbo nucifera likelihood approach, we used Bayesian inference to recon- [95]. Amborella trichopoda is sister to all extant angio- struct phylogenetic trees (Mr. Bayes version 3.2 software; sperms [96]. Most basal land lineages are represented by [88]). The following substitution models were used: i)the the bryophyte Physcomitrella patens [97] and Klebsormi- GTR model of DNA substitution with among-site vari- dium flaccidium marking the transition of aquatic to ter- ation drawn from a gamma distribution, ii) the Jones restrial life [25]. This cladogram (Fig. 1) was used as a fixed-rate model of amino acid substitution, which was species tree for gene tree reconciliation. chosen via the model jumping option in Mr. Bayes, iii)the implemented codon model, which is based on the GY and MG codon models, to reconstruct phylogenetic trees. In Comparative genomics (gene collinearity) the case of the codon model, the ratio of synonymous and We used the PLAZA 3.0 online database [98] to study nonsynonymous substitution rates (ω) was again modeled the genomic organization of the cytokinin signaling as being constant over sites. The posterior probabilities of genes. In the PLAZA database, collinear and syntenic re- the phylogenetic tree model were estimated as part of the gions within and between genomes are pre-computed by Bayesian analyses by using Markov chain Monte Carlo using i-ADHoRe [99]. An intra-species comparison of sampling with Metropolis coupling running four chains in the whole genome (WGDotplot) reports all collinear re- two simultaneous analyses. The analyses were run with gions, i.e., duplicated blocks, found within a genome. uniform prior distributions for tree topology. A flat The age of the paralogs is provided based on pairwise Dirichlet distribution (1.0) was used for stationary fre- synonymous substitution rate (Ks distances) calculated quencies of nucleotides, codons, and amino acids, with PAML [100]. Kaltenegger et al. BMC Evolutionary Biology (2018) 18:76 Page 17 of 19 Additional files Received: 29 September 2017 Accepted: 14 March 2018 Additional file 1: List of analysed species and used data source, and list of the analysed CHK encoding sequences. (XLSX 46 kb) References 1. Adams KL, Wendel JF. Polyploidy and genome evolution in plants. Curr Additional file 2: Collinear regions, Ks distances. (XLSX 357 kb) Opin Plant Biol. 2005;8:135–41. Additional file 3: Figure S1. ML Tree CHKs 1, Figure S2. Mr. Bayes 2. Wood TE, Takebayashi N, Barker MS, Mayrose I, Greenspoon PB, Rieseberg CHKs, Figure S3. Reconciled ML tree CHKs, Figure S4. Reconciled LH. The frequency of polyploid speciation in vascular plants. Proc Natl Acad ML tree HPTs, Figure S5. ML tree RRBs, Figure S6. Mr. Bayes tree RRBs, Sci U S A. 2009;106:13875–9. Figure S7. Cladogram comparison RRBs. (PDF 2407 kb) 3. Mayrose I, Zhan SH, Rothfels CJ, Magnuson-Ford K, Barker MS, Rieseberg LH, Additional file 4: Supplemental alignment of #CHKs, #HPTs, #RRAs and et al. Recently formed polyploid plants diversify at lower rates. Science. #RRBs. (FASTA 770 kb) 2011;333:1257. 4. Vanneste K, Baele G, Maere S, de Peer YV. Analysis of 41 plant genomes Additional file 5: RF distances between phylogenetic trees of CHKs, supports a wave of successful genome duplications in association with the HPTs, RRAs and RRBs. (XLSX 15 kb) cretaceous–Paleogene boundary. Genome Res. 2014;24:1334–47. 5. Lynch M, Conery JS. The evolutionary fate and consequences of duplicate genes. Science. 2000;290:1151–5. Abbreviations 6. Karev GP, Wolf YI, Berezovskaya FS, Koonin EV. Gene family evolution: an in- CCT domain: Constans/Constans-like/TOC1 domain; CHASE: Cyclases/ depth theoretical and simulation analysis of non-linear birth-death- Histidine kinases Associated Sensing Extracellular; CHK: CHASE domain innovation models. BMC Evol Biol. 2004;4:32. containing histidine kinase; GDB: Gene dosage balance; GTR: General time 7. De Smet R, Adams KL, Vandepoele K, Van Montagu MCE, Maere S, Van de reversible (substitution model); HPT: Histidine phosphotransfer protein; Peer Y. Convergent gene loss following gene and genome duplications ML: Maximum likelihood; Myb domain: Myeloblastosis related DNA-binding creates single-copy families in flowering plants. Proc Natl Acad Sci U S A. domain; PRR: Pseudo-response regulator; RF: Distance robinson-foulds dis- 2013;110:2898–903. tance; RRA: Type-A response regulator; RRB: Type-B response regulator; 8. Geiser C, Mandáková T, Arrigo N, Lysak MA, Parisod C. Repeated whole- WGD: Whole genome duplication; WGT: Whole genome triplication genome duplication, karyotype reshuffling, and biased retention of stress- responding genes in buckler mustard. Plant Cell. 2016;28:17–27. 9. Li Z, Defoort J, Tasdighian S, Maere S, de Peer YV, Smet RD. Gene Acknowledgments duplicability of core genes is highly consistent across all angiosperms. Plant We are grateful to Prof. Dietrich Ober and to Prof. Lawrence Hobbie for Cell. 2016;28:326–44. discussing the manuscript. We also thank Anne Kupczok of the 10. Papp B, Pál C, Hurst LD. Dosage sensitivity and the evolution of gene bioinformatics support group of the University of Kiel for her advice with the families in yeast. Nature. 2003;424:194–7. phylogenetic analyses. 11. Maere S, Bodt SD, Raes J, Casneuf T, Montagu MV, Kuiper M, et al. Modeling gene and genome duplications in eukaryotes. Proc Natl Acad Sci U S A. Funding 2005;102:5454–9. The project was funded by the Dahlem Center of Plant Sciences and the 12. Freeling M. Bias in plant gene content following different sorts of University of Kiel (promotion program for women in science). duplication: tandem, whole-genome, segmental, or by transposition. Annu Rev Plant Biol. 2009;60:433–53. 13. Huminiecki L, Heldin CH. 2R and remodeling of vertebrate signal Availability of data and materials transduction engine. BMC Biol. 2010;8:146. The data sets supporting the conclusions of this article are included within 14. Rodgers-Melnick E, Mane SP, Dharmawardhana P, Slavov GT, Crasta OR, the article (and its Additional files). Strauss SH, et al. Contrasting patterns of evolution following whole genome versus tandem duplication events in Populus. Genome Res. 2012;22:95–105. 15. Veitia RA, Bottani S, Birchler JA. Cellular reactions to gene dosage Authors’ contributions imbalance: genomic, transcriptomic and proteomic effects. Trends Genet. EK conducted sequence searches and data analysis, interpreted the data and 2008;24:390–7. wrote the manuscript. SL identified and analyzed sequences. AH conceived 16. Edger PP, Pires JC. Gene and genome duplications: the impact of dosage- of the project, interpreted the data, and wrote the manuscript. All authors sensitivity on the fate of nuclear genes. Chromosom Res. 2009;17:699–717. read and approved the final version. 17. Birchler JA, Veitia RA. Gene balance hypothesis: connecting issues of dosage sensitivity across biological disciplines. Proc Natl Acad Sci U S A. 2012;109: 14746–53. Ethics approval and consent to participate 18. Veitia RA, Bottani S, Birchler JA. Gene dosage effects: nonlinearities, genetic Not applicable. interactions, and dosage compensation. Trends Genet. 2013;29:385–93. 19. Conant GC, Birchler JA, Pires JC. Dosage, duplication, and diploidization: Consent for publication clarifying the interplay of multiple models for duplicate gene evolution over Not applicable. time. Curr Opin Plant Biol. 2014;19:91–8. 20. Veitia RA. Gene dosage balance: deletions, duplications and dominance. Trends Genet. 2005;21:33–5. Competing interests 21. D’Hont A, Denoeud F, Aury J-M, Baurens F-C, Carreel F, Garsmeur O, et al. The authors declare that they have no competing interests. The banana (Musa acuminata) genome and the evolution of monocotyledonous plants. Nature. 2012;488:213–7. 22. Blanc G, Wolfe KH. Functional divergence of duplicated genes formed by Publisher’sNote polyploidy during Arabidopsis evolution. Plant Cell. 2004;16:1679–91. Springer Nature remains neutral with regard to jurisdictional claims in 23. Aury J-M, Jaillon O, Duret L, Noel B, Jubin C, Porcel BM, et al. Global trends published maps and institutional affiliations. of whole-genome duplications revealed by the ciliate Paramecium tetraurelia. Nature. 2006;444:171–8. Author details 24. Hwang I, Sheen J, Müller B. Cytokinin signaling networks. Annu Rev Plant Department Biochemical Ecology and Molecular Evolution, Botanical Biol. 2012;63:353–80. Institute, Christian-Albrechts-University, Kiel, Germany. Institute of Applied 25. Hori K, Maruyama F, Fujisawa T, Togashi T, Yamamoto N, Seo M, et al. Genetics, Freie Universität Berlin, Berlin, Germany. Biology Department, Klebsormidium flaccidum genome reveals primary factors for plant terrestrial Adelphi University, Garden City, USA. adaptation. Nat Commun. 2014;5:3978. Kaltenegger et al. BMC Evolutionary Biology (2018) 18:76 Page 18 of 19 26. Wang C, Liu Y, Li S-S, Han G-Z. Insights into the origin and evolution of 51. Wang X, Wang J, Jin D, Guo H, Lee T-H, Liu T, et al. Genome alignment plant hormone signaling machinery. Plant Physiol. 2015;167:872–86. spanning major poaceae lineages reveals heterogeneous evolutionary rates 27. von Schwartzenberg K, Lindner A-C, Gruhn N, Šimura J, Novák O, and alters inferred dates for key evolutionary events. Mol Plant. 2015;8:885–98. Strnad M, et al. CHASE domain-containing receptors play an essential 52. Rausher MD, Miller RE, Tiffin P. Patterns of evolutionary rate variation among role in the cytokinin response of the moss Physcomitrella patens.J Exp genes of the anthocyanin biosynthetic pathway. Mol Biol Evol. 1999;16:266–74. Bot. 2016;67:667–79. 53. Lu Y, Rausher MD. Evolutionary rate variation in anthocyanin pathway 28. Heyl A, Schmülling T. Cytokinin signal perception and transduction. Curr genes. Mol Biol Evol. 2003;20:1844–53. Opin Plant Biol. 2003;6:480–8. 54. Olson-Manning CF, Lee C-R, Rausher MD, Mitchell-Olds T. Evolution of flux 29. Heyl A, Wulfetange K, Pils B, Nielsen N, Romanov GA, Schmülling T. control in the glucosinolate pathway in arabidopsis thaliana. Mol Biol Evol. Evolutionary proteomics identifies amino acids essential for ligand- 2013;30:14–23. binding of the cytokinin receptor CHASE domain. BMC Evol Biol. 55. Makarova KS, Wolf YI, Mekhedov SL, Mirkin BG, Koonin EV. Ancestral 2007;7:62. paralogs and pseudoparalogs and their role in the emergence of the 30. Hothorn M, Dabi T, Chory J. Structural basis for cytokinin recognition by eukaryotic cell. Nucleic Acids Res. 2005;33:4626–38. Arabidopsis thaliana histidine kinase 4. Nat Chem Biol. 2011;7:766–8. 56. Moghe GD, Hufnagel DE, Tang H, Xiao Y, Dworkin I, Town CD, et al. 31. Punwani JA, Hutchison CE, Schaller GE, Kieber JJ. The subcellular distribution Consequences of whole-genome triplication as revealed by comparative of the Arabidopsis histidine phosphotransfer proteins is independent of genomic analyses of the wild radish raphanus raphanistrum and three other cytokinin signaling. Plant J. 2010;62:473–82. brassicaceae species. Plant Cell. 2014;26:1925–37. 32. Mähönen AP, Bishopp A, Higuchi M, Nieminen KM, Kinoshita K, 57. Lespinet O, Wolf YI, Koonin EV, Aravind L. The role of lineage-specific gene Törmäkangas K, et al. Cytokinin signaling and its inhibitor AHP6 regulate cell family expansion in the evolution of eukaryotes. Genome Res. 2002;12:1048–59. fate during vascular development. Science. 2006;311:94–8. 58. Renny-Byfield S, Gallagher JP, Grover CE, Szadkowski E, Page JT, Udall JA, et 33. Ruszkowski M, Brzezinski K, Jedrzejczak R, Dauter M, Dauter Z, Sikorski M, et al. Ancient gene duplicates in gossypium (cotton) exhibit near-complete al. M. Truncatula histidine-containing phosphotransfer protein: structural expression divergence. Genome Biol Evol. 2014;6:559–71. and biochemical insights into cytokinin transduction pathway in plants. 59. Zhou J, Lemos B, Dopman EB, Hartl DL. Copy-number variation: the balance FEBS J. 2013;280:3709–20. between gene dosage and expression in drosophila melanogaster. Genome 34. Hwang I, Sheen J. Two-component circuitry in arabidopsis cytokinin signal Biol Evol. 2011;3:1014–24. transduction. Nature. 2001;413:383–9. 60. Hudson CM, Puckett EE, Bekaert M, Pires JC, Conant GC. Selection for higher 35. TO JPC, Haberer G, Ferreira FJ, Deruère J, Mason MG, Schaller GE, et al. gene copy number after different types of plant gene duplications. Type-a arabidopsis response regulators are partially redundant negative Genome Biol Evol. 2011;3:1369–80. regulators of cytokinin signaling. Plant Cell. 2004;16:658–71. 61. Rody HVS, Baute GJ, Rieseberg LH, Oliveira LO. Both mechanism and age of 36. Van de Peer Y, Fawcett JA, Proost S, Sterck L, Vandepoele K. The flowering duplications contribute to biased gene retention patterns in plants. BMC world: a tale of duplications. Trends Plant Sci. 2009;14:680–8. Genomics. 2017;18:46. 37. Jiao Y, Wickett NJ, Ayyampalayam S, Chanderbali AS, Landherr L, Ralph PE, 62. Kaltenegger E, Ober D. Paralogue interference affects the dynamics after et al. Ancestral polyploidy in seed plants and angiosperms. Nature. 2011; gene duplication. Trends Plant Sci. 2015;20:814–21. 473:97–100. 63. Baker CR, Hanson-Smith V, Johnson AD. Following gene duplication, 38. Mandáková T, Joly S, Krzywinski M, Mummenhoff K, Lysak MA. Fast paralog interference constrains transcriptional circuit evolution. Science. diploidization in close mesopolyploid relatives of arabidopsis. Plant Cell. 2013;342:104–8. 2010;22:2277–90. 64. Bridgham JT, Brown JE, Rodríguez-Marí A, Catchen JM, Thornton JW. 39. Whelan S, Allen JE, Blackburne BP, Talavera D. ModelOMatic: fast and Evolution of a new function by degenerative mutation in cephalochordate automated model selection between RY, nucleotide, amino acid, and steroid receptors. PLoS Genet. 2008;4:e1000191. codon substitution models. Syst Biol. 2015;64:42–55. 65. Keshishian EA, Rashotte AM. Plant cytokinin signalling. Essays Biochem. 40. Jiao Y, Li J, Tang H, Paterson AH. Integrated syntenic and phylogenomic 2015;58:13–27. analyses reveal an ancient genome duplication in monocots. Plant Cell. 66. Goodstein DM, Shu S, Howson R, Neupane R, Hayes RD, Fazo J, et al. 2014;26:2792–802. Phytozome: a comparative platform for green plant genomics. Nucleic 41. Tang H, Bowers JE, Wang X, Paterson AH. Angiosperm genome Acids Res. 2012;40:D1178–86. comparisons reveal early polyploidy in the monocot lineage. Proc Natl Acad 67. Matasci N, Hung L-H, Yan Z, Carpenter EJ, Wickett NJ, Mirarab S, et al. Data Sci U S A. 2010;107:472–7. access for the 1000 plants (1KP) project. Giga Science. 2014;3:17. 42. Blanc G, Wolfe KH. Widespread Paleopolyploidy in Model plant species inferred 68. Yates A, Akanni W, Amode MR, Barrell D, Billis K, Carvalho-Silva D, et al. from age distributions of duplicate genes. Plant Cell. 2004;16:1667–78. Ensembl 2016. Nucleic Acids Res. 2016;44:D710–6. 43. Estep MC, McKain MR, Diaz DV, Zhong J, Hodge JG, Hodkinson TR, et al. 69 Heyl A, Brault M, Frugier F, Kuderova A, Lindner A-C, Motyka V, et al. Allopolyploidy, diversification, and the miocene grassland expansion. Proc Nomenclature for members of the two-component signaling pathway of Natl Acad Sci U S A. 2014;111:15149–54. plants. Plant Physiol. 2013;161:1063–5. 44. Paterson AH, Wendel JF, Gundlach H, Guo H, Jenkins J, Jin D, et al. 70. Suzuki T, Miwa K, Ishikawa K, Yamada H, Aiba H, Mizuno T. The arabidopsis Repeated polyploidization of gossypium genomes and the evolution of sensor his-kinase, AHk4, can respond to cytokinins. Plant Cell Physiol. 2001; spinnable cotton fibres. Nature. 2012;492:423–7. 42:107–13. 45. Tsai Y-C, Weir NR, Hill K, Zhang W, Kim HJ, Shiu S-H, et al. Characterization 71. Schaller GE, Doi K, Hwang I, Kieber JJ, Khurana JP, Kurata N, et al. of genes involved in cytokinin signaling and metabolism from rice. Plant Nomenclature for two-component signaling elements of rice. Plant Physiol. Physiol. 2012;158:1666–84. 2007;143:555–7. 46. Gruhn N, Halawa M, Snel B, Seidl MF, Heyl A. A subfamily of putative 72. Finn RD, Bateman A, Clements J, Coggill P, Eberhardt RY, Eddy SR, et al. cytokinin receptors is revealed by an analysis of the evolution of the two- Pfam: the protein families database. Nucleic Acids Res. 2014;42:D222–30. component signaling system of plants. Plant Physiol. 2014;165:227–37. 73. Krogh A, Larsson B, von Heijne G, Sonnhammer EL. Predicting 47. Pils B, Heyl A. Unraveling the evolution of cytokinin signaling. Plant Physiol. transmembrane protein topology with a hidden markov model: application 2009;151:782–91. to complete genomes. J Mol Biol. 2001;305:567–80. 48. Pires ND, Yi K, Breuninger H, Catarino B, Menand B, Dolan L. Recruitment 74. Petersen TN, Brunak S, von Heijne G, Nielsen H. SignalP 4.0: discriminating and remodeling of an ancient gene regulatory network during land plant signal peptides from transmembrane regions. Nat Methods. 2011;8:785–6. evolution. Proc Natl Acad Sci U S A. 2013;110:9571–6. 75. Käll L, Krogh A, Sonnhammer ELL. Advantages of combined transmembrane 49. Jourda C, Cardi C, Mbéguié-A-Mbéguié D, Bocs S, Garsmeur O, D’Hont A, et topology and signal peptide prediction–the phobius web server. Nucleic al. Expansion of banana (Musa acuminata) gene families involved in Acids Res. 2007;35:W429–32. ethylene biosynthesis and signalling after lineage-specific whole-genome 76. Suzuki T, Sakurai K, Imamura A, Nakamura A, Ueguchi C, Mizuno T. Compilation duplications. New Phytol. 2014;202:986–1000. and characterization of histidine-containing phosphotransmitters implicated in 50. Smith SA, Donoghue MJ. Rates of molecular evolution are linked to life his-to-asp phosphorelay in plants: AHP signal transducers of Arabidopsis history in flowering plants. Science. 2008;322:86–9. thaliana. Biosci Biotechnol Biochem. 2000;64:2486–9. Kaltenegger et al. BMC Evolutionary Biology (2018) 18:76 Page 19 of 19 77. Edgar RC. MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res. 2004;32:1792–7. 78. Tamura K, Stecher G, Peterson D, Filipski A, Kumar S. MEGA6: molecular evolutionary genetics analysis version 6.0. Mol Biol Evol. 2013;30:2725–9. 79. Urao T, Yakubov B, Yamaguchi-Shinozaki K, Shinozaki K. Stress-responsive expression of genes for two-component response regulator-like proteins in arabidopsis thaliana. FEBS Lett. 1998;427:175–8. 80. Katoh K, Misawa K, Kuma K, Miyata T. MAFFT: a novel method for rapid multiple sequence alignment based on fast fourier transform. Nucleic Acids Res. 2002;30:3059–66. 81. Sela I, Ashkenazy H, Katoh K, Pupko T. GUIDANCE2: accurate detection of unreliable alignment regions accounting for the uncertainty of multiple parameters. Nucleic Acids Res. 2015;43:W7–14. 82. Stamatakis A. RAxML version 8: a tool for phylogenetic analysis and post- analysis of large phylogenies. Bioinformatics. 2014;30:1312–3. 83. Gil M, Zanetti MS, Zoller S, Anisimova M. CodonPhyML: fast maximum likelihood phylogeny estimation under codon substitution models. Mol Biol Evol. 2013;30:1270–80. 84. Goldman N, Yang Z. A codon-based model of nucleotide substitution for protein-coding DNA sequences. Mol Biol Evol. 1994;11:725–36. 85. Muse SV, Gaut BS. A likelihood approach for comparing synonymous and nonsynonymous nucleotide substitution rates, with application to the chloroplast genome. Mol Biol Evol. 1994;11:715–24. 86. Yap VB, Lindsay H, Easteal S, Huttley G. Estimates of the effect of natural selection on protein-coding content. Mol Biol Evol. 2010;27:726–34. 87. Anisimova M, Gil M, Dufayard J-F, Dessimoz C, Gascuel O. Survey of branch support methods demonstrates accuracy, power, and robustness of fast likelihood-based approximation schemes. Syst Biol. 2011;60:685–99. 88. Ronquist F, Teslenko M, van der Mark P, Ayres DL, Darling A, Höhna S, et al. MrBayes 3.2: efficient bayesian phylogenetic inference and model choice across a large model space. Syst Biol. 2012;61:539–42. 89. Robinson DF, Foulds LR. Comparison of phylogenetic trees. Math Biosci. 1981;53:131–47. 90. Schliep K, Potts AJ, Morrison DA, Grimm GW. Intertwining phylogenetic trees and networks. Methods Ecol Evol. 2017;8:1212–20. 91. Galili T. Dendextend: an R package for visualizing, adjusting and comparing trees of hierarchical clustering. Bioinformatics. 2015;31:3718–20. 92. Stolzer M, Lai H, Xu M, Sathaye D, Vernot B, Durand D. Inferring duplications, losses, transfers and incomplete lineage sorting with nonbinary species trees. Bioinforma Oxf Engl. 2012;28:i409–15. 93. The Angiosperm Phylogeny Group. An update of the angiosperm phylogeny group classification for the orders and families of flowering plants: APG IV. Bot J Linn Soc. 2016;181:1–20. 94. Zou X-H, Zhang F-M, Zhang J-G, Zang L-L, Tang L, Wang J, et al. Analysis of 142 genes resolves the rapid diversification of the rice genus. Genome Biol. 2008;9:R49. 95. Ming R, VanBuren R, Liu Y, Yang M, Han Y, Li L-T, et al. Genome of the long- living sacred lotus (Nelumbo nucifera Gaertn.). Genome Biol. 2013;14:R41. 96. Albert VA, Barbazuk WB, dePamphilis CW, Der JP, Leebens-Mack J, Ma H, et al. The Amborella genome and the evolution of flowering plants. Science. 2013;342:1467. 97. Rensing SA, Lang D, Zimmer AD, Terry A, Salamov A, Shapiro H, et al. The physcomitrella genome reveals evolutionary insights into the conquest of land by plants. Science. 2008;319:64–9. 98. Proost S, Van Bel M, Vaneechoutte D, Van de Peer Y, Inzé D, Mueller-Roeber B, et al. PLAZA 3.0: an access point for plant comparative genomics. Nucleic Acids Res. 2014;43:D974–81. 99. Proost S, Fostier J, De Witte D, Dhoedt B, Demeester P, Van de Peer Y, et al. Submit your next manuscript to BioMed Central I-ADHoRe 3.0—fast and sensitive detection of genomic homology in and we will help you at every step: extremely large data sets. Nucleic Acids Res. 2012;40:e11. 100. Yang Z. PAML 4: phylogenetic analysis by maximum likelihood. Mol Biol • We accept pre-submission inquiries Evol. 2007;24:1586–91. � Our selector tool helps you to find the most relevant journal 101. Kagale S, Robinson SJ, Nixon J, Xiao R, Huebert T, Condie J, et al. Polyploid evolution of the brassicaceae during the cenozoic era. Plant Cell. 2014;26: � We provide round the clock customer support 2777–91. � Convenient online submission 102. Tang H, Wang X, Bowers JE, Ming R, Alam M, Paterson AH. Unraveling � Thorough peer review ancient hexaploidy through multiply-aligned angiosperm gene maps. Genome Res. 2008;18:1944–54. � Inclusion in PubMed and all major indexing services � Maximum visibility for your research Submit your manuscript at www.biomedcentral.com/submit http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png BMC Evolutionary Biology Springer Journals

The effects of repeated whole genome duplication events on the evolution of cytokinin signaling pathway

Free
19 pages

Loading next page...
 
/lp/springer_journal/the-effects-of-repeated-whole-genome-duplication-events-on-the-6OJQ1AwcIr
Publisher
BioMed Central
Copyright
Copyright © 2018 by The Author(s).
Subject
Life Sciences; Evolutionary Biology; Animal Systematics/Taxonomy/Biogeography; Entomology; Genetics and Population Dynamics; Life Sciences, general
eISSN
1471-2148
D.O.I.
10.1186/s12862-018-1153-x
Publisher site
See Article on Publisher Site

Abstract

Background: It is thought that after whole-genome duplications (WGDs), a large fraction of the duplicated gene copies is lost over time while few duplicates are retained. Which factors promote survival or death of a duplicate remains unclear and the underlying mechanisms are poorly understood. According to the model of gene dosage balance, genes encoding interacting proteins are predicted to be preferentially co-retained after WGDs. Among these are genes encoding proteins involved in complexes or in signal transduction. Results: We have investigated the way that repeated WGDs during land plant evolution have affected cytokinin signaling to study patterns of gene duplicability and co-retention in this important signal transduction pathway. Through the integration of phylogenetic analyses with comparisons of genome collinearity, we have found that signal input mediated by cytokinin receptors proved to be highly conserved over long evolutionary time-scales, with receptors showing predominantly gene loss after repeated WGDs. However, the downstream elements, e,g. response regulators, were mainly retained after WGDs and thereby formed gene families in most plant lineages. Conclusions: Gene dosage balance between the interacting components indicated by co-retention after WGDs seems to play a minor role in the evolution of cytokinin signaling pathway. Overall, core genes of cytokinin signaling show a highly heterogeneous pattern of gene retention after WGD, reflecting complex relationships between the various factors that shape the long-term fate of a duplicated gene. Keywords: Gene duplication, Whole-genome duplication, Gene retention, Gene dosage balance, Cytokinin signaling, Signal transduction, Evolution Background Studies of WGD events and their effect on core angio- Duplications of individual genes and whole genomes are sperm genes (i.e., gene families shared by all angiosperm a dominant feature of plant evolution and have been de- species) showed a generalized pattern of “gene duplic- tected in all land plant lineages [1–4]. Gene duplication ability”, meaning the ability of genes to be retained fol- is assumed to be a stochastic process and a common fate lowing WGD. Three categories could be defined: a) of a duplicate is its loss [5, 6]. However, the retention of “singleton” genes: the majority of core genes occur as duplicate genes seems to be biased toward certain func- single copies and are functionally involved in the main- tional classes of genes [7–9]. Another factor that seems tenance of genome integrity; b) “multicopy” genes: genes to influence the long-term survival of a duplicated gene remain in a duplicated state throughout time and are is the mode of duplication as genes which are predomin- functionally biased toward signaling, transport, and me- antly retained differ between whole-genome duplications tabolism; and c) “intermediate” genes: these genes show (WGDs) and small-scale duplication events [10–14]. a pattern of prolonged duplicate retention spanning sev- eral tens of millions of years following WGD but appear eventually to return to singleton status. This later group * Correspondence: ekaltenegger@bot.uni-kiel.de; aheyl@adelphi.edu Department Biochemical Ecology and Molecular Evolution, Botanical (intermediate genes) is enriched for genes that are Institute, Christian-Albrechts-University, Kiel, Germany involved in development, growth, and regulation of tran- Institute of Applied Genetics, Freie Universität Berlin, Berlin, Germany scription [9]. Full list of author information is available at the end of the article © The Author(s). 2018 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. Kaltenegger et al. BMC Evolutionary Biology (2018) 18:76 Page 2 of 19 For the categories b) and c), gene dosage balance after WGD include signal transduction genes in diverse (GDB) theory is discussed to be a major driver of gene species such as banana [21], Arabidopsiss [22], and the retention after WGD [9]. Basically, the GDB theory ciliate Paramecium tetraurelia [23]. Here, we have states that, for many genes whose products participate studied the pattern of gene retention and loss of the in protein complexes, the stoichiometry among interact- individual components of core cytokinin signaling after ing gene products (i.e., proteins) must be maintained repeated WGDs during land plant evolution to test [10, 15–18]. Thus, according to GDB, dosage-balance- whether a bias exists in the gene duplicability of the in- sensitive genes are predicted to be co-retained after a dividual components and to explore whether GDB can WGD event. These genes are also predicted to continu- explain the observed pattern. Cytokinins are plant hor- ously experience purifying selection after duplication mones that play pivotal roles in plant development and leading to prolonged retention. This prolonged retention its response to changes in the environment [24]. Various accompanied by the gradual circumvention of dosage- studies have indicated that the cytokinin signaling sys- balance-constraints may increase the possibility that du- tem was established in early divergent land plants, and plicate genes diversify (sub- or neofunctionalization) and even some Charophyceae green algae have been found become permanently preserved [17, 19]. Genes in the to encode family members of all four components of this “multicopy” group may have been retained – at least ini- signaling pathway [25–27]. Thus, cytokinin signaling is tially – because of dosage balance constraints. The an ideal model system for studying the way that the in- “intermediate” group of gene families can be explained dependent and repeated WGDs during land plant evolu- by a scenario of dosage balance that wears off over time, tion have affected the evolution of the individual leading to prolonged preservation but ultimate loss of components of a signaling pathway. duplicates [9]. The core signaling of the phytohormone cytokinin is GDB has been extended to informational pathways mediated via a variant of the two-component signaling (e.g., signal tansduction) [20], in agreement with the ob- system [28] (Fig. 1a). The cytokinin molecules are servation that preferentially retained gene categories perceived by binding to the Cyclases/Histidine kinases Fig. 1 Cytokinin signaling and repeated polyploidy events during land plant evolution. a Schematic representation of core cytokinin signaling. Cytokinin receptors perceive cytokinins, autophosphorylate and transmit the signal via HPTs to RRAs and RRBs. Pseudo-HPTs may compete for phosphotransfer with HPTs. RRAs are induced by cytokinins and function as negative regulators to form feedback regulatory loops. RRBs encode DNA-binding transcription factors that mediate cytokinin-dependent transcriptional activation [24]. b Repeated WGDs and WGTs during land plant evolution and sampling strategy [4, 21, 37, 40, 41, 43, 44, 50, 95, 101, 102]. The figure illustrates the phylogenetic tree topology for land plants [93, 94, 96, 97]. Klebsormidium flaccidum is placed on the basal lineage of current land plants marking the transition from the aquatic to the terrestrial life form [25]. Ancestral polyploidy events in seed plants and angiosperms are indicated by symbols and were inferred from the literature, given in the key. Gray boxes mark the 14 core species chosen for this study of comparative analyses of cytokinin signaling (Table 1). For all depicted species/lineages, genes encoding CHKs were identified and their evolutionary history was reconstructed. Additionally, the evolutionary history of HPTs, RRAs, and RRBs from species labeled with * was reconstructed Kaltenegger et al. BMC Evolutionary Biology (2018) 18:76 Page 3 of 19 Associated Sensing Extracellular (CHASE) domain of a Table 1 Copy numbers of the cytokinin signaling components membrane-bound hybrid histidine kinase (CHASE do- Species CHKs HPTs (AHPT/PHPT) RRAs RRBs main containing histidine kinase, CHK) that serves as Arabidopsis lyrata 3 6 (5/1) 10 15 receptor [29, 30]. The binding of the hormone leads to Arabidopsis thaliana 3 6 (5/1) 10 15 the autophosphorylation of the histidine kinase domain. Brassica oleracea 4 8 (7/1) 15 19 After an intramolecular phosphotransfer to the c- Brassica rapa 4 8 (7/1) 20 20 terminal response regulator domain of the receptor, the Nelumbo nucifera 4 8 (6/2) 7 11 signal is transferred to histidine phosphotransfer pro- teins (HPTs). These proteins have been shown to shuttle Musa acuminata 8 10 (8/2) 14 14 between the cytoplasm and the nucleus [31]. The HPTs Oryza sativa Japonica Group 4 5 (2/3) 13 10 can be divided into enzymatically active and inactive Zea mays 7 5 (4/1) 12 9 orthologs (pseudo-HPTs). The pseudo-HPTs lack a Amborella trichopoda 2 4 (3/1) 4 6 conserved histidine residue that acts as a phosphoryl- Picea abies 2 7 (5/2) 12 7 ation site and negatively interfere with pathway activity Pinus taeda 3 3 (3/0) 13 9 [32, 33]. HPTs can phosphorylate the response regulator domain of various response regulators. In cytokinin sig- Selaginella moellendorffii 2 2 (2/0) 3 10 naling, two types of response regulators have been Physcomitrella patens 11 2 (2/0) 7 5 shown to be important: i) the type-B response regulators Klebsormidium flaccidum 9 1 (1/0) 3 1 (RRB), which are Myb type transcription factors that, Abbreviations: AHPT authentic His-containing phosphotransfer protein, PHPT upon phosphorylation, initiate the transcription of their pseudo His-containing phosphotransfer protein, which lacks the conserved His without EHD1 copies: OrysatHPT25 OsRR27 and OrysatHPT24 OsRR30, target genes, and ii) the type-A response regulators including B-IV group, B-V group excluded (classification according to Tsai et (RRA), which are transcriptionally regulated by the RRB al. [45]) [34] and have been shown to be negative regulators of the cytokinin signaling pathway [35]. Comparison of copy numbers of the various cytokinin The study presented here reveals that the individual signaling components among the investigated species components of cytokinin signaling were duplicated and Sequences encoding the four components of the cytoki- retained independently of each other. Although the cyto- nin signaling pathway were identified in the core species kinin signaling pathway expanded mainly via WGD and categorized as bona fide CHKs, HPTs, RRAs, or events, the observed pattern of gene duplicability and RRBs. The copy numbers of the identified components the pattern of co-retention after WGDs does not correl- varied between species and also between the different ate with the predictions of GDB. Instead, downstream protein families (Table 1). The number of cytokinin re- elements of the pathway show a trend towards higher ceptors was relatively stable between species, ranging gene duplicability compared with upstream elements. from two to four copies across most land plants. Excep- tions found were M. acuminata, Zea mays, Physcomi- trella patens, and K. flaccidum with eight, seven, eleven, Results and nine receptor genes, respectively. In contrast, for Repeated WGDs during land plant evolution provide the HPTs, a steady increase in copy number was detected background to study the evolutionary patterns of the during land plant evolution starting from two HPTs in cytokinin signaling components the moss P. patens to eight and 10 in Brassica species In order to study the evolutionary pattern of the individ- and Musa acuminata, respectively. In the gymnosperm ual components of cytokinin signaling after whole gen- Picea abies, seven HPT copies were identified, in com- ome duplications, plant species were chosen for further parison with three HPT copies in the closely related analysis to cover the major meso- and paleopolyploidy Pinus taeda. Another noteworthy trend was the emer- events reported in land plant evolution [4, 36–38] gence of pseudo HPTs in P. abies and in angiosperms (Fig. 1b). Furthermore, to allow the identification of all with copy numbers ranging from one to two, with the members of the four protein families involved in cytokinin exception of rice for which three pseudo HPTs were signaling pathway the availability of a large dataset, e.g., a identified. fully sequenced genome or transcriptome, was another RRA and RRB copy number increased steadily during criterion to select species. Thus, this study focused on 14 land plant evolution to form middle size gene families in “core” plant species (Table 1,Fig. 1b) for comparative ana- dicots and monocots. Furthermore, with a few excep- lyses of cytokinin signaling. Beginning with Klebsormidium tions, the number of RRAs and RRBs in flowering plant flaccidum as a representative of the Charophyceae, the species were found to be roughly equal (Table 1). algae lineage that gave rise to land plants, the whole These differences in the copy number between the four spectrum of land plants was covered. gene families involved in cytokinin signaling indicated that Kaltenegger et al. BMC Evolutionary Biology (2018) 18:76 Page 4 of 19 the individual components experienced different evolu- studied (Additional file 2: Table S2). The results are tionary pressures that influenced their duplicability or summarized in Table 2 and information on colinearity is rather their retention after WGD. included in the reconciled CHK tree (Fig. 2). All reconciled trees supported two ancient duplication Reconstruction of cytokinin receptor evolution events: i) one duplication before the split of gymnosperm The complete sequence of cytokinin receptors included and angiosperms giving rise to the ancestor of the AHK4 four protein domains (PFAM domains of CHASE, clade and the ancestor of the AHK2/AHK3 clade (Fig. 2, HisKinaseA, HATPase, response regulator receiver) as Additional file 3: Figure S3) which coincides with the an- alignable regions, which covered in total 466 amino cient ζ WGD event [37]; ii) a second duplication event be- acids. To reconstruct CHK evolution during land plant fore the radiation of angiosperms giving rise to the AHK2 evolution, genes encoding CHKs of the 14 above- and AHK3 clades which coincides with a well-established mentioned “core species” (Table 1, Fig. 1) were analyzed. ε WGD event at the basis of angiosperm evolution [37]. A Additional species were sampled to improve the phylo- third ancient duplication event is predicted within the genetic reconstructions (Fig. 1). Thus, the final dataset monocot clade of AHK4. The different reconciled trees included CHKs from 51 plant species ranging from support either a correlation with the described commeli- mosses, lycophytes, ferns, and gymnosperms up to flow- nid specific τ WGD [40](Fig. 2a) or the grass specific σ ering plants (Additional file 1: Table S1). To test the ro- WGD [41](Fig. 2b). Lineage specific duplications of CHKs bustness of the tree topology, trees based on different are supported by the reconciled trees in Z. mays, Gossy- substitution models (nucleotide, codon, and protein sub- pium raimondii,and M. acuminata that correlate with stitution models) were calculated and compared. Fur- lineage specific WGDs. Exactely these species all have a thermore, maximum likelihood (ML) and Bayesian CHK copy number greater than four (see Table 1). reconstruction were performed. While genomic comparisons did not provide further In the resulting trees, the main branching pattern evidence that the above-described ancient duplications was highly similar. Robinson-Foulds-(RF)-distances could be traced back to the discussed WGD events, these (Additional file 2: Table S2) between pairs of trees comparisons could clearly show that the lineage specific showed that less than 25% of branches were dissimi- WGD in Z. mays and G. raimondii were involved in the lar for most pairwise comparisons. Codon models fit- copy number increase in these species. For example, in ted the data best according to a model test with Z. mays, the paralogous CHK pairs (ZeamayCHK01/ ModelOMatic [39]. Overall, phylogenetic signal in the CHK02, ZeamayCHK03/CHK04, ZeamayCHK05/CHK06) dataset was sufficient, and tree topology was robust. are located in collinear or syntenic regions (Fig. 2; Table 2; All reconstructed trees supported the presence of three Additional file 2: Table S2). Furthermore, the estimated di- major clades within angiosperm CHKs (Additional file 3: vergence time of the collinear regions (i.e., the paralogous Figure S1 and S2). These clades were named according to blocks) based on pairwise synonymous nucleotide substi- the three well-characterized cytokinin receptors of A. tution rates (Ks distances) vary between 0.15 and 0.29 and thaliana, which were located within the clades (the correlate well with the Ks distance characteristic for gene AHK2, the AHK3, and the AHK4 clade, respectively). The pairs created by the maize-lineage specific WGD [42, 43]. more basal branches of the phylogenetic tree reconstruc- Interestingly, only three of the four duplicates resulting tions were generally less supported. Furthermore, the from this WGD were retained (Fig. 2). In G. raimondii, positioning of the CHKs from gymnosperms was incon- genomic organization and Ks distances also supports the sistent in the various tree reconstructions. However, CHKs lineage-specific gene-expansion of CHKs via the five- to of the early diverging land plants (lycophytes and bryo- six-fold ploidy increase over ~ 60 Mya (Fig. 2; Table 2), phytes) fell in all tree reconstructions reproducibly outside characterized by Ks distances ~ 0.5 [44]. In M. acuminata the above-mentioned groups of the land plant CHKs. At however, which possesses eight CHK-encoding gene cop- the very base of the tree, we found sequences encoding ies and experienced three rounds of WGD in the range of CHKs of the algae K. flaccidum and the moss P. patens. 65 to 100 Mya [21], intragenomic comparisons only sup- port an origin of the paralogs MusacuCHK15/CHK5 Ancient duplications of cytokinin receptors (Table 2) through one of these events. Besides, one pair of To analyze the ancient duplication events during cytoki- tandem duplicates, MusacuCHK01/CHK03, has been nin receptor evolution in more detail, the above- identified within an ultracontig of the M. acuminata described set of gene trees (ML; Mr. Bayes, codon, genome. One further duplication event, which led to a protein, nucleotides trees) were reconciled with a species copy number increase in Brassica sp., could be identified tree that reflected the commonly accepted evolution of as a large scale duplication. The paralogs BrarapCHK04/ land plants (Fig. 1). Furthermore, genomic organization CHK03 are located within an collinear intragenomic concerning location in collinear or syntenic blocks was region (Fig. 2, Additional file 2: Table S2). Based on Kaltenegger et al. BMC Evolutionary Biology (2018) 18:76 Page 5 of 19 Table 2 Gene colinearity of the cytokinin signaling components Table 2 Gene colinearity of the cytokinin signaling components a b (Continued) Gene pair Multiplicon Anchor points Ks/median Ks a b Gene pair Multiplicon Anchor points Ks/median Ks BrarapCHK04/CHK03 340,248 5 1.03/1.05 AralyrRRA02/RRA04 36,506 75 1.1/0.91 Syntenic region AralyrRRA05/RRA07 60,452 30 0.96/0.88 ZeamayCHK03/CHK04 11,281 135 0.19/0.26 AralyrRRA01/RRA03 31,061 179 0.62/0.88 ZeamayCHK05/CHK06 10,125 315 0.15/0.29 AralyrRRA09/RRA10 35,707 80 0.87/0.82 GosraiCHK02/CHK06 59,516 31 0.39/0.48 ArathaRRA03/RRA04 34,065 96 0.83/0.91 GosraiCHK02/CHK04 65,400 27 0.34/0.52 ArathaRRA05/RRA06 56,388 33 1.18/0.96 GosraiCHK04/CHK06 36,049 78 0.38/0.60 ArathaRRA03/RRA04 34,327 93 0.77/0.96 GosraiCHK03/CHK07 36,049 78 0.47/0.60 ArathaRRA07/RRA08 30,310 263 1.18/0.9 GosraiCHK03/CHK08 125,670 13 0.47/0.69 ArathaRRA09/RRA11 30,310 263 0.74/0.9 GosraiCHK07/CHK08 148,631 11 0.43/0.46 ZeamayRRA05/RRA08 11,007 162 1.26/0.26 GosraiCHK01/CHK05 98,488 17 0.39/0.57 ZeamayRRA10/RRA12 33,006 7 1.87/2.23 MusacuCHK05/CHK15 11,287 133 0.37/0.54 ZeamayRRA11/RRA12 11,559 116 0.40/0.33 BrarapHPT01/HPT03 52,943 36 1.12/1.24 MusacuRRA06/RRA07 37,422 6 0.98/0.66 BrarapHPT01/HPT02 296,202 6 0.93/1.34 MusacuRRA07/RRA08 35,224 7 1.45/1.16 BrarapHPT02/HPT03 45,950 44 0.32/0.38 MusacuRRA11/RRA14 16,973 19 0.50/0.50 BrarapHPT07/HPT08 29,877 334 0.63/0.37 MusacuRRA11/RRA13 35,273 7 0.73/0.64 ArathaHPT01/HPT02 99,381 17 0.50/1.19 MusacuRRA03/RRA20 15,306 25 0.74/0.60 MusacuHPT06/HPT09 17,637 17 0.36/0.66 MusacuRRA01/RRA02 25,508 9 0.41/0.41 MusacuHPT07/HPT09 39,756 6 0.36/0.85 MusacuRRA02/RRA04 16,323 21 0.53/0.72 ZeamayHPT01/HPT02 23,667 10 0.12/0.28 BrarapRRB41/RRB42 33,554 102 0.33/0.34 OrsaJaHPT01/HPT02 20,328 13 0.83/1.16 BrarapRRB32/RRB33 30,151 290 0.36/0.37 OrsaJaHPT03/HPT04 13,603 40 1.02/1.15 BrarapRRB37/RRB39 40,489 58 0.48/0.42 BrarapRRA02/RRA12 163,623 10 0.38/0.35 BrarapRRB47/RRB48 122,822 14 0.38/0.33 BrarapRRA02/RRA07 65,158 27 0.47/0.37 BrarapRRB59/RRB60 44,640 47 0.56/0.39 BrarapRRA07/RRA12 31,421 160 0.30/0.39 AralyrRRB18/RRB16 70,367 25 1.07/0.92 BrarapRRA07/RRA11 49,477 40 0.87/1.05 AralyrRRB25/RRB24 105,708 16 0.93/0.78 BrarapRRA04/RRA06 151,125 22 0.84/1.13 AralyrRRB37/RRB25 234,790 7 6.90/6.90 BrarapRRA06/RRA10 56,429 33 0.29/0.39 ArathaRRB22/RRB19 56,277 33 1.31/0.82 BrarapRRA08/RRA09 30,151 290 0.43/0.37 ArathaRRB23/RRB24 111,168 15 0.95/1.19 BrarapRRA01/RRA03 31,812 143 0.41/0.36 ArathaRRB23/RRB45 268,141 6 8.89/4.45 BrarapRRA17/RRA13 82,131 21 1.01/1.01 OrsaJaRRB21/RRB27 12,291 73 1.34/1.34 BrarapRRA17/RRA20 101,049 18 0.92/1.02 OrsaJaRRB21/RRB27 13,539 41 0.98/1.10 BrarapRRA17/RRA21 29,311 402 0.35/0.37 ZeamayRRB24/RRB25 14,587 30 1.30/1.79 BrarapRRA21/RRA13 65,663 27 0.11/1.03 ZeamayRRB21/RRB23 10,224 280 0.12/0.25 BrarapRRA21/RRA22 39,677 61 1.20/1.20 ZeamayRRB18/RRB27 10,125 315 0.14/0.29 BrarapRRA13/RRA22 30,346 258 0.25/0.36 MusacuRRB30/RRB33 17,165 18 0.40/0.56 BrarapRRA13/RRA20 31,080 178 0.27/0.38 MusacuRRB30/RRB36 17,906 17 0.49/0.55 BrarapRRA14/RRA15 61,084 30 0.79/1.19 MusacuRRB39/RRB42 16,390 21 0.47/0.72 BrarapRRA14/RRA18 30,346 258 0.26/0.36 MusacuRRB53/RRB57 17,546 17 0.53/0.53 BrarapRRA15/RRA18 40,545 58 1.05/1.11 colinear gene pairs BrarapRRA15/RRA22 350,553 5 16.41/2.30 Ks distance of the gene pair given in column 1/median Ks distance of the multiplicon OrsaJaRRA06/RRA07 10,605 211 1.30/1.30 OrsaJaRRA08/RRA09 12,741 56 1.47/1.33 OrsaJaRRA01/RRA02 10,348 252 0.02/0.1 Kaltenegger et al. BMC Evolutionary Biology (2018) 18:76 Page 6 of 19 Fig. 2 (See legend on next page.) Kaltenegger et al. BMC Evolutionary Biology (2018) 18:76 Page 7 of 19 (See figure on previous page.) Fig. 2 Reconciled tree of CHK encoding sequences. a Reconciled Bayesian tree of CHK encoding sequences (codon substitution model). Gene tree reconciliation includes rearrangement of branches with a support of less than 70% (posterior probability). Branch support of > 70% is given in the tree. Original Mr. Bayes tree see. Additional file 3: Figure S2. The optimization criterion for reconciliation was the number of duplications and losses. The three dominant clades of CHKs are indicated. Duplicates inferred to be lost are labeled (*LOST) and are given in gray. Of note, most duplicates resulting from WGDs that were lost thereafter were not detected. Thus, WGD events that occurred during CHK evolution are indicated with empty symbols (see key) on the corresponding branches, if resulting duplicates were not retained. If, according to phylogenetic inference plus syntenie/collinearity relationships, a copy resulting from a WGD duplication was retained, duplication nodes are labeled with the corresponding filled symbols, and Ks distances between the gene pairs are given. Furthermore, the median Ks, and the number of gene pairs of the total collinear region is given in brackets. Further duplications inferred from phylogenetic analysis are labeled with a black dot, and if phylogenetic inference indicates that a WGD is involved, the node is also labeled with an empty symbol. Additionally, tandem duplicates of M. acuminata are specified and labeled with the symbol #. Possible CHK pseudogenes from M. acuminata and Z. mays are given in italic. b Alternative tree topology within the AHK4 clade (monocots) of the reconciled maximum likelihood tree (codon substitution) phylogenetic reconstruction and Ks distances, they origi- amplification. According to the reconciled tree, HPTs nated from the Brassicaceae-wide α WGD. experienced further lineage-specific amplification in Concerning the functionality of the above mentioned mono- and dicots through WGDs. For example, three of additional CHK copies in Z. mays, G. raimondii and M. the five HPTs from A. thaliana (AHP2, AHP3 and acuminata, in silico analyses of transmembrane helices AHP5) are found in a clade specific for dicots (HPT predict only three of the eight MusacuCHK copies clade 1, Fig. 3a) and most likely originated by the Brassi- (MusacuCHK01, CHK06 andCHK15) and five of the caceae-specific α and β WGD (Fig. 3a, Additional file 2: seven ZeamayCHK copies (ZeamayCHK03–07) encode Table S2). Gene colinearity of AHP2 and AHP3 support functional receptors while all CHKs from G. raimondii this phylogeny based reconstruction of WGD events are predicted to be functional. Additionally, for Zea- (Table 2, Fig. 2). In contrast, the other two HPTs from mayCHK01/CHK02/CHK04 secretory signal peptides A. thaliana that group with monocot HPTs were not that targets its passenger protein for translocation across amplified by the Brassicaceae-specific WGDs, meaning the endoplasmic reticulum membrane in eukaryotes are that the resulting duplicates have been lost (Fig. 3a,HPT predicted. clade 2 and 3). The monocot HPTs increased either via In summary, given the scenario of ancient origin of the pancereal σ or ρ WGD (Fig. 3a, HPT clade 2) which three CHK copies together with the repeated WGDs that is supported by colinearity of OsAHP1/OsAHP2 and occurred in the subsequent radiation of angiosperm spe- OsPHP1/OsPHP3. However, with the limitations of Ks cies, this indicates that the most common fate of CHK distance based dating in mind, it is not possible to duplicates resulting from WGDs is gene loss. Only in clearly identify which of the two WGD was involved some species that experienced rather recent polyploidi- (Table 2, Fig. 3). The duplicates resulting from the meso- zation events (Ks values 0.1–0.4) have several duplicates polyploidy events in the lineage towards Brassica and Z. been retained. mays were predominantly lost, except the paralogous pairs BrarapHPT02/HPT03, BrarapHPT07/HPT08, and HPT copy number increased through palaeo- and ZeamayHPT01/HPT02 (Fig. 3b). All three pairs are mesopolyploidy events located in colinear regions with Ks distances similar to We analyzed and reconstructed the evolution of HPT- the duplicates of the lineage specific polyploidy events. encoding genes analogous to the CHKs by combining HPTs in M. acuminata showed similar to CHKs a highly phylogenetic tree reconstruction, gene tree-species tree species-specific evolutionary pattern and included reconciliation, and gene synteny analyses to obtain tandem duplications and/or possibly retroposition as evi- predictions of duplication events and their timing. denced by the observation that MusacuHPT02/HPT04 Compared with the analyses of CHKs, a reduced dataset are located on one chromosome in close vicinity including only angiosperm species was used as the align- (chromosome 7) and MusacuHPT02/HPT03/HPT04 able region of the HPTs covering only the 85 amino completely lack introns. acids of the PFAM HPT domain contained limited Concerning the evolution of pseudo-HPTs and thus the phylogenetic signal. The HPT copy number in the evolution of a new regulatory component in cytokinin analyzed species ranged from five to 10 (Table 1). signaling, the phylogenetic reconstructions support that The reconciled trees supported two different topolo- they have emerged at least twice independently. The gies with either two or three repeated duplications duplicability of monocot pseudo-HPTs (OsPHP1/OsPHP3, before the split of mono- and dicots (Fig. 3a and see above) differed from the dicot pseudo-HPTs as the Additional file 3: Figure S4) indicating that ζ and ε former were amplified via a lineage-specific WGD event (ρ WGD may have been involved in ancestral HPT and/or σ) while the latter were not amplified. Kaltenegger et al. BMC Evolutionary Biology (2018) 18:76 Page 8 of 19 Fig. 3 Reconciled tree of HPT encoding sequences. Pseudo HPTs without the canonical histidine are marked with a gray box. a Reconciled Bayesian HPT tree (codon substitution model). Figure style is analogous to that of Fig. 2 concerning color code and the labeling of duplication events. Gene tree reconciliation includes the rearrangement of branches with a support less than 90% (posterior probability). Branch support of > 90% is given in the tree. The three dominant clades of HPTs and tandem duplicates are indicated. b Alternative maximum likelihood tree topology of HPT clade 3, supporting the proposal that the gene pair BrarapHPT8/HPT7 originated by the α’ WGT In summary, the most dominant fate of HPT dupli- but individual clades were consistent between the two cates that arose from WGD events was again gene loss. reconstructions (Additional file 3: Figure S5 and S6). However, some HPTs copies were retained after the β, α, These branches are highlighted in Fig. 5. α, ’ and ρ WGD events which in comparison did not led Although similar numbers of RRAs and RRBs exist in to a similar copy number increase in the upstream flowering plants (Table 1), the in-depth phylogenetic acting CHKs. analyses show that their evolutionary pattern is different. For RRAs, the reconciled trees support a scenario of constant copy number increase via repeated paleao- and Response regulator families expand continuously via mesopolyploidy events. The RRAs of monocots and di- repeated WGD cots form two main groups indicating that their last The evolution and duplication pattern of RRAs and common ancestor possessed two RRA copies, which RRBs of mono- and dicots was reconstructed separately might have arisen from the ancient ζ or ε WGD. Dupli- by using the same approach as that for CHKs and HPTs. cation and differentiation of RRAs then occurred inde- The 112-amino-acid PFAM response regulator receiver pendently during monocot and dicot radiation leading to (RR) domain was used as alignable region. However, the the observed two main clades and the increase in copy tree signal of RRBs was poor indicated by a relative high number in both plant groups. For example, the phylo- percentage of low branch supports. Thus, the threshold genetic analyses indicate that β and γ WGD were in- for the branch support for gene tree reconciliation in the volved in the expansion of RRAs within clade 1 (Fig. 4). RRB trees was lowered to > 0.75 for SH-like branch Phylogeny and collinearity suggest an origin of four par- support. Furthermore, RRB trees reconstructed with alogous RRA pairs in Arabidopsis thaliana (ARR6/ maximum likelihood and Bayesian inference were incon- ARR5, ARR15/ARR7, ARR8/ARR9, ARR17/ARR16) via gruent with regard to the basal branching pattern and the α WGD event. Thus, RRA duplicates were more also the placing of some highly diverged RRB sequences likely to be retained compared to the CHKs and HPTs from M. acuminata (MusacuRRB38, RRB40, RRB43), evolution. Kaltenegger et al. BMC Evolutionary Biology (2018) 18:76 Page 9 of 19 Fig. 4 Reconciled maximum likelihood tree of RRA encoding sequences. Figure style is analogous to that in Fig. 2 concerning color code and the labeling of duplication events. Gene tree reconciliation includes the rearrangement of branches with a support less than 85% (SH-like branch support). Branch support of > 85% is given in the tree. The two dominant clades of RRAs are indicated Kaltenegger et al. BMC Evolutionary Biology (2018) 18:76 Page 10 of 19 On the other hand, RRBs show a more heterogeneous did not give rise to land plants [47]. However, their func- pattern of evolution. Some clades share sequences from tion in these green algae is unclear. Thus, we are monocots and dicots (Fig. 5, clades w - z), suggesting tempted to speculate that the CHKs might have been re- that the last common ancestor possessed at least four cruited specifically in the Charophyceae lineage to serve RRB copies. Of note, within three of these clades, which as a new “plug-in” for the established HPT-RRB net- include A. thaliana PRR2, ARR11, and ARR14 (clade w, work. As has recently been proposed, the reorganization y, z), WGD events rarely led to an increase of the copy of gene regulatory network architecture is a major factor number. However, individual RRB copies, such as the A. underlying evolution, and phytohormonal pathways thaliana ARR10/ARR12 and ARR2/ARR1 gene pairs, might be used to redirect such gene regulatory networks originated via WGD events. Compared with RRAs, amp- to fulfill new functions [48]. lification of RRBs via WGD is patchier. Moreover, RRBs show one more noticeable feature: some gene copies in Cytokinin signaling pathway expanded via WGD events O. sativa subspec. Japonica [45] and B. rapa originated The applied approach of integrating phylogenetic ana- from tandem duplication. lyses with comparisons of genome collinearity, which To summarize, RRAs and RRBs differ in their evolu- has previously been successfully applied to identify an- tionary pattern, although both gene families show a high cient WGD in monocots [40], indicates that many of gene duplicability. RRAs exhibit a clear trend towards the duplications that have affected cytokinin signaling gene retention instead of gene loss after WGD. RRBs components are genome-wide or other large-scale du- may have been amplified by additional duplication plications. Also for ethylene signaling in M. acuminata, mechanisms, e.g., tandem duplication. However, the WGD has been shown to be the dominant duplication reconstruction of RRBs evolution is in general vaguer, mode [49]. However, a few exceptions can be found to suggesting that they might comprise a functionally this common trend. In O. sativum subspec. Japonica, heterogeneous group of sequences. the RRBs OsPRR11, OsPRR12, OsRR29, and OsRR28 originated by tandem duplications (Fig. 5), consistent Discussion with repeated unequal cross-overs [45]. In M. acumi- Genes encoding cytokinin receptors were recruited early nata, the genes encoding MusacuHPT02/HPT04 have in land plant evolution for cytokinin signaling been identified as tandem duplicates. Both copies are The results of our phylogenetic reconstructions of CHK, intronless. MusacuHPT03 is closely related to this pair HPT, RRA, and RRB evolution are in solid agreement of HPTs and also lacks introns. One can speculate that with similar, albeit smaller-scale studies [25, 26, 45, 46] retroposition was involved in the origin of these three and support an early origin of cytokinin signaling at the copies; this would represent yet another mechanism for base of land plant evolution, and functional cytokinin gene duplication affecting members of the cytokinin signaling perhaps is even present in K. flaccidum. How- signaling pathway. ever, the CHKs of K. flaccidum, which form a monophy- In order to estimate the timing of WGD events that letic clade at the basis of the CHK gene trees, group affected cytokinin signaling, we have used Ks distances next to a group of CHKs from the early diverging land between paralogous pairs located in collinear intrage- plant P. patens recently identified by Gruhn et al. [2]. Al- nomic blocks. These distances only offer rough dates though a heterologously expressed member of this sub- that have to be taken with caution because of variable family (PpCHK4) has previously been shown to be able rates among the different gene families [40] and between to bind cytokinin hormones and to translate this binding species [50, 51]. Nevertheless, the WGDs identified as into a cellular response in a dose-dependent manner affecting cytokinin signaling by using this approach are [46], in P. patens a group of three further CHKs evolved in good agreement with those in previous studies. A which clade with the classic cytokinin receptors from genome alignment spanning major Poaceae lineages sup- the other land plants [46]. Genetic and biochemical ex- ports the amplification of the cytokinin signaling compo- periments have shown that these three classic cytokinin nents through WGD events in this lineage [51]. receptors are necessary to mediate the cytokinin re- Furthermore, a phylogenomic analysis of ancestral poly- sponse in this moss [27]. One can speculate at this point ploidy events indicates that RRAs and CHKs were amp- that these three copies were recruited to function specif- lified before the split of basal angiosperms (Aristolochia, ically in cytokinin signaling and have evolved from an- Liriodendron, Nuphar, and Amborella), most likely via cestors forming the very basal group of CHKs from K. the ε WGD [37]. flaccidum and P. patens, whose biological function needs to be investigated. Patterns of GDB in cytokinin signaling Interestingly, HPTs and RRBs are also present in Of special interest in this study has been the testing of Chlorophyceae algae, the branch of the green algae that whether the cytokinin signaling components show a Kaltenegger et al. BMC Evolutionary Biology (2018) 18:76 Page 11 of 19 Fig. 5 Reconciled maximum likelihood tree of RRB encoding sequences. Figure style is analogous to that of Fig. 2 concerning color code and the labeling of duplication events. Gene tree reconciliation includes the rearrangement of branches with a support less than 70% (SH-like branch support). Branching pattern in Bayesian RRB tree reconstructions differed for basal branches from maximum likelihood tree reconstructions. Groups that are consistently reconstructed in Bayesian and maximum likelihood trees are marked with gray boxes. Branch support of > 70% (SH-like branch support/posterior probability) is given in the tree Kaltenegger et al. BMC Evolutionary Biology (2018) 18:76 Page 12 of 19 signature of GDB. Many previous studies have implied trend exists in that downstream elements in cytokinin that GDB preferentially “acts” on multiprotein com- signaling are more likely to be retained while the up- plexes and signal transduction/regulatory networks [15, stream elements tend to be lost (Fig. 6b). This pattern 17, 20]. The central assumption is: if a pathway, whose is in agreement to the trend that upstream genes in a components are linked in a dosage-sensitive relationship, biochemical pathwayevolve moreslowlythandown- is duplicated as a whole via a WGD event, the relative stream genes [52, 53], which might be explained by dosage between genes will be preserved, and the dupli- network characteristics, as most likely the rate- or cated dosage-sensitive genes will be preferentially flux-controlling elements on which natural selection retained. Thus, “interacting” genes are bound to be co- preferentially acts are the upstream elements in a retained over evolutionary time [15]. In core cytokinin pathway [54]. signaling, interacting genes are CHKs, HPTs, RRAs and Focusing on the upstream part – the CHKs, which RRBs (Fig. 1a) and if GDB is a dominating force, co- channel the signal into cytokinin signaling – we found retention after WGD of these components is predicted relatively simple orthologous relationships, and flower- (Fig. 6a), leading to a concerted increase of all interact- ing plants have a similar repertoire of CHKs [45, 46]. ing genes. According to our data, however, a different Whereas CHKs were amplified before the split of Fig. 6 Schematic illustration of the discussed fate of core cytokinin signaling after WGD events. a Gene dosage balance (GDB) between interacting components of core cytokinin signaling predicts co-retention of the encoding genes after WGD leading to a concerted increase of upstream and downstream elements. b Upstream genes, i.e. CHKs that mediate signal input, are duplication resistant compared to downstream element leading to a unequal increase of the individual components of the signaling pathway. In this scenario, perceiving of the signal is conserved but the output can be individually regulated by diverse regulators. c GDB between antagonistic HPTs and pseudo-HPT predicts co-retention. Unequal loss could result in a shift towards either the activator or repressor, depending on which gene is lost Kaltenegger et al. BMC Evolutionary Biology (2018) 18:76 Page 13 of 19 angiosperms possibly via WGD events, the subsequent strongly differ from HPTs. For example, the ancestor of repeated rounds of palaeopolyploidy events during flow- AHP6 and its orthologs in Brassica experienced accord- ering plant evolution did not affect copy number of ing to the reconstructed phylogeny up to three WGD CHKs, meaning that the resulting duplicates were lost. events (α, β, γ) but the resulting duplicates have not This pattern of an unique amplification at the trunk of a been retained (Fig. 3). However, the corresponding an- tree with very few or no subsequent additions of gene tagonists (AHP2, AHP3, AHP5) were partly amplified copies is consistent with the phenomenon of “frozen du- via the α and β WGDs. Thus, in Arabidopsis and plications” described by Makarova et al. [55] as evolu- Brassica, either the pseudo-HPTs experienced unequal tionarily stable paralogous clusters that are not further loss after WGDs (Fig. 6c), or their function as repressor amplified. Only the more recent mesopolyploidy events evolved only after the last common WGD. In rice, both led to a copy number increase of CHKs in G. raimondii, the pseudo- and the authentic-HPT gene family show a Z. mays, and possibly also in M. acuminata. In the two copy number increase due to polyploidy events, however latter species however, several of the additional copies most likely different, pancereal WGD events were in- contain extra sequence motifs like transmembrane heli- volved. Thus, no pattern of co-retention between ces or secretory signal peptides possibly rendering them pseudo-HPT and authentic HPTs is observed. non-functional in cytokinin signaling. One can speculate, In general, lineage-specific gene family expansions that these copies are in the process of pseudogenization seem to be one of the principal means of adaptation and and eventually will be purged from the genome, which one of the most important sources of organizational and may also be the long-term fate for the additional CHK regulatory diversity in crown-group eukaryotes [57]. In copies in G. raimondii. cytokinin signaling, expansion is mostly found for down- HPTs, RRAs, and RRBs displayed overall a higher de- stream elements. This amplification might allow the gree of lineage-specific expansion compared with that of regulatory fine tuning (subfunctionalization) or the re- CHKs. However, the in-depth phylogenetic analyses wiring of signal output to execute novel functionality. showed that the copy number increase of these compo- Furthermore, no unique pattern of co-retention after nents can be traced back to different polyploidy events WGDs, indicative of GDB among the encoded proteins, meaning that the downstream parts were not amplified was identified in cytokinin signaling. Possible explana- in a concerted way. Especially striking is the different tions for this result are: i) an important prerequisite for evolutionary pattern of RRAs and RRBs. While RRAs GDB to manifest is that the duplicates are co-expressed. show a continuous increase in copy number via WGDs, While immediately after a WGD event paralogs show an RRBs are much more heterogenous. But, RRBs comprise identical expression pattern as coding as well as regula- in general a more heterogenous group of genes. Most tory sequences have been duplicated, expression pattern but not all of the RRB genes encode an additional Myb subsequently can change. Indeed, expression divergence domain and some RRBs lack the conserved phosphoryl- between paralogs seems to be the rule rather than the ation site. Functional differentiation between these clas- exception [58] and represents an important way of sub- ses is not well understood and thus, the present analyses functionalization between duplicates. Furthermore, gene might guide further functional studies. dosage balance can be compensated by regulatory Another interesting aspect of the cytokinin signaling changes [18]. To further test the potential role of GDB are the pseudo-HPTs. GDB is especially suggested to be in the evolution of cytokinin signaling, detailed co- a driving force between proteins with opposing actions, expression studies of the interacting components such as enzymatic or transcriptional activators and in- (Fig. 6a), the antagonistic interactions partners (Fig. 6c) hibitors within a pathway, and might shape the duplic- as well as their duplicated copies in polyploids of differ- ability of the encoding genes [15]. Within the cytokinin ent age are necessary. GDB predicts that interaction signaling pathway the HPTs and the pseudo-HPTs, partners should show a correlated expression level. which lack the conserved His residue, are discussed to Thus, after a WGD, an unequal loss of one of the interac- be antagonistic players and thus predicted to be co- tions partners needs to be compensated according to the retained after WGD (Fig. 6c). Of the six HPTs from A. GDB. For several model plants, online tools to study thaliana, only AHP6 is a pseudo-HPT [24, 32]. It nega- expression pattern are available to get first insights in the tively interferes with pathway activity, most likely by dynamic of duplication and expression pattern, e.g. the competing with AHP1–5 for interaction with the phos- Bio-Analytic Resource for Plant Biology (bar.utoronto.ca). phorelay machinery [31, 56]. Whereas in A. thaliana And ii) a central assumption of GDB, namely that gene only one pseudo-HPT (AHP6) has been identified, this expression is directly correlated to copy number variation group is especially large in rice with three copies (CNV), may be not valid. For example, CNV in humans (OsPHP1, PHP2, and PHP3) (Fig. 3). Regarding duplic- only partly account for the differences in expression be- ability, or co-retention after WGD, pseudo-HPTs tween individuals, whereas a large portion of the variance Kaltenegger et al. BMC Evolutionary Biology (2018) 18:76 Page 14 of 19 must stem from other sources [35]. Furthermore, most retention of the various components of cytokinin signaling genes present in CNVs in Drosophila melangolaster show and the diverse modes of duplication as besides WGD also no evidence of increased or diminished transcription [59]. tandem duplication and possibly retroposition was found. Thus, this central assumption of GDB needs to be further Obviously, various mechanisms at diverse levels of inter- studied. action act in shaping the evolution of this signal transduc- However, while GDB may be of minor importance to tion pathway, all of which require further experimental co-stabilize duplicates of cytokinin signaling in the long exploration. Detailed mechanistic studies of specific candi- run, in M. acuminata, two antagonistic components of date genes, e.g., young paralogs in neopolyploid species, ethylene signaling have been shown to be co-retained including analyses of gene expression, gene function (in after the three lineage-specific WGD events indicating vivo and in vitro), and protein interactions will allow to that GDB shaped their evolution [49]. Together, these gain a more complete picture of the forces that shape the results reflect the complexity of molecular mechanisms fate of a duplicated gene. that shape gene duplicability. The fixation and the reten- tion of duplicated genes in plant genomes seem to be Methods context-dependent events, and relevant factors include Identification of cytokinin receptor encoding sequences intrinsic properties, such as gene function, and the en- Sequences encoding putative or confirmed cytokinin re- vironment in which the duplication occurred [60, 61]. ceptors were obtained from the genomes and transcrip- Several specific gene features including gene ontology tomes of selected species (Additional file 1: Table S1, (GO-slim classification), sequence-related features (gene Additional file 4: alignment 1) by screening the online and protein sizes, the GC content in the third codon platforms “Phytozome”, “Ensemble”, and “1 K database” position, protein domain size), expression-related fea- [66–68] by using the Basic Local Alignment Search Tool tures (level of expression and biotic and abiotic respon- (BLAST). Species were chosen to cover the evolution of siveness), and conservation-related features (omega, the cytokinin signaling during land plant evolution by using ratio of relative fixation rates of synonymous and nonsy- the charophyte alga Klebsormidium flaccidum [25] nonymous mutations, as an indicator of selective pres- as an outgroup. For each species, we performed an itera- sure), have been shown to influence gene retention after tive BLASTN search with the sequences encoding the WGD [56]. Another important feature of proteins effect- Cyclase Histidine kinase Associated Sensory Extracellu- ing the duplicability of the encoding genes might be lar (CHASE) domain of cytokinin receptors of the model their tendency to form symmetrical homomers. Duplica- plant Arabidopsis thaliana as query sequences tion of a gene that encodes a homomeric protein can (NCBI accession numbers AHK4 NM_201667, AHK2 lead to the phenomenon of paralog interference, which BT002530, AHK3 NM_102494). The CHASE domain is basically predicts a functional link between paralogous the ligand binding domain of cytokinin receptors [29]. genes via interactions of the encoded proteins in multi- Subsequently we used every identified sequence again as meric complexes and one important outcome of paralog query to search for further sequences with homology to interference is a prolonged retention timer after duplica- cytokinin receptors in the respective species. Sequences tion [62]. Examples of paralog interference has been de- were named as CHASE-domain containing His kinase scribed for MADS box transcriptional regulators in the (CHK) according to the common nomenclature [69] and yeast Kluyveromyces lactis [63] and steroid receptors numbered serially. For the simple and rapid identifica- [64], both form dimers. Of note, also cytokinin receptors tion of the species, the first three initials of the genus are also thought to function as dimers [28, 65] and para- and of the species name are given, e.g., Aratha for log interference could influence their duplicability. A Arabidopsis thaliana (Additional file 1: Table S1). prolonged retention of CHKs in Z. mays compared to To differentiate between Oryza sativa ssp. japonia HPTs, RRAs and RRBs (compare Figs. 2, 3, 4 and 5) and Oryza sativa ssp. indica,the first two initials could be an indicator but further studies of protein in- of the genus, species, and subspecies name are given. Fur- teractions are necessary. thermore, we added the annotation of well-characterized cytokinin receptors from Arabidopsis thaliana [70] Conclusions and Oryza sativa ssp. japonica [71]. If different The observation of the non-random loss of genes follow- splicing variants of a gene were identified, only the longest ing WGD has stimulated much discussion regarding the variant was used for subsequent analyses. The individual molecular mechanisms that influence these outcomes. sequences of this sequence collection were analyzed re- The pattern of gene retention after WGDs within the garding their protein domain structure by using the PFAM cytokinin signaling pathway fits best to the model stating database [72]. Sequences that comprised the following that downstream genes in a pathway evolve faster. How- four PFAM protein domains were identified as functional ever, most striking is the heterogeneous pattern of gene cytokinin receptors: i) CHASE domain (PF03924); ii) Kaltenegger et al. BMC Evolutionary Biology (2018) 18:76 Page 15 of 19 Response regulator receiver domain (PF00072); iii)Histi- excluded from the analyses. The RRAs were clearly dis- dine kinase-, DNA gyrase B-, HSP90-like ATPase domain tinguished as they formed a well-supported monophlye- (PF02518); and iv) His Kinase A (phospho-acceptor) do- tic clade, even in phylogenetic analysis with diverse main (PF00512). As we are interested in the retention of species. The same is true for RRCs, which clearly group functional genes after WGDs, stringent selection criteria with the response regulator receiver domain from cyto- were applied, and sequences that lacked more than 50% in kinin receptors [46]. The PRRs were identified by their one of the four domains were rejected. The resulting data- phylogenetic position and by an additional characteristic set of 166 sequences was used for phylogenetic analyses. protein motif: the Constans/Constans-like/TOC1 (CCT) Genes encoding cytokinin receptors were also analyzed domain [46]. The remaining response regulators were concerning the presence of transmembrane helices with designated as RRBs. This group is heterogeneous con- the “TMHMM Server v. 2.0” [73] and the presence of sig- cerning its phylogenetic composition and includes re- nal peptides with the SignalP [74] and Phobius [75]. sponse regulators that have an additional myeloblastosis (Myb)-related DNA binding domain [46] but also Identification of HPT and RR genes response regulators that lack this domain as well as To investigate the evolutionary pattern of the whole response regulators that have been annotated as pseudo- cytokinin signaling pathway, we additionally identified response regulators by Suzuki et al. [79] based on alter- HPT and RR encoding genes in a subset of species that ations in the conserved phosphorylation site (the DDK covered flowering plants (Additional file 4: alignment 2– motif) [80]. The biological roles played by latter, nonca- 4). We followed the same strategy as that described nonical members are not clear but based on the agreed above, performing an iterative BLASTN search starting nomenclature for cytokinin signaling components, they with sequences encoding HPT and RR domains from were included in the RRB group in this study [69]. When Arabidopsis thaliana [69]. We also used the PFAM data- experimental evidence was available, the information base [72] to analyze the protein domain structure of the was included in the analysis. Thus, based on work by identified sequences. Sequences that lacked more than Tsai and colleagues [45], OsRR27, OsRR31, OsRR32, 15% of RR (PF00072)- and HPT (PF01627)-PFAM do- OsRR30, and OsRR33 were shown not to function in mains were rejected. For phylogenetic analysis, only the cytokinin signaling, and thus, these sequences were ex- conserved regions encoding the HPT and RR PFAM do- cluded from our analyses. Furthermore, sequences that main were used as the alignable region. HPT encoding contained an additional EHD1 domain were excluded. sequences were further classified as authentic His- containing phosphotransfer proteins if they contained a conserved histidine residue, and sequences that lacked Phylogenetic analyses this site were classified as pseudo His-containing phos- To reconstruct the process and pattern of evolution of photransfer proteins [76]. Moreover, response regulators the cytokinin signaling pathway during land plant evolu- were further categorized into type-A, type-B, type-C, tion with emphasis on WGD events, the following data- and clock related- and pseudo-response regulators and sets were assembled from the sequence collection for named as RRA, RRB, RRC, and PRR, respectively [69]. phylogenetic analyses: i) sequences encoding cytokinin We assigned the sequences of our dataset to these cat- receptors of land plants ranging from Klebsormidium egories based on phylogenetic analysis as previously sug- flaccidum to monocots and dicots, and from monocots gested [69]. We used the well-characterized set of RRs and dicots: ii) sequences encoding HPTs, iii) sequences from A. thaliana as a reference (see [38]) in these phylo- encoding RRAs, and iv) sequences encoding RRBs. The genetic analyses and compared the individual species- respective sequences were aligned with the MAFFT mul- specific sets of RRs with this reference. For the more tiple sequence alignment algorithm (MSA) [79] based on divergent species investigated in this study (P. abies, P codons by using the GUIDANCE web server [81], which taeda, S. moellendorffii, and K. flaccidum), A trichopoda evaluates the confidence of the MSA. Based on the was additionally included in the comparative phylogen- GUIDANCE confidence scores, unreliable columns were etic analyses to classify RRs. To perform these analyses, removed (threshold 0.93). We used ModelOMatic [39], we calculated multiple sequence alignments with the which allows comparisons of nucleotide, amino acid, MUSCLE algorithm [77] implemented in MEGA Soft- and codon models, to identify the best substitution ware package, version 6.06 [78] and calculated Max- model for subsequent tree reconstruction. For recon- imum Likelihood trees with MEGA based on the structing the evolution of cytokinin receptors, the ro- General Time Reversible (GTR) substitution model with bustness of the reconstructed tree topology was tested a Gamma distribution to model evolutionary rate differ- by comparing phylogenetic trees based on nucleotide ences among sites (5 categories) with 100 bootstrap rep- and codon substitution models. When reconstructing licates. Sites with less than 95% site coverage were the evolution of HPTs, RRAs, and RRBs we focused on Kaltenegger et al. BMC Evolutionary Biology (2018) 18:76 Page 16 of 19 the best substitution model predicted by ModelOMatic. nucleotide substitution rates, and ω. Exponential priors In general, maximum likelihood trees based on the gen- were used for the shape parameter of the gamma distribu- eral time reversible (GTR) nucleotide substitution model tion of rate variation. Branch lengths were unconstrained. were calculated with RaxML [82], and the CAT approxi- The MCMC chain was sampled every 500 generations mation of rate heterogeneity was used to model rate het- with the burn-in set to 25% from the cold chain. Conver- erogeneity. RaxML involves a rapid hill climbing gence diagnostics were calculated every 5000 generations algorithm to search for the best tree, with the subtree and analyses were continued until the average standard pruning re-grafting (SPR) starting on a randomized max- deviation of split frequencies reached 0.01. For all trees, imum parsimony tree. The initial rearrangement settings cutoffs for the branch support were selected according to of SPR and the number of rate categories for the CAT the tree signal, also for the subsequent gene tree reconcili- approximation were optimized through comparative ation. Trees with a high percentage of low support were analysis by using various settings. Based on the original interpreted to contain low phylogenetic signal and to be alignment, 20 tree inferences with the optimized settings less robust. To compare the tree topologies calculated were performed to find the best maximum likelihood based on different substitution models and the different tree according to the final likelihoods. Branch support algorithms (Mr. Bayes and maximum likelihood), we de- was determined by 100 bootstrap repeats and mapped termined Robinson-Foulds distances (RF distance) [89]in onto the best tree of the 20 inferences. R, package phangorn 2.4.0, and normalized them by divid- To calculate maximum likelihood trees under codon ing by the maximal possible distance [90]. RF-distance models, we used CodonPhyML [83]. We compared three matrix is provided as Additional file 5: Table S3. For illus- codon models, namely i) the Goldman and Yang model tration, reconstructed RRB cladograms based on max- [84], ii) the Muse and Gaut model [85], and iii)the YAP imum likelihood (codonphyML, codon substitution model model [86], all of which differ in their instantaneous sub- YAP CF3x4) and Mr. Bayes (codon substitution model) stitution rates between codons. The stationary frequency are compared with R package dendextend 1.7.0 [91]. The of codons and the transition-transversion ratio were esti- tangelgram is provided as Additional file 3: Figure S7. mated by maximum likelihood. The ratio of synonymous and nonsynonymous substitution rates was modeled as being constant over sites (M0 model). Site-rate variation Gene tree reconciliation was drawn from a discrete gamma distribution with four We used Notung [92] for exploring alternate hypotheses classes. Starting from an initial tree built by using the about duplication events of the cytokinin signaling BioNJ algorithm, nearest neighbor interchange was used genes. Both rooting and a rearrangement tool were used to search for the best tree topology. Branch support was to minimize the overall Duplication/Loss score of a gene assessed by using SH-aLRT statistics, which is a conserva- tree. A cladogram reflecting land plant evolution was tive test for branch support, comparable to standard boot- built based on the work of the Angiosperm Phylogeny strap [87]. The tree with the highest log likelihood was Group [93] and Zou et al. [94] for relationships between selected for further analyses. In addition to the maximum rice species. The most basal eudicot is Nelumbo nucifera likelihood approach, we used Bayesian inference to recon- [95]. Amborella trichopoda is sister to all extant angio- struct phylogenetic trees (Mr. Bayes version 3.2 software; sperms [96]. Most basal land lineages are represented by [88]). The following substitution models were used: i)the the bryophyte Physcomitrella patens [97] and Klebsormi- GTR model of DNA substitution with among-site vari- dium flaccidium marking the transition of aquatic to ter- ation drawn from a gamma distribution, ii) the Jones restrial life [25]. This cladogram (Fig. 1) was used as a fixed-rate model of amino acid substitution, which was species tree for gene tree reconciliation. chosen via the model jumping option in Mr. Bayes, iii)the implemented codon model, which is based on the GY and MG codon models, to reconstruct phylogenetic trees. In Comparative genomics (gene collinearity) the case of the codon model, the ratio of synonymous and We used the PLAZA 3.0 online database [98] to study nonsynonymous substitution rates (ω) was again modeled the genomic organization of the cytokinin signaling as being constant over sites. The posterior probabilities of genes. In the PLAZA database, collinear and syntenic re- the phylogenetic tree model were estimated as part of the gions within and between genomes are pre-computed by Bayesian analyses by using Markov chain Monte Carlo using i-ADHoRe [99]. An intra-species comparison of sampling with Metropolis coupling running four chains in the whole genome (WGDotplot) reports all collinear re- two simultaneous analyses. The analyses were run with gions, i.e., duplicated blocks, found within a genome. uniform prior distributions for tree topology. A flat The age of the paralogs is provided based on pairwise Dirichlet distribution (1.0) was used for stationary fre- synonymous substitution rate (Ks distances) calculated quencies of nucleotides, codons, and amino acids, with PAML [100]. Kaltenegger et al. BMC Evolutionary Biology (2018) 18:76 Page 17 of 19 Additional files Received: 29 September 2017 Accepted: 14 March 2018 Additional file 1: List of analysed species and used data source, and list of the analysed CHK encoding sequences. (XLSX 46 kb) References 1. Adams KL, Wendel JF. Polyploidy and genome evolution in plants. Curr Additional file 2: Collinear regions, Ks distances. (XLSX 357 kb) Opin Plant Biol. 2005;8:135–41. Additional file 3: Figure S1. ML Tree CHKs 1, Figure S2. Mr. Bayes 2. Wood TE, Takebayashi N, Barker MS, Mayrose I, Greenspoon PB, Rieseberg CHKs, Figure S3. Reconciled ML tree CHKs, Figure S4. Reconciled LH. The frequency of polyploid speciation in vascular plants. Proc Natl Acad ML tree HPTs, Figure S5. ML tree RRBs, Figure S6. Mr. Bayes tree RRBs, Sci U S A. 2009;106:13875–9. Figure S7. Cladogram comparison RRBs. (PDF 2407 kb) 3. Mayrose I, Zhan SH, Rothfels CJ, Magnuson-Ford K, Barker MS, Rieseberg LH, Additional file 4: Supplemental alignment of #CHKs, #HPTs, #RRAs and et al. Recently formed polyploid plants diversify at lower rates. Science. #RRBs. (FASTA 770 kb) 2011;333:1257. 4. Vanneste K, Baele G, Maere S, de Peer YV. Analysis of 41 plant genomes Additional file 5: RF distances between phylogenetic trees of CHKs, supports a wave of successful genome duplications in association with the HPTs, RRAs and RRBs. (XLSX 15 kb) cretaceous–Paleogene boundary. Genome Res. 2014;24:1334–47. 5. Lynch M, Conery JS. The evolutionary fate and consequences of duplicate genes. Science. 2000;290:1151–5. Abbreviations 6. Karev GP, Wolf YI, Berezovskaya FS, Koonin EV. Gene family evolution: an in- CCT domain: Constans/Constans-like/TOC1 domain; CHASE: Cyclases/ depth theoretical and simulation analysis of non-linear birth-death- Histidine kinases Associated Sensing Extracellular; CHK: CHASE domain innovation models. BMC Evol Biol. 2004;4:32. containing histidine kinase; GDB: Gene dosage balance; GTR: General time 7. De Smet R, Adams KL, Vandepoele K, Van Montagu MCE, Maere S, Van de reversible (substitution model); HPT: Histidine phosphotransfer protein; Peer Y. Convergent gene loss following gene and genome duplications ML: Maximum likelihood; Myb domain: Myeloblastosis related DNA-binding creates single-copy families in flowering plants. Proc Natl Acad Sci U S A. domain; PRR: Pseudo-response regulator; RF: Distance robinson-foulds dis- 2013;110:2898–903. tance; RRA: Type-A response regulator; RRB: Type-B response regulator; 8. Geiser C, Mandáková T, Arrigo N, Lysak MA, Parisod C. Repeated whole- WGD: Whole genome duplication; WGT: Whole genome triplication genome duplication, karyotype reshuffling, and biased retention of stress- responding genes in buckler mustard. Plant Cell. 2016;28:17–27. 9. Li Z, Defoort J, Tasdighian S, Maere S, de Peer YV, Smet RD. Gene Acknowledgments duplicability of core genes is highly consistent across all angiosperms. Plant We are grateful to Prof. Dietrich Ober and to Prof. Lawrence Hobbie for Cell. 2016;28:326–44. discussing the manuscript. We also thank Anne Kupczok of the 10. Papp B, Pál C, Hurst LD. Dosage sensitivity and the evolution of gene bioinformatics support group of the University of Kiel for her advice with the families in yeast. Nature. 2003;424:194–7. phylogenetic analyses. 11. Maere S, Bodt SD, Raes J, Casneuf T, Montagu MV, Kuiper M, et al. Modeling gene and genome duplications in eukaryotes. Proc Natl Acad Sci U S A. Funding 2005;102:5454–9. The project was funded by the Dahlem Center of Plant Sciences and the 12. Freeling M. Bias in plant gene content following different sorts of University of Kiel (promotion program for women in science). duplication: tandem, whole-genome, segmental, or by transposition. Annu Rev Plant Biol. 2009;60:433–53. 13. Huminiecki L, Heldin CH. 2R and remodeling of vertebrate signal Availability of data and materials transduction engine. BMC Biol. 2010;8:146. The data sets supporting the conclusions of this article are included within 14. Rodgers-Melnick E, Mane SP, Dharmawardhana P, Slavov GT, Crasta OR, the article (and its Additional files). Strauss SH, et al. Contrasting patterns of evolution following whole genome versus tandem duplication events in Populus. Genome Res. 2012;22:95–105. 15. Veitia RA, Bottani S, Birchler JA. Cellular reactions to gene dosage Authors’ contributions imbalance: genomic, transcriptomic and proteomic effects. Trends Genet. EK conducted sequence searches and data analysis, interpreted the data and 2008;24:390–7. wrote the manuscript. SL identified and analyzed sequences. AH conceived 16. Edger PP, Pires JC. Gene and genome duplications: the impact of dosage- of the project, interpreted the data, and wrote the manuscript. All authors sensitivity on the fate of nuclear genes. Chromosom Res. 2009;17:699–717. read and approved the final version. 17. Birchler JA, Veitia RA. Gene balance hypothesis: connecting issues of dosage sensitivity across biological disciplines. Proc Natl Acad Sci U S A. 2012;109: 14746–53. Ethics approval and consent to participate 18. Veitia RA, Bottani S, Birchler JA. Gene dosage effects: nonlinearities, genetic Not applicable. interactions, and dosage compensation. Trends Genet. 2013;29:385–93. 19. Conant GC, Birchler JA, Pires JC. Dosage, duplication, and diploidization: Consent for publication clarifying the interplay of multiple models for duplicate gene evolution over Not applicable. time. Curr Opin Plant Biol. 2014;19:91–8. 20. Veitia RA. Gene dosage balance: deletions, duplications and dominance. Trends Genet. 2005;21:33–5. Competing interests 21. D’Hont A, Denoeud F, Aury J-M, Baurens F-C, Carreel F, Garsmeur O, et al. The authors declare that they have no competing interests. The banana (Musa acuminata) genome and the evolution of monocotyledonous plants. Nature. 2012;488:213–7. 22. Blanc G, Wolfe KH. Functional divergence of duplicated genes formed by Publisher’sNote polyploidy during Arabidopsis evolution. Plant Cell. 2004;16:1679–91. Springer Nature remains neutral with regard to jurisdictional claims in 23. Aury J-M, Jaillon O, Duret L, Noel B, Jubin C, Porcel BM, et al. Global trends published maps and institutional affiliations. of whole-genome duplications revealed by the ciliate Paramecium tetraurelia. Nature. 2006;444:171–8. Author details 24. Hwang I, Sheen J, Müller B. Cytokinin signaling networks. Annu Rev Plant Department Biochemical Ecology and Molecular Evolution, Botanical Biol. 2012;63:353–80. Institute, Christian-Albrechts-University, Kiel, Germany. Institute of Applied 25. Hori K, Maruyama F, Fujisawa T, Togashi T, Yamamoto N, Seo M, et al. Genetics, Freie Universität Berlin, Berlin, Germany. Biology Department, Klebsormidium flaccidum genome reveals primary factors for plant terrestrial Adelphi University, Garden City, USA. adaptation. Nat Commun. 2014;5:3978. Kaltenegger et al. BMC Evolutionary Biology (2018) 18:76 Page 18 of 19 26. Wang C, Liu Y, Li S-S, Han G-Z. Insights into the origin and evolution of 51. Wang X, Wang J, Jin D, Guo H, Lee T-H, Liu T, et al. Genome alignment plant hormone signaling machinery. Plant Physiol. 2015;167:872–86. spanning major poaceae lineages reveals heterogeneous evolutionary rates 27. von Schwartzenberg K, Lindner A-C, Gruhn N, Šimura J, Novák O, and alters inferred dates for key evolutionary events. Mol Plant. 2015;8:885–98. Strnad M, et al. CHASE domain-containing receptors play an essential 52. Rausher MD, Miller RE, Tiffin P. Patterns of evolutionary rate variation among role in the cytokinin response of the moss Physcomitrella patens.J Exp genes of the anthocyanin biosynthetic pathway. Mol Biol Evol. 1999;16:266–74. Bot. 2016;67:667–79. 53. Lu Y, Rausher MD. Evolutionary rate variation in anthocyanin pathway 28. Heyl A, Schmülling T. Cytokinin signal perception and transduction. Curr genes. Mol Biol Evol. 2003;20:1844–53. Opin Plant Biol. 2003;6:480–8. 54. Olson-Manning CF, Lee C-R, Rausher MD, Mitchell-Olds T. Evolution of flux 29. Heyl A, Wulfetange K, Pils B, Nielsen N, Romanov GA, Schmülling T. control in the glucosinolate pathway in arabidopsis thaliana. Mol Biol Evol. Evolutionary proteomics identifies amino acids essential for ligand- 2013;30:14–23. binding of the cytokinin receptor CHASE domain. BMC Evol Biol. 55. Makarova KS, Wolf YI, Mekhedov SL, Mirkin BG, Koonin EV. Ancestral 2007;7:62. paralogs and pseudoparalogs and their role in the emergence of the 30. Hothorn M, Dabi T, Chory J. Structural basis for cytokinin recognition by eukaryotic cell. Nucleic Acids Res. 2005;33:4626–38. Arabidopsis thaliana histidine kinase 4. Nat Chem Biol. 2011;7:766–8. 56. Moghe GD, Hufnagel DE, Tang H, Xiao Y, Dworkin I, Town CD, et al. 31. Punwani JA, Hutchison CE, Schaller GE, Kieber JJ. The subcellular distribution Consequences of whole-genome triplication as revealed by comparative of the Arabidopsis histidine phosphotransfer proteins is independent of genomic analyses of the wild radish raphanus raphanistrum and three other cytokinin signaling. Plant J. 2010;62:473–82. brassicaceae species. Plant Cell. 2014;26:1925–37. 32. Mähönen AP, Bishopp A, Higuchi M, Nieminen KM, Kinoshita K, 57. Lespinet O, Wolf YI, Koonin EV, Aravind L. The role of lineage-specific gene Törmäkangas K, et al. Cytokinin signaling and its inhibitor AHP6 regulate cell family expansion in the evolution of eukaryotes. Genome Res. 2002;12:1048–59. fate during vascular development. Science. 2006;311:94–8. 58. Renny-Byfield S, Gallagher JP, Grover CE, Szadkowski E, Page JT, Udall JA, et 33. Ruszkowski M, Brzezinski K, Jedrzejczak R, Dauter M, Dauter Z, Sikorski M, et al. Ancient gene duplicates in gossypium (cotton) exhibit near-complete al. M. Truncatula histidine-containing phosphotransfer protein: structural expression divergence. Genome Biol Evol. 2014;6:559–71. and biochemical insights into cytokinin transduction pathway in plants. 59. Zhou J, Lemos B, Dopman EB, Hartl DL. Copy-number variation: the balance FEBS J. 2013;280:3709–20. between gene dosage and expression in drosophila melanogaster. Genome 34. Hwang I, Sheen J. Two-component circuitry in arabidopsis cytokinin signal Biol Evol. 2011;3:1014–24. transduction. Nature. 2001;413:383–9. 60. Hudson CM, Puckett EE, Bekaert M, Pires JC, Conant GC. Selection for higher 35. TO JPC, Haberer G, Ferreira FJ, Deruère J, Mason MG, Schaller GE, et al. gene copy number after different types of plant gene duplications. Type-a arabidopsis response regulators are partially redundant negative Genome Biol Evol. 2011;3:1369–80. regulators of cytokinin signaling. Plant Cell. 2004;16:658–71. 61. Rody HVS, Baute GJ, Rieseberg LH, Oliveira LO. Both mechanism and age of 36. Van de Peer Y, Fawcett JA, Proost S, Sterck L, Vandepoele K. The flowering duplications contribute to biased gene retention patterns in plants. BMC world: a tale of duplications. Trends Plant Sci. 2009;14:680–8. Genomics. 2017;18:46. 37. Jiao Y, Wickett NJ, Ayyampalayam S, Chanderbali AS, Landherr L, Ralph PE, 62. Kaltenegger E, Ober D. Paralogue interference affects the dynamics after et al. Ancestral polyploidy in seed plants and angiosperms. Nature. 2011; gene duplication. Trends Plant Sci. 2015;20:814–21. 473:97–100. 63. Baker CR, Hanson-Smith V, Johnson AD. Following gene duplication, 38. Mandáková T, Joly S, Krzywinski M, Mummenhoff K, Lysak MA. Fast paralog interference constrains transcriptional circuit evolution. Science. diploidization in close mesopolyploid relatives of arabidopsis. Plant Cell. 2013;342:104–8. 2010;22:2277–90. 64. Bridgham JT, Brown JE, Rodríguez-Marí A, Catchen JM, Thornton JW. 39. Whelan S, Allen JE, Blackburne BP, Talavera D. ModelOMatic: fast and Evolution of a new function by degenerative mutation in cephalochordate automated model selection between RY, nucleotide, amino acid, and steroid receptors. PLoS Genet. 2008;4:e1000191. codon substitution models. Syst Biol. 2015;64:42–55. 65. Keshishian EA, Rashotte AM. Plant cytokinin signalling. Essays Biochem. 40. Jiao Y, Li J, Tang H, Paterson AH. Integrated syntenic and phylogenomic 2015;58:13–27. analyses reveal an ancient genome duplication in monocots. Plant Cell. 66. Goodstein DM, Shu S, Howson R, Neupane R, Hayes RD, Fazo J, et al. 2014;26:2792–802. Phytozome: a comparative platform for green plant genomics. Nucleic 41. Tang H, Bowers JE, Wang X, Paterson AH. Angiosperm genome Acids Res. 2012;40:D1178–86. comparisons reveal early polyploidy in the monocot lineage. Proc Natl Acad 67. Matasci N, Hung L-H, Yan Z, Carpenter EJ, Wickett NJ, Mirarab S, et al. Data Sci U S A. 2010;107:472–7. access for the 1000 plants (1KP) project. Giga Science. 2014;3:17. 42. Blanc G, Wolfe KH. Widespread Paleopolyploidy in Model plant species inferred 68. Yates A, Akanni W, Amode MR, Barrell D, Billis K, Carvalho-Silva D, et al. from age distributions of duplicate genes. Plant Cell. 2004;16:1667–78. Ensembl 2016. Nucleic Acids Res. 2016;44:D710–6. 43. Estep MC, McKain MR, Diaz DV, Zhong J, Hodge JG, Hodkinson TR, et al. 69 Heyl A, Brault M, Frugier F, Kuderova A, Lindner A-C, Motyka V, et al. Allopolyploidy, diversification, and the miocene grassland expansion. Proc Nomenclature for members of the two-component signaling pathway of Natl Acad Sci U S A. 2014;111:15149–54. plants. Plant Physiol. 2013;161:1063–5. 44. Paterson AH, Wendel JF, Gundlach H, Guo H, Jenkins J, Jin D, et al. 70. Suzuki T, Miwa K, Ishikawa K, Yamada H, Aiba H, Mizuno T. The arabidopsis Repeated polyploidization of gossypium genomes and the evolution of sensor his-kinase, AHk4, can respond to cytokinins. Plant Cell Physiol. 2001; spinnable cotton fibres. Nature. 2012;492:423–7. 42:107–13. 45. Tsai Y-C, Weir NR, Hill K, Zhang W, Kim HJ, Shiu S-H, et al. Characterization 71. Schaller GE, Doi K, Hwang I, Kieber JJ, Khurana JP, Kurata N, et al. of genes involved in cytokinin signaling and metabolism from rice. Plant Nomenclature for two-component signaling elements of rice. Plant Physiol. Physiol. 2012;158:1666–84. 2007;143:555–7. 46. Gruhn N, Halawa M, Snel B, Seidl MF, Heyl A. A subfamily of putative 72. Finn RD, Bateman A, Clements J, Coggill P, Eberhardt RY, Eddy SR, et al. cytokinin receptors is revealed by an analysis of the evolution of the two- Pfam: the protein families database. Nucleic Acids Res. 2014;42:D222–30. component signaling system of plants. Plant Physiol. 2014;165:227–37. 73. Krogh A, Larsson B, von Heijne G, Sonnhammer EL. Predicting 47. Pils B, Heyl A. Unraveling the evolution of cytokinin signaling. Plant Physiol. transmembrane protein topology with a hidden markov model: application 2009;151:782–91. to complete genomes. J Mol Biol. 2001;305:567–80. 48. Pires ND, Yi K, Breuninger H, Catarino B, Menand B, Dolan L. Recruitment 74. Petersen TN, Brunak S, von Heijne G, Nielsen H. SignalP 4.0: discriminating and remodeling of an ancient gene regulatory network during land plant signal peptides from transmembrane regions. Nat Methods. 2011;8:785–6. evolution. Proc Natl Acad Sci U S A. 2013;110:9571–6. 75. Käll L, Krogh A, Sonnhammer ELL. Advantages of combined transmembrane 49. Jourda C, Cardi C, Mbéguié-A-Mbéguié D, Bocs S, Garsmeur O, D’Hont A, et topology and signal peptide prediction–the phobius web server. Nucleic al. Expansion of banana (Musa acuminata) gene families involved in Acids Res. 2007;35:W429–32. ethylene biosynthesis and signalling after lineage-specific whole-genome 76. Suzuki T, Sakurai K, Imamura A, Nakamura A, Ueguchi C, Mizuno T. Compilation duplications. New Phytol. 2014;202:986–1000. and characterization of histidine-containing phosphotransmitters implicated in 50. Smith SA, Donoghue MJ. Rates of molecular evolution are linked to life his-to-asp phosphorelay in plants: AHP signal transducers of Arabidopsis history in flowering plants. Science. 2008;322:86–9. thaliana. Biosci Biotechnol Biochem. 2000;64:2486–9. Kaltenegger et al. BMC Evolutionary Biology (2018) 18:76 Page 19 of 19 77. Edgar RC. MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res. 2004;32:1792–7. 78. Tamura K, Stecher G, Peterson D, Filipski A, Kumar S. MEGA6: molecular evolutionary genetics analysis version 6.0. Mol Biol Evol. 2013;30:2725–9. 79. Urao T, Yakubov B, Yamaguchi-Shinozaki K, Shinozaki K. Stress-responsive expression of genes for two-component response regulator-like proteins in arabidopsis thaliana. FEBS Lett. 1998;427:175–8. 80. Katoh K, Misawa K, Kuma K, Miyata T. MAFFT: a novel method for rapid multiple sequence alignment based on fast fourier transform. Nucleic Acids Res. 2002;30:3059–66. 81. Sela I, Ashkenazy H, Katoh K, Pupko T. GUIDANCE2: accurate detection of unreliable alignment regions accounting for the uncertainty of multiple parameters. Nucleic Acids Res. 2015;43:W7–14. 82. Stamatakis A. RAxML version 8: a tool for phylogenetic analysis and post- analysis of large phylogenies. Bioinformatics. 2014;30:1312–3. 83. Gil M, Zanetti MS, Zoller S, Anisimova M. CodonPhyML: fast maximum likelihood phylogeny estimation under codon substitution models. Mol Biol Evol. 2013;30:1270–80. 84. Goldman N, Yang Z. A codon-based model of nucleotide substitution for protein-coding DNA sequences. Mol Biol Evol. 1994;11:725–36. 85. Muse SV, Gaut BS. A likelihood approach for comparing synonymous and nonsynonymous nucleotide substitution rates, with application to the chloroplast genome. Mol Biol Evol. 1994;11:715–24. 86. Yap VB, Lindsay H, Easteal S, Huttley G. Estimates of the effect of natural selection on protein-coding content. Mol Biol Evol. 2010;27:726–34. 87. Anisimova M, Gil M, Dufayard J-F, Dessimoz C, Gascuel O. Survey of branch support methods demonstrates accuracy, power, and robustness of fast likelihood-based approximation schemes. Syst Biol. 2011;60:685–99. 88. Ronquist F, Teslenko M, van der Mark P, Ayres DL, Darling A, Höhna S, et al. MrBayes 3.2: efficient bayesian phylogenetic inference and model choice across a large model space. Syst Biol. 2012;61:539–42. 89. Robinson DF, Foulds LR. Comparison of phylogenetic trees. Math Biosci. 1981;53:131–47. 90. Schliep K, Potts AJ, Morrison DA, Grimm GW. Intertwining phylogenetic trees and networks. Methods Ecol Evol. 2017;8:1212–20. 91. Galili T. Dendextend: an R package for visualizing, adjusting and comparing trees of hierarchical clustering. Bioinformatics. 2015;31:3718–20. 92. Stolzer M, Lai H, Xu M, Sathaye D, Vernot B, Durand D. Inferring duplications, losses, transfers and incomplete lineage sorting with nonbinary species trees. Bioinforma Oxf Engl. 2012;28:i409–15. 93. The Angiosperm Phylogeny Group. An update of the angiosperm phylogeny group classification for the orders and families of flowering plants: APG IV. Bot J Linn Soc. 2016;181:1–20. 94. Zou X-H, Zhang F-M, Zhang J-G, Zang L-L, Tang L, Wang J, et al. Analysis of 142 genes resolves the rapid diversification of the rice genus. Genome Biol. 2008;9:R49. 95. Ming R, VanBuren R, Liu Y, Yang M, Han Y, Li L-T, et al. Genome of the long- living sacred lotus (Nelumbo nucifera Gaertn.). Genome Biol. 2013;14:R41. 96. Albert VA, Barbazuk WB, dePamphilis CW, Der JP, Leebens-Mack J, Ma H, et al. The Amborella genome and the evolution of flowering plants. Science. 2013;342:1467. 97. Rensing SA, Lang D, Zimmer AD, Terry A, Salamov A, Shapiro H, et al. The physcomitrella genome reveals evolutionary insights into the conquest of land by plants. Science. 2008;319:64–9. 98. Proost S, Van Bel M, Vaneechoutte D, Van de Peer Y, Inzé D, Mueller-Roeber B, et al. PLAZA 3.0: an access point for plant comparative genomics. Nucleic Acids Res. 2014;43:D974–81. 99. Proost S, Fostier J, De Witte D, Dhoedt B, Demeester P, Van de Peer Y, et al. Submit your next manuscript to BioMed Central I-ADHoRe 3.0—fast and sensitive detection of genomic homology in and we will help you at every step: extremely large data sets. Nucleic Acids Res. 2012;40:e11. 100. Yang Z. PAML 4: phylogenetic analysis by maximum likelihood. Mol Biol • We accept pre-submission inquiries Evol. 2007;24:1586–91. � Our selector tool helps you to find the most relevant journal 101. Kagale S, Robinson SJ, Nixon J, Xiao R, Huebert T, Condie J, et al. Polyploid evolution of the brassicaceae during the cenozoic era. Plant Cell. 2014;26: � We provide round the clock customer support 2777–91. � Convenient online submission 102. Tang H, Wang X, Bowers JE, Ming R, Alam M, Paterson AH. Unraveling � Thorough peer review ancient hexaploidy through multiply-aligned angiosperm gene maps. Genome Res. 2008;18:1944–54. � Inclusion in PubMed and all major indexing services � Maximum visibility for your research Submit your manuscript at www.biomedcentral.com/submit

Journal

BMC Evolutionary BiologySpringer Journals

Published: May 29, 2018

References

You’re reading a free preview. Subscribe to read the entire article.


DeepDyve is your
personal research library

It’s your single place to instantly
discover and read the research
that matters to you.

Enjoy affordable access to
over 18 million articles from more than
15,000 peer-reviewed journals.

All for just $49/month

Explore the DeepDyve Library

Search

Query the DeepDyve database, plus search all of PubMed and Google Scholar seamlessly

Organize

Save any article or search result from DeepDyve, PubMed, and Google Scholar... all in one place.

Access

Get unlimited, online access to over 18 million full-text articles from more than 15,000 scientific journals.

Your journals are on DeepDyve

Read from thousands of the leading scholarly journals from SpringerNature, Elsevier, Wiley-Blackwell, Oxford University Press and more.

All the latest content is available, no embargo periods.

See the journals in your area

DeepDyve

Freelancer

DeepDyve

Pro

Price

FREE

$49/month
$360/year

Save searches from
Google Scholar,
PubMed

Create lists to
organize your research

Export lists, citations

Read DeepDyve articles

Abstract access only

Unlimited access to over
18 million full-text articles

Print

20 pages / month

PDF Discount

20% off