ARTICLE DOI: 10.1038/s41467-018-04628-4 OPEN The evolution of the temporal program of genome replication 1 1 1 1 2 2 Nicolas Agier , Stéphane Delmas , Qing Zhang , Aubin Fleiss , Yan Jaszczyszyn , Erwin van Dijk , 2 1 1,3 1 Claude Thermes , Martin Weigt , Marco Cosentino-Lagomarsino & Gilles Fischer Genome replication is highly regulated in time and space, but the rules governing the remodeling of these programs during evolution remain largely unknown. We generated genome-wide replication timing proﬁles for ten Lachancea yeasts, covering a continuous evolutionary range from closely related to more divergent species. We show that replication programs primarily evolve through a highly dynamic evolutionary renewal of the cohort of active replication origins. We found that gained origins appear with low activity yet become more efﬁcient and ﬁre earlier as they evolutionarily age. By contrast, origins that are lost comprise the complete range of ﬁring strength. Additionally, they preferentially occur in close vicinity to strong origins. Interestingly, despite high evolutionary turnover, active replication origins remain regularly spaced along chromosomes in all species, suggesting that origin distribution is optimized to limit large inter-origin intervals. We propose a model on the evolutionary birth, death, and conservation of active replication origins. 1 2 Sorbonne Université, CNRS, Institut de Biologie Paris-Seine, Laboratory of Computational and Quantitative Biology, F-75005 Paris, France. Institute for Integrative Biology of the Cell, UMR9198, CNRS CEA Univ Paris-Sud, Université Paris-Saclay, 91190 Gif sur Yvette Cedex, France. IFOM (FIRC Institute of Molecular Oncology), Via Adamello 16, 20139 Milano MI, Italy. Correspondence and requests for materials should be addressed to G.F. (email: gilles.ﬁscher@sorbonne-universite.fr) NATURE COMMUNICATIONS (2018) 9:2199 DOI: 10.1038/s41467-018-04628-4 www.nature.com/naturecommunications 1 | | | 1234567890():,; ARTICLE NATURE COMMUNICATIONS | DOI: 10.1038/s41467-018-04628-4 o ensure the completion of genome doubling before cell existence of conserved sequence elements necessary for origin division, eukaryotic chromosomes initiate DNA replication function in other yeast genomes as well as a conserved role for Tfrom multiple sites, termed replication origins. Mammalian centromeres and telomeres in deﬁning early and late origin ﬁring, 25,26,28–30 and yeast genomes initiate replication at hundreds and thousands respectively . Comparative analysis of replication timing of active origins respectively, that are selected from a larger pool also revealed that at short evolutionary distances, between Sac- 1–3 of possible autonomously replicating sequences (ARS) .In charomyces species, most active origins remained conserved both yeast, active origins are distributed throughout the genome, in location and in activation time, resulting in an important mainly at non-transcribed and nucleosome-depleted sequences conservation in the temporal order of genome replication .On and comprise a speciﬁc DNA motif called ARS consensus the contrary, comparisons between more distantly related species sequence (ACS) which is bound by the Origin Recognition revealed that the conservation of the temporal organization of 4–6 Complex . Yet only a subset of the pre-determined replication replication was restricted to speciﬁc genetic elements such as initiation loci will be activated by different cells in a population. centromeres and histone genes that are among the ﬁrst regions to DNA combing and ﬂuorescence microscopy conﬁrmed the sto- replicate and telomeres that are among the last . Only a small 7–10 chastic activation of origins at the individual cell level . proportion of replication origins (5−30%) are conserved in Nevertheless, within populations, a speciﬁc temporal program of position between S. cerevisiae, L. waltii, L. kluyveri or K. lac- 3,23,28 genome replication emerges and can be recapitulated by aver- tis . Such a level of reprogramming contrasts with the global aging the heterogeneous replication kinetics of a large number of conservation observed between Saccharomyces genomes and 7,11 cells . This can be achieved experimentally by time-course precludes any chance to identify the selective forces responsible measurements of replication progression during S-phase in a cell for the conservation, gain and loss of replication origins over population, using microarray hybridization or high throughput evolutionary time. sequencing . Mathematical modeling of such replication timing To overcome these limitations, we focus on the continuous data gives access to the stochastic ﬁring components of individual evolutionary range covered by the genus Lachancea, from closely 10,12–14 31 origins . First, the efﬁciency describes the probability of related to more diverged species .We ﬁrst characterize the activation for each replication origin, which represents the pro- genome replication dynamics and origin usage at the population portion of cells in a population in which the origin actively level in ten Lachancea species. This unique dataset allows us to 15,16 ﬁres . Second, each origin has an intrinsic strength called infer all events of origin gains and losses since these species characteristic ﬁring time, which represent its time, early to late, of diverged from their last common ancestor. We then correlate the activation during S-phase in the absence of interfering neigh- functional properties of replication origins from equivalent evo- 17,18 boring origins . Firing time and efﬁciency both reﬂect the lutionary ages, such as their chromosomal location, ﬁring time probability of origin ﬁring, either per unit of time or over the and efﬁciency to reveal new rules that govern the birth, death and entire S-phase respectively. Consequently, timing and efﬁciency conservation of active replication origins during evolution. 16,19,20 are correlated , the fundamental difference being that efﬁ- ciency incorporates the effect of passive replication from forks originating from different origins while ﬁring rate is an origin- Results speciﬁc quantity. Temporal programs of genome replication in Lachancea.We Early microarray-based studies in Saccharomyces cerevisiae measured the temporal programs of genome replication by established the ﬁrst genome-wide replication program providing assessing DNA copy number change during S-phase, for ten both replication timing information for all genomic sequences Lachancea species with high-quality genome assemblies, in order 18,21 31–33 and replication origin location along chromosomes . Since to determine the mean replication time, called Trep (Sup- then, temporal programs of genome replication have been plementary Fig. 1 and Supplementary Data 1). Cells were syn- established for many eukaryotic genomes including three closely chronized by physical discrimination of G1 cells and related Saccharomyces sensu stricto species and nine more synchronously released into S-phase. Time-point samples were distantly related species (Candida glabrata, Naumovozyma cas- taken during S-phase until cells reached the G2 phase and DNA tellii, Tetrapisispora blattae, Zygosaccharomyces rouxii, Kluyver- samples were analyzed using Illumina deep sequencing . We also omyces lactis, Lachancea waltii, Lachancea kluyveri, Pichia performed a Marker Frequency Analysis (MFA), consisting in 3,23–27 Pastoris and Candida albicans) . An early comparative measuring replication dynamics directly from an exponentially genomics study among closely related Saccharomyces species growing cell population and compared our results to published 3,23 showed that phylogenetic conservation can be used to determine proﬁles for L. waltii and L. kluyveri . For each species, the Trep the genome-wide location of replication origins in S. cerevisiae . and MFA proﬁles were highly correlated, indicating good Additional comparative genomic approaches later conﬁrmed the reproducibility (Supplementary Fig. 2). In addition, we inferred Table 1 Replication proﬁle features Species Genome size Nb ORI ORI every … Median replication time (min) Average S-phase duration (min) Centromere Telomere Histone L. fantastica 11.3 Mb 256 44 kb 6 ± 2 29 ± 7 3 ± 3 39 L. meyersii 11.3 Mb 237 48 kb 2 ± 2 17 ± 5 1 ± 1 24 L. dasiensis 10.7 Mb 234 46 kb 4 ± 2 24 ± 6 3 ± 1 36 L. nothofagi 11.3 Mb 241 47 kb 6 ± 4 28 ± 9 3 ± 1 42 L. waltii 10.2 Mb 225 46 kb 9 ± 3 24 ± 11 7 ± 1 41 L. thermotolerans 10.4 Mb 211 49 kb 2 ± 2 16 ± 5 1 ± 1 29 L. mirantina 10.1 Mb 202 50 kb 7 ± 3 35 ± 16 1 ± 1 48 L. fermentati 10.3 Mb 214 48 kb 2 ± 4 42 ± 4 1 ± 1 47 L. cidri 10.1 Mb 200 51 kb 4 ± 3 25 ± 4 2 ± 1 41 L. kluyveri 11.3 Mb 244 46 kb 1 ± 1 26 ± 6 1 ± 1 32 2 NATURE COMMUNICATIONS (2018) 9:2199 DOI: 10.1038/s41467-018-04628-4 www.nature.com/naturecommunications | | | NATURE COMMUNICATIONS | DOI: 10.1038/s41467-018-04628-4 ARTICLE a b 1 = 0.812 0.0 S. cerevisiae S. paradoxus 0.5 S. arboricolus 4 S. uvarum 1.0 L. fantastica 0.0 0.5 1.0 L. meyersii 2 = 0.738 L. dasiensis 0.0 L. nothofagi L. thermotolerans 0.5 L. waltii 1.0 L. mirantina 0.0 0.5 1.0 L. fermentati = 0.413 L. cidri 0.0 L. kluyveri C. albicans 0.5 0.08 1.0 0.0 0.5 1.0 1.0 = 0.320 0.8 0.0 0.6 0.5 0.4 1.0 0.0 0.5 1.0 0.2 = –0.011 0.0 0.0 0.5 –0.2 0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.0 Phylogenetic distance 0.0 0.5 1.0 Chromosomal coordinates (Mb) Fig. 1 Evolution of replication timing proﬁles in yeast. a Phylogeny of 15 yeast species inferred from a maximum likelihood analysis of a concatenated alignment of 510 protein families. The circled numbers to the right of the tree relate to the pairwise species comparisons presented in b. b Synteny-based projections of replication timing proﬁles between pairs of species. We projected the replication timing of one species onto the chromosomal coordinates of a second species, using synteny conservation between the two genomes. The color coding is identical to a. The rho value above each proﬁle indicates the genome-wide Spearman’s rank correlation coefﬁcient between the replication timing programs of the two compared species. Vertical gray bars indicate the position of synteny breakpoints between genomes. Comparisons 1, 4, and 5 represent the projections of S. uvarum, L. fantastica, and C. albicans timing data, respectively, onto the replication proﬁle of S. cerevisiae chromosome XII. The gray rectangle symbolizes the rDNA locus in this chromosome. The comparisons 2 and 3 represent the projections of L. fantastica and L. kluyveri timing data, respectively, onto the replication proﬁle of L. meyersii chromosome 0D. c Correlation between replication proﬁle conservation and phylogenetic distance. Spearman coefﬁcients on the y-axis correspond to the synteny-based projections of genome replication timing proﬁles between pairs of species as illustrated in b. Red, blue, purple, and black dot series correspond to intra- Saccharomyces, intra-Lachancea, Saccharomyces vs Lachancea and C. albicans vs all other species comparisons, respectively. The gray boxplots show the Spearman correlation values of a null model where, for each pairwise comparison, we applied a random offset to the coordinates of the syntenic genes in one of the two compared genomes. Whiskers are deﬁned by selecting the values within the range of the 75th and 25th percentile plus or minus 1.5 x (75th percentile – 25th percentile) respectively, and at the maximum distance from the median. The offset was re-deﬁned 100 times in both directions and correlations were calculated for each combination of one offset proﬁle from one genome and one original proﬁle from the other genome the location of replication origins along the chromosomes by However, we observed a constant origin density along chromo- applying a peak calling method and stringently deﬁned active somes of one origin every 47 kb (Supplementary Fig. 3a−c). replication origins only if peaks were co-detected in both proﬁles Additionally, origins are more regularly spaced than what would (Supplementary Fig. 2 and Supplementary Data 1). be expected by chance (Supplementary Fig. 3d), as previously We identiﬁed 2264 active replication origins in the ten described for S. cerevisiae, K. lactis, L. kluyveri, and L. waltii .We genomes (Supplementary Data 1). These genomes undoubtedly also found common features shared between all ten replication contain more weak origins with efﬁciencies too low to be detected programs, such as the spatial alternation between early and late by our approach. The total number of active origins per genome replicating large chromosomal regions, with the exception of the 23,30 varies from 200 in L. cidri to 256 in L. fantastica (Table 1). left arm of chromosome C in L. kluyveri , as well as both early NATURE COMMUNICATIONS (2018) 9:2199 DOI: 10.1038/s41467-018-04628-4 www.nature.com/naturecommunications 3 | | | Spearman correlation () Replication timing (0 = early, 1 = late) ARTICLE NATURE COMMUNICATIONS | DOI: 10.1038/s41467-018-04628-4 27 2 replicating centromeres and histone genes and late replicating coefﬁcients linearly anti-correlate with phylogenetic distances (R −16 telomeres (Supplementary Fig. 1 and Table 1). = 0.88, P < 2.2×10 ), showing that replication timing linearly evolves alongside protein divergence (Fig. 1c). This result indi- cates that the genus Lachancea is an ideal candidate to investigate Continuous evolutionary range of genome replication proﬁles. the causes behind the progressive reprogramming of genome We calculated the Spearman’s rho for replication timing pro- replication during evolution. grams between pairs of species to determine their degree of conservation (see Methods). We included in the analysis our new Lachancea replication timing data and those previously published Local impact of genome rearrangements on proﬁle evolution. 22 25 in Saccharomyces species , and Candida albicans (Fig. 1a, b). We ﬁrst determined that the number of syntenic homologs, We found high degrees of replication proﬁle correlation between considered as orthologous genes hereafter, used to project the all sensu stricto Saccharomyces species, varying from 0.79 to 0.86, replication timing of one genome onto another was globally as previously reported . On the contrary, all pairwise compar- constant, ruling out the possibility that the anti-correlation isons involving species from different genera (Saccharomyces, observed in Fig. 1c resulted from a bias in synteny detection Lachancea or Candida) show little to no degree of correlation (Fig. 2a). By contrast, the number of synteny blocks increases preventing any reliable comparative study. Interestingly, com- with phylogenetic distance (Fig. 2a) due to the accumulation of 31,37,38 parisons within the genus Lachancea revealed that the Spearman’s genome rearrangements during evolution . Importantly, the rho stagger from highly conserved replication proﬁles (ρ = 0.74) anticorrelation slopes based on the local coefﬁcients both near to more variable programs (ρ = 0.41), with all intermediate levels breakpoints and within conserved synteny blocks are similar to of correlation in-between (Supplementary Fig. 4). These that of the global trend, revealing that the local effects of genome a b 350 0.8 [ ] 0.6 [ ] 0.4 Global rho No synteny break 0.2 50 Synteny break 0 0 0 0 0.5 1.0 1.5 2.0 0 0.5 1.0 1.5 2.0 Phylogenetic distance Phylogenetic distance cd 0.4 R = 0.0991 200 R = 0.4535 0.3 0.2 0.1 0 50 0 0.5 1.0 1.5 2.0 0 0.5 1.0 1.5 2.0 Phylogenetic distance Phylogenetic distance Fig. 2 Evolution of genomic properties in the genus Lachancea. In all panels, each point corresponds to a pairwise comparison between two Lachancea species. The values of the property indicated on the y-axis are plotted as a function of the phylogenetic divergence between all pairs of Lachancea species as deﬁned in ref. . a The number of syntenic homologs and synteny blocks are represented by black and open circles, respectively. b Comparison of global and local correlation coefﬁcients between pairs of genome replication timing proﬁles. The global coefﬁcient plot (black circles) is identical to that of the 2 −16 blue dots in Fig. 1c(R = 0.868, P < 2.2×10 ). Open squares and gray triangles represent the local correlation coefﬁcients averaged over the same 2 −6 number of regions of ﬁve genes that are either located 25 genes away from the synteny breakpoints (R = 0.502, P = 2.24×10 ) or directly ﬂank the 2 −6 breakpoints (R = 0.477, P = 1.6×10 ), respectively. c Origin ﬁring time differences between conserved replication origins in all pairs of species. The y- axis corresponds to the normalized replication timing difference between orthologous origins. For each species, the average replication timing data (presented in Supplementary Fig. 1) were normalized between 0 and 1. The distributions of differences in normalized activation timing between orthologous origins are shown for all pairwise comparisons. The black dots indicate the median values for pairwise comparisons and the gray bars show the standard 2 −07 deviations. d Relative conservation of active replication origins in the different pairs of Lachancea genomes (R = 0.45, P = 3.9×10 ). The y-axis corresponds to the number of active origins found to be conserved between each pair of species 4 NATURE COMMUNICATIONS (2018) 9:2199 DOI: 10.1038/s41467-018-04628-4 www.nature.com/naturecommunications | | | Median delta replication timing # Orthologous genes (x1000) # Synteny blocks # Origins in family Spearman correlation () NATURE COMMUNICATIONS | DOI: 10.1038/s41467-018-04628-4 ARTICLE rearrangements are not sufﬁcient to explain the evolution of the replication programs did not result from a progressive change in replication programs (Fig. 2b). However, local Spearman’s rho origin activation times. However, we found a negative correlation were on average weaker around synteny breakpoints than within between the number of conserved replication origins within pairs conserved synteny regions, suggesting that genome rearrange- of species and their phylogenetic distances (Fig. 2d). Given that ments have a local impact on the evolution of replication proﬁles phylogenetic distances also correlate with replication timing (Fig. 2b). Yet, part of this decay could be due to a technical Spearman’s rho (Fig. 1c), it results in the numbers of conserved component because projected proﬁles right at the edge of synteny origins correlating with the conservation of the replication timing 2 −09 blocks are necessarily discontinuous. Furthermore, if this was the proﬁles (R = 0.57, P = 2.5×10 ). These results suggest that the sole component responsible for the correlation decay, the coef- appearance and disappearance of active replication origins would ﬁcients should increase abruptly in the next windows to reach the be the dominant process for shaping replication proﬁles during level found in conserved synteny regions. However, we found a evolution. gradual increase of the correlation coefﬁcients with increasing distances from the breakpoints in the range of 5−20 genes indicating the implication of a biological component in the Dynamic evolutionary turnover of active replication origins. observed discontinuity of the proﬁle at breakpoints (Supple- We reconstructed the evolutionary history of active replication mentary Fig. 5). This biological component could be the con- origins along the branches of the phylogenetic tree under a birth- sequence of fusing regions with different replication timing. death evolutionary model using L. kluyveri as the outgroup spe- cies (see Methods). We identiﬁed 1310 origins clustered in 220 families in the nine other Lachancea genomes that were vertically Turnover of active origins drives timing program evolution. inherited from the last common ancestor of the clade designated We next sought whether the number of replication origins, their as L.A2 (Fig. 3a). We will refer to them as ancestral origins differential ﬁring times, or their conservation levels could have hereafter. Extant genomes comprise on average 68% of ancestral driven the evolution of replication timing programs. First, we origins (Fig. 3b), varying from 59 to 83% (Supplementary Fig. 9). found no correlation between replication proﬁle conservation and Only 37 out of these 220 ancestral families were faithfully the raw number of active origins in the genomes (Supplementary transmitted from L.A2 without any subsequent loss of origins Fig. 6). Second, we constructed families of orthologous replication along the branches of the tree. This means that 83% of the origins based on synteny conservation (see Methods), to test ancestral families underwent at least one event of origin gain or whether replication program evolution could have resulted from loss, demonstrating that the evolutionary turnover of the cohort differential conservation of active replication origins. After ﬁl- of active replication origins is very dynamic (Fig. 3b). We denote tering 96 subtelomeric origins, we obtained 374 multi-species here a gain of a new active origin as the event of de novo origin families comprising 1956 origins and 212 species-speciﬁc emergence. Alternatively, it could also correspond to the increase singleton origins (Supplementary Figs. 7 and 8). We found that in efﬁciency of an origin undetectable in other species. Our the differences in origin activity, measured by peak height on deﬁnition of an origin loss is the inactivation of a previously replication timing proﬁles, between orthologous origins con- active origin at the chromosomal level, regardless of its capacity served in different species did not correlate with their phyloge- to sustain the autonomous replication of a plasmid (ARS activity). netic distances. The range of median timing differences between We also cannot rule out that an origin loss would correspond orthologous origins remains between 10 and 22% despite varying either to the reduction of its chromosomal activity to a level phylogenetic distances between compared species (Fig. 2c). This below the sensitivity of the experiment or to the rise of an earlier absence of correlation suggests that the evolution of the origin nearby, resulting in passive replication of the considered ab 46|19 Origin gains|losses LAFA 100 18|20 14.5% gained in b9 44|34 external branches 32.2% LAME b8 13|10 Lachancea specific 51|47 17.7% gained in 12|12 LADA origins 211 75 internal branches b7 40|31 LANO 15|21 b5 32|41 LATH 9|19 14|17 50 b6 40|30 b4 LAWA 67.8% ancestral 44|76 LAMI L.A2 origins b2 17|22 LAFE 16|37 L.A1 199 b3 b1 28|41 LACI 16.2% shared by all species LAKL 0 0.2 Fig. 3 Evolution of the repertoire of active replication origins in the genus Lachancea. a The phylogeny of the ten Lachancea species is taken from ref. . Abbreviations of the species names: LAFA: L. fantastica, LAME: L. meyersii, LADA: L. dasiensis, LANO: L. nothofagi, LATH: L. thermotolerans, LAWA: L. waltii, LAMI: L. mirantina, LAFE: L. fermentati, LACI: L. cidri and LAKL: L. kluyveri. LAKL was used as the outgroup species; therefore evolutionary events that occurred on both the LAKL and the b2 branches (dotted lines) could not be retraced. Internal branches, labeled b3 to b9, and terminal branches are drawn in black and blue, respectively. The number of origin gains (in green) and losses (in red) were estimated for each branch of the tree under a birth-death evolutionary model. The inferred number of active replication origins in the ancestral genomes is indicated next to the corresponding node of the tree. b The histogram indicates the proportion of active origins that were vertically inherited from the L.A2 ancestor (in gray), gained on internal branches of the tree (in black) and gained on terminal branches (in blue). The proportion of ancestral origins retained in all nine genomes is indicated by the open frame NATURE COMMUNICATIONS (2018) 9:2199 DOI: 10.1038/s41467-018-04628-4 www.nature.com/naturecommunications 5 | | | Proportion of origin in current species (%) ARTICLE NATURE COMMUNICATIONS | DOI: 10.1038/s41467-018-04628-4 C Nearest G Nearest L Nearest Expected Expected Expected Observed Observed Observed *** *** *** C Nearest G Nearest L Nearest 1.0 1.0 Next to 0.8 0.8 lost ORI Next to lost ORI 0.6 0.6 0.4 0.4 0.2 0.2 0 0 C G *** *** d e 1.0 1.0 0.8 0.8 b3 b6 LATH LACI b4 LAFE LAMI LAWA 0.6 0.6 b5 LADA b7 0.4 0.4 LAME b9 LAFA LANO b8 0.2 0.2 0 0 1 239 4 5 6 7 810 0.0 0.2 0.4 0.6 0.8 # Species in families Divergence from L.A2 origin. We located on the phylogenetic tree 477 losses and 439 origins represent 32% of all active replication origins in the nine gains since the nine Lachancea species diverged from L.A2 Lachancea genomes today. (Fig. 3a). These conservative ﬁgures exclude 94 events that The number of origin losses per branch, and to a lesser extent occurred on the two most internal branches of the tree (b2 and L. the number of gains, signiﬁcantly correlate with branch length, kluyveri), for which it was not possible to discriminate an origin suggesting that the accumulation of point mutations could have loss on one branch from an origin gain on the other. The 439 gain had an impact on the dynamics of the set of active origins events resulted in 623 Lachancea-speciﬁc origins because a single (Supplementary Fig. 10a). Additionally origin losses and gains gain event occurring on an internal branch of the tree results in also correlate with the number of inversions, translocations, and several origins present in different descendant species. These duplications per branch (Supplementary Fig. 10b, c). Although 6 NATURE COMMUNICATIONS (2018) 9:2199 DOI: 10.1038/s41467-018-04628-4 www.nature.com/naturecommunications | | | Origin efficiency Origin efficiency Distance to nearest origin (x10 bp) Origin efficiency Origin efficiency L.A2 NATURE COMMUNICATIONS | DOI: 10.1038/s41467-018-04628-4 ARTICLE there is no obvious causal relationship between these observa- origins are preferentially lost when they are in the close vicinity of tions, they reveal that protein divergence, chromosome archi- a newborn origin. tecture, and replication programs evolve in a coordinated Next we compared the efﬁciencies between the three origin manner. categories and found that gained and to a lesser extent lost origins have lower efﬁciencies than conserved (Fig. 4b). We also found that the efﬁciencies of the nearest origins directly ﬂanking lost Origin losses and gains are linked both in time and space.To origins are higher than those ﬂanking both gained and conserved. further characterize the evolutionary dynamics of active replica- At ﬁrst this may seem paradoxical because it suggests that an tion origins, we compared the physical and functional properties origin would be preferentially lost when it is ﬂanked by a gained between conserved, lost, and gained origins. To ensure we used an and highly efﬁcient origin, while gained origins have in general appropriate proxy of the properties at the time they were gained the lowest efﬁciencies of all three categories (Fig. 4b). However, or lost, we focused on the most recent events that occurred along we found that the subset of both gained and conserved origins in the Lachancea phylogeny comparing sister species within the close vicinity to lost origins include the most efﬁcient origins three most closely related pairs (L. fantastica/L. meyersii, L. within their respective category (Fig. 4c). thermotolerans/L. waltii and L. fermentati/L. cidri, see blue shaded Therefore, a clear picture of the evolutionary dynamics of area in Fig. 3a). After ﬁltering for dubious cases of losses and replication origins emerges from these analyses, where origins are gains (see Methods), this data set comprises 997 conserved, 151 preferentially lost when located close to a newborn origin gained, and 151 lost origins. As physical properties, we studied emerging with a high ﬁring efﬁciency. Similar results were the distance between each origin and its centromere, closest tel- obtained with characteristic ﬁring times, as late ﬁring origins are omere, closest genome rearrangement breakpoint, and closest preferentially lost near early ﬁring origins that have emerged neighboring origin. As functional properties, we considered efﬁ- recently on the same phylogenetic branch (Supplementary ciency and characteristic ﬁring time of each individual origin and Fig. 12). that of its closest neighboring origin. We derived the efﬁciency and ﬁring time of each individual origin in the ten Lachancea genomes by ﬁtting a stochastic mathematical model to our Mechanisms of origin losses. To get information on the relative replication timing data (see Methods). For lost origins, we rates at which conserved and gained origins would be lost, we inferred their characteristics based on the corresponding features looked at their type (conserved or gained) in ancestral branches of their orthologous origins in the sister genomes. b3, b6, and b9 (Fig. 3a), i.e. immediately prior to their loss in Firstly, we found no difference in the origin-centromere and terminal branches. For the 79 cases of losses that occurred next to origin-telomere distances between the conserved, gained, and lost a conserved origin, we found that the origin lost in the terminal origins, showing that their relative chromosomal location did not branches was systematically a conserved origin in the ancestral inﬂuence their evolutionary fate (Supplementary Fig. 11a). species, suggesting that old origins are more likely to be lost than Secondly, using 74 rearrangements that occurred along the six new origins (P = 0.05, Supplementary Fig. 13). terminal branches we found no clear association with replica- To further characterize the mechanisms by which origins are tion origins, neither in the ﬁrst 5 kb that directly ﬂank the lost, we compared our set of 30 origin losses in the L. waltii breakpoints nor further away (Supplementary Fig. 11b). Similarly, branch to an independent dataset previously generated in this for each origin category (conserved, gained or lost), we found no species . The authors identiﬁed 194 ARS, of which 156 were association with the breakpoints (Supplementary Fig. 11c). All shown to function as chromosomal replication origins during S- these results suggest that chromosomal rearrangements play little phase . We found that 21 of our losses did not overlap with any role, if any, in the evolutionary dynamics of replication origins ARS and 4 corresponded to an ARS but were devoid of and conﬁrm our previous ﬁndings that replication origins and chromosomal activity. The ﬁve remaining cases of origin losses synteny breakpoints do not signiﬁcantly collocate in L. kluyveri in our study matched chromosomally active origins in the and L. waltii genomes . Lastly, we found that origin losses and to previous dataset (17%, by comparison 79% of the 156 active a lesser extent gains tend to occur closer to their nearest origin origins were also detected in our experiments). Therefore, a than conserved origins (median distances of 13 kb and 24 kb vs minority of our loss events would correspond to low efﬁciency 36 kb, respectively, Fig. 4a), indicating that origins tend to be lost origins that escaped detection because the proximity of a highly when they are in close vicinity to another replication origin. active origin. The remaining 83% correspond to true origin losses Moreover, we found a strong physical association between lost that could occur through the loss of ARS activity. In order to and newly gained origins (Fig. 4a). These results show that, within experimentally conﬁrm this result, we performed an ARS assay the same phylogenetic branch of the tree, active replication for 16 loci of the L. thermotolerans genome corresponding to Fig. 4 Physical and functional properties of lost, gained, and conserved replication origins. The original dataset comprises 1073 conserved, 207 gained, and −03 187 lost origins; however, after ﬁltering for dubious cases the number of total gain and loss is both 151 events. In all panels, *** and * represent P< 10 −02 and P< 5×10 , respectively, using a chi-square two-sample test. a Distribution of the distance separating Conserved (C), Gain (G), and Lost (L) origins from their nearest origins. The pie charts show the expected, i.e. the theoretical percentages if the nearest origins were randomly sampled in the population of origins and the observed proportions of C, G, and L nearest origins for the three categories. b Distribution of the efﬁciencies of C, G, and L origins and of their nearest origins. c Split distributions of the efﬁciencies of C and G origins based on the category of their nearest origin. d Distribution of the ﬁring efﬁciencies as a function of the number of Lachancea species comprised into the families of orthologous origins (R = 0.891 on median values, −05 P = 4.1×10 ). For boxplots in a–d, whiskers are deﬁned by selecting the values within the range of the 75th and 25th percentile plus or minus 1.5 x (75th percentile – 25th percentile) respectively, and at the maximum distance from the median. e Correlation between origin efﬁciency and origin age for 546 2 −03 origin families comprising 220 ancestral and 326 Lachancea-speciﬁc families (R = 0.44, P = 1.3×10 ). Each dot represents the median efﬁciency of all the replication origins that appeared in a given branch of the phylogenetic tree. The x-axis represents the total branch length between L.A2 (the last common ancestor of the nine species, indicated in light gray) and the branch of appearance of the new origins, placing the most ancestral origins on the left and the youngest on the right of the plot. Vertical error bars represent the standard deviation of origin efﬁciencies and horizontal bars represent the span of the branch lengths based on the phylogenetic tree in Fig. 3 NATURE COMMUNICATIONS (2018) 9:2199 DOI: 10.1038/s41467-018-04628-4 www.nature.com/naturecommunications 7 | | | Evolutionary time ARTICLE NATURE COMMUNICATIONS | DOI: 10.1038/s41467-018-04628-4 experimentally deﬁned active ARS in the L. waltii genome . Four Discussion of them are deﬁned here as conserved between the two genomes Our study is the ﬁrst detailed reconstruction of the evolutionary while the remaining 12 correspond to origin losses in L. history of genome replication in eukaryotes. Using the model thermotolerans. We found that all four conserved origins show yeast genus Lachancea, which exhibit a continuous evolutionary a clear ARS activity (Supplementary Fig. 14). Interestingly, the range from closely related to more divergent genomes, we cap- conserved origin of L. thermotolerans with the lowest ARS activity tured all intermediate states between highly conserved and sig- is also the least efﬁcient and latest ﬁring origin. By contrast, only niﬁcantly reprogrammed temporal orders of genome replication. 4 of the 12 loci corresponding to lost origins show ARS activity The replication-timing program evolves coordinately with protein (Supplementary Fig. 14). The remaining eight loci have lost their sequence and chromosome architecture (Fig. 1c and Supple- ARS activity. We then tested whether the loss of ARS activity mentary Fig. 10). Previous data suggested that replication origins could be due to mutations affecting the ACS or its B1 tend to colocalize with synteny breakpoints between yeast species 2,5,39 23,28,41,42 element . We identiﬁed 75 syntenic origins between L. more distantly related than Lachancea species . However, thermotolerans and L. waltii origins with known ACS . Using we surprisingly discovered that the accumulation of chromosomal MEME , we searched for the L. thermotolerans ACS and rearrangements did not drive the evolution of the replication identiﬁed a motif of 36 nucleotides highly similar to the ACS and B1 element of both L. waltii and S. cerevisiae; therefore, we deﬁned it as the L. thermotolerans ACS (Supplementary Fig. 15). Early Using the same approach, we searched for the ACS across the other species, we found a motif highly similar to the L. waltii/L. thermotolerans ACS in L. fantastica and L. nothofagi, a more degenerate motif in L. meyersii and L. dasiensis and we failed to detect any motif in the four more distantly related species (Supplementary Fig. 15). Interestingly, the ACS of L. thermo- Late tolerans was identiﬁed in the four conserved origins tested for ARS activity. However, this ACS was found in only 6 of the 12 tested lost origins, 3 of the 6 displaying ARS activity. Moreover, the logo resulting from the four conserved origins was highly similar to the ACS while the one derived from the six lost origins was slightly less conserved. For only the three lost origins without any ARS activity, the ACS motif is unrecognizable (Supplemen- tary Fig. 16). These results suggest that the loss of the ARS activity is related to the degeneration of the ACS. G Evolutionary age impacts active replication origin features.We found that origin efﬁciencies positively correlate with the number of species represented in the origin families (Fig. 4d), suggesting a relationship with the evolutionary age of the origins. All indivi- dual lineages contributed to this signal, as we observed that the L G efﬁciency of species-speciﬁc origins is signiﬁcantly lower than that of ubiquitous origins (Supplementary Fig. 17a). Species- speciﬁc origins also ﬁre later than conserved origins (Supple- mentary Fig. 17b). The relationship between efﬁciency and origin family size did not result from a lack of precision in origin location for weak and isolated origins as compared to strong and ubiquitous origins (Supplementary Fig. 17c, d). The number of species per family might not be necessarily the best proxy for the evolutionary age of the origins as a family comprising four species could be older than a family that comprises ﬁve if the former underwent more origin losses than the latter. In addition, species- speciﬁc origins could in fact have different ages because of the variation in length between the terminal branches of the tree. In order to remove these confounding factors, we used our recon- Loss-first Gain-first struction of replication origin history (Fig. 3a) to assign a phy- logenetic age to each origin family, corresponding to the Fig. 5 Evolutionary model of the temporal program of genome replication. a cumulated branch lengths between L.A2 and its branch of origi- The diagram illustrates ﬁve principles governing the dynamics of replication nation in the phylogenetic tree. We found a correlation between origins at large evolutionary scale: (i) chromosomally active replication the phylogenetic age and the efﬁciency of the origin families origins are continuously gained (G) and lost (L) during evolution, (ii) (Fig. 4e). Both the ancestral and oldest Lachancea-speciﬁc origins, conserved origins (C) are more ancestral and have on average stronger originating from L.A2 and the b3/b4 branches respectively, ﬁre activity than younger origins, (iii) newly gained origins have on average low more efﬁciently on average than the youngest origins that were activity and their strength increases over evolutionary times, (iv) conserved gained on the small terminal branches leading to L fantastica and origins are preferentially lost and (v) origin loss and gain of strong origins L. meyersii (Fig. 4e). The same was observed with characteristic occur in a direct chromosomal vicinity. b The two pathways called "Loss- ﬁring times (Supplementary Fig. 18). Therefore these results show ﬁrst" and "Gain-ﬁrst" describe, at shorter evolutionary distances, the that origin activity correlates with the evolutionary age of the coordinated gain of a strong origin with the loss of a strong (left) or weak replication origins. (right) origin 8 NATURE COMMUNICATIONS (2018) 9:2199 DOI: 10.1038/s41467-018-04628-4 www.nature.com/naturecommunications | | | Replication time NATURE COMMUNICATIONS | DOI: 10.1038/s41467-018-04628-4 ARTICLE program (Fig. 2b and Supplementary Fig. 11b, c). Here, we found in terms of physical distance along the chromosome and time of that the dominant process in replication proﬁle evolution is the appearance, i.e. in the same branch of the phylogenetic tree highly dynamic evolutionary renewal of the cohort of replication (Fig. 4a−c and Supplementary Fig. 12b). Similarly, conserved origins across species (Fig. 2d). The evolutionary dynamics of origins that ﬂank lost origins are also on average the most efﬁcient replication origins in Lachancea far exceeds that of chromosome of all conserved origins (Fig. 4b, c). rearrangements with 394 recent events of origin gains and losses Altogether, the above observations ﬁt within a two-pathway compared to only 74 translocations and inversions that reached model of origin gain and loss coordination during evolution that ﬁxation on the terminal branches of the tree (Fig. 3a). The small we call Loss-ﬁrst and Gain-ﬁrst. In the Loss-ﬁrst pathway, the proportion, 16.2%, of ancestral origins belonging to families that main cause explaining the gain of a strong active origin would be remained conserved across all Lachancea genomes also illustrates the initial loss, nearby along the chromosome, of a strong origin, the high evolutionary turnover of active replication origins. These generating a large region devoid of initiation zone (Fig. 5b). results are in marked contrast with the high conservation in the Conversely, in the Gain-ﬁrst pathway, the main cause explaining location and activation time of chromosomally active origins the evolutionary loss of a weaker replication origin would be the between much more closely related species from the Sacchar- appearance nearby of a stronger replication origin (Fig. 5b). In a omyces sensu stricto complex . In total, we characterized 1010 nutshell, our model relies on ﬁve simple principles (Fig. 5): (i) gain and loss events since the Lachancea species diverged from chromosomally active replication origins are continuously gained their last common ancestor. The chromosomally active replica- and lost during evolution, (ii) conserved origins have on average tion origins detected by our genome-wide timing survey only stronger activity than younger origins, (iii) newly gained origins correspond to a subset of a pool of possible ARS because cells, in have on average low activity and their strength increases over order to overcome potentially irreversible double fork stalling evolutionary time, (iv) conserved origins are preferentially lost 1–3,43–45 events, license many more origins than they use . For and (v) origin loss and gain of strong origins occur in a direct instance, in S. cerevisiae, the experimental deletion of all efﬁcient chromosomal vicinity. It is intriguing to speculate on the phy- replication origins from entire chromosomes only causes a mar- siological principles and constraints responsible for the evolu- ginal mitotic instability because dormant origins become active tionary tradeoffs leading to these principles. Interestingly, despite 46,47 and contribute to the replication of the genome . The evolu- their high evolutionary turnover, chromosomally active replica- tionary dynamics of replication origins uncovered here is not tion origins remain more regularly spaced than expected in all ten directly comparable to this ARS modularity because time scales Lachancea genomes since they diverged from their last common are different. ancestor approximately 80 million years ago (Supplementary However, we cannot rule out that the evolutionary gains of new Fig. 3d). Regular spacing of replication origins along chromo- 36,48–50 chromosomally active origins could in fact correspond to the somes was also reported for other yeast species . These activation of dormant ARS or to the increase in efﬁciency of a ﬁndings suggest that origin distribution has been optimized to weak origin. Concerning origin losses, we identiﬁed two categories limit large inter-origin distances where irreversible double fork differing by their ARS activity. We observed that the origin losses stall events are more likely . If natural selection acts to keep that concomitantly lost their ARS activity were associated with an active replication origins regularly spaced along chromosomes, it important degeneration of the ACS while origin losses that means that there may be a cost for both low and high origin retained ARS activity show greater ACS conservation (Supple- density. Alternatively, since there is some rate of loss of origins, mentary Fig. 16). The origin loss, in the former case, could result simply having no beneﬁt for high density would ensure that over time origins in high-density regions are lost. from mutations acting in cis in the ACS or its ﬂanking B1 element that would impair their recognition by ORC thereby abolishing In addition, even if chromosomal rearrangements play no the ARS activity . In the latter case, it could result from mutations direct role in the evolutionary dynamics of origins, they can acting in trans and affecting neighboring gene expression and/or change the distance between origins, thereby creating in some chromatin states and thereby altering ORC binding in the chro- instances larger origin-free regions. Therefore origins would need mosomal context only. The key result of our study is that we to be able to evolve at least as quickly as chromosomes rearrange. uncovered several principles that governed the gain, loss and However, in the absence of any chromosomal rearrangements, conservation of replication origins during genome evolution the molecular determinants that drive the evolutionary loss and (Fig. 5a). Firstly, new chromosomally active replication origins are gain of active replication origins and therefore the remodeling of continuously gained and lost during genome evolution (Fig. 3 and the temporal program of genome replication remain so far Supplementary Figs. 9 and 10). Secondly, the activity of replication unidentiﬁed. origins depends on their evolutionary age. New origins gained on terminal branches of the tree emerge with globally low efﬁciencies Methods and late ﬁring times (Fig. 4b and Supplementary Fig. 12a), con- Yeast strains and growth conditions. All yeast strains and growth conditions are served origins that appeared on internal branches have inter- summarized in Supplementary Table 1. For all Lachancea species, excluding L. mediate activity and ancestral origins are the strongest (Fig. 4e and nothofagi and L. fantastica, cells were grown at 30 °C in YPD broth (BD-Difco). L. nothofagi and L. fantastica were grown at 24 °C since, at 30 °C, the former does not Supplementary Fig. 18b). These ﬁndings suggest that origin grow and the latter aggregates. All time-course experiments were performed at 23 ° activity becomes stronger while they age over long evolutionary C in YPD. periods. Alternatively, natural selection would rapidly purge the genomes of low efﬁciency replication origins and select for the few DNA sample preparation. The experimental and analytical methods used for the new origins that emerged with high efﬁciency. However, this time course experiments are fully described in ref. . Brieﬂy, G1 cells were isolated hypothesis seems less likely because, even though lost origins have from an asynchronous cell culture using centrifugal elutriation, and then grown at on average intermediate activity relative to conserved and gained 23 °C in YPD. Time-point samples were taken regularly until the cells reached the origins, they also cover the entire range of efﬁciency (Fig. 4b). G2 phase. Progression of the cells through the cell cycle was monitored using ﬂow cytometry, as described in ref. . Samples covering the whole S-phase were selected Thirdly, we found that origins lost in the terminal branches of the and DNA was extracted using the genomic-tip 20/G isolation kit (Qiagen). For the tree systematically corresponded to conserved origins, suggesting MFA experiments, cells were grown in YPD at the appropriate temperature. The that old origins are more likely to be lost than new origins. Finally, exponential (Expo) and stationary phase (Stat) samples were collected after 4 and the subset of gained origins corresponding to the most efﬁcient 30 h of growth, respectively. DNA was extracted using the genomic-tip 20/G iso- and earliest ﬁring are physically associated with lost origins, both lation kit (Qiagen). NATURE COMMUNICATIONS (2018) 9:2199 DOI: 10.1038/s41467-018-04628-4 www.nature.com/naturecommunications 9 | | | ARTICLE NATURE COMMUNICATIONS | DOI: 10.1038/s41467-018-04628-4 Deep sequencing. For each sample, a minimum of 300 ng of genomic DNA was correlation coefﬁcient (rho) is calculated on replication timing for all orthologous sequenced as 50 bp single reads using Illumina technology (Single-End, 50 bp). A pairs. As a null model, the same correlation was calculated after applying an offset minimum of 10 million and 15 million reads per sample were used for the time- on orthologous coordinates for one of the two species of the comparison. The offset course experiment and the MFA experiment, respectively. To avoid differential was re-deﬁned 100 times and correlations were calculated for each combination of PCR biases between samples in the time-course experiment, all multiplexed an offset proﬁle from one species and an original proﬁle from the other species. libraries were pooled before PCR ampliﬁcation as described in ref. . Libraries were de-multiplexed and adaptator sequences were removed from the reads. Construction of replication origin families. We used synteny conservation 31–33 Sequences were remapped to the reference genomes using BWA (0.59) and between the two protein coding genes that ﬂank each origin position to construct allowing no mismatch and no gap. Mapped reads were subsequently ﬁltered to families of orthologous active origins. Because of the poor synteny conservation in keep only unique match and high-quality mapping scores (MAPQ > 37, i.e. base subtelomeres, we excluded the 96 subtelomeric origins that were interstitially call accuracy >99.98%). located between the telomeres and the ﬁrst synteny blocks comprising at least ﬁve syntenic genes. We constructed origin families for the remaining 2168 internal Mean replication time and MFA proﬁles. Mean replication times were calculated origins, representing 96% of the total number of origins (Supplementary Fig. 7a). from time-course experiment sequences. Reads were counted in 500 bp non- For all pairwise comparisons between two Lachancea species, we ﬁrst projected the overlapping windows, along the genome. Changes in DNA copy number were position of each replication origin of one genome onto the chromosomal coordi- measured by calculating the ratio of the number of sequences between S-phase nates of a second genome based on the synteny conservation of its two ﬂanking samples and the reference sample, corresponding to G1 or G2 phase. For each time coding genes, and reciprocally from the second genome to the ﬁrst one. Each point, the median of the S/G1 or S/G2 ratio was adjusted to correspond to the DNA projection was associated with its nearest resident origin and then, two origins were content measured by ﬂow cytometry during S-phase progression. Subsequently, deﬁned as conserved between two species when the projected and the resident ratios were re-scaled between 1 and 2 and for each window, the time where the origins were located at most two syntenic genes apart, in both directions (delta = 2, scaled ratio equaled 1.5 was deﬁned as the Trep. Finally, mean replication times Supplementary Fig. 7b). The resulting 3730 pairs of conserved orthologous origins were obtained by smoothing the data with a loess regression (see ref. for details). were subsequently clustered into origin families which simply correspond to the For the MFA experiments, reads were counted in all 500 bp non-overlapping assembly of all pairs of orthologous origins into connected components, such as windows, along the genome. Windows where the number of sequences was deﬁned those illustrated in Supplementary Fig. 7c. Active replication origins distributed as an outlier (> or <1.5 times interquartile spaces) were ﬁltered out. The MFA ratio into 374 multi-species origin families comprising 1956 origins (90%) and 212 is calculated by dividing, for each window, the number of sequences from the Expo species-speciﬁc singleton origins (10%, Supplementary Fig. 7a). sample by the number of sequences from the Stat sample as described in ref. . To assess the quality of our origin families, we generated a null model where the The MFA proﬁle is obtained by smoothing the data with a loess regression. All positions of the origins were randomized, according to the following constraints: timing data are available in the Supplementary Data 1. randomized inter-origin distances must preserve the same distribution than the observed inter-origin distances. This property is particularly important, Identiﬁcation of replication origins. The mean replication times and MFA pro- since actual origins are more regularly spaced than randomly distributed ﬁles were plotted as a function of the chromosomal coordinates. The ﬁrst derivative origins (Supplementary Fig. 3d); of these curves was estimated by calculating the slope of each coordinate x of the ● the position of the ﬁrst origin on each chromosome must be located between window i along the genome, using the following formula (y – y )/(x – x ), as i i − 1 i i − 1 the beginning of the chromosome and two-times the position of the ﬁrst actual described in ref. . The second derivative was then estimated from the ﬁrst origin; derivative values using the same method and plotted as a function of the chro- the relative proportions of intra- and inter-genic origins must be conserved by mosomal coordinates. Both peaks and shoulders from the original mean replication the randomization procedure. time and MFA curves appear as clear peaks on the second derivative plot. The In particular, the ﬁrst two constraints guarantee that the average number of chromosomal location of these peaks was determined where the slope curves origins is globally conserved, even if it may slightly ﬂuctuate between samples corresponding to the third derivatives intersect 0 from negative to positive value as drawn randomly from the null model. described in ref. .We deﬁned peak coordinates as the location of active repli- Thereafter, for each pairwise comparison, origin positions were randomized 100 cation origins only when peaks were co-detected in both the mean replication time times in the ﬁrst genome and projected on the second genome, and reciprocally. and the MFA second derivatives proﬁles. Then, for each replication origin location, Randomized origin families were then constructed with the same procedure as for we compared its two chromosomal coordinates issued from the two curves to the real origin families. We used the randomized families to deﬁne the optimal estimate the precision of our experiments in calling origin locations. We found a number of syntenic genes allowed between two conserved origins. The threshold of median and a mean difference between the two peak coordinates of 4.0 and 4.9 kb, Delta = 2 syntenic genes was determined after testing all values between 0 and 6 respectively. All origin locations are available in the Supplementary Data 1. intervening genes and looking for the value that maximized the differences between real origin families and null model. We found that Delta = 2 was the value that Modelization of replication kinetics. We employed a 1D nucleation-growth jointly (i) maximized the number of conserved origins in the real versus the mathematical model of stochastic replication kinetics similar to models available in random dataset, (ii) maximized the number of families comprising ten origins (one 12,13 14 the literature and detailed in ref. , using constant probability of origin acti- per species) in real vs random dataset comparisons and (iii) minimized the number vation (γ = 0). Empirical parameters (origin ﬁring rates and replication fork speed) of families with more than ten members (Supplementary Fig. 7d). Finally, we also were inferred through ﬁtting experimental data on DNA copy number as a found that Delta = 2 limited the proportion of families with more than one origin function of position and time with the model (Supplementary Fig. 19). The initial per species (1.9% for Delta = 2 vs 8% for Delta = 3). ﬁring rates were obtained from the slope of the ﬁrst time points in the replication To check that our methodology indeed captured a true evolutionary signal proﬁles. The ﬁts are performed using the full replication time-course data, by corresponding to the orthology relationship between replication origins, we minimizing the distance between the replication timing proﬁles in the model and in compared the distributions of the number of origins per family between the real the experimental data, by an iterative updating method with adaptive steps . The and the random dataset and found that they were clearly different from what is objective function was deﬁned as the average of squared differences of the expected from a null model (Supplementary Fig. 8a). Moreover, we constructed experimental and theoretical replication proﬁle. We used the positions of repli- two phylogenetic trees based on the composition of the real origin families in the cation origins as deﬁned in the section above and kept them ﬁxed in the ﬁts. The ten Lachancea species. First, we performed hierarchical clustering implemented in model ﬁt allows the direct estimation of the origin ﬁring rates. Characteristic ﬁring MEV (http://www.tm4.org./#/welcome) based on the phylostratigraphic patterns of times correspond to the inverse of rates and therefore are independent from the origin families. Second, we built a distance matrix representing the proportion of conserved origins between any two pairs of species and generated an NJ tree, using interference from nearby origins. Once the model parameters (origin rates, repli- cation fork speed) are ﬁxed, the computational model was run to simulate S phases Phylip version 3.695 . The two resulting tree topologies are very similar to the of single cells. For each origin, we determined in how many simulation runs it was topology of the reference species tree based on the concatenation of 3598 actively or passively replicated. The fraction of simulation runs where the origin orthologous protein sequences , with only few bipartitions being different was actively replicated is our estimation of the origin efﬁciency. This procedure is (Supplementary Fig. 8b). identical to the one deﬁned by Nieduszynski and coworkers . Inference of replication origin history. Based on the composition of the origin Comparison of temporal programs of genome replication. The identiﬁcation of families, the evolutionary history of origin conservation, gain and loss was inferred orthologous genes and the construction of synteny blocks were computed with the using the Gloome online program . As input, we used simpliﬁed phyletic patterns SynChro algorithm for all pairwise combinations between the ten Lachancea limited to the presence (1) or absence (0) of an origin family within Lachancea species, the four Saccharomyces and the C. albicans species. Genome annotations species. We used an evolutionary model where the probability of gain and loss is were downloaded from GRYC (http://gryc.inra.fr), saccharomycessensustricto equal across all sites. We tolerated more than one possible creation event per (http://www.saccharomycessensustricto.org), and CGD (http://www. family. The parsimony cost of the gains was set to 2. Other parameters were set to candidagenome.org). We used our mean replication timing proﬁles for the default values. Lachancea species and the previously published replication data for the Sacchar- To compare the physical and functional properties between conserved, lost, and 22,25 omyces species and C. albicans . For each pairwise combination, a Spearman’s gained replication origins, we only focused on the three most closely related pairs 10 NATURE COMMUNICATIONS (2018) 9:2199 DOI: 10.1038/s41467-018-04628-4 www.nature.com/naturecommunications | | | NATURE COMMUNICATIONS | DOI: 10.1038/s41467-018-04628-4 ARTICLE of species (L. fantastica/L. meyersii, L. thermotolerans/L. waltii, and L. fermentati/L. 14. Zhang, Q., Bassetti, F., Gherardi, M. & Cosentino Lagomarsino, M. Cell-to-cell cidri). This resulted in an initial dataset of 1280 cases (886 conserved origins, 207, variability and robustness in S-phase duration from genome replication gains and 187 losses). We then ﬁltered from this dataset the dubious cases of losses kinetics. Nucleic Acids Res. 45, 8190–8198 (2017). and gains that correspond to origins that were detected by only one of the two 15. Friedman, K. L., Brewer, B. J. & Fangman, W. L. Replication proﬁle of methods (MFA or mean replication time). This resulted in a ﬁnal dataset composed Saccharomyces cerevisiae chromosome VI. Genes Cells 2, 667–678 (1997). of 1148 cases (846 conserved origins, 151 gains, and 151 losses). 16. Yamashita, M. et al. The efﬁciency and timing of initiation of replication of multiple replicons of Saccharomyces cerevisiae chromosome VI. Genes Cells 2, ARS assay. Using the Gibson assembly method , we constructed 16 plasmids 655–665 (1997). containing a conserved (4 cases) or lost origin (12 cases). These 16 loci were 17. Ferguson, B. M., Brewer, B. J., Reynolds, A. E. & Fangman, W. L. A yeast selected because they showed strict synteny conservation with previously published origin of replication is activated late in S phase. Cell 65, 507–515 (1991). chromosomally active L. waltii ARS . For each ARS, the exact location of the ACS 18. Raghuraman, M. K. et al. Replication dynamics of the yeast genome. Science in L. waltii was used to identify the corresponding syntenic intergene in L. ther- 294, 115–121 (2001). motolerans. These intergenes were ampliﬁed by PCR using the oligonucleotides 19. Aparicio, O. M. Location, location, location: it’s all in the timing for described in Supplementary Table 2. We used as a backbone the pRS41K plasmid replication origins. Genes Dev. 27, 117–128 (2013). where the native S. cerevisiae centromere was replaced by the centromere of L. 20. Rhind, N. DNA replication timing: random thoughts about origin ﬁring. Nat. thermotolerans chromosome 0C. The S. cerevisiae replication origin originally Cell Biol. 8, 1313–1316 (2006). present in pRS41K was substituted in each plasmid by one of the 16 L. thermo- 21. Yabuki, N., Terashima, H. & Kitada, K. Mapping of early ﬁring origins on a tolerans PCR fragments. Plasmids were transformed into L. thermotolerans cells replication proﬁle of budding yeast. Genes Cells 7, 781–789 (2002). using the LiAc/PEG method . Cells were plated on YPD with G418 (200 µg/mL) 22. Muller, C. A. & Nieduszynski, C. A. Conservation of replication timing reveals and growth for 4 days at 30 °C and then the number of colonies and their size were global and local regulation of replication origin activity. Genome Res. 22, measured. 1953–1962 (2012). 23. Agier, N., Romano, O. M., Touzain, F., Lagomarsino, M. C. & Fischer, G. The ACS detection. We selected 123 chromosomally active origins in L. waltii that spatio-temporal program of replication in the genome of Lachancea kluyveri. corresponded to known ARS that were previously published . We then selected in Genome Biol. Evol. 5, 370–388 (2013). the genomes of all other Lachancea species, the replication origins that were 24. Descorps-Declere, S. et al. Genome-wide replication landscape of Candida conserved in families with these 123 origins. We used the known ACS position in glabrata. BMC Biol. 13, 69 (2015). L. waltii to identify the corresponding gene/intergene strictly conserved in synteny 25. Koren, A. et al. Epigenetically-inherited centromere and neocentromere DNA in the other species. Based on these datasets, we use MEME (zoops, size 9-40, e- replicates earliest in S-phase. PLoS Genet. 6, e1001068 (2010). value threshold 10) to search for motives, both in L. waltii, as a control, and in the 26. Liachko, I. et al. GC-rich DNA elements enable replication origin activity in other Lachancea species. The number of occurrences containing the detected motif the methylotrophic yeast Pichia pastoris. PLoS Genet. 10, e1004169 (2014). −4 40 was then evaluated using MAST (e-value threshold 10 ) . 27. Muller, C. A. & Nieduszynski, C. A. DNA replication timing inﬂuences gene expression level. J. Cell Biol. 216, 1907–1914 (2017). 28. Liachko, I. et al. A comprehensive genome-wide map of autonomously Data availability. The fastq ﬁles are deposited in the Sequence Read Archive with replicating sequences in a naive genome. PLoS Genet. 6, e1000946 (2010). the project number SRP111158 under accessions ranging from SRR5807795 to 29. Liachko, I. et al. Novel features of ARS selection in budding yeast Lachancea SRR5807891. kluyveri. BMC Genom. 12, 633 (2011). 30. Payen, C. et al. Unusual composition of a yeast chromosome arm is associated Received: 18 October 2017 Accepted: 8 May 2018 with its delayed replication. Genome Res. 19, 1710–1721 (2009). 31. Vakirlis, N. et al. Reconstruction of ancestral chromosome architecture and gene repertoire reveals principles of genome evolution in a model yeast genus. Genome Res. 26, 918–932 (2016). 32. Kellis, M., Birren, B. W. & Lander, E. S. Proof and evolutionary analysis of ancient genome duplication in the yeast Saccharomyces cerevisiae. Nature 428, References 617–624 (2004). 1. Mechali, M. Eukaryotic DNA replication origins: many choices for 33. Souciet, J. L. et al. Comparative genomics of protoploid Saccharomycetaceae. appropriate answers. Nat. Rev. Mol. Cell Biol. 11, 728–738 (2010). Genome Res. 19, 1696–1709 (2009). 2. Nieduszynski, C. A., Knox, Y. & Donaldson, A. D. Genome-wide identiﬁcation 34. Agier, N. & Fischer, G. A versatile procedure to generate genome-wide of replication origins in yeast by comparative genomics. Genes Dev. 20, spatiotemporal program of replication in yeast species. Methods Mol. Biol. 1874–1879 (2006). 1361, 247–264 (2016). 3. Di Rienzi, S. C. et al. Maintaining replication origins in the face of genomic 35. Muller, C. A. et al. The dynamics of genome replication using deep change. Genome Res. 22, 1940–1952 (2012). sequencing. Nucleic Acids Res. 42, e3 (2014). 4. Rodriguez, J., Lee, L., Lynch, B. & Tsukiyama, T. Nucleosome occupancy as a 36. Newman, T. J., Mamun, M. A., Nieduszynski, C. A. & Blow, J. J. Replisome novel chromatin parameter for replication origin functions. Genome Res. 27, stall events have shaped the distribution of replication origins in the genomes 269–277 (2017). of yeasts. Nucleic Acids Res. 41, 9705–9718 (2013). 5. Eaton, M. L., Galani, K., Kang, S., Bell, S. P. & MacAlpine, D. M. Conserved 37. Drillon, G. & Fischer, G. Comparative study on synteny between yeasts and nucleosome positioning deﬁnes replication origins. Genes Dev. 24, 748–753 vertebrates. C. R. Biol. 334, 629–638 (2011). (2010). 38. Fischer, G., Rocha, E. P., Brunet, F., Vergassola, M. & Dujon, B. Highly 6. Hoggard, T., Shor, E., Muller, C. A., Nieduszynski, C. A. & Fox, C. A. A link variable rates of genome rearrangements between Hemiascomycetous yeast between ORC-origin binding mechanisms and origin activation time revealed lineages. PLoS Genet. 2, e32 (2006). in budding yeast. PLoS Genet. 9, e1003798 (2013). 39. Theis, J. F. & Newlon, C. S. The ARS309 chromosomal replicator of 7. Czajkowsky, D. M., Liu, J., Hamlin, J. L. & Shao, Z. DNA combing reveals Saccharomyces cerevisiae depends on an exceptional ARS consensus sequence. intrinsic temporal disorder in the replication of yeast chromosome VI. J. Mol. Proc. Natl. Acad. Sci. USA 94, 10786–10791 (1997). Biol. 375,12–19 (2008). 40. Bailey, T. L. et al. MEME SUITE: tools for motif discovery and searching. 8. Patel, P. K., Arcangioli, B., Baker, S. P., Bensimon, A. & Rhind, N. DNA Nucleic Acids Res. 37, W202–W208 (2009). replication origins ﬁre stochastically in ﬁssion yeast. Mol. Biol. Cell 17, 41. Di Rienzi, S. C., Collingwood, D., Raghuraman, M. K. & Brewer, B. J. Fragile 308–316 (2006). genomic sites are associated with origins of replication. Genome Biol. Evol. 1, 9. Saner, N. et al. Stochastic association of neighboring replicons creates 350–363 (2009). replication factories in budding yeast. J. Cell Biol. 202, 1001–1012 (2013). 42. Gordon, J. L., Byrne, K. P. & Wolfe, K. H. Additions, losses, and 10. Hawkins, M. et al. High-resolution replication proﬁles deﬁne the stochastic rearrangements on the evolutionary route from a reconstructed ancestor to the nature of genome replication initiation and termination. Cell Rep. 5, modern Saccharomyces cerevisiae genome. PLoS Genet. 5, e1000485 (2009). 1132–1141 (2013). 43. Blow, J. J., Ge, X. Q. & Jackson, D. A. How dormant origins promote complete 11. Gilbert, D. M. Evaluating genome-scale approaches to eukaryotic DNA genome replication. Trends Biochem. Sci. 36, 405–414 (2011). replication. Nat. Rev. Genet. 11, 673–684 (2010). 44. Woodward, A. M. et al. Excess Mcm2-7 license dormant origins of replication 12. Yang, S. C., Rhind, N. & Bechhoefer, J. Modeling genome-wide replication that can be used under conditions of replicative stress. J. Cell Biol. 173, kinetics reveals a mechanism for regulation of replication timing. Mol. Syst. 673–683 (2006). Biol. 6, 404 (2010). 45. Ge, X. Q., Jackson, D. A. & Blow, J. J. Dormant origins licensed by excess 13. de Moura, A. P., Retkute, R., Hawkins, M. & Nieduszynski, C. A. Mcm2-7 are required for human cells to survive replicative stress. Genes Dev. Mathematical modelling of whole chromosome replication. Nucleic Acids Res. 21, 3331–3341 (2007). 38, 5623–5633 (2010). NATURE COMMUNICATIONS (2018) 9:2199 DOI: 10.1038/s41467-018-04628-4 www.nature.com/naturecommunications 11 | | | ARTICLE NATURE COMMUNICATIONS | DOI: 10.1038/s41467-018-04628-4 46. Dershowitz, A. et al. Linear derivatives of Saccharomyces cerevisiae Author contributions chromosome III can be maintained in the absence of autonomously G.F., N.A., and S.D. wrote the paper. M.C.-L. and M.W. contributed to the manuscript. replicating sequence elements. Mol. Cell. Biol. 27, 4652–4663 (2007). G.F. supervised the project. G.F. and N.A. designed the experiments. N.A. and S.D. 47. Bogenschutz, N. L., Rodriguez, J. & Tsukiyama, T. Initiation of DNA performed the experiments. N.A. performed the analyses. M.W. and A.F. contributed to replication from non-canonical sites on an origin-depleted chromosome. PLoS the analyses. Q.Z. and M.C.-L. performed the mathematical modeling. Y.J., E.v.D. and ONE 9, e114545 (2014). C.T. performed the sequencing. 48. Karschau, J., Blow, J. J. & de Moura, A. P. Optimal placement of origins for DNA replication. Phys. Rev. Lett. 108, 058101 (2012). Additional information 49. Xu, J. et al. Genome-wide identiﬁcation and characterization of replication Supplementary Information accompanies this paper at https://doi.org/10.1038/s41467- origins by deep sequencing. Genome Biol. 13, R27 (2012). 018-04628-4. 50. Musialek, M. W. & Rybaczek, D. Behavior of replication origins in eukaryota —spatio-temporal dynamics of licensing and ﬁring. Cell Cycle 14, 2251–2264 Competing interests: The authors declare no competing interests. (2015). 51. Drillon, G., Carbone, A. & Fischer, G. SynChro: a fast and easy tool to Reprints and permission information is available online at http://npg.nature.com/ reconstruct and visualize synteny blocks along eukaryotic chromosomes. PLoS reprintsandpermissions/ ONE 9, e92621 (2014). 52. Felsenstein, J. P. H. Y. L. I. P. (Phylogeny Inference Package) version 3.6. Publisher's note: Springer Nature remains neutral with regard to jurisdictional claims in (Distributed by the author, Department of Genome Sciences, University of published maps and institutional afﬁliations. Washington, Seattle., 2005). 53. Cohen, O., Ashkenazy, H., Belinky, F., Huchon, D. & Pupko, T. GLOOME: gain loss mapping engine. Bioinformatics 26, 2914–2915 (2010). 54. Fu, C., Donovan, W. P., Shikapwashya-Hasser, O., Ye, X. & Cole, R. H. Hot Open Access This article is licensed under a Creative Commons Fusion: an efﬁcient method to clone multiple DNA fragments as well as Attribution 4.0 International License, which permits use, sharing, inverted repeats without ligase. PLoS ONE 9, e115318 (2014). adaptation, distribution and reproduction in any medium or format, as long as you give 55. Gietz, R. D. & Schiestl, R. H. High-efﬁciency yeast transformation using the appropriate credit to the original author(s) and the source, provide a link to the Creative LiAc/SS carrier DNA/PEG method. Nat. Protoc. 2,31–34 (2007). Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory Acknowledgements regulation or exceeds the permitted use, you will need to obtain permission directly from This work was supported by the Agence Nationale de la Recherche (GB-3G, ANR-10- the copyright holder. To view a copy of this license, visit http://creativecommons.org/ BLAN-1606 and Phenovar, ANR-16-CE12-0019). We thank our colleagues Gianni Liti, licenses/by/4.0/. Bertrand Llorente, Carolin Müller, Conrad Nieduszynski, and Samuel O’Donnell for fruitful discussions and constructive suggestions. © The Author(s) 2018 12 NATURE COMMUNICATIONS (2018) 9:2199 DOI: 10.1038/s41467-018-04628-4 www.nature.com/naturecommunications | | |
Nature Communications – Springer Journals
Published: Jun 6, 2018
It’s your single place to instantly
discover and read the research
that matters to you.
Enjoy affordable access to
over 18 million articles from more than
15,000 peer-reviewed journals.
All for just $49/month
Query the DeepDyve database, plus search all of PubMed and Google Scholar seamlessly
Save any article or search result from DeepDyve, PubMed, and Google Scholar... all in one place.
All the latest content is available, no embargo periods.
“Whoa! It’s like Spotify but for academic articles.”@Phil_Robichaud