Key message Optimal cross selection increases long-term genetic gain of two-part programs with rapid recurrent genomic selection. It achieves this by optimising efficiency of converting genetic diversity into genetic gain through reducing the loss of genetic diversity and reducing the drop of genomic prediction accuracy with rapid cycling. Abstract This study evaluates optimal cross selection to balance selection and maintenance of genetic diversity in two-part plant breeding programs with rapid recurrent genomic selection. The two-part program reorganises a conventional breeding program into a population improvement component with recurrent genomic selection to increase the mean value of germ- plasm and a product development component with standard methods to develop new lines. Rapid recurrent genomic selection has a large potential, but is challenging due to genotyping costs or genetic drift. Here we simulate a wheat breeding program for 20 years and compare optimal cross selection against truncation selection in the population improvement component with one to six cycles per year. With truncation selection we crossed a small or a large number of parents. With optimal cross selection we jointly optimised selection, maintenance of genetic diversity, and cross allocation with AlphaMate program. The results show that the two-part program with optimal cross selection delivered the largest genetic gain that increased with the increasing number of cycles. With four cycles per year optimal cross selection had 78% (15%) higher long-term genetic gain than truncation selection with a small (large) number of parents. Higher genetic gain was achieved through higher efficiency of converting genetic diversity into genetic gain; optimal cross selection quadrupled (doubled) efficiency of truncation selection with a small (large) number of parents. Optimal cross selection also reduced the drop of genomic selection accuracy due to the drift between training and prediction populations. In conclusion optimal cross selection enables optimal management and exploitation of population improvement germplasm in two-part programs. Introduction parents and (2) identifying parents for subsequent breeding cycles. We recently proposed a two-part program that uses In this study we evaluate optimal cross selection to balance genomic selection to separately address these goals (Gaynor selection and maintenance of genetic diversity in two-part et al. 2017; Hickey et al. 2017a). The two-part program reor- plant breeding programs with rapid recurrent genomic selec- ganises conventional program into two distinct components: tion. Plant breeding programs that produce inbred lines have a product development component that develops and screens two concurrent goals: (1) identifying new varieties or hybrid inbred lines with established breeding methods and a popu- lation improvement component that increases the popula- tion mean with rapid cycles of recurrent genomic selection. Communicated by Jochen Reif. Simulations showed that the two-part program has a poten- tial to deliver about 2.5 times larger genetic gain compared Electronic supplementary material The online version of this to a conventional program for the same investment (Gaynor article (https ://doi.org/10.1007/s0012 2-018-3125-3) contains supplementary material, which is available to authorized users. et al. 2017). The larger genetic gain from the two-part program is pri- * Gregor Gorjanc marily driven by rapid recurrent genomic selection in the email@example.com population improvement component. In a conventional pro- The Roslin Institute and Royal (Dick) School of Veterinary gram a cycle of “recurrent” selection may take 4 to 5 years Studies, Easter Bush Research Centre, University to complete. The two-part program enables rapid recurrent of Edinburgh, Midlothian EH25 9RG, UK Vol.:(0123456789) 1 3 1954 Theoretical and Applied Genetics (2018) 131:1953–1966 selection with several cycles per year, because population 2009; Kinghorn 2011). These methods are established in improvement and product development components oper- animal breeding (for a review, see Woolliams et al. (2015)) ate independently of each other. For example, Gaynor et al. and are increasingly common in plant breeding (Cowling (2017) simulated two cycles of population improvement per et al. 2016; Akdemir and Sánchez 2016; De Beukelaer et al. year, which reduced cycle time eightfold compared to the 2017; Lin et al. 2017). conventional program. Cycle time can be decreased even The aim of this study was to evaluate the potential of further with intensive use of greenhouses and speed breed- optimal cross selection to balance selection and maintenance ing (Christopher et al. 2015; Hickey et al. 2017b; Watson of genetic diversity in a two-part program with rapid recur- et al. 2018). Factoring this potential into the breeder’s equa- rent genomic selection. We evaluated the potential with a tion suggests that the large genetic gain in Gaynor et al. long-term simulation of conventional and two-part breeding (2017) could be increased even more with more than two programs. The two-part programs used different number of cycles per year. cycles, different selection methods, and different resources To ensure large genetic gain a population improvement for genomic selection. The results show that optimal cross manager must simultaneously consider several factors: most selection delivered the largest long-term genetic gain under notably number of cycles, size of the population, number all scenarios. This was achieved by optimising the efficiency of parents, genomic prediction accuracy, maintenance of of converting genetic diversity into genetic gain with the genetic diversity, and costs. Performing more cycles can increasing number of recurrent selection cycles. With four increase genetic gain per year, but it also increases costs cycles per year optimal cross selection had 15–78% higher incurred by genotyping many selection candidates and other genetic gain and 2–4 times higher efficiency than truncation operating costs. To control costs the manager is likely to selection. reduce population size with increasing number of cycles. In an unpublished analysis (reproduced in this study), we observed that increasing the number of cycles, above two Materials and methods used in Gaynor et al. (2017), expectedly increased genetic gain in first years, but eventually led to a lower long-term Breeding programs genetic gain than with two cycles. Inspection of the results indicated that genetic diversity was depleted faster with We used simulations of entire breeding programs to com- increased number of cycles. pare different selection methods under different scenarios. We hypothesise that to achieve large long-term genetic Detailed description of simulated breeding programs and gain from the two-part program with rapid recurrent scenarios is available in Supplementary material 1. In sum- genomic selection we need to balance selection and mainte- mary, we have initiated a virtual wheat breeding program nance of genetic diversity. To test this we simulated a two- for a polygenic trait and ran it for 20 years (burn-in) with a part program that uses truncation selection or optimal cross conventional program based on phenotypic selection. After selection to manage population improvement germplasm. the burn-in we evaluated different programs under equal- The optimal cross selection is a combination of optimal ised costs for another 20 years. The evaluated programs contribution selection and cross allocation. The optimal were: (1) conventional program with phenotypic selection contribution selection optimises contributions of selec- (Conv), (2) conventional program with genomic selection tion candidates to the next generation such that expected at the preliminary trial stage (ConvP), (3) conventional pro- benefit and risks are balanced (Woolliams et al. 2015). A gram with genomic selection at the headrow stage (ConvH), common way to achieve this balance is to maximise genetic and (4) two-part program with recurrent genomic selection gain at a predefined rate of population inbreeding (coances- (TwoPart). While the conventional program performs popu- try) through penalising selection of individuals that are too lation improvement and product development concurrently, closely related (Wray and Goddard 1994; Meuwissen 1997). the two-part program splits these two activities into two This penalisation controls the rate at which genetic diversity separate, but connected, components (Fig. 1). The popula- is lost due to drift and selection. Well-managed breeding tion improvement component is based on rapid recurrent programs balance this loss by maintaining sufficiently large genomic selection to increase population mean, while prod- effective population size so that standing genetic diversity uct development component is based on standard breeding and newly generated genetic diversity due to mutation (and methods (including field trials) to develop inbred lines. A possibly migration) sustain long-term genetic gains (Hill by-product of field trials is a training set of genotyped and 2016). The optimal contribution selection assumes that con- phenotyped individuals, which is used to retrain a genomic tributions will be randomly paired, including selfing. An selection model. Because the two-part program uses rapid extension that delivers a practical crossing plan is to jointly cycling, we use doubled-haploid lines to speed up the con- optimise contributions and cross allocations (Kinghorn et al. ventional program and the product development component. 1 3 Theoretical and Applied Genetics (2018) 131:1953–1966 1955 Fig. 1 Scheme of breeding strategies (the conventional strategy is rent selection; the dashed line indicates initialisation of the popula- based on the product development component that implicitly also tion improvement component; N1 and N2 correspond to the number performs population improvement, while the two-part strategy of lines in Table 1) includes an explicit population improvement component with recur- A challenge with the two-part program is to balance selec- Table 1 Per cycle characteristics of the population improvement component by number of recurrent selection cycles per year (number tion and maintenance of genetic diversity in the population or crosses per cycle, number of selection candidates per cycle, and improvement. This is particularly challenging with several minimum or maximum number of parents used per cycle) cycles or recurrent genomic selection, because the breeder # Cycles # Crosses # Candidates # Parents needs to handle increasing genotyping costs. Assume that the population improvement component is based on 64 crosses Min Max from 32 to 128 parents that give rise to 640 selection can- 1 64 640 32 128 didates. With a fixed genotyping budget, we can implement 2 32 320 16 64 one cycle of this scheme or several cycles with proportion- 3 22 214 12 44 ately reduced numbers, as shown in Table 1. Rapid cycling is 4 16 160 8 32 appealing in terms of genetic gain, but challenging in terms 5 13 128 8 26 of maintaining genetic diversity. We have evaluated how 6 11 107 6 22 these two aspects are balanced with: (1) truncation selection of a small numbers of parents (TwoPartTS), (2) truncation selection of a large number of parents (TwoPartTS+), or (3) optimal cross selection (TwoPartOCS). In the scenario with unrealistic, but we have included them to demonstrate the a small/large number of parents we selected a minimal/maxi- potential genetic gain with higher investment and to demon- mal possible number of parents for a given number of cycles strate the potential of optimal cross and truncation selection per year (Min/Max in Table 1). These two-part programs under the different settings. were compared with one to six recurrent selection cycles We repeated the entire simulation ten times and report per year and under constrained or unconstrained costs. With average and confidence intervals. For simulation of breeding unconstrained costs, the number of crosses was 64 with 640 programs and genomic selection we used the AlphaSimR R selection candidates per cycle irrespective of the number of package (Gaynor et al.) available at www.alpha genes .rosli cycles. The scenarios with unconstrained costs are likely n.ed.ac.uk/AlphaS imR. For optimal cross selection, we used 1 3 1956 Theoretical and Applied Genetics (2018) 131:1953–1966 the AlphaMate Fortran program (Gorjanc and Hickey 2018) Comparison available at www.alpha genes .rosli n.ed.ac.uk/Alpha Mate. Programs were compared in terms of genetic gain, genomic prediction accuracy, genetic diversity, and efficiency of con- Genomic prediction verting genetic diversity into genetic gain. To enable com- parison between conventional and two-part programs we The training data set for genomic prediction was initiated report the metrics on doubled-haploid lines, prior to head- with genotype and phenotype data collected in the last row selection (Fig. 1). In the two-part program there are two 3 years of the burn-in (3120 lines). The data set was further sets of doubled-haploid lines (Fig. 1), which we summarised enlarged every year with new trial phenotype and genotype jointly. We also report the metrics on selection candidates of data (1000 lines). We used the standard ridge regression the population improvement component in Supplementary model with heterogeneous error variance to account for material 2. different levels of replication in trials collected at different We measured genetic gain as average true genetic val- stages of a breeding program (Endelman 2011). ues that were standardised to mean zero and unit standard deviation in year 20. We measured accuracy of genomic pre- Optimal cross selection diction by correlation between predicted and true genetic values. Optimal cross selection delivers a crossing plan that maxim- We measured genetic diversity with genetic standard ises genetic gain in the next generation under constraints. deviation, genic standard deviation, number of times popula- Constraints could be: loss of genetic diversity (commonly tion ran out of genetic diversity as measured by marker measured with the rate of coancestry), number of parents, genotypes, and effective population size. We calculated and minimum/maximum number of crosses per parent. For genetic standard deviation as standard deviation of standard- example, in our simulation a parent could contribute from 1 ised true genetic values. We calculated genic standard devia- � � ∑n to 4 crosses and crosses had to be made between individuals q tion as = 2 p 1 − p (n is the number of i i q i=1 i in male and female pools. We implemented optimal cross causal loci and p and α are, respectively, allele frequency selection in the program AlphaMate, which uses evolution- i i and allele substitution effect at the i -th causal locus) and ary optimisation algorithm (Storn and Price 1997). Inputs expressed it relative to the observed value in year 20. Genic for the program are: (1) a list of selection candidates with standard deviation enables comparison of different stages breeding values (a) and gender pool information, (2) across different programs. For example, doubled-haploid coancestry matrix (C), and (3) a specification file with con- (inbred) lines in the product development component have straints. For breeding values we used genomic predictions. larger genetic variance than outbred plants in the population To construct the coancestry matrix we estimated coancestry improvement component, while their genic variances are for each pair of individuals as the proportion of marker 1 1 comparable because they depend only on population allele alleles that are identical by state: C = 1 + XX , where 2 n frequencies. We calculated effective population size from X = M − 1 and M is a n × n matrix of n marker genotypes i m m the rate of coancestry, N = 1/(2ΔC). Following the formula (coded as 0, 1, or 2) of n individuals. Given the inputs and i for change of genetic variance over time as a function of the 2 2 a proposed crossing plan by the evolutionary algorithm, the rate of coancestry, = (1 −ΔC) (Wright 1949), we t+1 t program calculates expected genetic gain as a ̄ = x a and estimated ΔC with log-link gamma regression of genic vari- group coancestry (expected inbreeding of the next genera- 1 ance on year using function glm() in R (R Development Core tion) as c ̄ = x Cx , where x = n , n is a vector of integer 2n Team 2017). Log-link gamma regression assumes that 2 2 contributions (0, 1, 2, 3, or 4), and n is the number of expected value at time t + 1 is E(σ |t + 1) = E(σ |t)exp(β) α α crosses. The contributions (x) and their pairing (crossing (McCullagh and Nelder 1989), which gives plan) are unknown parameters and optimised with the evo- ΔC = 1 − exp(β). Since we used genic variance for the esti- lutionary algorithm. Following Kinghorn (2011) we opera- mation of effective population size, the estimate refers to tionalise balance between genetic gain and coancestry via causal loci and not whole genome or neutral loci. “penalty degrees” between the maximal genetic gain solu- We measured efficiency of converting genetic diversity tion and the targeted solution under constraints. Specifically, into genetic gain by regressing achieved genetic gain the maximal genetic gain solution is obtained by setting (y =( − )∕ ) on lost genetic diversity t a a t 20 20 penalty to 0°, while the minimal loss of genetic diversity is x = 1 − ∕ , i.e. y = a + bx + e , where b is effi- t t t t t 20 obtained by setting penalty to 90°. For each scenario we ran ciency. For example, with the starting point of optimal cross selection with a range of penalty degrees (1°, (y , x ) = (0, 0) and a final point of (y , x ) = (10, 0.4), a 5°, 10°, …, 85°). 20 20 40 40 1 3 Theoretical and Applied Genetics (2018) 131:1953–1966 1957 breeding program converted 0.4 standard deviation of and genic standard deviation. Third, we present the change genetic diversity into genetic gain of 10 standard deviations, of genomic prediction accuracy over time. Fourth, we pre- an efficiency factor of 25 = 10/0.4. In some scenarios, par - sent the relationship between realised effective population ticularly with truncation selection in the two-part program, size and long-term genetic gain and efficiency. The two-part we noticed large changes in the “gain-diversity plane” in the program results in the second, third, and fourth sections of first and last generations. For this reason we estimated effi- the results are presented only for four cycles of recurrent ciency with robust regression using function rlm() in R selection per year. Unless specified explicitly, the results for (Venables and Ripley 2002). In addition to using robust the two-part program with optimal cross selection are given regression we have removed repeated values of genetic gain for penalty degrees that gave the highest long-term genetic and genetic diversity when a breeding program reached gain. selection limit. Eec ff t of the number of cycles on long‑term genetic gain Results Optimal cross selection delivered the highest long-term Overall the results show that the two-part program with opti- genetic gains. The gain increased with the increased num- mal cross selection delivered the largest long-term genetic ber of cycles of recurrent selection irrespective of cost con- gain and that this gain increased with the increasing number straints. This is shown in Fig. 2, which plots genetic mean of recurrent selection cycles per year. This was achieved after 20 years of selection against the number of cycles of by optimising efficiency of converting genetic diversity into recurrent selection per year in the two-part program. For genetic gain, which the two-part program with truncation comparison genetic gain of conventional programs is also selection cannot achieve. The extra efficiency from the opti- shown. The conventional program with phenotypic selec- misation was due to the reduced loss of genetic diversity and tion had the smallest genetic gain (5.7), followed by the two the reduced drop of genomic prediction accuracy with the conventional programs with genomic selection (8.2 and increasing number of recurrent selection cycles. With four 10.5). The two-part programs had generally larger genetic cycles per year optimal cross selection had 15–78% higher gains than conventional programs, but they varied consider- genetic gain and 2–4 times higher efficiency than truncation ably, and there were interactions between selection method, selection. number of cycles of recurrent selection per year, and cost In the following we structure the results in four parts. constraints. First, we present the effect of the number of cycles of recur - Under constrained costs optimal cross selection deliv- rent selection on long-term genetic gain and efficiency of the ered the highest long-term genetic gain, which increased two-part programs. Second, we present the 20-year trajec- with the increasing number of cycles: 11.5 with one cycle, tory of breeding programs through the plane of genetic mean 14.5 with two cycles, 15.5 with four cycles, and 16.1 Fig. 2 Genetic mean of doubled-haploid lines after 20 years of selection against the number of recurrent selection cycles per year in the two-part program by selection method and cost constraints (mean and 95% confidence interval). Conventional programs did not use recurrent selection, but are shown for comparison. Labels denote average penalty degree of optimum cross selection that delivered the highest long-term gain 1 3 1958 Theoretical and Applied Genetics (2018) 131:1953–1966 with six cycles. To achieve increased genetic gain with Eec ff t of the number of cycles on efficiency the increasing number of cycles, penalty degrees had to increase as well: on average, 14° with one cycle, 24° with Optimal cross selection had the highest efficiency of con- two cycles, 40° with four cycles, and 49° with six cycles. verting genetic diversity into genetic gain amongst the Genetic gain with truncation selection of a large number two-part programs. This is shown in Fig. 3, which plots of parents initially increased with increasing number of efficiency against the number of recurrent selection cycles cycles (up to 14.1 with three cycles per year), but then per year in the two-part program. For comparison effi- decreased. With six cycles per year it reached a level com- ciency of conventional programs is also shown. These had parable to what it achieved with just one cycle per year, an efficiency of 66.1 for the conventional program with which was also a comparable level of genetic gain to that phenotypic selection, 46.8 for the conventional program achieved by the conventional program with genomic selec- with genomic selection in preliminary trials, and 31.5 for tion in headrows. Genetic gain with truncation selection the conventional program with genomic selection in head- of a small number of parents increased from one to two rows. Efficiency of the two-part programs interacted with cycles per year (from 11.5 to 12.8) and decreased thereaf- the selection method, number of recurrent selection cycles ter. With six cycles per year this method had almost as low per year, and cost constraints. genetic gain as the conventional program with phenotypic Under constrained costs optimal cross selection had selection. the highest efficiency of two-part programs: 48.2 with one Under unconstrained costs truncation selection of a cycle and around 40.0 with more than one cycle. Trunca- large number of parents and optimal cross selection deliv- tion selection of a large number of parents had an effi- ered the largest long-term genetic gains and this increased ciency of 39.0 with one cycle, which decreased down to with increasing number of cycles: 11.5 with one cycle, 9.9 with six cycles. Truncation selection of a small number 15.0 with two cycles, 18.2 with four cycles, and 19.6 with of parents had an efficiency of 26.6 with one cycle, which six cycles. To achieve these genetic gains penalty degrees decreased to 10.0 already with three cycles. had to increase, but less than under constrained costs. Under unconstrained costs optimal cross selection had Truncation selection of a small number of parents again the highest efficiency of the two-part programs. It also increased genetic gain only when number of cycles was maintained comparable level of efficiency to the conven- increased from one to two and gradually decreased with tional program with genomic selection in preliminary tri- additional cycles, but at slower rate than under constrained als irrespective of the number of cycles. Efficiency of the costs. truncation selection of a large and small number of parents Fig. 3 Efficiency against the number of recurrent selection cycles per use recurrent selection, but are shown for comparison. Labels denote year in the two-part program by selection method and cost constraints average penalty degree of optimum cross selection that delivered the (mean and 95% confidence interval). Conventional programs did not highest long-term gain 1 3 Theoretical and Applied Genetics (2018) 131:1953–1966 1959 decreased with the increasing number of cycles, but less had larger efficiency (66), but about 2.5 times lower genetic than with constrained costs. gain. The two-part programs with truncation selection had a worse balance between genetic gain achieved and genetic Gain‑diversity trajectory diversity lost in particular when a small number of parents were used. The two-part program with optimal cross selection delivered the largest genetic gain of all breeding programs and con- served the most genetic diversity of the two-part programs. Accuracy of genomic prediction This is shown in Fig. 4, which plots the 20-year trajectory of evaluated breeding programs through the plane of genetic Optimal cross selection maintained accuracy of genomic mean and genic standard deviation. The two-part programs prediction better than truncation selection. This is shown were run with four cycles of recurrent selection. Separate in Fig. 5, which plots accuracy of genomic prediction in trends of genetic mean, genic standard deviation, and genetic doubled-haploid lines (top) and population improvement standard deviation against year are available in Supplemen- component (bottom) over 20 years. The two-part pro- tary material 3 (Fig S2.1, Fig S2.2, and Fig S2.3). The slope grams were run with four cycles of recurrent selection. of change in genetic mean on change in genic standard devi- The conventional programs with genomic selection had ation quantifies the efficiency of converting genetic diversity slowly increasing accuracy over the years due to increasing into genetic gain. genomic selection training set. The two-part programs had The two-part program with optimal cross selection had nominally higher accuracy than conventional programs the best balance between the genetic gain achieved and due to breeding program structure, i.e. double-haploid genetic diversity lost irrespective of cost constraints. With lines originated from the population improvement com- four cycles of recurrent selection per year it achieved a ponent and the product development component. This genetic gain of 15.5 for a loss of 0.38 units of genic stand- structure caused a rapid initial increase in accuracies as ard deviation (an efficiency factor of 41) under constrained the two-part programs started. However, soon after the costs and a genetic gain of 18.2 for a loss of 0.37 units of initial increase, accuracies started to decrease under con- genic standard deviation (an efficiency factor of 49) under strained costs, in particular for the truncation selection unconstrained costs. This efficiency was comparable to effi- of a small number of parents, while optimal cross selec- ciency of the conventional program with genomic selection tion and truncation selection of a large number of parents in preliminary trials, but with about two times larger genetic maintained accuracy. Under unconstrained costs, accura- gain. The conventional program with phenotypic selection cies decreased only with truncation selection of a small Fig. 4 Change of genetic mean and genic standard deviation of dou- mean regression with a time-trend arrow. The two-part programs used bled-haploid lines over years of selection by breeding program and four recurrent selection cycles per year cost constraints. Individual replicates are shown by thin lines and a 1 3 1960 Theoretical and Applied Genetics (2018) 131:1953–1966 Fig. 5 Accuracy of genomic prediction in doubled-haploid lines (top) and population improvement component (bot- tom) over 20 years of selection by breeding program and cost constraints (mean and 95% con- fidence interval). The two-part programs used four recurrent selection cycles per year number of parents, while optimal cross selection main- Relationship with effective population size tained nominally higher accuracy than truncation selection of a large number of parents. The realised effective population size of different breeding Accuracies were lower in the population improvement programs was nonlinearly related to genetic gain achieved component due to absence of breeding program struc- in 20 years and linearly related to efficiency. This is shown ture. They were also more dynamic due to several cycles in Fig. 6, which plots both genetic mean after 20 years of of recurrent selection per year and only one retraining selection and efficiency against realised effective popula- of genomic selection model per year with newly added tion size. The two-part programs were run with four cycles training data from the product development component. of recurrent selection. Genetic mean increased sharply Optimum cross selection maintained higher accuracy than with increasing effective population size up to around 10 truncation selection with much less variability than trunca- and decreased thereafter. Efficiency increased linearly tion selection, in particular under constrained costs. with effective population size over all breeding programs 1 3 Theoretical and Applied Genetics (2018) 131:1953–1966 1961 Fig. 6 Genetic mean after 20 years of selection and efficiency against per year. Results for the optimal cross selection are shown for all realised effective population size by breeding program and cost con- evaluated penalty degrees (1°, 5°, 10°, …, 85°) straints. The two-part programs used four recurrent selection cycles as well as within programs. The conventional programs Balance between selection and maintenance had on average an effective population size of 60.5 with of genetic diversity phenotypic selection, 27.8 with genomic selection in pre- liminary trials, and 14.2 with genomic selection in head- This study is an extension of our previous study (Gaynor rows. The two-part programs with truncation selection had et al. 2017), where we proposed a two-part breeding pro- small effective population sizes: 2.6 with a small number gram for implementation of recurrent genomic selection. of parents under constrained costs and 3.5 under uncon- The key component in the two-part program is population strained costs and 3.6 with a large number of parents under improvement, which uses one or more cycles of recurrent constrained costs and 7.2 under unconstrained costs. The genomic selection per year to rapidly increase the population two-part program with optimal cross selection had a large mean. This improved germplasm is in turn used as parents of range of effective population sizes as controlled by penalty crosses in the product development component from which degrees. Largest genetic gain with optimal cross selection new lines are developed. Specifically, we created DH lines under constrained (unconstrained) costs was achieved with from the improved germplasm and potentially used them as 40° (25°), which resulted in effective population size of parents after the headrow stage. Our previous study (Gaynor 10.8 (11.3). et al. 2017) assumed two cycles of population improvement per year, which delivered about 2.5 times more genetic gain than the conventional program with phenotypic selection. The main driver of this genetic gain is shortening of the Discussion breeding cycle with genomic selection, and there is scope for even shorter breeding cycle time by more aggressive use of The results show that the two-part program with optimal greenhouses and speed breeding in the population improve- cross selection delivered the largest long-term genetic gain ment part (Christopher et al. 2015; Hickey et al. 2017b; Wat- by optimising efficiency of converting genetic diversity into son et al. 2018). genetic gain. This highlights five topics for discussion, spe- In the present study we show that a more aggressive cifically: (1) balance between selection and maintenance of implementation of the two-part program, achieved through genetic diversity, (2) maintenance of genomic prediction even shorter breeding cycle times, must manage the exploi- accuracy, (3) effective population size and long-term genetic tation of genetic diversity. Preliminary analyses following gain, (4) practical implementation in self-pollinating crops, the Gaynor et al. (2017) study indicated that increasing the and (5) open questions. number of cycles above two delivered larger genetic gain in 1 3 1962 Theoretical and Applied Genetics (2018) 131:1953–1966 short term, but not in long term. This is due to the require- crosses to maintain genetic diversity. However, the sys- ment to decrease the per-generation population size to main- tematic, yet practical, approach of optimal cross selection tain equal operating cost, which results in faster depletion of formalises breeding actions and indicates decisions that a genetic diversity. A simple method to avoid fast depletion of breeder might not consider. genetic diversity is to use a sufficiently large number of par - Use of a tool like optimal cross selection is impor- ents with equalised contributions (Wright 1949). The present tant in the two-part program, because managing outbred study assessed this simple method by comparing truncation germplasm in the population improvement component is selection of a small and a large number of parents. Increas- different to managing germplasm of inbred lines. In par - ing the number of parents delivered competitive genetic ticular, differences between the outbred genotypes are less gain, but only up to three recurrent selection cycles per year. pronounced and there is very limited amount of pheno- The two-part program with optimal cross selection can typic data, if any, that breeders would use for selection and deliver higher long-term genetic gain than with truncation crossing amongst them. An example that shows the flex - selection by optimising the efficiency of turning genetic ibility of the optimal cross selection is the observed trend diversity into genetic gain. While truncation selection of a of cyclical deviations in genetic mean and genic stand- large number of parents was successful in delivering higher ard deviation in the population improvement component long-term genetic gain than truncation selection of a small (Figs. S2.1 and S2.2). Those deviations were due to using number of parents, it still rapidly reduced genetic diversity, some parents from the product development component in which limited long-term genetic gain. This was particularly an optimised crossing plan for the population improvement evident under constrained costs, but would also have even- component. Although these parents had lower genetic tually happened under unconstrained costs. Optimal cross merit than the best population improvement candidates, selection was able to overcome rapid loss of genetic diversity they had sufficiently high merit and low coancestry with through penalising the selection of parents that were too them. Optimal cross selection automatically exploited this related, which in turn enabled larger long-term genetic gain. situation to balance selection and maintenance of genetic These two results combined show that optimal cross selec- diversity. The pattern of deviations is cyclical because we tion optimises the efficiency of converting genetic diversity designed the simulation such that product development into genetic gain than truncation selection. lines were considered for use in the population improve- It was interesting to observe that the two-part program ment component only once a year. There is, however, no with optimal cross selection in population improvement reason for this limitation; i.e. optimal cross selection can had comparable efficiency to the conventional program with design crossing plans that utilise any set of individuals at genomic selection in preliminary trials, yet it had about dou- any time. ble the genetic gain. A further interesting observation was Finding a balance between selection and maintenance of that the conventional program with phenotypic selection genetic diversity is challenging, but the presented method had the highest efficiency of turning genetic diversity into provides an intuitive and practical approach. Since breed- genetic gain. Both of these observations are in line with the ing programs compete for market share, they have to select selection theory. Namely, long-term genetic gain is a func- intensively, sometimes also at the expense of genetic diver- tion of how well the within-family component of a breeding sity. While breeders can boost genetic diversity by inte- value, i.e. the Mendelian sampling term, is estimated (see grating other germplasm, this can be challenging for vari- Woolliams et al. 2015 and references therein). The conven- ous reasons including cost. Therefore, methods to optimise tional program with phenotypic evaluation or genomic selec- efficiency of converting genetic diversity into genetic gain tion in preliminary trials provides high accuracy of the Men- are desired. The approach with penalty degrees used in this delian sampling term. However, the high efficiency of these study, due to Kinghorn (2011), is intuitive and practical. two conventional programs was not due to a large genetic Namely, setting penalty degrees to 45° weighs selection gain, but instead due to a small loss of genetic diversity for and maintenance of genetic diversity equally, while set- the genetic gain that was achieved. The two-part program ting penalty degrees to 0° ignores maintenance of genetic achieved higher genetic gain, because it had much shorter diversity, which is equivalent to truncation selection. breeding cycle than the conventional programs despite lower Clearly, breeding programs are interested in small pen- accuracy of the Mendelian sampling term. alty degrees. However, as the results show, this depends Optimal cross selection provides further advantages on the factors such as population size. Under constrained than just balance between selection and maintenance of costs the optimal degrees that maximised genetic gain over genetic diversity. Comparison of optimal cross selection 20 years of selection were about 15° with one cycle of 640 against truncation selection is in a sense extreme, because selection candidates, about 25° with two cycles of 320 breeders do not perform truncation selection blindly. In selection candidates per cycle, up to 45° with six cycles practice breeders balance selection of parents from several of 107 selection candidates per cycle. 1 3 Theoretical and Applied Genetics (2018) 131:1953–1966 1963 much genetic gain (in units of initial genetic standard devia- Maintenance of genomic prediction accuracy tion) can be achieved by exhausting all genetic diversity. The two-part programs with optimal cross selection can be set The efficacy of two-part program depends crucially on the level of genomic prediction accuracy in the population up such that it delivers either the highest genetic gain after 20 years of selection or the highest efficiency (true long- improvement part. In this study the initial training set for genomic selection consisted of 3120 genotypes with associ- term genetic gain), though the balance between selection and maintenance of genetic diversity has to be different for ated yield trial data collected in the product development component. This set was expanded every year by adding the two objectives. Given that breeding programs compete for market share, the hope is that tools like optimal cross 1000 new genotypes with trial data, which in general ensured a high level of genomic prediction accuracy for the both con- selection help breeders to balance intensive selection and maintenance of genetic diversity, while mutation generates ventional and two-part programs. Note that the accuracy did not increase indefinitely due to the increasing training set, new genetic diversity to sustain long-term breeding. because of the changing population over time. Further, this training set was not sufficient to maintain accuracy over the Practical implementation in self‑pollinating crops 20 years when truncation selection with a small number of parents was used, in particular under constrained costs. The This study assumed a breeding program that can perform several breeding cycles per year. Following our previous failure to maintain accuracy in that case can be attributed to the too rapidly increasing genetic distance (drift) between work (Gaynor et al. 2017), we simulated breeding program of a self-pollinating crop such as wheat. While speed breed- training and prediction sets, which is a well-known property of genomic selection (Pszczola et al. 2012; Clark et al. 2012; ing protocols are continually improved (e.g. Christopher et al. 2015; Hickey et al. 2017b; Watson et al. 2018), the Hickey et al. 2014; Scutari et al. 2016; Michel et al. 2016). Proper management of genetic diversity constrained drift explored number of cycles per year (from one to six) should be put into a context of a particular crop. For example, speed between product development and population improve- ment components. Constraining drift in turn reduced drop breeding has achieved six cycles per year in spring wheat, but the number of cycles in winter wheat would be less due of genomic prediction accuracy in cycles of population improvement that had not had genomic selection model to the requirement for vernalisation. Logistical barriers relat- ing to genotyping may further limit the number of achievable retrained. This was partially achieved with truncation selec- tion of a larger numbers of parents, but optimal cross selec- cycles per year. An additional assumption was that the population tion reduced the drop of accuracy even further. Similarly, Eynard et al. (2017) also found that optimal contribution improvement component can be easily implemented. Our previous study assumed the use of a hybridising agent to selection provided a good balance between maintaining genetic gain, genetic diversity, and accuracy in a breeding induce male sterility and open pollination with pollen from untreated plants (Gaynor et al. 2017). Optimal contribution program with recurrent genomic selection. selection without cross allocation (Meuwissen 1997) might be applied in such a system by using pollen from different Eec ff tive population size and long‑term genetic gain individuals that is proportional to their optimised contribu- tions. Here we opted for a manual crossing system based In this study we compared different breeding programs over a 20-year period and referred to these results as long term. on either truncation selection or optimal cross selection of parents to develop a method that can be used with both While 20 years is a long-term period from the practical perspective of a breeder, it is not long term from popula- approaches. Whichever approach we use, recurrent genomic selection is constrained by the amount of seed per plant, tion/quantitative genetics perspective. This is evident from observed strong nonlinear relationship between effective because this imposes a limit on selection intensity. A way to bypass this limit is to increase the amount of seed with population size and genetic gain after 20 years. Namely, the theory predicts a positive linear relationship between selfing. In the context of genomic selection this has been termed as the Cross-Self-Select method in comparison with effective population size and long-term response to selec- tion for a polygenic trait (Robertson 1960), even in the pres- the Cross-Select method used on F seed (Bernardo 2010). We have compared these two methods (see Supplemen- ence of epistasis (Paixão and Barton 2016). Therefore, the observed highest genetic gain with effective population size tary material 3) and observed that exposing more genetic diversity with the Cross-Self-Select method enabled higher of about 10 suggests that the evaluated period is rather short to medium term. The efficiency had on the other hand a long-term genetic gain at comparable costs and time than with the Cross-Select method, while the genetic diversity positive linear relationship with effective population size, suggesting that this metric gives a better indication of the trends were comparable. The difference in long-term genetic gain between the two methods was about 10% for optimal true long-term genetic gain. In fact, efficiency measures how 1 3 1964 Theoretical and Applied Genetics (2018) 131:1953–1966 cross selection and truncation selection of a large number the additive model genetic relationships between training of parents and about 25% for truncation selection of a small and prediction individuals play an important role in addi- number of parents. This is expected, because genetic diver- tion to linkage between causal and marker loci (Habier sity was limiting with the latter program and exposing more et al. 2007; Jannink et al. 2010; Hickey et al. 2014). Non- genetic diversity through selfing had a bigger effect. It is up additive effects are likely to make this dependency even to a breeder to choose between exploiting a larger number stronger. A positive observation in relation to this is that of cycles with the Cross-Select method or a larger variance optimal cross selection constrained loss of genetic diver- with the Cross-Self-Select method. Costs can be challeng- sity in the rapidly cycling population improvement compo- ing when genotyping a large number of candidates with the nent based on genomic predictions (Figs. S2.2 and S2.3). Cross-Self-Select method, though this can be mitigated by This implies that optimal cross selection constrained the imputation and/or genotyping-by-sequencing (Hickey et al. drift between training set (product development compo- 2015; Jacobson et al. 2015; Gorjanc et al. 2017a, b). nent) and prediction set (population improvement com- ponent) under the simulated additive model, which would Open questions likely be even more beneficial in the case of significant non-additive effects. While the presented two-part program with optimal cross A related open question is the level of genomic predic- selection delivered larger long-term genetic gain and a tion accuracies observed in this study compared to empirical more efficient breeding program, there is room for fur- studies (e.g. Michel et al. 2016). An additional reason to ther improvement. We initially expected larger difference non-additive genetic effects is also genotype-by-environ- in long-term genetic gain between optimal cross selection ment and genotype-by-year effects that we did not include and truncation selection. There are at least two reasons for in this study. In our previous study (Gaynor et al. 2017) we small difference between the two selection methods. First, included these additional effects and found that while the the simulation encompassed a whole breeding program with overall level of accuracies dropped, the comparison between a sizeable initial genetic variance that did not limit selec- different breeding schemes gave similar conclusions. tion for the first few years, which means that maintenance of genetic diversity was not important initially. Had we extended the simulation period, the difference would have been larger, but even further removed from today. That said, Conclusions it is unknown where on the trajectory of exhausting genetic variance many breeding programs actually are. Perhaps they We evaluated the use of optimal cross selection to balance are as we simulated or perhaps they are less or further along selection and maintenance of genetic diversity in a two-part the trajectory. Secondly, it is unclear how to optimally main- plant breeding program with rapid recurrent genomic selec- tain genetic diversity, specifically which genetic diversity tion. The optimal cross selection delivered higher long-term should be preserved and which discarded. In this study we genetic gain than truncation selection. It achieved this by operationally measured genetic diversity in the optimal cross optimising efficiency of converting genetic diversity into selection with the identity-by-state-based coancestry, which genetic gain through reducing the loss of genetic diversity measures genome-wide diversity, but is agnostic to traits and reducing the drop of genomic prediction accuracy with under selection. Perhaps coancestry should include informa- rapid cycling. With four cycles per year optimal cross selec- tion about which alleles are more desired so that focus is on tion had 15–78% higher genetic gain and 2–4 times higher avoiding the loss of these alleles and not any alleles. This is efficiency than truncation selection. Our results suggest that a subject of our future research. breeders should consider the use of optimal cross selection An open question is the importance of non-additive to assist in optimally managing the maintenance and exploi- genetic variance. We have simulated an additive model, tation of their germplasm. albeit with a large number of causal loci. Inclusion of non- additive effects would make the simulation model more Author contribution statement GG and JH conceived the realistic, but it is unclear how to model these effects in study. RCG developed the initial plant breeding program a realistic and practical way. Non-additive effects would simulation and contributed to study design. GG extended the likely reduce prediction accuracy. However, even with simulation, implemented optimal cross selection, performed the additive model and a sizeable training set we have the analyses, and wrote manuscript. All authors read and observed a strong reduction in prediction accuracy for approved the final manuscript. the population improvement component when predic- tions were based on the non-updated genomic selection model. This reduced accuracy confirms that even under 1 3 Theoretical and Applied Genetics (2018) 131:1953–1966 1965 Acknowledgements The authors acknowledge the financial support imputation. Crop Sci 57:216. https ://doi.or g/10.2135/cr ops from the BBSRC ISPG to The Roslin Institute BBS/E/D/30002275, ci201 6.06.0526 from Grant Nos. BB/N015339/1, BB/L020467/1, BB/M009254/1. This Gorjanc G, Dumasy J-F, Gonen S et al (2017b) Potential of low- work has made use of the resources provided by the Edinburgh Com- coverage genotyping-by-sequencing and imputation for cost- pute and Data Facility (ECDF) (http://www.ecdf.ed.ac.uk). effective genomic selection in biparental segregating popula- tions. Crop Sci 57:1404–1420. https ://doi.org/10.2135/crops ci201 6.08.0675 Compliance with ethical standards Habier D, Fernando RL, Dekkers JCM (2007) The impact of genetic relationship information on genome-assisted breeding val- Conflict of interest The authors declare that they have no conflict of ues. Genetics 177:2389–2397. https ://doi.or g/10.1534/g ene t interest. ics.107.08119 0 Hickey JM, Dreisigacker S, Crossa J et al (2014) Evaluation of Open Access This article is distributed under the terms of the Crea- genomic selection training population designs and genotyping tive Commons Attribution 4.0 International License (http://creat iveco strategies in plant breeding programs using simulation. Crop mmons.or g/licenses/b y/4.0/), which permits unrestricted use, distribu- Sci 54:1476–1488. https://doi.or g/10.2135/cropsci201 3.03.0195 tion, and reproduction in any medium, provided you give appropriate Hickey JM, Gorjanc G, Varshney RK, Nettelblad C (2015) Imputa- credit to the original author(s) and the source, provide a link to the tion of single nucleotide polymorphism genotypes in biparen- Creative Commons license, and indicate if changes were made. tal, backcross, and topcross populations with a hidden Markov model. Crop Sci 55:1934–1946. https ://doi.org/10.2135/crops ci201 4.09.0648 Hickey JM, Chiurugwi T, Mackay I et al (2017a) Genomic predic- tion unifies animal and plant breeding programs to form plat - References forms for biological discovery. Nat Genet 49:1297. https ://doi. org/10.1038/ng.3920 Akdemir D, Sánchez JI (2016) Efficient breeding by genomic mating. Hickey LT, Germán SE, Pereyra SA et al (2017b) Speed breeding for Front Genet. https ://doi.org/10.3389/fgene .2016.00210 multiple disease resistance in barley. Euphytica 213:64. https:// Bernardo R (2010) Genomewide selection with minimal crossing doi.org/10.1007/s1068 1-016-1803-2 in self-pollinated crops. Crop Sci 50:624–627. h t t p s : / / d o i . Hill WG (2016) Is continued genetic improvement of livestock sus- org/10.2135/crops ci200 9.05.0250 tainable? Genetics 202:877–881. https ://doi.org/10.1534/genet Christopher J, Richard C, Chenu K et al (2015) integrating rapid ics.115.18665 0 phenotyping and speed breeding to improve stay-green and Jacobson A, Lian L, Zhong S, Bernardo R (2015) Marker imputation root adaptation of wheat in changing, water-limited, australian before genomewide selection in biparental maize populations. environments. Procedia Environ Sci 29:175–176. https ://doi. Plant Genome 8:9. https ://doi.org/10.3835/plant genom e2014 org/10.1016/j.proen v.2015.07.246 .10.0078 Clark SA, Hickey JM, Daetwyler HD, van der Werf JH (2012) The Jannink J-L, Lorenz AJ, Iwata H (2010) Genomic selection in plant importance of information on relatives for the prediction of breeding: from theory to practice. Brief Funct Genomics 9:166– genomic breeding values and the implications for the makeup 177. https ://doi.org/10.1093/bfgp/elq00 1 of reference data sets in livestock breeding schemes. Genet Sel Kinghorn BP (2011) An algorithm for efficient constrained Evol 44:4. https ://doi.org/10.1186/1297-9686-44-4 mate selection. Genet Sel Evol 43:4. https ://doi. Cowling WA, Li L, Siddique KHM et al (2016) Evolving gene banks: org/10.1186/1297-9686-43-4 improving diverse populations of crop and exotic germplasm Kinghorn BP, Banks R, Gondro C et al (2009) Strategies to exploit with optimal contribution selection. J Exp Bot. https ://doi. genetic variation while maintaining diversity. In: van der Werf org/10.1093/jxb/erw40 6 J, Graser H-U, Frankham R, Gondro C (eds) Adaptation and De Beukelaer H, Badke Y, Fack V, De Meyer G (2017) Moving fitness in animal populations. Springer, Dordrecht, pp 191–200 beyond managing realized genomic relationship in long-term Lin Z, Shi F, Hayes BJ, Daetwyler HD (2017) Mitigation of inbreed- genomic selection. Genetics. https ://doi.or g/10.1534/g ene t ing while preserving genetic gain in genomic breeding programs ics.116.19444 9 for outbred plants. Theor Appl Genet 130:969–980. https://d oi. Endelman JB (2011) Ridge regression and other kernels for genomic org/10.1007/s0012 2-017-2863-y selection with R Package rrBLUP. Plant Genome 4:250–255. McCullagh P, Nelder JA (1989) Generalized linear models, 2nd edn. https ://doi.org/10.3835/plant genom e2011 .08.0024 CRC Press, Boca Raton Eynard SE, Croiseau P, Laloë D et al (2017) Which individuals to Meuwissen THE (1997) Maximizing the response of selection with choose to update the reference population? Minimizing the Loss a predefined rate of inbreeding. J Anim Sci 75:934–940. https of Genetic Diversity in Animal Genomic Selection Programs. ://doi.org/10.2527/1997.75493 4x G3 Bethesda Md. https ://doi.org/10.1534/g3.117.1117 Michel S, Ametz C, Gungor H et al (2016) Genomic selection across Gaynor RC, Gorjanc G, Wilson DL et al. AlphaSimR: An R Package multiple breeding cycles in applied bread wheat breeding. Theor for Breeding Program Simulations. Manuscr Prep Appl Genet 129:1179–1189. https ://doi.or g/10.1007/s0012 Gaynor RC, Gorjanc G, Bentley AR et al (2017) A two-part strategy 2-016-2694-2 for using genomic selection to develop inbred lines. Crop Sci Paixão T, Barton NH (2016) The effect of gene interactions on 57:2372–2386. https ://doi.org/10.2135/crops ci201 6.09.0742 the long-term response to selection. Proc Natl Acad Sci USA Gorjanc G, Hickey JM (2018) AlphaMate: a program for optimis- 113:4422–4427. https ://doi.org/10.1073/pnas.15188 30113 ing selection, maintenance of diversity, and mate allocation in Pszczola M, Strabel T, Mulder HA, Calus MPL (2012) Reliability of breeding programs. Bioinformatics. h ttp s :/ /d oi. or g/1 0. 10 93/ direct genomic values for animals with different relationships bioin forma tics/bty37 5 within and to the reference population. J Dairy Sci 95:389–400. Gorjanc G, Battagin M, Dumasy J-F et al (2017a) Prospects for https ://doi.org/10.3168/jds.2011-4338 cost-effective genomic selection via accurate within-family 1 3 1966 Theoretical and Applied Genetics (2018) 131:1953–1966 R Development Core Team (2017) R: a language and environment Watson A, Ghosh S, Williams MJ et al (2018) Speed breeding is for statistical computing. R Foundation for Statistical Comput- a powerful tool to accelerate crop research and breeding. Nat ing, Vienna Plants. https ://doi.org/10.1038/s4147 7-017-0083-8 Robertson A (1960) A theory of limits in artificial selection. Woolliams JA, Berg P, Dagnachew BS, Meuwissen THE (2015) Proc R Soc Lond B 153:234–249. h t tp s : // d o i. o r g / 10 . 1 09 8 / Genetic contributions and their optimization. J Anim Breed rspb.1960.0099 Genet 132:89–99. https ://doi.org/10.1111/jbg.12148 Scutari M, Mackay I, Balding D (2016) Using genetic distance Wray NR, Goddard ME (1994) Increasing long-term to infer the accuracy of genomic prediction. PLoS Genet response to selection. Genet Sel Evol 26:431. https ://doi. 12:e1006288. https ://doi.org/10.1371/journ al.pgen.10062 88 org/10.1186/1297-9686-26-5-431 Storn R, Price K (1997) differential evolution—a simple and efficient Wright S (1949) The genetical structure of populations. Ann Eugen heuristic for global optimization over continuous spaces. J Glob 15:323–354. https://doi.or g/10.1111/j.1469-1809.1949.tb02451.x Optim 11:341–359. https ://doi.org/10.1023/A:10082 02821 328 Venables WN, Ripley BD (2002) Modern applied statistics with S, 4th edn. Springer, New York 1 3
TAG Theoretical and Applied Genetics – Springer Journals
Published: Jun 6, 2018
It’s your single place to instantly
discover and read the research
that matters to you.
Enjoy affordable access to
over 18 million articles from more than
15,000 peer-reviewed journals.
All for just $49/month
Query the DeepDyve database, plus search all of PubMed and Google Scholar seamlessly
Save any article or search result from DeepDyve, PubMed, and Google Scholar... all in one place.
Get unlimited, online access to over 18 million full-text articles from more than 15,000 scientific journals.
Read from thousands of the leading scholarly journals from SpringerNature, Elsevier, Wiley-Blackwell, Oxford University Press and more.
All the latest content is available, no embargo periods.
“Hi guys, I cannot tell you how much I love this resource. Incredible. I really believe you've hit the nail on the head with this site in regards to solving the research-purchase issue.”Daniel C.
“Whoa! It’s like Spotify but for academic articles.”@Phil_Robichaud
“I must say, @deepdyve is a fabulous solution to the independent researcher's problem of #access to #information.”@deepthiw
“My last article couldn't be possible without the platform @deepdyve that makes journal papers cheaper.”@JoseServera