Background: Due to the advancement in high throughput technology, single nucleotide polymorphism (SNP) is routinely being incorporated along with phenotypic information into genetic evaluation. However, this approach often cannot achieve high accuracy for some complex traits. It is possible that SNP markers are not sufficient to predict these traits due to the missing heritability caused by other genetic variations such as microsatellite and copy number variation (CNV), which have been shown to affect disease and complex traits in humans and other species. Results: In this study, CNVs were included in a SNP based genomic selection framework. A Nellore cattle dataset consisting of 2230 animals genotyped on BovineHD SNP array was used, and 9 weight and carcass traits were analyzed. A total of six models were implemented and compared based on their prediction accuracy. For comparison, three models including only SNPs were implemented: 1) BayesA model, 2) Bayesian mixture model (BayesB), and 3) a GBLUP model without polygenic effects. The other three models incorporating both SNP and CNV included 4) a Bayesian model similar to BayesA (BayesA+CNV), 5) a Bayesian mixture model (BayesB+CNV), and 6) GBLUP with CNVs modeled as a covariable (GBLUP+CNV). Prediction accuracies were assessed based on Pearson’s correlation between de-regressed EBVs (dEBVs) and direct genomic values (DGVs) in the validation dataset. For BayesA, BayesB and GBLUP, accuracy ranged from 0.12 to 0.62 across the nine traits. A minimal increase in prediction accuracy for some traits was noticed when including CNVs in the model (BayesA+CNV, BayesB+CNV, GBLUP+CNV). Conclusions: This study presents the first genomic prediction study integrating CNVs and SNPs in livestock. Combining CNV and SNP marker information proved to be beneficial for genomic prediction of some traits in Nellore cattle. Keywords: Genomic selection, Complex trait, CNV, SNP, Nellore cattle Background of causal variants [1, 2]. However, due to incomplete Genomic prediction is the estimation of breeding values linkage disequilibrium (LD) with other variants [3–7], using genetic variations such as single nucleotide poly- SNP markers may fail to capture all the effects of vari- morphism (SNP) . Ideally, breeding values would be ants causing missing heritability or phenotypic devia- predicted as the sum of the effects of all inherited quan- tions, thus genomic estimated breeding values (GEBV) titative trait nucleotides (QTNs). As QTNs are not based on SNPs may represent only a component of the known in practice, genome-wide SNP markers have been true breeding value (TBV) . Missing heritability was proposed as surrogates to indirectly capture the effects defined as the proportion of genetic variation not accounted for by SNPs but predicted to be present due to heritability. Another possibility is that genetic effects * Correspondence: email@example.com; George.Liu@ars.usda.gov are not due to the common SNPs, but due to other Departamento de Medicina Veterinária Preventiva e Reprodução Animal, Faculdade de Ciências Agrárias e Veterinárias, UNESP - Univ Estadual Paulista, kinds of genetic variants, such as microsatelites and copy Jaboticabal, SP 14884-900, Brazil number variations (CNV) [9, 10]. Animal Genomics and Improvement Laboratory, BARC, USDA-ARS, Beltsville, MD 20705, USA Full list of author information is available at the end of the article © The Author(s). 2018 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. Hay et al. BMC Genomics (2018) 19:441 Page 2 of 8 In the last ten years, attention has been drawn to (2) NH or “non-homologous” rearrangements without CNVs, as they are deemed to impact phenotypes. CNVs sequence similarity, including NHEJ (non-homologous are structural variations larger than 50 bp in the form end-joining) and (MMBIR) microhomology-mediated of insertions, deletions, duplications, inversions and break-induced replication; (3) VNTR (the shrinking or translocations [11, 12]. For example, a number of studies expansion of variable number of tandem repeats, often indicate chromosomal translocations and subsequent du- involving simple sequences by slippage; and (4) MEI plications of the KIT gene are involved in several distinct (mobile element insertions, including transposition and cattle coat phenotypes [13, 14], suggesting that the differ- retrotransposition of common repeats). Within these ent modifications of the KIT gene can influence coat color CNV types, deletions are often mediated by NAHR and in cattle . are the easiest ones to detect, genotype and validate. Given the ubiquity of immunity related genes that co- Therefore, deletions were extensively studied by ; 2/ incide with CNVs, there are likely many more immunity 3 of reported events were deletions and almost all traits that are influenced by CNVs. Antimicrobial peptides (98.8%) validations were on deletions. Deletion’sunique (AMPs) represent a class of copy number variable genes advantages are that their locations and allele types are within livestock species that function as part of the innate well defined and easy to assess. For a single deletion, its immune response to pathogens. The β-defensin class of location is restricted to the allele’slocus andcan be AMPs appears to be copy number variable in several live- easily derived. Its alleles normally can only be one of stock species, but most notably in cattle [15, 16]. The lin- these three types: no deletion (0,0), heterozygous dele- gual antimicrobial peptide (LAP) and tracheal antimicrobial tion (− 1,0) and homozygous deletions (− 1,-1). peptide (TAP) genes share a high degree of sequence By comparing deletion genotypes with genotypes of homology with β-defensins, but AMPs are exclusive to nearby SNP, Mills et al.  found, consistent with earlier cattle . Additionally, the BSP30A gene, which is an studies [3–7], that 81% of common deletions had one or important salivary AMP, was found to be highly copy more SNPs with which they are strongly correlated. This number variable within cattle of different breeds . suggests that many deletions mapped will be identifiable Finally, cathelicidin-type AMPs such as CATHL4  through tagging SNPs. However, a fifth of the genotyped and PGN3  have been identified as highly variable deletions were not tagged by HapMap SNP, implying that among pig and cattle individuals, respectively. MHC these CNV should be genotyped directly. In our cattle gene family members have been frequently found to be study , we observed a similar result, i.e. 75% simple de- copy number variable in livestock species. A duplica- letions displayed LD with SNPs while the remaining 25% tion of the CIITA gene, which encodes a trans-activator did not, suggesting that these events are not tagged by the of the MHC class II receptor, was found in cattle that BovineHD SNPs. Similarly, Handsaker et al. used had resistance to ingested nematodes . In addition, whole genome sequence data to detect and impute CNV studies on the loss of copy number of MHC class II and found that most of common deletions and biallelic genes within other species have revealed increased sus- duplications were well imputed whereas the imputation ceptibility of that species to pathogens and cancers, accuracy for common multi-allelic CNV or mCNV, espe- such as the Tasmanian devil facial tumor epidemic . cially duplications with three or more segregating alleles This serves as a warning to all animal breeders, as a was lower . Additionally, the LD properties of complex loss of diversity at this locus due to improperly man- SVs (e.g., mCNV like tandem duplications or novel se- aged selective breeding or imposed population bottle- quence insertion) have not yet been fully ascertained necks could increase the susceptibility of their herds to because methods for genotyping such CNVs with high epidemics . Several other classes of immunity related accuracy just emerged and was only reported and ap- gene families have also been identified as copy number plicable for human data [24, 25]. variable in livestock species. Expansion and contraction of CNV can function either as causal variants or as tag- the workshop class I (WC1) gene family has been identi- ging markers. A human study found that CNVs captured fied in cattle [15, 16]. WC1 genes are unique to the cattle, around 18% of the total variation in gene expression in sheep, and pig genomes, and encode pattern recognition cultured lymphocytes . Furthermore, studies revealed receptors expressed on γδ-T cells . several CNVs with effects on livestock economically im- The two 1000 human genome structural variation (SV) portant traits such as milk production, residual feed in- papers reported multiple types of CNVs, including dele- take in Holstein cows and disease resistance in Angus tions, tandem duplications, novel sequence insertions and cattle [7, 27–29]. As CNVs have been shown to affect mobile element insertions [11, 23]. Mills et al. studied gene structure and dosage, they may have drastic effects four CNV formation mechanisms by examining the break- on phenotypes, altering gene regulation and exposing re- point junction sequence: (1) NAHR or non-allelic homolo- cessive alleles [7, 28, 30–33]. Considering the critical gous recombination, associated with homologous sequence; role of CNVs in complex traits, genomic prediction Hay et al. BMC Genomics (2018) 19:441 Page 3 of 8 integrating both SNP and CNV may offer novel insights Table 1 Number of animals and heritabilities of traits analyzed for elucidating complex traits and understanding the Trait N h missing heritability. However, in the last decade, nearly BW 2058 0.37 all genomics predication in farm animal were conducted CW 2032 0.25 based on only SNP using GBLUP and Bayesian methods. CY 1979 0.31 Up to now, there are no reports of the joint use of SNP MW 2032 0.26 and CNV genotypes in genomic prediction in livestock MY 1979 0.30 [1, 34, 35]. We recently published a CNV-based study of growth traits using high density SNP microarray data in PW 1982 0.25 Bos indicus cattle. We detected 17 CNVs significantly as- PWG 1990 0.33 sociated with seven growth traits . The objectives of PY 1979 0.31 this study were to integrate CNV (deletions and biallelic WG 2052 0.26 duplications) with SNPs into genomic evaluation using GBLUP and Bayesian methods and investigate their im- pact on the genomic prediction accuracy. SNP genotyping and quality control A total of 2230 Nellore animals (Bos indicus) were geno- typed for 777,962 SNP markers with the Illumina BovineHD Methods BeadChip assay. This data builds on previously published Phenotypic data studies [37, 39]. The quality control step consisted of Estimated breeding values (EBVs) were based on Best excluding SNP markers with minor allele frequency Linear Unbiased Predictor (BLUP) estimates of single-trait less than 0.02 and SNPs with Call Rate (CR )<0.98 SNP animal models obtained from routine genetic evaluations and Fisher’s exact test P-value for Hardy-Weinberg − 5 using performance and pedigree data from the database Equilibrium (HWE) < 1 × 10 . (available at: http://www.gensys.com.br/home/show_ page.php?id=701). Phenotypes used to fit the models CNV segmentation and genotyping comprised records from 542,918 animals born between The multivariate CNV calling approach of Golden Helix 1985 and 2011, and raised in 243 grazing-based herds. The SVS 8.3.0 (Golden Helix Inc., Bozeman, MT, USA) was evaluated traits included birth weight (BW), post weaning used to detect common CNV events. This is because other gain (PWG), weaning gain (WG), carcass conformation at traditional CNV discovery methods are not designed to find weaning (CW), muscling at weaning (MW), carcass finish- common CNVs but to report more CNVs . In total, ing precocity at weaning (PW), carcass conformation at 992,350 CNVs were detected, as described previously . yearling (CY), muscling at yearling (MY) and carcass finish- By merging all the segments, 445 non-redundant CNV ing precocity at yearling (PY). Conformation, finishing pre- events were identified in the 2230 samples. After filtering cocity and muscling traits (CPM) were based on recorded away CNVs over 5 Mb and CNVs with frequency < 0.45% visual scores assigned in a discrete ordered scale, relative to (i.e. appearing in less than 10 samples), a total of 231 CNVs theanimals of thesamemanagementgroup (for amorede- with high confidence, ranging from 894 bp to 4,855,088 bp, tailed description of the traits, see Neves et al. 2014 ). were retained and used in further analysis. For each trait, only EBVs of animals whose accuracy (i.e., After visual inspection of the histograms of segment square root of reliability, calculated based on prediction mean intensities (LRR), all 231 CNVs were assigned into error variance estimates) was > 0.50 were analyzed. The 2 categories: CNV events with simple and distinct geno- number of animals used in the study and heritability of type clusters or CNV with multiallelic and complex traits analyzed are presented in Table 1. genotype clusters. Deletions and biallelic duplications In this study, genomic prediction analysis was car- can be genotyped if the clusters representing different ried out using de-regressed EBV (dEBVs) instead of genotypes are sufficiently distinct. Based on this classifi- EBVs as the response variable in order to remove any cation and event frequency, three different CNV subsets bias due to double counting phenotypic and pedigree were tested and used in genomic prediction analyses: 1) information. De-regressing of EBVs was performed ac- common deletions (n = 55) with frequency > 5%; 2) all cording to the approach proposed by which re- deletions (n = 72) and 3) all deletions and biallelic dupli- moved parent average (PA) effects and also accounted cation (n = 173) (Additional file 1: Table S1). for heterogeneous variances. To test the performance of the proposed models, the dataset was randomly Statistical analysis split into two datasets, 2/3 and 1/3 of the data for The first three models used to estimate DGVs considering training and validation, respectively and the analysis SNP effects only are the following: 1) Bayesian regression was replicated five times. model (BayesA), a mixture Bayesian model (Bayes B) and Hay et al. BMC Genomics (2018) 19:441 Page 4 of 8 a GBLUP model without polygenic effects (GBLUP). All the regression slope of dEBVs on DGVs for animals in three models accounted for additive effects only. the validation dataset in order to test the inflation/defla- The first approach to combine SNP marker informa- tion degree of genomic predictions. tion and CNV information (BayesA+CNV) is described Gibbs sampler was used with a chain of 90,000 itera- below. In this approach we assume that SNP effects and tions for each parameter, with a burn-in period of 10,000 CNV effects contribute to the genetic variance. Using iterations and a sampling interval of 100 iterations. Con- this approach the effects of variants will be modeled as vergence testing was performed for all parameters in- the following: cluding SNP effects following Geweke’s (1992)  and Heidelberger and Welch’s (1983) , and visual analysis of trace plots was also performed using Bayesian Output X X Analysis program. y ¼ μ þ x b þ z g þ e ð1Þ ik k il i i l k¼1 l¼1 Results and discussion Where y is the pseudo-phenotype (dEBV) for animal CNV detection i, μ is the overall mean, x is the SNP marker genotype Out of 231 CNV, 95 (41.13%) were pure deletions. ik for animal i at locus k (k =1, 2…, p)coded as the number Within the remaining CNVs, only 12 CNVs (5.19%) have th of copies of minor allele, b is the k SNP effect, z is duplication frequency > 5% and all other 124 CNV had k il th the CNV genotype for animal i, g is the l CNV effect duplication frequency < 5%. Based on CNV classification and e is the residual term. and event frequency (Methods), three different CNV For CNV effects, a flat prior was assumed since the subsets were tested and used in genomic prediction ana- number of CNVs is several folds smaller than the num- lyses: 1) common deletions (n = 55) with frequency > 5%; ber of observations therefore allowing the data to drive 2) all deletions (n = 72) and 3) all deletions and biallelic the inferences of CNV effects. duplication (n = 173) (Additional file 1: Table S1). The second approach to incorporate SNP markers and As we described previously , most of deletions re- CNVs is a mixture model (BayesB+CNV) similar to the ported in this study by the SVS’s multivariate option were either no deletions or homozygous deletions, with model in eq. (1) except for the SNP effects part x b I ik k k only a handful of events were heterozygous deletions. A k¼1 th where x is the genotype of the k marker, coded as the similar observation of two alleles was found for biallelic ik number of copies of the minor allele, b is the effect of duplications, the event was either with no duplication or marker k,and I is an indicator variable that is equal to 1 with duplication. These results indicated that deletions th if the k marker has a non-zero effect on the trait and 0 and biallelic duplications could be accurately genotyped otherwise. A binominal distribution with known probabil- with defined genomic coordinates and mainly 2 states ity π = 0.01 was assumed for I . As opposed to SNPs, a (with or without deletion or duplication), which were mixture distribution was not assumed for CNVs, since the similar to the behaviors of common SNPs. As demon- number of CNVs is small. strated for human and cattle CNVs previously [4, 36, 43], A third approach is GBLUP where CNVs were mod- the assumed additive model was largely satisfied when de- eled as a covariate which can be described as: letions and biallelic duplications were included in genetic prediction. y¼XbþZaþe ð2Þ Genomic prediction where y is the vector of dEBVs, b is a vector of fixed Different methods of incorporating CNVs into genomic CNV covariates coded as − 1, 0, 1 for neutral, loss and evaluation were compared based on their prediction ac- gain states respectively, a is the vector of random animal curacies. Using the average of 5 replicates, prediction additive effects and e is the vector of residual terms. The accuracies computed as Pearson’s correlation between direct genomic value (DGV) was calculated as: DGV and dEBV for all nine traits using SNP markers are shown in Table 2. The accuracy for BW trait was DGV ¼ Xb þ Zu ^ ð3Þ 0.21 using BayesA and dropped to 0.17 and 0.20 using Where DGV is the vector of direct genomic values, X BayesB and GBLUP respectively. For MW, higher ac- is the matrix of CNV covariates, b is the vector of CNV curacy was seen using GBLUP (0.40) compared to effects, Z is the matrix of genotypes and u is the vector models BayesA and BayesB with accuracies of 0.36 and of estimated SNP effects. 0.34 respectively. The highest prediction accuracy was Models adopted in this study were compared using the noticed for PY using BayesB model (0.62). On average following criteria: Pearson’s correlation between dEBV GBLUP model resulted in slightly higher genomic predic- and DGV, mean squared error of Prediction (MSE) and tion accuracies than BayesA and BayesB. The genomic Hay et al. BMC Genomics (2018) 19:441 Page 5 of 8 Table 2 Pearson’s correlations between dEBVs and DGVs of 9 traits for different models using SNP markers only and combining SNP and CNV information BayesA BayesA+CNV Bayes B BayesB+CNV GBLUP GBLUP+CNV a b c d a b c d a b c d Trait SNPs del All del All SNPs del All del All SNPs del All del All BW 0.21 (±0.03) 0.22 (±0.02) 0.21 (±0.04) 0.24 (±0.02) 0.17 (±0.02) 0.23 (±0.02) 0.23 (±0.03) 0.22 (±0.06) 0.20 (±0.03) 0.20 (±0.02) 0.20 (±0.02) 0.20 (±0.04) CW 0.12 (±0.01) 0.10 (±0.01) 0.10 (±0.01) 0.10 (±0.03) 0.15 (±0.01) 0.15 (±0.04) 0.13 (±0.04) 0.14 (±0.02) 0.15 (±0.04) 0.16 (±0.01) 0.14 (±0.04) 0.14 (± 0.05) CY 0.23 (±0.03) 0.23 (± 0.04) 0.22 (±0.03) 0.20 (±0.02) 0.22 (±0.03) 0.19 (±0.02) 0.19 (±0.05) 0.19 (±0.02) 0.24 (±0.01) 0.22 (±0.01) 0.21 (±0.02) 0.22 (± 0.03) MW 0.36 (±0.01) 0.34 (± 0.01) 0.33 (±0.02) 0.36 (±0.01) 0.34 (±0.01) 0.39 (± 0.02) 0.38 (±0.03) 0.38 (±0.03) 0.40 (±0.02) 0.39 (±0.01) 0.39 (±0.04) 0.40 (± 0.04) MY 0.54 (±0.05) 0.53 (± 0.06) 0.50 (±0.03) 0.53 (±0.04) 0.51 (±0.04) 0.56 (±0.02) 0.54 (±0.04) 0.54 (±0.04) 0.54 (±0.03) 0.52 (±0.04) 0.50 (±0.02) 0.55 (± 0.02) PW 0.38 (±0.02) 0.36 (±0.03) 0.34 (±0.05) 0.36 (±0.01) 0.37 (±0.03) 0.38 (±0.01) 0.34 (±0.04) 0.40 (±0.02) 0.38 (±0.04) 0.36 (±0.03) 0.32 (±0.03) 0.36 (± 0.05) PWG 0.27 (±0.02) 0.24 (±0.04) 0.23 (±0.03) 0.23 (±0.03) 0.30 (±0.01) 0.26 (±0.02) 0.26 (±0.03) 0.26 (±0.04) 0.30 (±0.01) 0.26 (±0.03) 0.24 (±0.02) 0.27 (± 0.02) PY 0.58 (±0.04) 0.58 (±0.03) 0.58 (±0.04) 0.58 (±0.04) 0.62 (±0.02) 0.57 (±0.02) 0.56 (±0.03) 0.58 (±0.05) 0.57 (±0.03) 0.59 (±0.04) 0.59 (±0.04) 0.59 (± 0.04) WG 0.28 (±0.05) 0.22 (±0.04) 0.20 (±0.03) 0.21 (±0.05) 0.30 (±0.03) 0.21 (±0.03) 0.20 (±0.03) 0.22 (±0.06) 0.30 (±0.01) 0.22 (±0.04) 0.22 (±0.05) 0.22 (± 0.03) Only SNPs SNPs and only common deletions with frequency greater than 5% were included in the model (55 CNVs) SNPs and all deletions were included in the model (72 CNVs) SNPs and all deletions and biallelic duplications (173 CNVs) Hay et al. BMC Genomics (2018) 19:441 Page 6 of 8 Fig. 1 Prediction accuracies calculated as Pearson’s correlations between direct genomic values (DGVs) and dEBVs of animals in the validation data sets using BayesA, BayesB and GBLUP prediction accuracy results of the three models differed deletions greater than 5%, all deletions, and all deletions from trait to trait as displayed in Fig. 1. and biallelic duplications respectively. The prediction accuracies integrating CNVs are also A decrease in accuracy was also noticed when incorp- presented in Table 2. Three different CNV subsets were orating CNVs in the prediction. Accuracy for trait CW tested (common deletions with frequency greater than decreased from to 0.12 using BayesA to 0.10 using 5%, all deletions, and all deletions and biallelic duplica- BayesA+CNV. The largest decrease in prediction accur- tions). A small increase in prediction accuracy was seen acy was seen for WG using BayesB+CNV model. This for BW for all models across all three scenarios. The decrease could be due to redundant information of highest increase was noticed for model BayesB+CNV CNVs already captured by SNP markers. On average, using all deletions (0.23 vs. 0.17). Further, prediction ac- using common deletion CNVs with frequency greater curacy slightly increased for MW, MY and PW traits. than 5% resulted in higher accuracies. Furthermore, Using BayesB model, the prediction accuracy for MW GBLUP+CNV slightly outperformed BayesA+CNV and was 0.34, the accuracy increased when including CNVs BayesB+CNV. This gain in accuracy was observed when (BayesB+CNV) to 0.39, 0.38 and 0.38 for common including CNVs into GBLUP type of approach. A plausible Table 3 Mean squared error (MSE) of genomic predictions of different models using all deletions and biallelic duplications (173 CNVs) Trait BayesA BayesA+CNV BayesB BayesB+CNV GBLUP GBLUP+CNV BW 0.87 0.86 0.89 0.85 0.88 0.89 CW 0.10 0.14 0.11 0.12 0.12 0.12 CY 0.18 0.21 0.20 0.23 0.19 0.21 MW 0.29 0.29 0.30 0.26 0.24 0.25 MY 0.28 0.32 0.26 0.22 0.27 0.22 PW 0.08 0.11 0.10 0.07 0.09 0.11 PWG 25.76 25.45 24.32 24.49 23.76 24.38 PY 0.16 0.16 0.14 0.15 0.15 0.11 WG 18.32 20.85 20.14 20.76 14.60 16.68 Hay et al. BMC Genomics (2018) 19:441 Page 7 of 8 explanation to the behavior seen using this approach is Conclusions that the genomic relationship matrix G used in GBLUP In this study, including copy number variation information does not capture all the genetic variation, therefore in- into genomic selection proved to be beneficial for some cluding CNVs as covariates may explain part of the miss- traits. However, their impact varied from model to model ing genetic variance and thus improving the prediction and from trait to trait and a universal model is yet to be accuracy. developed. The small increase in prediction accuracy seen By evaluating the models in this study using the MSE when integrating CNVs could be due to their function criterion (Table 3), we found that the goodness-of-fit of either as causal genes or as tagging markers. This might the model did not improve when including CNVs into the help in the prediction of complex traits and explain part of model (BayesA+CNV, BayesB+CNV and GBLUP+CNV), the missing heritability that SNP markers fail to capture. but on average, MSE was higher for these models. Future efforts are warranted to better utilize CNV informa- In order to measure the degree of inflation or deflation tion in genomic evaluation methods. of direct genomic breeding values (DGV), the slope of the regression (b1) of dEBVs on DGV was evaluated. Additional file Table 4 shows the estimates of b1 for all nine traits. Additional file 1: Table S1. Detail information of the CNVs detected Model BayesA and BayesA+CNV resulted in inflated es- with high confidence. (XLSX 37 kb) timates compared to the other models. On average GBLUP performed the best in terms of scale. Acknowledgements A study using the same dataset  revealed genetic Mention of trade names or commercial products in this article is solely stratification among the samples. Population stratifica- for the purpose of providing specific information and does not imply recommendation or endorsement by the US Department of Agriculture. tion could potentially affect the resulting genomic pre- The USDA is an equal opportunity provider and employer. diction accuracies; however a random cross-validation approach was adopted in this study so that the impact of Funding stratification was minimized . In general, the predic- This work was supported in part by AFRI grant numbers 2011–67015-30183 and 2013–67015-20951 from USDA NIFA and United States - Israel Binational tion accuracy of the DGV for most traits using only SNP Agricultural Research and Development (BARD) Fund Award number US- was in concordance with the results reported in litera- 4997-17. The funders had no role in the design of the study and collection, ture for Nellore cattle breed . For example the pre- analysis, and interpretation of data and in writing the manuscript. diction accuracy for BW using GBLUP was 0.24 as Availability of data and materials reported in , and it resulted in an accuracy of 0.20 Since these genotyping belongs to Brazilian AI semen providers, they are here (Table 2). Additionally, although genomic predic- available upon request (requires a signed Material Transfer Agreement for exclusive research purpose). tion accuracies were computed using dEBVs as the re- sponse variable; the results shouldn’t greatly change if Authors’ contributions other unbiased measures of true genetic value (e.g., aver- Experimental design: EHAH, GEL. Sample collection and genotyping: HHRN, RC, JFG. Result interpretation: YTU, LX, YZ, DMB. Manuscript preparation: age corrected performances (YD) or DYD for bulls) EHAH, YTU, LM, JFG, GEL. All authors read and approved the final manuscript. instead of dEBVs were used. Ethics approval and consent to participate Specific approval from an Animal Care and Use Committee was not obtained for this study as analyses were performed with data previously Table 4 Inflation estimates (b1) of genomic prediction of 9 generated from samples previously collected as part of commercial testing procedures. traits using different models using all deletions and biallelic duplications (173 CNVs) Competing interests b1(dEBV,DGV) The authors declare that they have no competing interests. Trait BayesA BayesA BayesB BayesB GBLUP GBLUP +CNV +CNV +CNV Publisher’sNote BW 1.06 0.96 0.78 0.95 1.04 0.93 Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. CW 1.78 1.52 1.19 1.23 0.90 1.09 Author details CY 1.82 1.74 1.44 1.31 1.12 1.18 USDA Agricultural Research Service, Fort Keogh Livestock and Range MW 1.56 1.39 1.10 0.90 0.94 0.96 2 Research Laboratory, Miles City, MT 59301, USA. Departamento de Medicina Veterinária Preventiva e Reprodução Animal, Faculdade de Ciências Agrárias MY 1.28 1.33 1.15 1.09 1.02 1.15 e Veterinárias, UNESP - Univ Estadual Paulista, Jaboticabal, SP 14884-900, PW 1.37 1.41 1.21 1.12 1.12 1.22 Brazil. Institute of Animal Science, Chinese Academy of Agricultural Science, Beijing 100193, China. College of Animal Science and Technology, PWG 0.84 0.91 0.90 0.88 0.89 0.92 Northwest A&F University, Shaanxi Key Laboratory of Agricultural Molecular PY 1.24 1.19 1.23 1.14 1.11 1.09 Biology, Yangling, Shaanxi 712100, China. Departamento de Zootecnia, Faculdade de Ciências Agrárias e Veterinárias, UNESP - Univ Estadual Paulista, WG 0.83 0.92 0.86 0.82 0.90 0.92 Jaboticabal, SP 14884-900, Brazil. Animal Genomics and Improvement Hay et al. BMC Genomics (2018) 19:441 Page 8 of 8 Laboratory, BARC, USDA-ARS, Beltsville, MD 20705, USA. Department of 22. Herzig CT, Baldwin CL. Genomic organization and classification of the Animal and Avian Sciences, University of Maryland, College Park, MD 20742, bovine WC1 genes and expression by peripheral blood gamma delta T USA. Departamento de Apoio, Produção e Saúde Animal, Faculdade de cells. BMC Genomics. 2009;10:191. Medicina Veterinária de Araçatuba, UNESP – Univ Estadual Paulista, 23. Sudmant PH, Rausch T, Gardner EJ, Handsaker RE, Abyzov A, Huddleston J, Araçatuba, SP 16050-680, Brazil. et al. An integrated map of structural variation in 2,504 human genomes. Nature. 2015;526:75–81. Received: 18 July 2017 Accepted: 14 May 2018 24. Handsaker RE, Van D, V, Berman JR, Genovese G, Kashin S, Boettger LM et al.: Large multiallelic copy number variations in humans. Nat Genet 2015, 47: 296–303. 25. Hehir-Kwa JY, Marschall T, Kloosterman WP, Francioli LC, Baaijens JA, Dijkstra LJ, et al. A high-quality human reference panel reveals the complexity and distribution of genomic structural variants. Nat Commun. 2016;7:12989. References 26. Stranger BE, Forrest MS, Dunning M, Ingle CE, Beazley C, Thorne N, et al. 1. Meuwissen TH, Hayes BJ, Goddard ME. Prediction of total genetic value Relative impact of nucleotide and copy number variation on gene using genome-wide dense marker maps. Genetics. 2001;157:1819–29. expression phenotypes. Science. 2007;315:848–53. 2. Xu S. Estimating polygenic effects using markers of the entire genome. 27. Hou Y, Liu GE, Bickhart DM, Matukumalli LK, Li C, Song J, et al. Genomic Genetics. 2003;163:789–801. regions showing copy number variations associate with resistance or 3. Korn JM, Kuruvilla FG, McCarroll SA, Wysoker A, Nemesh J, Cawley S, et al. susceptibility to gastrointestinal nematodes in Angus cattle. Funct Integr Integrated genotype calling and association analysis of SNPs, common copy Genomics. 2011;12:81–92. number polymorphisms and rare CNVs. Nat Genet. 2008;40:1253–60. 28. Hou Y, Bickhart DM, Chung H, Hutchison JL, Norman HD, Connor EE, et al. 4. Wheeler E, Huang N, Bochukova EG, Keogh JM, Lindsay S, Garg S, et al. Analysis of copy number variations in Holstein cows identify potential Genome-wide SNP and CNV analysis identifies common and low-frequency mechanisms contributing to differences in residual feed intake. Funct Integr variants associated with severe early-onset obesity. Nat Genet. 2013;45:513–7. Genomics. 2012;12:717–23. 5. McCarroll SA, Kuruvilla FG, Korn JM, Cawley S, Nemesh J, Wysoker A, et al. 29. Xu L, Hou Y, Bickhart DM, Song J, Van Tassell CP, Sonstegard TS, et al. A Integrated detection and population-genetic analysis of SNPs and copy genome-wide survey reveals a deletion polymorphism associated with number variation. Nat Genet. 2008;40:1166–74. resistance to gastrointestinal nematodes in Angus cattle. Funct Integr 6. Wineinger NE, Pajewski NM, Tiwari HK. A method to assess linkage Genomics. 2014;14:333–9. disequilibrium between CNVs and SNPs inside copy number variable 30. Orozco LD, Cokus SJ, Ghazalpour A, Ingram-Drake L, Wang S, van Nas A, et regions. Front Genet. 2011;2:17. al. Copy number variation influences gene expression and metabolic traits 7. Xu L, Cole JB, Bickhart DM, Hou Y, Song J, VanRaden PM, et al. Genome in mice. Hum Mol Genet. 2009;18:4118–29. wide CNV analysis reveals additional variants associated with milk 31. Zhang F, Gu W, Hurles ME, Lupski JR. Copy number variation in human health, production traits in Holsteins. BMC Genomics. 2014;15:683. disease, and evolution. Annu Rev Genomics Hum Genet. 2009;10:451–81. 8. Taylor JF: Implementation and accuracy of genomic selection. Aquaculture 32. Henrichsen CN, Chaignat E, Reymond A. Copy number variants, diseases 2014, Suppl 1: S8-S14. and gene expression. Hum Mol Genet. 2009;18:R1–8. 9. Eichler EE, Flint J, Gibson G, Kong A, Leal SM, Moore JH, et al. Missing 33. Gamazon ER, Stranger BE. The impact of human copy number variation on heritability and strategies for finding the underlying causes of complex gene expression. Brief Funct Genomics. 2015;14:352–7. disease. Nat Rev Genet. 2010;11:446–50. 34. Gianola D, de los CG, Hill WG, Manfredi E, Fernando R. Additive genetic 10. Manolio TA, Collins FS, Cox NJ, Goldstein DB, Hindorff LA, Hunter DJ, et al. variability and the Bayesian alphabet. Genetics. 2009;183:347–63. Finding the missing heritability of complex diseases. Nature. 2009;461:747–53. 35. Habier D, Fernando RL, Garrick DJ. Genomic BLUP decoded: a look into the 11. Mills RE, Walter K, Stewart C, Handsaker RE, Chen K, Alkan C, et al. Mapping black box of genomic prediction. Genetics. 2013;194:597–607. copy number variation by population-scale genome sequencing. Nature. 36. Zhou Y, Utsunomiya YT, Xu L, Hay eH, Bickhart DM, Alexandre PA, et al. 2011;470:59–65. Genome-wide CNV analysis reveals variants associated with growth traits in 12. Scherer SW, Lee C, Birney E, Altshuler DM, Eichler EE, Carter NP, et al. Bos indicus. BMC Genomics. 2016;17:419. Challenges and standards in integrating surveys of structural variation. Nat 37. Neves HH, Carvalheiro R, O'Brien AM, Utsunomiya YT, do Carmo AS, Genet. 2007;39:S7–15. Schenkel FS, et al. Accuracy of genomic predictions in Bos indicus (Nellore) 13. Durkin K, Coppieters W, Drogemuller C, Ahariz N, Cambisano N, Druet T, et cattle. Genet Sel Evol. 2014;46:17. al. Serial translocation by means of circular intermediates underlies colour 38. Garrick DJ, Taylor JF, Fernando RL. Deregressing estimated breeding values sidedness in cattle. Nature. 2012;482:81–4. and weighting information for genomic regression analyses. Genet Sel Evol. 14. Brenig B, Beck J, Floren C, Bornemann-Kolatzki K, Wiedemann I, Hennecke S, 2009;41:55. et al. Molecular genetics of coat colour variations in white Galloway and 39. Carvalheiro R, Boison SA, Neves HH, Sargolzaei M, Schenkel FS, Utsunomiya White Park cattle. Anim Genet. 2013;44:450–3. YT, et al. Accuracy of genotype imputation in Nelore cattle. Genet Sel Evol. 15. Liu GE, Hou Y, Zhu B, Cardone MF, Jiang L, Cellamare A, et al. Analysis 2014;46:69. of copy number variations among diverse cattle breeds. Genome Res. 40. Geweke J: Evaluating the accuracy of sampling-based approaches to the 2010;20:693–703. calculation of posterior moments. In Bayesian Statistics 4. Edited by 16. Bickhart DM, Hou Y, Schroeder SG, Alkan C, Cardone MF, Matukumalli LK, et Bernardo JM, Berger JO, Dawid AP, Smith AFM. Oxford: Oxford University al. Copy number variation of individual cattle genomes using next- Press; 1992:169–193. generation sequencing. Genome Res. 2012;22:778–90. 41. Heidelberger P, Welch PD. Simulation run length control in the presence of 17. Bickhart DM, Xu L, Hutchison JL, Cole JB, Null DJ, Schroeder SG, et al. an initial transient. Opns Res. 1983;31:1144. Diversity and population-genetic properties of copy number variations and 42. Xu L, Hou Y, Bickhart DM, Zhou Y, Hay eH, Song J, et al. Population- multicopy genes in cattle. DNA Res. 2016;23:253–62. genetic properties of differentiated copy number variations in cattle. 18. Paudel Y, Madsen O, Megens HJ, Frantz LA, Bosse M, Bastiaansen JW, Sci Rep. 2016;6:23161. et al. Evolutionary dynamics of copy number variation in pig 43. Xu L, Bickhart DM, Cole JB, Schroeder SG, Song J, Tassell CP, et al. Genomic genomes in the context of adaptation and domestication. BMC signatures reveal new evidences for selection of important traits in Genomics. 2013;14:449. domestic cattle. Mol Biol Evol. 2015;32:711–25. 19. Liu GE, Brown T, Hebert DA, Cardone MF, Hou YL, Choudhary RK, et al. 44. Utsunomiya YT, do Carmo AS, Carvalheiro R, Neves HH, Matos MC, Zavarez Initial analysis of copy number variations in cattle selected for resistance or LB, et al. Genome-wide association study for birth weight in Nellore cattle susceptibility to intestinal nematodes. Mamm Genome. 2011;22:111–21. points to previously described orthologous genes affecting human and 20. Cheng Y, Stuart A, Morris K, Taylor R, Siddle H, Deakin J, et al. Antigen- bovine height. BMC Genet. 2013;14:52. presenting genes and genomic copy number variations in the Tasmanian 45. Silva RM, Fragomeni BO, Lourenco DA, Magalhaes AF, Irano N, Carvalheiro R, devil MHC. BMC Genomics. 2012;13:87. et al. Accuracies of genomic prediction of feed efficiency traits using 21. Eimes JA, Bollmer JL, Whittingham LA, Johnson JA, VAN Oosterhout C, Dunn PO. different prediction and validation methods in an experimental Nelore Rapid loss of MHC class II variation in a bottlenecked population is explained by cattle population. J Anim Sci. 2016;94:3613–23. drift and loss of copy number variation. J Evol Biol. 2011;24:1847–56.
– Springer Journals
Published: Jun 5, 2018