Background: The number of studies of Copy Number Variation in cattle has increased in recent years. This has been prompted by the increased availability of data on polymorphisms and their relationship with phenotypes. In addition, livestock species are good models for some human phenotypes. In the present study, we described the landscape of CNV driven genetic variation in a large population of 146 individuals representing 13 cattle breeds, using whole genome DNA sequence. Results: A highly significant variation among all individuals and within each breed was observed in the number of −15 −15 duplications (P <10 ) and in the number of deletions (P <10 ). We also observed significant differences between breeds for duplication (P = 0.01932) and deletion (P = 0.01006) counts. The same variation CNV length - inter-individual −15 −15 and inter-breed differences were significant for duplications (P <10 ) and deletions (P <10 ). Moreover, breed- specific variants were identified, with the largest proportion of breed-specific duplications (9.57%) found for Fleckvieh and breed-specific deletions found for Brown Swiss (5.00%). Such breed-specific CNVs were predominantly located in intragenic regions, however in Simmental, one deletion present in five individuals was found in the coding sequence of a novel gene ENSBTAG00000000688 on chromosome 18. In Brown Swiss, Norwegian Red and Simmental breed- specific deletions were located within KIT and MC1R genes, which are responsible for a coat colour. The functional annotation of coding regions underlying the breed-specific CNVs showed that in Norwegian Red, Guernsey, and Simmental significantly under- and overrepresented GO terms were related to chemical stimulus involved in sensory perception of smell and the KEGG pathways for olfactory transduction. In addition, specifically for the Norwegian Red breed, the dopaminergic synapse KEGG pathway was significantly enriched within deleted parts of the genome. Conclusions: The CNV landscape in Bos taurus genome revealed by this study was highly complex, with inter-breed differences, but also a significant variation within breeds. The former, may explain some of the phenotypic differences among analysed breeds, and the latter contributes to within-breed variation available for selection. Keywords: Copy number variation, Cattle, Genetic diversity, Next-generation sequencing Background phenotypes [6–11] and provide a source of genetic variation. The analysis of Copy Number Variation (CNV) has been It has been found that CNVs often occur in gene-rich carried out in many species including humans [1, 2], regions and are associated with phenotypic variation as well mice  and cattle [4, 5]. CNVs are structural polymor- as disease susceptibility [12, 13]. In livestock, pigmentation, phisms, including deletions, insertions and duplications. coat colour, body size, olfaction, immune response, patho- CNVs in genes and regulatory regions potentially impact gen and parasite resistance, lipid and protein metabolism, feed efficiency, fertility and milk production have been foundtobeaffectedbyCNVs[10, 12, 14, 15]. * Correspondence: firstname.lastname@example.org CNV were originally detected by approaches such as Biostatistics group, Department of Genetics, Wroclaw University of Environmental and Life Sciences, Kozuchowska 7, 51-631 Wroclaw, Poland Comparative Genomic Hybridization (CGH), array-based National Research Institute of Animal Production, Krakowska 1, 32-083 Comparative Genomic Hybridization (aCGH), quantitative Balice, Poland Polymerase Chain Reaction (qPCR), or using SNP arrays. Full list of author information is available at the end of the article © The Author(s). 2018 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. Mielczarek et al. BMC Genomics (2018) 19:410 Page 2 of 13 So far in cattle CNVs have been detected using SNP array the overlap of CNVs detected between studies is very [11, 15–21] while a few studies have used the comparative low . genomic hybridization approach [22, 23]. However, both In this study, we used a full genome sequence data for methods suffer from low accuracy of CNV location and 146 individuals representing 13 cattle breeds and merged CNV length estimation, and are not able to detect CNVs two algorithms for NGS-based CNV detection. Our goal along the entire genome sequence. The qPCR method has was to describe the CNV genomic landscape in cattle and not been applied on a genome-wide scale and is typically assess the degree of within- and between-breed variability used to explore targeted regions e.g. to validate putative in the CVN length and number. CNVs found using other methods . Recent advances in the next generation sequencing (NGS) technology provide Results a more accurate approach to identifying not only common, The landscape of copy number variation in Bos taurus but also rare CNVs, at a base-pair resolution . Studies The number of CNV variants identified varied considerably based on NGS have facilitated the discovery of smaller, amongthe 146individuals, ranging between12and 11,704 previously unknown, CNVs . There have been several (1343 ± 1086) for duplications and between none and 3960 studies focusing on CNVs in Bos taurus at the population (1708 ± 700) for deletions. In addition, CNV lengths were level conducted using NGS [4, 5, 10, 25], however, little also variable, by ranging from 200 bp to 4,992,800 bp is known about their population-wise distribution and (31,018 ± 169,307) for duplications and from 200 bp to their potential impact on phenotypes in cattle. Moreover, 4,536,800 bp (10,836 ± 53,724) for deletions (Fig. 1). Fig. 1 The graphical representation of the number of duplications (a) and deletions (b) per bull and the length of duplications (c) and deletions (d) observed in the whole validated data set Mielczarek et al. BMC Genomics (2018) 19:410 Page 3 of 13 Functional annotation of variants using Sequence gene (EBD, ENSBTAG00000033545), and other two Ontology, showed that, 29.49% of duplications and 32.08% introns were duplicated in the serine/threonine-protein of deletions overlapped with genes. The 20 most common kinase gene (PAK3, ENSBTAG00000015670) on BTX. duplications, shared by 74–117 bulls representing all All of the 14 remaining common duplications were breeds and the 20 most common deletions, shared by located between genes. The only genic region among the 117–140 bulls representing all breeds, were examined in most common deletions occurred in an intron within the detail. Among the most common duplications, there were uncharacterized gene on BTA6 (ENSBTAG00000035764). two duplicated non-protein expressed coding regions. The 19 other common deletions were located between One, located on BTA21, included a transcript of a small genes. However, it is worth mentioning that a partial nuclear RNA gene (SNORD116, ENSBTAG00000048121) deletion of MC1R (melanocortin 1 receptor gene) exon. and the other, located on BTA28, was a part of the 5S Activation of this gene results in black coat color, ribosomal RNA gene (5SrRNA, ENSBTAG00000045518). whereas loss of function causes red coat color . The These two transcripts were classified as having high func- deletion was identified in Brown Swiss, Norwegian Red tional impact. Furthermore, a protein coding region of inter- and Simmental individuals, which are red breeds. A list of feron alpha-inducible protein 27 (ENSBTAG00000003152) the 20 most common CNVs, with information on their on BTA21 was duplicated with potential impact on gene genomic location and overlaps with polymorphisms from function assigned by the Sequence Ontology. This gene other studies is provided in the Tables 1 and 2. may be involved in regulation of protein export from the nucleus, but it is not well characterized for the cattle Inter-individual and inter-breed variation genome.A duplicationonBTA8includedan intron of Most of CNVs, comprising 84.85% of duplications and the rho-related BTB domain-containing protein 2 gene 77.22% of deletions, were identified in only one bull. (RHOBTB2, ENSBTAG00000031916), one duplication There were no identical CNVs, defined as polymorphisms on BTA27 included an intron of the enteric beta-defensin with exactly the same breakpoint positions, which were Table 1 The most common duplications in the whole dataset BTA begin end genomic location overlapping with the DGVa 2 136,813,001 136,815,100 intergenic 4 28,200,301 28,203,500 intergenic 8 70,883,001 70,885,000 intron of the ENSBTAG00000031916 gene (RHOBTB2)  (2) 8 74,685,001 74,687,800 intergenic 9 53,617,901 53,621,800 intergenic 18 50,944,801 50,948,100 intergenic  (4),  (1),  (1) 21 2,128,101 2,130,400 non coding transcript exon of the ENSBTAG00000048121 gene (SNORD116) 21 59,331,801 59,334,500 coding sequence variant and intron of  (1) ENSBTAG00000003152 gene 27 5,516,501 5,519,500 intron of the ENSBTAG00000033545 gene  (5),  (1) 27 28,539,101 28,543,700 intergenic  (2) 27 28,543,901 28,548,300 intergenic  (2) 27 28,548,501 28,552,600 intergenic  (3) 27 28,878,101 28,881,600 intergenic 28 1,893,701 1,895,100 transcript amplification in the  (4) ENSBTAG00000045518 gene (5S rRNA) X 36,208,701 36,209,700 intergenic  (1) X 36,260,901 36,262,400 intergenic  (1) X 36,673,801 36,676,800 intergenic  (2) X 64,480,501 64,481,800 intron of the ENSBTAG00000015670 gene (PAK3)  (2),  (1) X 64,504,801 64,512,100 intron of the ENSBTAG00000015670 gene (PAK3)  (1) X 138,259,801 138,320,600 intergenic  (1) The list of the 20 most common duplications detected in this study. Genomic locations were determined by the VEP program. The last column shows the number of duplications found in other studies available under the DGVa database Mielczarek et al. BMC Genomics (2018) 19:410 Page 4 of 13 Table 2 The most common deletions in the whole dataset BTA begin end genomic location overlapping with the DGVa 2 136,815,101 136,816,200 intergenic  (1) 2 136,942,201 136,943,800 intergenic  (1) 6 5,358,201 5,360,200 intergenic  (6) 6 5,897,301 5,899,100 intergenic  (10),  (1) 6 5,903,601 5,904,300 intergenic  (10),  (1) 6 6,218,501 6,219,600 intron of the ENSBTAG00000035764 gene  (7),  (1) 6 6,548,401 6,549,400 intergenic  (8) 7 34,622,901 34,623,700 intergenic  (1),  (1) 8 39,388,901 39,389,500 intergenic  (1) 8 62,206,601 62,207,700 intergenic 14 292,501 294,900 upstream gene variant of ENSBTAG00000046822  (20) (U6 spliceosomal RNA) 14 322,901 325,800 upstream gene variant of  (24) ENSBTAG00000045988 (5S rRNA) 14 389,001 391,100 downstream gene variant of  (26) ENSBTAG00000045780 (5S rRNA) 16 7,825,301 7,826,200 intergenic  (2) 17 50,668,301 50,670,100 intergenic  (2) 21 2,020,201 2,022,100 upstream gene variant of  (1) ENSBTAG00000046925 (5S rRNA) 21 2,025,201 2,026,700 intergenic  (1) X 35,728,601 35,730,000 intergenic X 53,961,901 53,963,800 intergenic X 54,097,401 54,098,700 intergenic  (3) The list of the 20 most common deletions detected in this study. Genomic locations were determined by the VEP program. The last column shows the number of deletions found in other studies available under the DGVa database observed in each of 146 genomes. The most frequent of the genome containing deletions or duplications among duplication overlapped among 117 bulls and the most individuals within breeds was significantly different (tests −12 common deletion was found in 140 bulls. A highly resulting in p-values P <0.1 ∙ 10 ). significant variation among all 146 individuals was observed Functional annotation performed for CNVs separately in the number of duplications and in the number of within each breed showed that the fraction of duplica- −15 deletions (both with P <10 ). Deletions and duplications tions assigned to gene regions markedly differed between −15 were distributed both, within-breeds (with P <10 ) breeds and ranged from 29.56% (Simmental) to 58.61% and between-breeds (P = 0.01932 for duplications and (Fleckvieh) (Fig. 2). The fraction of deletions ranged from P = 0.01006 for deletions). The inter-individual variation in 36.21% (Guernsey) to 44.71% (Brown Swiss) for gene length of CNVs was highly significant for duplications regions (Fig. 3). −15 −15 (P <10 )and deletions(P <10 ), which was due to both, significant within-breed and between-breed vari- Breed-specific CNVs ation. The average length of duplications was highest in Variants present only in one breed have a potential to Norwegian Red (76,931.9 bp) and lowest in Simmental contribute to genetic differences between them. Due to (13,905.71 bp), which also showed the highest within-breed still relatively small sizes of breed-specific data sets in −94 variation (P =3.02 ∙ 10 ). The average length of deletions this and previous NGS based studies an unequivocal varied between 7409 bp in Guernsey and 12,564 bp in declaration of a CNV being specific for only one breed is Fleckvieh and was therefore much lower than for duplica- not possible. In the present study, breed-specific variants tions. The highest within-breed variation in deletion length, were defined as CNVs shared by at least two bulls within a −192 expressed by P =1.23 ∙ 10 was found in Norwegian given breed and absent in the other breeds. The percent Red. A graphical representation of duplication lengths of breed-specific CNVs was the lowest in Simmental is provided in Additional file 1: Figure S1 and deletions (1.74% of duplications and 1.31% for deletions), while the lengths in Additional file 2: Figure S2. The percentage most distinct breeds were Brown Swiss with 5.00% of the Mielczarek et al. BMC Genomics (2018) 19:410 Page 5 of 13 Norwegian Red and significantly overrepresented in Guernsey and Simmental breeds. For Fleckvieh, neither biological process, nor molecular function was significantly under- or overrepresented GO terms found. The overall count of significantly underrepresented and overrepre- sented GO terms was highest in Simmental. Underrepre- sented GO terms were mainly related to cell management (e.g. organelle organization, cell differentiation, cellular response to organic substance, regulation of cell prolif- eration) while overrepresented GO terms were mainly related to immune response (e.g. immunoglobulin pro- duction, autophagy and antigen processing and presenta- tion of peptide antigen via MHC class I). Norwegian Red was the breed in which the breed-specific deletions were most significantly underrepresented (e.g. natural killer cell mediated cytotoxicity, immunoglobulin production) and Fig. 2 The percent of duplications falling into non-genic and gene overrepresented (e.g. small molecule metabolic process, regions. BSW represents Brown Swiss, FLV Fleckvieh, GUE Guernsey, response to cytokine, RNA processing, translation) GO RED Norwegian Red and SIM Simmental breed terms. A common feature of breed-specific deletions was their significant overrepresentation in the “natural killer cell lectin-like receptor binding” ontology (GO:0046703). breed specific duplications while Fleckvieh had 9.57% In the context of KEGG pathways, the olfactory trans- of the breed specific deletions (Fig. 4). Interestingly, we duction pathway (bta04740) was significantly enriched −5 found that the part of the KIT (the Hardy-Zuckerman 4 among duplicated genes in Guernsey (P =7.80 ∙ 10 ) −22 feline sarcoma viral oncogene homolog) gene, which and Simmental (P =7.01 ∙ 10 ), while the same pathway explains a considerable proportion of the variation in (bta04740, P = 0.0063) together with dopaminergic synapse pigmentation pattern , was deleted in five Brown (bta04728, P = 0.03674) pathway were significantly enriched Swiss individuals and was present in all four remaining among deleted genes in Norwegian Red breed. breeds which have a characteristic spotted phenotype. The most common breed-specific duplications were Functional annotation of breed-specific duplications shared by ten bulls in Brown Swiss (20.83%), seven bulls in showed that the same GO term “detection of chemical Norwegian Red (36.84%), six bulls in Fleckvieh (20.00%), stimulus involved in sensory perception of smell” six bulls in Guernsey (30.00%) and five in Simmental (GO:0050911) was significantly underrepresented in (31.25%). The most common breed-specific deletions were present in 23 individuals of the Brown Swiss breed (47.92%), 11 Norwegian Red individuals (57.90%), ten Guernsey (50.00%) and five Simmentals (31.25%). The genomic annotation of the ten most common duplications and eight deletions within each breed were investigated further. Seven duplications were in intergenic regions and three duplications were located in introns or upstream gene regions (Table 3). In the case of the deletions, five were annotated in intergenic regions, two in introns or upstream gene regions and only one overlapped with a cod- ing sequence. The latter was on the BTA18 and incorpo- rated the exonic sequence of the ENSBTAG00000000688 gene, in which protein product is not well characterized in a mammal genome (Table 4). This gene has been reported to be involved in the regulation of transcription in humans by . The deletion identified in the present study, spanning this gene region was found in five bulls belonging to the Simmental breed, and it is also present in the Database of Fig. 3 The percent of deletions falling into non-genic and gene Genomic Variants . regions. BSW represents Brown Swiss, FLV Fleckvieh, GUE Guernsey, The most common breed-specific CNVs overlapping with RED Norwegian Red and SIM Simmental breed QTL represented six phenotypic groups: reproduction, milk, Mielczarek et al. BMC Genomics (2018) 19:410 Page 6 of 13 Fig. 4 The percent of breed specific deletions/duplications (detected in at least two bulls belonging to the same breed). BSW represents Brown Swiss, FLV Fleckvieh, GUE Guernsey, RED Norwegian Red and SIM Simmental breed production, exterior, meat and carcass as well as health. In and Simmental specific duplications overlapped with QTL the case of duplications, QTL falling into meat and carcass related to milk yield. Interestingly, Simmental specific trait class were found in all breeds, except Norwegian Red. duplication fell into all phenotypic groups, but deletion For the latter breed duplications occurred in only two QTL, overlapped only with body weight. Breed specific deletions for calving index and length of productive life. Fleckvieh in QTL related to body weight were found in all analysed Table 3 The most common breed specific duplications breed # bulls sharing duplication BTA begin end genomic location overlapping with the DGVa BSW 10 5 74,078,801 74,086,100 intergenic BSW 10 14 64,001 89,100 intergenic  (1),  (3),  (1) FLV 6 17 72,899,301 72,924,700 intron of the  (1),  (3),  (1),  (1) ENSBTAG00000031160 gene GUE 6 5 114,221,601 114,225,800 intergenic GUE 6 8 56,717,001 56,730,400 intergenic GUE 6 12 73,428,801 73,437,300 intergenic  (28) GUE 6 25 19,009,101 19,013,400 intron of the ENSBTAG00000018560 (DNAH3) gene RED 7 9 88,596,301 88,599,700 intron of the  (2),  (1) ENSBTAG00000015935 (IYD) gene RED 7 X 36,034,701 36,036,900 intergenic  (1) SIM 5 10 24,513,701 24,528,400 intergenic  (14),  (1) The list of the most common duplications detected within each breed. Genomic locations were determined by the VEP program. The last column shows the number of duplications found in other studies available under the DGVa database Mielczarek et al. BMC Genomics (2018) 19:410 Page 7 of 13 Table 4 The most common breed specific deletions breed # bulls sharing deletion BTA begin end genomic location overlapping with the DGVa BSW 23 5 23,616,701 23,623,400 intergenic FLV 5 12 76,499,501 76,514,300 intergenic  (11) FLV 5 16 23,946,401 23,947,000 intergenic FLV 5 18 63,804,801 63,808,200 upstream gene variant of ENSBTAG00000000688 FLV 5 28 7,026,301 7,027,000 intron of the ENSBTAG00000020361 (SLC35F3) gene GUE 10 2 55,348,801 55,371,300 intergenic RED 11 17 25,081,301 25,083,200 intergenic SIM 5 18 63,800,101 63,806,400 start lost, coding sequence, 5’ UTR,  (1) intron of the ENSBTAG00000000688 gene The list of the most common deletions detected within each breed. Genomic locations were determined by the VEP program. The last column shows the number of deletions found in other studies available under the DGVa database breeds. Breed-specific deletions were also found in QTL Genomic landscape of CNVs for milk yield as well as meat and carcass classes in all The total number of putative CNVs identified in this breeds except the Simmental breed. study was 445,791 (196,241 duplications and 249,550 deletions) with, on average, 3053 CNVs (1344 duplica- Discussion tions, 1709 deletions) per bull. In contrast,  reported The present study investigated the occurrence of CNVs in 520 CNVs for one bull, while 790 CNVs fortwo 13 breeds of domestic cattle, focussing on inter-individual animals. Furthermore, detected 6811deletions for and inter-breed levels of variation in length, number and 32 animals, while  only 547 deletions and 410 function of the variants. duplications for 62 bulls. The number of CNVs in this study was higher which may be explained by the bigger CNV dataset sample size and that most of CNVs were specific for Although algorithms for CNV detection have improved only one animal. Most studies report that deletions are recently and are based on improved data provided by more common than duplications. A possible biological the next generation sequencing (NGS), the number of false explanation for this is that a non-allelic homologous positive CNV calls are still high [29, 30]. The problem with recombination, one of the major sources of CNVs, reliable detection of CNVs has been discussed by , who generates more deleted than duplicated regions . In compared CNVs detected for the same individual using the present study, the excess of deletions may also be three different methods (NGS, oligonucleotide array, CGH explained by the CNV detection algorithm used, which array). They observed that there was only a 23% overlap in applies more stringent criteria for calling duplications, the CNVs detected. Other authors have also observed a as these are susceptible to the systematic read mapping low correlation among CNV detected within and among bias caused by unknown regions in the reference genome studies [5, 10] which is caused by technical aspects . The length of CNVs reported in different studies also such as different sample sizes, differences in breeds differs considerably. In our study, the minimum reported studied, detection platforms used (array-based vs. NGS) CNV length was constrained by the 200 bp, cut of set in and CNV detection algorithms. the software. The largest CNVs reported are much longer Because of this low reproducibility in CNV detection, than CNVs reported by other authors: a maximum CNV it is important that data is carefully edited and results length 28 kbp in  and 129,9 kbp in  in comparison validated. In the present study the raw output was rigor- with 4993 kbp for duplications and 4537 kbp for deletions ously edited by discarding CVN variants outside the reported in this study. These differences are probably a length range 50 bp - 5,000,000 bp. CNVs longer than result of the different CNV detection software and five Mbp were classified as artefacts of the alignment validation methods used. Previous results have reported process. The validated dataset retained only 30.28% of that CNVs comprise between 1.74 to 10% of the bovine duplications and 11.50% of deletions initially identified genome [10–12, 25]. in the raw output. It is worth noting that 44% of duplica- tions and deletions detected in the present study fell within Functional annotation or overlapped with CNVs present in the DGVa (https:// CNVs often include functional elements of the genome, www.ebi.ac.uk/dgva) and therefore can be considered as such as genes or regulatory sequences, and thus have a validated. potential to affect phenotypes [6–11]. In the present Mielczarek et al. BMC Genomics (2018) 19:410 Page 8 of 13 study, 29% of duplications and 32% of deletions were breeds having a characteristic spotting phenotype. assigned to SO terms corresponding to gene regions. Contrarily,  observed a duplication nearby segment However, among the 20 most common deletions only of the KIT gene resulting of serial translocation leading one was located within an intronic part of a gene. to differential skin color pigmentation in Brown Swiss Whereas seven of the 20 most common duplications animals. This particular duplication located on BTA6 were in two non-protein coding expressed regions, one was not found in this study for Brown Swiss population. was within a protein coding region and four were within However, we observed an overlapping duplication in one introns. This suggests that deletion events in coding Simmental genome. This founding also overlapped with regions are less evolutionary accepted than duplications. the CNV gain detected by  where bulls representing the Deletions may have a greater impact on phenotype by seven most popular breeds in the United States (including interrupting gene products and causing loss of their Simmental) were investigated. On the other hand, follow- biological functions . ing  study we also observed the duplication on BTA29 in one Brown Swiss genome which were reported in the Inter-individual and inter-breed variation context of color sidedness in cattle. What’s interesting, In this study, a highly significant variation was observed we also detected that MC1R (melanocortin 1 receptor), both in the number and length of duplications and in the whose permanent activation results in black coat colour, number and length of deletions among the 146 animals. whereas loss of function mutations causes red coat colour An inter-individual, breed-independent component was in different cattle breeds , was partly deleted in Brown identified. However, most of the CNVs, comprising Swiss, Norwegian Red and Simmental individuals. 84.85% of duplications and 77.22% of deletions, were Although many breed-specific GO terms and KEGG found in only one bull. A similar proportion was also pathways were identified, we have no recognized any observed by , where 61% of all CNVs were specific to systematic pattern of inter-breed differences. Nevertheless, only one animal. CNVs, with exactly the same breakpoints olfactory receptors genes were reported to be duplicated among all 146 individuals, were not observed in our within the bovine genome suggesting that they may be dataset. Considering CNVs which are common to all indi- under strong selection for newly evolving functions . viduals, it is important to bear in mind that such CNVs This was confirmed here by significantly under – and might be an artefact arising from the animal used to create overrepresented GO terms related to chemical stimulus the reference bovine genome , or artefacts resulting involved in sensory perception of smell in Guernsey, and from assembly problems . The proportion of CNVs Simmental Norwegian Red breeds. located in gene regions differed between breeds. Although, as expected, most of CNVs were located in non-genic Conclusions regions, for the Fleckvieh breed the percent of duplica- Structural genomic variations, especially long deletions tions was higher in genes than in non-genic regions. and duplications, are a common feature in the bovine Fleckvieh also differed from other breeds in as much as it genome. Compared to SNPs and indels, CNVs show a contained a higher proportion of breed-specific duplica- greater inter-individual variability. In the present study a tions. Those duplications seem to reflect the selection large proportion of the variants identified were individual history. Since a large number of duplications, especially specific and are likely to contribute to phenotypic differ- duplications of coding sequences, enhances organism ences between individuals. The diversity of the olfactory genetic diversity by allowing to gain new function by gene family, where several CNVs were identified, reveals duplicated genes . Such diversity may have been the possible role of these structural variants in driving promoted for Fleckvieh as it has always been selected as a functional evolution. While the impact of point mutations, dual purpose breed. Also  observed a high haplotype which are predominantly located in gene promoters acts diversity of Fleckvieh as compared to Simmental, Brown in regulation of expression levels , the impact of struc- Swiss and Spanish cattle. Moreover, the diversity is tural duplications may be in the formation of new genes reflected by a large effective population size estimated by . Also in the present study we observed that common  and being approximately 3 times higher than for the duplications were more often located in genic regions Holstein breed. than common deletions. It is widely known that CNV type polymorphisms may cause differences in the coat color in cattle [26, 27, 37]. Methods In this study we observed that the part of the KIT Material (the Hardy-Zuckerman 4 feline sarcoma viral oncogene Whole genome DNA sequences were generated as homolog) gene which explains a considerable proportion described in . In brief: DNA was isolated from blood of the variation in patterned pigmentation was samples of 155 bulls using a DNA Isolation System, then deleted in Brown Swiss and was present in four remaining libraries were generated from 1 μgofgenomic DNAusing Mielczarek et al. BMC Genomics (2018) 19:410 Page 9 of 13 the Illumina TruseqDNA PCR, and sequenced on the detection was carried out for 146 bulls. CNV were detected IlluminaHiSeq2000 with a 100 cycles of paired-end with the CNVnator and the Pindelprograms. sequencing module using the Truseq SBS kit v3. All The read-depth (RD) algorithm implemented in the animals were selected and sequenced within the frame CNVnator software is based on the comparison of of the Gene2Farm project and represented 13 breeds: genome coverage and assumes that regions with coverage Brown Swiss (48), Fleckvieh (31), Norwegian Red (26), different from the genome average correspond to CNVs Guernsey (20), Simmental (16), Parda de la Montaña . The Pindel program is based on the split-read (SR) (4), Pezzata Rossa Italiana (3), Avileña (2), Bruna Italiana approach, which uses paired-end reads features for CNV (1), Albera (1), Rubia Gallega (1), Toro de Lidia (1) and detection. CNVs detected by the CNVnator software, Pirenaica (1). The total number of raw reads obtained for longer than 5,000,000 bp were discarded as were CNVs a single bull varied between 83,423,880 (a Norwegian Red detected by Pindel which were outside the length range of bull) and 763,594,929 (a Brown Swiss bull). The number 50 bp - 5,000,000 bp. The consensus set was then created, of reads per individual was shown on Additional file 3: using the output of the CNVnator as a baseline data set Figure S3 The length of single read was 101 bp and and each variant, which was also detected by the Pindel the corresponding insert size was 350 bp. Data were software was classified as validated. This validated dataset paired-end type and the average quality of reads per was compared to CNVs available in the Database of bull ranged from 28.11 to 36.69. Genomic Variants archive (DGVa). Only CNVs classified as the gain (duplications) or loss (deletions) of DNA frag- CNV detection and annotation pipeline ment, which is consistent with the CNVnator output, were Annotated CNVs were performed using the following used and other variants available in the database e.g. steps, described in detail below: (i) an alignment to the assigned as “inversions” were excluded. The breakpoint reference bovine genome, (ii) data processing after align- position accuracy implemented in CNVnator was 100 bp, ment, (iii) CNV detection, (iv) validation of CNVs, and therefore, for all comparisons, breakpoint positions within (v) CNVs annotation (Fig. 5). BWA-MEM software  the range 100 bp up- or downstream, were considered as was used to align reads against the UMD 3.1  refer- the same. CNVs were annotated using the Variant Effect ence bovine genome. Post alignment processing was Predictor software  and classified as genic or non-genic done using a collection of tools from the Picard (http:// (defined as described in the Additional file 4:Table S1). broadinstitute.github.io/picard/) and the SAMtools pack- Predicted consequences of deletions or duplications were ages . This step included converting a SAM format assigned according to the Sequence Ontology (SO) classifi- to a BAM format, merging BAM files, sorting reads, cation  for the 20 most common duplications and the removing identical duplicates, and sequence indexing. 20 most common deletions identified in the whole dataset, The average coverage per individual was calculated by as well as for the most common breed-specific duplications using the following formula: and deletions (shared by at least two individuals within a breed). Breed specific CNVs were subjected to enrichment analysis of underlying GO terms [49, 50] and KEGG path- i¼1 coverage ¼ ; ð1Þ ways using the Kobas software [51, 52]. The most common breed specific CNVs were also compared with QTL from the AnimalQTLdb (www.animalgenome.org/). Breed spe- where N denoted the total number of aligned reads, r cific CNVs were analysed for the five most numerous was length in bp of i-th read and d the length of the breeds: Brown Swiss, Guernsey, Fleckvieh, Simmental and reference genome (2697.56 Mb). This value was used to Norwegian Red. exclude individuals with an average genome coverage below seven from downstream analyses. As a consequence, Testing inter-individual and inter-breed variation in CNVs nine individuals (seven Norwegian Red, one Fleckvieh and Inter-individual and the inter-breed variation in the one Parda de la Montaña) were discarded (Fig. 6). In order number of variants was tested separately for duplications to control the alignment process (i) the percent of all and deletions using: aligned reads and (ii) the percent of properly paired reads m 2 (aligned to the same chromosome with the reasonable ðÞ O −E 2 2 χ ¼ χ ; ð2Þ m−1 insert size and oriented towards each other) were deter- i¼1 mined. Because the percent of all aligned reads was fairly high (86.87% in one Brown Swiss bull and from 96.01 to 99.92% for the others) as well as the percent of properly where O denotes the number of duplications/deletions paired reads (from 80.62 to 99.14%.) we did not exclude for i-th individual, E is the average number of deletions/ any other animal from the analysis. Therefore, the CNV duplications identified in the whole dataset and m denotes Mielczarek et al. BMC Genomics (2018) 19:410 Page 10 of 13 Fig. 5 The workflow including CNV detection and annotation pipeline used in this study. White boxes represent particular processes names, while blue boxes represent software. Consensus CNV set constructing and statistical analysis were implemented in self-written scripts the number of bulls. For Brown Swiss, Guernsey, Fleckvieh, R denotes the sum of ranks of the deletion/duplication Simmental and Norwegian Red the χ test was used count in i-th breed. within-breed, where E represents a breed-specific average The null hypothesis that lengths of deletions/duplica- number of deletions/duplications. tions are normally distributed was tested using the In order to test the variability in the number of deletions/ Shapiro-Wilk test: duplications among breeds the Kurskal-Wallis test was performed: aðÞ n X −X i ðÞ n−iþ1 :n i:n i¼1 W ¼ ; ð4Þ m 2 12 R n i 2 X −X H ¼ −3ðÞ m þ 1 χ ð3Þ i¼1 n−1 nnðÞ þ 1 k i¼1 where a represents a constant from Shapiro-Wilk tables, where k is the number of individuals representing i-th n denotes the number of CNVs, X is the length of i-th i i : n breed, and n ¼ k , m is the number of breeds and variant in the sorted vector of variants length. i¼1 Mielczarek et al. BMC Genomics (2018) 19:410 Page 11 of 13 Fig. 6 The average genome coverage per individual. Bulls excluded from further analysis are below the red horizontal line. BSW represents Brown Swiss, FLV Fleckvieh, GUE Guernsey, RED Norwegian Red and SIM Simmental breed. The “other” category contains individuals belonging to breeds such as Parda de la Montaña (4 bulls), Pezzata Rossa Italiana (3), Avileña (2), Bruna Italiana (1), Albera (1), Rubia Gallega (1), Toro de Lidia (1) and Pirenaica (1) As CNV lengths did not follow a normal distribution, each breed were subjected to Bonferroni correction for in order to test whether the distribution of CNV lengths multiple testing. The statistical analysis was performed in is the same for all individuals a Kruskal–Wallis test was R package . Inter-individual within-breed variation applied as in equation (3) but variables were denoted as and inter-breed variation was tested for the Brown Swiss, follows: k was the number of duplications/deletions for Guernsey, Fleckvieh, Simmental and Norwegian Red i-th bull, and k ¼ k , m was the number of bulls breeds. i¼1 and R denoted the sum of ranks for deletion/duplication length corresponding to i-th bull. The same test was Additional files applied to check whether variability in the length of deletions/duplications between breeds exists. Additional file 1: Figure S1. The length of duplications found within The difference in the percentage of genome covered by each breed. BSW represents Brown Swiss, FLV Fleckvieh, GUE Guernsey, CNVs was tested between individuals within-breed with RED Norwegian Red and SIM Simmental breed. (TIFF 22406 kb) the null hypothesis that for each bull the same percentage Additional file 2: Figure S2. The length of deletions found within each breed. BSW represents Brown Swiss, FLV Fleckvieh, GUE Guernsey, of the genome is covered by deletions /duplications. The RED Norwegian Red and SIM Simmental breed. (TIFF 22494 kb) hypothesis was tested using the multiple proportion test: Additional file 3: Figure S3. The number of reads per individual (in millions). (TIFF 21489 kb) l 2 Additional file 4: Table S1. SO terms classified in two, more general d∙ðÞ p −p l i¼1 i F ¼ ∙ ; ð5Þ groups as the non-genic and genic regions. (XLSX 8 kb) l−1 p ∙ðÞ 1−p i¼1 i i Abbreviations where p denotes the observed percentage of the genome aCGH: Array-based comparative genomic hybridization; BAM: Binary version of the i-th individual covered by CNVs, p denotes the of a SAM file; BTA: Bos taurus autosome; BTX: Bos taurus X chromosome; mean of p , d is the length of the reference genome, and BWA: Burrows-Wheeler Aligner; CGH: Comparative genomic hybridization; CNV: Copy number variation; NGS: Next generation sequencing; l is the number of animals representing a given breed. PCR: Polymerase chain reaction; qPCR: Quantitative polymerase chain reaction; Under the null hypothesis, this test statistic follows the RD: Read-depth; SAM: Sequence alignment/map file format; SNP: Single F(l − 1, t) distribution, where t→ ∞.Nominal P-values for nucleotide polymorphism; SR: Split-read; VEP: Variant effect predictor Mielczarek et al. BMC Genomics (2018) 19:410 Page 12 of 13 Acknowledgements 7. Mills RE, Walter K, Stewart C, Handsaker RE, Chen K, Alkan C, et al. Mapping We acknowledge Poznan Supercomputing and Networking Centre for copy number variation by population-scale genome sequencing. Nature. hosting the large computations. Genomic data was provided by ANAPRI, 2011;470(7332):59–65. ANARB, FEAGAS, Qualitas and ZD. Project was also supported by the 8. Liu GE, Bickhart DM. Copy number variation in the cattle genome. Funct Wroclaw Centre of Biotechnology, programme the Leading National Integr Genomics. 2012;12(4):609–24. Research Centre (KNOW) for years 2014-2018. 9. Bickhart DM, Liu GE. The challenges and importance of structural variation detection in livestock. Front Genet. 2014;5:37. Funding 10. Shin DH, Lee HJ, Cho S, Kim HJ, Jae Hwang Y, Lee CK, et al. Deleted copy The research was funded by the European Union’s Seventh Framework number variation of Hanwoo and Holstein using next generation Program for research, technological development and demonstration under sequencing at the population level. BMC Genomics. 2014;15:240. grant agreement 289592 - Gene2Farm and from the Polish National Science 11. Sasaki S, Watanabe T, Nishimura S, Sugimoto Y. Genome-wide identification Centre (grant 2014/15/N/NZ9/03914). of copy number variation using high-density single-nucleotide polymorphism array in Japanese black cattle. BMC Genet. 2016;17:26. Availability of data and materials 12. Bickhart DM, Hou Y, Schroeder SG, Alkan C, Cardone MF, Matukumalli LK, et All genomic data was part of the Gene2farm European Project al. Copy number variation of individual cattle genomes using next- (www.gene2farm.eu) within the framework of the European Union’s Seventh generation sequencing. Genome Res. 2012;22(4):778–90. Framework Program for research, technological development and 13. Choi JW, Lee KT, Liao X, Stothard P, An HS, Ahn S, et al. Genome-wide copy demonstration under grant agreement 289,592. Data belongs to ANAPRI, number variation in Hanwoo, black Angus, and Holstein cattle. Mamm ANARB, FEAGAS, Qualitas and ZuchtData and are available on request and Genome. 2013;24:151–63. upon individual agreements. Data request should be addressed to the 14. Santana MH, Junior GA, Cesar AS, Freua MC, da Costa Gomes R, da Luz E, Gene2farm project coordinator E. L. Nicolazzi (email@example.com). Silva S, et al. Copy number variations and genome-wide associations reveal putative genes and metabolic pathways involved with the feed conversion Authors’ contributions ratio in beef cattle. J Appl Genet. 2016;4:495–504. MM and JS designed the study. MM performed CNV detection, annotation 15. Xu L, Hou Y, Bickhart DM, Zhou Y, Hay el HA, Song J, et al. Population- and part of the statistical analyses. MF suggested and performed the genetic properties of differentiated copy number variations in cattle. Sci statistical analyses. MM, JS and MF wrote the draft of manuscript. ELN and Rep. 2016;23(6):23161. JLW contributed to the concept of the study and improved the manuscript. 16. Bae JS, Cheong HS, Kim LH, NamGung S, Park TJ, Chun JY, et al. All authors read and approved the final manuscript. Identification of copy number variations and common deletion polymorphisms in cattle. BMC Genomics. 2010;11:232. Ethics approval and consent to participate 17. Hou Y, Liu GE, Bickhart DM, Cardone MF, Wang K, Kim ES, et al. Real data was provided by animal breeding companies such as ANAPRI, ANARB, Genomic characteristics of cattle copy number variations. BMC FEAGAS, and ZuchtData within the framework of the Gene2farm European Genomics. 2011;12:127. Project (www.gene2farm.eu). Therefore, data recording followed the 18. Hou Y, Bickhart DM, Hvinden ML, Li C, Song J, Boichard DA, et al. Fine International Committee for Animal Recording (ICAR) approved guidelines. mapping of copy number variations on two cattle genome assemblies using high density SNP array. BMC Genomics. 2012;13:376. Competing interests 19. Cicconardi F, Chillemi G, Tramontano A, Marchitelli C, Valentini A, Ajmone- The authors declare that they have no competing interests. Marsan P, Nardone A. Massive screening of copy number population-scale variation in Bos taurus genome. BMC Genomics. 2013;14:124. 20. Zhang Q, Ma Y, Wang X, Zhang Y, Zhao X. Identification of copy number Publisher’sNote variations in Qinchuan cattle using BovineHD genotyping Beadchip array. Springer Nature remains neutral with regard to jurisdictional claims in Mol Gen Genomics. 2015;290(1):319–27. published maps and institutional affiliations. 21. Gurgul A, Jasielczuk I, Szmatoła T, Pawlina K, Ząbek T, Żukowski K, Bugno- Poniewierska M. Genome-wide characteristics of copy number variation in Author details polish Holstein and polish red cattle using SNP genotyping assay. Genetica. Biostatistics group, Department of Genetics, Wroclaw University of 2015;143(2):145–55. Environmental and Life Sciences, Kozuchowska 7, 51-631 Wroclaw, Poland. 22. Fadista J, Thomsen B, Holm LE, Bendixen C. Copy number variation in the National Research Institute of Animal Production, Krakowska 1, 32-083 bovine genome. BMC Genomics. 2010;6(11):284. Balice, Poland. Council on Dairy Cattle Breeding (CDCB), 4201 Northview Dr, 23. Liu GE, Hou Y, Zhu B, Cardone MF, Jiang L, Cellamare A, et al. Analysis of Bowie, MD 20716, USA. Davies Research Centre, University of Adelaide, copy number variations among diverse cattle breeds. Genome Res. 2010; School of Animal and Veterinary Sciences, Roseworthy, SA 5371, Australia. 20(5):693–703. 24. Alkan C, Coe BP, Eichler EE. Genome structural variation discovery and Received: 25 January 2018 Accepted: 22 May 2018 genotyping. Nat Rev Genet. 2011;12:363–76. 25. Boussaha M, Esquerré D, Barbieri J, Djari A, Pinton A, Letaief R, et al. Genome-wide study of structural variants in bovine Holstein, References Montbéliarde and Normande dairy breeds. PLoS One. PLoS One. 2015; 1. Conrad DF, Pinto D, Redon R, Feuk L, Gokcumen O, Zhang Y, et al. Origins 10(8):e0135931. and functional impact of copy number variation in the human genome. 26. Qanbari S, Pausch H, Jansen S, Somel M, Strom TM, et al. Classic selective Nature. 2010;464:704–12. sweeps revealed by massive sequencing in cattle. PLoS Genet. 2014;10(2): 2. Handsaker RE, Van Doren V, Berman JR, Genovese G, Kashin S, Boettger LM, e1004148. McCarroll SA. Large multiallelic copy number variations in humans. Nat 27. Hayes BJ, Pryce J, Chamberlain AJ, Bowman PJ, Goddard ME. Genetic Genet. 2015;47:296–303. architecture of complex traits and accuracy of genomic prediction: coat 3. Locke MEO, Milojevic M, Eitutis ST, Patel N, Wishart AE, Daley M, Hill KA. colour, milk-fat percentage, and type in Holstein cattle as contrasting model Genomic copy number variation in Mus musculus. BMC Genomics. 2015; traits. Georges M, ed PLoS Genet. 2010;6(9):e1001139. 16(1):497. 28. Li J, Wang Y, Fan X, Mo X, Wang Z, Li Y, et al. ZNF307, a novel zinc finger 4. Bickhart DM, Xu L, Hutchison JL, Cole JB, Null DJ, Schroeder SG, et al. gene suppresses p53 and p21 pathway. Biochem Biophys Res Commun. Diversity and population-genetic properties of copy number variations and 2007;363(4):895–900. multicopy genes in cattle. DNA Res. 2016;3(3):253–62. 5. Keel BN, Keele JW, Snelling WM. Genome-wide copy number variation in 29. Meacham F, Boffelli D, Dhahbi J, Martin DK, Singer M, Pachter L. the bovine genome detected using low coverage sequence of popular Identification and correction of systematic error in high-throughput beef breeds. Anim Genet. 2017;48(2):141–50. sequence data. BMC Bioinf. 2011;12:45. 6. Zhang F, Gu W, Hurles ME, Lupski JR. Copy number variation in human health, 30. Li H. Towards better understanding of artifacts in variant calling from high- disease, and evolution. Annu Rev Genomics Hum Genet. 2009;10:451–81. coverage samples. Bioinformatics. 2014;15;30(20):2843–51. Mielczarek et al. BMC Genomics (2018) 19:410 Page 13 of 13 31. Zhan B, Fadista J, Thomsen B, Hedegaard J, Panitz F, Bendixen C. Global assessment of genomic variation in cattle by genome resequencing and high-throughput genotyping. BMC Genomics. 2011;14;12:557. 32. Stothard P, Choi JW, Basu U, Sumner-Thomson JM, Meng Y, Liao X, Moore SS. Whole genome resequencing of black Angus and Holstein cattle for SNP and CNV discovery. BMC Genomics. 2011;12:559. 33. Turner DJ, Miretti M, Rajan D, Fiegler H, Carter NP, Blayney ML, et al. Germline rates of de novo meiotic deletions and duplications causing several genomic disorders. Nat Genet. 2008;40:90–5. 34. Chain FJJ, Feulner PGD, Panchal M, Eizaguirre C, Samonte IE, Kalbe M, et al. Extensive copy-number variation of young genes across stickleback populations. PLoS Genet. 2014;10(12):e1004830. 35. Sánchez-Molano E, Tsiokos D, Chatziplis D, Jorjani H, Degano L, Diaz C, et al. A practical approach to detect ancestral haplotypes in livestock populations. BMC Genet. 2016;17:91. 36. Boitard S, Rodríguez W, Jay F, Mona S, Austerlitz F. Inferring population size history from large samples of genome-wide molecular data - an approximate Bayesian computation approach. PLoS Genet. 2016;12(3):e1005877. 37. Durkin K, Coppieters W, Drögemüller C, Ahariz N, Cambisano N, Druet T, et al. Serial translocation by means of circular intermediates underlies colour sidedness in cattle. Nature. 2012;482:81–4. 38. Ignatieva EV, Levitsky VG, Yudin NS, Moshkin MP, Kolchanov NA. Genetic basis of olfactory cognition: extremely high level of DNA sequence polymorphism in promoter regions of the human olfactory receptor genes revealed using the 1000 genomes project dataset. Front Psychol. 2014;5:247. 39. Niimura Y, Nei M. Evolution of olfactory receptor genes in the human genome. Proc Natl Acad Sci U S A. 2003;100(21):12235–40. 40. Szyda J, Frąszczak M, Mielczarek M, Giannico R, Minozzi G, Nicolazzi EL, et al. The assessment of inter-individual variation of whole-genome DNA sequence in 32 cows. Mamm Genome. 2015;28(11):658–65. 41. Li H, Durbin R. Fast and accurate short read alignment with burrows- wheeler transform. Bioinformatics. 2009;25:1754–60. 42. Zimin AV, Delcher AL, Florea L, Kelley DR, Schatz MC, Puiu D, et al. A whole- genome assembly of the domestic cow, Bos taurus. Genome Biol. 2009;10: R42. 43. Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, et al. The sequence alignment/map (SAM) format and SAMtools. Bioinformatics. 2009; 25:2078–9. 44. Abyzov A, Urban AE, Snyder M, Gerstein M. CNVnator: an approach to discover, genotype, and characterize typical and atypical CNVs from family and population genome sequencing. Genome Res. 2011;21:974–84. 45. Ye K, Schulz MH, Long Q, Apweiler R, Ning Z. Pindel: a pattern growth approach to detect break points of large deletions and medium sized insertions from paired-end short reads. Bioinformatics. 2009;25(21):2865–71. 46. Medvedev P, Stanciu M, Brudno M. Computational methods for discovering structural variation with next-generation sequencing. Nat Methods. 2009;6: S13–20. 47. McLaren W, Pritchard B, Rios D, Chen Y, Flicek P, Cunningham F. Deriving the consequences of genomic variants with the Enssembl API and SNP effect predictor. Bioinformatics. 2010;26(16):2069–70. 48. Eilbeck K, Lewis SE, Mungall JC, Yandell M, Stein L, Durbin R, Ashburner M. The sequence ontology: a tool for the unification of genome annotations. Genome Biol. 2005;6:R44. 49. Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, Davis AP, et al. Gene ontology: tool for the unification of biology. Nat Genet. 2011;25: 25–9. 50. The Gene Ontology Consortium. Expansion of the gene ontology knowledgebase and resources. Nucleic Acids Res. 2017; 4;45(D1):D331–8. 51. Xie C, Mao X, Huang J, Ding Y, Wu J, Dong S, et al. KOBAS 2.0: a web server for annotation and identification of enriched pathways and diseases. Nucleic Acids Res. 2011;39:W316–22. 52. Wu J, Mao X, Cai T, Luo J, Wei L. KOBAS server: a web-based platform for automated annotation and pathway identification. Nucleic Acids Res. 2006; 34:W720–4. 53. R Development Core team. R: a language and environment for statistical computing. R Foundation for Statistical Computing; 2013. https://cran.r- project.org/doc/FAQ/R-FAQ.html. 54. Hou Y, Liu GE, Bickhart DM, Matukumalli LK, Li C, Song J, et al. Genomic regions showing copy number variations associate with resistance or susceptibility to gastrointestinal nematodes in Angus cattle. Funct Integr Genomics. 2012;12(1):81–92.
– Springer Journals
Published: May 29, 2018