Genome-wide mapping of large deletions and their population-genetic properties in dairy cattle

Genome-wide mapping of large deletions and their population-genetic properties in dairy cattle Large genomic deletions are potential candidate for loss-of-function, which could be lethal as homozygote. Analysing whole genome data of 175 cattle, we report 8,480 large deletions (199 bp–773 KB) with an overall false discovery rate of 8.8%; 82% of which are novel compared with deletions in the dbVar database. Breakpoint sequence analyses revealed that majority (24 of 29 tested) of the deletions contain microhomology/homology at breakpoint, and therefore, most likely generated by microhomology-mediated end joining. We observed higher differentia- tion among breeds for deletions in some genic-regions, such as ABCA12, TTC1, VWA3B, TSHR, DST/BPAG1, and CD1D. The genes overlapping deletions are on average evolutionarily less con- served compared with known mouse lethal genes (P-value ¼ 2.3  10 ). We report 167 natural gene knockouts in cattle that are apparently nonessential as live homozygote individuals are observed. These genes are functionally enriched for immunoglobulin domains, olfactory recep- 22 22 6 tors, and MHC classes (FDR ¼ 2.06  10 , 2.06  10 , 7.01  10 , respectively). We also demonstrate that deletions are enriched for health and fertility related quantitative trait loci 10 11 (2-and 1.5-fold enrichment, Fisher’s P-value ¼ 8.91  10 and 7.4  10 , respectively). Finally, we identified and confirmed the breakpoint of a 525 KB deletion on Chr23:12,291,761- 12,817,087 (overlapping BTBD9, GLO1 and DNAH8), causing stillbirth in Nordic Red Cattle. Key words: dairy cattle, structural variants, whole genome sequence, population genetics, loss-of-function 1. Introduction like milk and protein yield. An estimated yearly loss of $10.74 Embryonic lethality has become a challenge to cattle breeders, espe- million is attributed to known recessive lethals in four dairy cattle cially for dairy cattle where a limited number of bulls were exten- breeds from USA only, where Holstein accounts for 70% of the to- sively used in breeding for fast genetic progress in economic traits tal losses, followed by Jersey, Brown Swiss, and Ayrshire. Hence, V The Author 2017. Published by Oxford University Press on behalf of Kazusa DNA Research Institute. This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact Downloaded from https://academic.oup.com/dnaresearch/article-abstract/25/1/49/4107864 journals.permissions@oup.com 49 by Ed 'DeepDyve' Gillespie user on 16 March 2018 50 Genomic deletions in dairy cattle understanding the genomic architecture of cattle populations is im- software to produce BAM files for subsequent variant calling. In portant, now more than ever, for optimizing genetic gain while con- 1KBGP, SNPs and indels were called using ‘SAMtools 0.1.18 mpi- 26 27 straining negative impact of deleterious mutations responsible for leup’ software, while ‘GATK v1.6’ software was used for Nordic 20 21 genetic defects and inbreeding depression. WGS data (detailed method in and, respectively). For all the Unlike single nucleotide polymorphism (SNP) and small insertion or analysis, bovine genome assembly ‘UMD3.1’ was used as the refer- deletion (indel), structural variants (SVs), i.e. DNA alterations larger ence genome. than 50 base pairs (bp) that include insertions, deletions, duplications, inversions, and translocations, are the least explored polymorphisms in 2.3. Discovery and genotyping of deletions cattle. SVs contribute substantially to phenotypic variations and have a SVs can be detected from NGS data based on sequence signatures wide-spectrum of impact ranging from beneficial to lethal in both hu- such as (discordant) read-pair (RP), split-read (SP), and read-depth 3,4 5 mans and animals. The phenotypic impact of SVs in cattle is well evi- (RD), as well as de novo assembly of reads. However, approaches dent from numerous studies. For example, Xu et al. showed that a based on only one sequence signature could be constrained by high combination of SNPs with SVs could explain additional genetic variance false discovery rate (FDR), hence we employed a population scale 7 8 underlying milk production traits, while Charlier et al., Schutz et al., SV detection method called ‘Genome STRucture in Populations and Kadri et al., showed the lethal effect of large deletions in dairy cat- (Genome STRiP)’ —which leverages technical (e.g. RP and RD sig- tle. Furthermore, a 525 KB deletion on chromosome 23 is reported to nals) and population-level sequence features (e.g. coherence around be associated with stillbirth in Nordic Red Cattle. shared alleles, and heterogeneity of evidentiary sequences in different Earlier SV studies on cattle were mostly SNP-array based, such as genomes) for accurate discovery of deletions, and determines geno- array-comparative genomic hybridization, 50 K BovineSNP50 type (allelic state) of each locus from RD using a Gaussian mixture 12 13 BeadChip or 777 K BovineHD BeadChip (BovineHD chip) model. based. But, using these approaches a substantial portion of the ge- nome could not be explored and breakpoint resolution is still an is- 14 2.3.1. Genome STRiP sue. However, whole-genome sequence (WGS) based techniques For deletion discovery and genotyping ‘Genome STRiP’ software version could improve resolution as well as power to capture SVs in a wide 14 2.00.1678 was used. Following the documentation, we built a custom size and frequency spectrum. For example, majority of the novel 15,16 17 reference metadata bundle for cattle samples that includes alignability SVs in humans and mouse were detected using WGS mask, copy-number mask (CN2 mask), ploidy map, gender map. approaches. Besides, breakpoint sequences could also be assembled 18 Alignability mask represents sites on the reference genome that are with high accuracy from sequencing reads, which are necessary for 19 uniquely alignable by sequence read of a certain length (readLength). elucidating the mechanisms underlying SV formation. Our WGS data was a mixture of different ‘Illumina’ paired-end reads In the advent of next-generation sequencing (NGS) techniques, hun- rangingfrom90to101bp (Q1 ¼ 90, median ¼ 100, and Q3 ¼ 100), dreds of cattle (bull or cow) genomes were re-sequenced in collaborative 20 hence genome alignability mask was prepared with readLength value of initiative such as 1000 Bull Genomes Project (1KBGP) (and other inde- 21,22 90 using ‘ComputeGenomeMask’ utility from Genome STRiP. Copy- pendent projects, e.g. ) to build a comprehensive database of sequence number mask (CN2 mask), i.e. regions on the reference genome unlikely variants, mainly SNPs and indels. This NGS data provides a unique op- 23,24 to be copy-number variable in most individuals, was produced for the portunity to study SVs in cattle. However, few studies utilized these bovine assembly UMD3.1 excluding sex chromosome X, unplaced con- (and/or other) NGS resources so far for studying SVs in cattle. tigs, and repeat sequences (retrieved from RepeatMasker track of UCSC Therefore, in this study we scanned the WGSs of 175 cattle from Table Browser, accessed on 4 July 2016). three dairy breeds, namely Holstein, Jersey, and Nordic Red Cattle, to discover large deletions segregating in the population, and analyse 2.3.2. Pre-processing, deletion discovery and their population-genetic properties. In particular, we focused on understating the population diversity, stratification, and plausible genotyping functional effects. We also explored the probable mechanisms of SV We ran the preprocessing Queue script (dry run) to emit all the com- formation for a set of breakpoint-resolved deletions. mands, prepared bash scripts to run in Portable Batch System job scheduler, and executed these commands proving 175 BAM files (one for each sample) as input. 2. Materials and methods Large deletions (100 bp  size  1 MB) were discovered and fil- tered using SVDiscovery Queue script. Discovered sites were filtered 2.1. Animal samples and ethics (default filters) if (i) the site contained too high or too low read This study was performed on WGS of 175 dairy cattle from three pileup, (ii) RPs spacing was inconsistent with a single segregating de- breeds, e.g. 67 Holstein, 27 Jersey, and 81 Nordic Red Cattle. The letion, (iii) RD and RP evidences were inconsistent across samples, sample included 7 Holstein cows and 168 bulls from these three 20 (iv) RD differences were not significant, and (v) RP evidence was breeds144 animals from Run 5 of 1KBGP and 31 animals from 21 thinly distributed across samples (Genome STRiP Tutorial—GATK Nordic sequence data. Genome sequences were generated using Workshop 2013, http://software.broadinstitute.org/software/genome Illumina paired-end sequencing to an average coverage of 10-fold. strip/workshop-presentations, accessed on 26 August 2016). Here, we did not include any experimentation on animals and only dealt All passed sites were genotyped by SVGenotyper with default pa- with analysis-ready WGS data; hence, no ethical approval was required. rameters. Genotyped deletion calls were then filtered based on fol- lowing criteria, e.g. (i) sites with excess number of heterozygote calls 2.2. Sequence alignment to reference genome and (inbreeding coefficient 0.15), (ii) non-variant site based on geno- SNPs/indels calling type likelihood (parameter: non-variance score  13.0), (iii) sites Raw sequencing reads were filtered and ‘FASTQ’ files were aligned with too low or too high RD (parameter: 0.5  GSM1  2.0), to bovine reference genome assembly ‘UMD3.1’ using ‘BWA’ (iv) sites with less than 30% uniquely alignable bases, (v) potential Downloaded from https://academic.oup.com/dnaresearch/article-abstract/25/1/49/4107864 by Ed 'DeepDyve' Gillespie user on 16 March 2018 M. Mesbah-Uddin et al. 51 duplicate of another site (parameter: duplicateOverlapThreshold 0.5 2.5. Analysis of population genetic properties and duplicateScoreThreshold  0.0), (vi) start/end position of a dele- The population genetic properties of deletions, among the three tion call within 150 bp of assembly gap, (vii) all samples homozygous breeds, were studied in terms of population diversity, population struc- for reference allele (95% CI), and (viii) sites with  10% missing ture, and population differentiation. Population diversity was calcu- genotype. lated using ‘VariantsPerSampleAnnotator’ from ‘Genome STRiP’ software, which provides distribution of variants across samples and populations. We performed principal component analysis (PCA) using 2.4. Validation of deletions PLINK (v1.90p) software to distinguish three cattle breeds (details 2.4.1. Validation using 777K BovineHD BeadChip in Supplementary Material). We calculated V —a population strati- ST intensity data fication measure of SVs (highly correlated with Wright’s fixation in- We validated deletion calls using 777K BovineHD BeadChip dex, F ), for each deletion locus using variant allele frequency ST (Illumina, San Diego, CA, USA) intensity data on 26 Holstein sam- (VAF) and genotypes from pairwise comparison of one breed with the ples that were both WGS and 777K chip typed. We calculated FDR rest, such as Holstein vs Jersey þ Nordic Red Cattle, and vice versa. for the deletion call-set using IntensityRankSum (IRS) test imple- mented in Genome STRiP. Intensity file was prepared from raw chip 2.6. Functional annotation and enrichment analysis intensity data following the guideline for IRS test. Overall FDR for Functional annotation of deletions were performed using ‘Variant the call-set was calculated as two times the fraction of sites with IRS Effect Predictor (VEP-87)’ software, and enrichment of protein do- P-value  0.5 (i.e. sites with IRS P-value  0.5 to the sites with valid 36 37 38 mains (InterPro and Pfam ) and pathways (KEGG ) were 15,28 P-value). Details of IRS test could be found in. analysed using ‘STRING-v10 database’. Selective constraints on genes were measured from the ratio of non- synonymous (dN) to synonymous (dS) substitution rate, i.e. dN/dS ratio, 2.4.2. Validation by targeted assembly of breakpoint between cow-mouse 1-to-1 orthologues downloaded from Ensembl Targeted iterative graph routing assembler (TIGRA-0.4.3) soft- database (release 87, last accessed on 21 February 2017) using ware was used, with default parameters, for assembling deletion BioMart. Here we analysed whether dN/dS of genes overlapping dele- breakpoint sequences from a set of randomly selected deletions along tions are higher (i.e. less constrained) than that of mouse lethal genes with three previously known deletions segregating in the study popu- (from Dickinson et al. ) using Wilcoxson test. Reported causal genes lations. TIGRA extracted all reads mapped to 500 bp upstream and for cattle were also retrieved from OMIA database (http://omia.angis. 50 bp downstream of start coordinate, and 50 bp upstream and org.au/, last accessed on 10 May 2016) for dN/dS comparison. 500 bp downstream of end coordinate of a given deletion; and reads We retrieved cattle quantitative trait loci (QTL) from QTLdb data- were then assembled iteratively using de Bruijn graph assembler with base (release 31; accessed on 6 January 2017); autosomal QTL from multiple k-mers (e.g. 15 bp followed by 25 bp). We aligned the as- Holstein, Jersey, Nordic Red Cattle and Ayrshire, associated to any of sembled contigs to UMD3.1 using Cow BLAT Search from UCSC the six trait classes, e.g. ‘Reproduction’, ‘Milk’, ‘Production’, Genome Browser (https://genome.ucsc.edu/cgi-bin/hgBlat) to visual- ‘Exterior’, ‘Meat and Carcass’, and ‘Health’, were considered for QTL ize and infer breakpoints from the alignments. enrichment analysis. We calculated fold enrichment for a trait, such as for Health related QTL: (No. of Health QTL on Deletions = Total QTL on deletions) / (Total Health QTL = Total QTL in the dataset), 2.4.3. Validation by PCR and amplicon sequencing and statistical significance using ‘Fisher’s exact test’ (two sided). We validated a previously reported 525 KB deletion segregating in Nordic Red Cattle using PCR and amplicon sequencing. Genomic DNA was extracted as described previously by Miller 2.7. Data manipulation, visualization, and statistical et al. from semen sample of two bulls carrying the deletion and analysis two non-carriers. The PCR reaction was done with the All statistical analyses and plots were generated in RStudio soft- DyNAzyme II DNA Polymerase (Thermo Fisher, MA, US) in a 30 44 45 ware running R software version 3.3.2, unless mentioned other- ml volume of 1 PCR buffer, 0.2 mM dNTPs, 10 pmol primer mix wise. BEDTools (v2.26.0) software is used for identifying the 0 0 (forward primer: 5 - AAGCCACCACAATGAGAAGC -3 and re- overlap between deletion calls and other genomic features, such as, 0 0 verse primer: 5 - TTTGGGGTAGGAGAAGTAGGG -3 )and ‘UMD3.1’ assembly gaps (from ‘UCSC Table Browser’), CNVs from 50 ng of genomic DNA. The cycling conditions were the following: 47 7,9,10 dbVar database , three known deletions from , QTL from (i) an initial denaturation at 95 C for 3 min, (ii) 35 cycles of 30 s QTLdb. VCFtools (v0.1.15) software and PLINK (v1.90p) software denaturation (94 C), 30 s hybridization (65.2 C), 30 s elongation were used for analysing the VCF file. (72 C), and a final 3 min elongation (72 C). PCR products were separated on a 2% agarose gel, purified and directly sequenced us- ing the BigDye Terminator Cycle Sequencing Kit (Applied 3. Results and discussion Biosystems, CA, US). Electrophoresis of sequencing reactions 3.1. Discovery and genotyping of deletions was performed on ‘3500xL Genetic Analyzers’ (Applied Biosystems, CA, US), and sequences were visualized with Deletion discovery and genotyping were carried out using Genome Sequencher 5.4.6 (Gene Codes Corporation, MI, USA). A 977 bp STRiP. After filtering, we report 8,480 large deletions with genotypes control amplification, with a primer pair within the deletion (for- in 67 Holsten, 27 Jersey, and 81 Nordic Red Cattle. The deletion 0 0 ward primer: 5 - CCCAATGCAAAATCACAAAA -3 and reverse size ranged from 199 bp to 773 KB with a mean of 4.5 KB (median ¼ 0 0 primer: 5 - CCAGAAAAGCTACACTTGAACTGA -3 ), was per- 1 KB), which is approximately 10 times smaller compared with 184 formed using the same reaction conditions as above except hybrid- deletion-CNVs (mean ¼ 44.5 KB, median ¼ 7.7 KB) reported in a re- ization was performed at 59.8 C. cent 777K BovineHD BeadChip (BovineHD chip) based study, Downloaded from https://academic.oup.com/dnaresearch/article-abstract/25/1/49/4107864 by Ed 'DeepDyve' Gillespie user on 16 March 2018 52 Genomic deletions in dairy cattle Figure 1. Number of ascertained deletions relative to variant allele count. Here, VAF is expressed in terms of variant allele count. Deletions down to an allele count of 1 (VAF ¼ 0.0026 and 0.0032, in cattle and humans, respectively) are also represented here. Human deletion calls by Mills et al. were downloaded from 1K Genomes Project FTP server (ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/pilot_data/paper_data_sets/companion_papers/mapping_structural_variation). Table 1. FDR estimates of Genome STRiP’ deletion calls using reflecting the resolution of our sequence-based calls. Only 18% of 777K BovineHD BeadChip intensity data the deletion calls have overlap (1 bp) with previously reported bo- a b c vine deletions (or CNV-loss) in the dbVar database (accessed on 27 Array-probe Overlap A B FDR January 2017), while remaining 82% are novel. However, 72% of One-probe 497 28 11.3% our deletion regions remained unique when compared with all CNVs >1 array-probe 206 3 2.9% (gain or loss) and copy number variable regions in the database. >2 array-probe 113 1 1.8% Interestingly, majority (80%) of these overlapping regions are from Overall 703 31 8.8% an earlier WGS-based study, where genome sequences of 27 Holstein, 17 Montbe ´ liarde, and 18 Normande bulls were analysed. A, No. of sites with P-value. Nonetheless, we were able to broaden the accessible deletion size- B, No. of sites with P-value  0.5 range, more importantly towards smaller one unascertainable by FDR estimates were based on ‘Wilcoxson rank sum test’ using BovineHD chip intensity data of 26 Holstein animals. FDR was calculated usual SNP-array based approaches. We also report high quality ge- as (B/A 3 2 3 100). notypes for all the 8,480 deletions. Apparently, there are more low frequency variants than that of high frequency one, and the fre- quency distribution is very similar to humans (Fig. 1). Previous NGS-based studies on cattle were mostly limited to SV discovery, while copy-number states (genotypes) were inferred using 3.2.1. BovineHD chip intensity 24 23 BovineHD chip, or custom SNPs array. However, in this study We used 777K BovineHD chip intensity data of 26 Holstein animals, we estimated the copy-number at each deletion locus (per sample) both chip-typed and sequenced, to validate the deletion calls using from RD within the region using a constrained Gaussian mixture Genome STRiP’ IRS test. We had partial power to investigate all de- model with three classes, e.g. copy-number zero (i.e. homozygous de- letions due to the sparsity of the array-probes (one probe per letion), one (i.e. heterozygous deletion) and two (i.e. homozygous 3.5 KB), and were underpowered to accurately verify small dele- reference). It is known from human studies that majority of the (bi- tions (e.g. overlapping one-probe). Furthermore, we could only test a 48,49 allelic) common SVs segregate on specific SNP haplotypes, deletion for which at least one of the 26 samples had non-reference 28,50 which could be imputed with high accuracy. Thus, this approach allele. Therefore, an estimate of FDR for the deletion call-set is pro- has the potential, albeit with large reference, for accurate haplotype vided here from the overall P-value distribution. In this approach, we phasing and imputation of SVs to large cohorts of low-density chip- were able to interrogate 8.3% of the total call, majority of which typed animals with no additional cost. contain a single array-probe within the region (Table 1 and Supplementary Table S1). We found that deletions overlapping only 3.2. Validation of deletions one array-probe had higher FDR (11.3%) compared with two or We validated the results using three approaches: (i) using BovineHD more probes. And finally, we showed that our deletion call-set had chip intensity data, (ii) breakpoint assembly and alignment, and (iii) an overall FDR of 8.8%, which is within our chosen threshold of PCR þ sequencing of amplicons. FDR  10%. Downloaded from https://academic.oup.com/dnaresearch/article-abstract/25/1/49/4107864 by Ed 'DeepDyve' Gillespie user on 16 March 2018 M. Mesbah-Uddin et al. 53 1,229 and 1,196, in Nordic Red Cattle, Holstein, and Jersey, respec- 3.2.2. Targeted breakpoint assembly tively (Fig. 4a). In contrast to heterozygosity, Jersey animals showed We next validated three randomly chosen set of ten-deletions each by highest levels of deletion-homozygosity, followed by Holstein assembling breakpoint sequences using TIGRA :10 deletions (Fig. 4b). Similar estimates of genetic diversity were also reported for 500 bp, 10 deletions > 500 bp but  1 KB, and 10 deletions with VAF these breeds in a SNP heterozygosity and runs-of-homozygosity anal- 0.10. Out of the thirty, we successfully resolved breakpoints of 26 ysis—where Jersey exhibited lowest (genome-wide) average nucleo- deletions (87% success rate) using a combination of TIGRA and tide diversity (and higher number/size of runs-of-homozygosity) BLAT search (Supplementary Table S2). Additionally, we assembled followed by Holstein and Nordic Red Cattle. These differences breakpoints of three previously reported deletions that were also segre- could be understood from the current effective population size (N ) gating in our study populations, such as a 662 KB deletiononchro- of these breeds, e.g. N of Jersey, Holstein, and Nordic Red Cattle mosome 12 encompassing RNASEH2B, GUCY1B2,and FAM124A, are 73, 99, and 106, respectively ; this entailed higher diversity in a 3 KB deletion on chromosome 21 encompassing FANCI,and a Nordic Red Cattle, and Holstein, than in Jersey. From singletons es- 525 KB deletion on chromosome 23 encompassing BTBD9, GLO1, timate it is also evident that Nordic Red Cattle has more rare dele- and DNAH8. Breakpoints of the former two deletions were previously 7,9 tions compared with Holstein and Jersey (Fig. 4c); partly could be reported in, which exactly matched with our predicted breakpoint due to incorrect ascertainment of rare ones. sequences (Supplementary Figs S1a–c and S2a–c). Although for the later, we resolved breakpoint sequences in this study (Fig. 2a and b). Overall, the success rate of our deletion-breakpoint assembly was bet- 3.3.2. Principal component analysis ter than the reported success rate of TIGRA on similar sized read- We performed PCA using the deletion genotypes of the samples. length. And Genome STRiP’s breakpoint predictions were on aver- Around 6 K deletions with VAF between 0.02 and 0.90 were used in age within 20 bp of the validated breakpoint, which is within the tool’s the analysis. For comparison, we also performed PCA on 168 K bi- reported estimate of 1–20 bp. allelic SNPs randomly selected from 29 autosomes of the same indi- viduals. The first two principal components (PCs) from both deletion and SNP-based PCA clearly distinguished the three breeds, 3.2.3. PCR and amplicon sequencing and jointly explained 20 and 16.2% of the variance, respectively We then experimentally validated breakpoints for ‘Chr23:12,291,761- (Fig. 5a and b). In addition, PC3 and PC4 recapitulated substructures 12,817,087’ deletion, previously reported to be associated with still- within Nordic Red cattle (Supplementary Fig. S3), and first five PCs birth in Nordic Red Cattle. Four animals were used for PCR valida- cumulatively explained 33.6% (with the deletions) and 28.4% (with tion: two carriers and two non-carriers. The breakpoint spanning PCR the SNPs) of the variance (Supplementary Fig. S4). Our deletion re- products of 359 bp were only observed in the carrier animals, while no sults agree with the known population structure of the three breeds. amplicon was seen for non-carriers (Fig. 3a). The 359 bp amplicon Similar population structure (and substructure within Nordic Red was then sequenced, and exact breakpoint sequences were observed Cattle) has been reported using genome-wide SNPs. Nordic Red (Fig. 3b), thus confirming the breakpoint for this deletion. Cattle from Denmark showed closer relationship with the Holstein in our WGS samples (Supplementary Fig. S5). This is consistent with 3.3. Population genetic properties of deletions the known history of Holstein interrogation in Danish Red cattle, as 3.3.1. Population diversity previously reported based on imputed WGS SNP analysis on a larger We explored population diversity among the three dairy cattle breeds sample. It is also known from admixture analysis that genomes of from per-individual deletion-heterozygosity and homozygosity. We Nordic Red Cattle are a mosaic of multiple ancestral populations, found that individuals from Nordic Red Cattle exhibits 3.5 and i.e. more ancestral components in Nordic Red Cattle than in 52,54 6.4% higher deletion-heterozygosity than in Holstein and Jersey, re- Holstein and Jersey ; our deletion-based PCA largely corroborate spectively. Median numbers of heterozygote-deletion were 1,272, that. Figure 2. A 525-KB deletion on chromosome 23 discovered using Genome STRiP (a), and resolved breakpoint sequences from TIGRA and BLAT search (b). Shaded bases are a 5-bp microhomology at breakpoint junction. (This figure was drawn and modified using Inkscape version 0.91.) Downloaded from https://academic.oup.com/dnaresearch/article-abstract/25/1/49/4107864 by Ed 'DeepDyve' Gillespie user on 16 March 2018 54 Genomic deletions in dairy cattle Figure 3. Experimental validation of the 525KB deletion on chromosome 23. (a) PCR amplification across (left) and within the deletion (right) for two carrier (D/þ) and two homozygous wild-type (þ/þ) animals. Water, negative control; M, molecular weight marker (GeneRuler 100bp DNA ladder, Fermentas). (b) Sequence trace of the 359bp amplicon bridging the breakpoint. Shaded bases are a 5-bp microhomology at breakpoint junction. (This figure was drawn and modified using Inkscape version 0.91.) Chr20:26,812,159-26,812,834 (in Holstein and Jersey) overlap QTL as- 3.3.3. V analysis ST 59,60 sociated with calving traits, Chr20:45,816,245-45,820,519 (in We analysed population stratification in terms of V , a measure ST Holstein & Nordic Red Cattle) with meat and carcass trait, and highly correlated with Wright’s fixation index (F ), to identify popu- ST Chr23:49,778,653-49,782,567 (in Holstein and Nordic Red Cattle) lation differentiation. We calculated V for each deletion pairwise ST with body weight. We also identified a highly differentiated fertility as- amongst the breeds (e.g. Holstein vs Jersey þ Nordic Red Cattle) from 63,64 sociated gene DST/BPAG1 (Chr23:3,486,232-3,486,603) in VAFs. We identified 158 highly stratified deletions (pairwise V ST Nordic Red Cattle. One differentially selected deletion (V ¼ 0.28) of ST mean þ 4 S.D.) among the breeds (Fig. 6). Around 27% of these dele- chromosome 3 (Chr3:12,141,822-12,170,916) overlapping ENSBTAG tions overlap genic elements, i.e. exons, introns, or (upstream/down- 00000047776 and ENSBTAG00000024960 genes (human orthologue stream) untranslated regions (UTRs), and remaining 73% are intergenic CD1D), drawn our attention (though it marginally failed our selection variants (Supplementary Tables S3–S5). There were eleven sites shared threshold); Holsteins exhibited VAF of 24.63% (have both homozygous between Holstein and Nordic Red Cattle, two sites between Holstein and heterozygous deletion), while it is mostly homozygous for the and Jersey, and one site between Nordic Red Cattle and Jersey. Among reference allele in Nordic Red cattle (VAF ¼ 0.62%) and Jersey these sites were gene variants, such as ABCA12 (Chr2:103,682,772- (VAF ¼ 0.0%). The CD1D gene has known function in host immune 103,684,297 in Holstein & Jersey) associated with growth and develop- 65,66 55,56 response and parasite resistance, and also reported differentially ex- ment, TTC1 (Chr7:73,725,513-73,725,918 in Holstein & Nordic pressed post intra-mammary infection. Majority of these stratified de- Red Cattle) with cold tolerance, VWA3B (Chr11:3,521,329- letion regions are novel compared with previous CNV studies in cattle, 3,522,551 in Holstein & Nordic Red Cattle) with milk glycosylated and therefore are interesting targets to investigate large deletions under- kappa-casein percentage, and were intergenic variants, such as going genetic drift or artificial selection. Chr15:41,393,393-41,393,780 (in Jersey and Nordic Red Cattle) and Downloaded from https://academic.oup.com/dnaresearch/article-abstract/25/1/49/4107864 by Ed 'DeepDyve' Gillespie user on 16 March 2018 M. Mesbah-Uddin et al. 55 Figure 4. Population diversity. (a) Heterozygous deletions per genome. (b) Homozygous deletions per genome. (c) Singletons per genome. Only high-confi- dence genotype calls are included. The y-axis in (a–c) represents the number of heterozygous, homozygous, and singleton deletions per genome, respectively. HOL, Holstein; JER, Jersey; RDC, Nordic Red Cattle. Figure 5. PCA depicting three dairy cattle breeds. The analysis is based on (a) 6 K deletions (0.02 < VAF < 0.9), and (b) 168 K bi-allelic SNPs randomly se- lected from 29 bovine autosomes. First two PCs from deletions and SNPs are plotted here; jointly explained 20 and 16.2% of the variance, respectively. HOL, Holstein; JER, Jersey; RDC, Nordic Red Cattle. Downloaded from https://academic.oup.com/dnaresearch/article-abstract/25/1/49/4107864 by Ed 'DeepDyve' Gillespie user on 16 March 2018 56 Genomic deletions in dairy cattle Figure 6. Population stratification based on V (a measure of differentiation for SV, highly correlated with Wright’s fixation index, F ). Horizontal dash line in- ST ST dicates highly stratified deletion regions (V  Mean þ 4 S.D.). Highly stratified genic-deletions, e.g. overlapping exons, introns, or UTRs, are highlighted with ST HGNC gene symbol. lethal genes (P-value 2.3  10 ; one-sided Wilcoxon test), and thus 3.4. Functional impact of deletions are evolutionarily less conserved. This is consistent with the rate of We annotated all the deletions using Variant Effect Predictor (Ensembl evolution seen in essential and non-essential genes—where mutations 87). Around 71% (6,019 SVs) variants were intergenic and remaining in essential genes were under strong purifying selection and thus 29% (2,461 SVs) overlapped genic elements, such as exons, introns, and evolved slowly (low dN/dS ratio), while non-essential genes were un- UTR. On average, high frequency gene disrupting deletions were some- der relaxed selection, and hence, evolved faster (high dN/dS ratio). what depleted compared with intergenic variants (VAF > intergenic Nonetheless, robustness of these processes is also evident in the evolu- VAF , P-value ¼ 0.04; one-sided Wilcoxon test). Furthermore, we genic tion of human essential genes. Interestingly, 77% human essential observed many common genic deletions. These genes are relatively less genes could even be traced back to pre-metazoans. conserved, and majority has multiple paralogs (discussed later). However, deletions on known essential genes were only observed as het- erozygote with relatively low VAF (<3%), and generally were private to 3.4.2. Nonessential genes in cattle a specific breed. For example, FANCI deletions (cause brachyspina ) In total, we found 5,000 deletions for which at least one individual were only observed in Holstein, and RNASEH2B deletions (cause em- was homozygous. In the set, we analysed homozygous deletions in bryonic lethality ) in Nordic Red Cattle. genes to find natural gene knockouts. We found 167 deleted genes (transcript-ablation or complete deletion) corresponding to 115 inde- 3.4.1. Selective constraints on genes overlapping pendent deletions that are apparently nonessential based on the oc- deletions currences of live homozygote individuals. This is 45% more than The relative abundance of high-frequency genic and intergenic variants the previous report. Nonetheless, we found 44% fewer genes indicate that majority of these intersected genes are non-essential, and compared with in humans (240 nonessential genes), which could thus did not affect the viability or fecundity of the carriers. To test this be due to the differences in sample size (175 vs 2,504 individuals) hypothesis, we analysed the selective constraints between deleted genes and study populations (3 vs 26 populations in human). (overlap of any genic element) and known mouse lethal genes (from Among these genes, 83% (139 genes) are protein-coding, 12% Dickinson et al. )interms of dN/dS ratio of cow-mouse 1-to-1 ortho- pseudogenes, and the rest are different types of small RNAs logues (Fig. 7). Here, high dN/dS values indicate low selective con- (Supplementary Table S6). Most of these genes belong to multigenic straints on genes, and low value indicates high constraints. We found families and are not highly conserved (median cow-mouse dN/dS of that genes in deletions have significantly higher dN/dS ratios than 0.17 vs OMIA genes dN/dS of 0.11; Supplementary Fig. S6), as Downloaded from https://academic.oup.com/dnaresearch/article-abstract/25/1/49/4107864 by Ed 'DeepDyve' Gillespie user on 16 March 2018 M. Mesbah-Uddin et al. 57 Figure 7. Difference between dN/dS ratios of mouse-lethal and deletion-overlapped genes in cattle. Cow genes for which one-to-one mouse orthologues avail- able were considered for a one-sided Wilcoxon rank-sum test. Mouse lethal genes are from Dickinson et al. expected for homozygous deletion. Moreover, this set of genes are Table 2. Enrichment of QTL on deletions functionally enriched in immunoglobulin domains, olfactory receptors, 22 22 6 and MHC classes (FDR ¼ 2.06  10 ,2.06  10 ,7.01  10 , Trait classes Fold enrichment P value (Fisher’s test) respectively), along with other related domains (Supplementary Tables Health 2 8.91  10 S7–S9). Similar functional enrichment of nonessential genes was also Reproduction 1.5 7.4  10 seen in humans. Olfactory receptor related genes are well known for Milk 0.8 2.45  10 extensive gains and losses in mammalian evolution. And population Exterior 0.5 1.85  10 specific copy-number variations of olfactory receptor genes were also Production 0.5 0.002 71 72 reported in human (deletions) and cattle (gains). Nevertheless, this Meat and Carcass 0.5 0.058 is the first report, to our knowledge, of homozygous deletion of olfac- a 42 Trait classes are from cattleQTLdb. QTL from autosomes of Holsteins, tory receptor genes in cattle. Jersey, Nordic Red Cattle, and Ayrshires were considered for Fisher’s exact test (two-sided). 3.4.3. QTL enrichment We next explored the enrichment (or depletion) of QTL on deleted regions (at least 1 bp overlap with deletion). We retrieved 24 K au- to no sequence homology, and thus could be characterized by micro- tosomal QTL from QTLdb reported to be associated with any of the homologies or simple blunt ends at the breakpoint junction. six trait classes, e.g. ‘Health’, ‘Reproduction’, ‘Milk’, ‘Exterior’, Breakpoint information is crucial for understanding the mechanism, ‘Production’, and ‘Meat and Carcass’. The association of deletions and therefore, we analysed 29 breakpoint resolved deletions from our with diseases, fitness or fertility related traits is well evident. Hence, validation set. We found that 24 of 29 deletions contain microhomology we suspected enrichment of fitness and fertility related traits for our ranging from 2 to 31 bp at the breakpoint, andtwo of whichalsocon- deletions. As expected, health (2-fold) and reproduction (1.5-fold) re- tain insertions (Supplementary Table S2). In addition, four deletions ex- lated QTL were significantly enriched, while other trait classes were hibited non-reference insertion at breakpoint junctions, and one deletion highly depleted (Table 2). Higher enrichment of health related QTL with no apparent homology. However, the number of breakpoint se- could be driven by immune-system genes, which were also highly en- quences analysed here were not a robust representation of our deletion riched in our dataset (discussed earlier). call-set (<0.5% deletions), though selected randomly (for validation), we were able to demonstrate that majority of deletions contain microho- 3.5. Deletion formation mechanisms moloy at breakpoint, followed by few insertions, and rarely with no ho- Finally we explored the probable mechanisms of deletion formation. mology. Our results largely agree with the trend reported for large There are two key mechanisms of SVs formation (for detail see re- deletions in humans, e.g. 70.8% deletions exhibited microhomology/ho- 19,73 view ); for example, recurrent SVs often result from ‘non-allelic 16 mology and 16.1% insertions at the breakpoint. homologous recombination’ between large low-copy repeats (LCRs), and thus, contain extensive sequence homology provided by LCRs, 3.6. Limitations such as segmental duplicates, at the flanking regions. In contrast, non-recurrent SVs often form either by ‘microhomology-mediated This study only focused on identifying deletions in cattle because of end joining’ or ‘non-homologus end joining’, which requires limited their potential relevance to loss-of-function and embryonic lethality. Downloaded from https://academic.oup.com/dnaresearch/article-abstract/25/1/49/4107864 by Ed 'DeepDyve' Gillespie user on 16 March 2018 58 Genomic deletions in dairy cattle 2. Cole, J.B., Null, D.J. and VanRaden, P.M. 2016, Phenotypic and genetic However, we had limited success to identify small deletions, such as effects of recessive haplotypes on yield, longevity, and fertility. J Dairy <200 bp due to reduced sensitivity of the SV caller. It is also not a Sci., 99, 7274–88. comprehensive list of deletions for these samples, since we could 3. Weischenfeldt, J., Symmons, O., Spitz, F. and Korbel, J. O. 2013, have missed many true deletions due to sensitivity, coverage, or strin- Phenotypic impact of genomic structural variation: insights from and for gent filtering (among other reasons). Furthermore, the short read human disease. Nat. Rev. Genet., 14, 125–138. length (100 bp) in our WGS dataset also made it difficult to resolve 4. Zarrei, M., MacDonald, J.R., Merico, D. and Scherer, S.W. 2015, A copy breakpoints from regions of long repeats. number variation map of the human genome. Nat Rev Genet, 16, 172–183. 3.7. Conclusions 5. Bickhart, D.M. and Liu, G.E. 2014, The challenges and importance of structural variation detection in livestock. Front. Genet., 5, 37. Loss-of-function variants are responsible for a substantial yearly- 6. Xu, L., Cole, J.B., Bickhart, D.M., et al. 2014, Genome wide CNV analy- economic loss in dairy industry, where a limited number of elite sires sis reveals additional variants associated with milk production traits in are in extensive use for rapid genetic gains. Mapping of such variants Holsteins. BMC Genomics, 15, 683. is essential for effective breeding planning and genomic selection. Here 7. Charlier, C., Agerholm, J. S., Coppieters, W., et al. 2012, A deletion in the we showed an NGS-based analytical framework suitable for bovine FANCI gene compromises fertility by causing fetal death and bra- population-scale mapping of large deletions in cattle, leveraging the chyspina. PLoS One, 7, e43085. available WGSs. Here we described population-genetic, functional, 8. Schutz, E., Wehrhahn, C., Wanjek, M., et al. 2016, The holstein friesian and evolutionary properties of discovered deletions. We identified and lethal haplotype 5 (HH5) results from a complete deletion of TBF1M and confirmed a 525 KB deletion on chromosome 23, causing stillbirth cholesterol deficiency (CDH) from an ERV-(LTR) insertion into the cod- ing region of APOB. PLoS One, 11, e0154602. in Nordic Red Cattle. We demonstrated that Nordic Red Cattle had 9. Kadri, N.K., Sahana, G., Charlier, C., et al. 2014, A 660-Kb deletion with higher population diversity than Holstein and Jersey, and deletion- antagonistic effects on fertility and milk production segregates at high fre- genotype could recapitulate genetic structure of these breeds. Natural quency in Nordic Red cattle: additional evidence for the common occur- gene knockouts are enriched for immune-related and olfactory recep- rence of balancing selection in livestock. PLoS Genet., 10, e1004049. tor genes. We also showed that deletions are significantly enriched for 10. Sahana, G., Iso-Touru, T., Wu, X., et al. 2016, A 0.5-Mbp deletion on bo- health and fertility related QTL, while depleted for production related vine chromosome 23 is a strong candidate for stillbirth in Nordic Red cat- QTL. Our population genetic and functional analysis showed promise tle. Genet. Sel. Evol., 48, 35. for inclusion of SVs in genomic studies in dairy cattle. This deletion 11. Liu, G. E., Hou, Y., Zhu, B., et al. 2010, Analysis of copy number varia- catalog will facilitate discovery, genotyping, and imputation of dele- tions among diverse cattle breeds. Genome Res, 20, 693–703. tions in large cohorts of animals, and subsequent studies for gene map- 12. Hou, Y., Liu, G. E., Bickhart, D. M., et al. 2011, Genomic characteristics of cattle copy number variations. BMC Genomics, 12, 127. ping and genomic prediction of breeding values. 13. Xu, L., Hou, Y., Bickhart, D. M., et al. 2016, Population-genetic proper- ties of differentiated copy number variations in cattle. Sci. Rep., 6, 23161. Acknowledgements 14. Alkan, C., Coe, B. P. and Eichler, E. E. 2011, Genome structural variation discovery and genotyping. Nat. Rev. Genet., 12, 363–376. Md Mesbah-Uddin benefited from a joint grant from the European Commission 15. Sudmant, P. H., Rausch, T., Gardner, E. J., et al. 2015, An integrated within the framework of the Erasmus-Mundus joint doctorate ‘EGS-ABG’. map of structural variation in 2,504 human genomes. Nature, 526, 75–81. 16. Mills, R. E., Walter, K., Stewart, C., et al. 2011, Mapping copy number Funding variation by population-scale genome sequencing. Nature, 470, 59–65. This research was supported by the Center for Genomic Selection in Animals and 17. Yalcin, B., Wong, K., Agam, A., et al. 2011, Sequence-based characteriza- Plants (GenSAP) funded by Innovation Fund Denmark (grant 0603-00519B). tion of structural variation in the mouse genome. Nature, 477, 326–329. 18. Chen, K., Chen, L., Fan, X., Wallis, J., Ding, L. and Weinstock, G. 2014, TIGRA: a targeted iterative graph routing assembler for breakpoint as- Data availability sembly. Genome Res., 24, 310–317. 19. Carvalho, C. M. and Lupski, J. R. 2016, Mechanisms underlying structural All relevant results are within the paper and its Supplementary data files. VCF variant formation in genomic disorders. Nat Rev Genet, 17,224–238. file with deletion calls could be found at https://github.com/MMesbahU/ 20. Daetwyler, H. D., Capitan, A., Pausch, H., et al. 2014, Whole-genome se- Deletions_in_cattle. WGSs of 44 samples (out of 175) are available from the quencing of 234 bulls facilitates mapping of monogenic and complex NCBI Sequence Read Archive (project accession numbers SRP039339 and traits in cattle. Nat. Genet., 46, 858–865. SRP065105). Among the 175 samples, 144 are from Run 6 of 1KBGP. Rest of 21. Brondum, R. F., Guldbrandtsen, B., Sahana, G., Lund, M. S. and Su, G. data are available only upon agreement with the commercial breeding organiza- 2014, Strategies for imputation to whole genome sequence using a single tion and should be requested directly from the senior author (G.S.: goutam.saha- or multi-breed reference population in cattle. BMC Genomics, 15, 728. na@mbg.au.dk) or the Center Director (M.S.L.: mogens.lund@mbg.au.dk). 22. Jansen, S., Aigner, B., Pausch, H., et al. 2013, Assessment of the genomic Conflict of interest: None declared. variation in a cattle population by re-sequencing of key animals at low to medium coverage. BMC Genomics, 14, 446. 23. Boussaha, M., Esquerre, D., Barbieri, J., et al. 2015, Genome-wide study of structural variants in bovine holstein, montbeliarde and normande Supplementary data dairy breeds. PLoS One, 10, e0135931. Supplementary data are available at DNARES online. 24. Chen, L., Chamberlain, A. J., Reich, C. M., Daetwyler, H. D. and Hayes, B. J. 2017, Detection and validation of structural variations in bovine whole-genome sequence data. Genet. Select. Evol., 49, 13. References 25. Li, H. and Durbin, R. 2009, Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics, 25, 1754–1760. 1. Charlier, C., Li, W., Harland, C., et al. 2016, NGS-based reverse genetic 26. Li, H., Handsaker, B., Wysoker, A., et al. 2009, The Sequence screen for common embryonic lethal mutations compromising fertility in Alignment/Map format and SAMtools. Bioinformatics, 25, 2078–2079. livestock. Genome Res., 26, 1333–1341. Downloaded from https://academic.oup.com/dnaresearch/article-abstract/25/1/49/4107864 by Ed 'DeepDyve' Gillespie user on 16 March 2018 M. Mesbah-Uddin et al. 59 27. McKenna, A., Hanna, M., Banks, E., et al. 2010, The Genome Analysis 53. Mao, X., Sahana, G., De Koning, D. J. and Guldbrandtsen, B. 2016, Toolkit: a MapReduce framework for analyzing next-generation DNA se- Genome-wide association studies of growth traits in three dairy cattle quencing data. Genome Res., 20, 1297–1303. breeds using whole-genome sequence data. J. Anim. Sci., 94, 1426–37. 28. Handsaker, R. E., Korn, J. M., Nemesh, J. and McCarroll, S. A. 2011, 54. Brondum, R. F., Rius-Vilarrasa, E., Stranden, I., et al. 2011, Reliabilities Discovery and genotyping of genome structural polymorphism by se- of genomic prediction using combined reference data of the Nordic Red quencing on a population scale. Nat. Genet., 43, 269–276. dairy cattle populations. J. Dairy Sci., 94, 4700–4707. 29. Karolchik, D., Hinrichs, A. S., Furey, T. S., et al. 2004, The UCSC Table 55. Xu, L., Bickhart, D. M., Cole, J. B., et al. 2015, Genomic signatures reveal Browser data retrieval tool. Nucleic Acids Res. 32, D493–496. new evidences for selection of important traits in domestic cattle. Mol. 30. Kent, W. J. 2002, BLAT–the BLAST-like alignment tool. Genome Res., Biol. Evol., 32, 711–25. 12, 656–664. 56. Cole, J. B., Waurich, B., Wensch-Dorendorf, M., Bickhart, D. M. and 31. Miller, S. A., Dykes, D. D. and Polesky, H. F. 1988, A simple salting out Swalve, H. H. 2014, A genome-wide association study of calf birth weight procedure for extracting DNA from human nucleated cells. Nucleic Acids in Holstein cattle using single nucleotide polymorphisms and phenotypes Res., 16, 1215. predicted from auxiliary traits. J Dairy Sci, 97, 3156–3172. 32. Purcell, S., Neale, B., Todd-Brown, K., et al. 2007, PLINK: a tool set for 57. Howard, J. T., Kachman, S. D., Snelling, W. M., et al. 2014, Beef cattle whole-genome association and population-based linkage analyses. Am. J. body temperature during climatic stress: a genome-wide association study. Hum. Genet., 81, 559–575. Int. J. Biometeorol., 58, 1665–1672. 33. Redon, R., Ishikawa, S., Fitch, K. R., et al. 2006, Global variation in copy 58. Buitenhuis, B., Poulsen, N. A., Gebreyesus, G. and Larsen, L. B. 2016, number in the human genome. Nature, 444, 444–454. Estimation of genetic parameters and detection of chromosomal regions 34. Wright, S. 1931, Evolution in Mendelian Populations. Genetics, 16, affecting the major milk proteins and their post translational modifica- 97–159. tions in Danish Holstein and Danish Jersey cattle. BMC Genet., 17, 114. 35. McLaren, W., Gil, L., Hunt, S. E., et al. 2016, The Ensembl Variant Effect 59. McClure, M. C., Morsci, N. S., Schnabel, R. D., et al. 2010, A genome Predictor. Genome Biol., 17, 122. scan for quantitative trait loci influencing carcass, post-natal growth and 36. Finn, R. D., Attwood, T. K., Babbitt, P. C., et al. 2017, InterPro in reproductive traits in commercial Angus cattle. Anim. Genet., 41, 2017-beyond protein family and domain annotations. Nucleic Acids Res., 597–607. 45, D190–D199. 60. Sahana, G., Guldbrandtsen, B. and Lund, M. S. 2011, Genome-wide asso- 37. Finn, R. D., Coggill, P., Eberhardt, R. Y., et al. 2016, The Pfam protein ciation study for calving traits in Danish and Swedish Holstein cattle. J. families database: towards a more sustainable future. Nucleic Acids Res., Dairy Sci., 94, 479–486. 44, D279–285. 61. McClure, M. C., Ramey, H. R., Rolf, M. M., et al. 2012, Genome-wide 38. Kanehisa, M., Furumichi, M., Tanabe, M., Sato, Y. and Morishima, K. association analysis for quantitative trait loci influencing Warner-Bratzler 2017, KEGG: new perspectives on genomes, pathways, diseases and shear force in five taurine cattle breeds. Anim. Genet., 43, 662–673. drugs. Nucleic Acids Res., 45, D353–61. 62. Snelling, W. M., Allan, M. F., Keele, J. W., et al. 2010, Genome-wide as- 39. Szklarczyk, D., Franceschini, A., Wyder, S., et al. 2015, STRING v10: sociation study of growth in crossbred beef cattle. J. Anim. Sci., 88, protein-protein interaction networks, integrated over the tree of life. 837–848. Nucleic Acids Res., 43, D447–52. 63. Cole, J. B., Wiggans, G. R., Ma, L., et al. 2011, Genome-wide association 40. Yates, A., Akanni, W., Amode, M. R., et al. 2016, Ensembl 2016. Nucleic analysis of thirty one production, health, reproduction and body confor- Acids Res., 44, D710–6. mation traits in contemporary U.S. Holstein cows. BMC Genomics, 12, 41. Kinsella, R. J., Kahari, A., Haider, S., et al. 2011, Ensembl BioMarts: a 408. hub for data retrieval across taxonomic space. Database (Oxford), 2011, 64. Lobago, F., Gustafsson, H., Bekana, M., Beckers, J. F. and Kindahl, H. bar030. 2006, Clinical features and hormonal profiles of cloprostenol-induced 42. Dickinson, M. E., Flenniken, A. M., Ji, X., et al. 2016, High-throughput early abortions in heifers monitored by ultrasonography. Acta Vet. discovery of novel developmental phenotypes. Nature, 537, 508–14. Scand., 48, 23. 43. Hu, Z. L., Park, C. A. and Reecy, J. M. 2016, Developmental progress and 65. Sandri, M., Stefanon, B. and Loor, J. J. 2015, Transcriptome profiles of current status of the Animal QTLdb. Nucleic Acids Res., 44, D827–33. whole blood in Italian Holstein and Italian Simmental lactating cows di- 44. RStudio Team 2016, RStudio: integrated development environment for R. verging for genetic merit for milk protein. J. Dairy Sci., 98, 6119–6127. RStudio, Inc.: Boston, MA. 66. Araujo, R. N., Padilha, T., Zarlenga, D., et al. 2009, Use of a candidate 45. R Core Team 2016, R: A language and environment for statistical com- gene array to delineate gene expression patterns in cattle selected for resis- puting. R Foundation for Statistical Computing: Vienna, Austria. tance or susceptibility to intestinal nematodes. Vet. Parasitol., 162, 46. Quinlan, A. R. and Hall, I. M. 2010, BEDTools: a flexible suite of utilities 106–115. for comparing genomic features. Bioinformatics, 26, 841–842. 67. Fang, L., Sahana, G., Su, G., et al. 2017, Integrating sequence-based 47. Lappalainen, I., Lopez, J., Skipper, L., et al. 2013, DbVar and DGVa: GWAS and RNA-seq provides novel insights into the genetic basis of mas- public archives for genomic structural variation. Nucleic Acids Res., 41, titis and milk production in dairy cattle. Sci. Rep., 7, 45560. D936–941. 68. Hurst, L. D. and Smith, N. G. 1999, Do essential genes evolve slowly? 48. McCarroll, S. A., Kuruvilla, F. G., Korn, J. M., et al. 2008, Integrated de- Curr. Biol., 9, 747–50. tection and population-genetic analysis of SNPs and copy number varia- 69. Blomen, V. A., Majek, P., Jae, L. T., et al. 2015, Gene essentiality and syn- tion. Nat. Genet., 40, 1166–74. thetic lethality in haploid human cells. Science, 350, 1092–6. 49. Conrad, D. F., Pinto, D., Redon, R., et al. 2010, Origins and functional 70. Niimura, Y. and Nei, M. 2007, Extensive gains and losses of olfactory re- impact of copy number variation in the human genome. Nature, 464, ceptor genes in mammalian evolution. PLoS One, 2, e708. 704–12. 71. Van Ziffle, J., Yang, W. and Chehab, F. F. 2011, Homozygous deletion of 50. Handsaker, R. E., Van Doren, V., Berman, J. R., et al. 2015, Large multi- six olfactory receptor genes in a subset of individuals with allelic copy number variations in humans. Nat. Genet., 47, 296–303. Beta-thalassemia. PLoS One, 6, e17327. 51. Zhang, Q., Guldbrandtsen, B., Bosse, M., Lund, M. S. and Sahana, G. 72. Lee, K., Nguyen, D. T., Choi, M., et al. 2013, Analysis of cattle olfactory 2015, Runs of homozygosity and distribution of functional variants in the subgenome: the first detail study on the characteristics of the complete cattle genome. BMC Genomics, 16, 542. olfactory receptor repertoire of a ruminant. BMC Genomics, 14, 596. 52. Bovine HapMap, C., Gibbs, R. A., Taylor, J. F., et al. 2009, 73. Hastings, P. J., Lupski, J. R., Rosenberg, S. M. and Ira, G. 2009, Genome-wide survey of SNP variation uncovers the genetic structure of Mechanisms of change in gene copy number. Nat. Rev. Genet., 10, cattle breeds. Science, 324, 528–32. 551–64. Downloaded from https://academic.oup.com/dnaresearch/article-abstract/25/1/49/4107864 by Ed 'DeepDyve' Gillespie user on 16 March 2018 http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png DNA Research Oxford University Press

Genome-wide mapping of large deletions and their population-genetic properties in dairy cattle

Free
11 pages

Loading next page...
 
/lp/ou_press/genome-wide-mapping-of-large-deletions-and-their-population-genetic-pRT0tmtGd0
Publisher
Oxford University Press
Copyright
© The Author 2017. Published by Oxford University Press on behalf of Kazusa DNA Research Institute.
ISSN
1340-2838
eISSN
1756-1663
D.O.I.
10.1093/dnares/dsx037
Publisher site
See Article on Publisher Site

Abstract

Large genomic deletions are potential candidate for loss-of-function, which could be lethal as homozygote. Analysing whole genome data of 175 cattle, we report 8,480 large deletions (199 bp–773 KB) with an overall false discovery rate of 8.8%; 82% of which are novel compared with deletions in the dbVar database. Breakpoint sequence analyses revealed that majority (24 of 29 tested) of the deletions contain microhomology/homology at breakpoint, and therefore, most likely generated by microhomology-mediated end joining. We observed higher differentia- tion among breeds for deletions in some genic-regions, such as ABCA12, TTC1, VWA3B, TSHR, DST/BPAG1, and CD1D. The genes overlapping deletions are on average evolutionarily less con- served compared with known mouse lethal genes (P-value ¼ 2.3  10 ). We report 167 natural gene knockouts in cattle that are apparently nonessential as live homozygote individuals are observed. These genes are functionally enriched for immunoglobulin domains, olfactory recep- 22 22 6 tors, and MHC classes (FDR ¼ 2.06  10 , 2.06  10 , 7.01  10 , respectively). We also demonstrate that deletions are enriched for health and fertility related quantitative trait loci 10 11 (2-and 1.5-fold enrichment, Fisher’s P-value ¼ 8.91  10 and 7.4  10 , respectively). Finally, we identified and confirmed the breakpoint of a 525 KB deletion on Chr23:12,291,761- 12,817,087 (overlapping BTBD9, GLO1 and DNAH8), causing stillbirth in Nordic Red Cattle. Key words: dairy cattle, structural variants, whole genome sequence, population genetics, loss-of-function 1. Introduction like milk and protein yield. An estimated yearly loss of $10.74 Embryonic lethality has become a challenge to cattle breeders, espe- million is attributed to known recessive lethals in four dairy cattle cially for dairy cattle where a limited number of bulls were exten- breeds from USA only, where Holstein accounts for 70% of the to- sively used in breeding for fast genetic progress in economic traits tal losses, followed by Jersey, Brown Swiss, and Ayrshire. Hence, V The Author 2017. Published by Oxford University Press on behalf of Kazusa DNA Research Institute. This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact Downloaded from https://academic.oup.com/dnaresearch/article-abstract/25/1/49/4107864 journals.permissions@oup.com 49 by Ed 'DeepDyve' Gillespie user on 16 March 2018 50 Genomic deletions in dairy cattle understanding the genomic architecture of cattle populations is im- software to produce BAM files for subsequent variant calling. In portant, now more than ever, for optimizing genetic gain while con- 1KBGP, SNPs and indels were called using ‘SAMtools 0.1.18 mpi- 26 27 straining negative impact of deleterious mutations responsible for leup’ software, while ‘GATK v1.6’ software was used for Nordic 20 21 genetic defects and inbreeding depression. WGS data (detailed method in and, respectively). For all the Unlike single nucleotide polymorphism (SNP) and small insertion or analysis, bovine genome assembly ‘UMD3.1’ was used as the refer- deletion (indel), structural variants (SVs), i.e. DNA alterations larger ence genome. than 50 base pairs (bp) that include insertions, deletions, duplications, inversions, and translocations, are the least explored polymorphisms in 2.3. Discovery and genotyping of deletions cattle. SVs contribute substantially to phenotypic variations and have a SVs can be detected from NGS data based on sequence signatures wide-spectrum of impact ranging from beneficial to lethal in both hu- such as (discordant) read-pair (RP), split-read (SP), and read-depth 3,4 5 mans and animals. The phenotypic impact of SVs in cattle is well evi- (RD), as well as de novo assembly of reads. However, approaches dent from numerous studies. For example, Xu et al. showed that a based on only one sequence signature could be constrained by high combination of SNPs with SVs could explain additional genetic variance false discovery rate (FDR), hence we employed a population scale 7 8 underlying milk production traits, while Charlier et al., Schutz et al., SV detection method called ‘Genome STRucture in Populations and Kadri et al., showed the lethal effect of large deletions in dairy cat- (Genome STRiP)’ —which leverages technical (e.g. RP and RD sig- tle. Furthermore, a 525 KB deletion on chromosome 23 is reported to nals) and population-level sequence features (e.g. coherence around be associated with stillbirth in Nordic Red Cattle. shared alleles, and heterogeneity of evidentiary sequences in different Earlier SV studies on cattle were mostly SNP-array based, such as genomes) for accurate discovery of deletions, and determines geno- array-comparative genomic hybridization, 50 K BovineSNP50 type (allelic state) of each locus from RD using a Gaussian mixture 12 13 BeadChip or 777 K BovineHD BeadChip (BovineHD chip) model. based. But, using these approaches a substantial portion of the ge- nome could not be explored and breakpoint resolution is still an is- 14 2.3.1. Genome STRiP sue. However, whole-genome sequence (WGS) based techniques For deletion discovery and genotyping ‘Genome STRiP’ software version could improve resolution as well as power to capture SVs in a wide 14 2.00.1678 was used. Following the documentation, we built a custom size and frequency spectrum. For example, majority of the novel 15,16 17 reference metadata bundle for cattle samples that includes alignability SVs in humans and mouse were detected using WGS mask, copy-number mask (CN2 mask), ploidy map, gender map. approaches. Besides, breakpoint sequences could also be assembled 18 Alignability mask represents sites on the reference genome that are with high accuracy from sequencing reads, which are necessary for 19 uniquely alignable by sequence read of a certain length (readLength). elucidating the mechanisms underlying SV formation. Our WGS data was a mixture of different ‘Illumina’ paired-end reads In the advent of next-generation sequencing (NGS) techniques, hun- rangingfrom90to101bp (Q1 ¼ 90, median ¼ 100, and Q3 ¼ 100), dreds of cattle (bull or cow) genomes were re-sequenced in collaborative 20 hence genome alignability mask was prepared with readLength value of initiative such as 1000 Bull Genomes Project (1KBGP) (and other inde- 21,22 90 using ‘ComputeGenomeMask’ utility from Genome STRiP. Copy- pendent projects, e.g. ) to build a comprehensive database of sequence number mask (CN2 mask), i.e. regions on the reference genome unlikely variants, mainly SNPs and indels. This NGS data provides a unique op- 23,24 to be copy-number variable in most individuals, was produced for the portunity to study SVs in cattle. However, few studies utilized these bovine assembly UMD3.1 excluding sex chromosome X, unplaced con- (and/or other) NGS resources so far for studying SVs in cattle. tigs, and repeat sequences (retrieved from RepeatMasker track of UCSC Therefore, in this study we scanned the WGSs of 175 cattle from Table Browser, accessed on 4 July 2016). three dairy breeds, namely Holstein, Jersey, and Nordic Red Cattle, to discover large deletions segregating in the population, and analyse 2.3.2. Pre-processing, deletion discovery and their population-genetic properties. In particular, we focused on understating the population diversity, stratification, and plausible genotyping functional effects. We also explored the probable mechanisms of SV We ran the preprocessing Queue script (dry run) to emit all the com- formation for a set of breakpoint-resolved deletions. mands, prepared bash scripts to run in Portable Batch System job scheduler, and executed these commands proving 175 BAM files (one for each sample) as input. 2. Materials and methods Large deletions (100 bp  size  1 MB) were discovered and fil- tered using SVDiscovery Queue script. Discovered sites were filtered 2.1. Animal samples and ethics (default filters) if (i) the site contained too high or too low read This study was performed on WGS of 175 dairy cattle from three pileup, (ii) RPs spacing was inconsistent with a single segregating de- breeds, e.g. 67 Holstein, 27 Jersey, and 81 Nordic Red Cattle. The letion, (iii) RD and RP evidences were inconsistent across samples, sample included 7 Holstein cows and 168 bulls from these three 20 (iv) RD differences were not significant, and (v) RP evidence was breeds144 animals from Run 5 of 1KBGP and 31 animals from 21 thinly distributed across samples (Genome STRiP Tutorial—GATK Nordic sequence data. Genome sequences were generated using Workshop 2013, http://software.broadinstitute.org/software/genome Illumina paired-end sequencing to an average coverage of 10-fold. strip/workshop-presentations, accessed on 26 August 2016). Here, we did not include any experimentation on animals and only dealt All passed sites were genotyped by SVGenotyper with default pa- with analysis-ready WGS data; hence, no ethical approval was required. rameters. Genotyped deletion calls were then filtered based on fol- lowing criteria, e.g. (i) sites with excess number of heterozygote calls 2.2. Sequence alignment to reference genome and (inbreeding coefficient 0.15), (ii) non-variant site based on geno- SNPs/indels calling type likelihood (parameter: non-variance score  13.0), (iii) sites Raw sequencing reads were filtered and ‘FASTQ’ files were aligned with too low or too high RD (parameter: 0.5  GSM1  2.0), to bovine reference genome assembly ‘UMD3.1’ using ‘BWA’ (iv) sites with less than 30% uniquely alignable bases, (v) potential Downloaded from https://academic.oup.com/dnaresearch/article-abstract/25/1/49/4107864 by Ed 'DeepDyve' Gillespie user on 16 March 2018 M. Mesbah-Uddin et al. 51 duplicate of another site (parameter: duplicateOverlapThreshold 0.5 2.5. Analysis of population genetic properties and duplicateScoreThreshold  0.0), (vi) start/end position of a dele- The population genetic properties of deletions, among the three tion call within 150 bp of assembly gap, (vii) all samples homozygous breeds, were studied in terms of population diversity, population struc- for reference allele (95% CI), and (viii) sites with  10% missing ture, and population differentiation. Population diversity was calcu- genotype. lated using ‘VariantsPerSampleAnnotator’ from ‘Genome STRiP’ software, which provides distribution of variants across samples and populations. We performed principal component analysis (PCA) using 2.4. Validation of deletions PLINK (v1.90p) software to distinguish three cattle breeds (details 2.4.1. Validation using 777K BovineHD BeadChip in Supplementary Material). We calculated V —a population strati- ST intensity data fication measure of SVs (highly correlated with Wright’s fixation in- We validated deletion calls using 777K BovineHD BeadChip dex, F ), for each deletion locus using variant allele frequency ST (Illumina, San Diego, CA, USA) intensity data on 26 Holstein sam- (VAF) and genotypes from pairwise comparison of one breed with the ples that were both WGS and 777K chip typed. We calculated FDR rest, such as Holstein vs Jersey þ Nordic Red Cattle, and vice versa. for the deletion call-set using IntensityRankSum (IRS) test imple- mented in Genome STRiP. Intensity file was prepared from raw chip 2.6. Functional annotation and enrichment analysis intensity data following the guideline for IRS test. Overall FDR for Functional annotation of deletions were performed using ‘Variant the call-set was calculated as two times the fraction of sites with IRS Effect Predictor (VEP-87)’ software, and enrichment of protein do- P-value  0.5 (i.e. sites with IRS P-value  0.5 to the sites with valid 36 37 38 mains (InterPro and Pfam ) and pathways (KEGG ) were 15,28 P-value). Details of IRS test could be found in. analysed using ‘STRING-v10 database’. Selective constraints on genes were measured from the ratio of non- synonymous (dN) to synonymous (dS) substitution rate, i.e. dN/dS ratio, 2.4.2. Validation by targeted assembly of breakpoint between cow-mouse 1-to-1 orthologues downloaded from Ensembl Targeted iterative graph routing assembler (TIGRA-0.4.3) soft- database (release 87, last accessed on 21 February 2017) using ware was used, with default parameters, for assembling deletion BioMart. Here we analysed whether dN/dS of genes overlapping dele- breakpoint sequences from a set of randomly selected deletions along tions are higher (i.e. less constrained) than that of mouse lethal genes with three previously known deletions segregating in the study popu- (from Dickinson et al. ) using Wilcoxson test. Reported causal genes lations. TIGRA extracted all reads mapped to 500 bp upstream and for cattle were also retrieved from OMIA database (http://omia.angis. 50 bp downstream of start coordinate, and 50 bp upstream and org.au/, last accessed on 10 May 2016) for dN/dS comparison. 500 bp downstream of end coordinate of a given deletion; and reads We retrieved cattle quantitative trait loci (QTL) from QTLdb data- were then assembled iteratively using de Bruijn graph assembler with base (release 31; accessed on 6 January 2017); autosomal QTL from multiple k-mers (e.g. 15 bp followed by 25 bp). We aligned the as- Holstein, Jersey, Nordic Red Cattle and Ayrshire, associated to any of sembled contigs to UMD3.1 using Cow BLAT Search from UCSC the six trait classes, e.g. ‘Reproduction’, ‘Milk’, ‘Production’, Genome Browser (https://genome.ucsc.edu/cgi-bin/hgBlat) to visual- ‘Exterior’, ‘Meat and Carcass’, and ‘Health’, were considered for QTL ize and infer breakpoints from the alignments. enrichment analysis. We calculated fold enrichment for a trait, such as for Health related QTL: (No. of Health QTL on Deletions = Total QTL on deletions) / (Total Health QTL = Total QTL in the dataset), 2.4.3. Validation by PCR and amplicon sequencing and statistical significance using ‘Fisher’s exact test’ (two sided). We validated a previously reported 525 KB deletion segregating in Nordic Red Cattle using PCR and amplicon sequencing. Genomic DNA was extracted as described previously by Miller 2.7. Data manipulation, visualization, and statistical et al. from semen sample of two bulls carrying the deletion and analysis two non-carriers. The PCR reaction was done with the All statistical analyses and plots were generated in RStudio soft- DyNAzyme II DNA Polymerase (Thermo Fisher, MA, US) in a 30 44 45 ware running R software version 3.3.2, unless mentioned other- ml volume of 1 PCR buffer, 0.2 mM dNTPs, 10 pmol primer mix wise. BEDTools (v2.26.0) software is used for identifying the 0 0 (forward primer: 5 - AAGCCACCACAATGAGAAGC -3 and re- overlap between deletion calls and other genomic features, such as, 0 0 verse primer: 5 - TTTGGGGTAGGAGAAGTAGGG -3 )and ‘UMD3.1’ assembly gaps (from ‘UCSC Table Browser’), CNVs from 50 ng of genomic DNA. The cycling conditions were the following: 47 7,9,10 dbVar database , three known deletions from , QTL from (i) an initial denaturation at 95 C for 3 min, (ii) 35 cycles of 30 s QTLdb. VCFtools (v0.1.15) software and PLINK (v1.90p) software denaturation (94 C), 30 s hybridization (65.2 C), 30 s elongation were used for analysing the VCF file. (72 C), and a final 3 min elongation (72 C). PCR products were separated on a 2% agarose gel, purified and directly sequenced us- ing the BigDye Terminator Cycle Sequencing Kit (Applied 3. Results and discussion Biosystems, CA, US). Electrophoresis of sequencing reactions 3.1. Discovery and genotyping of deletions was performed on ‘3500xL Genetic Analyzers’ (Applied Biosystems, CA, US), and sequences were visualized with Deletion discovery and genotyping were carried out using Genome Sequencher 5.4.6 (Gene Codes Corporation, MI, USA). A 977 bp STRiP. After filtering, we report 8,480 large deletions with genotypes control amplification, with a primer pair within the deletion (for- in 67 Holsten, 27 Jersey, and 81 Nordic Red Cattle. The deletion 0 0 ward primer: 5 - CCCAATGCAAAATCACAAAA -3 and reverse size ranged from 199 bp to 773 KB with a mean of 4.5 KB (median ¼ 0 0 primer: 5 - CCAGAAAAGCTACACTTGAACTGA -3 ), was per- 1 KB), which is approximately 10 times smaller compared with 184 formed using the same reaction conditions as above except hybrid- deletion-CNVs (mean ¼ 44.5 KB, median ¼ 7.7 KB) reported in a re- ization was performed at 59.8 C. cent 777K BovineHD BeadChip (BovineHD chip) based study, Downloaded from https://academic.oup.com/dnaresearch/article-abstract/25/1/49/4107864 by Ed 'DeepDyve' Gillespie user on 16 March 2018 52 Genomic deletions in dairy cattle Figure 1. Number of ascertained deletions relative to variant allele count. Here, VAF is expressed in terms of variant allele count. Deletions down to an allele count of 1 (VAF ¼ 0.0026 and 0.0032, in cattle and humans, respectively) are also represented here. Human deletion calls by Mills et al. were downloaded from 1K Genomes Project FTP server (ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/pilot_data/paper_data_sets/companion_papers/mapping_structural_variation). Table 1. FDR estimates of Genome STRiP’ deletion calls using reflecting the resolution of our sequence-based calls. Only 18% of 777K BovineHD BeadChip intensity data the deletion calls have overlap (1 bp) with previously reported bo- a b c vine deletions (or CNV-loss) in the dbVar database (accessed on 27 Array-probe Overlap A B FDR January 2017), while remaining 82% are novel. However, 72% of One-probe 497 28 11.3% our deletion regions remained unique when compared with all CNVs >1 array-probe 206 3 2.9% (gain or loss) and copy number variable regions in the database. >2 array-probe 113 1 1.8% Interestingly, majority (80%) of these overlapping regions are from Overall 703 31 8.8% an earlier WGS-based study, where genome sequences of 27 Holstein, 17 Montbe ´ liarde, and 18 Normande bulls were analysed. A, No. of sites with P-value. Nonetheless, we were able to broaden the accessible deletion size- B, No. of sites with P-value  0.5 range, more importantly towards smaller one unascertainable by FDR estimates were based on ‘Wilcoxson rank sum test’ using BovineHD chip intensity data of 26 Holstein animals. FDR was calculated usual SNP-array based approaches. We also report high quality ge- as (B/A 3 2 3 100). notypes for all the 8,480 deletions. Apparently, there are more low frequency variants than that of high frequency one, and the fre- quency distribution is very similar to humans (Fig. 1). Previous NGS-based studies on cattle were mostly limited to SV discovery, while copy-number states (genotypes) were inferred using 3.2.1. BovineHD chip intensity 24 23 BovineHD chip, or custom SNPs array. However, in this study We used 777K BovineHD chip intensity data of 26 Holstein animals, we estimated the copy-number at each deletion locus (per sample) both chip-typed and sequenced, to validate the deletion calls using from RD within the region using a constrained Gaussian mixture Genome STRiP’ IRS test. We had partial power to investigate all de- model with three classes, e.g. copy-number zero (i.e. homozygous de- letions due to the sparsity of the array-probes (one probe per letion), one (i.e. heterozygous deletion) and two (i.e. homozygous 3.5 KB), and were underpowered to accurately verify small dele- reference). It is known from human studies that majority of the (bi- tions (e.g. overlapping one-probe). Furthermore, we could only test a 48,49 allelic) common SVs segregate on specific SNP haplotypes, deletion for which at least one of the 26 samples had non-reference 28,50 which could be imputed with high accuracy. Thus, this approach allele. Therefore, an estimate of FDR for the deletion call-set is pro- has the potential, albeit with large reference, for accurate haplotype vided here from the overall P-value distribution. In this approach, we phasing and imputation of SVs to large cohorts of low-density chip- were able to interrogate 8.3% of the total call, majority of which typed animals with no additional cost. contain a single array-probe within the region (Table 1 and Supplementary Table S1). We found that deletions overlapping only 3.2. Validation of deletions one array-probe had higher FDR (11.3%) compared with two or We validated the results using three approaches: (i) using BovineHD more probes. And finally, we showed that our deletion call-set had chip intensity data, (ii) breakpoint assembly and alignment, and (iii) an overall FDR of 8.8%, which is within our chosen threshold of PCR þ sequencing of amplicons. FDR  10%. Downloaded from https://academic.oup.com/dnaresearch/article-abstract/25/1/49/4107864 by Ed 'DeepDyve' Gillespie user on 16 March 2018 M. Mesbah-Uddin et al. 53 1,229 and 1,196, in Nordic Red Cattle, Holstein, and Jersey, respec- 3.2.2. Targeted breakpoint assembly tively (Fig. 4a). In contrast to heterozygosity, Jersey animals showed We next validated three randomly chosen set of ten-deletions each by highest levels of deletion-homozygosity, followed by Holstein assembling breakpoint sequences using TIGRA :10 deletions (Fig. 4b). Similar estimates of genetic diversity were also reported for 500 bp, 10 deletions > 500 bp but  1 KB, and 10 deletions with VAF these breeds in a SNP heterozygosity and runs-of-homozygosity anal- 0.10. Out of the thirty, we successfully resolved breakpoints of 26 ysis—where Jersey exhibited lowest (genome-wide) average nucleo- deletions (87% success rate) using a combination of TIGRA and tide diversity (and higher number/size of runs-of-homozygosity) BLAT search (Supplementary Table S2). Additionally, we assembled followed by Holstein and Nordic Red Cattle. These differences breakpoints of three previously reported deletions that were also segre- could be understood from the current effective population size (N ) gating in our study populations, such as a 662 KB deletiononchro- of these breeds, e.g. N of Jersey, Holstein, and Nordic Red Cattle mosome 12 encompassing RNASEH2B, GUCY1B2,and FAM124A, are 73, 99, and 106, respectively ; this entailed higher diversity in a 3 KB deletion on chromosome 21 encompassing FANCI,and a Nordic Red Cattle, and Holstein, than in Jersey. From singletons es- 525 KB deletion on chromosome 23 encompassing BTBD9, GLO1, timate it is also evident that Nordic Red Cattle has more rare dele- and DNAH8. Breakpoints of the former two deletions were previously 7,9 tions compared with Holstein and Jersey (Fig. 4c); partly could be reported in, which exactly matched with our predicted breakpoint due to incorrect ascertainment of rare ones. sequences (Supplementary Figs S1a–c and S2a–c). Although for the later, we resolved breakpoint sequences in this study (Fig. 2a and b). Overall, the success rate of our deletion-breakpoint assembly was bet- 3.3.2. Principal component analysis ter than the reported success rate of TIGRA on similar sized read- We performed PCA using the deletion genotypes of the samples. length. And Genome STRiP’s breakpoint predictions were on aver- Around 6 K deletions with VAF between 0.02 and 0.90 were used in age within 20 bp of the validated breakpoint, which is within the tool’s the analysis. For comparison, we also performed PCA on 168 K bi- reported estimate of 1–20 bp. allelic SNPs randomly selected from 29 autosomes of the same indi- viduals. The first two principal components (PCs) from both deletion and SNP-based PCA clearly distinguished the three breeds, 3.2.3. PCR and amplicon sequencing and jointly explained 20 and 16.2% of the variance, respectively We then experimentally validated breakpoints for ‘Chr23:12,291,761- (Fig. 5a and b). In addition, PC3 and PC4 recapitulated substructures 12,817,087’ deletion, previously reported to be associated with still- within Nordic Red cattle (Supplementary Fig. S3), and first five PCs birth in Nordic Red Cattle. Four animals were used for PCR valida- cumulatively explained 33.6% (with the deletions) and 28.4% (with tion: two carriers and two non-carriers. The breakpoint spanning PCR the SNPs) of the variance (Supplementary Fig. S4). Our deletion re- products of 359 bp were only observed in the carrier animals, while no sults agree with the known population structure of the three breeds. amplicon was seen for non-carriers (Fig. 3a). The 359 bp amplicon Similar population structure (and substructure within Nordic Red was then sequenced, and exact breakpoint sequences were observed Cattle) has been reported using genome-wide SNPs. Nordic Red (Fig. 3b), thus confirming the breakpoint for this deletion. Cattle from Denmark showed closer relationship with the Holstein in our WGS samples (Supplementary Fig. S5). This is consistent with 3.3. Population genetic properties of deletions the known history of Holstein interrogation in Danish Red cattle, as 3.3.1. Population diversity previously reported based on imputed WGS SNP analysis on a larger We explored population diversity among the three dairy cattle breeds sample. It is also known from admixture analysis that genomes of from per-individual deletion-heterozygosity and homozygosity. We Nordic Red Cattle are a mosaic of multiple ancestral populations, found that individuals from Nordic Red Cattle exhibits 3.5 and i.e. more ancestral components in Nordic Red Cattle than in 52,54 6.4% higher deletion-heterozygosity than in Holstein and Jersey, re- Holstein and Jersey ; our deletion-based PCA largely corroborate spectively. Median numbers of heterozygote-deletion were 1,272, that. Figure 2. A 525-KB deletion on chromosome 23 discovered using Genome STRiP (a), and resolved breakpoint sequences from TIGRA and BLAT search (b). Shaded bases are a 5-bp microhomology at breakpoint junction. (This figure was drawn and modified using Inkscape version 0.91.) Downloaded from https://academic.oup.com/dnaresearch/article-abstract/25/1/49/4107864 by Ed 'DeepDyve' Gillespie user on 16 March 2018 54 Genomic deletions in dairy cattle Figure 3. Experimental validation of the 525KB deletion on chromosome 23. (a) PCR amplification across (left) and within the deletion (right) for two carrier (D/þ) and two homozygous wild-type (þ/þ) animals. Water, negative control; M, molecular weight marker (GeneRuler 100bp DNA ladder, Fermentas). (b) Sequence trace of the 359bp amplicon bridging the breakpoint. Shaded bases are a 5-bp microhomology at breakpoint junction. (This figure was drawn and modified using Inkscape version 0.91.) Chr20:26,812,159-26,812,834 (in Holstein and Jersey) overlap QTL as- 3.3.3. V analysis ST 59,60 sociated with calving traits, Chr20:45,816,245-45,820,519 (in We analysed population stratification in terms of V , a measure ST Holstein & Nordic Red Cattle) with meat and carcass trait, and highly correlated with Wright’s fixation index (F ), to identify popu- ST Chr23:49,778,653-49,782,567 (in Holstein and Nordic Red Cattle) lation differentiation. We calculated V for each deletion pairwise ST with body weight. We also identified a highly differentiated fertility as- amongst the breeds (e.g. Holstein vs Jersey þ Nordic Red Cattle) from 63,64 sociated gene DST/BPAG1 (Chr23:3,486,232-3,486,603) in VAFs. We identified 158 highly stratified deletions (pairwise V ST Nordic Red Cattle. One differentially selected deletion (V ¼ 0.28) of ST mean þ 4 S.D.) among the breeds (Fig. 6). Around 27% of these dele- chromosome 3 (Chr3:12,141,822-12,170,916) overlapping ENSBTAG tions overlap genic elements, i.e. exons, introns, or (upstream/down- 00000047776 and ENSBTAG00000024960 genes (human orthologue stream) untranslated regions (UTRs), and remaining 73% are intergenic CD1D), drawn our attention (though it marginally failed our selection variants (Supplementary Tables S3–S5). There were eleven sites shared threshold); Holsteins exhibited VAF of 24.63% (have both homozygous between Holstein and Nordic Red Cattle, two sites between Holstein and heterozygous deletion), while it is mostly homozygous for the and Jersey, and one site between Nordic Red Cattle and Jersey. Among reference allele in Nordic Red cattle (VAF ¼ 0.62%) and Jersey these sites were gene variants, such as ABCA12 (Chr2:103,682,772- (VAF ¼ 0.0%). The CD1D gene has known function in host immune 103,684,297 in Holstein & Jersey) associated with growth and develop- 65,66 55,56 response and parasite resistance, and also reported differentially ex- ment, TTC1 (Chr7:73,725,513-73,725,918 in Holstein & Nordic pressed post intra-mammary infection. Majority of these stratified de- Red Cattle) with cold tolerance, VWA3B (Chr11:3,521,329- letion regions are novel compared with previous CNV studies in cattle, 3,522,551 in Holstein & Nordic Red Cattle) with milk glycosylated and therefore are interesting targets to investigate large deletions under- kappa-casein percentage, and were intergenic variants, such as going genetic drift or artificial selection. Chr15:41,393,393-41,393,780 (in Jersey and Nordic Red Cattle) and Downloaded from https://academic.oup.com/dnaresearch/article-abstract/25/1/49/4107864 by Ed 'DeepDyve' Gillespie user on 16 March 2018 M. Mesbah-Uddin et al. 55 Figure 4. Population diversity. (a) Heterozygous deletions per genome. (b) Homozygous deletions per genome. (c) Singletons per genome. Only high-confi- dence genotype calls are included. The y-axis in (a–c) represents the number of heterozygous, homozygous, and singleton deletions per genome, respectively. HOL, Holstein; JER, Jersey; RDC, Nordic Red Cattle. Figure 5. PCA depicting three dairy cattle breeds. The analysis is based on (a) 6 K deletions (0.02 < VAF < 0.9), and (b) 168 K bi-allelic SNPs randomly se- lected from 29 bovine autosomes. First two PCs from deletions and SNPs are plotted here; jointly explained 20 and 16.2% of the variance, respectively. HOL, Holstein; JER, Jersey; RDC, Nordic Red Cattle. Downloaded from https://academic.oup.com/dnaresearch/article-abstract/25/1/49/4107864 by Ed 'DeepDyve' Gillespie user on 16 March 2018 56 Genomic deletions in dairy cattle Figure 6. Population stratification based on V (a measure of differentiation for SV, highly correlated with Wright’s fixation index, F ). Horizontal dash line in- ST ST dicates highly stratified deletion regions (V  Mean þ 4 S.D.). Highly stratified genic-deletions, e.g. overlapping exons, introns, or UTRs, are highlighted with ST HGNC gene symbol. lethal genes (P-value 2.3  10 ; one-sided Wilcoxon test), and thus 3.4. Functional impact of deletions are evolutionarily less conserved. This is consistent with the rate of We annotated all the deletions using Variant Effect Predictor (Ensembl evolution seen in essential and non-essential genes—where mutations 87). Around 71% (6,019 SVs) variants were intergenic and remaining in essential genes were under strong purifying selection and thus 29% (2,461 SVs) overlapped genic elements, such as exons, introns, and evolved slowly (low dN/dS ratio), while non-essential genes were un- UTR. On average, high frequency gene disrupting deletions were some- der relaxed selection, and hence, evolved faster (high dN/dS ratio). what depleted compared with intergenic variants (VAF > intergenic Nonetheless, robustness of these processes is also evident in the evolu- VAF , P-value ¼ 0.04; one-sided Wilcoxon test). Furthermore, we genic tion of human essential genes. Interestingly, 77% human essential observed many common genic deletions. These genes are relatively less genes could even be traced back to pre-metazoans. conserved, and majority has multiple paralogs (discussed later). However, deletions on known essential genes were only observed as het- erozygote with relatively low VAF (<3%), and generally were private to 3.4.2. Nonessential genes in cattle a specific breed. For example, FANCI deletions (cause brachyspina ) In total, we found 5,000 deletions for which at least one individual were only observed in Holstein, and RNASEH2B deletions (cause em- was homozygous. In the set, we analysed homozygous deletions in bryonic lethality ) in Nordic Red Cattle. genes to find natural gene knockouts. We found 167 deleted genes (transcript-ablation or complete deletion) corresponding to 115 inde- 3.4.1. Selective constraints on genes overlapping pendent deletions that are apparently nonessential based on the oc- deletions currences of live homozygote individuals. This is 45% more than The relative abundance of high-frequency genic and intergenic variants the previous report. Nonetheless, we found 44% fewer genes indicate that majority of these intersected genes are non-essential, and compared with in humans (240 nonessential genes), which could thus did not affect the viability or fecundity of the carriers. To test this be due to the differences in sample size (175 vs 2,504 individuals) hypothesis, we analysed the selective constraints between deleted genes and study populations (3 vs 26 populations in human). (overlap of any genic element) and known mouse lethal genes (from Among these genes, 83% (139 genes) are protein-coding, 12% Dickinson et al. )interms of dN/dS ratio of cow-mouse 1-to-1 ortho- pseudogenes, and the rest are different types of small RNAs logues (Fig. 7). Here, high dN/dS values indicate low selective con- (Supplementary Table S6). Most of these genes belong to multigenic straints on genes, and low value indicates high constraints. We found families and are not highly conserved (median cow-mouse dN/dS of that genes in deletions have significantly higher dN/dS ratios than 0.17 vs OMIA genes dN/dS of 0.11; Supplementary Fig. S6), as Downloaded from https://academic.oup.com/dnaresearch/article-abstract/25/1/49/4107864 by Ed 'DeepDyve' Gillespie user on 16 March 2018 M. Mesbah-Uddin et al. 57 Figure 7. Difference between dN/dS ratios of mouse-lethal and deletion-overlapped genes in cattle. Cow genes for which one-to-one mouse orthologues avail- able were considered for a one-sided Wilcoxon rank-sum test. Mouse lethal genes are from Dickinson et al. expected for homozygous deletion. Moreover, this set of genes are Table 2. Enrichment of QTL on deletions functionally enriched in immunoglobulin domains, olfactory receptors, 22 22 6 and MHC classes (FDR ¼ 2.06  10 ,2.06  10 ,7.01  10 , Trait classes Fold enrichment P value (Fisher’s test) respectively), along with other related domains (Supplementary Tables Health 2 8.91  10 S7–S9). Similar functional enrichment of nonessential genes was also Reproduction 1.5 7.4  10 seen in humans. Olfactory receptor related genes are well known for Milk 0.8 2.45  10 extensive gains and losses in mammalian evolution. And population Exterior 0.5 1.85  10 specific copy-number variations of olfactory receptor genes were also Production 0.5 0.002 71 72 reported in human (deletions) and cattle (gains). Nevertheless, this Meat and Carcass 0.5 0.058 is the first report, to our knowledge, of homozygous deletion of olfac- a 42 Trait classes are from cattleQTLdb. QTL from autosomes of Holsteins, tory receptor genes in cattle. Jersey, Nordic Red Cattle, and Ayrshires were considered for Fisher’s exact test (two-sided). 3.4.3. QTL enrichment We next explored the enrichment (or depletion) of QTL on deleted regions (at least 1 bp overlap with deletion). We retrieved 24 K au- to no sequence homology, and thus could be characterized by micro- tosomal QTL from QTLdb reported to be associated with any of the homologies or simple blunt ends at the breakpoint junction. six trait classes, e.g. ‘Health’, ‘Reproduction’, ‘Milk’, ‘Exterior’, Breakpoint information is crucial for understanding the mechanism, ‘Production’, and ‘Meat and Carcass’. The association of deletions and therefore, we analysed 29 breakpoint resolved deletions from our with diseases, fitness or fertility related traits is well evident. Hence, validation set. We found that 24 of 29 deletions contain microhomology we suspected enrichment of fitness and fertility related traits for our ranging from 2 to 31 bp at the breakpoint, andtwo of whichalsocon- deletions. As expected, health (2-fold) and reproduction (1.5-fold) re- tain insertions (Supplementary Table S2). In addition, four deletions ex- lated QTL were significantly enriched, while other trait classes were hibited non-reference insertion at breakpoint junctions, and one deletion highly depleted (Table 2). Higher enrichment of health related QTL with no apparent homology. However, the number of breakpoint se- could be driven by immune-system genes, which were also highly en- quences analysed here were not a robust representation of our deletion riched in our dataset (discussed earlier). call-set (<0.5% deletions), though selected randomly (for validation), we were able to demonstrate that majority of deletions contain microho- 3.5. Deletion formation mechanisms moloy at breakpoint, followed by few insertions, and rarely with no ho- Finally we explored the probable mechanisms of deletion formation. mology. Our results largely agree with the trend reported for large There are two key mechanisms of SVs formation (for detail see re- deletions in humans, e.g. 70.8% deletions exhibited microhomology/ho- 19,73 view ); for example, recurrent SVs often result from ‘non-allelic 16 mology and 16.1% insertions at the breakpoint. homologous recombination’ between large low-copy repeats (LCRs), and thus, contain extensive sequence homology provided by LCRs, 3.6. Limitations such as segmental duplicates, at the flanking regions. In contrast, non-recurrent SVs often form either by ‘microhomology-mediated This study only focused on identifying deletions in cattle because of end joining’ or ‘non-homologus end joining’, which requires limited their potential relevance to loss-of-function and embryonic lethality. Downloaded from https://academic.oup.com/dnaresearch/article-abstract/25/1/49/4107864 by Ed 'DeepDyve' Gillespie user on 16 March 2018 58 Genomic deletions in dairy cattle 2. Cole, J.B., Null, D.J. and VanRaden, P.M. 2016, Phenotypic and genetic However, we had limited success to identify small deletions, such as effects of recessive haplotypes on yield, longevity, and fertility. J Dairy <200 bp due to reduced sensitivity of the SV caller. It is also not a Sci., 99, 7274–88. comprehensive list of deletions for these samples, since we could 3. Weischenfeldt, J., Symmons, O., Spitz, F. and Korbel, J. O. 2013, have missed many true deletions due to sensitivity, coverage, or strin- Phenotypic impact of genomic structural variation: insights from and for gent filtering (among other reasons). Furthermore, the short read human disease. Nat. Rev. Genet., 14, 125–138. length (100 bp) in our WGS dataset also made it difficult to resolve 4. Zarrei, M., MacDonald, J.R., Merico, D. and Scherer, S.W. 2015, A copy breakpoints from regions of long repeats. number variation map of the human genome. Nat Rev Genet, 16, 172–183. 3.7. Conclusions 5. Bickhart, D.M. and Liu, G.E. 2014, The challenges and importance of structural variation detection in livestock. Front. Genet., 5, 37. Loss-of-function variants are responsible for a substantial yearly- 6. Xu, L., Cole, J.B., Bickhart, D.M., et al. 2014, Genome wide CNV analy- economic loss in dairy industry, where a limited number of elite sires sis reveals additional variants associated with milk production traits in are in extensive use for rapid genetic gains. Mapping of such variants Holsteins. BMC Genomics, 15, 683. is essential for effective breeding planning and genomic selection. Here 7. Charlier, C., Agerholm, J. S., Coppieters, W., et al. 2012, A deletion in the we showed an NGS-based analytical framework suitable for bovine FANCI gene compromises fertility by causing fetal death and bra- population-scale mapping of large deletions in cattle, leveraging the chyspina. PLoS One, 7, e43085. available WGSs. Here we described population-genetic, functional, 8. Schutz, E., Wehrhahn, C., Wanjek, M., et al. 2016, The holstein friesian and evolutionary properties of discovered deletions. We identified and lethal haplotype 5 (HH5) results from a complete deletion of TBF1M and confirmed a 525 KB deletion on chromosome 23, causing stillbirth cholesterol deficiency (CDH) from an ERV-(LTR) insertion into the cod- ing region of APOB. PLoS One, 11, e0154602. in Nordic Red Cattle. We demonstrated that Nordic Red Cattle had 9. Kadri, N.K., Sahana, G., Charlier, C., et al. 2014, A 660-Kb deletion with higher population diversity than Holstein and Jersey, and deletion- antagonistic effects on fertility and milk production segregates at high fre- genotype could recapitulate genetic structure of these breeds. Natural quency in Nordic Red cattle: additional evidence for the common occur- gene knockouts are enriched for immune-related and olfactory recep- rence of balancing selection in livestock. PLoS Genet., 10, e1004049. tor genes. We also showed that deletions are significantly enriched for 10. Sahana, G., Iso-Touru, T., Wu, X., et al. 2016, A 0.5-Mbp deletion on bo- health and fertility related QTL, while depleted for production related vine chromosome 23 is a strong candidate for stillbirth in Nordic Red cat- QTL. Our population genetic and functional analysis showed promise tle. Genet. Sel. Evol., 48, 35. for inclusion of SVs in genomic studies in dairy cattle. This deletion 11. Liu, G. E., Hou, Y., Zhu, B., et al. 2010, Analysis of copy number varia- catalog will facilitate discovery, genotyping, and imputation of dele- tions among diverse cattle breeds. Genome Res, 20, 693–703. tions in large cohorts of animals, and subsequent studies for gene map- 12. Hou, Y., Liu, G. E., Bickhart, D. M., et al. 2011, Genomic characteristics of cattle copy number variations. BMC Genomics, 12, 127. ping and genomic prediction of breeding values. 13. Xu, L., Hou, Y., Bickhart, D. M., et al. 2016, Population-genetic proper- ties of differentiated copy number variations in cattle. Sci. Rep., 6, 23161. Acknowledgements 14. Alkan, C., Coe, B. P. and Eichler, E. E. 2011, Genome structural variation discovery and genotyping. Nat. Rev. Genet., 12, 363–376. Md Mesbah-Uddin benefited from a joint grant from the European Commission 15. Sudmant, P. H., Rausch, T., Gardner, E. J., et al. 2015, An integrated within the framework of the Erasmus-Mundus joint doctorate ‘EGS-ABG’. map of structural variation in 2,504 human genomes. Nature, 526, 75–81. 16. Mills, R. E., Walter, K., Stewart, C., et al. 2011, Mapping copy number Funding variation by population-scale genome sequencing. Nature, 470, 59–65. This research was supported by the Center for Genomic Selection in Animals and 17. Yalcin, B., Wong, K., Agam, A., et al. 2011, Sequence-based characteriza- Plants (GenSAP) funded by Innovation Fund Denmark (grant 0603-00519B). tion of structural variation in the mouse genome. Nature, 477, 326–329. 18. Chen, K., Chen, L., Fan, X., Wallis, J., Ding, L. and Weinstock, G. 2014, TIGRA: a targeted iterative graph routing assembler for breakpoint as- Data availability sembly. Genome Res., 24, 310–317. 19. Carvalho, C. M. and Lupski, J. R. 2016, Mechanisms underlying structural All relevant results are within the paper and its Supplementary data files. VCF variant formation in genomic disorders. Nat Rev Genet, 17,224–238. file with deletion calls could be found at https://github.com/MMesbahU/ 20. Daetwyler, H. D., Capitan, A., Pausch, H., et al. 2014, Whole-genome se- Deletions_in_cattle. WGSs of 44 samples (out of 175) are available from the quencing of 234 bulls facilitates mapping of monogenic and complex NCBI Sequence Read Archive (project accession numbers SRP039339 and traits in cattle. Nat. Genet., 46, 858–865. SRP065105). Among the 175 samples, 144 are from Run 6 of 1KBGP. Rest of 21. Brondum, R. F., Guldbrandtsen, B., Sahana, G., Lund, M. S. and Su, G. data are available only upon agreement with the commercial breeding organiza- 2014, Strategies for imputation to whole genome sequence using a single tion and should be requested directly from the senior author (G.S.: goutam.saha- or multi-breed reference population in cattle. BMC Genomics, 15, 728. na@mbg.au.dk) or the Center Director (M.S.L.: mogens.lund@mbg.au.dk). 22. Jansen, S., Aigner, B., Pausch, H., et al. 2013, Assessment of the genomic Conflict of interest: None declared. variation in a cattle population by re-sequencing of key animals at low to medium coverage. BMC Genomics, 14, 446. 23. Boussaha, M., Esquerre, D., Barbieri, J., et al. 2015, Genome-wide study of structural variants in bovine holstein, montbeliarde and normande Supplementary data dairy breeds. PLoS One, 10, e0135931. Supplementary data are available at DNARES online. 24. Chen, L., Chamberlain, A. J., Reich, C. M., Daetwyler, H. D. and Hayes, B. J. 2017, Detection and validation of structural variations in bovine whole-genome sequence data. Genet. Select. Evol., 49, 13. References 25. Li, H. and Durbin, R. 2009, Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics, 25, 1754–1760. 1. Charlier, C., Li, W., Harland, C., et al. 2016, NGS-based reverse genetic 26. Li, H., Handsaker, B., Wysoker, A., et al. 2009, The Sequence screen for common embryonic lethal mutations compromising fertility in Alignment/Map format and SAMtools. Bioinformatics, 25, 2078–2079. livestock. Genome Res., 26, 1333–1341. Downloaded from https://academic.oup.com/dnaresearch/article-abstract/25/1/49/4107864 by Ed 'DeepDyve' Gillespie user on 16 March 2018 M. Mesbah-Uddin et al. 59 27. McKenna, A., Hanna, M., Banks, E., et al. 2010, The Genome Analysis 53. Mao, X., Sahana, G., De Koning, D. J. and Guldbrandtsen, B. 2016, Toolkit: a MapReduce framework for analyzing next-generation DNA se- Genome-wide association studies of growth traits in three dairy cattle quencing data. Genome Res., 20, 1297–1303. breeds using whole-genome sequence data. J. Anim. Sci., 94, 1426–37. 28. Handsaker, R. E., Korn, J. M., Nemesh, J. and McCarroll, S. A. 2011, 54. Brondum, R. F., Rius-Vilarrasa, E., Stranden, I., et al. 2011, Reliabilities Discovery and genotyping of genome structural polymorphism by se- of genomic prediction using combined reference data of the Nordic Red quencing on a population scale. Nat. Genet., 43, 269–276. dairy cattle populations. J. Dairy Sci., 94, 4700–4707. 29. Karolchik, D., Hinrichs, A. S., Furey, T. S., et al. 2004, The UCSC Table 55. Xu, L., Bickhart, D. M., Cole, J. B., et al. 2015, Genomic signatures reveal Browser data retrieval tool. Nucleic Acids Res. 32, D493–496. new evidences for selection of important traits in domestic cattle. Mol. 30. Kent, W. J. 2002, BLAT–the BLAST-like alignment tool. Genome Res., Biol. Evol., 32, 711–25. 12, 656–664. 56. Cole, J. B., Waurich, B., Wensch-Dorendorf, M., Bickhart, D. M. and 31. Miller, S. A., Dykes, D. D. and Polesky, H. F. 1988, A simple salting out Swalve, H. H. 2014, A genome-wide association study of calf birth weight procedure for extracting DNA from human nucleated cells. Nucleic Acids in Holstein cattle using single nucleotide polymorphisms and phenotypes Res., 16, 1215. predicted from auxiliary traits. J Dairy Sci, 97, 3156–3172. 32. Purcell, S., Neale, B., Todd-Brown, K., et al. 2007, PLINK: a tool set for 57. Howard, J. T., Kachman, S. D., Snelling, W. M., et al. 2014, Beef cattle whole-genome association and population-based linkage analyses. Am. J. body temperature during climatic stress: a genome-wide association study. Hum. Genet., 81, 559–575. Int. J. Biometeorol., 58, 1665–1672. 33. Redon, R., Ishikawa, S., Fitch, K. R., et al. 2006, Global variation in copy 58. Buitenhuis, B., Poulsen, N. A., Gebreyesus, G. and Larsen, L. B. 2016, number in the human genome. Nature, 444, 444–454. Estimation of genetic parameters and detection of chromosomal regions 34. Wright, S. 1931, Evolution in Mendelian Populations. Genetics, 16, affecting the major milk proteins and their post translational modifica- 97–159. tions in Danish Holstein and Danish Jersey cattle. BMC Genet., 17, 114. 35. McLaren, W., Gil, L., Hunt, S. E., et al. 2016, The Ensembl Variant Effect 59. McClure, M. C., Morsci, N. S., Schnabel, R. D., et al. 2010, A genome Predictor. Genome Biol., 17, 122. scan for quantitative trait loci influencing carcass, post-natal growth and 36. Finn, R. D., Attwood, T. K., Babbitt, P. C., et al. 2017, InterPro in reproductive traits in commercial Angus cattle. Anim. Genet., 41, 2017-beyond protein family and domain annotations. Nucleic Acids Res., 597–607. 45, D190–D199. 60. Sahana, G., Guldbrandtsen, B. and Lund, M. S. 2011, Genome-wide asso- 37. Finn, R. D., Coggill, P., Eberhardt, R. Y., et al. 2016, The Pfam protein ciation study for calving traits in Danish and Swedish Holstein cattle. J. families database: towards a more sustainable future. Nucleic Acids Res., Dairy Sci., 94, 479–486. 44, D279–285. 61. McClure, M. C., Ramey, H. R., Rolf, M. M., et al. 2012, Genome-wide 38. Kanehisa, M., Furumichi, M., Tanabe, M., Sato, Y. and Morishima, K. association analysis for quantitative trait loci influencing Warner-Bratzler 2017, KEGG: new perspectives on genomes, pathways, diseases and shear force in five taurine cattle breeds. Anim. Genet., 43, 662–673. drugs. Nucleic Acids Res., 45, D353–61. 62. Snelling, W. M., Allan, M. F., Keele, J. W., et al. 2010, Genome-wide as- 39. Szklarczyk, D., Franceschini, A., Wyder, S., et al. 2015, STRING v10: sociation study of growth in crossbred beef cattle. J. Anim. Sci., 88, protein-protein interaction networks, integrated over the tree of life. 837–848. Nucleic Acids Res., 43, D447–52. 63. Cole, J. B., Wiggans, G. R., Ma, L., et al. 2011, Genome-wide association 40. Yates, A., Akanni, W., Amode, M. R., et al. 2016, Ensembl 2016. Nucleic analysis of thirty one production, health, reproduction and body confor- Acids Res., 44, D710–6. mation traits in contemporary U.S. Holstein cows. BMC Genomics, 12, 41. Kinsella, R. J., Kahari, A., Haider, S., et al. 2011, Ensembl BioMarts: a 408. hub for data retrieval across taxonomic space. Database (Oxford), 2011, 64. Lobago, F., Gustafsson, H., Bekana, M., Beckers, J. F. and Kindahl, H. bar030. 2006, Clinical features and hormonal profiles of cloprostenol-induced 42. Dickinson, M. E., Flenniken, A. M., Ji, X., et al. 2016, High-throughput early abortions in heifers monitored by ultrasonography. Acta Vet. discovery of novel developmental phenotypes. Nature, 537, 508–14. Scand., 48, 23. 43. Hu, Z. L., Park, C. A. and Reecy, J. M. 2016, Developmental progress and 65. Sandri, M., Stefanon, B. and Loor, J. J. 2015, Transcriptome profiles of current status of the Animal QTLdb. Nucleic Acids Res., 44, D827–33. whole blood in Italian Holstein and Italian Simmental lactating cows di- 44. RStudio Team 2016, RStudio: integrated development environment for R. verging for genetic merit for milk protein. J. Dairy Sci., 98, 6119–6127. RStudio, Inc.: Boston, MA. 66. Araujo, R. N., Padilha, T., Zarlenga, D., et al. 2009, Use of a candidate 45. R Core Team 2016, R: A language and environment for statistical com- gene array to delineate gene expression patterns in cattle selected for resis- puting. R Foundation for Statistical Computing: Vienna, Austria. tance or susceptibility to intestinal nematodes. Vet. Parasitol., 162, 46. Quinlan, A. R. and Hall, I. M. 2010, BEDTools: a flexible suite of utilities 106–115. for comparing genomic features. Bioinformatics, 26, 841–842. 67. Fang, L., Sahana, G., Su, G., et al. 2017, Integrating sequence-based 47. Lappalainen, I., Lopez, J., Skipper, L., et al. 2013, DbVar and DGVa: GWAS and RNA-seq provides novel insights into the genetic basis of mas- public archives for genomic structural variation. Nucleic Acids Res., 41, titis and milk production in dairy cattle. Sci. Rep., 7, 45560. D936–941. 68. Hurst, L. D. and Smith, N. G. 1999, Do essential genes evolve slowly? 48. McCarroll, S. A., Kuruvilla, F. G., Korn, J. M., et al. 2008, Integrated de- Curr. Biol., 9, 747–50. tection and population-genetic analysis of SNPs and copy number varia- 69. Blomen, V. A., Majek, P., Jae, L. T., et al. 2015, Gene essentiality and syn- tion. Nat. Genet., 40, 1166–74. thetic lethality in haploid human cells. Science, 350, 1092–6. 49. Conrad, D. F., Pinto, D., Redon, R., et al. 2010, Origins and functional 70. Niimura, Y. and Nei, M. 2007, Extensive gains and losses of olfactory re- impact of copy number variation in the human genome. Nature, 464, ceptor genes in mammalian evolution. PLoS One, 2, e708. 704–12. 71. Van Ziffle, J., Yang, W. and Chehab, F. F. 2011, Homozygous deletion of 50. Handsaker, R. E., Van Doren, V., Berman, J. R., et al. 2015, Large multi- six olfactory receptor genes in a subset of individuals with allelic copy number variations in humans. Nat. Genet., 47, 296–303. Beta-thalassemia. PLoS One, 6, e17327. 51. Zhang, Q., Guldbrandtsen, B., Bosse, M., Lund, M. S. and Sahana, G. 72. Lee, K., Nguyen, D. T., Choi, M., et al. 2013, Analysis of cattle olfactory 2015, Runs of homozygosity and distribution of functional variants in the subgenome: the first detail study on the characteristics of the complete cattle genome. BMC Genomics, 16, 542. olfactory receptor repertoire of a ruminant. BMC Genomics, 14, 596. 52. Bovine HapMap, C., Gibbs, R. A., Taylor, J. F., et al. 2009, 73. Hastings, P. J., Lupski, J. R., Rosenberg, S. M. and Ira, G. 2009, Genome-wide survey of SNP variation uncovers the genetic structure of Mechanisms of change in gene copy number. Nat. Rev. Genet., 10, cattle breeds. Science, 324, 528–32. 551–64. Downloaded from https://academic.oup.com/dnaresearch/article-abstract/25/1/49/4107864 by Ed 'DeepDyve' Gillespie user on 16 March 2018

Journal

DNA ResearchOxford University Press

Published: Feb 1, 2018

There are no references for this article.

You’re reading a free preview. Subscribe to read the entire article.


DeepDyve is your
personal research library

It’s your single place to instantly
discover and read the research
that matters to you.

Enjoy affordable access to
over 18 million articles from more than
15,000 peer-reviewed journals.

All for just $49/month

Explore the DeepDyve Library

Search

Query the DeepDyve database, plus search all of PubMed and Google Scholar seamlessly

Organize

Save any article or search result from DeepDyve, PubMed, and Google Scholar... all in one place.

Access

Get unlimited, online access to over 18 million full-text articles from more than 15,000 scientific journals.

Your journals are on DeepDyve

Read from thousands of the leading scholarly journals from SpringerNature, Elsevier, Wiley-Blackwell, Oxford University Press and more.

All the latest content is available, no embargo periods.

See the journals in your area

DeepDyve

Freelancer

DeepDyve

Pro

Price

FREE

$49/month
$360/year

Save searches from
Google Scholar,
PubMed

Create lists to
organize your research

Export lists, citations

Read DeepDyve articles

Abstract access only

Unlimited access to over
18 million full-text articles

Print

20 pages / month

PDF Discount

20% off