Abstract Mice have been widely used as a model organism to investigate human gene–phenotype relationships based on a conjecture that orthologous genes generally perform similar functions and are associated with similar phenotypes. However, phenotypes associated with orthologous genes often turn out to be quite different between human and mouse. Herein, we devised a method to quantitatively compare phenotypes annotations associated with mouse models and human. Using semantic similarity comparisons, we identified orthologous genes with different phenotype annotations, of which the similarity score is on a par with that of random gene pairs. Analysis of sequence evolution and transcriptomic changes revealed that orthologous genes with phenotypic differences are correlated with changes in noncoding regulatory elements and tissue-specific expression profiles rather than changes in protein-coding sequences. To map accurate gene–phenotype relationships using model organisms, we propose that careful consideration of the evolutionary divergence of noncoding regulatory elements and transcriptomic profiles is essential. gene–phenotype relationship, mouse model, orthologous gene, noncoding regulatory element, transcriptomic profile Introduction Systematic genetic screening of model organisms has become a crucial tool for understanding human gene–phenotype relationships (White et al. 2013; Sun et al. 2016). Model organisms have been used to reveal phenotypic outcomes related to the manipulation of human orthologous genes, since their genetic backgrounds can be controlled through the breeding of isogenic lines (Aitman et al. 2011; Schughart et al. 2013). In particular, mice have been widely used as a model organism for probing the detailed molecular mechanisms underlying a disease that cannot be studied directly in human subjects due to ethical reasons. Mice engage in rapid reproduction, have a short life span, and are relatively easy to handle and manipulate at the molecular level compared with other mammals (Schofield et al. 2012). However, phenotypic differences in orthologous genes between human and mouse impose challenges when studying human phenotypes using mouse genetic approaches. Human phenotyping based on mouse genetics has been performed based on “orthology–function conjecture,” an assumption that orthologous genes generally perform conserved functions across species and are thus associated with similar phenotypes (Tatusov et al. 1997; Gabaldon and Koonin 2013). On the basis of this conjecture, phenotypes of mouse genes have been catalogued to interpret the phenotypes of human orthologous genes (Eppig et al. 2015). However, experiments have shown that the phenotypes associated with orthologous genes often differ (Kim et al. 2012). Specifically, knockout or gene manipulation using mouse models often do not present the expected phenotype observed in human subjects. For example, mutations of human SPTLC2 were reportedly associated with hereditary sensory and autonomic neuropathy type 1, a human genetic disorder of the neurological system (Rotthier et al. 2010). However, mutation of mouse SPTLC2 only shows abnormalities in liver physiology from the problem of circulating phospholipid (Hornemann et al. 2009). These phenotypic differences might be caused by evolutionary differences in the human and mouse genomes as a result of the ∼100 Ma divergence since the last common ancestor of these species, and possibly to changes in reproduction rate and body size (Benton and Donoghue 2007). However, it is unclear 1) which orthologous genes have undergone significant phenotypic changes between human and mouse, and 2) which types of molecular evolution are able to explain the phenotypic differences associated with orthologous gene. A quantitative semantic comparison of phenotype annotations across species is required to identify orthologous genes that underwent significant phenotypic changes. Existing resources for phenotype mapping allow systematic comparison of gene–phenotype relationships between human and mouse. Specifically, numerous methods for gene–phenotype mappings are available, including systematic phenotyping of mouse mutants (White et al. 2013; Eppig et al. 2015), and the association of human disease phenotypes and genes that have been investigated by various biomedical studies on patient cohorts, aided by genome-wide association studies (Welter et al. 2014; Amberger et al. 2015). Recent advances in ontology and text-mining techniques have helped to organize semantic representation of phenotype annotations, which is crucial for quantitative phenotype comparisons, so that it is now possible to connect the homologous or synonymous annotation of phenotype ontologies (Mungall et al. 2012; Smedley et al. 2013; Robinson and Webber 2014; Deans et al. 2015). In the present study, we devised a method for the systematic quantification of phenotypic differences in human and mouse orthologous genes. Specifically, we developed a phenotype similarity (PS) scoring method to build a statistical framework for the semantic comparison of phenotype ontologies associated with human and mouse orthologous genes. The PS score allowed us to quantify the statistical significance of similar or dissimilar phenotypes associated with orthologous gene pairs without potential bias from incomplete or excessive gene-phenotyping resources. A large number of orthologous genes yielded different phenotype annotations in human and mouse. Moreover, the PS score correctly captured phenotypic differences in a reference gene set that was previously linked to different phenotypes in human and mouse (Liao and Zhang 2008). Furthermore, we investigated molecular evolutionary features in an attempt to explain phenotypic differences between orthologous genes. We analyzed sequence divergence of orthologous genes and discovered that phenotypic differences were correlated with divergence of noncoding regulatory elements, whereas sequence divergence features of protein-coding regions were not a strong indicator of phenotypic differences. Recently, advances in comparative genomic and transcriptomic analyses revealed that divergence of noncoding regulatory elements of orthologous genes potentially contribute to changes in gene–phenotype relationships across species (Carroll 2008; Forrest et al. 2014; Yue et al. 2014). Consistently, from our transcriptome profile analysis, we confirmed that changes in gene expression triggered by regulatory divergence could explain phenotypic differences between orthologous gene pairs. Our results suggest careful consideration of evolutionary divergence of noncoding regulatory elements and transcriptome profiling is necessary to accurately and reliably map human gene–phenotype relationships using model organisms. Results Quantitative Phenotype Comparison of Human and Mouse Orthologous Genes We devised the PS score, a quantitative measure for comparing phenotype annotations associated with human and mouse orthologous genes (fig. 1; see Materials and Methods for details). Briefly, the PS score assesses the statistical significance of phenotypic similarity based on a comparison of the semantic similarity scores of orthologous gene pairs to those of random gene pairs. Semantic similarity scores were calculated by comparing human phenotype ontology (HPO) and mammalian phenotype ontology (MPO) (fig. 1a), adapted from phenotype comparison of disease genes and mouse models (PhenoDigm), providing pairwise similarity of HPO and MPO terms based on lexical matching or bridging with cross-species ontologies of anatomical and phenotypic quality annotations (Smedley et al. 2013; Eppig et al. 2015; Groza et al. 2015). Gene–phenotype relationships in mice were compiled from the mouse genome informatics (MGI) database (Eppig et al. 2015). The Online Mendelian inheritance in man (OMIM) database (Amberger et al. 2015) and HPO (Groza et al. 2015) were used to compile human gene–disease phenotype relationships (fig. 1b). In gene–phenotype relationships, a gene is usually associated with multiple phenotype annotations due to pleiotropic effects. Therefore, to build an unbiased semantic comparison from multiple annotations, best scoring matches for each phenotype annotation were selected and averaged to calculate the semantic similarity score (supplementary fig. S1, Supplementary Material online). Quantitative phenotype comparisons were performed on 2,142 one-to-one orthologous gene pairs with associated phenotype annotations in both species (fig. 1c). A total of 3,139 human genes associated with diseases were obtained using OMIM, and 7,546 mouse genes associated with phenotype annotations were obtained using MGI. Fig. 1. View largeDownload slide Phenotype similarity (PS) score of orthologous human and mouse genes. (a) Comparison of phenotype terms between human and mouse genes. Orange and cyan colored balls indicate human and mammalian phenotype ontology terms, respectively. (b) Human and mouse gene–phenotype relationships were displayed based on ontology mapping. (c) Orthologous genes and phenotype ontology mapping of human and mouse genes. (d) A comparison of PS scores calculated from orthologous and random gene pairs. The PS score distribution of random gene pairs is indicated by grey bars. (e) Example of high PS score gene (HPG), PLN, displayed using the semantic similarity score matrix. (f) Example of a low PS score gene (LPG), SPTLC2, displayed using the semantic similarity score matrix. Fig. 1. View largeDownload slide Phenotype similarity (PS) score of orthologous human and mouse genes. (a) Comparison of phenotype terms between human and mouse genes. Orange and cyan colored balls indicate human and mammalian phenotype ontology terms, respectively. (b) Human and mouse gene–phenotype relationships were displayed based on ontology mapping. (c) Orthologous genes and phenotype ontology mapping of human and mouse genes. (d) A comparison of PS scores calculated from orthologous and random gene pairs. The PS score distribution of random gene pairs is indicated by grey bars. (e) Example of high PS score gene (HPG), PLN, displayed using the semantic similarity score matrix. (f) Example of a low PS score gene (LPG), SPTLC2, displayed using the semantic similarity score matrix. The phenotypic similarity of human and mouse orthologous genes was assessed by statistical significance of the PS scores based on orthology–function conjecture. We compared semantic similarity scores with random expectation by shuffling of orthologous relationships having a similar number of ontology terms (supplementary figs. S2–S4, Supplementary Material online; see Materials and Methods). The PS score removes the bias of the semantic similarity score toward a greater number of phenotype ontology terms, possibly due to the higher random chance of best scoring matches when comparing a greater number of terms. We observed that the average and standard deviation of semantic similarity scores were affected by the number of HPO or MPO terms. However, when we converted them to the PS score, bias from the number of phenotype terms was eliminated (supplementary fig. S3, Supplementary Material online). Consequently, we compiled 642 HPGs, 642 LPGs, and 858 MPGs based on the PS score (fig. 1d). The High phenotype-similarity gene (HPG) category was applicable when the PS score of a gene was ranked in the top 30% of high phenotypic similarity among orthologous genes (P ≤ 0.0133). Conversely, the Low phenotype-similarity gene (LPG) category was used when the PS score was ranked in the bottom 30% of low phenotypic similarity, and therefore not significantly different random gene pairs (P ≥ 0.2486). The Moderate phenotype-similarity gene (MPG) category applied when the PS score was ambiguous and mapped neither to LPG nor HPG. Lists of HPGs/LPGs/MPGs and their detailed semantic similarity comparisons of each orthologous gene are provided in the Supplementary Material (supplementary data S1, Supplementary Material online) and our companion site (https://sbi.postech.ac.kr/w/PS). Validation of PS Score To confirm that the PS score correctly captured phenotypic differences between orthologous gene pairs, we analyzed the PS scores of genes, manually verified to display phenotypic differences for lethality according to literature curation from a different study (Liao and Zhang 2008); null mutation of human orthologous genes cause disease symptoms culminating in death before puberty, whereas knock-out of mouse orthologous genes did not result in lethal phenotypes (supplementary table S1, Supplementary Material online). Specifically, human and mouse orthologous gene pairs with the verification of phenotypic differences displayed significantly low PS scores compared with the average PS scores of orthologous gene pairs verified to have similar phenotypes for lethality in Liao et al. (P = 0.011, Mann–Whitney U Test; supplementary fig. S5, Supplementary Material online, verified gene set of similar phenotypes was presented in supplementary table S2, Supplementary Material online). For example, DMPK, one of the verified genes with phenotypic difference, has a low PS score (0.57). Indeed, mutation of DMPK is associated with severe congenital symptoms of myotonic dystrophy type-1 (ICD: 359.21) (Boucher et al. 1995). However, similar phenotypes were not observed in equivalent DMPK mutants in mice (Reddy et al. 1996; van den Broek et al. 2002). In the semantic similarity matrix of DMPK, the severe congenital symptoms present in the fetus were mapped in HPO terms, such as decreased fetal movement, feeding difficulties in infancy, cerebral atrophy, and respiratory distress. However, similar terms were not found in the MPO terms (search DMPK at https://sbi.postech.ac.kr/w/PS). Overall, low PS scores of the verified genes indicated that our method for phenotypic comparison of mouse and human genes correctly captured the phenotypic differences between orthologous genes. Whether mutation types found in human and mouse are different and affect the PS scoring and classification of LPGs and HPGs is an important point. Specifically, for missense mutations (low impact) in human, mouse and human phenotypes would be different because most mouse mutations are derived from knock-out studies in which gene deletion could have a high impact on gene function. To answer this question, we divided orthologous genes into three groups based on mutation type; missense (low impact), truncating (high impact), and others (see Materials and Methods). We found that mutation type did not have a significant influence on the classification of LPGs and HPGs. Genes with different mutation types showed a similar distribution of PS scores (supplementary fig. S6, Supplementary Material online; missense vs. truncating, P = 0.259; missense vs. other, P = 0.390; truncating vs. other, P = 0.401; Mann–Whitney U Test), suggesting that PS scores were not significantly affected by particular mutation types. Indeed, LPGs and HPGs included a similar proportion of mutation types in missense, truncating, and other groups (P = 0.113, Chi-square Test; supplementary table S3, Supplementary Material online). We also confirmed that mouse mutation types did not have a substantial influence on the classification of LPGs and HPGs since only small number of genes were investigated by hypomorphic mutations (N = 7 for LPGs, N = 5 for HPGs; see Materials and Methods). Most mouse mutations were assayed by knock-out experiments (N = 610 for LPGs, N = 590 for HPGs). Orthologous Genes Cover a Wide Range of PS Scores Orthologous genes displayed significantly higher PS scores than random gene pairs (P < 10−308, one-sample Kolmogorov–Smirnov Test; fig. 1d) which could be easily explained by orthology–function conjecture (Gabaldon and Koonin 2013). The conjecture postulates that orthologous genes tend to have the same functions and possibly share similar phenotypes because their sequences and functions have been evolutionarily conserved since speciation from the last common ancestor (Tatusov et al. 1997). For example, the phenotypes of the HPG cardiac phospholamban (PLN) and its mouse orthologous gene (Pln) are quite similar, and have a high PS score of 4.94 (fig. 1e). The PLN protein is known function in calcium homeostasis in heart muscle (Ceholski et al. 2012). In human, PLN is associated with cardiomyopathy (ICD: 425.4), of which phenotypes include ventricular arrhythmia/hypertrophy, atrial fibrillation and reduced systolic function (Schmitt et al. 2003). In mouse, Pln is also associated with abnormal ventricle pressure, cardiac muscle contractility/relaxation and myocardial fiber physiology (Schmitt et al. 2003). Thus, a high PS score suggests that the mouse phenotypes of orthologous genes are useful for investigating human diseases with similar phenotypes, and the mouse model successfully captures the human disease phenotype in this case. In contrast, orthologous genes often display dissimilar phenotypes in human and mouse. For example, the phenotypes of Serine palmitoyltransferase 2 (SPTLC2) and its mouse ortholog (Sptlc2) were very different (PS score = −0.41, fig. 1f). The phenotypes of human patients that have mutations in SPTLC2 are associated with problems in the neurological system, but mouse Sptlc2 mutations were instead associated with problems in liver physiology and phospholipid circulations. In human, SPTLC2 is linked to hereditary sensory and autonomic neuropathy type 1 (ICD: 356.2), of which phenotypes include sensorimotor neuropathy, distal sensory loss of all modalities or impairment (Rotthier et al. 2010). In contrast, Sptlc2 in mouse is linked to phenotypes related to sphingomyelin and cholesterol circulations (Hojjati et al. 2005). Sequence Divergence in Protein-Coding Genes Cannot Explain Phenotypic Differences between Orthologous Genes We found that most LPGs have quite different phenotype annotations in human and mouse. One might expect that sequence divergence would affect phenotypic differences of LPGs because orthologous genes usually have high sequence similarity, but genetic drift between species triggered the divergence of coding regions, changing the phenotypes between species. However, we found that sequence divergence of protein-coding regions could not significantly explain phenotypic differences between orthologous gene pairs (fig. 2 and supplementary fig. S7, Supplementary Material online). To quantify the sequence divergence of orthologous genes, we aligned pairwise coding sequences and calculated the percentage of aligned amino acids (i.e., the sequence identity). When we analyzed the association between sequence identity and the PS score of orthologous gene pairs, sequence identity of protein-coding regions could not significantly explain the variance of PS scores (β = −1.87 × 10−3 and P = 0.487, Generalized Linear Model; supplementary fig. S7a, Supplementary Material online, see Materials and Methods). Likewise, we found that there were no significant differences in sequence identity among LPGs, MPGs, and HPGs (P = 0.997, one-way ANOVA for ranks, P = 0.475, post hoc test between LPG and HPG, Mann–Whitney U Test; fig. 2a). Fig. 2. View largeDownload slide Correlation between sequence divergence and PS score. Differences in (a) sequence identity, (b) sequence length, (c) evolutionary rate, (d) changes in structural disorder, and (e) conservation of secondary structural elements between LPGs and HPGs. Examples of (f) LPG and (g) HPG are displayed on the left and right side of the correlation, respectively. Fig. 2. View largeDownload slide Correlation between sequence divergence and PS score. Differences in (a) sequence identity, (b) sequence length, (c) evolutionary rate, (d) changes in structural disorder, and (e) conservation of secondary structural elements between LPGs and HPGs. Examples of (f) LPG and (g) HPG are displayed on the left and right side of the correlation, respectively. For example, the phenotypes linked to LPG Wolframin (WFS1) are annotated differently between human and mouse, and the PS score is low (0.20; fig. 2f), even though orthologous WFS1 sequences are relatively highly conserved between human and mouse (sequence identity = 87%). For reference, the average sequence identity of all human and mouse orthologues is 85%. In human, WFS1 mutations are associated with Wolfram syndrome (OMIM: 116400) and deafness (OMIM: 600965) which display heterogeneous phenotypes in genitourinary, cardiovascular, nervous, endocrine and sensory organs, such as testicular atrophy, cardiomyopathy, schizophrenia, glucose intolerance, and sensorineural hearing impairment (Inoue et al. 1998; Cryns et al. 2003). However, mutation of mouse Wfs1 does not result in these abnormalities, although mouse mutants display several phenotypes that are similar to those in humans, such as impaired glucose tolerance and hyperactivity (Riggs et al. 2005; Kato et al. 2008). The HPG fibrocystin (PKHD1) yielded very similar phenotype annotations between human and mouse, with a high PS score of 3.52 (fig. 2g). PKHD1 mouse and human orthologs share 82% sequence identity. Human PKHD1 is associated with polycystic kidney and hepatic diseases (OMIM: 263200), and has been linked to a receptor protein that regulates centrosome duplication, as well as collecting duct and biliary differentiation (Bergmann et al. 2004). The phenotypes of PKHD1 mutations in both human and mouse orthologs are associated with abnormalities in the liver, kidney and pancreas, resulting in a high PS score (Garcia-Gonzalez et al. 2007; Woollard et al. 2007). We investigated additional sequence features of the coding regions of orthologous gene pairs to probe phenotype differences, but this failed to explain differences in LPGs and HPGs. For example, differences in gene length and evolutionary rate were not significant between LPGs and HPGs (fig. 2b and c). Insertion or deletion of disordered regions and secondary structure divergence could affect the structure and/or physicochemical properties of proteins and change promiscuous molecular interactions, which might alter protein function and hence the phenotypes of orthologous genes (Kim et al. 2008; Siltberg-Liberles 2011; Mosca et al. 2012). Therefore, we analyzed changes in disordered regions and secondary structure conservation in LPGs and HPGs, but did not identify significant differences (fig. 2d and e). Likewise, the additional sequence features could not significantly explain the variance of the PS score (supplementary fig. S7b–e, Supplementary Material online). Taken together, our results suggest that sequence features of coding regions are unable to explain phenotypic differences between human and mouse orthologous genes. We therefore investigated which types of molecular evolutionary features could explain the phenotypic differences (discussed below). Divergence of Noncoding Regulatory Sequences Is Correlated with Phenotypic Differences Phenotypic differences may be the result of changes in noncoding regulatory sequences, which are frequently observed within mammalian species (Villar et al. 2014; Young et al. 2015). Therefore, we compared the noncoding regulatory sequences of LPGs and HPGs, and found that LPGs included a greater proportion of significant sequence differences than HPGs. To calculate sequence differences, a conservation score was assigned to each nucleotide in the human genome based on the whole-genome alignment of seven mammalian species (see Materials and Methods). Sequences near the transcription start site (TSS) regions of LPGs tend to be less conserved than those of HPGs. LPGs displayed significantly lower nucleotide conservation scores within -250 bp and +50 bp from the TSS, regions that potentially contain proximal, core, and downstream promoter elements, than MPGs and HPGs (P = 6.90 × 10−4, One-way ANOVA on ranks, P = 6.00 × 10−6, for post hoc test between LPG and HPG, Mann–Whitney U Test; fig. 3a;Core et al. 2008). In particular, differences in nucleotide conservation between LPGs and HPGs were significantly greater near the TSS, indicating that they could be relevant to changes in transcriptional regulation (fig. 3b). Moreover, the nucleotide conservation score could significantly explain the variance of the PS score (β = 0.490 and P = 9.37 × 10−5, Generalized Linear Model; supplementary fig. S8, Supplementary Material online). Fig. 3. View largeDownload slide Comparison of the conservation of noncoding regulatory sequences and elements between LPGs and HPGs. (a) Average nucleotide conservation scores within −250 bp and +50 bp from the TSS compared between LPGs and HPGs. (b) Base-to-base mapping of nucleotide conservation scores of LPGs and HPGs. Upstream regions of (c) Wfs1 (d) Pkhd1 genes visualized using the UCSC Genome browsers for mouse (NCBI37/mm9) and human (GRCh37/hg19) genome assembly. Histone modification marks for Wfs1 and Pkhd1 were quantified from the cerebellum and liver, respectively. Histone modification marks indicate the enrichment signal from ChIP-seq data in mouse and human using ENCODE (Yue et al. 2014). Chain alignment shows the chromosome alignment of human (hg19) and mouse genomes derived using BLASTz (Schwartz et al. 2003). Fig. 3. View largeDownload slide Comparison of the conservation of noncoding regulatory sequences and elements between LPGs and HPGs. (a) Average nucleotide conservation scores within −250 bp and +50 bp from the TSS compared between LPGs and HPGs. (b) Base-to-base mapping of nucleotide conservation scores of LPGs and HPGs. Upstream regions of (c) Wfs1 (d) Pkhd1 genes visualized using the UCSC Genome browsers for mouse (NCBI37/mm9) and human (GRCh37/hg19) genome assembly. Histone modification marks for Wfs1 and Pkhd1 were quantified from the cerebellum and liver, respectively. Histone modification marks indicate the enrichment signal from ChIP-seq data in mouse and human using ENCODE (Yue et al. 2014). Chain alignment shows the chromosome alignment of human (hg19) and mouse genomes derived using BLASTz (Schwartz et al. 2003). The emergence of species-specific regulatory elements would affect the divergence of noncoding regulatory sequences across species. We confirmed that species-specific regulatory elements were more frequently found in LPGs than HPGs. Species-specific or conserved regulatory elements were identified from sequence alignments of putative functional DNA elements of human and mouse genes predicted from ChIP-seq assays of histone modifications and transcription factor binding (see Materials and Methods; Yue et al. 2014). Specifically, LPGs have significantly more species-specific regulatory elements such as enhancers, transcription factor binding sites (TFBS) than HPGs (P = 2.32 × 10−4 in enhancers, P = 3.53 × 10−7 in TFBS, P = 0.074 in promoters, Fisher-exact Test; table 1). In contrast, HPGs contained a greater number of conserved regulatory elements than LPGs. Table 1. Conservation of cis-Regulatory Elements in LPGs and HPGs. Functional elements LPG HPG Significance P Enhancers Species-specific 28.4% (259/912) 22.3% (186/833) 2.32 × 10−4 Conserved 40.6% (370/912) 49.6% (413/833) TF-binding sites Species-specific 17.5% (200/1, 143) 10.3% (173/1, 828) 3.53 × 10−7 Conserved 66.2% (757/1, 143) 76.2% (1, 436/1, 828) Promoters Species-specific 4.4% (27/621) 2.4% (13/535) 0.074 Conserved 81.2% (504/621) 86.9% (465/535) Functional elements LPG HPG Significance P Enhancers Species-specific 28.4% (259/912) 22.3% (186/833) 2.32 × 10−4 Conserved 40.6% (370/912) 49.6% (413/833) TF-binding sites Species-specific 17.5% (200/1, 143) 10.3% (173/1, 828) 3.53 × 10−7 Conserved 66.2% (757/1, 143) 76.2% (1, 436/1, 828) Promoters Species-specific 4.4% (27/621) 2.4% (13/535) 0.074 Conserved 81.2% (504/621) 86.9% (465/535) Note.—The proportions of species-specific versus conserved promoters, enhancers and transcription factor binding sites were compared among LPGs and HPGs. Significance P-values were calculated by Fisher-exact test. Table 1. Conservation of cis-Regulatory Elements in LPGs and HPGs. Functional elements LPG HPG Significance P Enhancers Species-specific 28.4% (259/912) 22.3% (186/833) 2.32 × 10−4 Conserved 40.6% (370/912) 49.6% (413/833) TF-binding sites Species-specific 17.5% (200/1, 143) 10.3% (173/1, 828) 3.53 × 10−7 Conserved 66.2% (757/1, 143) 76.2% (1, 436/1, 828) Promoters Species-specific 4.4% (27/621) 2.4% (13/535) 0.074 Conserved 81.2% (504/621) 86.9% (465/535) Functional elements LPG HPG Significance P Enhancers Species-specific 28.4% (259/912) 22.3% (186/833) 2.32 × 10−4 Conserved 40.6% (370/912) 49.6% (413/833) TF-binding sites Species-specific 17.5% (200/1, 143) 10.3% (173/1, 828) 3.53 × 10−7 Conserved 66.2% (757/1, 143) 76.2% (1, 436/1, 828) Promoters Species-specific 4.4% (27/621) 2.4% (13/535) 0.074 Conserved 81.2% (504/621) 86.9% (465/535) Note.—The proportions of species-specific versus conserved promoters, enhancers and transcription factor binding sites were compared among LPGs and HPGs. Significance P-values were calculated by Fisher-exact test. Species-specific regulatory elements could explain the existence of orthologous genes with very low PS scores, which could not be accounted for by the divergence of coding regions. For example, the LPG WFS1 has a species-specific enhancer candidate in its upstream region in the mouse ortholog Wfs1. Specifically, histone modification marks H3K4me1/H3K27a on the mouse chromosome, which are hallmarks of enhancer activity signal, are not aligned with those on the human chromosome (Chr5: 37,354,900–37,355,900; fig. 3c; see Materials and Methods). On the other hand, the HPG PKHD1 has a conserved enhancer candidate in the 5′-UTR in the first exon of the mouse ortholog. Specifically, the histone modification marks are present in a region conserved in human and mouse chromosomes (Chr1: 20,048,050–20,049,050; fig. 3d and supplementary fig. S9, Supplementary Material online). Note that the aligned human chromosomal region (Chr6: 51,483,093–51,484,474) also includes histone modification marks in human liver, implying the conservation of enhancer sequences in human and mouse. One might ask that regulatory elements located far from protein-coding sequences could be associated with phenotypic differences. Thus, we analyzed the conserved noncoding sequences (CNS) which potentially include the distal regulatory elements posited in the intra or intergenic regions (Takahashi and Saitou 2012; Babarinde and Saitou 2016). We found that CNS throughout the mammalian species were more strongly associated with HPGs than LPGs (supplementary fig. S10a, Supplementary Material online). In contrast, primate- and rodent-specific CNSs tend to have stronger association with LPGs than HPGs (supplementary fig. S10b and c, Supplementary Material online). Taken together, our results suggest that the diverse regulatory elements, closely or distantly placed from genes, could affect the phenotypic differences of human and mouse orthologous genes. Transcriptomic Divergence Explains Phenotypic Differences between Orthologous Genes Changes in gene regulatory sequences could trigger divergence in transcription across species that might repurpose functionally conserved proteins in different cell types or tissues, and consequently change the phenotypes (Carroll 2008). Therefore, we analyzed transcriptional divergence in orthologous genes with phenotypic differences and discovered that, in many cases, phenotypic differences can be significantly explained by transcriptome divergence across species (fig. 4a and supplementary fig. S11, Supplementary Material online; β = 0.272 and P = 7.81 × 10−4, Generalized Linear Model). Specifically, expression divergence was higher in LPGs than HPGs and MPGs (P = 0.010, One-way ANOVA on ranks, P = 2.66 × 10−3, for post hoc test between LPG and HPG, Mann–Whitney U Test; fig. 4a). This trend was consistently observed using various methods for calculating expression divergence (supplementary fig. S12, Supplementary Material online). The correlation between expression divergence and PS score was not merely due to gene expression changes in particular tissues, but rather to changes in overall tissue expression (supplementary fig. S13, Supplementary Material online). To calculate expression divergence for orthologous gene pairs, we calculated the linear correlation and nonlinear dependence of gene expression profiles between human/mouse homologous tissues (see Materials and Methods and supplementary data S2, Supplementary Material online). Fig. 4. View largeDownload slide Expression conservation and PS scores. (a) Expression conservation among LPG/MPG/HPGs was compared using Pearson correlation coefficients (PCC) calculated from 13 and 21 tissues using ENCODE and FANTOM5 transcriptome databases (Forrest et al. 2014; Yue et al. 2014), respectively. Comparison of tissue-specific expression profiles of (b) WFS1 and (c) PKHD1 between human and mouse transcriptomes. Fig. 4. View largeDownload slide Expression conservation and PS scores. (a) Expression conservation among LPG/MPG/HPGs was compared using Pearson correlation coefficients (PCC) calculated from 13 and 21 tissues using ENCODE and FANTOM5 transcriptome databases (Forrest et al. 2014; Yue et al. 2014), respectively. Comparison of tissue-specific expression profiles of (b) WFS1 and (c) PKHD1 between human and mouse transcriptomes. Transcriptomic divergence is also associated with the divergence of noncoding regulatory elements. For example, a phenotypic difference between WFS1 and the mouse ortholog could be explained by divergence of regulatory elements (fig. 3c). The conservation of WFS1 expression is lower than that of the average orthologous gene pair (ρ = −0.10 and the average ρ of all orthologous genes is 0.38; fig. 4b). For example, we found that expression of WFS1 differed among many human and mouse tissues including heart and testis. Meanwhile, a similar phenotype between human and mouse PKHD1 mutations could be explained by the conservation of regulatory elements (fig. 3d). Conservation of the expression of PKHD1 was also higher than that of the average of all orthologous gene pairs (ρ = 0.63; fig. 4c). Specifically, tissue-specific expression of PKHD1 is conserved across human/mouse liver, kidney, and pancreas, which are associated with lesions in polycystic kidney and hepatic diseases. Furthermore, among all one-to-one human/mouse orthologs, genes with species-specific regulatory elements were more likely to be correlated with transcriptome divergence than those without species-specific regulatory elements (supplementary fig. S14, Supplementary Material online). Taken together, our results suggest that transcriptomic divergence is correlated with noncoding regulatory divergence, and can explain the phenotypic differences of LPGs. Tissue-specificity, gene essentiality, and associated molecular functions could have the potential to influence the identification of mechanisms underlying phenotypic differences, because genes that differ in these factors could have different modes of sequence and expression evolution. Tissue specificity was associated with phenotypic differences, however, expression divergence was still correlated with phenotypic differences when the tissue specificity contribution was taken into account and corrected for (supplementary fig. S15a, Supplementary Material online). LPGs tend to be more broadly expressed than HPGs (supplementary fig. S15b, Supplementary Material online), consistent with the previous report that expression divergence is driven by a subset of housekeeping genes rather than tissue-specific genes (Movahedi et al. 2011; Lin et al. 2014; Breschi et al. 2016). Moreover, particular molecular functions were associated with phenotypic differences. Specifically, LPG genes were enriched in metabolic processes and immune responses (supplementary fig. S15c, Supplementary Material online), and these functions are similar to those associated with the divergence of cis-regulatory elements and transcriptomes between human and mouse (Yue et al. 2014). In contrast, gene essentiality was not significantly associated with phenotypic differences (supplementary fig. S15d, Supplementary Material online). Taken together, the association of phenotypic differences with tissue specificity or particular molecular functions could support the correlation between expression evolution and phenotypic difference, because the two factors are known to be associated with expression evolution. Discussion Phenotypic differences between human and mouse orthologous genes are often considered as a failure of mouse models and are rarely reported in certain cases (Bakker et al. 2013; Chandrasekera and Pippin 2013; Seok et al. 2013). Thus, it is still unclear which orthologous genes have undergone significant phenotypic changes and what types of molecular evolutionary events have contributed to the phenotypic differences. In the present study, we established for the first time an integrated approach for genotype–phenotype comparison to reveal diverse molecular evolutionary events that likely contribute to phenotype differences resulting from differences between human and mouse genomes. We devised a quantitative PS scoring system (fig. 1) to map genome-scale phenotype comparisons to analyze the concordance or discrepancy in human disease and mouse phenotypes. We discovered that phenotypic discrepancies were partly explained by divergence in noncoding regulatory elements and transcriptomic profiles rather than differences in protein-coding sequences (figs. 2–4). PS scores provide a statistical assessment of the significance of phenotypic similarity compared with the expected model based on orthology–function conjecture (fig. 1). However, we wondered whether the accuracy of PS scores is affected by the incompleteness of current genotype–phenotype mapping. Thus, we confirmed that calculation of PS scores was robust when removing 15% of genotype–phenotype mappings (Pearson ρ = 0.95), or when using older versions of the mapping databases (ρ = 0.91), thereby simulating the incompleteness of the mapping (supplementary fig. S16, Supplementary Material online). We also confirmed that our data set for phenotype comparison was not biased toward a particular phenotype class, and demonstrated that it represented current gene–phenotype maps of human and mouse (supplementary fig. S17, Supplementary Material online). Specifically, the proportion of genes with HPO and MPO classes in our data set was highly correlated with the proportion in OMIM and MGI databases, respectively (ρ = 1.00 in OMIM, ρ = 0.98 in MGI), indicating that the composition of phenotype terms in our data set provides a good representation of those in the complete gene sets. Taken together, our phenomics approach using the PS score could be applicable for the comparison of current gene–phenotype relationships between the two species. We found that regulatory divergence was associated with phenotypic differences in human and mouse orthologous genes (fig. 3). The frequent emergence of phenotypic differences might result from evolutionary divergence between species rather than random mistakes in genotype–phenotype comparisons. It was reported that changes in gene regulatory sequences could be a source of phenotypic evolution (Carroll 2008; Indjeian et al. 2016). Groups of genes are often regulated together and their functions affected by switch mutations in regulatory genes (Babu et al. 2004; Voordeckers et al. 2015). Indeed, we found that genes located in the same pathway or functional module tended to show concurrent phenotypic differences, which is consistent with previous reports implying that phenotypic changes may occur in a modular manner (Ryan et al. 2013; Han et al. 2015; Kachroo et al. 2015). For example, from the LPG analysis, 13 out of 17 Fanconi anaemia pathway (FAP) genes with similar PS scores had significant phenotypic differences (supplementary table S4, Supplementary Material online). Indeed, mutations of mouse FAP genes do not display the developmental abnormalities that are frequently observed in human FA patients (Bakker et al. 2013). Our results provide evidence that comparative transcriptomic data analysis should be conducted to identify in which tissues mouse gene expression is a suitable model of human diseases, and indicate the need to optimize the careful use of animal models. Our results showed that, in many cases, changes in tissue-specific expression profiles across species are able to explain phenotypic differences between human and mouse orthologous genes (fig. 4). Recent comparative transcriptome data revealed that many human/mouse orthologous genes have different transcriptome profiles in different tissues or cell types (Forrest et al. 2014; Yue et al. 2014). Comparison of transcriptome data can be invaluable for identifying which tissues or cell types are connected to phenotypic changes or the conservation of a particular gene between human and model organisms (Breschi et al. 2017) since transcriptomic changes are regarded as an “intermediate molecular phenotype” and a consequence of genetic variation reflecting the current state of a system (i.e., a tissue, organ, or species) (Burga and Lehner 2013). Phenologs, orthologous genes with phenotypes that are identical across species, are helpful when investigating the molecular mechanisms underlying human disease phenotypes through mouse genetic approaches (McGary et al. 2010; McWhite et al. 2015). However, discovery of phenologs is difficult since frequent phenotypic differences are observed between orthologous genes, and mapping of human and mouse genotype–phenotype relationships is often incomplete. Thus, we investigated tissue-specific expression conservation in all human and mouse orthologous genes using the FANTOM and ENCODE databases (supplementary data S2, Supplementary Material online and see our companion site; https://sbi.postech.ac.kr/w/PS). Orthologous genes with high expression conservation are likely to be useful for identifying putative phenologs. In our analysis, divergence of protein-coding sequences could not explain phenotypic differences between human and mouse orthologous genes (fig. 2). According to the neutral theory of molecular evolution, sequence divergence of genes could result from the fixation of many random neutral mutations, as well as the positive selection of beneficial mutations, that affect the adaptive evolution of phenotypes (Orr 2005). Also, due to the high complexity of genotype–phenotype mapping, phenotypic differences could only be explained by the integration of diverse molecular evolutionary features obtained from cross-species comparisons of multi-omics approaches, including molecular evolutionary features related to noncoding or coding sequences (Maher 2012; Breschi et al. 2017). Thus, further studies on diverse molecular evolutionary events are necessary to better understand phenotypic differences between species. We anticipate that our cross-species comparison of genotype–phenotype maps combined with the use of data from multi-omics approaches will prove to be a valuable resource, including as a reference set for training and constructing computational models that can explain the complex nature of phenotypic differences through molecular evolutionary features (Breschi et al. 2017). Materials and Methods Calculation of PS Scores Compilation of Gene–Phenotype Relationships in Human and Mouse Gene–phenotype relationships in human were compiled from the OMIM database. OMIM provides manually curated relationship data between human Mendelian disorders and genetic variants (Amberger et al. 2015). However, annotation of disorders in OMIM tends not to include semantic information of symptoms or phenotypes because disorders are commonly named after their identifier. Therefore, human disorders were mapped using HPO, providing standard phenotype terminology and a collection of disease-phenotype annotations (Groza et al. 2015). Consequently, we compiled relationships between 2,380 genes and 6,506 HPO terms. Gene–phenotype relationships in mouse were compiled from the MGI database. MGI provides gene–phenotype relationship data obtained from mouse knockout experiments and phenotyping (Eppig et al. 2015). In the MGI database, phenotype annotations are already organized into MPO terms (Smith and Eppig 2012). Therefore, we gathered relationships between 5,737 genes and 7,839 MPO terms directly from the MGI database. Detection of Human and Mouse Orthologous Gene Pairs Orthologous relationships between human and mouse genes were predicted by sequence homology. Human and mouse genomes were curated from the National Center for Biotechnology Information NCBI36 and NCBIM36 databases. Mouse orthologs (including sequences) of human genes were extracted using EnsemblCompara Gene Trees (Vilella et al. 2008), which was downloaded in 2017 via Ensembl Biomart (http://www.ensembl.org/biomart/martview/). Consequently, one-to-one orthologs of 17,007 genes were detected between human and mouse. Semantic Similarity Scores of Multiple Terms Associated with Orthologous Genes Human and mouse phenotypes of 2,142 one-to-one orthologous genes were compared with PhenoDigm (Smedley et al. 2013). Specifically, we used pairwise ontology-alignment scores between HPO and MPO terms, quantified by the geometric mean of the Jaccard Index and Information Content (IC) values adapted from PhenoDigm. The Jaccard Index represents the co-occurrence of attributes associated with the term pair. Attributes of terms bridge the two different ontologies and are represented by the ontologies of cross-species anatomy and phenotypic quality description, alongside the lexical matching of HPO and MPO terms. IC indicates the significance of the co-occurrence involved in a more specific or granular term. The semantic similarity (S) of multiple terms associated with an orthologous gene pair was calculated as presented in supplementary figure S1, Supplementary Material online. To calculate overall semantic similarity between multiple terms, pairwise ontology-alignment scores from best matches were averaged. Because of the abundant pleiotropy in the gene–phenotype map of human and mouse, best scoring matches were useful for calculating the semantic similarity score among multiple ontology terms associated with various tissue or organs. Statistical Assessment of Semantic Similarity Based on Orthology–Function Conjecture To assess the statistical significance of the semantic similarity, we compared the semantic similarity score with the calculation of random gene pairs, which are created by the shuffling of orthologous relationships. To construct random gene pairs, we generated available combinations of gene pairs (∼235 million) between 3,139 human genes and 7,546 mouse genes with phenotype annotations. However, the semantic similarity score, represented by best scoring matches, is not a fair comparison among orthologous genes with different numbers of associated phenotype ontology terms. Specifically, the higher the number of terms associated with a particular gene, the more random is the chance that the MPO–HPO term pair with high a pairwise ontology-alignment score is likely to be selected as a best scoring match, leading to bias towards the number of terms. For a fair comparison without bias, we normalized semantic similarity scores based on the number of phenotype ontology terms from both HPO and MPO. To normalize the semantic similarity, we calculated a Z-score, a deviation from the mean of the semantic similarity score distribution calculated from random gene pairs with a similar number of phenotype annotations to the observed gene pair (supplementary fig. S2, Supplementary Material online). Specifically, we separated the semantic similarity distribution of random gene pairs based on the number of associated HPO and MPO terms. To calculate the PS score of a particular gene, an appropriate random distribution was chosen that corresponds to the number of HPO and MPO terms associated with the gene. When we converted these to PS scores, bias from the number of phenotype terms was eliminated (supplementary fig. S3, Supplementary Material online). We confirmed that bias was not from ascertainment bias or underestimation due to excessive or incomplete phenotyping, respectively. Specifically, we investigated the dependency on term number using random models by shuffling the three layers affecting the comparison of the biased gene–phenotype mapping, namely 1) gene pairs of orthologous relationships, 2) MPO/HPO term IDs, and 3) gene–phenotype relationships (supplementary fig. S4a, Supplementary Material online). Despite the various ways of shuffling, the random expected scores were still dependent on the number of MPO terms (supplementary fig. S4b, Supplementary Material online). However, bias was reduced when we chose the average value to score the similarity of each term (supplementary fig. S4c, Supplementary Material online), indicating that it could arise from the method of best scoring matches. Nevertheless, best scoring matches are necessary for comparing multiple terms with a complex gene–phenotype map with abundant pleiotropy. Thus, we calculated the PS score using two steps; normalization of the number of terms, after calculating best scoring matches as semantic similarity. Impact of Human Disease-Associated Mutations and Mouse Mutation Types Human disease-associated mutations are classified by their mutational impact. Missense mutations include insertion/deletion of single amino acids. Truncating mutations include changes in start/stop position, frameshifts, and insertion/deletion of exon chunks or whole genes. Other mutations include changes in copy number and cis-regulatory and noncoding variations, for which the impact on gene function can be difficult to predict. Mutations conducted in mouse experiments were classified into two types: 1) null/knock-out mutations, 2) hypomorphic, knockdown or modified isoform alleles that are not a complete loss of gene but drastic mutations likely to reduce the gene functionality. Statistical Model Explaining the PS Scores To statistically determine how strongly a particular molecular evolutionary feature explains the PS score, we performed generalized linear regression analysis based on the normal distribution of the PS score with the following probability density function. The relationship between the linear predictor (Xβ) and the mean of the distribution function (μ) is provided by the “identical” link function (Xβ = μ), where X is an independent variable representing a single feature of molecular evolution, β is the coefficient of the linear predictor, and μ is the mean of the dependent variable representing the PS score. To calculate the statistical model, we used StatsModels 0.8.0 (http://statsmodels.org) implemented in the Python package. Parameters Related to the Evolution of Protein-Coding Sequences Sequence Identity and Differences in Gene Length To calculate sequence identity, pairwise sequence alignment was computed using the needle program of the European molecular biology open software suite (EMBOSS; Rice et al. 2000). A representative coding sequence of a particular gene was chosen from the canonical peptide sequence translated from a longest and most abundant transcript. Human and mouse genomes were curated from the National Center for Biotechnology Information NCBI36 and NCBIM36 databases. Evolutionary Rate To calculate the evolutionary rate, dN/dS values of human and mouse genes were computed. Mouse orthologs (including sequences) of human genes were extracted using BioMart, adapted from EnsemblCompara Gene Trees (Vilella et al. 2008). Sequence alignment between human and mouse genes was carried out through phylogenetic analysis (www.ensembl.org/info/docs/compara/homology_method.html). The dS and dN values of human and mouse orthologs were obtained from BioMart and estimated by the likelihood method. Changes in Structural Disorder To calculate changes in structural disorder between orthologous genes, we applied the sequence-based disorder predictor (IUPred; Dosztányi et al. 2005) to human and mouse canonical protein sequences. To reduce noise from disorder prediction, at least 30 consecutive residues with a disorder score >0.5 were used to define disordered residues. Changes in structural disorder were calculated by dividing the number of disordered residues found in orthologous genes with a high number of disordered residues by the number of disordered residues. Conservation of Secondary Structural Elements To calculate the conservation of secondary structural elements, the secondary structure alignment method (SSEA) was employed (Fontana et al. 2005). Specifically, we used the fraction of aligned secondary structural elements as an indicator of conservation. Secondary structures of proteins were predicted by PSIPRED (Jones 1999). Analysis of the Conservation of Noncoding Regulatory Elements Nucleotide Conservation Scores To calculate the conservation of regions near TSS, a nucleotide conservation score was computed for a window 250 bp upstream and 50 bp downstream of the TSS of each gene that could contain proximal, core and downstream promoter elements (Core et al. 2008). The TSS of a particular gene is defined as the first 5′ base of the gene sequence that is deposited in ENSEMBL genome annotation system (Aken et al. 2016). Specifically, the candidate TSS were predicted by Eponine based on computational prediction using sequence pattern near the known TSS. PhastCons scores (Siepel et al. 2005) in this window were averaged from the bigwig file using the bwtool (Pohl and Beato 2014) and the average was taken as a measure of the nucleotide sequence conservation near the TSS. The bigwig file was calculated from multiple alignments of six mammalian genomes to the human genome (hg38), and was provided by the University of California Santa Cruz (UCSC) genome browser (hg38.phastCons7way.bw). Identification of Species-Specific/Conserved Regulatory Elements Associated with Genes We gathered a nonredundant set of multiple enhancers or promoters associated with a certain group of genes (e.g., LPG) to calculate the fraction of species-specific/conserved regulatory elements, noting that more than one element could be associated with a particular gene. To identify species-specific/conserved regulatory elements, one-to-one mapping of human and mouse cis-bases was performed using reciprocal-chained BLASTz alignments (Schwartz et al. 2003) as described previously (Yue et al. 2014). BnMapper (https://bitbucket.org/james_taylor/bx-python/wiki/bnMapper) was used to align mouse cis-regulatory elements in the human genome. Specifically, one-to-one mapping of chains between mouse (mm9) and human (hg19) genomes (mm9.hg19.rBest.chain) was used to identify the orthologous sequence of cis-regulatory elements. To identify species (mouse)-specific elements, one-to-many orthologous sequences, that were found from one-to-many mapping of the chains (mm9ToHg19.over.chain), were excluded from all candidate regulatory elements. To map the regulatory elements into the human and mouse chromosome, we used the interval of promoters, enhancers and TFBS from human and mouse ENCODE databases (Yue et al. 2014). Specifically, we downloaded the interval of candidate promoter and enhancer elements from mouse ENCODE projects (http://promoter.bx.psu.edu/ENCODE/download.html). Promoters and enhancers were predicted based on histone modification and RNA polymerase II (polII) binding of 23 mouse tissue/cell types. The candidate of promoters was predicted based on computational models trained by H3K4me3 or polII binding signals of known TSS sites. The candidate of enhancers were predicted based on three histone marks, H3K4me1 or H3K27ac outside of promoter regions, using a random-forest based algorithm for enhancer identification from chromatin state (RFECS; Rajagopal et al. 2013). To determine TFBS, we downloaded the interval of predicted TFBS from mouse ENCODE projects (http://mouseencode.org/publications/mcp00/). To associate genes and regulatory elements from the ENCODE data set, we assigned specific enhancers, TFBS or promoters to the genes when elements were localized near the TSS of the canonical transcript of gene (<10 kb for enhancers, < 3 kb for promoters and TFBS). We downloaded canonical transcripts of human and mouse genes from UCSC genome browser. Identification of CNS We used the genomic coordinate of “CNS throughout mammalian species” and “lineage-specific CNS,” either provided in Babarinde et al. and Takahashi et al., respectively (Takahashi and Saitou 2012; Babarinde and Saitou 2016). CNS throughout mammalian species was retrieved based on the pairwise homology searches with chicken as query against the four-mammalian species (human, mouse, dog and cattle). Extraction of primate- and rodent-specific CNSs was based on the human–marmoset and the mouse–rat alignments, respectively, followed by the exclusion of vertebrate homologous regions. In primate-specific CNSs, sequences that were not conserved in other primate species (rhesus macaque, orangutan, and chimpanzee) were removed. A CNS was considered to be associated with a particular gene, when the CNS is overlapped with the intragenic region (e.g., protein-coding region, intron or UTR) of the gene or is the closest to the TSS of the gene. Evolution of Expression Parameters Calculation of Expression Conservation of Orthologous Genes To calculate the expression conservation of orthologous genes, differential expression between human and mouse homologous tissues were quantified from comparative transcriptome data (Forrest et al. 2014). To quantify the expression conservation of a particular gene, we calculated the linear correlation and nonlinear dependence of gene expression profiles between human/mouse homologous tissues. ENCODE and FANTOM were used to quantify the expression of human and mouse orthologous genes across tissues, where samples of both human and mouse homologous tissues are available in the database (13 and 21 tissues of ENCODE and FANTOM). We downloaded the RNA-seq quantification data of FANTOM from Expression Atlas database (Petryszak et al. 2016), which were carried out using iRAP (Fonseca et al. 2014). For comparison of RNA-seq quantification between human and mouse samples, we additionally subjected the FPKM values to quantile normalization. Linear correlation and nonlinear dependence of gene expression profiles were calculated using the Pearson-correlation coefficient (ρ) and normalized point-wise mutual information (nPMI). The PCC of human and mouse expression profiles for human–mouse orthologous pair i was calculated using the expression (Wang et al. 2010), ρi=∑t=1nSHi,tSMi,t-∑t=1nSHi,t∑t=1nSMi,tn(∑t=1nSHi,t2-∑t=1nSHi,t2n)(∑t=1nSMi,t2-∑t=1nSMi,t2n), where SH(i,t) and SM(i,t) are the expression level (FPKM) of gene i in human and mouse tissue t, respectively, and n is the total number of homologous tissues used for measuring the expression profiles. The nPMI value of human and mouse expression profiles for human–mouse orthologous pair i was calculated using the expression, nPMIi=log[PH(i)PM(i)]log[PMH(i)]-1.0, where PH(i) and PM(i) denote the proportion of human (H) and mouse (M) tissues in which gene i of the corresponding species is expressed with a FPKM value >1.0, respectively. PHM(i) denotes the proportion of human and mouse homologous tissue pairs in which orthologous pair i are both expressed with a FPKM value >1.0. Calculation of Tissue-Wide Expression Similarity between Homologous Tissues across Species To calculate the contribution of a particular tissue to the correlation between transcriptome conservation and PS score, we calculated the nPMI value of the expression profile of a particular gene set between homologous tissue pairs and compared the two PMIs calculated from LPGs and HPGs. The nPMI value between gene set G of a particular homologous tissue pair t was calculated using the expression, nPMI(t,G)=log[PHG(t)PMG(t)]log[PHMG(t)]-1.0, where PHG(t) and PMG(t) denote the proportion of genes in G that are expressed in human and mouse tissues of t with a FPKM value >1.0, respectively. PHMG(t)denotes the proportion of genes in G for which human and mouse orthologous are simultaneously expressed in homologous tissue t with FPKM values >1.0. To test the statistical significance of tissue-wide expression similarity, we calculated the Z-score as a measure of the deviation of the observed value nPMI(t,G) from the distribution of expression similarities calculated using a random gene set containing the same number of genes in G. Supplementary Material Supplementary data are available at Molecular Biology and Evolution online. Acknowledgments We thank all the members of the Kim laboratory for helpful discussion. This work was supported in part by Korean National Research Foundation grant (2018R1A2B6002657 and 2017M3C9A60472625) and Korea Institute of Marine Science & Technology grant (D11510215H480000140). We would also like to thank Dr Naruya Saitou, Dr Isaac Adeyemi Babarinde, and Dr Mahoko Takahashi for providing CNS data adopted in this study. Author Contributions S.K.H. and S.K. conceived and designed the experiments. S.K.H. and D.K. performed the experiments. S.K.H., D.K., H.L., S.K., and I.K. analyzed the data. S.K.H., D.K., I.K., and S.K. wrote the paper. References Aitman TJ , Boone C , Churchill GA , Hengartner MO , Mackay TFC , Stemple DL. 2011 . The future of model organisms in human disease research . Nat Rev Genet . 12 8 : 575 – 582 . Google Scholar CrossRef Search ADS PubMed Aken BL , Ayling S , Barrell D , Clarke L , Curwen V , Fairley S , Fernandez Banet J , Billis K , García Girón C , Hourlier T et al. , . 2016 . The Ensembl gene annotation system . Database 2016 : baw093. Google Scholar CrossRef Search ADS PubMed Amberger JS , Bocchini CA , Schiettecatte F , Scott AF , Hamosh A. 2015 . OMIM.org: online Mendelian Inheritance in Man (OMIM(R)), an online catalog of human genes and genetic disorders . Nucleic Acids Res . 43 : D789 – D798 . Google Scholar CrossRef Search ADS PubMed Babarinde IA , Saitou N. 2016 . Genomic locations of conserved noncoding sequences and their proximal protein-coding genes in mammalian expression dynamics . Mol Biol Evol . 33 7 : 1807. Google Scholar CrossRef Search ADS PubMed Babu MM , Luscombe NM , Aravind L , Gerstein M , Teichmann SA. 2004 . Structure and evolution of transcriptional regulatory networks . Curr Opin Struct Biol . 14 3 : 283 – 291 . Google Scholar CrossRef Search ADS PubMed Bakker ST , de Winter JP , te Riele H. 2013 . Learning from a paradox: recent insights into Fanconi anaemia through studying mouse models . Dis Model Mech . 6 1 : 40 – 47 . Google Scholar CrossRef Search ADS PubMed Benton MJ , Donoghue PCJ. 2007 . Paleontological evidence to date the tree of life . Mol Biol Evol . 24 1 : 26 – 53 . Google Scholar CrossRef Search ADS PubMed Bergmann C , Senderek J , Küpper F , Schneider F , Dornia C , Windelen E , Eggermann T , Rudnik-Schöneborn S , Kirfel J , Furu L et al. , . 2004 . PKHD1 mutations in autosomal recessive polycystic kidney disease (ARPKD) . Hum Mutat . 23 5 : 453 – 463 . Google Scholar CrossRef Search ADS PubMed Boucher CA , King SK , Carey N , Krahe R , Winchester CL , Rahman S , Creavin T , Mehji P , Bailey MES , Chartier FL et al. , . 1995 . A novel homeodomain-encoding gene is associated with a large CpG island interrupted by the myotonic dystrophy unstable (CTG)n repeat . Hum Mol Genet . 4 10 : 1919 – 1925 . Google Scholar CrossRef Search ADS PubMed Breschi A , Djebali S , Gillis J , Pervouchine DD , Dobin A , Davis CA , Gingeras TR , Guigó R. 2016 . Gene-specific patterns of expression variation across organs and species . Genome Biol . 17 1 : 151. Google Scholar CrossRef Search ADS PubMed Breschi A , Gingeras TR , Guigó R. 2017 . Comparative transcriptomics in human and mouse . Nat Rev Genet . 18 7 : 425 – 440 . Google Scholar CrossRef Search ADS PubMed Burga A , Lehner B. 2013 . Predicting phenotypic variation from genotypes, phenotypes and a combination of the two . Curr Opin Biotechnol . 24 4 : 803 – 809 . Google Scholar CrossRef Search ADS PubMed Carroll SB. 2008 . Evo-devo and an expanding evolutionary synthesis: a genetic theory of morphological evolution . Cell 134 1 : 25 – 36 . Google Scholar CrossRef Search ADS PubMed Ceholski DK , Trieber CA , Holmes CFB , Young HS. 2012 . Lethal, hereditary mutants of phospholamban elude phosphorylation by protein kinase A . J Biol Chem . 287 32 : 26596 – 26605 . Google Scholar CrossRef Search ADS PubMed Chandrasekera PC , Pippin JJ. 2013 . Of rodents and men: species-specific glucose regulation and type 2 diabetes research . Altex 31 : 157 – 176 . Google Scholar CrossRef Search ADS Core LJ , Waterfall JJ , Lis JT. 2008 . Nascent RNA sequencing reveals widespread pausing and divergent initiation at human promoters . Science (80-) 322 5909 : 1845 – 1848 . Google Scholar CrossRef Search ADS Cryns K , Sivakumaran TA , Van den Ouweland JMW , Pennings RJE , Cremers CWRJ , Flothmann K , Young TL , Smith RJH , Lesperance MM , Van Camp G. 2003 . Mutational spectrum of the WFS1 gene in Wolfram syndrome, nonsyndromic hearing impairment, diabetes mellitus, and psychiatric disease . Hum Mutat . 22 4 : 275 – 287 . Google Scholar CrossRef Search ADS PubMed Deans AR , Lewis SE , Huala E , Anzaldo SS , Ashburner M , Balhoff JP , Blackburn DC , Blake JA , Burleigh JG , Chanet B et al. , . 2015 . Finding our way through phenotypes . PLoS Biol . 13 : e1002033 . Google Scholar CrossRef Search ADS PubMed Dosztányi Z , Csizmok V , Tompa P , Simon I. 2005 . IUPred: web server for the prediction of intrinsically unstructured regions of proteins based on estimated energy content . Bioinformatics 21 16 : 3433 – 3434 . Google Scholar CrossRef Search ADS PubMed Eppig JT , Blake JA , Bult CJ , Kadin JA , Richardson JE. 2015 . The Mouse Genome Database (MGD): facilitating mouse as a model for human biology and disease . Nucleic Acids Res. 43 ( Database issue ): D726 – D736 . Google Scholar CrossRef Search ADS PubMed Fonseca NA , Petryszak R , Marioni J , Brazma A. 2014 . iRAP – an integrated RNA-seq Analysis Pipeline . bioRxiv : 5991 . Fontana P , Bindewald E , Toppo S , Velasco R , Valle G , Tosatto SCE. 2005 . The SSEA server for protein secondary structure alignment . Bioinformatics 21 3 : 393 – 395 . Google Scholar CrossRef Search ADS PubMed Forrest ARR , Kawaji H , Rehli M , Kenneth Baillie J , de Hoon MJL , Haberle V , Lassmann T , Kulakovskiy IV , Lizio M , Itoh M et al. , . 2014 . A promoter-level mammalian expression atlas . Nature 507 7493 : 462 – 470 . Google Scholar CrossRef Search ADS PubMed Gabaldon T , Koonin E. 2013 . Functional and evolutionary implications of gene orthology . Nat Rev Genet . 14 5 : 360 – 366 . Google Scholar CrossRef Search ADS PubMed Garcia-Gonzalez MA , Menezes LF , Piontek KB , Kaimori J , Huso DL , Watnick T , Onuchic LF , Guay-Woodford LM , Germino GG. 2007 . Genetic interaction studies link autosomal dominant and recessive polycystic kidney disease in a common pathway . Hum Mol Genet . 16 16 : 1940 – 1950 . Google Scholar CrossRef Search ADS PubMed Groza T , Köhler S , Moldenhauer D , Vasilevsky N , Baynam G , Zemojtel T , Schriml LM , Kibbe WA , Schofield PN , Beck T et al. , . 2015 . The human phenotype ontology: semantic unification of common and rare disease . Am J Hum Genet . 97 1 : 111 – 124 . Google Scholar CrossRef Search ADS PubMed Han SK , Kim I , Hwang J , Kim S. 2015 . Network modules of the cross-species genotype–phenotype map reflect the clinical severity of human diseases . PLOS ONE. 10 : e0136300 . Google Scholar CrossRef Search ADS PubMed Hojjati MR , Li Z , Jiang XC. 2005 . Serine palmitoyl-CoA transferase (SPT) deficiency and sphingolipid levels in mice . Biochim Biophys Acta-Mol Cell Biol Lipids. 1737 1 : 44 – 51 . Google Scholar CrossRef Search ADS Hornemann T , Penno A , Richard S , Nicholson G , Van Dijk FS , Rotthier A , Timmerman V , Von Eckardstein A. 2009 . A systematic comparison of all mutations in hereditary sensory neuropathy type I (HSAN I) reveals that the G387A mutation is not disease associated . Neurogenetics 10 2 : 135 – 143 . Google Scholar CrossRef Search ADS PubMed Indjeian VB , Kingman GA , Jones FC , Guenther CA , Grimwood J , Schmutz J , Myers RM , Kingsley DM. 2016 . Evolving new skeletal traits by cis-regulatory changes in bone morphogenetic proteins . Cell 164 ( 1–2 ): 45 – 56 . Google Scholar CrossRef Search ADS PubMed Inoue H , Tanizawa Y , Wasson J , Behn P , Kalidas K , Bernal-Mizrachi E , Mueckler M , Marshall H , Donis-Keller H , Crock P et al. , . 1998 . A gene encoding a transmembrane protein is mutated in patients with diabetes mellitus and optic atrophy (Wolfram syndrome) . Nat Genet . 20 2 : 143 – 148 . Google Scholar CrossRef Search ADS PubMed Jones DT. 1999 . Protein secondary structure prediction based on position-specific scoring matrices . J Mol Biol . 292 2 : 195 – 202 . Google Scholar CrossRef Search ADS PubMed Kachroo AH , Laurent JM , Yellman CM , Meyer AG , Wilke CO , Marcotte EM. 2015 . Systematic humanization of yeast genes reveals conserved functions and genetic modularity . Science (80-) 348 6237 : 921 – 925 . Google Scholar CrossRef Search ADS Kato T , Ishiwata M , Yamada K , Kasahara T , Kakiuchi C , Iwamoto K , Kawamura K , Ishihara H , Oka Y. 2008 . Behavioral and gene expression analyses of Wfs1 knockout mice as a possible animal model of mood disorder . Neurosci Res . 61 2 : 143 – 158 . Google Scholar CrossRef Search ADS PubMed Kim J , Kim I , Han SK , Bowie JU , Kim S. 2012 . Network rewiring is an important mechanism of gene essentiality change . Sci Rep . 2 1 : 1 – 7 . Kim PM , Sboner A , Xia Y , Gerstein M. 2008 . The role of disorder in interaction networks: a structural analysis . Mol Syst Biol . 4 : 179. Google Scholar CrossRef Search ADS PubMed Liao B-Y , Zhang J. 2008 . Null mutations in human and mouse orthologs frequently result in different phenotypes . Proc Natl Acad Sci U S A. 105 19 : 6987 – 6992 . Google Scholar CrossRef Search ADS PubMed Lin S , Lin Y , Nery JR , Urich MA , Breschi A , Davis CA , Dobin A , Zaleski C , Beer MA , Chapman WC et al. , . 2014 . Comparison of the transcriptional landscapes between human and mouse tissues . Proc Natl Acad Sci U S A . 111 48 : 17224 – 17229 . Google Scholar CrossRef Search ADS PubMed Maher B. 2012 . ENCODE: the human encyclopaedia . Nature 489 7414 : 46 – 48 . Google Scholar CrossRef Search ADS PubMed McGary KL , Park TJ , Woods JO , Cha HJ , Wallingford JB , Marcotte EM. 2010 . Systematic discovery of nonobvious human disease models through orthologous phenotypes . Proc Natl Acad Sci U S A . 107 14 : 6544 – 6549 . Google Scholar CrossRef Search ADS PubMed McWhite CD , Liebeskind BJ , Marcotte EM. 2015 . Applications of comparative evolution to human disease genetics . Curr Opin Genet Dev . 35 : 16 – 24 . Google Scholar CrossRef Search ADS PubMed Mosca R , Pache RA , Aloy P. 2012 . The role of structural disorder in the rewiring of protein interactions through evolution . Mol Cell Proteomics. 11 7 : M111.014969. Google Scholar CrossRef Search ADS PubMed Movahedi S , Van de Peer Y , Vandepoele K. 2011 . Comparative network analysis reveals that tissue specificity and gene function are important factors influencing the mode of expression evolution in Arabidopsis and rice . Plant Physiol . 156 3 : 1316 – 1330 . Google Scholar CrossRef Search ADS PubMed Mungall CJ , Torniai C , Gkoutos GV , Lewis SE , Haendel MA. 2012 . Uberon, an integrative multi-species anatomy ontology . Genome Biol . 13 1 : R5. Google Scholar CrossRef Search ADS PubMed Orr HA. 2005 . The genetic theory of adaptation: a brief history . Nat Rev Genet . 6 2 : 119 – 127 . Google Scholar CrossRef Search ADS PubMed Petryszak R , Keays M , Tang YA , Fonseca NA , Barrera E , Burdett T , Füllgrabe A , Fuentes AMP , Jupp S , Koskinen S et al. , . 2016 . Expression Atlas update – an integrated database of gene and protein expression in humans, animals and plants . Nucleic Acids Res. 44 ( D1 ): D746 – D752 . Google Scholar CrossRef Search ADS PubMed Pohl A , Beato M. 2014 . bwtool: a tool for bigWig files . Bioinformatics 30 11 : 1618 – 1619 . Google Scholar CrossRef Search ADS PubMed Rajagopal N , Xie W , Li Y , Wagner U , Wang W , Stamatoyannopoulos J , Ernst J , Kellis M , Ren B. 2013 . RFECS: a random-forest based algorithm for enhancer identification from chromatin state . PLoS Comput Biol . 9 3 : e1002968. Google Scholar CrossRef Search ADS PubMed Reddy S , Smith DBJ , Rich MM , Leferovich JM , Reilly P , Davis BM , Tran K , Rayburn H , Bronson R , Cros D et al. , . 1996 . Mice lacking the myotonic dystrophy protein kinase develop a late onset progressive myopathy . Nat Genet. 13 3 : 325 – 335 . Google Scholar CrossRef Search ADS PubMed Rice P , Longden I , Bleasby A. 2000 . EMBOSS: the European molecular biology open software suite . Trends Genet . 16 6 : 276 – 277 . Google Scholar CrossRef Search ADS PubMed Riggs AC , Bernal-Mizrachi E , Ohsugi M , Wasson J , Fatrai S , Welling C , Murray J , Schmidt RE , Herrera PL , Permutt MA. 2005 . Mice conditionally lacking the Wolfram gene in pancreatic islet beta cells exhibit diabetes as a result of enhanced endoplasmic reticulum stress and apoptosis . Diabetologia 48 11 : 2313 – 2321 . Google Scholar CrossRef Search ADS PubMed Robinson PN , Webber C. 2014 . Phenotype ontologies and cross-species analysis for translational research . PLoS Genet . 10 4 : e1004268. Google Scholar CrossRef Search ADS PubMed Rotthier A , Auer-Grumbach M , Janssens K , Baets J , Penno A , Almeida-Souza L , Van Hoof K , Jacobs A , De Vriendt E , Schlotter-Weigel B et al. , . 2010 . Mutations in the SPTLC2 subunit of serine palmitoyltransferase cause hereditary sensory and autonomic neuropathy type I . Am J Hum Genet . 87 4 : 513 – 522 . Google Scholar CrossRef Search ADS PubMed Ryan CJ , Krogan NJ , Cunningham P , Cagney G. 2013 . All or nothing: protein complexes flip essentiality between distantly related eukaryotes . Genome Biol Evol . 5 6 : 1049 – 1059 . Google Scholar CrossRef Search ADS PubMed Schmitt JP , Kamisago M , Asahi M , Li GH , Ahmad F , Mende U , Kranias EG , MacLennan DH , Seidman JG , Seidman CE. 2003 . Dilated cardiomyopathy and heart failure caused by a mutation in phospholamban . Science (80-) 299 5611 : 1410 – 1413 . Google Scholar CrossRef Search ADS Schofield PN , Hoehndorf R , Gkoutos GV. 2012 . Mouse genetic and phenotypic resources for human genetics . Hum Mutat . 33 5 : 826 – 836 . Google Scholar CrossRef Search ADS PubMed Schughart K , Libert C , Kas MJ. 2013 . Controlling complexity: the clinical relevance of mouse complex genetics . Eur J Hum Genet . 21 11 : 1191 – 1196 . Google Scholar CrossRef Search ADS PubMed Schwartz S , Kent WJ , Smit A , Zhang Z , Baertsch R , Hardison RC , Haussler D , Miller W. 2003 . Human–mouse alignments with BLASTZ . Genome Res . 13 1 : 103 – 107 . Google Scholar CrossRef Search ADS PubMed Seok J , Warren HS , Cuenca AG , Mindrinos MN , Baker HV , Xu W , Richards DR , McDonald-Smith GP , Gao H , Hennessy L et al. , . 2013 . Genomic responses in mouse models poorly mimic human inflammatory diseases . Proc Natl Acad Sci U S A . 110 9 : 3507 – 3512 . Google Scholar CrossRef Search ADS PubMed Siepel A , Bejerano G , Pedersen JS , Hinrichs AS , Hou M , Rosenbloom K , Clawson H , Spieth J , Hillier LW , Richards S et al. , . 2005 . Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes . Genome Res . 15 8 : 1034 – 1050 . Google Scholar CrossRef Search ADS PubMed Siltberg-Liberles J. 2011 . Evolution of structurally disordered proteins promotes neostructuralization . Mol Biol Evol . 28 1 : 59 – 62 . Google Scholar CrossRef Search ADS PubMed Smedley D , Oellrich A , Köhler S , Ruef B , Westerfield M , Robinson P , Lewis S , Mungall C. 2013 . PhenoDigm: analyzing curated annotations to associate animal models with human diseases . Database (Oxford) 2013 : bat025. Google Scholar CrossRef Search ADS PubMed Smith CL , Eppig JT. 2012 . The Mammalian Phenotype Ontology as a unifying standard for experimental and high-throughput phenotyping data . Mamm Genome. 23 ( 9–10 ): 653 – 668 . Google Scholar CrossRef Search ADS PubMed Sun S , Yang F , Tan G , Costanzo M , Oughtred R , Hirschman J , Theesfeld CL , Bansal P , Sahni N , Yi S et al. , . 2016 . An extended set of yeast-based functional assays accurately identifies human disease mutations . Genome Res . 26 5 : 670 – 680 . Google Scholar CrossRef Search ADS PubMed Takahashi M , Saitou N. 2012 . Identification and characterization of lineage-specific highly conserved noncoding sequences in mammalian genomes . Genome Biol Evol . 4 5 : 641 – 657 . Google Scholar CrossRef Search ADS PubMed Tatusov RL , Koonin EV , Lipman DJ. 1997 . A genomic perspective on protein families . Science (80-) 278 5338 : 631 – 637 . Google Scholar CrossRef Search ADS van den Broek WJAA , Nelen MR , Wansink DG , Coerwinkel MM , te Riele H , Groenen PJTA , Wieringa B. 2002 . Somatic expansion behaviour of the (CTG)n repeat in myotonic dystrophy knock-in mice is differentially affected by Msh3 and Msh6 mismatch-repair proteins . Hum Mol Genet . 11 2 : 191 – 198 . Google Scholar CrossRef Search ADS PubMed Vilella AJ , Severin J , Ureta-Vidal A , Heng L , Durbin R , Birney E. 2008 . EnsemblCompara GeneTrees: complete, duplication-aware phylogenetic trees in vertebrates . Genome Res . 19 2 : 327 – 335 . Google Scholar CrossRef Search ADS PubMed Villar D , Flicek P , Odom DT. 2014 . Evolution of transcription factor binding in metazoans—mechanisms and functional implications . Nat Rev Genet . 15 4 : 221 – 233 . Google Scholar CrossRef Search ADS PubMed Voordeckers K , Pougach K , Verstrepen KJ. 2015 . How do regulatory networks evolve and expand throughout evolution? Curr Opin Biotechnol . 34 : 180 – 188 . Google Scholar CrossRef Search ADS PubMed Wang Y , Robbins KR , Rekaya R. 2010 . Comparison of computational models for assessing conservation of gene expression across species . PLoS ONE. 5 10 : e13239 – e13236 . Google Scholar CrossRef Search ADS PubMed Welter D , MacArthur J , Morales J , Burdett T , Hall P , Junkins H , Klemm A , Flicek P , Manolio T , Hindorff L et al. , . 2014 . The NHGRI GWAS Catalog, a curated resource of SNP-trait associations . Nucleic Acids Res. 42 ( Database issue ): D1001 – D1006 . Google Scholar CrossRef Search ADS PubMed White JK , Gerdin A-K , Karp NA , Ryder E , Buljan M , Bussell JN , Salisbury J , Clare S , Ingham NJ , Podrini C et al. , . 2013 . Genome-wide generation and systematic phenotyping of knockout mice reveals new roles for many genes . Cell 154 2 : 452 – 464 . Google Scholar CrossRef Search ADS PubMed Woollard JR , Punyashtiti R , Richardson S , Masyuk TV , Whelan S , Huang BQ , Lager DJ , vanDeursen J , Torres VE , Gattone VH et al. , . 2007 . A mouse model of autosomal recessive polycystic kidney disease with biliary duct and proximal tubule dilatation . Kidney Int . 72 3 : 328 – 336 . Google Scholar CrossRef Search ADS PubMed Young RS , Hayashizaki Y , Andersson R , Sandelin A , Kawaji H , Itoh M , Lassmann T , Carninci P , Bickmore WA , Forrest AR et al. , . 2015 . The frequent evolutionary birth and death of functional promoters in mouse and human . Genome Res . 25 10 : 1546 – 1557 . Google Scholar CrossRef Search ADS PubMed Yue F , Cheng Y , Breschi A , Vierstra J , Wu W , Ryba T , Sandstrom R , Ma Z , Davis C , Pope BD et al. , . 2014 . A comparative encyclopedia of DNA elements in the mouse genome . Nature 515 7527 : 355 – 364 . Google Scholar CrossRef Search ADS PubMed © The Author(s) 2018. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. All rights reserved. For permissions, please e-mail: firstname.lastname@example.org This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://academic.oup.com/journals/pages/about_us/legal/notices)
Molecular Biology and Evolution – Oxford University Press
Published: Apr 24, 2018
It’s your single place to instantly
discover and read the research
that matters to you.
Enjoy affordable access to
over 18 million articles from more than
15,000 peer-reviewed journals.
All for just $49/month
Query the DeepDyve database, plus search all of PubMed and Google Scholar seamlessly
Save any article or search result from DeepDyve, PubMed, and Google Scholar... all in one place.
Get unlimited, online access to over 18 million full-text articles from more than 15,000 scientific journals.
Read from thousands of the leading scholarly journals from SpringerNature, Elsevier, Wiley-Blackwell, Oxford University Press and more.
All the latest content is available, no embargo periods.
“Hi guys, I cannot tell you how much I love this resource. Incredible. I really believe you've hit the nail on the head with this site in regards to solving the research-purchase issue.”Daniel C.
“Whoa! It’s like Spotify but for academic articles.”@Phil_Robichaud
“I must say, @deepdyve is a fabulous solution to the independent researcher's problem of #access to #information.”@deepthiw
“My last article couldn't be possible without the platform @deepdyve that makes journal papers cheaper.”@JoseServera