Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 7-Day Trial for You or Your Team.

Learn More →

Edgetic perturbation models of human inherited disorders

Edgetic perturbation models of human inherited disorders Introduction Decades of research into human Mendelian disorders has led to the discovery of a massive amount of disease‐associated allelic variations. Most disease‐causing mutations are thought to confer radical changes to proteins ( Wang and Moult, 2001 ; Botstein and Risch, 2003 ; Yue , 2005 ; Subramanian and Kumar, 2006 ). Consequently, genotype‐to‐phenotype relationships in human genetic disorders are often modeled as: ‘mutation in gene X leads to loss of gene product X, which leads to disease A’. A single ‘gene‐loss’ model seems pertinent for many diseases ( Botstein and Risch, 2003 ). However, this model cannot fully reconcile with the increasingly appreciated prevalence of complex genotype‐to‐phenotype associations for even ‘simple’ Mendelian disorders ( Goh , 2007 ), particularly in which: (i) a single gene can be associated with multiple disorders (allelic heterogeneity), (ii) a single disorder can be caused by mutations in any one of several genes (locus heterogeneity), (iii) only a subset of individuals carrying a mutation are affected by the disease (incomplete penetrance), or (iv) not all individuals with a given mutation are affected equally (variable expressivity). More complex models to interpret genotype‐to‐phenotype relationships would probably improve the understanding of human disease. Genes and gene products function not in isolation but as components of complex networks of macromolecules (DNA, RNA, or proteins) and metabolites linked through biochemical or physical interactions, often represented in ‘interactome’ network models as ‘nodes’ and ‘edges’, respectively. Cellular networks seem to exhibit systems properties underlying phenotypic variations ( Goh , 2007 ). Here we propose network‐perturbation models to explain molecular dysfunctions underlying human disease. We hypothesize that distinct mutations causing distinct molecular defects to proteins may lead to distinct perturbations of cellular networks, giving rise to distinct phenotypic outcomes ( Figure 1A ). Truncations close to the start of an open‐reading frame, or mutations that grossly destabilize a protein structure, can be modeled as removing a protein node from the network (‘node removal’). Alternatively, single amino‐acid substitutions that affect specific binding sites, or truncations that preserve certain domains of a protein, may give rise to partially functional gene products with specific changes in distinct biophysical or biochemical interaction(s) ( edge ‐specific gene tic perturbation or ‘edgetic’ perturbations; Figure 1B ). Node removal versus edgetic perturbation models of network changes underlying phenotypic alterations. ( A ) Schematic illustration of pleiotropic phenotypic outcomes resulting from distinct network perturbations upon complete loss of gene product (node removal, blue box) versus perturbation of specific molecular interactions (edgetic perturbation, red box). Solid lines between two nodes represent preserved interactions and dashed lines represent perturbed interactions. Edges are generally biophysical interactions, but could also be biochemical interactions. ( B ) Schematic illustration of distinct ‘truncating’ versus ‘in‐frame’ mutations causing distinct molecular defects in proteins leading to distinct node removal versus edgetic perturbation. Edgetic network perturbations provide alternative molecular explanations for protein dysfunction in addition to gene loss. Taking advantage of the large number of known disease‐causing allelic variations in human Mendelian disorders, we investigated how such mutations may cause complete loss of gene products or, alternatively, cause specific loss or gain of distinct molecular interaction(s). We further tested edgetic perturbation models in cases in which a single gene is associated with multiple disorders. Together, both experimental and computational evidence support edgetic perturbation models in human inherited disorders. Edgetic perturbations probably underlie many complex genotype‐to‐phenotype relationships. Results Global distribution of disease‐causing mutations To investigate possibly differing network perturbations in human inherited disorders, we examined ∼50 000 Mendelian disease‐causing alleles, affecting over 1900 protein‐coding genes, altogether associated with more than 2000 human disorders available in the Human Gene Mutation Database (HGMD) ( Stenson , 2003 ). We differentiated all disease alleles into two subsets probably causing different molecular defects to proteins. The first subset (‘truncating’ alleles) comprises all mutations that lead to the synthesis of truncated gene products, including nonsense mutations, out‐of‐frame insertions or deletions, or defective splicing. The second subset (‘in‐frame’ alleles) comprises mutations that probably give rise to nearly full‐length gene products, including missense mutations and in‐frame insertions or deletions. Over 50% (27 919/52 491) of Mendelian alleles in HGMD correspond to ‘in‐frame’ alleles ( Figure 2A ). Our hypothesis is that ‘truncating’ and ‘in‐frame’ alleles probably cause distinct molecular defects in proteins, and are thus enriched in distinct node removal or edgetic perturbations, respectively. This hypothesis is based on the assumption that ‘truncating’ alleles are less prone to produce stably folded proteins than ‘in‐frame’ alleles. Although exceptions may apply, our hypothesis predicts that ‘truncating’ versus ‘in‐frame’ alleles may distribute differently among diseases involving distinct node removal versus edgetic perturbations. Global patterns of disease mutations in human genetic disorders. ( A ) Subdivision of ‘truncating’ versus ‘in‐frame’ mutations in Human Gene Mutation Database (HGMD) ( Stenson , 2003 ). ( B ) Schematic illustration of distinct node removal versus edgetic perturbation models in disease with autosomal recessive versus autosomal dominant inheritance. ( C ) Distribution of autosomal recessive and dominant disease with respect to the associated ‘in‐frame’ mutations. Mutations in each gene associated with each mode of inheritance are grouped as one trait. Each data point represents the fraction of autosomal recessive (blue bar) or autosomal dominant (red bar) traits that have a fraction of ‘in‐frame’ mutations no less than the value on the x ‐axis. Statistical significance of the observed difference between distributions is assessed by Mann–Whitney U test ( P <9.2 × 10 −12 ). The number of traits, genes, diseases and total mutations in each bin are provided in Supplementary Table 1 . ( D ) Average fraction of ‘in‐frame’ mutations associated with autosomal dominant disease in transcription factors and structural proteins. P ‐value assessed by Mann–Whitney U test of the observed difference is shown. Given that, with the exception of haploinsufficiency, many established molecular explanations for dominance entail production of a mutated protein that interferes in some way with the function of the product of the normal allele, autosomal dominant disease should be more frequently associated with edgetic perturbation than node removal ( Figure 2B ). To test the hypothesis that ‘truncating’ versus ‘in‐frame’ alleles are enriched in distinct node removal versus edgetic perturbations, respectively, we retrieved the inheritance information, by manual curation, for each HGMD‐annotated phenotype from the Online Mendelian Inheritance in Man (OMIM) database ( Hamosh , 2005 ). ‘Truncating’ versus ‘in‐frame’ alleles distribute differently among autosomal dominant and autosomal recessive traits. Among genes affected solely by ‘in‐frame’ mutations, the proportion of autosomal dominant diseases is ∼10‐fold higher than that of autosomal recessive diseases ( Figure 2C ). This trend holds even after removing all human predicted orthologs of essential genes from the analysis ( Supplementary Figure S1 ). We next examined whether distinct distribution of ‘truncating’ versus ‘in‐frame’ alleles can also be found among autosomal dominant traits that are probably caused by different molecular mechanisms. Mutations in cytoskeleton proteins frequently cause dominant‐negative effects, in which incorporation of expressed abnormal molecules into multimeric assemblies of structural proteins disrupts the integrity and function of the complex ( Wilkie, 1994 ). In contrast, germline mutations in transcription factors are more frequently associated with haploinsufficiency ( Wilkie, 1994 ; Seidman and Seidman, 2002 ) probably because of insufficient activity or production of the remaining wild‐type allele in heterozygotes. Consistent with this distinction, a significantly higher fraction of ‘in‐frame’ mutations was found for autosomal dominant Mendelian disorders associated with structural proteins than with transcription factors ( Figure 2D ). Distinct global distributions of ‘truncating’ versus ‘in‐frame’ mutations among diseases with distinct modes of inheritance, and in proteins probably associated with distinct molecular mechanisms of dominance, support our hypothesis that ‘truncating’ versus ‘in‐frame’ alleles are probably enriched in distinct node removal versus edgetic perturbations, respectively. The distinctions observed between autosomal dominant and autosomal recessive mutations may be more pronounced if haploinsufficiency could be separated overall from dominant‐negative and other molecular mechanisms of dominance, but such information is currently unavailable at the global level. Distinguishing edgetic perturbation from node removal For a proof‐of‐principle analysis of allele‐specific network perturbations by disease proteins, we used an integrated experimental approach to characterize binary protein interaction defects of disease‐causing mutant alleles. Our approach includes (i) Gateway recombinational cloning of mutations by PCR‐based site‐directed mutagenesis ( Suzuki , 2005 ), (ii) high‐throughput mapping of binary protein–protein interactions ( Rual , 2005 ), (iii) high‐throughput characterization of protein–protein interaction defects of all cloned disease‐causing mutant proteins, and (iv) integration of network perturbations by disease‐causing mutations with structural or functional information of disease proteins. We selected disease proteins that have: (i) multiple mutations annotated in HGMD ( Stenson , 2003 ), (ii) wild‐type clones available in our human ORFeome collection, hORFeome 3.1 ( Lamesch , 2007 ), (iii) structural information available in Protein Data Bank (PDB, http://www.rcsb.org/pdb ), and (iv) two or more interactions reported in our previous binary human interactome map ( Rual , 2005 ). We also requested that at least one of the observed interactions by yeast two‐hybrid (Y2H) analysis be supported by functional characterization in the literature. Given these criteria, we could apply our allele‐profiling platform to one autosomal recessive disease protein (CBS), and to three autosomal dominant disease proteins with likely dominant‐negative (ACTG1), abnormal activation (CDK4), or haploinsufficiency (PRKAR1A) molecular defects ( Figure 3A ). We included one additional autosomal recessive disease protein (HGD) that meets all criteria except that no protein–protein interaction data were available ( Figure 3A ). We carried out a genome‐wide Y2H screen against a set of ∼8100 human open‐reading frames ( Rual , 2005 ), and identified three interactions for wild‐type HGD. We cloned disease‐causing mutants annotated in HGMD for these five proteins and profiled each mutant against the corresponding wild‐type interactors. Profiling allele‐specific interaction defects of disease‐causing mutant proteins. ( A ) Schematic illustration of selection of disease proteins for proof‐of‐principle analysis of binary protein interaction defects of disease‐causing mutant proteins. ( B ) Interpreted network perturbations for each allele comparing to corresponding wild‐type proteins. Missing lines represent lost protein interactions. Dashed lines represent reduced protein interactions. Color codes for distinct network perturbations are indicated at the top panel. Profiling interaction defects of 29 alleles associated with five distinct genetic disorders revealed three classes of interaction‐defective alleles ( Supplementary information and Figure 3B ): (i) five alleles that behaved as null, eliminating all interactions, (ii) 16 edgetic alleles that lost specific interaction(s) while retaining other interactions, and (iii) eight alleles that behaved as ‘pseudo‐wild‐type’, retaining all currently available protein–protein interactions tested here. Null‐like alleles were observed only for two autosomal recessive disease proteins (CBS and HGD) and in a supposed case of dominant haploinsufficiency (PRKAR1A), consistent with differing network perturbations in diseases associated with distinct modes of inheritance ( Figure 2B ). We propose that many disease‐causing alleles scoring as pseudo‐wild‐type in the assay described here might still be true edgetic alleles. Further analysis with additional physical and biochemical interactors using additional assays should eventually settle that question. We related Y2H interaction profiles of each mutant to structural properties of disease proteins ( Supplementary information and Supplementary Figure S2–6 ). Grossly disruptive mutations tend to affect buried residues of the protein, whereas mutations leading to loss or gain of specific interaction(s) tend to lie on the surface. Edgetic perturbation of some disease alleles revealed diverse molecular mechanisms of protein dysfunction ( Supplementary information ). Complex allele‐specific perturbations were also found to be associated with phenotypic variability among patients, such as their response to specific treatments ( Supplementary information for CBS). Structural analyses of disease‐causing mutations To further investigate the extent to which mutations found in human genetic disorders may grossly disrupt proteins or cause alterations in specific biochemical or biophysical interaction(s), we examined available three‐dimensional structures of all disease proteins. As grossly disruptive mutations versus mutations leading to loss or gain of specific interaction(s) probably distribute differently on protein structures ( Figure 4A ), we divided missense disease‐causing mutations into three non‐redundant categories: buried residues (<5% of surface accessible to water), exposed residues (⩾30% of surface accessible to water), and residues with intermediate exposure (5–30% of surface accessible to water). Among all 3664 affected residues in 236 proteins for which three‐dimensional X‐ray structures are available, about one‐third of the mutated residues are buried, whereas another one‐third are exposed, probably representing complete loss of gene products versus loss or gain of specific molecular interaction(s), respectively ( Supplementary Figure S7 ). Consistent with differing network perturbations in disease with distinct modes of inheritance ( Figure 2B ), autosomal dominant versus autosomal recessive disease mutations exhibit significant separation with respect to their solvent‐accessible surface areas ( P <3 × 10 −10 ; Figure 4B ). About 40% of mutated residues in autosomal dominant disease are exposed (with relative solvent‐accessible surface areas ⩾30%), whereas only 27% of mutated residues in autosomal recessive disease fall in the same category ( Figure 4B ). Structural analyses of disease‐causing mutations in HGMD. ( A ) Schematic illustration of distinct positions of missense mutations in the three‐dimensional structure of a given protein probably causing node removal versus edgetic perturbation. ( B ) Distribution of accessible residues among mutations associated with autosomal recessive diseases (blue bar) and with autosomal dominant ones (red bar). ( C ) Schematic illustration of distinct positions of ‘truncating’ mutations with respect to protein domains probably causing node removal versus edgetic perturbation. ( D ) Distribution of ‘truncating’ mutations in Pfam domains. Fold enrichment higher than one means that Pfam domains contain more mutations than expected at random, whereas enrichment between zero and one means that Pfam domains are depleted in mutations. P ‐values assess the significance of the observed fold enrichment. Allele‐specific perturbations observed in PRKAR1A ( Supplementary Figure S6 ) indicate that interaction‐specific perturbation by truncations is also possible. As ‘truncating’ alleles outside of protein domains may preserve function of certain domains, giving rise to interaction‐specific perturbations ( Figure 4C ), we determined the distribution of ‘truncating’ mutations in Pfam domains ( Finn , 2006 ). Although disease‐causing ‘truncating’ mutations seem to exhibit a random distribution with respect to Pfam domains (enrichment: 1.0, P =0.2), ‘truncating’ mutations in autosomal dominant disease are slightly depleted in Pfam domains, whereas ‘truncating’ mutations in autosomal recessive disease are slightly enriched in Pfam domains ( Figure 4D ). This finding is consistent with the hypothesis that different ‘truncating’ mutations may cause distinct node removal versus edgetic perturbations giving rise to disease with distinct modes of inheritance. In agreement with distinct molecular mechanisms of dominance ( Figure 2B ), we found a depletion of autosomal dominant ‘truncating’ mutations in Pfam domains for structural proteins against an enrichment for transcription factors ( Figure 4D ), probably associated with dominant‐negative effects versus haploinsufficiency, respectively. Node removal versus edgetic perturbation in complex gene‐disease associations The complex patterns of disease mutations noted so far indicate that a substantial fraction of causative alleles in human genetic disorders may cause edgetic perturbations rather than node removal. Distinct network perturbation models, leading to distinct phenotypic outcomes ( Figure 1 ), predict that ‘truncating’ versus ‘in‐frame’ alleles for a given gene product might cause different diseases ( Figure 5A ). We therefore examined 142 genes associated with two or more diseases for which at least five distinct alleles have been reported for each disease. Among 278 disease pairs, each associated with a single one of these 142 genes, we found 88 pairs (∼30%) for which the proportion of ‘in‐frame’ versus ‘truncating’ mutations is significantly different between the two diseases ( P <0.05; Figure 5B and Supplementary Table 2 ). A noteworthy example involves the four types (I, II, III, and IV) of osteogenesis imperfecta (OI) with COL1A1 ‘in‐frame’ mutations causing strikingly more severe phenotypes (in type II, III, or IV) than ‘truncating’ mutations involved in type I ( Hamosh , 2005 ; Figure 5B ). Distinct node removal versus edgetic perturbation underlying pleiotropy. ( A ) Schematic illustration of distinct ‘truncating’ versus ‘in‐frame’ mutations in a single gene product causing distinct network perturbations giving rise to distinct disorders. ( B ) Analysis of ‘in‐frame’ mutations found in genes associated with multiple diseases. Each dot represents the fraction of ‘in‐frame’ mutations of a pair of distinct diseases associated with a common gene. x ‐axis represents the smaller fraction of ‘in‐frame’ mutation in each pair and y ‐axis represents the larger fraction. Significantly different fractions of ‘in‐frame’ mutation between each pair of diseases are represented by red dots ( P <0.05). Statistically indistinguishable pairs are represented in black. Three gray arrows pointing to three disease pairs corresponding to Type I and Type II, III or IV Osteogenesis Imperfecta, with ‘in‐frame’ mutation fraction of 0.19 and 0.93, 0.83, 0.75 respectively. Among 34 genes that are linked to both autosomal dominant and autosomal recessive disorders, the fraction of ‘in‐frame’ versus ‘truncating’ mutations per gene is significantly higher for autosomal dominant mutations than for autosomal recessive ones ( Supplementary Figure S8 ). This finding further supports our hypothesis that distinct ‘in‐frame’ versus ‘truncating’ mutations probably cause distinct network perturbations giving rise to disease with distinct modes of inheritance ( Figure 2 ). Edgetic interaction profiles of CBS and PRKAR1A mutant proteins ( Figure 3 ) revealed possible connections between allele‐specific interaction defects and differential treatment responses or phenotypic severity among patients ( Supplementary information ). In addition to clinical variability, edgetic perturbation models also predict that distinct edgetic perturbations for a given gene product might cause phenotypically distinguishable disorders ( Figure 6A ). We used predicted Pfam domains as surrogates for functional protein domains ( Sammut , 2008 ), assuming that ‘in‐frame’ mutations located in different Pfam domains probably alter protein functions differently. Among 169 genes associated with two or more diseases and encoding proteins containing at least two Pfam domains, 77 had significant enrichment of ‘in‐frame’ mutations in Pfam domains ( P <0.05). There were nine proteins with at least two Pfam domains significantly enriched with ‘in‐frame’ mutations ( P <0.05). For each of the nine proteins, we found a striking pattern of near mutual exclusivity, whereby different Pfam domains seem to be specifically affected in distinct disorders ( Figure 6B and Supplementary Table 3 ). A compelling example is TP63 ( van Bokhoven and Brunner, 2002 ) in which two clinically distinct developmental disorders, ectrodactyly ectodermal dysplasia (EEC) and ankyloblepharon ectodermal dysplasia (AEC), are caused by mutations in two separate domains, one predicted to bind DNA and the other to mediate protein–protein interaction(s) ( Figure 6B ). Current information on protein functional domains is incomplete, thus limiting the resolution for distinguishing phenotypes and genotypes. With more detailed structural and biochemical information available, more such allele‐specific edgetic phenotype‐to‐genotype correlations should be uncovered. Distinct edgetic perturbations underlying pleiotropy. ( A ) Schematic illustration of distinct ‘in‐frame’ alleles in a single gene product causing distinct network perturbations giving rise to distinct disorders. ( B ) Enrichment of ‘in‐frame’ mutations causing different disorders in different Pfam domains. Color intensity of Pfam domains represents fold enrichment of each disease associated ‘in‐frame’ mutations ( P <0.05). Vertical lines below corresponding Pfam domains mark disease‐causing ‘in‐frame’ mutations in TP63 . Discussion There are commonalities behind disease mutations that have been discerned, such as disease mutations tend to present at highly conserved regions and to confer radical changes to proteins ( Wang and Moult, 2001 ; Botstein and Risch, 2003 ; Yue , 2005 ; Subramanian and Kumar, 2006 ), but there are more complexities to disease mutations and these should not be overlooked. Here we uncovered both experimental and computational evidences that strongly support distinct network perturbations in human Mendelian disorders resulting from complete loss of gene products (node removal) or specific alterations in distinct molecular interaction(s) (edgetic perturbation), respectively ( Figures 2 , 3 and 4 ). Distinct edgetic network perturbations probably underlie many complex genotype‐to‐phenotype relationships in human genetic disorders ( Figures 5 and 6 ) supporting the idea that edgetic perturbation versus node removal may confer fundamentally different functional consequences. Edgetic network perturbation models focus on specific alterations in distinct molecular interactions. Although the ‘node‐centered’ gene knockout or knockdown approaches are convenient and useful in determining effects of gross disruption of proteins in model organisms, an ‘edge‐centered’ allele‐profiling approach, as carried out here and elsewhere ( Dreze , in press ), dissects the dynamics and complexities of biological systems, in which different interactions may occur independently, and in which a single protein may carry out different functions with different partners or in different biological contexts. Edgetic alleles with suboptimal but largely preserved molecular interactions may become insufficient when expressed at reduced levels or may become less stable. Such properties of edgetic alleles may be regulated by other genetic or environmental factors. In this regard, functional characterization of edgetic alleles may help explain phenotypic variations among patients, such as incomplete penetrance or variable expressivity, as well as differential clinical treatment responses (e.g. CBS alleles, Supplementary information ). In addition, edgetic network perturbation models might improve our understanding of why and how disease alleles have disseminated in human populations. Just as high‐throughput sequencing technologies are revolutionizing genotyping platforms, and as functional genomics and proteomics are becoming increasingly able to characterize gene products resulting from whole genome sequencing and gene prediction, functional characterizations of genetic variations may be applied at large‐scale to characterize mutations with uncertain pathological consequences. We considered the effects of disease‐causing mutations on physical protein–protein interactions, perturbation of which has emerged as a characteristic shared by many disease mutations ( Ye , 2006 ; Hsu , 2007 ; Schuster‐Böckler and Bateman, 2008 ). Complete understanding of network perturbations in disease would require comprehensive analysis of disease mutant proteins by integration of data available from multiple functional assays. First, the current interactome network derived from Y2H analysis is probably incomplete. Many biologically relevant interactors remain to be tested and many may not be recovered by Y2H alone or by any other single protein interaction assay ( Braun , 2009 ; Venkatesan , 2009 ). Second, Y2H detects binary protein interactions. A positive Y2H readout does not necessarily warrant proper protein complex assembly in vivo . In oligomer assembly, multiple interaction surfaces of the monomer may be utilized. Mutant alleles that disrupt one but not all interaction surfaces may show positive interaction in the Y2H analysis, but may still affect proper oligomerization. Third, Y2H is not quantitative. Subtle alterations in the affinity of protein–protein interactions, which are undetectable by Y2H, may confer phenotypic changes. Finally, disease mutations may affect protein functions by altering biochemical activities or protein–DNA or protein–RNA interactions. Disease‐associated alleles may also gain new interactions, which is another important potential mechanism for pathogenicity. Gain‐of‐interaction alleles may be discovered by screening for new interactions specific for an individual mutant. Although we can assay only known edges at any given moment, as more physical and biochemical interactions become identified with time, deeper edgetic profiling will become possible. The pilot step taken here will reach its full potential when applied at genome or proteome scale, with the results integrated into extensive molecular networks. Materials and methods Database annotation The lists of genes and associated phenotypes were downloaded from HGMD website ( Stenson , 2003 ) (June 2006). The corresponding gene IDs were retrieved from Entrez Gene ( Maglott , 2005 ) (June 2006). By manual annotation we linked phenotypes associated with each mutation, as annotated in HGMD, to the corresponding disease in the OMIM database ( Hamosh , 2005 ). The resulting list contains 2269 gene‐to‐OMIM disease ID entries associated with 48 774 distinct mutations. We carried out all analyses on the resulting gene–OMIM disease associations. We obtained the inheritance information for the corresponding disease available in OMIM and separated mutations associated with autosomal dominant or autosomal recessive inheritance. A total of 1777 gene‐to‐OMIM disease entries, which involve 1281 genes, 1466 OMIM disease IDs and 35 154 mutations, are associated with either autosomal dominant or autosomal recessive inheritance. Fraction of ‘in‐frame’ mutations We grouped missense and small in‐frame insertions, deletions and indels (types of mutations as defined in HGMD) as ‘in‐frame’ mutations, whereas nonsense, splicing and small out‐of‐frame frame insertions, deletions and indels we grouped as ‘truncating’ mutations. We calculated the fraction of ‘in‐frame’ mutations as the number of ‘in‐frame’ mutations divided by the total number of mutations in each gene for each mode of inheritance ( Figure 2C and D and Supplementary Figures S1 and S8 ) or for each disease ( Figure 5B ). To minimize the possibility of any existing trend being obscured by genes with few mutations, we limited our analysis to genes that have five or more mutations associated with each inheritance ( Figure 2C and D and Supplementary Figures S1 and S8) or each disease ( Figure 5B ). Essential human genes were estimated from the orthologs of mouse ( Goh , 2007 ), fly, worm and yeast essential genes. Fly essential genes were extracted from Flybase ( Wilson , 2008b ; phenotype class: ‘lethal’), yeast essential genes from SGD ( Ball , 2000 ; phenotype: ‘inviable’), and worm essential genes from RNAiDB ( Gunsalus , 2004 ; phenotypes: ‘lethal’, ‘embryonic lethal’, ‘larval lethal’ and ‘adult lethal’). Profiling interaction defects of mutant proteins Disease mutant clones were generated by PCR mutagenesis essentially as described previously ( Suzuki , 2005 ). Forward and reverse internal primers used are listed ( Supplementary Table 4 ). All sequence‐confirmed Entry clones of mutant alleles were transferred individually by Gateway recombinational cloning into both pDB‐dest and pAD‐dest‐CYH destination vectors, generating DB–ORF allele and AD–ORF allele fusions ( Rual , 2005 ). To test against wild‐type interactors, the DB–ORF and AD–ORF clones for CBS, HGD, ACTG1, CDK4 and PRKAR1A mutant proteins were transformed into MAT α MaV203 or MAT a MaV103 yeast strains, respectively. Each interaction pair was tested for growth on SC‐His+3AT (synthetic medium without leucine, tryptophan and histidine, containing 20 mM 3‐amino‐1,2,4‐triazole) plates to confirm GAL1 :: HIS3 transcriptional activity, on yeast extract–peptone–dextrose (YPD) medium to determine GAL1 :: lacZ transcriptional activity using a ‐galactosidase filter assay, and on SC‐Ura plates (synthetic medium without leucine, tryptophan and uracil) to determine SPAL10 :: URA3 transcriptional activity. Scoring of Y2H reporters was done by comparing to a set of Y2H control strains that contain plasmids expressing pairs of proteins with a spectrum of interaction strengths ( Supplementary Figure S9 ). Activation of at least two of the three reporter genes was taken as a positive interaction. Interaction pairs showing less than two positive reporters are scored as ‘−’. Interaction pairs showing the same number of positive reporters as the corresponding wild type are scored as ‘+’. Interactions that lose expression of one reporter but still show expression of the other two reporters are scored as ‘R’. For immunoblotting, yeast cells with AD–ORF fusions were cultured overnight at 30°C in synthetic medium without tryptophan and then grown in YPD medium to mid‐exponential phase. Cells were collected and treated with 150 mM of NaOH on ice for 15 min and then lysed in 0.8% SDS buffer (0.024 M Tris–HCl (pH 6.8), 10% glycerol, 0.04% bromophenol blue and 0.4% 2‐mercaptoethanol) for 5 min at 95°C. Whole cell lysates were cleared by centrifugation at 14 000 g . Resulting supernatants were separated on NuPAGE acrylamide gels (Invitrogen) and electrophoretically transferred onto a PVDF membrane (Invitrogen). AD fusion proteins were detected by standard immunoblotting techniques using anti‐GAL4 (Activation domain) antibody produced in rabbit (Sigma) as the primary antibody. For comparison with experimental data, the following structures were used: 1JBQ for CBS ( Meier , 2001 ), 1EYB and 1EY2 for HGD ( Titus , 2000 ), 2BTF ( Schutt , 1993 ), 1HLU ( Chik , 1996 ) and 2OAN ( Lassing , 2007 ) for bovine β‐actin, 2W9F, 2W9Z, 2W96, 2W99 ( Day , 2009 ) for CDK4, and 1G3N ( Jeffrey , 2000 ) for CDK6–CDKN2C complex. Figures of tertiary structures were generated with PyMol ( http://www.pymol.org ). The relative solvent‐accessible surface areas (%ASAs) were calculated with PSAIA ( Mihel , 2008 ). Structural analyses Protein structures were downloaded from the Protein Data Bank website (PDB, http://www.rcsb.org/pdb ). Removal of redundant structures was achieved using the PISCES server ( Wang and Dunbrack, 2005 ) with the following criteria: X‐ray structures only; no structure with Cα only; resolution ⩽3 Å; R‐factor ⩽0.3; sequence length between 40 and 10 000 amino acids; and maximum 90% of sequence identity between similar PDB structures. This filtering collected 249 non‐redundant protein structures corresponding to 236 genes in HGMD. To repair residual mismatches between the residue numeration in PDB files and in HGMD, PDB sequences were aligned against their corresponding cDNA sequences in HGMD using CLUSTALW ( Chenna , 2003 ). The relative accessibility of over 91 000 residues in all 249 structures was calculated using PSAIA ( Mihel , 2008 ). With multimers, accessibility was computed for all monomers considered independently and the multiple values obtained for the same residue were averaged. Among the 3664 residues affected by missense mutations, 1590 and 1045 were associated with autosomal recessive and autosomal dominant diseases, respectively. Pfam domain assignment Pfam domains (Pfam‐A family only) were computed for cDNA sequences provided by HGMD, using InterProScan version 4.3 ( http://www.ebi.ac.uk/Tools/InterProScan/ ). Missense, nonsense, in‐frame and out‐of‐frame small insertions, deletions, and indels were then mapped onto the cDNA sequences and Pfam domains, generating a dataset containing 1348 genes with at least one Pfam‐A domain and 34 964 associated mutations. Among them, a total of 10 904 ‘truncating’ mutations are used for the analysis shown in Figure 4D , including 6212 associated with autosomal dominant diseases and 4692 associated with autosomal recessive diseases. Statistics were generated on the sum of a particular mutation type that either fell into or out of any Pfam‐A domain in its respective protein versus the total fraction of the Pfam‐A domain sequences in the protein sequence. Transcription factors and structural proteins Information on genes encoding transcription factors was obtained from Gene Ontology ( Harris , 2004 ) annotations (948 genes with the GO term of ‘transcription factor activity’) and predictions in the transcription factor database (DNA Binding Domain, DBD; Wilson , 2008a ; 1467 genes). A total of 1697 human transcription factor genes were retrieved. Among them, 82 genes associated with autosomal dominant diseases that have at least one mutation in HGMD were used for Pfam analysis ( Figure 4D ), and 56 genes with five mutations or more were used for analysis of ‘in‐frame’ mutations ( Figure 2D ). Structural protein coding genes were retrieved from Gene Ontology annotations of ‘cytoskeleton’ (992 genes). Among them, 72 genes with at least one mutation in HGMD were used for Pfam analysis ( Figure 4D ), and 47 genes with five mutations or more were used for analysis of ‘in‐frame’ mutations ( Figure 2D ). DBD and Gene Ontology data were downloaded in March 2008. Statistical analysis Error bars represent the s.e.m. values. Significance of the observed difference in the distributions of ‘in‐frame’ versus ‘truncating’ mutations in autosomal dominant and autosomal recessive disease, the greater proportions of ‘in‐frame’ mutations in structural proteins than in transcription factors, as well as the greater accessibility of residues mutated in autosomal dominant versus autosomal recessive diseases, was evaluated using the non‐parametric Mann–Whitney U test. Enrichments of disease alleles in Pfam domains were determined using odds ratio and the significance thereof using Fisher's exact test. A fold enrichment higher than one means Pfam domains contain more mutations than expected at random, whereas an enrichment between zero and one means a depletion in mutations. The differences between proportions of ‘in‐frame’ mutations in each pair of diseases associated with the same gene were assessed by Fisher's exact test. All statistics were computed using the R package ( http://www.r‐project.org/ ). Acknowledgements We thank all members of the Vidal Lab and the Center for Cancer Systems Biology (CCSB), Dr Patricia K Donahoe and Dr Roseann Mulloy for helpful suggestions; Ines M Pinto for help with experiments. This study was supported by the Ellison Foundation and the WM Keck Foundation (MV), NIH grants R01‐HG001715 from NHGRI (MV and F Roth), U01‐CA105423 (PI, S Orkin, project leader, MV), U54‐CA112952 (PI, J Nevins, subcontract, MV) and R33‐CA132073 (MV) from NCI, and by Institute Sponsored Research funds from the Dana‐Farber Cancer Institute Strategic Initiative awarded to CCSB. KV was supported by an NIH NRSA training grant fellowship (T32‐CA09361). BC was supported by the Belgian Program on Interuniversity Attraction Poles initiated by the Federal Office for Scientific, Technical and Cultural Affairs (IAP P6/19 PROFUSA). MV and RB are ‘Honorary Research Associate’ and ‘Research Director’ from the Fonds de la Recherche Scientifique (FRS‐FNRS, French Community of Belgium), respectively. Conflict of Interest The authors declare that they have no conflict of interest. http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Molecular Systems Biology Wiley

Loading next page...
 
/lp/wiley/edgetic-perturbation-models-of-human-inherited-disorders-zEkc6Oks3R

References (67)

Publisher
Wiley
Copyright
Copyright © 2013 Wiley Periodicals, Inc
eISSN
1744-4292
DOI
10.1038/msb.2009.80
pmid
19888216
Publisher site
See Article on Publisher Site

Abstract

Introduction Decades of research into human Mendelian disorders has led to the discovery of a massive amount of disease‐associated allelic variations. Most disease‐causing mutations are thought to confer radical changes to proteins ( Wang and Moult, 2001 ; Botstein and Risch, 2003 ; Yue , 2005 ; Subramanian and Kumar, 2006 ). Consequently, genotype‐to‐phenotype relationships in human genetic disorders are often modeled as: ‘mutation in gene X leads to loss of gene product X, which leads to disease A’. A single ‘gene‐loss’ model seems pertinent for many diseases ( Botstein and Risch, 2003 ). However, this model cannot fully reconcile with the increasingly appreciated prevalence of complex genotype‐to‐phenotype associations for even ‘simple’ Mendelian disorders ( Goh , 2007 ), particularly in which: (i) a single gene can be associated with multiple disorders (allelic heterogeneity), (ii) a single disorder can be caused by mutations in any one of several genes (locus heterogeneity), (iii) only a subset of individuals carrying a mutation are affected by the disease (incomplete penetrance), or (iv) not all individuals with a given mutation are affected equally (variable expressivity). More complex models to interpret genotype‐to‐phenotype relationships would probably improve the understanding of human disease. Genes and gene products function not in isolation but as components of complex networks of macromolecules (DNA, RNA, or proteins) and metabolites linked through biochemical or physical interactions, often represented in ‘interactome’ network models as ‘nodes’ and ‘edges’, respectively. Cellular networks seem to exhibit systems properties underlying phenotypic variations ( Goh , 2007 ). Here we propose network‐perturbation models to explain molecular dysfunctions underlying human disease. We hypothesize that distinct mutations causing distinct molecular defects to proteins may lead to distinct perturbations of cellular networks, giving rise to distinct phenotypic outcomes ( Figure 1A ). Truncations close to the start of an open‐reading frame, or mutations that grossly destabilize a protein structure, can be modeled as removing a protein node from the network (‘node removal’). Alternatively, single amino‐acid substitutions that affect specific binding sites, or truncations that preserve certain domains of a protein, may give rise to partially functional gene products with specific changes in distinct biophysical or biochemical interaction(s) ( edge ‐specific gene tic perturbation or ‘edgetic’ perturbations; Figure 1B ). Node removal versus edgetic perturbation models of network changes underlying phenotypic alterations. ( A ) Schematic illustration of pleiotropic phenotypic outcomes resulting from distinct network perturbations upon complete loss of gene product (node removal, blue box) versus perturbation of specific molecular interactions (edgetic perturbation, red box). Solid lines between two nodes represent preserved interactions and dashed lines represent perturbed interactions. Edges are generally biophysical interactions, but could also be biochemical interactions. ( B ) Schematic illustration of distinct ‘truncating’ versus ‘in‐frame’ mutations causing distinct molecular defects in proteins leading to distinct node removal versus edgetic perturbation. Edgetic network perturbations provide alternative molecular explanations for protein dysfunction in addition to gene loss. Taking advantage of the large number of known disease‐causing allelic variations in human Mendelian disorders, we investigated how such mutations may cause complete loss of gene products or, alternatively, cause specific loss or gain of distinct molecular interaction(s). We further tested edgetic perturbation models in cases in which a single gene is associated with multiple disorders. Together, both experimental and computational evidence support edgetic perturbation models in human inherited disorders. Edgetic perturbations probably underlie many complex genotype‐to‐phenotype relationships. Results Global distribution of disease‐causing mutations To investigate possibly differing network perturbations in human inherited disorders, we examined ∼50 000 Mendelian disease‐causing alleles, affecting over 1900 protein‐coding genes, altogether associated with more than 2000 human disorders available in the Human Gene Mutation Database (HGMD) ( Stenson , 2003 ). We differentiated all disease alleles into two subsets probably causing different molecular defects to proteins. The first subset (‘truncating’ alleles) comprises all mutations that lead to the synthesis of truncated gene products, including nonsense mutations, out‐of‐frame insertions or deletions, or defective splicing. The second subset (‘in‐frame’ alleles) comprises mutations that probably give rise to nearly full‐length gene products, including missense mutations and in‐frame insertions or deletions. Over 50% (27 919/52 491) of Mendelian alleles in HGMD correspond to ‘in‐frame’ alleles ( Figure 2A ). Our hypothesis is that ‘truncating’ and ‘in‐frame’ alleles probably cause distinct molecular defects in proteins, and are thus enriched in distinct node removal or edgetic perturbations, respectively. This hypothesis is based on the assumption that ‘truncating’ alleles are less prone to produce stably folded proteins than ‘in‐frame’ alleles. Although exceptions may apply, our hypothesis predicts that ‘truncating’ versus ‘in‐frame’ alleles may distribute differently among diseases involving distinct node removal versus edgetic perturbations. Global patterns of disease mutations in human genetic disorders. ( A ) Subdivision of ‘truncating’ versus ‘in‐frame’ mutations in Human Gene Mutation Database (HGMD) ( Stenson , 2003 ). ( B ) Schematic illustration of distinct node removal versus edgetic perturbation models in disease with autosomal recessive versus autosomal dominant inheritance. ( C ) Distribution of autosomal recessive and dominant disease with respect to the associated ‘in‐frame’ mutations. Mutations in each gene associated with each mode of inheritance are grouped as one trait. Each data point represents the fraction of autosomal recessive (blue bar) or autosomal dominant (red bar) traits that have a fraction of ‘in‐frame’ mutations no less than the value on the x ‐axis. Statistical significance of the observed difference between distributions is assessed by Mann–Whitney U test ( P <9.2 × 10 −12 ). The number of traits, genes, diseases and total mutations in each bin are provided in Supplementary Table 1 . ( D ) Average fraction of ‘in‐frame’ mutations associated with autosomal dominant disease in transcription factors and structural proteins. P ‐value assessed by Mann–Whitney U test of the observed difference is shown. Given that, with the exception of haploinsufficiency, many established molecular explanations for dominance entail production of a mutated protein that interferes in some way with the function of the product of the normal allele, autosomal dominant disease should be more frequently associated with edgetic perturbation than node removal ( Figure 2B ). To test the hypothesis that ‘truncating’ versus ‘in‐frame’ alleles are enriched in distinct node removal versus edgetic perturbations, respectively, we retrieved the inheritance information, by manual curation, for each HGMD‐annotated phenotype from the Online Mendelian Inheritance in Man (OMIM) database ( Hamosh , 2005 ). ‘Truncating’ versus ‘in‐frame’ alleles distribute differently among autosomal dominant and autosomal recessive traits. Among genes affected solely by ‘in‐frame’ mutations, the proportion of autosomal dominant diseases is ∼10‐fold higher than that of autosomal recessive diseases ( Figure 2C ). This trend holds even after removing all human predicted orthologs of essential genes from the analysis ( Supplementary Figure S1 ). We next examined whether distinct distribution of ‘truncating’ versus ‘in‐frame’ alleles can also be found among autosomal dominant traits that are probably caused by different molecular mechanisms. Mutations in cytoskeleton proteins frequently cause dominant‐negative effects, in which incorporation of expressed abnormal molecules into multimeric assemblies of structural proteins disrupts the integrity and function of the complex ( Wilkie, 1994 ). In contrast, germline mutations in transcription factors are more frequently associated with haploinsufficiency ( Wilkie, 1994 ; Seidman and Seidman, 2002 ) probably because of insufficient activity or production of the remaining wild‐type allele in heterozygotes. Consistent with this distinction, a significantly higher fraction of ‘in‐frame’ mutations was found for autosomal dominant Mendelian disorders associated with structural proteins than with transcription factors ( Figure 2D ). Distinct global distributions of ‘truncating’ versus ‘in‐frame’ mutations among diseases with distinct modes of inheritance, and in proteins probably associated with distinct molecular mechanisms of dominance, support our hypothesis that ‘truncating’ versus ‘in‐frame’ alleles are probably enriched in distinct node removal versus edgetic perturbations, respectively. The distinctions observed between autosomal dominant and autosomal recessive mutations may be more pronounced if haploinsufficiency could be separated overall from dominant‐negative and other molecular mechanisms of dominance, but such information is currently unavailable at the global level. Distinguishing edgetic perturbation from node removal For a proof‐of‐principle analysis of allele‐specific network perturbations by disease proteins, we used an integrated experimental approach to characterize binary protein interaction defects of disease‐causing mutant alleles. Our approach includes (i) Gateway recombinational cloning of mutations by PCR‐based site‐directed mutagenesis ( Suzuki , 2005 ), (ii) high‐throughput mapping of binary protein–protein interactions ( Rual , 2005 ), (iii) high‐throughput characterization of protein–protein interaction defects of all cloned disease‐causing mutant proteins, and (iv) integration of network perturbations by disease‐causing mutations with structural or functional information of disease proteins. We selected disease proteins that have: (i) multiple mutations annotated in HGMD ( Stenson , 2003 ), (ii) wild‐type clones available in our human ORFeome collection, hORFeome 3.1 ( Lamesch , 2007 ), (iii) structural information available in Protein Data Bank (PDB, http://www.rcsb.org/pdb ), and (iv) two or more interactions reported in our previous binary human interactome map ( Rual , 2005 ). We also requested that at least one of the observed interactions by yeast two‐hybrid (Y2H) analysis be supported by functional characterization in the literature. Given these criteria, we could apply our allele‐profiling platform to one autosomal recessive disease protein (CBS), and to three autosomal dominant disease proteins with likely dominant‐negative (ACTG1), abnormal activation (CDK4), or haploinsufficiency (PRKAR1A) molecular defects ( Figure 3A ). We included one additional autosomal recessive disease protein (HGD) that meets all criteria except that no protein–protein interaction data were available ( Figure 3A ). We carried out a genome‐wide Y2H screen against a set of ∼8100 human open‐reading frames ( Rual , 2005 ), and identified three interactions for wild‐type HGD. We cloned disease‐causing mutants annotated in HGMD for these five proteins and profiled each mutant against the corresponding wild‐type interactors. Profiling allele‐specific interaction defects of disease‐causing mutant proteins. ( A ) Schematic illustration of selection of disease proteins for proof‐of‐principle analysis of binary protein interaction defects of disease‐causing mutant proteins. ( B ) Interpreted network perturbations for each allele comparing to corresponding wild‐type proteins. Missing lines represent lost protein interactions. Dashed lines represent reduced protein interactions. Color codes for distinct network perturbations are indicated at the top panel. Profiling interaction defects of 29 alleles associated with five distinct genetic disorders revealed three classes of interaction‐defective alleles ( Supplementary information and Figure 3B ): (i) five alleles that behaved as null, eliminating all interactions, (ii) 16 edgetic alleles that lost specific interaction(s) while retaining other interactions, and (iii) eight alleles that behaved as ‘pseudo‐wild‐type’, retaining all currently available protein–protein interactions tested here. Null‐like alleles were observed only for two autosomal recessive disease proteins (CBS and HGD) and in a supposed case of dominant haploinsufficiency (PRKAR1A), consistent with differing network perturbations in diseases associated with distinct modes of inheritance ( Figure 2B ). We propose that many disease‐causing alleles scoring as pseudo‐wild‐type in the assay described here might still be true edgetic alleles. Further analysis with additional physical and biochemical interactors using additional assays should eventually settle that question. We related Y2H interaction profiles of each mutant to structural properties of disease proteins ( Supplementary information and Supplementary Figure S2–6 ). Grossly disruptive mutations tend to affect buried residues of the protein, whereas mutations leading to loss or gain of specific interaction(s) tend to lie on the surface. Edgetic perturbation of some disease alleles revealed diverse molecular mechanisms of protein dysfunction ( Supplementary information ). Complex allele‐specific perturbations were also found to be associated with phenotypic variability among patients, such as their response to specific treatments ( Supplementary information for CBS). Structural analyses of disease‐causing mutations To further investigate the extent to which mutations found in human genetic disorders may grossly disrupt proteins or cause alterations in specific biochemical or biophysical interaction(s), we examined available three‐dimensional structures of all disease proteins. As grossly disruptive mutations versus mutations leading to loss or gain of specific interaction(s) probably distribute differently on protein structures ( Figure 4A ), we divided missense disease‐causing mutations into three non‐redundant categories: buried residues (<5% of surface accessible to water), exposed residues (⩾30% of surface accessible to water), and residues with intermediate exposure (5–30% of surface accessible to water). Among all 3664 affected residues in 236 proteins for which three‐dimensional X‐ray structures are available, about one‐third of the mutated residues are buried, whereas another one‐third are exposed, probably representing complete loss of gene products versus loss or gain of specific molecular interaction(s), respectively ( Supplementary Figure S7 ). Consistent with differing network perturbations in disease with distinct modes of inheritance ( Figure 2B ), autosomal dominant versus autosomal recessive disease mutations exhibit significant separation with respect to their solvent‐accessible surface areas ( P <3 × 10 −10 ; Figure 4B ). About 40% of mutated residues in autosomal dominant disease are exposed (with relative solvent‐accessible surface areas ⩾30%), whereas only 27% of mutated residues in autosomal recessive disease fall in the same category ( Figure 4B ). Structural analyses of disease‐causing mutations in HGMD. ( A ) Schematic illustration of distinct positions of missense mutations in the three‐dimensional structure of a given protein probably causing node removal versus edgetic perturbation. ( B ) Distribution of accessible residues among mutations associated with autosomal recessive diseases (blue bar) and with autosomal dominant ones (red bar). ( C ) Schematic illustration of distinct positions of ‘truncating’ mutations with respect to protein domains probably causing node removal versus edgetic perturbation. ( D ) Distribution of ‘truncating’ mutations in Pfam domains. Fold enrichment higher than one means that Pfam domains contain more mutations than expected at random, whereas enrichment between zero and one means that Pfam domains are depleted in mutations. P ‐values assess the significance of the observed fold enrichment. Allele‐specific perturbations observed in PRKAR1A ( Supplementary Figure S6 ) indicate that interaction‐specific perturbation by truncations is also possible. As ‘truncating’ alleles outside of protein domains may preserve function of certain domains, giving rise to interaction‐specific perturbations ( Figure 4C ), we determined the distribution of ‘truncating’ mutations in Pfam domains ( Finn , 2006 ). Although disease‐causing ‘truncating’ mutations seem to exhibit a random distribution with respect to Pfam domains (enrichment: 1.0, P =0.2), ‘truncating’ mutations in autosomal dominant disease are slightly depleted in Pfam domains, whereas ‘truncating’ mutations in autosomal recessive disease are slightly enriched in Pfam domains ( Figure 4D ). This finding is consistent with the hypothesis that different ‘truncating’ mutations may cause distinct node removal versus edgetic perturbations giving rise to disease with distinct modes of inheritance. In agreement with distinct molecular mechanisms of dominance ( Figure 2B ), we found a depletion of autosomal dominant ‘truncating’ mutations in Pfam domains for structural proteins against an enrichment for transcription factors ( Figure 4D ), probably associated with dominant‐negative effects versus haploinsufficiency, respectively. Node removal versus edgetic perturbation in complex gene‐disease associations The complex patterns of disease mutations noted so far indicate that a substantial fraction of causative alleles in human genetic disorders may cause edgetic perturbations rather than node removal. Distinct network perturbation models, leading to distinct phenotypic outcomes ( Figure 1 ), predict that ‘truncating’ versus ‘in‐frame’ alleles for a given gene product might cause different diseases ( Figure 5A ). We therefore examined 142 genes associated with two or more diseases for which at least five distinct alleles have been reported for each disease. Among 278 disease pairs, each associated with a single one of these 142 genes, we found 88 pairs (∼30%) for which the proportion of ‘in‐frame’ versus ‘truncating’ mutations is significantly different between the two diseases ( P <0.05; Figure 5B and Supplementary Table 2 ). A noteworthy example involves the four types (I, II, III, and IV) of osteogenesis imperfecta (OI) with COL1A1 ‘in‐frame’ mutations causing strikingly more severe phenotypes (in type II, III, or IV) than ‘truncating’ mutations involved in type I ( Hamosh , 2005 ; Figure 5B ). Distinct node removal versus edgetic perturbation underlying pleiotropy. ( A ) Schematic illustration of distinct ‘truncating’ versus ‘in‐frame’ mutations in a single gene product causing distinct network perturbations giving rise to distinct disorders. ( B ) Analysis of ‘in‐frame’ mutations found in genes associated with multiple diseases. Each dot represents the fraction of ‘in‐frame’ mutations of a pair of distinct diseases associated with a common gene. x ‐axis represents the smaller fraction of ‘in‐frame’ mutation in each pair and y ‐axis represents the larger fraction. Significantly different fractions of ‘in‐frame’ mutation between each pair of diseases are represented by red dots ( P <0.05). Statistically indistinguishable pairs are represented in black. Three gray arrows pointing to three disease pairs corresponding to Type I and Type II, III or IV Osteogenesis Imperfecta, with ‘in‐frame’ mutation fraction of 0.19 and 0.93, 0.83, 0.75 respectively. Among 34 genes that are linked to both autosomal dominant and autosomal recessive disorders, the fraction of ‘in‐frame’ versus ‘truncating’ mutations per gene is significantly higher for autosomal dominant mutations than for autosomal recessive ones ( Supplementary Figure S8 ). This finding further supports our hypothesis that distinct ‘in‐frame’ versus ‘truncating’ mutations probably cause distinct network perturbations giving rise to disease with distinct modes of inheritance ( Figure 2 ). Edgetic interaction profiles of CBS and PRKAR1A mutant proteins ( Figure 3 ) revealed possible connections between allele‐specific interaction defects and differential treatment responses or phenotypic severity among patients ( Supplementary information ). In addition to clinical variability, edgetic perturbation models also predict that distinct edgetic perturbations for a given gene product might cause phenotypically distinguishable disorders ( Figure 6A ). We used predicted Pfam domains as surrogates for functional protein domains ( Sammut , 2008 ), assuming that ‘in‐frame’ mutations located in different Pfam domains probably alter protein functions differently. Among 169 genes associated with two or more diseases and encoding proteins containing at least two Pfam domains, 77 had significant enrichment of ‘in‐frame’ mutations in Pfam domains ( P <0.05). There were nine proteins with at least two Pfam domains significantly enriched with ‘in‐frame’ mutations ( P <0.05). For each of the nine proteins, we found a striking pattern of near mutual exclusivity, whereby different Pfam domains seem to be specifically affected in distinct disorders ( Figure 6B and Supplementary Table 3 ). A compelling example is TP63 ( van Bokhoven and Brunner, 2002 ) in which two clinically distinct developmental disorders, ectrodactyly ectodermal dysplasia (EEC) and ankyloblepharon ectodermal dysplasia (AEC), are caused by mutations in two separate domains, one predicted to bind DNA and the other to mediate protein–protein interaction(s) ( Figure 6B ). Current information on protein functional domains is incomplete, thus limiting the resolution for distinguishing phenotypes and genotypes. With more detailed structural and biochemical information available, more such allele‐specific edgetic phenotype‐to‐genotype correlations should be uncovered. Distinct edgetic perturbations underlying pleiotropy. ( A ) Schematic illustration of distinct ‘in‐frame’ alleles in a single gene product causing distinct network perturbations giving rise to distinct disorders. ( B ) Enrichment of ‘in‐frame’ mutations causing different disorders in different Pfam domains. Color intensity of Pfam domains represents fold enrichment of each disease associated ‘in‐frame’ mutations ( P <0.05). Vertical lines below corresponding Pfam domains mark disease‐causing ‘in‐frame’ mutations in TP63 . Discussion There are commonalities behind disease mutations that have been discerned, such as disease mutations tend to present at highly conserved regions and to confer radical changes to proteins ( Wang and Moult, 2001 ; Botstein and Risch, 2003 ; Yue , 2005 ; Subramanian and Kumar, 2006 ), but there are more complexities to disease mutations and these should not be overlooked. Here we uncovered both experimental and computational evidences that strongly support distinct network perturbations in human Mendelian disorders resulting from complete loss of gene products (node removal) or specific alterations in distinct molecular interaction(s) (edgetic perturbation), respectively ( Figures 2 , 3 and 4 ). Distinct edgetic network perturbations probably underlie many complex genotype‐to‐phenotype relationships in human genetic disorders ( Figures 5 and 6 ) supporting the idea that edgetic perturbation versus node removal may confer fundamentally different functional consequences. Edgetic network perturbation models focus on specific alterations in distinct molecular interactions. Although the ‘node‐centered’ gene knockout or knockdown approaches are convenient and useful in determining effects of gross disruption of proteins in model organisms, an ‘edge‐centered’ allele‐profiling approach, as carried out here and elsewhere ( Dreze , in press ), dissects the dynamics and complexities of biological systems, in which different interactions may occur independently, and in which a single protein may carry out different functions with different partners or in different biological contexts. Edgetic alleles with suboptimal but largely preserved molecular interactions may become insufficient when expressed at reduced levels or may become less stable. Such properties of edgetic alleles may be regulated by other genetic or environmental factors. In this regard, functional characterization of edgetic alleles may help explain phenotypic variations among patients, such as incomplete penetrance or variable expressivity, as well as differential clinical treatment responses (e.g. CBS alleles, Supplementary information ). In addition, edgetic network perturbation models might improve our understanding of why and how disease alleles have disseminated in human populations. Just as high‐throughput sequencing technologies are revolutionizing genotyping platforms, and as functional genomics and proteomics are becoming increasingly able to characterize gene products resulting from whole genome sequencing and gene prediction, functional characterizations of genetic variations may be applied at large‐scale to characterize mutations with uncertain pathological consequences. We considered the effects of disease‐causing mutations on physical protein–protein interactions, perturbation of which has emerged as a characteristic shared by many disease mutations ( Ye , 2006 ; Hsu , 2007 ; Schuster‐Böckler and Bateman, 2008 ). Complete understanding of network perturbations in disease would require comprehensive analysis of disease mutant proteins by integration of data available from multiple functional assays. First, the current interactome network derived from Y2H analysis is probably incomplete. Many biologically relevant interactors remain to be tested and many may not be recovered by Y2H alone or by any other single protein interaction assay ( Braun , 2009 ; Venkatesan , 2009 ). Second, Y2H detects binary protein interactions. A positive Y2H readout does not necessarily warrant proper protein complex assembly in vivo . In oligomer assembly, multiple interaction surfaces of the monomer may be utilized. Mutant alleles that disrupt one but not all interaction surfaces may show positive interaction in the Y2H analysis, but may still affect proper oligomerization. Third, Y2H is not quantitative. Subtle alterations in the affinity of protein–protein interactions, which are undetectable by Y2H, may confer phenotypic changes. Finally, disease mutations may affect protein functions by altering biochemical activities or protein–DNA or protein–RNA interactions. Disease‐associated alleles may also gain new interactions, which is another important potential mechanism for pathogenicity. Gain‐of‐interaction alleles may be discovered by screening for new interactions specific for an individual mutant. Although we can assay only known edges at any given moment, as more physical and biochemical interactions become identified with time, deeper edgetic profiling will become possible. The pilot step taken here will reach its full potential when applied at genome or proteome scale, with the results integrated into extensive molecular networks. Materials and methods Database annotation The lists of genes and associated phenotypes were downloaded from HGMD website ( Stenson , 2003 ) (June 2006). The corresponding gene IDs were retrieved from Entrez Gene ( Maglott , 2005 ) (June 2006). By manual annotation we linked phenotypes associated with each mutation, as annotated in HGMD, to the corresponding disease in the OMIM database ( Hamosh , 2005 ). The resulting list contains 2269 gene‐to‐OMIM disease ID entries associated with 48 774 distinct mutations. We carried out all analyses on the resulting gene–OMIM disease associations. We obtained the inheritance information for the corresponding disease available in OMIM and separated mutations associated with autosomal dominant or autosomal recessive inheritance. A total of 1777 gene‐to‐OMIM disease entries, which involve 1281 genes, 1466 OMIM disease IDs and 35 154 mutations, are associated with either autosomal dominant or autosomal recessive inheritance. Fraction of ‘in‐frame’ mutations We grouped missense and small in‐frame insertions, deletions and indels (types of mutations as defined in HGMD) as ‘in‐frame’ mutations, whereas nonsense, splicing and small out‐of‐frame frame insertions, deletions and indels we grouped as ‘truncating’ mutations. We calculated the fraction of ‘in‐frame’ mutations as the number of ‘in‐frame’ mutations divided by the total number of mutations in each gene for each mode of inheritance ( Figure 2C and D and Supplementary Figures S1 and S8 ) or for each disease ( Figure 5B ). To minimize the possibility of any existing trend being obscured by genes with few mutations, we limited our analysis to genes that have five or more mutations associated with each inheritance ( Figure 2C and D and Supplementary Figures S1 and S8) or each disease ( Figure 5B ). Essential human genes were estimated from the orthologs of mouse ( Goh , 2007 ), fly, worm and yeast essential genes. Fly essential genes were extracted from Flybase ( Wilson , 2008b ; phenotype class: ‘lethal’), yeast essential genes from SGD ( Ball , 2000 ; phenotype: ‘inviable’), and worm essential genes from RNAiDB ( Gunsalus , 2004 ; phenotypes: ‘lethal’, ‘embryonic lethal’, ‘larval lethal’ and ‘adult lethal’). Profiling interaction defects of mutant proteins Disease mutant clones were generated by PCR mutagenesis essentially as described previously ( Suzuki , 2005 ). Forward and reverse internal primers used are listed ( Supplementary Table 4 ). All sequence‐confirmed Entry clones of mutant alleles were transferred individually by Gateway recombinational cloning into both pDB‐dest and pAD‐dest‐CYH destination vectors, generating DB–ORF allele and AD–ORF allele fusions ( Rual , 2005 ). To test against wild‐type interactors, the DB–ORF and AD–ORF clones for CBS, HGD, ACTG1, CDK4 and PRKAR1A mutant proteins were transformed into MAT α MaV203 or MAT a MaV103 yeast strains, respectively. Each interaction pair was tested for growth on SC‐His+3AT (synthetic medium without leucine, tryptophan and histidine, containing 20 mM 3‐amino‐1,2,4‐triazole) plates to confirm GAL1 :: HIS3 transcriptional activity, on yeast extract–peptone–dextrose (YPD) medium to determine GAL1 :: lacZ transcriptional activity using a ‐galactosidase filter assay, and on SC‐Ura plates (synthetic medium without leucine, tryptophan and uracil) to determine SPAL10 :: URA3 transcriptional activity. Scoring of Y2H reporters was done by comparing to a set of Y2H control strains that contain plasmids expressing pairs of proteins with a spectrum of interaction strengths ( Supplementary Figure S9 ). Activation of at least two of the three reporter genes was taken as a positive interaction. Interaction pairs showing less than two positive reporters are scored as ‘−’. Interaction pairs showing the same number of positive reporters as the corresponding wild type are scored as ‘+’. Interactions that lose expression of one reporter but still show expression of the other two reporters are scored as ‘R’. For immunoblotting, yeast cells with AD–ORF fusions were cultured overnight at 30°C in synthetic medium without tryptophan and then grown in YPD medium to mid‐exponential phase. Cells were collected and treated with 150 mM of NaOH on ice for 15 min and then lysed in 0.8% SDS buffer (0.024 M Tris–HCl (pH 6.8), 10% glycerol, 0.04% bromophenol blue and 0.4% 2‐mercaptoethanol) for 5 min at 95°C. Whole cell lysates were cleared by centrifugation at 14 000 g . Resulting supernatants were separated on NuPAGE acrylamide gels (Invitrogen) and electrophoretically transferred onto a PVDF membrane (Invitrogen). AD fusion proteins were detected by standard immunoblotting techniques using anti‐GAL4 (Activation domain) antibody produced in rabbit (Sigma) as the primary antibody. For comparison with experimental data, the following structures were used: 1JBQ for CBS ( Meier , 2001 ), 1EYB and 1EY2 for HGD ( Titus , 2000 ), 2BTF ( Schutt , 1993 ), 1HLU ( Chik , 1996 ) and 2OAN ( Lassing , 2007 ) for bovine β‐actin, 2W9F, 2W9Z, 2W96, 2W99 ( Day , 2009 ) for CDK4, and 1G3N ( Jeffrey , 2000 ) for CDK6–CDKN2C complex. Figures of tertiary structures were generated with PyMol ( http://www.pymol.org ). The relative solvent‐accessible surface areas (%ASAs) were calculated with PSAIA ( Mihel , 2008 ). Structural analyses Protein structures were downloaded from the Protein Data Bank website (PDB, http://www.rcsb.org/pdb ). Removal of redundant structures was achieved using the PISCES server ( Wang and Dunbrack, 2005 ) with the following criteria: X‐ray structures only; no structure with Cα only; resolution ⩽3 Å; R‐factor ⩽0.3; sequence length between 40 and 10 000 amino acids; and maximum 90% of sequence identity between similar PDB structures. This filtering collected 249 non‐redundant protein structures corresponding to 236 genes in HGMD. To repair residual mismatches between the residue numeration in PDB files and in HGMD, PDB sequences were aligned against their corresponding cDNA sequences in HGMD using CLUSTALW ( Chenna , 2003 ). The relative accessibility of over 91 000 residues in all 249 structures was calculated using PSAIA ( Mihel , 2008 ). With multimers, accessibility was computed for all monomers considered independently and the multiple values obtained for the same residue were averaged. Among the 3664 residues affected by missense mutations, 1590 and 1045 were associated with autosomal recessive and autosomal dominant diseases, respectively. Pfam domain assignment Pfam domains (Pfam‐A family only) were computed for cDNA sequences provided by HGMD, using InterProScan version 4.3 ( http://www.ebi.ac.uk/Tools/InterProScan/ ). Missense, nonsense, in‐frame and out‐of‐frame small insertions, deletions, and indels were then mapped onto the cDNA sequences and Pfam domains, generating a dataset containing 1348 genes with at least one Pfam‐A domain and 34 964 associated mutations. Among them, a total of 10 904 ‘truncating’ mutations are used for the analysis shown in Figure 4D , including 6212 associated with autosomal dominant diseases and 4692 associated with autosomal recessive diseases. Statistics were generated on the sum of a particular mutation type that either fell into or out of any Pfam‐A domain in its respective protein versus the total fraction of the Pfam‐A domain sequences in the protein sequence. Transcription factors and structural proteins Information on genes encoding transcription factors was obtained from Gene Ontology ( Harris , 2004 ) annotations (948 genes with the GO term of ‘transcription factor activity’) and predictions in the transcription factor database (DNA Binding Domain, DBD; Wilson , 2008a ; 1467 genes). A total of 1697 human transcription factor genes were retrieved. Among them, 82 genes associated with autosomal dominant diseases that have at least one mutation in HGMD were used for Pfam analysis ( Figure 4D ), and 56 genes with five mutations or more were used for analysis of ‘in‐frame’ mutations ( Figure 2D ). Structural protein coding genes were retrieved from Gene Ontology annotations of ‘cytoskeleton’ (992 genes). Among them, 72 genes with at least one mutation in HGMD were used for Pfam analysis ( Figure 4D ), and 47 genes with five mutations or more were used for analysis of ‘in‐frame’ mutations ( Figure 2D ). DBD and Gene Ontology data were downloaded in March 2008. Statistical analysis Error bars represent the s.e.m. values. Significance of the observed difference in the distributions of ‘in‐frame’ versus ‘truncating’ mutations in autosomal dominant and autosomal recessive disease, the greater proportions of ‘in‐frame’ mutations in structural proteins than in transcription factors, as well as the greater accessibility of residues mutated in autosomal dominant versus autosomal recessive diseases, was evaluated using the non‐parametric Mann–Whitney U test. Enrichments of disease alleles in Pfam domains were determined using odds ratio and the significance thereof using Fisher's exact test. A fold enrichment higher than one means Pfam domains contain more mutations than expected at random, whereas an enrichment between zero and one means a depletion in mutations. The differences between proportions of ‘in‐frame’ mutations in each pair of diseases associated with the same gene were assessed by Fisher's exact test. All statistics were computed using the R package ( http://www.r‐project.org/ ). Acknowledgements We thank all members of the Vidal Lab and the Center for Cancer Systems Biology (CCSB), Dr Patricia K Donahoe and Dr Roseann Mulloy for helpful suggestions; Ines M Pinto for help with experiments. This study was supported by the Ellison Foundation and the WM Keck Foundation (MV), NIH grants R01‐HG001715 from NHGRI (MV and F Roth), U01‐CA105423 (PI, S Orkin, project leader, MV), U54‐CA112952 (PI, J Nevins, subcontract, MV) and R33‐CA132073 (MV) from NCI, and by Institute Sponsored Research funds from the Dana‐Farber Cancer Institute Strategic Initiative awarded to CCSB. KV was supported by an NIH NRSA training grant fellowship (T32‐CA09361). BC was supported by the Belgian Program on Interuniversity Attraction Poles initiated by the Federal Office for Scientific, Technical and Cultural Affairs (IAP P6/19 PROFUSA). MV and RB are ‘Honorary Research Associate’ and ‘Research Director’ from the Fonds de la Recherche Scientifique (FRS‐FNRS, French Community of Belgium), respectively. Conflict of Interest The authors declare that they have no conflict of interest.

Journal

Molecular Systems BiologyWiley

Published: Jan 1, 2009

Keywords: ; ; ;

There are no references for this article.