The Extraordinary Evolutionary History of the Reticuloendotheliosis Virusesdoi: 10.1371/journal.pbio.1001642pmid: 24013706
Introduction The reticuloendotheliosis viruses (REVs) comprise several closely related amphotropic retroviruses (family Retroviridae) isolated from birds [1]. The prototypic REV isolate was isolated from a turkey in 1957 [2]. Subsequently, REV infections have been reported in a diverse range of gamebirds (order Galliformes) and waterfowl (order Anseriformes). Infection is associated with a range of disease syndromes, including anemia, immunosuppression, neoplasia, runting, and feathering abnormalities called “nakanuke.” The etiology of REV infection remains enigmatic—although antibodies to REV are widespread in poultry, REV outbreaks occur only sporadically and are relatively rare [3]. All retroviruses replicate their genomes via a DNA intermediate that is integrated into the nuclear DNA of the host cell and is referred to as a “provirus.” Occasionally, infection of germ cells allows retroviral proviruses to enter the host germline, so that they can be vertically inherited as host alleles, called endogenous retroviruses (ERVs) [4], a proportion of which end up becoming fixed in the germline. These ancestral retrovirus sequences represent retroviral “fossils” [5],[6], and as such they support “paleovirological” investigations that seek to address the long-term, macroevolutionary history of interaction between hosts and retroviruses [7],[8]. In a previous study, phylogenetic analysis of retroviral polymerase (pol) gene sequences revealed that REV groups robustly within the Gammaretrovirus genus, and is closely related to an ERV in the genome of the short-beaked echidna (Tachyglossus aculeatus)—an-egg laying mammal found only in Australia and New Guinea [9]. This discovery reinforced the conclusions of earlier, serological studies, which proposed REVs to have originated in mammals [10]. Curiously, sequences derived from REV have also been identified in the genomes of two large DNA viruses that naturally infect birds: fowlpox virus (FWPV), a poxvirus [11],[12] (family Poxviridae), and gallid herpesvirus 2 (GHV-2), a herpesvirus (family Herpesviridae). FWPV infects poultry and wild birds throughout the world, and causes a mild-to-severe, slow developing disease (avian pox) characterized by the formation of proliferative external lesions (dry pox), and diphtheritic lesions in the digestive and respiratory tracts (wet pox) [13]. GHV-2 is the causative agent of Marek's disease, a highly contagious disease of chickens and other galliform birds that is associated with a wide range of clinical syndromes, including neoplasia and paralysis [14]. Clinical disease is not always apparent in infected birds, but mortality rates in susceptible flocks can be very high [14]. Contamination of both FWPV and Marek's disease vaccines with replication competent REV, leading to outbreaks of REV infection, has been reported on numerous distinct occasions [3],[15]. However, only remnant REV sequences, incapable of expressing retrovirus, have been identified in GHV-2 and FWPV vaccine strains (typically a “solo LTR” derived from the long terminal repeat (LTR) regions that flank the provirus) [12],[16]. By contrast, FWPV field strains containing near full-length REV proviruses appear to circulate naturally in unvaccinated birds [12],[16]–[19]. Recently, a field strain of GHV-2 containing a novel REV LTR insertion was reported [20]. In this study, we used a combination of PCR-based and in silico screening to explore the origin and evolutionary history of the REV lineage, and to investigate the processes linking exogenous REV isolates with endogenous REV-related sequences in virus and animal genomes. Results Paleovirological History of the REV Lineage To investigate the deeper origins of the REVs, we screened avian and mammalian genome sequence databases (Table S1) for ERV sequences closely related to REV (Table 1). Screening of 42 mammalian genomes identified numerous ERV loci that disclosed highly significant similarity to one or more REV coding domains, but none that matched closely to REV across the entire coding region of the genome. We found that all mammalian ERVs exhibiting a high degree of sequence similarity to REV in the gag-pol domain exhibited no such similarity in env, and vice versa. This can be assumed to reflect the recombinant genome structure of REV [21],[22], comprising a Gammaretrovirus gag-pol domain fused to an env domain that is more commonly associated with the Betaretrovirus genus (although it also occurs in some other Gammaretroviruses, also considered to be recombinants [23]). No ERV loci closely related to REV were detected in avian genomes. We did identify numerous avian ERVs that disclosed weak similarity to REV in pol (30–40% amino acid identity). However, phylogenetic analysis revealed these ERVs to be derived from ancient, highly degenerated ERV lineages that were clearly distinct from modern Gammaretroviruses (Figure 1). Download: PPT PowerPoint slide PNG larger image TIFF original image Figure 1. Evolutionary relationships among the RT genes of exogenous Gammaretroviruses and related ERVs. Shaded boxes indicate taxa that are known to occur as exogenous retroviruses. Brackets to the right indicate major lineages (note: an integrated taxonomy of exogenous and ERVs has yet to be established by the International Committee on Taxonomy of Viruses, and the groupings shown here are propositional). Associations of retrovirus groups and individual retroviral taxa with avian and mammalian hosts are indicated, as shown in the key. The phylogeny shown was constructed using NJ and a multiple sequence alignment spanning 140 amino acid residues in the reverse transcriptase protein (RT), and is midpoint rooted for display purposes. To obtain putative protein sequences for ERVs, frameshifting indels were inferred and removed, and the resulting nucleotide sequence was conceptually translated. Asterisks indicate clades with bootstrap support >90% in both NJ and maximum likelihood (ML) trees, based on 1,000 bootstrap replicates. The scale bar indicates evolutionary distance in substitutions per site. Table S2 provides details of all the ERVs and exogenous retrovirus taxa shown in the phylogeny. https://doi.org/10.1371/journal.pbio.1001642.g001 Download: PPT PowerPoint slide PNG larger image TIFF original image Table 1. Distribution of REV-related sequences in vertebrate and virus genomes. https://doi.org/10.1371/journal.pbio.1001642.t001 In phylogenies based on reverse transcriptase (RT), avian REV isolates cluster tightly with a previously described ERV sequence derived from the short-beaked echidna genome [9]. During a polymerase chain reaction (PCR)–based investigation of ERV diversity in Malagasy mammals, we serendipitously identified additional ERV RT sequences that grouped within this clade, in the genomes of two Malagasy carnivore species: the ring-tailed mongoose (Galidia elegans) and the narrow-striped mongoose (Mungotictis decemlineata). We recovered near complete proviral genome sequences for all three REV-related ERVs (hereafter referred to as echidna-ERV, Galidia-ERV, and Mungotictis-ERV) (Figure 2a), revealing that they exhibit similarity to REV throughout the entire internal coding region of the genome. Crucially, echidna-ERV, Galidia-ERV, and Mungotictis-ERV grouped robustly with REV isolates in phylogenies constructed using both the pol and env coding domains (Figure 3a and 3b), establishing that they share a common, recombinant ancestor with these viruses. Thus, ERVs belonging to the REV-lineage do occur in the genomic fossil record of mammals, but as with certain other retrovirus groups, such as foamy viruses and lentiviruses, they are relatively rare [24],[25]. Download: PPT PowerPoint slide PNG larger image TIFF original image Figure 2. Genomic and paleovirological characteristics of REV-related retroviruses. The schematic in panel (a) shows the genome structure of REV and SNV, a near full-length REV insertion in the FWPV genome, and the mammalian ERVs Echidna-ERV and Galidia-ERV. Percentage sequence identity to SNV, at the amino acid level, is shown for the putative Gag, Pol, and Env polyproteins of Echidna-ERV and Galidia-ERV. Proviral coding regions that disclose homology to Gammaretroviruses are shown in green, whereas those that disclose homology to Betaretroviruses are shown in blue. ORFs flanking the REV insertion in FWPV are in yellow. Panel (b) summarizes the genomic data used to estimate the minimum age of REV-related ERV insertions in Malagasy carnivore genomes. A time-scaled Carnivora phylogeny (based on Nyakatura et al. [27]) is shown on the left, with Malagasy carnivores shaded. A corresponding schematic on the right shows the genomic locus at which an orthologous ERV insertion was identified in a subset of Malagasy carnivores. Boxes represent the env gene (blue) and 3′ LTR sequences (green = U3; dark grey = R; light grey = U5). The adjacent black line represents flanking genomic DNA, spanning 238 nucleotides, obtained from the striped mongoose (Mungotictis decemlineata) and ring-tailed mongoose (Galidia elegans) genomes in our study, and aligned to a homologous genomic region (lacking a proviral insertion) in the cat (Felis catus), dog (Canis familiaris), and ferret (Mustela furo) genomes. An orthologous ERV insertion was detected in M. decemlineata and G. elegans genomes, but not in the more distantly related Fossa (Cryptoprocta ferox), indicating that germline invasion occurred between 18 and 8 Ma. Genetic data indicate that all Malagasy carnivores are derived from a single founder population that colonized Madagascar ∼19 Ma [26]; thus, invasion of the Malagasy carnivore germline occurred in Madagascar. The nucleotide sequence alignment on which the schematic in panel (b) is based on is shown in Figure S1. Abbreviations: RV, retrovirus; Kb, Kilobases; ORF, open reading frame; PBS, primer binding site; Pro, proline; Thr, threonine; LTR, long terminal repeat; U3, unique three prime region; R, repeat region; U5, Unique five prime region; RT, reverse transcriptase; SU, surface protein; TM, transmembrane protein; M.dec, Mungotictis decemlineata; G.ele, Galidia elegans. https://doi.org/10.1371/journal.pbio.1001642.g002 Download: PPT PowerPoint slide PNG larger image TIFF original image Figure 3. Contrasting phylogenetic relationships of pol and env genes found in REV-related retroviruses. Panels (a) and (b) show ML phylogenies constructed from alignments of Gamma- and Betaretrovirus protein sequences. The phylogeny in panel (a) was constructed from an alignment spanning 157 residues of the RT protein encoded by pol, whereas the phylogeny in panel (b) was constructed from an alignment spanning 153 residues of the TM domain in the polypeptide encoded by env. Asterisks on internal nodes indicate ML bootstrap support >95%, (based on 1,000 bootstrap replicates). Asterisks beside taxa names indicate ERV families identified in this study. Open triangles indicate ERV lineages for which env genes were not identified. Scale bars indicate evolutionary distance in substitutions per site. Brackets to the right indicate genus designations, and viruses previously identified as Gamma- and Betaretrovirus (γ-β) recombinants. Table S2 provides details of all the ERVs and exogenous retrovirus taxa shown in the phylogeny. Abbreviations: RV, retrovirus; MoMLV, Moloney murine leukemia virus; FeLV, feline leukemia virus; GaLV, gibbon ape leukemia virus; KoRV, koala retrovirus; BAEV, baboon endogenous virus; SMRV, squirrel monkey retrovirus; TvERV, Trichosurus vulpecula endogenous retrovirus; JSRV, Jaagsiekte sheep retrovirus; SRV, simian retrovirus; MMTV, mouse mammary tumor virus. https://doi.org/10.1371/journal.pbio.1001642.g003 PCR results suggested that all three ERVs are low copy number (1–2 proviruses) in their host species. Along with other factors, such as the relatively short length of LTRs, this precluded the confident use of molecular clock-based approaches to date the echidna-ERV, Galidia-ERV, and Mungotictis-ERV insertions. Notably, however, internal coding regions in all three ERVs were relatively intact (although echidna-ERV has a large deletion in region of the env gene encoding the surface (SU) glycoprotein (Figure 2a)). Using a ligation-mediated PCR method, we recovered matching flanking insertion sites for Galidia-ERV and Mungotictis-ERV, confirming that REV-like viruses occur as orthologous insertions in distinct Malagasy mongoose species. This finding indicates that REV-like viruses entered the germline Malagasy mammals prior to the divergence of Galidia and Mungotictis ∼8 million years ago (Ma) [26],[27] (Figure 2b). REV-related ERVs were not detected in the more distantly related fossa (Cryptoprocta ferox). Together these findings establish that the entire REV lineage—including both mammalian and avian isolates—derives from a common founder that was generated by recombination, and circulated among mammals during the Miocene Epoch (∼23–5 Ma). Origin of the Avian REVs Given that the REV lineage clearly originates in mammals, we decided to investigate the origins of the avian REVs in greater detail. We reviewed all published reports of REV outbreaks, and sought to obtain any available archived samples (Table 2). Among REV isolates that had not previously been sequenced, we were only able to obtain one (duck infectious anemia virus (DIAV)), for which all samples had not been exhausted or destroyed. We sequenced the complete DIAV genome and constructed ML phylogenies using all available REV sequence data. Phylogenies were constructed using alignment partitions representing (i) a conserved region of the pol coding domain (Figure 4a), (ii) the complete internal coding region of the viral genome (Figure 4b), and (iii) LTR sequences (Figure 5). Download: PPT PowerPoint slide PNG larger image TIFF original image Figure 4. Phylogenetic relationships of REV coding regions. ML phylogenies constructed using (a) an alignment spanning residues 183–481 of the Pol polyprotein (DIAV coordinates) and containing REV and mammalian gammaretroviruses sequences and (b) a nucleotide alignment of the entire internal coding region of full-length avian isolates. The tree in panel (b) indicates the number of strain-specific, nonsynonymous mutations estimated to have occurred in the nucleocapsid (NC), capsid (CA), matrix (MA), protease (PR), RT, RNase-H (RH), integrase (IN), surface (SU), and TM genes of the exogenous isolate HA9901. Asterisks on internal nodes indicate ML bootstrap support >95%. All trees are midpoint rooted for display purposes. Scale bars indicate evolutionary distance in substitutions per site. Taxa labels include sequence accession numbers, and in panel (b) two-letter ISO country codes enclosed by brackets indicating the country of sampling. Further details of REV sequences included in these trees can be found in Table S4. https://doi.org/10.1371/journal.pbio.1001642.g004 Download: PPT PowerPoint slide PNG larger image TIFF original image Figure 5. Phylogenetic relationships of REV LTR sequences. ML phylogenies constructed using an alignment of REV LTR sequences. Asterisks on internal nodes indicate ML bootstrap support >95%. The phylogeny is midpoint rooted for display purposes. Scale bars indicate evolutionary distance in substitutions per site. Taxa labels include two-letter ISO country codes indicating the country of sampling.. Taxa labels include accession numbers and two-letter ISO country codes enclosed by brackets indicating country of sampling. Where appropriate, FWPV and GHV-2 strain designations are shown in bold. Further details of REV sequences used in the tree, and an alignment figure highlighting lineage-specific LTR indels, can be found in Table S4 and Figure S2, respectively. https://doi.org/10.1371/journal.pbio.1001642.g005 Download: PPT PowerPoint slide PNG larger image TIFF original image Table 2. REV publication timeline. https://doi.org/10.1371/journal.pbio.1001642.t002 All three phylogenies consistently disclosed three major lineages. The first was comprised of spleen necrosis virus (SNV) and DIAV. Both these viruses were isolated from ducks that were experimentally infected with Plasmodium lophurae (SNV in 1959 [28] and DIAV in 1972 [29]). The report describing the isolation of DIAV concluded that P. lophurae stocks were the source of infection, and demonstrated contamination of stocks in five different laboratories. Sequencing revealed that SNV and DIAV are highly related (∼98% nucleotide identity), despite being isolated 13 years apart, establishing that contaminated stocks have been the source of multiple outbreaks of retroviral infection in P. lophurae–infected ducks, dating back as far as 1959, and likely earlier [28]–[32]. A second clade comprised the REV insertion in FWPV and exogenous REV isolates obtained independently in different countries, including the prototypic REV isolates isolated in the United States [2]. In addition, LTR phylogenies revealed this clade to include insertions present in two distinct GHV-2 strains: an attenuated lab strain (RM1 [33]), and a field strain (GX0101 [20]). Virus in this clade exhibit remarkably little genetic variation overall, despite having apparently been maintained in the avian population for at least 50 years [2]. It thus appears unlikely that the exogenous REV isolates in this clade are spreading primarily through horizontal transmission of infectious retrovirus (since this would be expected to generate greater nucleotide sequence diversity among isolates). Instead, phylogenies suggest that exogenous retroviruses are being expressed from a stable FWPV-REV vector that circulates among domestic and wild birds. This would explain why antibodies to REV are widespread in poultry (see Table S5), and why REV infections occur not only in association with contaminated vaccines, but also in wild birds and unvaccinated commercial flocks [3]. Revealingly, several reports describe FWPV or undiagnosed pox-like infections occurring in bird populations shortly prior to the occurrence of REV outbreaks [2],[34]–[38]. The third clade comprised the exogenous REV isolate HA9901, from China [39], as well as LTR sequences obtained from the JM-Hi3 strain of GHV-2, and a REV plasmid (pREVA6 [40]). This clade is robustly supported in bootstrapped phylogenies, and the presence of unique, shared indels in LTRs provides further evidence of common ancestry (Figure S2). These observations establish that HA9901 shares a common history with pREVA6, which can ultimately be traced back to the prototypic REV specimen [2],[41]. Interestingly, HA9901 has acquired numerous nonsynonymous mutations, consistent with ongoing replication as an exogenous retrovirus (Figure 4b). Paleovirological History of the REV Lineage To investigate the deeper origins of the REVs, we screened avian and mammalian genome sequence databases (Table S1) for ERV sequences closely related to REV (Table 1). Screening of 42 mammalian genomes identified numerous ERV loci that disclosed highly significant similarity to one or more REV coding domains, but none that matched closely to REV across the entire coding region of the genome. We found that all mammalian ERVs exhibiting a high degree of sequence similarity to REV in the gag-pol domain exhibited no such similarity in env, and vice versa. This can be assumed to reflect the recombinant genome structure of REV [21],[22], comprising a Gammaretrovirus gag-pol domain fused to an env domain that is more commonly associated with the Betaretrovirus genus (although it also occurs in some other Gammaretroviruses, also considered to be recombinants [23]). No ERV loci closely related to REV were detected in avian genomes. We did identify numerous avian ERVs that disclosed weak similarity to REV in pol (30–40% amino acid identity). However, phylogenetic analysis revealed these ERVs to be derived from ancient, highly degenerated ERV lineages that were clearly distinct from modern Gammaretroviruses (Figure 1). Download: PPT PowerPoint slide PNG larger image TIFF original image Figure 1. Evolutionary relationships among the RT genes of exogenous Gammaretroviruses and related ERVs. Shaded boxes indicate taxa that are known to occur as exogenous retroviruses. Brackets to the right indicate major lineages (note: an integrated taxonomy of exogenous and ERVs has yet to be established by the International Committee on Taxonomy of Viruses, and the groupings shown here are propositional). Associations of retrovirus groups and individual retroviral taxa with avian and mammalian hosts are indicated, as shown in the key. The phylogeny shown was constructed using NJ and a multiple sequence alignment spanning 140 amino acid residues in the reverse transcriptase protein (RT), and is midpoint rooted for display purposes. To obtain putative protein sequences for ERVs, frameshifting indels were inferred and removed, and the resulting nucleotide sequence was conceptually translated. Asterisks indicate clades with bootstrap support >90% in both NJ and maximum likelihood (ML) trees, based on 1,000 bootstrap replicates. The scale bar indicates evolutionary distance in substitutions per site. Table S2 provides details of all the ERVs and exogenous retrovirus taxa shown in the phylogeny. https://doi.org/10.1371/journal.pbio.1001642.g001 Download: PPT PowerPoint slide PNG larger image TIFF original image Table 1. Distribution of REV-related sequences in vertebrate and virus genomes. https://doi.org/10.1371/journal.pbio.1001642.t001 In phylogenies based on reverse transcriptase (RT), avian REV isolates cluster tightly with a previously described ERV sequence derived from the short-beaked echidna genome [9]. During a polymerase chain reaction (PCR)–based investigation of ERV diversity in Malagasy mammals, we serendipitously identified additional ERV RT sequences that grouped within this clade, in the genomes of two Malagasy carnivore species: the ring-tailed mongoose (Galidia elegans) and the narrow-striped mongoose (Mungotictis decemlineata). We recovered near complete proviral genome sequences for all three REV-related ERVs (hereafter referred to as echidna-ERV, Galidia-ERV, and Mungotictis-ERV) (Figure 2a), revealing that they exhibit similarity to REV throughout the entire internal coding region of the genome. Crucially, echidna-ERV, Galidia-ERV, and Mungotictis-ERV grouped robustly with REV isolates in phylogenies constructed using both the pol and env coding domains (Figure 3a and 3b), establishing that they share a common, recombinant ancestor with these viruses. Thus, ERVs belonging to the REV-lineage do occur in the genomic fossil record of mammals, but as with certain other retrovirus groups, such as foamy viruses and lentiviruses, they are relatively rare [24],[25]. Download: PPT PowerPoint slide PNG larger image TIFF original image Figure 2. Genomic and paleovirological characteristics of REV-related retroviruses. The schematic in panel (a) shows the genome structure of REV and SNV, a near full-length REV insertion in the FWPV genome, and the mammalian ERVs Echidna-ERV and Galidia-ERV. Percentage sequence identity to SNV, at the amino acid level, is shown for the putative Gag, Pol, and Env polyproteins of Echidna-ERV and Galidia-ERV. Proviral coding regions that disclose homology to Gammaretroviruses are shown in green, whereas those that disclose homology to Betaretroviruses are shown in blue. ORFs flanking the REV insertion in FWPV are in yellow. Panel (b) summarizes the genomic data used to estimate the minimum age of REV-related ERV insertions in Malagasy carnivore genomes. A time-scaled Carnivora phylogeny (based on Nyakatura et al. [27]) is shown on the left, with Malagasy carnivores shaded. A corresponding schematic on the right shows the genomic locus at which an orthologous ERV insertion was identified in a subset of Malagasy carnivores. Boxes represent the env gene (blue) and 3′ LTR sequences (green = U3; dark grey = R; light grey = U5). The adjacent black line represents flanking genomic DNA, spanning 238 nucleotides, obtained from the striped mongoose (Mungotictis decemlineata) and ring-tailed mongoose (Galidia elegans) genomes in our study, and aligned to a homologous genomic region (lacking a proviral insertion) in the cat (Felis catus), dog (Canis familiaris), and ferret (Mustela furo) genomes. An orthologous ERV insertion was detected in M. decemlineata and G. elegans genomes, but not in the more distantly related Fossa (Cryptoprocta ferox), indicating that germline invasion occurred between 18 and 8 Ma. Genetic data indicate that all Malagasy carnivores are derived from a single founder population that colonized Madagascar ∼19 Ma [26]; thus, invasion of the Malagasy carnivore germline occurred in Madagascar. The nucleotide sequence alignment on which the schematic in panel (b) is based on is shown in Figure S1. Abbreviations: RV, retrovirus; Kb, Kilobases; ORF, open reading frame; PBS, primer binding site; Pro, proline; Thr, threonine; LTR, long terminal repeat; U3, unique three prime region; R, repeat region; U5, Unique five prime region; RT, reverse transcriptase; SU, surface protein; TM, transmembrane protein; M.dec, Mungotictis decemlineata; G.ele, Galidia elegans. https://doi.org/10.1371/journal.pbio.1001642.g002 Download: PPT PowerPoint slide PNG larger image TIFF original image Figure 3. Contrasting phylogenetic relationships of pol and env genes found in REV-related retroviruses. Panels (a) and (b) show ML phylogenies constructed from alignments of Gamma- and Betaretrovirus protein sequences. The phylogeny in panel (a) was constructed from an alignment spanning 157 residues of the RT protein encoded by pol, whereas the phylogeny in panel (b) was constructed from an alignment spanning 153 residues of the TM domain in the polypeptide encoded by env. Asterisks on internal nodes indicate ML bootstrap support >95%, (based on 1,000 bootstrap replicates). Asterisks beside taxa names indicate ERV families identified in this study. Open triangles indicate ERV lineages for which env genes were not identified. Scale bars indicate evolutionary distance in substitutions per site. Brackets to the right indicate genus designations, and viruses previously identified as Gamma- and Betaretrovirus (γ-β) recombinants. Table S2 provides details of all the ERVs and exogenous retrovirus taxa shown in the phylogeny. Abbreviations: RV, retrovirus; MoMLV, Moloney murine leukemia virus; FeLV, feline leukemia virus; GaLV, gibbon ape leukemia virus; KoRV, koala retrovirus; BAEV, baboon endogenous virus; SMRV, squirrel monkey retrovirus; TvERV, Trichosurus vulpecula endogenous retrovirus; JSRV, Jaagsiekte sheep retrovirus; SRV, simian retrovirus; MMTV, mouse mammary tumor virus. https://doi.org/10.1371/journal.pbio.1001642.g003 PCR results suggested that all three ERVs are low copy number (1–2 proviruses) in their host species. Along with other factors, such as the relatively short length of LTRs, this precluded the confident use of molecular clock-based approaches to date the echidna-ERV, Galidia-ERV, and Mungotictis-ERV insertions. Notably, however, internal coding regions in all three ERVs were relatively intact (although echidna-ERV has a large deletion in region of the env gene encoding the surface (SU) glycoprotein (Figure 2a)). Using a ligation-mediated PCR method, we recovered matching flanking insertion sites for Galidia-ERV and Mungotictis-ERV, confirming that REV-like viruses occur as orthologous insertions in distinct Malagasy mongoose species. This finding indicates that REV-like viruses entered the germline Malagasy mammals prior to the divergence of Galidia and Mungotictis ∼8 million years ago (Ma) [26],[27] (Figure 2b). REV-related ERVs were not detected in the more distantly related fossa (Cryptoprocta ferox). Together these findings establish that the entire REV lineage—including both mammalian and avian isolates—derives from a common founder that was generated by recombination, and circulated among mammals during the Miocene Epoch (∼23–5 Ma). Origin of the Avian REVs Given that the REV lineage clearly originates in mammals, we decided to investigate the origins of the avian REVs in greater detail. We reviewed all published reports of REV outbreaks, and sought to obtain any available archived samples (Table 2). Among REV isolates that had not previously been sequenced, we were only able to obtain one (duck infectious anemia virus (DIAV)), for which all samples had not been exhausted or destroyed. We sequenced the complete DIAV genome and constructed ML phylogenies using all available REV sequence data. Phylogenies were constructed using alignment partitions representing (i) a conserved region of the pol coding domain (Figure 4a), (ii) the complete internal coding region of the viral genome (Figure 4b), and (iii) LTR sequences (Figure 5). Download: PPT PowerPoint slide PNG larger image TIFF original image Figure 4. Phylogenetic relationships of REV coding regions. ML phylogenies constructed using (a) an alignment spanning residues 183–481 of the Pol polyprotein (DIAV coordinates) and containing REV and mammalian gammaretroviruses sequences and (b) a nucleotide alignment of the entire internal coding region of full-length avian isolates. The tree in panel (b) indicates the number of strain-specific, nonsynonymous mutations estimated to have occurred in the nucleocapsid (NC), capsid (CA), matrix (MA), protease (PR), RT, RNase-H (RH), integrase (IN), surface (SU), and TM genes of the exogenous isolate HA9901. Asterisks on internal nodes indicate ML bootstrap support >95%. All trees are midpoint rooted for display purposes. Scale bars indicate evolutionary distance in substitutions per site. Taxa labels include sequence accession numbers, and in panel (b) two-letter ISO country codes enclosed by brackets indicating the country of sampling. Further details of REV sequences included in these trees can be found in Table S4. https://doi.org/10.1371/journal.pbio.1001642.g004 Download: PPT PowerPoint slide PNG larger image TIFF original image Figure 5. Phylogenetic relationships of REV LTR sequences. ML phylogenies constructed using an alignment of REV LTR sequences. Asterisks on internal nodes indicate ML bootstrap support >95%. The phylogeny is midpoint rooted for display purposes. Scale bars indicate evolutionary distance in substitutions per site. Taxa labels include two-letter ISO country codes indicating the country of sampling.. Taxa labels include accession numbers and two-letter ISO country codes enclosed by brackets indicating country of sampling. Where appropriate, FWPV and GHV-2 strain designations are shown in bold. Further details of REV sequences used in the tree, and an alignment figure highlighting lineage-specific LTR indels, can be found in Table S4 and Figure S2, respectively. https://doi.org/10.1371/journal.pbio.1001642.g005 Download: PPT PowerPoint slide PNG larger image TIFF original image Table 2. REV publication timeline. https://doi.org/10.1371/journal.pbio.1001642.t002 All three phylogenies consistently disclosed three major lineages. The first was comprised of spleen necrosis virus (SNV) and DIAV. Both these viruses were isolated from ducks that were experimentally infected with Plasmodium lophurae (SNV in 1959 [28] and DIAV in 1972 [29]). The report describing the isolation of DIAV concluded that P. lophurae stocks were the source of infection, and demonstrated contamination of stocks in five different laboratories. Sequencing revealed that SNV and DIAV are highly related (∼98% nucleotide identity), despite being isolated 13 years apart, establishing that contaminated stocks have been the source of multiple outbreaks of retroviral infection in P. lophurae–infected ducks, dating back as far as 1959, and likely earlier [28]–[32]. A second clade comprised the REV insertion in FWPV and exogenous REV isolates obtained independently in different countries, including the prototypic REV isolates isolated in the United States [2]. In addition, LTR phylogenies revealed this clade to include insertions present in two distinct GHV-2 strains: an attenuated lab strain (RM1 [33]), and a field strain (GX0101 [20]). Virus in this clade exhibit remarkably little genetic variation overall, despite having apparently been maintained in the avian population for at least 50 years [2]. It thus appears unlikely that the exogenous REV isolates in this clade are spreading primarily through horizontal transmission of infectious retrovirus (since this would be expected to generate greater nucleotide sequence diversity among isolates). Instead, phylogenies suggest that exogenous retroviruses are being expressed from a stable FWPV-REV vector that circulates among domestic and wild birds. This would explain why antibodies to REV are widespread in poultry (see Table S5), and why REV infections occur not only in association with contaminated vaccines, but also in wild birds and unvaccinated commercial flocks [3]. Revealingly, several reports describe FWPV or undiagnosed pox-like infections occurring in bird populations shortly prior to the occurrence of REV outbreaks [2],[34]–[38]. The third clade comprised the exogenous REV isolate HA9901, from China [39], as well as LTR sequences obtained from the JM-Hi3 strain of GHV-2, and a REV plasmid (pREVA6 [40]). This clade is robustly supported in bootstrapped phylogenies, and the presence of unique, shared indels in LTRs provides further evidence of common ancestry (Figure S2). These observations establish that HA9901 shares a common history with pREVA6, which can ultimately be traced back to the prototypic REV specimen [2],[41]. Interestingly, HA9901 has acquired numerous nonsynonymous mutations, consistent with ongoing replication as an exogenous retrovirus (Figure 4b). Discussion The data presented in this study unequivocally demonstrate that REVs derive from a retrovirus that circulated in ancestral mammals, and originated through recombination more than 8 Ma. Furthermore, the extremely low genetic diversity observed among all avian REV isolates and sequences indicates a very recent origin for REV in birds (Figure 4 and 5). In previous studies it has generally been assumed that the REVs are a group of bona fide avian retroviruses that circulate in wild bird populations. However, phylogenetic evidence indicates that successful transmission of retroviruses, poxviruses, and herpesviruses across host classes is extremely rare, if indeed it occurs at all [9],[42],[43]. While such “long-distance” transmission events, leading to productive virus replication in the new host, likely do occur at an appreciable frequency for these viruses (particularly, for example, when the recipient host is immunocompromised), unless the transmitted virus is able to spread efficiently from host-to-host in the new species, these instances will typically represent evolutionary dead-ends [44]. Since REVs clearly originate in mammals, and all avian REVs are highly related, the entire avian REV lineage almost certainly derives from a single founder. Phylogenies rooted on mammalian REVs unambiguously place the SNV/DIAV lineage in a basal position relative to the FWPV-REV and HA9901 clades (Figure 4a). This is most readily reconciled with a scenario wherein REVs originated in P. lophurae experiments, and subsequently inserted into the FWPV and GHV-2 genomes (Figure 6). Importantly, this hypothesis of REV origin and evolution is not only consistent with the REV phylogeny, but also with the entire recorded history of REV-associated disease (Figure 7, Table 2), accounting for the disappearance of the SNV/DIAV lineage since the 1980s (when P. lophurae stocks were exhausted—see below), and the limited genetic diversity observed among all avian REVs (since relatively few virus replication cycles would be expected to separate all isolates). Moreover, this scenario accounts for the anomaly of retroviral interclass transmission, because it occurs in an experimental context wherein a pathogen (P. lophurae) is being deliberately adapted to a foreign host species. Download: PPT PowerPoint slide PNG larger image TIFF original image Figure 6. Interclass transmission and the origin of REV. A schematic showing the three possible scenarios via which the ancestor of REV could have crossed from birds (class Aves) into mammals (class Mammalia), assuming a maximum of one inter-class transmission (ICT) event in total. For each of the three scenarios shown, the phylogenetic relationships between REV isolates that would be expected to arise as a result are indicated (all phylogenies are rooted on the mammalian ancestor of avian REVs). A REV founder strain could conceivably have been transmitted from mammals to birds after first inserting into the genome of FWPV (panel a) or GHV-2 (panel b). However, only a scenario in which the SNV and DIAV lineage were established first (panel c)—as would be expected to occur if P. lophurae contamination enabled the iatrogenic emergence of virus—is compatible with the relationships observed in rooted phylogenetic trees (see Figure 4a). Abbreviations: FWPV, fowlpox virus; GHV-2, gallid herpesvirus 2. https://doi.org/10.1371/journal.pbio.1001642.g006 Download: PPT PowerPoint slide PNG larger image TIFF original image Figure 7. A hypothesis of REV origin and evolution. A schematic representation of REV evolutionary history is shown, summarizing our hypothesis regarding the origin and evolution of the three major avian REV lineages (SNV/DIAV, REV/FWPV-REV, and HA9901) from a mammalian retrovirus ancestor that originated in the Cenozoic Era. REV-associated events (i.e., outbreaks of REV-associated disease, isolation of new REV strains, or identification of REV-containing DNA virus strains) reported in the literature have been mapped onto this schematic, as indicated in the key. Numbers shown above key symbols refer to Table 2, where details of the associated publication or report can be found. The broken scale bar shows time in years A.D. to the right of the break and Ma to the left of the break. A shaded background region indicates the time window for invasion of FWPV genome following iatrogenic introduction into poultry (assuming that reports of REV sequences in FWPV vaccine strains lyophilized in 1949 [52] are accurate). Abbreviations: REV, reticuloendotheliosis virus; SNV, spleen necrosis virus; DIAV, duck infectious anemia virus; FWPV, fowlpox virus; GHV-2, gallid herpesvirus 2; FWPV-REV, Fowlpox virus with REV insertion. https://doi.org/10.1371/journal.pbio.1001642.g007 P. lophurae has only been isolated once, in June 1937, in the New York Zoological Park (now Bronx Zoo), by Lowell T. Coggeshall. Coggeshall, who was then working for the Rockefeller Foundation, was searching for a parasite that could serve as an experimental model system for malaria research. In 1935, Émile Brumpt of the Pasteur Institute had identified Plasmodium gallinaceum, a parasite causing malarial disease in poultry, during an excursion to Ceylon (now Sri Lanka) [45]. However, P. gallinaceum could not be introduced to the United States due to strict quarantine regulations against importation of poultry pathogens [46]. Reasoning that other avian species from the same geographic region might harbor a similar parasite, Coggeshall screened some South East Asian bird species that had been introduced to the New York Zoological Park in the 1920s by ornithologist Lee Saunders Crandall [47]. This led to the identification of a plasmodium in the blood of a Borneo firebacked pheasant (Lophura igniti igniti), which proved transmissible to very young chickens [48]. Stocks of this parasite, designated Plasmodium lophurae, were maintained by serial passage in chicken, duck, and turkey chicks, with 25 passages reported as of 1938 [48]. Published reports suggest that contaminating virus was present from an early stage; a 1941 study of P. lophurae noted that anemia in infected animals appeared to be decoupled from parasite replication, indicating the presence of a second infectious agent [49]. A study a few years later confirmed the presence of an additional “filterable agent”—the cause of a lethal anemia—in P. lophurae–infected poultry [30], and in 1959 William Trager identified this agent as SNV [28]. Subsequently, SNV-like viruses were isolated from P. lophurae–infected ducks on multiple distinct occasions (Table 2) [1],[31. The role of P. lophurae stocks as a source of infection appears to have gone unappreciated prior to the isolation of DIAV in 1972 [29]. But while the associated study concluded that “DIAV has been an unrecognized companion of P. lophurae for many years,” the assumption remained that the contaminating virus was a natural pathogen of ducks. Research on P. lophurae effectively ceased in the 1980s, when stocks could no longer be replenished. The organism has never subsequently been identified, and thus remains an enigma in many respects. Expeditions to Borneo have been mounted with the express purpose of obtaining further isolates, but these failed to identify the parasite in populations of wild birds [46]. Since P. lophurae stocks ran out, no further viruses belonging to the SNV/DIAV lineage have been isolated, consistent with the hypothesis that contaminated stocks were the principle reservoir of infection for these viruses. It remains unclear whether the progenitors of avian REVs were present in the animal from which P. lophurae was originally obtained or were introduced from an external source during serial passage. However, since none of the mammalian species that might be considered likely sources of contamination in a lab environment (i.e., mouse, rat, rabbit, guinea pig) appear to harbor truly REV-like viruses in exogenous or endogenous forms, whereas more exotic mammalian species do, cross-species transmission or contamination within the setting of the zoological park is an attractive hypothesis. Notably, we have identified REV-related ERVs in mammalian groups (Malagasy carnivores and Australian monotremes) that inhabit highly distinct and relatively isolated biogeographic regions, separated from one another by large expanses of ocean. This suggests that infection has been widespread in the past and that chiropteran (bat) vectors were likely at least partly involved in the spread of virus. It also remains unclear precisely when and how the REV insertions in FWPV and GHV-2 genome were generated. REV could presumably have spread from birds experimentally infected with P. lophurae and into the wider environment either before or after inserting into a DNA virus vector. Notably, research on malaria was prioritized in the United States during World War II, and P. lophurae stocks were distributed to laboratories throughout the country (see Table S6) for experimental vaccine and drug research. During this period the poultry industry was scaling rapidly, and the first avian virus vaccines were being commercially developed (including live FWPV vaccines, based on attenuated virus strains grown in embryonated eggs [50],[51]). REV sequences have been reported in FWPV vaccines lyophilized in 1949 [52], suggesting that insertion had already occurred by this time. Unfortunately, however, this inference is subject to some incertitude, since it is based solely on PCR from a single archived sample, and no lyophilized material remains for study (Table 2). The creation of Mareks disease vaccines became a priority in the United States during the 1950s, in response to devastating outbreaks of an apparently new, acute form of the disease [41]. However, effective vaccines were not produced until after the first avian cell culture systems were established in the 1960s. These in vitro systems were key to the eventual development of vaccines based on (i) attenuated GHV-2 strains and (ii) the closely related herpesvirus of turkeys (HVT). Both of these vaccines were later discovered to be contaminated with REV. In previous studies it has generally been assumed that REV insertions into the GHV-2 genome originated in the distant evolutionary past [3],[11],[12] (although it is recognized that at least some were generated recently during in vitro attenuation [53],[54]). By contrast, our data suggest that all REV insertions into GHV-2 have been generated recently. In 1960s and 1970s REV provided an experimental model for retrovirologists [55], and was sometimes used to transform avian cells [56]. Thus it is likely that the emergence of avian cell culture systems was accompanied by the spread of REV as a contaminant. Interestingly, dissemination of REV genetic material appears to be ongoing; REV is apparently being maintained as an insertion in naturally circulating FWPV-REV, and field strains of GHV-2 containing novel REV LTR insertions have recently been reported [19],[20]. Furthermore, we show that the recently described exogenous REV isolate HA9901 [39] shares a common history with REV plasmid pREVA6, which was in turn derived from the original tissue sample from which prototypic REV strains were isolated [40] (Figure 5, Figure S2). Thus it appears that in China, REV-contaminated materials may have given rise to independently circulating infectious retrovirus. The processes driving REV dissemination warrant further exploration, as does the potential role of co-opted REV sequences in altering the in vivo properties of FWPV and/or GHV-2. In conclusion, historical, phylogenetic, and paleovirological evidence supports a scenario wherein REVs originated as mammalian retroviruses that were iatrogenically introduced into avian hosts, and subsequently integrated into the FWPV and GHV-2 genomes, generating recombinant DNA viruses that now circulate in wild birds and poultry. These data provide the first evidence that horizontal gene transfer between virus families can expand the impact of iatrogenic transmission events, raising questions about the potential, unintended impacts of live, recombinant vector vaccines. Broader surveillance of viral genetic diversity should be prioritized, so that the unintended consequences of experimental procedures on viral ecology and evolution can be better assessed and limited. Materials and Methods Screening in Silico PERL scripts and the BLAST+ program suite were used to perform in silico screening of sequence databanks for sequences homologous to REV. We screened complete and low coverage whole genome sequence data representing 10 avian and 42 mammalian species (Table S1) and all poxvirus and herpesvirus-derived sequence data available in GenBank as of July 1, 2012. The noncoding nucleotide sequences (LTR and leader) and translated open reading frames (ORFs) (Gag, Pol, Env) of REV (FJ439119.1) were used as “probes” for in silico screening. Sequences that matched probes with high statistical significance (i.e., expect (e) values 0.001) were extracted and compared to a library of reference retroviral genomes (see Table S2), again using BLAST. The results of this “reciprocal” BLAST were examined, and the phylogenetic relationships of ERV loci that disclosed higher similarity to REV than to any other retroviral reference were investigated using the neighbor joining (NJ) algorithm implemented in PAUP [57]. NJ trees revealed that among all the ERV loci identified by screening, ERVs in the European hedgehog (Erinaceus europaeus) and cape hyrax (Procavia capensis) genomes were most closely related to REV in the gag-pol and env genes, respectively (unpublished data). The median reciprocal BLAST bit score for these two subsets of ERVs was used to establish a threshold bit score for discriminating REV-related coding sequences from those of other ERVs. Tissue Samples, Virus, and Cell Culture Frozen tissue samples from Malagasy carnivores (Cryptoprocta ferox, Galidia elegans, Mungotictis decemlineata) were obtained from the American Museum of Natural Historys cryogenic collection. Frozen spleen tissue samples were obtained from a deceased echidna (Tachyglossus aculeatus) at the Philadelphia Zoo. Chicken embryonic fibroblasts, SL-29 cells (ATCC#: CRL1590), were maintained in DMEM medium (Life technologies) supplemented with 5% fetal bovine serum, 5% tryptose phosphate broth, penicillin (100 U/ml), and streptomycin (100 mg/ml). An aliquot of 400 ul of DIAV (ATCC #: VR775) was inoculated onto 30% confluent SL-29 cells in six-well plates. Media was changed after 2 days, and cells were allowed to grow for a total of 5 days. After 5 days, cells were harvested and genomic DNA was extracted. PCR and Sequencing Genomic DNA was extracted from tissue samples and SL-29 cells using the AllPrep dual DNA/RNA extraction kit (QIAGEN). Initial PCR amplification of endogenous retroviral fragments was performed using PCR primers (Integrated DNA Technologies) directed against two highly conserved motifs in retroviral protease (PR) and RT proteins. After initial sequencing of this genomic region, a combination of gene-specific primers and degenerate primers were used to amplify the remaining regions of the REV genomes found in Galidia elegans, Mungotictis decemlineata, and Tachyglossus aculeatus. LTR regions and genomic insertion sites were amplified and cloned by ligation-mediated PCR, using the GenomeWalker Universal kit (Clontech). For complete genome sequencing of DIAV, primers were based on equivalent targets in REV and SNV and were used to amplify multiple overlapping regions of the DIAV genome. A list of primer sequences, the genomes on which they were used, and their coordinates (based on alignment to the DIAV genome) can be found in Table S3. Basic PCR conditions were used for almost all reactions (denaturation at 95°C for 2 min, followed by 30 cycles of 94°C for 15 s, 55°C for 30 s, and 68°C for 1 min, final elongation for 10 min), although annealing temperatures and elongation times varied depending on the primers used (details available on request). For all reactions, gel-resolved amplicons were excised from 1% agarose gels and purified using the Qiaquick kit (QIAGEN) before TA cloning into pCR2.1 (Life Technologies, La Jolla, CA) and sequencing. All sequence analysis was performed by the GeneWiz commercial sequencing facility (GeneWiz, South Plainfield, NJ). Sequences obtained in this study have been submitted to Genbank under the following accession numbers: DIAV (KF313137); Echidna-ERV (KF313136), and Galidia-ERV (KF313135). Sequence Data and Phylogenetic Analyses Retroviral “pan-genus” phylogenies were constructed from an alignment of the highly conserved RT and transmembrane (TM) peptides. Sequences derived from the retroviral reference library (Table S2) were included, as well as a selection of the best matching, uncharacterized ERVs from in silico and PCR screening. For both genes, ProtTest was used to select the best fitting amino acid substitution matrix from a range of 96 different combinations of models and rate heterogeneity parameters, based on the Akaike information criterion (AIC) [58]. The best fitting model for RT was rtREV [59], with gamma distributed rate heterogeneity (rtREV+Γ); for TM it was HIVw [60]. Phylogenetic investigation of within-REV variation was conducted using both peptide and nucleotide sequence data. We obtained all published REV sequence data from Genbank. Sequences shorter than 100 bp were excluded. The location and year of sampling, and host species associations, were extracted from the Genbank file or from an associated publication (Table S4). All sequences were profile aligned to a full genome reference (SNV; DQ003951.1). So that the phylogenetic relationships of all available sequences could be investigated, phylogenies were constructed for a range of alignment partitions: (i) complete genome, (ii) LTRs, (iii) gag, (iv) pol, and (v) env. Each partition was examined for evidence of recombination using GARD [61] and SplitsTree [62]. One full genome sequence (GQ375848.1) appeared to be recombinant and was subsequently removed from our dataset. We used ProtTest and ModelTest to select the best fitting amino acid and/or nucleotide substitution matrices for each alignment partition. The best fitting model for all nucleotide alignments was the general time reversible model [63] with a proportion of invariable sites and a gamma-shaped rate variation across sites (GTR+I+G). The best fitting models for amino acid alignments were; Gag, JTT+I; Pol, FLU+G; Env, JTT. The ML phylogeny was constructed using RAXML [64], with 1,000 nonparametric bootstrap replicates. A REV ancestral sequence was reconstructed using PAML [65]. Literature Review To systematically review REV-related literature, electronic searches of PubMed/Medline, JSTOR, Mendley, Scopus, Web of Science, and WorldCat were conducted in July 2012. Keywords used to search databases were “Reticuloendotheliosis Virus,” “Duck Infectious Anemia Virus,” “Spleen Necrosis Virus,” and “Chick Sync[i/y]tial Virus.” We restricted our search to papers with titles and abstracts available in English. The following data were searched for in texts: year of virus isolation, virus association, origin of isolation, animal status, secondary disease association, place of isolation, and the methods of isolation or detection. A completed PRISMA checklist and flow diagram for this systematic literature review can be found in Text S1. Screening in Silico PERL scripts and the BLAST+ program suite were used to perform in silico screening of sequence databanks for sequences homologous to REV. We screened complete and low coverage whole genome sequence data representing 10 avian and 42 mammalian species (Table S1) and all poxvirus and herpesvirus-derived sequence data available in GenBank as of July 1, 2012. The noncoding nucleotide sequences (LTR and leader) and translated open reading frames (ORFs) (Gag, Pol, Env) of REV (FJ439119.1) were used as “probes” for in silico screening. Sequences that matched probes with high statistical significance (i.e., expect (e) values 0.001) were extracted and compared to a library of reference retroviral genomes (see Table S2), again using BLAST. The results of this “reciprocal” BLAST were examined, and the phylogenetic relationships of ERV loci that disclosed higher similarity to REV than to any other retroviral reference were investigated using the neighbor joining (NJ) algorithm implemented in PAUP [57]. NJ trees revealed that among all the ERV loci identified by screening, ERVs in the European hedgehog (Erinaceus europaeus) and cape hyrax (Procavia capensis) genomes were most closely related to REV in the gag-pol and env genes, respectively (unpublished data). The median reciprocal BLAST bit score for these two subsets of ERVs was used to establish a threshold bit score for discriminating REV-related coding sequences from those of other ERVs. Tissue Samples, Virus, and Cell Culture Frozen tissue samples from Malagasy carnivores (Cryptoprocta ferox, Galidia elegans, Mungotictis decemlineata) were obtained from the American Museum of Natural Historys cryogenic collection. Frozen spleen tissue samples were obtained from a deceased echidna (Tachyglossus aculeatus) at the Philadelphia Zoo. Chicken embryonic fibroblasts, SL-29 cells (ATCC#: CRL1590), were maintained in DMEM medium (Life technologies) supplemented with 5% fetal bovine serum, 5% tryptose phosphate broth, penicillin (100 U/ml), and streptomycin (100 mg/ml). An aliquot of 400 ul of DIAV (ATCC #: VR775) was inoculated onto 30% confluent SL-29 cells in six-well plates. Media was changed after 2 days, and cells were allowed to grow for a total of 5 days. After 5 days, cells were harvested and genomic DNA was extracted. PCR and Sequencing Genomic DNA was extracted from tissue samples and SL-29 cells using the AllPrep dual DNA/RNA extraction kit (QIAGEN). Initial PCR amplification of endogenous retroviral fragments was performed using PCR primers (Integrated DNA Technologies) directed against two highly conserved motifs in retroviral protease (PR) and RT proteins. After initial sequencing of this genomic region, a combination of gene-specific primers and degenerate primers were used to amplify the remaining regions of the REV genomes found in Galidia elegans, Mungotictis decemlineata, and Tachyglossus aculeatus. LTR regions and genomic insertion sites were amplified and cloned by ligation-mediated PCR, using the GenomeWalker Universal kit (Clontech). For complete genome sequencing of DIAV, primers were based on equivalent targets in REV and SNV and were used to amplify multiple overlapping regions of the DIAV genome. A list of primer sequences, the genomes on which they were used, and their coordinates (based on alignment to the DIAV genome) can be found in Table S3. Basic PCR conditions were used for almost all reactions (denaturation at 95°C for 2 min, followed by 30 cycles of 94°C for 15 s, 55°C for 30 s, and 68°C for 1 min, final elongation for 10 min), although annealing temperatures and elongation times varied depending on the primers used (details available on request). For all reactions, gel-resolved amplicons were excised from 1% agarose gels and purified using the Qiaquick kit (QIAGEN) before TA cloning into pCR2.1 (Life Technologies, La Jolla, CA) and sequencing. All sequence analysis was performed by the GeneWiz commercial sequencing facility (GeneWiz, South Plainfield, NJ). Sequences obtained in this study have been submitted to Genbank under the following accession numbers: DIAV (KF313137); Echidna-ERV (KF313136), and Galidia-ERV (KF313135). Sequence Data and Phylogenetic Analyses Retroviral “pan-genus” phylogenies were constructed from an alignment of the highly conserved RT and transmembrane (TM) peptides. Sequences derived from the retroviral reference library (Table S2) were included, as well as a selection of the best matching, uncharacterized ERVs from in silico and PCR screening. For both genes, ProtTest was used to select the best fitting amino acid substitution matrix from a range of 96 different combinations of models and rate heterogeneity parameters, based on the Akaike information criterion (AIC) [58]. The best fitting model for RT was rtREV [59], with gamma distributed rate heterogeneity (rtREV+Γ); for TM it was HIVw [60]. Phylogenetic investigation of within-REV variation was conducted using both peptide and nucleotide sequence data. We obtained all published REV sequence data from Genbank. Sequences shorter than 100 bp were excluded. The location and year of sampling, and host species associations, were extracted from the Genbank file or from an associated publication (Table S4). All sequences were profile aligned to a full genome reference (SNV; DQ003951.1). So that the phylogenetic relationships of all available sequences could be investigated, phylogenies were constructed for a range of alignment partitions: (i) complete genome, (ii) LTRs, (iii) gag, (iv) pol, and (v) env. Each partition was examined for evidence of recombination using GARD [61] and SplitsTree [62]. One full genome sequence (GQ375848.1) appeared to be recombinant and was subsequently removed from our dataset. We used ProtTest and ModelTest to select the best fitting amino acid and/or nucleotide substitution matrices for each alignment partition. The best fitting model for all nucleotide alignments was the general time reversible model [63] with a proportion of invariable sites and a gamma-shaped rate variation across sites (GTR+I+G). The best fitting models for amino acid alignments were; Gag, JTT+I; Pol, FLU+G; Env, JTT. The ML phylogeny was constructed using RAXML [64], with 1,000 nonparametric bootstrap replicates. A REV ancestral sequence was reconstructed using PAML [65]. Literature Review To systematically review REV-related literature, electronic searches of PubMed/Medline, JSTOR, Mendley, Scopus, Web of Science, and WorldCat were conducted in July 2012. Keywords used to search databases were “Reticuloendotheliosis Virus,” “Duck Infectious Anemia Virus,” “Spleen Necrosis Virus,” and “Chick Sync[i/y]tial Virus.” We restricted our search to papers with titles and abstracts available in English. The following data were searched for in texts: year of virus isolation, virus association, origin of isolation, animal status, secondary disease association, place of isolation, and the methods of isolation or detection. A completed PRISMA checklist and flow diagram for this systematic literature review can be found in Text S1. Supporting Information Figure S1. A nucleotide alignment of orthologous ERV insertion sites in the Galidia elegans (Galidia-ERV-1,2) and Mungotictis decemlineata (Mungotictis-ERV-1,2) genomes. The alignment illustrated spans the 3′LTR and 3′ end of env of the orthologous REV-related ERV insertion in these two species and 238 bp of flanking genomic DNA (shown in grey). Flanking DNA is shown aligned to a homologous genomic sequence identified in the cat (Felis catus), dog (Canis familiaris), panda (Ailuropoda melanoleuca), and ferret (Mustela furo) genomes. https://doi.org/10.1371/journal.pbio.1001642.s001 (PDF) Figure S2. An alignment of REV LTR sequences, showing the presence of unique shared indels (insertions and deletions) that support the monophyletic relationship of the three sequences highlighted in gray, which include (i) the HA9901 strain of REV, (ii) REV plasmid (pREVA6), and (iii) a REV LTR insertion present in the JM-Hi3 strain of GHV-2. Shared indels are indicated by boxes. https://doi.org/10.1371/journal.pbio.1001642.s002 (PDF) Table S1. Avian and mammalian whole genome sequences screened for REV-related ERVs. https://doi.org/10.1371/journal.pbio.1001642.s003 (DOCX) Table S2. Retroviral reference sequences used in the study. Annotated reference sequences representing newly described ERVs have been made available online (http://saturn.adarc.org/paleo/). https://doi.org/10.1371/journal.pbio.1001642.s004 (DOCX) Table S3. Primer Coordinates. aThe coordinates of primers are shown, based on alignment to DIAV reference sequence (Accession Number KF313137). Abbreviations: F, forward; R, reverse; REV, reticuloendotheliosis virus; LTR, long terminal repeat. https://doi.org/10.1371/journal.pbio.1001642.s005 (DOCX) Table S4. Abbreviations: REV, reticuloendotheliosis virus; FWPV-REV, fowlpox virus with REV insertion; GHV-2-REV, gallid herpesvirus type 2 with REV insertion; LTR, long terminal repeat. https://doi.org/10.1371/journal.pbio.1001642.s006 (DOCX) Table S5. aState or prefecture/two-letter ISO country code. https://doi.org/10.1371/journal.pbio.1001642.s007 (DOC) Table S6. Asterisks indicate studies sponsored by the US Office of Scientific Research and Development (OSRD). https://doi.org/10.1371/journal.pbio.1001642.s008 (DOC) Text S1. PRISMA checklist and flow diagram. https://doi.org/10.1371/journal.pbio.1001642.s009 (PDF) Acknowledgments We thank the American Museum of Natural History and Philadelphia Zoo for providing tissue samples, and Eric Delwart, Karel A. Schat, Irwin W. Sherman, Michael Skinner, Greg Towers, Michael Tristem, and Robin Weiss for helpful discussions.
Par-1 Regulates Tissue Growth by Influencing Hippo Phosphorylation Status and Hippo-Salvador Associationdoi: 10.1371/journal.pbio.1001620pmid: 23940457
Introduction The control of organ size, which requires the delicate coordination of cell growth, cell proliferation, and cell death, is a fascinating biological process. The identification of the Hippo (Hpo) signaling pathway has shed some light on this biological phenomenon. The Hpo pathway has emerged as an evolutionarily conserved pathway that controls organ size during animal development. It regulates tissue growth by balancing cell proliferation and apoptosis and has also been implicated in stem cell maintenance, tissue homeostasis, and repair [1]–[4]. In addition, it has been reported to play a role in cell contact-dependent growth inhibition [5]. Accumulating evidence has suggested that mutations and malfunctions of the components of the Hpo pathway result in a wide range of human cancers and diseases [6]. The Hpo pathway can be divided into three parts: upstream regulatory inputs, core kinase cassette, and downstream transcriptional output [2]. Core to the Hpo pathway is the kinase cascade, which acts sequentially to inhibit the nuclear translocation and activity of the growth-promoting transcriptional coactivator Yorkie (Yki) or Yap/TAZ (mammalian homologue of Drosophila Yki) [7]. The core kinase cascade of the Hpo pathway consists of four tumor suppressor proteins, including two kinases, the serine/threonine Ste20-like kinase Hpo or its mammalian homologues MST1/2 [8]–[12], and the nuclear Dbf-2-related (NDR) family kinase Warts (Wts) or its mammalian homologues LATS1/2 [13],[14], and the scaffold proteins of the kinases, Salvador (Sav) [15],[16] and Mob as tumor suppressors (Mats) [17]. Hpo phosphorylates and activates Wts via the formation of a complex with Sav [15],[18],[19]. Wts functions in a complex with Mats to restrict the nuclear translocation of Yki by phosphorylating Yki at multiple sites [7],[17],[19],[20]. In the absence of suppression from the Hpo pathway, Yki associates with transcription factors, primarily Scalloped (Sd) [20]–[22] and other factors, including Homothorax [23], Teashirt, and Mad [24], in the nucleus to promote proliferation and to inhibit apoptosis by inducing the expression of target genes, such as bantam, cyclinE, and diap1 [25]. Comparing the clear linear relationship between the core kinase cassette and the downstream transcriptional output, this pathway is regulated by multiple upstream regulatory branches, such as the Merlin-Expanded(Ex)-Kibra complex [26]–[29], Fat (ft) and Dachsous [30]–[33], Crumbs and the Lgl-Scrib-Dlg complex [34],[35], and, the most recently identified, Echinoid and Tao-1 [36]–[38]. Several recent proteome-wide phosphorylation studies, which have uncovered a large number of previously unknown phosphorylation events in Hpo signaling [39],[40], have indicated the involvement of a large number of unknown participants in the Hpo pathway. To identify novel pathway modulators, we performed a gain-of-function EP screen and identified Par-1 as a novel Hpo pathway regulator. Par-1 is a multifunctional serine/threonine kinase containing an N-terminal conserved catalytic domain, a ubiquitin-associated (UBA) domain adjacent to the catalytic domain, and a kinase associated domain 1 (KA1 domain) within the last 40 amino-acids [41]. Par-1 plays a major role in anterior/posterior (A/P) axis formation and germline determinant polarization, and it also regulates diverse cellular processes, including microtubule dynamics and neuronal polarity [42]–[47]. Although Par-1 is involved in multicellular processes, little is known regarding the function of Par-1 in disease and tumor formation. Hyper-phosphorylation of the microtubule-associated protein Tau by microtubule affinity regulating kinase, the homolog of Drosophila Par-1 (MARK) [48], which is activated by upstream kinases, such as LKB1 [49] and Tao-1 [50], results in microtubule depolymerization and abnormal aggregation of Tau in Alzheimer disease. Additionally, MARK4 has been reported to be involved in hepatocellular carcinogenesis and gliomagenesis [51],[52]. In this study, we identified Par-1 as a negative regulator of the Hpo kinase complex. We found that overexpression of Par-1 drove tissue overgrowth and upregulated the expression of Hpo pathway-responsive genes. Moreover, knockdown of Par-1 blocked tissue growth and downregulated the expression of Hpo pathway-responsive genes. We demonstrated that par-1 functioned downstream of ex and ft but upstream of hpo and sav. We also provided evidence that Par-1 associated with the Hpo-Sav complex and regulated the phosphorylation of Hpo at Ser30 to regulate Hpo activity. Furthermore, we found that Par-1 promoted the dissociation of Sav from the Hpo-Sav complex, eventually resulting in Sav dephosphorylation and destabilization. Thus, these results identified Par-1 as a novel regulator of the Hpo signaling pathway and supported a model by which Par-1 regulates Hpo phosphorylation and Hpo-Sav association to control organ growth. Results Par-1 EP Lines Promote Growth via Hpo Signaling To identify novel candidates of the Hpo pathway, we performed an overexpression screen in which flies carrying GMR-Gal4 and UAS-Yki (referred to as GMR-Yki) were crossed with a collection of EP lines. Overexpression of UAS-Yki posterior to the morphogenetic furrow (MF) under the control of the GMR-Gal4 driver (GMR-Yki) resulted in enlarged eyes (compare Figure 1A′ with 1A), providing a sensitive background for a genetic modifier screen [53]. Each EP line was crossed with GMR-Yki flies, and the F1 progeny was screened for an increase in eye size. From more than 10,000 EP lines, we screened numerous lines that enhanced the overgrowth phenotype induced by Yki overexpression. We then analyzed the UAS element insertion sites of these lines and found that the insertion sites of L[484], L[507], and F[727] were all within the 5′ UTR region of the par-1 gene (Figure 1C). Although these three EP lines did not display an overgrowth phenotype when driven by GMR-Gal4 in Drosophila eyes (compare Figure 1B with 1A, and unpublished data), these lines dramatically enhanced the GMR-Yki-induced overgrowth phenotype (compare Figure 1B′ with 1A′, and unpublished data). In addition, the expression of these lines driven under the wing-specific Gal4 driver MS1096 produced enlarged adult wings (Figure 1D–1D″), indicating that the candidate genes expressed in these lines may play a role in organ size control. To determine whether the UAS element of these lines regulated par-1 gene expression, real-time PCR analysis was performed for the L[484] line. The mRNA level of par-1 was significantly upregulated when the L[484] line was crossed with MS1096, whereas the mRNA level of genes located proximal to par-1, mei-W68, and hpo, demonstrated a slight or no change (Figure 1E), suggesting that ectopic Par-1 expression could be responsible for the tissue overgrowth phenotype that we observed. Download: PPT PowerPoint slide PNG larger image TIFF original image Figure 1. Par-1 P-element insertion lines enlarge organ size and promote Hpo pathway-responsive gene expression. (A–B′) An EP line L[484] enhanced the Yki gain-of-function induced phenotype. Side views of D. melanogaster wild-type eyes (A), eyes expressing L[484] (B), eyes expressing Yki (A′), or eyes co-expressing L[484] and Yki (B′), driven by GMR-Gal4. (C) Schematic representation of the par-1 gene locus and P-element insertion sites. The insertion sites of the L[484], L[507], and F[727] EP lines are at the 5′ UTR region of the Par-1 gene (marked by orange arrows). The par-1 locus is located between mei-W68 and hpo. (D–D″) L[484] promotes adult fly wing growth. Control adult fly wings (D) and wings expressing L[484] (D′) with the wing-specific driver MS1096. The red dashed line indicates the size of the control wing. The relative wing size was quantified using an unpaired t-test (D″). The results represent the mean ± SEM. ***p<0.001 (n>6) for each genotype. (E) Expression of L[484] significantly increased the mRNA levels of Par-1. To detect the level of Par-1 and the transcripts of its neighboring genes, a real-time PCR analysis was performed. All of the results were expressed as the mean ± SEM.*p<0.05, **p<0.01. (F–G′) L[484] promotes expanded gene expression. Wild-type D. melanogaster third-instar larval wing discs (F) or wing discs expressing L[484] (G–G′) with hh-Gal4 were immunostained to demonstrate the expression of Cubitus (Ci) (Red) and Ex-LacZ (EX-Z) (green). Ci marked the anterior compartment (A-compartment). The arrows indicate the P-compartment. (H–H″) L[484] elevates diap1 gene expression. Wing discs containing flip-out clones expressing L[484] with act>CD2>Gal4 were immunostained to demonstrate the expression of CD2 (red) and diap1-lacZ (green). Cells expressing L[484] were indicated by the lack of CD2 expression (indicated by arrows). https://doi.org/10.1371/journal.pbio.1001620.g001 To determine whether overexpression of L[484] promoted tissue growth via Hpo signaling, the L[484] line was expressed under the control of the hh-Gal4 driver, which drives gene expression in the posterior compartment (P-compartment). As shown in Figure 1G–1G′, ex-lacZ (EX-Z), an enhancer trap for ex [27], was increased in the P-compartment of the wing imaginal disc, suggesting an inhibition of Hpo signaling. Furthermore, the Hpo downstream marker diap1-lacZ was also upregulated in the flip-out clones expressing L[484] (Figure 1H–1H″). Briefly, these observations suggested that the expression of the L[484] line promoted tissue growth via Hpo signaling by controlling the expression of Par-1. Overexpression of Par-1 Inactivates Hpo Signaling to Induce Tissue Growth in a Kinase-Dependent Manner To verify the functional relationship between Par-1 and the Hpo pathway, a dual luciferase assay, which reflected Sd-Yki transcriptional activity [20], was performed. As shown in Figure 2A, in S2 cells, coexpression of Yki and Sd activated the luciferase reporter gene (3×Sd2-Luc), which was greatly promoted by Par-1, indicating that Par-1 enhanced the activity of the Sd-Yki transcriptional complex in vitro. To further determine the functional relationship between Par-1 and the Hpo pathway in vivo, Myc tagged Par-1 transgenic flies were generated. Consistent with the results in Figure 1A–1B′, overexpression of two copies of UAS-Myc-Par-1, using the GMR-Gal4 driver (referred to as GMR/2*Myc-Par-1), resulted in rough eyes without a discernible overgrowth (compare Figure 2B′ with 2B), while coexpression of UAS-Myc-Par-1 with GMR-Yki enhanced the overgrowth phenotype caused by GMR-Yki (compare Figure 2C′ with 2C). Although GMR/2*Myc-Par-1 did not induce a discernible overgrowth phenotype in the eyes (Figure 2B′), the expression of two copies of UAS-Myc-Par-1, using the MS1096 driver (referred to as MS1096/2*Myc-Par-1), resulted in enlarged wings and caused a wing bending-down phenotype, which indicated an expansion of the wing (Figure 2D′″ and compare Figure 2D′ with 2D and 2E′ with 2E). We also found that the relative P-compartment area of the wings expressing UAS-Par-1, using the hh-Gal4 driver, was increased (Figure S1A–S1B). We then examined whether overexpression of Par-1 affected the expression of Hpo pathway-responsive genes. We found that flip-out clones expressing UAS-Myc-Par-1 in the wing imaginal discs showed upregulated expression of EX-Z (Figure 2F–2F′), diap1-lacZ (Figure 2H–2H′) and diap1-GFP3.5 (a diap1 enhancer element reporter [20], Figure S1C–S1C″), suggesting compromised Hpo signaling activity. Taken together, these results suggested that overexpression of Par-1 promoted tissue overgrowth by inhibiting Hpo pathway activity. Download: PPT PowerPoint slide PNG larger image TIFF original image Figure 2. Overexpression of Par-1 triggers tissue overgrowth and inactivates Hpo signaling in a kinase-dependent manner. (A) Par-1 enhances the transcriptional activity of the Yki-Sd complex in vitro. S2 cells were transfected with the indicated constructs and the luciferase reporter genes. 48 h after transfection, the cell lysates were harvested and subjected to a dual luciferase assay. Note that Par-1 activates the 3×Sd2-Luc reporter compared with the control. All of these data were represented as the mean ± SEM. **p<0.01. **p<0.001. (B–C″) Par-1, but not Par-1-KD, synergizes with Yki to trigger tissue overgrowth. Side view of D. melanogaster adult eyes: wild-type (B); eyes expressing two copies of UAS-Par-1 (B′), two copies of UAS-Par-1-KD (B″) or UAS-Yki (C); or eyes co-expressing UAS-Yki and two copies of UAS-Par-1 (C′) or UAS-Yki and two copies of UAS-Par-1-KD (C″), driven by GMR-Gal4. (D–E″) Par-1, but not Par-1-KD, induces Drosophila wing overgrowth. Dorsal view (D–D″) or side view (E–E″) of the control wings (D, E), wings expressing two copies of UAS-Par-1 (D′, E′), or wings expressing two copies of UAS-Par-1-KD (D″, E″), with MS1096. The red dashed line indicated the size of the control wings. The relative wing size was quantified using the unpaired t-test (D′″). The results represented the mean ± SEM. *** means p<0.001 (n>6) for each genotype. Note that the adult wings were bent down in flies that overexpressed Par-1. This phenotype was not observed in the flies that overexpressed Par-1-KD. (F–I′) Par-1, but not Par-1-KD, promotes the Hpo pathway-responsive gene expression. Drosophila discs containing flip-out clones expressing UAS-Myc-Par-1 or UAS-Myc-Par-1-KD driven by act>CD2>Gal4 were dissected and immunostained with the indicated antibodies. Cells expressing UAS-Myc-Par-1 or UAS-Myc-Par-1-KD transgenes were labeled by Myc tag (indicated by arrows). Note the upregulation of diap1 and ex transcription via ectopic Par-1 expression. Par-1-KD was incapable of inducing diap1 and ex expression. https://doi.org/10.1371/journal.pbio.1001620.g002 Considering that the Ser/Thr kinase activity of Par-1 was important for its function in polarity regulation, we speculated that the function of Par-1 in Hpo signaling might also be dependent on its kinase activity. To examine this hypothesis, we first constructed a kinase-dead form of Par-1 (Par-1-KD), which contained the T408A and S412A mutations. Par-1, containing these two mutations, was thought to be a kinase inactive mutant because the activation loop of the catalytic domain was disrupted [54]. This was confirmed by an in vitro kinase assay in which the kinase activity of Par-1-KD was completely abolished (Figure S1D). Unlike the phenotype observed with the expression of two copies of the Par-1 transgenes, the expression of two copies of the Par-1-KD transgenes had no obvious effect on either eye growth or wing growth and did not dramatically enhance the overgrowth phenotype induced by Yki overexpression (compare Figure 2B″, 2C″ with 2B–2B′, 2C–2C′ and 2D″, 2E″ with 2D–2D′, 2E–2E′). Importantly, to exclude the possibility that the functional variation between Par-1 and Par-1-KD was due to a low expression level of Par-1-KD, we compared the overexpressed Par-1 and Par-1-KD protein levels in both the eye and wing imaginal discs using direct Western blot analysis. We found that the overexpression level of Par-1-KD was higher compared to Par-1, verifying that the functional variation did not result from a low Par-1-KD expression level (Figure S1E–S1F). Furthermore, Par-1-KD failed to elevate Ex and Diap1 expression in flip-out clones (Figure 2G–2G′ and 2I–2I′). Taken together, these observations demonstrated that overexpression of Par-1 promoted tissue overgrowth by promoting the activity of the Sd-Yki complex and upregulating the expression of Hpo pathway-responsive genes in a kinase-dependent fashion. Loss of Par-1 Inhibits Tissue Growth by Downregulating Hpo Pathway Targets To determine whether Par-1 is necessary for normal growth, we examined the effect of the loss-of-function of Par-1 on Hpo signaling. By expressing UAS-Par-1-RNAi under the control of eyeless-Gal4 (ey-Gal4) or MS1096, adult eye/wing sizes were reduced (Figure 3A–3B″), suggesting that Par-1 (activity) was required for normal eye and wing development. Par-1 RNAi efficiency was also confirmed by in vivo staining, in which Par-1-RNAi transgenes were expressed under the control of hh-Gal4. As shown in Figure S2B–S2B′, endogenous Par-1 protein levels were efficiently knocked down by expressing Par-1-RNAi in the P-compartment. In addition, shrinkage of the P-compartment was also observed (Figure S2B–S2B′). To eliminate the concern regarding Par-1 RNAi off-target effects, a second line of Par-1 RNAi (Par-1-RNAi-2), which targets a different region of Par-1, was generated. Par-1-RNAi-2 also efficiently knocked down endogenous Par-1 expression (Figure S2C–S2C′) and restricted wing growth when expressed by MS1096 (Figure S2D–S2D′). Furthermore, the expression of Par-1-RNAi by GMR-Gal4 resulted in the detection of caspase-3 in its active (cleaved) form (Figure 3D–3D′), indicating a role for Par-1 in restricting tissue growth by inducing apoptotic cell death. Download: PPT PowerPoint slide PNG larger image TIFF original image Figure 3. Inactivation of Par-1 reduces organ size and downregulates the Hpo pathway-responsive genes. (A–B″) Inactivation of Par-1 restricts organ growth. Side view of adult fly eyes: wild-type (A) or eyes expressing Par-1 RNAi (A′) under the control of eyless-Gal4. Dorsal view of adult wings: wild type (B) or wings expressing Par-1 RNAi (B′) under the control of MS1096. The relative wing size was quantified using an unpaired t-test (B″). The results represented the mean ± SEM. *** means p<0.001 (n>6) for each genotype. (C–C′) Knockout of par-1 restricts cell growth. Drosophila third-instar larval eye discs containing par-1w3 clones were dissected. par-1 mutant clones were marked by the loss of GFP expression, while their twin spots were marked by increased GFP expression. The representative par-1 mutant clone and its twin spot were separately indicated by the white dashed line and red dashed line (C). The total area of the par-1 mutant clones or their twin spots within one eye disc were calculated (C′). The results represented the mean ± SEM. *** means p<0.001 (n>6) for each genotype. (D–D′) Par-1 knockdown initiates apoptosis. Drosophila third-instar larval control eye discs (D) or discs expressing Par-1 RNAi (D′) under the control of GMR-Gal4 were immunostained with a cleaved-caspase-3 antibody. Note that caspase-3 was cleaved to its active form when Par-1 RNAi was expressed. (E–H′) Par-1 knockdown suppresses the expression of the Hpo pathway-responsive genes. Wing discs expressing Par-1 RNAi with hh-Gal4 (E–E′, H–H′) or with act>CD2>Gal4 (F–F′) were immunostained to demonstrate the protein expression levels of EX-Z (E–E′), mic32-GFP (F–F′), and DIAP1 (G–H′). Cells expressing Par-1 RNAi were labeled either by the lack of Ci or CD2 expression. Note that the expression of EX-Z and DIAP1 was inhibited, whereas the levels of mic32-GFP were increased. The arrows indicate the P-compartment or clone regions. (I–J″) Loss of par-1 disrupts the expression of Hpo pathway-responsive genes. D. melanogaster third-instar larval eye discs containing par-1W3 clones were dissected and examined to determine the expression of diap1-lacZ (I–I″) and bantam-lacZ (J–J″). par-1 mutant clones were marked by the loss of GFP expression. Note the downregulation of diap1-lacZ and bantam-lacZ in the absence of Par-1. The clone regions are indicated by arrows. https://doi.org/10.1371/journal.pbio.1001620.g003 We further tested whether knockdown of Par-1 resulted in downregulation of Hpo pathway-responsive genes. Expression of either UAS-Par-1-RNAi or UAS-Par-1-RNAi-2 by hh-Gal4 resulted in diminished levels of EX-Z, DIAP1, and diap1-GFP3.5 and a reduced P-compartment size (Figures 3E–3E′, 3H–3H′, and S2E–S2E′, S2F–S2F″). Consistent with these results, a bantam sensor (mic32-GFP) signal was upregulated in Par-1-RNAi flip-out clones (Figure 3F–3F′) or wing discs expressing Par-1-RNAi-2 by hh-Gal4 (Figure S2G–S2G″), suggesting a restriction of microRNA bantam expression by knockdown of Par-1. Thus, this evidence suggested that the inactivation of Par-1 resulted in abnormal growth by antagonizing the expression of Hpo-responsive genes. To further strengthen this conclusion, the expression of Hpo-responsive genes was examined in par-1w3 mosaic clones. As shown in Figure 3I–3I″ and 3J–3J″, in par-1 null clones, the diap1 transcriptional level was reduced (Figure 3I–3I″), and bantam-lacZ was decreased (Figure 3J–3J″). Importantly, the size of the par-1 null clones was significantly reduced compared to their twin spots (Figure 3C–3C′), indicating a proliferation disadvantage for par-1 null clones. Taken together, these observations demonstrated that par-1 was essential for normal growth and that perturbation of Par-1 expression resulted in growth suppression and apoptosis by stimulating the Hpo pathway. par-1 Acts Downstream of ex and fat but Upstream of hpo in the Hpo Pathway Given the findings presented in the previous section, we next determined the functional relationship between Par-1 and Hpo pathway components by identifying the genetic interactions of Par-1 in the Hpo pathway. We first examined whether the function of Par-1 was dependent on the activity of the Sd-Yki transcriptional complex because this complex was the main downstream effector of the Hpo pathway. Although expression of two copies of UAS-Myc-Par-1 in flip-out clones increased diap1-lacZ (Figure S3A–S3A′), no increase in diap1-lacZ was detected when UAS-Yki-RNAi was coexpressed (Figure S3B–S3B′). Coexpression of UAS-Sd-RNAi suppressed the overgrowth phenotype induced by Par-1 overexpression in Drosophila wings (Figure S3C–S3C′). In addition, the elevated levels of diap1 transcription caused by Par-1 overexpression were reverted by coexpression of Sd RNAi (Figure S3D–S3F′). Furthermore, ectopic Yki expression reverted downregulated DIAP1 levels and the shrunken P-compartment phenotype induced by the expression of Par-1 RNAi (Figure 4A–4B′). These results indicated that par-1 functioned upstream of the Sd-Yki transcription complex in the Hpo pathway. To strengthen this conclusion, the levels of phosphorylated Yki, which reflected Hpo/Wts activity, were examined. As expected, Par-1, but not Par-1-KD, reduced phosphorylated Yki levels (Figure 4C). In addition, Par-1 also inhibited the Hpo/Wts signaling-induced Yki mobility shift (Figure 5I, lanes 2–5). These findings suggested that Par-1 functioned upstream of yki to affect the activity of the Sd-Yki transcription complex. We next examined the genetic epistasis between Par-1 and the upstream components of the Hpo pathway. We found that elevated DIAP1 levels and the enlarged P-compartment size (Figure 4D–4D′), resulting from ex RNAi expression by hh-Gal4, were suppressed by coexpression of the Par-1 RNAi transgene (Figure 4E–4E′), suggesting that Par-1 functioned downstream of ex. In addition, coexpression of Par-1 RNAi suppressed ex RNAi-induced wing overgrowth (Figure S3G–S3G′″). Furthermore, Par-1 RNAi also suppressed Ft RNAi-induced wing overgrowth (Figure S3H–S3H′″). These observations supported the notion that par-1 functioned downstream of or in parallel to ex and ft. Download: PPT PowerPoint slide PNG larger image TIFF original image Figure 4. Par-1 functionally interacts with the components of the Hpo pathway. (A–B′) Loss of Par-1 induced a phenotype that was suppressed by Yki overexpression. The protein levels of DIAP1 in the wing discs expressing UAS-Par-1 RNAi (A–A′) or coexpressing UAS-Yki and UAS-Par-1 RNAi (B–B′) with hh-Gal4 were detected. Note that Yki overexpression overcame the inhibitory effect of Par-1 RNAi on diap1 expression. The arrows indicate the P-compartment. (C) Par-1 inhibits Yki phosphorylation. S2 cells were transfected with the indicated constructs. Phosphorylated Yki was detected using the p-Yki antibody, which recognizes the phosphorylated site of Yki at Ser168. (D–E′) ex functions upstream of par-1 in the Hpo pathway. Wing discs expressing UAS-ex RNAi (D–D′) or coexpressing UAS-ex RNAi and UAS-Par-1 RNAi (E–E′) with hh-Gal4 were subjected to immunostaining. The transgene expression regions were marked by the lack of Ci (red) staining and are indicated by arrows. Note that ex RNAi expression resulted in an enlarged P-compartment and increased expression of diap1-GFP 3.5, whereas coexpression with Par-1 RNAi fully suppressed these phenotypes. (F–F′″) wts functions downstream of par-1 in the Hpo pathway. Clones were generated using the MARCM system. The genotypes were the following: ey-flp, Ubi-Gal4, UAS-GFP; FRT82B/FRT82B Gal80 (F), ey-flp, Ubi-Gal4, UAS-GFP; Par-1-RNAi; FRT82B/FRT82B Gal80 (F′), ey-flp, Ubi-Gal4, UAS-GFP; Par-1-RNAi; FRT82B wtslatsX1/FRT82B Gal80 (F″), and ey-flp, Ubi-Gal4, UAS-GFP; FRT82B wtslatsX1/FRT82B Gal80 (F′″). Note that Par-1 RNAi reduced the clone size, whereas wtslatsX1 rescued this phenotype. (G) Quantification of the relative clone size. The relative clone size was calculated as the GFP area divided by the entire disc area. All of these data were expressed as the mean ± SEM. **p<0.01. **p<0.001. n>5, for each group. (H) Par-1 modulates Wts phosphorylation status. S2 cells were transfected with the indicated plasmids. The cell lysates were harvested and followed by Western blot analysis. Note that the phosphorylation shift of Wts mediated by Hpo/Sav/Merlin/Tao-1 was partially blocked by Par-1 expression. The shifted Wts bands are indicated by the small circles. (I–I″) par-1 functions upstream of hpo in the Hpo pathway. Clones were generated using the MARCM system. The genotypes were the following: eyflp, ubi-Gal4, UAS-GFP; FRT42D/FRT42D Gal80 (I), eyflp, ubi-Gal4, UAS-GFP; FRT42D hpoBF33/FRT42DGal80 (I′), eyflp, ubi-Gal4, UAS-GFP; FRT42D hpoBF33/FRT42DGal80; Par-1-RNAi (I″). Note that the Hpo null clones caused tumorous growth, whereas Par-1 RNAi could not rescue this phenotype. https://doi.org/10.1371/journal.pbio.1001620.g004 Download: PPT PowerPoint slide PNG larger image TIFF original image Figure 5. Par-1 interacts with Hpo-Sav and regulates the phosphorylation of Hpo at Ser30. (A–B) Par-1 interacts with Hpo and Sav in vitro. S2 cells were transfected with HA-tagged full-length or truncated Par-1 and Hpo (A) or Sav (B) constructs. The cell lysates were immunoprecipitated, followed by Western blot analysis with the indicated antibodies. Note that weak binding (asterisk indicated) between full-length Par-1/Par-1-C and Hpo and Sav were detected, whereas the N-terminal truncation of Par-1, which contained the kinase domain, showed a much stronger interaction signal. (C) Par-1 induces phosphorylation shift of Hpo-KD in vitro. S2 cells were transfected with the indicated constructs. The cell lysates were subjected to phosphorylation mobility shift assays. Note the phosphorylation shift of Hpo-KD in the presence of Par-1. Phos-tag was used to enhance the phosphorylation shift (see Materials and Methods for further details). (D) Par-1 regulates phosphorylation of Hpo-KD at Ser30 in S2 cells. S2 cells were transfected with the indicated constructs. The cell lysates were subjected to a phosphorylation mobility shift assay. The Hpo Ser30 site was mutated to an alanine. Note that the Hpo(S30A) mutant did not shift in the presence of Par-1. (E–F) Par-1 induces the phosphorylation of Hpo-KD at Ser30 in S2 cells. S2 cells were transfected with the indicated constructs. The cell lysates were subjected to Western blot analyses. Note that the phospho Hpo(Ser30) antibody could only detect Par-1-induced phosphorylation in the Hpo-KD samples but not in the Hpo(Ser30) mutant samples. The asterisks indicate non-specific bands. Lambda-PP indicates λ-phosphatase. (G) Par-1 inhibits Hpo(Thr195) phosphorylation. S2 cells were transfected with the indicated constructs. The cell lysates were immunoprecipitated, followed by Western blot analyses to detect p-Hpo(Thr195) levels. Note that Par-1 inhibited Hpo(Thr195) phosphorylation in a kinase-dependent manner, whereas the Hpo(S30A) mutant could not be inhibited. (H) Quantification of p-Hpo(Thr195) levels. p-Hpo(Thr195) levels were quantified using densitometry. The results were expressed as the mean ± SEM from three independent experiments. *p<0.05. (I) Hpo(S30A) results in a higher phosphorylation shift of Yki. S2 cells were transfected with the indicated constructs. The cell lysates were subjected to a phosphorylation mobility shift assay. Note that the phosphorylation shift of Yki was enhanced in the presence of Hpo(S30A) and that the Hpo(S30A) mutant was resistant to Par-1 induced Yki dephosphorylation. (J–K) Hpo(S30A) shows enhanced activity compared with wild-type Hpo in vivo. Control wings (J) or wings expressing UAS-Hpo (J′) or UAS-Hpo(S30A) (J″) with C765 were shown. The relative wing size was quantified using an unpaired t-test (K). The results represented the mean ± SEM.*p<0.05, **p<0.01, ***p<0.001 (n>6) for each genotype. Note that the Hpo(S30A) flies exhibited smaller wings than the Hpo flies. https://doi.org/10.1371/journal.pbio.1001620.g005 We then determined whether Par-1 regulated the Hpo pathway via Wts, which phosphorylates Yki at Ser168 to retain Yki in the cytoplasm [19]. By generating Par-1 RNAi clones in eye discs using the mosaic analysis with a repressible cell marker (MARCM) technique, we found that the size of Par-1 RNAi clones was extremely small compared to that of the control clones (compare Figure 4F′ with 4F), indicating adverse development of the Par-1 RNAi clones. Strikingly, we found that knockout of wts rescued the adverse developmental phenotypes of the Par-1 RNAi clones and that the size of the wts mutant clones expressing Par-1 RNAi was more comparable to that of the wts clones, rather than that of Par-1 RNAi clones (Figure 4G and compare Figure 4F″ with 4F′″). These findings indicated that par-1 functioned upstream of wts. Because Wts was activated and phosphorylated by upstream components of the Hpo pathway, we then tested whether Par-1 regulated Wts phosphorylation. Par-1, but not Par-1-KD, reduced the mobility shift of Wts phosphorylation in the presence of Hpo/Sav/Merlin/Tao-1 (Figure 4H), suggesting that par-1 functioned upstream of wts and regulated Wts phosphorylation in a kinase-dependent manner. We then determined whether Par-1-regulated Hpo signaling was dependent on the activity of Hpo, which restricted tissue growth by phosphorylating Wts. By generating hpo mutant clones in Drosophila compound eyes using the MARCM system, we found that the ablation of Hpo resulted in tumor outgrowth (compare Figure 4I′ with Figure 4I). Furthermore, we found that Par-1 RNAi was incapable of reverting the growth advantage of hpo null clones (compare Figure 4I″ with Figure 4I′), indicating that Par-1 functioned upstream of hpo to regulate Hpo signaling. Par-1 Interacts with Hpo-Sav and Regulates the Phosphorylation of Hpo at Ser30 Given that Par-1 functions upstream of hpo (Figure 4), we speculated that Par-1 modulated the function of the Hpo-Sav complex. To test our hypothesis, we first examined whether Par-1 bound to the Hpo-Sav complex using a co-immunoprecipitation assay. As expected, full-length Par-1 interacted with both Hpo and Sav, although these associations were weak (Figure 5A–5B). In addition, we also found that the HA-tagged N-terminal fragment of Par-1 (Par-1-N, Figure S4A) had a strong association with both Hpo and Sav, whereas there was only a weak interaction between the Par-1 C-terminal fragment (Par-1-C, Figure S4A) and Hpo/Sav had been detected (Figure 5A–5B). Importantly, the interaction between Par-1 and Hpo/Sav was specific because neither the full-length nor the N-terminal fragment of Par-1 co-immunoprecipitated with Merlin (unpublished data), Wts, or Mats (Figure S4B). On the basis of the previous results that Par-1 functioned in a kinase-dependent manner in Hpo signaling, we performed a phosphorylation shift experiment using a phos-tag gel to determine whether Par-1 affected the phosphorylation of Hpo and Sav in vitro. The Phos-Tag is a phosphate binding compound that, when incorporated into polyacrylamide gels, results in an exaggerated mobility shift for phosphorylated proteins, which is dependent upon the degree of phosphorylation [18],[55]. We observed that Flag-Hpo cotransfected with Par-1, but not Par-1-KD, in S2 cells that exhibited a mobility shift (Figure S5A). However, such a mobility shift was not detected for Sav when it was cotransfected with either Par-1 or Par-1-KD (Figure 6B, compare lanes 3–4 with lane 1). To remove the effect of Hpo auto-phosphorylation, a kinase-dead form of Hpo (Hpo-KD) was also tested in the phosphorylation shift experiment. As shown in Figure 5C, Par-1 also induced Hpo-KD to generate a phosphorylation mobility shift. These results suggested that Par-1 specifically induced Hpo phosphorylation in S2 cells. Download: PPT PowerPoint slide PNG larger image TIFF original image Figure 6. Par-1 disrupts the association of the Hpo-Sav complex in a kinase-dependent manner. (A) Par-1 could destabilize Hpo-induced accumulation of Sav in a kinase-dependent manner. S2 cells were transfected with the indicated constructs followed by a Western blot analysis. CFP served as a loading control. (B) Par-1 inhibits the phosphorylation of Sav induced by Hpo. The mobility shift assay was employed. The loading volume was adjusted according to the total Sav protein level. (C) Par-1 disrupts the interaction of the Hpo-Sav complex. S2 cells were transfected with the indicated constructs followed by immunoprecipitation to test whether the interaction between Sav and endogenous Hpo was affected by Par-1. Note that less Sav interacted with Hpo in the presence of Par-1. (D–D″) Par-1 RNAi is incapable of inhibiting sav mutant-induced adult eye overgrowth. The clones were generated using the MARCM system. The genotypes are as following: eyflp, ubi-Gal4, UAS-GFP; FRT82B/FRT82BGal80 (D), eyflp, ubi-Gal4, UAS-GFP; FRT82B SavSH13/FRT82B Gal80 (D′), and eyflp, ubi-Gal4, UAS-GFP; Par-1-RNAi; FRT82B SavSH13/FRT82B Gal80 (D″). (E) The proposed mechanism of Par-1 regulation of the Hpo pathway (see text for further detail). https://doi.org/10.1371/journal.pbio.1001620.g006 To identify which sites on Hpo were affected by Par-1, we cotransfected Flag tagged Hpo with HA tagged Par-1 in S2 cells. Flag-Hpo protein was then immunoprecipitated and separated using SDS-PAGE and harvested for a semi-quantitative mass spectrometric (MS) analysis (details are described in the Materials and Methods). According to the MS results, we identified four potential phosphorylation sites that were affected by Par-1 expression, S30, S66, Y365, and T615. We then generated different Hpo variants by individually mutating these candidate sites and then examined how these mutations affected Par-1-induced Hpo phosphorylation. Interestingly, only Hpo (S30A) failed to generate mobility shifts in a direct Western blot analysis compared to wild-type Hpo protein (Figure S5C), while other Hpo mutations, T615A, S66A, and Y365E, did not affect the mobility shift upon Par-1 induction (Figure S5B–S5C). Moreover, we also found that Hpo (S30A)-KD could not be shifted by Par-1 (Figure 5D). These results suggested that Par-1 may regulate the phosphorylation of Hpo at Ser30. To verify our prediction, an antibody that specifically recognized phosphorylated Hpo at Ser30 was generated. As shown in Figures 5E and S5D, Par-1 but not Par-1-KD was able to induce Hpo phosphorylation at Ser30. To validate the phospho-Hpo S30 antibody, a phosphatase treatment was applied. Upon phosphatase treatment, the p-Hpo(S30) band was undetectable (Figure 5E, compare lane 5 with lane 2). To further test the specificity of this antibody, we transfected Hpo(S30) mutants with Par-1. As shown in Figure 5F and in Figure S5E, in the presence of Par-1, Hpo(S30A) could not be detected by the p-Hpo(Ser 30) antibody, whereas wild-type Hpo(S30) was detected. These findings indicated that Par-1 regulated the phosphorylation of Hpo at Ser30. Par-1-Induced Hpo Ser30 Phosphorylation Regulates Hpo Activity In recent proteome-wide phosphorylation studies using Drosophila embryos [40], it was suggested that Hpo was phosphorylated at Ser30 in vivo, indicating an important role for the Ser30 site in the regulation of Hpo activity. To determine the biological significance of Hpo phosphorylation at Ser30 induced by Par-1, we first detected whether Ser30 phosphorylation state affects Hpo phosphorylation at Thr195, which was important for Hpo activation. As shown in Figure 5G–5H, Par-1, but not Par-1-KD, significantly inhibited Hpo phosphorylation levels at Thr195, whereas this inhibitory effect was abolished when the Ser30 site was mutated. More importantly, phosphorylation at Thr195 was slightly elevated when Ser30 was mutated into an alanine (Figure 5G, compare lane 4 with lane 1, and Figure 5H). These findings suggested that Par-1 regulated Hpo activity via antagonizing phosphorylation at the Thr195 site by regulating Ser30 phosphorylation. It has been reported that the Hpo Thr195 site is not only auto-phosphorylated but also phosphorylated by Tao-1 [37],[38], which is a partner of Par-1 in the regulation of microtubule dynamics. Thus, we asked whether Par-1-induced phosphorylation at Ser30 also affected Tao-1-mediated phosphorylation at Thr195. As shown in Figure S5F, consistent with the results shown in Figure 5G–5H, Par-1 suppressed Tao-1-mediated phosphorylation at Thr195. The antagonistic effect of Par-1 and Tao-1 on Hpo phosphorylation at Thr195 motivated the examination of the interrelationship of Par-1 and Tao-1 in the Hpo pathway. We found that Tao-1 disrupted Par-1-induced a phosphorylation mobility shift of Hpo-KD (Figure S5F), suggesting that the function of Par-1 in the Hpo pathway was modulated by upstream signaling. Because Hpo(S30A) demonstrated a higher activity compared to the Hpo wild type, we examined whether Hpo(S30A) exerted its inhibitory effect on Yki. We found that Hpo(S30A) resulted in a dramatic Yki mobility shift, whereas the Hpo wild type resulted in a moderate phosphorylation of Yki (Figure 5I, compare lane 2 with lane 6). Furthermore, we found that co-transfection of the Hpo(S30A) mutant blocked Par-1-induced Yki dephosphorylation (Figure 5I, compare lane 3 with lane 7), which further confirmed our conclusion that Par-1 modulated Hpo activity by regulating Hpo phosphorylation at Ser30. To further investigate the role of Par-1-induced Hpo phosphorylation at Ser30 in Hpo activity regulation, we generated transgenic flies of attB-UAS-Hpo variants at the 75B1 attP locus, which ensured equal expression of different forms of the Hpo mutant. Because the flies were grown at room temperature, overexpression of Hpo under the control of GMR-Gal4, Ci-Gal4, and hh-Gal4 resulted in adult lethality; therefore, we selected weak drivers, such as C765, which induced gene expression moderately in the Drosophila wing to compare the activity of Hpo variants. Consistent with our in vitro studies, Hpo(S30A) mutants exhibited much smaller wings compared to wild-type Hpo flies (Figure 5J–5K), indicating that Hpo(S30A) variants demonstrated a higher activity than wild-type Hpo in vivo. To further strengthen this conclusion, we used an inducible Ci-Gal4, which was controlled by a temperature-sensitive Gal80, to drive Hpo expression in order to exclude the early lethality. Because some progeny were viable when Hpo expression was induced after pupa formation, we then compared the activity of Hpo variants by measuring the wing size of the survivors and calculated the mortality rate. We found that survivors with Hpo expression maintained a relatively normal wing size (Figure S5G–S5H), while survivors with Hpo(S30A) expression demonstrated much smaller wings compared to controls (Figure S5G–S5H). Meanwhile, Hpo(S30A) flies exhibited a higher mortality rate compared to wild-type Hpo flies (Figure S5I). These findings suggested that the activity of Hpo(S30A) mutants was higher compared to wild-type Hpo. Taken together, on the basis of the above biochemical and in vivo evidence, we speculated that Par-1 inhibited Hpo activity via the regulation of Hpo phosphorylation at the Ser30 site. Par-1 Inhibits Hpo-Sav Association in a Kinase-Dependent Manner We observed that Par-1 blocked Hpo-induced Sav stabilization in a kinase-dependent manner (Figure S5C, and Figure 6A, compare lane 3 with lane 2). Thus, we examined whether Par-1 induced Sav destabilization, which was dependent on the phosphorylation of the Hpo Ser30 site. Interestingly, we found that the stabilization of Sav by Hpo was not affected by Par-1-induced Hpo phosphorylation at Ser30 because Hpo(S30A) mutants were still able to stabilize Sav, and this stabilization could be reversed by Par-1 (Figure S6A). In addition to the change in Sav protein levels, Par-1 also decreased the phosphorylation status of Sav, which was induced by Hpo in a kinase-dependent manner (Figure 6B, compare lanes 5–6 with lane 2). Given that full-length Par-1 weakly interacted with both Hpo and Sav (Figure 5A–5B), we then determined whether Par-1 affected the association between Hpo and Sav. As shown in Figure 6C, Par-1, but not Par-1-KD, impaired the association of endogenous Hpo and transfected Sav, suggesting that the interaction between Hpo and Sav was disrupted by Par-1 overexpression. To mimic the disruption of the Hpo-Sav complex, we ablated sav using the MARCM system. We found that the growth defect induced by Par-1 RNAi was incapable of inhibiting sav mutant-induced clone growth (Figure S6B–S6C) and adult eye overgrowth (Figure 6D-–6D″), suggesting that Par-1 functioned upstream of sav. These observations supported the notion that Par-1 kinase activity was important to restrain the function of the Hpo-Sav complex. On the basis of the evidence provided above, we propose a model of how Par-1 restricts the activity of Hpo pathway (Figure 6E). Par-1 regulates Hpo phosphorylation at Ser30 to modulate the Hpo kinase activity; simultaneously, Par-1 also promotes the dissociation of Sav from Hpo, resulting in the dephosphorylation and destabilization of Sav, thereby repressing the function of the Hpo-Sav complex. The Mammalian Homologue of Par-1, MAP/MARK, Regulates the Hpo Pathway An evolutionally conserved function of Par-1 in regulating microtubule dynamics has been reported [41]. To determine whether the function of Par-1 on the Hpo pathway is conserved, the human homologue of Par-1, MARK1 and MARK4 were cloned. The Gal4-tead4 reporter [56] was used to examine the effect of MARK1 and MARK4 on the mammalian Hpo pathway. As expected, both MARK1 and MARK4, but not the kinase-dead form of MARK4, activated the YAP transcription co-activator activity (Figures 7A and S7A). Since MARK4 activated YAP more than MARK1, we investigated whether MARK4 affected YAP phosphorylation. Indeed, the phosphorylation levels of YAP were significantly decreased upon MARK4 overexpression (Figure 7B), indicating that MARK inhibited Hpo signaling in mammals. Finally, we investigated whether MARK also resulted in MST (human homologue of Hpo) phosphorylation. We found that both MARK4 and MARK1 induced mobility shifts of MST2 (Figures 7C and S7B). Taken together, these findings suggested that the inhibitory function of Par-1 on Hpo signaling was conserved from Drosophila to mammals. Our study suggested that Par-1 may have a procarcinogenic role because its hyperactivation in Drosophila is sufficient to induce tissue overgrowth, and in mammals, Par-1 is sufficient to activate YAP. Interestingly, MARK4 has been suggested to play a role in hepatocellular carcinogenesis and gliomagenesis [51],[52]. However, whether MARK1 is also involved in carcinogenesis is largely unknown. To further elucidate this question, we characterized the mutation and expression of MARK1 in different cancer samples using the COSMIC and GEO databases. Although no mutation or deletion of MARK1 has been reported in the tumors that were surveyed, the transcription levels of MARK1 showed significant upregulation in squamous lung cancer samples and during the progression of prostate cancer (Figure 7D–7E). Taken together, these findings suggested that Par-1 was a potential oncogene and that its regulatory role in Hpo signaling could be conserved. Download: PPT PowerPoint slide PNG larger image TIFF original image Figure 7. MARK, the mammalian homologue of Par-1, regulates the Hpo pathway. (A) MARK4 activates YAP transcriptional activity. The indicated plasmids were co-transfected with the 5XUAS-luc reporter, Gal4-TEAD4, Flag-YAP, and CMV-β-galactosidase construct into HEK293T cells. Luciferase activity was measured and normalized against β-galactosidase activity. (B) Ectopic expression of MARK4 inhibits YAP phosphorylation. HEK293T cells were transfected with the indicated constructs, and YAP was immunoprecipitated using the anti-Flag antibody. YAP phosphorylation was detected using Western blot analysis with a phospho-YAP specific antibody and determining its mobility on a Phos-tag-containing SDS-PAGE gel. (C) MARK4 induces the phosphorylation of MST2. HEK293T cells were transfected with the indicated constructs, and MST2 phosphorylation was analyzed by electrophoresis on a Phos-tag-containing gel and through Western blotting with an anti-Flag antibody. (D) MARK1 expression is significantly upregulated in squamous lung cancer biopsy specimens compared with matched control specimens. The expression profiling data were downloaded from the GEO dataset GDS1312. These data were expressed as the mean ± SEM and were analyzed using the paired t-test. *p<0.05. (E) Prostate cancer progression is accompanied by an increased expression of MARK1. PIN, prostatic intraepithelial neoplasia; PCA, localized prostate cancer; MET, metastatic prostate cancer. The expression data were obtained from GEO dataset GDS3289. All of the results were expressed as the mean ± SEM. ***p<0.001. https://doi.org/10.1371/journal.pbio.1001620.g007 Par-1 EP Lines Promote Growth via Hpo Signaling To identify novel candidates of the Hpo pathway, we performed an overexpression screen in which flies carrying GMR-Gal4 and UAS-Yki (referred to as GMR-Yki) were crossed with a collection of EP lines. Overexpression of UAS-Yki posterior to the morphogenetic furrow (MF) under the control of the GMR-Gal4 driver (GMR-Yki) resulted in enlarged eyes (compare Figure 1A′ with 1A), providing a sensitive background for a genetic modifier screen [53]. Each EP line was crossed with GMR-Yki flies, and the F1 progeny was screened for an increase in eye size. From more than 10,000 EP lines, we screened numerous lines that enhanced the overgrowth phenotype induced by Yki overexpression. We then analyzed the UAS element insertion sites of these lines and found that the insertion sites of L[484], L[507], and F[727] were all within the 5′ UTR region of the par-1 gene (Figure 1C). Although these three EP lines did not display an overgrowth phenotype when driven by GMR-Gal4 in Drosophila eyes (compare Figure 1B with 1A, and unpublished data), these lines dramatically enhanced the GMR-Yki-induced overgrowth phenotype (compare Figure 1B′ with 1A′, and unpublished data). In addition, the expression of these lines driven under the wing-specific Gal4 driver MS1096 produced enlarged adult wings (Figure 1D–1D″), indicating that the candidate genes expressed in these lines may play a role in organ size control. To determine whether the UAS element of these lines regulated par-1 gene expression, real-time PCR analysis was performed for the L[484] line. The mRNA level of par-1 was significantly upregulated when the L[484] line was crossed with MS1096, whereas the mRNA level of genes located proximal to par-1, mei-W68, and hpo, demonstrated a slight or no change (Figure 1E), suggesting that ectopic Par-1 expression could be responsible for the tissue overgrowth phenotype that we observed. Download: PPT PowerPoint slide PNG larger image TIFF original image Figure 1. Par-1 P-element insertion lines enlarge organ size and promote Hpo pathway-responsive gene expression. (A–B′) An EP line L[484] enhanced the Yki gain-of-function induced phenotype. Side views of D. melanogaster wild-type eyes (A), eyes expressing L[484] (B), eyes expressing Yki (A′), or eyes co-expressing L[484] and Yki (B′), driven by GMR-Gal4. (C) Schematic representation of the par-1 gene locus and P-element insertion sites. The insertion sites of the L[484], L[507], and F[727] EP lines are at the 5′ UTR region of the Par-1 gene (marked by orange arrows). The par-1 locus is located between mei-W68 and hpo. (D–D″) L[484] promotes adult fly wing growth. Control adult fly wings (D) and wings expressing L[484] (D′) with the wing-specific driver MS1096. The red dashed line indicates the size of the control wing. The relative wing size was quantified using an unpaired t-test (D″). The results represent the mean ± SEM. ***p<0.001 (n>6) for each genotype. (E) Expression of L[484] significantly increased the mRNA levels of Par-1. To detect the level of Par-1 and the transcripts of its neighboring genes, a real-time PCR analysis was performed. All of the results were expressed as the mean ± SEM.*p<0.05, **p<0.01. (F–G′) L[484] promotes expanded gene expression. Wild-type D. melanogaster third-instar larval wing discs (F) or wing discs expressing L[484] (G–G′) with hh-Gal4 were immunostained to demonstrate the expression of Cubitus (Ci) (Red) and Ex-LacZ (EX-Z) (green). Ci marked the anterior compartment (A-compartment). The arrows indicate the P-compartment. (H–H″) L[484] elevates diap1 gene expression. Wing discs containing flip-out clones expressing L[484] with act>CD2>Gal4 were immunostained to demonstrate the expression of CD2 (red) and diap1-lacZ (green). Cells expressing L[484] were indicated by the lack of CD2 expression (indicated by arrows). https://doi.org/10.1371/journal.pbio.1001620.g001 To determine whether overexpression of L[484] promoted tissue growth via Hpo signaling, the L[484] line was expressed under the control of the hh-Gal4 driver, which drives gene expression in the posterior compartment (P-compartment). As shown in Figure 1G–1G′, ex-lacZ (EX-Z), an enhancer trap for ex [27], was increased in the P-compartment of the wing imaginal disc, suggesting an inhibition of Hpo signaling. Furthermore, the Hpo downstream marker diap1-lacZ was also upregulated in the flip-out clones expressing L[484] (Figure 1H–1H″). Briefly, these observations suggested that the expression of the L[484] line promoted tissue growth via Hpo signaling by controlling the expression of Par-1. Overexpression of Par-1 Inactivates Hpo Signaling to Induce Tissue Growth in a Kinase-Dependent Manner To verify the functional relationship between Par-1 and the Hpo pathway, a dual luciferase assay, which reflected Sd-Yki transcriptional activity [20], was performed. As shown in Figure 2A, in S2 cells, coexpression of Yki and Sd activated the luciferase reporter gene (3×Sd2-Luc), which was greatly promoted by Par-1, indicating that Par-1 enhanced the activity of the Sd-Yki transcriptional complex in vitro. To further determine the functional relationship between Par-1 and the Hpo pathway in vivo, Myc tagged Par-1 transgenic flies were generated. Consistent with the results in Figure 1A–1B′, overexpression of two copies of UAS-Myc-Par-1, using the GMR-Gal4 driver (referred to as GMR/2*Myc-Par-1), resulted in rough eyes without a discernible overgrowth (compare Figure 2B′ with 2B), while coexpression of UAS-Myc-Par-1 with GMR-Yki enhanced the overgrowth phenotype caused by GMR-Yki (compare Figure 2C′ with 2C). Although GMR/2*Myc-Par-1 did not induce a discernible overgrowth phenotype in the eyes (Figure 2B′), the expression of two copies of UAS-Myc-Par-1, using the MS1096 driver (referred to as MS1096/2*Myc-Par-1), resulted in enlarged wings and caused a wing bending-down phenotype, which indicated an expansion of the wing (Figure 2D′″ and compare Figure 2D′ with 2D and 2E′ with 2E). We also found that the relative P-compartment area of the wings expressing UAS-Par-1, using the hh-Gal4 driver, was increased (Figure S1A–S1B). We then examined whether overexpression of Par-1 affected the expression of Hpo pathway-responsive genes. We found that flip-out clones expressing UAS-Myc-Par-1 in the wing imaginal discs showed upregulated expression of EX-Z (Figure 2F–2F′), diap1-lacZ (Figure 2H–2H′) and diap1-GFP3.5 (a diap1 enhancer element reporter [20], Figure S1C–S1C″), suggesting compromised Hpo signaling activity. Taken together, these results suggested that overexpression of Par-1 promoted tissue overgrowth by inhibiting Hpo pathway activity. Download: PPT PowerPoint slide PNG larger image TIFF original image Figure 2. Overexpression of Par-1 triggers tissue overgrowth and inactivates Hpo signaling in a kinase-dependent manner. (A) Par-1 enhances the transcriptional activity of the Yki-Sd complex in vitro. S2 cells were transfected with the indicated constructs and the luciferase reporter genes. 48 h after transfection, the cell lysates were harvested and subjected to a dual luciferase assay. Note that Par-1 activates the 3×Sd2-Luc reporter compared with the control. All of these data were represented as the mean ± SEM. **p<0.01. **p<0.001. (B–C″) Par-1, but not Par-1-KD, synergizes with Yki to trigger tissue overgrowth. Side view of D. melanogaster adult eyes: wild-type (B); eyes expressing two copies of UAS-Par-1 (B′), two copies of UAS-Par-1-KD (B″) or UAS-Yki (C); or eyes co-expressing UAS-Yki and two copies of UAS-Par-1 (C′) or UAS-Yki and two copies of UAS-Par-1-KD (C″), driven by GMR-Gal4. (D–E″) Par-1, but not Par-1-KD, induces Drosophila wing overgrowth. Dorsal view (D–D″) or side view (E–E″) of the control wings (D, E), wings expressing two copies of UAS-Par-1 (D′, E′), or wings expressing two copies of UAS-Par-1-KD (D″, E″), with MS1096. The red dashed line indicated the size of the control wings. The relative wing size was quantified using the unpaired t-test (D′″). The results represented the mean ± SEM. *** means p<0.001 (n>6) for each genotype. Note that the adult wings were bent down in flies that overexpressed Par-1. This phenotype was not observed in the flies that overexpressed Par-1-KD. (F–I′) Par-1, but not Par-1-KD, promotes the Hpo pathway-responsive gene expression. Drosophila discs containing flip-out clones expressing UAS-Myc-Par-1 or UAS-Myc-Par-1-KD driven by act>CD2>Gal4 were dissected and immunostained with the indicated antibodies. Cells expressing UAS-Myc-Par-1 or UAS-Myc-Par-1-KD transgenes were labeled by Myc tag (indicated by arrows). Note the upregulation of diap1 and ex transcription via ectopic Par-1 expression. Par-1-KD was incapable of inducing diap1 and ex expression. https://doi.org/10.1371/journal.pbio.1001620.g002 Considering that the Ser/Thr kinase activity of Par-1 was important for its function in polarity regulation, we speculated that the function of Par-1 in Hpo signaling might also be dependent on its kinase activity. To examine this hypothesis, we first constructed a kinase-dead form of Par-1 (Par-1-KD), which contained the T408A and S412A mutations. Par-1, containing these two mutations, was thought to be a kinase inactive mutant because the activation loop of the catalytic domain was disrupted [54]. This was confirmed by an in vitro kinase assay in which the kinase activity of Par-1-KD was completely abolished (Figure S1D). Unlike the phenotype observed with the expression of two copies of the Par-1 transgenes, the expression of two copies of the Par-1-KD transgenes had no obvious effect on either eye growth or wing growth and did not dramatically enhance the overgrowth phenotype induced by Yki overexpression (compare Figure 2B″, 2C″ with 2B–2B′, 2C–2C′ and 2D″, 2E″ with 2D–2D′, 2E–2E′). Importantly, to exclude the possibility that the functional variation between Par-1 and Par-1-KD was due to a low expression level of Par-1-KD, we compared the overexpressed Par-1 and Par-1-KD protein levels in both the eye and wing imaginal discs using direct Western blot analysis. We found that the overexpression level of Par-1-KD was higher compared to Par-1, verifying that the functional variation did not result from a low Par-1-KD expression level (Figure S1E–S1F). Furthermore, Par-1-KD failed to elevate Ex and Diap1 expression in flip-out clones (Figure 2G–2G′ and 2I–2I′). Taken together, these observations demonstrated that overexpression of Par-1 promoted tissue overgrowth by promoting the activity of the Sd-Yki complex and upregulating the expression of Hpo pathway-responsive genes in a kinase-dependent fashion. Loss of Par-1 Inhibits Tissue Growth by Downregulating Hpo Pathway Targets To determine whether Par-1 is necessary for normal growth, we examined the effect of the loss-of-function of Par-1 on Hpo signaling. By expressing UAS-Par-1-RNAi under the control of eyeless-Gal4 (ey-Gal4) or MS1096, adult eye/wing sizes were reduced (Figure 3A–3B″), suggesting that Par-1 (activity) was required for normal eye and wing development. Par-1 RNAi efficiency was also confirmed by in vivo staining, in which Par-1-RNAi transgenes were expressed under the control of hh-Gal4. As shown in Figure S2B–S2B′, endogenous Par-1 protein levels were efficiently knocked down by expressing Par-1-RNAi in the P-compartment. In addition, shrinkage of the P-compartment was also observed (Figure S2B–S2B′). To eliminate the concern regarding Par-1 RNAi off-target effects, a second line of Par-1 RNAi (Par-1-RNAi-2), which targets a different region of Par-1, was generated. Par-1-RNAi-2 also efficiently knocked down endogenous Par-1 expression (Figure S2C–S2C′) and restricted wing growth when expressed by MS1096 (Figure S2D–S2D′). Furthermore, the expression of Par-1-RNAi by GMR-Gal4 resulted in the detection of caspase-3 in its active (cleaved) form (Figure 3D–3D′), indicating a role for Par-1 in restricting tissue growth by inducing apoptotic cell death. Download: PPT PowerPoint slide PNG larger image TIFF original image Figure 3. Inactivation of Par-1 reduces organ size and downregulates the Hpo pathway-responsive genes. (A–B″) Inactivation of Par-1 restricts organ growth. Side view of adult fly eyes: wild-type (A) or eyes expressing Par-1 RNAi (A′) under the control of eyless-Gal4. Dorsal view of adult wings: wild type (B) or wings expressing Par-1 RNAi (B′) under the control of MS1096. The relative wing size was quantified using an unpaired t-test (B″). The results represented the mean ± SEM. *** means p<0.001 (n>6) for each genotype. (C–C′) Knockout of par-1 restricts cell growth. Drosophila third-instar larval eye discs containing par-1w3 clones were dissected. par-1 mutant clones were marked by the loss of GFP expression, while their twin spots were marked by increased GFP expression. The representative par-1 mutant clone and its twin spot were separately indicated by the white dashed line and red dashed line (C). The total area of the par-1 mutant clones or their twin spots within one eye disc were calculated (C′). The results represented the mean ± SEM. *** means p<0.001 (n>6) for each genotype. (D–D′) Par-1 knockdown initiates apoptosis. Drosophila third-instar larval control eye discs (D) or discs expressing Par-1 RNAi (D′) under the control of GMR-Gal4 were immunostained with a cleaved-caspase-3 antibody. Note that caspase-3 was cleaved to its active form when Par-1 RNAi was expressed. (E–H′) Par-1 knockdown suppresses the expression of the Hpo pathway-responsive genes. Wing discs expressing Par-1 RNAi with hh-Gal4 (E–E′, H–H′) or with act>CD2>Gal4 (F–F′) were immunostained to demonstrate the protein expression levels of EX-Z (E–E′), mic32-GFP (F–F′), and DIAP1 (G–H′). Cells expressing Par-1 RNAi were labeled either by the lack of Ci or CD2 expression. Note that the expression of EX-Z and DIAP1 was inhibited, whereas the levels of mic32-GFP were increased. The arrows indicate the P-compartment or clone regions. (I–J″) Loss of par-1 disrupts the expression of Hpo pathway-responsive genes. D. melanogaster third-instar larval eye discs containing par-1W3 clones were dissected and examined to determine the expression of diap1-lacZ (I–I″) and bantam-lacZ (J–J″). par-1 mutant clones were marked by the loss of GFP expression. Note the downregulation of diap1-lacZ and bantam-lacZ in the absence of Par-1. The clone regions are indicated by arrows. https://doi.org/10.1371/journal.pbio.1001620.g003 We further tested whether knockdown of Par-1 resulted in downregulation of Hpo pathway-responsive genes. Expression of either UAS-Par-1-RNAi or UAS-Par-1-RNAi-2 by hh-Gal4 resulted in diminished levels of EX-Z, DIAP1, and diap1-GFP3.5 and a reduced P-compartment size (Figures 3E–3E′, 3H–3H′, and S2E–S2E′, S2F–S2F″). Consistent with these results, a bantam sensor (mic32-GFP) signal was upregulated in Par-1-RNAi flip-out clones (Figure 3F–3F′) or wing discs expressing Par-1-RNAi-2 by hh-Gal4 (Figure S2G–S2G″), suggesting a restriction of microRNA bantam expression by knockdown of Par-1. Thus, this evidence suggested that the inactivation of Par-1 resulted in abnormal growth by antagonizing the expression of Hpo-responsive genes. To further strengthen this conclusion, the expression of Hpo-responsive genes was examined in par-1w3 mosaic clones. As shown in Figure 3I–3I″ and 3J–3J″, in par-1 null clones, the diap1 transcriptional level was reduced (Figure 3I–3I″), and bantam-lacZ was decreased (Figure 3J–3J″). Importantly, the size of the par-1 null clones was significantly reduced compared to their twin spots (Figure 3C–3C′), indicating a proliferation disadvantage for par-1 null clones. Taken together, these observations demonstrated that par-1 was essential for normal growth and that perturbation of Par-1 expression resulted in growth suppression and apoptosis by stimulating the Hpo pathway. par-1 Acts Downstream of ex and fat but Upstream of hpo in the Hpo Pathway Given the findings presented in the previous section, we next determined the functional relationship between Par-1 and Hpo pathway components by identifying the genetic interactions of Par-1 in the Hpo pathway. We first examined whether the function of Par-1 was dependent on the activity of the Sd-Yki transcriptional complex because this complex was the main downstream effector of the Hpo pathway. Although expression of two copies of UAS-Myc-Par-1 in flip-out clones increased diap1-lacZ (Figure S3A–S3A′), no increase in diap1-lacZ was detected when UAS-Yki-RNAi was coexpressed (Figure S3B–S3B′). Coexpression of UAS-Sd-RNAi suppressed the overgrowth phenotype induced by Par-1 overexpression in Drosophila wings (Figure S3C–S3C′). In addition, the elevated levels of diap1 transcription caused by Par-1 overexpression were reverted by coexpression of Sd RNAi (Figure S3D–S3F′). Furthermore, ectopic Yki expression reverted downregulated DIAP1 levels and the shrunken P-compartment phenotype induced by the expression of Par-1 RNAi (Figure 4A–4B′). These results indicated that par-1 functioned upstream of the Sd-Yki transcription complex in the Hpo pathway. To strengthen this conclusion, the levels of phosphorylated Yki, which reflected Hpo/Wts activity, were examined. As expected, Par-1, but not Par-1-KD, reduced phosphorylated Yki levels (Figure 4C). In addition, Par-1 also inhibited the Hpo/Wts signaling-induced Yki mobility shift (Figure 5I, lanes 2–5). These findings suggested that Par-1 functioned upstream of yki to affect the activity of the Sd-Yki transcription complex. We next examined the genetic epistasis between Par-1 and the upstream components of the Hpo pathway. We found that elevated DIAP1 levels and the enlarged P-compartment size (Figure 4D–4D′), resulting from ex RNAi expression by hh-Gal4, were suppressed by coexpression of the Par-1 RNAi transgene (Figure 4E–4E′), suggesting that Par-1 functioned downstream of ex. In addition, coexpression of Par-1 RNAi suppressed ex RNAi-induced wing overgrowth (Figure S3G–S3G′″). Furthermore, Par-1 RNAi also suppressed Ft RNAi-induced wing overgrowth (Figure S3H–S3H′″). These observations supported the notion that par-1 functioned downstream of or in parallel to ex and ft. Download: PPT PowerPoint slide PNG larger image TIFF original image Figure 4. Par-1 functionally interacts with the components of the Hpo pathway. (A–B′) Loss of Par-1 induced a phenotype that was suppressed by Yki overexpression. The protein levels of DIAP1 in the wing discs expressing UAS-Par-1 RNAi (A–A′) or coexpressing UAS-Yki and UAS-Par-1 RNAi (B–B′) with hh-Gal4 were detected. Note that Yki overexpression overcame the inhibitory effect of Par-1 RNAi on diap1 expression. The arrows indicate the P-compartment. (C) Par-1 inhibits Yki phosphorylation. S2 cells were transfected with the indicated constructs. Phosphorylated Yki was detected using the p-Yki antibody, which recognizes the phosphorylated site of Yki at Ser168. (D–E′) ex functions upstream of par-1 in the Hpo pathway. Wing discs expressing UAS-ex RNAi (D–D′) or coexpressing UAS-ex RNAi and UAS-Par-1 RNAi (E–E′) with hh-Gal4 were subjected to immunostaining. The transgene expression regions were marked by the lack of Ci (red) staining and are indicated by arrows. Note that ex RNAi expression resulted in an enlarged P-compartment and increased expression of diap1-GFP 3.5, whereas coexpression with Par-1 RNAi fully suppressed these phenotypes. (F–F′″) wts functions downstream of par-1 in the Hpo pathway. Clones were generated using the MARCM system. The genotypes were the following: ey-flp, Ubi-Gal4, UAS-GFP; FRT82B/FRT82B Gal80 (F), ey-flp, Ubi-Gal4, UAS-GFP; Par-1-RNAi; FRT82B/FRT82B Gal80 (F′), ey-flp, Ubi-Gal4, UAS-GFP; Par-1-RNAi; FRT82B wtslatsX1/FRT82B Gal80 (F″), and ey-flp, Ubi-Gal4, UAS-GFP; FRT82B wtslatsX1/FRT82B Gal80 (F′″). Note that Par-1 RNAi reduced the clone size, whereas wtslatsX1 rescued this phenotype. (G) Quantification of the relative clone size. The relative clone size was calculated as the GFP area divided by the entire disc area. All of these data were expressed as the mean ± SEM. **p<0.01. **p<0.001. n>5, for each group. (H) Par-1 modulates Wts phosphorylation status. S2 cells were transfected with the indicated plasmids. The cell lysates were harvested and followed by Western blot analysis. Note that the phosphorylation shift of Wts mediated by Hpo/Sav/Merlin/Tao-1 was partially blocked by Par-1 expression. The shifted Wts bands are indicated by the small circles. (I–I″) par-1 functions upstream of hpo in the Hpo pathway. Clones were generated using the MARCM system. The genotypes were the following: eyflp, ubi-Gal4, UAS-GFP; FRT42D/FRT42D Gal80 (I), eyflp, ubi-Gal4, UAS-GFP; FRT42D hpoBF33/FRT42DGal80 (I′), eyflp, ubi-Gal4, UAS-GFP; FRT42D hpoBF33/FRT42DGal80; Par-1-RNAi (I″). Note that the Hpo null clones caused tumorous growth, whereas Par-1 RNAi could not rescue this phenotype. https://doi.org/10.1371/journal.pbio.1001620.g004 Download: PPT PowerPoint slide PNG larger image TIFF original image Figure 5. Par-1 interacts with Hpo-Sav and regulates the phosphorylation of Hpo at Ser30. (A–B) Par-1 interacts with Hpo and Sav in vitro. S2 cells were transfected with HA-tagged full-length or truncated Par-1 and Hpo (A) or Sav (B) constructs. The cell lysates were immunoprecipitated, followed by Western blot analysis with the indicated antibodies. Note that weak binding (asterisk indicated) between full-length Par-1/Par-1-C and Hpo and Sav were detected, whereas the N-terminal truncation of Par-1, which contained the kinase domain, showed a much stronger interaction signal. (C) Par-1 induces phosphorylation shift of Hpo-KD in vitro. S2 cells were transfected with the indicated constructs. The cell lysates were subjected to phosphorylation mobility shift assays. Note the phosphorylation shift of Hpo-KD in the presence of Par-1. Phos-tag was used to enhance the phosphorylation shift (see Materials and Methods for further details). (D) Par-1 regulates phosphorylation of Hpo-KD at Ser30 in S2 cells. S2 cells were transfected with the indicated constructs. The cell lysates were subjected to a phosphorylation mobility shift assay. The Hpo Ser30 site was mutated to an alanine. Note that the Hpo(S30A) mutant did not shift in the presence of Par-1. (E–F) Par-1 induces the phosphorylation of Hpo-KD at Ser30 in S2 cells. S2 cells were transfected with the indicated constructs. The cell lysates were subjected to Western blot analyses. Note that the phospho Hpo(Ser30) antibody could only detect Par-1-induced phosphorylation in the Hpo-KD samples but not in the Hpo(Ser30) mutant samples. The asterisks indicate non-specific bands. Lambda-PP indicates λ-phosphatase. (G) Par-1 inhibits Hpo(Thr195) phosphorylation. S2 cells were transfected with the indicated constructs. The cell lysates were immunoprecipitated, followed by Western blot analyses to detect p-Hpo(Thr195) levels. Note that Par-1 inhibited Hpo(Thr195) phosphorylation in a kinase-dependent manner, whereas the Hpo(S30A) mutant could not be inhibited. (H) Quantification of p-Hpo(Thr195) levels. p-Hpo(Thr195) levels were quantified using densitometry. The results were expressed as the mean ± SEM from three independent experiments. *p<0.05. (I) Hpo(S30A) results in a higher phosphorylation shift of Yki. S2 cells were transfected with the indicated constructs. The cell lysates were subjected to a phosphorylation mobility shift assay. Note that the phosphorylation shift of Yki was enhanced in the presence of Hpo(S30A) and that the Hpo(S30A) mutant was resistant to Par-1 induced Yki dephosphorylation. (J–K) Hpo(S30A) shows enhanced activity compared with wild-type Hpo in vivo. Control wings (J) or wings expressing UAS-Hpo (J′) or UAS-Hpo(S30A) (J″) with C765 were shown. The relative wing size was quantified using an unpaired t-test (K). The results represented the mean ± SEM.*p<0.05, **p<0.01, ***p<0.001 (n>6) for each genotype. Note that the Hpo(S30A) flies exhibited smaller wings than the Hpo flies. https://doi.org/10.1371/journal.pbio.1001620.g005 We then determined whether Par-1 regulated the Hpo pathway via Wts, which phosphorylates Yki at Ser168 to retain Yki in the cytoplasm [19]. By generating Par-1 RNAi clones in eye discs using the mosaic analysis with a repressible cell marker (MARCM) technique, we found that the size of Par-1 RNAi clones was extremely small compared to that of the control clones (compare Figure 4F′ with 4F), indicating adverse development of the Par-1 RNAi clones. Strikingly, we found that knockout of wts rescued the adverse developmental phenotypes of the Par-1 RNAi clones and that the size of the wts mutant clones expressing Par-1 RNAi was more comparable to that of the wts clones, rather than that of Par-1 RNAi clones (Figure 4G and compare Figure 4F″ with 4F′″). These findings indicated that par-1 functioned upstream of wts. Because Wts was activated and phosphorylated by upstream components of the Hpo pathway, we then tested whether Par-1 regulated Wts phosphorylation. Par-1, but not Par-1-KD, reduced the mobility shift of Wts phosphorylation in the presence of Hpo/Sav/Merlin/Tao-1 (Figure 4H), suggesting that par-1 functioned upstream of wts and regulated Wts phosphorylation in a kinase-dependent manner. We then determined whether Par-1-regulated Hpo signaling was dependent on the activity of Hpo, which restricted tissue growth by phosphorylating Wts. By generating hpo mutant clones in Drosophila compound eyes using the MARCM system, we found that the ablation of Hpo resulted in tumor outgrowth (compare Figure 4I′ with Figure 4I). Furthermore, we found that Par-1 RNAi was incapable of reverting the growth advantage of hpo null clones (compare Figure 4I″ with Figure 4I′), indicating that Par-1 functioned upstream of hpo to regulate Hpo signaling. Par-1 Interacts with Hpo-Sav and Regulates the Phosphorylation of Hpo at Ser30 Given that Par-1 functions upstream of hpo (Figure 4), we speculated that Par-1 modulated the function of the Hpo-Sav complex. To test our hypothesis, we first examined whether Par-1 bound to the Hpo-Sav complex using a co-immunoprecipitation assay. As expected, full-length Par-1 interacted with both Hpo and Sav, although these associations were weak (Figure 5A–5B). In addition, we also found that the HA-tagged N-terminal fragment of Par-1 (Par-1-N, Figure S4A) had a strong association with both Hpo and Sav, whereas there was only a weak interaction between the Par-1 C-terminal fragment (Par-1-C, Figure S4A) and Hpo/Sav had been detected (Figure 5A–5B). Importantly, the interaction between Par-1 and Hpo/Sav was specific because neither the full-length nor the N-terminal fragment of Par-1 co-immunoprecipitated with Merlin (unpublished data), Wts, or Mats (Figure S4B). On the basis of the previous results that Par-1 functioned in a kinase-dependent manner in Hpo signaling, we performed a phosphorylation shift experiment using a phos-tag gel to determine whether Par-1 affected the phosphorylation of Hpo and Sav in vitro. The Phos-Tag is a phosphate binding compound that, when incorporated into polyacrylamide gels, results in an exaggerated mobility shift for phosphorylated proteins, which is dependent upon the degree of phosphorylation [18],[55]. We observed that Flag-Hpo cotransfected with Par-1, but not Par-1-KD, in S2 cells that exhibited a mobility shift (Figure S5A). However, such a mobility shift was not detected for Sav when it was cotransfected with either Par-1 or Par-1-KD (Figure 6B, compare lanes 3–4 with lane 1). To remove the effect of Hpo auto-phosphorylation, a kinase-dead form of Hpo (Hpo-KD) was also tested in the phosphorylation shift experiment. As shown in Figure 5C, Par-1 also induced Hpo-KD to generate a phosphorylation mobility shift. These results suggested that Par-1 specifically induced Hpo phosphorylation in S2 cells. Download: PPT PowerPoint slide PNG larger image TIFF original image Figure 6. Par-1 disrupts the association of the Hpo-Sav complex in a kinase-dependent manner. (A) Par-1 could destabilize Hpo-induced accumulation of Sav in a kinase-dependent manner. S2 cells were transfected with the indicated constructs followed by a Western blot analysis. CFP served as a loading control. (B) Par-1 inhibits the phosphorylation of Sav induced by Hpo. The mobility shift assay was employed. The loading volume was adjusted according to the total Sav protein level. (C) Par-1 disrupts the interaction of the Hpo-Sav complex. S2 cells were transfected with the indicated constructs followed by immunoprecipitation to test whether the interaction between Sav and endogenous Hpo was affected by Par-1. Note that less Sav interacted with Hpo in the presence of Par-1. (D–D″) Par-1 RNAi is incapable of inhibiting sav mutant-induced adult eye overgrowth. The clones were generated using the MARCM system. The genotypes are as following: eyflp, ubi-Gal4, UAS-GFP; FRT82B/FRT82BGal80 (D), eyflp, ubi-Gal4, UAS-GFP; FRT82B SavSH13/FRT82B Gal80 (D′), and eyflp, ubi-Gal4, UAS-GFP; Par-1-RNAi; FRT82B SavSH13/FRT82B Gal80 (D″). (E) The proposed mechanism of Par-1 regulation of the Hpo pathway (see text for further detail). https://doi.org/10.1371/journal.pbio.1001620.g006 To identify which sites on Hpo were affected by Par-1, we cotransfected Flag tagged Hpo with HA tagged Par-1 in S2 cells. Flag-Hpo protein was then immunoprecipitated and separated using SDS-PAGE and harvested for a semi-quantitative mass spectrometric (MS) analysis (details are described in the Materials and Methods). According to the MS results, we identified four potential phosphorylation sites that were affected by Par-1 expression, S30, S66, Y365, and T615. We then generated different Hpo variants by individually mutating these candidate sites and then examined how these mutations affected Par-1-induced Hpo phosphorylation. Interestingly, only Hpo (S30A) failed to generate mobility shifts in a direct Western blot analysis compared to wild-type Hpo protein (Figure S5C), while other Hpo mutations, T615A, S66A, and Y365E, did not affect the mobility shift upon Par-1 induction (Figure S5B–S5C). Moreover, we also found that Hpo (S30A)-KD could not be shifted by Par-1 (Figure 5D). These results suggested that Par-1 may regulate the phosphorylation of Hpo at Ser30. To verify our prediction, an antibody that specifically recognized phosphorylated Hpo at Ser30 was generated. As shown in Figures 5E and S5D, Par-1 but not Par-1-KD was able to induce Hpo phosphorylation at Ser30. To validate the phospho-Hpo S30 antibody, a phosphatase treatment was applied. Upon phosphatase treatment, the p-Hpo(S30) band was undetectable (Figure 5E, compare lane 5 with lane 2). To further test the specificity of this antibody, we transfected Hpo(S30) mutants with Par-1. As shown in Figure 5F and in Figure S5E, in the presence of Par-1, Hpo(S30A) could not be detected by the p-Hpo(Ser 30) antibody, whereas wild-type Hpo(S30) was detected. These findings indicated that Par-1 regulated the phosphorylation of Hpo at Ser30. Par-1-Induced Hpo Ser30 Phosphorylation Regulates Hpo Activity In recent proteome-wide phosphorylation studies using Drosophila embryos [40], it was suggested that Hpo was phosphorylated at Ser30 in vivo, indicating an important role for the Ser30 site in the regulation of Hpo activity. To determine the biological significance of Hpo phosphorylation at Ser30 induced by Par-1, we first detected whether Ser30 phosphorylation state affects Hpo phosphorylation at Thr195, which was important for Hpo activation. As shown in Figure 5G–5H, Par-1, but not Par-1-KD, significantly inhibited Hpo phosphorylation levels at Thr195, whereas this inhibitory effect was abolished when the Ser30 site was mutated. More importantly, phosphorylation at Thr195 was slightly elevated when Ser30 was mutated into an alanine (Figure 5G, compare lane 4 with lane 1, and Figure 5H). These findings suggested that Par-1 regulated Hpo activity via antagonizing phosphorylation at the Thr195 site by regulating Ser30 phosphorylation. It has been reported that the Hpo Thr195 site is not only auto-phosphorylated but also phosphorylated by Tao-1 [37],[38], which is a partner of Par-1 in the regulation of microtubule dynamics. Thus, we asked whether Par-1-induced phosphorylation at Ser30 also affected Tao-1-mediated phosphorylation at Thr195. As shown in Figure S5F, consistent with the results shown in Figure 5G–5H, Par-1 suppressed Tao-1-mediated phosphorylation at Thr195. The antagonistic effect of Par-1 and Tao-1 on Hpo phosphorylation at Thr195 motivated the examination of the interrelationship of Par-1 and Tao-1 in the Hpo pathway. We found that Tao-1 disrupted Par-1-induced a phosphorylation mobility shift of Hpo-KD (Figure S5F), suggesting that the function of Par-1 in the Hpo pathway was modulated by upstream signaling. Because Hpo(S30A) demonstrated a higher activity compared to the Hpo wild type, we examined whether Hpo(S30A) exerted its inhibitory effect on Yki. We found that Hpo(S30A) resulted in a dramatic Yki mobility shift, whereas the Hpo wild type resulted in a moderate phosphorylation of Yki (Figure 5I, compare lane 2 with lane 6). Furthermore, we found that co-transfection of the Hpo(S30A) mutant blocked Par-1-induced Yki dephosphorylation (Figure 5I, compare lane 3 with lane 7), which further confirmed our conclusion that Par-1 modulated Hpo activity by regulating Hpo phosphorylation at Ser30. To further investigate the role of Par-1-induced Hpo phosphorylation at Ser30 in Hpo activity regulation, we generated transgenic flies of attB-UAS-Hpo variants at the 75B1 attP locus, which ensured equal expression of different forms of the Hpo mutant. Because the flies were grown at room temperature, overexpression of Hpo under the control of GMR-Gal4, Ci-Gal4, and hh-Gal4 resulted in adult lethality; therefore, we selected weak drivers, such as C765, which induced gene expression moderately in the Drosophila wing to compare the activity of Hpo variants. Consistent with our in vitro studies, Hpo(S30A) mutants exhibited much smaller wings compared to wild-type Hpo flies (Figure 5J–5K), indicating that Hpo(S30A) variants demonstrated a higher activity than wild-type Hpo in vivo. To further strengthen this conclusion, we used an inducible Ci-Gal4, which was controlled by a temperature-sensitive Gal80, to drive Hpo expression in order to exclude the early lethality. Because some progeny were viable when Hpo expression was induced after pupa formation, we then compared the activity of Hpo variants by measuring the wing size of the survivors and calculated the mortality rate. We found that survivors with Hpo expression maintained a relatively normal wing size (Figure S5G–S5H), while survivors with Hpo(S30A) expression demonstrated much smaller wings compared to controls (Figure S5G–S5H). Meanwhile, Hpo(S30A) flies exhibited a higher mortality rate compared to wild-type Hpo flies (Figure S5I). These findings suggested that the activity of Hpo(S30A) mutants was higher compared to wild-type Hpo. Taken together, on the basis of the above biochemical and in vivo evidence, we speculated that Par-1 inhibited Hpo activity via the regulation of Hpo phosphorylation at the Ser30 site. Par-1 Inhibits Hpo-Sav Association in a Kinase-Dependent Manner We observed that Par-1 blocked Hpo-induced Sav stabilization in a kinase-dependent manner (Figure S5C, and Figure 6A, compare lane 3 with lane 2). Thus, we examined whether Par-1 induced Sav destabilization, which was dependent on the phosphorylation of the Hpo Ser30 site. Interestingly, we found that the stabilization of Sav by Hpo was not affected by Par-1-induced Hpo phosphorylation at Ser30 because Hpo(S30A) mutants were still able to stabilize Sav, and this stabilization could be reversed by Par-1 (Figure S6A). In addition to the change in Sav protein levels, Par-1 also decreased the phosphorylation status of Sav, which was induced by Hpo in a kinase-dependent manner (Figure 6B, compare lanes 5–6 with lane 2). Given that full-length Par-1 weakly interacted with both Hpo and Sav (Figure 5A–5B), we then determined whether Par-1 affected the association between Hpo and Sav. As shown in Figure 6C, Par-1, but not Par-1-KD, impaired the association of endogenous Hpo and transfected Sav, suggesting that the interaction between Hpo and Sav was disrupted by Par-1 overexpression. To mimic the disruption of the Hpo-Sav complex, we ablated sav using the MARCM system. We found that the growth defect induced by Par-1 RNAi was incapable of inhibiting sav mutant-induced clone growth (Figure S6B–S6C) and adult eye overgrowth (Figure 6D-–6D″), suggesting that Par-1 functioned upstream of sav. These observations supported the notion that Par-1 kinase activity was important to restrain the function of the Hpo-Sav complex. On the basis of the evidence provided above, we propose a model of how Par-1 restricts the activity of Hpo pathway (Figure 6E). Par-1 regulates Hpo phosphorylation at Ser30 to modulate the Hpo kinase activity; simultaneously, Par-1 also promotes the dissociation of Sav from Hpo, resulting in the dephosphorylation and destabilization of Sav, thereby repressing the function of the Hpo-Sav complex. The Mammalian Homologue of Par-1, MAP/MARK, Regulates the Hpo Pathway An evolutionally conserved function of Par-1 in regulating microtubule dynamics has been reported [41]. To determine whether the function of Par-1 on the Hpo pathway is conserved, the human homologue of Par-1, MARK1 and MARK4 were cloned. The Gal4-tead4 reporter [56] was used to examine the effect of MARK1 and MARK4 on the mammalian Hpo pathway. As expected, both MARK1 and MARK4, but not the kinase-dead form of MARK4, activated the YAP transcription co-activator activity (Figures 7A and S7A). Since MARK4 activated YAP more than MARK1, we investigated whether MARK4 affected YAP phosphorylation. Indeed, the phosphorylation levels of YAP were significantly decreased upon MARK4 overexpression (Figure 7B), indicating that MARK inhibited Hpo signaling in mammals. Finally, we investigated whether MARK also resulted in MST (human homologue of Hpo) phosphorylation. We found that both MARK4 and MARK1 induced mobility shifts of MST2 (Figures 7C and S7B). Taken together, these findings suggested that the inhibitory function of Par-1 on Hpo signaling was conserved from Drosophila to mammals. Our study suggested that Par-1 may have a procarcinogenic role because its hyperactivation in Drosophila is sufficient to induce tissue overgrowth, and in mammals, Par-1 is sufficient to activate YAP. Interestingly, MARK4 has been suggested to play a role in hepatocellular carcinogenesis and gliomagenesis [51],[52]. However, whether MARK1 is also involved in carcinogenesis is largely unknown. To further elucidate this question, we characterized the mutation and expression of MARK1 in different cancer samples using the COSMIC and GEO databases. Although no mutation or deletion of MARK1 has been reported in the tumors that were surveyed, the transcription levels of MARK1 showed significant upregulation in squamous lung cancer samples and during the progression of prostate cancer (Figure 7D–7E). Taken together, these findings suggested that Par-1 was a potential oncogene and that its regulatory role in Hpo signaling could be conserved. Download: PPT PowerPoint slide PNG larger image TIFF original image Figure 7. MARK, the mammalian homologue of Par-1, regulates the Hpo pathway. (A) MARK4 activates YAP transcriptional activity. The indicated plasmids were co-transfected with the 5XUAS-luc reporter, Gal4-TEAD4, Flag-YAP, and CMV-β-galactosidase construct into HEK293T cells. Luciferase activity was measured and normalized against β-galactosidase activity. (B) Ectopic expression of MARK4 inhibits YAP phosphorylation. HEK293T cells were transfected with the indicated constructs, and YAP was immunoprecipitated using the anti-Flag antibody. YAP phosphorylation was detected using Western blot analysis with a phospho-YAP specific antibody and determining its mobility on a Phos-tag-containing SDS-PAGE gel. (C) MARK4 induces the phosphorylation of MST2. HEK293T cells were transfected with the indicated constructs, and MST2 phosphorylation was analyzed by electrophoresis on a Phos-tag-containing gel and through Western blotting with an anti-Flag antibody. (D) MARK1 expression is significantly upregulated in squamous lung cancer biopsy specimens compared with matched control specimens. The expression profiling data were downloaded from the GEO dataset GDS1312. These data were expressed as the mean ± SEM and were analyzed using the paired t-test. *p<0.05. (E) Prostate cancer progression is accompanied by an increased expression of MARK1. PIN, prostatic intraepithelial neoplasia; PCA, localized prostate cancer; MET, metastatic prostate cancer. The expression data were obtained from GEO dataset GDS3289. All of the results were expressed as the mean ± SEM. ***p<0.001. https://doi.org/10.1371/journal.pbio.1001620.g007 Discussion The Hpo signaling pathway has emerged as a conserved pathway that controls tissue growth and balances tissue homeostasis via the regulation of the downstream Sd-Yki transcription complex. Despite the importance of this pathway in development and carcinogenesis [2],6, many unknown regulators of the Hpo pathway remain to be identified. Here, we identified Par-1 as one such Hpo pathway regulator via a genetic overexpression screen using Drosophila EP lines. In this study, we demonstrated that Par-1 was essential for the restriction of Hpo signaling. We also demonstrated that overexpression of Par-1 promoted tissue growth via the inhibition of the Hpo pathway, whereas loss of Par-1 promoted Hpo signaling to suppress growth and induce apoptosis. Using the Drosophila eye and wing imaginal discs as well as cultured cells, we provide the first genetic and biochemical evidence for a function of Par-1 in the Hpo pathway. Although the conserved function of Hpo has been well studied, the regulatory mechanism of its kinase activity is still largely obscure. Currently, the regulatory mechanism of Hpo kinase activity was believed to mainly be dependent on autophosphorylation by altering the phosphorylation status of the Thr195 site [37],[53],[57]. However, whether the uncharacterized phosphorylation events of Hpo, which have been identified in several recent proteome-wide phosphorylation studies [39],[40], contributed to the regulation of Hpo activity is still unknown. By studying the mechanism underlying Par-1 function in Hpo signaling, we demonstrated that Par-1 induced Hpo phosphorylation at Ser30 and this lead to the regulation of Hpo kinase activity. Although we have extensively studied how Par-1 regulates the Hpo pathway in this study, several unresolved questions remain. The interaction between Par-1 and Hpo/Sav may be tightly regulated because full-length Par-1 only weakly interacted with Hpo/Sav, unlike the interaction with the N-terminal fragment of Par-1 (Figure 5A–5B). However, the triggering signal for Par-1 to interact with Hpo/Sav is still unknown. It has been reported that Par-1 was activated by Tao-1 and LKB1 [49],[50]. In this study, we established that Par-1 antagonized Tao-1 in Hpo signaling, and interestingly, in Drosophila, the antagonistic relationship between Par-1 and Tao-1 in microtubule regulation has been previously reported [57]–[59]. Thus, it is unlikely that Tao-1 functions as the trigger. We then investigated whether LKB1 functioned as an activator of Par-1 in Hpo signaling by expressing the LKB1 transgene in different organs. Unlike Par-1, ectopic LKB1 expression limited both wing and eye growth (unpublished data), indicating that LKB1 was also not the trigger. We have shown that Par-1 and Tao-1 exhibited opposing effects on Hpo signaling (Figure S5F). Given that Tao-1 and Par-1 were partners that regulated microtubule dynamics via the phosphorylation of Tau [48], Tau may have a function in Hpo signaling. To investigate this hypothesis, we employed genetic and biochemical studies and found that Tau RNAi failed to suppress the expression of Hpo pathway-responsive genes (Figure S8A–S8B). In addition, Tau did not trigger Hpo phosphorylation and Sav dissociation in vitro (Figure S8C–S8D), indicating that Par-1 regulated Hpo signaling independent of Tau. Interestingly, it has been previously suggested that Par-1 did not regulate Tau activity in Drosophila [46], indicating an evolutionary difference between Par-1 and Tau-1 function. We have provided evidence that Par-1 regulated Hpo signaling via the phosphorylation of Hpo or the destruction of the Hpo/Sav complex. Because Par-1 is a well-known polarity regulator and polarity components, such as Crumb and Lgl, have been shown to be involved in the Hpo signaling pathway [35], it is possible that Par-1 may regulate Hpo signaling via a polarity complex, or its activity might be regulated via a polarity complex. Indeed, the localization of Crumb and Patj were affected by Par-1 expression (unpublished data). Thus, further studies on polarity complexes and Hpo signaling will help elucidate this problem. Materials and Methods Cloning, Mutants, Transgenes, and Drosophila Genetics Par-1 fragments were amplified from the BDGP DGC Clone (number RE47050) using the PCR. The sequence of full-length Par-1 used in this study was the same as Par-1-N1S (accession number NP_001014542). which has been previously described [43]. The Par-1 kinase-dead mutant was generated by converting the conserved Ser412 and Thr408 into alanine at the activation loop of the Par-1 catalytic domain. All of the primers used in this study are available upon request. The EP library for the screen was a gift from Jianming Chen (Third Institute of Oceanography, State Oceanic Administration, China). Par-1w3 is a null allele from the St Johnston lab, and FRT/FLP-mediated mitotic recombination was used to generate mutant clones, as previously described [20]. The genotypes used to generate the clones were the following: FRTG13 Par-1w3/eyflp; FRTG13 ubi-GFP. Par-1 RNAi fly was purchased from the Bloomington Drosophila Stock Center (stock number 32410). To generate the Par-1 RNAi-2 transgenic fly, an artificial microRNA method was adopted, which was reported to efficiently silence gene expression [60]. Briefly, we designed two hairpin oligos that were targeted to the 1227–1247 base pair and the 1551–1571 base pair regions of Par-1-N1S (accession number NM_001014542). Next, these two hairpin oligos were ligated to create a tandem hairpin RNA. The procedure for this complete construction can be found in the supplementary information of Wang et al.'s study [61]. The following transgenes were used in this study: bantam-lacZ (a gift from Wei Du, The University of Chicago), UAS-ex-RNAi (V22994, VDRC), UAS-fat-RNAi (V9396, VDRC), and UAS-Tau-RNAi (V25024, VDRC). Other stocks included: bantam sensor mic32-GFP, ex-lacZ, diap1-GFP3.5, hh-Gal4, GMR-Gal4, MS1096, act > CD2 > Gal4, eyless-Gal4, diap1-lacZ, UAS-Yki, UAS-Yki-RNAi, UAS-Sd-RNAi, hpoBF33, wtslatsX1, and SavSH13, which have all been previously described [20],[53]. The generation of the transgenes at the attP locus has been previously described [20]. Unless otherwise indicated, all of the flies were cultured at 25°C. Cell Culture, Transfection, Immunoprecipitation, Western Blot Analysis, Immunostaining, and the Luciferase Reporter Assay S2 cells were maintained in Drosophila Schneider's Medium (Invitrogen) supplemented with 10% heat-inactivated fetal bovine serum, 100 U/ml of penicillin, and 100 mg/ml of streptomycin. The cells were incubated at 25°C in a humidified air atmosphere. Plasmid transfection was performed using Lipofectamine (Invitrogen), according to the manufacturer's instructions. For all of the transfection experiments, a ubiquitin-Gal4 construct was co-transfected with the pUAST expression vectors. The procedures for the immunoprecipitation, Western blotting, and immunostaining analyses were previously described [20],[53]. The following antibodies were used in the immunoprecipitation or Western blot analyses: rabbit anti-Hpo antibody [53], rabbit anti-Phospho Hpo (Thr195) antibody (Cell Signaling Technology), mouse anti-Flag antibody (Sigma), mouse anti-Myc antibody (Santa Cruz), mouse anti-V5 antibody (Invitrogen), and mouse anti-GFP/CFP antibody (Santa Cruz). The antibodies used in the immunostaining experiments were the following: rabbit anti-Par-1 antibody (a gift from the Montell lab, Lerner Research Institute), mouse anti-DIAP1 (a gift from Bruce A. Hay, California Institute of Technology), rat anti-cubitus interruptus (Ci) antibody (Developmental Studies Hybridoma Bank, DSHB), rabbit anti-lacZ antibody (Invitrogen), mouse anti-CD2 antibody (Invitrogen), and rabbit anti-cleaved caspase-3 antibody (Cell Signaling Technology). The rabbit anti-Phospho Hpo(Ser30) antibody was generated by Abgent. For the luciferase reporter assay, the 3×Sd2-Luc reporter has been previously described [20]. The luciferase assay was performed using the Dual Luciferase Assay System (Promega). Phosphorylation Mobility Shift Assay For the phosphorylation mobility shift assays, Phos-Tag AAL-107 (FMS Laboratory) was introduced to enlarge the mobility shift. The operating procedure was performed according to the manufacturer's instructions. For all of the mobility shift assays, the protein samples were processed using an SDS-PAGE gel under a low voltage. According to the molecular weight of the protein, a 6% or 8% resolving gel was used for the Wts and the Sav and Hpo mobility shift assays, respectively. In Vitro Kinase Assay Immunoprecipitated cell lysates or purified protein were directly incubated in 20–40 µl of kinase assay buffer (250 mM HEPES, pH 7.4, 0.2 mM EDTA, 1% glycerol, 150 mM NaCl, and 10 mM MgCl2). The reaction were initiated by the addition of an ATP mixture (2 µl 1 mM ATP, 0.2 µl [γ-32P] ATP [10 mCi/ml]) and then incubated at 30°C for 30 min. The reactions were terminated by the addition of an SDS sample buffer. Next, the samples were boiled for 5 min at 100°C followed by SDS-PAGE and autoradiography. Mapping Phosphorylation Sites by Mass Spectrometry S2 cells were transfected with Flag-Hpo or cotransfected with Flag-Hpo and Par-1. 48 h after transfection, the cells were harvested and then lysed. SDS-PAGE and Colloidal Blue staining (Invitrogen, LC6025) were then performed on the protein samples. The Hpo protein was cut from the gel and sent to the Protein Center, SIBCB for mass spectrometric analysis. A detailed procedure of the mass spectrometric analysis may be obtained from the Protein Center, SIBCB. The candidate sites were identified by the increased phosphorylation abundance in the cotransfected Flag-Hpo and Par-1 cells versus Flag-Hpo transfected cells. Statistical Analysis All of the data in this study were expressed as the mean ± standard error of the mean (SEM) and were analyzed using Student's t test by R 2.9.0. The results were considered statistically significant if p<0.05. Cloning, Mutants, Transgenes, and Drosophila Genetics Par-1 fragments were amplified from the BDGP DGC Clone (number RE47050) using the PCR. The sequence of full-length Par-1 used in this study was the same as Par-1-N1S (accession number NP_001014542). which has been previously described [43]. The Par-1 kinase-dead mutant was generated by converting the conserved Ser412 and Thr408 into alanine at the activation loop of the Par-1 catalytic domain. All of the primers used in this study are available upon request. The EP library for the screen was a gift from Jianming Chen (Third Institute of Oceanography, State Oceanic Administration, China). Par-1w3 is a null allele from the St Johnston lab, and FRT/FLP-mediated mitotic recombination was used to generate mutant clones, as previously described [20]. The genotypes used to generate the clones were the following: FRTG13 Par-1w3/eyflp; FRTG13 ubi-GFP. Par-1 RNAi fly was purchased from the Bloomington Drosophila Stock Center (stock number 32410). To generate the Par-1 RNAi-2 transgenic fly, an artificial microRNA method was adopted, which was reported to efficiently silence gene expression [60]. Briefly, we designed two hairpin oligos that were targeted to the 1227–1247 base pair and the 1551–1571 base pair regions of Par-1-N1S (accession number NM_001014542). Next, these two hairpin oligos were ligated to create a tandem hairpin RNA. The procedure for this complete construction can be found in the supplementary information of Wang et al.'s study [61]. The following transgenes were used in this study: bantam-lacZ (a gift from Wei Du, The University of Chicago), UAS-ex-RNAi (V22994, VDRC), UAS-fat-RNAi (V9396, VDRC), and UAS-Tau-RNAi (V25024, VDRC). Other stocks included: bantam sensor mic32-GFP, ex-lacZ, diap1-GFP3.5, hh-Gal4, GMR-Gal4, MS1096, act > CD2 > Gal4, eyless-Gal4, diap1-lacZ, UAS-Yki, UAS-Yki-RNAi, UAS-Sd-RNAi, hpoBF33, wtslatsX1, and SavSH13, which have all been previously described [20],[53]. The generation of the transgenes at the attP locus has been previously described [20]. Unless otherwise indicated, all of the flies were cultured at 25°C. Cell Culture, Transfection, Immunoprecipitation, Western Blot Analysis, Immunostaining, and the Luciferase Reporter Assay S2 cells were maintained in Drosophila Schneider's Medium (Invitrogen) supplemented with 10% heat-inactivated fetal bovine serum, 100 U/ml of penicillin, and 100 mg/ml of streptomycin. The cells were incubated at 25°C in a humidified air atmosphere. Plasmid transfection was performed using Lipofectamine (Invitrogen), according to the manufacturer's instructions. For all of the transfection experiments, a ubiquitin-Gal4 construct was co-transfected with the pUAST expression vectors. The procedures for the immunoprecipitation, Western blotting, and immunostaining analyses were previously described [20],[53]. The following antibodies were used in the immunoprecipitation or Western blot analyses: rabbit anti-Hpo antibody [53], rabbit anti-Phospho Hpo (Thr195) antibody (Cell Signaling Technology), mouse anti-Flag antibody (Sigma), mouse anti-Myc antibody (Santa Cruz), mouse anti-V5 antibody (Invitrogen), and mouse anti-GFP/CFP antibody (Santa Cruz). The antibodies used in the immunostaining experiments were the following: rabbit anti-Par-1 antibody (a gift from the Montell lab, Lerner Research Institute), mouse anti-DIAP1 (a gift from Bruce A. Hay, California Institute of Technology), rat anti-cubitus interruptus (Ci) antibody (Developmental Studies Hybridoma Bank, DSHB), rabbit anti-lacZ antibody (Invitrogen), mouse anti-CD2 antibody (Invitrogen), and rabbit anti-cleaved caspase-3 antibody (Cell Signaling Technology). The rabbit anti-Phospho Hpo(Ser30) antibody was generated by Abgent. For the luciferase reporter assay, the 3×Sd2-Luc reporter has been previously described [20]. The luciferase assay was performed using the Dual Luciferase Assay System (Promega). Phosphorylation Mobility Shift Assay For the phosphorylation mobility shift assays, Phos-Tag AAL-107 (FMS Laboratory) was introduced to enlarge the mobility shift. The operating procedure was performed according to the manufacturer's instructions. For all of the mobility shift assays, the protein samples were processed using an SDS-PAGE gel under a low voltage. According to the molecular weight of the protein, a 6% or 8% resolving gel was used for the Wts and the Sav and Hpo mobility shift assays, respectively. In Vitro Kinase Assay Immunoprecipitated cell lysates or purified protein were directly incubated in 20–40 µl of kinase assay buffer (250 mM HEPES, pH 7.4, 0.2 mM EDTA, 1% glycerol, 150 mM NaCl, and 10 mM MgCl2). The reaction were initiated by the addition of an ATP mixture (2 µl 1 mM ATP, 0.2 µl [γ-32P] ATP [10 mCi/ml]) and then incubated at 30°C for 30 min. The reactions were terminated by the addition of an SDS sample buffer. Next, the samples were boiled for 5 min at 100°C followed by SDS-PAGE and autoradiography. Mapping Phosphorylation Sites by Mass Spectrometry S2 cells were transfected with Flag-Hpo or cotransfected with Flag-Hpo and Par-1. 48 h after transfection, the cells were harvested and then lysed. SDS-PAGE and Colloidal Blue staining (Invitrogen, LC6025) were then performed on the protein samples. The Hpo protein was cut from the gel and sent to the Protein Center, SIBCB for mass spectrometric analysis. A detailed procedure of the mass spectrometric analysis may be obtained from the Protein Center, SIBCB. The candidate sites were identified by the increased phosphorylation abundance in the cotransfected Flag-Hpo and Par-1 cells versus Flag-Hpo transfected cells. Statistical Analysis All of the data in this study were expressed as the mean ± standard error of the mean (SEM) and were analyzed using Student's t test by R 2.9.0. The results were considered statistically significant if p<0.05. Supporting Information Figure S1. (A–A′) Drosophila wings of wild type (A) or wings expressing Par-1 (A′) with hh-Gal4. The posterior compartments were indicated by a pseudo-gray color. (B) Quantification of the relative P-compartment area of the wings. The results were calculated as the area of the P-compartment divided by the entire wing area. The results represented the mean ± SEM. * mean p<0.05 (n>6) for each genotype. (C–C″) Overexpression of Par-1 promotes the transcription of diap1. Cells expressing UAS-Par-1 were labeled by the lack of CD2 expression (indicated arrows). Note the upregulation of diap1 transcription via ectopic Par-1 expression. (D) Inability of Par1-KD to autophosphorylate. Myc-tagged Par-1 or Par-1-KD was immunoprecipitated and subjected to an in vitro kinase assay. (E–F) Western blot analysis of extracts from third-instar larval eye discs (E) and wing discs (F) to show the expression level of Par-1 and Par-1-KD. https://doi.org/10.1371/journal.pbio.1001620.s001 (TIF) Figure S2. (A–C′) Wild-type wing discs (A–A′) or wing discs expressing Par-1 RNAi (B–B′) or Par-1 RNAi-2 (C–C′) with hh-Gal4 were immunostained with anti-Ci (red) and anti-Par-1 (green) to detect RNAi efficiency. The arrows indicate the P-compartment. Both RNAi lines could efficiently knock down the endogenous Par-1 protein. (D–D′) Adult wings of wild type (D) or wings expressing Par-1-RNAi-2 (D′) with MS1096. Note the reduced organ size induced by Par-1 RNAi-2. (E–G′) Wing discs expressing Par-1 RNAi-2 in the P-compartment with hh-Gal4 were immunostained to demonstrate the expression of ex-LacZ (E–E′), diap1- GFP3.5 (F–F″), and bantam sensor mic32-GFP (G–G″). Note that Par-1 RNAi-2 downregulated the expression of Hpo-responsive genes. The arrows indicate the P-compartment. https://doi.org/10.1371/journal.pbio.1001620.s002 (TIF) Figure S3. (A–B′) Gain-of-function of the Par-1-induced phenotype is blocked by Yki RNAi. UAS-2*Myc-Par-1 (A–A′), UAS-2*Myc-Par-1; UAS-Yki RNAi (B–B′) were expressed under the control of act>CD2>Gal4 to detect changes in diap1-LacZ. Cells expressing the indicated transgenes were marked by Myc tag (indicated by arrows). Note that the upregulation of diap1-LacZ induced by ectopic Par-1 was completely suppressed by Yki RNAi. (C–F′) Par-1 is functionally dependent on Sd in the Hpo pathway. Adult wings expressing UAS-2*Myc Par-1; UAS-Sd RNAi (C′) showed a similar phenotype as that of Sd RNAi (C). Furthermore, wild-type wing discs (D–D′), wing discs expressing UAS-2*Myc-Par-1 (E–E′) or UAS-Myc-Par-1; UAS-Sd RNAi (F–F′) in the P-compartment were immunostained to demonstrate the expression of diap-GFP3.5. The P-compartment was marked by the loss of Ci or Myc tag (red) and is indicated by arrows. Note that coexpression of Sd RNAi reversed the upregulation of diap1-GFP 3.5 induced by Par-1. (G–G′″) Drosophila wings of the indicated genotypes are shown. Note that the enlarged wing size induced by ex RNAi was reversed by Par-1 RNAi. (H–H′″) Drosophila wings of the indicated genotypes are shown. Note that Par-1 RNAi reduced wing size even in the fat-RNAi condition. https://doi.org/10.1371/journal.pbio.1001620.s003 (TIF) Figure S4. (A) A schematic representation of the Par-1 full-length structure or its truncated forms. (B) Immunoprecipitation between Par-1-N, Wts and Mats. S2 cells were transfected with the indicated constructs followed by co-immunoprecipitation. Note that Wts and Mats were unable to interact with the N-terminal of Par-1. https://doi.org/10.1371/journal.pbio.1001620.s004 (TIF) Figure S5. (A) Par-1 regulates Hpo phosphorylation in vitro. S2 cells were transfected with the indicated constructs. Cell lysates were subjected to the phosphorylation mobility shift assay. Note the phosphorylation shift of Hpo in the presence of Par-1 but not with Par-1 KD. (B–C) Hpo(S30A) mutants blocked the Par-1-induced Hpo phosphorylation shift but Hpo (T615A), Hpo(S66A) and Hpo(Y365E) did not. (D–E) Par-1 induces Ser30 phosphorylation of Hpo in S2 cells. S2 cells were transfected with the indicated constructs. The cell lysates were subjected to a Western blot analysis. Note that the phospho Hpo(Ser30) antibody could only detect Hpo but not Hpo(S30A) phosphorylation induced by Par-1. (F) Par-1 and Tao-1 antagonization regulates Hpo phosphorylation. S2 cells were transfected with the indicated plasmids, and the cell lysates were subjected to a direct Western blot analysis or a phosphorylation mobility shift assay. Note that Tao-1 partially inhibited the Par-1-induced Hpo phosphorylation mobility shift, while Par-1 inhibited Tao-1-induced Hpo Thr195 phosphorylation. (G–I) Hpo(S30A) showed enhanced activity compared with wild-type Hpo in vivo. Adult wings of wild-type (G), wings expressing Hpo (G′), and wings expressing Hpo(S30A) (G″) under the control of tub-Gal80ts; Ci-Gal4. Note that the Hpo Ser30 mutant induced smaller wings (H) and a higher mortality rate (I) compared to wild-type Hpo. The relative wing size (H) was quantified using an unpaired t-test. The results represented the mean ± SEM. **p<0.01, (n>6) for each genotype. The percentage of lethal flies (I) was calculated by dividing the number of lethal pupas by the total number of pupas. To induce Hpo expression, fly progeny were transferred to a 29°C incubator at different developmental stages. https://doi.org/10.1371/journal.pbio.1001620.s005 (TIF) Figure S6. (A) Par-1 destabilizes Hpo-induced Sav accumulation independent of Par-1 phosphorylation at the Hpo Ser30 site. S2 cells were transfected with the indicated constructs followed by a Western blot analysis. Note that both wild-type Hpo and Hpo(S30A) could stabilize Sav. In addition, stabilized Sav can be destabilized by Par-1 overexpression. (B) par-1 functions upstream of sav in the Hpo pathway. Clones were generated using the MARCM system. The following genotypes were used: ey-flp, Ubi-Gal4, UAS-GFP; FRT82B SavSH13/FRT82B Gal80 (left panel), and eyflp, ubiGal4, UAS-GFP; Par-1-RNAi; FRT82B SavSH13/FRT82B Gal80 (right panel). (C) Quantification of the relative clone size. The relative clone size was calculated as the GFP area divided by the entire disc area. All of these data were expressed as the mean ± SEM. **p<0.01. **p<0.001. n>5, for each group. https://doi.org/10.1371/journal.pbio.1001620.s006 (TIF) Figure S7. (A) MARK1 significantly enhances the transcriptional activity of Yap. HEK293T cells were transfected with the indicated plasmids, and the cell lysates were directly subjected to a dual luciferase reporter assay. Note that MARK1 synergized with Yap to promote the transcriptional activity of TEAD. (B) MARK1 induces a MST2 phosphorylation mobility shift. HEK293T cells were transfected with the indicated plasmids and cell lysates were directly subjected to a phosphorylation mobility shift assay. Note that MST2 showed a shift band by MARK1 coexpression. https://doi.org/10.1371/journal.pbio.1001620.s007 (TIF) Figure S8. (A–B′) Wing discs expressing Tau RNAi (A–B′) with hh-Gal4 were immunostained with anti-Ci (red) and anti-DIAP1(green A–A′) to demonstrate the expression level of DIAP1 (A–A′) and mic32-GFP (B–B′). Note that the expression of Tau RNAi failed to affect Hpo pathway-responsive gene expression. The arrows indicate the P-compartment. (C–D) S2 cells were transfected with the indicated constructs followed by Western blot analyses. Note that Tau affects neither the phosphorylation status of Sav (C) nor the mobility shift of Hpo (D). https://doi.org/10.1371/journal.pbio.1001620.s008 (TIF) Acknowledgments We would like to thank Yingzi Yang, Wei Du, Dangsheng Li, and Jinqiu Zhou for their critical reading of the manuscript and Wei Du, Daniel St. Johnston, Jackie Hall, Denise Montell, Bruce A. Hay, Jocelyn A. McDonald, and Jianming Chen for supplying various reagents. We also thank the Bloomington and Vienna Stock Centers and DSHB and DGRC (supported by NIH grant OD010949-10) for fly stocks.
Bursting with Randomness: A Simple Model for Stochastic Control of Gene Expressiondoi: 10.1371/journal.pbio.1001622pmid: 23940459
If a gene's promoter were a light switch, you'd probably call an electrician. That's because rather than simply turning on and off in a limited and predictable way, many genes—whose expression is controlled by their promoters—are expressed in bursts, with expression fluctuating randomly over time. Download: PPT PowerPoint slide PNG larger image TIFF original image Analyzing the nucleosome configuration of single gene molecules by electron microscopy. https://doi.org/10.1371/journal.pbio.1001622.g001 What is the molecular basis of this transcriptional bursting? Previous work has suggested, but not directly demonstrated, that gene promoters can assume alternative structural configurations, raising the possibility that the promoter might randomly flip among these alternatives. Transcription requires that the polymerase machinery gain access to the promoter, and that access can be inhibited when DNA is tightly wrapped around histone proteins to form nucleosomes, the basic protein–DNA unit of structure within chromosomes. Researchers have speculated that these alternative promoter configurations might be linked to the number and distribution of nucleosomes within the promoter, but there has been no proof. The best way to test that hypothesis is by direct observation, and that is what Christopher Brown, Hinrich Boeger, and colleagues set out to do. Their work confirmed that the promoter of PHO5, a well-studied yeast gene, does indeed adopt alternative configurations based on nucleosome distribution. And they went further, examining the probability distribution of those nucleosomes and constructing a mechanistic model that accounts for that distribution, and hence the phenomenon of transcriptional bursting. The key to their study was to chemically “freeze” the promoter regions of multiple PHO5 genes, and then use the electron microscope to determine the precise configuration of each. The gene's promoter includes three nucleosome binding sites, and they were able to tabulate the relative frequency of each of the eight possible binding states (i.e., occupied-occupied-occupied, occupied-occupied-empty, etc.). Next, they built a mathematical probability model to test various assumptions about the transitions among the binding states. They distinguished three kinds of transitions: assembly of a nucleosome on a binding site, disassembly and removal, or sliding of a nucleosome from one site to another. They found that assembly and disassembly alone accounted for the number of observed nucleosomes, but that sliding was needed to account for their distribution. In particular, it appeared that sliding out of the middle position toward the ends was common. Further, the best fit with the data came when disassembly at the first position occurred only when the second position was unoccupied. Since transcription is maximally repressed in the fully occupied state and maximally permitted in the fully unoccupied state, the stochastic transitions among these and the intermediate states suggest that transcription of the PHO5 gene occurs in random bursts, the authors concluded. Further experimentation and modeling allowed them to predict that binding states in which the middle position was unoccupied were the most conducive to transcription. But the transcription rate was not maximal in all conducive states; other, as-yet unidentified factors presumably play a role in determining the degree of transcription for a given conducive state. The model the authors developed provides a structural basis for transcriptional bursting consistent with a large body of data. While other models might also explain that data, none do so as simply and with as few “working parts” as theirs—generally a sign that a model is pointing in the right direction. Brown CR, Mao C, Falkovskaia E, Jurica MS, Boeger H (2013) Linking Stochastic Fluctuations in Chromatin Structure and Gene Expression. doi:10.1371/journal.pbio.1001621
A Bacteriophage Tailspike Domain Promotes Self-Cleavage of a Human Membrane-Bound Transcription Factor, the Myelin Regulatory Factor MYRFdoi: 10.1371/journal.pbio.1001624pmid: 23966832
Introduction Membrane-bound transcription factors (MBTFs) are a remarkable class of transcription factors that are initially generated as integral membrane proteins. Upon relevant cues, they undergo proteolytic processing, releasing the transcription factor domain from the membrane and allowing it to translocate to the nucleus to control gene expression. Two different broad mechanisms of MBTF proteolytic activation have been observed to date. One class of MBTFs is proteolytically activated by regulated ubiquitin/proteasome-dependent processing (RUP) and includes transcription factors that control membrane fluidity in budding yeast (SPT23 and MGA2) and a fission yeast hypoxic transcription factor (Sre1) [1]–[2]. The second class is activated via regulated intramembrane proteolysis (RIP) and includes sterol regulatory element-binding proteins (SREBPs) [3]–[4], activating transcription factor 6 (ATF6) [5]–[7], and the developmental regulator Notch [8]–[10]. RIP-dependent activation of MBTFs typically requires additional proteases that act outside of the membrane. For example, when cellular cholesterol levels decrease, SREBPs are transported to the Golgi apparatus, where they are cleaved by Site-1 protease, whose active site is located in the lumen of the Golgi. Cleavage by Site-1 protease allows the subsequent intramembrane proteolysis by Site-2 protease [4]. Similarly, following accumulation of misfolded proteins in the endoplasmic reticulum (ER), ATF6 translocates to the Golgi and is proteolyzed sequentially by Site-1 and Site-2 proteases [5],[7]. Recently, many basic leucine zipper proteins homologous to ATF6 have been discovered and appear to play important roles in tissue-specific unfolded protein responses [11]–[12]. Within the human genome, an early genome-wide computational screen suggested the existence of six MBTFs [13]. Since then, the number of characterized DNA-binding domains has increased significantly [14], and prediction methods for the membrane topology of proteins have been improved dramatically [15]–[16], which led us to revisit the search for human MBTFs. We found that C11orf9, the largely uncharacterized human ortholog of mouse Myrf (a key transcriptional regulator of oligodendrocyte (OL) maturation and CNS myelination [17]), was strongly predicted to encode an MBTF. C11orf9 (hereafter referred to as MYRF [CCDS ID: 31579 and RefSeq ID: NP_037411]) and its orthologs were predicted to have a domain homologous to the DNA-binding domain of the yeast transcription factor Ndt80 [18] as well as a single transmembrane (TM) segment. However, by using algorithms capable of recognizing extremely distant sequence homology, we also observed that MYRF and its orthologs harbor an intramolecular chaperone domain shared with bacteriophage endosialidases [19]–[20], the tailspike proteins essential for bacteriophages to infect bacteria encapsulated with polysaccharides. While the homology of genes between bacteriophages and eukaryotes is not unprecedented, or even the horizontal transfer of genes between the two, it is nonetheless rare, and in general the mechanism of transferred genes is quite different. For example, the GG domain is found in both bacteriophage tail fibers and FAM3 cytokines [21]. In addition, the large nuclear and cytoplasmic viruses, such as Mimivirus, appear to have chimeric origins that include bacteriophages [22]. The tailspike proteins are known to trimerize and to self-process. This raised the hypothesis that this domain in eukaryotes might contribute to a novel method for the formation and function of an MBTF. Indeed, the intramolecular chaperone domain of MYRF facilitates its homo-oligomerization and proteolytically processes it into two halves. The N-terminal trimer, containing the DNA-binding domain, is released from the ER membrane and moves to the nucleus, where it exerts transcriptional effects. Proper processing and translocation of the MYRF N-terminal trimer then contributes to the maturation of OLs. The C-terminal homo-oligomer, containing the TM domain, remains in the ER. These findings not only demonstrate an extraordinary link between a possible endosymbiont or commensal bacteriophage and eukaryotic development, but reveal a novel cleavage mechanism for MBTFs. Results Full-Length MYRF Is First Generated as a Type-II Membrane Protein Myrf (the mouse ortholog of MYRF) was previously reported to encode a nuclear protein, based on immunofluorescence (IF) microscopy with an N-terminally Myc-tagged construct [17]. However, TOPCONS [15], a state-of-the-art membrane topology prediction program, predicts both MYRF and Myrf to be type-II membrane proteins (Figure S1A). Notably, we identified well-conserved nuclear localization signals (NLSs) in the N-terminus (K245KRK248 and K482KGK485) and potential N-linked glycosylation sites in the C-terminus (Figure 1A). Download: PPT PowerPoint slide PNG larger image TIFF original image Figure 1. Full-length MYRF is generated as a membrane protein. (A) Predicted sequence features of MYRF and sequence diagrams of various MYRF constructs used for IF microscopy. Stars in blue indicate predicted NLSs at K245KRK248 and K482KGK485. (B) IF images of GFP-MYRF, MYRF-GFP, MYRFΔTM-GFP, and MYRF-1:756-GFP in HeLa cells. (C) IF image of 3F-MYRF-GFP in HeLa cells. Scale bar, 10 µm. https://doi.org/10.1371/journal.pbio.1001624.g001 To determine the precise localization of MYRF in the cell, we expressed epitope-tagged MYRF constructs in HeLa cells. Green fluorescent protein (GFP) tagged to the N-terminus of MYRF (GFP-MYRF, Figure 1A) localized to the nucleus, in agreement with the previous study on Myrf (Figure 1B). However, when GFP was tagged to the C-terminus of MYRF (MYRF-GFP, Figure 1A), the GFP signal co-localized with calnexin (CLX), an ER marker (Figure 1B). A doubly-tagged protein, 3F-MYRF-GFP (Figure 1A), resolved this apparent dichotomy: The FLAG tag at the N-terminus exhibited a nuclear signal, whereas the C-terminal GFP signal co-localized with the ER (Figure 1C). In order to test if the predicted TM domain mediated the ER localization of the C-terminus of MYRF, we deleted the TM domain from the C-terminally GFP-tagged construct (MYRFΔTM-GFP, Figure 1A). MYRFΔTM-GFP localized to the nucleus of HeLa cells (Figure 1B), confirming the role of the predicted TM domain for ER localization. Similarly, a C-terminally GFP-tagged mutant truncated before the predicted TM domain at L756 (MYRF-1:756-GFP, Figure 1A) also localized to the nucleus (Figure 1B). Control experiments were consistent when using alternate epitope tags (FLAG tag; Figure S1B) and cell lines (CG4 cells, a rat OL cell line that may be used as a model for early OL differentiation [23]–[31]; Figure S1C and S1D). Thus, these localization patterns appear to be intrinsic features of MYRF and not artifacts of the particular tags or cells used. The microscopy suggested that MYRF is processed in cells, which was further confirmed by Western blot of 3F-MYRF (Figure 2B). The majority of the protein was cleaved into a ∼90 kDa N-terminal fragment from the full length of ∼160 kDa. The latter was further verified by comparing 5M-MYRF-3F protein expressed in cells to that expressed from an in vitro translation system (the in vitro reaction mixture immunoprecipitated with FLAG antibodies and blotted with anti-Myc antibodies) (Figure 2C). Download: PPT PowerPoint slide PNG larger image TIFF original image Figure 2. Full-length MYRF is a type-II membrane protein. (A) Predicted sequence features of MYRF and sequence diagrams of various MYRF constructs used for experiments. (B) Western blot of HeLa cells transfected with pcDNA3 and 3F-MYRF. (C) The top band of HeLa cells that were transfected with 5M-MYRF-3F has the same electrophoretic mobility as full-length protein products for the same construct that were obtained with an in vitro translation system. (D) Full-length forms of MYRF consist of two closely spaced bands that represent glycosylated and unglycosylated full-length MYRF, respectively (indicated by the two arrows). (E) HeLa cells transfected with 3F-MYRF were disrupted using a Dounce-type homogenizer, and then centrifuged at 200× g for 5 min to obtain a supernatant fraction. It was mixed with 0.1 volume of each of the following chemicals: 5 M NaCl, 1 M Na2CO3 (pH 11), and 10% SDS. After incubation for 20 min at room temperature, mixtures were centrifuged at 20,000× g for 15 min at 4°C to separate supernatant (S) from pellet (P). Calnexin, a known integral membrane protein, served as a control. (F) Membrane topology of GFP-MYRF-3F and 3F-MYRF-L690A-GFP in HeLa cells. When cell membranes were selectively permeated by digitonin, FLAG IF signals of GFP-MYRF-3F could not be detected, indicating that the C-terminus of MYRF is located within the ER lumen. In contrast, FLAG IF signals of 3F-MYRF-L690A-GFP were robustly detected even when cell membranes were selectively permeated by digitonin, indicating that the N-terminus of full-length MYRF is located on the cytoplasmic side of ER membranes. Scale bar, 10 µm. https://doi.org/10.1371/journal.pbio.1001624.g002 The top band representing full-length MYRF was observed to consist of two closely spaced bands (Figure 2D, arrows), with the upper and lower bands potentially representing glycosylated and unglycosylated full-length MYRF, respectively. Upon MG132 treatment, the lower band became as dominant as the upper one. This suggested either that MG132 treatment alters the degradation of MYRF or that MG132—an inducer of ER stress that decreases glycosylation efficiency [32]—inhibits the glycosylation of full-length MYRF, leading to the accumulation of unglycosylated full-length MYRF. Consistent with the latter possibility, tunicamycin treatment reversed the ratio between the upper and lower bands, with the lower one now dominating (Figure 2D). We note that the 120 kDa isoform (Figure 2B) is most likely a degradation intermediate, as it was inconsistently observed and disappeared upon treatment with MG132 (Figure S2A). Fractionation of HeLa cells transfected with 3F-MYRF revealed that full-length MYRF could be extracted from membranes by treatment with the detergent SDS, but not with high salt or alkaline pH (Figure 2E), similar to the control protein calnexin, a known integral membrane protein. Thus, the fluorescence microscopy, TM domain mutagenesis, glycosylation analysis, and biochemical fractionation data all demonstrated that full-length MYRF is an integral membrane protein. Finally, we determined the membrane topology of full-length MYRF by treating cells with digitonin, which selectively permeabilizes the plasma membrane but not organelle membranes (Figure S2B) [33]. When the plasma membrane of HeLa cells expressing GFP-MYRF-3F was selectively permeabilized by digitonin, FLAG IF signals could not be detected, in contrast to a strong signal when membranes were indiscriminately permeabilized by Triton X-100 (Figure 2F), suggesting that the C-terminus of MYRF is oriented to the ER lumen. Additional tests with a point mutant (L690A, detailed below) that blocks the generation of the 90 kDa isoform from full-length MYRF enabled us to probe the subcellular location of the N-terminus of full-length MYRF. FLAG IF signals were detected for 3F-MYRF-L690A-GFP when cell membranes were selectively permeated with digitonin (Figure 2F), indicating that the N-terminus of full-length MYRF is located on the cytoplasmic side of ER membranes. Thus, MYRF is synthesized as a type-II membrane protein and processed into N-terminal and C-terminal portions, localized in the nucleus and on the ER membrane, respectively. MYRF Harbors the Intramolecular Chaperone Domain of Bacteriophage Endosialidases In the course of analyzing the MYRF sequence, we discovered distant but significant homology (16% sequence identity and E-value = 3.1×10−18, as measured by HHpred [34]) between the portion of MYRF that lies between its DNA-binding and TM domains and the intramolecular chaperone domain found in bacteriophage endosialidases, proteins that constitute the tailspikes of many bacteriophages (Figure S3A) [19],[35]. The intramolecular chaperone domain, which we have dubbed an ICA (Intramolecular Chaperone Auto-processing) domain, plays two roles in the maturation of bacteriophage endosialidases. The ICA domain facilitates the protein's folding and trimerization [19],[35]. It then functions as a “folding sensor” and auto-cleaves itself away from the bacteriophage endosialidase [20]. A multiple sequence alignment of MYRF and its orthologs indicated that the ICA domain is a strictly conserved feature (Figure S3D). Further, a multiple sequence alignment of only the ICA domains from eukaryotes, a bacterium, and a phage revealed the absolute conservation of S578 and K583 (following the MYRF numbering, Figure 3A). In bacteriophage endosialidases, the serine and lysine residues equivalent to MYRF S578 and K583 form a catalytic dyad for the auto-cleavage reaction [20]. The correct positioning of these catalytic residues, along with an arginine residue that stabilizes the oxyanion during the peptide bond breakage, is thought to be achieved only upon folding and trimerization of bacteriophage endosialidases [20], enabling the ICA domain to function as a folding sensor. We thus asked if the ICA domain might nonetheless still serve—in a radically altered context as compared to viral tailspikes—as a folding sensor and protease to activate MYRF. Download: PPT PowerPoint slide PNG larger image TIFF original image Figure 3. The ICA domain autonomously mediates the proteolytic processing of MYRF. (A) Multiple sequence alignment of the ICA domains from eukaryotes, a bacterium, and a phage, generated with ClustalW [54]. Strictly conserved residues are shown in red. The numbering system is based on MYRF. (B) The auto-processing mechanism for MYRF postulated based on the ICA domain and its known properties. (C) Western blots of HeLa cells transfected with various MYRF constructs, showing the effects of mutations in the ICA domain on the proteolytic processing of MYRF. (D) IF image of 3F-MYRF-S578A and 3F-MYRF-K583A in HeLa cells. (E) The amino acid sequence of MYRF (residues N567-R692) was mapped onto the crystal structure of an ICA domain (PDB ID: 3GW6) using the alignment shown in Figure S3A. In the zoomed active site are shown two key catalytic residues (S578 and K583, both belonging to the same subunit) in stick model and two strictly conserved residues (V670 of one subunit and G626 of a different subunit) in space filling model. Shown below are L683, I687, and L690 that were predicted to form a leucine zipper. For visual clarity, clipped images were generated when deemed necessary. (F) (Left) Western blot showing that the proteolytic processing of MYRF is independent of its membrane insertion. MYRF-1:756 is a mutant truncated before the TM domain at L756. (Middle) Western blot showing the proteolytic processing of MYRF-319:708 in HeLa cells and E. coli. (Right) Western blot showing the normal processing of full-length MYRF in budding yeast. Scale bar, 10 µm. https://doi.org/10.1371/journal.pbio.1001624.g003 The ICA Domain Autonomously Mediates the Proteolytic Processing of Full-Length MYRF Based on the conservation of the ICA domain, including its catalytic residues, we hypothesized that, once generated as a type-II membrane protein, the ICA domain could potentially facilitate the folding and trimerization of full-length MYRF and then proteolytically process it into two independent trimers (Figure 3B). The N-terminal trimer, containing the DNA-binding domain, might then be released from membranes and enter the nucleus to regulate transcription, while the C-terminal trimer, comprising residues S578-D1111, would remain in the ER membrane. To test this hypothesis, we mutated the ICA domain of MYRF and assayed the effects on the proteolytic processing of MYRF. Deletions involving the ICA domain (Δ662–752 and Δ538–617) blocked normal processing of MYRF (Figure 3C). Likewise, mutation of the putative catalytic residues S578 and K583 to alanine (3F-MYRF-S578A and 3F-MYRF-K583A) also blocked proteolytic processing (Figure 3C). The FLAG tag at the N-terminus of these mutant constructs remained in the ER membrane of HeLa cells (Figure 3D), demonstrating that the DNA-binding domain of MYRF is retained in the membrane when auto-processing is blocked. We next asked whether additional residues shown to be important for the function of phage ICA domains are also important for the function of the MYRF ICA domain. As the N912 and G956 residues in the ICA domain of bacteriophage K1F endosialidase are essential for the function of the ICA domain [19], mutation of their corresponding residues in MYRF to alanine (3F-MYRF-D579A and 3F-MYRF-G626A) markedly reduced the proteolytic processing of MYRF (Figure 3C). As an additional control, we expressed a truncated form of MYRF that terminates at P577 (3F-MYRF-1:577), corresponding to the expected N-terminal fragment generated from auto-processing (Figure 3B), and confirmed that it has the same electrophoretic mobility as the processed N-terminal fragment of 3F-MYRF (Figure 3C). Taken together, these results support the hypothesis that the ICA domain mediates the proteolytic processing of MYRF, in a manner similar to bacteriophage endosialidases. To further investigate the role of the ICA domain in the processing of MYRF, we mapped the amino acid sequence of the MYRF ICA domain (residues N567–R692) onto the crystal structure of the ICA domain of bacteriophage K1F endosialidase (PDB accession code: 3GW6 [20]) (Figure S3A). The homology-derived structure predicted that L683, I687, and L690 of MYRF form a leucine zipper (Figure 3E). Because the leucine zipper appears integral to the trimeric structure of the MYRF ICA domain, we reasoned that its disruption would destabilize the trimer and consequently interfere with proteolytic processing. Site-directed mutagenesis confirmed that the leucine zipper is indeed required for MYRF processing (Figure 3C). IF microscopy of 3F-MYRF-L683A and 3F-MYRF-L690A in HeLa cells confirmed that their localizations matched the catalytic residue mutants (Figure S3B). In contrast, the structure suggested that K596, S599, and L679 would not be essential to either catalytic or structural roles, and all were predicted to face the exterior of the protein (Figure S3C). Consistent with this prediction, mutating each of these residues to alanine (3F-MYRF-K596A, 3F-MYRF-S599A, and 3F-MYRF-L679A) did not affect MYRF processing (Figure 3C). These results confirm that the ICA domain is indeed responsible for the proteolytic processing of MYRF, and that the mechanism of proteolysis is conserved between animals and bacteriophages, in spite of a complete alteration of neighboring protein domains and overall protein function. The ICA domain is known to function autonomously to proteolyze bacteriophage endosialidases. We therefore asked whether the processing of MYRF was similarly autonomous, testing two specific hypotheses. First, we examined whether the proteolytic processing of MYRF was independent of membrane integration. As shown in Figure 3F, a construct (3F-MYRF-1:756) that was truncated before the TM domain at L756 was normally processed in HeLa cells, but processing was blocked when the catalytic residue S578 was changed to alanine (3F-MYRF-1:756-S578A). Second, we asked whether MYRF is normally processed in heterologous systems, which would support a fully autonomous event. To address this hypothesis, we expressed MYRF in E. coli and yeast cells. Due to the difficulty of expressing full-length MYRF in E. coli, we worked with a truncation construct (MYRF-319:708) that only comprises the DNA-binding and ICA domains of MYRF. This construct was normally processed in HeLa cells (Figure 3F), and its processing was blocked when important residues were mutated to alanine (3F-MYRF-319:708-S578A, 3F-MYRF-319:708-K583A, and 3F-MYRF-319:708-L683A). Figure 3F shows that MYRF-319:708 behaved in the same manner in E. coli, and similarly, full-length MYRF was normally processed in budding yeast (Figure 3F). Taken together, these results indicate that the ICA domain autonomously functions in the proteolytic processing of MYRF. The N-Terminal Trimer, Formed by the ICA Domain, Translocates to the Nucleus Aided by Two NLSs The ICA domain is known to induce the trimerization of bacteriophage endosialidases as part of its intramolecular chaperone activity [36]. Given the central role of the ICA domain in MYRF auto-processing, we next asked whether it was also promoting trimerization in this context. We first used co-immunoprecipitation experiments of differentially tagged constructs in order to assay homo-oligomerization of the N-terminal fragment generated by the auto-processing of MYRF. As shown in Figure 4B, the N-terminal fragment of N-terminally 5xMyc-tagged MYRF (5M-MYRF; Figure 4A) did not bind beads coated with FLAG antibodies. However, when co-transfected with 3F-MYRF (Figure 4A), the N-terminal fragment of 5M-MYRF robustly bound the FLAG beads, confirming homo-oligomerization of the N-terminal fragment of MYRF. To measure the nature of the homo-oligomer, we employed size exclusion chromatography, which indicated that the N-terminal fragment from the auto-processing of MYRF-319:708 (a construct which only contains the DNA-binding and ICA domains of MYRF) forms a trimer (Figure S4). MrfA, a Dictyostelium ortholog of MYRF, has also been suggested to bind DNA as a trimer in vivo [37]. Download: PPT PowerPoint slide PNG larger image TIFF original image Figure 4. The N-terminal trimer is formed by the ICA domain and enters the nucleus. (A) Predicted sequence features of MYRF and sequence diagrams of various MYRF constructs used for experiments. (B) Western blots showing co-immunoprecipitation results for the MYRF constructs. “Input” was incubated with FLAG antibody-coated beads and then spun down to separate “Sup” from “Bead” fractions. The failure of MYRF-1:577 to homo-oligomerize demonstrated the importance of the ICA domain for the N-terminal trimer formation. (C) When the NLSs (NLS1 and NLS2) were deleted, the nuclear translocation of the N-terminal trimer was partially blocked. Scale bar, 20 µm. https://doi.org/10.1371/journal.pbio.1001624.g004 Given that the MYRF N-terminal fragment forms a trimer, we assayed the state of the C-terminal fragment. We could not directly assay its trimeric state due to expression and purification issues in E. coli; instead, we asked if the C-terminal fragment of MYRF also homo-oligomerized using co-immunoprecipitation. As predicted, the C-terminal fragment of MYRF-6M (Figure 4A) did not bind FLAG beads (Figure 4B). However, co-transfection with MYRF-3F (Figure 4A) induced binding (Figure 4B), confirming that the C-terminal fragment generated from auto-processing is also a homo-oligomer. Because the bacteriophage ICA domain is known to exist autonomously as a trimer [19]–[20] and the ICA domain is part of the C-terminal fragment of MYRF, and we have confirmed the trimeric state of the N-terminal fragment, we expect the MYRF C-terminal fragment to also exist as a trimer. Because the ICA domain is known to induce trimerization, we expected that a truncated construct encoding only the N-terminal fragment (MYRF-1:577; Figure 4A) should fail to trimerize. When expressed alone in HeLa cells, 5M-MYRF-1:577 did not bind FLAG beads (Figure 4B). When co-transfected with 3F-MYRF-1:577, it still did not bind (Figure 4B), even though co-transfected 3F-MYRF-1:577 robustly bound the FLAG beads (unpublished data). Thus, an intact ICA domain is essential for the formation of the N-terminal trimer. As bacteriophage ICA domains require the completion of folding and trimerization as a prerequisite to the auto-cleavage reaction [20],[36], we suspected that full-length MYRF should also homo-oligomerize. A test of full-length MYRF, obtained by using a catalytic residue mutant (K583A), confirmed its homo-oligomerization (Figure 4B). Likewise, full-length MYRF obtained by a leucine zipper mutant (L683A) still homo-oligomerized (Figure 4B), in spite of defective auto-processing (Figure 3C). Thus, auto-processing of MYRF apparently requires both trimerization and proper formation of the leucine zipper that includes L683. Notably, the N-terminal fragment generated from the auto-processing of MYRF-1:756 (Figure 4A) also formed a homo-oligomer (Figure 4B), consistent with functional autonomy of the ICA domain. Upon auto-processing, the N-terminal trimer translocates to the nucleus. To test the roles of the predicted NLSs (K245KRK248 and K482KGK485) in nuclear translocation, we examined the effects of deleting the NLSs on subcellular localization. When either single NLS was deleted (3F-MYRFΔNLS1 or 3F-MYRFΔNLS2), nuclear translocation of the N-terminal trimer was only partially blocked (Figure 4C). Deletion of both NLSs blocked MYRF nuclear translocation to a greater extent (Figure 4C), indicating that both NLSs contribute to the nuclear translocation of the N-terminal trimer. Auto-Processing Is Essential for the Transcriptional Activity of MYRF Once generated as a type-II membrane protein, MYRF is auto-processed into two independent fragments (Figure 3B). The N-terminal trimer enters the nucleus where it is likely to function as a transcription factor, while the C-terminal homo-oligomer remains in the ER, where its function is unknown. In order to assay the transcriptional roles of MYRF, we first identified transcriptional targets of MYRF by performing next-generation RNA sequencing of HeLa cells that were transfected with wild-type MYRF and a catalytic residue mutant. Among genes differentially expressed between the two samples (Table S1), we confirmed Endothelin 2 (Edn2) as a transcriptional target of MYRF in HeLa cells, and thus could use its expression levels measured by quantitative real-time polymerase-chain-reaction (qRT-PCR) as a readout of the transcriptional activity of various MYRF constructs. Using this assay, we confirmed that auto-processing of MYRF is required for the transcriptional activity of MYRF. Figure 5A shows that the expression level of Edn2 was about 20-fold higher when HeLa cells were transfected with MYRF compared to the empty vector control (pcDNA3). The transcriptional activation of Edn2 by MYRF is most likely due to the direct binding of MYRF to DNA because mutation of R445, a strictly conserved residue essential for the direct binding of MrfA to DNA [37], to alanine ablated its transcriptional effects (Figure 5A). Mutation of R445 to alanine did not affect the auto-processing and localization of MYRF (Figure S5). Blocking the auto-processing of MYRF, by mutating either catalytic residues (S578A and K583A) or a structurally important residue (L683A), abrogated the transcriptional effects of MYRF on Edn2 (Figure 5A). Download: PPT PowerPoint slide PNG larger image TIFF original image Figure 5. Auto-processing is essential for the functions of MYRF. (A) The transcriptional activity of various MYRF constructs was estimated by their ability to activate the transcription of Edn2 in HeLa cells. Values are means ± SEM. (B) Examples of transfected CG4 cells that matured to express MBP or O1. (C) Quantification of the proportion of transfected CG4 cells expressing MBP or O1. Values are means ± SEM. *p<0.05, **p<0.01, and ***p<0.001. Scale bar, 10 µm. https://doi.org/10.1371/journal.pbio.1001624.g005 To test whether the N-terminal trimer is sufficient for this transcriptional activation, we assayed a construct truncated before the TM domain at L756 (MYRF-1:756) for transcriptional activity. MYRF-1:756 properly homo-oligomerizes (Figure 4B) and is normally processed (Figure 3F). Figure 5A shows that MYRF-1:756 is as competent as full-length MYRF in activating the transcription of Edn2, demonstrating that the N-terminal trimer is sufficient for the transcriptional activity of MYRF; notably, the C-terminal homo-oligomer does not significantly contribute to this activity. Although homo-trimeric transcription factors are not common, there is a well-known precedent, heat shock factor 1 [38]. Trimerization also appears to be necessary for full transcriptional effects, as we observed a construct directly encoding the N-terminal fragment of MYRF (MYRF-1:577, Figure 4A) to be only partially functional (Figure 5A); this construct fails to form a trimer (Figure 4B). On the other hand, our observation that MYRF-1:577 is still partially functional is in excellent agreement with a recent demonstration that monomeric MrfA can still bind DNA in vitro, although it appears to function as a trimer in vivo [37]. Auto-Processing Is Essential for MYRF to Promote OL Maturation Given the well-characterized role of Myrf (the mouse ortholog) in OL maturation [17], we examined the functional consequences of MYRF auto-processing on the maturation of CG4 cells. Although the CG4 cell line is widely understood not to be a good model of myelination, it may be used as a model for early OL differentiation. We counted the fraction of transfected CG4 cells that had matured to express myelin basic protein (MBP) or O1, two known OL maturation markers (Figure 5B). MYRF significantly promoted the maturation of CG4 cells: 15% of transfected CG4 cells matured to express MBP when transfected with a vector containing both MYRF and GFP (MYRF, Figure 5C), as compared to less than 2% of cells transfected with a control vector expressing only GFP (IRES-GFP). Consistent with the RT-PCR analysis, the mutation of R445 to alanine abrogated the effects of MYRF on CG4 cell maturation (MYRF-R445A, Figure 5C), suggesting that MYRF directly binds DNA to activate transcription for OL maturation. Auto-processing mutants of MYRF (MYRF-S578A and MYRF-K583A) similarly failed to promote CG4 cell maturation (Figure 5C), indicating that correct processing is required. To test if the N-terminal trimer generated by auto-processing is sufficient for OL maturation, we employed a construct truncated before the TM domain at L756 (MYRF-1:756, Figure 4A). Notably, MYRF-1:756 was much less competent compared to wild-type MYRF in promoting maturation (Figure 5C), in spite of being normally processed (Figure 3F), homo-oligomerizing (Figure 4B), and activating Edn2 expression similarly to wild-type MYRF (Figure 5A) in HeLa cells, suggesting a potential role for the C-terminal domain in OL maturation. Overall, our data confirm that auto-processing is essential for MYRF to promote OL maturation. They also suggest that the maturation of OLs might require both the transcription factor function of the N-terminal trimer and the unknown function of the C-terminal homo-oligomer in the ER. Full-Length MYRF Is First Generated as a Type-II Membrane Protein Myrf (the mouse ortholog of MYRF) was previously reported to encode a nuclear protein, based on immunofluorescence (IF) microscopy with an N-terminally Myc-tagged construct [17]. However, TOPCONS [15], a state-of-the-art membrane topology prediction program, predicts both MYRF and Myrf to be type-II membrane proteins (Figure S1A). Notably, we identified well-conserved nuclear localization signals (NLSs) in the N-terminus (K245KRK248 and K482KGK485) and potential N-linked glycosylation sites in the C-terminus (Figure 1A). Download: PPT PowerPoint slide PNG larger image TIFF original image Figure 1. Full-length MYRF is generated as a membrane protein. (A) Predicted sequence features of MYRF and sequence diagrams of various MYRF constructs used for IF microscopy. Stars in blue indicate predicted NLSs at K245KRK248 and K482KGK485. (B) IF images of GFP-MYRF, MYRF-GFP, MYRFΔTM-GFP, and MYRF-1:756-GFP in HeLa cells. (C) IF image of 3F-MYRF-GFP in HeLa cells. Scale bar, 10 µm. https://doi.org/10.1371/journal.pbio.1001624.g001 To determine the precise localization of MYRF in the cell, we expressed epitope-tagged MYRF constructs in HeLa cells. Green fluorescent protein (GFP) tagged to the N-terminus of MYRF (GFP-MYRF, Figure 1A) localized to the nucleus, in agreement with the previous study on Myrf (Figure 1B). However, when GFP was tagged to the C-terminus of MYRF (MYRF-GFP, Figure 1A), the GFP signal co-localized with calnexin (CLX), an ER marker (Figure 1B). A doubly-tagged protein, 3F-MYRF-GFP (Figure 1A), resolved this apparent dichotomy: The FLAG tag at the N-terminus exhibited a nuclear signal, whereas the C-terminal GFP signal co-localized with the ER (Figure 1C). In order to test if the predicted TM domain mediated the ER localization of the C-terminus of MYRF, we deleted the TM domain from the C-terminally GFP-tagged construct (MYRFΔTM-GFP, Figure 1A). MYRFΔTM-GFP localized to the nucleus of HeLa cells (Figure 1B), confirming the role of the predicted TM domain for ER localization. Similarly, a C-terminally GFP-tagged mutant truncated before the predicted TM domain at L756 (MYRF-1:756-GFP, Figure 1A) also localized to the nucleus (Figure 1B). Control experiments were consistent when using alternate epitope tags (FLAG tag; Figure S1B) and cell lines (CG4 cells, a rat OL cell line that may be used as a model for early OL differentiation [23]–[31]; Figure S1C and S1D). Thus, these localization patterns appear to be intrinsic features of MYRF and not artifacts of the particular tags or cells used. The microscopy suggested that MYRF is processed in cells, which was further confirmed by Western blot of 3F-MYRF (Figure 2B). The majority of the protein was cleaved into a ∼90 kDa N-terminal fragment from the full length of ∼160 kDa. The latter was further verified by comparing 5M-MYRF-3F protein expressed in cells to that expressed from an in vitro translation system (the in vitro reaction mixture immunoprecipitated with FLAG antibodies and blotted with anti-Myc antibodies) (Figure 2C). Download: PPT PowerPoint slide PNG larger image TIFF original image Figure 2. Full-length MYRF is a type-II membrane protein. (A) Predicted sequence features of MYRF and sequence diagrams of various MYRF constructs used for experiments. (B) Western blot of HeLa cells transfected with pcDNA3 and 3F-MYRF. (C) The top band of HeLa cells that were transfected with 5M-MYRF-3F has the same electrophoretic mobility as full-length protein products for the same construct that were obtained with an in vitro translation system. (D) Full-length forms of MYRF consist of two closely spaced bands that represent glycosylated and unglycosylated full-length MYRF, respectively (indicated by the two arrows). (E) HeLa cells transfected with 3F-MYRF were disrupted using a Dounce-type homogenizer, and then centrifuged at 200× g for 5 min to obtain a supernatant fraction. It was mixed with 0.1 volume of each of the following chemicals: 5 M NaCl, 1 M Na2CO3 (pH 11), and 10% SDS. After incubation for 20 min at room temperature, mixtures were centrifuged at 20,000× g for 15 min at 4°C to separate supernatant (S) from pellet (P). Calnexin, a known integral membrane protein, served as a control. (F) Membrane topology of GFP-MYRF-3F and 3F-MYRF-L690A-GFP in HeLa cells. When cell membranes were selectively permeated by digitonin, FLAG IF signals of GFP-MYRF-3F could not be detected, indicating that the C-terminus of MYRF is located within the ER lumen. In contrast, FLAG IF signals of 3F-MYRF-L690A-GFP were robustly detected even when cell membranes were selectively permeated by digitonin, indicating that the N-terminus of full-length MYRF is located on the cytoplasmic side of ER membranes. Scale bar, 10 µm. https://doi.org/10.1371/journal.pbio.1001624.g002 The top band representing full-length MYRF was observed to consist of two closely spaced bands (Figure 2D, arrows), with the upper and lower bands potentially representing glycosylated and unglycosylated full-length MYRF, respectively. Upon MG132 treatment, the lower band became as dominant as the upper one. This suggested either that MG132 treatment alters the degradation of MYRF or that MG132—an inducer of ER stress that decreases glycosylation efficiency [32]—inhibits the glycosylation of full-length MYRF, leading to the accumulation of unglycosylated full-length MYRF. Consistent with the latter possibility, tunicamycin treatment reversed the ratio between the upper and lower bands, with the lower one now dominating (Figure 2D). We note that the 120 kDa isoform (Figure 2B) is most likely a degradation intermediate, as it was inconsistently observed and disappeared upon treatment with MG132 (Figure S2A). Fractionation of HeLa cells transfected with 3F-MYRF revealed that full-length MYRF could be extracted from membranes by treatment with the detergent SDS, but not with high salt or alkaline pH (Figure 2E), similar to the control protein calnexin, a known integral membrane protein. Thus, the fluorescence microscopy, TM domain mutagenesis, glycosylation analysis, and biochemical fractionation data all demonstrated that full-length MYRF is an integral membrane protein. Finally, we determined the membrane topology of full-length MYRF by treating cells with digitonin, which selectively permeabilizes the plasma membrane but not organelle membranes (Figure S2B) [33]. When the plasma membrane of HeLa cells expressing GFP-MYRF-3F was selectively permeabilized by digitonin, FLAG IF signals could not be detected, in contrast to a strong signal when membranes were indiscriminately permeabilized by Triton X-100 (Figure 2F), suggesting that the C-terminus of MYRF is oriented to the ER lumen. Additional tests with a point mutant (L690A, detailed below) that blocks the generation of the 90 kDa isoform from full-length MYRF enabled us to probe the subcellular location of the N-terminus of full-length MYRF. FLAG IF signals were detected for 3F-MYRF-L690A-GFP when cell membranes were selectively permeated with digitonin (Figure 2F), indicating that the N-terminus of full-length MYRF is located on the cytoplasmic side of ER membranes. Thus, MYRF is synthesized as a type-II membrane protein and processed into N-terminal and C-terminal portions, localized in the nucleus and on the ER membrane, respectively. MYRF Harbors the Intramolecular Chaperone Domain of Bacteriophage Endosialidases In the course of analyzing the MYRF sequence, we discovered distant but significant homology (16% sequence identity and E-value = 3.1×10−18, as measured by HHpred [34]) between the portion of MYRF that lies between its DNA-binding and TM domains and the intramolecular chaperone domain found in bacteriophage endosialidases, proteins that constitute the tailspikes of many bacteriophages (Figure S3A) [19],[35]. The intramolecular chaperone domain, which we have dubbed an ICA (Intramolecular Chaperone Auto-processing) domain, plays two roles in the maturation of bacteriophage endosialidases. The ICA domain facilitates the protein's folding and trimerization [19],[35]. It then functions as a “folding sensor” and auto-cleaves itself away from the bacteriophage endosialidase [20]. A multiple sequence alignment of MYRF and its orthologs indicated that the ICA domain is a strictly conserved feature (Figure S3D). Further, a multiple sequence alignment of only the ICA domains from eukaryotes, a bacterium, and a phage revealed the absolute conservation of S578 and K583 (following the MYRF numbering, Figure 3A). In bacteriophage endosialidases, the serine and lysine residues equivalent to MYRF S578 and K583 form a catalytic dyad for the auto-cleavage reaction [20]. The correct positioning of these catalytic residues, along with an arginine residue that stabilizes the oxyanion during the peptide bond breakage, is thought to be achieved only upon folding and trimerization of bacteriophage endosialidases [20], enabling the ICA domain to function as a folding sensor. We thus asked if the ICA domain might nonetheless still serve—in a radically altered context as compared to viral tailspikes—as a folding sensor and protease to activate MYRF. Download: PPT PowerPoint slide PNG larger image TIFF original image Figure 3. The ICA domain autonomously mediates the proteolytic processing of MYRF. (A) Multiple sequence alignment of the ICA domains from eukaryotes, a bacterium, and a phage, generated with ClustalW [54]. Strictly conserved residues are shown in red. The numbering system is based on MYRF. (B) The auto-processing mechanism for MYRF postulated based on the ICA domain and its known properties. (C) Western blots of HeLa cells transfected with various MYRF constructs, showing the effects of mutations in the ICA domain on the proteolytic processing of MYRF. (D) IF image of 3F-MYRF-S578A and 3F-MYRF-K583A in HeLa cells. (E) The amino acid sequence of MYRF (residues N567-R692) was mapped onto the crystal structure of an ICA domain (PDB ID: 3GW6) using the alignment shown in Figure S3A. In the zoomed active site are shown two key catalytic residues (S578 and K583, both belonging to the same subunit) in stick model and two strictly conserved residues (V670 of one subunit and G626 of a different subunit) in space filling model. Shown below are L683, I687, and L690 that were predicted to form a leucine zipper. For visual clarity, clipped images were generated when deemed necessary. (F) (Left) Western blot showing that the proteolytic processing of MYRF is independent of its membrane insertion. MYRF-1:756 is a mutant truncated before the TM domain at L756. (Middle) Western blot showing the proteolytic processing of MYRF-319:708 in HeLa cells and E. coli. (Right) Western blot showing the normal processing of full-length MYRF in budding yeast. Scale bar, 10 µm. https://doi.org/10.1371/journal.pbio.1001624.g003 The ICA Domain Autonomously Mediates the Proteolytic Processing of Full-Length MYRF Based on the conservation of the ICA domain, including its catalytic residues, we hypothesized that, once generated as a type-II membrane protein, the ICA domain could potentially facilitate the folding and trimerization of full-length MYRF and then proteolytically process it into two independent trimers (Figure 3B). The N-terminal trimer, containing the DNA-binding domain, might then be released from membranes and enter the nucleus to regulate transcription, while the C-terminal trimer, comprising residues S578-D1111, would remain in the ER membrane. To test this hypothesis, we mutated the ICA domain of MYRF and assayed the effects on the proteolytic processing of MYRF. Deletions involving the ICA domain (Δ662–752 and Δ538–617) blocked normal processing of MYRF (Figure 3C). Likewise, mutation of the putative catalytic residues S578 and K583 to alanine (3F-MYRF-S578A and 3F-MYRF-K583A) also blocked proteolytic processing (Figure 3C). The FLAG tag at the N-terminus of these mutant constructs remained in the ER membrane of HeLa cells (Figure 3D), demonstrating that the DNA-binding domain of MYRF is retained in the membrane when auto-processing is blocked. We next asked whether additional residues shown to be important for the function of phage ICA domains are also important for the function of the MYRF ICA domain. As the N912 and G956 residues in the ICA domain of bacteriophage K1F endosialidase are essential for the function of the ICA domain [19], mutation of their corresponding residues in MYRF to alanine (3F-MYRF-D579A and 3F-MYRF-G626A) markedly reduced the proteolytic processing of MYRF (Figure 3C). As an additional control, we expressed a truncated form of MYRF that terminates at P577 (3F-MYRF-1:577), corresponding to the expected N-terminal fragment generated from auto-processing (Figure 3B), and confirmed that it has the same electrophoretic mobility as the processed N-terminal fragment of 3F-MYRF (Figure 3C). Taken together, these results support the hypothesis that the ICA domain mediates the proteolytic processing of MYRF, in a manner similar to bacteriophage endosialidases. To further investigate the role of the ICA domain in the processing of MYRF, we mapped the amino acid sequence of the MYRF ICA domain (residues N567–R692) onto the crystal structure of the ICA domain of bacteriophage K1F endosialidase (PDB accession code: 3GW6 [20]) (Figure S3A). The homology-derived structure predicted that L683, I687, and L690 of MYRF form a leucine zipper (Figure 3E). Because the leucine zipper appears integral to the trimeric structure of the MYRF ICA domain, we reasoned that its disruption would destabilize the trimer and consequently interfere with proteolytic processing. Site-directed mutagenesis confirmed that the leucine zipper is indeed required for MYRF processing (Figure 3C). IF microscopy of 3F-MYRF-L683A and 3F-MYRF-L690A in HeLa cells confirmed that their localizations matched the catalytic residue mutants (Figure S3B). In contrast, the structure suggested that K596, S599, and L679 would not be essential to either catalytic or structural roles, and all were predicted to face the exterior of the protein (Figure S3C). Consistent with this prediction, mutating each of these residues to alanine (3F-MYRF-K596A, 3F-MYRF-S599A, and 3F-MYRF-L679A) did not affect MYRF processing (Figure 3C). These results confirm that the ICA domain is indeed responsible for the proteolytic processing of MYRF, and that the mechanism of proteolysis is conserved between animals and bacteriophages, in spite of a complete alteration of neighboring protein domains and overall protein function. The ICA domain is known to function autonomously to proteolyze bacteriophage endosialidases. We therefore asked whether the processing of MYRF was similarly autonomous, testing two specific hypotheses. First, we examined whether the proteolytic processing of MYRF was independent of membrane integration. As shown in Figure 3F, a construct (3F-MYRF-1:756) that was truncated before the TM domain at L756 was normally processed in HeLa cells, but processing was blocked when the catalytic residue S578 was changed to alanine (3F-MYRF-1:756-S578A). Second, we asked whether MYRF is normally processed in heterologous systems, which would support a fully autonomous event. To address this hypothesis, we expressed MYRF in E. coli and yeast cells. Due to the difficulty of expressing full-length MYRF in E. coli, we worked with a truncation construct (MYRF-319:708) that only comprises the DNA-binding and ICA domains of MYRF. This construct was normally processed in HeLa cells (Figure 3F), and its processing was blocked when important residues were mutated to alanine (3F-MYRF-319:708-S578A, 3F-MYRF-319:708-K583A, and 3F-MYRF-319:708-L683A). Figure 3F shows that MYRF-319:708 behaved in the same manner in E. coli, and similarly, full-length MYRF was normally processed in budding yeast (Figure 3F). Taken together, these results indicate that the ICA domain autonomously functions in the proteolytic processing of MYRF. The N-Terminal Trimer, Formed by the ICA Domain, Translocates to the Nucleus Aided by Two NLSs The ICA domain is known to induce the trimerization of bacteriophage endosialidases as part of its intramolecular chaperone activity [36]. Given the central role of the ICA domain in MYRF auto-processing, we next asked whether it was also promoting trimerization in this context. We first used co-immunoprecipitation experiments of differentially tagged constructs in order to assay homo-oligomerization of the N-terminal fragment generated by the auto-processing of MYRF. As shown in Figure 4B, the N-terminal fragment of N-terminally 5xMyc-tagged MYRF (5M-MYRF; Figure 4A) did not bind beads coated with FLAG antibodies. However, when co-transfected with 3F-MYRF (Figure 4A), the N-terminal fragment of 5M-MYRF robustly bound the FLAG beads, confirming homo-oligomerization of the N-terminal fragment of MYRF. To measure the nature of the homo-oligomer, we employed size exclusion chromatography, which indicated that the N-terminal fragment from the auto-processing of MYRF-319:708 (a construct which only contains the DNA-binding and ICA domains of MYRF) forms a trimer (Figure S4). MrfA, a Dictyostelium ortholog of MYRF, has also been suggested to bind DNA as a trimer in vivo [37]. Download: PPT PowerPoint slide PNG larger image TIFF original image Figure 4. The N-terminal trimer is formed by the ICA domain and enters the nucleus. (A) Predicted sequence features of MYRF and sequence diagrams of various MYRF constructs used for experiments. (B) Western blots showing co-immunoprecipitation results for the MYRF constructs. “Input” was incubated with FLAG antibody-coated beads and then spun down to separate “Sup” from “Bead” fractions. The failure of MYRF-1:577 to homo-oligomerize demonstrated the importance of the ICA domain for the N-terminal trimer formation. (C) When the NLSs (NLS1 and NLS2) were deleted, the nuclear translocation of the N-terminal trimer was partially blocked. Scale bar, 20 µm. https://doi.org/10.1371/journal.pbio.1001624.g004 Given that the MYRF N-terminal fragment forms a trimer, we assayed the state of the C-terminal fragment. We could not directly assay its trimeric state due to expression and purification issues in E. coli; instead, we asked if the C-terminal fragment of MYRF also homo-oligomerized using co-immunoprecipitation. As predicted, the C-terminal fragment of MYRF-6M (Figure 4A) did not bind FLAG beads (Figure 4B). However, co-transfection with MYRF-3F (Figure 4A) induced binding (Figure 4B), confirming that the C-terminal fragment generated from auto-processing is also a homo-oligomer. Because the bacteriophage ICA domain is known to exist autonomously as a trimer [19]–[20] and the ICA domain is part of the C-terminal fragment of MYRF, and we have confirmed the trimeric state of the N-terminal fragment, we expect the MYRF C-terminal fragment to also exist as a trimer. Because the ICA domain is known to induce trimerization, we expected that a truncated construct encoding only the N-terminal fragment (MYRF-1:577; Figure 4A) should fail to trimerize. When expressed alone in HeLa cells, 5M-MYRF-1:577 did not bind FLAG beads (Figure 4B). When co-transfected with 3F-MYRF-1:577, it still did not bind (Figure 4B), even though co-transfected 3F-MYRF-1:577 robustly bound the FLAG beads (unpublished data). Thus, an intact ICA domain is essential for the formation of the N-terminal trimer. As bacteriophage ICA domains require the completion of folding and trimerization as a prerequisite to the auto-cleavage reaction [20],[36], we suspected that full-length MYRF should also homo-oligomerize. A test of full-length MYRF, obtained by using a catalytic residue mutant (K583A), confirmed its homo-oligomerization (Figure 4B). Likewise, full-length MYRF obtained by a leucine zipper mutant (L683A) still homo-oligomerized (Figure 4B), in spite of defective auto-processing (Figure 3C). Thus, auto-processing of MYRF apparently requires both trimerization and proper formation of the leucine zipper that includes L683. Notably, the N-terminal fragment generated from the auto-processing of MYRF-1:756 (Figure 4A) also formed a homo-oligomer (Figure 4B), consistent with functional autonomy of the ICA domain. Upon auto-processing, the N-terminal trimer translocates to the nucleus. To test the roles of the predicted NLSs (K245KRK248 and K482KGK485) in nuclear translocation, we examined the effects of deleting the NLSs on subcellular localization. When either single NLS was deleted (3F-MYRFΔNLS1 or 3F-MYRFΔNLS2), nuclear translocation of the N-terminal trimer was only partially blocked (Figure 4C). Deletion of both NLSs blocked MYRF nuclear translocation to a greater extent (Figure 4C), indicating that both NLSs contribute to the nuclear translocation of the N-terminal trimer. Auto-Processing Is Essential for the Transcriptional Activity of MYRF Once generated as a type-II membrane protein, MYRF is auto-processed into two independent fragments (Figure 3B). The N-terminal trimer enters the nucleus where it is likely to function as a transcription factor, while the C-terminal homo-oligomer remains in the ER, where its function is unknown. In order to assay the transcriptional roles of MYRF, we first identified transcriptional targets of MYRF by performing next-generation RNA sequencing of HeLa cells that were transfected with wild-type MYRF and a catalytic residue mutant. Among genes differentially expressed between the two samples (Table S1), we confirmed Endothelin 2 (Edn2) as a transcriptional target of MYRF in HeLa cells, and thus could use its expression levels measured by quantitative real-time polymerase-chain-reaction (qRT-PCR) as a readout of the transcriptional activity of various MYRF constructs. Using this assay, we confirmed that auto-processing of MYRF is required for the transcriptional activity of MYRF. Figure 5A shows that the expression level of Edn2 was about 20-fold higher when HeLa cells were transfected with MYRF compared to the empty vector control (pcDNA3). The transcriptional activation of Edn2 by MYRF is most likely due to the direct binding of MYRF to DNA because mutation of R445, a strictly conserved residue essential for the direct binding of MrfA to DNA [37], to alanine ablated its transcriptional effects (Figure 5A). Mutation of R445 to alanine did not affect the auto-processing and localization of MYRF (Figure S5). Blocking the auto-processing of MYRF, by mutating either catalytic residues (S578A and K583A) or a structurally important residue (L683A), abrogated the transcriptional effects of MYRF on Edn2 (Figure 5A). Download: PPT PowerPoint slide PNG larger image TIFF original image Figure 5. Auto-processing is essential for the functions of MYRF. (A) The transcriptional activity of various MYRF constructs was estimated by their ability to activate the transcription of Edn2 in HeLa cells. Values are means ± SEM. (B) Examples of transfected CG4 cells that matured to express MBP or O1. (C) Quantification of the proportion of transfected CG4 cells expressing MBP or O1. Values are means ± SEM. *p<0.05, **p<0.01, and ***p<0.001. Scale bar, 10 µm. https://doi.org/10.1371/journal.pbio.1001624.g005 To test whether the N-terminal trimer is sufficient for this transcriptional activation, we assayed a construct truncated before the TM domain at L756 (MYRF-1:756) for transcriptional activity. MYRF-1:756 properly homo-oligomerizes (Figure 4B) and is normally processed (Figure 3F). Figure 5A shows that MYRF-1:756 is as competent as full-length MYRF in activating the transcription of Edn2, demonstrating that the N-terminal trimer is sufficient for the transcriptional activity of MYRF; notably, the C-terminal homo-oligomer does not significantly contribute to this activity. Although homo-trimeric transcription factors are not common, there is a well-known precedent, heat shock factor 1 [38]. Trimerization also appears to be necessary for full transcriptional effects, as we observed a construct directly encoding the N-terminal fragment of MYRF (MYRF-1:577, Figure 4A) to be only partially functional (Figure 5A); this construct fails to form a trimer (Figure 4B). On the other hand, our observation that MYRF-1:577 is still partially functional is in excellent agreement with a recent demonstration that monomeric MrfA can still bind DNA in vitro, although it appears to function as a trimer in vivo [37]. Auto-Processing Is Essential for MYRF to Promote OL Maturation Given the well-characterized role of Myrf (the mouse ortholog) in OL maturation [17], we examined the functional consequences of MYRF auto-processing on the maturation of CG4 cells. Although the CG4 cell line is widely understood not to be a good model of myelination, it may be used as a model for early OL differentiation. We counted the fraction of transfected CG4 cells that had matured to express myelin basic protein (MBP) or O1, two known OL maturation markers (Figure 5B). MYRF significantly promoted the maturation of CG4 cells: 15% of transfected CG4 cells matured to express MBP when transfected with a vector containing both MYRF and GFP (MYRF, Figure 5C), as compared to less than 2% of cells transfected with a control vector expressing only GFP (IRES-GFP). Consistent with the RT-PCR analysis, the mutation of R445 to alanine abrogated the effects of MYRF on CG4 cell maturation (MYRF-R445A, Figure 5C), suggesting that MYRF directly binds DNA to activate transcription for OL maturation. Auto-processing mutants of MYRF (MYRF-S578A and MYRF-K583A) similarly failed to promote CG4 cell maturation (Figure 5C), indicating that correct processing is required. To test if the N-terminal trimer generated by auto-processing is sufficient for OL maturation, we employed a construct truncated before the TM domain at L756 (MYRF-1:756, Figure 4A). Notably, MYRF-1:756 was much less competent compared to wild-type MYRF in promoting maturation (Figure 5C), in spite of being normally processed (Figure 3F), homo-oligomerizing (Figure 4B), and activating Edn2 expression similarly to wild-type MYRF (Figure 5A) in HeLa cells, suggesting a potential role for the C-terminal domain in OL maturation. Overall, our data confirm that auto-processing is essential for MYRF to promote OL maturation. They also suggest that the maturation of OLs might require both the transcription factor function of the N-terminal trimer and the unknown function of the C-terminal homo-oligomer in the ER. Discussion RIP- and RUP-activated MBTFs are widely observed across organisms, spanning both eukaryotes and prokaryotes. MBTFs are recognized as increasingly common regulatory mechanisms in plants, with many plant MBTFs playing important roles in stress responses and development [39]–[43]. NTM1 (NAC with TM motif1), for example, regulates cell division and growth in Arabidopsis [44]. New examples of MBTFs have also been identified for bacteria [45]–[49]. In fact, ToxR of Vibrio cholerae was the first known MBTF [50], although it is still unclear whether ToxR requires a proteolytic activation step to exert transcriptional effects. Thus, it is likely that many MBTFs remain to be found, and an open question is what other activation mechanisms may be employed. MYRF reveals one such previously unknown activation mechanism. MYRF Is a Founding Member of a New Family of MBTFs That Are Auto-Processed Into Two Independent Trimers by an ICA Domain We show that MYRF is a MBTF that is auto-processed by its ICA domain into two independent homo-oligomers (Figure 6B). The N-terminal trimer, containing a largely disordered protein segment and the Ndt80 DNA-binding domain, is released from the membrane and translocates to the nucleus to regulate gene expression. The disordered N-terminal protein segment presumably functions as a transactivation domain because partial deletions in this region render MYRF nonfunctional in terms of its transcriptional activation of Edn2, although auto-processing and localization are not affected (unpublished data). The N-terminal trimer is both necessary and sufficient for the transcriptional activity of MYRF. The C-terminal homo-oligomer remains in the ER, and may perform an important function there. Our functional assays show that auto-processing is essential for MYRF both to activate transcription and to promote OL maturation. Download: PPT PowerPoint slide PNG larger image TIFF original image Figure 6. The ICA domain catalyzes trimerization-dependent auto-proteolysis in entirely distinct protein and cellular contexts. (A) In K1F bacteriophage, the C-terminal ICA domain within each tailspike endosialidase auto-catalytically removes itself following tailspike trimerization, guiding maturation of the six tailspikes surrounding the phage tail (shown for phage K1E, adapted from [55]). (B) Once generated as a type-II membrane protein, the ICA domain is thought to induce the trimerization of MYRF, upon which it cleaves itself, generating two independent trimers. The N-terminal trimer translocates to the nucleus and activates the transcription of myelin genes by direct DNA binding. The transcriptional role of the N-terminal trimer serves to promote the terminal differentiation of OLs, likely aided by an as-yet-unknown function of the C-terminal trimer that remains in the ER. https://doi.org/10.1371/journal.pbio.1001624.g006 Importantly, this mechanism is probably conserved across all MYRF family members that include MYRFL (a paralog of MYRF), as all members—found across the animal kingdom, including vertebrates, insects, nematodes, amoeba, and tunicates—are characterized by the domain arrangement shown in Figure 3B. Residues shown to be essential for the direct DNA-binding of the Ndt80 DNA-binding domain [18], including MYRF R445 and R469, are strictly conserved across the family, as are key residues of the ICA domain, such as S578, D579, K583, G626, and I687, and the presence of the leucine zipper. The ICA domain is invariantly followed by a relatively well-conserved TM domain and a poorly conserved ER lumenal domain. Does the ICA Domain Function as a Chaperone? As an intramolecular chaperone, the ICA domain is known to facilitate the trimerization and folding of protein sequences that lie N-terminal to it [36], and it has been best characterized for bacteriophage endosialidases [19]–[20]. Without the ICA domain, the endosialidases apparently fail to fold properly, let alone forming a trimer. For the N-terminal fragment of MYRF, however, the role of the ICA domain seems limited to trimerization. Even though MYRF-1:577 failed to form a trimer (Figure 4B), it did activate the transcription of Edn2, albeit to a lesser degree compared to its trimeric counterpart. Also, the presumably monomeric form of MrfA bound DNA [37]. These results suggest that while the ICA domain has a clear role in trimerization and auto-catalysis, its role in the proper folding of the N-terminal fragment of MYRF may be less critical than for the folding of bacteriophage endosialidases. Consistent with this, sequence alignments show that an extended loop of the bacteriophage K1F ICA domain that comprises T977–H1026, shown to be essential for the folding of bacteriophage endosialidases [20], is missing in most other ICA domains, including those of MYRF and its orthologs (Figure S3A). From a structural perspective, the ICA domain achieves trimerization by chaperoning a triple β-helix fold [36]. Triple β-helix folds are often associated with very stable trimeric structures such as viral tailspike or fiber proteins, and the bacteriophage K1F endosialidase—whose trimerization and folding are mediated by its ICA domain—is known to form a trimeric structure that is resistant to SDS [19]. An interesting question is whether the N-terminal trimer of MYRF also involves a triple β-helix fold. Figure S3D shows that the region compassing residues W536-P577, where a triple β-helix fold is expected, is indeed well conserved. In PQN-47, a mutation of the glycine residue equivalent to G566 of MYRF renders PQN-47 nonfunctional [51]. The evidence therefore suggests that the N-terminal trimer of MYRF could in principle maintain its trimeric state by having a triple β-helix fold at its C-terminus. Why Are MYRF and Its Orthologs Membrane-Bound? The auto-processing of MYRF appears to be constitutive, which stands in stark contrast to the highly regulated processing of such factors as SREBP [4], ATF6 [5], Notch [9], and STP23 [1]. In fact, we observed normal processing of MYRF in all the systems that we tried, including HEK293 cells, fibroblasts, human umbilical vein endothelial cells, and frog embryos (unpublished data), in addition to the budding yeast and E. coli. Nevertheless, we do not exclude the possibility that the auto-processing of MYRF can in principle be regulated by some mechanism. The apparently constitutive processing of MYRF presents a puzzle, as our data suggest that the processing is not a key step by which MYRF might be externally regulated. Indeed, Myrf (the mouse ortholog) has been shown to be regulated mainly at the transcriptional level [17]. Moreover, another recent study indicates that Myrf is continuously needed throughout adulthood to maintain myelin, consistent with a constitutive process [52]. Why, then, are MYRF and its orthologs MBTFs? While we have demonstrated that the mechanism clearly supports the proper assembly of the N-terminal transcription factor trimer, we speculate that this might additionally represent a mechanism by which the generation of two functionally independent trimers can be mandatorily coupled to coordinate regulation of nuclear and ER processes. Several pieces of circumstantial evidence support this speculation: First, PQN-47, the C. elegans ortholog of MYRF, has recently been implicated in the regulation of molting—a process involving extensive secretion [51]. Notably, an intact ER lumenal domain, localized outside the nucleus, was shown to be critical for PQN-47's regulation of molting. Second, our CG4 maturation assay shows that the N-terminal trimer alone is not as competent as full-length MYRF in promoting the maturation of CG4 cells, suggesting that the unknown function of the C-terminal homo-oligomer may be as essential for OL maturation as the transcriptional function of the N-terminal trimer. Notably, the physiological processes and genes to which MYRF and its orthologs have been linked all involve the secretory pathway. EcmA (an endogenous target gene of MrfA [37]) is a secreted protein. Myelination, in which Myrf plays an essential role, places heavy demands upon the secretory pathway, as does molting, for which PQN-47 is critical [51]. Finally, a meta-analysis of microarray data in the Gene Expression Omnibus database [53] indicates that MYRF is significantly expressed in secretory tissues including stomach and lung (unpublished data). We speculate that in the context of OL differentiation, while the N-terminal trimer acts in the nucleus to stimulate production of myelin components, the C-terminal homo-oligomer either coordinates their orderly passage through the secretory pathway or functions as part of the UPR pathway to prepare the ER for the increased flux of myelin components. In the future, it will be interesting to explore whether MYRF is truly a dual-functional protein and what function the C-terminal homo-oligomer performs in the ER for OL maturation. Reconciling Conflicting Reports on Myrf and Its Orthologs In fact, the recognition of this protein family as MBTFs serves to reconcile apparently contradictory, but likely correct, findings in the prior literature. The report on Myrf ascribed its role to a master transcriptional regulator for OL maturation and CNS myelination [17]. A subsequent study on PQN-47 questioned this role for Myrf, mainly because a PQN-47::GFP translational fusion protein localized outside the nucleus in C. elegans [51]. Based on molting phenotypes, the PQN-47 study concluded that PQN-47 (and Myrf, by implication) might play an important role in the secretory pathway. On the other hand, a recent study in Dictyostelium showed that the DNA-binding domain of MrfA endogenously localizes to the nucleus and binds DNA directly, supporting the conclusion of the report on Myrf [37]. Our finding that MYRF is a MBTF that is auto-processed into two independent homo-oligomers entirely resolves these seemingly conflicting reports: Depending on the location of the epitope tag and whether auto-processing is blocked or not, MYRF and its orthologs can exhibit either nuclear or ER localization, as appropriate; each of these prior studies is consistent with this interpretation. Taken together, MYRF and its orthologs represent a new class of MBTFs that require auto-processing to function in gene transcription and likely also play important roles beyond transcription, including in secretion. MYRF Is a Founding Member of a New Family of MBTFs That Are Auto-Processed Into Two Independent Trimers by an ICA Domain We show that MYRF is a MBTF that is auto-processed by its ICA domain into two independent homo-oligomers (Figure 6B). The N-terminal trimer, containing a largely disordered protein segment and the Ndt80 DNA-binding domain, is released from the membrane and translocates to the nucleus to regulate gene expression. The disordered N-terminal protein segment presumably functions as a transactivation domain because partial deletions in this region render MYRF nonfunctional in terms of its transcriptional activation of Edn2, although auto-processing and localization are not affected (unpublished data). The N-terminal trimer is both necessary and sufficient for the transcriptional activity of MYRF. The C-terminal homo-oligomer remains in the ER, and may perform an important function there. Our functional assays show that auto-processing is essential for MYRF both to activate transcription and to promote OL maturation. Download: PPT PowerPoint slide PNG larger image TIFF original image Figure 6. The ICA domain catalyzes trimerization-dependent auto-proteolysis in entirely distinct protein and cellular contexts. (A) In K1F bacteriophage, the C-terminal ICA domain within each tailspike endosialidase auto-catalytically removes itself following tailspike trimerization, guiding maturation of the six tailspikes surrounding the phage tail (shown for phage K1E, adapted from [55]). (B) Once generated as a type-II membrane protein, the ICA domain is thought to induce the trimerization of MYRF, upon which it cleaves itself, generating two independent trimers. The N-terminal trimer translocates to the nucleus and activates the transcription of myelin genes by direct DNA binding. The transcriptional role of the N-terminal trimer serves to promote the terminal differentiation of OLs, likely aided by an as-yet-unknown function of the C-terminal trimer that remains in the ER. https://doi.org/10.1371/journal.pbio.1001624.g006 Importantly, this mechanism is probably conserved across all MYRF family members that include MYRFL (a paralog of MYRF), as all members—found across the animal kingdom, including vertebrates, insects, nematodes, amoeba, and tunicates—are characterized by the domain arrangement shown in Figure 3B. Residues shown to be essential for the direct DNA-binding of the Ndt80 DNA-binding domain [18], including MYRF R445 and R469, are strictly conserved across the family, as are key residues of the ICA domain, such as S578, D579, K583, G626, and I687, and the presence of the leucine zipper. The ICA domain is invariantly followed by a relatively well-conserved TM domain and a poorly conserved ER lumenal domain. Does the ICA Domain Function as a Chaperone? As an intramolecular chaperone, the ICA domain is known to facilitate the trimerization and folding of protein sequences that lie N-terminal to it [36], and it has been best characterized for bacteriophage endosialidases [19]–[20]. Without the ICA domain, the endosialidases apparently fail to fold properly, let alone forming a trimer. For the N-terminal fragment of MYRF, however, the role of the ICA domain seems limited to trimerization. Even though MYRF-1:577 failed to form a trimer (Figure 4B), it did activate the transcription of Edn2, albeit to a lesser degree compared to its trimeric counterpart. Also, the presumably monomeric form of MrfA bound DNA [37]. These results suggest that while the ICA domain has a clear role in trimerization and auto-catalysis, its role in the proper folding of the N-terminal fragment of MYRF may be less critical than for the folding of bacteriophage endosialidases. Consistent with this, sequence alignments show that an extended loop of the bacteriophage K1F ICA domain that comprises T977–H1026, shown to be essential for the folding of bacteriophage endosialidases [20], is missing in most other ICA domains, including those of MYRF and its orthologs (Figure S3A). From a structural perspective, the ICA domain achieves trimerization by chaperoning a triple β-helix fold [36]. Triple β-helix folds are often associated with very stable trimeric structures such as viral tailspike or fiber proteins, and the bacteriophage K1F endosialidase—whose trimerization and folding are mediated by its ICA domain—is known to form a trimeric structure that is resistant to SDS [19]. An interesting question is whether the N-terminal trimer of MYRF also involves a triple β-helix fold. Figure S3D shows that the region compassing residues W536-P577, where a triple β-helix fold is expected, is indeed well conserved. In PQN-47, a mutation of the glycine residue equivalent to G566 of MYRF renders PQN-47 nonfunctional [51]. The evidence therefore suggests that the N-terminal trimer of MYRF could in principle maintain its trimeric state by having a triple β-helix fold at its C-terminus. Why Are MYRF and Its Orthologs Membrane-Bound? The auto-processing of MYRF appears to be constitutive, which stands in stark contrast to the highly regulated processing of such factors as SREBP [4], ATF6 [5], Notch [9], and STP23 [1]. In fact, we observed normal processing of MYRF in all the systems that we tried, including HEK293 cells, fibroblasts, human umbilical vein endothelial cells, and frog embryos (unpublished data), in addition to the budding yeast and E. coli. Nevertheless, we do not exclude the possibility that the auto-processing of MYRF can in principle be regulated by some mechanism. The apparently constitutive processing of MYRF presents a puzzle, as our data suggest that the processing is not a key step by which MYRF might be externally regulated. Indeed, Myrf (the mouse ortholog) has been shown to be regulated mainly at the transcriptional level [17]. Moreover, another recent study indicates that Myrf is continuously needed throughout adulthood to maintain myelin, consistent with a constitutive process [52]. Why, then, are MYRF and its orthologs MBTFs? While we have demonstrated that the mechanism clearly supports the proper assembly of the N-terminal transcription factor trimer, we speculate that this might additionally represent a mechanism by which the generation of two functionally independent trimers can be mandatorily coupled to coordinate regulation of nuclear and ER processes. Several pieces of circumstantial evidence support this speculation: First, PQN-47, the C. elegans ortholog of MYRF, has recently been implicated in the regulation of molting—a process involving extensive secretion [51]. Notably, an intact ER lumenal domain, localized outside the nucleus, was shown to be critical for PQN-47's regulation of molting. Second, our CG4 maturation assay shows that the N-terminal trimer alone is not as competent as full-length MYRF in promoting the maturation of CG4 cells, suggesting that the unknown function of the C-terminal homo-oligomer may be as essential for OL maturation as the transcriptional function of the N-terminal trimer. Notably, the physiological processes and genes to which MYRF and its orthologs have been linked all involve the secretory pathway. EcmA (an endogenous target gene of MrfA [37]) is a secreted protein. Myelination, in which Myrf plays an essential role, places heavy demands upon the secretory pathway, as does molting, for which PQN-47 is critical [51]. Finally, a meta-analysis of microarray data in the Gene Expression Omnibus database [53] indicates that MYRF is significantly expressed in secretory tissues including stomach and lung (unpublished data). We speculate that in the context of OL differentiation, while the N-terminal trimer acts in the nucleus to stimulate production of myelin components, the C-terminal homo-oligomer either coordinates their orderly passage through the secretory pathway or functions as part of the UPR pathway to prepare the ER for the increased flux of myelin components. In the future, it will be interesting to explore whether MYRF is truly a dual-functional protein and what function the C-terminal homo-oligomer performs in the ER for OL maturation. Reconciling Conflicting Reports on Myrf and Its Orthologs In fact, the recognition of this protein family as MBTFs serves to reconcile apparently contradictory, but likely correct, findings in the prior literature. The report on Myrf ascribed its role to a master transcriptional regulator for OL maturation and CNS myelination [17]. A subsequent study on PQN-47 questioned this role for Myrf, mainly because a PQN-47::GFP translational fusion protein localized outside the nucleus in C. elegans [51]. Based on molting phenotypes, the PQN-47 study concluded that PQN-47 (and Myrf, by implication) might play an important role in the secretory pathway. On the other hand, a recent study in Dictyostelium showed that the DNA-binding domain of MrfA endogenously localizes to the nucleus and binds DNA directly, supporting the conclusion of the report on Myrf [37]. Our finding that MYRF is a MBTF that is auto-processed into two independent homo-oligomers entirely resolves these seemingly conflicting reports: Depending on the location of the epitope tag and whether auto-processing is blocked or not, MYRF and its orthologs can exhibit either nuclear or ER localization, as appropriate; each of these prior studies is consistent with this interpretation. Taken together, MYRF and its orthologs represent a new class of MBTFs that require auto-processing to function in gene transcription and likely also play important roles beyond transcription, including in secretion. Materials and Methods Constructs, Cell Culture, and Transient Transfection The MYRF cDNA was purchased from Open Biosystems (the 1111-amino-acid-long isoform [CCDS ID: 31579 and RefSeq ID: NP_037411]). HeLa cells were cultured in Dulbecco's modified Eagle's medium supplemented with 10% fetal bovine serum. Cells were maintained in a humidified 5% CO2 incubator at 37°C. Transient transfection was performed using either FuGENE HD or Lipofectamine 2000. Immunoprecipitation Cells grown on 150 mm culture dishes were rinsed once with PBS, and 500 µL of 2× Cell Lysis Buffer (Cell Signaling) was directly added to the cell layer. Cell lysates were sonicated and then spun down at 14,000× g for 10 min at 4°C. The supernatant was mixed with FLAG antibody-coated beads (Sigma) and incubated for 2 h at 4°C on a rotating plate. The mix was spun down at 7,500× g for 30 s to separate supernatant (“Sup”) from pellet (“Bead”) fractions. Immunoblotting Cells were rinsed once with PBS and then lysed directly in wells with 1× Laemmli Sample Buffer (Bio-Rad). Cell lysates were boiled at 95°C for 5 min. Upon SDS-PAGE, the proteins were transferred to PVDF and probed with primary and horseradish peroxidase (HRP)-conjugated secondary antibodies. The following dilutions were used for immunoblotting: mouse anti-FLAG (1∶1000, Sigma), rabbit anti-c-Myc (1∶250, Santa Cruz Biotechnology), goat anti-actin (1∶400, Santa Cruz Biotechnology), rabbit anti-calnexin (1∶400, Santa Cruz Biotechnology), goat anti-calnexin (1∶400, Santa Cruz Biotechnology), goat anti-mouse HRP-conjugated (1∶10,000, Santa Cruz Biotechnology), mouse anti-FLAG HRP-conjugated (1∶1,000, Sigma), and mouse anti-c-Myc HRP-conjugated (1∶400, Santa Cruz Biotechnology). Immunofluorescence Cells cultured in glass bottom six-well plates (In Vitro Scientific) were fixed with 4% formaldehyde, permeabilized in cold 100% methanol or 0.1% Triton X-100, blocked with 1% BSA in 1× PBS with 0.05% Tween, and incubated with primary antibody diluted in blocking buffer at 4°C overnight, followed by incubation with fluorochrome-conjugated secondary antibody. Nuclei were stained with Hoechst 33342 (Invitrogen). Fluorescence was visualized with a Nikon Eclipse TE2000-E fitted with a Plan Apo VC 100×/1.40 oil objective and a digital camera (Cascade II 512; Photometrics) controlled by the NIS Elements software (AR 3.0). To selectively permeabilize plasma and ER membranes, 25 µg/ml digitonin was used to treat cells for 5 min on ice, followed by fixation with 4% formaldehyde. Rhodamine-conjugated donkey anti-goat IgG was from Santa Cruz Biotechnology. Alexa Fluor 594 goat anti-mouse or rabbit IgG and Alexa Fluor 488 goat anti-rabbit IgG were from Invitrogen. Protein Expression and Purification in E. coli The truncated MYRF (MYRF-319:708) was inserted into pET52b (Novagen) between BamHI and SacI to generate pET52b-StrepII-MYRF-319:708-10xHis. This plasmid was transformed into BL21 Star (DE3) pLysS E. coli (Invitrogen). Cells were cultured at 37°C to OD600 nm 0.4–0.6 in LB and protein expression was induced by 0.5 mM IPTG at 16°C for 16–18 h. Cells were collected and lysed by sonication in lysis buffer (20 mM Tris pH 8.0, 500 mM NaCl, 10 mM β-mercaptoethanol, 10% glycerol). The lysate was clarified by centrifugation at 15,000× g for 30 min at 4°C and the supernatant was loaded onto a Ni-NTA column (Qiagen). The flow-through was loaded onto a Strep-Tactin chromatography column (IBA) to affinity purify the N-terminal fragment of MYRF-319:708 according to the manufacturer's purification protocol. The eluted protein was dialyzed in dialysis buffer (20 mM HEPES pH 7.5, 50 mM NaCl, 10 mM β-mercaptoethanol, 10% glycerol) and concentrated to about 2 mg/ml. The concentrated protein was analyzed by Superdex 200 10/300 GL gel filtration column chromatography. qRT-PCR RNA was extracted from HeLa cells using Trizol (Invitrogen). cDNA was generated using SuperScript First-Strand Synthesis System for RT-PCR (Invitrogen). qRT-PCR was performed using PowerSYBR Green PCR Master Mix (Invitrogen) and ABI ViiA 7 Real-Time PCR System. Primer sequences are available in Table S2. CG4 Maturation Assay CG4 cells were maintained in GM [70% of CG4 growth medium (Dulbecco's modified Eagle's medium, 5 µg/ml transferrin, 100 µM putrescine, 20 nM progesterone, 30 nM selenium, 10 ng/ml biotin, and 5 µg/ml insulin) supplemented with 30% of the same medium conditioned by B104 cells]. For maturation assays, CG4 cells were plated on glass bottom six-well plates coated with poly-L-ornithine. 0.4 µg of plasmid DNA was transfected using Lipofectamine 2000 (Invitrogen) for 4 h. After transfection, GM was replaced by DM (CG4 growth medium supplemented with 1% FBS, 40 ng/ml triiodothyronine). CG4 cells were maintained in DM for 4 d before immunostaining for cell counting. Primary antibodies used were 1∶500 mouse anti-O1 (Millipore) and 1∶500 rat-anti-MBP (Millipore). For each sample, cells of at least 50 random fields were counted in a blind fashion. Constructs, Cell Culture, and Transient Transfection The MYRF cDNA was purchased from Open Biosystems (the 1111-amino-acid-long isoform [CCDS ID: 31579 and RefSeq ID: NP_037411]). HeLa cells were cultured in Dulbecco's modified Eagle's medium supplemented with 10% fetal bovine serum. Cells were maintained in a humidified 5% CO2 incubator at 37°C. Transient transfection was performed using either FuGENE HD or Lipofectamine 2000. Immunoprecipitation Cells grown on 150 mm culture dishes were rinsed once with PBS, and 500 µL of 2× Cell Lysis Buffer (Cell Signaling) was directly added to the cell layer. Cell lysates were sonicated and then spun down at 14,000× g for 10 min at 4°C. The supernatant was mixed with FLAG antibody-coated beads (Sigma) and incubated for 2 h at 4°C on a rotating plate. The mix was spun down at 7,500× g for 30 s to separate supernatant (“Sup”) from pellet (“Bead”) fractions. Immunoblotting Cells were rinsed once with PBS and then lysed directly in wells with 1× Laemmli Sample Buffer (Bio-Rad). Cell lysates were boiled at 95°C for 5 min. Upon SDS-PAGE, the proteins were transferred to PVDF and probed with primary and horseradish peroxidase (HRP)-conjugated secondary antibodies. The following dilutions were used for immunoblotting: mouse anti-FLAG (1∶1000, Sigma), rabbit anti-c-Myc (1∶250, Santa Cruz Biotechnology), goat anti-actin (1∶400, Santa Cruz Biotechnology), rabbit anti-calnexin (1∶400, Santa Cruz Biotechnology), goat anti-calnexin (1∶400, Santa Cruz Biotechnology), goat anti-mouse HRP-conjugated (1∶10,000, Santa Cruz Biotechnology), mouse anti-FLAG HRP-conjugated (1∶1,000, Sigma), and mouse anti-c-Myc HRP-conjugated (1∶400, Santa Cruz Biotechnology). Immunofluorescence Cells cultured in glass bottom six-well plates (In Vitro Scientific) were fixed with 4% formaldehyde, permeabilized in cold 100% methanol or 0.1% Triton X-100, blocked with 1% BSA in 1× PBS with 0.05% Tween, and incubated with primary antibody diluted in blocking buffer at 4°C overnight, followed by incubation with fluorochrome-conjugated secondary antibody. Nuclei were stained with Hoechst 33342 (Invitrogen). Fluorescence was visualized with a Nikon Eclipse TE2000-E fitted with a Plan Apo VC 100×/1.40 oil objective and a digital camera (Cascade II 512; Photometrics) controlled by the NIS Elements software (AR 3.0). To selectively permeabilize plasma and ER membranes, 25 µg/ml digitonin was used to treat cells for 5 min on ice, followed by fixation with 4% formaldehyde. Rhodamine-conjugated donkey anti-goat IgG was from Santa Cruz Biotechnology. Alexa Fluor 594 goat anti-mouse or rabbit IgG and Alexa Fluor 488 goat anti-rabbit IgG were from Invitrogen. Protein Expression and Purification in E. coli The truncated MYRF (MYRF-319:708) was inserted into pET52b (Novagen) between BamHI and SacI to generate pET52b-StrepII-MYRF-319:708-10xHis. This plasmid was transformed into BL21 Star (DE3) pLysS E. coli (Invitrogen). Cells were cultured at 37°C to OD600 nm 0.4–0.6 in LB and protein expression was induced by 0.5 mM IPTG at 16°C for 16–18 h. Cells were collected and lysed by sonication in lysis buffer (20 mM Tris pH 8.0, 500 mM NaCl, 10 mM β-mercaptoethanol, 10% glycerol). The lysate was clarified by centrifugation at 15,000× g for 30 min at 4°C and the supernatant was loaded onto a Ni-NTA column (Qiagen). The flow-through was loaded onto a Strep-Tactin chromatography column (IBA) to affinity purify the N-terminal fragment of MYRF-319:708 according to the manufacturer's purification protocol. The eluted protein was dialyzed in dialysis buffer (20 mM HEPES pH 7.5, 50 mM NaCl, 10 mM β-mercaptoethanol, 10% glycerol) and concentrated to about 2 mg/ml. The concentrated protein was analyzed by Superdex 200 10/300 GL gel filtration column chromatography. qRT-PCR RNA was extracted from HeLa cells using Trizol (Invitrogen). cDNA was generated using SuperScript First-Strand Synthesis System for RT-PCR (Invitrogen). qRT-PCR was performed using PowerSYBR Green PCR Master Mix (Invitrogen) and ABI ViiA 7 Real-Time PCR System. Primer sequences are available in Table S2. CG4 Maturation Assay CG4 cells were maintained in GM [70% of CG4 growth medium (Dulbecco's modified Eagle's medium, 5 µg/ml transferrin, 100 µM putrescine, 20 nM progesterone, 30 nM selenium, 10 ng/ml biotin, and 5 µg/ml insulin) supplemented with 30% of the same medium conditioned by B104 cells]. For maturation assays, CG4 cells were plated on glass bottom six-well plates coated with poly-L-ornithine. 0.4 µg of plasmid DNA was transfected using Lipofectamine 2000 (Invitrogen) for 4 h. After transfection, GM was replaced by DM (CG4 growth medium supplemented with 1% FBS, 40 ng/ml triiodothyronine). CG4 cells were maintained in DM for 4 d before immunostaining for cell counting. Primary antibodies used were 1∶500 mouse anti-O1 (Millipore) and 1∶500 rat-anti-MBP (Millipore). For each sample, cells of at least 50 random fields were counted in a blind fashion. Supporting Information Figure S1. Control IF experiments confirmed that MYRF is generated as a membrane protein. (A) Membrane topology prediction results for MYRF (left) and Myrf (right) from the TOPCONS server [15]. (B) IF images of 3F-MYRF, MYRF-3F, and MYRFΔTM-3F in HeLa cells. (C) IF images of GFP-MYRF and MYRF-GFP in CG4 cells. (D) IF image of 3F-MYRF-GFP in CG4 cells. Scale bars, 10 µm. https://doi.org/10.1371/journal.pbio.1001624.s001 (TIF) Figure S2. Disappearance of the middle band at ∼120 kDa upon MG132 treatment (A) and control experiments for selective membrane permeation with digitonin (B). (A) HeLa cells were transfected with 3F-MYRF and then treated with MG132, a proteasome inhibitor. The middle band disappeared upon MG132 treatment, suggesting that it represents a proteasome degradation intermediate, presumably caused by overexpression. This possibility was corroborated by a control experiment with p60Tth, the NFκB p105 construct whose processing is known to be mediated by the proteasome [56]. (B) Control experiments testing the selective permeation of the plasma membrane by digitonin. When cells were selectively permeated by digitonin, a calnexin antibody targeting an epitope inside the ER lumen did not yield IF signals. Yet when cells were indiscriminately permeated by Triton X-100, it gave strong IF signals. An antibody targeting an epitope in the cytoplasmic segment of calnexin gave IF signals for both digitonin and Triton X-100. Scale bar, 10 µm. https://doi.org/10.1371/journal.pbio.1001624.s002 (TIF) Figure S3. The ICA domain autonomously mediates the proteolytic processing of MYRF. (A) Sequence alignment between the ICA domain of bacteriophage K1F endosialidase and the portion of MYRF that lies between its DNA-binding and TM domains, as generated by the HHpred server [34]. (B) L683 and L690 were predicted to form a leucine zipper. Mutation of these residues to alanine disrupted the processing of MYRF. IF images confirmed their exclusion from the nucleus. (C) Mapping of the amino acid sequence of MYRF onto the ICA domain of bacteriophage K1F endosialidase (PDB ID: 3GW6), based on the sequence alignment shown in panel A, indicated the positions that K596, S599, and L679 of MYRF would occupy. Since these three residues were all predicted to point outward, they were not expected to be critical for either catalytic or structural roles. (D) Multiple sequence alignment of MYRF and its orthologs generated by ClustalW [54]. Shown are the DNA-binding domain (blue broken double line), the ICA domain (red broken double line), and the TM domain (black broken double line). Scale bar, 10 µm. https://doi.org/10.1371/journal.pbio.1001624.s003 (TIF) Figure S4. The N-terminal fragment of MYRF forms a trimer. MYRF-319:708 was expressed in E. coli with N-terminal StrepII tag and C-terminal His tag. The N-terminal fragment StrepII-MYRF-319:577 from the auto-processing of StrepII-MYRF-319:708-10xHis was purified by Strep-Tactin affinity chromatography followed by size exclusion chromatography. The elution profiles of molecular weight standards and StrepII-MYRF-319:577 are shown in panels A and B, respectively. The peak for StrepII-MYRF-319:577 was confirmed by SDS-PAGE (the insert in panel B). The molecular weight of StrepII-MYRF-319:577 was determined to be 89 kDa by comparison with a standard curve. Theoretical molecular weights for a monomer, dimer, and trimer are 30 kDa, 60 kDa, and 90 kDa, respectively. https://doi.org/10.1371/journal.pbio.1001624.s004 (TIF) Figure S5. R445A mutation does not affect the proteolytic processing and localization of MYRF. (A) Western blot showed that 3F-MYRF-R445A is normally processed. (B) IF images showed that the N-terminal fragment of 3F-MYRF-R445A is localized in the nucleus. Scale bar, 10 µm. https://doi.org/10.1371/journal.pbio.1001624.s005 (TIF) Table S1. Ten most differentially expressed genes between HeLa cells that were transfected with wild-type MYRF and the catalytic mutant S578A. https://doi.org/10.1371/journal.pbio.1001624.s006 (DOCX) Table S2. Primer sequences for qRT-PCR. https://doi.org/10.1371/journal.pbio.1001624.s007 (DOCX) Acknowledgments We are grateful to Dr. Lynn Hudson, Dr. Jo Ann Berndt, and Dr. David Schubert for providing us with cell lines and detailed protocols. We gratefully acknowledge Paul Tesar for assistance and reagents for OL culture, and thank Kimberly Raab-Graham, Jeff Gross, Seema Agarwala, Vishy Iyer, Jessie Zhang, and Andrew Ellington for scientific discussion and interactions. We thank Mengmeng Zhang, Bum-Kyu Lee, and Jiwoon Lee for their experimental help; the UT Microscopy and Imaging Facility for assistance with confocal microscopy; and Angel Syrett and Marianna Grenadier for assistance with illustrations.
Hybrid T-Helper Cells: Stabilizing the Moderate Center in a Polarized Systemdoi: 10.1371/journal.pbio.1001632pmid: 23976879
The diversity of cell types in the metazoan body arises through a hierarchical cascade of binary branching in the cells' developmental path [1]. Starting in the omnipotent fertilized egg cell, which faces the first “either-or” choice between the extra-embryonic and the inner cell mass lineage [2] (Figure 1A), such binary branching of lineages is seen throughout the development of virtually all tissues. Download: PPT PowerPoint slide PNG larger image TIFF original image Figure 1. Binary cell-fate decisions in development. Examples of polarization of cell phenotypes at developmental branch points, for the first cell-fate decision in the zygote (A) and for the lymphoid lineages (B). Each binary lineage branching is typically governed by a toggle switch, in which the fate-determining transcription factors often also auto-activate, giving rise to the mutual-inhibition/self-activation (MISA) circuit. In reality these circuits are interconnected to large gene regulatory networks wherein some factors are reused at more than one level of the branching hierarchy. In the Th1–Th2 branching, cross-inhibition between the two lineages as well as self-activation is also mediated by well-characterized external interactions, embodied by the two lineage-characterizing cytokines IFNγ and IL-4 that are also involved in cell proliferation control. https://doi.org/10.1371/journal.pbio.1001632.g001 In the adaptive immune system, the common lymphoid progenitor (CLP) has two major lineage options, B lymphocytes and T lymphocytes (Figure 1B). T cells further split into cytotoxic T cells (CTL), identified by the CD8 cell surface marker, and T-helper (Th) cells, which express CD4 instead. This reciprocal surface marker expression is exploited by biologists for the physical separation of these two functionally distinct types of T cells. Not surprisingly, among the highly versatile Th cells a further binary and functionally significant subdivision was discovered in 1986 by Mosmann and Coffman [3],[4]; based on cytokine expression profiles, one can distinguish between Th1 and Th2 cells, which are, roughly speaking, in charge of complementary aspects of host defense [5],[6]. While other Th lineages are now distinguished, the Th1–Th2 dualism results from a tightly controlled “either-or” decision, which is critical because mounting an inappropriate or excessive response against a pathogen can result either in blunted immune defense or in autoimmunity. Binary branching of cell fates that polarizes the cell phenotype constitutes a natural dichotomy, i.e., the two alternative options are disjoint and mutually exclusive (Figure 1A). How does this polarization arise? Why is black or white prevalent but gray so rare? Conrad Waddington already noted that “intermediates” between discrete phenotypes are rare [7]. But now, in defiance of the ubiquity of such natural polarization of cell lineages, three groups report the existence of a gray-zone “hybrid” Th1/Th2 state that has features of both Th1 and Th2 cells and, importantly, is very stable. Moreover, it does not appear to represent the common metastable precursor, and has a distinct biological function [8]–[10]. How is the polarization overcome to produce such a stable intermediate? Since one cannot understand “gray” without a preexisting, internalized notion of “black” and “white,” let us take a step back and examine how phenotype polarization so readily and reliably arises in the first place. The emergence of two stable states within one system is one of the first cases of the use of nonlinear dynamical systems theory [11] to predict cell-fate control by a molecular network [12]. This example has now been elevated to a classical paradigm, a gene circuit popularly known as the “toggle switch” [13] (see Figure 2A). Thus a gene circuit consisting of two mutually repressing genes X and Y can toggle between these two steady states that are stable (attractors and valleys in Figure 2)—A (where X is highly expressed and Y is suppressed, X>>Y) and B (with the reciprocal pattern, X<<Y). The mathematical description (ordinary differential equations [11]) that maps the mutual repression circuit into such bistable behavior dictates that the intermediate state C (X = Y), although a steady state, is unstable (hilltop in Figure 2A). Download: PPT PowerPoint slide PNG larger image TIFF original image Figure 2. From gene circuit architecture to cell-fate behavior. The theory of dynamical systems predicts the repertoire of cell behavior. The dynamics can be precisely visualized as a “quasi-potential landscape” in which each position represents a state S (an expression configuration of the two genes X, Y). The bottom panels show a cross section of the landscape along a diagonal that cuts through the attractors. Here, steady states are represented by “flat” regions that experience no driving force. Stable steady states (attractor states) are the lowest point in valleys (potential wells) and unstable steady states are “hilltops.” Orange dashed lines depict attractor boundaries. The quasi-potential or elevation U(S) at each S reflects the “relative stability,” in terms of the probability for state transitions (represented by the height of uphill climb needed to exit an attractor) [16]. The dynamical behavior (manifest in the shape of landscape) is determined by the architecture of the circuit and by the strength and modality of the interactions (model parameters) and gene expression noise. The bistable “toggle”-switch circuit (Panel A) has two stable attractor states A and B, characterized by reciprocal expression of X and Y, (X>>Y or Y>>X, respectively), whereas the hybrid state C (X = Y) is by necessity an unstable steady state. In the tristable “MISA” circuit (Panel B), the central state C is locally stable. Its relative stability depends on, e.g., the strength of the self-activation loops, which may be the basis for its stabilization in the Th1/Th2 hybrid state. The small insets of landscapes depict examples of their modification by changing the parameters of interactions (“parameter space”). Note here that the central hybrid state can also be modeled as a monostable constellation as done by Antebi et al. [10]. https://doi.org/10.1371/journal.pbio.1001632.g002 In 1948, Max Delbrück first formulated this principle of cross-inhibition in terms of ordinary differential equations to explain differentiation into two states in a two-metabolite system [14]. Monod and Jacob proposed the same concept in the early 1960s for a two-gene regulatory circuit that is essentially today's “toggle switch” (Figure 2A) [15]. The list of pairs of mutually antagonistic cell-fate regulators that both govern binary lineage branching into two “sister lineages” and regulate their respective effector genes is growing [16]. In the Th1–Th2 dichotomy, the two antagonistic transcription factors are T-bet (promoting Th1) and GATA3 (Th2) [6],[17]. But reality is more complex: binary branching decision points are connected to the larger tree of development, and gene circuits are connected to the complex genome-spanning regulatory network [18]. Yet the presence of mutual suppression switches shines through. Systematic genome-wide analyses of transcriptomes in individual cell types have revealed reciprocal expression for many pairs of regulatory factors in sister cell lineages, and many of them indeed form bidirectional interaction circuits [19]. The binary decision between two fate options implies an undecided decider—a status obviously embodied by the common multipotential precursor of the two sister cell lineages. It is intuitive and plausible that this common precursor is in the intermediate state C, which naturally exhibits “promiscuous” co-expression of the two lineage-specific factors (Figure 2), as first proposed by Tariq Enver [20]. Analysis of gene expression patterns of common precursor cells indeed has provided evidence for such multi-lineage promiscuity [21]–[24]. Simultaneous presence of both antagonistic master regulators in the same cell is thus a hallmark of the precursor of the two respective sister lineages [1]. There is one problem with the promiscuous coexistence of the two opposing regulators X and Y. According to the mathematical model of the bistable toggle switch, the central, promiscuous state C (X = Y) is dynamically highly unstable (Figure 2A)—at odds with biological reality where the promiscuous multipotent stem or precursor cells are discernible cell types that can be isolated and thus display some finite degree of stability despite their notorious propensity to “differentiate away” when not kept in their natural stem cell niche. How can the unstable intermediate state be partially stabilized? Here again, the dynamical systems formalism helps us to understand how the structure of a regulatory circuitry maps into its behavioral repertoire. In addition to the mutual repression between the antagonistic transcription factors, in many toggle-switch circuits the regulators are also capable of (indirect or direct) auto-stimulation, e.g., they may bind to and activate their own promoters [1],[12]. Mathematical modeling shows that self-activation can convert the central state from an unstable steady state to a stable steady state, thus creating “tristable” behavior [23],[25],[26]. The existence of stable steady states (attractors) for a given gene circuit architecture depends both on the structure of its “wiring diagram” and on the parameters in the model that represent the strength of the regulatory interactions (Figure 2). Mathematical analysis of such tristable mutual-inhibition+self-activation circuits (hereafter, MISA) suggests that for a wide range of parameter values the central state C exists as a stable or at least metastable state [23],[25],[26]. Thus (partial) stabilization of the central state is readily achieved. In other words, tristability per se is a robust phenomenon in the space of possible circuit architectures. Moreover, increasing the strength of autoregulation increases the relative stability [16] of the central attractor state C [23],[27]. Yet, despite metastability, the central hybrid state is not long-lived—just sufficient to support the undecided precursor state. Multipotent cells quickly make decisions if their state is not actively maintained by its niche. This is why the description of a robust, persistent Th1/Th2 hybrid state by three groups is intriguing [8]–[10]. The existence of a hybrid “Th0” phenotype in activated T cells had been suggested early on based on cytokine profiles [28],[29]. But since cytokine profiles are known to exhibit noisy cell-to-cell variation, the Th0 state remained the subject of debate. More recent work has suggested that virus-specific Th2 cells could be “reprogrammed” to T-bet+/GATA3+ double-positive cells that also exhibit Th1 functionality [30] upon challenge in the context of a Th1-promoting viral infection [30]. Now, the direct generation of Th1/Th2 hybrid cells from naïve T cells and their long-term stability is reported in this issue. The cellular coexistence of T-bet and GATA3, which is more consistent than the co-expression of the subtype-defining cytokines, is documented in a number of ways: at the level of mRNA and protein, and at the functional level. Single-cell resolution analysis was used to unambiguously demonstrate that coexistence of Th1 and Th2 characteristics is due to a “true” (cell-intrinsic) Th1/Th2 hybrid phenotype as opposed to resulting from a mixture of Th1 and Th2 cells. The new results also confirm the stochastic manner in which individual cells produce the lineage-specific cytokines. Such “irregular” cytokine expression in subsets of cells [31], once attributed to incomplete polarization or unknown subsets, is now placed in the new context of the stable coexistence of their regulators, T-bet and GATA3, and of the well-recognized stochasticity of gene expression. In the first report, Fang et al. [8] used fluorescent in situ hybridization to quantitate the transcripts of T-bet and GATA3 and demonstrated that under in vitro nonpolarizing Th cell–activating conditions, Th cells co-expressed both transcripts at high levels in individual cells. This departure from the canonical tristability model, where promiscuous expression typically occurs at intermediate levels [23],[25],[26], may be achieved by adjusting the interaction parameters and the circuit diagram. In the Th1–Th2 dichotomy, the antagonism also takes place at the level of extracellular communication: The Th1 cytokine IFNγ suppresses the production of Th2 cells, and the Th2-secreted cytokine IL4 suppresses generation of Th1 cells [5],[6]. In the second report, Antebi et al. [10] demonstrate that mixed stimulation with different ratios of IFNγ and IL4 can tune the Th1/Th2 cell state across a large continuum of hybrid Th1/Th2 cells, consistent with a broad regime in which the central state (which in their model is in the monostable regime) is stable. But is there a biological function for the hybrid state? If the previous two reports relied on spontaneous, unbiased, or (ambivalently) biased conditions for Th activation in vitro, in the third report Peine et al. [9] describe the direct generation of a robust hybrid Th1/Th2 state following in vivo challenge with parasites (which typically elicit Th2 responses). Intriguingly, the Th1/Th2 hybrid cells were stable over an extended period of time (and maintained in the memory T cell state), exhibited immune responses characteristic of both Th1- and Th2-mediated inflammation, and they mitigated immunopathology associated with pure Th1 or Th2 response. Peine et al. also showed that the Th1/Th2 hybrid is unlikely a precursor state because it robustly resisted conversion to Th1 or Th2 cells with IFNγ, IFN-α/β, and IL-12, or with IL-4, respectively. Here the absence of polarization is literally a moderation of extremes, a compromise at the center, which helps to avoid damage from overt immune response mediated by Th1 or Th2 cells. Did selection for moderation promote evolution of a stable central state? Why regulatory pathways are wired the way they are is a profound problem in systems biology and evolution [32]. Although it is tempting to credit natural selection for tinkering with a circuit's wiring diagram to optimally serve its purpose [33],[34], it is more likely that selection uses those network structures that readily emerge from the random cis/trans region shuffling during genome evolution [35]–[37] and that happen to be associated with a desirable functionality. Natural selection would then only deserve credit for the fine-tuning. As mentioned above, simply adding autoactivation to the toggle switch to obtain a MISA circuit stabilizes the intermediate state. There are other variants of the bistable toggle switch that may have a stable, central hybrid state. For instance, circuits in which one of the cross-inhibitory interactions is mediated by miRNA may also create tristability [38]. Moreover, models that consider gene expression noise, known to produce accumulation of cells in states not predicted by the deterministic (noise-free) model (Figure 2) [39], can also explain an intermediate state—even in the absence of self-activation [40]. Other models that consider more molecular details (transcription factor binding/unbinding, translation) can, for certain parameter values, produce multiple, asymmetric intermediate states and even continuously degenerate hybrid states [41],[42]. In the case of the Th1/Th2 hybrids, the external, cytokine-based MISA circuitry mediated by IFNγ and IL4 (Figure 1B) also functions at the level of cell survival and proliferation control, and may thereby contribute to stabilization of a subpopulation of cells in the hybrid state. If the stable intermediate state is easily generated in minimal networks, why is it not more frequently encountered in developmental dichotomies governed by MISA circuits, instead being limited to transient metastable precursors? Perhaps there is no need for moderation of extremes in the case of progenitor cells that do not have a major biological effector function in the mature tissue—unlike the immune cells' rapidly deployed defense activities that are often double-edged swords. On the contrary, in development the diversification of cell phenotype is the common objective. Hence robust polarization into clear-cut lineages rather than moderation in the gray zone is desired.
Effects of Diet on Resource Utilization by a Model Human Gut Microbiota Containing Bacteroides cellulosilyticus WH2, a Symbiont with an Extensive Glycobiomedoi: 10.1371/journal.pbio.1001637pmid: 23976882
Introduction A growing body of evidence indicates that the tens of trillions of microbial cells that inhabit our gastrointestinal tracts extend our biological capabilities in important ways. Microbial enzymes process many compounds that would otherwise pass through our intestines unaltered [1], and cases of particular nutrient substrates favoring the growth of particular taxa are being reported [2]–[5]. Changes in diet are therefore expected to lead to changes in the composition and function of the microbiota [6]–[10]. However, our understanding of diet–microbiota interactions at a mechanistic level is still in its infancy. The absence of a complete catalog of the microbial strains and associated genome sequences that comprise a given microbiota complicates efforts to describe how particular dietary substrates influence individual taxa, how taxa cooperate/compete to utilize nutrients, and how these many interactions in aggregate lead to emergent host phenotypes. Gnotobiotic mice colonized with defined consortia of sequenced human gut microbes, on the other hand, provide an in vivo model of the microbiota in which the identity of all taxa and genes comprising the system are known. Within these assemblages, expressed mRNAs and proteins can be attributed to their genome, gene, and species of origin, and findings of interest can be pursued in follow-up in vitro or in vivo experiments. These systems also afford an opportunity to tightly control experimental variables to a degree not possible in human studies and have proven useful in studying microbial invasion, microbe–microbe interactions, and the metabolic roles of key ecological guilds [11]–[15]. Studies aiming to better understand community-level assembly, resilience, and adaptation are therefore likely to benefit from a focus on such defined systems. However, the limited taxonomic and functional representation within artificial communities of modest complexity requires that caution be exercised when extrapolating results to more complex, naturally occurring gut communities (see Prospectus). Culture-independent surveys of the healthy adult gut microbiota consistently conclude that it is composed primarily of members of two bacterial phyla, the Bacteroidetes and Firmicutes [16]–[21]. The dominance of these two bacterial phyla suggests that their representatives in the human gut are exquisitely adapted to its dynamic conditions, which include a constantly evolving nutrient environment. Members of the genus Bacteroides are known to be adept at utilizing both plant- and host-derived polysaccharides [22]. Comparisons of available Bacteroides genomes with those from other gut species indicate that the former are enriched in genes involved in the acquisition and metabolism of various glycans, including glycoside hydrolases (GHs) and polysaccharide lyases (PLs), as well as linked environmental sensors that control their expression (e.g., hybrid two-component systems, extracytoplasmic function (ECF) sigma factors and anti-sigma factors). Many of these genes are organized into polysaccharide utilization loci (PULs) that are distributed throughout the genome [23],[24]. Recent studies have focused on better understanding the evolution, specificity, and regulation of PULs in the genomes of species like Bacteroides thetaiotaomicron and Bacteroides ovatus [25],[26]. Little is known, however, about the metabolic strategies adopted by multiple competing species in more complex communities, how dietary changes lead to reconfigurations in community structure through changes in individual species, or whether dietary context influences which genes dominant species rely on to remain competitive with other microbes, including those genes that are components of PULs. Here, we adopt a multifaceted approach to study an artificial community in gnotobiotic mice fed changing diets in order to better understand (i) the process by which such a community reconfigures itself structurally in response to changes in host diet; (ii) how aggregate community function, as judged by the metatranscriptome and metaproteome, is impacted when host diet is altered; (iii) how the metabolic strategies of its individual component microbes change, if at all, when the nutrient milieu is dramatically altered, with an emphasis on one prominent but understudied member of the human gut Bacteroides; and (iv) whether a microbe's metabolic versatility/flexibility correlates with competitive advantage in an assemblage containing related and unrelated species. Results and Discussion Sequencing the Bacteroides cellulosilyticus WH2 Genome Though at least eight complete and 68 draft genomes of Bacteroides spp. are currently available [27], there are numerous examples of distinct clades within this genus where little genomic information exists. To further explore the genome space of one such clade, we obtained a human fecal isolate whose four 16S rRNA gene sequences indicate a close relationship to Bacteroides cellulosilyticus (Figure S1A,B). The genome of this isolate, which we have designated B. cellulosilyticus WH2, was sequenced deeply, yielding a high-quality draft assembly (23 contigs with an N50 value of 798,728 bp; total length of all contigs in the assembly, 7.1 Mb; Table S1). Annotation of its 5,244 predicted protein-coding genes using the carbohydrate active enzyme (CAZy) database [28] revealed an extraordinary complement of 503 CAZymes comprising 373 GHs, 23 PLs, 28 carbohydrate esterases (CEs), and 84 glycosyltransferases (GTs) (see Table S2 for all annotated genes in the B. cellulosilyticus WH2 genome predicted to have relevance to carbohydrate metabolism). One distinguishing feature of gut Bacteroides genomes is the substantial number of CAZymes they encode relative to those of other intestinal bacteria [29]. The B. cellulosilyticus WH2 CAZome is enriched in a number of GH families even when compared with prominent representatives of the gut Bacteroidetes (Figure S2A). When we expanded this comparison to include all 86 Bacteroidetes in the CAZy database, we found that the B. cellulosilyticus WH2 genome had the greatest number of genes for 19 different GH families, as well as genes from two GH families that had not previously been observed within a Bacteroidetes genome (Figure S2B). Altogether, B. cellulosilyticus WH2 has more GH genes at its disposal than any other Bacteroidetes species analyzed to date. In Bacteroides spp., CAZymes are often located within PULs [30]. At a minimum, a typical PUL harbors a pair of genes with significant homology to the susC and susD genes of the starch utilization system (Sus) in B. thetaiotaomicron [30]–[32]. Other genes encoding enzymes capable of liberating oligo- and monosaccharides from a larger polysaccharide are also frequently present. The susC- and susD-like genes of these loci encode the proteins that comprise the main outer membrane binding and transport apparatus and thus represent key elements of these systems. A search of the B. cellulosilyticus WH2 genome for genes with strong homology to the susC- and susD-like genes in B. thetaiotaomicron VPI-5482 revealed an unprecedented number of susC/D pairs (a total of 118). Studies of other prominent Bacteroides spp. have found that the evolutionary expansion of these genes has played an important role in endowing the Bacteroides with the ability to degrade a wide range of host- and plant-derived polysaccharides [25],[33]. Analysis of deeply sampled adult human gut microbiota datasets indicates that B. cellulosilyticus strains are common, colonizing approximately 77% of 124 adult Europeans characterized in one study [18] and 62% of 139 individuals living in the United States examined in another survey [20]. We hypothesized that the apparent success of B. cellulosilyticus in the gut is derived in part from its substantial arsenal of genes involved in carbohydrate utilization. Measuring Changes in the Structural Configuration of a 12-Member Model Microbiota in Response to a Dietary Perturbation To test the fitness of B. cellulosilyticus WH2 in relation to other prominent gut symbionts, and the importance of diet on its fitness, we carried out an experiment in gnotobiotic mice (experiment 1, “E1,” Figure S3). Two groups of 10–12-wk-old male germ-free C57BL/6J animals were moved to individual cages within gnotobiotic isolators (n = 7 animals/group). At day zero, each animal was colonized by oral gavage with an artificial community comprising 12 human gut bacterial species (Figure 1A, Table S3). Each species chosen for inclusion in this microbial assemblage met four criteria: (i) it was a member of one of three bacterial phyla routinely found in the human gut (i.e., Bacteroidetes, Firmicutes, or Actinobacteria), (ii) it was identified as a prominent member of the human gut microbiota in previous culture-independent surveys, (iii) it could be grown in the laboratory, and (iv) its genome had been sequenced to at least a high-quality draft level. Species were also selected for their functional attributes (as judged by their annotated gene content) in an effort to create an artificial community that was somewhat representative of a more complex human microbiota. For example, although more than half of the species in the assemblage were Bacteroidetes predicted to excel at the breakdown of polysaccharides, several were also prominent inhabitants of the human gut that are thought to have limited carbohydrate utilization capabilities (e.g., Firmicutes from Clostridium cluster XIVa). Some attributes for the 12 strains included in the artificial community are provided in Table S4. Download: PPT PowerPoint slide PNG larger image TIFF original image Figure 1. COPRO-Seq analysis of the structure of a 12-member artificial human gut microbial community as a function of diet and time. (A) The 12 bacterial species comprising the artificial community. (B) Principal coordinates analysis (PCoA) was applied to relative abundance data generated by COPRO-Seq from two experiments (E1, E2), each spanning 6 wk. Following colonization (day 0), mice were switched between two different diets at 2-wk intervals as described in Figure S3. COPRO-Seq data from E1 and E2 were ordinated in the same multidimensional space. For clarity, only data from E2 are shown here (for the E1 PCoA plot, see Figure S5A). Red/blue, feces; pink/cyan, cecal contents. (C) Proportional abundance data from E1 illustrating the impact of diet on fecal levels of a diet-sensitive strain with higher representation on HF/HS chow (B. caccae), a diet-sensitive strain with higher representation on LF/HPP chow (B. ovatus), a diet-insensitive strain with no obvious diet preference (B. thetaiotaomicron), and a diet-sensitive strain with a preference for the LF/HPP diet that also achieves a high level of representation on the HF/HS diet (B. cellulosilyticus WH2). Mean values ± SEM are shown. Plots illustrating changes in abundance over time for all species in both experiments are provided in Figure S4C. https://doi.org/10.1371/journal.pbio.1001637.g001 For 2 wk, each treatment group was fed a standard low-fat/high-plant polysaccharide (LF/HPP) mouse chow, or a “Western”-like diet where calories are largely derived from fat, starch, and simple sugars (high-fat/high-sugar (HF/HS)) [12]. Over the course of 6 wk, diets were changed twice at 2-wk intervals, such that each group began and ended on the same diet, with an intervening 2-wk period during which the other diet was administered (Figure S3). Using fecal DNA as a proxy for microbial biomass, the plant polysaccharide-rich LF/HPP diet supported 2- to 3-fold more total bacterial growth (primary productivity) despite its lower caloric density (3.7 kcal/g versus 4.5 kcal/g for the HF/HS diet; Figure S4A). The HF/HS diet contains carbohydrates that are easily metabolized and absorbed in the proximal intestine (sucrose, corn starch, and maltodextrin), with cellulose being the one exception (4% of the diet by weight versus 46.3% for the other carbohydrate sources). Thus, in mice fed the HF/HS diet, diet-derived simple sugars are likely to be rare in the distal gut where the vast majority of gut microbes reside; this may provide an advantage to those bacteria capable of utilizing other carbon sources (e.g., proteins/oligopeptides, host glycans). In mice fed the LF/HPP diet, on the other hand, plant polysaccharides that are indigestible by the host should provide a plentiful source of energy for saccharolytic members of the artificial community. To evaluate the impact of each initial diet and subsequent diet switch on the structural configuration of the artificial community, we performed shotgun sequencing (community profiling by sequencing; COPRO-Seq) [11] of DNA isolated from fecal samples collected throughout the course of the experiment, as well as cecal contents collected at sacrifice. The relative abundances of the species in each sample (defined by the number of sequencing reads that could be unambiguously assigned to each microbial genome after adjusting for genome uniqueness) were subjected to ordination by principal coordinates analysis (PCoA) (Figure S5A). As expected, diet was found to be the predominant explanatory variable for observed variance (see separation along principal coordinate 1, “PC1,” which accounts for 52% of variance). The overall structure of the artificial community achieved quasi-equilibrium before the midpoint of the first diet phase, as evidenced by the lack of any significant movement along PC1 after day five. A structural reconfiguration also took place over the course of ∼5 d following transition to the second diet phase. Notably, the two treatment groups underwent a near-perfect inversion in their positions along PC1 after the first diet switch; the artificial community in animals switched from a LF/HPP to HF/HS diet took on a structure like that which arose by the end of the first diet phase in animals consuming the HF/HS diet, and vice versa. The second diet switch from phase 2 to 3 resulted in a similar movement along PC1 in the opposite direction, indicating a reversion of the artificial community's configuration to its originally assembled structure in each treatment group. These results, in addition to demonstrating the significant impact of these two diets on the structure of this 12-member artificial human gut community, also suggest that an assemblage of this size is capable of demonstrating resilience in the face of substantial diet perturbations. The assembly process and observed diet-induced reconfigurations also proved to be highly reproducible as evidenced by COPRO-Seq results from a replication of E1 (experiment 2, “E2”). In this follow-up experiment, fecal samples were collected more frequently than in E1, providing a dataset with improved temporal resolution. Ordination of E2 COPRO-Seq data by PCoA showed that (i) for each treatment group in E2, the artificial community assembles in a manner similar to its counterpart in E1; (ii) structural reconfigurations in response to diet occur with the same timing as in E1; and (iii) the quasi-equilibria achieved during each diet phase are highly similar between experiments for each treatment group (compare Figures 1B and S5A). As in E1, cecal data for each E2 treatment group overlap with their corresponding fecal samples, and DNA yields from E2 fecal samples vary substantially as a function of host diet (Figure S4B). COPRO-Seq provides precise measurements of the proportional abundance of each member species present in the artificial community. Data collected in both E1 and E2 (Table S5) revealed significant differences between members in terms of the maximum abundance levels they achieved, the rates at which their abundance levels were impacted by diet shifts, and the degree to which each species demonstrated a preference for one diet over another (Figure S4C). Changes in each species' abundance over time replicated well across animals in each treatment group, suggesting the assembly process and diet-induced reconfigurations occur in an orderly, rules-based fashion and with minimal stochasticity in this artificial community. A species' relative abundance immediately after colonization (i.e., 24 h after gavage/day 1) was, in general, a poor predictor of its abundance at the end of the first diet phase (i.e., day 13) (E1 R2 = 0.23; E2 R2 = 0.27), suggesting that early dominance of the founder population was not strongly tied to relative success in the assembly process. In mice initially fed a HF/HS diet, four Bacteroides spp. (Bacteroides caccae, B. cellulosilyticus WH2, B. thetaiotaomicron, and Bacteroides vulgatus) each achieved a relative abundance of ≥10% by the end of the first diet phase (day 13 postgavage), with B. caccae attaining the highest levels (37.1±4.9% and 34.2±5.5%; group mean ± SD in E1 and E2, respectively). In animals fed the plant polysaccharide-rich LF/HPP chow during the first diet phase, B. cellulosilyticus WH2 was dominant, achieving levels of 37.1±2.0% (E1) and 41.6±3.9% (E2) by day 13. B. thetaiotaomicron and B. vulgatus also attained relative abundances of >10%. Changes in diet often resulted in rapid, dramatic changes in a species' proportional representation. Because the dynamic range of abundance values observed when comparing multiple species was substantial (lowest, Dorea longicatena (<0.003%); highest, B. caccae (55.0%)), comparing diet responses on a common scale using raw abundance values was challenging. To represent these changes in a way that scaled absolute increases/decreases in relative abundance to the range observed for each strain, we also normalized each species' representation within the artificial community at each time-point to the maximum proportional abundance each microbe achieved across all time-points within each mouse. Plotting the resulting measure of abundance (percentage of maximum achieved; PoMA) over time demonstrates which microbes are strongly responsive to diet (experience significant swings in PoMA value following a diet switch) and which are relatively diet-insensitive (experience only modest or no significant change in PoMA value following a diet switch). Heatmap visualization of E1 PoMA values (Figure S5B) indicated that those microbes with a preference for a particular diet in one animal treatment group also tended to demonstrate the same diet preference in the other. Likewise, diet insensitivity was also consistent across treatment groups; diet-insensitive microbes were insensitive regardless of the order in which diets were introduced. Of the diet-sensitive taxa, those showing the most striking responses were B. caccae and B. ovatus, which strongly preferred the “Western”-like HF/HS diet and the polysaccharide-rich LF/HPP diet, respectively (Figures 1C and S4C). Among the diet-insensitive taxa, B. thetaiotaomicron showed the most stability in its representation (Figures 1C and S4C), consistent with its reputation as a versatile forager. Paradoxically, B. cellulosilyticus WH2 was both diet-sensitive and highly fit on its less-preferred diet; although this strain clearly achieved higher levels of representation in animals fed the LF/HPP diet, it also maintained strong levels of representation in animals fed the HF/HS diet (Figures 1C and S4C). When taking into account the abundance data for all 12 artificial community members, proportional representation at the end of the first diet phase (i.e., day 13) was a good predictor of representation at the end of the third diet phase (i.e., day 42) (E1 R2 = 0.77; E2 R2 = 0.84), suggesting that the intervening dietary perturbation had little effect on the ultimate outcomes for most species within this assemblage. However, one very low-abundance strain (D. longicatena) achieved significantly different maximum percentage abundances across the two treatment groups in each experiment, suggesting that steady-state levels of this strain may have been impacted by diet history. In mice initially fed the LF/HPP diet, D. longicatena was found to persist throughout the experiment at low levels on both diet regimens. In mice initially fed the HF/HS diet, D. longicatena dropped below the limit of detection before the end of the first diet phase, was undetectable by the end of the second diet phase, and remained undetectable throughout the rest of the time course. This interesting example raises the possibility that for some species, irreversible hysteresis effects may play a significant role in determining the likelihood that they will persist within a gut over long periods of time. The Cecal Metatranscriptome Sampled at the Time of Sacrifice These diet-induced reconfigurations in the structure of the artificial community led us to examine the degree to which its members were modifying their metabolic strategies. To establish an initial baseline static view of expression data for each microbe on each diet, we developed a custom GeneChip whose probe sets were designed to target 46,851 of the 48,023 known or predicted protein-coding genes within our artificial human gut microbiome (see Materials and Methods). Total RNA was collected from the cecal contents of each animal in E1 at the time of sacrifice and hybridized to this GeneChip. The total number of genes whose expression was detectable on each diet was remarkably similar (14,929 and 14,594 detected in the LF/HPP→HF/HS→LF/HPP and HF/HS→LF/HPP→HF/HS treatment groups, respectively). A total of 11,373 genes (24.3%) were expressed on both diets (Figure S6A), while 2,003 (4.3%) were differentially expressed to a statistically significant degree, including 161 (6.1%) of the 2,640 genes in the microbiome encoding proteins with CAZy-recognized domains. Figure S6B illustrates the fraction of the community-level CAZome and several species-level CAZomes expressed on each diet (see Table S6 for a comprehensive list of all genes, organized by species and fold-change in expression, whose cecal expression was detectable on each diet and all genes whose expression was significantly different when comparing data from each treatment group). Among taxa demonstrating obvious diet preferences (as judged by relative abundance data), B. caccae and B. cellulosilyticus WH2 provided examples of CAZy-level responses to diet change that were different in several respects. Our observations regarding the carbohydrate utilization capabilities and preferences of B. caccae are summarized in Text S1. However, our ability to evaluate shifts in B. caccae's metabolic strategy in the gut was limited by its very low abundance in animals fed LF/HPP chow (i.e., our mRNA and subsequent protein assays were often not sensitive enough to exhaustively sample B. caccae's metatranscriptome and metaproteome). In contrast, the abundance of B. cellulosilyticus WH2, which favored the LF/HPP diet, remained high enough on both diets to allow for a comprehensive analysis of its expressed genes and proteins. This advantage, along with the exceptional carbohydrate utilization machinery encoded within the genome of this organism, encouraged us to focus on further dissecting the responses of B. cellulosilyticus WH2 to diet changes. Detailed inspection of the expressed B. cellulosilyticus WH2 CAZome (503 CAZymes in total) provided an initial view of this microbe's sophisticated carbohydrate utilization strategy. A comparison of the top decile of expressed CAZymes on each diet disclosed many shared elements between the two lists, spanning many different CAZy families, with just over half of the 50 most expressed enzymes on the plant polysaccharide-rich LF/HPP chow also occurring in the list of most highly expressed enzymes on the sucrose-, corn starch-, and maltodextrin-rich HF/HS diet (Figure 2A). Twenty-five of the 50 most expressed CAZymes on the LF/HPP diet were significantly up-regulated compared to the HF/HS diet; of these, seven were members of the GH43 family (Figure 2B). The GH43 family consists of enzymes with activities required for the breakdown of plant-derived polysaccharides such as hemicellulose and pectin. Inspection of the enzyme commission (EC) annotations for the most up-regulated GH43 genes shows that they encode xylan 1,4-β-xylosidases (EC 3.2.1.37), arabinan endo-1,5-α-L-arabinosidases (EC 3.2.1.99), and α-L-arabinofuranosidases (EC 3.2.1.55). The GH10 family, which is currently comprised exclusively of endo-xylanases (EC 3.2.1.8, EC 3.2.1.32), was also well represented among this set of 25 genes, with four of the seven putative GH10 genes in the B. cellulosilyticus WH2 genome making the list. Strikingly, of the 45 predicted genes with putative GH43 domains in the B. cellulosilyticus WH2 genome, none were up-regulated on the “Western”-style HF/HS diet. Download: PPT PowerPoint slide PNG larger image TIFF original image Figure 2. B. cellulosilyticus WH2 CAZyme expression in mice fed different diets. (A) Overview of the 50 most highly expressed B. cellulosilyticus WH2 CAZymes (GHs, GTs, PLs, and CEs) for samples from each diet treatment group. List position denotes the rank order of gene expression for each treatment group, with higher expression levels situated at the top of each list. Genes common to both lists are identified by a connecting line, with the slope of the line indicating the degree to which a CAZyme's prioritized expression is increased/decreased from one diet to the other. CAZy families in bold, colored letters highlight those list entries found to be significantly up-regulated relative to the alternative diet (i.e., a CAZyme with a bold green family designation was up-regulated on the LF/HPP diet; a bold orange family name implies a gene was up-regulated significantly on the HF/HS diet). Statistically significant fold-changes between diets are denoted in the “F.C.” column (nonsignificant fold-changes are omitted for clarity). (B) Breakdown by CAZy family of the top 10% most expressed CAZymes on each diet whose expression was also found to be significantly higher on one diet than the other. Note that for each diet, the family with the greatest number of up-regulated genes was also exclusively up-regulated on that diet (LF/HPP, GH43; HF/HS, GH13). In total, 25 genes representative of 27 families and 12 genes representative of 13 families are shown for the LF/HPP and HF/HS diets, respectively. https://doi.org/10.1371/journal.pbio.1001637.g002 The most highly expressed B. cellulosilyticus WH2 CAZyme on the plant polysaccharide-rich chow (which was also highly-expressed on the HF/HS chow) was BWH2_1228, a putative α-galactosidase from the GH36 family. These enzymes, which are not expressed by humans in the stomach or intestine, cleave terminal galactose residues from the nonreducing ends of raffinose family oligosaccharides (RFOs, including raffinose, stachyose, and verbascose), galacto(gluco)mannans, galactolipids, and glycoproteins. RFOs, which are well represented in cereal grains consumed by humans, are expected to be abundant in the LF/HPP diet given its ingredients (e.g., soybean meal), but potential substrates in the HF/HS diet are less obvious, possibly implicating a host glycolipid or glycoprotein target. Surface glycans in the intestinal epithelium of rodents are decorated with terminal fucose residues [34] as well as terminal sialic acid and sulfate [35]. Hydrolysis of the α-2 linkage connecting terminal fucose residues to the galactose-rich extended core is thought to be catalyzed in large part by GH95 and GH29 enzymes [36]. The B. cellulosilyticus WH2 genome is replete with putative GH95 and GH29 genes (total of 12 and 9, respectively), but only a few (BWH2_1350/2142/3154/3818) were expressed in vivo on at least one diet, and their expression was low relative to many other CAZymes (see Table S6). Cleavage of terminal sialic acids present in host mucins by bacteria is usually carried out by GH33 family enzymes. B. cellulosilyticus WH2 has two GH33 genes that are expressed on either one diet (BWH2_3822, HF/HS) or both diets (BWH2_4650), but neither is highly expressed relative to other B. cellulosilyticus WH2 CAZymes. Therefore, utilization of host glycans by B. cellulosilyticus WH2, if it occurs, likely requires partnerships with other members of the artificial community that express GH29/95/33 enzymes (see Table S6 for a list of members that express these enzymes in a diet-independent and/or diet-specific fashion). Among the 50 most highly expressed B. cellulosilyticus WH2 CAZymes, 12 were significantly up-regulated on the HF/HS diet compared to the LF/HPP diet, with members of family GH13 being most prevalent. While the enzymatic activities and substrate specificities of GH13 family members are varied, most relate to the hydrolysis of substrates comprising chains of glucose subunits, including amylose (one of the two components of starch) and maltodextrin, both prominent ingredients in the HF/HS diet. GeneChip-based profiling of the E1 cecal communities provided a snapshot of the metatranscriptome on the final day of the final diet phase in each treatment group. The analysis of B. cellulosilyticus WH2 CAZyme expression suggested that this strain achieves a “generalist” lifestyle not by relying on substrates that are present at all times (e.g., host mucins), but rather by modifying its resource utilization strategy to effectively compete with other microbes for diet-derived polysaccharides that are not metabolized by the host. Community-Level Analysis of Diet-Induced Changes in Microbial Gene Expression To develop a more complete understanding of the dynamic changes that occur in gene expression over time and throughout the artificial community following diet perturbations, we performed microbial RNA-Seq analyses using feces obtained at select time-points from mice in the LF/HPP→HF/HS→LF/HPP treatment group of E2 (Figure S3). We began with a “top-down” analysis in which every RNA-Seq read count from every gene in the artificial microbiome was binned based on the functional annotation of the gene from which it was derived, regardless of its species of origin. In this case, the functional annotation used as the binning variable was the predicted EC number for a gene's encoded protein product. Expecting that some changes might occur rapidly, while others might require days or weeks, we searched for significant differences between the terminal time-points of the first two diet phases (i.e., points at which the model human gut microbiota had been allowed 13 d to acclimate to each diet). The 157 significant changes we identified were subjected to hierarchical clustering by EC number to determine which functional responses occurred with similar kinetics. The results revealed that in contrast to the rapid, diet-induced structural reconfigurations observed in this artificial community, community-level changes in microbial gene expression occurred with highly variable timing that differed from function to function. These changes were dominated by EC numbers associated with enzymatic reactions relevant to carbohydrate and amino acid metabolism (see Table S7 for a summary of all significant changes observed, including aggregate expression values for each functional bin (EC number) at each time-point). Significant responses could be divided into one of three groups: “rapid” responses were those where the representation of EC numbers in the transcriptome increased/decreased dramatically within 1–2 d of a diet switch; “gradual” responses were those where the representation of EC numbers changed notably, but slowly, between the two diet transition points; and “delayed” responses were those where significant change did not occur until the end of a diet phase (Figure 3, Table S7). EC numbers associated with reactions important in carbohydrate metabolism and transport were distributed across all three of these response types for each of the two diets. Nearly all genes encoding proteins with EC numbers related to amino acid metabolism that were significantly up-regulated on HF/HS chow binned into the “rapid” or “gradual” groups, suggesting this diet put immediate pressure on the artificial microbial community to increase its repertoire of expressed amino acid biosynthesis and degradation genes. Genes with assigned EC numbers involved in amino acid metabolism that were significantly up-regulated on the other, polysaccharide-rich, LF/HPP diet were spread more evenly across these three response types (Figure 3). Download: PPT PowerPoint slide PNG larger image TIFF original image Figure 3. Top-down analysis of fecal microbiome RNA expression in mice receiving oscillating diets. The fecal metatranscriptomes of four animals in the LF/HPP→HF/HS→LF/HPP treatment group of E2 were analyzed using microbial RNA-Seq at seven time-points to evaluate the temporal progression of changes in expressed microbial community functions triggered by a change in diet. After aligning reads to genes in the defined artificial human gut microbiome, raw counts were collapsed by the functional annotation (EC number) of the gene from which the corresponding reads originated. Total counts for each EC number in each sample were normalized, and any EC numbers demonstrating a statistically significant difference in their representation in the metatranscriptome between the final days of the first two diet phases were identified using a model based on the negative binomial distribution [57]. Normalized expression values for 157 significant EC numbers (out of 1,021 total tested) were log-transformed, mean-centered, and subjected to hierarchical clustering, followed by heatmap visualization. “Rapid” responses are those where expression increased/decreased dramatically within 1–2 d of a diet switch. “Gradual” responses are those where expression changed notably, but slowly, between the two diet transition points. “Delayed” responses are those where significant expression changes did not occur until the end of a diet phase. EC numbers specifying enzymatic reactions relevant to carbohydrate metabolism and/or transport are denoted by purple markers, while those with relevance to amino acid metabolism are indicated using orange markers. A full breakdown of all significant responses over time and the outputs of the statistical tests performed are provided in Table S7. https://doi.org/10.1371/journal.pbio.1001637.g003 Careful inspection of our top-down analysis results and a complementary “bottom-up” analysis in which normalization was performed at the level of individual species, rather than at the community level, allowed us to identify other important responses that would have gone undetected were it not for the fact that we were dealing with a defined assemblage of microbes where all of the genes in component members' genomes were known. For example, an assessment of the representation of EC 3.2.1.8 (endo-1,4-β-xylanase) within the metatranscriptome before and after the first diet switch (LF/HPP→HF/HS) initially suggested that this activity was reduced to a statistically significant degree as a result of the first diet perturbation (day 13 versus day 27; Mann–Whitney U test, p = 0.03; Figure S7A). Aggregation by species of all sequencing read counts assignable to mRNAs encoding proteins with this EC number revealed that over 99% of the contributions to this functional bin originated from B. cellulosilyticus WH2 (note the similarity in a comparison of Figure S7A and Figure S7B), implying that the community-level response and the response of this Bacteroides species were virtually one and the same. A tally of all sequencing reads assignable to B. cellulosilyticus WH2 at each time-point disclosed that although this strain maintains high proportional representation in the artificial community throughout each diet oscillation period (range, 10.3–42.5% and 11.6–43.3% for E1 and E2, respectively), its contribution to the metatranscriptome is substantially decreased during the HF/HS diet phase (Figure S7C). This dramatic reduction in the extent to which B. cellulosilyticus WH2 contributes to the metatranscriptome in HF/HS-fed mice “masks” the significant up-regulation of EC 3.2.1.8 that occurs within the B. cellulosilyticus WH2 transcriptome following the first diet shift (day 13 versus day 27; Mann–Whitney U test, p = 0.03; Figure S7D). A further breakdown of endo-1,4-β-xylanase up-regulation in B. cellulosilyticus WH2 when mice are switched to the HF/HS diet reveals that most of this response is driven by two genes, BWH2_4068 and BWH2_4072 (Figure S7E). Our realization that we were unable to correctly infer the direction of one of the most significant diet-induced gene expression changes in the second most abundant strain in the artificial community when inspecting functional responses at the community level provides a strong argument for expanding the use of microbial assemblages comprised exclusively of sequenced species in studies of the gut microbiota. This should allow the contributions of individual species to community activity to be evaluated in a rigorous way that is not possible with microbial communities of unknown or poorly defined gene composition. High-Resolution Profiling of the Cecal Metaproteome Sampled at the Time of Sacrifice In principle, protein measurements can provide a more direct readout of expressed community functions than an RNA-level analysis, and thus a deeper understanding of community members' interactions with one another and with their habitat [37],[38]. For these reasons and others, much work has been dedicated to applying shotgun proteomics techniques to microbial ecosystems in various environments [39],[40]. Though these efforts have provided illustrations of significant methodological advances, they have been limited by the complexity of the metaproteomes studied and by the difficulties this complexity creates when attempting to assign peptide identities uniquely to proteins of specific taxa. Recognizing that a metaproteomics analysis of our artificial community would not be subject to such uncertainty given its fully defined microbiome and thus fully defined theoretical proteome, we subjected cecal samples from two mice from each diet treatment group in E1 (n = 4 total) to high-performance liquid chromatography-tandem mass spectrometry (LC-MS/MS; see Materials and Methods). We had three goals: (i) to evaluate how our ability to assign peptide-spectrum matches (PSMs) to particular proteins within a theoretical metaproteome is affected by the presence of close homologs within the same species and within other, closely related species; (ii) to test the limits of our ability to characterize protein expression across different species given the substantial dynamic range we documented in microbial species abundance; and (iii) to collect semiquantitative peptide/protein data that might validate and enrich our understanding of functional responses identified at the mRNA level, particularly with respect to the niche (profession) of CAZyme-rich B. cellulosilyticus WH2. Given the evolutionary relatedness of the strains involved, we expected that some fraction of observed PSMs from each sample would be of ambiguous origin due to nonunique peptides shared between species' proteomes. To assess which species might be most affected by this phenomenon when characterizing the metaproteome on different diets, we catalogued each strain's theoretical peptidome using an in silico tryptic digest. This simulated digest took into account both the potential for missed trypic cleavages and the peptide mass range that could be detected using our methods. The results (Figure S8A, Table S8) demonstrated that for an artificial community of modest complexity, the proportion of peptides within each strain's theoretical peptidome that are “unique” (i.e., assignable to a single protein within the theoretical metaproteome) varies substantially from species to species, even among those that are closely related. We found the lone representative of the Actinobacteria in the artificial community, Collinsella aerofaciens, to have the highest proportion of unique peptides (94.2%), while B. caccae had the lowest (63.0%). Interestingly, there was not a strong correlation between the fraction of a species' peptides that were unique and the total number of unique peptides that species contributed to the theoretical peptidome. For example, C. aerofaciens (2,367 predicted protein-coding genes) contributed only 81,894 (1.5%) unique peptides, the lowest of any artificial community member evaluated, despite having a proteome composed of mostly unique peptides. On the other hand, B. cellulosilyticus WH2 (5,244 predicted protein-coding genes) contributed 241,473 (4.5%) unique peptides, the highest of any member despite a high fraction of nonunique peptides (18.4%) within its theoretical peptidome. The evolutionary relatedness of the Bacteroides components of the artificial community appeared to negatively affect our ability to assign their peptides to specific proteins; their six theoretical peptidomes had the six lowest uniqueness levels. However, their greater number of proteins and peptides relative to the Firmicutes and Actinobacteria more than compensated for this deficiency; over 60% of unique peptides within the unique theoretical metaproteome were contributed by the Bacteroides. We also found that the proportion of PSMs uniquely assignable to a single protein within the metaproteome varied significantly by function, suggesting that some classes of proteins can be traced back to specific microbes more readily than others. For example, when considering all theoretical peptides that could be derived from the proteome of a particular bacterial species, those from proteins with roles in categories with high expected levels of functional conservation (e.g., translation and nucleotide metabolism) were on average deemed unique more often than those from proteins with roles in functions we might expect to be less conserved (e.g., glycan biosynthesis and metabolism) (see Table S8 for a summary of how peptide uniqueness varied across different KEGG categories and pathways, and across different species in the experiment). However, even in KEGG categories and pathways with high expected levels of functional conservation, the vast majority of peptides were found to be unique when a particular species was not closely related to other members of the artificial community. Next, we determined the average number of proteins that could be experimentally identified in our samples for each microbial species within each treatment group in E1. The results (Figure S8B, Table S9) illustrate two important conclusions. First, although equal concentrations of total protein were evaluated for each sample, slightly less than twice as many total microbial proteins were identified in samples from the LF/HPP-fed mice as those from mice fed the HF/HS diet (4,659 versus 2,777, respectively). While there are a number of possible explanations, both this finding and the higher number of mouse proteins detected in samples from HF/HS-fed animals are consistent with the results of our fecal DNA analysis, which indicated that the HF/HS diet supports lower levels of gut microbial biomass than the LF/HPP diet (Figure S4A,B). Second, a breakdown of all detected microbial proteins by species of origin (Figure S8B) revealed that the degree to which we could inspect protein expression for a given species was dictated largely by its relative abundance and the diet to which it was exposed. Our ability to detect many of B. cellulosilyticus WH2's expressed transcripts and proteins in samples from both diet treatment groups allowed us to determine how well RNA and protein data for an abundant, active member of the artificial community might correlate. These data also allowed us to evaluate whether or not the types of genes considered might influence the degree of correlation between these two datasets. We first performed a spectral count-based correlation analysis on the diet-induced, log-transformed, average fold-differences in expression for all B. cellulosilyticus WH2 genes that were detectable at both the RNA and protein level for both diets. The results revealed a moderate degree of linear correlation between RNA and protein observations (Figure S8C, black plot; r = 0.53). However, subsequent division of these genes into functionally related subsets, which were each subjected to their own correlation analysis, revealed striking differences in the degree to which RNA-level and protein-level expression changes agreed with one another. For example, diet-induced changes in mRNA expression for genes involved in translation showed virtually no correlation with changes measured at the protein level (Figure S8C, red plot; r = 0.03). Correlations for other categories of B. cellulosilyticus WH2 genes, such as those involved in energy metabolism (Figure S8C, green plot; r = 0.36) and amino acid metabolism (Figure S8C, orange plot; r = 0.48), were also poorer than the correlation for the complete set of detectable genes. In contrast, the correlation for the 110 genes with predicted involvement in carbohydrate metabolism was quite strong (Figure S8C, blue plot; r = 0.69), and was in fact the best correlation identified for any functional category of genes considered. The significant range of correlations observed in different categories of genes suggests that the degree to which RNA-based analyses provide an accurate picture of microbial adaptation to environmental perturbation may be strongly impacted by the functional classification of the genes involved. Additionally, these data further emphasize the need for enhanced dynamic range metaproteome measurements and better bioinformatic methods to assign/bin unique and nonunique peptides so that deeper and more thorough surveys of the microbial protein landscape can be performed and evaluated alongside more robust transcriptional datasets. Identifying Two Diet-Inducible, Xylanase-Containing PULs Whose Genetic Disruption Results in Diet-Specific Loss of Fitness Several of the most highly expressed and diet-sensitive B. cellulosilyticus WH2 genes in this study fell within two putative PULs. One PUL (BWH2_4044–55) contains 12 ORFs that include a dual susC/D cassette, three putative xylanases assigned to CAZy families GH8 and GH10, a putative multifunctional acetyl xylan esterase/α-L-fucosidase, and a putative hybrid two-component system regulator (Figure 4A). Gene expression within this PUL was markedly higher in mice consuming the plant polysaccharide-rich LF/HPP diet at both the mRNA and protein level. Our mRNA-level analysis disclosed that BWH2_4047 was the most highly expressed B. cellulosilyticus WH2 susD homolog on this diet. Likewise, BWH2_4046/4, the two susC-like genes within this PUL, were the 2nd and 4th most highly expressed B. cellulosilyticus WH2 susC-like genes in LF/HPP-fed animals, and exhibited expression level reductions of 99.5% and 93% in animals consuming the HF/HS diet. The same LF/HPP diet bias was observed for other genes within this PUL (Figures 2A and 4B) but not for neighboring genes. The same trends were obvious and amplified when we quantified protein expression (Figure 4C). In mice fed LF/HPP chow, only three B. cellulosilyticus WH2 SusC-like proteins had higher protein levels than BWH2_4044/6, and only two SusD-like proteins had higher levels than BWH2_4045/7. Strikingly, we were unable to detect a single peptide from 9 of the 12 proteins in this PUL in samples obtained from mice fed the HF/HS diet, emphasizing the strong diet specificity of this locus. Download: PPT PowerPoint slide PNG larger image TIFF original image Figure 4. Two xylanase-containing B. cellulosilyticus WH2 PULs demonstrating strong diet-specific expression patterns in vivo. (A) The PUL spanning BWH2_4044–55 includes a four-gene cassette comprising two consecutive susC/D pairs, multiple genes encoding GHs and CEs, and a gene encoding a putative hybrid two-component system (HTCS) presumed to play a role in the regulation of this locus. GH10 enzymes are endo-xylanases (most often endo-β-1,4-xylanases), while some GH5 and GH8 enzymes are also known to have endo- or exo-xylanase activity. CE6 enzymes are acetyl xylan esterases, as are some members of the CE1 family. A second PUL spanning BWH2_4072–6 contains a susC/D cassette, an endo-xylanase with dual GH10 modules as well as dual carbohydrate (xylan) binding modules (CBM22), a hypothetical protein of unknown function, and a putative HTCS. (B) Heatmap visualization of GeneChip expression data for BWH2_4044–55 and BWH2_4072–6 showing marked up-regulation of these putative PULs when mice are fed either a plant polysaccharide-rich LF/HPP diet or a diet high in fat and simple sugar (HF/HS), respectively. Data are from cecal contents harvested from mice at the endpoint of experiment E1. (C) Mass spectrometry-based quantitation of the abundance of all cecal proteins from the BWH2_4044–55 and BWH2_4072–6 PULs that were detectable in the same material used for GeneChip quantitation in panel (B). Bars represent results (mean ± SEM) from two technical runs per sample. For each MS run, the spectral counts for each protein were normalized against the total number of B. cellulosilyticus WH2 spectra acquired. (D) Comparison of in vivo PUL gene expression as measured by RNA-Seq (top) and the degree to which disruption of each gene within each PUL by a transposon impacts the fitness of B. cellulosilyticus WH2 on each diet, as measured by insertion sequencing (INSeq, bottom). For the lower plots, fitness measurements were calculated by dividing a mutant's representation (normalized sequencing counts) within the fecal output population by its representation within an input population that was introduced into germ-free animals via a single oral gavage along with other members of the artificial community. For cases in which no instances of a particular mutant could be measured in the fecal output (resulting in a fitness calculation denominator of zero), data are plotted as “<0.01” and are drawn without error bars. https://doi.org/10.1371/journal.pbio.1001637.g004 A second PUL in the B. cellulosilyticus WH2 genome composed of a susC/D-like pair (BWH2_4074/5), a putative hybrid two-component system regulator (BWH2_4076), and a xylanase (GH10) with dual carbohydrate binding module domains (CBM22) (BWH2_4072) (Figure 4A) demonstrated a strong but opposite diet bias, in this case exhibiting significantly higher expression in animals consuming the HF/HS “Western”-like diet. Our mRNA-level analysis showed that this xylanase was the second most highly expressed B. cellulosilyticus WH2 CAZyme in animals consuming this diet (Figure 2A). As with the previously described PUL, shotgun metaproteomics validated the transcriptional analysis (Figure 4B,C); with the exception of the gene encoding the PUL's presumed transcriptional regulator (BWH2_4076), diet specificity was substantial, with protein-level fold changes ranging from 10–33 across the locus (Table S10). Recent work by Cann and co-workers has done much to advance our understanding of the regulation and metabolic role of xylan utilization system gene clusters in xylanolytic members of the Bacteroidetes, particularly within the genus Prevotella [41]. The “core” gene cluster of the prototypical xylan utilization system they described consists of two tandem repeats of susC/susD homologs (xusA/B/C/D), a downstream hypothetical gene (xusE) and a GH10 endoxylanase (xyn10C). The 12-gene PUL identified in our study (BWH2_4044–55) appears to contain the only instance of this core gene cluster within the B. cellulosilyticus WH2 genome, suggesting that this PUL, induced during consumption of a plant polysaccharide-rich diet, is likely to be the primary xylan utilization system within this organism. A recent study characterizing the carbohydrate utilization capabilities of B. ovatus ATCC 8483 also identified two PULs involved in xylan utilization (BACOVA_04385–94, BACOVA_03417–50) whose gene configurations differ from those described in Prevotella spp. [25]. Interestingly, the five proteins encoded by the smaller xylanase-containing PUL described above (BWH2_4072–6) are homologous to the products of the last five genes in BACOVA_4385–94 (i.e., BACOVA_4390–4). The order of these five genes in these two loci is also identical. The similarities and differences observed when comparing the putative xylan utilization systems encoded within the genomes of different Bacteroidetes illustrate how its members may have evolved differentiated strategies for utilizing hemicelluloses like xylan. Having established that expression of BWH2_4044–55 and BWH2_4072–6 is strongly dictated by diet, we next sought to determine if these PULs are required by B. cellulosilyticus WH2 for fitness in vivo. A follow-up study was performed in which mice were fed either a LF/HPP or HF/HS diet after being colonized with an artificial community similar to the one used in E1 and E2 (see Materials and Methods). The wild-type B. cellulosilyticus WH2 strain used in our previous experiments was replaced with a transposon mutant library consisting of over 90,000 distinct transposon insertion mutants in 91.5% of all predicted ORFs (average of 13.9 distinct insertion mutants per ORF). The library was constructed using methods similar to those reported by Goodman et al. ([42]; see Materials and Methods) so that the relative proportion of each insertion mutant in both the input (orally gavaged) and output (fecal) populations could be determined using insertion sequencing (INSeq). The INSeq results revealed clear, diet-specific losses of fitness when components of these loci were disrupted (Figure 4D). Additionally, as observed in E1 and E2, expression of each PUL was strongly biased by diet, with genes BWH2_4072–5 demonstrating up-regulation on the HF/HS diet and BWH2_4044–55 on the LF/HPP diet. The extent to which a gene's disruption impacted the fitness of B. cellulosilyticus WH2 on one diet or the other correlated well with whether or not that gene was highly expressed on a given diet. For example, four of the five most highly expressed genes in the BWH2_4044–55 locus were the four genes shown to be most crucial for fitness on the LF/HPP diet. Of these four genes, three were susC or susD homologs (the fourth was the putative endo-1,4-β-xylanase thought to constitute the last element of the xylan utilization system core). Though the fitness cost of disrupting genes within BWH2_4044–55 varied from gene to gene, disruption of any one component of the BWH2_4072–6 PUL had serious consequences for B. cellulosilyticus WH2 in animals fed the HF/HS diet. This could suggest that while disruption of some components of the BWH2_4044–55 locus can be rescued by similar or redundant functions elsewhere in the genome, the same is not true for BWH2_4072–5. Notably, disruption of BWH2_4076, which is predicted to encode a hybrid two-component regulatory system, had negative consequences on either diet tested, indicating that regulation of this locus is crucial even when the PUL is not actively expressed. While many genes outside of these two PULs were also found to be important for the in vivo fitness of B. cellulosilyticus WH2, those within these PULs were among the most essential to diet-specific fitness, suggesting that these loci are central to the metabolic lifestyle of B. cellulosilyticus WH2 in the gut. Characterizing the Carbohydrate Utilization Capabilities of B. cellulosilyticus WH2 and B. caccae The results described in the preceding section indicate that B. cellulosilyticus WH2 prioritizes xylan as a nutrient source in the gut and that it tightly regulates the expression of its xylan utilization machinery. Moreover, the extraordinary number of putative CAZymes and PULs within the B. cellulosilyticus WH2 genome suggests that it is capable of growing on carbohydrates with diverse structures and varying degrees of polymerization. To characterize its carbohydrate utilization capabilities, we defined its growth in minimal medium (MM) supplemented with one of 46 different carbohydrates [25]. Three independent growths, each consisting of two technical replications, yielded a total of six growth curves for each substrate. Of the 46 substrates tested, B. cellulosilyticus WH2 grew on 39 (Table S11); they encompassed numerous pectins (6 of 6), hemicelluloses/β-glucans (8 of 8), starches/fructans/α-glucans (6 of 6), and simple sugars (14 of 15), as well as host-derived glycans (4 of 7) and one cellooligosaccharide (cellobiose). The seven substrates that did not support growth included three esoteric carbohydrates (carrageenan, porphyran, and alginic acid), the simple sugar N-acetylneuraminic acid, two host glyans (keratan sulfate and mucin O-glycans), and fungal cell wall-derived α-mannan. B. cellulosilyticus WH2 clearly grew more robustly on some carbohydrates than others. Excluding simple sugars, fastest growth was achieved on dextran (0.099±0.048 OD600 units/h), laminarin (0.095±0.014), pectic galactan (0.088±0.018), pullulan (0.088±0.026), and amylopectin (0.085±0.003). Although one study has reported that the type strain of B. cellulosilyticus degrades cellulose [43], the WH2 strain failed to demonstrate any growth on MM plus cellulose (specifically, Solka-Floc 200 FCC from International Fiber Corp.) after 5 d. Maximum cell density was achieved with amylopectin (1.17±0.02 OD600 units), dextran (1.12±0.20), cellobiose (1.09±0.08), laminarin (1.08±0.08), and xyloglucan (0.99±0.04). Total B. cellulosilyticus WH2 growth (i.e., maximum cell density achieved) on host-derived glycans was typically very poor, with only two substrates achieving total growth above 0.2 OD600 units (chondroitin sulfate, 0.50±0.04; glycogen, 0.99±0.02). The disparity between total growth on plant polysaccharides versus host-derived glycans, including O-glycans that are prevalent in host mucin, indicates a preference for diet-derived saccharides, consistent with our in vivo mRNA and protein expression data. We also determined how the growth rate of B. cellulosilyticus WH2 on these substrates compared to the growth rates for other prominent gut Bacteroides spp. After subjecting B. caccae to the same phenotypic characterization as B. cellulosilyticus WH2, we combined our measurements for these two strains with previously published measurements for B. thetaiotaomicron and B. ovatus [25]. The results underscored the competitive growth advantage B. cellulosilyticus WH2 likely enjoys when foraging for polysaccharides in the intestinal lumen. For example, of the eight hemicelluloses and β-glucans tested in our carbohydrate panel, B. cellulosilyticus WH2 grew fastest on six while B. ovatus grew fastest on two (Table S11). B. caccae and B. thetaiotaomicron, on the other hand, failed to grow on any of these substrates. Across all the carbohydrates for which data are available for all four species, B. cellulosilyticus WH2 grew fastest on the greatest number of polysaccharides (11 of 26) and tied with B. caccae for the greatest number of monosaccharides (6 of 15). B. thetaiotaomicron and B. caccae did, however, outperform the other two Bacteroides tested with respect to utilization of host glycans in vitro, demonstrating superior growth rates on four of five substrates tested (Table S11). B. cellulosilyticus WH2's rapid growth to high densities on xylan, arabinoxylan, and xyloglucan, as well as xylose, arabinose, and galactose, is noteworthy given our prediction that two of its most tightly regulated, highly expressed PULs appear to be involved in the utilization of xylan, arabinoxylan, or some closely related polysaccharide. To identify specific mono- and/or polysaccharides capable of triggering the activation of these two PULs, as well as the 111 other putative PULs within the B. cellulosilyticus WH2 genome, we used RNA-Seq to characterize its transcriptional profile at mid-log phase in MM (Table S12) plus one of 16 simple sugars or one of 15 complex sugars (Table S13) (see Materials and Methods; n = 2–3 cultures/substrate; 5.2–14.0 million raw Illumina HiSeq reads generated for each of the 90 transcriptomes). After mapping each read to the B. cellulosilyticus WH2 reference gene set, counts were normalized using DESeq to allow for direct comparisons across samples and conditions. Hierarchical clustering of the normalized dataset resulted in a well-ordered dendrogram in which samples clustered almost perfectly by the carbohydrate on which B. cellulosilyticus WH2 was grown (Figure 5A). The consistency of this clustering illustrates that (i) technical replicates within each condition exhibit strong correlations with one another, suggesting any differences between cultures in a treatment group (e.g., small differences in density or growth phase) had at best minor effects on aggregate gene expression, and (ii) growth on different carbohydrates results in distinct, substrate-specific gene expression signals capable of driving highly discriminatory differences between treatment groups. The application of rigorous bootstrapping to our hierarchical clustering results also revealed several cases of higher level clusters in which strong confidence was achieved. These dendrogram nodes (illustrated as white circles) indicate sets of growth conditions that yield gene expression patterns more like each other than like the patterns observed for other substrates. Two notable examples were xylan/arabinoxylan (which are structurally related and share the same xylan backbone) and L-fucose/L-rhamnose (which are known to be metabolized via parallel pathways in E. coli [44]). Download: PPT PowerPoint slide PNG larger image TIFF original image Figure 5. In vitro microbial RNA-Seq profiling of B. cellulosilyticus WH2 during growth on different carbohydrates. (A) Hierarchical clustering of the gene expression profiles of 90 cultures grown in minimal medium supplemented with one of 31 simple or complex sugars (n = 2–3 replicates per condition). Circles at dendrogram branch points identify clusters with strong bootstrapping support (>95%; 10,000 repetitions). Solid circles denote clusters comprising only replicates from a single treatment group/carbohydrate, while open circles denote higher level clusters comprising samples from multiple treatment groups. Colored rectangles indicate the type of carbohydrate on which the samples within each cluster were grown. (B) Unclustered heatmap representation of fold-changes in gene expression relative to growth on minimal medium plus glucose (MM-Glc) for 60 of the 236 paired susC- and susD-like genes identified within the B. cellulosilyticus WH2 genome (for a full list of all paired and unpaired susC and susD homologs, see Table S2). Data shown are limited to those genes whose expression on at least one of the 31 carbohydrates tested demonstrated a >100-fold increase relative to growth on MM-Glc for at least one of the replicates within the treatment group. Yellow boxes denote areas of the map where both genes in a susC/D pair were up-regulated >100-fold for at least two of the replicates in a treatment group and where the average up-regulation for each gene in the pair was >100-fold across all replicates of the treatment group. Two sets of columns to the right of the heatmap indicate PULs that were detectably expressed at the mRNA level (left set of columns) and/or protein level (right set of columns) in experiment 1 (E1). Red and black circles indicate that both genes in a susC/D pair were consistently expressed on a particular diet, as determined by GeneChip analysis of cecal RNA (≥5 of 7 animals assayed) or LC-MS/MS analysis of cecal protein (2 of 2 animals assayed). In both cases, a red circle denotes significantly higher expression on one diet compared to the other. https://doi.org/10.1371/journal.pbio.1001637.g005 Importantly, these findings suggested that by considering in vitro profiling data alongside in vivo expression data from the artificial community, it might be possible to identify the particular carbohydrates to which B. cellulosilyticus WH2 is exposed and responding within its gut environment. To explore this concept further, we compared expression of each gene in each condition to its expression on our control treatment, MM plus glucose (MM-Glc). The results revealed a dynamic PUL activation network in which some PULs were activated by a single substrate, some were activated by multiple substrates, and some were transcriptionally silent across all conditions tested. Of the 118 putative susC/D pairs in B. cellulosilyticus WH2 that we have used as markers of PULs, 30 were dramatically activated on one or more of the substrates tested; in these cases, both the susC- and susD-like genes in the cassette were up-regulated an average of >100-fold relative to MM-Glc across all technical replicates (Figure 5B). At least one susC/D activation signature was identified for every one of the 17 oligosaccharides and polysaccharides and for six of the 13 monosaccharides tested (Table S14). The lack of carbohydrate-specific PUL activation events for some monosaccharides (fructose, galactose, glucuronic acid, sucrose, and xylose) was expected, given that these loci are primarily dedicated to polysaccharide acquisition. Further inspection of gene expression outside of PULs disclosed that B. cellulosilyticus WH2 prioritizes use of its non-PUL-associated carbohydrate machinery, such as putative phosphotransferase system (PTS) components and monosaccharide permeases, when grown on these monosaccharides (Table S14). Several carbohydrates activated the expression of multiple PULs. Growth on water-soluble xylan and wheat arabinoxylan produced significant up-regulation of five susC/D-like pairs (BWH2_0865/6, 0867/8, 4044/5, 4046/7, and 4074/5). No other substrate tested activated as many loci within the genome, again hinting at the importance of xylan and arabinoxylan to this strain's metabolic strategy in vivo. Cecal expression data from E1 showed that 15 of these activated PULs were expressed in vivo on one or both of the diets tested (see circles to the right of the heatmap in Figure 5B). In mice fed the polysaccharide-rich LF/HPP chow, B. cellulosilyticus WH2 up-regulates three susC/D pairs (BWH2_2717/8, 4044/5, 4046/7) whose expression is activated in vitro by arabinan and xylan/arabinoxylan. The three most significantly up-regulated susC/D pairs (BWH2_1736/7, 2514/5, 4074/5) in mice fed the HF/HS diet rich in sugar, corn starch, and maltodextrin are activated in vitro by amylopectin, ribose, and xylan/arabinoxylan, respectively. All three PULs identified as being up-regulated at the RNA level in LF/HPP-fed mice were also found to be up-regulated at the protein level (Figure 5B). Two of the three PULs up-regulated at the mRNA level in HF/HS-fed mice were up-regulated at the protein level as well. The presence of an amylopectin-activated PUL among these two loci is noteworthy, given the significant amount of starch present in the HF/HS diet. The up-regulation of four other PULs in HF/HS-fed animals was only evident in our LC-MS/MS data, reinforcing the notion that protein data both complement and supplement mRNA data when profiling microbes of interest. Two of the five susC/D pairs activated by xylan/arabinoxylan form the four-gene cassette in the previously discussed PUL comprising BWH2_4044–55 that is activated in mice fed the plant polysaccharide-rich chow (see Figure 4A). Another one of the five is the susC/D pair found in the PUL comprising BWH2_4072–6 that is activated in mice fed the HF/HS “Western”-like chow (see Figure 4A). Thus, we have identified a pair of putative PULs in close proximity to one another on the B. cellulosilyticus WH2 genome that encode CAZymes with similar predicted functions, are subject to near-identical levels of specific activation by the same two polysaccharides (i.e., xylan, arabinoxylan) in vitro, but are discordantly regulated in vivo in a diet-specific manner. The highly expressed nature of these PULs in the diet environment where they are active, their shared emphasis on xylan/arabinoxylan utilization, and their tight regulation indicate that they are likely to be important for the in vivo success of this organism in the two nutrient environments tested. However, the reasons for their discordant regulation are unclear. One possibility is that in addition to being activated by xylan/arabinoxylan and related polysaccharides, these loci are also subject to repression by other substrates present in the lumen of the gut, and this repression is sufficient to block activation. Alternatively, the specific activators of each PUL may be molecular moieties shared by both xylan and arabinoxylan that do not co-occur in the lumenal environment when mice are fed the diets tested in this study. Prospectus Elucidating generalizable “rules” for how microbiota operate under different environmental conditions is a substantial challenge. As our appreciation for the importance of the gut microbiota in human health and well-being grows, so too does our need to develop such rules using tractable experimental models of the gut ecosystem that allow us to move back and forth between in vivo and ex vivo analyses, using one to inform the other. Here, we have demonstrated the extent to which high-resolution DNA-, mRNA-, and protein-level analyses can be applied (and integrated) to study an artificial community of sequenced human gut microbes colonizing gnotobiotic mice. Our efforts have focused on characterizing community-level and species-level adaptation to dietary change over time and “leveraging” results obtained from in vitro assessments of individual species' responses to a panel of purified carbohydrates to deduce glycan exposures and consumption strategies in vivo. This experimental paradigm could be applied to any number of questions related to microbe–microbe, environment–microbe, and host–microbe interactions, including, for example, the metabolic fate of particular nutrients of interest (metabolic flux experiments), microbial succession, and biotransformations of xenobiotics. Studying artificial human gut microbial communities in gnotobiotic mice also allows us to evaluate the technical limitations of current molecular approaches for characterizing native communities. For example, the structure of an artificial community can be evaluated over time at low cost using short read shotgun DNA sequencing data mapped to all microbial genomes within the community (COPRO-Seq). This allows for a much greater depth of sequencing coverage (i.e., more sensitivity) and much less ambiguity in the assignment of reads to particular taxa than traditional 16S rRNA gene-based sequencing. Short read cDNA sequences transcribed from total microbial community RNA can also often be assigned to the exact species and gene from which they were derived, and the same is also often true for peptides derived from particular bacterial proteins. However, substantial dynamic range in species/transcript/protein abundance within any microbiota, defined or otherwise, imposes limits on our ability to characterize the least abundant elements of these systems. The effort to obtain a more complete understanding of the operations and behaviors of minor components of the microbiota is an area deserving of significant attention, given known examples of low-abundance taxa that play key roles within their larger communities and in host physiology [2],[45]. Developing such an understanding requires methods and assays that are collectively capable of assessing the structure and function of a microbiota at multiple levels of resolution. The need for high sensitivity and specificity in these approaches will become increasingly relevant as we transition towards experiments involving defined communities of even greater complexity, including bacterial culture collections prepared from the fecal microbiota of humans [46]. We anticipate that the study of sequenced culture collections transplanted to gnotobiotic mice will be instrumental in determining the degree to which physiologic or pathologic host phenotypes can be ascribed to the microbiota as well as specific constituent taxa. The recent development of a low-error 16S ribosomal RNA amplicon sequencing method (LEA-Seq) and the application of this method to the fecal microbiota of 37 healthy adults followed for up to 5 years indicated that individuals in this cohort contained 195±48 bacterial strains representing 101±27 species [47]. Furthermore, stability follows a power-law function, suggesting that once acquired, most gut strains in a person are present for decades. New advances in the culturing of fastidious gut microbes may one day allow us to capture most (or all) of the taxonomic and functional diversity present within an individual's fecal microbiota as a clonally arrayed, sequenced culture collection, providing a perfectly representative and defined experimental model of their gut community. In the meantime, first-generation artificial communities of modest complexity such as the one described here offer a way of studying many questions related to the microbiota. However, the limited complexity and composition of our 12-species artificial community, and the way in which it was assembled in germ-free mice, make it an imperfect model of more complex human microbiota. Native microbial communities, for example, are subject to the influence of variables that are notably absent from this system, such as intraspecies genetic variability and exogenous microbial inputs. There are also taxa (e.g., Proteobacteria, Bifidobacteria) and microbial guilds (e.g., butyrate producers) typical of human gut communities that are absent from our defined assemblage that could be used to augment this system in order to improve our understanding of how their presence/absence influences a microbiota's response to diet and a spectrum of other variables of interest. These future attempts to systematically increase complexity should reveal what trends, patterns, and trajectories observed in artificial assemblages such as the one reported here map or do not map onto natural communities. Finally, one of the greatest advantages of studying defined assemblages in mice is that they afford us the ability to interrogate the biology of key bacterial species in a focused manner. The artificial community we used in our experiments included B. cellulosilyticus WH2, a species that warrants further study as a model gut symbiont given its exceptional carbohydrate utilization capabilities, its apparent fitness advantage over many other previously characterized gut symbionts, and its genetic tractability. This genetic tractability should facilitate future experiments in which transposon mutant libraries are screened in vivo as one component of a larger artificial community in order to identify this strain's most important fitness determinants under a wide variety of dietary conditions. Identifying the genetic elements that allow B. cellulosilyticus to persist at the relatively high levels observed, regardless of diet, should provide microbiologists and synthetic biologists with new “standard biological parts” that will aid them in developing the next generation of prebiotics, probiotics, and synbiotics. Sequencing the Bacteroides cellulosilyticus WH2 Genome Though at least eight complete and 68 draft genomes of Bacteroides spp. are currently available [27], there are numerous examples of distinct clades within this genus where little genomic information exists. To further explore the genome space of one such clade, we obtained a human fecal isolate whose four 16S rRNA gene sequences indicate a close relationship to Bacteroides cellulosilyticus (Figure S1A,B). The genome of this isolate, which we have designated B. cellulosilyticus WH2, was sequenced deeply, yielding a high-quality draft assembly (23 contigs with an N50 value of 798,728 bp; total length of all contigs in the assembly, 7.1 Mb; Table S1). Annotation of its 5,244 predicted protein-coding genes using the carbohydrate active enzyme (CAZy) database [28] revealed an extraordinary complement of 503 CAZymes comprising 373 GHs, 23 PLs, 28 carbohydrate esterases (CEs), and 84 glycosyltransferases (GTs) (see Table S2 for all annotated genes in the B. cellulosilyticus WH2 genome predicted to have relevance to carbohydrate metabolism). One distinguishing feature of gut Bacteroides genomes is the substantial number of CAZymes they encode relative to those of other intestinal bacteria [29]. The B. cellulosilyticus WH2 CAZome is enriched in a number of GH families even when compared with prominent representatives of the gut Bacteroidetes (Figure S2A). When we expanded this comparison to include all 86 Bacteroidetes in the CAZy database, we found that the B. cellulosilyticus WH2 genome had the greatest number of genes for 19 different GH families, as well as genes from two GH families that had not previously been observed within a Bacteroidetes genome (Figure S2B). Altogether, B. cellulosilyticus WH2 has more GH genes at its disposal than any other Bacteroidetes species analyzed to date. In Bacteroides spp., CAZymes are often located within PULs [30]. At a minimum, a typical PUL harbors a pair of genes with significant homology to the susC and susD genes of the starch utilization system (Sus) in B. thetaiotaomicron [30]–[32]. Other genes encoding enzymes capable of liberating oligo- and monosaccharides from a larger polysaccharide are also frequently present. The susC- and susD-like genes of these loci encode the proteins that comprise the main outer membrane binding and transport apparatus and thus represent key elements of these systems. A search of the B. cellulosilyticus WH2 genome for genes with strong homology to the susC- and susD-like genes in B. thetaiotaomicron VPI-5482 revealed an unprecedented number of susC/D pairs (a total of 118). Studies of other prominent Bacteroides spp. have found that the evolutionary expansion of these genes has played an important role in endowing the Bacteroides with the ability to degrade a wide range of host- and plant-derived polysaccharides [25],[33]. Analysis of deeply sampled adult human gut microbiota datasets indicates that B. cellulosilyticus strains are common, colonizing approximately 77% of 124 adult Europeans characterized in one study [18] and 62% of 139 individuals living in the United States examined in another survey [20]. We hypothesized that the apparent success of B. cellulosilyticus in the gut is derived in part from its substantial arsenal of genes involved in carbohydrate utilization. Measuring Changes in the Structural Configuration of a 12-Member Model Microbiota in Response to a Dietary Perturbation To test the fitness of B. cellulosilyticus WH2 in relation to other prominent gut symbionts, and the importance of diet on its fitness, we carried out an experiment in gnotobiotic mice (experiment 1, “E1,” Figure S3). Two groups of 10–12-wk-old male germ-free C57BL/6J animals were moved to individual cages within gnotobiotic isolators (n = 7 animals/group). At day zero, each animal was colonized by oral gavage with an artificial community comprising 12 human gut bacterial species (Figure 1A, Table S3). Each species chosen for inclusion in this microbial assemblage met four criteria: (i) it was a member of one of three bacterial phyla routinely found in the human gut (i.e., Bacteroidetes, Firmicutes, or Actinobacteria), (ii) it was identified as a prominent member of the human gut microbiota in previous culture-independent surveys, (iii) it could be grown in the laboratory, and (iv) its genome had been sequenced to at least a high-quality draft level. Species were also selected for their functional attributes (as judged by their annotated gene content) in an effort to create an artificial community that was somewhat representative of a more complex human microbiota. For example, although more than half of the species in the assemblage were Bacteroidetes predicted to excel at the breakdown of polysaccharides, several were also prominent inhabitants of the human gut that are thought to have limited carbohydrate utilization capabilities (e.g., Firmicutes from Clostridium cluster XIVa). Some attributes for the 12 strains included in the artificial community are provided in Table S4. Download: PPT PowerPoint slide PNG larger image TIFF original image Figure 1. COPRO-Seq analysis of the structure of a 12-member artificial human gut microbial community as a function of diet and time. (A) The 12 bacterial species comprising the artificial community. (B) Principal coordinates analysis (PCoA) was applied to relative abundance data generated by COPRO-Seq from two experiments (E1, E2), each spanning 6 wk. Following colonization (day 0), mice were switched between two different diets at 2-wk intervals as described in Figure S3. COPRO-Seq data from E1 and E2 were ordinated in the same multidimensional space. For clarity, only data from E2 are shown here (for the E1 PCoA plot, see Figure S5A). Red/blue, feces; pink/cyan, cecal contents. (C) Proportional abundance data from E1 illustrating the impact of diet on fecal levels of a diet-sensitive strain with higher representation on HF/HS chow (B. caccae), a diet-sensitive strain with higher representation on LF/HPP chow (B. ovatus), a diet-insensitive strain with no obvious diet preference (B. thetaiotaomicron), and a diet-sensitive strain with a preference for the LF/HPP diet that also achieves a high level of representation on the HF/HS diet (B. cellulosilyticus WH2). Mean values ± SEM are shown. Plots illustrating changes in abundance over time for all species in both experiments are provided in Figure S4C. https://doi.org/10.1371/journal.pbio.1001637.g001 For 2 wk, each treatment group was fed a standard low-fat/high-plant polysaccharide (LF/HPP) mouse chow, or a “Western”-like diet where calories are largely derived from fat, starch, and simple sugars (high-fat/high-sugar (HF/HS)) [12]. Over the course of 6 wk, diets were changed twice at 2-wk intervals, such that each group began and ended on the same diet, with an intervening 2-wk period during which the other diet was administered (Figure S3). Using fecal DNA as a proxy for microbial biomass, the plant polysaccharide-rich LF/HPP diet supported 2- to 3-fold more total bacterial growth (primary productivity) despite its lower caloric density (3.7 kcal/g versus 4.5 kcal/g for the HF/HS diet; Figure S4A). The HF/HS diet contains carbohydrates that are easily metabolized and absorbed in the proximal intestine (sucrose, corn starch, and maltodextrin), with cellulose being the one exception (4% of the diet by weight versus 46.3% for the other carbohydrate sources). Thus, in mice fed the HF/HS diet, diet-derived simple sugars are likely to be rare in the distal gut where the vast majority of gut microbes reside; this may provide an advantage to those bacteria capable of utilizing other carbon sources (e.g., proteins/oligopeptides, host glycans). In mice fed the LF/HPP diet, on the other hand, plant polysaccharides that are indigestible by the host should provide a plentiful source of energy for saccharolytic members of the artificial community. To evaluate the impact of each initial diet and subsequent diet switch on the structural configuration of the artificial community, we performed shotgun sequencing (community profiling by sequencing; COPRO-Seq) [11] of DNA isolated from fecal samples collected throughout the course of the experiment, as well as cecal contents collected at sacrifice. The relative abundances of the species in each sample (defined by the number of sequencing reads that could be unambiguously assigned to each microbial genome after adjusting for genome uniqueness) were subjected to ordination by principal coordinates analysis (PCoA) (Figure S5A). As expected, diet was found to be the predominant explanatory variable for observed variance (see separation along principal coordinate 1, “PC1,” which accounts for 52% of variance). The overall structure of the artificial community achieved quasi-equilibrium before the midpoint of the first diet phase, as evidenced by the lack of any significant movement along PC1 after day five. A structural reconfiguration also took place over the course of ∼5 d following transition to the second diet phase. Notably, the two treatment groups underwent a near-perfect inversion in their positions along PC1 after the first diet switch; the artificial community in animals switched from a LF/HPP to HF/HS diet took on a structure like that which arose by the end of the first diet phase in animals consuming the HF/HS diet, and vice versa. The second diet switch from phase 2 to 3 resulted in a similar movement along PC1 in the opposite direction, indicating a reversion of the artificial community's configuration to its originally assembled structure in each treatment group. These results, in addition to demonstrating the significant impact of these two diets on the structure of this 12-member artificial human gut community, also suggest that an assemblage of this size is capable of demonstrating resilience in the face of substantial diet perturbations. The assembly process and observed diet-induced reconfigurations also proved to be highly reproducible as evidenced by COPRO-Seq results from a replication of E1 (experiment 2, “E2”). In this follow-up experiment, fecal samples were collected more frequently than in E1, providing a dataset with improved temporal resolution. Ordination of E2 COPRO-Seq data by PCoA showed that (i) for each treatment group in E2, the artificial community assembles in a manner similar to its counterpart in E1; (ii) structural reconfigurations in response to diet occur with the same timing as in E1; and (iii) the quasi-equilibria achieved during each diet phase are highly similar between experiments for each treatment group (compare Figures 1B and S5A). As in E1, cecal data for each E2 treatment group overlap with their corresponding fecal samples, and DNA yields from E2 fecal samples vary substantially as a function of host diet (Figure S4B). COPRO-Seq provides precise measurements of the proportional abundance of each member species present in the artificial community. Data collected in both E1 and E2 (Table S5) revealed significant differences between members in terms of the maximum abundance levels they achieved, the rates at which their abundance levels were impacted by diet shifts, and the degree to which each species demonstrated a preference for one diet over another (Figure S4C). Changes in each species' abundance over time replicated well across animals in each treatment group, suggesting the assembly process and diet-induced reconfigurations occur in an orderly, rules-based fashion and with minimal stochasticity in this artificial community. A species' relative abundance immediately after colonization (i.e., 24 h after gavage/day 1) was, in general, a poor predictor of its abundance at the end of the first diet phase (i.e., day 13) (E1 R2 = 0.23; E2 R2 = 0.27), suggesting that early dominance of the founder population was not strongly tied to relative success in the assembly process. In mice initially fed a HF/HS diet, four Bacteroides spp. (Bacteroides caccae, B. cellulosilyticus WH2, B. thetaiotaomicron, and Bacteroides vulgatus) each achieved a relative abundance of ≥10% by the end of the first diet phase (day 13 postgavage), with B. caccae attaining the highest levels (37.1±4.9% and 34.2±5.5%; group mean ± SD in E1 and E2, respectively). In animals fed the plant polysaccharide-rich LF/HPP chow during the first diet phase, B. cellulosilyticus WH2 was dominant, achieving levels of 37.1±2.0% (E1) and 41.6±3.9% (E2) by day 13. B. thetaiotaomicron and B. vulgatus also attained relative abundances of >10%. Changes in diet often resulted in rapid, dramatic changes in a species' proportional representation. Because the dynamic range of abundance values observed when comparing multiple species was substantial (lowest, Dorea longicatena (<0.003%); highest, B. caccae (55.0%)), comparing diet responses on a common scale using raw abundance values was challenging. To represent these changes in a way that scaled absolute increases/decreases in relative abundance to the range observed for each strain, we also normalized each species' representation within the artificial community at each time-point to the maximum proportional abundance each microbe achieved across all time-points within each mouse. Plotting the resulting measure of abundance (percentage of maximum achieved; PoMA) over time demonstrates which microbes are strongly responsive to diet (experience significant swings in PoMA value following a diet switch) and which are relatively diet-insensitive (experience only modest or no significant change in PoMA value following a diet switch). Heatmap visualization of E1 PoMA values (Figure S5B) indicated that those microbes with a preference for a particular diet in one animal treatment group also tended to demonstrate the same diet preference in the other. Likewise, diet insensitivity was also consistent across treatment groups; diet-insensitive microbes were insensitive regardless of the order in which diets were introduced. Of the diet-sensitive taxa, those showing the most striking responses were B. caccae and B. ovatus, which strongly preferred the “Western”-like HF/HS diet and the polysaccharide-rich LF/HPP diet, respectively (Figures 1C and S4C). Among the diet-insensitive taxa, B. thetaiotaomicron showed the most stability in its representation (Figures 1C and S4C), consistent with its reputation as a versatile forager. Paradoxically, B. cellulosilyticus WH2 was both diet-sensitive and highly fit on its less-preferred diet; although this strain clearly achieved higher levels of representation in animals fed the LF/HPP diet, it also maintained strong levels of representation in animals fed the HF/HS diet (Figures 1C and S4C). When taking into account the abundance data for all 12 artificial community members, proportional representation at the end of the first diet phase (i.e., day 13) was a good predictor of representation at the end of the third diet phase (i.e., day 42) (E1 R2 = 0.77; E2 R2 = 0.84), suggesting that the intervening dietary perturbation had little effect on the ultimate outcomes for most species within this assemblage. However, one very low-abundance strain (D. longicatena) achieved significantly different maximum percentage abundances across the two treatment groups in each experiment, suggesting that steady-state levels of this strain may have been impacted by diet history. In mice initially fed the LF/HPP diet, D. longicatena was found to persist throughout the experiment at low levels on both diet regimens. In mice initially fed the HF/HS diet, D. longicatena dropped below the limit of detection before the end of the first diet phase, was undetectable by the end of the second diet phase, and remained undetectable throughout the rest of the time course. This interesting example raises the possibility that for some species, irreversible hysteresis effects may play a significant role in determining the likelihood that they will persist within a gut over long periods of time. The Cecal Metatranscriptome Sampled at the Time of Sacrifice These diet-induced reconfigurations in the structure of the artificial community led us to examine the degree to which its members were modifying their metabolic strategies. To establish an initial baseline static view of expression data for each microbe on each diet, we developed a custom GeneChip whose probe sets were designed to target 46,851 of the 48,023 known or predicted protein-coding genes within our artificial human gut microbiome (see Materials and Methods). Total RNA was collected from the cecal contents of each animal in E1 at the time of sacrifice and hybridized to this GeneChip. The total number of genes whose expression was detectable on each diet was remarkably similar (14,929 and 14,594 detected in the LF/HPP→HF/HS→LF/HPP and HF/HS→LF/HPP→HF/HS treatment groups, respectively). A total of 11,373 genes (24.3%) were expressed on both diets (Figure S6A), while 2,003 (4.3%) were differentially expressed to a statistically significant degree, including 161 (6.1%) of the 2,640 genes in the microbiome encoding proteins with CAZy-recognized domains. Figure S6B illustrates the fraction of the community-level CAZome and several species-level CAZomes expressed on each diet (see Table S6 for a comprehensive list of all genes, organized by species and fold-change in expression, whose cecal expression was detectable on each diet and all genes whose expression was significantly different when comparing data from each treatment group). Among taxa demonstrating obvious diet preferences (as judged by relative abundance data), B. caccae and B. cellulosilyticus WH2 provided examples of CAZy-level responses to diet change that were different in several respects. Our observations regarding the carbohydrate utilization capabilities and preferences of B. caccae are summarized in Text S1. However, our ability to evaluate shifts in B. caccae's metabolic strategy in the gut was limited by its very low abundance in animals fed LF/HPP chow (i.e., our mRNA and subsequent protein assays were often not sensitive enough to exhaustively sample B. caccae's metatranscriptome and metaproteome). In contrast, the abundance of B. cellulosilyticus WH2, which favored the LF/HPP diet, remained high enough on both diets to allow for a comprehensive analysis of its expressed genes and proteins. This advantage, along with the exceptional carbohydrate utilization machinery encoded within the genome of this organism, encouraged us to focus on further dissecting the responses of B. cellulosilyticus WH2 to diet changes. Detailed inspection of the expressed B. cellulosilyticus WH2 CAZome (503 CAZymes in total) provided an initial view of this microbe's sophisticated carbohydrate utilization strategy. A comparison of the top decile of expressed CAZymes on each diet disclosed many shared elements between the two lists, spanning many different CAZy families, with just over half of the 50 most expressed enzymes on the plant polysaccharide-rich LF/HPP chow also occurring in the list of most highly expressed enzymes on the sucrose-, corn starch-, and maltodextrin-rich HF/HS diet (Figure 2A). Twenty-five of the 50 most expressed CAZymes on the LF/HPP diet were significantly up-regulated compared to the HF/HS diet; of these, seven were members of the GH43 family (Figure 2B). The GH43 family consists of enzymes with activities required for the breakdown of plant-derived polysaccharides such as hemicellulose and pectin. Inspection of the enzyme commission (EC) annotations for the most up-regulated GH43 genes shows that they encode xylan 1,4-β-xylosidases (EC 3.2.1.37), arabinan endo-1,5-α-L-arabinosidases (EC 3.2.1.99), and α-L-arabinofuranosidases (EC 3.2.1.55). The GH10 family, which is currently comprised exclusively of endo-xylanases (EC 3.2.1.8, EC 3.2.1.32), was also well represented among this set of 25 genes, with four of the seven putative GH10 genes in the B. cellulosilyticus WH2 genome making the list. Strikingly, of the 45 predicted genes with putative GH43 domains in the B. cellulosilyticus WH2 genome, none were up-regulated on the “Western”-style HF/HS diet. Download: PPT PowerPoint slide PNG larger image TIFF original image Figure 2. B. cellulosilyticus WH2 CAZyme expression in mice fed different diets. (A) Overview of the 50 most highly expressed B. cellulosilyticus WH2 CAZymes (GHs, GTs, PLs, and CEs) for samples from each diet treatment group. List position denotes the rank order of gene expression for each treatment group, with higher expression levels situated at the top of each list. Genes common to both lists are identified by a connecting line, with the slope of the line indicating the degree to which a CAZyme's prioritized expression is increased/decreased from one diet to the other. CAZy families in bold, colored letters highlight those list entries found to be significantly up-regulated relative to the alternative diet (i.e., a CAZyme with a bold green family designation was up-regulated on the LF/HPP diet; a bold orange family name implies a gene was up-regulated significantly on the HF/HS diet). Statistically significant fold-changes between diets are denoted in the “F.C.” column (nonsignificant fold-changes are omitted for clarity). (B) Breakdown by CAZy family of the top 10% most expressed CAZymes on each diet whose expression was also found to be significantly higher on one diet than the other. Note that for each diet, the family with the greatest number of up-regulated genes was also exclusively up-regulated on that diet (LF/HPP, GH43; HF/HS, GH13). In total, 25 genes representative of 27 families and 12 genes representative of 13 families are shown for the LF/HPP and HF/HS diets, respectively. https://doi.org/10.1371/journal.pbio.1001637.g002 The most highly expressed B. cellulosilyticus WH2 CAZyme on the plant polysaccharide-rich chow (which was also highly-expressed on the HF/HS chow) was BWH2_1228, a putative α-galactosidase from the GH36 family. These enzymes, which are not expressed by humans in the stomach or intestine, cleave terminal galactose residues from the nonreducing ends of raffinose family oligosaccharides (RFOs, including raffinose, stachyose, and verbascose), galacto(gluco)mannans, galactolipids, and glycoproteins. RFOs, which are well represented in cereal grains consumed by humans, are expected to be abundant in the LF/HPP diet given its ingredients (e.g., soybean meal), but potential substrates in the HF/HS diet are less obvious, possibly implicating a host glycolipid or glycoprotein target. Surface glycans in the intestinal epithelium of rodents are decorated with terminal fucose residues [34] as well as terminal sialic acid and sulfate [35]. Hydrolysis of the α-2 linkage connecting terminal fucose residues to the galactose-rich extended core is thought to be catalyzed in large part by GH95 and GH29 enzymes [36]. The B. cellulosilyticus WH2 genome is replete with putative GH95 and GH29 genes (total of 12 and 9, respectively), but only a few (BWH2_1350/2142/3154/3818) were expressed in vivo on at least one diet, and their expression was low relative to many other CAZymes (see Table S6). Cleavage of terminal sialic acids present in host mucins by bacteria is usually carried out by GH33 family enzymes. B. cellulosilyticus WH2 has two GH33 genes that are expressed on either one diet (BWH2_3822, HF/HS) or both diets (BWH2_4650), but neither is highly expressed relative to other B. cellulosilyticus WH2 CAZymes. Therefore, utilization of host glycans by B. cellulosilyticus WH2, if it occurs, likely requires partnerships with other members of the artificial community that express GH29/95/33 enzymes (see Table S6 for a list of members that express these enzymes in a diet-independent and/or diet-specific fashion). Among the 50 most highly expressed B. cellulosilyticus WH2 CAZymes, 12 were significantly up-regulated on the HF/HS diet compared to the LF/HPP diet, with members of family GH13 being most prevalent. While the enzymatic activities and substrate specificities of GH13 family members are varied, most relate to the hydrolysis of substrates comprising chains of glucose subunits, including amylose (one of the two components of starch) and maltodextrin, both prominent ingredients in the HF/HS diet. GeneChip-based profiling of the E1 cecal communities provided a snapshot of the metatranscriptome on the final day of the final diet phase in each treatment group. The analysis of B. cellulosilyticus WH2 CAZyme expression suggested that this strain achieves a “generalist” lifestyle not by relying on substrates that are present at all times (e.g., host mucins), but rather by modifying its resource utilization strategy to effectively compete with other microbes for diet-derived polysaccharides that are not metabolized by the host. Community-Level Analysis of Diet-Induced Changes in Microbial Gene Expression To develop a more complete understanding of the dynamic changes that occur in gene expression over time and throughout the artificial community following diet perturbations, we performed microbial RNA-Seq analyses using feces obtained at select time-points from mice in the LF/HPP→HF/HS→LF/HPP treatment group of E2 (Figure S3). We began with a “top-down” analysis in which every RNA-Seq read count from every gene in the artificial microbiome was binned based on the functional annotation of the gene from which it was derived, regardless of its species of origin. In this case, the functional annotation used as the binning variable was the predicted EC number for a gene's encoded protein product. Expecting that some changes might occur rapidly, while others might require days or weeks, we searched for significant differences between the terminal time-points of the first two diet phases (i.e., points at which the model human gut microbiota had been allowed 13 d to acclimate to each diet). The 157 significant changes we identified were subjected to hierarchical clustering by EC number to determine which functional responses occurred with similar kinetics. The results revealed that in contrast to the rapid, diet-induced structural reconfigurations observed in this artificial community, community-level changes in microbial gene expression occurred with highly variable timing that differed from function to function. These changes were dominated by EC numbers associated with enzymatic reactions relevant to carbohydrate and amino acid metabolism (see Table S7 for a summary of all significant changes observed, including aggregate expression values for each functional bin (EC number) at each time-point). Significant responses could be divided into one of three groups: “rapid” responses were those where the representation of EC numbers in the transcriptome increased/decreased dramatically within 1–2 d of a diet switch; “gradual” responses were those where the representation of EC numbers changed notably, but slowly, between the two diet transition points; and “delayed” responses were those where significant change did not occur until the end of a diet phase (Figure 3, Table S7). EC numbers associated with reactions important in carbohydrate metabolism and transport were distributed across all three of these response types for each of the two diets. Nearly all genes encoding proteins with EC numbers related to amino acid metabolism that were significantly up-regulated on HF/HS chow binned into the “rapid” or “gradual” groups, suggesting this diet put immediate pressure on the artificial microbial community to increase its repertoire of expressed amino acid biosynthesis and degradation genes. Genes with assigned EC numbers involved in amino acid metabolism that were significantly up-regulated on the other, polysaccharide-rich, LF/HPP diet were spread more evenly across these three response types (Figure 3). Download: PPT PowerPoint slide PNG larger image TIFF original image Figure 3. Top-down analysis of fecal microbiome RNA expression in mice receiving oscillating diets. The fecal metatranscriptomes of four animals in the LF/HPP→HF/HS→LF/HPP treatment group of E2 were analyzed using microbial RNA-Seq at seven time-points to evaluate the temporal progression of changes in expressed microbial community functions triggered by a change in diet. After aligning reads to genes in the defined artificial human gut microbiome, raw counts were collapsed by the functional annotation (EC number) of the gene from which the corresponding reads originated. Total counts for each EC number in each sample were normalized, and any EC numbers demonstrating a statistically significant difference in their representation in the metatranscriptome between the final days of the first two diet phases were identified using a model based on the negative binomial distribution [57]. Normalized expression values for 157 significant EC numbers (out of 1,021 total tested) were log-transformed, mean-centered, and subjected to hierarchical clustering, followed by heatmap visualization. “Rapid” responses are those where expression increased/decreased dramatically within 1–2 d of a diet switch. “Gradual” responses are those where expression changed notably, but slowly, between the two diet transition points. “Delayed” responses are those where significant expression changes did not occur until the end of a diet phase. EC numbers specifying enzymatic reactions relevant to carbohydrate metabolism and/or transport are denoted by purple markers, while those with relevance to amino acid metabolism are indicated using orange markers. A full breakdown of all significant responses over time and the outputs of the statistical tests performed are provided in Table S7. https://doi.org/10.1371/journal.pbio.1001637.g003 Careful inspection of our top-down analysis results and a complementary “bottom-up” analysis in which normalization was performed at the level of individual species, rather than at the community level, allowed us to identify other important responses that would have gone undetected were it not for the fact that we were dealing with a defined assemblage of microbes where all of the genes in component members' genomes were known. For example, an assessment of the representation of EC 3.2.1.8 (endo-1,4-β-xylanase) within the metatranscriptome before and after the first diet switch (LF/HPP→HF/HS) initially suggested that this activity was reduced to a statistically significant degree as a result of the first diet perturbation (day 13 versus day 27; Mann–Whitney U test, p = 0.03; Figure S7A). Aggregation by species of all sequencing read counts assignable to mRNAs encoding proteins with this EC number revealed that over 99% of the contributions to this functional bin originated from B. cellulosilyticus WH2 (note the similarity in a comparison of Figure S7A and Figure S7B), implying that the community-level response and the response of this Bacteroides species were virtually one and the same. A tally of all sequencing reads assignable to B. cellulosilyticus WH2 at each time-point disclosed that although this strain maintains high proportional representation in the artificial community throughout each diet oscillation period (range, 10.3–42.5% and 11.6–43.3% for E1 and E2, respectively), its contribution to the metatranscriptome is substantially decreased during the HF/HS diet phase (Figure S7C). This dramatic reduction in the extent to which B. cellulosilyticus WH2 contributes to the metatranscriptome in HF/HS-fed mice “masks” the significant up-regulation of EC 3.2.1.8 that occurs within the B. cellulosilyticus WH2 transcriptome following the first diet shift (day 13 versus day 27; Mann–Whitney U test, p = 0.03; Figure S7D). A further breakdown of endo-1,4-β-xylanase up-regulation in B. cellulosilyticus WH2 when mice are switched to the HF/HS diet reveals that most of this response is driven by two genes, BWH2_4068 and BWH2_4072 (Figure S7E). Our realization that we were unable to correctly infer the direction of one of the most significant diet-induced gene expression changes in the second most abundant strain in the artificial community when inspecting functional responses at the community level provides a strong argument for expanding the use of microbial assemblages comprised exclusively of sequenced species in studies of the gut microbiota. This should allow the contributions of individual species to community activity to be evaluated in a rigorous way that is not possible with microbial communities of unknown or poorly defined gene composition. High-Resolution Profiling of the Cecal Metaproteome Sampled at the Time of Sacrifice In principle, protein measurements can provide a more direct readout of expressed community functions than an RNA-level analysis, and thus a deeper understanding of community members' interactions with one another and with their habitat [37],[38]. For these reasons and others, much work has been dedicated to applying shotgun proteomics techniques to microbial ecosystems in various environments [39],[40]. Though these efforts have provided illustrations of significant methodological advances, they have been limited by the complexity of the metaproteomes studied and by the difficulties this complexity creates when attempting to assign peptide identities uniquely to proteins of specific taxa. Recognizing that a metaproteomics analysis of our artificial community would not be subject to such uncertainty given its fully defined microbiome and thus fully defined theoretical proteome, we subjected cecal samples from two mice from each diet treatment group in E1 (n = 4 total) to high-performance liquid chromatography-tandem mass spectrometry (LC-MS/MS; see Materials and Methods). We had three goals: (i) to evaluate how our ability to assign peptide-spectrum matches (PSMs) to particular proteins within a theoretical metaproteome is affected by the presence of close homologs within the same species and within other, closely related species; (ii) to test the limits of our ability to characterize protein expression across different species given the substantial dynamic range we documented in microbial species abundance; and (iii) to collect semiquantitative peptide/protein data that might validate and enrich our understanding of functional responses identified at the mRNA level, particularly with respect to the niche (profession) of CAZyme-rich B. cellulosilyticus WH2. Given the evolutionary relatedness of the strains involved, we expected that some fraction of observed PSMs from each sample would be of ambiguous origin due to nonunique peptides shared between species' proteomes. To assess which species might be most affected by this phenomenon when characterizing the metaproteome on different diets, we catalogued each strain's theoretical peptidome using an in silico tryptic digest. This simulated digest took into account both the potential for missed trypic cleavages and the peptide mass range that could be detected using our methods. The results (Figure S8A, Table S8) demonstrated that for an artificial community of modest complexity, the proportion of peptides within each strain's theoretical peptidome that are “unique” (i.e., assignable to a single protein within the theoretical metaproteome) varies substantially from species to species, even among those that are closely related. We found the lone representative of the Actinobacteria in the artificial community, Collinsella aerofaciens, to have the highest proportion of unique peptides (94.2%), while B. caccae had the lowest (63.0%). Interestingly, there was not a strong correlation between the fraction of a species' peptides that were unique and the total number of unique peptides that species contributed to the theoretical peptidome. For example, C. aerofaciens (2,367 predicted protein-coding genes) contributed only 81,894 (1.5%) unique peptides, the lowest of any artificial community member evaluated, despite having a proteome composed of mostly unique peptides. On the other hand, B. cellulosilyticus WH2 (5,244 predicted protein-coding genes) contributed 241,473 (4.5%) unique peptides, the highest of any member despite a high fraction of nonunique peptides (18.4%) within its theoretical peptidome. The evolutionary relatedness of the Bacteroides components of the artificial community appeared to negatively affect our ability to assign their peptides to specific proteins; their six theoretical peptidomes had the six lowest uniqueness levels. However, their greater number of proteins and peptides relative to the Firmicutes and Actinobacteria more than compensated for this deficiency; over 60% of unique peptides within the unique theoretical metaproteome were contributed by the Bacteroides. We also found that the proportion of PSMs uniquely assignable to a single protein within the metaproteome varied significantly by function, suggesting that some classes of proteins can be traced back to specific microbes more readily than others. For example, when considering all theoretical peptides that could be derived from the proteome of a particular bacterial species, those from proteins with roles in categories with high expected levels of functional conservation (e.g., translation and nucleotide metabolism) were on average deemed unique more often than those from proteins with roles in functions we might expect to be less conserved (e.g., glycan biosynthesis and metabolism) (see Table S8 for a summary of how peptide uniqueness varied across different KEGG categories and pathways, and across different species in the experiment). However, even in KEGG categories and pathways with high expected levels of functional conservation, the vast majority of peptides were found to be unique when a particular species was not closely related to other members of the artificial community. Next, we determined the average number of proteins that could be experimentally identified in our samples for each microbial species within each treatment group in E1. The results (Figure S8B, Table S9) illustrate two important conclusions. First, although equal concentrations of total protein were evaluated for each sample, slightly less than twice as many total microbial proteins were identified in samples from the LF/HPP-fed mice as those from mice fed the HF/HS diet (4,659 versus 2,777, respectively). While there are a number of possible explanations, both this finding and the higher number of mouse proteins detected in samples from HF/HS-fed animals are consistent with the results of our fecal DNA analysis, which indicated that the HF/HS diet supports lower levels of gut microbial biomass than the LF/HPP diet (Figure S4A,B). Second, a breakdown of all detected microbial proteins by species of origin (Figure S8B) revealed that the degree to which we could inspect protein expression for a given species was dictated largely by its relative abundance and the diet to which it was exposed. Our ability to detect many of B. cellulosilyticus WH2's expressed transcripts and proteins in samples from both diet treatment groups allowed us to determine how well RNA and protein data for an abundant, active member of the artificial community might correlate. These data also allowed us to evaluate whether or not the types of genes considered might influence the degree of correlation between these two datasets. We first performed a spectral count-based correlation analysis on the diet-induced, log-transformed, average fold-differences in expression for all B. cellulosilyticus WH2 genes that were detectable at both the RNA and protein level for both diets. The results revealed a moderate degree of linear correlation between RNA and protein observations (Figure S8C, black plot; r = 0.53). However, subsequent division of these genes into functionally related subsets, which were each subjected to their own correlation analysis, revealed striking differences in the degree to which RNA-level and protein-level expression changes agreed with one another. For example, diet-induced changes in mRNA expression for genes involved in translation showed virtually no correlation with changes measured at the protein level (Figure S8C, red plot; r = 0.03). Correlations for other categories of B. cellulosilyticus WH2 genes, such as those involved in energy metabolism (Figure S8C, green plot; r = 0.36) and amino acid metabolism (Figure S8C, orange plot; r = 0.48), were also poorer than the correlation for the complete set of detectable genes. In contrast, the correlation for the 110 genes with predicted involvement in carbohydrate metabolism was quite strong (Figure S8C, blue plot; r = 0.69), and was in fact the best correlation identified for any functional category of genes considered. The significant range of correlations observed in different categories of genes suggests that the degree to which RNA-based analyses provide an accurate picture of microbial adaptation to environmental perturbation may be strongly impacted by the functional classification of the genes involved. Additionally, these data further emphasize the need for enhanced dynamic range metaproteome measurements and better bioinformatic methods to assign/bin unique and nonunique peptides so that deeper and more thorough surveys of the microbial protein landscape can be performed and evaluated alongside more robust transcriptional datasets. Identifying Two Diet-Inducible, Xylanase-Containing PULs Whose Genetic Disruption Results in Diet-Specific Loss of Fitness Several of the most highly expressed and diet-sensitive B. cellulosilyticus WH2 genes in this study fell within two putative PULs. One PUL (BWH2_4044–55) contains 12 ORFs that include a dual susC/D cassette, three putative xylanases assigned to CAZy families GH8 and GH10, a putative multifunctional acetyl xylan esterase/α-L-fucosidase, and a putative hybrid two-component system regulator (Figure 4A). Gene expression within this PUL was markedly higher in mice consuming the plant polysaccharide-rich LF/HPP diet at both the mRNA and protein level. Our mRNA-level analysis disclosed that BWH2_4047 was the most highly expressed B. cellulosilyticus WH2 susD homolog on this diet. Likewise, BWH2_4046/4, the two susC-like genes within this PUL, were the 2nd and 4th most highly expressed B. cellulosilyticus WH2 susC-like genes in LF/HPP-fed animals, and exhibited expression level reductions of 99.5% and 93% in animals consuming the HF/HS diet. The same LF/HPP diet bias was observed for other genes within this PUL (Figures 2A and 4B) but not for neighboring genes. The same trends were obvious and amplified when we quantified protein expression (Figure 4C). In mice fed LF/HPP chow, only three B. cellulosilyticus WH2 SusC-like proteins had higher protein levels than BWH2_4044/6, and only two SusD-like proteins had higher levels than BWH2_4045/7. Strikingly, we were unable to detect a single peptide from 9 of the 12 proteins in this PUL in samples obtained from mice fed the HF/HS diet, emphasizing the strong diet specificity of this locus. Download: PPT PowerPoint slide PNG larger image TIFF original image Figure 4. Two xylanase-containing B. cellulosilyticus WH2 PULs demonstrating strong diet-specific expression patterns in vivo. (A) The PUL spanning BWH2_4044–55 includes a four-gene cassette comprising two consecutive susC/D pairs, multiple genes encoding GHs and CEs, and a gene encoding a putative hybrid two-component system (HTCS) presumed to play a role in the regulation of this locus. GH10 enzymes are endo-xylanases (most often endo-β-1,4-xylanases), while some GH5 and GH8 enzymes are also known to have endo- or exo-xylanase activity. CE6 enzymes are acetyl xylan esterases, as are some members of the CE1 family. A second PUL spanning BWH2_4072–6 contains a susC/D cassette, an endo-xylanase with dual GH10 modules as well as dual carbohydrate (xylan) binding modules (CBM22), a hypothetical protein of unknown function, and a putative HTCS. (B) Heatmap visualization of GeneChip expression data for BWH2_4044–55 and BWH2_4072–6 showing marked up-regulation of these putative PULs when mice are fed either a plant polysaccharide-rich LF/HPP diet or a diet high in fat and simple sugar (HF/HS), respectively. Data are from cecal contents harvested from mice at the endpoint of experiment E1. (C) Mass spectrometry-based quantitation of the abundance of all cecal proteins from the BWH2_4044–55 and BWH2_4072–6 PULs that were detectable in the same material used for GeneChip quantitation in panel (B). Bars represent results (mean ± SEM) from two technical runs per sample. For each MS run, the spectral counts for each protein were normalized against the total number of B. cellulosilyticus WH2 spectra acquired. (D) Comparison of in vivo PUL gene expression as measured by RNA-Seq (top) and the degree to which disruption of each gene within each PUL by a transposon impacts the fitness of B. cellulosilyticus WH2 on each diet, as measured by insertion sequencing (INSeq, bottom). For the lower plots, fitness measurements were calculated by dividing a mutant's representation (normalized sequencing counts) within the fecal output population by its representation within an input population that was introduced into germ-free animals via a single oral gavage along with other members of the artificial community. For cases in which no instances of a particular mutant could be measured in the fecal output (resulting in a fitness calculation denominator of zero), data are plotted as “<0.01” and are drawn without error bars. https://doi.org/10.1371/journal.pbio.1001637.g004 A second PUL in the B. cellulosilyticus WH2 genome composed of a susC/D-like pair (BWH2_4074/5), a putative hybrid two-component system regulator (BWH2_4076), and a xylanase (GH10) with dual carbohydrate binding module domains (CBM22) (BWH2_4072) (Figure 4A) demonstrated a strong but opposite diet bias, in this case exhibiting significantly higher expression in animals consuming the HF/HS “Western”-like diet. Our mRNA-level analysis showed that this xylanase was the second most highly expressed B. cellulosilyticus WH2 CAZyme in animals consuming this diet (Figure 2A). As with the previously described PUL, shotgun metaproteomics validated the transcriptional analysis (Figure 4B,C); with the exception of the gene encoding the PUL's presumed transcriptional regulator (BWH2_4076), diet specificity was substantial, with protein-level fold changes ranging from 10–33 across the locus (Table S10). Recent work by Cann and co-workers has done much to advance our understanding of the regulation and metabolic role of xylan utilization system gene clusters in xylanolytic members of the Bacteroidetes, particularly within the genus Prevotella [41]. The “core” gene cluster of the prototypical xylan utilization system they described consists of two tandem repeats of susC/susD homologs (xusA/B/C/D), a downstream hypothetical gene (xusE) and a GH10 endoxylanase (xyn10C). The 12-gene PUL identified in our study (BWH2_4044–55) appears to contain the only instance of this core gene cluster within the B. cellulosilyticus WH2 genome, suggesting that this PUL, induced during consumption of a plant polysaccharide-rich diet, is likely to be the primary xylan utilization system within this organism. A recent study characterizing the carbohydrate utilization capabilities of B. ovatus ATCC 8483 also identified two PULs involved in xylan utilization (BACOVA_04385–94, BACOVA_03417–50) whose gene configurations differ from those described in Prevotella spp. [25]. Interestingly, the five proteins encoded by the smaller xylanase-containing PUL described above (BWH2_4072–6) are homologous to the products of the last five genes in BACOVA_4385–94 (i.e., BACOVA_4390–4). The order of these five genes in these two loci is also identical. The similarities and differences observed when comparing the putative xylan utilization systems encoded within the genomes of different Bacteroidetes illustrate how its members may have evolved differentiated strategies for utilizing hemicelluloses like xylan. Having established that expression of BWH2_4044–55 and BWH2_4072–6 is strongly dictated by diet, we next sought to determine if these PULs are required by B. cellulosilyticus WH2 for fitness in vivo. A follow-up study was performed in which mice were fed either a LF/HPP or HF/HS diet after being colonized with an artificial community similar to the one used in E1 and E2 (see Materials and Methods). The wild-type B. cellulosilyticus WH2 strain used in our previous experiments was replaced with a transposon mutant library consisting of over 90,000 distinct transposon insertion mutants in 91.5% of all predicted ORFs (average of 13.9 distinct insertion mutants per ORF). The library was constructed using methods similar to those reported by Goodman et al. ([42]; see Materials and Methods) so that the relative proportion of each insertion mutant in both the input (orally gavaged) and output (fecal) populations could be determined using insertion sequencing (INSeq). The INSeq results revealed clear, diet-specific losses of fitness when components of these loci were disrupted (Figure 4D). Additionally, as observed in E1 and E2, expression of each PUL was strongly biased by diet, with genes BWH2_4072–5 demonstrating up-regulation on the HF/HS diet and BWH2_4044–55 on the LF/HPP diet. The extent to which a gene's disruption impacted the fitness of B. cellulosilyticus WH2 on one diet or the other correlated well with whether or not that gene was highly expressed on a given diet. For example, four of the five most highly expressed genes in the BWH2_4044–55 locus were the four genes shown to be most crucial for fitness on the LF/HPP diet. Of these four genes, three were susC or susD homologs (the fourth was the putative endo-1,4-β-xylanase thought to constitute the last element of the xylan utilization system core). Though the fitness cost of disrupting genes within BWH2_4044–55 varied from gene to gene, disruption of any one component of the BWH2_4072–6 PUL had serious consequences for B. cellulosilyticus WH2 in animals fed the HF/HS diet. This could suggest that while disruption of some components of the BWH2_4044–55 locus can be rescued by similar or redundant functions elsewhere in the genome, the same is not true for BWH2_4072–5. Notably, disruption of BWH2_4076, which is predicted to encode a hybrid two-component regulatory system, had negative consequences on either diet tested, indicating that regulation of this locus is crucial even when the PUL is not actively expressed. While many genes outside of these two PULs were also found to be important for the in vivo fitness of B. cellulosilyticus WH2, those within these PULs were among the most essential to diet-specific fitness, suggesting that these loci are central to the metabolic lifestyle of B. cellulosilyticus WH2 in the gut. Characterizing the Carbohydrate Utilization Capabilities of B. cellulosilyticus WH2 and B. caccae The results described in the preceding section indicate that B. cellulosilyticus WH2 prioritizes xylan as a nutrient source in the gut and that it tightly regulates the expression of its xylan utilization machinery. Moreover, the extraordinary number of putative CAZymes and PULs within the B. cellulosilyticus WH2 genome suggests that it is capable of growing on carbohydrates with diverse structures and varying degrees of polymerization. To characterize its carbohydrate utilization capabilities, we defined its growth in minimal medium (MM) supplemented with one of 46 different carbohydrates [25]. Three independent growths, each consisting of two technical replications, yielded a total of six growth curves for each substrate. Of the 46 substrates tested, B. cellulosilyticus WH2 grew on 39 (Table S11); they encompassed numerous pectins (6 of 6), hemicelluloses/β-glucans (8 of 8), starches/fructans/α-glucans (6 of 6), and simple sugars (14 of 15), as well as host-derived glycans (4 of 7) and one cellooligosaccharide (cellobiose). The seven substrates that did not support growth included three esoteric carbohydrates (carrageenan, porphyran, and alginic acid), the simple sugar N-acetylneuraminic acid, two host glyans (keratan sulfate and mucin O-glycans), and fungal cell wall-derived α-mannan. B. cellulosilyticus WH2 clearly grew more robustly on some carbohydrates than others. Excluding simple sugars, fastest growth was achieved on dextran (0.099±0.048 OD600 units/h), laminarin (0.095±0.014), pectic galactan (0.088±0.018), pullulan (0.088±0.026), and amylopectin (0.085±0.003). Although one study has reported that the type strain of B. cellulosilyticus degrades cellulose [43], the WH2 strain failed to demonstrate any growth on MM plus cellulose (specifically, Solka-Floc 200 FCC from International Fiber Corp.) after 5 d. Maximum cell density was achieved with amylopectin (1.17±0.02 OD600 units), dextran (1.12±0.20), cellobiose (1.09±0.08), laminarin (1.08±0.08), and xyloglucan (0.99±0.04). Total B. cellulosilyticus WH2 growth (i.e., maximum cell density achieved) on host-derived glycans was typically very poor, with only two substrates achieving total growth above 0.2 OD600 units (chondroitin sulfate, 0.50±0.04; glycogen, 0.99±0.02). The disparity between total growth on plant polysaccharides versus host-derived glycans, including O-glycans that are prevalent in host mucin, indicates a preference for diet-derived saccharides, consistent with our in vivo mRNA and protein expression data. We also determined how the growth rate of B. cellulosilyticus WH2 on these substrates compared to the growth rates for other prominent gut Bacteroides spp. After subjecting B. caccae to the same phenotypic characterization as B. cellulosilyticus WH2, we combined our measurements for these two strains with previously published measurements for B. thetaiotaomicron and B. ovatus [25]. The results underscored the competitive growth advantage B. cellulosilyticus WH2 likely enjoys when foraging for polysaccharides in the intestinal lumen. For example, of the eight hemicelluloses and β-glucans tested in our carbohydrate panel, B. cellulosilyticus WH2 grew fastest on six while B. ovatus grew fastest on two (Table S11). B. caccae and B. thetaiotaomicron, on the other hand, failed to grow on any of these substrates. Across all the carbohydrates for which data are available for all four species, B. cellulosilyticus WH2 grew fastest on the greatest number of polysaccharides (11 of 26) and tied with B. caccae for the greatest number of monosaccharides (6 of 15). B. thetaiotaomicron and B. caccae did, however, outperform the other two Bacteroides tested with respect to utilization of host glycans in vitro, demonstrating superior growth rates on four of five substrates tested (Table S11). B. cellulosilyticus WH2's rapid growth to high densities on xylan, arabinoxylan, and xyloglucan, as well as xylose, arabinose, and galactose, is noteworthy given our prediction that two of its most tightly regulated, highly expressed PULs appear to be involved in the utilization of xylan, arabinoxylan, or some closely related polysaccharide. To identify specific mono- and/or polysaccharides capable of triggering the activation of these two PULs, as well as the 111 other putative PULs within the B. cellulosilyticus WH2 genome, we used RNA-Seq to characterize its transcriptional profile at mid-log phase in MM (Table S12) plus one of 16 simple sugars or one of 15 complex sugars (Table S13) (see Materials and Methods; n = 2–3 cultures/substrate; 5.2–14.0 million raw Illumina HiSeq reads generated for each of the 90 transcriptomes). After mapping each read to the B. cellulosilyticus WH2 reference gene set, counts were normalized using DESeq to allow for direct comparisons across samples and conditions. Hierarchical clustering of the normalized dataset resulted in a well-ordered dendrogram in which samples clustered almost perfectly by the carbohydrate on which B. cellulosilyticus WH2 was grown (Figure 5A). The consistency of this clustering illustrates that (i) technical replicates within each condition exhibit strong correlations with one another, suggesting any differences between cultures in a treatment group (e.g., small differences in density or growth phase) had at best minor effects on aggregate gene expression, and (ii) growth on different carbohydrates results in distinct, substrate-specific gene expression signals capable of driving highly discriminatory differences between treatment groups. The application of rigorous bootstrapping to our hierarchical clustering results also revealed several cases of higher level clusters in which strong confidence was achieved. These dendrogram nodes (illustrated as white circles) indicate sets of growth conditions that yield gene expression patterns more like each other than like the patterns observed for other substrates. Two notable examples were xylan/arabinoxylan (which are structurally related and share the same xylan backbone) and L-fucose/L-rhamnose (which are known to be metabolized via parallel pathways in E. coli [44]). Download: PPT PowerPoint slide PNG larger image TIFF original image Figure 5. In vitro microbial RNA-Seq profiling of B. cellulosilyticus WH2 during growth on different carbohydrates. (A) Hierarchical clustering of the gene expression profiles of 90 cultures grown in minimal medium supplemented with one of 31 simple or complex sugars (n = 2–3 replicates per condition). Circles at dendrogram branch points identify clusters with strong bootstrapping support (>95%; 10,000 repetitions). Solid circles denote clusters comprising only replicates from a single treatment group/carbohydrate, while open circles denote higher level clusters comprising samples from multiple treatment groups. Colored rectangles indicate the type of carbohydrate on which the samples within each cluster were grown. (B) Unclustered heatmap representation of fold-changes in gene expression relative to growth on minimal medium plus glucose (MM-Glc) for 60 of the 236 paired susC- and susD-like genes identified within the B. cellulosilyticus WH2 genome (for a full list of all paired and unpaired susC and susD homologs, see Table S2). Data shown are limited to those genes whose expression on at least one of the 31 carbohydrates tested demonstrated a >100-fold increase relative to growth on MM-Glc for at least one of the replicates within the treatment group. Yellow boxes denote areas of the map where both genes in a susC/D pair were up-regulated >100-fold for at least two of the replicates in a treatment group and where the average up-regulation for each gene in the pair was >100-fold across all replicates of the treatment group. Two sets of columns to the right of the heatmap indicate PULs that were detectably expressed at the mRNA level (left set of columns) and/or protein level (right set of columns) in experiment 1 (E1). Red and black circles indicate that both genes in a susC/D pair were consistently expressed on a particular diet, as determined by GeneChip analysis of cecal RNA (≥5 of 7 animals assayed) or LC-MS/MS analysis of cecal protein (2 of 2 animals assayed). In both cases, a red circle denotes significantly higher expression on one diet compared to the other. https://doi.org/10.1371/journal.pbio.1001637.g005 Importantly, these findings suggested that by considering in vitro profiling data alongside in vivo expression data from the artificial community, it might be possible to identify the particular carbohydrates to which B. cellulosilyticus WH2 is exposed and responding within its gut environment. To explore this concept further, we compared expression of each gene in each condition to its expression on our control treatment, MM plus glucose (MM-Glc). The results revealed a dynamic PUL activation network in which some PULs were activated by a single substrate, some were activated by multiple substrates, and some were transcriptionally silent across all conditions tested. Of the 118 putative susC/D pairs in B. cellulosilyticus WH2 that we have used as markers of PULs, 30 were dramatically activated on one or more of the substrates tested; in these cases, both the susC- and susD-like genes in the cassette were up-regulated an average of >100-fold relative to MM-Glc across all technical replicates (Figure 5B). At least one susC/D activation signature was identified for every one of the 17 oligosaccharides and polysaccharides and for six of the 13 monosaccharides tested (Table S14). The lack of carbohydrate-specific PUL activation events for some monosaccharides (fructose, galactose, glucuronic acid, sucrose, and xylose) was expected, given that these loci are primarily dedicated to polysaccharide acquisition. Further inspection of gene expression outside of PULs disclosed that B. cellulosilyticus WH2 prioritizes use of its non-PUL-associated carbohydrate machinery, such as putative phosphotransferase system (PTS) components and monosaccharide permeases, when grown on these monosaccharides (Table S14). Several carbohydrates activated the expression of multiple PULs. Growth on water-soluble xylan and wheat arabinoxylan produced significant up-regulation of five susC/D-like pairs (BWH2_0865/6, 0867/8, 4044/5, 4046/7, and 4074/5). No other substrate tested activated as many loci within the genome, again hinting at the importance of xylan and arabinoxylan to this strain's metabolic strategy in vivo. Cecal expression data from E1 showed that 15 of these activated PULs were expressed in vivo on one or both of the diets tested (see circles to the right of the heatmap in Figure 5B). In mice fed the polysaccharide-rich LF/HPP chow, B. cellulosilyticus WH2 up-regulates three susC/D pairs (BWH2_2717/8, 4044/5, 4046/7) whose expression is activated in vitro by arabinan and xylan/arabinoxylan. The three most significantly up-regulated susC/D pairs (BWH2_1736/7, 2514/5, 4074/5) in mice fed the HF/HS diet rich in sugar, corn starch, and maltodextrin are activated in vitro by amylopectin, ribose, and xylan/arabinoxylan, respectively. All three PULs identified as being up-regulated at the RNA level in LF/HPP-fed mice were also found to be up-regulated at the protein level (Figure 5B). Two of the three PULs up-regulated at the mRNA level in HF/HS-fed mice were up-regulated at the protein level as well. The presence of an amylopectin-activated PUL among these two loci is noteworthy, given the significant amount of starch present in the HF/HS diet. The up-regulation of four other PULs in HF/HS-fed animals was only evident in our LC-MS/MS data, reinforcing the notion that protein data both complement and supplement mRNA data when profiling microbes of interest. Two of the five susC/D pairs activated by xylan/arabinoxylan form the four-gene cassette in the previously discussed PUL comprising BWH2_4044–55 that is activated in mice fed the plant polysaccharide-rich chow (see Figure 4A). Another one of the five is the susC/D pair found in the PUL comprising BWH2_4072–6 that is activated in mice fed the HF/HS “Western”-like chow (see Figure 4A). Thus, we have identified a pair of putative PULs in close proximity to one another on the B. cellulosilyticus WH2 genome that encode CAZymes with similar predicted functions, are subject to near-identical levels of specific activation by the same two polysaccharides (i.e., xylan, arabinoxylan) in vitro, but are discordantly regulated in vivo in a diet-specific manner. The highly expressed nature of these PULs in the diet environment where they are active, their shared emphasis on xylan/arabinoxylan utilization, and their tight regulation indicate that they are likely to be important for the in vivo success of this organism in the two nutrient environments tested. However, the reasons for their discordant regulation are unclear. One possibility is that in addition to being activated by xylan/arabinoxylan and related polysaccharides, these loci are also subject to repression by other substrates present in the lumen of the gut, and this repression is sufficient to block activation. Alternatively, the specific activators of each PUL may be molecular moieties shared by both xylan and arabinoxylan that do not co-occur in the lumenal environment when mice are fed the diets tested in this study. Prospectus Elucidating generalizable “rules” for how microbiota operate under different environmental conditions is a substantial challenge. As our appreciation for the importance of the gut microbiota in human health and well-being grows, so too does our need to develop such rules using tractable experimental models of the gut ecosystem that allow us to move back and forth between in vivo and ex vivo analyses, using one to inform the other. Here, we have demonstrated the extent to which high-resolution DNA-, mRNA-, and protein-level analyses can be applied (and integrated) to study an artificial community of sequenced human gut microbes colonizing gnotobiotic mice. Our efforts have focused on characterizing community-level and species-level adaptation to dietary change over time and “leveraging” results obtained from in vitro assessments of individual species' responses to a panel of purified carbohydrates to deduce glycan exposures and consumption strategies in vivo. This experimental paradigm could be applied to any number of questions related to microbe–microbe, environment–microbe, and host–microbe interactions, including, for example, the metabolic fate of particular nutrients of interest (metabolic flux experiments), microbial succession, and biotransformations of xenobiotics. Studying artificial human gut microbial communities in gnotobiotic mice also allows us to evaluate the technical limitations of current molecular approaches for characterizing native communities. For example, the structure of an artificial community can be evaluated over time at low cost using short read shotgun DNA sequencing data mapped to all microbial genomes within the community (COPRO-Seq). This allows for a much greater depth of sequencing coverage (i.e., more sensitivity) and much less ambiguity in the assignment of reads to particular taxa than traditional 16S rRNA gene-based sequencing. Short read cDNA sequences transcribed from total microbial community RNA can also often be assigned to the exact species and gene from which they were derived, and the same is also often true for peptides derived from particular bacterial proteins. However, substantial dynamic range in species/transcript/protein abundance within any microbiota, defined or otherwise, imposes limits on our ability to characterize the least abundant elements of these systems. The effort to obtain a more complete understanding of the operations and behaviors of minor components of the microbiota is an area deserving of significant attention, given known examples of low-abundance taxa that play key roles within their larger communities and in host physiology [2],[45]. Developing such an understanding requires methods and assays that are collectively capable of assessing the structure and function of a microbiota at multiple levels of resolution. The need for high sensitivity and specificity in these approaches will become increasingly relevant as we transition towards experiments involving defined communities of even greater complexity, including bacterial culture collections prepared from the fecal microbiota of humans [46]. We anticipate that the study of sequenced culture collections transplanted to gnotobiotic mice will be instrumental in determining the degree to which physiologic or pathologic host phenotypes can be ascribed to the microbiota as well as specific constituent taxa. The recent development of a low-error 16S ribosomal RNA amplicon sequencing method (LEA-Seq) and the application of this method to the fecal microbiota of 37 healthy adults followed for up to 5 years indicated that individuals in this cohort contained 195±48 bacterial strains representing 101±27 species [47]. Furthermore, stability follows a power-law function, suggesting that once acquired, most gut strains in a person are present for decades. New advances in the culturing of fastidious gut microbes may one day allow us to capture most (or all) of the taxonomic and functional diversity present within an individual's fecal microbiota as a clonally arrayed, sequenced culture collection, providing a perfectly representative and defined experimental model of their gut community. In the meantime, first-generation artificial communities of modest complexity such as the one described here offer a way of studying many questions related to the microbiota. However, the limited complexity and composition of our 12-species artificial community, and the way in which it was assembled in germ-free mice, make it an imperfect model of more complex human microbiota. Native microbial communities, for example, are subject to the influence of variables that are notably absent from this system, such as intraspecies genetic variability and exogenous microbial inputs. There are also taxa (e.g., Proteobacteria, Bifidobacteria) and microbial guilds (e.g., butyrate producers) typical of human gut communities that are absent from our defined assemblage that could be used to augment this system in order to improve our understanding of how their presence/absence influences a microbiota's response to diet and a spectrum of other variables of interest. These future attempts to systematically increase complexity should reveal what trends, patterns, and trajectories observed in artificial assemblages such as the one reported here map or do not map onto natural communities. Finally, one of the greatest advantages of studying defined assemblages in mice is that they afford us the ability to interrogate the biology of key bacterial species in a focused manner. The artificial community we used in our experiments included B. cellulosilyticus WH2, a species that warrants further study as a model gut symbiont given its exceptional carbohydrate utilization capabilities, its apparent fitness advantage over many other previously characterized gut symbionts, and its genetic tractability. This genetic tractability should facilitate future experiments in which transposon mutant libraries are screened in vivo as one component of a larger artificial community in order to identify this strain's most important fitness determinants under a wide variety of dietary conditions. Identifying the genetic elements that allow B. cellulosilyticus to persist at the relatively high levels observed, regardless of diet, should provide microbiologists and synthetic biologists with new “standard biological parts” that will aid them in developing the next generation of prebiotics, probiotics, and synbiotics. Materials and Methods Ethics Statement All experiments involving mice used protocols approved by the Washington University Animal Studies Committee in accordance with guidelines set forth by the American Veterinary Medical Association. Trained veterinarians from the Washington University Division of Comparative Medicine supervised all experiments. The laboratory animal program at Washington University is accredited by the Association for Assessment and Accreditation of Laboratory Animal Care International (AAALAC). B. cellulosilyticus WH2 Genome Sequencing A strain of B. cellulosilyticus designated “WH2” (see Figure S1A,B) was isolated from a human fecal sample during an iteration of the Microbial Diversity Summer Course overseen by A. Salyers (University of Illinois, Urbana-Champaign) at the Marine Biological Laboratory (Woods Hole, MA). The genome of this isolate was sequenced using a combination of long-read and short-read technologies, yielding 51,819 plasmid and fosmid end reads (library insert sizes: 3.9, 4.9, 6.0, 8.0, and 40 kb; ABI 3730 platform), 333,883 unpaired 454 reads (FLX+ and XL+ chemistry), and 10 million unpaired Illumina reads (HiSeq; 42 nt read length). A hybrid assembly was constructed using MIRA v3.4.0 (method, de novo; type, genome; quality grade, accurate) with default settings [48],[49]. Gene calling was performed using the YACOP metatool [50]. Additionally, the four ribosomal RNA (rRNA) operons within the B. cellulosilyticus WH2 genome were sequenced individually to ensure high sequence accuracy in these difficult-to-assemble regions. Further details for the B. cellulosilyticus WH2 assembly are provided in Table S1. Bacterial Strains Details regarding the 12 bacterial strains used in this study are provided in Table S4. Cells were grown in supplemented TYG (TYGS; [42]) at 37°C under anaerobic conditions in a Coy anaerobic chamber (atmosphere: 75% N2, 20% H2, 5% CO2). After reaching stationary phase, cells were pelleted by centrifugation and resuspended in TYGS medium supplemented with 20% glycerol. Individual aliquots containing 400–800 µL of each cell suspension were stored at −80°C in 1.8 mL borosilicate glass vials with aluminum crimp tops. The identity of each species was verified prior to its use in experiments by extracting DNA from a frozen aliquot of cells, amplifying the 16S rRNA gene by PCR using primers 8F/27F (AGAGTTTGATCCTGGCTCAG; [51]) and 1391R (GACGGGCGGTGWGTRCA; [52]), sequencing the entire amplicon with an ABI 3730 capillary sequencer (Retrogen, Inc.), and comparing the assembled 16S rRNA gene sequence to the known reference sequence. Preparation of Strains for Oral Gavage Details regarding the construction of each inoculum are provided in Table S3. The inocula used to gavage germ-free mice in each experiment were prepared either directly from frozen stocks (experiment 1, E1) or from a combination of frozen stocks and overnight cultures (experiment 2, E2). The recoverable cell density for each batch of frozen stocks used in inoculum preparation was determined prior to pooling, while the same values for overnight cultures were calculated after pooling. To do so, an aliquot of cells from each overnight culture or set of frozen stocks was used to prepare a 10-fold dilution series in phosphate-buffered saline (PBS), and each dilution series was plated on brain-heart-infusion (BHI; BD Difco) agar supplemented with 10% (v/v) defibrinated horse blood (Colorado Serum Co.). Plates were grown for up to 3 d at 37°C under anaerobic conditions in a Coy chamber, colonies were counted, and the number of colony-forming units per milliliter (CFUs/mL) was calculated. The volume of each cell suspension included in the final inoculum was normalized by its known or estimated viable cell concentration in an effort to ensure that no species received an early advantage during establishment of the artificial community in the germ-free animals. Total CFUs per gavage were estimated at 8.0×107 and 4.2×108 for experiments E1 and E2, respectively. Mice Experiments were performed using protocols approved by the animal studies committee of the Washington University School of Medicine. For each experiment, two groups of 10–12-wk-old male germ-free C57BL/6J mice were maintained in flexible film gnotobiotic isolators under a strict 12 h light cycle, during which time they received sterilized food and water ad libitum. Animals were fasted for 4 h prior to gavage with 500 µL of a cell suspension inoculum containing the 12 sequenced, human gut-derived bacterial symbionts. After gavage, animals were maintained in separate cages throughout the course of the experiment. Fresh fecal pellets were periodically collected directly into screw-cap sample tubes that were immediately frozen in liquid nitrogen. At the time of sacrifice, the contents of each animal's cecum were divided into thirds and snap-frozen in liquid nitrogen for later use in DNA, RNA, and total protein isolations. Diets Animals were subjected to dietary oscillations comprising three consecutive phases of 2 wk each (see Figure S3). Prior to inoculation, germ-free mice were maintained on a standard autoclaved chow diet low in fat and rich in plant polysaccharides (LF/HPP, B&K rat and mouse autoclavable chow #73780000, Zeigler Bros, Inc). Three days prior to inoculation, one group of germ-free animals was switched to a sterile “Western”-like chow high in fat and simple sugars (HF/HS, Harlan Teklad TD96132), while the other continued to receive LF/HPP chow. After gavage, each group of animals was maintained on its respective diet for 2 wk, after which each treatment group was switched to the alternative diet. Two weeks later, the mice were switched back to their original starting diet and were retained on this diet until the time of sacrifice. DNA and RNA Extraction DNA and RNA were extracted from fecal pellets and cecal contents as previously described [11]. Community Profiling by Sequencing (COPRO-Seq) COPRO-Seq measurements of the proportional representation of all species present in each fecal/cecal sample analyzed were performed as previously described [11] using short-read (36 nt) data collected from an Illumina sequencer (data were generated using a combination of the Genome Analyzer I, Genome Analyzer II, and Genome Analyzer IIx platforms). After demultiplexing each barcoded pool, reads were trimmed to 25 bp and aligned to the reference genomes. An abundance threshold cutoff of 0.003% was set for determining an artificial community members' presence/absence, based on the proportion of reads from each experiment that were found to spuriously align to distractor reference genomes of bacterial species not included in this study. Normalized counts for each bacterial species in each sample were used to calculate a simple intrasample percentage. In order to make changes in abundance over time more easily comparable between species with significantly different relative abundances, these percentages were also in some cases normalized by the maximum abundance (%) observed for a given species across all time-points from a given animal. This transformation resulted in a value referred to as the percentage of maximum achieved (“PoMA”) that was used to evaluate which species were most/least responsive to dietary interventions. Ordination of COPRO-Seq Data Using QIIME COPRO-Seq proportional abundance data were subjected to ordination using scripts found in QIIME v1.5.0-dev [53]. Data from both E1 and E2 were combined to generate a single tab-delimited table conforming to QIIME's early (pre-v1.4.0-dev) OTU table format. This pseudo-OTU table was subsequently converted into a BIOM-formatted table object that was used as the input for beta_diversity.py to calculate the pairwise distances between all samples using a Hellinger metric. PCoA calculations were performed using principal_coordinates.py. These coordinates and sample metadata were passed to make_3d_plots.py to generate PCoA plots. Plots shown are visualized using v2.21 of the KiNG software package [54]. Metatranscriptomics GeneChip. A custom Affymetrix GeneChip (“SynComm1”) with perfect match/mismatch (PM/MM) probe sets targeting 97.6% of the predicted protein-coding genes within the genomes of the 12 bacterial species in this study (plus three additional species not included in the model human gut microbiota) was designed and manufactured in collaboration with the Affymetrix chip design team. Control probes targeting intergenic regions from each genome were also tiled onto the array to allow detection of any contaminating gDNA. Hybridizations were carried out with 0.9–5.1 µg cDNA using the manufacturer's recommended protocols. Details regarding the design of this GeneChip are deposited under Gene Expression Omnibus (GEO) accession GPL9803. Custom mask files were generated for each species on the GeneChip for the purpose of performing data normalization one species at a time. Normalization of raw intensity values was carried out in Affymetrix Microarray Suite (MAS) v5.0. MAS output was exported to Excel where advanced filtering was used to identify those probe sets called present in at least five of seven cecal RNA samples in at least one diet tested. Data from probe sets that did not meet these criteria (i.e., genes that were not expressed on either condition) were not included in subsequent analyses. Normalized, filtered data were evaluated using the Cyber-T web server [55] to identify differentially expressed genes. Genes were generally considered significantly differentially expressed in cases meeting the following three criteria: p < 0.01, PPDE(<p) ≥ 0.99, and |fold-change|≥ 2. Microbial RNA-Seq. Methods for extracting total microbial RNA from mouse feces and cecal contents, depleting small RNAs (e.g., tRNA) and ribosomal RNA (5S, 16S, and 23S rRNA), and for converting depleted RNA to double-stranded cDNA were described previously [14]. Illumina libraries were prepared [11] from 26 fecal samples obtained from the second diet oscillation experiment (four animals, 6–7 time-points surveyed per animal), using 500 ng of input double-stranded cDNA/sample/library. RNA-Seq reads were aligned to the reference genomes using the SSAHA2 aligner [56]. Normalization of the resulting raw counts was performed using the DESeq package in R [57]. Raw counts derived from the metatranscriptome were normalized either at the community level (i.e., counts from all genes were included in the same table during normalization) for purposes of looking at community-level representation of functions (ECs) of interest, or at the species level (i.e., counts from each species were independently normalized) for purposes of looking at gene expression changes within individual species. Data adjustment (logarithmic transformation) and hierarchical clustering were performed using Cluster 3.0 [58] and GENE-E. Heatmap visualizations of expression data were prepared using JavaTreeview [59] and Microsoft Excel. The B. cellulosilyticus WH2 in vitro gene expression dendrogram presented was prepared using GENE-E. Bootstrap probabilities at each edge of the dendrogram were calculated using the “pvclust” package in R (10,000 replications). Clusters with bootstrap p values >0.95 were considered strongly supported and statistically significant. Metaproteomics Sample Preparation. Cecal contents were collected from four mice and solubilized in 1 mL SDS lysis buffer (4% w/v SDS, 100 mM Tris·HCl (pH 8.0), 10 mM dithiothreitol (DTT)), lysed mechanically by sonication, incubated at 95°C for 5 min, and centrifuged at 21,000× g. Crude protein extracts were precipitated using 100% trichloroacetic acid (TCA), pelleted by centrifugation, and washed with ice-cold acetone to remove lipids and excess SDS. Protein precipitates were resolubilized in 500 µL of 8 M urea and 100 mM Tris·HCl (pH 8.0), reduced by incubation in DTT (final concentration of 10 mM) for 1 h at room temperature, and sonicated in an ice water bath (Branson (model SSE-1) sonicator; 20% amp; 2 min total (cycles of 5 s on, 10 s off)). An aliquot of each protein extract was quantified using a bicinchoninic acid (BCA)-based protein assay kit (Pierce). Protein samples (1 mg) were subsequently diluted with 100 mM Tris·HCl and 10 mM CaCl2 (pH 8.0) to a final urea concentration below 4 M. Proteolytic digestions were initiated with sequencing grade trypsin (1/100, w/w; Promega) and incubated overnight at room temperature. A second aliquot of trypsin was added (1/100) after the reactions were diluted with 100 mM Tris·HCl (pH 8.0) to a final urea concentration below 2 M. After incubation for 4 h at room temperature, samples were reduced by incubation in 10 mM DTT for 1 h at room temperature. Finally, the peptides were acidified (protonated) in 200 mM NaCl and 0.1% formic acid, filtered, and concentrated with a 10 kDa molecular weight cutoff spin column (Sartorius). LC-MS/MS Data Collection. The peptide mixture from each mouse was analyzed in technical duplicate via two-dimensional liquid chromatography (LC)-MS/MS on a hybrid LTQ Orbitrap Velos mass spectrometer (Thermo Fisher Scientific). Peptides (∼100 µL per sample) were separated using a split phase 2D (strong-cation exchange (SCX) and C18 reverse phase (RP))-LC column over a 12-step gradient for each run. All MS analyses were performed in positive ion mode. Mass spectral data were acquired using Xcalibur (v2.0.7) in data-dependent acquisition mode for each chromatographic separation (22 h run). One precursor MS scan was acquired in the Orbitrap at 30K resolution followed by 10 data-dependent MS/MS scans (m/z 400–1,700) at 35% normalized collision energy with dynamic exclusion enabled at a repeat count of 1. MS/MS spectra were searched with SEQUEST (v.27; [60]) using the following settings: enzyme type = trypsin; precursor ion mass tolerance = 3.0 Da; fragment mass tolerance = 0.5 Da; fully tryptic peptides and those resulting from up to four missed cleavages only. All datasets were filtered with DTASelect (v1.9; [61]) using the following parameters: Xcorrs of 1.8, 2.5, and 3.5 for singly, doubly, and triply charged precursor ions; DeltCN ≥ 0.08; ≥2 fully tryptic peptides per protein. A custom-built FASTA target-decoy database [62],[63] was generated and searched with SEQUEST at a peptide-level false positive rate (FPR) estimated at ≤0.5%. The database contained theoretical proteomes predicted from the genomes of the 12 bacterial species characterized in this study (see Tables S4 and S8), some diet components (e.g., rice and yeast), and common contaminants (e.g., keratins). Three additional theoretical bacterial proteomes predicted from the genomes of Eubacterium rectale, Faecalibacterium prausnitzii, and Ruminococcus torques were included as distractors (negative controls) that were not expected to be present in any of the samples analyzed. An in silico tryptically digested protein sequence database was also used to generate a theoretical peptidome of unique peptides within a mass range of 600–4,890 Da and ≤1 miscleavages. Analysis of Proteomic Datasets. Spectral counts for each protein were normalized by either the total number of spectra collected for all species in a sample (normalization by community, “NBC”), or by the total number of spectra collected for all proteins from a given species (normalization by species, “NBS”). p values for each protein were calculated using the Mann–Whitney U test. To correct for multiple comparisons, q values were calculated using an optimized false discovery rate (FDR) approach with the “qvalue” package in BioConductor. Regardless of the normalization strategy employed, p and q values were only calculated for proteins with at least three valid runs, where a valid run was one with more than five spectral counts. In NBC data, p and q values were calculated for all proteins within the model metaproteome. In NBS data, p and q values were calculated for each species-specific set of proteins. Differences in spectral counts between treatment groups (diets) were calculated using group medians. A protein was designated as “UP”-regulated if both its p and q values were less than 0.05 and the spectral count difference between treatment groups was greater than 5. The same criteria were applied in the opposite direction for proteins labeled as “DOWN.” For proteins labeled “NULL,” there was insufficient evidence to report any significant difference between the two treatment groups. Finally, a protein was considered detected or “present” in a sample if at least four (raw) spectral counts were assigned to that protein when aggregating the results from the two runs (technical replications) performed on the sample. Phenotypic Screen for the Growth of Bacteroides spp. on Various Carbohydrates The ability of B. cellulosilyticus WH2 and B. caccae ATCC 43185 to grow on a panel of 47 simple and complex carbohydrates was evaluated using a phenotypic array whose composition has been previously described [25]. Growth measurements were collected in duplicate (two wells per substrate) over the course of 3 d at 37°C under anaerobic conditions. A total of three independent experiments were performed for each species tested (n = 6 growth profiles/substrate/species). Total growth (Atot) was calculated from each growth curve as the difference between the maximum and minimum optical densities (OD600) observed (i.e., Amax−Amin). Growth rates were calculated as total growth divided by time (Atot/(tmax−tmin)), where tmax and tmin correspond to the time-points at which Amax and Amin, respectively, were collected. Consolidated statistics from all six replicates for each of the 47 conditions tested for each species are provided in Table S11. Profiling B. cellulosilyticus WH2 Gene Expression During Growth in Defined MM Containing Various Carbohydrates RNA-Seq. To characterize the impact of select mono- and polysaccharides on the in vitro gene expression of B. cellulosilyticus WH2, cells were cultured in MM supplemented with one of 31 distinct carbohydrates (for the formulation of MM and a list of the carbohydrates used as growth substrates, see Tables S12 and S13). After recovery from a frozen stock on BHI blood agar, a single colony was picked and inoculated into 5 mL of MM containing 5 mg/mL glucose (MM-Glc). Anaerobic conditions were generated within each individual culture tube using a previously described method [64] with the following modifications: (i) the cotton plug was lit and extinguished before being pushed below the lip of the culture tube, and (ii) 200 µL of saturated sodium bicarbonate was combined with 200 µL 35% (w/v) pyrogallate solution on top of the cotton plug before a bare rubber stopper was used to seal the tube. Cultures were grown overnight at 37°C. Twenty microliters of this “starter” culture were subsequently inoculated into a series of “acclimatization” cultures, each containing 5 mL of MM plus one of the 31 carbohydrates to be tested (5 mg/mL final concentration), and anaerobic culturing was carried out as above. This second round of culturing served two purposes: (i) it ensured cells were acclimated to growth on their new carbohydrate substrate prior to the inoculation of the final cultures that were harvested for RNA, and (ii) it provided an opportunity to obtain OD600 measurements indicating, for each carbohydrate, the range of optical densities corresponding to B. cellulosilyticus WH2's logarithmic phase of growth. Finally, 50 µL of each “acclimatization” culture were inoculated into triplicate 10 mL volumes of medium of the same composition, and the 90 “harvest” cultures were grown anaerobically at 37°C. At mid-log phase, 5 mL of cells were immediately preserved in Qiagen RNAprotect Bacteria Reagent according to the manufacturer's instructions. Cells were then pelleted, RNAprotect reagent was poured off, and the bacteria were stored at −80°C. After thawing, while still cold, each bacterial cell pellet was combined with 500 µL Buffer B (200 mM NaCl, 20 mM EDTA), 210 µL of 20% SDS, and 500 µL of acid phenol∶chloroform∶isoamyl alcohol (125∶24∶1, pH 4.5). The pellet was resuspended by manual manipulation with a pipette tip and transferred to a 2 mL screwcap tube containing acid-washed glass beads (Sigma, 212–300 µm diameter). Tubes were placed on ice, bead-beaten for 2 min at room temperature (BioSpec Mini-Beadbeater-8; set to “homogenize”), placed on ice, and bead-beaten for an additional 2 min, after which time RNA was extracted as described above for fecal and cecal samples. Identification of Diet-Specific Fitness Determinants within the B. cellulosilyticus WH2 Genome Using Insertion Sequencing (INSeq) Whole genome transposon mutagenesis of B. cellulosilyticus WH2 was performed using protocols originally developed for B. thetaiotaomicron [42],[46], with some modifications. Initial attempts to transform B. cellulosilyticus WH2 with the pSAM_Bt construct reported by Goodman et al. yielded very low numbers of antibiotic-resistant clones, which we attributed to poor recognition of one or more promoters in the mutagenesis plasmid. Replacement of the promoter driving expression of the transposon's erythromycin resistance gene (ermG) with the promoter for the gene encoding EF-Tu in B. cellulosilyticus WH2 (BWH2_3183) dramatically improved the number of resistant clones recovered after transformation. The resulting library consisted of 93,458 distinct isogenic mutants, with each mutant strain containing a single randomly inserted modified mariner transposon. Of all predicted ORFs, 91.5% had insertions covering the first 80% of each gene (mean, 13.9 distinct insertion mutants per ORF). At 11 wk of age, male germ-free C57BL/6J mice (individually caged) were fed either a diet low in fat and rich in plant polysaccharides (LF/HPP) or high in fat and simple sugars (HF/HS). After a week on their experimental diet, animals received a single gavage containing the B. cellulosilyticus WH2 transposon library and 14 other species of bacteria (i.e., this artificial community consisted of the 12 species listed in Figure 1A, plus B. thetaiotaomicron 7330, E. rectale ATCC 33656, and Clostridium symbiosum ATCC 14940). After 16 d, fecal pellets were collected, and total fecal DNA was extracted. 500 ng of each fecal DNA extraction was diluted in 15 µL of TE buffer and digested with MmeI (4 U, New England Biolabs) in a 20 µL reaction supplemented with 10 pmoles of 12 bp DNA containing an MmeI restriction site (to improve the efficiency of restriction enzyme digestion) [42]. The reaction was incubated for 1 h at 37°C and then terminated (80°C for 20 min). MmeI-digested DNA was subsequently purified using 125 µL of AMPure beads (after washing the beads once with 100 µL of sizing solution (1.2 M NaCl and 8.4% PEG 8000)). The digested DNA was added to the beads and the solution incubated at room temperature for 5 min. Beads were pelleted with a magnetic particle collector (MPC), washed twice (each time using a mixture composed of 20 µL TE buffer (pH 7.0) and 100 µL sizing solution, with bead recovery via MPC after each wash), followed by two ethanol washes (180 µL 70% ethanol/wash) and air-drying for 10 min. Samples were resuspended in 18 µL TE buffer (pH 7.0), and DNA was removed after pelleting beads with the MPC. Ligation of adapters was performed in a 20 µL reaction that contained 16 µL of purified DNA, 1 µL of T4 Ligase (2000 U/µL; NEB), 2 µL 10× ligase buffer, and 10 pmol of barcoded adapter (incubation for 1 h at 16°C). Ligations were subsequently diluted with TE buffer (pH 7.0) to a final volume of 50 µL, mixed with 60 µL of AMPure beads, and incubated at room temperature for 5 min. Beads with bound DNA were pelleted using the MPC and washed twice with 70% ethanol as above. After allowing the ethanol to evaporate for 10 min, 35 µL of nuclease-free water was added, and the mixture was incubated at room temperature for 2 min before collecting the beads with the MPC. Enrichment PCR was performed in a final volume of 50 µL using 32 µL of the cleaned up sample DNA, 10 µL 10× Pfx amplification buffer (Invitrogen), 2 µL 10 mM dNTPs, 0.5 µL 50 mM MgSO4, 2 µL of 5 µM amplification primers (forward primer: 5′CAAGCAGAAGACGGCATACG3′, reverse primer: 5′AATGATACGGCGACCACCGAACACTCTTTCCCTACACGA3′), and 1.5 µL Pfx polymerase (2.5 U/µL; Invitrogen) (cycling conditions: denaturation at 94°C for 15 s; annealing at 65°C for 1 min; extension at 68°C for 30 s; total of 22 cycles). The 134 bp PCR product from each reaction was purified (4% MetaPhor gel; MinElute Gel Extraction Kit (Qiagen)) in a final volume of 20 µL and was quantified (Qubit, dsDNA HS Assay Kit; Invitrogen). Reaction products were then combined in equimolar amounts into a pool that was subsequently adjusted to 10 nM and sequenced (Illumina HiSeq 2000 instrument). Data Deposition All short read Illumina data used for COPRO-Seq and RNA-Seq analyses, GeneChip data, and genome sequencing/assembly data are available through GEO SuperSeries GSE48537 and NCBI BioProject ID PRJNA183545. The draft genome assembly for B. cellulosilyticus WH2 has been deposited at DDBJ/EMBL/GenBank under accession number ATFI00000000. Raw MS data are available from the Dryad Digital Repository: http://dx.doi.org/10.5061/dryad.7fj1k. Ethics Statement All experiments involving mice used protocols approved by the Washington University Animal Studies Committee in accordance with guidelines set forth by the American Veterinary Medical Association. Trained veterinarians from the Washington University Division of Comparative Medicine supervised all experiments. The laboratory animal program at Washington University is accredited by the Association for Assessment and Accreditation of Laboratory Animal Care International (AAALAC). B. cellulosilyticus WH2 Genome Sequencing A strain of B. cellulosilyticus designated “WH2” (see Figure S1A,B) was isolated from a human fecal sample during an iteration of the Microbial Diversity Summer Course overseen by A. Salyers (University of Illinois, Urbana-Champaign) at the Marine Biological Laboratory (Woods Hole, MA). The genome of this isolate was sequenced using a combination of long-read and short-read technologies, yielding 51,819 plasmid and fosmid end reads (library insert sizes: 3.9, 4.9, 6.0, 8.0, and 40 kb; ABI 3730 platform), 333,883 unpaired 454 reads (FLX+ and XL+ chemistry), and 10 million unpaired Illumina reads (HiSeq; 42 nt read length). A hybrid assembly was constructed using MIRA v3.4.0 (method, de novo; type, genome; quality grade, accurate) with default settings [48],[49]. Gene calling was performed using the YACOP metatool [50]. Additionally, the four ribosomal RNA (rRNA) operons within the B. cellulosilyticus WH2 genome were sequenced individually to ensure high sequence accuracy in these difficult-to-assemble regions. Further details for the B. cellulosilyticus WH2 assembly are provided in Table S1. Bacterial Strains Details regarding the 12 bacterial strains used in this study are provided in Table S4. Cells were grown in supplemented TYG (TYGS; [42]) at 37°C under anaerobic conditions in a Coy anaerobic chamber (atmosphere: 75% N2, 20% H2, 5% CO2). After reaching stationary phase, cells were pelleted by centrifugation and resuspended in TYGS medium supplemented with 20% glycerol. Individual aliquots containing 400–800 µL of each cell suspension were stored at −80°C in 1.8 mL borosilicate glass vials with aluminum crimp tops. The identity of each species was verified prior to its use in experiments by extracting DNA from a frozen aliquot of cells, amplifying the 16S rRNA gene by PCR using primers 8F/27F (AGAGTTTGATCCTGGCTCAG; [51]) and 1391R (GACGGGCGGTGWGTRCA; [52]), sequencing the entire amplicon with an ABI 3730 capillary sequencer (Retrogen, Inc.), and comparing the assembled 16S rRNA gene sequence to the known reference sequence. Preparation of Strains for Oral Gavage Details regarding the construction of each inoculum are provided in Table S3. The inocula used to gavage germ-free mice in each experiment were prepared either directly from frozen stocks (experiment 1, E1) or from a combination of frozen stocks and overnight cultures (experiment 2, E2). The recoverable cell density for each batch of frozen stocks used in inoculum preparation was determined prior to pooling, while the same values for overnight cultures were calculated after pooling. To do so, an aliquot of cells from each overnight culture or set of frozen stocks was used to prepare a 10-fold dilution series in phosphate-buffered saline (PBS), and each dilution series was plated on brain-heart-infusion (BHI; BD Difco) agar supplemented with 10% (v/v) defibrinated horse blood (Colorado Serum Co.). Plates were grown for up to 3 d at 37°C under anaerobic conditions in a Coy chamber, colonies were counted, and the number of colony-forming units per milliliter (CFUs/mL) was calculated. The volume of each cell suspension included in the final inoculum was normalized by its known or estimated viable cell concentration in an effort to ensure that no species received an early advantage during establishment of the artificial community in the germ-free animals. Total CFUs per gavage were estimated at 8.0×107 and 4.2×108 for experiments E1 and E2, respectively. Mice Experiments were performed using protocols approved by the animal studies committee of the Washington University School of Medicine. For each experiment, two groups of 10–12-wk-old male germ-free C57BL/6J mice were maintained in flexible film gnotobiotic isolators under a strict 12 h light cycle, during which time they received sterilized food and water ad libitum. Animals were fasted for 4 h prior to gavage with 500 µL of a cell suspension inoculum containing the 12 sequenced, human gut-derived bacterial symbionts. After gavage, animals were maintained in separate cages throughout the course of the experiment. Fresh fecal pellets were periodically collected directly into screw-cap sample tubes that were immediately frozen in liquid nitrogen. At the time of sacrifice, the contents of each animal's cecum were divided into thirds and snap-frozen in liquid nitrogen for later use in DNA, RNA, and total protein isolations. Diets Animals were subjected to dietary oscillations comprising three consecutive phases of 2 wk each (see Figure S3). Prior to inoculation, germ-free mice were maintained on a standard autoclaved chow diet low in fat and rich in plant polysaccharides (LF/HPP, B&K rat and mouse autoclavable chow #73780000, Zeigler Bros, Inc). Three days prior to inoculation, one group of germ-free animals was switched to a sterile “Western”-like chow high in fat and simple sugars (HF/HS, Harlan Teklad TD96132), while the other continued to receive LF/HPP chow. After gavage, each group of animals was maintained on its respective diet for 2 wk, after which each treatment group was switched to the alternative diet. Two weeks later, the mice were switched back to their original starting diet and were retained on this diet until the time of sacrifice. DNA and RNA Extraction DNA and RNA were extracted from fecal pellets and cecal contents as previously described [11]. Community Profiling by Sequencing (COPRO-Seq) COPRO-Seq measurements of the proportional representation of all species present in each fecal/cecal sample analyzed were performed as previously described [11] using short-read (36 nt) data collected from an Illumina sequencer (data were generated using a combination of the Genome Analyzer I, Genome Analyzer II, and Genome Analyzer IIx platforms). After demultiplexing each barcoded pool, reads were trimmed to 25 bp and aligned to the reference genomes. An abundance threshold cutoff of 0.003% was set for determining an artificial community members' presence/absence, based on the proportion of reads from each experiment that were found to spuriously align to distractor reference genomes of bacterial species not included in this study. Normalized counts for each bacterial species in each sample were used to calculate a simple intrasample percentage. In order to make changes in abundance over time more easily comparable between species with significantly different relative abundances, these percentages were also in some cases normalized by the maximum abundance (%) observed for a given species across all time-points from a given animal. This transformation resulted in a value referred to as the percentage of maximum achieved (“PoMA”) that was used to evaluate which species were most/least responsive to dietary interventions. Ordination of COPRO-Seq Data Using QIIME COPRO-Seq proportional abundance data were subjected to ordination using scripts found in QIIME v1.5.0-dev [53]. Data from both E1 and E2 were combined to generate a single tab-delimited table conforming to QIIME's early (pre-v1.4.0-dev) OTU table format. This pseudo-OTU table was subsequently converted into a BIOM-formatted table object that was used as the input for beta_diversity.py to calculate the pairwise distances between all samples using a Hellinger metric. PCoA calculations were performed using principal_coordinates.py. These coordinates and sample metadata were passed to make_3d_plots.py to generate PCoA plots. Plots shown are visualized using v2.21 of the KiNG software package [54]. Metatranscriptomics GeneChip. A custom Affymetrix GeneChip (“SynComm1”) with perfect match/mismatch (PM/MM) probe sets targeting 97.6% of the predicted protein-coding genes within the genomes of the 12 bacterial species in this study (plus three additional species not included in the model human gut microbiota) was designed and manufactured in collaboration with the Affymetrix chip design team. Control probes targeting intergenic regions from each genome were also tiled onto the array to allow detection of any contaminating gDNA. Hybridizations were carried out with 0.9–5.1 µg cDNA using the manufacturer's recommended protocols. Details regarding the design of this GeneChip are deposited under Gene Expression Omnibus (GEO) accession GPL9803. Custom mask files were generated for each species on the GeneChip for the purpose of performing data normalization one species at a time. Normalization of raw intensity values was carried out in Affymetrix Microarray Suite (MAS) v5.0. MAS output was exported to Excel where advanced filtering was used to identify those probe sets called present in at least five of seven cecal RNA samples in at least one diet tested. Data from probe sets that did not meet these criteria (i.e., genes that were not expressed on either condition) were not included in subsequent analyses. Normalized, filtered data were evaluated using the Cyber-T web server [55] to identify differentially expressed genes. Genes were generally considered significantly differentially expressed in cases meeting the following three criteria: p < 0.01, PPDE(<p) ≥ 0.99, and |fold-change|≥ 2. Microbial RNA-Seq. Methods for extracting total microbial RNA from mouse feces and cecal contents, depleting small RNAs (e.g., tRNA) and ribosomal RNA (5S, 16S, and 23S rRNA), and for converting depleted RNA to double-stranded cDNA were described previously [14]. Illumina libraries were prepared [11] from 26 fecal samples obtained from the second diet oscillation experiment (four animals, 6–7 time-points surveyed per animal), using 500 ng of input double-stranded cDNA/sample/library. RNA-Seq reads were aligned to the reference genomes using the SSAHA2 aligner [56]. Normalization of the resulting raw counts was performed using the DESeq package in R [57]. Raw counts derived from the metatranscriptome were normalized either at the community level (i.e., counts from all genes were included in the same table during normalization) for purposes of looking at community-level representation of functions (ECs) of interest, or at the species level (i.e., counts from each species were independently normalized) for purposes of looking at gene expression changes within individual species. Data adjustment (logarithmic transformation) and hierarchical clustering were performed using Cluster 3.0 [58] and GENE-E. Heatmap visualizations of expression data were prepared using JavaTreeview [59] and Microsoft Excel. The B. cellulosilyticus WH2 in vitro gene expression dendrogram presented was prepared using GENE-E. Bootstrap probabilities at each edge of the dendrogram were calculated using the “pvclust” package in R (10,000 replications). Clusters with bootstrap p values >0.95 were considered strongly supported and statistically significant. GeneChip. A custom Affymetrix GeneChip (“SynComm1”) with perfect match/mismatch (PM/MM) probe sets targeting 97.6% of the predicted protein-coding genes within the genomes of the 12 bacterial species in this study (plus three additional species not included in the model human gut microbiota) was designed and manufactured in collaboration with the Affymetrix chip design team. Control probes targeting intergenic regions from each genome were also tiled onto the array to allow detection of any contaminating gDNA. Hybridizations were carried out with 0.9–5.1 µg cDNA using the manufacturer's recommended protocols. Details regarding the design of this GeneChip are deposited under Gene Expression Omnibus (GEO) accession GPL9803. Custom mask files were generated for each species on the GeneChip for the purpose of performing data normalization one species at a time. Normalization of raw intensity values was carried out in Affymetrix Microarray Suite (MAS) v5.0. MAS output was exported to Excel where advanced filtering was used to identify those probe sets called present in at least five of seven cecal RNA samples in at least one diet tested. Data from probe sets that did not meet these criteria (i.e., genes that were not expressed on either condition) were not included in subsequent analyses. Normalized, filtered data were evaluated using the Cyber-T web server [55] to identify differentially expressed genes. Genes were generally considered significantly differentially expressed in cases meeting the following three criteria: p < 0.01, PPDE(<p) ≥ 0.99, and |fold-change|≥ 2. Microbial RNA-Seq. Methods for extracting total microbial RNA from mouse feces and cecal contents, depleting small RNAs (e.g., tRNA) and ribosomal RNA (5S, 16S, and 23S rRNA), and for converting depleted RNA to double-stranded cDNA were described previously [14]. Illumina libraries were prepared [11] from 26 fecal samples obtained from the second diet oscillation experiment (four animals, 6–7 time-points surveyed per animal), using 500 ng of input double-stranded cDNA/sample/library. RNA-Seq reads were aligned to the reference genomes using the SSAHA2 aligner [56]. Normalization of the resulting raw counts was performed using the DESeq package in R [57]. Raw counts derived from the metatranscriptome were normalized either at the community level (i.e., counts from all genes were included in the same table during normalization) for purposes of looking at community-level representation of functions (ECs) of interest, or at the species level (i.e., counts from each species were independently normalized) for purposes of looking at gene expression changes within individual species. Data adjustment (logarithmic transformation) and hierarchical clustering were performed using Cluster 3.0 [58] and GENE-E. Heatmap visualizations of expression data were prepared using JavaTreeview [59] and Microsoft Excel. The B. cellulosilyticus WH2 in vitro gene expression dendrogram presented was prepared using GENE-E. Bootstrap probabilities at each edge of the dendrogram were calculated using the “pvclust” package in R (10,000 replications). Clusters with bootstrap p values >0.95 were considered strongly supported and statistically significant. Metaproteomics Sample Preparation. Cecal contents were collected from four mice and solubilized in 1 mL SDS lysis buffer (4% w/v SDS, 100 mM Tris·HCl (pH 8.0), 10 mM dithiothreitol (DTT)), lysed mechanically by sonication, incubated at 95°C for 5 min, and centrifuged at 21,000× g. Crude protein extracts were precipitated using 100% trichloroacetic acid (TCA), pelleted by centrifugation, and washed with ice-cold acetone to remove lipids and excess SDS. Protein precipitates were resolubilized in 500 µL of 8 M urea and 100 mM Tris·HCl (pH 8.0), reduced by incubation in DTT (final concentration of 10 mM) for 1 h at room temperature, and sonicated in an ice water bath (Branson (model SSE-1) sonicator; 20% amp; 2 min total (cycles of 5 s on, 10 s off)). An aliquot of each protein extract was quantified using a bicinchoninic acid (BCA)-based protein assay kit (Pierce). Protein samples (1 mg) were subsequently diluted with 100 mM Tris·HCl and 10 mM CaCl2 (pH 8.0) to a final urea concentration below 4 M. Proteolytic digestions were initiated with sequencing grade trypsin (1/100, w/w; Promega) and incubated overnight at room temperature. A second aliquot of trypsin was added (1/100) after the reactions were diluted with 100 mM Tris·HCl (pH 8.0) to a final urea concentration below 2 M. After incubation for 4 h at room temperature, samples were reduced by incubation in 10 mM DTT for 1 h at room temperature. Finally, the peptides were acidified (protonated) in 200 mM NaCl and 0.1% formic acid, filtered, and concentrated with a 10 kDa molecular weight cutoff spin column (Sartorius). LC-MS/MS Data Collection. The peptide mixture from each mouse was analyzed in technical duplicate via two-dimensional liquid chromatography (LC)-MS/MS on a hybrid LTQ Orbitrap Velos mass spectrometer (Thermo Fisher Scientific). Peptides (∼100 µL per sample) were separated using a split phase 2D (strong-cation exchange (SCX) and C18 reverse phase (RP))-LC column over a 12-step gradient for each run. All MS analyses were performed in positive ion mode. Mass spectral data were acquired using Xcalibur (v2.0.7) in data-dependent acquisition mode for each chromatographic separation (22 h run). One precursor MS scan was acquired in the Orbitrap at 30K resolution followed by 10 data-dependent MS/MS scans (m/z 400–1,700) at 35% normalized collision energy with dynamic exclusion enabled at a repeat count of 1. MS/MS spectra were searched with SEQUEST (v.27; [60]) using the following settings: enzyme type = trypsin; precursor ion mass tolerance = 3.0 Da; fragment mass tolerance = 0.5 Da; fully tryptic peptides and those resulting from up to four missed cleavages only. All datasets were filtered with DTASelect (v1.9; [61]) using the following parameters: Xcorrs of 1.8, 2.5, and 3.5 for singly, doubly, and triply charged precursor ions; DeltCN ≥ 0.08; ≥2 fully tryptic peptides per protein. A custom-built FASTA target-decoy database [62],[63] was generated and searched with SEQUEST at a peptide-level false positive rate (FPR) estimated at ≤0.5%. The database contained theoretical proteomes predicted from the genomes of the 12 bacterial species characterized in this study (see Tables S4 and S8), some diet components (e.g., rice and yeast), and common contaminants (e.g., keratins). Three additional theoretical bacterial proteomes predicted from the genomes of Eubacterium rectale, Faecalibacterium prausnitzii, and Ruminococcus torques were included as distractors (negative controls) that were not expected to be present in any of the samples analyzed. An in silico tryptically digested protein sequence database was also used to generate a theoretical peptidome of unique peptides within a mass range of 600–4,890 Da and ≤1 miscleavages. Analysis of Proteomic Datasets. Spectral counts for each protein were normalized by either the total number of spectra collected for all species in a sample (normalization by community, “NBC”), or by the total number of spectra collected for all proteins from a given species (normalization by species, “NBS”). p values for each protein were calculated using the Mann–Whitney U test. To correct for multiple comparisons, q values were calculated using an optimized false discovery rate (FDR) approach with the “qvalue” package in BioConductor. Regardless of the normalization strategy employed, p and q values were only calculated for proteins with at least three valid runs, where a valid run was one with more than five spectral counts. In NBC data, p and q values were calculated for all proteins within the model metaproteome. In NBS data, p and q values were calculated for each species-specific set of proteins. Differences in spectral counts between treatment groups (diets) were calculated using group medians. A protein was designated as “UP”-regulated if both its p and q values were less than 0.05 and the spectral count difference between treatment groups was greater than 5. The same criteria were applied in the opposite direction for proteins labeled as “DOWN.” For proteins labeled “NULL,” there was insufficient evidence to report any significant difference between the two treatment groups. Finally, a protein was considered detected or “present” in a sample if at least four (raw) spectral counts were assigned to that protein when aggregating the results from the two runs (technical replications) performed on the sample. Sample Preparation. Cecal contents were collected from four mice and solubilized in 1 mL SDS lysis buffer (4% w/v SDS, 100 mM Tris·HCl (pH 8.0), 10 mM dithiothreitol (DTT)), lysed mechanically by sonication, incubated at 95°C for 5 min, and centrifuged at 21,000× g. Crude protein extracts were precipitated using 100% trichloroacetic acid (TCA), pelleted by centrifugation, and washed with ice-cold acetone to remove lipids and excess SDS. Protein precipitates were resolubilized in 500 µL of 8 M urea and 100 mM Tris·HCl (pH 8.0), reduced by incubation in DTT (final concentration of 10 mM) for 1 h at room temperature, and sonicated in an ice water bath (Branson (model SSE-1) sonicator; 20% amp; 2 min total (cycles of 5 s on, 10 s off)). An aliquot of each protein extract was quantified using a bicinchoninic acid (BCA)-based protein assay kit (Pierce). Protein samples (1 mg) were subsequently diluted with 100 mM Tris·HCl and 10 mM CaCl2 (pH 8.0) to a final urea concentration below 4 M. Proteolytic digestions were initiated with sequencing grade trypsin (1/100, w/w; Promega) and incubated overnight at room temperature. A second aliquot of trypsin was added (1/100) after the reactions were diluted with 100 mM Tris·HCl (pH 8.0) to a final urea concentration below 2 M. After incubation for 4 h at room temperature, samples were reduced by incubation in 10 mM DTT for 1 h at room temperature. Finally, the peptides were acidified (protonated) in 200 mM NaCl and 0.1% formic acid, filtered, and concentrated with a 10 kDa molecular weight cutoff spin column (Sartorius). LC-MS/MS Data Collection. The peptide mixture from each mouse was analyzed in technical duplicate via two-dimensional liquid chromatography (LC)-MS/MS on a hybrid LTQ Orbitrap Velos mass spectrometer (Thermo Fisher Scientific). Peptides (∼100 µL per sample) were separated using a split phase 2D (strong-cation exchange (SCX) and C18 reverse phase (RP))-LC column over a 12-step gradient for each run. All MS analyses were performed in positive ion mode. Mass spectral data were acquired using Xcalibur (v2.0.7) in data-dependent acquisition mode for each chromatographic separation (22 h run). One precursor MS scan was acquired in the Orbitrap at 30K resolution followed by 10 data-dependent MS/MS scans (m/z 400–1,700) at 35% normalized collision energy with dynamic exclusion enabled at a repeat count of 1. MS/MS spectra were searched with SEQUEST (v.27; [60]) using the following settings: enzyme type = trypsin; precursor ion mass tolerance = 3.0 Da; fragment mass tolerance = 0.5 Da; fully tryptic peptides and those resulting from up to four missed cleavages only. All datasets were filtered with DTASelect (v1.9; [61]) using the following parameters: Xcorrs of 1.8, 2.5, and 3.5 for singly, doubly, and triply charged precursor ions; DeltCN ≥ 0.08; ≥2 fully tryptic peptides per protein. A custom-built FASTA target-decoy database [62],[63] was generated and searched with SEQUEST at a peptide-level false positive rate (FPR) estimated at ≤0.5%. The database contained theoretical proteomes predicted from the genomes of the 12 bacterial species characterized in this study (see Tables S4 and S8), some diet components (e.g., rice and yeast), and common contaminants (e.g., keratins). Three additional theoretical bacterial proteomes predicted from the genomes of Eubacterium rectale, Faecalibacterium prausnitzii, and Ruminococcus torques were included as distractors (negative controls) that were not expected to be present in any of the samples analyzed. An in silico tryptically digested protein sequence database was also used to generate a theoretical peptidome of unique peptides within a mass range of 600–4,890 Da and ≤1 miscleavages. Analysis of Proteomic Datasets. Spectral counts for each protein were normalized by either the total number of spectra collected for all species in a sample (normalization by community, “NBC”), or by the total number of spectra collected for all proteins from a given species (normalization by species, “NBS”). p values for each protein were calculated using the Mann–Whitney U test. To correct for multiple comparisons, q values were calculated using an optimized false discovery rate (FDR) approach with the “qvalue” package in BioConductor. Regardless of the normalization strategy employed, p and q values were only calculated for proteins with at least three valid runs, where a valid run was one with more than five spectral counts. In NBC data, p and q values were calculated for all proteins within the model metaproteome. In NBS data, p and q values were calculated for each species-specific set of proteins. Differences in spectral counts between treatment groups (diets) were calculated using group medians. A protein was designated as “UP”-regulated if both its p and q values were less than 0.05 and the spectral count difference between treatment groups was greater than 5. The same criteria were applied in the opposite direction for proteins labeled as “DOWN.” For proteins labeled “NULL,” there was insufficient evidence to report any significant difference between the two treatment groups. Finally, a protein was considered detected or “present” in a sample if at least four (raw) spectral counts were assigned to that protein when aggregating the results from the two runs (technical replications) performed on the sample. Phenotypic Screen for the Growth of Bacteroides spp. on Various Carbohydrates The ability of B. cellulosilyticus WH2 and B. caccae ATCC 43185 to grow on a panel of 47 simple and complex carbohydrates was evaluated using a phenotypic array whose composition has been previously described [25]. Growth measurements were collected in duplicate (two wells per substrate) over the course of 3 d at 37°C under anaerobic conditions. A total of three independent experiments were performed for each species tested (n = 6 growth profiles/substrate/species). Total growth (Atot) was calculated from each growth curve as the difference between the maximum and minimum optical densities (OD600) observed (i.e., Amax−Amin). Growth rates were calculated as total growth divided by time (Atot/(tmax−tmin)), where tmax and tmin correspond to the time-points at which Amax and Amin, respectively, were collected. Consolidated statistics from all six replicates for each of the 47 conditions tested for each species are provided in Table S11. Profiling B. cellulosilyticus WH2 Gene Expression During Growth in Defined MM Containing Various Carbohydrates RNA-Seq. To characterize the impact of select mono- and polysaccharides on the in vitro gene expression of B. cellulosilyticus WH2, cells were cultured in MM supplemented with one of 31 distinct carbohydrates (for the formulation of MM and a list of the carbohydrates used as growth substrates, see Tables S12 and S13). After recovery from a frozen stock on BHI blood agar, a single colony was picked and inoculated into 5 mL of MM containing 5 mg/mL glucose (MM-Glc). Anaerobic conditions were generated within each individual culture tube using a previously described method [64] with the following modifications: (i) the cotton plug was lit and extinguished before being pushed below the lip of the culture tube, and (ii) 200 µL of saturated sodium bicarbonate was combined with 200 µL 35% (w/v) pyrogallate solution on top of the cotton plug before a bare rubber stopper was used to seal the tube. Cultures were grown overnight at 37°C. Twenty microliters of this “starter” culture were subsequently inoculated into a series of “acclimatization” cultures, each containing 5 mL of MM plus one of the 31 carbohydrates to be tested (5 mg/mL final concentration), and anaerobic culturing was carried out as above. This second round of culturing served two purposes: (i) it ensured cells were acclimated to growth on their new carbohydrate substrate prior to the inoculation of the final cultures that were harvested for RNA, and (ii) it provided an opportunity to obtain OD600 measurements indicating, for each carbohydrate, the range of optical densities corresponding to B. cellulosilyticus WH2's logarithmic phase of growth. Finally, 50 µL of each “acclimatization” culture were inoculated into triplicate 10 mL volumes of medium of the same composition, and the 90 “harvest” cultures were grown anaerobically at 37°C. At mid-log phase, 5 mL of cells were immediately preserved in Qiagen RNAprotect Bacteria Reagent according to the manufacturer's instructions. Cells were then pelleted, RNAprotect reagent was poured off, and the bacteria were stored at −80°C. After thawing, while still cold, each bacterial cell pellet was combined with 500 µL Buffer B (200 mM NaCl, 20 mM EDTA), 210 µL of 20% SDS, and 500 µL of acid phenol∶chloroform∶isoamyl alcohol (125∶24∶1, pH 4.5). The pellet was resuspended by manual manipulation with a pipette tip and transferred to a 2 mL screwcap tube containing acid-washed glass beads (Sigma, 212–300 µm diameter). Tubes were placed on ice, bead-beaten for 2 min at room temperature (BioSpec Mini-Beadbeater-8; set to “homogenize”), placed on ice, and bead-beaten for an additional 2 min, after which time RNA was extracted as described above for fecal and cecal samples. RNA-Seq. To characterize the impact of select mono- and polysaccharides on the in vitro gene expression of B. cellulosilyticus WH2, cells were cultured in MM supplemented with one of 31 distinct carbohydrates (for the formulation of MM and a list of the carbohydrates used as growth substrates, see Tables S12 and S13). After recovery from a frozen stock on BHI blood agar, a single colony was picked and inoculated into 5 mL of MM containing 5 mg/mL glucose (MM-Glc). Anaerobic conditions were generated within each individual culture tube using a previously described method [64] with the following modifications: (i) the cotton plug was lit and extinguished before being pushed below the lip of the culture tube, and (ii) 200 µL of saturated sodium bicarbonate was combined with 200 µL 35% (w/v) pyrogallate solution on top of the cotton plug before a bare rubber stopper was used to seal the tube. Cultures were grown overnight at 37°C. Twenty microliters of this “starter” culture were subsequently inoculated into a series of “acclimatization” cultures, each containing 5 mL of MM plus one of the 31 carbohydrates to be tested (5 mg/mL final concentration), and anaerobic culturing was carried out as above. This second round of culturing served two purposes: (i) it ensured cells were acclimated to growth on their new carbohydrate substrate prior to the inoculation of the final cultures that were harvested for RNA, and (ii) it provided an opportunity to obtain OD600 measurements indicating, for each carbohydrate, the range of optical densities corresponding to B. cellulosilyticus WH2's logarithmic phase of growth. Finally, 50 µL of each “acclimatization” culture were inoculated into triplicate 10 mL volumes of medium of the same composition, and the 90 “harvest” cultures were grown anaerobically at 37°C. At mid-log phase, 5 mL of cells were immediately preserved in Qiagen RNAprotect Bacteria Reagent according to the manufacturer's instructions. Cells were then pelleted, RNAprotect reagent was poured off, and the bacteria were stored at −80°C. After thawing, while still cold, each bacterial cell pellet was combined with 500 µL Buffer B (200 mM NaCl, 20 mM EDTA), 210 µL of 20% SDS, and 500 µL of acid phenol∶chloroform∶isoamyl alcohol (125∶24∶1, pH 4.5). The pellet was resuspended by manual manipulation with a pipette tip and transferred to a 2 mL screwcap tube containing acid-washed glass beads (Sigma, 212–300 µm diameter). Tubes were placed on ice, bead-beaten for 2 min at room temperature (BioSpec Mini-Beadbeater-8; set to “homogenize”), placed on ice, and bead-beaten for an additional 2 min, after which time RNA was extracted as described above for fecal and cecal samples. Identification of Diet-Specific Fitness Determinants within the B. cellulosilyticus WH2 Genome Using Insertion Sequencing (INSeq) Whole genome transposon mutagenesis of B. cellulosilyticus WH2 was performed using protocols originally developed for B. thetaiotaomicron [42],[46], with some modifications. Initial attempts to transform B. cellulosilyticus WH2 with the pSAM_Bt construct reported by Goodman et al. yielded very low numbers of antibiotic-resistant clones, which we attributed to poor recognition of one or more promoters in the mutagenesis plasmid. Replacement of the promoter driving expression of the transposon's erythromycin resistance gene (ermG) with the promoter for the gene encoding EF-Tu in B. cellulosilyticus WH2 (BWH2_3183) dramatically improved the number of resistant clones recovered after transformation. The resulting library consisted of 93,458 distinct isogenic mutants, with each mutant strain containing a single randomly inserted modified mariner transposon. Of all predicted ORFs, 91.5% had insertions covering the first 80% of each gene (mean, 13.9 distinct insertion mutants per ORF). At 11 wk of age, male germ-free C57BL/6J mice (individually caged) were fed either a diet low in fat and rich in plant polysaccharides (LF/HPP) or high in fat and simple sugars (HF/HS). After a week on their experimental diet, animals received a single gavage containing the B. cellulosilyticus WH2 transposon library and 14 other species of bacteria (i.e., this artificial community consisted of the 12 species listed in Figure 1A, plus B. thetaiotaomicron 7330, E. rectale ATCC 33656, and Clostridium symbiosum ATCC 14940). After 16 d, fecal pellets were collected, and total fecal DNA was extracted. 500 ng of each fecal DNA extraction was diluted in 15 µL of TE buffer and digested with MmeI (4 U, New England Biolabs) in a 20 µL reaction supplemented with 10 pmoles of 12 bp DNA containing an MmeI restriction site (to improve the efficiency of restriction enzyme digestion) [42]. The reaction was incubated for 1 h at 37°C and then terminated (80°C for 20 min). MmeI-digested DNA was subsequently purified using 125 µL of AMPure beads (after washing the beads once with 100 µL of sizing solution (1.2 M NaCl and 8.4% PEG 8000)). The digested DNA was added to the beads and the solution incubated at room temperature for 5 min. Beads were pelleted with a magnetic particle collector (MPC), washed twice (each time using a mixture composed of 20 µL TE buffer (pH 7.0) and 100 µL sizing solution, with bead recovery via MPC after each wash), followed by two ethanol washes (180 µL 70% ethanol/wash) and air-drying for 10 min. Samples were resuspended in 18 µL TE buffer (pH 7.0), and DNA was removed after pelleting beads with the MPC. Ligation of adapters was performed in a 20 µL reaction that contained 16 µL of purified DNA, 1 µL of T4 Ligase (2000 U/µL; NEB), 2 µL 10× ligase buffer, and 10 pmol of barcoded adapter (incubation for 1 h at 16°C). Ligations were subsequently diluted with TE buffer (pH 7.0) to a final volume of 50 µL, mixed with 60 µL of AMPure beads, and incubated at room temperature for 5 min. Beads with bound DNA were pelleted using the MPC and washed twice with 70% ethanol as above. After allowing the ethanol to evaporate for 10 min, 35 µL of nuclease-free water was added, and the mixture was incubated at room temperature for 2 min before collecting the beads with the MPC. Enrichment PCR was performed in a final volume of 50 µL using 32 µL of the cleaned up sample DNA, 10 µL 10× Pfx amplification buffer (Invitrogen), 2 µL 10 mM dNTPs, 0.5 µL 50 mM MgSO4, 2 µL of 5 µM amplification primers (forward primer: 5′CAAGCAGAAGACGGCATACG3′, reverse primer: 5′AATGATACGGCGACCACCGAACACTCTTTCCCTACACGA3′), and 1.5 µL Pfx polymerase (2.5 U/µL; Invitrogen) (cycling conditions: denaturation at 94°C for 15 s; annealing at 65°C for 1 min; extension at 68°C for 30 s; total of 22 cycles). The 134 bp PCR product from each reaction was purified (4% MetaPhor gel; MinElute Gel Extraction Kit (Qiagen)) in a final volume of 20 µL and was quantified (Qubit, dsDNA HS Assay Kit; Invitrogen). Reaction products were then combined in equimolar amounts into a pool that was subsequently adjusted to 10 nM and sequenced (Illumina HiSeq 2000 instrument). Data Deposition All short read Illumina data used for COPRO-Seq and RNA-Seq analyses, GeneChip data, and genome sequencing/assembly data are available through GEO SuperSeries GSE48537 and NCBI BioProject ID PRJNA183545. The draft genome assembly for B. cellulosilyticus WH2 has been deposited at DDBJ/EMBL/GenBank under accession number ATFI00000000. Raw MS data are available from the Dryad Digital Repository: http://dx.doi.org/10.5061/dryad.7fj1k. Supporting Information Figure S1. Phylogenetic relatedness of B. cellulosilyticus WH2 to other Bacteroides species. (A) Near full-length 16S rRNA gene sequences from the B. cellulosilyticus WH2 isolate, its closest relatives (two strains of Bacteroides xylanisolvens, three strains of Bacteroides intestinalis, and the type strain of B. cellulosilyticus), and Parabacteroides distasonis (the latter was included as an outgroup) were aligned against the SILVA SEED using the SINA aligner [65]. The 5′ and 3′ ends of the resulting multiple sequence alignment were trimmed to remove ragged edges, and the final alignment was used to construct an approximately maximum-likelihood phylogenetic tree using FastTree v2.1.4 [66]. Sequences in the trimmed alignment used to generate the tree shown correspond to bases 22–1498 of the Escherichia coli 16S rRNA gene [67]. Parenthetical identifiers indicate the locus tag (for B. cellulosilyticus WH2, whose genome contains four copies of the 16S rRNA gene) or GenBank accession number (for all other strains) of each sequence included in the phylogenetic analysis. (B) Identity matrix summarizing the pairwise similarities (as % nucleotide sequence identity) for all 16S rRNA gene sequences used to construct the tree shown in panel (A). https://doi.org/10.1371/journal.pbio.1001637.s001 (EPS) Figure S2. Representation of all putative GH families identified in the B. cellulosilyticus WH2 genome compared to their representation in other sequenced Bacteroidetes species. Enumeration of the GH repertoire of B. cellulosilyticus WH2 relative to (A) the six other Bacteroidetes species included in the artificial microbial community described in Figure 1A, and (B) the 86 Bacteroidetes currently annotated in the CAZy database. GH numbers in red signify CAZy families whose representation is greater in B. cellulosilyticus WH2 than in any of the other Bacteroidetes to which it is being compared. An asterisk following a GH family number indicates that genes encoding proteins from that family were found exclusively in the B. cellulosilyticus WH2 genome. In (B), GH family numbers are ordered from left to right and from top to bottom by their average representation within the 87 Bacteroidetes genomes included in the analysis. https://doi.org/10.1371/journal.pbio.1001637.s002 (TIF) Figure S3. Design and sampling schedule for experiments E1 and E2. In each experiment, two groups of C57BL/6J germ-free mice were gavaged at 10–12 wk of age with a 12-member artificial human gut microbial community (the day of gavage, referred to as day 0, is denoted by a large black arrow). Over time, animals were fed diets low in fat and high in plant polysaccharides (LF/HPP, bold green) or high in fat and simple sugar (HF/HS, bold orange) in alternating fashion. Fecal pellets and cecal contents were collected as indicated for profiling community membership and gene expression (sample types are denoted by a circle's color and the methods applied to each sample are indicated in parentheses within the sample key). Values shown along the time course indicate the number of days since gavage of the artificial community into germ-free animals. https://doi.org/10.1371/journal.pbio.1001637.s003 (EPS) Figure S4. COPRO-Seq analysis of the proportional representation of component taxa in the 12-member artificial community as a function of time after colonization of gnotobiotic mice and the diet they were consuming. (A) Average DNA yields from fecal and cecal samples collected from each treatment group in experiment E1. (B) DNA yields from samples collected in experiment E2. (C) COPRO-Seq quantitation of the 12 bacterial species comprising the assemblage used to colonize germ-free mice in experiments E1 and E2. Vertical dashed lines at days 14 and 28 denote time-points at which diets were switched. Panels (A–C) share a common key, provided in the upper right. Circles and triangles denote samples from experiments E1 and E2, respectively. Cecal sample data points (obtained at sacrifice on day 42 of the experiment) are plotted as for fecal sample data, but with inverted colors (i.e., colored outline, solid black fill). For all panels (A–C), data shown are mean ± SEM. https://doi.org/10.1371/journal.pbio.1001637.s004 (EPS) Figure S5. Further COPRO-Seq analysis of the relative abundance of components of the 12-member bacterial community as a function of diet and time. (A) Plot of the ordination results for experiment 1 (E1) from the PCoA described in Figure 1B. COPRO-Seq data from E1 and E2 were ordinated in the same multidimensional space. For clarity, only data from E1 are shown (for the E2 PCoA plot, see Figure 1B). Color code: red/blue, feces; pink/cyan, cecal contents. (B) Heatmap representation of the relative abundance data from E1 normalized to each species' maximum across all time-points within a given animal (“Percentage of maximum achieved (PoMA)”). Each heatmap cell denotes the mean for one treatment group (n = 7 animals), and each treatment group is shown as its own heatmap. https://doi.org/10.1371/journal.pbio.1001637.s005 (EPS) Figure S6. GeneChip profiling of the cecal metatranscriptome in mice fed different diets. (A) Venn diagram illustrating the number of bacterial genes whose expression was scored as “present” (i.e., detectable in ≥5 of 7 animals), only in mice that were consuming the plant polysaccharide-rich LF/HPP diet, only in mice that were consuming a “Western”-like HF/HS diet, or in both groups. (B) Overview of the diet-specificity of CAZyme gene expression in the 12-member model microbiota and in four prominent taxa that maintained a proportional representation in the cecal microbiota that was >5% on each diet. https://doi.org/10.1371/journal.pbio.1001637.s006 (EPS) Figure S7. Dissecting the in vivo expression of EC 3.2.1.8 (endo-1,4-β-xylanase). (A) Gene expression in E2 fecal samples was evaluated by microbial RNA-Seq. After data from all 12 species in the model human gut microbiome were binned by EC number annotation and normalized (i.e., data were “community-normalized” at the level of ECs), a significant decrease in the representation of EC 3.2.1.8 in the metatranscriptome was observed when comparing the final time-point of the first diet phase (day 13, LF/HPP diet) and the final time-point of the second diet phase (day 27, HF/HS diet) (Mann–Whitney U test, p = 0.03). (B) Transcribed B. cellulosilyticus WH2 genes account for >99% of community-normalized RNA-Seq counts assignable to EC 3.2.1.8 (note how counts at the community level in panel (A) compare to those attributable to B. cellulosilyticus WH2 in panel (B)). Thus, B. cellulosilyticus WH2 essentially dictates the degree to which expressed endo-1,4-β-xylanase genes are represented within the metatranscriptome. (C) B. cellulosilyticus WH2 contributes a greater number of community-normalized RNA-Seq counts to the metatranscriptome in LF/HPP-fed mice than in HF/HS-fed animals. (D) When B. cellulosilyticus WH2 gene expression data are normalized independently of data from other taxa (i.e., when data are “species-normalized”), statistically significant increases in the representation of EC 3.2.1.8 within the B. cellulosilyticus WH2 transcriptome become apparent in HF/HS-fed mice. (E) Breakdown of the total species-normalized counts in panel (D) by the B. cellulosilyticus WH2 gene from which they were derived. For all panels (A–E), mean values ± SEM are shown. Means for all panels were calculated from data from four animals at each time-point, except day 26 (n = 2). In each of the first four panels (A–D), the differences between day 13 and day 27 were deemed statistically significant by Mann–Whitney U test (p = 0.03 for each of the four tests performed). https://doi.org/10.1371/journal.pbio.1001637.s007 (EPS) Figure S8. Shotgun metaproteomic analysis of cecal samples from gnotobiotic mice colonized with the 12-member artificial community. (A) Each species' theoretical proteome was subjected to in silico trypsinization (see Materials and Methods). Of the resulting peptides, those specific to a single protein within our database of all predicted proteins encoded by the genomes of the 12 assemblage members, the mouse, and three bacterial “distractors” (E. rectale, F. prausnitzii, and R. torques) were classified as “unique,” while all others were considered “nonunique.” The “unique” fraction of a species' predicted peptides indicates how many can be unambiguously traced back to a single protein of origin if detected by LC-MS/MS. (B) Comparison of the average relative cecal abundance of each assemblage member (dark gray bars) with the percentage of proteins within its theoretical proteome that were detected by LC-MS/MS (red bars), and the percentage of all genes within its genome whose expression was detected using our custom GeneChip (light gray bars). Data shown are mean values ± SEM. (C) Scatter plots illustrating the Pearson correlation coefficient (r) between log-transformed averages of diet-specific fold-differences in expression as determined by GeneChip assay (RNA, x-axis) and LC-MS/MS (protein, y-axis) in E1. Data points within the black scatter plot represent the 448 B. cellulosilyticus WH2 genes for which reliable quantitative data could be obtained for animals in both diet treatment groups for both the GeneChip and LC-MS/MS assays (i.e., any gene for which a signal could not be detected on at least one diet treatment in at least one assay was excluded). Scatter plots in color represent the results of correlation analyses performed on subsets of genes within the black plot whose KEGG annotations fall within particular functional categories, including “Translation” (r = 0.03, 59 genes), “Energy metabolism” (r = 0.36, 58 genes), “Amino acid metabolism” (r = 0.48, 67 genes), and “Carbohydrate metabolism” (r = 0.69, 110 genes). For both (B) and (C), n = 2 mice per treatment group (4 mice total). https://doi.org/10.1371/journal.pbio.1001637.s008 (EPS) Table S1. Sequencing statistics for B. cellulosilyticus WH2. https://doi.org/10.1371/journal.pbio.1001637.s009 (XLSB) Table S2. B. cellulosilyticus WH2 genome features with relevance to carbohydrate metabolism. https://doi.org/10.1371/journal.pbio.1001637.s010 (XLSB) Table S3. Composition of the 12-member artificial community inoculated by oral gavage into germ-free animals. https://doi.org/10.1371/journal.pbio.1001637.s011 (XLSB) Table S4. Bacterial strains included in this study. https://doi.org/10.1371/journal.pbio.1001637.s012 (XLSB) Table S5. COPRO-Seq quantitation of the relative abundances of artificial community members over time. https://doi.org/10.1371/journal.pbio.1001637.s013 (XLSB) Table S6. GeneChip measurements of cecal gene expression for the 12 bacterial species comprising the artificial human gut microbial community studied in experiment E1. https://doi.org/10.1371/journal.pbio.1001637.s014 (XLSB) Table S7. List of EC numbers whose representation within the fecal metatranscriptome is significantly impacted by diet. https://doi.org/10.1371/journal.pbio.1001637.s015 (XLSB) Table S8. Summary of theoretical peptidome statistics. https://doi.org/10.1371/journal.pbio.1001637.s016 (XLSB) Table S9. Number of proteins detected within each cecal sample for each species in our custom SEQUEST database. https://doi.org/10.1371/journal.pbio.1001637.s017 (XLSB) Table S10. Raw and normalized MS/MS spectral counts for detectable proteins in E1 cecal samples. https://doi.org/10.1371/journal.pbio.1001637.s018 (XLSB) Table S11. Growth of B. cellulosilyticus WH2 and B. caccae on a panel of structurally diverse carbohydrates. https://doi.org/10.1371/journal.pbio.1001637.s019 (XLSB) Table S12. Preparation of minimal medium for in vitro gene expression profiling of B. cellulosilyticus WH2. https://doi.org/10.1371/journal.pbio.1001637.s020 (XLSB) Table S13. Carbohydrate substrates tested during in vitro gene expression profiling of B. cellulosilyticus WH2. https://doi.org/10.1371/journal.pbio.1001637.s021 (XLSB) Table S14. RNA-Seq gene expression values for B. cellulosilyticus WH2 grown in vitro on 31 simple and complex saccharides. https://doi.org/10.1371/journal.pbio.1001637.s022 (XLSB) Text S1. Supplementary results. https://doi.org/10.1371/journal.pbio.1001637.s023 (DOCX) Acknowledgments We thank Maria Karlsson, David O'Donnell, Sabrina Wagoner, and Su Deng for superb technical assistance, as well as Nathan VerBerkmoes, Alejandro Reyes, and Federico Rey for helpful suggestions throughout the course of this study.
Comparative Sex Chromosome Genomics in Snakes: Differentiation, Evolutionary Strata, and Lack of Global Dosage Compensationdoi: 10.1371/journal.pbio.1001643pmid: 24015111
Author summary Sex chromosomes have evolved from non-sex-determining chromosomes (autosomes) multiple times throughout the tree of life. In snakes, females are the heterogametic sex, in that they have two different sex chromosomes, Z and W, while males have two Z chromosomes. However, while in some snake species (e.g., boids) the Z and W chromosomes look identical (“homomorphic”), in others (e.g., vipers) they are very different in size and structure (“heteromorphic”); yet other species (e.g., colubrids) appear intermediate between these two states. This diversity makes snakes ideally suited for studying the evolutionary dynamics of sex chromosome differentiation. Here we sequence the genomes of three snake species (boa, rattlesnake, garter snake) that display varying levels of sex chromosome heteromorphism and perform a comparative genome analysis. This allows us to establish important principles in the biology of snake sex chromosomes, including the identification of evolutionarily distinct regions along the snake sex chromosomes that reflect the loss of recombination at different time points, higher mutation rates in male snakes, and faster evolution of protein-coding genes on snake Z chromosomes. In contrast to conclusions drawn from previous cytogenetic data, we show that the sex chromosomes of colubrid snakes are completely heteromorphic at the DNA sequence level, and that recombination along the sex chromosomes was already abolished in a common ancestor of colubrids and vipers. Finally, we also show that snakes have not evolved chromosome-wide mechanisms to compensate for reduced gene expression of the sex chromosomes in the heterogametic sex. Introduction Heteromorphic sex chromosomes are derived from ordinary autosomes [1],[2]. The acquisition of a sex-determining gene initiates sex chromosome evolution, and in many lineages, initially identical homomorphic sex chromosomes differentiated into heteromorphic sex chromosomes [1],[2]. For homomorphic sex chromosomes to evolve independently and differentiate, recombination needs to be abolished between the homomorphic proto-sex chromosomes. Sexually antagonistic alleles that accumulate on proto-sex chromosomes are thought to select for a suppression of recombination between homomorphic sex chromosomes [2]–[4]. A lack of recombination, over parts or all of the length of the sex chromosome that is limited to one sex only (the Y in species with male heterogamety, or the W in species with female heterogamety) results in degeneration of the non-recombining region. That is, the non-recombining portion of the Y or W chromosome loses most of the genes that were ancestrally present on the proto-sex chromosome [4]. Differentiation of homomorphic sex chromosomes resulting in heteromorphic sex chromosomes is a by-product of degeneration of non-recombining regions along the Y or W chromosome, and can encompass the entire chromosome, or just parts of it. Many familiar sex chromosome systems, such as those of humans or Drosophila, have highly differentiated X and Y chromosomes, exhibiting few signatures of their evolutionary history [4]. Some species groups, however, contain taxa at various stages in this transition from morphologically identical to fully differentiated sex chromosomes. Why some species abolish recombination along their sex chromosomes and acquire heteromorphic XY or ZW chromosomes, while others maintain homomorphic sex chromosomes, is not entirely clear. In principle, this could be due to a lack of sexually antagonistic mutations in some species, or the resolution of conflict imposed by sexually antagonistic mutations by evolving sex-specific or sex-biased expression [5]. All snakes exhibit genetic sex determination, with females being the heterogametic sex, but cytogenetic studies suggest that tremendous variation exists among taxa with regards to the level of W degeneration [6],[7]. It was in fact comparative cytogenetic work in snakes that led Ohno to first propose that vertebrate sex chromosomes are derived from autosomes, and snakes show the entire continuum of sex chromosome differentiation [6],[7]. Boidae and Pythonidae, which diverged from the remaining snakes about 100 MY ago [8], have homomorphic sex chromosomes, that is the Z and W chromosomes appear undifferentiated at the cytological level. In contrast, W chromosomes appear highly degenerated and heterochromatic in snakes belonging to the Elapidae and the Viperidae. Colubrid snakes, which diverged from vipers about 50 MY ago [8], are at an intermediate stage of sex-chromosome evolution and often have moderately differentiated Z and W chromosome karyotypes [6],[7],[9],[10]. Thus, snakes are an excellent model system for studying the evolutionary processes driving sex-chromosome differentiation in vertebrates. In addition, snakes provide an independent example of a female-heterogametic system. Like in birds, female snakes contain the sex-limited chromosome, and while ZW systems mimic XY species in many respects, such as in the degeneration of the non-recombining W or Y chromosome, they appear to differ in other aspects [11],[12]. In particular, while XY species generally have global dosage compensation mechanisms that result in equal expression of X-linked genes in males and females, all ZW species studied to date lack chromosome-wide dosage compensation [13]–[17]. The generality of these patterns, and the evolutionary reasons for this different behavior of XY versus ZW systems, however, are not fully understood [12],[18]. Thus, while snakes promise to reveal important insights into sex chromosome differentiation, the lack of genomic resources has limited progress. Here we sequence and perform a comparative genome analysis of three snake species displaying varying levels of sex chromosome heteromorphism to investigate sex chromosome differentiation in this clade. In boas (Boa constrictor, family Boidae), the Z and W are mostly homomorphic and show low levels of differentiation, while pygmy rattlesnakes (Sistrurus miliarius, family Viperidae) have highly differentiated sex chromosomes with little to no homology remaining between the Z and W [6],[7],[9],[19]. Garter snakes (genus Thamnophis, family Colubridae) contain species with both homomorphic and heteromorphic sex chromosomes [20], and we sequenced the common garter snake (Thamnophis elegans), which is reported to have homomorphic sex chromosomes based on cytological evidence [20]. In addition, we obtain transcriptomes from boa and pygmy rattlesnake, to test for dosage compensation in snakes. Comparing the transcriptomes between male and female Boidae allows us to establish baseline levels of sex-biased transcription at homomorphic sex chromosomes, to test for the absence or presence of dosage compensation at heteromorphic sex chromosomes in Viperidae. Results Genome and Transcriptome Assemblies in Snakes We sequenced the genome of a single male and single female of the common garter snake (T. elegans, family Colubridae) and pygmy rattlesnake (S. miliarius, family Viperidae), as well as a single female boa (B. constrictor, family Boidae; publically available male boa reads and scaffolds were obtained from the Assemblathon website, see Methods). Pygmy rattlesnake and garter snake scaffolds were assembled using SOAPdenovo (Table S1). We performed RNA-seq on a single male and female individual of both boa and pygmy rattlesnake to study patterns of gene expression (see Methods). A SOAPdenovo assembly yielded 297,551 transcripts for boa and 142,100 for pygmy rattlesnake, of which 43,977 and 37,667, respectively, mapped to protein-coding genes of the lizard Anolis carolinensis, the closest reptile relative with a completely sequenced and annotated genome. After removal of redundant transcripts and concatenation of transcripts corresponding to different parts of the same Anolis gene, our final sample consisted of 10,793 boa genes and 11,939 pygmy rattlesnake genes. Identification of Z-Linked Sequences in Snakes Karyologically, snakes are highly conserved with a preponderance of species with 2n = 36 (16 macro- and 20 microchromosomes). All snakes show genetic sex determination, with females being ZW. The sex chromosomes of snakes were shown to be homologous in different families [9],[10], and correspond to chromosome 6 of the butterfly lizard Leiolepis reevesii rubritaeniata [21]. Karyotypes and synteny have been well conserved during 280 million years of reptile evolution; for example, 19 out of 22 anchored chicken chromosomes are syntenic to a single Anolis chromosome over their entire length [22]. Thus, we used chromosomal information from Anolis as a proxy to anchor genes along chromosomes in snakes. We used genomic coverage of de novo assembled scaffolds in males versus females to identify sex chromosomes of snakes. Specifically, Z-linked regions with a degenerate W homolog should only have half the genomic coverage in females relative to males, while autosomal regions and undifferentiated sex-linked regions (pseudoautosomal regions [PARs]) should have equal genomic coverage in both sexes [14]. Genomic scaffolds from all three species were assigned to the chromosomes of Anolis, and male and female genomic reads were mapped to the scaffolds to estimate their male and female coverage. This coverage analysis reveals that snake scaffolds homologous to chromosomes 1–5 of Anolis have similar coverage in males and females in all three species (Figure 1). Chromosome 6 of Anolis corresponds to the Z chromosome of pygmy rattlesnake and garter snake, as it shows an almost 2-fold reduction in female to male coverage relative to the other chromosomes in this species (Figure 1). No such reduction is observed in boa, confirming that the Z and W are homomorphic even at the sequence level in this species. Thus, the coverage analysis shows that, at the nucleotide level, both pygmy rattlesnake and garter snake have fully differentiated sex chromosomes (Figure 1); we do not identify any segments along the Z chromosome with similar coverage in the two sexes, as would be expected for a PAR. Note that we cannot rule out the presence of very small PARs, or that the PAR is derived from a region not homologous to Anolis chromosome 6. On the other hand, the sex chromosomes of boa appear entirely homomorphic, and we detect no regions of differentiation along the Z, indicating that the boa sex chromosome is indeed recombining over almost all of its length (see Discussion). Download: PPT PowerPoint slide PNG larger image TIFF original image Figure 1. Normalized read coverage depth for female (red) and male (blue) scaffolds ordered along the Anolis genome for (A) boa, (B) garter snake, and (C) pygmy rattlesnake. The points show the normalized log2 coverage for each scaffold and the lines represent a smoothing spline drawn along the chromosome. Coverage was normalized by dividing the coverage for each scaffold-by-sex combination by the median coverage of all scaffolds in chromosomes 1–5 in that sex, resulting in a median log2 coverage score for autosomes of 0. Under this normalization, hemizygous sequences are expected to have a median log2 coverage of −1. Scaffolds were mapped to the Anolis macrochromosomes on the basis of the location of their gene content. The phylogenetic relationship between the species investigated is shown. Boids split from the other two groups about 100 MY ago, while colubrids and viperids diverged about 50 MY ago [8]. The snake photographs are used under a Creative Commons Attribution 2.5 Generic license (CC BY 2.5). Credit for the photographs are as follows: Nick Turland for the western pygmy rattlesnake (http://www.flickr.com/photos/nturland/1436776818/); Guilherme Jófili for the Boa constrictor (http://www.flickr.com/photos/gjofili/5005623645/); Steve Jurvetson for the coastal garter snake (http://www.flickr.com/photos/jurvetson/825514494/). Additional permission to publish the western pygmy rattlesnake image was granted by Nick Turland. https://doi.org/10.1371/journal.pbio.1001643.g001 Verification of Z-linkage and Conservation of Synteny Two independent lines of evidence confirm that we correctly assign sex-linkage of our genomic scaffolds. First, SNP patterns in our RNA-seq sample corroborate that homologues of Anolis chromosome 6 genes are Z-linked in snakes (the higher coverage of the RNA-seq data for many genes makes SNP inferences more reliable than using our genomic data). If the W is fully degenerated, as in pygmy rattlesnake, SNP patterns are expected to differ between males and females on the Z but should be similar on autosomes. Female individuals carry only one copy of the Z, and should therefore harbor no polymorphism at Z-linked genes, whereas ZZ male individuals can exhibit variation at Z-linked loci. Consistent with this, over 25% of all pygmy rattlesnake genes assigned to chromosomes 1 to 5 have at least one SNP in both the male and female sample. For Z-linked genes (chromosome 6), this proportion is 29% for the male, but only 2% for the female (7 genes out of 247; see Figure S1), confirming that overall we are classifying sex-linked genes correctly. The seven genes with female SNPs are not adjacent to each other on the Z, and thus they are unlikely to be derived from an undetected PAR. Instead, remaining heterozygosity could be due to several factors, including sequencing or mapping errors, mapping of W-derived reads (see next section), undetected paralogs, or movement of these genes to an autosome. In boa, the W and Z are homomorphic; thus, most Z-linked genes have a homologous copy on the W and are therefore expected to show normal levels of polymorphism in the female. This is indeed what we observe, with female SNP counts of genes on the boa Z well within the range of the autosomes (Figure S1). Second, we mapped 11 known Z-linked cDNA clones of the rat snake Elaphe quadrivirgata [9] to the Anolis genome, and nine of them mapped to chromosome 6 of Anolis (the remaining two mapped to unmapped scaffolds; Figure S2). On the other hand, only one of the remaining non-sex-linked clones of rat snake mapped to chromosome 6 while 92 mapped to other chromosomes or to unmapped regions (Table S2). This strongly suggests that the Z chromosome of snakes is fully homologous to the Anolis chromosome 6. The 11 Z-linked markers were found to be collinear in three snake species investigated by FISH mapping (with representatives from each of Boidae, Viperidae, and Colubridae [9]), and we found the same order of these markers along chromosome 6 of Anolis (Figure S2). Thus, the overall structure of chromosome 6 of Anolis and the Z of snakes appears conserved, suggesting that there have been no large-scale chromosomal rearrangements between snakes and lizards. In addition, we also compared synteny of the genome assemblies of Anolis and boa, the snake species with the best-assembled genome, to detect chromosomal rearrangements at a smaller scale. Indeed, we find that micro-synteny is also remarkably conserved between these two species; almost all of the boa scaffolds are co-linear with Anolis (Figure S3), and we find evidence for only two small inversions on the Z chromosome of boa. Thus, all genes in our snake data mapping to chromosome 6 of Anolis were assigned as Z-linked, while genes mapping to chromosomes 1 to 5 were used as autosomal control genes. In addition to conserved chromosomal homology, we also expect that the relative locations of genes along the Z chromosome are, to a large extent, conserved between species (Figure S2). Identification of W-Linked Sequences in Snakes W-linked sequences are present only in females, and we searched for genomic scaffolds with female-specific read coverage to identify putative W-derived scaffolds (see Methods). In order to gain a chromosome-wide perspective on conservation of putatively W-linked sequences, we examined female-specific genomic scaffolds, regardless of whether they contained coding regions (putative W-linked coding sequences are discussed in the section Recombination suppression predates Viperidae–Colubridae divergence below). We selected female specific scaffolds that mapped to scaffolds homologous to Anolis chromosomes 1–6 with more than 150 aligned nucleotides as a candidate set enriched for W-derived sequences. If such putative W-linked sequences are derived from genomic regions of the ancestral autosome that formed the sex chromosomes in snakes, we expect them to be enriched for sequences homologous to inferred Z-linked scaffolds (i.e., scaffolds homologous to Anolis chromosome 6). Indeed, we find this to be the case: while the Z chromosome homolog in Anolis accounts for only 7.6% of the macrochromosomal genome, we find that 49% of the candidate W-scaffolds with homologs in our assembly map to Z-linked sequences in pygmy rattlesnake and 35% in garter snake (p<1×10−15; Figures 2 (histograms) and S4). Thus, W-linked candidate sequences in snake species with heteromorphic sex chromosomes are significantly enriched for homologs residing on the Z chromosome (4.5-fold for garter snake, 6.4-fold for pygmy rattlesnake). No such excess is seen for the homomorphic sex chromosomes of boa, and female-biased scaffolds map randomly across the genome, as expected for background noise mapping. Only 6.5% of W-candidate scaffolds of boa map to Anole chromosome 6, which is not statistically different than its proportion of euchromatin in the assembly, or 7.6% (binomial test, p = 0.63). Download: PPT PowerPoint slide PNG larger image TIFF original image Figure 2. Mapping of the best candidate female-specific W-candidate scaffolds of (A) boa, (B) garter snake, and (C) pygmy rattlesnake to the Anolis macrochromosomes. The histograms (left) show the number of candidate W scaffolds mapped to the six major chromosomes of Anolis, with the green bar highlighting the Anolis homolog to the snake Z chromosome. The W candidates homologous to Z-linked scaffolds (which accounts for 7.6% of the genome) make up 49% and 35% of all female-biased scaffolds in pygmy rattlesnake and garter snake, respectively. This is a 6.4-fold excess in pygmy rattlesnake, and 4.5-fold excess in garter snake over random mapping based on chromosome size. In contrast, in Boa, only 6.5% of scaffolds map to chromosome 6, which does not differ from random mapping on the basis of chromosome size. The right panel shows color-coded mapping density of W-candidates along the Anolis macrochromosomes. The density of W-candidates is not uniform across the Z chromosome in both pygmy rattlesnake and garter snake (p<0.0001 for mean nearest neighbor distances). The data in this figure are from all female biased candidate W-linked scaffolds and will thus contain both non-coding scaffolds as well as scaffolds containing protein coding genes. https://doi.org/10.1371/journal.pbio.1001643.g002 Evolutionary Strata on Snake Sex Chromosomes Our mapping of these W-candidate genomic scaffolds from each species to their genomic scaffolds anchored along the Anolis genome also allowed us to test if W-linked candidate scaffolds are randomly distributed along the Z chromosome, or if they cluster in certain genomic regions. Clustering of W-candidates would suggest that different regions of the Z chromosome have different evolutionary histories, that is, snake Z chromosomes might possess evolutionary strata, as observed in other taxa [23]–[25]. In particular, regions of the Z chromosome with a higher density of W scaffolds and higher similarity between Z and W sequences are indicative of segments along the sex chromosomes that abolished recombination more recently than ones that lack W homologs [23]. Figure 2 shows our mapping of candidate W-linked scaffolds against autosomal and Z-linked scaffolds in each species. We find that the density of W-scaffolds is not uniform across the Z for either pygmy rattlesnake or garter snake (p<0.0001; Figure S4). In both pygmy rattlesnake and garter snake, we identify at least two (and possibly three) evolutionary strata across the Z chromosomes that differ in their density of W paralogs, and their degree of nucleotide conservation between the Z and W (Figures 2 and 3). Enrichment of W scaffolds on the Z and identification of strata is robust to the cutoff used to select W-candidates (Figures S5 and S6). Download: PPT PowerPoint slide PNG larger image TIFF original image Figure 3. Evolutionary strata and sequence conservation between the pygmy rattlesnake and garter snake W-candidate scaffolds mapped along the Z chromosome. The middle three tracks show the position of candidate W sequences along the Z chromosome in garter snake (top) and pygmy rattlesnake (bottom), and their overlap (center). The top and bottom plots show nucleotide identity between Z-W gametologs (grey dots), and the median (red line) inferred for each of the putative strata for garter snake and pygmy rattlesnake (blue boxes; the y range of the boxes represents the interquantile range of the identity values in each region). The gray shaded region represents identity below 30%, indicating low quality mapping. As in Figure 2, the data in this figure contain both non-coding and coding scaffolds. https://doi.org/10.1371/journal.pbio.1001643.g003 Limited Z-W homology does not allow us to precisely determine the boundaries of strata or their age, but the overall architecture of strata looks similar in pygmy rattlesnake and garter snake. In particular, in both species the oldest stratum encompasses the middle of the chromosome (from roughly 2×107–5×107 bp), with a lower density of homologous sequences between the Z and W, most of which match at levels that are no different than chance (Figure 3). The distal regions on the Z show a much higher density of W-linked candidate regions compared to the center segment (Figure 3; p<<0.0001 for both garter snake and pygmy rattlesnake), indicating that the middle of the W chromosome is largely degenerated whereas the distal regions maintain substantial homology. A lower mapping density of the first 20 Mb might be indicative of a slightly older stratum in both pygmy rattlesnake and garter snake than the last 30 Mb (see Figures 2 and 3), but divergence within those regions is comparable (46% versus 42% identity for garter snake and 53% versus 51% for pygmy rattlesnake, neither comparison significantly different). Whether or not the two distal regions represent the same stratum or different strata remains unresolved, and will likely require high quality sequencing of both sexes of multiple species among Colubrids and Viperids. Thus, our data clearly reveal the presence of at least two, and possibly three evolutionary strata on snake sex chromosomes (Figures 3 and S6), and their similar architecture suggests that their formation may have pre-dated the split between Viperidae and Colubridae roughly 50 MY ago. Recombination Suppression Predates Viperidae–Colubridae Divergence Three lines of evidence support that the strata identified in the genomic scaffolds above were indeed formed prior to the divergence between the two species groups. Mapping Z-linked genes to putative W-linked sequence allowed us to retrieve W-gametologs for 55 of these genes in pygmy rattlesnake and 29 in garter snake (see Methods). The average synonymous sequence divergence between the Z-W gametologs exceeds the median synonymous divergence between garter snake and pygmy rattlesnake (0.28 versus 0.20, p = 0.076 with a two-tailed Wilcoxon test for pygmy rattlesnake; 0.32 versus 0.20, p = 0.004 for garter snake; Figure S7; Table S3, S4, S5), suggesting that recombination between the sex chromosomes ceased before the species split. In addition, if the W chromosomes had diverged completely independently in the two lineages, limited overlap between W-linked sequences (coding and non-coding) between garter snake and pygmy rattlesnake would be expected, but we find an excess of regions along the W to be shared between species (Figures 3 and S8; p<0.0001 for all comparisons, via Monte Carlo resampling of observed numbers of scaffolds based on locations of all scaffolds). Similarly, we find more protein-coding genes shared by the W-chromosomes of both species than expected by chance (15 observed versus 1.9 expected, p<0.001 with a goodness of fit Chi-square test). Although this fraction is highest for genes in the oldest stratum (Figure S9, due to the presence of a single common gene in this part of the W-chromosomes of both species), it is above 1 for each interval (from 4.6 to 14). This suggests that many W-linked genes had already degenerated in the ancestor of pygmy rattlesnake and garter snake, and that the W had started to reach its equilibrium gene content at the time the two species split, especially in the oldest stratum where recombination was first abolished. Finally, we compared phylogenetic trees for ten regions along the sex chromosomes for which we had homologous sequences from both the pygmy rattlesnake and garter snake Z and W chromosomes. If recombination suppression between the Z and W chromosomes preceded species divergence, we would expect the sequences to cluster by chromosome (i.e., W pygmy rattlesnake with W garter snake, and Z with Z). If pygmy rattlesnake and garter snake abolished recombination at some (or all) of the strata independently, the sequences should cluster by species. We find that for each gene tree constructed, the sequences cluster by chromosome and not species, suggesting that all strata on the sex chromosomes of Viperidae and Colubridae were formed prior to their species divergence (Figure S10). Transcribed W-Genes In addition to identifying 55 W-candidate genes from the pygmy rattlesnake genome assembly, we also assembled the female pygmy rattlesnake RNA-seq reads de novo, and searched this assembly for W sequences, based on female-limited expression and female-biased sequence coverage. Briefly, we mapped the pygmy rattlesnake male and female DNA-seq and RNA-seq reads to the transcriptome using Bowtie2 to estimate, for each transcript, the male and female coverage and expression level. While comparisons between female and male liver are unlikely to yield many genes with female-specific functions (as expected on the W chromosome), we find 12 pygmy rattlesnake transcripts with female to male coverage ratio below 0.1 and female-specific expression (nine after excluding transcripts derived from parasites), and W-linkage of six of these candidates was confirmed by female-specific PCR products (Figure S11). The list of confirmed and putative W-linked transcripts is provided in Table S6. Of the six confirmed W-derived transcripts, three consist of repetitive sequences or sequences derived from transposable elements, in agreement with the idea that degeneration of W/Y-chromosomes is often driven by the accumulation of active transposable elements. Although they were expressed at high enough levels to be detected in the female transcriptome, these repetitive/TE sequences had overall lower expression levels (mean fragments per kilobase of transcript per million mapped reads [FPKM]: 6.8) than the three other W-linked transcripts (mean FPKM: 13.4), providing some support for a functional role of the latter. Two of these non-repetitive transcripts have significant similarity to known vertebrate genes (ube2m and 28S ribosomal RNA gene, with tblastx E-values of 3.00e–11 and 1.00e–71), while the third transcript does not map to any known gene or contain any open reading frames (but has a relatively high level of expression at 14.2). Combining the DNA and RNA-based approach, we identify 61 potential genes on the W of pygmy rattlesnake (six from the transcriptome, and 55 from female-specific genomic scaffolds that map to chromosome 6/Z genes) and 29 on the W of garter snake (15 of which overlap between pygmy rattlesnake and garter snake). Thus, the W chromosome in both species contains considerably fewer genes than its former homolog, the Z chromosome (we detect 61 W-linked genes versus 712 Z-linked genes in pygmy rattlesnake, and 29 W-linked genes versus 723 Z-linked genes in garter snake). Molecular Evolution of Snake Sex Chromosomes Sex chromosomes are subject to unique evolutionary forces compared to autosomes [26]. Sex-biased transmission and hemizygosity of the Z in females can impact the efficacy of natural selection on the Z relative to autosomes (faster Z evolution). On one hand, selection can be more efficient at incorporating adaptive recessive mutations on the hemizygous Z-chromosome, resulting in increased rates of adaptive evolution [27],[28]. Alternatively, the Z can also experience more genetic drift due to a smaller effective population size relative to autosomes, resulting in an accumulation of slightly deleterious mutations [29],. In addition, Z-chromosomes are more often transmitted through males than are autosomes, and might be subject to higher mutation rates (male-driven evolution). That is, the many more cell divisions in spermatogenesis than in oogenesis can result in more mutations during DNA replication and thereby skew mutation rates between chromosomes [31]–[33]. Since synonymous sites are generally expected to evolve neutrally in vertebrates, rates of synonymous divergence (Ks) can be used as a proxy to infer mutation rate differences (i.e., male-driven evolution), while rates of amino-acid evolution (Ka) allow inferences regarding the efficacy of natural selection, and this has been used extensively to detect faster-X/Z evolution in a variety of species. Higher Rates of Protein and Synonymous Site Evolution on the Z The traditional approach to detect faster-X/Z evolution has been to compare rates of evolution of Z/X-linked genes to those of autosomal genes between a pair of species. Using pygmy rattlesnake and garter snake, this approach suggests that the Z is prone to accelerated rates of both synonymous (Ks autosome = 0.17 versus Ks Z = 0.19; see Tables S3 and S7) and non-synonymous (Ka autosome = 0.032 versus Ka Z = 0.037) evolution. However, the results of such pairwise comparisons are highly dependent on the gene content of each chromosome and can yield inconsistent results [34]. In snakes, the same set of genes is present on a differentiated Z (in pygmy rattlesnake and garter snake) and in a pseudoautosomal region (in boa), which is not expected to be subject to faster-Z or male-driven evolution. Under faster Z and male-driven evolution, we therefore expect this set of Z-linked genes to show an accelerated rate of divergence in the lineages with heteromorphic sex chromosomes (pygmy rattlesnake and garter snake) relative to the homomorphic lineage (boa). We calculated pairwise rates of synonymous (Ks) and amino acid (Ka) evolution, as well as their ratio (Ka/Ks), at protein-coding genes between Anolis and the different snake species. To test if they were consistent with increased rates of divergence on the Z in pygmy rattlesnake and garter snake compared with boa, we compared the following ratios (K stands for Ka, Ks or Ka/Ks depending on the analysis; see Figure 4) on the Z versus the autosomes: K(garter snake-Anolis)/K(boa-Anolis), K(pygmy rattlesnake-Anolis)/K(boa-Anolis), and K(garter snake-Anolis)/K(pygmy rattlesnake-Anolis). Under faster-Z and male-drive evolution, both K(pygmy rattlesnake-Anolis) and K(garter snake-Anolis) are expected to be increased on the Z, so that the ratio between the two may not differ significantly between the Z and the autosomes. K(pygmy rattlesnake-Anolis)/K(boa-Anolis), and K(garter snake-Anolis)/K(boa-Anolis), on other hand, are expected to be increased on the Z relative to the autosomes because boa is not under faster-Z/male-driven evolution (but garter snake and pygmy rattlesnake presumably are). Download: PPT PowerPoint slide PNG larger image TIFF original image Figure 4. Molecular evolution of snake Z chromosomes and autosomes at synonymous sites (Ks) and non-synonymous sites (Ka), and their ratio (Ka/Ks). For each gene, we calculated the Anolis-boa, Anolis-pygmy rattlesnake, and Anolis-garter snake rates of synonymous and non-synonymous evolution. To detect branch-specific differences, we obtained for each gene the ratios of these evolutionary rates between the different snake species pairs (pygmy rattlesnake/boa, garter snake/boa, and garter snake/pygmy rattlesnake), and plotted them for each macrochromosome. Please note that in the figure, “Pygmy Rattlesnake/Boa” refers to the ratio: pygmy-Anolis divergence/boa-Anolis divergence, and so on. Significant differences between the Z-chromosome and the autosomes are marked with asterisks (***, p<0.001; NS, non-significant). https://doi.org/10.1371/journal.pbio.1001643.g004 Overall, comparisons involving pygmy rattlesnake or garter snake yield slightly higher Ka and Ks values than the boa-Anolis comparison; however, this difference is significantly larger for the Z-chromosome (Figure 4) at the Ka, Ks, and Ka/Ks level, in agreement with faster-Z and male-driven evolution (Figure 4). Pygmy-Anolis and garter-Anolis comparisons, on the other hand, yield similar values for all chromosomes, as expected if both are under faster-Z evolution. The elevated Ka and Ks values on the Z suggest a general increase in mutation rates for Z-linked genes in males (i.e., male-driven evolution). The strength of male-driven evolution can be assessed by comparing the rates of synonymous evolution on the Z versus the autosomes, and α, the male to female ratio of mutation, is equal to (3 * KsZ−2 * KsA)/(4 * KsA−3 * KsZ) (where KsA and KsZ are the rates of synonymous evolution on the autosomes and the Z, respectively, and assuming that synonymous sites are generally neutral). Using data from pygmy rattlesnake and garter snake, we estimate that α is ∼1.8. Importantly, a significantly increased Ka/Ks ratio also supports faster evolution of the protein sequences of Z-linked genes, above the increase due to mutation rate differences (Figure 4). In summary, we find that snake genes on differentiated Z-chromosomes undergo accelerated rates of evolution relative to their homologs on the pseudoautosomal region, both at synonymous sites (consistent with different mutation rates between the sexes) and at protein sequences (likely due to mutational effects, but also due to differential selective effects on female-haploid chromosomes). Expression Analysis of Z-Linked Genes in Boa and Rattlesnake As W-chromosomes lack recombination and degenerate, Z chromosomes may evolve dosage compensation [35]. However, while taxa with male heterogamety generally equalize expression of X-linked genes between the sexes, all species investigated to date in which females are the heterogametic sex appear to lack chromosome-wide dosage compensation [12],[35]. To test for dosage compensation in pygmy rattlesnakes, we assayed gene expression in males and females for Z-linked and autosomal genes. One challenge of studying dosage compensation is that the expectations regarding levels of expression on the Z/X versus the autosomes in females and males are often unclear (see Discussion). Expression in boa provides a clear expectation for snakes, as, in the absence of W degeneration, the Z must have largely maintained the expression levels and biases of the ancestral autosome 6. Figure 5 shows that there is no reduction of expression in either male or female boa for this chromosome (p = 0.54, Wilcoxon test), and that the female to male ratio of expression is about 1 for all chromosomes. This provides clear base-line levels of absolute and sex-biased expression when testing for dosage compensation in snake species with heteromorphic sex chromosomes, as any expression bias observed on the Z must have arisen during or after the degeneration of the W. Download: PPT PowerPoint slide PNG larger image TIFF original image Figure 5. Log2 of expression in female, log2 of expression in male, and log2 of female over male expression, for the different macrochromosomes of boa (A) and pygmy rattlesnake (B). FPKM values were obtained for each gene using Cufflinks. Genes were assigned to different chromosomes according to their location in the Anolis genome. (C) log2 of female over male expression along chromosome 6 of Anolis (Z of snakes), using a sliding window size of 30 genes. https://doi.org/10.1371/journal.pbio.1001643.g005 In pygmy rattlesnake, chromosomes 1–5 have a female to male expression ratio of approximately 1, resembling patterns of gene expression in boas. The Z chromosome (chromosome 6), however, shows a clear reduction in levels of gene expression in females (p-value = 5.32e–11, Wilcoxon test), resulting in highly male-biased expression of this chromosome in pygmy rattlesnakes (the median of log2(F/M) is equal to −0.71 for the Z, corresponding to a 1.6-fold reduction, versus 0.05 for the autosomes, p-value<2.2e–16). While all genes with FPKM >0 were included in this analysis, the patterns are robust to changes in expression cutoffs (Figure S12), and a similar reduction is observed for genes with FPKM >10 in both males and females. This expression analysis thus provides clear evidence of a lack of global, chromosome-wide dosage compensation in ZW snakes with heteromorphic sex chromosomes. The female Z does not show a full 2-fold reduction of expression, which is likely to be at least partly associated with a general buffering mechanism for gene expression (i.e., a single gene dose does not result in halving of gene transcript), as observed in aneuploid Drosophila and human autosomal genes and on the uncompensated regions of the single female chicken Z [12]. In addition, snakes may show local dosage compensation, as found in birds [36], with some Z-linked genes being dosage-compensated. One possibility is that dosage compensation is restricted to segments of the chromosome, as has been suggested for the chicken Z chromosome [37]. A sliding-window analysis of sex-biased expression in pygmy rattlesnakes, with genes mapped according to their position on the Anolis chromosome 6, reveals some localized variation in female to male expression ratio along the chromosome, but no segment that is fully compensated (Figure 5). Whether this reflects the biology of the chromosome, or whether compensated chromosomal fragments in pygmy rattlesnake are simply masked by a large number of rearrangements since the snake-lizard split, will require the analysis of a fully assembled genomic sequence. While this is not yet possible for pygmy rattlesnake, a genomic assembly for boa is available (http://assemblathon.org/pages/download-data). Our analysis of synteny of genes on the largest boa scaffolds shows that there is a limited number of rearrangements between snakes and lizards, suggesting that this is unlikely to account for the patterns of female to male expression in pygmy rattlesnake (Figure S13). A more likely possibility is therefore that only some dosage-sensitive genes acquire dosage compensation, and that they do so on a gene-by-gene basis, similar to what is observed in birds and silkworm [12]. Consistent with this, the expression ratio histogram for Z-linked genes in pygmy rattlesnakes shows a shoulder of genes with a log2(F∶M) expression ratio of 0 (Figure 5B), in agreement with about 18% of all genes present on the Z being locally compensated. These dosage compensated genes show no clear clustering along the Z (Figure S14), as expected if specific genes become compensated individually on the Z. Genome and Transcriptome Assemblies in Snakes We sequenced the genome of a single male and single female of the common garter snake (T. elegans, family Colubridae) and pygmy rattlesnake (S. miliarius, family Viperidae), as well as a single female boa (B. constrictor, family Boidae; publically available male boa reads and scaffolds were obtained from the Assemblathon website, see Methods). Pygmy rattlesnake and garter snake scaffolds were assembled using SOAPdenovo (Table S1). We performed RNA-seq on a single male and female individual of both boa and pygmy rattlesnake to study patterns of gene expression (see Methods). A SOAPdenovo assembly yielded 297,551 transcripts for boa and 142,100 for pygmy rattlesnake, of which 43,977 and 37,667, respectively, mapped to protein-coding genes of the lizard Anolis carolinensis, the closest reptile relative with a completely sequenced and annotated genome. After removal of redundant transcripts and concatenation of transcripts corresponding to different parts of the same Anolis gene, our final sample consisted of 10,793 boa genes and 11,939 pygmy rattlesnake genes. Identification of Z-Linked Sequences in Snakes Karyologically, snakes are highly conserved with a preponderance of species with 2n = 36 (16 macro- and 20 microchromosomes). All snakes show genetic sex determination, with females being ZW. The sex chromosomes of snakes were shown to be homologous in different families [9],[10], and correspond to chromosome 6 of the butterfly lizard Leiolepis reevesii rubritaeniata [21]. Karyotypes and synteny have been well conserved during 280 million years of reptile evolution; for example, 19 out of 22 anchored chicken chromosomes are syntenic to a single Anolis chromosome over their entire length [22]. Thus, we used chromosomal information from Anolis as a proxy to anchor genes along chromosomes in snakes. We used genomic coverage of de novo assembled scaffolds in males versus females to identify sex chromosomes of snakes. Specifically, Z-linked regions with a degenerate W homolog should only have half the genomic coverage in females relative to males, while autosomal regions and undifferentiated sex-linked regions (pseudoautosomal regions [PARs]) should have equal genomic coverage in both sexes [14]. Genomic scaffolds from all three species were assigned to the chromosomes of Anolis, and male and female genomic reads were mapped to the scaffolds to estimate their male and female coverage. This coverage analysis reveals that snake scaffolds homologous to chromosomes 1–5 of Anolis have similar coverage in males and females in all three species (Figure 1). Chromosome 6 of Anolis corresponds to the Z chromosome of pygmy rattlesnake and garter snake, as it shows an almost 2-fold reduction in female to male coverage relative to the other chromosomes in this species (Figure 1). No such reduction is observed in boa, confirming that the Z and W are homomorphic even at the sequence level in this species. Thus, the coverage analysis shows that, at the nucleotide level, both pygmy rattlesnake and garter snake have fully differentiated sex chromosomes (Figure 1); we do not identify any segments along the Z chromosome with similar coverage in the two sexes, as would be expected for a PAR. Note that we cannot rule out the presence of very small PARs, or that the PAR is derived from a region not homologous to Anolis chromosome 6. On the other hand, the sex chromosomes of boa appear entirely homomorphic, and we detect no regions of differentiation along the Z, indicating that the boa sex chromosome is indeed recombining over almost all of its length (see Discussion). Download: PPT PowerPoint slide PNG larger image TIFF original image Figure 1. Normalized read coverage depth for female (red) and male (blue) scaffolds ordered along the Anolis genome for (A) boa, (B) garter snake, and (C) pygmy rattlesnake. The points show the normalized log2 coverage for each scaffold and the lines represent a smoothing spline drawn along the chromosome. Coverage was normalized by dividing the coverage for each scaffold-by-sex combination by the median coverage of all scaffolds in chromosomes 1–5 in that sex, resulting in a median log2 coverage score for autosomes of 0. Under this normalization, hemizygous sequences are expected to have a median log2 coverage of −1. Scaffolds were mapped to the Anolis macrochromosomes on the basis of the location of their gene content. The phylogenetic relationship between the species investigated is shown. Boids split from the other two groups about 100 MY ago, while colubrids and viperids diverged about 50 MY ago [8]. The snake photographs are used under a Creative Commons Attribution 2.5 Generic license (CC BY 2.5). Credit for the photographs are as follows: Nick Turland for the western pygmy rattlesnake (http://www.flickr.com/photos/nturland/1436776818/); Guilherme Jófili for the Boa constrictor (http://www.flickr.com/photos/gjofili/5005623645/); Steve Jurvetson for the coastal garter snake (http://www.flickr.com/photos/jurvetson/825514494/). Additional permission to publish the western pygmy rattlesnake image was granted by Nick Turland. https://doi.org/10.1371/journal.pbio.1001643.g001 Verification of Z-linkage and Conservation of Synteny Two independent lines of evidence confirm that we correctly assign sex-linkage of our genomic scaffolds. First, SNP patterns in our RNA-seq sample corroborate that homologues of Anolis chromosome 6 genes are Z-linked in snakes (the higher coverage of the RNA-seq data for many genes makes SNP inferences more reliable than using our genomic data). If the W is fully degenerated, as in pygmy rattlesnake, SNP patterns are expected to differ between males and females on the Z but should be similar on autosomes. Female individuals carry only one copy of the Z, and should therefore harbor no polymorphism at Z-linked genes, whereas ZZ male individuals can exhibit variation at Z-linked loci. Consistent with this, over 25% of all pygmy rattlesnake genes assigned to chromosomes 1 to 5 have at least one SNP in both the male and female sample. For Z-linked genes (chromosome 6), this proportion is 29% for the male, but only 2% for the female (7 genes out of 247; see Figure S1), confirming that overall we are classifying sex-linked genes correctly. The seven genes with female SNPs are not adjacent to each other on the Z, and thus they are unlikely to be derived from an undetected PAR. Instead, remaining heterozygosity could be due to several factors, including sequencing or mapping errors, mapping of W-derived reads (see next section), undetected paralogs, or movement of these genes to an autosome. In boa, the W and Z are homomorphic; thus, most Z-linked genes have a homologous copy on the W and are therefore expected to show normal levels of polymorphism in the female. This is indeed what we observe, with female SNP counts of genes on the boa Z well within the range of the autosomes (Figure S1). Second, we mapped 11 known Z-linked cDNA clones of the rat snake Elaphe quadrivirgata [9] to the Anolis genome, and nine of them mapped to chromosome 6 of Anolis (the remaining two mapped to unmapped scaffolds; Figure S2). On the other hand, only one of the remaining non-sex-linked clones of rat snake mapped to chromosome 6 while 92 mapped to other chromosomes or to unmapped regions (Table S2). This strongly suggests that the Z chromosome of snakes is fully homologous to the Anolis chromosome 6. The 11 Z-linked markers were found to be collinear in three snake species investigated by FISH mapping (with representatives from each of Boidae, Viperidae, and Colubridae [9]), and we found the same order of these markers along chromosome 6 of Anolis (Figure S2). Thus, the overall structure of chromosome 6 of Anolis and the Z of snakes appears conserved, suggesting that there have been no large-scale chromosomal rearrangements between snakes and lizards. In addition, we also compared synteny of the genome assemblies of Anolis and boa, the snake species with the best-assembled genome, to detect chromosomal rearrangements at a smaller scale. Indeed, we find that micro-synteny is also remarkably conserved between these two species; almost all of the boa scaffolds are co-linear with Anolis (Figure S3), and we find evidence for only two small inversions on the Z chromosome of boa. Thus, all genes in our snake data mapping to chromosome 6 of Anolis were assigned as Z-linked, while genes mapping to chromosomes 1 to 5 were used as autosomal control genes. In addition to conserved chromosomal homology, we also expect that the relative locations of genes along the Z chromosome are, to a large extent, conserved between species (Figure S2). Identification of W-Linked Sequences in Snakes W-linked sequences are present only in females, and we searched for genomic scaffolds with female-specific read coverage to identify putative W-derived scaffolds (see Methods). In order to gain a chromosome-wide perspective on conservation of putatively W-linked sequences, we examined female-specific genomic scaffolds, regardless of whether they contained coding regions (putative W-linked coding sequences are discussed in the section Recombination suppression predates Viperidae–Colubridae divergence below). We selected female specific scaffolds that mapped to scaffolds homologous to Anolis chromosomes 1–6 with more than 150 aligned nucleotides as a candidate set enriched for W-derived sequences. If such putative W-linked sequences are derived from genomic regions of the ancestral autosome that formed the sex chromosomes in snakes, we expect them to be enriched for sequences homologous to inferred Z-linked scaffolds (i.e., scaffolds homologous to Anolis chromosome 6). Indeed, we find this to be the case: while the Z chromosome homolog in Anolis accounts for only 7.6% of the macrochromosomal genome, we find that 49% of the candidate W-scaffolds with homologs in our assembly map to Z-linked sequences in pygmy rattlesnake and 35% in garter snake (p<1×10−15; Figures 2 (histograms) and S4). Thus, W-linked candidate sequences in snake species with heteromorphic sex chromosomes are significantly enriched for homologs residing on the Z chromosome (4.5-fold for garter snake, 6.4-fold for pygmy rattlesnake). No such excess is seen for the homomorphic sex chromosomes of boa, and female-biased scaffolds map randomly across the genome, as expected for background noise mapping. Only 6.5% of W-candidate scaffolds of boa map to Anole chromosome 6, which is not statistically different than its proportion of euchromatin in the assembly, or 7.6% (binomial test, p = 0.63). Download: PPT PowerPoint slide PNG larger image TIFF original image Figure 2. Mapping of the best candidate female-specific W-candidate scaffolds of (A) boa, (B) garter snake, and (C) pygmy rattlesnake to the Anolis macrochromosomes. The histograms (left) show the number of candidate W scaffolds mapped to the six major chromosomes of Anolis, with the green bar highlighting the Anolis homolog to the snake Z chromosome. The W candidates homologous to Z-linked scaffolds (which accounts for 7.6% of the genome) make up 49% and 35% of all female-biased scaffolds in pygmy rattlesnake and garter snake, respectively. This is a 6.4-fold excess in pygmy rattlesnake, and 4.5-fold excess in garter snake over random mapping based on chromosome size. In contrast, in Boa, only 6.5% of scaffolds map to chromosome 6, which does not differ from random mapping on the basis of chromosome size. The right panel shows color-coded mapping density of W-candidates along the Anolis macrochromosomes. The density of W-candidates is not uniform across the Z chromosome in both pygmy rattlesnake and garter snake (p<0.0001 for mean nearest neighbor distances). The data in this figure are from all female biased candidate W-linked scaffolds and will thus contain both non-coding scaffolds as well as scaffolds containing protein coding genes. https://doi.org/10.1371/journal.pbio.1001643.g002 Evolutionary Strata on Snake Sex Chromosomes Our mapping of these W-candidate genomic scaffolds from each species to their genomic scaffolds anchored along the Anolis genome also allowed us to test if W-linked candidate scaffolds are randomly distributed along the Z chromosome, or if they cluster in certain genomic regions. Clustering of W-candidates would suggest that different regions of the Z chromosome have different evolutionary histories, that is, snake Z chromosomes might possess evolutionary strata, as observed in other taxa [23]–[25]. In particular, regions of the Z chromosome with a higher density of W scaffolds and higher similarity between Z and W sequences are indicative of segments along the sex chromosomes that abolished recombination more recently than ones that lack W homologs [23]. Figure 2 shows our mapping of candidate W-linked scaffolds against autosomal and Z-linked scaffolds in each species. We find that the density of W-scaffolds is not uniform across the Z for either pygmy rattlesnake or garter snake (p<0.0001; Figure S4). In both pygmy rattlesnake and garter snake, we identify at least two (and possibly three) evolutionary strata across the Z chromosomes that differ in their density of W paralogs, and their degree of nucleotide conservation between the Z and W (Figures 2 and 3). Enrichment of W scaffolds on the Z and identification of strata is robust to the cutoff used to select W-candidates (Figures S5 and S6). Download: PPT PowerPoint slide PNG larger image TIFF original image Figure 3. Evolutionary strata and sequence conservation between the pygmy rattlesnake and garter snake W-candidate scaffolds mapped along the Z chromosome. The middle three tracks show the position of candidate W sequences along the Z chromosome in garter snake (top) and pygmy rattlesnake (bottom), and their overlap (center). The top and bottom plots show nucleotide identity between Z-W gametologs (grey dots), and the median (red line) inferred for each of the putative strata for garter snake and pygmy rattlesnake (blue boxes; the y range of the boxes represents the interquantile range of the identity values in each region). The gray shaded region represents identity below 30%, indicating low quality mapping. As in Figure 2, the data in this figure contain both non-coding and coding scaffolds. https://doi.org/10.1371/journal.pbio.1001643.g003 Limited Z-W homology does not allow us to precisely determine the boundaries of strata or their age, but the overall architecture of strata looks similar in pygmy rattlesnake and garter snake. In particular, in both species the oldest stratum encompasses the middle of the chromosome (from roughly 2×107–5×107 bp), with a lower density of homologous sequences between the Z and W, most of which match at levels that are no different than chance (Figure 3). The distal regions on the Z show a much higher density of W-linked candidate regions compared to the center segment (Figure 3; p<<0.0001 for both garter snake and pygmy rattlesnake), indicating that the middle of the W chromosome is largely degenerated whereas the distal regions maintain substantial homology. A lower mapping density of the first 20 Mb might be indicative of a slightly older stratum in both pygmy rattlesnake and garter snake than the last 30 Mb (see Figures 2 and 3), but divergence within those regions is comparable (46% versus 42% identity for garter snake and 53% versus 51% for pygmy rattlesnake, neither comparison significantly different). Whether or not the two distal regions represent the same stratum or different strata remains unresolved, and will likely require high quality sequencing of both sexes of multiple species among Colubrids and Viperids. Thus, our data clearly reveal the presence of at least two, and possibly three evolutionary strata on snake sex chromosomes (Figures 3 and S6), and their similar architecture suggests that their formation may have pre-dated the split between Viperidae and Colubridae roughly 50 MY ago. Recombination Suppression Predates Viperidae–Colubridae Divergence Three lines of evidence support that the strata identified in the genomic scaffolds above were indeed formed prior to the divergence between the two species groups. Mapping Z-linked genes to putative W-linked sequence allowed us to retrieve W-gametologs for 55 of these genes in pygmy rattlesnake and 29 in garter snake (see Methods). The average synonymous sequence divergence between the Z-W gametologs exceeds the median synonymous divergence between garter snake and pygmy rattlesnake (0.28 versus 0.20, p = 0.076 with a two-tailed Wilcoxon test for pygmy rattlesnake; 0.32 versus 0.20, p = 0.004 for garter snake; Figure S7; Table S3, S4, S5), suggesting that recombination between the sex chromosomes ceased before the species split. In addition, if the W chromosomes had diverged completely independently in the two lineages, limited overlap between W-linked sequences (coding and non-coding) between garter snake and pygmy rattlesnake would be expected, but we find an excess of regions along the W to be shared between species (Figures 3 and S8; p<0.0001 for all comparisons, via Monte Carlo resampling of observed numbers of scaffolds based on locations of all scaffolds). Similarly, we find more protein-coding genes shared by the W-chromosomes of both species than expected by chance (15 observed versus 1.9 expected, p<0.001 with a goodness of fit Chi-square test). Although this fraction is highest for genes in the oldest stratum (Figure S9, due to the presence of a single common gene in this part of the W-chromosomes of both species), it is above 1 for each interval (from 4.6 to 14). This suggests that many W-linked genes had already degenerated in the ancestor of pygmy rattlesnake and garter snake, and that the W had started to reach its equilibrium gene content at the time the two species split, especially in the oldest stratum where recombination was first abolished. Finally, we compared phylogenetic trees for ten regions along the sex chromosomes for which we had homologous sequences from both the pygmy rattlesnake and garter snake Z and W chromosomes. If recombination suppression between the Z and W chromosomes preceded species divergence, we would expect the sequences to cluster by chromosome (i.e., W pygmy rattlesnake with W garter snake, and Z with Z). If pygmy rattlesnake and garter snake abolished recombination at some (or all) of the strata independently, the sequences should cluster by species. We find that for each gene tree constructed, the sequences cluster by chromosome and not species, suggesting that all strata on the sex chromosomes of Viperidae and Colubridae were formed prior to their species divergence (Figure S10). Transcribed W-Genes In addition to identifying 55 W-candidate genes from the pygmy rattlesnake genome assembly, we also assembled the female pygmy rattlesnake RNA-seq reads de novo, and searched this assembly for W sequences, based on female-limited expression and female-biased sequence coverage. Briefly, we mapped the pygmy rattlesnake male and female DNA-seq and RNA-seq reads to the transcriptome using Bowtie2 to estimate, for each transcript, the male and female coverage and expression level. While comparisons between female and male liver are unlikely to yield many genes with female-specific functions (as expected on the W chromosome), we find 12 pygmy rattlesnake transcripts with female to male coverage ratio below 0.1 and female-specific expression (nine after excluding transcripts derived from parasites), and W-linkage of six of these candidates was confirmed by female-specific PCR products (Figure S11). The list of confirmed and putative W-linked transcripts is provided in Table S6. Of the six confirmed W-derived transcripts, three consist of repetitive sequences or sequences derived from transposable elements, in agreement with the idea that degeneration of W/Y-chromosomes is often driven by the accumulation of active transposable elements. Although they were expressed at high enough levels to be detected in the female transcriptome, these repetitive/TE sequences had overall lower expression levels (mean fragments per kilobase of transcript per million mapped reads [FPKM]: 6.8) than the three other W-linked transcripts (mean FPKM: 13.4), providing some support for a functional role of the latter. Two of these non-repetitive transcripts have significant similarity to known vertebrate genes (ube2m and 28S ribosomal RNA gene, with tblastx E-values of 3.00e–11 and 1.00e–71), while the third transcript does not map to any known gene or contain any open reading frames (but has a relatively high level of expression at 14.2). Combining the DNA and RNA-based approach, we identify 61 potential genes on the W of pygmy rattlesnake (six from the transcriptome, and 55 from female-specific genomic scaffolds that map to chromosome 6/Z genes) and 29 on the W of garter snake (15 of which overlap between pygmy rattlesnake and garter snake). Thus, the W chromosome in both species contains considerably fewer genes than its former homolog, the Z chromosome (we detect 61 W-linked genes versus 712 Z-linked genes in pygmy rattlesnake, and 29 W-linked genes versus 723 Z-linked genes in garter snake). Molecular Evolution of Snake Sex Chromosomes Sex chromosomes are subject to unique evolutionary forces compared to autosomes [26]. Sex-biased transmission and hemizygosity of the Z in females can impact the efficacy of natural selection on the Z relative to autosomes (faster Z evolution). On one hand, selection can be more efficient at incorporating adaptive recessive mutations on the hemizygous Z-chromosome, resulting in increased rates of adaptive evolution [27],[28]. Alternatively, the Z can also experience more genetic drift due to a smaller effective population size relative to autosomes, resulting in an accumulation of slightly deleterious mutations [29],. In addition, Z-chromosomes are more often transmitted through males than are autosomes, and might be subject to higher mutation rates (male-driven evolution). That is, the many more cell divisions in spermatogenesis than in oogenesis can result in more mutations during DNA replication and thereby skew mutation rates between chromosomes [31]–[33]. Since synonymous sites are generally expected to evolve neutrally in vertebrates, rates of synonymous divergence (Ks) can be used as a proxy to infer mutation rate differences (i.e., male-driven evolution), while rates of amino-acid evolution (Ka) allow inferences regarding the efficacy of natural selection, and this has been used extensively to detect faster-X/Z evolution in a variety of species. Higher Rates of Protein and Synonymous Site Evolution on the Z The traditional approach to detect faster-X/Z evolution has been to compare rates of evolution of Z/X-linked genes to those of autosomal genes between a pair of species. Using pygmy rattlesnake and garter snake, this approach suggests that the Z is prone to accelerated rates of both synonymous (Ks autosome = 0.17 versus Ks Z = 0.19; see Tables S3 and S7) and non-synonymous (Ka autosome = 0.032 versus Ka Z = 0.037) evolution. However, the results of such pairwise comparisons are highly dependent on the gene content of each chromosome and can yield inconsistent results [34]. In snakes, the same set of genes is present on a differentiated Z (in pygmy rattlesnake and garter snake) and in a pseudoautosomal region (in boa), which is not expected to be subject to faster-Z or male-driven evolution. Under faster Z and male-driven evolution, we therefore expect this set of Z-linked genes to show an accelerated rate of divergence in the lineages with heteromorphic sex chromosomes (pygmy rattlesnake and garter snake) relative to the homomorphic lineage (boa). We calculated pairwise rates of synonymous (Ks) and amino acid (Ka) evolution, as well as their ratio (Ka/Ks), at protein-coding genes between Anolis and the different snake species. To test if they were consistent with increased rates of divergence on the Z in pygmy rattlesnake and garter snake compared with boa, we compared the following ratios (K stands for Ka, Ks or Ka/Ks depending on the analysis; see Figure 4) on the Z versus the autosomes: K(garter snake-Anolis)/K(boa-Anolis), K(pygmy rattlesnake-Anolis)/K(boa-Anolis), and K(garter snake-Anolis)/K(pygmy rattlesnake-Anolis). Under faster-Z and male-drive evolution, both K(pygmy rattlesnake-Anolis) and K(garter snake-Anolis) are expected to be increased on the Z, so that the ratio between the two may not differ significantly between the Z and the autosomes. K(pygmy rattlesnake-Anolis)/K(boa-Anolis), and K(garter snake-Anolis)/K(boa-Anolis), on other hand, are expected to be increased on the Z relative to the autosomes because boa is not under faster-Z/male-driven evolution (but garter snake and pygmy rattlesnake presumably are). Download: PPT PowerPoint slide PNG larger image TIFF original image Figure 4. Molecular evolution of snake Z chromosomes and autosomes at synonymous sites (Ks) and non-synonymous sites (Ka), and their ratio (Ka/Ks). For each gene, we calculated the Anolis-boa, Anolis-pygmy rattlesnake, and Anolis-garter snake rates of synonymous and non-synonymous evolution. To detect branch-specific differences, we obtained for each gene the ratios of these evolutionary rates between the different snake species pairs (pygmy rattlesnake/boa, garter snake/boa, and garter snake/pygmy rattlesnake), and plotted them for each macrochromosome. Please note that in the figure, “Pygmy Rattlesnake/Boa” refers to the ratio: pygmy-Anolis divergence/boa-Anolis divergence, and so on. Significant differences between the Z-chromosome and the autosomes are marked with asterisks (***, p<0.001; NS, non-significant). https://doi.org/10.1371/journal.pbio.1001643.g004 Overall, comparisons involving pygmy rattlesnake or garter snake yield slightly higher Ka and Ks values than the boa-Anolis comparison; however, this difference is significantly larger for the Z-chromosome (Figure 4) at the Ka, Ks, and Ka/Ks level, in agreement with faster-Z and male-driven evolution (Figure 4). Pygmy-Anolis and garter-Anolis comparisons, on the other hand, yield similar values for all chromosomes, as expected if both are under faster-Z evolution. The elevated Ka and Ks values on the Z suggest a general increase in mutation rates for Z-linked genes in males (i.e., male-driven evolution). The strength of male-driven evolution can be assessed by comparing the rates of synonymous evolution on the Z versus the autosomes, and α, the male to female ratio of mutation, is equal to (3 * KsZ−2 * KsA)/(4 * KsA−3 * KsZ) (where KsA and KsZ are the rates of synonymous evolution on the autosomes and the Z, respectively, and assuming that synonymous sites are generally neutral). Using data from pygmy rattlesnake and garter snake, we estimate that α is ∼1.8. Importantly, a significantly increased Ka/Ks ratio also supports faster evolution of the protein sequences of Z-linked genes, above the increase due to mutation rate differences (Figure 4). In summary, we find that snake genes on differentiated Z-chromosomes undergo accelerated rates of evolution relative to their homologs on the pseudoautosomal region, both at synonymous sites (consistent with different mutation rates between the sexes) and at protein sequences (likely due to mutational effects, but also due to differential selective effects on female-haploid chromosomes). Expression Analysis of Z-Linked Genes in Boa and Rattlesnake As W-chromosomes lack recombination and degenerate, Z chromosomes may evolve dosage compensation [35]. However, while taxa with male heterogamety generally equalize expression of X-linked genes between the sexes, all species investigated to date in which females are the heterogametic sex appear to lack chromosome-wide dosage compensation [12],[35]. To test for dosage compensation in pygmy rattlesnakes, we assayed gene expression in males and females for Z-linked and autosomal genes. One challenge of studying dosage compensation is that the expectations regarding levels of expression on the Z/X versus the autosomes in females and males are often unclear (see Discussion). Expression in boa provides a clear expectation for snakes, as, in the absence of W degeneration, the Z must have largely maintained the expression levels and biases of the ancestral autosome 6. Figure 5 shows that there is no reduction of expression in either male or female boa for this chromosome (p = 0.54, Wilcoxon test), and that the female to male ratio of expression is about 1 for all chromosomes. This provides clear base-line levels of absolute and sex-biased expression when testing for dosage compensation in snake species with heteromorphic sex chromosomes, as any expression bias observed on the Z must have arisen during or after the degeneration of the W. Download: PPT PowerPoint slide PNG larger image TIFF original image Figure 5. Log2 of expression in female, log2 of expression in male, and log2 of female over male expression, for the different macrochromosomes of boa (A) and pygmy rattlesnake (B). FPKM values were obtained for each gene using Cufflinks. Genes were assigned to different chromosomes according to their location in the Anolis genome. (C) log2 of female over male expression along chromosome 6 of Anolis (Z of snakes), using a sliding window size of 30 genes. https://doi.org/10.1371/journal.pbio.1001643.g005 In pygmy rattlesnake, chromosomes 1–5 have a female to male expression ratio of approximately 1, resembling patterns of gene expression in boas. The Z chromosome (chromosome 6), however, shows a clear reduction in levels of gene expression in females (p-value = 5.32e–11, Wilcoxon test), resulting in highly male-biased expression of this chromosome in pygmy rattlesnakes (the median of log2(F/M) is equal to −0.71 for the Z, corresponding to a 1.6-fold reduction, versus 0.05 for the autosomes, p-value<2.2e–16). While all genes with FPKM >0 were included in this analysis, the patterns are robust to changes in expression cutoffs (Figure S12), and a similar reduction is observed for genes with FPKM >10 in both males and females. This expression analysis thus provides clear evidence of a lack of global, chromosome-wide dosage compensation in ZW snakes with heteromorphic sex chromosomes. The female Z does not show a full 2-fold reduction of expression, which is likely to be at least partly associated with a general buffering mechanism for gene expression (i.e., a single gene dose does not result in halving of gene transcript), as observed in aneuploid Drosophila and human autosomal genes and on the uncompensated regions of the single female chicken Z [12]. In addition, snakes may show local dosage compensation, as found in birds [36], with some Z-linked genes being dosage-compensated. One possibility is that dosage compensation is restricted to segments of the chromosome, as has been suggested for the chicken Z chromosome [37]. A sliding-window analysis of sex-biased expression in pygmy rattlesnakes, with genes mapped according to their position on the Anolis chromosome 6, reveals some localized variation in female to male expression ratio along the chromosome, but no segment that is fully compensated (Figure 5). Whether this reflects the biology of the chromosome, or whether compensated chromosomal fragments in pygmy rattlesnake are simply masked by a large number of rearrangements since the snake-lizard split, will require the analysis of a fully assembled genomic sequence. While this is not yet possible for pygmy rattlesnake, a genomic assembly for boa is available (http://assemblathon.org/pages/download-data). Our analysis of synteny of genes on the largest boa scaffolds shows that there is a limited number of rearrangements between snakes and lizards, suggesting that this is unlikely to account for the patterns of female to male expression in pygmy rattlesnake (Figure S13). A more likely possibility is therefore that only some dosage-sensitive genes acquire dosage compensation, and that they do so on a gene-by-gene basis, similar to what is observed in birds and silkworm [12]. Consistent with this, the expression ratio histogram for Z-linked genes in pygmy rattlesnakes shows a shoulder of genes with a log2(F∶M) expression ratio of 0 (Figure 5B), in agreement with about 18% of all genes present on the Z being locally compensated. These dosage compensated genes show no clear clustering along the Z (Figure S14), as expected if specific genes become compensated individually on the Z. Discussion Sex Chromosome Differentiation in Snakes We compare the genomes of three snake species to study sex chromosomes at different stages of differentiation. Our analysis reveals that boas have entirely homomorphic sex chromosomes with no detectable differentiated segment present on the Z. Several reasons could have prevented us from detecting an existing sex-determining region on the Z/W: first, since coverage data is inherently noisy, a single, small, female-specific region (or a small region of reduced female coverage, if part of the W is degenerated) may not be distinguishable from a false positive. That the Boidae sex-determining region is small is evidenced by the viability of WW boa females produced by facultative parthenogenesis (in snakes with differentiated sex chromosomes, WW females are inviable and only ZZ males or ZW females are thought to result from parthenogenesis) [38],[39]. Another possibility is that the W acquired a sex-determining region from another chromosome, or that part of another chromosome of Anolis is fused to the Z/W of snakes. Finally, many genomic scaffolds of Anolis remain unmapped, and one of them could correspond to the part of chromosome 6 that has evolved into the sex-determining region of snakes. In contrast to boa, the sex chromosomes of both pygmy rattlesnake and garter snake are completely heteromorphic, and we detect no pseudoautosomal regions in either species (with the caveat that, once again, the PAR could be derived from a region not homologous to chromosome 6 of Anolis). Thus, the species investigated appear to represent extreme ends in the process of sex chromosome differentiation, despite cytological data that show little differentiation between the Z and W chromosome of garter snakes [20]. This suggests that one needs to be cautious in the interpretation of cytological data, and many more species that appear to lack sex chromosomes at the cytological level may in fact have differentiated sex chromosomes. Old Strata on Snake Sex Chromosomes Further, we identify dozens of W-candidate sequences in snakes whose closest paralogs in the genome are located on the Z, suggesting that W-sequences, to a large extent, consist of degenerated sequences present ancestrally on the autosomal precursor of the sex chromosomes, and not of recent additions to the W chromosome. The spatial distribution of W-scaffolds along the Z and identities of conserved Z-W sequences indicate the presence of evolutionary strata along the snake sex chromosomes. More recently formed strata show a higher density of W-linked sequences and more similarity with their Z-linked gametologs, compared to older strata. We identify at least two (and possibly three) strata in both pygmy rattlesnakes and garter snakes, and their overall architecture and age distributions are similar in both species. Sequence divergence at Z-W gametologs exceeds that of Z-linked genes between pygmy rattlesnake and garter snake, and comparison of the gene content of W-linked sequences in pygmy rattlesnake and garter snake shows an excess of shared genes between the species. Phylogenetic analysis reveals that W-linked sequences between the two species are more similar to each other than they are to their Z-linked gametologs within a species, indicating that the W-linked regions stopped recombining with their Z gametologs before the species split between Viperidae and Colubridae. All of these data strongly suggest that recombination between the sex chromosomes of pygmy rattlesnake and garter snake was abolished before these two lineages diverged, and all strata were established in an ancestor of Viperidae and Colubridae. Thus, our data are inconsistent with the view that sex chromosomes of snakes have evolved restricted recombination independently in different lineages. Instead, sex chromosomes of both Viperidae and Colubridae have been non-recombining for over 50 MY before their split, and their W-chromosomes are expected to be globally degenerate at the sequence level in both lineages. Cytogenetics Versus Degeneration at the Molecular Level The inferred variation in heteromorphism of sex chromosomes in different snake families from cytological analysis may be driven by general differences in their chromosomal morphology, such as the differential accumulation of repetitive sequences and heterochromatin on the W of different species (which can lead to dramatic differences in size and appearance, as often observed between Y chromosomes of closely related species [40]), rather than reflecting their level of divergence at the DNA sequence level. Similarly, the perceived cytological similarity between the W and Z of some species may stem from small pockets of local identity on otherwise degenerated Ws, as in the case of garter snake. A recent study localizing Z-linked clones using FISH mapping to the chromosomes of the Burmese python (Pythonidae), the Japanese rat snake (Colubridae), and the habu (Viperidae) found that while all 11 Z-linked clones mapped to both the Z and W chromosomes of the Burmese python, only three (RAB5A, CTNNB1, WAC) mapped to the W of the rat snake, and none to the W of the habu [9], supporting the cytological view that Colubridae are at a transitional stage of sex chromosome differentiation and Viperidae having a fully degenerate W chromosome. We searched our W-candidates in pygmy rattlesnake and garter snake for the three genes detected on the rat snake W, and did not detect a W-homolog of RAB5A in either species, found a W-homolog for WAC in both pygmy rattlesnake and garter snake, and identified a W-homolog for CTNNB1 only in pygmy rattlesnake. Thus, while cytological data failed to identify these W-candidate genes in the Viperidae investigated [9], our genome analysis clearly provides evidence for their presence on the W of pygmy rattlesnake, suggesting that the previous observations reflected species-specific gene loss on the W (or difficulties of localizing hybridisation signal on a more heterochromatic chromosome), rather than a general difference in the gene content of the W chromosome of Colubridae and Viperidae. Faster Evolution of Snake Z Chromosomes We find evidence for both male-driven evolution, and faster Z in snakes, similar to patterns observed in birds [29],[32]. Faster Z evolution can be an adaptive or non-adaptive process. On one hand, faster Z evolution may result from increased rates of fixation of recessive beneficial mutations, as Z-linked recessive alleles are directly exposed to selection in females [27]. On the other hand, Z chromosomes might be subject to increased rates of genetic drift [41], because in female-heterogametic taxa the effective population size of the Z can be greatly reduced relative to that of the autosomes, if sexual selection is stronger in males than females [42]. This can lead to an increase in the accumulation of weakly deleterious amino-acid mutations on the Z chromosome [30],[42],[43]. We can use SNP densities in our RNA-seq data to compare effective population sizes between chromosomes. We find that SNP densities are similar for Z-linked and autosomal genes in boas (Z/A ratio: 1.01 in males), but significantly lower in male pygmy rattlesnakes (Z/A ratio: 0.64), reduced below the neutral expectation of 0.75. This is consistent with the polygynous mating system reported in many snakes that can greatly decrease the effective population size of the Z, and suggests that increased drift is likely a main contributor to faster Z in this clade. The difference in rates of molecular evolution for Z-linked genes relative to the autosomes is thus more likely due to neutral and non-adaptive processes in snakes, similar to what has been observed in birds [41]. Lack of Global Dosage Compensation in Snakes Our transcriptome analysis provides clear evidence that snakes with heteromorphic sex chromosomes lack global dosage compensation. This result is robust to using different cut-off levels of gene expression, and therefore not an artifact of incorporating lowly expressed genes or transcriptional noise in our analysis [44]. The comparison with boa, which has homomorphic sex chromosomes, allows us to rule out ancestrally lower expression of the Z in females, or an ancestral excess of male-biased genes on the Z. The lack of such a comparison has previously limited our understanding of the evolution of dosage compensation in other clades (but see [45],[46]). For instance, the X-chromosome of the flour beetle Tribolium castaneum is over-expressed in females relative to both the autosomes and the male X [47]. This has been interpreted as a primitive form of dosage compensation, achieved by over-expressing the X in both sexes, but could alternatively be understood as a lack of dosage compensation of an ancestrally over-expressed X chromosome. Likewise, Z-linked expression is reduced relative to the autosomes in both sexes in the Lepidoptera Bombyx mori [48] and, since the Z is not male-biased in expression, this has been interpreted as evidence for dosage compensation [36]. However, assuming similar ancestral expression of the Z and autosomes, this implies that dosage compensation evolved in this clade by a reduction of Z-linked expression in males, a paradoxical explanation if dosage compensation arises primarily to re-establish Z:autosome gene balance in females. Alternatively, this reduced expression may reflect an ancestral bias that predates sex linkage. Potential sex-biases in expression levels on the Z/X have generally been understood as a lack of dosage compensation, but could also reflect ancestral sex-biased expression of the Z/X, a possibility that cannot be excluded without assaying ancestral gene expression [45],[46]. The contrast between homomorphic and heteromorphic sex chromosomes in snakes eliminates all these confounding possibilities. The similar expression levels observed for the homomorphic sex chromosome in male and female boas imply that Z-linked gene dose was indeed reduced in female pygmy rattlesnakes following the differentiation of their sex chromosomes, supporting the view that the male-biased expression often observed on Z-chromosomes likely stems from a lack of chromosome-wide dosage compensation. In contrast, most (but not all) species with old heteromorphic XY-systems have evolved mechanisms to counter-balance gene dose differences at sex-linked genes between males and females. XY species that show clear evidence of independent evolution of dosage compensation include marsupials [45], Drosophila [44], Anopheles [49], and Caenorhabditis elegans [44]. Placental mammals inactivate one of their X-chromosomes in females and thus show equal expression of X-linked genes in both sexes [44],[45],[50]. However, most genes on the X are not hyper-expressed relative to ancestral expression levels [45],[46],[51], and instead, some autosomal genes in expression networks with X-linked genes may have been down-regulated in placental mammals to restore gene balance between X-linked genes and autosomes. XY monotremes, in contrast, lack chromosome-wide dosage compensation [45]. Silene latifolia, a plant with young XY chromosomes that are only partially degenerated, shows partial dosage compensation [52]. Thus, many—but not all—XY species have evolved dosage compensation. Yet, all species investigated to date in which females are the heterogametic sex appear to lack global dosage compensation [12],[35]. This is true in three broad taxonomic groups with independently evolved ZW chromosomes: birds [13],[16],[53],[54], butterflies [17],[55],[56] (but see [48]), and Schistostomes (a trematode [14]). Snakes with heteromorphic sex chromosomes provide yet another example of a ZW species that lacks a chromosome-wide dosage compensation mechanism. Dosage Compensation and Differences in XY Versus ZW Systems The reason for this consistent difference in male versus female heterogametic systems in whether dosage compensation evolved or not is unclear but several hypotheses could explain the general lack of dosage compensation in female heterogametic species. One possibility is that females are more robust to differences in gene dose than males, and simply do not require compensatory mechanisms [12]. However, it is unclear why this should be the case in such diverse species groups, such as snakes, birds, butterflies, and schistostomes. Alternatively, if the male-biased transmission of the Z has led to a male-specific gene content on this chromosome, dosage compensation in ZW females may in fact be deleterious. The chicken Z chromosome consists of multiple strata that have been sex-linked for different amounts of time [24],[57], and genes that are located in the older evolutionary strata of the Z show more male-biased expression than genes in younger strata [58]. This demonstrates that the gene content of the Z chromosome is becoming more male-biased over time. Why the X, which is similarly expected to become enriched for female-biased genes, should acquire global dosage compensation in XY taxa is, again, not entirely clear, although theory predicts that female-heterogametic systems may be more prone to accumulating sexually antagonistic mutations on the Z than the X in XY species (reviewed in [59]). While female-heterogametic species lack global dosage compensation, many Z-linked genes are expressed at similar levels in females and males, suggesting that dosage-sensitive genes are up-regulated in females in a gene-specific manner [36]. The propensity to evolve chromosome-wide versus gene-specific mechanisms of dosage compensation could also depend on the number of genes that need compensation simultaneously during the process of Y or W degeneration. Specifically, if Y chromosomes degenerate much quicker than W chromosomes, then several genes simultaneously on the X might require dosage compensation and chromosome-wide mechanisms might evolve [18],[35]. In contrast, if W chromosome degeneration proceeds at a much lower pace, there might only be a single dose-sensitive gene at a time on the Z that requires compensation, and gene-specific dosage compensation might evolve. Higher mutation rates in males, and more opportunity for sexually antagonistic or sex-specific selection in males might result in faster gene decay for Y chromosomes relative to the W, but this has not been tested yet. Finally, it has been suggested that chromosomes that carry genes that are dosage-insensitive may be predisposed to becoming sex chromosomes [60]. Interestingly, in our boa expression analysis, chromosome 6 has the largest standard deviation of FPKM in both male and female (p-value<2.2e–16 for F-tests between chromosome 6 and any of the other macrochromosomes; see Dataset S1). This may reflect an ancestral deficit of tightly regulated genes on this chromosome, although it is possible that the increased variance in gene expression is instead a consequence of its sex-linkage. Some support for the idea that the initial set of genes on the proto-sex chromosomes determines the emergence of compensation mechanisms comes from comparison of platypus and chicken; both have homologous sex chromosomes and neither has dosage compensation [45]. In fact, platypus is to date the only species with a non-dosage-compensated XY system, further strengthening this hypothesis. Another possibility for this apparent difference in XY versus ZW systems is that patterns of sex chromosome dosage compensation might mainly depend on potentially fortuitous events that arose soon after sex chromosome differentiation (e.g., recruitment of MSL complex in Drosophila), which determine the evolution of subsequent mechanisms to equalize expression between the sexes [45]. To distinguish between these hypotheses, one needs to characterize dozens of independently evolved male- and female-heterogametic species with a variety of life-history and population parameters. The application of next-generation sequencing techniques in non-model species to identify sex-linked genes and sex-specific gene expression, as done here, provides a powerful framework to make this feasible. Sex Chromosome Differentiation in Snakes We compare the genomes of three snake species to study sex chromosomes at different stages of differentiation. Our analysis reveals that boas have entirely homomorphic sex chromosomes with no detectable differentiated segment present on the Z. Several reasons could have prevented us from detecting an existing sex-determining region on the Z/W: first, since coverage data is inherently noisy, a single, small, female-specific region (or a small region of reduced female coverage, if part of the W is degenerated) may not be distinguishable from a false positive. That the Boidae sex-determining region is small is evidenced by the viability of WW boa females produced by facultative parthenogenesis (in snakes with differentiated sex chromosomes, WW females are inviable and only ZZ males or ZW females are thought to result from parthenogenesis) [38],[39]. Another possibility is that the W acquired a sex-determining region from another chromosome, or that part of another chromosome of Anolis is fused to the Z/W of snakes. Finally, many genomic scaffolds of Anolis remain unmapped, and one of them could correspond to the part of chromosome 6 that has evolved into the sex-determining region of snakes. In contrast to boa, the sex chromosomes of both pygmy rattlesnake and garter snake are completely heteromorphic, and we detect no pseudoautosomal regions in either species (with the caveat that, once again, the PAR could be derived from a region not homologous to chromosome 6 of Anolis). Thus, the species investigated appear to represent extreme ends in the process of sex chromosome differentiation, despite cytological data that show little differentiation between the Z and W chromosome of garter snakes [20]. This suggests that one needs to be cautious in the interpretation of cytological data, and many more species that appear to lack sex chromosomes at the cytological level may in fact have differentiated sex chromosomes. Old Strata on Snake Sex Chromosomes Further, we identify dozens of W-candidate sequences in snakes whose closest paralogs in the genome are located on the Z, suggesting that W-sequences, to a large extent, consist of degenerated sequences present ancestrally on the autosomal precursor of the sex chromosomes, and not of recent additions to the W chromosome. The spatial distribution of W-scaffolds along the Z and identities of conserved Z-W sequences indicate the presence of evolutionary strata along the snake sex chromosomes. More recently formed strata show a higher density of W-linked sequences and more similarity with their Z-linked gametologs, compared to older strata. We identify at least two (and possibly three) strata in both pygmy rattlesnakes and garter snakes, and their overall architecture and age distributions are similar in both species. Sequence divergence at Z-W gametologs exceeds that of Z-linked genes between pygmy rattlesnake and garter snake, and comparison of the gene content of W-linked sequences in pygmy rattlesnake and garter snake shows an excess of shared genes between the species. Phylogenetic analysis reveals that W-linked sequences between the two species are more similar to each other than they are to their Z-linked gametologs within a species, indicating that the W-linked regions stopped recombining with their Z gametologs before the species split between Viperidae and Colubridae. All of these data strongly suggest that recombination between the sex chromosomes of pygmy rattlesnake and garter snake was abolished before these two lineages diverged, and all strata were established in an ancestor of Viperidae and Colubridae. Thus, our data are inconsistent with the view that sex chromosomes of snakes have evolved restricted recombination independently in different lineages. Instead, sex chromosomes of both Viperidae and Colubridae have been non-recombining for over 50 MY before their split, and their W-chromosomes are expected to be globally degenerate at the sequence level in both lineages. Cytogenetics Versus Degeneration at the Molecular Level The inferred variation in heteromorphism of sex chromosomes in different snake families from cytological analysis may be driven by general differences in their chromosomal morphology, such as the differential accumulation of repetitive sequences and heterochromatin on the W of different species (which can lead to dramatic differences in size and appearance, as often observed between Y chromosomes of closely related species [40]), rather than reflecting their level of divergence at the DNA sequence level. Similarly, the perceived cytological similarity between the W and Z of some species may stem from small pockets of local identity on otherwise degenerated Ws, as in the case of garter snake. A recent study localizing Z-linked clones using FISH mapping to the chromosomes of the Burmese python (Pythonidae), the Japanese rat snake (Colubridae), and the habu (Viperidae) found that while all 11 Z-linked clones mapped to both the Z and W chromosomes of the Burmese python, only three (RAB5A, CTNNB1, WAC) mapped to the W of the rat snake, and none to the W of the habu [9], supporting the cytological view that Colubridae are at a transitional stage of sex chromosome differentiation and Viperidae having a fully degenerate W chromosome. We searched our W-candidates in pygmy rattlesnake and garter snake for the three genes detected on the rat snake W, and did not detect a W-homolog of RAB5A in either species, found a W-homolog for WAC in both pygmy rattlesnake and garter snake, and identified a W-homolog for CTNNB1 only in pygmy rattlesnake. Thus, while cytological data failed to identify these W-candidate genes in the Viperidae investigated [9], our genome analysis clearly provides evidence for their presence on the W of pygmy rattlesnake, suggesting that the previous observations reflected species-specific gene loss on the W (or difficulties of localizing hybridisation signal on a more heterochromatic chromosome), rather than a general difference in the gene content of the W chromosome of Colubridae and Viperidae. Faster Evolution of Snake Z Chromosomes We find evidence for both male-driven evolution, and faster Z in snakes, similar to patterns observed in birds [29],[32]. Faster Z evolution can be an adaptive or non-adaptive process. On one hand, faster Z evolution may result from increased rates of fixation of recessive beneficial mutations, as Z-linked recessive alleles are directly exposed to selection in females [27]. On the other hand, Z chromosomes might be subject to increased rates of genetic drift [41], because in female-heterogametic taxa the effective population size of the Z can be greatly reduced relative to that of the autosomes, if sexual selection is stronger in males than females [42]. This can lead to an increase in the accumulation of weakly deleterious amino-acid mutations on the Z chromosome [30],[42],[43]. We can use SNP densities in our RNA-seq data to compare effective population sizes between chromosomes. We find that SNP densities are similar for Z-linked and autosomal genes in boas (Z/A ratio: 1.01 in males), but significantly lower in male pygmy rattlesnakes (Z/A ratio: 0.64), reduced below the neutral expectation of 0.75. This is consistent with the polygynous mating system reported in many snakes that can greatly decrease the effective population size of the Z, and suggests that increased drift is likely a main contributor to faster Z in this clade. The difference in rates of molecular evolution for Z-linked genes relative to the autosomes is thus more likely due to neutral and non-adaptive processes in snakes, similar to what has been observed in birds [41]. Lack of Global Dosage Compensation in Snakes Our transcriptome analysis provides clear evidence that snakes with heteromorphic sex chromosomes lack global dosage compensation. This result is robust to using different cut-off levels of gene expression, and therefore not an artifact of incorporating lowly expressed genes or transcriptional noise in our analysis [44]. The comparison with boa, which has homomorphic sex chromosomes, allows us to rule out ancestrally lower expression of the Z in females, or an ancestral excess of male-biased genes on the Z. The lack of such a comparison has previously limited our understanding of the evolution of dosage compensation in other clades (but see [45],[46]). For instance, the X-chromosome of the flour beetle Tribolium castaneum is over-expressed in females relative to both the autosomes and the male X [47]. This has been interpreted as a primitive form of dosage compensation, achieved by over-expressing the X in both sexes, but could alternatively be understood as a lack of dosage compensation of an ancestrally over-expressed X chromosome. Likewise, Z-linked expression is reduced relative to the autosomes in both sexes in the Lepidoptera Bombyx mori [48] and, since the Z is not male-biased in expression, this has been interpreted as evidence for dosage compensation [36]. However, assuming similar ancestral expression of the Z and autosomes, this implies that dosage compensation evolved in this clade by a reduction of Z-linked expression in males, a paradoxical explanation if dosage compensation arises primarily to re-establish Z:autosome gene balance in females. Alternatively, this reduced expression may reflect an ancestral bias that predates sex linkage. Potential sex-biases in expression levels on the Z/X have generally been understood as a lack of dosage compensation, but could also reflect ancestral sex-biased expression of the Z/X, a possibility that cannot be excluded without assaying ancestral gene expression [45],[46]. The contrast between homomorphic and heteromorphic sex chromosomes in snakes eliminates all these confounding possibilities. The similar expression levels observed for the homomorphic sex chromosome in male and female boas imply that Z-linked gene dose was indeed reduced in female pygmy rattlesnakes following the differentiation of their sex chromosomes, supporting the view that the male-biased expression often observed on Z-chromosomes likely stems from a lack of chromosome-wide dosage compensation. In contrast, most (but not all) species with old heteromorphic XY-systems have evolved mechanisms to counter-balance gene dose differences at sex-linked genes between males and females. XY species that show clear evidence of independent evolution of dosage compensation include marsupials [45], Drosophila [44], Anopheles [49], and Caenorhabditis elegans [44]. Placental mammals inactivate one of their X-chromosomes in females and thus show equal expression of X-linked genes in both sexes [44],[45],[50]. However, most genes on the X are not hyper-expressed relative to ancestral expression levels [45],[46],[51], and instead, some autosomal genes in expression networks with X-linked genes may have been down-regulated in placental mammals to restore gene balance between X-linked genes and autosomes. XY monotremes, in contrast, lack chromosome-wide dosage compensation [45]. Silene latifolia, a plant with young XY chromosomes that are only partially degenerated, shows partial dosage compensation [52]. Thus, many—but not all—XY species have evolved dosage compensation. Yet, all species investigated to date in which females are the heterogametic sex appear to lack global dosage compensation [12],[35]. This is true in three broad taxonomic groups with independently evolved ZW chromosomes: birds [13],[16],[53],[54], butterflies [17],[55],[56] (but see [48]), and Schistostomes (a trematode [14]). Snakes with heteromorphic sex chromosomes provide yet another example of a ZW species that lacks a chromosome-wide dosage compensation mechanism. Dosage Compensation and Differences in XY Versus ZW Systems The reason for this consistent difference in male versus female heterogametic systems in whether dosage compensation evolved or not is unclear but several hypotheses could explain the general lack of dosage compensation in female heterogametic species. One possibility is that females are more robust to differences in gene dose than males, and simply do not require compensatory mechanisms [12]. However, it is unclear why this should be the case in such diverse species groups, such as snakes, birds, butterflies, and schistostomes. Alternatively, if the male-biased transmission of the Z has led to a male-specific gene content on this chromosome, dosage compensation in ZW females may in fact be deleterious. The chicken Z chromosome consists of multiple strata that have been sex-linked for different amounts of time [24],[57], and genes that are located in the older evolutionary strata of the Z show more male-biased expression than genes in younger strata [58]. This demonstrates that the gene content of the Z chromosome is becoming more male-biased over time. Why the X, which is similarly expected to become enriched for female-biased genes, should acquire global dosage compensation in XY taxa is, again, not entirely clear, although theory predicts that female-heterogametic systems may be more prone to accumulating sexually antagonistic mutations on the Z than the X in XY species (reviewed in [59]). While female-heterogametic species lack global dosage compensation, many Z-linked genes are expressed at similar levels in females and males, suggesting that dosage-sensitive genes are up-regulated in females in a gene-specific manner [36]. The propensity to evolve chromosome-wide versus gene-specific mechanisms of dosage compensation could also depend on the number of genes that need compensation simultaneously during the process of Y or W degeneration. Specifically, if Y chromosomes degenerate much quicker than W chromosomes, then several genes simultaneously on the X might require dosage compensation and chromosome-wide mechanisms might evolve [18],[35]. In contrast, if W chromosome degeneration proceeds at a much lower pace, there might only be a single dose-sensitive gene at a time on the Z that requires compensation, and gene-specific dosage compensation might evolve. Higher mutation rates in males, and more opportunity for sexually antagonistic or sex-specific selection in males might result in faster gene decay for Y chromosomes relative to the W, but this has not been tested yet. Finally, it has been suggested that chromosomes that carry genes that are dosage-insensitive may be predisposed to becoming sex chromosomes [60]. Interestingly, in our boa expression analysis, chromosome 6 has the largest standard deviation of FPKM in both male and female (p-value<2.2e–16 for F-tests between chromosome 6 and any of the other macrochromosomes; see Dataset S1). This may reflect an ancestral deficit of tightly regulated genes on this chromosome, although it is possible that the increased variance in gene expression is instead a consequence of its sex-linkage. Some support for the idea that the initial set of genes on the proto-sex chromosomes determines the emergence of compensation mechanisms comes from comparison of platypus and chicken; both have homologous sex chromosomes and neither has dosage compensation [45]. In fact, platypus is to date the only species with a non-dosage-compensated XY system, further strengthening this hypothesis. Another possibility for this apparent difference in XY versus ZW systems is that patterns of sex chromosome dosage compensation might mainly depend on potentially fortuitous events that arose soon after sex chromosome differentiation (e.g., recruitment of MSL complex in Drosophila), which determine the evolution of subsequent mechanisms to equalize expression between the sexes [45]. To distinguish between these hypotheses, one needs to characterize dozens of independently evolved male- and female-heterogametic species with a variety of life-history and population parameters. The application of next-generation sequencing techniques in non-model species to identify sex-linked genes and sex-specific gene expression, as done here, provides a powerful framework to make this feasible. Materials and Methods Samples We obtained genome sequences from B. constrictor (Boidae), Florida pygmy rattlesnake S. miliarius barbouri (Viperidae), and garter snake T. elegans (Colubridae) and transcriptomes from B. constrictor (blood) and S. miliarius barbouri (liver). B. constrictor female blood was provided by Monica Albe at UC Berkeley and used for DNA and RNA isolation. DNA and RNA from B. constrictor male blood was provided by Mark Stenglein at UC San Francisco. S. miliarius livers from a male and a female were collected by Bob Walton and immediately placed in RNAlater, shipped on dry ice and stored at −80°C until RNA and DNA isolation. T. elegans male and female livers were provided by the Museum of Vertebrate Zoology at UC Berkeley and used for DNA isolation. See Table S8 for an overview of the number of RNA-seq and DNA-seq reads generated. All DNA/RNA-seq reads are deposited at http://www.ncbi.nlm.nih.gov/sra under the bioproject accession number SRP026493 (genome data) and SRP026494 (transcriptome data). Genome Data and Assembly Illumina paired-end reads from a male B. constrictor were downloaded from the Assemblathon project website (http://assemblathon.org/). For all other samples (male and female pygmy rattlesnake, male and female garter snake, female boa), DNA was extracted from each sample with a Qiagen DNeasy Blood and Tissue kit, following the manufacturer's protocol. Paired-end library preparation and sequencing were performed at the Beijing Genomic Institute. All reads were trimmed before further analysis. The garter snake and pygmy rattlesnake genome was assembled from reads derived from separate male and female libraries using SOAPdenovo with a K-mer size of 31. The B. constrictor genome assembly 6C of the Assemblathon project was downloaded from http://assemblathon.org/. RNA-seq and Transcriptome Assembly We performed RNA-seq in male and a female liver of S. miliarius barbouri, and in male and female blood of B. constrictor. RNA was extracted with the Qiagen RNeasy kit according to the manufacturer's protocol. Library preparation and sequencing were performed at the Beijing Genomics Institute (en.genomics.cn). A. carolinensis CDS were downloaded from Ensembl release 67 (ftp://ftp.ensembl.org/) and, for each gene, only the longest CDS was kept. For each snake species, male and female reads were pooled, trimmed and assembled using SOAPdenovo [61] with a K-mer value of 31. Gapcloser was run to further improve the assembly. The resulting transcripts were mapped against A. carolinensis CDS sequences, using Blat [62] with a translated query and database. For each transcript, only the match with the highest score was kept. When transcripts overlapped on A. carolinensis genes by more than 20 bps, only the transcript with the best alignment score was kept. When the overlap was shorter than 20 bps, or when transcripts mapped to different parts of the gene, their sequences were concatenated. The location of these genes on Release 2 of the Anolis genome (http://www.broadinstitute.org/ftp/pub/assemblies/reptiles/lizard/AnoCar2.0/) was used to map the snake genes, and genes mapping to chromosome 6 of Anolis were classified as Z-linked. Chromosome Coordinate Assignment Anolis genes were mapped to genomic scaffolds for both pygmy rattlesnake and garter snake using translated blat searches between Anolis CDS sequences (EnsEMBL) and de novo snake contigs. In cases where more than one Anolis gene mapped to a snake scaffold, the consensus Anolis chromosome was used. Gene coordinates and strandedness from the consensus chromosome were used to position and orient the snake scaffolds. Members of the group of genes from the consensus chromosome were thrown out if they were more distant than 1 Mb from the nearest other neighbor in the group. Strandedness of the contig relative to the Anolis chromosome was determined by the consensus of colinearity of genes in the cluster. When only one gene mapped to a contig, its position and orientation was used to position and orient the snake scaffolds. The procedure for ordering and orienting boa scaffolds was the same as above. However, because the boa genome has substantially fewer sequencing gaps than the other snake genomes, we were able to construct boa pseudochromosomes directly by concatenating the genomic scaffolds rather than merely translating coordinates from Anolis as we did for garter snake and pygmy rattlesnake. Coverage Analysis across Chromosomes For all snake species, male and female reads were aligned separately to their respective genomes using BWA [63]. Read coverage per scaffold was calculated with bamtools [64]. bamtools was used to count the number of read pairs mapping to each scaffold such that: (1) if both reads from a pair mapped to a scaffold, that read pair was counted only once; (2) if only a single read from a pair mapped, it was also counted once, regardless of whether it was the first or second member of the pair. Thus, each sequenced molecule was counted only once. SNP Analysis Female and male RNA-seq reads were mapped back to the assembled transcriptome using Bowtie [65] with default parameters. Profile files were obtained from the resulting Bowtie map files using Bow2Pro (http://guanine.evolbio.mpg.de/), and only sites with coverage over 10 were kept for further analysis. SNPs were called when sites had two bases with frequency over 0.3 times the site coverage (genes with no sites with sufficient coverage were excluded). Mapping of Z-Linked and Autosomal Ratsnake Markers The sequence of each E. quadrivirgata marker used in [9] was downloaded from NCBI (http://www.ncbi.nlm.nih.gov/) and mapped to the Release 2 of the Anolis genome using blat with a translated query and database. For each marker, only the location with the best mapping score was kept. Identifying W Candidate Scaffolds from Genomic Data In order to search for W-linked sequences, three approaches were used. First, we identified W-linked scaffolds, both coding and noncoding, in all three genome assemblies to assess the global level of differentiation and homology between the Z and the W. Second, we searched for W-linked genes in the genomic data in pygmy rattlesnake and garter snake to investigate gene conservation between the Z and the W, and between the species. Finally we identified putatively expressed W-genes from the transcriptome of pygmy rattlesnake. For the global identification of W-derived scaffolds, all scaffolds from the de novo genomic assemblies showing evidence for high female coverage were identified as W-candidates. As it happens, none of these female biased W-candidates were successfully mapped using Anolis genes, likely because of the poorer assembly of W-linked sequences due to their reduced coverage and potentially increased repetitive content, and overall poor conservation of W sequences. As a result, we sought to map W-candidates by searching for their Z-linked homologs, which the W-candidates would have shared a common ancestor with far more recently than with the Anolis homologs. Therefore, we homology-searched our W-candidates using lastz and UCSC's axt/chain/net tools [66],[67] against all de novo scaffolds that were already successfully mapped to Anolis chromosomes as described above (see “Chromosome Coordinate Assignment”). High female coverage was defined with reference to the range of female coverage observed in autosomes (scaffolds homologous to Anolis chromosome 1–5) and was determined by selecting scaffolds for which the female proportion of coverage was statistically higher than Q3 + 1.5× IQR on autosomes. We did not initially require female specific read coverage in order to allow detection of incompletely degenerated W-candidate scaffolds. Note that while the homologs of the W-candidates scaffolds were mapped along the Anolis genome based on gene content (see “Chromosome Coordinate Assignment” above), they also contain noncoding sequence. These data are the basis for Figures 2 and 3. Identifying W Candidate Genes from Genomic Data For the genomic search for W-genes (performed in both garter snake and pygmy rattlesnake), male and female genomic reads were mapped to the de novo genome assemblies using Bowtie2 with default parameters, and female and male read counts were estimated from the resulting alignments (after removal of alignments with mismatches). Scaffolds with a female read count above 20 and a male read count below 2 were considered as putatively W-linked and kept for further analysis. Anolis coding sequences were mapped against these putative W-scaffolds using blat with a translated query and database. As expected, the majority of genes mapping to the putative W-scaffolds are located on Anolis chromosome 6/snake Z chromosome (55 Z-linked versus 19 from chromosomes 1–5 in pygmy rattlesnake, 29 versus 23 in garter snake; 42 and 34 were located in unmapped regions in pygmy rattlesnake and garter snake, respectively), and these Z-homologs were classified as putative W-linked genes. The resulting putative W-linked genes were further mapped to their Z-homolog (see “Estimation of Ka, Ks and Ka/Ks” for the obtention of snake CDS). For ZW gene pairs with a blat mapping score higher than 100, the coding sequence was extracted manually by aligning both the Z- and W-linked copies of the gene to the corresponding Anolis gene with tblastn, and keeping only parts of the sequence for which the protein alignment was well conserved. KaKs_calculator was then run on these ZW gene pairs to estimate their synonymous and non-synonymous rates of evolution (Table S4). For estimating Ks between Z/W paralogs using other models of sequence evolution, see Table S7. Identifying W Candidate Scaffolds from the Transcriptome For the identification of putatively expressed W-genes from the transcriptome (performed only in pygmy rattlesnake), female and male RNA-seq and DNA-seq reads were mapped to the de novo transcriptome assembly (for this analysis, the output of the SOAPdenovo/GapCloser assembly of female RNA-seq reads was used without further processing other than selecting transcripts above 300 bps) using Bowtie2 with default parameters. Cufflinks was used to estimate FPKM values from the RNA-seq alignments, and male and female genomic read counts were obtained from the DNA-seq alignments for each transcript (after removal of alignments with mismatches). Transcripts with male over female read counts below 0.1, total female read counts above 30, and female-specific expression were classified as putative W-linked sequences. PCR primers were designed for 9 of these transcripts, and W-linkage confirmed when they yielded female-specific bands in a standard PCR (see Figure S11 for details on the PCR analysis). Estimation of Ka, Ks, and Ka/Ks Coding sequences were extracted from the pygmy rattlesnake, garter snake and boa genome assemblies according to their homology to Anolis genes. Specifically, we mapped each Anolis CDS (see “Transcriptome Assembly”) to the snake genomes using blat with a translated query and database. In order to avoid the inclusion of paralogs, we used a reciprocal best hit approach: for each Anolis gene, only the genomic location with the largest mapping score was kept; similarly, when two Anolis genes mapped to the same snake genomic region (with an overlap longer than 20 bps), only the gene with the highest score was kept. These tables of reciprocal best hits were used to extract the putative gene loci from the genomic sequences of the three snake species with a perl script. Genewise was then run on these loci with default parameters, and the corresponding Anolis proteins as query, to retrieve the respective coding sequences. Only genes for which the resulting coding sequence encoded a protein longer than 40 amino acids with no stop codons in all three snake species were kept for further analysis. In order to obtain alignments based on the protein sequences, we translated the coding sequences from all three snake species and aligned the translated peptide sequences, as well as the corresponding Anolis protein, with Muscle [68]. Gblocks [69] was run with default parameters to remove gaps and regions of low conservation from these protein alignments, resulting in shortened protein sequences of equal sizes in all the species. These proteins were then used as queries for Genewise to extract the corresponding DNA sequences from each of the snake and lizard species. Since the final DNA sequences from the snake and lizard species correspond exactly to their aligned protein sequences (after gap removal), they can simply be concatenated for each gene into one final DNA alignment (in a few cases Genewise yielded DNA sequences of different sizes for the different species; these were removed from the analysis. The final sample consists of 9553 genes). KaKs_calculator [70] was run with the Nei-Gojobori model to estimate pairwise Ka and Ks values for each gene between all species pairs (Dataset S2). Expression Analysis Female and male reads were mapped back to the assembled transcriptome separately using Bowtie2 [71] with default parameters. FPKM values were estimated using Cufflinks [72] with a single GTF file per species that assigned a single transcript spanning the whole sequence to each gene (Dataset S1). Samples We obtained genome sequences from B. constrictor (Boidae), Florida pygmy rattlesnake S. miliarius barbouri (Viperidae), and garter snake T. elegans (Colubridae) and transcriptomes from B. constrictor (blood) and S. miliarius barbouri (liver). B. constrictor female blood was provided by Monica Albe at UC Berkeley and used for DNA and RNA isolation. DNA and RNA from B. constrictor male blood was provided by Mark Stenglein at UC San Francisco. S. miliarius livers from a male and a female were collected by Bob Walton and immediately placed in RNAlater, shipped on dry ice and stored at −80°C until RNA and DNA isolation. T. elegans male and female livers were provided by the Museum of Vertebrate Zoology at UC Berkeley and used for DNA isolation. See Table S8 for an overview of the number of RNA-seq and DNA-seq reads generated. All DNA/RNA-seq reads are deposited at http://www.ncbi.nlm.nih.gov/sra under the bioproject accession number SRP026493 (genome data) and SRP026494 (transcriptome data). Genome Data and Assembly Illumina paired-end reads from a male B. constrictor were downloaded from the Assemblathon project website (http://assemblathon.org/). For all other samples (male and female pygmy rattlesnake, male and female garter snake, female boa), DNA was extracted from each sample with a Qiagen DNeasy Blood and Tissue kit, following the manufacturer's protocol. Paired-end library preparation and sequencing were performed at the Beijing Genomic Institute. All reads were trimmed before further analysis. The garter snake and pygmy rattlesnake genome was assembled from reads derived from separate male and female libraries using SOAPdenovo with a K-mer size of 31. The B. constrictor genome assembly 6C of the Assemblathon project was downloaded from http://assemblathon.org/. RNA-seq and Transcriptome Assembly We performed RNA-seq in male and a female liver of S. miliarius barbouri, and in male and female blood of B. constrictor. RNA was extracted with the Qiagen RNeasy kit according to the manufacturer's protocol. Library preparation and sequencing were performed at the Beijing Genomics Institute (en.genomics.cn). A. carolinensis CDS were downloaded from Ensembl release 67 (ftp://ftp.ensembl.org/) and, for each gene, only the longest CDS was kept. For each snake species, male and female reads were pooled, trimmed and assembled using SOAPdenovo [61] with a K-mer value of 31. Gapcloser was run to further improve the assembly. The resulting transcripts were mapped against A. carolinensis CDS sequences, using Blat [62] with a translated query and database. For each transcript, only the match with the highest score was kept. When transcripts overlapped on A. carolinensis genes by more than 20 bps, only the transcript with the best alignment score was kept. When the overlap was shorter than 20 bps, or when transcripts mapped to different parts of the gene, their sequences were concatenated. The location of these genes on Release 2 of the Anolis genome (http://www.broadinstitute.org/ftp/pub/assemblies/reptiles/lizard/AnoCar2.0/) was used to map the snake genes, and genes mapping to chromosome 6 of Anolis were classified as Z-linked. Chromosome Coordinate Assignment Anolis genes were mapped to genomic scaffolds for both pygmy rattlesnake and garter snake using translated blat searches between Anolis CDS sequences (EnsEMBL) and de novo snake contigs. In cases where more than one Anolis gene mapped to a snake scaffold, the consensus Anolis chromosome was used. Gene coordinates and strandedness from the consensus chromosome were used to position and orient the snake scaffolds. Members of the group of genes from the consensus chromosome were thrown out if they were more distant than 1 Mb from the nearest other neighbor in the group. Strandedness of the contig relative to the Anolis chromosome was determined by the consensus of colinearity of genes in the cluster. When only one gene mapped to a contig, its position and orientation was used to position and orient the snake scaffolds. The procedure for ordering and orienting boa scaffolds was the same as above. However, because the boa genome has substantially fewer sequencing gaps than the other snake genomes, we were able to construct boa pseudochromosomes directly by concatenating the genomic scaffolds rather than merely translating coordinates from Anolis as we did for garter snake and pygmy rattlesnake. Coverage Analysis across Chromosomes For all snake species, male and female reads were aligned separately to their respective genomes using BWA [63]. Read coverage per scaffold was calculated with bamtools [64]. bamtools was used to count the number of read pairs mapping to each scaffold such that: (1) if both reads from a pair mapped to a scaffold, that read pair was counted only once; (2) if only a single read from a pair mapped, it was also counted once, regardless of whether it was the first or second member of the pair. Thus, each sequenced molecule was counted only once. SNP Analysis Female and male RNA-seq reads were mapped back to the assembled transcriptome using Bowtie [65] with default parameters. Profile files were obtained from the resulting Bowtie map files using Bow2Pro (http://guanine.evolbio.mpg.de/), and only sites with coverage over 10 were kept for further analysis. SNPs were called when sites had two bases with frequency over 0.3 times the site coverage (genes with no sites with sufficient coverage were excluded). Mapping of Z-Linked and Autosomal Ratsnake Markers The sequence of each E. quadrivirgata marker used in [9] was downloaded from NCBI (http://www.ncbi.nlm.nih.gov/) and mapped to the Release 2 of the Anolis genome using blat with a translated query and database. For each marker, only the location with the best mapping score was kept. Identifying W Candidate Scaffolds from Genomic Data In order to search for W-linked sequences, three approaches were used. First, we identified W-linked scaffolds, both coding and noncoding, in all three genome assemblies to assess the global level of differentiation and homology between the Z and the W. Second, we searched for W-linked genes in the genomic data in pygmy rattlesnake and garter snake to investigate gene conservation between the Z and the W, and between the species. Finally we identified putatively expressed W-genes from the transcriptome of pygmy rattlesnake. For the global identification of W-derived scaffolds, all scaffolds from the de novo genomic assemblies showing evidence for high female coverage were identified as W-candidates. As it happens, none of these female biased W-candidates were successfully mapped using Anolis genes, likely because of the poorer assembly of W-linked sequences due to their reduced coverage and potentially increased repetitive content, and overall poor conservation of W sequences. As a result, we sought to map W-candidates by searching for their Z-linked homologs, which the W-candidates would have shared a common ancestor with far more recently than with the Anolis homologs. Therefore, we homology-searched our W-candidates using lastz and UCSC's axt/chain/net tools [66],[67] against all de novo scaffolds that were already successfully mapped to Anolis chromosomes as described above (see “Chromosome Coordinate Assignment”). High female coverage was defined with reference to the range of female coverage observed in autosomes (scaffolds homologous to Anolis chromosome 1–5) and was determined by selecting scaffolds for which the female proportion of coverage was statistically higher than Q3 + 1.5× IQR on autosomes. We did not initially require female specific read coverage in order to allow detection of incompletely degenerated W-candidate scaffolds. Note that while the homologs of the W-candidates scaffolds were mapped along the Anolis genome based on gene content (see “Chromosome Coordinate Assignment” above), they also contain noncoding sequence. These data are the basis for Figures 2 and 3. Identifying W Candidate Genes from Genomic Data For the genomic search for W-genes (performed in both garter snake and pygmy rattlesnake), male and female genomic reads were mapped to the de novo genome assemblies using Bowtie2 with default parameters, and female and male read counts were estimated from the resulting alignments (after removal of alignments with mismatches). Scaffolds with a female read count above 20 and a male read count below 2 were considered as putatively W-linked and kept for further analysis. Anolis coding sequences were mapped against these putative W-scaffolds using blat with a translated query and database. As expected, the majority of genes mapping to the putative W-scaffolds are located on Anolis chromosome 6/snake Z chromosome (55 Z-linked versus 19 from chromosomes 1–5 in pygmy rattlesnake, 29 versus 23 in garter snake; 42 and 34 were located in unmapped regions in pygmy rattlesnake and garter snake, respectively), and these Z-homologs were classified as putative W-linked genes. The resulting putative W-linked genes were further mapped to their Z-homolog (see “Estimation of Ka, Ks and Ka/Ks” for the obtention of snake CDS). For ZW gene pairs with a blat mapping score higher than 100, the coding sequence was extracted manually by aligning both the Z- and W-linked copies of the gene to the corresponding Anolis gene with tblastn, and keeping only parts of the sequence for which the protein alignment was well conserved. KaKs_calculator was then run on these ZW gene pairs to estimate their synonymous and non-synonymous rates of evolution (Table S4). For estimating Ks between Z/W paralogs using other models of sequence evolution, see Table S7. Identifying W Candidate Scaffolds from the Transcriptome For the identification of putatively expressed W-genes from the transcriptome (performed only in pygmy rattlesnake), female and male RNA-seq and DNA-seq reads were mapped to the de novo transcriptome assembly (for this analysis, the output of the SOAPdenovo/GapCloser assembly of female RNA-seq reads was used without further processing other than selecting transcripts above 300 bps) using Bowtie2 with default parameters. Cufflinks was used to estimate FPKM values from the RNA-seq alignments, and male and female genomic read counts were obtained from the DNA-seq alignments for each transcript (after removal of alignments with mismatches). Transcripts with male over female read counts below 0.1, total female read counts above 30, and female-specific expression were classified as putative W-linked sequences. PCR primers were designed for 9 of these transcripts, and W-linkage confirmed when they yielded female-specific bands in a standard PCR (see Figure S11 for details on the PCR analysis). Estimation of Ka, Ks, and Ka/Ks Coding sequences were extracted from the pygmy rattlesnake, garter snake and boa genome assemblies according to their homology to Anolis genes. Specifically, we mapped each Anolis CDS (see “Transcriptome Assembly”) to the snake genomes using blat with a translated query and database. In order to avoid the inclusion of paralogs, we used a reciprocal best hit approach: for each Anolis gene, only the genomic location with the largest mapping score was kept; similarly, when two Anolis genes mapped to the same snake genomic region (with an overlap longer than 20 bps), only the gene with the highest score was kept. These tables of reciprocal best hits were used to extract the putative gene loci from the genomic sequences of the three snake species with a perl script. Genewise was then run on these loci with default parameters, and the corresponding Anolis proteins as query, to retrieve the respective coding sequences. Only genes for which the resulting coding sequence encoded a protein longer than 40 amino acids with no stop codons in all three snake species were kept for further analysis. In order to obtain alignments based on the protein sequences, we translated the coding sequences from all three snake species and aligned the translated peptide sequences, as well as the corresponding Anolis protein, with Muscle [68]. Gblocks [69] was run with default parameters to remove gaps and regions of low conservation from these protein alignments, resulting in shortened protein sequences of equal sizes in all the species. These proteins were then used as queries for Genewise to extract the corresponding DNA sequences from each of the snake and lizard species. Since the final DNA sequences from the snake and lizard species correspond exactly to their aligned protein sequences (after gap removal), they can simply be concatenated for each gene into one final DNA alignment (in a few cases Genewise yielded DNA sequences of different sizes for the different species; these were removed from the analysis. The final sample consists of 9553 genes). KaKs_calculator [70] was run with the Nei-Gojobori model to estimate pairwise Ka and Ks values for each gene between all species pairs (Dataset S2). Expression Analysis Female and male reads were mapped back to the assembled transcriptome separately using Bowtie2 [71] with default parameters. FPKM values were estimated using Cufflinks [72] with a single GTF file per species that assigned a single transcript spanning the whole sequence to each gene (Dataset S1). Supporting Information Dataset S1. FPKM value and location in the Anolis genome of all transcripts analyzed in boa and pygmy rattlesnake, list of all genes (and location in the Anolis genome) detected in the genomes of boa, pygmy rattlesnake, and garter snake, and list of all W-candidate genes detected in pygmy rattlesnake, and garter snake. https://doi.org/10.1371/journal.pbio.1001643.s001 (XLSX) Dataset S2. Molecular evolution of Z-linked genes in snakes. Ka, Ks, and Ka/Ks values for boa-Anolis, pygmy rattlesnake-Anolis and garter snake-Anolis comparisons are given for each gene. All divergence values were obtained with the KaKs_calculator (http://code.google.com/p/kaks-calculator/) with the Nei-Gojobori model. https://doi.org/10.1371/journal.pbio.1001643.s002 (XLSX) Figure S1. SNP count for each boa and pygmy rattlesnake gene along the genome. Boa (A) and pygmy rattlesnake (B) genes were mapped according to their location in the Anolis genome, and, for each gene, the total number of SNPs was plotted. (C) The proportion of genes with SNPs for each macrochromosome. https://doi.org/10.1371/journal.pbio.1001643.s003 (PDF) Figure S2. Comparative cytogenetic maps of sex chromosomes of snakes (picture redrawn from [9]) and Anolis (ideogram drawn for Phython molorus). The location of 11 genes mapped in three snake species and their position along Anolis chromosome 6 is shown. https://doi.org/10.1371/journal.pbio.1001643.s004 (PDF) Figure S3. Dot-plot between Anolis chromosomes and boa pseudochromosomes. Boa Z scaffolds are ordered and oriented by finding the consensus order and orientation of blat hits between Anolis genes and boa scaffolds from Assemblathon 2 and tiling the sequences in the appropriate order and orientation. If the breakpoints of inversions or other structural rearrangements map within scaffolds, this will be seen as off-diagonal dots. Given the high-quality assembly of boa (the concatenated boa chromosome exhibits the following assembly statistics: N50 = 1,855 Kb; N90 = 574 Kb; and N95 = 345 Kb), we have high power for finding rearrangements present within euchromatin. This figure shows evidence for two inversions on the Z. https://doi.org/10.1371/journal.pbio.1001643.s005 (PDF) Figure S4. Histogram of female-specific scaffolds mapping along the chromosomes of boa (A), garter snake (B), and pygmy rattlesnake (C). https://doi.org/10.1371/journal.pbio.1001643.s006 (PDF) Figure S5. Mapping of candidate female scaffolds to the genome of boa, garter snake, and pygmy rattlesnake with different stringency mapping parameters than in Figure 2. (A) High stringency mapping, requiring that percent identity be above 30% and number of aligned nucleotides be at least 200. The W candidates homologous to Z-linked scaffolds make up 60% and 41% of all female-biased scaffolds in pygmy rattlesnake and garter snake genomes, respectively. The proportion of boa scaffolds (14%) does not differ significantly from random mapping (binomial test, p = 0.42). (B) Low stringency mapping, requiring only that number of aligned nucleotides be at least 100. The W candidates homologous to Z-linked scaffolds make up 50% and 34% of all female-biased scaffolds in pygmy rattlesnake and garter snake genomes, respectively. The proportion of boa scaffolds (6.8%) does not differ significantly from random mapping (binomial test, p = 0.72). https://doi.org/10.1371/journal.pbio.1001643.s007 (PDF) Figure S6. Evolutionary strata and sequence conservation between the pygmy rattlesnake and garter snake W-candidate scaffolds mapped along the Z chromosome. Legend like Figure 3, but shown are the data for the lower/higher stringency mapping as in Figure S5. (A) High stringency mapping, requiring that percent identity be above 30% and number of aligned nucleotides be at least 200. Median Z-W identity for the distal left and right strata are 55% and 56% for pygmy rattlesnake and 53% and 51% for garter snake, respectively. (B) Low stringency mapping, requiring only that number of aligned nucleotides be at least 100. Median Z-W identity for the distal left and right strata are 55% and 55% for pygmy rattlesnake and 51% and 48% for garter snake, respectively. https://doi.org/10.1371/journal.pbio.1001643.s008 (PDF) Figure S7. Synonymous divergence between Z-W gametologs along the Z chromosome of garter snake and pygmy rattlesnake. The grey line shows median synonymous divergence for autosomal loci, and the red line shows median divergence for Z-linked loci. https://doi.org/10.1371/journal.pbio.1001643.s009 (PDF) Figure S8. Comparison of location of female-specific scaffolds in pygmy rattlesnake (top) and garter snakes (bottom). Regions of overlap are indicated in the middle. The mapping following Figures 2 and 3 are shown. https://doi.org/10.1371/journal.pbio.1001643.s010 (PDF) Figure S9. Gene conservation on the W of garter snake and pygmy rattlesnake. Anolis chromosome 6 (Z of snakes) was divided into three bins of equal sizes (45589–26937164, 26937164–53828738, 53828738–80720312, where 45589 and 80720312 are the location of the first and last gene along the chromosome). For each bin, the total number of genes was compared to the number of genes detected on putative W-linked scaffolds of garter snake, pygmy rattlesnake, or of both species (A). The number of genes detected on putative W-linked scaffolds of both species is higher than expected randomly (15 in total versus 1.9 expected, p<0.001 with a goodness of fit Chi-square test, where the expected proportion of overlapping genes in simply the proportion of chromosome 6/Z genes found on the pygmy W times the proportion of chromosome 6/Z genes found on the garter snake W) for all intervals (B). https://doi.org/10.1371/journal.pbio.1001643.s011 (PDF) Figure S10. Gene trees of Z and W gametologs from pygmy rattlesnake and garter snake. For some genes, we also added outgroup sequence information from Anolis or boa. https://doi.org/10.1371/journal.pbio.1001643.s012 (PDF) Figure S11. PCR confirmation of W-linkage of six female-specific transcripts in pygmy rattlesnake. Primers were designed to amplify fragments of putative W-linked scaffolds, while primers designed to amplify fragments of two autosomal sequences were used as a control. For each set of primers, standard PCR was performed with either male or female DNA as template (from the same samples used for the genomic sequencing), and an annealing temperature of 58°C. W-linkage was confirmed by the appearance of female-specific bands. https://doi.org/10.1371/journal.pbio.1001643.s013 (PDF) Figure S12. Log2 of expression in female and male, and log2 of female over male expression, for the different macrochromosomes of pygmy rattlesnake using different cutoff values. Only genes with FPKM values over 0 (upper panel), 1 (middle panel), or 10 (lower panel) were used in the analysis. https://doi.org/10.1371/journal.pbio.1001643.s014 (PDF) Figure S13. Synteny of genes between Anolis chromosomes and the eight largest boa scaffolds that mapped to macrochromosomes. Anolis CDS sequences were mapped to the boa genomic scaffolds using blat. The corresponding location of each gene on the eight largest scaffolds that mapped to Anolis macrochromosomes was plotted against their location on the corresponding Anolis chromosome. https://doi.org/10.1371/journal.pbio.1001643.s015 (PDF) Figure S14. Location of all 431 Z-linked pygmy rattlesnake transcripts (grey) and of the 80 genes with log2(F/M expression) = 0±0.3 (red, corresponds approximately to 0.8<F/M<1.2) along chromosome 6 (Z). https://doi.org/10.1371/journal.pbio.1001643.s016 (PDF) Table S1. Assembly statistics for boa, pygmy rattlesnake, and garter snake genomes. https://doi.org/10.1371/journal.pbio.1001643.s017 (DOCX) Table S2. Mapping of known Z-linked and autosomal markers of rat snake to the Anolis genome. https://doi.org/10.1371/journal.pbio.1001643.s018 (DOCX) Table S3. Average pairwise synonymous and non-synonymous divergence between all the species used in this study. https://doi.org/10.1371/journal.pbio.1001643.s019 (DOCX) Table S4. Divergence at synonymous and nonsynonymous sites between Z-W gametologs. https://doi.org/10.1371/journal.pbio.1001643.s020 (DOCX) Table S5. Synonymous divergence Ks between ZW paralogs using three other models to estimate Ks. https://doi.org/10.1371/journal.pbio.1001643.s021 (DOCX) Table S6. List of confirmed and putative W-linked transcripts in pygmy rattlesnake. Female and male RNA-seq and DNA-seq reads were mapped to the de novo transcriptome using Bowtie2, and the resulting alignments were used to estimate expression levels (FPKM, with Cufflinks) and genomic male and female read counts. Transcripts with male/female read counts below 0.1 and female-specific expression were classified as putative W-derived sequences (only transcripts with at least 30 genomic female reads were considered in this analysis). The PCR confirmation of W-linkage of some of the transcripts is described in Figure S11. https://doi.org/10.1371/journal.pbio.1001643.s022 (DOCX) Table S7. Detecting male-driven and faster-Z evolution using three other models to estimate Ka and Ks. Median Ka, Ks, and Ka/Ks values for boa-Anolis, pygmy rattlesnake-Anolis and garter snake-Anolis comparisons are given for autosomal macrochromosomes and for chromosome 6/Z. All estimates of rates of evolution were obtained using KaKs_calculator. https://doi.org/10.1371/journal.pbio.1001643.s023 (DOCX) Table S8. Number of RNA-seq and DNA-seq reads generated for/used in this analysis. https://doi.org/10.1371/journal.pbio.1001643.s024 (DOCX) Acknowledgments We thank Monica Albe, Mark Stenglein, and Bob Walton for providing samples.
Myelination Borrows a Trick from Phagedoi: 10.1371/journal.pbio.1001626pmid: 23966834
The 20,000–30,000 proteins encoded by our genomes all had to come from somewhere originally, and Nature is a great recycler. Although some sections of proteins have arisen recently by co-option of previously noncoding sequence, looking at most protein structures gives you a distinct feeling of déjà vu. Time and again handy functional modules appear in novel guises, doing sort of the same job, but in a different context. Two papers just published in PLOS Biology showcase a particularly spectacular example of such repurposing, where part of a protein from the tail of a bacteriophage (a bug's bug) reappears in a transcription factor that has a starring role in the electrical insulation of our nervous system. Download: PPT PowerPoint slide PNG larger image TIFF original image The same protein domain (ICA—green circle) mediates trimerization and self-cleavage in (A) the tail of a bacteriophage and (B) a transcription factor that drives insulation of nerves in the human brain. Image credit: Li, et al. https://doi.org/10.1371/journal.pbio.1001626.g001 Transcription factors generally regulate genes by directly recognizing specific short sequence motifs in nearby regions of the genome. Because the genome is stored in the nucleus, the vast majority of transcription factors either reside in the nucleus, or commute in from the cytoplasm in response to a regulatory cue. However, there's an exclusive little coterie of transcription factors that start life partially embedded in a cellular membrane, and need to be liberated by proteolytic cleavage before they can set off to work in the nucleus. In most cases studied so far, this cleavage event is the lynchpin of the entire regulatory pathway—the SREBP proteins, for instance, are cleaved when cholesterol levels drop too low; ATF6 remains membrane-bound until misfolded proteins start to accumulate in the endoplasmic reticulum. MYRF was identified in 2009 as a potential member of this select group, with what seemed to be the properties of both a transcription factor and a membrane protein—a DNA-binding domain reminiscent of yeast meiosis regulator Ndt80, plus a long run of oily amino acids characteristic of membrane-spanning anchors. MYRF is needed for the early steps of electrically insulating the axons of the neurons in our brain so that we can transmit nerve signals further, faster, and more efficiently. This process, known as myelination, is carried out by cells called oligodendrocytes that swaddle the delicate naked axons in fatty insulating sheaths. In the absence of MYRF, myelination stalls, with disastrous consequences for the axons. Two papers now published in PLOS Biology—one from Helena Bujalka, Matthias Koenning, Ben Emery, and colleagues, and one from Zhihua Li, Yungki Park, and Edward Marcotte—reveal that MYRF is a self-cleaving, membrane-dwelling transcription factor with an intriguing pedigree. In each study, the crucial step was the recognition of a further patch of protein homology, just next to the yeast transcription factor look-alike. Amazingly, this region resembles the intramolecular chaperone autoprocessing (ICA) domain of bacteriophage tailspike proteins. Bacteriophages look rather like the Apollo lunar landing module, and the tailspikes are the bits that mediate touchdown on the bacterial surface. The ICA domain is known to help the tailspike protein bunch together in threes (to “trimerize”) and to self-cleave to form an active enzyme that snips away at the bacterium's cell wall. Both groups then show conclusively that the ICA domain does exactly the same job in MYRF—it's essential for MYRF to trimerize and to cleave itself. The mechanism is presumably very similar, as mutating amino acids known to be important for phage protein self-cleavage or trimerization also affect the mammalian counterpart. The research groups then demonstrate that the cleavage is in turn needed to liberate the front half of the MYRF protein to make its way to the nucleus, leaving the back half behind at the endoplasmic reticulum. Once in the nucleus, the business end of MYRF seeks out sites in the genome that contain a simple seven-base-pair DNA sequence motif. These MYRF-recognized sites are preferentially located near genes known to be important for oligodendrocytes, the cells that myelinate neurons in the brain. MYRF-bound regions containing these motifs function as powerful enhancers of transcription in an oligodendrocyte-like cell line and in primary oligodenrocytes, and mutation of either the DNA motif or the Ndt80-like region of MYRF wipes out this effect. Together, these two papers present compelling evidence that MYRF starts life in the endoplasmic reticulum, cleaves itself from a membrane-bound stalk, and shuttles to the nucleus, where it collaborates with other transcription factors such as Sox10 and Olig2 to initiate a program of oligodendrocyte development and nerve axon insulation. Myelin is a vertebrate invention, though some other animals have come up with similar nervous system upgrades. But MYRF's intriguing story isn't limited to myelination—the eukaryotic tree of life is littered with MYRF relatives that share this crucial constellation of an Ndt80-like DNA-binding domain, a phage tailspike self-cleaving ICA domain, and a transmembrane tether. Sequence databases reveal that all vertebrates have two such proteins, and representatives are found in fruit flies, nematodes, sea anemones, and even the amoeba-like slime mold Dictyostelium. Some organisms with MYRF look-alikes lack nervous systems, let alone myelination, so these proteins can most likely be put to a range of uses. An obvious puzzle is that MYRF seems to go to a great deal of trouble to undergo this liberation from the membrane, yet both research groups find that cleavage is constitutive (i.e., unconditional)—with other membrane-bound transcription factors the whole point of the cleavage is that it's exquisitely regulated, dependent on cellular conditions. Is this characteristic of MYRF merely because the conditions needed for cleavage are ever present in the experiments used by the authors? It's interesting that the nematode MYRF-related protein, pqn-47, seems to live in the endoplasmic reticulum—maybe its conditions for cleavage aren't yet met, and it's sitting there poised for action. And the slime mold version, MrfA, regulates an important developmental step in forming a stalk-like structure, presumably a time-critical process. Perhaps MYRF cleavage is conditional; we just don't know what that condition is yet. The cute aspect of all of this, however, is MYRF's evolutionary journey, and the authors speculate as to whether this is a case of horizontal gene transfer. A parsimonious interpretation is that perhaps an ancient interaction between a phage particle and a single-celled eukaryote led to the insertion of part of a tailspike protein gene into a eukaryote transcription factor gene, and that the resulting (conditional?) self-cleavage was useful enough to be retained for the next billion years. Now that gene's descendants are telling slime molds when to form stalks and helping our brains to work fast enough to read papers about MYRF's role in myelination. Li Z, Park Y, Marcotte EM (2013) A Bacteriophage Tailspike Domain Promotes Self-Cleavage of a Human Membrane-Bound Transcription Factor, the Myelin Regulatory Factor MYRF. 10.1371/journal.pbio.1001624
The CK2 Kinase Stabilizes CLOCK and Represses Its Activity in the Drosophila Circadian Oscillatordoi: 10.1371/journal.pbio.1001645pmid: 24013921
Introduction Circadian oscillations of gene expression, physiology, and behavior are found in a wide range of organisms. They are governed by temporally regulated feedback loops in which transcription factors activate the expression of their own inhibitors. In the Drosophila circadian oscillator, the CLOCK (CLK) and CYCLE (CYC) bHLH-PAS domain transcription factors activate expression of the period (per) and timeless (tim) genes at the end of the day. The delayed accumulation of PER and TIM and their transfer to the nucleus leads to transcriptional repression of CLK/CYC during the late night. The repression phase is also shaped by other repressors/activators such as CLOCKWORK ORANGE (CWO) and KAYAK-α [1]. Subsequent degradation of PER and TIM repressors in the morning allows transcription to resume towards the evening [2],[3]. Controlled phosphorylation, ubiquitylation, and proteasome-dependent degradation of PER and TIM set the timing of their delayed accumulation and clearance. The PER protein is phosphorylated by the DOUBLETIME (DBT, CK1δ/ε), CK2, and NEMO kinases and polyubiquitylated by the SCFSlimb ubiquitin ligase complex [4]–[12]. TIM associates with PER, preventing its degradation, but TIM itself is subjected to phosphorylation and subsequent breakdown. TIM phosphorylation involves the CK2 and SHAGGY (SGG, GSK-3) kinases and TIM degradation also depends on SCFSlimb and a CULLIN-3-based ubiquitin ligase complex [7],[8],[13]–[15]. Phosphatase activity counterbalances the effects of the aforementioned and probably also of other kinases: PP2A regulates PER abundance, while PP1 targets both PER and TIM [16],[17]. CLK phosphorylation cycles with a peak in the morning and a minimum in the early night [18]–[21]. Similarly, CLK immunoreactivity in head extracts or brain tissue seems to oscillate in phase with its phosphorylation [21]–[23], although harsh extraction liberates chromatin-bound CLK, which results in relatively constant CLK levels [20],[24],[25]. Whether oscillations of CLK immunoreactivity in neurons reflect rhythmic changes of total CLK protein amount is still unclear [23],[26]. Due to the cyclic regulation of CLK as opposed to constitutive expression of CYC, the CLK protein appears to represent the key rhythmic component of the circadian activator in Drosophila [27]. CLK DNA-binding and transcriptional activity show a robust oscillation with an evening peak that is associated with the rapid increase of per and tim mRNA levels [20],[22]. The release of CLK from DNA goes hand in hand with its hyperphosphorylation, which depends on both PER and DBT [19],[20],[22]. Since kinase activity of DBT does not seem to be required for hyperphosphorylation, it was proposed that DBT acts as an interface for the recruitment of other kinases into a complex with CLK [28]. The PER kinase NEMO destabilizes CLK in vivo and might thus be a CLK kinase [29]. CLK transcriptional activity in cultured cells is affected by calcium/calmodulin-dependent kinase II and mitogen-activated protein kinase [30]. Ubiquitylation is also involved in the regulation of CLK and BMAL1, the CYC ortholog in mammals [31],[32]. In Drosophila, USP8 was recently reported to decrease CLK activity by deubiquitylation [25]. The CK2 kinase has a key function in the clockwork of various organisms [33]. In Neurospora, CK1 and CK2 phosphorylate both the White Collar Complex (WCC) transcriptional activator as well as its inhibitor FREQUENCY (FRQ) to control their activity, subcellular localization, and stability [34]–[36]. In mammals, CK2 and CK1 destabilize PER2, although phosphorylation at specific CK2 target sites stabilizes the protein [37],[38]. The CK2 holoenzyme is formed by a tetrameric complex consisting of two catalytic (α) and two interacting regulatory (β) subunits [39]. The β subunits stabilize the α subunits that possess constitutive kinase activity. Phosphorylation of most substrates is enhanced by CK2β, while some substrates are more efficiently phosphorylated by free CK2α in the absence of CK2β [40]. In Drosophila, CK2α and CK2β affect PER and TIM abundance and subcellular localization, which correlates with a direct phosphorylation of both proteins by the CK2 holoenzyme in vitro [8],[9],[41]–[43]. The dominant-negative CK2αTik mutation strongly increases TIM stability even in the absence of PER, supporting TIM as the main target of CK2 [15]. The CK2αTik protein overexpression induces hyperphosphorylation of TIM that could be explained by enhanced phosphorylation or reduced dephosphorylation of TIM by other kinases and phosphatases [15]. Since the identity of the kinases involved in the control of CLK phosphorylation remains unclear, we asked whether CK2 plays a role in the phosphorylation and regulation of CLK. Our results indicate that inhibition of CK2α activity strongly increases CLK degradation, whereas CK2β does not affect CLK stability. The CK2 holoenzyme is recruited onto PER, TIM, and CLK mainly during late night, inducing CLK hyperphosphorylation in vivo and CK2 phosphorylates CLK in vitro. Specific CLK activity is increased in dominant-negative CK2αTik-expressing flies indicating repression of CLK by CK2α. Our findings define, to our knowledge, the first bona fide kinase of Drosophila CLK that plays a role in its degradation and hyperphosphorylation. The unstable but strongly active CLK acquired by CK2α inhibition joins the club of other circadian transcription factors with similar properties such as the WCC complex in Neurospora. Results CK2α Activity Promotes CLK Protein Phosphorylation and Stability A putative role of CK2 in CLK regulation was first addressed by analyzing head extracts of flies expressing a dominant-negative version of the CK2α catalytic subunit. As previously reported [15],[42], w;tim-gal4;UAS-CkIIαTik flies (hereafter tim>Tik flies) were behaviorally arrhythmic (Table 1) and displayed weak and strongly delayed PER and TIM oscillations, with high levels of mildly phosphorylated PER and highly phosphorylated TIM (Figure 1A). As CLK efficiently binds to DNA in the evening, the estimation of CLK levels through the circadian cycle is affected by extraction conditions. In sonicated head extracts, CLK protein has been shown to stay at constant levels, in contrast to a robust cycle of its phosphorylation [19],[20]. However, the existence of oscillations in CLK levels remains discussed [22]–[25]. In our hands, sonicated extracts of control flies showed weak cycling of CLK levels, although peak time was rather variable between experiments (Figures 1A and S1A). Nonsonicated extracts always showed CLK levels cycling with a trough in the evening (Figure S1B). Importantly, both sonicated and nonsonicated extracts of tim>Tik flies showed very low CLK levels with reduced phosphorylation on the first day of constant darkness (DD) (Figures 1A and S1A and S1B). In order to better estimate CLK levels in tim>Tik flies, sonicated extracts were treated with λ protein phosphatase (Figure S1C). Again, a strong decrease of unphosphorylated CLK abundance was observed in tim>Tik animals. Moreover, Clk mRNA levels were about 1.5-fold higher in tim>Tik flies than in controls (Figure 1B), indicating that low CLK protein levels are not a consequence of reduced Clk expression. Consequently, the protein/mRNA ratio for CLK decreased to approximately 10% in tim>Tik (Figure S1D). Immunolabeling of whole-mount brains of tim>Tik flies also supported a strong reduction of CLK levels in the small ventral lateral neurons (s-LNvs) (Figure 1C), with no change in its nuclear-only localization (not shown). Download: PPT PowerPoint slide PNG larger image TIFF original image Figure 1. CK2α inhibition triggers CLK degradation. (A, D) Western blot of sonicated head extracts from flies collected at DD1. Time (h) is indicated as CT. Gray and black bars represent subjective day and subjective night, respectively. A Coomassie Blue (CB) stained band in the size range of CLK is used as a loading control for blots run on 4% gels. Brackets indicate hypo- and hyperphosphorylated forms of CLK. At least two independent experiments were performed for each blot. (A) Two copies of tim-gal4 and two copies of UAS-CkIIαTik transgene were used for the experimental genotype. (B) Quantitative RT-PCR measurements of Clk mRNA levels in heads of flies collected at DD1. Results were averaged from at least four independent experiments. Error bars indicate the s.e.m. Averaged values were normalized to the CT2 control averaged value set to 100. (C) Quantification of CLK immunofluorescence in the PDF-expressing s-LNvs. Fluorescence index is given in arbitrary units. Error bars indicate s.e.m. (D) Flies were entrained and collected at 29°C. One copy of tim-gal4 and two copies of the CkIIα RNAi construct were used for the experimental genotype. https://doi.org/10.1371/journal.pbio.1001645.g001 Download: PPT PowerPoint slide PNG larger image TIFF original image Table 1. Locomotor activity rhythms in constant darkness of flies. https://doi.org/10.1371/journal.pbio.1001645.t001 To independently analyze the effect of decreasing CK2α activity on CLK, a UAS-CkIIα RNA interference (RNAi) construct was expressed under tim-gal4 control. Adult flies were kept at 29°C to increase Gal4-dependent expression. CLK in sonicated head extracts of w;tim-gal4/+;UAS-CkIIα-RNAi (tim>CkIIα-RNAi) flies showed a similar phenotype to that of tim>Tik flies, with reduced phosphorylation and levels throughout the cycle (Figure 1D). Dampened and delayed PER and TIM oscillations were observed with increased protein levels during the day. In support of its specificity, the induction of RNAi reduced CK2α protein levels (Figure S1E). Although high mortality of tim>CkIIα-RNAi flies after long incubation at high temperature prevented the assessment of their locomotor activity rhythms at 29°C, they displayed long period rhythms at 25°C (Table 1). Similar long period rhythms are observed in heterozygous w;tim-gal4/+;UAS-CkIIαTik/+ flies (Table 1), as previously reported [42]. CK2α Stabilizes CLK in the Absence of PER and TIM TIM was described as the likely primary target of CK2α in the circadian clock, effects elicited on PER being only secondary [15]. We thus asked whether TIM was required for CK2α effects on CLK, by comparing the profile of CLK protein in sonicated head extracts of tim01 and tim01 tim>Tik flies. In the absence of TIM, the CK2αTik protein induced a prominent reduction in CLK phosphorylation and a significant decrease of protein levels (Figure 2A). Furthermore, Clk mRNA levels were about four times higher in tim0 tim>Tik compared to tim0, supporting a strong degradation of the CLK protein in tim0 tim>Tik flies (Figure 2A). Since a PER/DBT complex controls CLK phosphorylation [20],[28], we asked whether PER was required for CLK modifications by CK2α, even in the absence of TIM. Effects of the CK2αTik protein were thus analyzed in per0 tim0 double mutants, where CLK appeared minimally phosphorylated in a CkIIα+ background (Figures 2B and S2A). CLK levels and phosphorylation were further diminished in the presence of the CK2αTik protein (Figures 2B and S2A). Since the CK2αTik protein overproduction increased Clk mRNA levels by about twofold in per0 tim0 double mutants, the CLK protein/mRNA ratio was reduced just as in tim0 mutants (Figure 2B). A similar decrease of CLK protein levels was observed in per0 tim>Tik flies (Figure S2B) despite increased Clk mRNA levels, suggesting that CLK protein was again strongly destabilized in the absence of PER. These observations reveal that CK2α stabilizes CLK in the absence of PER and/or TIM. Since CLK phosphorylation is further decreased by CK2αTik expression, CK2α is important for the PER/TIM-independent minimal phosphorylation program of CLK. Download: PPT PowerPoint slide PNG larger image TIFF original image Figure 2. PER and TIM-independent degradation of CLK in CK2αTik overexpressing flies. (A, B) Western blot of sonicated head extracts from flies collected at DD1. A CB stained band in the size range of CLK is used as a loading control. At least two independent experiments were performed for each blot. (A) Comparison of CLK protein in tim>Tik and control flies in tim0 background. (Left) Comparison between tim>Tik and controls in tim0 background for PER and CLK at CT2. w; tim0 tim-gal4 (tim0 +) and w; tim0 tim-gal4; UAS-CkIIαTik (tim0 CkIIαTik) were used. a and b are different protein extracts from the same genotype at the same time point. We loaded 100 µg of extracts. (Middle) Quantitative RT-PCR measurements of Clk mRNA levels in heads of flies collected at DD1. Results are means of pooled values from two time points (CT2 and 14) with at least two independent samples for each time point. Error bars indicate s.e.m. Average values were normalized to the mean of the control (w; tim0 tim-gal4) set to 100. Previous analysis of separate values at CT2 and CT14 indicated that they were similar (Table S1) justifying their common treatment (see above). (Right) CLK protein/Clk mRNA ratio calculated from mean CT2–CT14 values of Western blot quantification and quantitative RT-PCR data. Ratios were normalized to the control (w; tim0; tim-gal4) set to 100. Abbreviations as in (A). (Right) CLK protein/Clk mRNA ratio calculated from mean CT2–CT14 values of Western blot quantification and quantitative RT-PCR data. Ratios were normalized to the control (w; tim0; tim-gal4) set to 100. (B) Comparison of CLK protein in tim>Tik and control flies in per0 tim0 background. (Left) Genotypes: per0 w; tim0 (per0 tim0) and per0 w; tim0 tim-gal4; UAS-CkIIαTik (per0 tim0 CkIIαTik) as well as w;;ClkJrk. a, b, and c are different protein extracts from the same genotype at the same time point. (Middle) Quantitative RT-PCR measurements of Clk mRNA levels in head extracts of flies collected at CT2. Mean values +/− s.e.m. from at least three independent experiments are shown with the per0 tim0 control set to 100. (Right) CLK protein/Clk mRNA ratio calculated with mean values of Western blot quantification and quantitative RT-PCR data at CT2. Ratios were normalized to the control (per0 tim0) set to 100. (C) Cycloheximide chase of CLK degradation in the presence of CK2α overexpression. per and tim dsRNA was applied to S2 cells prior to transfection. We transfected 1 µg pAc-Clk-V5/His6 with or without 3 µg of the FMO02931 CK2α expression vector. Transfections were split in four equal volumes for the degradation assay. “h in CHX” indicates the hours for which respective cells were incubated in cycloheximide to stop protein synthesis. Immunoreactivity against MHC (myosin heavy chain) was used as loading control. Anti-V5, anti-CK2α, and anti-HA were used to reveal CLK and CK2α, respectively. (Left) Western blots for CLK alone (CLK) and CLK cotransfected with CK2α (CLK+ CK2α). (Right) CK2α expression from the FMO02931 plasmid after 1 d of induction. (Bottom) Average degradation profiles from three independent experiments (as shown on the left). Error bars represent s.e.m. https://doi.org/10.1371/journal.pbio.1001645.g002 The up-regulation of Clk mRNA levels in tim>Tik flies suggested that CK2α could repress Clk transcription. To test whether Clk transcription was affected in the tim>Tik genotype, Clk pre-mRNA levels were estimated. They were not increased in tim>Tik flies compared to controls, although a reduced antiphasic cycling was observed at DD1 (Figure S2C). The antiphasic oscillation was reminiscent of PER and TIM oscillations persisting in these flies (see Figure 1A). The increase of mature Clk mRNA levels in tim>Tik flies thus seems not to be the consequence of higher Clk gene transcription and rather supports a posttranscriptional control of Clk mRNA by CK2α. In agreement with a posttranscriptional control, the VRI and PDP1 regulators of Clk transcription were not affected in per0 tim>Tik flies (Figure S2D). Finally, since the transcriptional regulation of the cryptochrome (cry) gene is similar to the one of the Clk gene [44], we tested cry mRNA levels in per0 tim>Tik flies. No increase of cry mRNA levels was observed in the presence of the CK2αTik protein (Figure S2E), supporting a specific control of Clk mRNA levels by CK2α. The data from tim>Tik flies strongly suggested that CK2α controls CLK stability independently from PER and TIM. To obtain direct evidence for this, CLK degradation kinetics were analyzed in a cycloheximide (CHX) chase-based assay in Drosophila Schneider 2 (S2) cells. Since transfected V5-tagged CLK induced both per and tim expression in S2 cells in our hands, we used RNAi against per and tim to eliminate any effect of PER and TIM proteins. After blocking protein synthesis with CHX, CLK showed robust degradation during the following 9 h (Figure 2C). When FLAG-HA-tagged CK2α was co-expressed, CLK degradation proceeded very slowly. The increase of CK2α levels by exogenous expression was rather limited in these conditions, indicating that a small increase in total CK2α protein can have substantial effects on CLK degradation. These results confirm the in vivo observations and strongly support a role for CK2α in the inhibition of CLK breakdown. CK2β Does Not Influence CLK Stability Since inhibition of CK2α affected CLK stability and phosphorylation, we asked whether CK2β knockdown would have similar effects. Pdf-gal4 UAS-CkIIβ-RNAi/+ flies have been reported to display long period rhythms [45]. Driving two CkIIβ-RNAi transgenes under the control of tim-gal4 (hereafter tim>CkIIβ-RNAi flies, see Materials and Methods) induced behavioral arrhythmicity (Table 1). The specificity of the CkIIβ RNAi was first behaviorally assessed by rescue experiments involving CkIIβ RNAi under the control of the strong PDF+ cell driver gal1118 [46] and the co-expression of different CK2β isoforms. The strongly altered behavior of w;; gal1118/UAS-CkIIβ-RNAi could be rescued by overexpression of the VIIa, VIIb, and VIIc CK2β isoforms (see [47]) (Table 1). Western blots against CK2β revealed a reduction in two isoforms in tim>CkIIβ-RNAi animals, while a third isoform remained unaffected (Figure S3A). TIM and PER cycling was profoundly altered in head extracts of tim>CkIIβ-RNAi flies at DD1 (Figures 3A and S3B). In contrast, CLK oscillations were only slightly affected. In particular, tim>CkIIβ-RNAi flies did not show the pronounced decrease in CLK levels that was observed in tim>Tik flies. Furthermore, CK2β depletion in a per0 background did not result in a marked reduction of CLK phosphorylation or quantity (Figure 3B). Since equivalent levels of Clk mRNA were observed in per0 flies with or without CkIIβ RNAi expression, their protein/mRNA ratios were identical (Figure 3C and D), in contrast to tim>Tik flies. Similarly, CLK was not affected when CkIIβ RNAi was expressed in a tim0 background (not shown). In conclusion, although CK2α and CK2β proteins similarly affect TIM and PER accumulation and phosphorylation, the CK2β subunit does not seem to be required for CK2α to control CLK degradation and phosphorylation. Download: PPT PowerPoint slide PNG larger image TIFF original image Figure 3. CK2β does not contribute to the inhibition of CLK degradation. (A, B) Western blot of nonsonicated head extracts from flies collected at DD1. Gray and black bars represent subjective day and subjective night, respectively. A CB stained band in the size range of CLK is used as a loading control. Brackets indicate hypo- and hyperphosphorylated forms of CLK. At least two independent experiments were performed for each blot. (A) Comparison between tim > CkIIβ RNAi (w; tim-gal4/106845; 32377/+) and tim-gal4/+ controls in a per+ background, for TIM, PER, and CLK proteins. (B) Comparison between tim > CkIIβ RNAi and tim-gal4/+ controls in a per0 background, for TIM and CLK. a and b are different protein extracts from the same genotype at the same time point. (C) Quantitative RT-PCR measurements of Clk mRNA levels in heads of flies collected at DD1. Results shown are means of pooled values from two time points (CT2 and 14, which gave similar values) with two independent samples for each time point. Error bars indicate s.e.m. Average values were normalized to the mean control (per0 w; tim-gal4/+) values set to 100. (D) CLK protein/Clk mRNA ratio calculated from mean CT2–CT14 values of Western blot (B) and quantitative RT-PCR (C) data. Ratios were normalized to the control (per0 w; tim-gal4/+) ratios set to 100. https://doi.org/10.1371/journal.pbio.1001645.g003 CK2α Preferentially Forms Complexes with Highly Phosphorylated CLK in the Morning The strong effects of CK2α inhibition on CLK suggested that the two proteins might physically interact. Flies expressing a FLAG-tagged CK2α protein under tim-gal4 control displayed strong behavioral rhythms with a 1 h period lengthening (Table 1). Anti-FLAG immunoprecipitation experiments were performed from FLAG-CK2α-expressing fly head extracts at different circadian times and showed co-immunoprecipitation of TIM, PER, and CLK mostly at the end of the subjective night and in the subjective morning when these proteins are mainly hyperphosphorylated (Figure 4A). Although relatively abundant medium-phosphorylated clock proteins were observed in the extracts at CT16, they were poorly co-immunoprecipitated with CK2α. The CK2α subunit thus appears to preferentially make complexes with highly phosphorylated forms of TIM, PER, and CLK. Flies expressing FLAG-tagged CK2β were also behaviorally rhythmic with a slightly lengthened period (Table 1), and FLAG-CK2β expression could rescue the severe period lengthening induced by CkIIβ RNAi (Table 1), indicating that the tagged protein was functional. Similarly to CK2α, CK2β was found to be associated with hyperphosphorylated TIM, PER, and CLK in the late subjective night and in the subjective morning, whereas little amounts of proteins were co-immunoprecipitated at other circadian times (Figure 4B). The results thus suggest that CK2 holoenzyme is involved in the hyperphosphorylation of CLK, PER, and TIM in the late night/morning part of the cycle. Download: PPT PowerPoint slide PNG larger image TIFF original image Figure 4. PER, TIM, and CLK are found in protein complexes containing CK2. Anti-FLAG immunoprecipitation (IP) from nonsonicated head extracts of flies collected at DD1. FLAG-CK2α or FLAG-CK2β was immunoprecipitated from 1 mg of total protein extracts from heads and 50% of the precipitate was subjected to Western blot analysis. We loaded 50 µg of head extracts as input controls. IgG LC indicates the immunoglobulin G light chain used for the precipitation that is well detected by the anti-mouse secondary antibody. Gray and black bars represent subjective day and subjective night, respectively. Experiments were performed at least twice. (A, Left) FLAG-CK2α was immunoprecipitated from tim >FLAG-CkIIα flies and tim-gal4/+ negative controls (control) in a per+ background. IP of CK2α (FLAG) and co-IP of CK2β, TIM, PER, and CLK were visualized by immunoblotting. «*» shows an aspecific band recognized by the anti-PER antibody. (Right) TIM, PER, and CLK immunoblots of the corresponding inputs on per+ background. The time “CT10” indicates a mixed population of flies harvested at CT8 and CT12. A CB stained band in the size range of CLK is used as a loading control. TIM was run on a 3–8% Tris-Acetate gel. (B, Left) FLAG-CK2β was immunoprecipitated from tim > FLAG-CkIIß flies and tim-gal4/+ negative controls (control) in a per+ background. IP of CK2ß (with anti-CK2ß) and co-IP of CK2α, TIM, PER, and CLK were visualized by immunoblotting. (Right) TIM, PER, and CLK immunoblots of the corresponding inputs. A CB stained band in the size range of CLK is used as a loading control. (C) FLAG-CK2α or FLAG-CK2β was immunoprecipitated from tim >FLAG-CkIIα (α) flies and tim-gal4/+ negative controls (C) in a per0 background. co-IP of CLK was visualized by immunoblotting. Input samples and immunoprecipitates were run on the same gel for C and α. (D) Image showing PDF (green), CK2α (magenta), and CLK (blue) fluorescent immunolabeling in small PDF+ LNv-s of an adult fly at ZT3. The fourth square is a composite picture of the three stainings. Single optical planes are shown taken by confocal microscopy. https://doi.org/10.1371/journal.pbio.1001645.g004 Since CK2α strongly influences CLK stability in the absence of PER, anti-FLAG immunoprecipitations were also done in per0 tim>FLAG-CkIIα flies (Figure 4C). Minute amounts of hypophosphorylated CLK were co-immunoprecipitated in per0 extracts, nevertheless indicating that CLK-CK2α complexes may exist in the absence of PER. Conversely, CK2β did not co-immunoprecipitate with CLK in a per0 background (Figure 4C). The poor detection of CK2α–CLK complexes in the absence of PER suggested a very labile interaction between the two proteins or indirect PER-independent effects of CK2α on CLK. CK2 subunits preferentially associate with clock proteins at times when those are present in the nucleus. CK2α was, however, described to localize to the cytoplasm of LNv-s [8]. We therefore set out to investigate whether CK2α could be present in the nucleus of LNv-s as well. Whole-mount adult brains were stained with an anti-CK2α antibody together with anti-PDF and anti-CLK. PDF is known to be exclusively cytoplasmic [48], while CLK is almost completely nuclear in our hands (see also [26]). Although CK2α predominantly localized to the cytoplasm of s-LNv-s, a fine cloud of CK2α staining co-localized with CLK to the nucleus (Figure 4D). CK2α Increases CLK Phosphorylation in a PER-Dependent Manner To further decipher the function of CK2α in CLK phosphorylation, CLK protein was analyzed in flies overexpressing wild-type CK2α. As previously reported [41], CK2α overexpression induced a modest lengthening of the behavioral period (Table 1). w;tim-gal4/UAS-CkIIα (tim>CkIIα) flies showed subtle changes of PER and TIM oscillations with a slightly delayed degradation of the TIM (CT 4–8) and PER (CT 8) proteins during daytime at DD1 (Figures 5A and S4A–B). CLK levels in CK2α overexpressing head extracts were higher at CT0 and lower at CT12 compared to controls, but overall protein levels were not significantly affected (Figure 5A). In contrast, CLK phosphorylation was strongly altered, with forms always more phosphorylated than the wild-type minimal phosphorylation that is observed at CT12 (Figure 5A). CLK phosphorylation was not increased by CK2α overexpression in a per0 background (Figure 5B), indicating that CLK hyperphosphorylation by CK2α required PER. The results thus support a PER-dependent hyperphosphorylation of CLK by CK2α, whereas CLK hypophosphorylation and stability appears to be mostly controlled by a PER-independent CK2α function. Download: PPT PowerPoint slide PNG larger image TIFF original image Figure 5. CK2α overexpression induces CLK hyperphosphorylation in the presence of PER. (A, B) Western blot of nonsonicated head extracts from flies collected at DD1. Samples were run on 3–8% Tris-Acetate gels in order to better resolve hyperphosphorylated CLK and TIM forms. Gray and black bars represent subjective day and subjective night, respectively. At least two independent experiments were performed for each blot. (A, Top) Comparison between tim > CkIIα and tim-gal4/+ controls in a per+ background, for TIM, PER, and CLK proteins. (Bottom) Two independent experiments as above were quantified for CLK abundance, and the mean values are plotted. Error bars stand for the difference of the respective values from each experiment and their mean. The value of w; tim-gal4/+at CT0 was normalized to 100. (B) Comparison between per0; tim > CkIIα and per0; tim-gal4/+ controls for CLK. (C) CK2α phosphorylates CLK in vitro. (Top) Wild-type CLK was translated with a N-terminal 6-histidine fusion tag in vitro, affinity purified either in the absence or presence of PER or TIM, and subjected to phosphorylation assays by incubation with γ–32P-ATP either in the absence (−) or presence (+) of CK2α. Intensity of incorporated 32P-phosphate into CLK (32P) was analyzed by autoradiography and total CLK protein levels (CLK) were determined by Western blot analysis. The arrow indicates the position of phosphorylated CLK. (Bottom) Quantification of CLK-incorporated 32P-phosphate after normalization towards total CLK protein levels. Average CLK phosphorylation from at least three experiments ± s.e.m. are shown in the figure with wild-type CLK set to 100. https://doi.org/10.1371/journal.pbio.1001645.g005 The CLK phosphorylation defects in flies with altered CK2α functions and the presence of CLK-CK2α/β complexes suggested that CK2 might directly phosphorylate CLK. We thus asked whether the CK2 holoenzyme could phosphorylate CLK in vitro. Indeed, CLK was phosphorylated by CK2, and the presence of PER increased CLK phosphorylation by about twofold (Figure S4C). Addition of TIM protein did not affect the CK2-dependent phosphorylation of CLK. When only the CK2α catalytic subunit was used for the in vitro assay, CLK was phosphorylated with a similar efficiency and showed the same PER-mediated facilitation of its phosphorylation (Figure 5C). This confirms the in vivo results indicating that at least some of the CK2α effects on CLK phosphorylation do not require CK2β, and supports a direct phosphorylation of CLK by CK2α. CK2α Decreases CLK Transcriptional Activity The strong influence of CK2α on CLK phosphorylation and stability suggests that CLK-dependent transcription could be affected in flies with altered CK2α activity. As previously reported [42], intermediate levels of per and tim mRNAs were observed in tim>Tik flies (Figure 6A). Download: PPT PowerPoint slide PNG larger image TIFF original image Figure 6. CK2α decreases CLK transcription factor activity. (A–D) Quantitative RT-PCR measurements of per, pre–per, tim, and pre–tim mRNA levels in heads of flies collected at DD1. Error bars indicate s.e.m. (A) per and tim mRNA levels in tim>Tik and control flies. Values were normalized to the maximum value (control at CT12) set to 100. Mean mRNA levels +/− s.e.m. from at least three independent experiments are shown. (B) Quantitative RT-PCR measurements of per and tim pre-mRNA levels in heads of tim >Tik and control flies collected at DD1. Average values from three independent experiments were normalized to the control (w; tim-gal4) mean value at CT12 set to 100. Error bars represent s.e.m. (C) Relative qPCR values from Figure 6B were averaged from the four indicated time points (CT0, CT6, CT12, and CT18) for each genotype and for each pre-messenger and the mean value was plotted. 100 stands for the highest pre-mRNA expression in the respective genotype at CT12. (D) per and tim mRNA levels in tim > Tik and controls in a per0 or tim0 background. Results are means of pooled values from two time points (CT2 and 14). Values were normalized to the corresponding controls set to 100. Previous analysis of separate values at CT2 and CT14 indicated that they were similar (Table S1), justifying their common treatment (see above). Average results from at least three independent experiments are shown. (E) Luciferase activity assay in the presence or absence of CK2α overexpression. S2 cells were transfected, harvested, and measured as described in Materials and Methods. We transfected 5 ng pAc-Clk-V5, 10 ng p3x69-luc, and 10 ng pAc-Renilla luciferase with or without 5 or 15 ng of the FMO02931 CK2α expression vector. Mean luciferase activity +/− s.e.m. of at least four different samples from two independent experiments are shown. CLK-dependent luciferase expression in the absence of CK2α co-expression was set to 100. 1× CK2α indicates 5 ng FMO02931, and 3× CK2α stands for 15 ng FMO02931. Student's t test (unpaired, two-tailed) was applied to the respective groups, * p = 0.0315, ** p = 0.00118. https://doi.org/10.1371/journal.pbio.1001645.g006 per and tim pre-mRNA levels were measured to more directly estimate CLK transcriptional activity. Average nonoscillating levels of pre–per and pre–tim were observed in tim>Tik flies (Figure 6B,C), despite the very reduced amounts of CLK protein (see Figure 1A). It thus suggested that CLK transcriptional activity was strongly increased when CK2α activity was diminished. Flies expressing the CK2αTik protein in a per0, tim0, or per0 tim0 double mutant background revealed no statistically significant changes in per and tim mRNA levels compared to wild-type CK2α controls (Figures 6D and S5A). However, CLK protein levels were reduced to 25–50% in CK2αTik expressing flies (see Figures 2 and S2), suggesting that specific CLK activity was still increased. Effects of CK2α on the transcriptional activity of CLK thus appears to be at least partly independent of PER and TIM. per and tim mRNA levels were also measured in tim>CkIIβ-RNAi flies and showed intermediate levels compared to controls (Figure S5B). Since CLK quantity is unaffected by tim>CkIIβ-RNAi (Figure 3A), CK2ß does not strongly modify CLK transcriptional activity. Finally, CLK-dependent transcription was tested by CLK-induced reporter gene expression in S2 cells. On a CLK-binding synthetic minimal enhancer composed of three per-derived E-boxes, CLK-dependent transcription was decreased in a dose-dependent manner by CK2α co-expression (Figure 6E). Since CLK quantity was increased in the presence of CK2α overexpression (see Figure 2C), the transcriptional decrease could hardly be a consequence of lower CLK levels. CK2α Activity Promotes CLK Protein Phosphorylation and Stability A putative role of CK2 in CLK regulation was first addressed by analyzing head extracts of flies expressing a dominant-negative version of the CK2α catalytic subunit. As previously reported [15],[42], w;tim-gal4;UAS-CkIIαTik flies (hereafter tim>Tik flies) were behaviorally arrhythmic (Table 1) and displayed weak and strongly delayed PER and TIM oscillations, with high levels of mildly phosphorylated PER and highly phosphorylated TIM (Figure 1A). As CLK efficiently binds to DNA in the evening, the estimation of CLK levels through the circadian cycle is affected by extraction conditions. In sonicated head extracts, CLK protein has been shown to stay at constant levels, in contrast to a robust cycle of its phosphorylation [19],[20]. However, the existence of oscillations in CLK levels remains discussed [22]–[25]. In our hands, sonicated extracts of control flies showed weak cycling of CLK levels, although peak time was rather variable between experiments (Figures 1A and S1A). Nonsonicated extracts always showed CLK levels cycling with a trough in the evening (Figure S1B). Importantly, both sonicated and nonsonicated extracts of tim>Tik flies showed very low CLK levels with reduced phosphorylation on the first day of constant darkness (DD) (Figures 1A and S1A and S1B). In order to better estimate CLK levels in tim>Tik flies, sonicated extracts were treated with λ protein phosphatase (Figure S1C). Again, a strong decrease of unphosphorylated CLK abundance was observed in tim>Tik animals. Moreover, Clk mRNA levels were about 1.5-fold higher in tim>Tik flies than in controls (Figure 1B), indicating that low CLK protein levels are not a consequence of reduced Clk expression. Consequently, the protein/mRNA ratio for CLK decreased to approximately 10% in tim>Tik (Figure S1D). Immunolabeling of whole-mount brains of tim>Tik flies also supported a strong reduction of CLK levels in the small ventral lateral neurons (s-LNvs) (Figure 1C), with no change in its nuclear-only localization (not shown). Download: PPT PowerPoint slide PNG larger image TIFF original image Figure 1. CK2α inhibition triggers CLK degradation. (A, D) Western blot of sonicated head extracts from flies collected at DD1. Time (h) is indicated as CT. Gray and black bars represent subjective day and subjective night, respectively. A Coomassie Blue (CB) stained band in the size range of CLK is used as a loading control for blots run on 4% gels. Brackets indicate hypo- and hyperphosphorylated forms of CLK. At least two independent experiments were performed for each blot. (A) Two copies of tim-gal4 and two copies of UAS-CkIIαTik transgene were used for the experimental genotype. (B) Quantitative RT-PCR measurements of Clk mRNA levels in heads of flies collected at DD1. Results were averaged from at least four independent experiments. Error bars indicate the s.e.m. Averaged values were normalized to the CT2 control averaged value set to 100. (C) Quantification of CLK immunofluorescence in the PDF-expressing s-LNvs. Fluorescence index is given in arbitrary units. Error bars indicate s.e.m. (D) Flies were entrained and collected at 29°C. One copy of tim-gal4 and two copies of the CkIIα RNAi construct were used for the experimental genotype. https://doi.org/10.1371/journal.pbio.1001645.g001 Download: PPT PowerPoint slide PNG larger image TIFF original image Table 1. Locomotor activity rhythms in constant darkness of flies. https://doi.org/10.1371/journal.pbio.1001645.t001 To independently analyze the effect of decreasing CK2α activity on CLK, a UAS-CkIIα RNA interference (RNAi) construct was expressed under tim-gal4 control. Adult flies were kept at 29°C to increase Gal4-dependent expression. CLK in sonicated head extracts of w;tim-gal4/+;UAS-CkIIα-RNAi (tim>CkIIα-RNAi) flies showed a similar phenotype to that of tim>Tik flies, with reduced phosphorylation and levels throughout the cycle (Figure 1D). Dampened and delayed PER and TIM oscillations were observed with increased protein levels during the day. In support of its specificity, the induction of RNAi reduced CK2α protein levels (Figure S1E). Although high mortality of tim>CkIIα-RNAi flies after long incubation at high temperature prevented the assessment of their locomotor activity rhythms at 29°C, they displayed long period rhythms at 25°C (Table 1). Similar long period rhythms are observed in heterozygous w;tim-gal4/+;UAS-CkIIαTik/+ flies (Table 1), as previously reported [42]. CK2α Stabilizes CLK in the Absence of PER and TIM TIM was described as the likely primary target of CK2α in the circadian clock, effects elicited on PER being only secondary [15]. We thus asked whether TIM was required for CK2α effects on CLK, by comparing the profile of CLK protein in sonicated head extracts of tim01 and tim01 tim>Tik flies. In the absence of TIM, the CK2αTik protein induced a prominent reduction in CLK phosphorylation and a significant decrease of protein levels (Figure 2A). Furthermore, Clk mRNA levels were about four times higher in tim0 tim>Tik compared to tim0, supporting a strong degradation of the CLK protein in tim0 tim>Tik flies (Figure 2A). Since a PER/DBT complex controls CLK phosphorylation [20],[28], we asked whether PER was required for CLK modifications by CK2α, even in the absence of TIM. Effects of the CK2αTik protein were thus analyzed in per0 tim0 double mutants, where CLK appeared minimally phosphorylated in a CkIIα+ background (Figures 2B and S2A). CLK levels and phosphorylation were further diminished in the presence of the CK2αTik protein (Figures 2B and S2A). Since the CK2αTik protein overproduction increased Clk mRNA levels by about twofold in per0 tim0 double mutants, the CLK protein/mRNA ratio was reduced just as in tim0 mutants (Figure 2B). A similar decrease of CLK protein levels was observed in per0 tim>Tik flies (Figure S2B) despite increased Clk mRNA levels, suggesting that CLK protein was again strongly destabilized in the absence of PER. These observations reveal that CK2α stabilizes CLK in the absence of PER and/or TIM. Since CLK phosphorylation is further decreased by CK2αTik expression, CK2α is important for the PER/TIM-independent minimal phosphorylation program of CLK. Download: PPT PowerPoint slide PNG larger image TIFF original image Figure 2. PER and TIM-independent degradation of CLK in CK2αTik overexpressing flies. (A, B) Western blot of sonicated head extracts from flies collected at DD1. A CB stained band in the size range of CLK is used as a loading control. At least two independent experiments were performed for each blot. (A) Comparison of CLK protein in tim>Tik and control flies in tim0 background. (Left) Comparison between tim>Tik and controls in tim0 background for PER and CLK at CT2. w; tim0 tim-gal4 (tim0 +) and w; tim0 tim-gal4; UAS-CkIIαTik (tim0 CkIIαTik) were used. a and b are different protein extracts from the same genotype at the same time point. We loaded 100 µg of extracts. (Middle) Quantitative RT-PCR measurements of Clk mRNA levels in heads of flies collected at DD1. Results are means of pooled values from two time points (CT2 and 14) with at least two independent samples for each time point. Error bars indicate s.e.m. Average values were normalized to the mean of the control (w; tim0 tim-gal4) set to 100. Previous analysis of separate values at CT2 and CT14 indicated that they were similar (Table S1) justifying their common treatment (see above). (Right) CLK protein/Clk mRNA ratio calculated from mean CT2–CT14 values of Western blot quantification and quantitative RT-PCR data. Ratios were normalized to the control (w; tim0; tim-gal4) set to 100. Abbreviations as in (A). (Right) CLK protein/Clk mRNA ratio calculated from mean CT2–CT14 values of Western blot quantification and quantitative RT-PCR data. Ratios were normalized to the control (w; tim0; tim-gal4) set to 100. (B) Comparison of CLK protein in tim>Tik and control flies in per0 tim0 background. (Left) Genotypes: per0 w; tim0 (per0 tim0) and per0 w; tim0 tim-gal4; UAS-CkIIαTik (per0 tim0 CkIIαTik) as well as w;;ClkJrk. a, b, and c are different protein extracts from the same genotype at the same time point. (Middle) Quantitative RT-PCR measurements of Clk mRNA levels in head extracts of flies collected at CT2. Mean values +/− s.e.m. from at least three independent experiments are shown with the per0 tim0 control set to 100. (Right) CLK protein/Clk mRNA ratio calculated with mean values of Western blot quantification and quantitative RT-PCR data at CT2. Ratios were normalized to the control (per0 tim0) set to 100. (C) Cycloheximide chase of CLK degradation in the presence of CK2α overexpression. per and tim dsRNA was applied to S2 cells prior to transfection. We transfected 1 µg pAc-Clk-V5/His6 with or without 3 µg of the FMO02931 CK2α expression vector. Transfections were split in four equal volumes for the degradation assay. “h in CHX” indicates the hours for which respective cells were incubated in cycloheximide to stop protein synthesis. Immunoreactivity against MHC (myosin heavy chain) was used as loading control. Anti-V5, anti-CK2α, and anti-HA were used to reveal CLK and CK2α, respectively. (Left) Western blots for CLK alone (CLK) and CLK cotransfected with CK2α (CLK+ CK2α). (Right) CK2α expression from the FMO02931 plasmid after 1 d of induction. (Bottom) Average degradation profiles from three independent experiments (as shown on the left). Error bars represent s.e.m. https://doi.org/10.1371/journal.pbio.1001645.g002 The up-regulation of Clk mRNA levels in tim>Tik flies suggested that CK2α could repress Clk transcription. To test whether Clk transcription was affected in the tim>Tik genotype, Clk pre-mRNA levels were estimated. They were not increased in tim>Tik flies compared to controls, although a reduced antiphasic cycling was observed at DD1 (Figure S2C). The antiphasic oscillation was reminiscent of PER and TIM oscillations persisting in these flies (see Figure 1A). The increase of mature Clk mRNA levels in tim>Tik flies thus seems not to be the consequence of higher Clk gene transcription and rather supports a posttranscriptional control of Clk mRNA by CK2α. In agreement with a posttranscriptional control, the VRI and PDP1 regulators of Clk transcription were not affected in per0 tim>Tik flies (Figure S2D). Finally, since the transcriptional regulation of the cryptochrome (cry) gene is similar to the one of the Clk gene [44], we tested cry mRNA levels in per0 tim>Tik flies. No increase of cry mRNA levels was observed in the presence of the CK2αTik protein (Figure S2E), supporting a specific control of Clk mRNA levels by CK2α. The data from tim>Tik flies strongly suggested that CK2α controls CLK stability independently from PER and TIM. To obtain direct evidence for this, CLK degradation kinetics were analyzed in a cycloheximide (CHX) chase-based assay in Drosophila Schneider 2 (S2) cells. Since transfected V5-tagged CLK induced both per and tim expression in S2 cells in our hands, we used RNAi against per and tim to eliminate any effect of PER and TIM proteins. After blocking protein synthesis with CHX, CLK showed robust degradation during the following 9 h (Figure 2C). When FLAG-HA-tagged CK2α was co-expressed, CLK degradation proceeded very slowly. The increase of CK2α levels by exogenous expression was rather limited in these conditions, indicating that a small increase in total CK2α protein can have substantial effects on CLK degradation. These results confirm the in vivo observations and strongly support a role for CK2α in the inhibition of CLK breakdown. CK2β Does Not Influence CLK Stability Since inhibition of CK2α affected CLK stability and phosphorylation, we asked whether CK2β knockdown would have similar effects. Pdf-gal4 UAS-CkIIβ-RNAi/+ flies have been reported to display long period rhythms [45]. Driving two CkIIβ-RNAi transgenes under the control of tim-gal4 (hereafter tim>CkIIβ-RNAi flies, see Materials and Methods) induced behavioral arrhythmicity (Table 1). The specificity of the CkIIβ RNAi was first behaviorally assessed by rescue experiments involving CkIIβ RNAi under the control of the strong PDF+ cell driver gal1118 [46] and the co-expression of different CK2β isoforms. The strongly altered behavior of w;; gal1118/UAS-CkIIβ-RNAi could be rescued by overexpression of the VIIa, VIIb, and VIIc CK2β isoforms (see [47]) (Table 1). Western blots against CK2β revealed a reduction in two isoforms in tim>CkIIβ-RNAi animals, while a third isoform remained unaffected (Figure S3A). TIM and PER cycling was profoundly altered in head extracts of tim>CkIIβ-RNAi flies at DD1 (Figures 3A and S3B). In contrast, CLK oscillations were only slightly affected. In particular, tim>CkIIβ-RNAi flies did not show the pronounced decrease in CLK levels that was observed in tim>Tik flies. Furthermore, CK2β depletion in a per0 background did not result in a marked reduction of CLK phosphorylation or quantity (Figure 3B). Since equivalent levels of Clk mRNA were observed in per0 flies with or without CkIIβ RNAi expression, their protein/mRNA ratios were identical (Figure 3C and D), in contrast to tim>Tik flies. Similarly, CLK was not affected when CkIIβ RNAi was expressed in a tim0 background (not shown). In conclusion, although CK2α and CK2β proteins similarly affect TIM and PER accumulation and phosphorylation, the CK2β subunit does not seem to be required for CK2α to control CLK degradation and phosphorylation. Download: PPT PowerPoint slide PNG larger image TIFF original image Figure 3. CK2β does not contribute to the inhibition of CLK degradation. (A, B) Western blot of nonsonicated head extracts from flies collected at DD1. Gray and black bars represent subjective day and subjective night, respectively. A CB stained band in the size range of CLK is used as a loading control. Brackets indicate hypo- and hyperphosphorylated forms of CLK. At least two independent experiments were performed for each blot. (A) Comparison between tim > CkIIβ RNAi (w; tim-gal4/106845; 32377/+) and tim-gal4/+ controls in a per+ background, for TIM, PER, and CLK proteins. (B) Comparison between tim > CkIIβ RNAi and tim-gal4/+ controls in a per0 background, for TIM and CLK. a and b are different protein extracts from the same genotype at the same time point. (C) Quantitative RT-PCR measurements of Clk mRNA levels in heads of flies collected at DD1. Results shown are means of pooled values from two time points (CT2 and 14, which gave similar values) with two independent samples for each time point. Error bars indicate s.e.m. Average values were normalized to the mean control (per0 w; tim-gal4/+) values set to 100. (D) CLK protein/Clk mRNA ratio calculated from mean CT2–CT14 values of Western blot (B) and quantitative RT-PCR (C) data. Ratios were normalized to the control (per0 w; tim-gal4/+) ratios set to 100. https://doi.org/10.1371/journal.pbio.1001645.g003 CK2α Preferentially Forms Complexes with Highly Phosphorylated CLK in the Morning The strong effects of CK2α inhibition on CLK suggested that the two proteins might physically interact. Flies expressing a FLAG-tagged CK2α protein under tim-gal4 control displayed strong behavioral rhythms with a 1 h period lengthening (Table 1). Anti-FLAG immunoprecipitation experiments were performed from FLAG-CK2α-expressing fly head extracts at different circadian times and showed co-immunoprecipitation of TIM, PER, and CLK mostly at the end of the subjective night and in the subjective morning when these proteins are mainly hyperphosphorylated (Figure 4A). Although relatively abundant medium-phosphorylated clock proteins were observed in the extracts at CT16, they were poorly co-immunoprecipitated with CK2α. The CK2α subunit thus appears to preferentially make complexes with highly phosphorylated forms of TIM, PER, and CLK. Flies expressing FLAG-tagged CK2β were also behaviorally rhythmic with a slightly lengthened period (Table 1), and FLAG-CK2β expression could rescue the severe period lengthening induced by CkIIβ RNAi (Table 1), indicating that the tagged protein was functional. Similarly to CK2α, CK2β was found to be associated with hyperphosphorylated TIM, PER, and CLK in the late subjective night and in the subjective morning, whereas little amounts of proteins were co-immunoprecipitated at other circadian times (Figure 4B). The results thus suggest that CK2 holoenzyme is involved in the hyperphosphorylation of CLK, PER, and TIM in the late night/morning part of the cycle. Download: PPT PowerPoint slide PNG larger image TIFF original image Figure 4. PER, TIM, and CLK are found in protein complexes containing CK2. Anti-FLAG immunoprecipitation (IP) from nonsonicated head extracts of flies collected at DD1. FLAG-CK2α or FLAG-CK2β was immunoprecipitated from 1 mg of total protein extracts from heads and 50% of the precipitate was subjected to Western blot analysis. We loaded 50 µg of head extracts as input controls. IgG LC indicates the immunoglobulin G light chain used for the precipitation that is well detected by the anti-mouse secondary antibody. Gray and black bars represent subjective day and subjective night, respectively. Experiments were performed at least twice. (A, Left) FLAG-CK2α was immunoprecipitated from tim >FLAG-CkIIα flies and tim-gal4/+ negative controls (control) in a per+ background. IP of CK2α (FLAG) and co-IP of CK2β, TIM, PER, and CLK were visualized by immunoblotting. «*» shows an aspecific band recognized by the anti-PER antibody. (Right) TIM, PER, and CLK immunoblots of the corresponding inputs on per+ background. The time “CT10” indicates a mixed population of flies harvested at CT8 and CT12. A CB stained band in the size range of CLK is used as a loading control. TIM was run on a 3–8% Tris-Acetate gel. (B, Left) FLAG-CK2β was immunoprecipitated from tim > FLAG-CkIIß flies and tim-gal4/+ negative controls (control) in a per+ background. IP of CK2ß (with anti-CK2ß) and co-IP of CK2α, TIM, PER, and CLK were visualized by immunoblotting. (Right) TIM, PER, and CLK immunoblots of the corresponding inputs. A CB stained band in the size range of CLK is used as a loading control. (C) FLAG-CK2α or FLAG-CK2β was immunoprecipitated from tim >FLAG-CkIIα (α) flies and tim-gal4/+ negative controls (C) in a per0 background. co-IP of CLK was visualized by immunoblotting. Input samples and immunoprecipitates were run on the same gel for C and α. (D) Image showing PDF (green), CK2α (magenta), and CLK (blue) fluorescent immunolabeling in small PDF+ LNv-s of an adult fly at ZT3. The fourth square is a composite picture of the three stainings. Single optical planes are shown taken by confocal microscopy. https://doi.org/10.1371/journal.pbio.1001645.g004 Since CK2α strongly influences CLK stability in the absence of PER, anti-FLAG immunoprecipitations were also done in per0 tim>FLAG-CkIIα flies (Figure 4C). Minute amounts of hypophosphorylated CLK were co-immunoprecipitated in per0 extracts, nevertheless indicating that CLK-CK2α complexes may exist in the absence of PER. Conversely, CK2β did not co-immunoprecipitate with CLK in a per0 background (Figure 4C). The poor detection of CK2α–CLK complexes in the absence of PER suggested a very labile interaction between the two proteins or indirect PER-independent effects of CK2α on CLK. CK2 subunits preferentially associate with clock proteins at times when those are present in the nucleus. CK2α was, however, described to localize to the cytoplasm of LNv-s [8]. We therefore set out to investigate whether CK2α could be present in the nucleus of LNv-s as well. Whole-mount adult brains were stained with an anti-CK2α antibody together with anti-PDF and anti-CLK. PDF is known to be exclusively cytoplasmic [48], while CLK is almost completely nuclear in our hands (see also [26]). Although CK2α predominantly localized to the cytoplasm of s-LNv-s, a fine cloud of CK2α staining co-localized with CLK to the nucleus (Figure 4D). CK2α Increases CLK Phosphorylation in a PER-Dependent Manner To further decipher the function of CK2α in CLK phosphorylation, CLK protein was analyzed in flies overexpressing wild-type CK2α. As previously reported [41], CK2α overexpression induced a modest lengthening of the behavioral period (Table 1). w;tim-gal4/UAS-CkIIα (tim>CkIIα) flies showed subtle changes of PER and TIM oscillations with a slightly delayed degradation of the TIM (CT 4–8) and PER (CT 8) proteins during daytime at DD1 (Figures 5A and S4A–B). CLK levels in CK2α overexpressing head extracts were higher at CT0 and lower at CT12 compared to controls, but overall protein levels were not significantly affected (Figure 5A). In contrast, CLK phosphorylation was strongly altered, with forms always more phosphorylated than the wild-type minimal phosphorylation that is observed at CT12 (Figure 5A). CLK phosphorylation was not increased by CK2α overexpression in a per0 background (Figure 5B), indicating that CLK hyperphosphorylation by CK2α required PER. The results thus support a PER-dependent hyperphosphorylation of CLK by CK2α, whereas CLK hypophosphorylation and stability appears to be mostly controlled by a PER-independent CK2α function. Download: PPT PowerPoint slide PNG larger image TIFF original image Figure 5. CK2α overexpression induces CLK hyperphosphorylation in the presence of PER. (A, B) Western blot of nonsonicated head extracts from flies collected at DD1. Samples were run on 3–8% Tris-Acetate gels in order to better resolve hyperphosphorylated CLK and TIM forms. Gray and black bars represent subjective day and subjective night, respectively. At least two independent experiments were performed for each blot. (A, Top) Comparison between tim > CkIIα and tim-gal4/+ controls in a per+ background, for TIM, PER, and CLK proteins. (Bottom) Two independent experiments as above were quantified for CLK abundance, and the mean values are plotted. Error bars stand for the difference of the respective values from each experiment and their mean. The value of w; tim-gal4/+at CT0 was normalized to 100. (B) Comparison between per0; tim > CkIIα and per0; tim-gal4/+ controls for CLK. (C) CK2α phosphorylates CLK in vitro. (Top) Wild-type CLK was translated with a N-terminal 6-histidine fusion tag in vitro, affinity purified either in the absence or presence of PER or TIM, and subjected to phosphorylation assays by incubation with γ–32P-ATP either in the absence (−) or presence (+) of CK2α. Intensity of incorporated 32P-phosphate into CLK (32P) was analyzed by autoradiography and total CLK protein levels (CLK) were determined by Western blot analysis. The arrow indicates the position of phosphorylated CLK. (Bottom) Quantification of CLK-incorporated 32P-phosphate after normalization towards total CLK protein levels. Average CLK phosphorylation from at least three experiments ± s.e.m. are shown in the figure with wild-type CLK set to 100. https://doi.org/10.1371/journal.pbio.1001645.g005 The CLK phosphorylation defects in flies with altered CK2α functions and the presence of CLK-CK2α/β complexes suggested that CK2 might directly phosphorylate CLK. We thus asked whether the CK2 holoenzyme could phosphorylate CLK in vitro. Indeed, CLK was phosphorylated by CK2, and the presence of PER increased CLK phosphorylation by about twofold (Figure S4C). Addition of TIM protein did not affect the CK2-dependent phosphorylation of CLK. When only the CK2α catalytic subunit was used for the in vitro assay, CLK was phosphorylated with a similar efficiency and showed the same PER-mediated facilitation of its phosphorylation (Figure 5C). This confirms the in vivo results indicating that at least some of the CK2α effects on CLK phosphorylation do not require CK2β, and supports a direct phosphorylation of CLK by CK2α. CK2α Decreases CLK Transcriptional Activity The strong influence of CK2α on CLK phosphorylation and stability suggests that CLK-dependent transcription could be affected in flies with altered CK2α activity. As previously reported [42], intermediate levels of per and tim mRNAs were observed in tim>Tik flies (Figure 6A). Download: PPT PowerPoint slide PNG larger image TIFF original image Figure 6. CK2α decreases CLK transcription factor activity. (A–D) Quantitative RT-PCR measurements of per, pre–per, tim, and pre–tim mRNA levels in heads of flies collected at DD1. Error bars indicate s.e.m. (A) per and tim mRNA levels in tim>Tik and control flies. Values were normalized to the maximum value (control at CT12) set to 100. Mean mRNA levels +/− s.e.m. from at least three independent experiments are shown. (B) Quantitative RT-PCR measurements of per and tim pre-mRNA levels in heads of tim >Tik and control flies collected at DD1. Average values from three independent experiments were normalized to the control (w; tim-gal4) mean value at CT12 set to 100. Error bars represent s.e.m. (C) Relative qPCR values from Figure 6B were averaged from the four indicated time points (CT0, CT6, CT12, and CT18) for each genotype and for each pre-messenger and the mean value was plotted. 100 stands for the highest pre-mRNA expression in the respective genotype at CT12. (D) per and tim mRNA levels in tim > Tik and controls in a per0 or tim0 background. Results are means of pooled values from two time points (CT2 and 14). Values were normalized to the corresponding controls set to 100. Previous analysis of separate values at CT2 and CT14 indicated that they were similar (Table S1), justifying their common treatment (see above). Average results from at least three independent experiments are shown. (E) Luciferase activity assay in the presence or absence of CK2α overexpression. S2 cells were transfected, harvested, and measured as described in Materials and Methods. We transfected 5 ng pAc-Clk-V5, 10 ng p3x69-luc, and 10 ng pAc-Renilla luciferase with or without 5 or 15 ng of the FMO02931 CK2α expression vector. Mean luciferase activity +/− s.e.m. of at least four different samples from two independent experiments are shown. CLK-dependent luciferase expression in the absence of CK2α co-expression was set to 100. 1× CK2α indicates 5 ng FMO02931, and 3× CK2α stands for 15 ng FMO02931. Student's t test (unpaired, two-tailed) was applied to the respective groups, * p = 0.0315, ** p = 0.00118. https://doi.org/10.1371/journal.pbio.1001645.g006 per and tim pre-mRNA levels were measured to more directly estimate CLK transcriptional activity. Average nonoscillating levels of pre–per and pre–tim were observed in tim>Tik flies (Figure 6B,C), despite the very reduced amounts of CLK protein (see Figure 1A). It thus suggested that CLK transcriptional activity was strongly increased when CK2α activity was diminished. Flies expressing the CK2αTik protein in a per0, tim0, or per0 tim0 double mutant background revealed no statistically significant changes in per and tim mRNA levels compared to wild-type CK2α controls (Figures 6D and S5A). However, CLK protein levels were reduced to 25–50% in CK2αTik expressing flies (see Figures 2 and S2), suggesting that specific CLK activity was still increased. Effects of CK2α on the transcriptional activity of CLK thus appears to be at least partly independent of PER and TIM. per and tim mRNA levels were also measured in tim>CkIIβ-RNAi flies and showed intermediate levels compared to controls (Figure S5B). Since CLK quantity is unaffected by tim>CkIIβ-RNAi (Figure 3A), CK2ß does not strongly modify CLK transcriptional activity. Finally, CLK-dependent transcription was tested by CLK-induced reporter gene expression in S2 cells. On a CLK-binding synthetic minimal enhancer composed of three per-derived E-boxes, CLK-dependent transcription was decreased in a dose-dependent manner by CK2α co-expression (Figure 6E). Since CLK quantity was increased in the presence of CK2α overexpression (see Figure 2C), the transcriptional decrease could hardly be a consequence of lower CLK levels. Discussion Temporally controlled phosphorylation of clock proteins is a key feature of the transcriptional-translational negative feedback loop underlying the Drosophila circadian clock. Although the CLK activator shows robust oscillations of its phosphorylation levels, the phosphorylation mechanisms and how they affect CLK function remain largely unknown. Our study aimed at determining whether CK2 was involved in the control of CLK phosphorylation and how it would affect CLK circadian function. Overexpression of the CK2αTik dominant-negative enzyme or RNA interference against CkIIα substantially reduced CLK phosphorylation as well as protein levels. In accordance with the in vivo observations, co-transfection of CK2α with CLK in S2 cells increased CLK stability. This supports a function for CK2α in CLK stabilization. High CLK target gene transcription was induced by the CK2αTik protein despite the low-level accumulation of CLK, indicating that CK2α decreased the expression of CLK targets. This was further corroborated in the luciferase activity assay in S2 cells where overexpression of CK2α inhibited CLK activity. Furthermore, CK2α associated with hyperphosphorylated forms of CLK in the morning and it was able to directly phosphorylate the CLK protein in vitro. Effects of CK2α on CLK stability and activity did not require PER or TIM, but CLK phosphorylation by CK2α involves both PER-independent and PER-dependent functions. The results suggest that direct phosphorylation by CK2α stabilizes CLK and diminishes its transcriptional activity. CLK protein levels but not Clk mRNA levels are low in tim>Tik and tim> CkIIα-RNAi flies, indicating that CK2 specifically affects CLK protein levels. Although a role of CK2α in CLK protein synthesis cannot be completely excluded, three sets of experimental data support a posttranslational action of CK2α on CLK. First, CK2α associates with CLK, PER, and TIM in protein complexes. Second, CK2α affects CLK phosphorylation state in vivo and is able to phosphorylate CLK directly in vitro. Third, CK2α stabilizes CLK even after protein synthesis blockage with CHX. Importantly, tim>Tik flies show reduced CLK phosphorylation, as predicted from kinase inhibition. This is in contrast with the effects of CkIIαTik on PER and TIM, for which highly phosphorylated forms of the proteins accumulate, although PER phosphorylation remains lower than the highest state in the wild-type [8],[15],[41],[42]. Nevertheless, CK2 is able to phosphorylate PER and TIM in vitro [8],[41],[49]. The CkIIαTik mutation affects TIM phosphorylation in the absence of PER, whereas TIM is required to observe effects of CkIIαTik on PER [15]. TIM was thus proposed to be a direct target of CK2 that drives CK2-dependent modification of PER [15]. One might expect that TIM or PER relays the effects of CK2 on CLK. This is not supported by the strong effect of CkIIαTik on the CLK protein in per01, tim01, or per01 tim01 double mutants. However, PER strongly influences CLK phosphorylation by CK2. First, overexpression of wild-type CK2α induces CLK hyperphosphorylation in per+ but not in per0 flies. Second, PER enhances in vitro phosphorylation of CLK by CK2α. Finally, the abundance of CLK/CK2 complexes observed in head extracts is much lower in per0 mutants. In comparison to per0, wild-type flies accumulate much more CLK/CK2α complexes, particularly in the morning when PER and CLK are abundant and hyperphosphorylated. The PER–CLK interaction is the strongest in the morning and the weakest in the early evening when PER is highly degraded. It seems unlikely that this temporal pattern of CLK interactions with PER is strongly altered in FLAG-CK2α overexpressing animals used for the immunoprecipitation since they show behavioral and molecular rhythms similar to wild-type flies. CLK/CK2α complexes strongly decrease after CT4 when high levels of CLK but not PER remain, suggesting that CLK/CK2α interactions follow phosphorylated PER abundance. PER hence could drive a large fraction of CLK/CK2α interactions with PER-free CLK being a weaker CK2 substrate. Since PER interacts in the late night with CLK species that no longer bind chromatin [22], it suggests that CLK/PER/CK2α complexes are mostly unbound to DNA. PER/DBT-dependent phosphorylation marks CLK for degradation [19],[20]. Although the NEMO kinase destabilizes CLK [29], whether it acts as a PER/DBT-dependent CLK kinase is not known. Our results indicate that inhibiting CK2α activity increases CLK breakdown, whereas overexpressing CK2α induces accumulation of highly phosphorylated CLK. CK2α thus appears to have opposite effects on CLK stability, compared to DBT and NEMO. Since both CK1 (DBT) and CK2 show a preferential association with CLK in the morning, they might counteract each other to control CLK degradation and recycling for a next transcription cycle. Interestingly, a kinase complex that includes CK1 promotes the SUPERNUMERARY LIMBS (SLMB)-dependent proteolysis of the CUBITUS INTERRUPTUS (CI) transcription factor, whereas CK2 stabilizes CI by preventing its ubiquitylation [50],[51]. As opposed to tim>Tik flies, flies expressing CkIIβ RNAi did not show significantly decreased CLK levels. In addition, CK2α and the CK2 holoenzyme are both able to phosphorylate Drosophila CLK in vitro. Our data suggest that CLK, in contrast to PER and TIM, might be a substrate of CK2α alone rather than a substrate of the CK2 holoenzyme in vivo. Several studies suggest that CK2α and β do not act synergistically on a handful of substrates [52] or even play antagonist roles with CK2β inhibiting CK2α-dependent phosphorylation of some target proteins (see [40]). In mammals, CK2α is more efficient than the CK2 holoenzyme to phosphorylate BMAL1, and can also phosphorylate CLK [53]. As previously reported [15],[42], tim>Tik flies showed intermediate levels of per and tim transcripts. We corroborated the involvement of transcription in this phenomenon by determining per and tim pre-mRNA profiles. It has been proposed that the high levels of hyperphosphorylated TIM in tim>Tik would prevent normal PER-dependent transcriptional repression [15]. However, the fact that flies expressing the CK2αTik protein in a per0, tim0, or per0 tim0 double mutant background have only half dose (or less) of CLK but as high levels of per and tim transcripts as the CkIIα+ controls supports an additional PER/TIM-independent transcriptional function of CK2. Importantly, the small amount of remaining CLK protein in the late night in per+ tim>Tik flies drives similarly high pre–per expression as much more CLK in the wild-type (see Figure 6B). That also undermines CK2's involvement only in PER/TIM repressor function during CLK-mediated transcription. A likely explanation is that the low-level hypophosphorylated CLK is extremely active in flies with reduced CK2α activity. In line with the in vivo results, the luciferase activity assay in cultured S2 cells uncovered a dose-dependent repression of CLK activity by the CK2α subunit on a minimal enhancer-promoter element. CK2α thus appears to control CLK-dependent transcription by increasing PER/TIM repressing capacity and jointly decreasing CLK activity by some other mechanism. CK2β supports TIM-dependent repression (Figure S5B), but may not contribute to the PER/TIM-independent control of CLK activity by CK2α. Since deubiquitylation of CLK by USP8 decreases its activity [25], it will be interesting to investigate whether CK2α phosphorylation affects CLK ubiquitylation. In the Neurospora circadian transcriptional feedback loop, the FRQ repressor recruits CK1 and CK2 to promote phosphorylation of the WCC activator complex resulting in the inhibition of its transcriptional activity [35],[54]–[57]. Reactivation of WCC occurs through its dephosphorylation by phosphatases such as PP2A [54],[56]. WCC is destabilized when turning active and gets stabilized as soon as it resumes a transcriptionally inactive state [56],[57]. This is reminiscent of our finding about the role of CK2 in CLK activity regulation. Recently, BMAL1 and CLK were also shown to be “Kamikaze” activators in mammals in that their activity was dependent on proteasome function—highly unstable CLK and BMAL1 were the most active, while proteasome inhibition resulted in long-lived but less potent activators [31],[32],[58]. Our findings indicate that CK2 might be a key player in such a mechanism, by promoting CLK stability and decreasing its activity. It remains to be seen how CK2 and DBT-dependent kinase activities interact on CLK to set CLK transcriptional activity to a proper phase in the circadian cycle. Materials and Methods Fly Stocks and Constructs Drosophila melanogaster stocks were maintained on a 12 h∶12 h LD cycle on standard corn meal-yeast-agar medium at 25°C. ClkJrk is a dominant allele of Clk, which results in a truncated and highly unstable CLK protein [59]. per01 [60], tim01 [61], w;tim-gal4-62 [62], w;;gal1118 [46], per01w;;13.2(per(Δ)-HA10His) F21 [63], yw;;P{UAS-CkIIα.Tik} T1 [15], yw;P{UAS-CkIIα.L} 35 [41], w;UAS-FLAG-CkIIα [51], and lines carrying UAS transgenes encoding each of the five CK2β isoforms [64] have been previously described. The gal1118 driver line in the adult brain is expressed in the small and large LNv-s in addition to some few nonclock cells [46]. UAS-RNAi flies against CkIIβ (stocks 32377 and 106845) and CkIIα (stock 17520 R-2) are described in http://stockcenter.vdrc.at/control/main and http://www.shigen.nig.ac.jp/fly/nigfly/index.jsp, respectively. Both CkIIβ RNAi lines (32377 and 106845) were induced in all the experiments using CkIIβ RNAi except specifically indicated. The UAS-FLAG-CkIIβ construct was made by inserting a FLAG-CK2ß coding segment (kindly provided by A. Bidwai, West Virginia University) into the pUAST vector, and w;UAS-FLAG-CkIIβ transgenic flies were generated by standard procedures. For in vitro phosphorylation assays, Clk constructs with a 6-histidine fusion tag as well as per and tim were expressed from a SP6 promoter incorporated in a pAc-5.1 vector, as described previously [30]. The FMO02931 expression plasmid was obtained from the Drosophila Genomics Resource Center (DGRC). It contains the full CkIIα ORF tagged C-terminally with FLAG and HA and driven by the metallothionein promoter. We verified the CkIIα ORF and the promoter region by sequencing. Behavioral Analysis Behavioral assays for locomotor activity rhythms were carried out with 1- to 5-d-old males at 25°C in Drosophila activity monitors (TriKinetics). Illumination was provided by standard white fluorescent low-energy bulbs. Light intensity at fly level was in the range of 300–1000 µW/cm2. Flies were first entrained to 12 h∶12 h light-dark (LD) cycles for 4 d and then transferred to constant darkness (DD). Activity data were analyzed from the second to the ninth day in DD. Data analysis was done with the FaasX 1.16 software that is derived from the Brandeis Rhythm Package (see [65]) and is freely available upon request (Apple Mac OSX only). Rhythmic flies were defined by χ2 periodogram analysis of an 8-d dataset with the following criteria (filter ON): power ≥20, width ≥1.5 h, with no selection on period value. Power and width represent the height and width of the periodogram peak, respectively, and give the significance of the calculated period. Genotypes with a reduced number of rhythmic flies (<50%), low power (<50), and high s.e.m. of the period (>1) are considered arrhythmic. Experiments were reproduced two or three times with very similar results. Protein Sample Preparation, Phosphatase Treatment, Sonication, and Western Blotting We entrained 1 to 5-d-old flies to 12 h∶12 h LD cycles for 4 d and transferred to DD (CT0 is 12 h after the last lights-OFF). Flies were collected on dry ice during the first day of DD (CT0–24). We homogenized 30–60 heads on ice in a modified RBS buffer [20]: 10 mM HEPES pH 7.5, 5 mM Tris-HCl pH 7.5, 50 mM KCl, 10% glycerol, 2 mM EDTA, 1% Triton X-100, 0.4% NP-40, 1 mM DTT, Complete Mini Protease Inhibitor Cocktail Tablet (Roche), Phosphatase Inhibitor Cocktail 2 and 3 (Sigma-Aldrich), and 20 mM β-glycerophosphate (3–4 µl buffer/head). A Brinkmann Heidolph Mechanical Overhead Stirrer RZR1 was used for the homogenization. After 1 min of extraction, tubes were incubated in ice for 30 min, then homogenized again for another minute. If sonication was included after this step, samples were sonicated on ice with a Vibracell ultrasonic processor (Bioblock Scientific) at 4W output for 5×10 s with 1 s breaks. Following Bradford protein concentration measurement (BioRad), supernatants were used for polyacrylamide gel electrophoresis. When supernatants were treated with λ protein phosphatase, 1,600 units of λ protein phosphatase (New England Biolabs) and 1 mM MnCl2 were added to sonicated extracts prepared in phosphatase inhibitor-free buffer [10 mM HEPES pH 7.5, 100 mM KCl, 0.1 mM EDTA, 5% glycerol, 0.1% Triton X-100, 5 mM DTT, and EDTA-free Complete Mini Protease Inhibitor Cocktail Tablet (Roche)] and subsequently incubated for 30 min at 30°C. Reaction was stopped by adding 1× NuPAGE LDS sample buffer (Life Technologies), 500 mM DTT, and incubation for 10 min at 70°C. We loaded 50 µg total protein on Novex 4% Tris-Glycine precast gels (Life Technologies) for PER, TIM, and CLK immunoblotting, except specifically indicated. When indicated, NuPAGE Novex 3–8% Tris-Acetate gels were used for TIM and CLK immunoblotting for a better resolution of hyperphosphorylated forms. Samples (50 µg) for FLAG, CK2α, and CK2β immunoblots were run on NuPAGE Novex 4–12% Bis-Tris precast gels (Life Technologies). Electrophoresis and blotting were done according to the manufacturer's instructions except for a 3 h running time for Tris-Acetate and a 2 h running time for Tris-Glycine gels. Equal loading was verified by Ponceau S staining on blotting membranes, which were blocked in 5% nonfat dry milk in TBST (Tris-Buffered Saline with 0.1% Tween-20) for 1 h at 25°C and then incubated with the primary antibody overnight at 4°C. The following primary antibodies were used diluted in 5% milk in TBST: rabbit anti-V5 (Sigma-Aldrich V8137, Lot 019K4827) at 1∶4,000, rabbit anti-myosin heavy chain (MHC, kind gift of Roger E. Karess, Institut Jacques Monod, Paris) at 1∶400,000, rabbit anti-CK2α (Abcam ab81435) at 1∶1,000, mouse anti-CK2β (Calbiochem 6D5 218712) at 1∶1,000, rat anti-TIM [7] at 1∶2,000, goat anti-CLK (Santa Cruz Biotechnology sc27070) at 1∶1000, rabbit anti-PER [66] at 1∶10,000, guinea pig anti-VRILLE [44] at 1∶5,000, and guinea pig anti-PDP1ε [67] at 1∶5,000. For immunoblotting of anti-FLAG immunoprecipitations, guinea pig GP90 anti-CLK [18] at 1∶1,000 was used since an aspecific IgG-derived band revealed with the SC27070 anti-CLK on the immunoprecipitates. Membranes were washed three times for 10 min, then the HRP-conjugated secondary antibodies (Santa Cruz Biotechnology) were added diluted in 5% milk in TBST: goat anti-rabbit (1∶10,000), goat anti-rat (1∶20,000), goat anti-mouse (1∶20,000), donkey anti-goat (1∶10,000), and goat anti-guinea pig (1∶10,000). In the case of anti-CK2β, TrueBlot ULTRA Anti-Mouse IgG-HRP (eBioscience, 1∶2,000) was used as a secondary antibody to circumvent problems resulting from primary antibody light chain detection after immunoprecipitation. Blots were revealed with the Amersham ECL Plus reagent (GE Healthcare). SimplyBlue SafeStain (Life Technologies) was used to stain membranes after blotting. Images were quantified with the NIH ImageJ (1.43 k) software after background subtraction. Calculations were done and histograms were generated with Microsoft Excel. Immunoprecipitation Fly head extracts were prepared as described above, except that HE buffer [20 mM HEPES pH 7.5, 150 mM KCl, 0.1 mM EDTA, 0.1% NP-40, 5% glycerol, Complete Mini Protease Inhibitor Cocktail Tablet (Roche), Phosphatase Inhibitor Cocktail 2 and 3 (Sigma-Aldrich) and 20 mM β-glycerophosphate] was used for the homogenization. We incubated 1 or 2 mg total protein overnight at 4°C with either 25 µl EZview Red anti-FLAG M2 Affinity Gel (Sigma-Aldrich) or 25 µl Protein G Sepharose (Pierce) mixed with 10 µl of anti-CLK antibody (Santa Cruz Biotechnology SC27069). Following three 10 min washes, bound complexes were eluted in 1× NuPAGE LDS sample buffer (Life Technologies) without DTT for 10 min at 70°C. Supernatants were complemented with DTT (500 mM) and reduced for 10 min at 70°C. Cell Culture Experiments Drosophila Schneider 2 (S2) cells [kind gift of Anne Plessis (Institut Jaques Monod, Paris)] were maintained in SFX-Insect Medium (HyClone) supplemented with 10% fetal bovine serum (Sigma-Aldrich) and 1% penicillin-streptomycin solution (Sigma-Aldrich) as previously described [68]. Complementary single-stranded RNA-s were in vitro transcribed from purified PCR templates containing the T7 RNA polymerase promoter site on both ends, using the MEGAscript T7 Kit (Life Technologies). Reactions were purified with the MEGAclear Kit (Life Technologies) and precipitated in ethanol/sodium acetate for concentration followed by resuspension in 40 µl H2O and annealing of the two strands (30 min at 65°C and slow cooling to room temperature). RNA quality and quantity was assessed by spectrophotometry and agarose gel electrophoresis. Primers for per target sequence amplification by PCR were: TTAATACGACTCACTATAGGGAGAAAGGAGGACAGCTTCTGCTGC and TTAATACGACTCACTATAGGGAGAGATATGATCCCGGTGGCCGTG and for tim were: TTAATA; CGACTCACTATAGGGAGACTGGTTACTAGCAACTCCGCA and TTAATACGACTCA; and CTATAGGGAGAGCAGGATATTTCTCAGCAGCA. pAc-Clk-V5/His6 [10], pAc-Renilla luciferase (kind gift of M. Rosbash), and p3x69-luc (containing three copies of per E-box as enhancer element [10],[69]) were already described. Transient transfection was performed with Effectene (Qiagen) using plasmids purified with the Plasmid Midi Kit (Qiagen). DNA quantities were equalized for transfection by addition of empty pAc vector. The induction of CkIIα under the control of metallothionein promoter was achieved by adding 500 µM CuSO4 to the cells 1 d after transfection. For luciferase activity assays, 106 cells were seeded in six-well plates, left to proliferate in serum-free medium for 48 h, were transfected in serum-free medium, supplemented with serum and antibiotics 4 h later, and harvested 48 h posttransfection. Cells were washed in PBS and lysed on plate with Passive Lysis Buffer according to the Dual-Luciferase Reporter Assay System manual (Promega). Lysates were cleared by centrifugation at 4°C and 10 µl of supernatant was measured for firefly and Renilla luciferase activities with the Dual-Luciferase Reporter Assay System (Promega) on a Mithras LB 940 luminometer (Berthold Technologies). Firefly luciferase activities were normalized to corresponding Renilla luciferase activities to control for transfection efficiency and protein concentration. Experiments were made in duplicates or quadruplicates and repeated at least twice. For degradation assays, cells were seeded in 60 mm dishes (2.5×106 cells/dish) and treated with per and tim dsRNA (37.5 µg) in serum-free medium for 48 h followed by transfections in medium containing serum and antibiotics. One day posttransfection, cells were split in four equal volumes and seeded in 12-well plates followed by induction with CuSO4. One day after induction, cycloheximide (CHX, Sigma-Aldrich) was added to each well at a final concentration of 0.58 mM, and cells were harvested 0, 3, 6, and 9 h after the beginning of CHX treatment. After harvest, cells were centrifuged for 5 min at 2,000 g at 20°C, washed once with PBS, and pellets were frozen at −80°C until extraction. Protein extraction was achieved by lysing cells in 40 µl of HE buffer (described above) supplemented with 0.5% Triton X-100 by means of pipetting and vortexing. After centrifugation for 10 min at 14,000 rpm at 4°C, supernatants were subjected to Bradford assay. We used 20 µg protein for polyacrylamide gel electrophoresis. Blots were revealed with anti-V5 for CLK and with anti-MHC as a loading control. Both blots were quantified by ImageJ. V5 reactivity was normalized to MHC reactivity for each sample, which was used for the calculations that are plotted in Figure 2E. In Vitro Phosphorylation Assays In vitro transcription/translation and phosphorylation reactions were carried out as described previously [30], with the following differences: CLK protein with a N-terminal 6-histidin fusion tag as well as PER and TIM were expressed in TNT SP6-Quick Coupled High Yield Wheat Germ expression system (Promega) for 2 h at 25°C with the addition of 0.2 mM staurosporine to block phosphorylation. Subsequently CLK protein was precipitated with 20 µl nickel-nitrilotriacetate (Ni-NTA) agarose for 90 min at 4°C either with or without prior addition of PER or TIM expressing lysates. Affinity purified CLK protein with or without co-precipitated PER or TIM was subjected to on bead phosphorylation reactions by human casein kinase II holoenzyme (New England Biolabs) or recombinant human CK2α subunit (KinaseDetect, DK-5792 Aarslev, Denmark) in 50 µl phosphorylation buffer (20 mM Tris-HCl, pH 7.5, 50 mM KCl, 10 mM MgCl2) with 0.5 µCi/µl γ–32P-ATP at 30°C for the holoenzyme and at 37°C for CK2α. The amount of CLK-incorporated 32P-phosphate was quantified by autoradiography and densitometry after SDS-page electrophoresis and blotting to nitrocellulose membrane. The intensity of the 32P-signal was normalized by total CLK protein level, as quantified by Western blot analysis. Quantitative RT-PCR Total RNA was prepared from adult heads (about 35) using the Promega SV Total RNA Isolation System. It was quantified using the Nanodrop ND-1000 spectrophotometer, and the integrity of the RNA was verified using the Agilent 2100 bioanalyser with the eukaryote total RNA Nano assay. RNA was treated with rDNase (NucleoSpin RNA Kit, Macherey-Nagel) in solution after RNA isolation to ensure optimal conditions for pre-mRNA detection. One µg of total RNA was reverse-transcribed in a 50 µl final reaction in presence of 0.4 µM oligodT(15) or random hexamer primers (for detection of pre-mRNA-s), 8 mM dNTP, 40 units of RNasine, and 400 units of M-MLV RTase H-minus (Promega), during 3 h at 37°C. Quantitative PCR was performed with a Roche LightCycler (mRNA-s) or an Applied Biosystems 7900HT Fast Real-Time PCR System (pre-mRNA-s) using the SYBR green detection protocol of the manufacturer. We mixed 3 µl of a 25× diluted cDNA (or 1 ng/µl) with FastStart DNA MasterPLUS SYBR green I mix with 500 nM of each primer, and the reaction mix was loaded on the capillaries and submitted to 40 cycles of PCR (95°C/15 s; 60°C/10 s; 72°C/20 s for the Lightcycler and 50°C 2 min; 95°C/20 s; [95°C/1 s–60°C/25 s]×40 for the ABI instrument), followed by a fusion cycle in order to analyze the melting curve of the PCR products. Negative control without the reverse transcriptase was introduced to verify the absence of genomic DNA contaminants. Primers (see Table S2) were defined within exons (for mRNA-s) or in one intron and one exon (for pre-mRNA-s) using the PrimerSelect program of the Lasergene software (DNAStar). BLAST searches were performed to confirm gene specificity and the absence of multilocus matching at the primer site. The amplification efficiencies of primers were generated using the slopes of the standard curves obtained by a 10-fold dilution series of 4, with all experimental points falling within this range. The efficiency of the q-PCR amplifications for all pairs of primers is indicated in the table. Amplification specificity for each q-PCR reaction was confirmed by dissociation curve analysis. Determined Ct values (see Table S2) were then used for quantification, with the tubulin gene as reference. Each sample measurement was made at least in duplicate (technical replicate). Immunolabeling of Adult Brains Experiments were done on whole-mounted adult brains as previously described [46]. Primary antibodies were rabbit anti-PER [66] at 1∶15,000, guinea pig GP47 anti-CLK [26] at 1∶15,000, mouse anti-PDF (Developmental Studies Hybridoma Bank) at 1∶50,000, and rabbit anti-CK2α. (Abcam, ab81435) at 1∶100. Secondary goat antibodies (Life Technologies) were Alexa 647- or Alexa 594-conjugated anti-rabbit at 1∶5,000, Alexa 488- or Alexa 647-conjugated anti-guinea pig at 1∶2,000, and Alexa 594- or Alexa 488-conjugated anti-mouse at 1∶2,000. Fluorescence signals were analyzed with a Zeiss AxioImager Z1 microscope with an ApoTome structured illumination module and an AxioCam MRm digital camera. Images for subcellular localization of CK2α were acquired with a Zeiss LSM-700 confocal microscope. Fluorescence intensity of individual cells was quantified from digital images of single focal planes with the NIH ImageJ software. We calculated a fluorescence index: I = 1 00(S-B)/B, which gives the fluorescence percentage above background (S (Signal) is fluorescence intensity and B (Background) is average intensity of the region adjacent to the positive cell). Index values were then averaged for the four PDF-positive s-LNv cells of 12–20 brain hemispheres for each time point. Fly Stocks and Constructs Drosophila melanogaster stocks were maintained on a 12 h∶12 h LD cycle on standard corn meal-yeast-agar medium at 25°C. ClkJrk is a dominant allele of Clk, which results in a truncated and highly unstable CLK protein [59]. per01 [60], tim01 [61], w;tim-gal4-62 [62], w;;gal1118 [46], per01w;;13.2(per(Δ)-HA10His) F21 [63], yw;;P{UAS-CkIIα.Tik} T1 [15], yw;P{UAS-CkIIα.L} 35 [41], w;UAS-FLAG-CkIIα [51], and lines carrying UAS transgenes encoding each of the five CK2β isoforms [64] have been previously described. The gal1118 driver line in the adult brain is expressed in the small and large LNv-s in addition to some few nonclock cells [46]. UAS-RNAi flies against CkIIβ (stocks 32377 and 106845) and CkIIα (stock 17520 R-2) are described in http://stockcenter.vdrc.at/control/main and http://www.shigen.nig.ac.jp/fly/nigfly/index.jsp, respectively. Both CkIIβ RNAi lines (32377 and 106845) were induced in all the experiments using CkIIβ RNAi except specifically indicated. The UAS-FLAG-CkIIβ construct was made by inserting a FLAG-CK2ß coding segment (kindly provided by A. Bidwai, West Virginia University) into the pUAST vector, and w;UAS-FLAG-CkIIβ transgenic flies were generated by standard procedures. For in vitro phosphorylation assays, Clk constructs with a 6-histidine fusion tag as well as per and tim were expressed from a SP6 promoter incorporated in a pAc-5.1 vector, as described previously [30]. The FMO02931 expression plasmid was obtained from the Drosophila Genomics Resource Center (DGRC). It contains the full CkIIα ORF tagged C-terminally with FLAG and HA and driven by the metallothionein promoter. We verified the CkIIα ORF and the promoter region by sequencing. Behavioral Analysis Behavioral assays for locomotor activity rhythms were carried out with 1- to 5-d-old males at 25°C in Drosophila activity monitors (TriKinetics). Illumination was provided by standard white fluorescent low-energy bulbs. Light intensity at fly level was in the range of 300–1000 µW/cm2. Flies were first entrained to 12 h∶12 h light-dark (LD) cycles for 4 d and then transferred to constant darkness (DD). Activity data were analyzed from the second to the ninth day in DD. Data analysis was done with the FaasX 1.16 software that is derived from the Brandeis Rhythm Package (see [65]) and is freely available upon request (Apple Mac OSX only). Rhythmic flies were defined by χ2 periodogram analysis of an 8-d dataset with the following criteria (filter ON): power ≥20, width ≥1.5 h, with no selection on period value. Power and width represent the height and width of the periodogram peak, respectively, and give the significance of the calculated period. Genotypes with a reduced number of rhythmic flies (<50%), low power (<50), and high s.e.m. of the period (>1) are considered arrhythmic. Experiments were reproduced two or three times with very similar results. Protein Sample Preparation, Phosphatase Treatment, Sonication, and Western Blotting We entrained 1 to 5-d-old flies to 12 h∶12 h LD cycles for 4 d and transferred to DD (CT0 is 12 h after the last lights-OFF). Flies were collected on dry ice during the first day of DD (CT0–24). We homogenized 30–60 heads on ice in a modified RBS buffer [20]: 10 mM HEPES pH 7.5, 5 mM Tris-HCl pH 7.5, 50 mM KCl, 10% glycerol, 2 mM EDTA, 1% Triton X-100, 0.4% NP-40, 1 mM DTT, Complete Mini Protease Inhibitor Cocktail Tablet (Roche), Phosphatase Inhibitor Cocktail 2 and 3 (Sigma-Aldrich), and 20 mM β-glycerophosphate (3–4 µl buffer/head). A Brinkmann Heidolph Mechanical Overhead Stirrer RZR1 was used for the homogenization. After 1 min of extraction, tubes were incubated in ice for 30 min, then homogenized again for another minute. If sonication was included after this step, samples were sonicated on ice with a Vibracell ultrasonic processor (Bioblock Scientific) at 4W output for 5×10 s with 1 s breaks. Following Bradford protein concentration measurement (BioRad), supernatants were used for polyacrylamide gel electrophoresis. When supernatants were treated with λ protein phosphatase, 1,600 units of λ protein phosphatase (New England Biolabs) and 1 mM MnCl2 were added to sonicated extracts prepared in phosphatase inhibitor-free buffer [10 mM HEPES pH 7.5, 100 mM KCl, 0.1 mM EDTA, 5% glycerol, 0.1% Triton X-100, 5 mM DTT, and EDTA-free Complete Mini Protease Inhibitor Cocktail Tablet (Roche)] and subsequently incubated for 30 min at 30°C. Reaction was stopped by adding 1× NuPAGE LDS sample buffer (Life Technologies), 500 mM DTT, and incubation for 10 min at 70°C. We loaded 50 µg total protein on Novex 4% Tris-Glycine precast gels (Life Technologies) for PER, TIM, and CLK immunoblotting, except specifically indicated. When indicated, NuPAGE Novex 3–8% Tris-Acetate gels were used for TIM and CLK immunoblotting for a better resolution of hyperphosphorylated forms. Samples (50 µg) for FLAG, CK2α, and CK2β immunoblots were run on NuPAGE Novex 4–12% Bis-Tris precast gels (Life Technologies). Electrophoresis and blotting were done according to the manufacturer's instructions except for a 3 h running time for Tris-Acetate and a 2 h running time for Tris-Glycine gels. Equal loading was verified by Ponceau S staining on blotting membranes, which were blocked in 5% nonfat dry milk in TBST (Tris-Buffered Saline with 0.1% Tween-20) for 1 h at 25°C and then incubated with the primary antibody overnight at 4°C. The following primary antibodies were used diluted in 5% milk in TBST: rabbit anti-V5 (Sigma-Aldrich V8137, Lot 019K4827) at 1∶4,000, rabbit anti-myosin heavy chain (MHC, kind gift of Roger E. Karess, Institut Jacques Monod, Paris) at 1∶400,000, rabbit anti-CK2α (Abcam ab81435) at 1∶1,000, mouse anti-CK2β (Calbiochem 6D5 218712) at 1∶1,000, rat anti-TIM [7] at 1∶2,000, goat anti-CLK (Santa Cruz Biotechnology sc27070) at 1∶1000, rabbit anti-PER [66] at 1∶10,000, guinea pig anti-VRILLE [44] at 1∶5,000, and guinea pig anti-PDP1ε [67] at 1∶5,000. For immunoblotting of anti-FLAG immunoprecipitations, guinea pig GP90 anti-CLK [18] at 1∶1,000 was used since an aspecific IgG-derived band revealed with the SC27070 anti-CLK on the immunoprecipitates. Membranes were washed three times for 10 min, then the HRP-conjugated secondary antibodies (Santa Cruz Biotechnology) were added diluted in 5% milk in TBST: goat anti-rabbit (1∶10,000), goat anti-rat (1∶20,000), goat anti-mouse (1∶20,000), donkey anti-goat (1∶10,000), and goat anti-guinea pig (1∶10,000). In the case of anti-CK2β, TrueBlot ULTRA Anti-Mouse IgG-HRP (eBioscience, 1∶2,000) was used as a secondary antibody to circumvent problems resulting from primary antibody light chain detection after immunoprecipitation. Blots were revealed with the Amersham ECL Plus reagent (GE Healthcare). SimplyBlue SafeStain (Life Technologies) was used to stain membranes after blotting. Images were quantified with the NIH ImageJ (1.43 k) software after background subtraction. Calculations were done and histograms were generated with Microsoft Excel. Immunoprecipitation Fly head extracts were prepared as described above, except that HE buffer [20 mM HEPES pH 7.5, 150 mM KCl, 0.1 mM EDTA, 0.1% NP-40, 5% glycerol, Complete Mini Protease Inhibitor Cocktail Tablet (Roche), Phosphatase Inhibitor Cocktail 2 and 3 (Sigma-Aldrich) and 20 mM β-glycerophosphate] was used for the homogenization. We incubated 1 or 2 mg total protein overnight at 4°C with either 25 µl EZview Red anti-FLAG M2 Affinity Gel (Sigma-Aldrich) or 25 µl Protein G Sepharose (Pierce) mixed with 10 µl of anti-CLK antibody (Santa Cruz Biotechnology SC27069). Following three 10 min washes, bound complexes were eluted in 1× NuPAGE LDS sample buffer (Life Technologies) without DTT for 10 min at 70°C. Supernatants were complemented with DTT (500 mM) and reduced for 10 min at 70°C. Cell Culture Experiments Drosophila Schneider 2 (S2) cells [kind gift of Anne Plessis (Institut Jaques Monod, Paris)] were maintained in SFX-Insect Medium (HyClone) supplemented with 10% fetal bovine serum (Sigma-Aldrich) and 1% penicillin-streptomycin solution (Sigma-Aldrich) as previously described [68]. Complementary single-stranded RNA-s were in vitro transcribed from purified PCR templates containing the T7 RNA polymerase promoter site on both ends, using the MEGAscript T7 Kit (Life Technologies). Reactions were purified with the MEGAclear Kit (Life Technologies) and precipitated in ethanol/sodium acetate for concentration followed by resuspension in 40 µl H2O and annealing of the two strands (30 min at 65°C and slow cooling to room temperature). RNA quality and quantity was assessed by spectrophotometry and agarose gel electrophoresis. Primers for per target sequence amplification by PCR were: TTAATACGACTCACTATAGGGAGAAAGGAGGACAGCTTCTGCTGC and TTAATACGACTCACTATAGGGAGAGATATGATCCCGGTGGCCGTG and for tim were: TTAATA; CGACTCACTATAGGGAGACTGGTTACTAGCAACTCCGCA and TTAATACGACTCA; and CTATAGGGAGAGCAGGATATTTCTCAGCAGCA. pAc-Clk-V5/His6 [10], pAc-Renilla luciferase (kind gift of M. Rosbash), and p3x69-luc (containing three copies of per E-box as enhancer element [10],[69]) were already described. Transient transfection was performed with Effectene (Qiagen) using plasmids purified with the Plasmid Midi Kit (Qiagen). DNA quantities were equalized for transfection by addition of empty pAc vector. The induction of CkIIα under the control of metallothionein promoter was achieved by adding 500 µM CuSO4 to the cells 1 d after transfection. For luciferase activity assays, 106 cells were seeded in six-well plates, left to proliferate in serum-free medium for 48 h, were transfected in serum-free medium, supplemented with serum and antibiotics 4 h later, and harvested 48 h posttransfection. Cells were washed in PBS and lysed on plate with Passive Lysis Buffer according to the Dual-Luciferase Reporter Assay System manual (Promega). Lysates were cleared by centrifugation at 4°C and 10 µl of supernatant was measured for firefly and Renilla luciferase activities with the Dual-Luciferase Reporter Assay System (Promega) on a Mithras LB 940 luminometer (Berthold Technologies). Firefly luciferase activities were normalized to corresponding Renilla luciferase activities to control for transfection efficiency and protein concentration. Experiments were made in duplicates or quadruplicates and repeated at least twice. For degradation assays, cells were seeded in 60 mm dishes (2.5×106 cells/dish) and treated with per and tim dsRNA (37.5 µg) in serum-free medium for 48 h followed by transfections in medium containing serum and antibiotics. One day posttransfection, cells were split in four equal volumes and seeded in 12-well plates followed by induction with CuSO4. One day after induction, cycloheximide (CHX, Sigma-Aldrich) was added to each well at a final concentration of 0.58 mM, and cells were harvested 0, 3, 6, and 9 h after the beginning of CHX treatment. After harvest, cells were centrifuged for 5 min at 2,000 g at 20°C, washed once with PBS, and pellets were frozen at −80°C until extraction. Protein extraction was achieved by lysing cells in 40 µl of HE buffer (described above) supplemented with 0.5% Triton X-100 by means of pipetting and vortexing. After centrifugation for 10 min at 14,000 rpm at 4°C, supernatants were subjected to Bradford assay. We used 20 µg protein for polyacrylamide gel electrophoresis. Blots were revealed with anti-V5 for CLK and with anti-MHC as a loading control. Both blots were quantified by ImageJ. V5 reactivity was normalized to MHC reactivity for each sample, which was used for the calculations that are plotted in Figure 2E. In Vitro Phosphorylation Assays In vitro transcription/translation and phosphorylation reactions were carried out as described previously [30], with the following differences: CLK protein with a N-terminal 6-histidin fusion tag as well as PER and TIM were expressed in TNT SP6-Quick Coupled High Yield Wheat Germ expression system (Promega) for 2 h at 25°C with the addition of 0.2 mM staurosporine to block phosphorylation. Subsequently CLK protein was precipitated with 20 µl nickel-nitrilotriacetate (Ni-NTA) agarose for 90 min at 4°C either with or without prior addition of PER or TIM expressing lysates. Affinity purified CLK protein with or without co-precipitated PER or TIM was subjected to on bead phosphorylation reactions by human casein kinase II holoenzyme (New England Biolabs) or recombinant human CK2α subunit (KinaseDetect, DK-5792 Aarslev, Denmark) in 50 µl phosphorylation buffer (20 mM Tris-HCl, pH 7.5, 50 mM KCl, 10 mM MgCl2) with 0.5 µCi/µl γ–32P-ATP at 30°C for the holoenzyme and at 37°C for CK2α. The amount of CLK-incorporated 32P-phosphate was quantified by autoradiography and densitometry after SDS-page electrophoresis and blotting to nitrocellulose membrane. The intensity of the 32P-signal was normalized by total CLK protein level, as quantified by Western blot analysis. Quantitative RT-PCR Total RNA was prepared from adult heads (about 35) using the Promega SV Total RNA Isolation System. It was quantified using the Nanodrop ND-1000 spectrophotometer, and the integrity of the RNA was verified using the Agilent 2100 bioanalyser with the eukaryote total RNA Nano assay. RNA was treated with rDNase (NucleoSpin RNA Kit, Macherey-Nagel) in solution after RNA isolation to ensure optimal conditions for pre-mRNA detection. One µg of total RNA was reverse-transcribed in a 50 µl final reaction in presence of 0.4 µM oligodT(15) or random hexamer primers (for detection of pre-mRNA-s), 8 mM dNTP, 40 units of RNasine, and 400 units of M-MLV RTase H-minus (Promega), during 3 h at 37°C. Quantitative PCR was performed with a Roche LightCycler (mRNA-s) or an Applied Biosystems 7900HT Fast Real-Time PCR System (pre-mRNA-s) using the SYBR green detection protocol of the manufacturer. We mixed 3 µl of a 25× diluted cDNA (or 1 ng/µl) with FastStart DNA MasterPLUS SYBR green I mix with 500 nM of each primer, and the reaction mix was loaded on the capillaries and submitted to 40 cycles of PCR (95°C/15 s; 60°C/10 s; 72°C/20 s for the Lightcycler and 50°C 2 min; 95°C/20 s; [95°C/1 s–60°C/25 s]×40 for the ABI instrument), followed by a fusion cycle in order to analyze the melting curve of the PCR products. Negative control without the reverse transcriptase was introduced to verify the absence of genomic DNA contaminants. Primers (see Table S2) were defined within exons (for mRNA-s) or in one intron and one exon (for pre-mRNA-s) using the PrimerSelect program of the Lasergene software (DNAStar). BLAST searches were performed to confirm gene specificity and the absence of multilocus matching at the primer site. The amplification efficiencies of primers were generated using the slopes of the standard curves obtained by a 10-fold dilution series of 4, with all experimental points falling within this range. The efficiency of the q-PCR amplifications for all pairs of primers is indicated in the table. Amplification specificity for each q-PCR reaction was confirmed by dissociation curve analysis. Determined Ct values (see Table S2) were then used for quantification, with the tubulin gene as reference. Each sample measurement was made at least in duplicate (technical replicate). Immunolabeling of Adult Brains Experiments were done on whole-mounted adult brains as previously described [46]. Primary antibodies were rabbit anti-PER [66] at 1∶15,000, guinea pig GP47 anti-CLK [26] at 1∶15,000, mouse anti-PDF (Developmental Studies Hybridoma Bank) at 1∶50,000, and rabbit anti-CK2α. (Abcam, ab81435) at 1∶100. Secondary goat antibodies (Life Technologies) were Alexa 647- or Alexa 594-conjugated anti-rabbit at 1∶5,000, Alexa 488- or Alexa 647-conjugated anti-guinea pig at 1∶2,000, and Alexa 594- or Alexa 488-conjugated anti-mouse at 1∶2,000. Fluorescence signals were analyzed with a Zeiss AxioImager Z1 microscope with an ApoTome structured illumination module and an AxioCam MRm digital camera. Images for subcellular localization of CK2α were acquired with a Zeiss LSM-700 confocal microscope. Fluorescence intensity of individual cells was quantified from digital images of single focal planes with the NIH ImageJ software. We calculated a fluorescence index: I = 1 00(S-B)/B, which gives the fluorescence percentage above background (S (Signal) is fluorescence intensity and B (Background) is average intensity of the region adjacent to the positive cell). Index values were then averaged for the four PDF-positive s-LNv cells of 12–20 brain hemispheres for each time point. Supporting Information Figure S1. CLK degradation is accelerated in tim > Tik flies. (A–E) Western blot of head extracts from flies collected at DD1. Time (h) is indicated as CT. Gray and black bars represent subjective day and subjective night, respectively. A CB stained band in the size range of CLK is used as a loading control for blots run on 4% gels. Brackets indicate hypo- and hyperphosphorylated forms of CLK. At least two independent experiments were performed for each blot. (A) Western blot of CLK protein in sonicated extracts of the indicated genotypes. Two representative examples are shown in addition to Figure 1A. (B) Western blot of CLK, PER, and TIM proteins as in (A) but from nonsonicated extracts of the indicated genotypes. (C) Sonicated extracts from the indicated genotypes were treated with or without λ protein phosphatase at the respective temperatures, and CLK protein was detected by Western blot. (D, Left) Western blot of CLK protein in nonsonicated head extracts from the indicated genotypes collected on the first day of constant darkness. CLK protein is shown on the immunoblot. A CB stained band in the size range of CLK is used as a loading control. (Right) CLK protein/Clk mRNA ratio of the indicated genotypes. Values from quantification of CLK bands of the left panel were divided with the values of RT-qPCR from Figure 1B. w; tim-gal4 at CT2 was set to 100. (E) CkIIα RNAi decreases CK2α protein abundance. Samples were run on a 4–12% Bis-Tris gel. Anti-CK2α primary antibody was used for the blot. One copy of tim-gal4 and two copies of the CkIIα RNAi construct were used for the experimental genotype. https://doi.org/10.1371/journal.pbio.1001645.s001 (PDF) Figure S2. CK2 affects CLK in a per0 background and impacts on Clk expression posttranscriptionally. (A, B, D) Western blot of head extracts from flies collected at DD1. Time (h) is indicated as CT. A CB stained band in the size range of CLK is used as a loading control for blots run on 4% gels. Brackets indicate hypo- and hyperphosphorylated forms of CLK. At least two independent experiments were performed for each blot. (A) Comparison of CLK phosphorylation states between per+tim+ sonicated extracts and their per0 tim0 counterparts. We loaded 100 µg protein. a, b, and c are different protein extracts from the same genotype at the same time point. w; tim-gal4/+(tim-gal4/+), per0 w; tim0 (per0 tim0), and per0 w; tim0 tim-gal4; UAS-CkIIαTik (per0 tim0 CkIIαTik) were used. (B, Left) Comparison between tim>Tik and controls in a per0 background for TIM and CLK [per0w; tim-gal4 (per0 +), per0w; tim-gal4; UAS-CkIIαTik (per0 CkIIαTik)]. a and b are different nonsonicated protein extracts from the same genotype at the same time point. We loaded 100 µg of extracts. Extracts were run on a 3–8% Tris-Acetate gel for TIM. (Middle) Quantitative RT-PCR measurements of Clk mRNA levels in heads of flies collected at DD1. Results are means of pooled values from two time points (CT2 and 14) with at least two independent samples for each time point. Error bars indicate s.e.m. Average values were normalized to the mean of the control (per0 w; tim-gal4) set to 100. Previous analysis of separate values at CT2 and CT14 indicated that they were similar (Table S1), justifying their common treatment (see above). (Right) CLK protein/Clk mRNA ratio of the indicated genotypes. Values from quantification of CLK bands of the left panel were divided with the values of RT-qPCR from the middle panel. per0w; tim-gal4 was set to 100. (C) Quantitative RT-PCR measurements of Clk pre-mRNA levels in head extracts of flies collected at DD1. Average values from three independent experiments were normalized to the mean of the control (w; tim-gal4) at CT0 set to 100. Error bars represent s.e.m. (D) Comparison between tim>Tik and controls in a per0 background for the VRI and PDP1ε proteins. Samples were resolved on a 3–8% Tris-Acetate gel. per0w; tim-gal4 (per0 +) and per0w; tim-gal4; UAS-CkIIαTik (per0 CkIIαTik) were used. (E) Quantitative RT-PCR measurements of cry mRNA levels in heads of flies collected at DD1. Results are means of pooled values from two time points (CT2 and 14, which gave similar values) with at least two independent samples for each time point. Error bars indicate s.e.m. Average values were normalized to the control (per0 w; tim-gal4) average values set to 100. https://doi.org/10.1371/journal.pbio.1001645.s002 (PDF) Figure S3. CkIIβ RNAi decreases CK2β protein abundance and causes PER and TIM accumulation. (A) Western blot of CK2β. Extracts of tim> CkIIβ RNAi (w; tim-gal4/106845; 32377/+) and tim-gal4/+ controls in a per+ background were run on a 4–12% Bis-Tris gel. VIIa, d, and c indicate different isoforms of CK2β [64]. (B) Quantification of CLK, PER, and TIM signal intensity on Western blots in tim> CkIIβ RNAi and tim-gal4/+ controls at six time points of DD1. Two independent experiments were quantified. Error bars stand for the difference of the respective values from each experiment and their mean. The intensities were normalized to the signal of a CB stained band. The highest intensity signal in w;tim-gal4/+ was set to 100. https://doi.org/10.1371/journal.pbio.1001645.s003 (PDF) Figure S4. CK2α overexpression induces a delay in TIM oscillation. (A) Western blot of nonsonicated head extracts from flies collected at DD1. A CB stained band in the size range of CLK is used as a loading control. tim > CkIIα and tim-gal4/+ controls are compared for TIM and PER. (B) Quantification of PER and TIM signal intensity from the Western blot in (A). The highest intensity signal in w;tim-gal4/+ was set to 100. (C) CK2 phosphorylates CLK in vitro. (Top) Wild-type CLK was translated with an N-terminal 6-histidine fusion tag in vitro, affinity purified either in the absence or presence of PER and TIM, and subjected to phosphorylation assays by incubation with γ–32P-ATP either in the absence (−) or presence (+) of CK2. Intensity of incorporated 32P-phosphate into CLK (32P) was analyzed by autoradiography, and total CLK protein levels (CLK) were determined by Western blot analysis. (Bottom) Quantification of CLK-incorporated 32P-phosphate after normalization toward total CLK protein levels. Average CLK phosphorylation from at least three experiments ± s.e.m. are shown in the figure with wild-type CLK set to 100. https://doi.org/10.1371/journal.pbio.1001645.s004 (PDF) Figure S5. per and tim transcription in tim > Tik and tim > CkIIβ RNAi animals. (A) Quantitative RT-PCR measurements of per and tim mRNA levels in head extracts of flies collected at CT2. tim>Tik and controls are compared in a per0 tim0 background. Mean mRNA levels +/− s.e.m. from at least three independent experiments are shown. Average values were normalized to the control mean (per0 tim0) set to 100. Genotypes: per0 w; tim0 (per0 tim0) and per0 w; tim0 tim-gal4; UAS-CkIIαTik (per0 tim0 CkIIαTik). (B) Quantitative RT-PCR measurements of per and tim mRNA levels in tim > CkIIß-RNAi and control flies. Values were normalized to the maximum value (control at CT12) set to 100. Mean mRNA levels +/− s.e.m. from at least three independent experiments are shown. https://doi.org/10.1371/journal.pbio.1001645.s005 (PDF) Table S1. Mean values of quantitative RT-PCR results with associated s.e.m. Relative values of mRNA abundance measured by quantitative RT-PCR (see Material and Methods) are indicated for samples on per0 or tim0 background collected at CT2 and CT14. Mean levels are normalized to the highest value in the control genotype (per0w;tim-gal4 or w;tim0 tim-gal4;) set to 100. The number of independent samples for each time point is shown in Figure 2 and Figure 6. https://doi.org/10.1371/journal.pbio.1001645.s006 (DOCX) Table S2. qPCR primer specifications. *This pair of primers was used in Figure 6B and Figure S2C. **These two pairs of primers were used in Figure 6B. Efficiency (E), DNA concentration ratio between cycles n+1 and n (1<E<2). R2, coefficient of determination of the calibration curve. Ct, Cycle threshold. https://doi.org/10.1371/journal.pbio.1001645.s007 (DOCX) Acknowledgments We thank F.R. Jackson for his generous gift of unpublished UAS-FLAG-CkIIß flies and I. Edery, P.E. Hardin, J. Jia, R. Karess, M. Rosbash, R. Stanewsky, and the Drosophila Genomics Resource Center (Bloomington, U.S.A.) for fly stocks, antibodies, and other materials. Fly stocks were also provided by the Bloomington Drosophila stock center (United States), the Vienna Drosophila RNAi Center (Austria), and the National Institute of Genetics fly stock center (Mishima, Japan). We are grateful to A. Plessis, J. Menet, and especially A. Lamouroux for help with S2 cells, as well as Eric Jacquet for expert advice on quantitative RT-PCR. We also thank E. Chélot and M. Krecsmarik for help with microscopy, C. Vias for dissections, M. Boudinot for the FaasX software, and the members of the Rouyer lab for critical reading of the manuscript.
Molecular Composition and Ultrastructure of the Caveolar Coat Complexdoi: 10.1371/journal.pbio.1001640pmid: 24013648
Introduction Caveolae, plasma membrane invaginations with a diameter of about 50–80 nm and a characteristic flask-like shape, were first identified 60 y ago by electron microscopy [1],[2]. They are found in many different cell types, and are particularly abundant in endothelial cells, adipocytes, and muscle cells [3]–[5]. Caveolin 1 is the major integral membrane protein in caveolae, and is essential for their formation in nonmuscle cells [6]–[8]. There are two further caveolin proteins. Caveolin 2 has the same distribution as caveolin 1, and hetero-oligomerises with this protein, but is not essential for forming caveolae [9],[10]. Caveolin 3 is only expressed in striated muscle and is required for making caveolae in this tissue [11]. Mice lacking caveolin 1 have multiple phenotypes including hyperglycaemia, lipidosis, and changes in endothelial permeability [8],[12]–, and humans with a loss of function mutation in caveolin 1 are severely lipodystrophic [16]. The molecular and cellular causes of these phenotypes are not completely understood, but caveolae have been proposed to act in a variety of ways, including as endocytic vesicles, as mechanoprotective or mechanosensing membrane reservoirs, as regulators of lipid transport, and as scaffolds for signaling events [3]–[5],[17],[18]. Caveolins were long believed to be the sole protein component of caveolae, and they clearly have a central role in biogenesis of these structures. Direct evidence for this is provided by experiments showing that expression of caveolin in bacteria is sufficient to generate caveolae-like membrane vesicles [19]. Recently, however, the list of caveolar components has been considerably expanded, with the identification of cavin proteins [5],[20]–[26], EHD2 [27]–[29], and pacsin 2 [29],[30]. This implies that caveolar biogenesis and function involves a complex set of proteins, but how these proteins assemble physically and spatially to generate caveolae has yet to be fully elucidated. The cavins (cavin 1, 2, 3, and 4) localise to caveolae, and are important for their formation and dynamics [5],[20]–[26]. It should be noted that the cavin nomenclature is not the same as the standard gene names for this family (cavin 1, PTRF; cavin 2, SDPR; cavin 3, PRKCDBP; cavin 4, MURC; Cavin 3 is also frequently referred to as SRBC [5]). Cavin 1 is expressed in all cell types that express caveolins, and is essential for making caveolae in vivo [24],[31]. Phenotypes of cavin 1 knockout mice resemble those of caveolin 1 caveolin 3 double knockout mice, implying a central role for cavin 1 in caveolar biogenesis [14],[31],[32]. In contrast to cavin 1, expression of cavins 2, 3, and 4 is more cell and tissue-specific, with cavin 4 only being expressed in striated muscle [22],[33]. Intriguingly, cavin 2 is required for morphogenesis of caveolae in the endothelia of some tissues but not others, and cavin 3 appears to be dispensable for forming caveolae [34]. Cavins are present in large complexes that can be detected on sucrose gradients, and the apparent size of these complexes differs between tissues [34],[35]. Cavins can be co-immunoprecipitated with each other, and cavins 1 and 2 interact directly [21]. Overexpression of cavin 2 distorts and elongates caveolae, while cavin 3 depletion reduces intracellular transport of caveolar vesicles [20],[21]. These data suggest that in vivo caveolae may contain different complements of cavin proteins, and that cavins 2 and 3 may regulate caveolar function and dynamics in a cell-type-specific manner. How different complements of cavins can be incorporated into morphologically uniform caveolae remains unclear. Caveolae contain an estimated 140–180 caveolin molecules [36] and oligomerisation of caveolins is likely to be critical for caveolae formation. Oligomerisation is cholesterol-dependent, and occurs initially in the trans-Golgi network, resulting in 8S complexes with an estimated 14–16 caveolin molecules [10],[35],[37]. Upon vesicular transport to the plasma membrane, such caveolin oligomers somehow assemble into higher order oligomeric complexes, which are likely to constitute a key structural unit of the caveolar coat [38]. Cavin proteins first co-localise with caveolins after delivery to the plasma membrane, but the nature of this association is not clear, as cavins and caveolins do not co-fractionate on sucrose gradients of detergent-solubilised cell lysates [35]. Moreover, whether the additional caveolar proteins EHD2 and pacsin 2, both of which are likely to regulate caveolar function or dynamics in some way [27]–[30], associate with caveolins and/or cavins directly or with other determinants within caveolar membranes is unknown. Electron microscopy (EM) techniques have been used to try and ascertain whether there is a protein coat that surrounds caveolae, and to determine its organisation. Platinum and chromium coating of plasma membrane fragments suggests the presence of spiral ridges or striations on the bulb that can be detected by scanning EM [6],[39]. A similar distribution of densities is seen after high-pressure freezing and freeze-substitution [40], and in conventionally stained ultrathin sections periodic local maxima in electron density are observed around the caveolar membrane [41]. However, the organisation of such protein densities as well as the actual shape of caveolae is contingent on the fixation method used [42], and so the relationship between the striations observed by scanning EM and the protein densities seen by transmission EM is not clear. It has been suggested that the caveolar neck forms a separate domain distinct from the caveolar bulb, but neither the identity of the protein components around the bulb nor those around the neck are fully defined [41],[43]. Although some studies report that caveolins are found all around the caveolar bulb [44],[45], others report a more restricted distribution to the sides or neck of caveolae [46]. Inherent limitations of immuno-labeling, including the possibility that epitopes may not be equally accessible all over the caveolar surface and the reduced spatial resolution provided by combining primary and secondary antibodies, mean that it has been hard to address this issue unequivocally [43],[47]. Finally, the subcaveolar distribution of more recently identified components of caveolae, such as the cavins and EHD2, has yet to be addressed. In the work reported here we have addressed fundamental questions central to an understanding of the protein machinery responsible for generating caveolae. We determine the identity and biochemical properties of the complexes into which caveolins and cavins assemble. We find that cavins and caveolins, but not EHD2 and pacsin 2, are found in a specific 80S complex, which we term the caveolar coat complex. Both immuno-EM and EM labeling with MiniSOG fusion proteins [48] show that this unitary complex does indeed localize all around the caveolar bulb. EHD2 defines a spatially and biochemically distinct domain at the neck of caveolae, and is not required for formation of the coat complex. These data provide conceptual advances in our understanding of how caveolae are generated. Results Caveolins and Cavins Assemble Into an 80S Caveolar Coat Complex Caveolin 1 has been reported to exist as a labile high molecular weight complex of about 70S, as determined by velocity gradient centrifugation [35],[43]. Such oligomeric complexes of caveolin 1 are readily lost upon extraction of cells with detergents known to fully solubilize membranes rich in cholesterol and sphingolipids, and the apparent size of caveolin 1 complexes is highly sensitive to the nature of the detergent used for solubilisation (Figure S1) [35]. We looked for ways to stabilise complexes containing caveolin prior to cell lysis. These experiments were carried out in a clonal HeLa cell line stably expressing caveolin-1-GFP at about 20% of the level of endogenous caveolin 1 (Figure S2A). We found that cross-linking of live HeLa cells with the membrane permeable and reversible cross-linker DSP (dithiobis(succinimidylpropionate)) efficiently and reproducibly stabilised a high molecular weight complex containing caveolin 1 (Figure 1A). Upon cross-linking, caveolin 1 was found almost exclusively in a single sharp peak in fractions 8–10 of 10–40% sucrose velocity gradients (Figure 1A and Figure S1). This was the case even when cells were lysed in 2% w/v (70 mM) octyl-glucoside (OG), a condition where without cross-linker most caveolin 1 is found in the top four fractions of the gradient (Figure S1) [35]. Download: PPT PowerPoint slide PNG larger image TIFF original image Figure 1. Caveolins and cavins assemble into an 80S complex. (A) HeLa cells were cross-linked with DSP and solubilised in 1% Triton X-100/1% octyl glucoside. Lysates were fractionated on 10–40% sucrose gradients, followed by Western blotting of gradient fractions 1–12 using antibodies against caveolin 1, cavin 1, GFP, clathrin heavy chain, or flotillin 2. Gradients were prepared from cells expressing caveolin-1-GFP (top panel) or flotillin-2-GFP as a control (middle panel). The bottom panel shows a gradient from flotillin-2-GFP expressing cells that had not been cross-linked prior to cell lysis. The high molecular weight (HMW) peak 8–10 of caveolin 1 and cavin 1 is boxed. The distribution of molecular weight protein standards in the gradients is indicated on the bottom. (B) Caveolin-1-GFP or flotillin-2-GFP (to provide a control) was immuno-isolated from pooled gradient fractions 8–10, indicated “HMW” in (A). A control immuno-isolation was also performed from fractions 8–10 of cells expressing GFP. Eluted proteins were visualized by silver staining. Cavin 1, caveolin-1-GFP, cavin 3, and caveolins are indicated. The bands specific to the flotillin-2-GFP immunoprecipitation, marked with asterisks (*) are flotillin-2-GFP, and endogenous flotillins 1 and 2. (C) The caveolin 1 HMW complex was analyzed by LC-MS/MS, and the graph shows the number of peptides for each protein identified. Proteins are ranked left to right by number of peptides identified. (D) Western blots of the starting material (In) and unbound material (Un) after immuno-isolation of caveolin-1-GFP, flotillin-2-GFP, or GFP from pooled HMW fractions 8–10. Membranes were probed with anti-GFP or anti-cavin 1 antibodies. https://doi.org/10.1371/journal.pbio.1001640.g001 In the presence of cross-linker, cavin 1 co-fractionated precisely with caveolin 1 (Figure 1A), whereas without cross-linking cavin 1 was found in a broad peak in the centre of the gradient (centred on fractions 6/7, about 60S; [35]) and did not co-fractionate with caveolin 1 (Figure 1A). Using the profile of cellular 80S ribosomes and purified 60S ribosomal subunits as a reference, we estimated that the cross-linked high molecular weight caveolin and cavin complexes have a sedimentation rate of about 80S (Figure S2B). 80S caveolin complexes were also detected in control cell lines expressing flotillin-2-GFP or GFP alone, and so were not dependent on the presence of caveolin-1-GFP (Figure 1A). Caveolin-1-GFP had the same distribution as endogenous caveolin 1 in the gradient (Figure 1A). Cross-linking did not change the distribution of caveolin 1 or cavin 1 when studied by immunofluorescence, and did not alter the appearance or brightness of either protein as they co-localised in puncta that are likely to correspond to individual caveolae (Figure S3A and S3B). Therefore, cross-linking with DSP does not itself induce redistribution of cavins or caveolin 1 within cells before lysis. These results suggest that cross-linking of live cells with DSP stabilises caveolin/cavin interactions that are otherwise lost during solubilisation with detergents, and thereby allows identification of a large 80S complex containing cavins and caveolins. To determine the protein composition of the 80S complex, and to confirm that co-fractionation of caveolin 1 and cavin 1 after cross-linking indeed reflects co-assembly of both proteins into the same complex, caveolin-1-GFP was immuno-isolated from pooled fractions 8–10 (HMW, high molecular weight fractions) using magnetic anti-GFP beads (Figure 1B). Immuno-isolation from the same fractions of gradients of cell lysates from flotillin-2-GFP or GFP expressing cells served as controls. The complexes were washed extensively and eluted with a pH shift. Eluates were reduced with DTT to disassemble DSP cross-links, and analysed by SDS-PAGE and silver staining. This revealed the presence of four major bands specific to the caveolin-1-GFP immunoprecipitate (Figure 1B). Western blotting (Figure S1C), and excision of the relevant bands for analysis by mass spectrometry, both confirmed that these correspond to caveolin-1-GFP, cavin 1, cavin 3, and caveolins 1 and 2. Tandem mass spectrometry of the immuno-precipitated 80S complex identified all of the above proteins, and these were the only abundant proteins detected. Cavin 2 was also detected in the complex, though at significantly lower levels (Figure 1C, File S1). Western blotting of the isolated complex confirmed that all caveolar proteins co-purified specifically with caveolin-1-GFP, and not with affinity-purified flotillin-2-GFP complexes or mock purifications from GFP control cells (Figure S2C). Cavin 1 yielded the most tryptic peptides identified by mass spectrometry analysis of isolated 80S complexes, and was the strongest band on silver-stained gels (Figure 1B and 1C, File S1). In addition, immuno-isolation of caveolin-1-GFP from the HMW fractions caused cavin 1 to be efficiently depleted from these fractions, showing that the large majority of cellular cavin 1 is present in complexes with caveolin 1 (Figure 1D). Together these data show that caveolins 1 and 2 and cavins 1, 2, and 3 assemble into an 80S complex, and that cavin 1 is a major component of this complex. Moreover, we can state that there are no further abundant protein components in the isolated complex. We hereafter refer to this complex as the caveolar coat complex. There Is a Single Species of Caveolar Coat Complex To study the role of cavins 1, 2, and 3 in the assembly of the caveolar coat complex, we generated separate HeLa cell lines stably expressing each protein with a C-terminal TEV-GFP-10×His tag (from now on referred to as cavin-1, -2, or -3-GFP). All cavin fusion proteins localised to caveolae by light microscopy (Figure S4A, Table S1), and cavins 1, 2, and 3 co-localised extensively with each other (Figure S4B). Fusion proteins were expressed at low levels, with cavin-1-GFP being expressed at about 20% of endogenous cavin 1 (Figure S4C). Endogenous cavin 2 is difficult to detect in HeLa cells with available antibodies (although it is present, albeit at low levels, as it is detected by mass spectrometry, Figure 1E and File S1), so cavin-2-GFP is likely to be present at significantly higher levels than endogenous cavin 2 in the cavin-2-GFP cell line. The expression of endogenous cavin 3 was specifically down-regulated in several independent cavin-3-GFP-expressing clonal cell lines, resulting in cell lines expressing cavin-3-GFP instead of cavin 3. In the cell line used for the experiments presented here, expression of cavin-3-GFP was similar to that of cavin 3 in control cells (Figure S4C). We analysed the distribution of the three cavin-GFP constructs in gradients from cells that had been cross-linked with DSP prior to cell lysis. Upon cross-linking, all cavin-GFP fusion proteins, endogenous cavin 1 and cavin 3, as well as caveolin 1 co-fractionated in the 80S fractions 8–10 (Figure 2A and 2B). Cavin-2-GFP exhibited an additional minor peak in fraction 3, and its expression led to the dissociation of small amounts of cavin 1 from the caveolar coat complex into fractions 5–7. In contrast, cavin-1-GFP and cavin-3-GFP were exclusively found in the caveolar coat complex. These data show that the cavin-GFP fusion proteins are incorporated into the caveolar coat complex just like endogenous cavins. To check that the coat complex does not reflect association or cross-linking of the cavin proteins after lysis, HeLa cell lines stably transfected with either cavin-3-GFP or cavin-3-mCherry were grown in the same dish, cross-linked, and lysed. Subsequent immunoisolation with anti-GFP antibodies yielded complexes devoid of cavin 3-mCherry (Figure S3C), arguing that cavin complexes do not form after cell lysis. Download: PPT PowerPoint slide PNG larger image TIFF original image Figure 2. There is a single species of caveolar coat complex. (A) Cross-linked and detergent solubilised (1% Triton X-100/1% octyl glucoside) HeLa cell extracts were fractionated by velocity centrifugation (10–40% sucrose), followed by Western blotting of fractions 1–12. Gradients were prepared from cells expressing either GFP (the control cells), cavin 1-GFP, cavin 2-GFP, or cavin 3-GFP. Membranes were probed with antibodies against caveolin 1, cavin 1, cavin 3, or GFP. The high molecular weight (HMW) peak of caveolin 1 and cavins is boxed. The strong band in the cavin 3 blot (marked with an asterisk (*); fractions 1–4) are nonspecific: they are also observed in cavin 3 knockout cells and in HeLa cells depleted of cavin 3 by siRNA (see Figure 5A). (B) Quantification of the distribution of cavin 1, 2, and-3-GFP, cavin 1, and caveolin 1 in velocity gradients as shown in (A). Relative protein amounts were determined by densitometry of Western blots. The data are expressed as an average percentage of protein in each fraction calculated from four independent experiments. Control was GFP-expressing cells. (C) Cavin 1, 2, and-3-GFP were immuno-isolated from pooled gradient HMW fractions 8–10. Control immuno-isolations were performed from pooled fractions 8–10 of cells expressing GFP or flotillin-2-GFP. Eluted proteins were analysed by Western blotting using antibodies against GFP, cavin 1, caveolin 1, EHD2, and pacsin 2. Note the absence of EHD2 and pacsin 2 from the caveolar coat complex. (D) HMW complexes immuno-isolated from cross-linked HeLa cells expressing caveolin-1-GFP, or cavin 1, 2, or-3-GFP were analyzed by LC-MS/MS. The graph is showing the number of peptides identified from caveolin 1 and cavin proteins in each complex. Other proteins identified by mass spectrometry are not shown. https://doi.org/10.1371/journal.pbio.1001640.g002 We asked whether the composition and stoichiometry of the complex is the same whichever cavin is used to isolate it. Western blotting of the complex, immuno-isolated from pooled fractions 8–10 (HMW fractions) of the gradients separately, from each cavin-GFP cell line showed that the isolated complexes are indeed indistinguishable with respect to the relative amounts of cavin 1 and caveolin 1 present in each complex (Figure 2C). To further demonstrate that the additional caveolar proteins pacsin 2 and EHD2 [27]–[30] are excluded from the coat complex, we carried out Western blotting of the isolated complex from each cavin-GFP cell line. As predicted by the mass spectrometry data (File S1), pacsin 2 and EHD2 could not be detected in isolated caveolar coat complexes (Figure 2C). Tandem mass spectrometry was used to further characterise the caveolar coat complex isolated from each cavin-GFP and the caveolin-1-GFP cell line. All of the core caveolin and cavin proteins were identified, and in line with our Western blotting data, peptides corresponding to cavin 1, cavin 3, and caveolin 1 were found in approximately equal numbers in all four immuno-isolates (Figure 2D). Cavin 1 peptides were slightly more abundant in the cavin-1-GFP cell line, and cavin 2 peptides were notably more abundant in the cavin-2-GFP cell line—consistent with overexpression of this latter fusion protein. These mass spectrometry data, coupled with the Western blot analysis in Figure 2A and 2C, show that the composition of the caveolar coat complex is constant whichever component is used for immuno-isolation, which in turn implies that the caveolar coat complex represents one specific species of macromolecular assembly. The Caveolar Coat Complex Contains Cavins and Caveolins at Defined Stoichiometry, but Cavin 2 and Cavin 3 Compete for Binding Sites We sought to determine the stoichiometry of the components of the caveolar coat complex. To this end, the complex was directly isolated from lysates of cross-linked cells, again using immuno-precipitation of either cavin-1-GFP, cavin-1-GFP, or cavin-3-GFP from the relevant cell lines. Isolated complexes were separated by SDS-PAGE, and stained with the quantitative protein dye Sypro Ruby. Cavin-1-GFP, cavin-2-GFP, cavin-3-GFP, cavin 1, cavin 3, and caveolin 1 (and caveolin 2, which is not well resolved from caveolin 1) were clearly visible in such gels, with little background from contaminating proteins (Figure 3A). We used densitometric gel scans of each immuno-precipitate from at least six separate gels and experiments to measure the relative amount of each component present. Molecular weights calculated from amino acid sequence were used to derive an estimate of the relative molar ratios between exogenously expressed cavin-GFPs, endogenous cavin 1 and cavin 3, and endogenous caveolin for all immuno-isolates (Figure 3B–E). Firstly, we found that both cavin-2-GFP and cavin-3-GFP are present in the complex at a molar ratio of slightly less than 1∶3 with cavin 1 (Figure 3B), so cavin 1 is clearly the most abundant of the three cavins. Secondly, we calculated a molar ratio of total cavin 1 to total caveolin of 1∶4 (Figure 3C)—that is, one cavin 1 may bind to four caveolin molecules (the analysis does not discriminate between caveolin 1 and caveolin 2). This ratio was constant whichever cavin was used for immuno-isolation of the complex. Thirdly, the ratio of the total amount of cavin (i.e., cavin 1+cavin 2+cavin 3) to caveolin was also constant whichever cavin was used for immuno-isolation (Figure 3D). This suggests that there are a fixed number of binding sites in the complex. Download: PPT PowerPoint slide PNG larger image TIFF original image Figure 3. The caveolar coat complex has a defined stoichiometry. (A) Cells expressing GFP, cavin-1-GFP, cavin-2-GFP, or cavin-3-GFP were cross-linked and lysed in 1% Triton X-100/1% octyl glucoside, followed by immuno-isolation of the complexes using anti-GFP antibodies. Eluted proteins were separated by SDS-PAGE and stained with Sypro Ruby. (B–E) Quantification of the relative molar ratios between cavin 1, 2, -3-GFP, cavin 1, cavin 3, and caveolins in the caveolar coat complex. Shown are the means +/− standard error determined from 6–8 independent experiments. Student t test was used to determine significant differences between samples. (F) Confocal microscopy to show levels of cavin 1, detected by indirect immunofluorescence, and cavin-3-GFP, in cells overexpressing high levels of cavin-2-mCherry. Cavin-3-GFP is stably transfected in these cells. Cavin-2-mCherry is shown in blue in the overlay panel, and the same cells are outlined in the other panels. (G) Quantification of fluorescence intensity from cells as in (F). Cavin 1 and cavin-3-GFP intensities are normalised so that the mean signal in cells without cavin-2-mCherry is always 1. Data from a single experiment are shown, each data point representing a single cell. The experiment was repeated three times with equivalent results. (H and I) These represent essentially the same experiments as in (F) and (G), except that here the effect of overexpressing cavin-3-mCherry in cells stably transfected with cavin-2-GFP was examined. https://doi.org/10.1371/journal.pbio.1001640.g003 If the ratio between total cavin and caveolin in the caveolar coat complex is fixed, then one would predict that overexpression of one cavin may reduce the abundance of another cavin within the complex. Indeed, the amount of cavin 3 present when the complex was immuno-isolated from cells overexpressing cavin-2-GFP was significantly reduced compared to that observed when cavin-1-GFP or cavin-3-GFP was used for immuno-isolation (Figure 3E). This implies that cavin 2 and cavin 3 may compete for binding sites within the coat complex. In order to test this, we examined the effects of overexpressing cavin-2-mCherry and cavin-3-mCherry at high levels, using immunofluorescence to assay whether overexpression perturbs the distribution of other cavins. Overexpressing cavin-2-mCherry caused a loss of cavin 3 from caveolar puncta without perturbing the distribution of cavin 1 (Figure 3F and 3G), and overexpressing cavin-3-mCherry caused loss of cavin 2, again without altering the distribution of cavin 1 (Figure 3H and 3I). Therefore, cavin 2 can displace cavin 3 from the caveolar coat complex, and vice versa. Variation in the relative amounts of cavin 2 and cavin 3 occurs between tissues in vivo [22],[34], so this competition is likely to have physiological relevance. The combined data imply that the single species of caveolar coat complex is composed of a defined number of cavins and caveolins. The core interaction between cavin 1 and caveolin occurs with a stoichiometry of around 1 cavin 1∶4 caveolin molecules. Changes in relative abundance imply that cavin 2 and cavin 3 compete for a defined number of binding sites within this single type of large 80S complex. Cavins 2 and 3 Form Distinct Subcomplexes with Cavin 1 and Caveolin 1 Given the above, we reasoned that the 80S caveolar coat complex might be constructed from specific cavin and caveolin subcomplexes. Partially disassembled coat complexes could yield additional information on the nature of such subcomplexes. To pursue this possibility, we quantified the distribution of each cavin fusion protein in sucrose velocity gradients from the appropriate cell lines after lysis with 1% Triton X-100 without prior cross-linking. We observed a bimodal distribution for caveolin 1 in all cell lines, with a minor peak in fraction 3 and a major peak in fraction 7 (Figure 4A and 4B). We suggest that this latter caveolin 1 peak is the 70S species identified previously [35]. Cavin-1-GFP co-fractionated with endogenous cavin 1, as expected, and formed complexes of about 60S [35]. Cavin-1-GFP expression had no effect on the distribution of endogenous cavin 1 and caveolin 1, as compared to control cells expressing GFP alone. Interestingly, however, cavin-2-GFP peaked in fraction 3, whilst cavin-3-GFP peaked in fractions 6 and 7. Moreover, expression of cavin-2-GFP resulted in a shift of endogenous cavin 1 towards low molecular weight fractions, while cavin-3-GFP expression caused cavin 1 to shift towards high molecular weight fractions. This implies that cavin 2 and cavin 3 form distinct subcomplexes with cavin 1, with the former being smaller or less stable in detergent than the latter. Affinity purification of cavin-GFP complexes from gradient fractions 3–5 or 6–8 confirmed this idea (Figure 4C). Cavin-2-GFP was much more abundant in the low molecular weight pool 3–5, while the amounts of cavin-1-GFP and cavin-3-GFP isolated from the two pools were approximately equal. In addition, while all cavin-GFP molecules co-immunoprecipitated endogenous cavin 1, interactions with caveolin 1 were only observed in fractions 6–8, and much more caveolin 1 associated with cavin-1-GFP and cavin-3-GFP than with cavin-2-GFP. We conclude that cavin 2 and cavin 3 form separate subcomplexes with cavin 1 that are distinct in terms of size and/or stability, as well as their affinity for caveolin 1. Download: PPT PowerPoint slide PNG larger image TIFF original image Figure 4. Cavin 2 and cavin 3 form distinct subcomplexes with cavin 1 and caveolin 1. (A) Gradient fractions (10–40% sucrose) of non-cross-linked and detergent solubilised (1% Triton X-100) HeLa cell extracts were analysed by Western blotting. Gradients were prepared from cells expressing either GFP (as control cells), cavin 1-GFP, cavin 2-GFP, or cavin 3-GFP. Membranes were probed with antibodies against caveolin 1, cavin 1, or GFP. Pooled fractions 3–5 and 6–8 used for immuno-isolation shown in (C) are boxed. (B) Quantification of the distribution of cavin 1, 2, and-3-GFP, cavin 1, and caveolin-1 in velocity gradients as shown in (A). Relative protein amounts were determined by densitometry of Western blots. The data are expressed as an average percentage of protein in each fraction calculated from four independent experiments. (C) Cavin 1, 2, and -3-GFP, flotillin-2-GFP, caveolin-1-GFP, or GFP were immuno-isolated from gradient fractions 3–5 or 6–8 as shown in (A). Eluted proteins were analysed by Western blotting using antibodies against GFP, cavin 1, or caveolin 1. (D) Caveolar coat complexes immuno-isolated from cross-linked HeLa cells stably expressing cavin-2-GFP were incubated with increasing concentrations of DTT to partially reduce crosslinks and analyzed by Western blotting using anti-caveolin 1 or anti-cavin 1 antibodies. Monomeric and oligomeric species of caveolin 1 and cavin 1 are indicated. Note the 180 kDa cavin 1 trimer. https://doi.org/10.1371/journal.pbio.1001640.g004 If cavin 2 and 3 do indeed form separate subcomplexes with cavin 1, then one would predict that cavin 1 may form complexes with cavin 2 or 3 even in the absence of caveolin 1, and that cavin 2 and 3 should not co-precipitate unless cavin 1 is present. We carried out experiments to test these predictions. There is a marked reduction in the expression of cavin 2 and cavin 3 in cavin 1 or caveolin 1 gene knockout mice and cell lines [23],[24],[31],[34], so we transiently transfected plasmids for overexpressing cavins 1, 2, or 3 as mCherry or GFP fusion proteins into mouse embryonic fibroblasts (MEFs) from cavin 1 and caveolin 1 knockout mice [7],[31],[34]. Western blotting of cell lysates from the transfected cells showed that the fusion proteins of all three cavins could be detected, although expression of cavin-3-mCherry was very low unless cavin-1-GFP was also present (Figure S5A). In caveolin 1 knockout MEFs, immuno-isolation of cavin-1-GFP co-precipitated cavin-2-mCherry and cavin-3-mCherry (Figure S5A). This is consistent with previous studies showing that cavins form high molecular weight complexes in the absence of caveolin 1, and that cavins 1 and 2 bind to each other directly in vitro [21],[22],[34]. In order to ascertain whether cavins 2 and 3 can interact without cavin 1, we compared co-precipiation of cavin-2-GFP and cavin-3-mCherry in control and cavin 1 knockout MEFs (Figure S5B). In control MEFs co-precipitation was detected, but this was lost when cavin 1 was absent, arguing that cavin 2 and cavin 3 do not interact directly, and so do indeed form separate subcomplexes with cavin 1. To further identify protein–protein interactions within the caveolar coat complex, we used Western blotting to look for interactions between the isolated components after immunoisolation of cavin-1-GFP, cavin-2-GFP, or cavin-3-GFP followed by titration of DTT to partially dissociate cross-links. Complexes purified by isolation of cavin-2-GFP are shown in Figure 4D, and immunoisolation of all three cavins is shown in Figure S6. We found that the disassembly of cross-linked complexes with titration of DTT is precisely the same whichever cavin is used for immuno-isolation, providing additional confirmation that there is one single type of caveolar coat complex. Under nonreducing conditions (without DTT), around 50% of the total caveolin 1 in the caveolar coat complex was found in oligomers of about 350 kDa and more. The remainder of the caveolin 1 was present mostly as either monomers or dimers (Figure 4D). Upon titration of DTT, high molecular weight oligomeric forms of caveolin 1 were reduced to a distinct caveolin oligomer of about 350–400 kDa, which was stable in up to 10 mM DTT and even clearly identifiable under fully reducing conditions (not shown). More minor cross-linked species consistent with the presence of caveolin 1 oligomers increasing in size from 2 to 8 caveolin 1 monomers were not stable in DTT. This suggests that the 350–400 kDa oligomeric form of caveolin 1 is a major component of the caveolar coat complex [49],[50]. We then analyzed cavin 1 (Figure 4D and Figure S6). In nonreduced samples, the large majority of cavin 1 was cross-linked into oligomers of about 180 kDa and more. Monomeric cavin 1 (55 kDa) and a minor cavin 1 species of about 85 kDa were also observed. Interestingly, progressive addition of DTT revealed a relatively stable oligomeric form of cavin 1 of about 180 kDa, a size indicative of a cavin 1 trimer. This form of cavin 1 was found in all immunoisolates and was stable in up to 10 mM DTT (Figure 4D and Figure S6). Altogether, combining the data on disassembly of the non-cross-linked caveolar coat complex in detergent, co-precipitation in the absence of cavin 1, and partial reduction of cross-links yields specific conclusions: Subcomplexes containing cavin 1 and cavin 2 can be separated from subcomplexes containing cavin 1 and cavin 3, and cavins 2 and 3 do not enter the same complex unless cavin 1 is also present. Partial reduction of cross-links shows that cavin 1 forms a relatively stable trimer, and this trimer is likely to be a core element of the caveolar coat complex. The Ratio Between Cavin 1 and Caveolin 1 in the Coat Complex Is Independent of Cavin 2, Cavin 3, and EHD2 Expression So as to characterise the role of cavins 2 and 3 in the caveolar coat complex in more detail, we used siRNAs to deplete these proteins from the cavin-1-GFP cell line. In parallel, we also used siRNAs targetting EHD2, to ask whether this protein controls the assembly of the complex, and siRNAs targetting flotillin proteins as a negative control [51]. Depletion of all targeted proteins was highly efficient, as judged by Western blotting (Figure 5A and Figure S7A). Velocity gradient centrifugation showed that 80S complexes were clearly still present in all siRNA-treated cells, and the large majority of cavin 1, cavin 3, and caveolin 1 were still found in the HMW, 80S, fractions 8–10 (Figure 5B and Figure S7B). Lack of cavin 3 caused a marginal destabilisation of the 80S complex, as in this case some cavin 1 and caveolin 1 were detected in lower density fractions in longer exposures of the relevant Western blots (LMW pool, Figure 5B and Figure S7B). In these longer exposures the cavin 1 trimer described above is detected, even though these samples were run under fully reducing conditions. Most importantly, however, quantification of the ratio between the amounts of total cavin 1 and caveolin 1 present in the high molecular weight fractions (i.e., in the caveolar coat complex) confirmed that this was unchanged by any of the siRNA treatments (Figure 5C), which shows that the stoichiometry of a core interaction between cavin 1 and caveolin 1 is independent of the presence of cavins 2 or 3. Download: PPT PowerPoint slide PNG larger image TIFF original image Figure 5. A core complex containing cavin 1 and caveolin 1 is independent of cavin 2, cavin 3, and EHD2 expression. (A) Western blotting of cell lysates from siRNA-treated cells. Blots were probed with the anitbodies indicated. * indicates the nonspecific band obtained with anti-cavin-3 antibodies. (B) Western blotting of fractions 3 to 5 (LMW pool) and 8 to 10 (HMW pool: these fractions contain the caveolar coat complex) of sucrose velocity gradients analyzing the distribution of cavins and caveolin 1 from cells treated with the siRNAs shown. Blots of the entire gradients are shown in Figure S7B. (C) Quantification of the ratio between the amount of cavin 1 and caveolin 1 in fractions and 8 to 10 (HMW pool: these fractions contain the caveolar coat complex) of sucrose velocity gradients as shown in the blots in (B). Quantification is from densitometric scans of Western blots, and is based on three separate experiments; bars are SEM. In each experiment, all values were normalized so that the intensity of relevant bands in control flotillin 1 and 2 siRNA treated samples was 1. https://doi.org/10.1371/journal.pbio.1001640.g005 Likewise, depletion of EHD2 had no detectable effect on the formation of the caveolar coat complex, having no effect on the behaviour of the complex in velocity gradients or on the relative amounts of cavin and caveolin proteins present (Figure 5B and 5C, Figure S7). EHD2, therefore, not only is not present in the complex, but also does not regulate its formation. Given that EHD2 controls the dynamics and plasma membrane association of caveolae [27],[28], this implies that the caveolar coat complex is the same whether caveolae are continuous with the plasma membrane or form intracellular membrane vesicles. Cavins 1, 2, and 3, and Caveolin 1 Are Uniformly Distributed Around the Caveolar Bulb If the caveolar coat complex does indeed coat caveolae, then it should be found all around the caveolar bulb. We aimed to determine the localisation of the complex within caveolae at high spatial resolution. Firstly, the cell lines expressing caveolin-1-GFP, cavin-1-GFP, cavin-1-GFP, and cavin-3-GFP described above were studied by immuno-electron microscopy. Pre-embedding labeling using anti-GFP antibodies and nanogold-conjugated secondary antibodies, followed by silver enhancement, allowed highly specific labeling of caveolae (Figure 6A). We acquired images of more than 50 caveolae stained with gold particles per cell line, and superimposed both the membrane profiles and the position of gold particles for each case (Figure 6B and Figure S8). The aggregated images revealed that all of the cavin-GFP fusions, as well as caveolin-1-GFP, localised around the caveolar bulb, with no discernable bias towards the membrane proximal or distal region. In order to check that endogenous proteins have the same distribution as GFP fusions, we carried out immuno-labeling of untransfected cells with caveolin 1 and cavin 1 antibodies. The distribution of endogenous caveolin 1 and cavin 1 determined using this approach was indistinguishable from that of caveolin-1-GFP and cavin-1-GFP (Figure 6C). These data argue that caveolar coat complexes are distributed all around the membrane bulb of caveolae. Download: PPT PowerPoint slide PNG larger image TIFF original image Figure 6. Cavins and caveolins are uniformly distributed around the caveolar bulb. (A) HeLa cell lines stably transfected with either caveolin-1-GFP, cavin 1, 2, or-3-GFP were processed for immuno-electron microscopy, using anti-GFP primary antibodies, nanogold-conjugated secondary antibodies, and silver enhancement. Representative transmission EM images showing specific immuno-labeling of caveolin-1-GFP. (B) Membrane profiles and the localisation of gold particles (as shown in A) from about 50 caveolae per cell line as shown were superimposed; labeling was with anti-GFP antibodies as in (A). (C) HeLa cells were processed for immuno-electron microscopy using anti-caveolin 1 or cavin 1 primary antibodies, nanogold-conjugated secondary antibodies, and silver enhancement. Images represent superimposition of around 50 caveolae. (D) Membrane profiles and the localisation of gold particles (as shown in A) from about 50 caveolae from GFP-EHD2 expressing cells were superimposed; labeling was with anti-GFP antibodies as in (A). https://doi.org/10.1371/journal.pbio.1001640.g006 Our biochemical data show that EHD2 is not present in the caveolar coat complex. To determine the distribution of EHD2 within caveolae, we used cells expressing GFP-EHD2 and immuno-labeling as above (Figure 6D). In clear contrast to the caveolar coat complexes, GFP-EHD2 was enriched around the neck of caveolae (Figure 6C). Therefore, caveolae are likely to have separate sets of proteins coating the bulb and neck regions, and these different distributions can be resolved by immuno-electron microscopy. Immuno-labeling has inherent limitations, including the possibility that epitopes may not be uniformly accessible or may only tolerate weak fixation, and the fact that complexes of primary and secondary antibodies reduce spatial resolution compared to the fine structural details otherwise delivered by electron microscopy. We aimed to directly visualise caveolar coat complexes by electron microscopy in situ. MiniSOG (for Mini Singlet Oxygen Generator) is a relatively small (106 amino acids) fluorescent flavoprotein that efficiently generates reactive oxygen species upon illumination with blue light. Local production of reactive oxygen species can be used to convert diaminobenzidine (DAB) into an osmiophilic electron-dense polymer. This allows proteins to be localised by EM at high spatial resolution [48],[52]. We generated separate cell lines expressing cavins 1, 2, and 3 as MiniSOG-mCherry fusion proteins (henceforth referred to as cavin-MiniSOG). In order to facilitate analysis of multiple caveolae, we used retinal pigment epithelial (RPE) cells, where caveolae polarise to the rear of the cell, as seen in other polarized cells such as fibroblasts (Figure S9B) [41],[53]. This yields defined regions of the cell that are very rich in caveolae. Cavin fusion proteins were expressed at low levels relative to the endogenous proteins, and localised to caveolae in a manner indistinguishable from the endogenous proteins by light microscopy (Figure S9A, Table S1). Photooxidation of glutaraldehyde-fixed cavin-MiniSOG expressing RPE cells in the presence of DAB resulted in the deposition of a brown reaction product (Figure 7A). Correlative electron micrographs of plastic embedded and osmium-stained sections revealed a high density of caveolae in such regions (Figure 7B). Caveolae were strongly labeled with an electron-dense stain, regardless of which cavin-MiniSOG was used for photooxidisation (Figure 7B and 7C). The staining was highly specific, as caveolar membranes from adjacent cells not expressing cavin-MiniSOG were not stained, the stain was restricted to regions of the cell enriched in caveolae, and the only cellular membranes stained were caveolae (Figure 7 and Figure S10). MiniSOG-generated stain from all three cavin fusion proteins was distributed right around the caveolar bulb, and was present at the same density at the lateral sides and at the apex of the bulb (Figure 7D). This is consistent with the immuno-electron microscopy presented in Figure 6, and again implies that caveolar coat complexes are present all around the caveolar bulb. Download: PPT PowerPoint slide PNG larger image TIFF original image Figure 7. MiniSOG labeling shows that cavins are distributed around the caveolar bulb, with periodic density maxima. RPE cells stably transfected with cavin 1, 2, or 3-MiniSOG-mCherry were photooxidized and processed for correlative light and electron microscopy. (A) Representative light microscopy images showing an RPE cell expressing cavin-3-MiniSOG-mCherry before (left) and after photooxidation (middle), and after osmification and plastic embedding (right). (B) Correlative electron micrograph of the photooxidized region shown in (A). The bottom electron micrograph shows the boxed region shown in (B), acquired at higher magnification. (C) Representative transmission electron micrographs of RPE cells stably expressing cavin-1-MiniSOG-mCherry (top) and cavin-2-MiniSOG-mCherry (bottom). (D) Representative electron micrographs for all three cavin-MiniSOG-mCherry showing caveolae visibly open to the outside of the cell. Note that MiniSOG label is found all around the caveolar bulb. Asterisks highlight background contrast produced by osmium stain in adjacent cells. (E) Representative electron micrographs of photooxidized regions of RPE cells stably transfected with cavin 1, 2, or 3-MiniSOG-mCherry. Areas showing regular spacing between densities produced by MiniSOG are boxed and shown enlarged on the bottom. (F) Line scans illustrating periodic density maxima as shown in (E). (G) Quantification of the distances between density maxima produced by cavin-MiniSOG-mCherry (n = 176). https://doi.org/10.1371/journal.pbio.1001640.g007 Ultrastructure of the Caveolar Coat We acquired high-resolution micrographs to study the ultrastructural properties of the caveolar coat complex labeled using cavin-MiniSOG. In thin sections, the label was clearly not continuously distributed along the caveolar membrane, but rather formed a punctate, sometimes spike-like coat. This was observed for all cavin-MiniSOG proteins (Figure 7E). In some caveolae, individual puncta formed periodic densities with approximately regular spacing (Figure 7E and 7F). The spacing of periodic densities around the bulb was not measurably different whether cavin-1-MiniSOG, cavin-2-MiniSOG, or cavin-3-MiniSOG were used (Figure 7F). Quantification of the spacing of these local increases in density revealed a periodicity of 10–16 nm. The shortest distance we were able to measure in electron micrographs of thin sections was around 8 nm (Figure 7G). This shows that caveolar coat complexes form local densities on caveolar membranes with an apparent spacing of 10–16 nm and that MiniSOG labeling allows proteins to be localized with low nanometer precision. The fact that we could observe regular spacing between densities is suggestive of regularity in the coat. In order to reveal the organisation of the coat in three dimensions, dual-tilt tomograms were recorded from representative regions. Tomography confirmed that cavin-MiniSOG labeling extends all around the bulb (Figure 8A and Movie S1), and that caveolar coat complexes form periodic density maxima (Figure 8B and 8C). In order to estimate the degree of resolution of the MiniSOG label in our tomograms, line scans through individual densities were performed. Line scans perpendicular to the membrane showed that densities peaked sharply and exhibited a half maximum width of about 8–10 nm (Figure 8B). Line scans along caveolar membranes resolved densities separated by about 10 nm, confirming our previous data on thin sections (Figure 8C: compare to 7F). We carried out three-dimensional reconstructions of regions of the caveolar surface where such periodic density maxima were well resolved (Figure 8D). These reconstructions revealed both local maxima and apparent linear striations within the coat (Figure 8D and Movie S2). In some regions, the density maxima had a regular, lattice-like distribution, suggesting that the distribution of MiniSOG reflects an underlying higher-order lattice organisation of the caveolar coat. The three-dimensional reconstructions also reinforced the firm conclusion that the MiniSOG label, and hence the caveolar coat complex, are found all around the caveolar bulb without specific enrichment in the sides, apex, or neck of caveolae. Download: PPT PowerPoint slide PNG larger image TIFF original image Figure 8. Tomographic reconstruction of the caveolar coat stained with cavin-3-MiniSOG. Tomograms were recorded from representative regions of photooxidized RPE cells stably transfected with cavin-3-miniSOG-mCherry. (A) x/y projection of a typical tomogram acquired at 150 kV, 18k at a pixel size of 0.5 nm. The backprojection shown was binned by 2, resulting in a pixel size of 1 nm. (B and C) Line scans through density maxima observed in x/y projections. (B) Line scans were performed perpendicular to the membrane and the relative intensity plotted against distance, including error bars (n = 11). (C) Representative line scan along the membrane showing regular spacing between local maxima. Note that distances of about 8–10 nm can be resolved. (D) Three-dimensional representation of the caveolar coat stained with cavin-3-MiniSOG, for two separate caveolae. From left to right: x/y backprojections of representative caveolae; surfaces of the caveolae shown on the left; maximum intensity projection of the boxed area shown in column 2; surface rendering of the boxed area shown in column 2. https://doi.org/10.1371/journal.pbio.1001640.g008 Caveolins and Cavins Assemble Into an 80S Caveolar Coat Complex Caveolin 1 has been reported to exist as a labile high molecular weight complex of about 70S, as determined by velocity gradient centrifugation [35],[43]. Such oligomeric complexes of caveolin 1 are readily lost upon extraction of cells with detergents known to fully solubilize membranes rich in cholesterol and sphingolipids, and the apparent size of caveolin 1 complexes is highly sensitive to the nature of the detergent used for solubilisation (Figure S1) [35]. We looked for ways to stabilise complexes containing caveolin prior to cell lysis. These experiments were carried out in a clonal HeLa cell line stably expressing caveolin-1-GFP at about 20% of the level of endogenous caveolin 1 (Figure S2A). We found that cross-linking of live HeLa cells with the membrane permeable and reversible cross-linker DSP (dithiobis(succinimidylpropionate)) efficiently and reproducibly stabilised a high molecular weight complex containing caveolin 1 (Figure 1A). Upon cross-linking, caveolin 1 was found almost exclusively in a single sharp peak in fractions 8–10 of 10–40% sucrose velocity gradients (Figure 1A and Figure S1). This was the case even when cells were lysed in 2% w/v (70 mM) octyl-glucoside (OG), a condition where without cross-linker most caveolin 1 is found in the top four fractions of the gradient (Figure S1) [35]. Download: PPT PowerPoint slide PNG larger image TIFF original image Figure 1. Caveolins and cavins assemble into an 80S complex. (A) HeLa cells were cross-linked with DSP and solubilised in 1% Triton X-100/1% octyl glucoside. Lysates were fractionated on 10–40% sucrose gradients, followed by Western blotting of gradient fractions 1–12 using antibodies against caveolin 1, cavin 1, GFP, clathrin heavy chain, or flotillin 2. Gradients were prepared from cells expressing caveolin-1-GFP (top panel) or flotillin-2-GFP as a control (middle panel). The bottom panel shows a gradient from flotillin-2-GFP expressing cells that had not been cross-linked prior to cell lysis. The high molecular weight (HMW) peak 8–10 of caveolin 1 and cavin 1 is boxed. The distribution of molecular weight protein standards in the gradients is indicated on the bottom. (B) Caveolin-1-GFP or flotillin-2-GFP (to provide a control) was immuno-isolated from pooled gradient fractions 8–10, indicated “HMW” in (A). A control immuno-isolation was also performed from fractions 8–10 of cells expressing GFP. Eluted proteins were visualized by silver staining. Cavin 1, caveolin-1-GFP, cavin 3, and caveolins are indicated. The bands specific to the flotillin-2-GFP immunoprecipitation, marked with asterisks (*) are flotillin-2-GFP, and endogenous flotillins 1 and 2. (C) The caveolin 1 HMW complex was analyzed by LC-MS/MS, and the graph shows the number of peptides for each protein identified. Proteins are ranked left to right by number of peptides identified. (D) Western blots of the starting material (In) and unbound material (Un) after immuno-isolation of caveolin-1-GFP, flotillin-2-GFP, or GFP from pooled HMW fractions 8–10. Membranes were probed with anti-GFP or anti-cavin 1 antibodies. https://doi.org/10.1371/journal.pbio.1001640.g001 In the presence of cross-linker, cavin 1 co-fractionated precisely with caveolin 1 (Figure 1A), whereas without cross-linking cavin 1 was found in a broad peak in the centre of the gradient (centred on fractions 6/7, about 60S; [35]) and did not co-fractionate with caveolin 1 (Figure 1A). Using the profile of cellular 80S ribosomes and purified 60S ribosomal subunits as a reference, we estimated that the cross-linked high molecular weight caveolin and cavin complexes have a sedimentation rate of about 80S (Figure S2B). 80S caveolin complexes were also detected in control cell lines expressing flotillin-2-GFP or GFP alone, and so were not dependent on the presence of caveolin-1-GFP (Figure 1A). Caveolin-1-GFP had the same distribution as endogenous caveolin 1 in the gradient (Figure 1A). Cross-linking did not change the distribution of caveolin 1 or cavin 1 when studied by immunofluorescence, and did not alter the appearance or brightness of either protein as they co-localised in puncta that are likely to correspond to individual caveolae (Figure S3A and S3B). Therefore, cross-linking with DSP does not itself induce redistribution of cavins or caveolin 1 within cells before lysis. These results suggest that cross-linking of live cells with DSP stabilises caveolin/cavin interactions that are otherwise lost during solubilisation with detergents, and thereby allows identification of a large 80S complex containing cavins and caveolins. To determine the protein composition of the 80S complex, and to confirm that co-fractionation of caveolin 1 and cavin 1 after cross-linking indeed reflects co-assembly of both proteins into the same complex, caveolin-1-GFP was immuno-isolated from pooled fractions 8–10 (HMW, high molecular weight fractions) using magnetic anti-GFP beads (Figure 1B). Immuno-isolation from the same fractions of gradients of cell lysates from flotillin-2-GFP or GFP expressing cells served as controls. The complexes were washed extensively and eluted with a pH shift. Eluates were reduced with DTT to disassemble DSP cross-links, and analysed by SDS-PAGE and silver staining. This revealed the presence of four major bands specific to the caveolin-1-GFP immunoprecipitate (Figure 1B). Western blotting (Figure S1C), and excision of the relevant bands for analysis by mass spectrometry, both confirmed that these correspond to caveolin-1-GFP, cavin 1, cavin 3, and caveolins 1 and 2. Tandem mass spectrometry of the immuno-precipitated 80S complex identified all of the above proteins, and these were the only abundant proteins detected. Cavin 2 was also detected in the complex, though at significantly lower levels (Figure 1C, File S1). Western blotting of the isolated complex confirmed that all caveolar proteins co-purified specifically with caveolin-1-GFP, and not with affinity-purified flotillin-2-GFP complexes or mock purifications from GFP control cells (Figure S2C). Cavin 1 yielded the most tryptic peptides identified by mass spectrometry analysis of isolated 80S complexes, and was the strongest band on silver-stained gels (Figure 1B and 1C, File S1). In addition, immuno-isolation of caveolin-1-GFP from the HMW fractions caused cavin 1 to be efficiently depleted from these fractions, showing that the large majority of cellular cavin 1 is present in complexes with caveolin 1 (Figure 1D). Together these data show that caveolins 1 and 2 and cavins 1, 2, and 3 assemble into an 80S complex, and that cavin 1 is a major component of this complex. Moreover, we can state that there are no further abundant protein components in the isolated complex. We hereafter refer to this complex as the caveolar coat complex. There Is a Single Species of Caveolar Coat Complex To study the role of cavins 1, 2, and 3 in the assembly of the caveolar coat complex, we generated separate HeLa cell lines stably expressing each protein with a C-terminal TEV-GFP-10×His tag (from now on referred to as cavin-1, -2, or -3-GFP). All cavin fusion proteins localised to caveolae by light microscopy (Figure S4A, Table S1), and cavins 1, 2, and 3 co-localised extensively with each other (Figure S4B). Fusion proteins were expressed at low levels, with cavin-1-GFP being expressed at about 20% of endogenous cavin 1 (Figure S4C). Endogenous cavin 2 is difficult to detect in HeLa cells with available antibodies (although it is present, albeit at low levels, as it is detected by mass spectrometry, Figure 1E and File S1), so cavin-2-GFP is likely to be present at significantly higher levels than endogenous cavin 2 in the cavin-2-GFP cell line. The expression of endogenous cavin 3 was specifically down-regulated in several independent cavin-3-GFP-expressing clonal cell lines, resulting in cell lines expressing cavin-3-GFP instead of cavin 3. In the cell line used for the experiments presented here, expression of cavin-3-GFP was similar to that of cavin 3 in control cells (Figure S4C). We analysed the distribution of the three cavin-GFP constructs in gradients from cells that had been cross-linked with DSP prior to cell lysis. Upon cross-linking, all cavin-GFP fusion proteins, endogenous cavin 1 and cavin 3, as well as caveolin 1 co-fractionated in the 80S fractions 8–10 (Figure 2A and 2B). Cavin-2-GFP exhibited an additional minor peak in fraction 3, and its expression led to the dissociation of small amounts of cavin 1 from the caveolar coat complex into fractions 5–7. In contrast, cavin-1-GFP and cavin-3-GFP were exclusively found in the caveolar coat complex. These data show that the cavin-GFP fusion proteins are incorporated into the caveolar coat complex just like endogenous cavins. To check that the coat complex does not reflect association or cross-linking of the cavin proteins after lysis, HeLa cell lines stably transfected with either cavin-3-GFP or cavin-3-mCherry were grown in the same dish, cross-linked, and lysed. Subsequent immunoisolation with anti-GFP antibodies yielded complexes devoid of cavin 3-mCherry (Figure S3C), arguing that cavin complexes do not form after cell lysis. Download: PPT PowerPoint slide PNG larger image TIFF original image Figure 2. There is a single species of caveolar coat complex. (A) Cross-linked and detergent solubilised (1% Triton X-100/1% octyl glucoside) HeLa cell extracts were fractionated by velocity centrifugation (10–40% sucrose), followed by Western blotting of fractions 1–12. Gradients were prepared from cells expressing either GFP (the control cells), cavin 1-GFP, cavin 2-GFP, or cavin 3-GFP. Membranes were probed with antibodies against caveolin 1, cavin 1, cavin 3, or GFP. The high molecular weight (HMW) peak of caveolin 1 and cavins is boxed. The strong band in the cavin 3 blot (marked with an asterisk (*); fractions 1–4) are nonspecific: they are also observed in cavin 3 knockout cells and in HeLa cells depleted of cavin 3 by siRNA (see Figure 5A). (B) Quantification of the distribution of cavin 1, 2, and-3-GFP, cavin 1, and caveolin 1 in velocity gradients as shown in (A). Relative protein amounts were determined by densitometry of Western blots. The data are expressed as an average percentage of protein in each fraction calculated from four independent experiments. Control was GFP-expressing cells. (C) Cavin 1, 2, and-3-GFP were immuno-isolated from pooled gradient HMW fractions 8–10. Control immuno-isolations were performed from pooled fractions 8–10 of cells expressing GFP or flotillin-2-GFP. Eluted proteins were analysed by Western blotting using antibodies against GFP, cavin 1, caveolin 1, EHD2, and pacsin 2. Note the absence of EHD2 and pacsin 2 from the caveolar coat complex. (D) HMW complexes immuno-isolated from cross-linked HeLa cells expressing caveolin-1-GFP, or cavin 1, 2, or-3-GFP were analyzed by LC-MS/MS. The graph is showing the number of peptides identified from caveolin 1 and cavin proteins in each complex. Other proteins identified by mass spectrometry are not shown. https://doi.org/10.1371/journal.pbio.1001640.g002 We asked whether the composition and stoichiometry of the complex is the same whichever cavin is used to isolate it. Western blotting of the complex, immuno-isolated from pooled fractions 8–10 (HMW fractions) of the gradients separately, from each cavin-GFP cell line showed that the isolated complexes are indeed indistinguishable with respect to the relative amounts of cavin 1 and caveolin 1 present in each complex (Figure 2C). To further demonstrate that the additional caveolar proteins pacsin 2 and EHD2 [27]–[30] are excluded from the coat complex, we carried out Western blotting of the isolated complex from each cavin-GFP cell line. As predicted by the mass spectrometry data (File S1), pacsin 2 and EHD2 could not be detected in isolated caveolar coat complexes (Figure 2C). Tandem mass spectrometry was used to further characterise the caveolar coat complex isolated from each cavin-GFP and the caveolin-1-GFP cell line. All of the core caveolin and cavin proteins were identified, and in line with our Western blotting data, peptides corresponding to cavin 1, cavin 3, and caveolin 1 were found in approximately equal numbers in all four immuno-isolates (Figure 2D). Cavin 1 peptides were slightly more abundant in the cavin-1-GFP cell line, and cavin 2 peptides were notably more abundant in the cavin-2-GFP cell line—consistent with overexpression of this latter fusion protein. These mass spectrometry data, coupled with the Western blot analysis in Figure 2A and 2C, show that the composition of the caveolar coat complex is constant whichever component is used for immuno-isolation, which in turn implies that the caveolar coat complex represents one specific species of macromolecular assembly. The Caveolar Coat Complex Contains Cavins and Caveolins at Defined Stoichiometry, but Cavin 2 and Cavin 3 Compete for Binding Sites We sought to determine the stoichiometry of the components of the caveolar coat complex. To this end, the complex was directly isolated from lysates of cross-linked cells, again using immuno-precipitation of either cavin-1-GFP, cavin-1-GFP, or cavin-3-GFP from the relevant cell lines. Isolated complexes were separated by SDS-PAGE, and stained with the quantitative protein dye Sypro Ruby. Cavin-1-GFP, cavin-2-GFP, cavin-3-GFP, cavin 1, cavin 3, and caveolin 1 (and caveolin 2, which is not well resolved from caveolin 1) were clearly visible in such gels, with little background from contaminating proteins (Figure 3A). We used densitometric gel scans of each immuno-precipitate from at least six separate gels and experiments to measure the relative amount of each component present. Molecular weights calculated from amino acid sequence were used to derive an estimate of the relative molar ratios between exogenously expressed cavin-GFPs, endogenous cavin 1 and cavin 3, and endogenous caveolin for all immuno-isolates (Figure 3B–E). Firstly, we found that both cavin-2-GFP and cavin-3-GFP are present in the complex at a molar ratio of slightly less than 1∶3 with cavin 1 (Figure 3B), so cavin 1 is clearly the most abundant of the three cavins. Secondly, we calculated a molar ratio of total cavin 1 to total caveolin of 1∶4 (Figure 3C)—that is, one cavin 1 may bind to four caveolin molecules (the analysis does not discriminate between caveolin 1 and caveolin 2). This ratio was constant whichever cavin was used for immuno-isolation of the complex. Thirdly, the ratio of the total amount of cavin (i.e., cavin 1+cavin 2+cavin 3) to caveolin was also constant whichever cavin was used for immuno-isolation (Figure 3D). This suggests that there are a fixed number of binding sites in the complex. Download: PPT PowerPoint slide PNG larger image TIFF original image Figure 3. The caveolar coat complex has a defined stoichiometry. (A) Cells expressing GFP, cavin-1-GFP, cavin-2-GFP, or cavin-3-GFP were cross-linked and lysed in 1% Triton X-100/1% octyl glucoside, followed by immuno-isolation of the complexes using anti-GFP antibodies. Eluted proteins were separated by SDS-PAGE and stained with Sypro Ruby. (B–E) Quantification of the relative molar ratios between cavin 1, 2, -3-GFP, cavin 1, cavin 3, and caveolins in the caveolar coat complex. Shown are the means +/− standard error determined from 6–8 independent experiments. Student t test was used to determine significant differences between samples. (F) Confocal microscopy to show levels of cavin 1, detected by indirect immunofluorescence, and cavin-3-GFP, in cells overexpressing high levels of cavin-2-mCherry. Cavin-3-GFP is stably transfected in these cells. Cavin-2-mCherry is shown in blue in the overlay panel, and the same cells are outlined in the other panels. (G) Quantification of fluorescence intensity from cells as in (F). Cavin 1 and cavin-3-GFP intensities are normalised so that the mean signal in cells without cavin-2-mCherry is always 1. Data from a single experiment are shown, each data point representing a single cell. The experiment was repeated three times with equivalent results. (H and I) These represent essentially the same experiments as in (F) and (G), except that here the effect of overexpressing cavin-3-mCherry in cells stably transfected with cavin-2-GFP was examined. https://doi.org/10.1371/journal.pbio.1001640.g003 If the ratio between total cavin and caveolin in the caveolar coat complex is fixed, then one would predict that overexpression of one cavin may reduce the abundance of another cavin within the complex. Indeed, the amount of cavin 3 present when the complex was immuno-isolated from cells overexpressing cavin-2-GFP was significantly reduced compared to that observed when cavin-1-GFP or cavin-3-GFP was used for immuno-isolation (Figure 3E). This implies that cavin 2 and cavin 3 may compete for binding sites within the coat complex. In order to test this, we examined the effects of overexpressing cavin-2-mCherry and cavin-3-mCherry at high levels, using immunofluorescence to assay whether overexpression perturbs the distribution of other cavins. Overexpressing cavin-2-mCherry caused a loss of cavin 3 from caveolar puncta without perturbing the distribution of cavin 1 (Figure 3F and 3G), and overexpressing cavin-3-mCherry caused loss of cavin 2, again without altering the distribution of cavin 1 (Figure 3H and 3I). Therefore, cavin 2 can displace cavin 3 from the caveolar coat complex, and vice versa. Variation in the relative amounts of cavin 2 and cavin 3 occurs between tissues in vivo [22],[34], so this competition is likely to have physiological relevance. The combined data imply that the single species of caveolar coat complex is composed of a defined number of cavins and caveolins. The core interaction between cavin 1 and caveolin occurs with a stoichiometry of around 1 cavin 1∶4 caveolin molecules. Changes in relative abundance imply that cavin 2 and cavin 3 compete for a defined number of binding sites within this single type of large 80S complex. Cavins 2 and 3 Form Distinct Subcomplexes with Cavin 1 and Caveolin 1 Given the above, we reasoned that the 80S caveolar coat complex might be constructed from specific cavin and caveolin subcomplexes. Partially disassembled coat complexes could yield additional information on the nature of such subcomplexes. To pursue this possibility, we quantified the distribution of each cavin fusion protein in sucrose velocity gradients from the appropriate cell lines after lysis with 1% Triton X-100 without prior cross-linking. We observed a bimodal distribution for caveolin 1 in all cell lines, with a minor peak in fraction 3 and a major peak in fraction 7 (Figure 4A and 4B). We suggest that this latter caveolin 1 peak is the 70S species identified previously [35]. Cavin-1-GFP co-fractionated with endogenous cavin 1, as expected, and formed complexes of about 60S [35]. Cavin-1-GFP expression had no effect on the distribution of endogenous cavin 1 and caveolin 1, as compared to control cells expressing GFP alone. Interestingly, however, cavin-2-GFP peaked in fraction 3, whilst cavin-3-GFP peaked in fractions 6 and 7. Moreover, expression of cavin-2-GFP resulted in a shift of endogenous cavin 1 towards low molecular weight fractions, while cavin-3-GFP expression caused cavin 1 to shift towards high molecular weight fractions. This implies that cavin 2 and cavin 3 form distinct subcomplexes with cavin 1, with the former being smaller or less stable in detergent than the latter. Affinity purification of cavin-GFP complexes from gradient fractions 3–5 or 6–8 confirmed this idea (Figure 4C). Cavin-2-GFP was much more abundant in the low molecular weight pool 3–5, while the amounts of cavin-1-GFP and cavin-3-GFP isolated from the two pools were approximately equal. In addition, while all cavin-GFP molecules co-immunoprecipitated endogenous cavin 1, interactions with caveolin 1 were only observed in fractions 6–8, and much more caveolin 1 associated with cavin-1-GFP and cavin-3-GFP than with cavin-2-GFP. We conclude that cavin 2 and cavin 3 form separate subcomplexes with cavin 1 that are distinct in terms of size and/or stability, as well as their affinity for caveolin 1. Download: PPT PowerPoint slide PNG larger image TIFF original image Figure 4. Cavin 2 and cavin 3 form distinct subcomplexes with cavin 1 and caveolin 1. (A) Gradient fractions (10–40% sucrose) of non-cross-linked and detergent solubilised (1% Triton X-100) HeLa cell extracts were analysed by Western blotting. Gradients were prepared from cells expressing either GFP (as control cells), cavin 1-GFP, cavin 2-GFP, or cavin 3-GFP. Membranes were probed with antibodies against caveolin 1, cavin 1, or GFP. Pooled fractions 3–5 and 6–8 used for immuno-isolation shown in (C) are boxed. (B) Quantification of the distribution of cavin 1, 2, and-3-GFP, cavin 1, and caveolin-1 in velocity gradients as shown in (A). Relative protein amounts were determined by densitometry of Western blots. The data are expressed as an average percentage of protein in each fraction calculated from four independent experiments. (C) Cavin 1, 2, and -3-GFP, flotillin-2-GFP, caveolin-1-GFP, or GFP were immuno-isolated from gradient fractions 3–5 or 6–8 as shown in (A). Eluted proteins were analysed by Western blotting using antibodies against GFP, cavin 1, or caveolin 1. (D) Caveolar coat complexes immuno-isolated from cross-linked HeLa cells stably expressing cavin-2-GFP were incubated with increasing concentrations of DTT to partially reduce crosslinks and analyzed by Western blotting using anti-caveolin 1 or anti-cavin 1 antibodies. Monomeric and oligomeric species of caveolin 1 and cavin 1 are indicated. Note the 180 kDa cavin 1 trimer. https://doi.org/10.1371/journal.pbio.1001640.g004 If cavin 2 and 3 do indeed form separate subcomplexes with cavin 1, then one would predict that cavin 1 may form complexes with cavin 2 or 3 even in the absence of caveolin 1, and that cavin 2 and 3 should not co-precipitate unless cavin 1 is present. We carried out experiments to test these predictions. There is a marked reduction in the expression of cavin 2 and cavin 3 in cavin 1 or caveolin 1 gene knockout mice and cell lines [23],[24],[31],[34], so we transiently transfected plasmids for overexpressing cavins 1, 2, or 3 as mCherry or GFP fusion proteins into mouse embryonic fibroblasts (MEFs) from cavin 1 and caveolin 1 knockout mice [7],[31],[34]. Western blotting of cell lysates from the transfected cells showed that the fusion proteins of all three cavins could be detected, although expression of cavin-3-mCherry was very low unless cavin-1-GFP was also present (Figure S5A). In caveolin 1 knockout MEFs, immuno-isolation of cavin-1-GFP co-precipitated cavin-2-mCherry and cavin-3-mCherry (Figure S5A). This is consistent with previous studies showing that cavins form high molecular weight complexes in the absence of caveolin 1, and that cavins 1 and 2 bind to each other directly in vitro [21],[22],[34]. In order to ascertain whether cavins 2 and 3 can interact without cavin 1, we compared co-precipiation of cavin-2-GFP and cavin-3-mCherry in control and cavin 1 knockout MEFs (Figure S5B). In control MEFs co-precipitation was detected, but this was lost when cavin 1 was absent, arguing that cavin 2 and cavin 3 do not interact directly, and so do indeed form separate subcomplexes with cavin 1. To further identify protein–protein interactions within the caveolar coat complex, we used Western blotting to look for interactions between the isolated components after immunoisolation of cavin-1-GFP, cavin-2-GFP, or cavin-3-GFP followed by titration of DTT to partially dissociate cross-links. Complexes purified by isolation of cavin-2-GFP are shown in Figure 4D, and immunoisolation of all three cavins is shown in Figure S6. We found that the disassembly of cross-linked complexes with titration of DTT is precisely the same whichever cavin is used for immuno-isolation, providing additional confirmation that there is one single type of caveolar coat complex. Under nonreducing conditions (without DTT), around 50% of the total caveolin 1 in the caveolar coat complex was found in oligomers of about 350 kDa and more. The remainder of the caveolin 1 was present mostly as either monomers or dimers (Figure 4D). Upon titration of DTT, high molecular weight oligomeric forms of caveolin 1 were reduced to a distinct caveolin oligomer of about 350–400 kDa, which was stable in up to 10 mM DTT and even clearly identifiable under fully reducing conditions (not shown). More minor cross-linked species consistent with the presence of caveolin 1 oligomers increasing in size from 2 to 8 caveolin 1 monomers were not stable in DTT. This suggests that the 350–400 kDa oligomeric form of caveolin 1 is a major component of the caveolar coat complex [49],[50]. We then analyzed cavin 1 (Figure 4D and Figure S6). In nonreduced samples, the large majority of cavin 1 was cross-linked into oligomers of about 180 kDa and more. Monomeric cavin 1 (55 kDa) and a minor cavin 1 species of about 85 kDa were also observed. Interestingly, progressive addition of DTT revealed a relatively stable oligomeric form of cavin 1 of about 180 kDa, a size indicative of a cavin 1 trimer. This form of cavin 1 was found in all immunoisolates and was stable in up to 10 mM DTT (Figure 4D and Figure S6). Altogether, combining the data on disassembly of the non-cross-linked caveolar coat complex in detergent, co-precipitation in the absence of cavin 1, and partial reduction of cross-links yields specific conclusions: Subcomplexes containing cavin 1 and cavin 2 can be separated from subcomplexes containing cavin 1 and cavin 3, and cavins 2 and 3 do not enter the same complex unless cavin 1 is also present. Partial reduction of cross-links shows that cavin 1 forms a relatively stable trimer, and this trimer is likely to be a core element of the caveolar coat complex. The Ratio Between Cavin 1 and Caveolin 1 in the Coat Complex Is Independent of Cavin 2, Cavin 3, and EHD2 Expression So as to characterise the role of cavins 2 and 3 in the caveolar coat complex in more detail, we used siRNAs to deplete these proteins from the cavin-1-GFP cell line. In parallel, we also used siRNAs targetting EHD2, to ask whether this protein controls the assembly of the complex, and siRNAs targetting flotillin proteins as a negative control [51]. Depletion of all targeted proteins was highly efficient, as judged by Western blotting (Figure 5A and Figure S7A). Velocity gradient centrifugation showed that 80S complexes were clearly still present in all siRNA-treated cells, and the large majority of cavin 1, cavin 3, and caveolin 1 were still found in the HMW, 80S, fractions 8–10 (Figure 5B and Figure S7B). Lack of cavin 3 caused a marginal destabilisation of the 80S complex, as in this case some cavin 1 and caveolin 1 were detected in lower density fractions in longer exposures of the relevant Western blots (LMW pool, Figure 5B and Figure S7B). In these longer exposures the cavin 1 trimer described above is detected, even though these samples were run under fully reducing conditions. Most importantly, however, quantification of the ratio between the amounts of total cavin 1 and caveolin 1 present in the high molecular weight fractions (i.e., in the caveolar coat complex) confirmed that this was unchanged by any of the siRNA treatments (Figure 5C), which shows that the stoichiometry of a core interaction between cavin 1 and caveolin 1 is independent of the presence of cavins 2 or 3. Download: PPT PowerPoint slide PNG larger image TIFF original image Figure 5. A core complex containing cavin 1 and caveolin 1 is independent of cavin 2, cavin 3, and EHD2 expression. (A) Western blotting of cell lysates from siRNA-treated cells. Blots were probed with the anitbodies indicated. * indicates the nonspecific band obtained with anti-cavin-3 antibodies. (B) Western blotting of fractions 3 to 5 (LMW pool) and 8 to 10 (HMW pool: these fractions contain the caveolar coat complex) of sucrose velocity gradients analyzing the distribution of cavins and caveolin 1 from cells treated with the siRNAs shown. Blots of the entire gradients are shown in Figure S7B. (C) Quantification of the ratio between the amount of cavin 1 and caveolin 1 in fractions and 8 to 10 (HMW pool: these fractions contain the caveolar coat complex) of sucrose velocity gradients as shown in the blots in (B). Quantification is from densitometric scans of Western blots, and is based on three separate experiments; bars are SEM. In each experiment, all values were normalized so that the intensity of relevant bands in control flotillin 1 and 2 siRNA treated samples was 1. https://doi.org/10.1371/journal.pbio.1001640.g005 Likewise, depletion of EHD2 had no detectable effect on the formation of the caveolar coat complex, having no effect on the behaviour of the complex in velocity gradients or on the relative amounts of cavin and caveolin proteins present (Figure 5B and 5C, Figure S7). EHD2, therefore, not only is not present in the complex, but also does not regulate its formation. Given that EHD2 controls the dynamics and plasma membrane association of caveolae [27],[28], this implies that the caveolar coat complex is the same whether caveolae are continuous with the plasma membrane or form intracellular membrane vesicles. Cavins 1, 2, and 3, and Caveolin 1 Are Uniformly Distributed Around the Caveolar Bulb If the caveolar coat complex does indeed coat caveolae, then it should be found all around the caveolar bulb. We aimed to determine the localisation of the complex within caveolae at high spatial resolution. Firstly, the cell lines expressing caveolin-1-GFP, cavin-1-GFP, cavin-1-GFP, and cavin-3-GFP described above were studied by immuno-electron microscopy. Pre-embedding labeling using anti-GFP antibodies and nanogold-conjugated secondary antibodies, followed by silver enhancement, allowed highly specific labeling of caveolae (Figure 6A). We acquired images of more than 50 caveolae stained with gold particles per cell line, and superimposed both the membrane profiles and the position of gold particles for each case (Figure 6B and Figure S8). The aggregated images revealed that all of the cavin-GFP fusions, as well as caveolin-1-GFP, localised around the caveolar bulb, with no discernable bias towards the membrane proximal or distal region. In order to check that endogenous proteins have the same distribution as GFP fusions, we carried out immuno-labeling of untransfected cells with caveolin 1 and cavin 1 antibodies. The distribution of endogenous caveolin 1 and cavin 1 determined using this approach was indistinguishable from that of caveolin-1-GFP and cavin-1-GFP (Figure 6C). These data argue that caveolar coat complexes are distributed all around the membrane bulb of caveolae. Download: PPT PowerPoint slide PNG larger image TIFF original image Figure 6. Cavins and caveolins are uniformly distributed around the caveolar bulb. (A) HeLa cell lines stably transfected with either caveolin-1-GFP, cavin 1, 2, or-3-GFP were processed for immuno-electron microscopy, using anti-GFP primary antibodies, nanogold-conjugated secondary antibodies, and silver enhancement. Representative transmission EM images showing specific immuno-labeling of caveolin-1-GFP. (B) Membrane profiles and the localisation of gold particles (as shown in A) from about 50 caveolae per cell line as shown were superimposed; labeling was with anti-GFP antibodies as in (A). (C) HeLa cells were processed for immuno-electron microscopy using anti-caveolin 1 or cavin 1 primary antibodies, nanogold-conjugated secondary antibodies, and silver enhancement. Images represent superimposition of around 50 caveolae. (D) Membrane profiles and the localisation of gold particles (as shown in A) from about 50 caveolae from GFP-EHD2 expressing cells were superimposed; labeling was with anti-GFP antibodies as in (A). https://doi.org/10.1371/journal.pbio.1001640.g006 Our biochemical data show that EHD2 is not present in the caveolar coat complex. To determine the distribution of EHD2 within caveolae, we used cells expressing GFP-EHD2 and immuno-labeling as above (Figure 6D). In clear contrast to the caveolar coat complexes, GFP-EHD2 was enriched around the neck of caveolae (Figure 6C). Therefore, caveolae are likely to have separate sets of proteins coating the bulb and neck regions, and these different distributions can be resolved by immuno-electron microscopy. Immuno-labeling has inherent limitations, including the possibility that epitopes may not be uniformly accessible or may only tolerate weak fixation, and the fact that complexes of primary and secondary antibodies reduce spatial resolution compared to the fine structural details otherwise delivered by electron microscopy. We aimed to directly visualise caveolar coat complexes by electron microscopy in situ. MiniSOG (for Mini Singlet Oxygen Generator) is a relatively small (106 amino acids) fluorescent flavoprotein that efficiently generates reactive oxygen species upon illumination with blue light. Local production of reactive oxygen species can be used to convert diaminobenzidine (DAB) into an osmiophilic electron-dense polymer. This allows proteins to be localised by EM at high spatial resolution [48],[52]. We generated separate cell lines expressing cavins 1, 2, and 3 as MiniSOG-mCherry fusion proteins (henceforth referred to as cavin-MiniSOG). In order to facilitate analysis of multiple caveolae, we used retinal pigment epithelial (RPE) cells, where caveolae polarise to the rear of the cell, as seen in other polarized cells such as fibroblasts (Figure S9B) [41],[53]. This yields defined regions of the cell that are very rich in caveolae. Cavin fusion proteins were expressed at low levels relative to the endogenous proteins, and localised to caveolae in a manner indistinguishable from the endogenous proteins by light microscopy (Figure S9A, Table S1). Photooxidation of glutaraldehyde-fixed cavin-MiniSOG expressing RPE cells in the presence of DAB resulted in the deposition of a brown reaction product (Figure 7A). Correlative electron micrographs of plastic embedded and osmium-stained sections revealed a high density of caveolae in such regions (Figure 7B). Caveolae were strongly labeled with an electron-dense stain, regardless of which cavin-MiniSOG was used for photooxidisation (Figure 7B and 7C). The staining was highly specific, as caveolar membranes from adjacent cells not expressing cavin-MiniSOG were not stained, the stain was restricted to regions of the cell enriched in caveolae, and the only cellular membranes stained were caveolae (Figure 7 and Figure S10). MiniSOG-generated stain from all three cavin fusion proteins was distributed right around the caveolar bulb, and was present at the same density at the lateral sides and at the apex of the bulb (Figure 7D). This is consistent with the immuno-electron microscopy presented in Figure 6, and again implies that caveolar coat complexes are present all around the caveolar bulb. Download: PPT PowerPoint slide PNG larger image TIFF original image Figure 7. MiniSOG labeling shows that cavins are distributed around the caveolar bulb, with periodic density maxima. RPE cells stably transfected with cavin 1, 2, or 3-MiniSOG-mCherry were photooxidized and processed for correlative light and electron microscopy. (A) Representative light microscopy images showing an RPE cell expressing cavin-3-MiniSOG-mCherry before (left) and after photooxidation (middle), and after osmification and plastic embedding (right). (B) Correlative electron micrograph of the photooxidized region shown in (A). The bottom electron micrograph shows the boxed region shown in (B), acquired at higher magnification. (C) Representative transmission electron micrographs of RPE cells stably expressing cavin-1-MiniSOG-mCherry (top) and cavin-2-MiniSOG-mCherry (bottom). (D) Representative electron micrographs for all three cavin-MiniSOG-mCherry showing caveolae visibly open to the outside of the cell. Note that MiniSOG label is found all around the caveolar bulb. Asterisks highlight background contrast produced by osmium stain in adjacent cells. (E) Representative electron micrographs of photooxidized regions of RPE cells stably transfected with cavin 1, 2, or 3-MiniSOG-mCherry. Areas showing regular spacing between densities produced by MiniSOG are boxed and shown enlarged on the bottom. (F) Line scans illustrating periodic density maxima as shown in (E). (G) Quantification of the distances between density maxima produced by cavin-MiniSOG-mCherry (n = 176). https://doi.org/10.1371/journal.pbio.1001640.g007 Ultrastructure of the Caveolar Coat We acquired high-resolution micrographs to study the ultrastructural properties of the caveolar coat complex labeled using cavin-MiniSOG. In thin sections, the label was clearly not continuously distributed along the caveolar membrane, but rather formed a punctate, sometimes spike-like coat. This was observed for all cavin-MiniSOG proteins (Figure 7E). In some caveolae, individual puncta formed periodic densities with approximately regular spacing (Figure 7E and 7F). The spacing of periodic densities around the bulb was not measurably different whether cavin-1-MiniSOG, cavin-2-MiniSOG, or cavin-3-MiniSOG were used (Figure 7F). Quantification of the spacing of these local increases in density revealed a periodicity of 10–16 nm. The shortest distance we were able to measure in electron micrographs of thin sections was around 8 nm (Figure 7G). This shows that caveolar coat complexes form local densities on caveolar membranes with an apparent spacing of 10–16 nm and that MiniSOG labeling allows proteins to be localized with low nanometer precision. The fact that we could observe regular spacing between densities is suggestive of regularity in the coat. In order to reveal the organisation of the coat in three dimensions, dual-tilt tomograms were recorded from representative regions. Tomography confirmed that cavin-MiniSOG labeling extends all around the bulb (Figure 8A and Movie S1), and that caveolar coat complexes form periodic density maxima (Figure 8B and 8C). In order to estimate the degree of resolution of the MiniSOG label in our tomograms, line scans through individual densities were performed. Line scans perpendicular to the membrane showed that densities peaked sharply and exhibited a half maximum width of about 8–10 nm (Figure 8B). Line scans along caveolar membranes resolved densities separated by about 10 nm, confirming our previous data on thin sections (Figure 8C: compare to 7F). We carried out three-dimensional reconstructions of regions of the caveolar surface where such periodic density maxima were well resolved (Figure 8D). These reconstructions revealed both local maxima and apparent linear striations within the coat (Figure 8D and Movie S2). In some regions, the density maxima had a regular, lattice-like distribution, suggesting that the distribution of MiniSOG reflects an underlying higher-order lattice organisation of the caveolar coat. The three-dimensional reconstructions also reinforced the firm conclusion that the MiniSOG label, and hence the caveolar coat complex, are found all around the caveolar bulb without specific enrichment in the sides, apex, or neck of caveolae. Download: PPT PowerPoint slide PNG larger image TIFF original image Figure 8. Tomographic reconstruction of the caveolar coat stained with cavin-3-MiniSOG. Tomograms were recorded from representative regions of photooxidized RPE cells stably transfected with cavin-3-miniSOG-mCherry. (A) x/y projection of a typical tomogram acquired at 150 kV, 18k at a pixel size of 0.5 nm. The backprojection shown was binned by 2, resulting in a pixel size of 1 nm. (B and C) Line scans through density maxima observed in x/y projections. (B) Line scans were performed perpendicular to the membrane and the relative intensity plotted against distance, including error bars (n = 11). (C) Representative line scan along the membrane showing regular spacing between local maxima. Note that distances of about 8–10 nm can be resolved. (D) Three-dimensional representation of the caveolar coat stained with cavin-3-MiniSOG, for two separate caveolae. From left to right: x/y backprojections of representative caveolae; surfaces of the caveolae shown on the left; maximum intensity projection of the boxed area shown in column 2; surface rendering of the boxed area shown in column 2. https://doi.org/10.1371/journal.pbio.1001640.g008 Discussion We show that caveolins and cavins can be purified as a single species of protein complex that excludes EHD2 and pacsin 2. We term this complex the caveolar coat complex. Purification of the caveolar coat complex, and a comprehensive quantitative analysis of its composition, allows us to put forward a model of the basic stoichiometry. Cavin 1 is a core component of the complex, and several independent experiments provide evidence for cavin 1 forming a trimer: Firstly, we identified a 180 kDa cavin 1 species, a size compatible with a trimer, which was relatively stable. Secondly, cavin 2 and cavin 3 interact with cavin 1 at a molar ratio of about 1∶3, suggesting that one cavin 2 or cavin 3 molecule associates with the cavin 1 trimer. Thirdly, cavin 1 is predicted to form a three-stranded coiled coil via its N-terminal domain (http://groups.csail.mit.edu/cb/multicoil/cgi-bin/multicoil.cgi) [54]. We therefore suggest that the core of the coat complex is composed of a cavin 1 trimer. For each cavin 1 molecule in the complex, there are likely to be 4 caveolins. Both in our experiments and previously [49],[50],[55], an SDS-resistant oligomeric state of caveolin 1 has been detected, with an apparent molecular mass of 350–400 KDa. Although this seems slightly too large to be the 12 caveolins predicted by our stoichiometric measurements and the presence of cavin 1 trimers, it is possible that SDS-resistant complexes run anomalously on SDS-PAGE due to pronounced secondary or tertiary structure. Our data imply that the basic unit of 1× cavin 2 or 3∶3× cavin 1∶12× caveolin must assemble into larger multimers to generate the 80S caveolar coat complex. Whether the 80S complex represents all of the coat present on an individual caveolar bulb or an intermediate level of structural organisation is not yet clear. Previous studies have shown that cavins and caveolin 1 in detergent-solubilised lysates fractionate differently on gradients and do not efficiently co-precipitate, which could be interpreted as arguing for lack of interaction in intact cells [22],[35]. Although our cross-linking data strongly suggest that cavin 1 and caveolin 1 do in fact interact, it will be important to demonstrate this directly. Higher resolution structural information on cavins and caveolins individually and in complexes is required. This may allow elucidation of the precise nature of the molecular contacts made between cavin 1 and caveolin 1. Nevertheless, it is clear that a core complex containing cavin 1 and caveolin 1 is not dependent on the presence of cavins 2 or 3, as the ratio between cavin 1 and caveolin 1 in this complex does not change when cavin 2 or cavin 3 are not present. Biochemical and imaging experiments argue that cavin 2 and cavin 3 compete for binding to the cavin 1 trimer, as cavin 2 can displace cavin 3 from the complex and vice versa. This implies that, when cavin 2 and cavin 3 are both present in the same cell, the 80S caveolar coat complex will contain both cavin 1 trimers bound to cavin 2, and cavin 1 trimers bound to cavin 3. The observations that cavin 1 will co-precipitate cavin 2 or cavin 3 even in the absence of caveolin 1, but cavins 2 and 3 do not co-precipitate in the absence of cavin 1, are consistent with this. We hypothesise that it is the balance between cavin 2 and cavin 3 that confers additional structural and functional properties on the coat complex. Either overexpression of cavin 2 or siRNA-mediated knockdown of cavin 3 caused a slight but measurable dissociation of cavin 1 and caveolin 1 from the coat complex. Moreover, cavin 2 and cavin 3 formed separate complexes with cavin 1 under noncrosslinking conditions, with the former being smaller or less stable than the latter. These combined data suggest that shifting the ratio between cavin 2 and cavin 3 within the 80S complex towards having relatively more cavin 2 could make the complex less stable, and vice versa. This might be a means to modulate caveolar functions in different tissues, as the ratio between cavin 2 and cavin 3 varies in vivo [22],[34], and the presence of cavin 2 is associated with apparently smaller cavin complexes in vivo [34]. Therefore, the findings presented here correlate with, and imply molecular explanations for, the in vivo data. Further in vivo experiments will be needed to address the functional and physiological consequences of the variation of the cavin 2 and cavin 3 complement within the caveolar coat complex. Both MiniSOG-tagged cavins and immuno-EM show that the caveolar coat complex is found all around the caveolar bulb, while the caveolar neck is likely to contain additional proteins including EHD2. Notably, our data do not provide any evidence for EHD2 making direct contact with the caveolar coat complex [27],[28], leading to the concept that the neck region constitutes a separate subdomain within caveolae that is distinct from the rest of the bulb [41]. The observation that siRNAs that efficiently target expression of EHD2 have no effect on the size or composition of the caveolar coat complex provides additional evidence that EHD2 has a separate role within caveolae. The previous finding that EHD2 controls the plasma membrane association and dynamics of caveolae [27],[28] leads to the additional conclusion that the coat complex is likely to be the same whether caveolae exist as characteristic flask-shaped invaginations of the plasma membrane, or as intracellular membrane vesicles. The production of reactive oxygen species by MiniSOG, and consequent deposition of osmiophilic DAB polymer, provides a highly specific label for electron microscopy [48], and our data highlight its utility as a probe for cell biological structures. Local maxima in density produced by MiniSOG are likely to reflect increased local concentration of the tag and hence the tagged protein. Our images suggest that diffusion of reactive oxygen and DAB product away from the MiniSOG is limited, as line scans across the consequent electron density reveal that periodic changes in density over a distance scale of 10 nm can be clearly resolved. Nevertheless, diffusion of singlet oxygen or DAB reaction product is likely to provide a limit to the resolution of MiniSOG generated label. The ultimate resolution achievable by the MiniSOG molecular contrasting system remains to be determined, but structural details measuring a few nanometers have been observed [48]. We show that the caveolar coat complex generates a lattice on the surface of caveolae that can be detected using MiniSOG fusions, as local density maxima in thin sections and in tomographic reconstructions of the caveolar surface. We present evidence from both TEM on thin sections and 3D tomography that the coat is regular, with a periodic spacing of 10–15 nm. Ridges or striations can be observed, as shown in Figure 8D, and some regions of the surface show regular arrays of maxima. Our data, however, do not completely resolve the high-resolution internal geometry of this lattice. It is possible that regions containing less well defined densities and occasional gaps in the lattice are due to local cellular factors restricting or enhancing diffusion of the MiniSOG-produced electron dense stain. It is also possible that the caveolar coat is not as well ordered or arrayed as, for example, clathrin-coated pits. Nevertheless, the key point is that the coat represents a single type of complex coating the caveolar bulb, rather than previously plausible alternative possibilities such as cavins and caveolins being present in different complexes with different distributions. The observation of local maxima in MiniSOG density with a spacing of around 10–15 nm agrees very well with local density maxima around the caveolar bulb in ultrathin plastic sections prepared from samples stained conventionally [41], and with the spacing of striations on the surface of caveolae revealed by platinum coating of membrane fragments [6],[43]. It is therefore likely that the caveolar coat complex described here is responsible for previously reported, but molecularly undefined, ultrastructural features on the surface of caveolae. Our data, revealing the identity, basic stoichiometry, and distribution of this unitary complex around the caveolar bulb, open the way for further structural characterisation of the protein machinery for generating caveolae. Materials and Methods Antibodies The following antibodies were used: Mouse anti-GFP (Roche, 11814460001), rabbit anti-PTRF (cavin 1) (Abcam, ab48824), rabbit anti-SRBC (PRKCDBP; cavin 3) (Abcam, ab83913), goat anti-SDPR (cavin 2) (R&D Systems, AF5759), goat anti-EHD2 (Abcam, ab23935), rabbit anti-Caveolin 1 (BD, 610060), mouse anti-flotillin-1 (BD, 610821), mouse anti-flotillin-2 (BD, 610384), mouse anti-clathrin heavy chain (×11), rabbit anti-GFP antibody (Abcam, ab6556), and rabbit anti-RFP (MBL, PM005). Protein Constructs and Cell Lines Constructs for human PTRF-mCherry (cavin 1), human SDPR-mCherry (cavin 2), and human SRBC-mCherry (cavin 3) have been described [21]. To generate TEV-GFP-10×His fusion constructs, a TEV-GFP-10×His cassette was inserted into the BamHI/NotI sites of pClontech N1. The respective cDNAs were inserted in frame. To generate MiniSOG-mCherry cavin constructs, MiniSOG cDNA was inserted into pClontech N1 via BamHI/AgeI. Constructs were transfected into HeLa and RPE cells using FugeneHD (Promega) according to the manufacturer's recommendations. Clonal HeLa cell lines expressing cavin-TEV-GFP-10×His proteins were produced from single cell clones and cultured in DMEM, 10% FCS, penicillin/streptomycin supplemented with 0.4 mg/ml G418 (Sigma). RPE cells expressing cavin-MiniSOG-mCherry were selected by FACS and cultured in 50% DMEM/50% F12 medium, 10% FCS, 5 mM glutamine, penicillin/streptomycin, supplemented with 0.4 mg/ml G418. Crosslinking, Cell Lysis, and Velocity Gradient Centrifugation For crosslinking studies, semiconfluent cultures of HeLa cells were washed twice with ice-cold PBS and incubated on ice with 1.2 mM DSP (Pierce) in ice-cold PBS for 1 h. A 100× DSP stock solution was prepared in DMSO fresh prior to use. After 1 h, DSP was quenched by addition of 1 M Tris pH 7.4 to a final concentration of 100 mM for 15 min. Cells were briefly rinsed in 100 mM Tris pH 8 and immediately scraped into lysis buffer (LB): 50 mM Tris pH 8, 300 mM NaCl, 5 mM EDTA, protease inhibitor cocktail (Roche). Dependent on the experiment, either 1% (v/v) Triton X-100, 2% (w/v) octyl-glucoside (OG), or a combination of 1% Triton X-100/1% OG were added to LB. Cell lysates were incubated on ice for 30 min and spun at 14,000 rpm in a table top centrifuge for 30 min at 4°C, followed by a second centrifugation for 10 min. Lysates were added atop a linear 10–40% (w/v) sucrose gradient prepared in LB plus 0.2% Triton X-100. Gradients were spun in a SW40Ti rotor at 37,000 rpm for 6 h at 4°C. Twelve 1 ml fractions were collected from the bottom of the gradient by tube puncture. For Western blotting, equal volumes (usually 250 µl) of each fraction were precipitated with MeOH/Chloroform. The pellet was dissolved in 1×LDS loading buffer (Invitrogen) and boiled for 2 min. Proteins were separated on NuPAGE 4–20% Tris/Glycine or 4–12% Bis/Tris gels (Invitrogen) and blotted onto PVDF membranes (Millipore). Immunoisolation For immunoisolation of GFP-tagged proteins, magnetic anti-GFP microbeads, and μcolumns (Miltenyi Biotech) were used. For immunoisolation of the HMW complex from sucrose gradients, fractions 8–10 were pooled (total 3 ml) and incubated with 20 µl anti-GFP beads for 2–4 h at 4°C rotating. Alternatively, 1 ml of total cell lysate was incubated with 10 µl anti-GFP beads. Lysates were applied to μcolumns and washed eight times with 2 ml of LB/1% Triton X-100 at room temperature. A final wash was performed with LB without detergent added. Protein complexes were eluted from the column with 140 µl 0.1 M TEA, pH 11.8, immediately neutralized by addition of 70 µl 1 M Tris, pH 7.4, and subjected to tandem mass spectrometry. Alternatively, protein complexes were eluted with elution buffer (Miltenyi Biotech) and separated by SDS-PAGE. Stoichiometric Analysis of the Caveolar Coat Complex Immunoisolates from total cell lysates were separated on 4–12% NuPAGE Bis/Tris gels. Gels were washed twice in distilled H20 for 5 min each and stained with the fluorescent protein dye SYPRO RUBY (Lonza) for 1 h at room temperature. Gels were washed several times in distilled H20 and the fluorescence scanned on a Chemidoc XRS+ Molecular Imager. The intensities of protein bands corresponding to cavin-GFP fusion proteins, cavin 1, cavin 3, and caveolins were determined using Image Lad software. Bands corresponding to alpha and beta caveolin 1 as well as caveolin 2 could not be resolved clearly and were thus quantified as one band. All values were corrected for by subtracting background fluorescence from the same molecular weight regions of GFP control samples. To calculate relative molar ratios between the protein components of the complex, the following molecular masses were used: Cavin1-GFP, 80 kDa; cavin-2-GFP, 90 kDa; cavin-3-GFP, 65 kDa; cavin 1, 55 kDa, cavin 3, 36 kDa, caveolin, 20 kDa. For documentation, data were exported to Prism Graphpad. siRNA Transfection of HeLa Cells On-target Plus SMART pool siRNAs against human cavin 2, human cavin 3, and human EHD2 were from Thermo Scientific (L-015910, L-016416, and L-016660, respectively). siRNAs against human flotillin 1 and flotillin 2 (J-010636-05, J-010636-06, J-003666-09, J-003666-10) were pooled and used as a control throughout. Cavin-1-TEV-GFP-10×HIS HeLa cell lines were transfected at 30% confluency using Oligofectamine (Invitrogen) and a total of 100 nM siRNA per transfection. Four to five days posttransfection, cells were cross-linked with 1.2 mM DSP as described above and then lysed in LB/1% OG/1% Triton X-100. Cell lysates were cleared by centrifugation at 14,000 rpm for 30 min and loaded atop 10–40% sucrose gradients. Gradients were spun as described above. To quantify the relative amounts of cavin 1 and caveolin 1 in the low molecular weight (LMW; fractions 3–5) and high molecular weight fractions (HMW; fractions 8–10), equal volumes of each fraction were pooled, MeOH/chloroform precipitated, and analysed by Western blotting. The ratio of caveolin 1 to cavin 1 in the HMW pool 8–10 was calculated from three independent experiments, using densitometry in ImageJ. Immunoprecipitation from MEFs Wild-type MEFs, caveolin 1 −/− MEFs, or cavin 1 −/− MEFs [34] were co-transfected with equal amounts of pDNA using electroporation. Different combinations of the following constructs were used: Cavin-1-TEV-GFP-10×HIS, cavin-2- TEV-GFP-10×HIS, cavin3-miniSOG-mCherry, cavin-2-miniSOG-mCherry. Per co-transfection, 1×106 cells were transfected with 2.5 µg of each pDNA. Twenty-four h posttransfection, cells were cross-linked with 3 mM DSP as described above and lysed in LB/1% OG/1% Triton X-100. Lysates were cleared by centrifugation at 14,000 rpm and incubated with 10 µl anti-GFP microbeads (Miltenyi Biotec) for 2 h at 4°C. Immunprecipitates were washed five times with LB/1% Triton X-100 and eluted with 60 µl elution buffer (Miltenyi Biotec). Equal volumes were analysed by Western blotting. Light Microscopy HeLa or RPE cells were fixed in 4% paraformaldehyde in PBS pH 7.4 for 10 min and stained with primary antibodies o/n in PBS, 3% FCS, 0.2% saponin. Cells were washed with PBS and incubated in secondary antibodies for 1 h. TIRF microscopy was carried out using an Olympus IX71. Confocal micrographs were captured on a ZEISS 510 LSM using standard filter sets. Co-localisation was quantifed using the Pearson correlation coefficient, as implemented in the “Colocalization Finder “plugin for Image J (http://rsbweb.nih.gov/ij/plugins/colocalization-finder.html). Immuno-Labeling HeLa cells were grown on glass bottom petri dishes (MatTek) and fixed in 4% Paraformaldehyde in 0.1 M phosphate buffer pH 7.4 overnight at 4°C. After several buffer washes, followed by inactivation of reactive aldehyde groups using 0.1% sodium borohydride in phosphate buffer for 15 min, cells were permeabilised using 0.03% saponin in 20 mM phosphate buffer, 150 mM sodium chloride for 30 min. Cells were incubated in normal goat serum (Aurion) for 40 min prior to incubation in rabbit anti-GFP antibody (Abcam) used at 1∶800 for 4.5 h at room temperature. After thorough washing, cells were incubated with 1∶200 dilution of goat anti-rabbit ultrasmall gold (Aurion) overnight at 4°C. After washing cells were fixed with 2% glutaraldehyde in 0.1 M phosphate buffer for 30 min and washed with distilled water followed by silver enhancement of gold using R-Gent SE-EM (Aurion) reagents. Cells were then postfixed with 0.5% osmium tetraoxide in 0.1 M phosphate buffer on ice for 15 min. Cells were then dehydrated in an ascending ethanol series and embedded in CY212 resin. Ultrathin sections were stained with saturated aqueous uranyl acetate and Reynolds lead citrate and examined using a Philips 208 EM operated at 80 kV. MiniSOG Photooxidation and Electron Microscopy For photooxidation, cells cultured in MatTec glass bottom dishes were fixed at room temperature with 2% glutaraldehyde (EM grade, EMS Corp.), 2.5 mM CaCl2 in 0.1 M cacodylate buffer pH 7.4 (CB), and immediately transferred onto ice for 1 h. Cells were rinsed five times with ice-cold CB and blocked with 50 mM glycine, 10 mM potassium cyanide, and 10 mM aminotriazole in CB for 15 min on ice. Cells were washed five times with CB and transferred onto a cooled stage on a Leica SPE II confocal microscope. A freshly prepared solution of 0.5 mg/ml diaminobenzidine (DAB, Sigma) in CB was added to the cells. Areas of interest were photooxidized by illumination with blue light, using a 150W xenon lamp, a standard FITC filter set, and a 63× objective NA 1.3. After about 3–4 min, a brownish precipitate formed in place of the fluorescence. Cells were removed from the stage, washed five times with CB, and poststained with 1% osmium tetraoxide in CB for 30 min at room temperature. Cells were washed five times with water, followed by dehydration in 20, 50, 70, 90, and 100% EtOH. Cells were infiltrated in Durcupan ACM resin (EMS Corp.). Photooxidized areas were sawed out of the dish and sectioned. For 2D transmission electron microscopy (TEM), 80 nm sections were sectioned. Electron micrographs were recorded at 80 or 120 kV on a FEI T12 TEM. Images were recorded with Serial EM software and a 2k×2k Gatan CCD camera. 3D Tomography Three-dimensional electron tomography was carried out on 250–300 nm sections at 150 or 300 kV using a FEI Titan TEM. Sections were carbon-coated, glow-discharged, and dipped into a solution of 0.1% BSA and 5 nm colloidal gold particles. Dual tilt series were recorded at +/−60° with 1° intervals and a pixel size of 0.5 nm (at 18k). Images were captured using a 4k×4k Gatan Ultrascan 4000 camera. Reconstruction was accomplished using a combination of IMOD [56] and TxBR [57] reconstruction packages. Rough alignment of the two tilt series was done with IMOD software package, and fine alignment and reconstruction was done using the TxBR package. Antibodies The following antibodies were used: Mouse anti-GFP (Roche, 11814460001), rabbit anti-PTRF (cavin 1) (Abcam, ab48824), rabbit anti-SRBC (PRKCDBP; cavin 3) (Abcam, ab83913), goat anti-SDPR (cavin 2) (R&D Systems, AF5759), goat anti-EHD2 (Abcam, ab23935), rabbit anti-Caveolin 1 (BD, 610060), mouse anti-flotillin-1 (BD, 610821), mouse anti-flotillin-2 (BD, 610384), mouse anti-clathrin heavy chain (×11), rabbit anti-GFP antibody (Abcam, ab6556), and rabbit anti-RFP (MBL, PM005). Protein Constructs and Cell Lines Constructs for human PTRF-mCherry (cavin 1), human SDPR-mCherry (cavin 2), and human SRBC-mCherry (cavin 3) have been described [21]. To generate TEV-GFP-10×His fusion constructs, a TEV-GFP-10×His cassette was inserted into the BamHI/NotI sites of pClontech N1. The respective cDNAs were inserted in frame. To generate MiniSOG-mCherry cavin constructs, MiniSOG cDNA was inserted into pClontech N1 via BamHI/AgeI. Constructs were transfected into HeLa and RPE cells using FugeneHD (Promega) according to the manufacturer's recommendations. Clonal HeLa cell lines expressing cavin-TEV-GFP-10×His proteins were produced from single cell clones and cultured in DMEM, 10% FCS, penicillin/streptomycin supplemented with 0.4 mg/ml G418 (Sigma). RPE cells expressing cavin-MiniSOG-mCherry were selected by FACS and cultured in 50% DMEM/50% F12 medium, 10% FCS, 5 mM glutamine, penicillin/streptomycin, supplemented with 0.4 mg/ml G418. Crosslinking, Cell Lysis, and Velocity Gradient Centrifugation For crosslinking studies, semiconfluent cultures of HeLa cells were washed twice with ice-cold PBS and incubated on ice with 1.2 mM DSP (Pierce) in ice-cold PBS for 1 h. A 100× DSP stock solution was prepared in DMSO fresh prior to use. After 1 h, DSP was quenched by addition of 1 M Tris pH 7.4 to a final concentration of 100 mM for 15 min. Cells were briefly rinsed in 100 mM Tris pH 8 and immediately scraped into lysis buffer (LB): 50 mM Tris pH 8, 300 mM NaCl, 5 mM EDTA, protease inhibitor cocktail (Roche). Dependent on the experiment, either 1% (v/v) Triton X-100, 2% (w/v) octyl-glucoside (OG), or a combination of 1% Triton X-100/1% OG were added to LB. Cell lysates were incubated on ice for 30 min and spun at 14,000 rpm in a table top centrifuge for 30 min at 4°C, followed by a second centrifugation for 10 min. Lysates were added atop a linear 10–40% (w/v) sucrose gradient prepared in LB plus 0.2% Triton X-100. Gradients were spun in a SW40Ti rotor at 37,000 rpm for 6 h at 4°C. Twelve 1 ml fractions were collected from the bottom of the gradient by tube puncture. For Western blotting, equal volumes (usually 250 µl) of each fraction were precipitated with MeOH/Chloroform. The pellet was dissolved in 1×LDS loading buffer (Invitrogen) and boiled for 2 min. Proteins were separated on NuPAGE 4–20% Tris/Glycine or 4–12% Bis/Tris gels (Invitrogen) and blotted onto PVDF membranes (Millipore). Immunoisolation For immunoisolation of GFP-tagged proteins, magnetic anti-GFP microbeads, and μcolumns (Miltenyi Biotech) were used. For immunoisolation of the HMW complex from sucrose gradients, fractions 8–10 were pooled (total 3 ml) and incubated with 20 µl anti-GFP beads for 2–4 h at 4°C rotating. Alternatively, 1 ml of total cell lysate was incubated with 10 µl anti-GFP beads. Lysates were applied to μcolumns and washed eight times with 2 ml of LB/1% Triton X-100 at room temperature. A final wash was performed with LB without detergent added. Protein complexes were eluted from the column with 140 µl 0.1 M TEA, pH 11.8, immediately neutralized by addition of 70 µl 1 M Tris, pH 7.4, and subjected to tandem mass spectrometry. Alternatively, protein complexes were eluted with elution buffer (Miltenyi Biotech) and separated by SDS-PAGE. Stoichiometric Analysis of the Caveolar Coat Complex Immunoisolates from total cell lysates were separated on 4–12% NuPAGE Bis/Tris gels. Gels were washed twice in distilled H20 for 5 min each and stained with the fluorescent protein dye SYPRO RUBY (Lonza) for 1 h at room temperature. Gels were washed several times in distilled H20 and the fluorescence scanned on a Chemidoc XRS+ Molecular Imager. The intensities of protein bands corresponding to cavin-GFP fusion proteins, cavin 1, cavin 3, and caveolins were determined using Image Lad software. Bands corresponding to alpha and beta caveolin 1 as well as caveolin 2 could not be resolved clearly and were thus quantified as one band. All values were corrected for by subtracting background fluorescence from the same molecular weight regions of GFP control samples. To calculate relative molar ratios between the protein components of the complex, the following molecular masses were used: Cavin1-GFP, 80 kDa; cavin-2-GFP, 90 kDa; cavin-3-GFP, 65 kDa; cavin 1, 55 kDa, cavin 3, 36 kDa, caveolin, 20 kDa. For documentation, data were exported to Prism Graphpad. siRNA Transfection of HeLa Cells On-target Plus SMART pool siRNAs against human cavin 2, human cavin 3, and human EHD2 were from Thermo Scientific (L-015910, L-016416, and L-016660, respectively). siRNAs against human flotillin 1 and flotillin 2 (J-010636-05, J-010636-06, J-003666-09, J-003666-10) were pooled and used as a control throughout. Cavin-1-TEV-GFP-10×HIS HeLa cell lines were transfected at 30% confluency using Oligofectamine (Invitrogen) and a total of 100 nM siRNA per transfection. Four to five days posttransfection, cells were cross-linked with 1.2 mM DSP as described above and then lysed in LB/1% OG/1% Triton X-100. Cell lysates were cleared by centrifugation at 14,000 rpm for 30 min and loaded atop 10–40% sucrose gradients. Gradients were spun as described above. To quantify the relative amounts of cavin 1 and caveolin 1 in the low molecular weight (LMW; fractions 3–5) and high molecular weight fractions (HMW; fractions 8–10), equal volumes of each fraction were pooled, MeOH/chloroform precipitated, and analysed by Western blotting. The ratio of caveolin 1 to cavin 1 in the HMW pool 8–10 was calculated from three independent experiments, using densitometry in ImageJ. Immunoprecipitation from MEFs Wild-type MEFs, caveolin 1 −/− MEFs, or cavin 1 −/− MEFs [34] were co-transfected with equal amounts of pDNA using electroporation. Different combinations of the following constructs were used: Cavin-1-TEV-GFP-10×HIS, cavin-2- TEV-GFP-10×HIS, cavin3-miniSOG-mCherry, cavin-2-miniSOG-mCherry. Per co-transfection, 1×106 cells were transfected with 2.5 µg of each pDNA. Twenty-four h posttransfection, cells were cross-linked with 3 mM DSP as described above and lysed in LB/1% OG/1% Triton X-100. Lysates were cleared by centrifugation at 14,000 rpm and incubated with 10 µl anti-GFP microbeads (Miltenyi Biotec) for 2 h at 4°C. Immunprecipitates were washed five times with LB/1% Triton X-100 and eluted with 60 µl elution buffer (Miltenyi Biotec). Equal volumes were analysed by Western blotting. Light Microscopy HeLa or RPE cells were fixed in 4% paraformaldehyde in PBS pH 7.4 for 10 min and stained with primary antibodies o/n in PBS, 3% FCS, 0.2% saponin. Cells were washed with PBS and incubated in secondary antibodies for 1 h. TIRF microscopy was carried out using an Olympus IX71. Confocal micrographs were captured on a ZEISS 510 LSM using standard filter sets. Co-localisation was quantifed using the Pearson correlation coefficient, as implemented in the “Colocalization Finder “plugin for Image J (http://rsbweb.nih.gov/ij/plugins/colocalization-finder.html). Immuno-Labeling HeLa cells were grown on glass bottom petri dishes (MatTek) and fixed in 4% Paraformaldehyde in 0.1 M phosphate buffer pH 7.4 overnight at 4°C. After several buffer washes, followed by inactivation of reactive aldehyde groups using 0.1% sodium borohydride in phosphate buffer for 15 min, cells were permeabilised using 0.03% saponin in 20 mM phosphate buffer, 150 mM sodium chloride for 30 min. Cells were incubated in normal goat serum (Aurion) for 40 min prior to incubation in rabbit anti-GFP antibody (Abcam) used at 1∶800 for 4.5 h at room temperature. After thorough washing, cells were incubated with 1∶200 dilution of goat anti-rabbit ultrasmall gold (Aurion) overnight at 4°C. After washing cells were fixed with 2% glutaraldehyde in 0.1 M phosphate buffer for 30 min and washed with distilled water followed by silver enhancement of gold using R-Gent SE-EM (Aurion) reagents. Cells were then postfixed with 0.5% osmium tetraoxide in 0.1 M phosphate buffer on ice for 15 min. Cells were then dehydrated in an ascending ethanol series and embedded in CY212 resin. Ultrathin sections were stained with saturated aqueous uranyl acetate and Reynolds lead citrate and examined using a Philips 208 EM operated at 80 kV. MiniSOG Photooxidation and Electron Microscopy For photooxidation, cells cultured in MatTec glass bottom dishes were fixed at room temperature with 2% glutaraldehyde (EM grade, EMS Corp.), 2.5 mM CaCl2 in 0.1 M cacodylate buffer pH 7.4 (CB), and immediately transferred onto ice for 1 h. Cells were rinsed five times with ice-cold CB and blocked with 50 mM glycine, 10 mM potassium cyanide, and 10 mM aminotriazole in CB for 15 min on ice. Cells were washed five times with CB and transferred onto a cooled stage on a Leica SPE II confocal microscope. A freshly prepared solution of 0.5 mg/ml diaminobenzidine (DAB, Sigma) in CB was added to the cells. Areas of interest were photooxidized by illumination with blue light, using a 150W xenon lamp, a standard FITC filter set, and a 63× objective NA 1.3. After about 3–4 min, a brownish precipitate formed in place of the fluorescence. Cells were removed from the stage, washed five times with CB, and poststained with 1% osmium tetraoxide in CB for 30 min at room temperature. Cells were washed five times with water, followed by dehydration in 20, 50, 70, 90, and 100% EtOH. Cells were infiltrated in Durcupan ACM resin (EMS Corp.). Photooxidized areas were sawed out of the dish and sectioned. For 2D transmission electron microscopy (TEM), 80 nm sections were sectioned. Electron micrographs were recorded at 80 or 120 kV on a FEI T12 TEM. Images were recorded with Serial EM software and a 2k×2k Gatan CCD camera. 3D Tomography Three-dimensional electron tomography was carried out on 250–300 nm sections at 150 or 300 kV using a FEI Titan TEM. Sections were carbon-coated, glow-discharged, and dipped into a solution of 0.1% BSA and 5 nm colloidal gold particles. Dual tilt series were recorded at +/−60° with 1° intervals and a pixel size of 0.5 nm (at 18k). Images were captured using a 4k×4k Gatan Ultrascan 4000 camera. Reconstruction was accomplished using a combination of IMOD [56] and TxBR [57] reconstruction packages. Rough alignment of the two tilt series was done with IMOD software package, and fine alignment and reconstruction was done using the TxBR package. Supporting Information Figure S1. Cavin/caveolin complexes are sensitive to detergent, but can be stabilized by crosslinking. HeLa cells, either cross-linked with DSP or left untreated, were solubilised in either 0.5% Triton X-100 (A), 1% Triton X-100 (B), or 2% octyl glucoside (C). Lysates were fractionated on 10–40% sucrose gradients, followed by Western blotting of gradient fractions 1–12 using antibodies against caveolin 1 or cavin 1. Without cross-linking, oligomeric complexes of caveolin 1 are sensitive to the detergent used for solubilisation. Full dissociation of caveolin oligomers was achieved with 2% octyl glucoside (C top) or a combination of 1% Triton X-100/1% octyl glucoside (Figure 1A). Note that cross-linking stabilises a high molecular weight (HMW) complex of both caveolin 1 and cavin 1 in all detergents tested. https://doi.org/10.1371/journal.pbio.1001640.s001 (TIF) Figure S2. Isolation of the caveolar coat complex using HeLa cells stably transfected with caveolin-1-GFP. (A) Western blots of lysates from HeLa cells stably transfected with caveolin-1-GFP, flotillin-2-GFP, or GFP. Membranes were probed with anti-GFP, anti-flotillin 2, or anti-caveolin 1 antibodies. Note that caveolin-1-GFP is expressed at low levels relative to endogenous caveolin 1, and that there is no detectable proteolysis of caveolin-1-GFP. (B) Coomassie-stained protein gel of gradient fractions 1–12 prepared from DSP-cross-linked HeLa cells. The HMW peak of fractions 8–10 that contain caveolin 1 and cavin 1 is boxed. Fractions 7–9 are rich in ribosomal proteins (dashed box). The 60S peak obtained for purified 60S ribosomal subunit is indicated. (C) HeLa cells stably transfected with caveolin-1-GFP, flotillin-2-GFP, or GFP were cross-linked with DSP and lysed in 1% Triton X-100/1% octyl glucoside. Lysates were fractionated on 10–40% sucrose gradients and pooled HMW fractions 8–10 used for immuno-isolation of the caveolar coat complex. Immuno-isolates were probed with anti-caveolin 1, anti-cavin 1, anti-cavin 3, or anti-flotillin 1 antibodies. https://doi.org/10.1371/journal.pbio.1001640.s002 (TIF) Figure S3. Chemical crosslinking does not perturb the distribution of caveolin 1 or cavins. (A) Confocal images of cells stably expressing cavin-1-GFP, fixed, and stained with caveolin 1 antibodies after incubation for 1 h in PBS, 1% DMSO (−DSP) or PBS, 1% DMSO, 1.2 mM DSP (+DSP). Bars are 20 µm. (B) Cells treated as in (A), but imaged using total internal reflection microscopy. Bars are 5 µm. (C) Cells stably transfected with either cavin-3-GFP or cavin-3-MiniSOG-mCherry were plated out either separately, or mixed together at a 1∶1 ratio in the same dish. They were cross-linked with DSP, lysed, and immuno-precipiatated with anti-GFP antibodies. The lysates and immuno-precipitates were analysed by Western blotting with the indicated antibodies. Note the absence of cavin-3-MiniSOG-mCherry in all immuno-precipitates. https://doi.org/10.1371/journal.pbio.1001640.s003 (TIF) Figure S4. Cavin 1, 2, and 3 with a C-terminal TEV-GFP-10×His tag localise correctly and are expressed at low levels. (A) HeLa cells stably transfected with cavin 1, 2, or 3-TEV-GFP-10×His were aldehyde-fixed, stained with anti-caveolin 1 antibodies, and analysed by TIRF microscopy. (B) Hela cells stably transfected with cavin-3-TEV-GFP-10×His were transfected with plasmid expressing cavin-2-mCherry, fixed, and stained with anti-cavin-1 antibodies, before analysis by confocal microscopy. The lower panels show a zoomed in view of the box in the main panels. Bar is 5 µm. (C) Western blots of lysates from HeLa cells stably transfected with cavin 1, 2, or 3-TEV-GFP-10×His, caveolin-1-GFP, flotillin-2-GFP, or GFP, probed with antibodies against cavin 3 and cavin 1. Note that the expression of endogenous cavin 3 is markedly and specifically reduced in cells stably transfected with cavin-3-TEV-GFP-10×His, and that the expression level of GFP-tagged cavin 3 is comparable to the level of endogenous cavin 3 in the other cell lines. ? indicate nonspecific bands observed using the cavin 3 antibody. These bands are still present in cells lacking the gene for cavin 3 (not shown). https://doi.org/10.1371/journal.pbio.1001640.s004 (TIF) Figure S5. Cavin 1 co-precipitates cavin 2 and cavin 3 in the absence of caveolin 1, but cavin 2 and cavin 3 do not co-precipitate without cavin 1. (A) Embryonic fibroblasts from congenic control and caveolin 1 knockout (KO) mice were transfected with plasmids expressing the constructs denoted by x in each lane (cavin-2-mCh is cavin-2-MiniSOG-mCherry; cavin-3-mCh is cavin-3-MiniSOG-mCherry). Cells were cross-linked with DSP, lysed, and immuno-precipitated with anti-GFP antibodies. The lysates and immunoprecipitates were analysed by Western blotting with the antibodies indicated. * indicates a background band detected by the anti-cavin-3 antibody in cell lysates. (B) Embryonic fibroblasts from congenic control and cavin 1 knockout (KO) mice were transfected with plasmids expressing cavin-2-GFP and cavin-3-MiniSOG-mCherry (shown as cavin-3-mCh). Cells were cross-linked with DSP, lysed, and immuno-precipitated with anti-GFP antibodies. The lysates and immunoprecipitates were analysed by Western blotting with the antibodies indicated. * indicates a background band detected by the anti-cavin-3 antibody in cell lysates, while<indicates cross-reaction between the anti-cavin-3 antibody or secondary antibody and immunoglobulin heavy chains present in the immunoprecipitates. https://doi.org/10.1371/journal.pbio.1001640.s005 (TIF) Figure S6. Partial reduction of cross-linked caveolar coat complexes provides evidence for subcomplexes. (A and B0 HeLa cells stably transfected with caveolin-1-GFP or cavin 1, 2, or 3-GFP were cross-linked with 1.2 mM DSP and lysed in 1% Triton X-100/1% octyl glucoside. The caveolar coat complex was immuno-isolated from each cell lysate using anti-GFP antibodies. Immuno-isolates were incubated with 0, 1, or 2 mM DTT for 15 min at 37°C, boiled for 2 min, and separated by 4–20% SDS-PAGE. Western blots were performed with antibodies against cavin 1 (A) or caveolin 1 (B). (A) Cavin 1 forms a stable trimer. Under nonreducing conditions (no DTT), most cavin 1 is found in oligomeric forms. The cavin 1 trimer (180 kDa), which is stable at 2 mM DTT, is indicated (see also Figure 4D). Note that cavin-1-GFP (indicated by an asterisk) runs slightly above an 80 kDa band of cavin 1. (B) Caveolin 1 forms a stable 350–400 kDa complex, indicated as the caveolin 1 oligomer, and a stable dimer. Note that expression of caveolin-1-GFP alters the molecular weights of oligomeric forms of caveolin 1. https://doi.org/10.1371/journal.pbio.1001640.s006 (TIF) Figure S7. Sucrose velocity gradients analysing the caveolar coat complex in cells treated with siRNAs to knock down cavin 2, cavin 3, and EHD2 expression. (A) Cells stably transfected with cavin-1-GFP, cavin-2-GFP, or cavin-3-GFP were transfected with siRNAs to knock down expression of flotillin 1 and 2, cavin 2, or cavin 3. As endogenous cavin 2 cannot be detected with available antibodies in HeLa cells, knock down of the stably expressed cavin-2-GFP provides a way of confirming that the cavin 2 siRNAs function efficiently. Efficiency of knockdown was assessed by Western blotting of cell lysates with the antibodies indicated under each panel. * denotes a background band detected with the anti-cavin-3 antibody: note that this band is not altered by cavin 3 siRNA treatment, while the specific cavin 3 band disappears. (B) Cells stably transfected with cavin-1-GFP were transfected with siRNAs to knock down expression of flotillin 1 and 2, cavin 2, cavin 3, or EHD2 as indicated. Cells were cross-linked with DSP before lysis and analysis of the caveolar coat complex on sucrose gradients as previously. Fractions from the gradients were analysed by Western blotting with the antibodies indicated. * denotes a background band detected with the anti-cavin-3 antibody. Fractions 3–5 (Low Molecular Weight) and 8–10 (High Molecular Weight) were pooled for quantitative side-by-side analysis of total protein levels as shown in Figure 5B and 5C. https://doi.org/10.1371/journal.pbio.1001640.s007 (TIF) Figure S8. Representative images of caveolae immuno-labelled with anti-GFP, anti-caveolin-1, and anti-cavin-1 antibodies. HeLa cell lines stably transfected with either cavin 1, 2, or 3-GFP were processed for immuno-electron microscopy, using pre-embedding labeling with anti-GFP primary antibodies, nanogold-conjugated secondary antibodies, and silver enhancement. Untransfected cells were processed in the same way, but were labelled with anti-caveolin-1 or anti-cavin-1 antibodies as shown. https://doi.org/10.1371/journal.pbio.1001640.s008 (TIF) Figure S9. Cavin 1, 2, and 3 with a C-terminal MiniSOG-mCherry tag localise correctly. (A) HeLa cells transfected with cavin 1, 2, or 3-MiniSOG-mCherry were aldehyde-fixed, stained with anti-caveolin-1 antibodies, and analyzed by TIRF microscopy. (B) RPE cells stably transfected with cavin 1, 2, or 3-MiniSOG-mCherry were aldehyde-fixed and inspected by confocal microscopy. Note the pronounced polarization of cavins at the cell rear. https://doi.org/10.1371/journal.pbio.1001640.s009 (TIF) Figure S10. Cavin-MiniSOG proteins specifically label caveolar membranes. Transmission electron micrograph of a photooxidized and osmium-tetraoxide stained RPE cell stably transfected with cavin-2-MiniSOG-mCherry. Boxed regions 1 and 2 show caveolar membranes strongly labeled with an electron-dense stain. Boxed region 3 shows small vesicles, apparently clathrin-coated, which are not stained with an electron-dense deposit. Note that mitochondria and putative endosomal membranes (asterisks) are not labeled. https://doi.org/10.1371/journal.pbio.1001640.s010 (TIF) File S1. Excel file containing peptides detected by mass spectrometry of caveolin-1-GFP immunoprecipites. https://doi.org/10.1371/journal.pbio.1001640.s011 (XLSX) Movie S1. 3D electron tomography of the caveolar coat visualised with cavin-3-miniSOG. A sequence of z slices through a 150–200 nm section is shown. The tomogram was recorded at 150 kV at 18k with a pixel size of 0.5 nm. Images are bin 2 back-projections. https://doi.org/10.1371/journal.pbio.1001640.s012 (MOV) Movie S2. 3D electron tomography of the caveolar coat visualised with cavin-3-miniSOG. Shown is a maximum intensity projection of a subregion shown in Movie S1. https://doi.org/10.1371/journal.pbio.1001640.s013 (MOV) Table S1. Pearson correlation coefficients describing co-localisation between cavin-GFP or cavin-MiniSOG constructs and caveolin 1. Also included are comparisons of cavin-1-GFP and flotillin 1, where there is no co-localisation visible by eye, and cavin-1-GFP and cavin 1 antibody staining, where there should be maximal co-localisation. https://doi.org/10.1371/journal.pbio.1001640.s014 (DOCX) Acknowledgments We are grateful to Ari Helenius, ETH Zurich, for providing the caveolin-1-GFP cell line; to Andy Finch, MRC-LMB, for providing purified 60S ribosomal subunit; to Carsten Gram Hansen, MRC-LMB, for preparing MEFs; and to Sebastian Phan, NCMIR, UCSD, for his assistance in collecting multiple tilt tomograms and applying TxBR tools to produce high resolution reconstructions from these data.