Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 7-Day Trial for You or Your Team.

Learn More →

The methylomes of six bacteria

The methylomes of six bacteria 11450–11462 Nucleic Acids Research, 2012, Vol. 40, No. 22 Published online 2 October 2012 doi:10.1093/nar/gks891 1 2 1 2 Iain A. Murray , Tyson A. Clark , Richard D. Morgan , Matthew Boitano , 1 2 1 2 Brian P. Anton , Khai Luong , Alexey Fomenkov , Stephen W. Turner , 2, 1, Jonas Korlach * and Richard J. Roberts * 1 2 New England Biolabs, 240 County Road, Ipswich, MA 01938 and Pacific Biosciences, 1380 Willow Road, Menlo Park, CA 94025, USA Received August 1, 2012; Revised August 31, 2012; Accepted September 3, 2012 ABSTRACT INTRODUCTION We are becoming accustomed to the ever-increasing speed Six bacterial genomes, Geobacter metallireducens and reduced cost with which DNA can be sequenced. GS-15, Chromohalobacter salexigens, Vibrio However, what is often lost in this frenzy of sequencing breoganii 1C-10, Bacillus cereus ATCC 10987, is the fact that DNA consists of more than just four bases. Campylobacter jejuni subsp. jejuni 81-176 and In eukaryotes, we have known for a long time about the C. jejuni NCTC 11168, all of which had previously m5 epigenetic role of 5-methylcytosine ( C), sometimes been sequenced using other platforms were called the fifth base, and more recently it has been found re-sequenced using single-molecule, real-time that 5-hydroxymethylcytosine, 5-formylcytosine and (SMRT) sequencing specifically to analyze their 5-carboxylcytosine are also present (1–4). However, two 6 m6 methylomes. In every case a number of new more modified bases, N -methyladenine ( A) and 6 m6 4 4 m4 N -methyladenine ( A) and N -methylcytosine N -methylcytosine ( C), are also common in bacterial m4 genomes, where they function as components of restric- ( C) methylation patterns were discovered and tion–modification (RM) systems (5). Until recently, these the DNA methyltransferases (MTases) responsible have usually been ignored because of the lack of simple for those methylation patterns were assigned. In methods to determine their locations. However, with the 15 cases, it was possible to match MTase genes advent of single-molecule, real-time (SMRT) sequencing with MTase recognition sequences without further (6–8), it has suddenly become possible to detect these sub-cloning. Two Type I restriction systems modified bases as a part of the routine sequencing required sub-cloning to differentiate their recogni- procedure. tion sequences, while four MTase genes that were The methylated bases that are found in bacterial and not expressed in the native organism were archaeal genomes serve important functions as part of sub-cloned to test for viability and recognition RM systems, where they protect the host chromosome sequences. Two of these proved active. No against the otherwise deleterious action of the partner restriction enzyme(s), which are needed to destroy attempt was made to detect 5-methylcytosine m5 unwanted incoming transmissible DNA elements such as ( C) recognition motifs from the SMRT phages (9). However, in some cases these methyl- sequencing data because this modification transferases (MTases) also serve regulatory roles as with produces weaker signals using current methods. m6 m6 m4 the Dam MTase of Escherichia coli, which introduces A However, all predicted A and C MTases were residues that play a key role in DNA repair and also have detected unambiguously. This study shows that important effects during the initiation of replication (10). the addition of SMRT sequencing to traditional Several studies have also implicated MTases in regulating sequencing approaches gives a wealth of useful gene expression, phase variation and pathogenicity functional information about a genome showing (11,12). Given the many DNA MTases that are typically not only which MTase genes are active but also re- found in prokaryotic genomes, it seems likely that they vealing their recognition sequences. will have hitherto undocumented effects aside from their *To whom correspondence should be addressed. Tel: +978 380 7405; Fax: +978 380 7406; Email: [email protected] Correspondence may also be addressed to Jonas Korlach. Tel: +650 521 8006; Fax: +650 323 9420; Email: jkorlach@pacificbiosciences.com The Author(s) 2012. Published by Oxford University Press. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0/), which permits unrestricted, distribution, and reproduction in any medium, provided the original work is properly cited. Nucleic Acids Research, 2012, Vol. 40, No. 22 11451 key role in RM systems. To date, there has been no the culture collections indicated. Vibrio breoganii 1C-10 genome-wide assessment of the extent of DNA methyla- DNA was a gift from Martin Polz, MIT. Campylobacter tion by known MTases such as E. coli Dam (10) and Dcm jejuni subsp. jejuni 81-176 and C. jejuni NCTC 11168 (13) or the cell cycle MTase, CcrM, of Caulobacter DNAs were a gift from Stuart Thompson, Medical crescentus (14). It is not known if their methylation College of Georgia. specificities are as precise as the customary recognition sequences suggest or whether the enzymes are promiscu- SMRT sequencing ous. This is particularly interesting to know for RM SMRTbell template libraries were prepared as previously systems as there are no obvious selective constraints on described (15,17). Briefly, genomic DNA samples were MTase specificity provided that the core recognition sheared to an average size of 800 bp via adaptive sequence of the restriction enzyme is fully modified. focused acoustics (Covaris; Woburn, MA, USA), end re- Recently, we have shown that by cloning an individual paired and ligated to hairpin adapters. Incompletely MTase gene into a plasmid and propagating it in an other- formed SMRTbell templates were digested with a combin- wise methylation-deficient strain of E. coli, it is easily ation of Exonuclease III (New England Biolabs; Ipswich, possible through SMRT sequencing to detect all of the MA, USA) and Exonuclease VII (Affymetrix; Cleveland, bases modified on the plasmid (15). Precise recognition OH, USA). SMRT sequencing was carried out on the sequences were convincingly demonstrated and mostly PacBioRS (Pacific Biosciences; Menlo Park, CA, USA) matched that of the cognate restriction enzyme when the using standard protocols for small insert SMRTbell MTase was part of an RM system. However, some pro- libraries. Sequencing reads were processed and mapped miscuous methylation was observed, with the Dam gene of to the respective reference sequences using the BLASR E. coli being a particularly striking example. There was mapper (http://www.pacbiodevnet.com/SMRT-Analysis/ one caveat to this interpretation though: because the Algorithms/BLASR) and the Pacific Biosciences’ MTase genes in that study were cloned on a multi-copy SMRTAnalysis pipeline (http://www.pacbiodevnet.com/ number plasmid (50–200 copies per cell), it could be that SMRT-Analysis/Software/SMRT-Pipe) using the the observed promiscuity arose because of over- standard mapping protocol. Interpulse durations were expression. measured as previously described (7) and processed as Given that the results for the plasmids were very clear, it described (15) for all pulses aligned to each position in seemed that it might be possible to perform a direct the reference sequence. To identify modified positions, analysis of bacterial genomes using the SMRTsequencing we used Pacific Biosciences’ SMRTPortal analysis method and thus obtain an accurate estimate of the extent platform, v. 1.3.1, which uses an in silico kinetic reference of methylation in the native organism. By then, comparing and a t-test based kinetic score detection of modified base a bioinformatic analysis of the RM systems with the direct positions (details are available at http://www.pacb.com/ measurement of just what was methylated, it should be pdf/TN_Detecting_DNA_Base_Modifications.pdf). possible to assign recognition sequences to individual MTase target sequence motifs were identified by select- MTase genes. Of particular interest in this sort of ing the top 1000 kinetic hits and subjecting a ±20 base analysis are the Type I and Type III RM systems, which window around the detected base to MEME-ChIP (18). have generally been very difficult to analyze by previous, To measure the extent of methylation for each motif in a more tedious techniques (16). In both of these kinds of genome, a kinetic score threshold was chosen such that systems, the specificity comes from a single subunit of 1% of the detected signals were not assigned to any the enzyme—the S subunit of the Type I enzymes and MTase recognition motifs (5% for B. cereus to accommo- the M subunit of the Type III enzymes (16). Thus, it m4 C). We subjected date for the lower signal intensities for seemed likely that recognition sequences for both types this 1% population of sequence context to another round of MTases could be discovered relatively easily. To dem- of MEME-ChIP analysis to confirm the absence of any onstrate the feasibility of this approach, we chose initially additional consensus motifs. We observed no accumula- to analyze six genomes with relatively few RM systems tion of motifs that harbored similarities to the identified before moving on to more complicated cases. active motifs. All kinetic data files have been deposited in GEO (accession numbers GSE40133) (19) (http://www .ncbi.nlm.nih.gov/geo/summary/). MATERIALS AND METHODS Materials Bioinformatic analysis All restriction endonucleases (REases) except Eco147I The SEQWARE computer resource was used to identify (Fermentas; Glen Burnie, MD, USA), Phusion-HF RM system genes from the complete genome sequences of DNA polymerase, Antarctic Phosphatase, T4-DNA G. metallireducens GS-15 (GenBank numbers CP000148 ligase and E. coli competent cells were from New and CP000149), C. salexigens (GenBank number England Biolabs Inc. (Ipswich, MA, USA). Synthetic CP000285), B. cereus (GenBank numbers AE017194 and oligonucleotides were purchased from Integrated DNA AE017195), C. jejuni subsp. jejuni 81-176 (GenBank Technologies (Coralville, IA, USA). Geobacter numbers CP000538, CP000549 and CP000550), C. jejuni metallireducens GS-15 ATCC 53774 DNA, NCTC 11168 (GenBank number AL111168) and Chromohalobacter salexigens DSM 3043 DNA and V. breoganii 1C-10 (GenBank number AKXW00000000). Bacillus cereus ATCC 10987 DNA were obtained from Software modules combined with internal databases 11452 Nucleic Acids Research, 2012, Vol. 40, No. 22 constitute the SEQWARE resource. New sequence data We analyzed each genome using SEQWARE and made are scanned locally for homologs of already identified predictions about the RM systems that were present and annotated RM systems in REBASE (5). Sequence including REase and MTase genes and recognition se- similarity from BLAST searches, the presence of predict- quences when a gene showed high similarity to a biochem- ive functional motifs (20,21) and genomic context are the ically characterized gene of known recognition sequence. basic indicators of potential new RM system components. These predictions are summarized for all RM system com- Heuristic rules, derived from knowledge about the gene ponents in Supplementary Table S2. Each genome was structure of RM systems, are also applied to refine the then subjected to SMRT sequencing and the methylated hits. Attempts are made to avoid false hits caused by bases identified by their kinetic signatures (7). These were strong sequence similarity of RNA and protein MTases then aligned and clustered to identify the motifs that or hits based solely on non-specific domains of RM constituted the consensus recognition sequences for the enzymes, such as helicase or chromatin remodeling MTases. These experimental results were then matched domains. SEQWARE then localizes motifs and domains, with the bioinformatic predictions. Several factors assigns probable recognition specificities, classifies helped in this matching such as the fact that all known accepted hits and marks Pfam relationships. All candi- Type III MTases and most Type IIG systems only methy- dates are then inspected manually before being assigned late one strand of their recognition sequence. Type I as part of an RM system. The results are entered into systems have bipartite recognition sequences in which REBASE (5). two short motifs (3–5 nt long) are separated by 5–8 non- specific nucleotides. A well-known example is the EcoKI 0 m6 MTase cloning RM system that recognizes 5 -A ACNNNNNNG TGC-3 (28). Methylation takes place as indicated (T in- Selected MTase genes were amplified from bacterial dicates that the A residue on the complementary strand is genomic DNA with Phusion-HF DNA polymerase and m5 methylated). It should be noted that because C gener- cloned into the plasmid pRRS as described previously ates a weak and somewhat diffuse SMRTsignal (7) no (15). Gene-specific oligonucleotide primers used for PCR attempt was made in any of these whole genome are described in Supplementary Table S1. When no m5 analyses to identify the position of C in the complete suitable sites were present elsewhere in the construct, re- genome analyses. Rather, where appropriate these MTase striction sites diagnostic for the predicted methylation genes were cloned and analyzed separately as was done pattern were incorporated into the 3 -end oligonucleo- previously (15). tides. The presence or absence of specific methylation was determined by digesting the constructs with appropri- Geobacter metallireducens GS-15 ate restriction enzymes. Host strains used for cloning included E. coli ER2796 (22) and E. coli ER2683 (23). Geobacter metallireducens strain GS-15, first isolated from The Csa_1401 and Gmet_0255 genes were cloned into freshwater sediment, is capable of reducing iron, manga- the plasmid pRRS using the Gibson assembly technique nese, uranium and other metals and thus represents an (24). The pRRS vector was PCR amplified using primers interesting target for bioremediation of groundwater con- pRRS srbs for and pRRS rev. The MTase genes were taminants (29). The genome sequence of this organism, amplified using primers having 5 tails that overlap with which grows at 30 C, was originally determined by the the ends of the amplified pRRS vector (Supplementary Joint Genome Institute (JGI) (GenBank numbers Table S1). PCR amplified DNAs were purified over a CP000148 and CP000149). Bioinformatic analysis Qiagen spin column. A total of 0.1 pmol vector was indicated that there should be two MTases associated combined with 0.3 pmol MTase gene insert in 20 ml1 with Type II RM systems and one with a Type III Gibson assembly reaction (New England Biolabs) and system (Supplementary Table S2). Two active MTases incubated at 50 C for 1 h. A total of 2 ml of this assembled were detected based on the SMRT sequencing analysis construct was transformed into 50 ml chemical competent (Figure 1; Supplementary Figures S1a and S2a). E. coli ER2796 cells and plated on LB-ampicillin plates at Figure 1a shows kinetic signals for both DNA strands 37 C overnight. for a section of the genome containing three instances of detected regions containing methylated template bases, two of which are limited to one of the two DNA strands RESULTS and the other encompassing methylation on both DNA We analyzed six bacterial strains, all of which had rela- strands. Genome-wide analysis of all template positions tively few predicted RM systems and several of which had (Figure 1b) revealed a population of A bases that clearly some experimental data already available. Three of these separated from the background of all other template pos- strains, G. metallireducens GS-15, C. salexigens and itions. Motif analysis (see ‘Materials and Methods’ V. breoganii 1C-10 had never been tested for active section) resulted in the identification of two MTase m6 m6 0 0 0 0 MTases previously, while three other strains, B. cereus specificities: 5 -G ATCC-3 and 5 -TCC AGG-3 ATCC 10987, C. jejuni subsp. jejuni 81-176 and C. jejuni (Figure 1c). The extent of methylation across the NCTC 11168 were all known to contain several active genome was determined by considering 29 166 positions MTases (25–27). In each case there were Type I or Type detected as methylated, corresponding to >99% of all III RM systems for which no information was available hits matching a motif (Figure 1b; see ‘Materials and about either their activity or recognition sequences. Methods’ section). Greater than 98% of all genomic Nucleic Acids Research, 2012, Vol. 40, No. 22 11453 (a) (b) (c) (d) Figure 1. Methylome determination of G. metallireducens GS-15. (a) Example trace of kinetic variation, showing three instances of methylated sequence regions. (b) Scatter plot of sequencing coverage and kinetic score for all genomic positions. The colors indicate the bases as shown in the upper left of the panel. The cutoff for detected genomic positions is indicated by the dashed line. (c) MTase specificities determined from the genomic positions detected as methylated. They are highlighted as gray boxes in the example trace (a). (d) Summary of detected methylated positions across the genome. positions matching these MTase specificities were detected genes for M.GmeI and GmeIP are separated by an open as methylated (Figure 1d). reading frame encoding a protein of 333 amino acids, Of the two Type II systems, one (Gmet_3140) which is homologous to a protein in the same location in showed great similarity to known MTases recognizing 5 - G. metallireducens RCH3, but has much less similarity to GGATC-3 , including M.EacI (30) and M.AlwI (5). In other proteins in GenBank. However, the next closest all cases, the MTase is itself a fusion of two MTase homolog is a 509 amino acid protein in Syntrophothermus 0 0 domains, one recognizing 5 -GGATC-3 and forming lipocalidus DSM 12680, which also sits next to an MTase m6 0 0 0 5 -GG ATC-3 and the other recognizing the complemen- gene, but one of different recognition specificity (5 -ACCT m6 0 0 0 tary strand and forming 5 -G ATCC-3 . The new MTase GC-3 ). identified here is called M.GmeI and its corresponding The other Type II MTase (Gmet_0255) contained the m5 REase encoded by Gmet_3138 is called GmeIP, since it is typical motifs associated with an C DNA MTase, but its not known if it is active. Interestingly, Gmet_3138 shows recognition sequence could not be predicted as the m5 great similarity to the known restriction enzyme genes EacI variable region showed no great similarity to other C (30) and AlwI (5), but unlike the latter two genes, which are MTases of known specificity. This MTase was cloned and immediately adjacent to their respective MTase genes, the tested for its ability to incorporate H-methyl groups into 11454 Nucleic Acids Research, 2012, Vol. 40, No. 22 DNA using labeled S-adenosylmethionine as substrate, Of the Type II systems, one (Csal_1368) was predicted 0 0 but was found to be inactive. Similarly, no DNA methy- to recognize 5 -GATC-3 since it showed significant simi- 0 0 lation was observed by SMRT sequencing of the plasmid larity to several well-characterized 5 -GATC-3 MTases. containing the cloned gene (data not shown). Either this However, the recognition sequence of the second Type MTase is inactive or it could be an RNA MTase. II MTase (Csal_1401), which appears to be encoded on The Type III MTase (Gmet_0676) clearly recognizes a prophage, could not be predicted. It was suspected that m6 0 0 5 -TCC AGG-3 and modifies the A residue as indicated. this might not be active in the genome as frequently It is named M.GmeII. As with all known Type III prophage-encoded genes are transcriptionally inactive enzymes, only one strand is modified. It too has a corres- until such time as the prophage is excised (32). ponding REase gene as the adjacent ORF (Gmet_0675), The results of whole genome SMRT sequencing analysis but it is not known if it is active. are shown in Figure 2 and demonstrate that the putative During our analysis, we found that there appeared to be GATC MTase is expressed, methylates the adenine m6 a deletion in the genomic DNA we obtained from the residues on both strands to form A, but actually recog- 0 0 ATCC relative to the reference genome, as we observed nizes the more specific sequence, 5 -RGATCY-3 , although no sequencing coverage between positions 2 446 610 and methylation seems not to be complete during normal 2 588 100. This region is flanked by two transposase genes. growth. This MTase is called M.CsaI. The specificity was This deletion has also been observed by Dr Derek Lovley very strict as the number of hits observed for 0 0 0 0 (unpublished data). 5 -NGATCN-3 , but not conforming to 5 -RGATCY-3 , was 0 (Supplementary Figure S3). The Type I system is Chromohalobacter salexigens very well defined and recognizes the usual bipartite sequence pattern recognized by Type I enzymes, but this Chromohalobacter salexigens is a moderate halophile that 0 0 particular recognition sequence 5 -CCAC(N) CTC-3 has is tolerant to various salt environments and allows other not been reported previously (5). As usual for Type I organisms (e.g. Salmonella) to exist in environments they systems, the MTase, M.CsaII, acts on the single adenine would otherwise not be able to cope with. The genome m6 residue in each DNA strand forming A. The putative sequence of this organism, which grows at 37 C, was ori- prophage-encoded MTase appears not to be expressed. ginally determined by the JGI (31). Bioinformatic analysis m6 0 0 That the 5 -RG ATCY-3 signal is due to expression of of the genome indicated that there should be one Type I Csal_1368 and is not a combination of expression of both system and two Type II systems (Supplementary Table Type II ORFs was tested by cloning Csal_1401 separately in S2). The recognition sequence of the Type I system the methylation deficient E. coli strain ER2796 (22). The could not be predicted since the specificity subunit resulting clone showed that the MTase was non-specific (Csal_0086), which determines the recognition sequence, and methylated most, but not all, A residues in the showed no similarity to any well-characterized system. (a) (b)(c) (d) Figure 2. Methylome determination of C. salexigens.(a and b) Example traces of kinetic variation, showing two instances of methylated positions. (c) MTase specificities determined from the genomic positions detected as methylated. (d) Summary of detected methylated positions across the genome. Nucleic Acids Research, 2012, Vol. 40, No. 22 11455 plasmid (Supplementary Figure S4). Motif analysis adjacent S genes of the two systems were cloned as indicated the following specificity rules for this relatively pairs. The S1.VbrIP gene is about half the length of a m6 m6 0 0 0 0 typical S subunit and was not tested for activity. The re- non-specific MTase: 5 - AB-3 and 5 -S AAM-3 sulting plasmids tested for resistance to HindIII and ScaI (>96% of all hits with a kinetic score >100 fell into these to test for methylation by M.VbrI and M.VbrII, respect- motifs; B = not A; S = G or C, M = A or C). ively (Supplementary Figure S5). The partial protection against HindIII is expected for an MTase, M.VbrI, Vibrio breoganii 1C-10 0 m6 0 forming 5 -AGC AAGCTTAATGAC-3 as the resulting Vibrio breoganii is a non-motile, alginolytic, marine bac- hemi-methylated HindIII site does not completely inhibit terium. Strain 1C-10 was isolated from large suspended cleavage (5). In a parallel experiment, methylation by particles (likely macroalgal detritus) during analysis of M.VbrII gave complete protection against ScaI at the 0 m6 0 resource partitioning of Vibrionaceae populations sequence 5 -CT AGTACTCCATA-3 as expected (5). (33,34). Bioinformatic analysis suggested that this These assignments were confirmed by SMRT sequencing strain contained two Type I RM systems and both of the plasmids containing individual MTase-expressing proved to be active, methylating the sequence motifs clones (Supplementary Figure S6). m6 m6 0 0 0 0 5 -AGH A(N) TGAC-3 and 5 -CT AG(N) RTAA-3 , Again from bioinformatic analysis, there were two Type 7 6 respectively (Figure 3; Supplementary Figures S1c and II MTases present. The first, M.VbrDam, was a close S2c). Bioinformatics alone could not resolve which homolog of the M.EcoKDam MTase of E. coli (35) and system recognized which sequence and so the M and indeed the genome was methylated at essentially all (a) (b)(c) (d) (e) Figure 3. Methylome determination of V. breoganii 1C-10. (a–c) Example traces of kinetic variation, showing instances of the detected methylated motifs. (d) MTase specificities determined from the genomic positions detected as methylated. (e) Summary of detected methylated positions across the genome. 11456 Nucleic Acids Research, 2012, Vol. 40, No. 22 GATC sites as predicted (Figure 3). The second MTase Finally, the two Type I systems are both active with one m6 0 0 was enigmatic and while a very weak signal (192 out of the forming 5 -CA AYN ACT-3 and the other forming m4 m6 0 0 5 -TA AYN TGC-3 . Since only the second of these 305 unassigned hits) that could be interpreted as C CA modifications is present in C. jejuni NCTC 11168, it can was found by sequencing, this seemed unlikely to be the be safely concluded that the specificity subunit, recognition sequence since very few genomic positions CJJ81176_1536, which has a close homolog in that harboring this motif had strong kinetic signals. 0 0 strain, recognizes 5 -TAAYN TGC-3 and the specificity Consistent with this hypothesis, no modified sites were 0 0 subunit, CJJ81176_0777, recognizes 5 -CAAYN ACT-3 . detected upon cloning this gene into a plasmid and In both cases, methylation results in the second A residue analysis by SMRT sequencing (data not shown), being modified as shown in Figure 4. indicating that this MTase gene is inactive. The weak CCA signals are more likely the result of phosphor- Campylobacter jejuni NCTC 11168 othioated nucleotides which have been detected in this bacterium by bulk methods [(36); T. A. Clark and J. This strain (37) codes for one Type I RM system and four Korlach, unpublished data]. Type II systems. The Type I system is essentially identical with the CjeFIV system in C. jejuni subsp. jejuni 81-176 m6 0 0 Campylobacter jejuni subsp. jejuni 81-176 and forms 5 -TA AYN TGC-3 (CjeNIV) (Figure 5). Two of the Type II systems, M.CjeNI and RM.CjeNII, Campylobacter jejuni is a Gram-negative bacterium native had previously been characterized [26; J.M.B. Vitor et al., to the digestive tract of poultry and other bird species and is unpublished data (5)]. However, as noted earlier, M.CjeNI one of the most common causes of human gastroenteritis. 0 0 0 recognizes 5 -RAATTY-3 (Figure 5) rather than 5 -GAA The genome sequence of this organism had been deter- TTC-3 as had been reported (26). RM.CjeNII is a Type mined some time ago (D. Fouts and K. Nelson, unpub- 0 0 IIG system and recognizes 5 -GAGN GT-3 and is now lished data; GenBank numbers CP000538, CP000549 and shown to methylate both A residues on the two strands. CP000550). Bioinformatic analysis suggested the presence Another Type II MTase is encoded by Cj0690c and is a of two Type I RM systems and four Type II systems, several m6 0 0 Type IIG enzyme that forms 5 -GKA AYG-3 of which had close homologs in C. jejuni NCTC 11168 methylating the second A residue (Figure 5). This gene (Supplementary Table S2). One gene, CJJ81176_0240, was cloned in E. coli and found to produce active endo- was 99% identical to the characterized gene for M.CjeNI, 0 0 nuclease recognizing 5 -GKAAYG-3 and cutting 19/17 0 0 which was reported to recognize 5 -GAATTC-3 (26). downstream. From the bioinformatic analysis, one add- However, when examining the genomic methylation itional gene, Cj0031, plus the adjacent gene, Cj0032, looks through SMRT sequencing, it was clear that the gene in like a Type IIG enzyme containing a frameshift. The this strain, coding for M.CjeFI, recognized the more degen- m6 complete gene would be 99% identical to the gene for 0 0 erate sequence 5 -RA ATTY-3 (Figure 4); the same 0 0 RM.CjeFV, which recognizes 5 -GGRCA-3 . However, proved true for M.CjeNI (see below and Figure 5). no such modification is found in the genome confirming Another MTase gene, CJJ81176_1454, was extremely that the frameshift is real and that this frameshifted gene similar to a gene in C. jejuni NCTC 11168 that was produces no active MTase. SMRT sequencing data con- 0 0 reported to encode an active 5 -GATC-3 MTase (27). firmed the presence of the frameshift. However, in neither of the two Campylobacter strains was such an active MTase detected. Furthermore, the gene in Bacillus cereus ATCC 10987 question shows more similarity to the RNA MTase RsmD than to other DNA MTases. We conclude that this gene is This bacterium was originally isolated from spoiled cheese not able to methylate DNA and its true activity may require and belongs to the same genetic subgroup as Bacillus further biochemical investigation. Two additional MTase anthracis (38). The RM systems in B. cereus ATCC genes appear to be part of Type IIG RM systems in which 10987 had previously been examined by Xu et al. (25), sequence specificity, methylation and restriction are all who determined recognition sequences for four Type II carried out by the same polypeptide. One recognizes the and III REases and one orphan MTase by traditional 0 0 sequence 5 -GGRCA-3 and modifies the terminal A methods. However, the sites of methylation for the Type residue, while the other recognizes the sequence 5 -GCAA II and III MTases were not determined and several other GG-3 and modifies the second A residue (Figure 4). As MTases were not examined including that in the Type I with many other Type IIG enzymes, only one strand of system (BCE_0839-BCE_0842) and a Type II MTase the DNA is methylated. To decide which gene was which, (BCE_0392) that was reported to be inactive (25). we noticed that CJJ81176_0713 is very similar to Cj0690c in However, when we cloned this MTase and checked its m6 C. jejuni NCTC 11168, which recognizes the related activity, it was clearly a promiscuous A MTase, which 0 0 sequence 5 -GKAAYG-3 (see below). Thus, we assigned we have now named M.BceSVII (Supplementary Figure CJJ81176_0713 as the gene encoding RM.CjeFIII forming S9 and Table 1). m6 0 0 5 -GCA AGG-3 and CJJ81176_0068 as the gene Our main goal was to characterize the Type I system and m6 0 0 encoding RM.CjeFV forming 5 -GGRC A-3 (Figure 5). also ascertain the sites of methylation by the MTases not These assignments were confirmed by cloning the indi- addressed in the previous study. The Type I system, now vidual ORFs and testing the clones for protection from called BceSVI, was clearly active and recognized the m6 0 0 appropriate REases (Supplementary Figure S7). These sequence 5 -TA AGN TGG-3 , where again the under- m6 results are summarized in Table 1. lined T indicates A on the complementary strand Nucleic Acids Research, 2012, Vol. 40, No. 22 11457 (a) (b)(c) (d) (e) (f) (g) Figure 4. Methylome determination of C. jejuni 81-176. (a–e) Example traces of kinetic variation, showing instances of the detected methylated motifs. (f) MTase specificities determined from the genomic positions detected as methylated. (g) Summary of detected methylated positions across the genome. 11458 Nucleic Acids Research, 2012, Vol. 40, No. 22 (a) (b)(c) (d) (e) (f) Figure 5. Methylome determination of C. jejuni NCTC 11168. (a–d) Example traces of kinetic variation, showing instances of the detected methylated motifs. (e) MTase specificities determined from the genomic positions detected as methylated. (f) Summary of detected methylated positions across the genome. (Figure 6; Supplementary Figures S1f and S2f). This system To show which MTase recognizes which strand, we is a little unusual in that, it contains two M subunits. cloned the two MTase genes independently and checked Because we did not clone the individual components of for their ability to protect against appropriate REases this system, we cannot say whether one or both M (Supplementary Figure S8). From this analysis, we can m4 0 0 subunits are active. The sites of modification of the three conclude that M1.BceSIII forms 5 -A CGGC-3 and m4 0 0 other Type II MTases are indicated in Table 1, while the M2.BceSIII forms 5 -G CCGT-3 . It is important to Type III MTase, which had been identified earlier by note that while cloning the individual MTase genes cloning, is shown to be completely active in the genome. showed five to be active only four seem to be active in the The previously identified Type II REase BceSIII recog- genome. M.BceSV, a multi-specific MTase characterized in 0 0 nizes an asymmetric sequence, 5 -ACGGC-3 and requires the previous study by cloning and overexpression (25) is m4 two MTases for protection, both of which are C MTases. encoded on a prophage and does not show detectable m4 m6 0 0 0 These form 5 -A CGGC-3 in the strand shown and 5 - activity in the native host genome. In addition to the A m4 m4 G CCGT-3 in the complementary strand (Figure 6b). and C MTases mentioned earlier, our analysis indicated Nucleic Acids Research, 2012, Vol. 40, No. 22 11459 (a) (b)(c) (d) (e) Figure 6. Methylome determination of B. cereus ATCC 10987. (a–c) Example traces of kinetic variation, showing instances of the detected methylated motifs. (d) MTase specificities determined from the genomic positions detected as methylated. (e) Summary of detected methylated positions across the genome. two more motifs that are likely modified by one or more of bacterial genome. For the MTases studied in this article, m5 the predicted C MTases in the B. cereus genome, as 179 of seven are components of Type I RM systems and have six the 524 unassigned hits fell into two categories. These different recognition sequences, all of which are new. Two m5 m5 0 0 0 0 motifs were 5 -G CWGC-3 and 5 -GGWC C-3 which Type III systems were found with one new recognition are consistent with recognition specificity predictions for sequence. Two MTases were part of traditional Type II BCE_0365 and BCE_4605 (Supplementary Table S2). The systems although we did not test whether the REase was m5 kinetic signals for C are subtle in that with the kinetic active. Four Type IIG REases, which contain both MTase 0 0 score cutoff used, we detect only 138 5 -GCWGC-3 and REase activity in a single polypeptide chain, were 0 0 (out of 15416 in the genome) and 41 5 -GGWCC-3 (out found, all with new specificities. It should be noted that of 5460) sites. We are currently exploring methods of two of these, RM.CjeFIII and RM.CjeNIII, show very m5 enhancing the kinetic signature of C during SMRT high sequence similarity and yet recognize different se- m6 m6 0 0 0 0 sequencing. quences (5 -GCA AGG-3 and 5 -GKA AYG-3 , re- spectively). Thus, this finding represents another family of Type IIG restriction enzymes that resemble the MmeI DISCUSSION family, where a few simple changes in critical base recogni- tion elements cause changes in specificity (39). This again The results presented in this article and summarized in Table 1 represent one of the first times that it has been emphasizes the need for caution when transferring annota- possible to examine the complete methylation pattern of a tion from one characterized protein to another (40). The 11460 Nucleic Acids Research, 2012, Vol. 40, No. 22 Table 1. Bioinformatic predictions and experimental results for all MTase genes Bioinformatic predictions Experimental results ORF # Type Gene Prediction Name Rec. Seq. Geobacter metallireducens GS-15 Gmet_0255 Type II M (5) ? inactive m6 Gmet_3140 Type II M GGATC M.GmeI GG ATC m6 Gmet_0676 Type III M ? M.GmeII TCC AGG Chromohalobacter salexigens m6 Csal_0084 Type I M ? M.CsaII CC ACN CTC m6 Csal_1368 Type II M GATC M.CsaI RG ATCY m6 m6 Csal_1401 Type II M ? M.CsaIII AB+S AAM Vibrio breoganii 1C-10 m6 ORF_51A Type I M ? M.VbrI AGH AN TGAC m6 ORF_9B Type I M ? M.VbrII CT AGN RTAA m6 ORF_50B Type II M GATC M.VbrIII G ATC ORF_5C Type II M ? inactive Campylobacter jejuni 81-176 m6 CJJ81176_0776 Type I M ? M.CjeFII CA AYN ACT m6 CJJ81176_1539 Type I M ? M.CjeFIV TA AYN TGC m6 CJJ81176_0068 Type II RM ? RM.CjeFV GGRC A a m6 CJJ81176_0240 Type II M GAATTC M.CjeFI RA ATTY m6 CJJ81176_0713 Type II RM ? RM.CjeFIII GCA AGG Campylobacter jejuni NCTC 11168 m6 Cj1553c Type I M ? M.CjeNIV TA AYN TGC m6 Cj0208 Type II M GAATTC M.CjeNI RA ATTY m6 Cj0690c Type II RM ? RM.CjeNIII GKA AYG m6 Cj1051c Type II RM GAGN GT RM.CjeNII G AGN GT 5 5 Bacillus cereus ATCC 10987 BCE_0839 Type I M ? M1.BceSVIP ? m6 BCE_0841 Type I M ? M2.BceSVIP TA AGN TTG BCE_0365 Type II M (5) GCAGC M.BceSIV N/D m6 BCE_0392 Type II M ? M.BceSVII promiscuous A BCE_0393 Type II M (5) Many M.BceSV N/D BCE_4605 Type II M (5) GGWCC M.BceSII N/D m4 BCE_5606 Type II M ACGGC M1.BceSIII A CGGC m4 BCE_5607 Type II M ACGGC M2.BceSIII G CCGT m6 m6 BCE_1018 Type III M CGA AG M.BceSI CGA AG Italicized genes characterized previously; red text indicates new information or revision. Recognition sequences representations use the standard abbreviations. (Eur. J. Biochem., 150, 1–5, 1985) to represent ambiguity: R = G or A, Y = C or T, M = A or C, K = G or T, S = G or C, W = A or T, = not A (C or G or T), D = not C (A or G or T), H = not G (A or C or T), V = not T (A or C or G), N = A or C or G or T. m5 N/D = not detected ( C assignments were not attempted). indicates incorrect result obtained previously. 0 0 0 0 0 0 0 0 5 -GGCC-3 /5 -GCNGC-3 /5 -CCGG-3 /5 -GGNCC-3 are all recognized. composition of an amino acid change can be critical with very few off-target events noted. Of course, much if it occurs at a residue belonging to a DNA sequence greater coverage would be required to detect very rare recognition element. Two orphan MTases, M.CsaIII and off-site effects and so some degree of promiscuity cannot M.BceSVII, were found to be active when cloned, but be ruled out. However, the apparent promiscuity that was m6 inactive in the genome. Both are promiscuous A observed in our earlier work (15) using MTase genes MTases and both occur on prophage elements suggesting cloned in high copy number plasmids was not apparent. that they may play a protective role during phage infection. We consider the ‘true’ MTases specificity to be reflected in 0 0 Finally, two solitary 5 -GATC-3 MTases were shown to be the modification patterns seen when they are expressed in active. It should be noted that when examining complete their genomic context. Thus, based on the current genome sequences for MTases, some of the genes may be findings, we would have to conclude that in general it inactive because of mutation, while others may be inactive seems likely that most MTases show essentially identical due to transcriptional silencing as is often found when the specificity to their cognate REases, a result that was not genes are present as part of a prophage. In the latter case completely expected since there are no obvious constraints cloning can reveal methylation activity, permitting on their specificity. complete characterization as found earlier (15). Previously, it had been found that Type III MTases One of the striking features of the results from the only methylate a single strand of their recognition current analysis is that the recognition sequences of all sequence and that holds true here. Similarly, most MTases found to be active showed fairly strict specificity characterized Type IIG enzymes methylate just a single Nucleic Acids Research, 2012, Vol. 40, No. 22 11461 strand although several do not, including RM.CjeNII as SUPPLEMENTARY DATA described here. Nevertheless, this can be very helpful when Supplementary Data are available at NAR Online: trying to match recognition sequences found by Supplementary Tables 1 and 2 and Supplementary sequencing with the genes responsible for each consensus Figures 1–9. sequence. Another useful feature is that all known Type I restriction systems seem to possess split recognition se- quences, which can help in distinguishing them when ACKNOWLEDGEMENTS matching genes and consensus sequences. Nevertheless, We thank Martin Polz for the gift of V. breoganii 1C-10 if two Type I systems are present as in V. breoganii DNA and providing its DNA sequence prior to publica- 1C-10, it was essential to clone out the individual tion, and to Stuart Thompson for the gift of C. jejuni systems so that specificity and genes could be properly subsp. jejuni 81-176 and C. jejuni NCTC 11168 DNAs. matched. Note that because of the mechanism of methy- Dr Derek Lovley graciously informed us of his results lation it is only the M and S subunits that need to be with G. metallireducens prior to publication. We would cloned to permit assembly of a functional MTase (16). also like to thank P. Marks and K. Spittle for data In the case of the Type II RM system BceSIII, because analysis help and sample preparation advice, respectively. of the asymmetric nature of the recognition sequence, two We appreciate the careful reading of the manuscript and independent MTases are required to methylate each useful comments by Dr Geoff Wilson. strand of the sequence. While SMRT sequencing can easily find the locations of each methyl group, it was ne- cessary to clone out the two MTase genes separately in FUNDING order to assign strand specificity to each one. M.GmeI New England Biolabs; Pacific BioSciences and NIH grants also recognizes an asymmetric sequence, but in this case, [1RC2GM092602 to J.K. and R44 GM100560 to R.J.R.]. the two M genes are fused. At the present time, we have Funding for open access charge: New England Biolabs’ relatively little information about strand specificity of internal funds. MTases, because it has proven difficult to determine spe- cificity experimentally. As more data accumulate using the Conflict of interest statement. T.A.C., M.B., K.L., S.T. and kinds of analyses that we present here, it should become J.K. are full-time employees at Pacific Biosciences, a much easier in the future to make accurate bioinformatic company commercializing SMRT sequencing tech- predictions about recognition sequences and specificity for nologies. I.A.M., R.D.M., A.F., B.P.A. and R.J.R. are MTases in newly sequenced genomes. full-time employees of New England Biolabs, a company Despite the recognized importance of methylation for that sells research reagents such as DNA MTases. understanding fundamental microbiological processes, microbe adaptability and disease pathogenicity (11,12), in the past, there has not been a great deal of research into the REFERENCES methylation patterns of bacterial genomes, largely because 1. Kumar,S., Cheng,X., Klimasauskas,S., Mi,S., Posfai,J., of the difficulty of obtaining suitable data. One area where Roberts,R.J. and Wilson,G.G. (1994) The DNA (cytosine-5) knowledge about the methylome is very important relates methyltransferases. Nucleic Acids Res., 22, 1–10. to studies trying to transform DNA into strains that con- 2. Tahiliani,M., Koh,K.P., Shen,Y., Pastor,W.A., Bandukwala,H., tain one or more RM systems and which vastly reduce Brudno,Y., Agarwal,S., Iyer,L.M., Liu,D.R., Aravind,L. et al. (2009) Conversion of 5-methylcytosine to 5-hydroxymethylcytosine transformation efficiencies. In some cases, these barriers in mammalian DNA by MLL partner TETI. Science, 324, have been overcome by premethylating the DNA or by 930–935. removing the RM systems from strains (41,42). One 3. Kriaucionis,S. and Heintz,N. (2009) The nuclear DNA base problem with the latter approach is that removal of methy- 5-hydroxymethylcytosine is present in Purkinje neurons and the brain. Science, 324, 929–930. lation systems may fundamentally change the biology of 4. Ito,S., Shen,L., Dai,Q., Wu,S.C., Collins,L.B., Swenberg,J.A., the organism under study. With the kind of analysis He,C. and Zhang,Y. (2011) Tet proteins can convert provided here, the RM systems likely to cause problems 5-methylcytosine to 5-formylcytosine and 5-carboxylcytosine. with transformation can be easily spotted and appropriate Science, 333, 1300–1303. 5. Roberts,R.J., Vincze,T., Posfai,J. and Macelis,D. (2010) measures taken. Thus, the MTases necessary for protection REBASE—a database for DNA restriction and modification: can be identified and if needed intermediate cloning hosts enzymes, genes and genomes. Nucleic Acids Res., 38, D234–D236. carrying suitable complements of MTase genes can be 6. Eid,J., Fehr,A., Gray,J., Luong,K., Lyle,J., Otto,G., Peluso,P., prepared. Rank,D., Baybayan,P., Bettman,B. et al. (2009) Real-time DNA sequencing from single polymerase molecules. Science, 323, In summary, the results provided here show that SMRT 133–138. sequencing can provide functional information about active 7. Flusberg,B.A., Webster,D.R., Lee,J.H., Travers,K.J., MTases present in genomes and can decipher their recogni- Olivares,E.C., Clark,T.A., Korlach,J. and Turner,S.W. (2010) tion sequences, a task that used to be time-consuming to a Direct detection of DNA methylation during single-molecule, point where it was not usually carried out. This, combined real-time sequencing. Nat. Methods, 7, 461–465. 8. Korlach,J. and Turner,S.W. (2012) Going beyond five bases in with the long reads provided by this technology can be an DNA sequencing. Curr. Opin. Struct. Biol., 22, 251–261. excellent adjunct to current high-throughput sequencing 9. Roberts,R.J. and Halford,S.E. (1993) Type II restriction enzymes. platforms, in that sequence assembly is facilitated and In: Linn,S.M., Lloyd,R.S. and Roberts,R.J. (eds), Nucleases. Cold gene function is reliably documented. Spring Harbor Laboratory Press, Cold Spring Harbor, pp. 35–88. 11462 Nucleic Acids Research, 2012, Vol. 40, No. 22 10. Marinus,M.G. and Casadesus,J. (2009) Roles of DNA adenine in regulating virulence characteristics. J. Bacteriol., 190, methylation in host-pathogen interactions: mismatch repair, 6524–6529. transcriptional regulation, and more. FEMS Microbiol. Rev., 33, 28. Kan,N.C., Lautenberger,J.A., Edgell,M.H. and 488–503. Hutchison,C.A. III (1979) The nucleotide sequence recognized by 11. Srikhanta,Y.N., Fox,K.L. and Jennings,M.P. (2010) The the Escherichia coli K12 restriction and modification enzymes. J. phasevarion: phase variation of type III DNA methyltransferases Mol. Biol., 130, 191–209. controls coordinated switching in multiple genes. Nat. Rev. 29. Lovley,D.R. and Phillips,E.J.P. (1988) Novel mode of microbial Microbiol., 8, 196–206. energy-metabolism - organic-carbon oxidation coupled to 12. Casadesu´ s,J. and Low,D. (2006) Epigenetic gene regulation in the dissimilatory reduction of iron or manganese. (1988). Appl. bacterial world. Microbiol. Mol. Biol. Rev., 70, 830–856. Environ. Microbiol., 54, 1472–1480. 13. Bhagwat,A.S. and McClelland,M. (1992) DNA mismatch 30. Graentzdoerffer,A., Lindenstrauss,U., Pich,A. and Andreesen,J.R. correction by very short patch repair may have altered the (2002) New DNA-methyltransferase M.EacI, useful for protecting abundance of oligonucleotides in the E. coli genome. Nucleic double stranded DNA against cleavage by restriction enzymes, Acids Res., 20, 1663–1668. derived from Eubacterium acidaminophilum, German Patent Office 14. Reisenauer,A., Kahng,L.S., McCollum,S. and Shapiro,L. (1999) DE 10060526. Bacterial DNA methylation: a cell cycle regulator? J. Bacteriol., 31. Copeland,A., Lucas,S., Copeland,A., Lucas,S., Lapidus,A., 181, 5135–5139. Barry,K., Detter,J.C., Glavina del Rio,T., Hammon,N., Israni,S. 15. Clark,T.A., Murray,I.A., Morgan,R.D., Kislyuk,A.O., et al. (2011) Complete genome sequence of the halophilic and Spittle,K.E., Boitano,M., Fomenkov,A., Roberts,R.J. and highly halotolerant Chromohalobacter salexigens type strain Korlach,J. (2012) Characterization of DNA methyltransferase (1H11T). Stand. Genomic Sci., 5, 379–388. specificities using single-molecule, real-time DNA sequencing. 32. Ventura,M., Canchaya,C., Bernini,V., Altermann,E., Nucleic Acids Res., 40, e29. Barrangou,R., McGrath,S., Claesson,M.J., Li,Y., Leahy,S., 16. Bickle,T.A. (1993) The ATP-dependent restriction enzymes. Walker,C.D. et al. (2006) Comparative genomics and In: Linn,S.M., Lloyd,R.S. and Roberts,R.J. (eds), Nucleases. Cold transcriptional analysis of prophages identified in the genomes of Spring Harbor Laboratory Press, Cold Spring Harbor, Lactobacillus gasseri, Lactobacillus salivarius and Lactobacillus pp. 89–109. casei. Appl. Environ. Microbiol., 72, 3130–3146. 17. Travers,K.J., Chin,C.S., Rank,D.R., Eid,J.S. and Turner,S.W. 33. Hunt,D.E., David,L.A., Gevers,D., Preheim,S.P., Alm,E.J. and (2010) A flexible and efficient template format for circular Polz,M.F. (2008) Resource partitioning and sympatric consensus sequencing and SNP detection. Nucleic Acids Res., 38, differentiation among closely related bacterioplankton. Science, e159. 320, 1081–1085. 18. Machanick,P. and Bailey,T.L. (2011) MEME-ChIP: motif analysis 34. Preheim,S.P., Timberlake,S. and Polz,M.F. (2011) Merging of large DNA datasets. Bioinformatics, 27, 1696–1697. taxonomy with ecological population prediction in a case study of 19. Sayers,E.W., Barrett,T., Benson,D.A., Bolton,E., Bryant,S.H., Vibrionaceae. Appl. Environ. Microbiol., 77, 7195–7206. Canese,K., Chetvernin,V., Church,D.M., Dicuccio,M., Federhen,S. 35. Brooks,J.E., Blumenthal,R.M. and Gingeras,T.R. (1983) The et al. (2012) Database resources of the National Center for isolation and characterization of the Escherichia coli DNA Biotechnology Information. Nucleic Acids Res., 40, D13–D25. adenine methylase (dam) gene. Nucleic Acids Res., 11, 837–851. 20. Posfai,J., Bhagwat,A.S., Posfai,G. and Roberts,R.J. (1989) 36. Wang,L., Chen,S. and Deng,Z. (2012) Phosphorothioation: an Predictive motifs derived from cytosine methyltransferases. Nucleic unusual post-replicative modification on the dna backbone. Acids Res., 17, 2421–2435. In: Seligmann,H. (ed.), DNA Replication-Current Advances, 21. Klimasauskas,S., Timinskas,A., Menkevicius,S., Butkiene,D., New York: InTech, Chapter 3. pp. 57–74. Butkus,V. and Janulaitis,A. (1989) Sequence motifs characteristic 37. Parkhill,J., Wren,B.W., Mungall,K., Ketley,J.M., Churcher,C., of DNA[cytosine-N4]methylases: similarity to adenine and Basham,D., Chillingworth,T., Davies,R.M., Feltwell,T. and cytosine-C5 DNA-methylases. Nucleic Acids Res., 17, 9823–9832. Holroyd,S. (2000) The genome sequence of the food-borne 22. Kong,H., Lin,L.F., Porter,N., Stickel,S., Byrd,D., Posfai,J. and pathogen Campylobacter jejuni reveals hypervariable sequences. Roberts,R.J. (2000) Functional analysis of putative Nature, 403, 665–668. restriction-modification system genes in the Helicobacter pylori 38. Rasko,D.A., Ravel,J., Økstad,O.A., Helgason,E., Cer,R.Z., J99 genome. Nucleic Acids Res., 28, 3216–3223. Jiang,L., Shores,K.A., Fouts,D.E., Tourasse,N.J. and 23. Sibley,M.H. and Raleigh,E.A. (2004) Cassette-like variation of Angiuoli,S.V. (2004) The genome sequence of Bacillus cereus restriction enzyme genes in Escherichia coli C and relatives. ATCC 10987 reveals metabolic adaptations and a large plasmid Nucleic Acids Res., 32, 522–534. related to Bacillus anthracis pXO1. Nucleic Acids Res., 32, 24. Gibson,D.G., Young,L., Chuang,R.Y., Venter,J.C., 977–988. Hutchison,C.A. III and Smith,H.O. (2009) Enzymatic assembly of 39. Morgan,R.D. and Luyten,Y.A. (2009) Rational engineering of DNA molecules up to several hundred kilobases. Nat. Methods, type II restriction endonuclease DNA binding and cleavage 6, 343–345. specificity. Nucleic Acids Res., 37, 5222–5233. 25. Xu,S.Y., Nugent,R.L., Kasamkattil,J., Fomenkov,A., Gupta,Y., 40. Roberts,R.J., Chang,Y.C., Hu,Z., Rachlin,J.N., Anton,B.P., Aggarwal,A., Wang,X., Li,Z., Zheng,Y. and Morgan,R. (2012) Pokrzywa,R.M., Choi,H.P., Faller,L.L., Guleria,J. and Characterization of Type II and III restriction-modification Housman,G. (2011) COMBREX: a project to accelerate the systems from Bacillus cereus strains ATCC10987 and functional annotation of prokaryotic genomes. Nucleic Acids Res., ATCC14579. J. Bacteriol., 194, 49–60. 39, D11–D14. 26. Takata,T., Wassenaar,T.M., Xu,Q. and Blaser,M.J. (2002) The 41. Donahue,J.P., Israel,D.A., Peek,R.M., Blaser,M.J. and gene product of Campylobacter jejuni gene Cj0208 is a DNA Miller,G.G. (2000) Overcoming the restriction barrier to plasmid methyltransferase with specificity for GAATTC. Abstr. Gen. Meet. transformation of Helicobacter pylori. Mol. Microbiol., 37, Am. Soc. Microbiol., 102, 164. 1066–1074. 27. Kim,J.S., Li,J.Q., Barnes,I.H.A., Baltzegar,D.A., Pajaniappan,M., 42. Dong,H., Zhang,Y., Dai,Z. and Li,Y. (2010) Engineering Cullen,T.W., Trent,M.S., Burns,C.M. and Thompson,S.A. (2008) Clostridium strain to accept unmethylated DNA. PLoS One, 5, Role of the Campylobacter jejuni cj1461 DNA methyltransferase e9038. http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Nucleic Acids Research Oxford University Press

Loading next page...
 
/lp/oxford-university-press/the-methylomes-of-six-bacteria-n4lmGaZ0Yf

References (49)

Publisher
Oxford University Press
Copyright
The Author(s) 2012. Published by Oxford University Press.
ISSN
0305-1048
eISSN
1362-4962
DOI
10.1093/nar/gks891
pmid
23034806
Publisher site
See Article on Publisher Site

Abstract

11450–11462 Nucleic Acids Research, 2012, Vol. 40, No. 22 Published online 2 October 2012 doi:10.1093/nar/gks891 1 2 1 2 Iain A. Murray , Tyson A. Clark , Richard D. Morgan , Matthew Boitano , 1 2 1 2 Brian P. Anton , Khai Luong , Alexey Fomenkov , Stephen W. Turner , 2, 1, Jonas Korlach * and Richard J. Roberts * 1 2 New England Biolabs, 240 County Road, Ipswich, MA 01938 and Pacific Biosciences, 1380 Willow Road, Menlo Park, CA 94025, USA Received August 1, 2012; Revised August 31, 2012; Accepted September 3, 2012 ABSTRACT INTRODUCTION We are becoming accustomed to the ever-increasing speed Six bacterial genomes, Geobacter metallireducens and reduced cost with which DNA can be sequenced. GS-15, Chromohalobacter salexigens, Vibrio However, what is often lost in this frenzy of sequencing breoganii 1C-10, Bacillus cereus ATCC 10987, is the fact that DNA consists of more than just four bases. Campylobacter jejuni subsp. jejuni 81-176 and In eukaryotes, we have known for a long time about the C. jejuni NCTC 11168, all of which had previously m5 epigenetic role of 5-methylcytosine ( C), sometimes been sequenced using other platforms were called the fifth base, and more recently it has been found re-sequenced using single-molecule, real-time that 5-hydroxymethylcytosine, 5-formylcytosine and (SMRT) sequencing specifically to analyze their 5-carboxylcytosine are also present (1–4). However, two 6 m6 methylomes. In every case a number of new more modified bases, N -methyladenine ( A) and 6 m6 4 4 m4 N -methyladenine ( A) and N -methylcytosine N -methylcytosine ( C), are also common in bacterial m4 genomes, where they function as components of restric- ( C) methylation patterns were discovered and tion–modification (RM) systems (5). Until recently, these the DNA methyltransferases (MTases) responsible have usually been ignored because of the lack of simple for those methylation patterns were assigned. In methods to determine their locations. However, with the 15 cases, it was possible to match MTase genes advent of single-molecule, real-time (SMRT) sequencing with MTase recognition sequences without further (6–8), it has suddenly become possible to detect these sub-cloning. Two Type I restriction systems modified bases as a part of the routine sequencing required sub-cloning to differentiate their recogni- procedure. tion sequences, while four MTase genes that were The methylated bases that are found in bacterial and not expressed in the native organism were archaeal genomes serve important functions as part of sub-cloned to test for viability and recognition RM systems, where they protect the host chromosome sequences. Two of these proved active. No against the otherwise deleterious action of the partner restriction enzyme(s), which are needed to destroy attempt was made to detect 5-methylcytosine m5 unwanted incoming transmissible DNA elements such as ( C) recognition motifs from the SMRT phages (9). However, in some cases these methyl- sequencing data because this modification transferases (MTases) also serve regulatory roles as with produces weaker signals using current methods. m6 m6 m4 the Dam MTase of Escherichia coli, which introduces A However, all predicted A and C MTases were residues that play a key role in DNA repair and also have detected unambiguously. This study shows that important effects during the initiation of replication (10). the addition of SMRT sequencing to traditional Several studies have also implicated MTases in regulating sequencing approaches gives a wealth of useful gene expression, phase variation and pathogenicity functional information about a genome showing (11,12). Given the many DNA MTases that are typically not only which MTase genes are active but also re- found in prokaryotic genomes, it seems likely that they vealing their recognition sequences. will have hitherto undocumented effects aside from their *To whom correspondence should be addressed. Tel: +978 380 7405; Fax: +978 380 7406; Email: [email protected] Correspondence may also be addressed to Jonas Korlach. Tel: +650 521 8006; Fax: +650 323 9420; Email: jkorlach@pacificbiosciences.com The Author(s) 2012. Published by Oxford University Press. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0/), which permits unrestricted, distribution, and reproduction in any medium, provided the original work is properly cited. Nucleic Acids Research, 2012, Vol. 40, No. 22 11451 key role in RM systems. To date, there has been no the culture collections indicated. Vibrio breoganii 1C-10 genome-wide assessment of the extent of DNA methyla- DNA was a gift from Martin Polz, MIT. Campylobacter tion by known MTases such as E. coli Dam (10) and Dcm jejuni subsp. jejuni 81-176 and C. jejuni NCTC 11168 (13) or the cell cycle MTase, CcrM, of Caulobacter DNAs were a gift from Stuart Thompson, Medical crescentus (14). It is not known if their methylation College of Georgia. specificities are as precise as the customary recognition sequences suggest or whether the enzymes are promiscu- SMRT sequencing ous. This is particularly interesting to know for RM SMRTbell template libraries were prepared as previously systems as there are no obvious selective constraints on described (15,17). Briefly, genomic DNA samples were MTase specificity provided that the core recognition sheared to an average size of 800 bp via adaptive sequence of the restriction enzyme is fully modified. focused acoustics (Covaris; Woburn, MA, USA), end re- Recently, we have shown that by cloning an individual paired and ligated to hairpin adapters. Incompletely MTase gene into a plasmid and propagating it in an other- formed SMRTbell templates were digested with a combin- wise methylation-deficient strain of E. coli, it is easily ation of Exonuclease III (New England Biolabs; Ipswich, possible through SMRT sequencing to detect all of the MA, USA) and Exonuclease VII (Affymetrix; Cleveland, bases modified on the plasmid (15). Precise recognition OH, USA). SMRT sequencing was carried out on the sequences were convincingly demonstrated and mostly PacBioRS (Pacific Biosciences; Menlo Park, CA, USA) matched that of the cognate restriction enzyme when the using standard protocols for small insert SMRTbell MTase was part of an RM system. However, some pro- libraries. Sequencing reads were processed and mapped miscuous methylation was observed, with the Dam gene of to the respective reference sequences using the BLASR E. coli being a particularly striking example. There was mapper (http://www.pacbiodevnet.com/SMRT-Analysis/ one caveat to this interpretation though: because the Algorithms/BLASR) and the Pacific Biosciences’ MTase genes in that study were cloned on a multi-copy SMRTAnalysis pipeline (http://www.pacbiodevnet.com/ number plasmid (50–200 copies per cell), it could be that SMRT-Analysis/Software/SMRT-Pipe) using the the observed promiscuity arose because of over- standard mapping protocol. Interpulse durations were expression. measured as previously described (7) and processed as Given that the results for the plasmids were very clear, it described (15) for all pulses aligned to each position in seemed that it might be possible to perform a direct the reference sequence. To identify modified positions, analysis of bacterial genomes using the SMRTsequencing we used Pacific Biosciences’ SMRTPortal analysis method and thus obtain an accurate estimate of the extent platform, v. 1.3.1, which uses an in silico kinetic reference of methylation in the native organism. By then, comparing and a t-test based kinetic score detection of modified base a bioinformatic analysis of the RM systems with the direct positions (details are available at http://www.pacb.com/ measurement of just what was methylated, it should be pdf/TN_Detecting_DNA_Base_Modifications.pdf). possible to assign recognition sequences to individual MTase target sequence motifs were identified by select- MTase genes. Of particular interest in this sort of ing the top 1000 kinetic hits and subjecting a ±20 base analysis are the Type I and Type III RM systems, which window around the detected base to MEME-ChIP (18). have generally been very difficult to analyze by previous, To measure the extent of methylation for each motif in a more tedious techniques (16). In both of these kinds of genome, a kinetic score threshold was chosen such that systems, the specificity comes from a single subunit of 1% of the detected signals were not assigned to any the enzyme—the S subunit of the Type I enzymes and MTase recognition motifs (5% for B. cereus to accommo- the M subunit of the Type III enzymes (16). Thus, it m4 C). We subjected date for the lower signal intensities for seemed likely that recognition sequences for both types this 1% population of sequence context to another round of MTases could be discovered relatively easily. To dem- of MEME-ChIP analysis to confirm the absence of any onstrate the feasibility of this approach, we chose initially additional consensus motifs. We observed no accumula- to analyze six genomes with relatively few RM systems tion of motifs that harbored similarities to the identified before moving on to more complicated cases. active motifs. All kinetic data files have been deposited in GEO (accession numbers GSE40133) (19) (http://www .ncbi.nlm.nih.gov/geo/summary/). MATERIALS AND METHODS Materials Bioinformatic analysis All restriction endonucleases (REases) except Eco147I The SEQWARE computer resource was used to identify (Fermentas; Glen Burnie, MD, USA), Phusion-HF RM system genes from the complete genome sequences of DNA polymerase, Antarctic Phosphatase, T4-DNA G. metallireducens GS-15 (GenBank numbers CP000148 ligase and E. coli competent cells were from New and CP000149), C. salexigens (GenBank number England Biolabs Inc. (Ipswich, MA, USA). Synthetic CP000285), B. cereus (GenBank numbers AE017194 and oligonucleotides were purchased from Integrated DNA AE017195), C. jejuni subsp. jejuni 81-176 (GenBank Technologies (Coralville, IA, USA). Geobacter numbers CP000538, CP000549 and CP000550), C. jejuni metallireducens GS-15 ATCC 53774 DNA, NCTC 11168 (GenBank number AL111168) and Chromohalobacter salexigens DSM 3043 DNA and V. breoganii 1C-10 (GenBank number AKXW00000000). Bacillus cereus ATCC 10987 DNA were obtained from Software modules combined with internal databases 11452 Nucleic Acids Research, 2012, Vol. 40, No. 22 constitute the SEQWARE resource. New sequence data We analyzed each genome using SEQWARE and made are scanned locally for homologs of already identified predictions about the RM systems that were present and annotated RM systems in REBASE (5). Sequence including REase and MTase genes and recognition se- similarity from BLAST searches, the presence of predict- quences when a gene showed high similarity to a biochem- ive functional motifs (20,21) and genomic context are the ically characterized gene of known recognition sequence. basic indicators of potential new RM system components. These predictions are summarized for all RM system com- Heuristic rules, derived from knowledge about the gene ponents in Supplementary Table S2. Each genome was structure of RM systems, are also applied to refine the then subjected to SMRT sequencing and the methylated hits. Attempts are made to avoid false hits caused by bases identified by their kinetic signatures (7). These were strong sequence similarity of RNA and protein MTases then aligned and clustered to identify the motifs that or hits based solely on non-specific domains of RM constituted the consensus recognition sequences for the enzymes, such as helicase or chromatin remodeling MTases. These experimental results were then matched domains. SEQWARE then localizes motifs and domains, with the bioinformatic predictions. Several factors assigns probable recognition specificities, classifies helped in this matching such as the fact that all known accepted hits and marks Pfam relationships. All candi- Type III MTases and most Type IIG systems only methy- dates are then inspected manually before being assigned late one strand of their recognition sequence. Type I as part of an RM system. The results are entered into systems have bipartite recognition sequences in which REBASE (5). two short motifs (3–5 nt long) are separated by 5–8 non- specific nucleotides. A well-known example is the EcoKI 0 m6 MTase cloning RM system that recognizes 5 -A ACNNNNNNG TGC-3 (28). Methylation takes place as indicated (T in- Selected MTase genes were amplified from bacterial dicates that the A residue on the complementary strand is genomic DNA with Phusion-HF DNA polymerase and m5 methylated). It should be noted that because C gener- cloned into the plasmid pRRS as described previously ates a weak and somewhat diffuse SMRTsignal (7) no (15). Gene-specific oligonucleotide primers used for PCR attempt was made in any of these whole genome are described in Supplementary Table S1. When no m5 analyses to identify the position of C in the complete suitable sites were present elsewhere in the construct, re- genome analyses. Rather, where appropriate these MTase striction sites diagnostic for the predicted methylation genes were cloned and analyzed separately as was done pattern were incorporated into the 3 -end oligonucleo- previously (15). tides. The presence or absence of specific methylation was determined by digesting the constructs with appropri- Geobacter metallireducens GS-15 ate restriction enzymes. Host strains used for cloning included E. coli ER2796 (22) and E. coli ER2683 (23). Geobacter metallireducens strain GS-15, first isolated from The Csa_1401 and Gmet_0255 genes were cloned into freshwater sediment, is capable of reducing iron, manga- the plasmid pRRS using the Gibson assembly technique nese, uranium and other metals and thus represents an (24). The pRRS vector was PCR amplified using primers interesting target for bioremediation of groundwater con- pRRS srbs for and pRRS rev. The MTase genes were taminants (29). The genome sequence of this organism, amplified using primers having 5 tails that overlap with which grows at 30 C, was originally determined by the the ends of the amplified pRRS vector (Supplementary Joint Genome Institute (JGI) (GenBank numbers Table S1). PCR amplified DNAs were purified over a CP000148 and CP000149). Bioinformatic analysis Qiagen spin column. A total of 0.1 pmol vector was indicated that there should be two MTases associated combined with 0.3 pmol MTase gene insert in 20 ml1 with Type II RM systems and one with a Type III Gibson assembly reaction (New England Biolabs) and system (Supplementary Table S2). Two active MTases incubated at 50 C for 1 h. A total of 2 ml of this assembled were detected based on the SMRT sequencing analysis construct was transformed into 50 ml chemical competent (Figure 1; Supplementary Figures S1a and S2a). E. coli ER2796 cells and plated on LB-ampicillin plates at Figure 1a shows kinetic signals for both DNA strands 37 C overnight. for a section of the genome containing three instances of detected regions containing methylated template bases, two of which are limited to one of the two DNA strands RESULTS and the other encompassing methylation on both DNA We analyzed six bacterial strains, all of which had rela- strands. Genome-wide analysis of all template positions tively few predicted RM systems and several of which had (Figure 1b) revealed a population of A bases that clearly some experimental data already available. Three of these separated from the background of all other template pos- strains, G. metallireducens GS-15, C. salexigens and itions. Motif analysis (see ‘Materials and Methods’ V. breoganii 1C-10 had never been tested for active section) resulted in the identification of two MTase m6 m6 0 0 0 0 MTases previously, while three other strains, B. cereus specificities: 5 -G ATCC-3 and 5 -TCC AGG-3 ATCC 10987, C. jejuni subsp. jejuni 81-176 and C. jejuni (Figure 1c). The extent of methylation across the NCTC 11168 were all known to contain several active genome was determined by considering 29 166 positions MTases (25–27). In each case there were Type I or Type detected as methylated, corresponding to >99% of all III RM systems for which no information was available hits matching a motif (Figure 1b; see ‘Materials and about either their activity or recognition sequences. Methods’ section). Greater than 98% of all genomic Nucleic Acids Research, 2012, Vol. 40, No. 22 11453 (a) (b) (c) (d) Figure 1. Methylome determination of G. metallireducens GS-15. (a) Example trace of kinetic variation, showing three instances of methylated sequence regions. (b) Scatter plot of sequencing coverage and kinetic score for all genomic positions. The colors indicate the bases as shown in the upper left of the panel. The cutoff for detected genomic positions is indicated by the dashed line. (c) MTase specificities determined from the genomic positions detected as methylated. They are highlighted as gray boxes in the example trace (a). (d) Summary of detected methylated positions across the genome. positions matching these MTase specificities were detected genes for M.GmeI and GmeIP are separated by an open as methylated (Figure 1d). reading frame encoding a protein of 333 amino acids, Of the two Type II systems, one (Gmet_3140) which is homologous to a protein in the same location in showed great similarity to known MTases recognizing 5 - G. metallireducens RCH3, but has much less similarity to GGATC-3 , including M.EacI (30) and M.AlwI (5). In other proteins in GenBank. However, the next closest all cases, the MTase is itself a fusion of two MTase homolog is a 509 amino acid protein in Syntrophothermus 0 0 domains, one recognizing 5 -GGATC-3 and forming lipocalidus DSM 12680, which also sits next to an MTase m6 0 0 0 5 -GG ATC-3 and the other recognizing the complemen- gene, but one of different recognition specificity (5 -ACCT m6 0 0 0 tary strand and forming 5 -G ATCC-3 . The new MTase GC-3 ). identified here is called M.GmeI and its corresponding The other Type II MTase (Gmet_0255) contained the m5 REase encoded by Gmet_3138 is called GmeIP, since it is typical motifs associated with an C DNA MTase, but its not known if it is active. Interestingly, Gmet_3138 shows recognition sequence could not be predicted as the m5 great similarity to the known restriction enzyme genes EacI variable region showed no great similarity to other C (30) and AlwI (5), but unlike the latter two genes, which are MTases of known specificity. This MTase was cloned and immediately adjacent to their respective MTase genes, the tested for its ability to incorporate H-methyl groups into 11454 Nucleic Acids Research, 2012, Vol. 40, No. 22 DNA using labeled S-adenosylmethionine as substrate, Of the Type II systems, one (Csal_1368) was predicted 0 0 but was found to be inactive. Similarly, no DNA methy- to recognize 5 -GATC-3 since it showed significant simi- 0 0 lation was observed by SMRT sequencing of the plasmid larity to several well-characterized 5 -GATC-3 MTases. containing the cloned gene (data not shown). Either this However, the recognition sequence of the second Type MTase is inactive or it could be an RNA MTase. II MTase (Csal_1401), which appears to be encoded on The Type III MTase (Gmet_0676) clearly recognizes a prophage, could not be predicted. It was suspected that m6 0 0 5 -TCC AGG-3 and modifies the A residue as indicated. this might not be active in the genome as frequently It is named M.GmeII. As with all known Type III prophage-encoded genes are transcriptionally inactive enzymes, only one strand is modified. It too has a corres- until such time as the prophage is excised (32). ponding REase gene as the adjacent ORF (Gmet_0675), The results of whole genome SMRT sequencing analysis but it is not known if it is active. are shown in Figure 2 and demonstrate that the putative During our analysis, we found that there appeared to be GATC MTase is expressed, methylates the adenine m6 a deletion in the genomic DNA we obtained from the residues on both strands to form A, but actually recog- 0 0 ATCC relative to the reference genome, as we observed nizes the more specific sequence, 5 -RGATCY-3 , although no sequencing coverage between positions 2 446 610 and methylation seems not to be complete during normal 2 588 100. This region is flanked by two transposase genes. growth. This MTase is called M.CsaI. The specificity was This deletion has also been observed by Dr Derek Lovley very strict as the number of hits observed for 0 0 0 0 (unpublished data). 5 -NGATCN-3 , but not conforming to 5 -RGATCY-3 , was 0 (Supplementary Figure S3). The Type I system is Chromohalobacter salexigens very well defined and recognizes the usual bipartite sequence pattern recognized by Type I enzymes, but this Chromohalobacter salexigens is a moderate halophile that 0 0 particular recognition sequence 5 -CCAC(N) CTC-3 has is tolerant to various salt environments and allows other not been reported previously (5). As usual for Type I organisms (e.g. Salmonella) to exist in environments they systems, the MTase, M.CsaII, acts on the single adenine would otherwise not be able to cope with. The genome m6 residue in each DNA strand forming A. The putative sequence of this organism, which grows at 37 C, was ori- prophage-encoded MTase appears not to be expressed. ginally determined by the JGI (31). Bioinformatic analysis m6 0 0 That the 5 -RG ATCY-3 signal is due to expression of of the genome indicated that there should be one Type I Csal_1368 and is not a combination of expression of both system and two Type II systems (Supplementary Table Type II ORFs was tested by cloning Csal_1401 separately in S2). The recognition sequence of the Type I system the methylation deficient E. coli strain ER2796 (22). The could not be predicted since the specificity subunit resulting clone showed that the MTase was non-specific (Csal_0086), which determines the recognition sequence, and methylated most, but not all, A residues in the showed no similarity to any well-characterized system. (a) (b)(c) (d) Figure 2. Methylome determination of C. salexigens.(a and b) Example traces of kinetic variation, showing two instances of methylated positions. (c) MTase specificities determined from the genomic positions detected as methylated. (d) Summary of detected methylated positions across the genome. Nucleic Acids Research, 2012, Vol. 40, No. 22 11455 plasmid (Supplementary Figure S4). Motif analysis adjacent S genes of the two systems were cloned as indicated the following specificity rules for this relatively pairs. The S1.VbrIP gene is about half the length of a m6 m6 0 0 0 0 typical S subunit and was not tested for activity. The re- non-specific MTase: 5 - AB-3 and 5 -S AAM-3 sulting plasmids tested for resistance to HindIII and ScaI (>96% of all hits with a kinetic score >100 fell into these to test for methylation by M.VbrI and M.VbrII, respect- motifs; B = not A; S = G or C, M = A or C). ively (Supplementary Figure S5). The partial protection against HindIII is expected for an MTase, M.VbrI, Vibrio breoganii 1C-10 0 m6 0 forming 5 -AGC AAGCTTAATGAC-3 as the resulting Vibrio breoganii is a non-motile, alginolytic, marine bac- hemi-methylated HindIII site does not completely inhibit terium. Strain 1C-10 was isolated from large suspended cleavage (5). In a parallel experiment, methylation by particles (likely macroalgal detritus) during analysis of M.VbrII gave complete protection against ScaI at the 0 m6 0 resource partitioning of Vibrionaceae populations sequence 5 -CT AGTACTCCATA-3 as expected (5). (33,34). Bioinformatic analysis suggested that this These assignments were confirmed by SMRT sequencing strain contained two Type I RM systems and both of the plasmids containing individual MTase-expressing proved to be active, methylating the sequence motifs clones (Supplementary Figure S6). m6 m6 0 0 0 0 5 -AGH A(N) TGAC-3 and 5 -CT AG(N) RTAA-3 , Again from bioinformatic analysis, there were two Type 7 6 respectively (Figure 3; Supplementary Figures S1c and II MTases present. The first, M.VbrDam, was a close S2c). Bioinformatics alone could not resolve which homolog of the M.EcoKDam MTase of E. coli (35) and system recognized which sequence and so the M and indeed the genome was methylated at essentially all (a) (b)(c) (d) (e) Figure 3. Methylome determination of V. breoganii 1C-10. (a–c) Example traces of kinetic variation, showing instances of the detected methylated motifs. (d) MTase specificities determined from the genomic positions detected as methylated. (e) Summary of detected methylated positions across the genome. 11456 Nucleic Acids Research, 2012, Vol. 40, No. 22 GATC sites as predicted (Figure 3). The second MTase Finally, the two Type I systems are both active with one m6 0 0 was enigmatic and while a very weak signal (192 out of the forming 5 -CA AYN ACT-3 and the other forming m4 m6 0 0 5 -TA AYN TGC-3 . Since only the second of these 305 unassigned hits) that could be interpreted as C CA modifications is present in C. jejuni NCTC 11168, it can was found by sequencing, this seemed unlikely to be the be safely concluded that the specificity subunit, recognition sequence since very few genomic positions CJJ81176_1536, which has a close homolog in that harboring this motif had strong kinetic signals. 0 0 strain, recognizes 5 -TAAYN TGC-3 and the specificity Consistent with this hypothesis, no modified sites were 0 0 subunit, CJJ81176_0777, recognizes 5 -CAAYN ACT-3 . detected upon cloning this gene into a plasmid and In both cases, methylation results in the second A residue analysis by SMRT sequencing (data not shown), being modified as shown in Figure 4. indicating that this MTase gene is inactive. The weak CCA signals are more likely the result of phosphor- Campylobacter jejuni NCTC 11168 othioated nucleotides which have been detected in this bacterium by bulk methods [(36); T. A. Clark and J. This strain (37) codes for one Type I RM system and four Korlach, unpublished data]. Type II systems. The Type I system is essentially identical with the CjeFIV system in C. jejuni subsp. jejuni 81-176 m6 0 0 Campylobacter jejuni subsp. jejuni 81-176 and forms 5 -TA AYN TGC-3 (CjeNIV) (Figure 5). Two of the Type II systems, M.CjeNI and RM.CjeNII, Campylobacter jejuni is a Gram-negative bacterium native had previously been characterized [26; J.M.B. Vitor et al., to the digestive tract of poultry and other bird species and is unpublished data (5)]. However, as noted earlier, M.CjeNI one of the most common causes of human gastroenteritis. 0 0 0 recognizes 5 -RAATTY-3 (Figure 5) rather than 5 -GAA The genome sequence of this organism had been deter- TTC-3 as had been reported (26). RM.CjeNII is a Type mined some time ago (D. Fouts and K. Nelson, unpub- 0 0 IIG system and recognizes 5 -GAGN GT-3 and is now lished data; GenBank numbers CP000538, CP000549 and shown to methylate both A residues on the two strands. CP000550). Bioinformatic analysis suggested the presence Another Type II MTase is encoded by Cj0690c and is a of two Type I RM systems and four Type II systems, several m6 0 0 Type IIG enzyme that forms 5 -GKA AYG-3 of which had close homologs in C. jejuni NCTC 11168 methylating the second A residue (Figure 5). This gene (Supplementary Table S2). One gene, CJJ81176_0240, was cloned in E. coli and found to produce active endo- was 99% identical to the characterized gene for M.CjeNI, 0 0 nuclease recognizing 5 -GKAAYG-3 and cutting 19/17 0 0 which was reported to recognize 5 -GAATTC-3 (26). downstream. From the bioinformatic analysis, one add- However, when examining the genomic methylation itional gene, Cj0031, plus the adjacent gene, Cj0032, looks through SMRT sequencing, it was clear that the gene in like a Type IIG enzyme containing a frameshift. The this strain, coding for M.CjeFI, recognized the more degen- m6 complete gene would be 99% identical to the gene for 0 0 erate sequence 5 -RA ATTY-3 (Figure 4); the same 0 0 RM.CjeFV, which recognizes 5 -GGRCA-3 . However, proved true for M.CjeNI (see below and Figure 5). no such modification is found in the genome confirming Another MTase gene, CJJ81176_1454, was extremely that the frameshift is real and that this frameshifted gene similar to a gene in C. jejuni NCTC 11168 that was produces no active MTase. SMRT sequencing data con- 0 0 reported to encode an active 5 -GATC-3 MTase (27). firmed the presence of the frameshift. However, in neither of the two Campylobacter strains was such an active MTase detected. Furthermore, the gene in Bacillus cereus ATCC 10987 question shows more similarity to the RNA MTase RsmD than to other DNA MTases. We conclude that this gene is This bacterium was originally isolated from spoiled cheese not able to methylate DNA and its true activity may require and belongs to the same genetic subgroup as Bacillus further biochemical investigation. Two additional MTase anthracis (38). The RM systems in B. cereus ATCC genes appear to be part of Type IIG RM systems in which 10987 had previously been examined by Xu et al. (25), sequence specificity, methylation and restriction are all who determined recognition sequences for four Type II carried out by the same polypeptide. One recognizes the and III REases and one orphan MTase by traditional 0 0 sequence 5 -GGRCA-3 and modifies the terminal A methods. However, the sites of methylation for the Type residue, while the other recognizes the sequence 5 -GCAA II and III MTases were not determined and several other GG-3 and modifies the second A residue (Figure 4). As MTases were not examined including that in the Type I with many other Type IIG enzymes, only one strand of system (BCE_0839-BCE_0842) and a Type II MTase the DNA is methylated. To decide which gene was which, (BCE_0392) that was reported to be inactive (25). we noticed that CJJ81176_0713 is very similar to Cj0690c in However, when we cloned this MTase and checked its m6 C. jejuni NCTC 11168, which recognizes the related activity, it was clearly a promiscuous A MTase, which 0 0 sequence 5 -GKAAYG-3 (see below). Thus, we assigned we have now named M.BceSVII (Supplementary Figure CJJ81176_0713 as the gene encoding RM.CjeFIII forming S9 and Table 1). m6 0 0 5 -GCA AGG-3 and CJJ81176_0068 as the gene Our main goal was to characterize the Type I system and m6 0 0 encoding RM.CjeFV forming 5 -GGRC A-3 (Figure 5). also ascertain the sites of methylation by the MTases not These assignments were confirmed by cloning the indi- addressed in the previous study. The Type I system, now vidual ORFs and testing the clones for protection from called BceSVI, was clearly active and recognized the m6 0 0 appropriate REases (Supplementary Figure S7). These sequence 5 -TA AGN TGG-3 , where again the under- m6 results are summarized in Table 1. lined T indicates A on the complementary strand Nucleic Acids Research, 2012, Vol. 40, No. 22 11457 (a) (b)(c) (d) (e) (f) (g) Figure 4. Methylome determination of C. jejuni 81-176. (a–e) Example traces of kinetic variation, showing instances of the detected methylated motifs. (f) MTase specificities determined from the genomic positions detected as methylated. (g) Summary of detected methylated positions across the genome. 11458 Nucleic Acids Research, 2012, Vol. 40, No. 22 (a) (b)(c) (d) (e) (f) Figure 5. Methylome determination of C. jejuni NCTC 11168. (a–d) Example traces of kinetic variation, showing instances of the detected methylated motifs. (e) MTase specificities determined from the genomic positions detected as methylated. (f) Summary of detected methylated positions across the genome. (Figure 6; Supplementary Figures S1f and S2f). This system To show which MTase recognizes which strand, we is a little unusual in that, it contains two M subunits. cloned the two MTase genes independently and checked Because we did not clone the individual components of for their ability to protect against appropriate REases this system, we cannot say whether one or both M (Supplementary Figure S8). From this analysis, we can m4 0 0 subunits are active. The sites of modification of the three conclude that M1.BceSIII forms 5 -A CGGC-3 and m4 0 0 other Type II MTases are indicated in Table 1, while the M2.BceSIII forms 5 -G CCGT-3 . It is important to Type III MTase, which had been identified earlier by note that while cloning the individual MTase genes cloning, is shown to be completely active in the genome. showed five to be active only four seem to be active in the The previously identified Type II REase BceSIII recog- genome. M.BceSV, a multi-specific MTase characterized in 0 0 nizes an asymmetric sequence, 5 -ACGGC-3 and requires the previous study by cloning and overexpression (25) is m4 two MTases for protection, both of which are C MTases. encoded on a prophage and does not show detectable m4 m6 0 0 0 These form 5 -A CGGC-3 in the strand shown and 5 - activity in the native host genome. In addition to the A m4 m4 G CCGT-3 in the complementary strand (Figure 6b). and C MTases mentioned earlier, our analysis indicated Nucleic Acids Research, 2012, Vol. 40, No. 22 11459 (a) (b)(c) (d) (e) Figure 6. Methylome determination of B. cereus ATCC 10987. (a–c) Example traces of kinetic variation, showing instances of the detected methylated motifs. (d) MTase specificities determined from the genomic positions detected as methylated. (e) Summary of detected methylated positions across the genome. two more motifs that are likely modified by one or more of bacterial genome. For the MTases studied in this article, m5 the predicted C MTases in the B. cereus genome, as 179 of seven are components of Type I RM systems and have six the 524 unassigned hits fell into two categories. These different recognition sequences, all of which are new. Two m5 m5 0 0 0 0 motifs were 5 -G CWGC-3 and 5 -GGWC C-3 which Type III systems were found with one new recognition are consistent with recognition specificity predictions for sequence. Two MTases were part of traditional Type II BCE_0365 and BCE_4605 (Supplementary Table S2). The systems although we did not test whether the REase was m5 kinetic signals for C are subtle in that with the kinetic active. Four Type IIG REases, which contain both MTase 0 0 score cutoff used, we detect only 138 5 -GCWGC-3 and REase activity in a single polypeptide chain, were 0 0 (out of 15416 in the genome) and 41 5 -GGWCC-3 (out found, all with new specificities. It should be noted that of 5460) sites. We are currently exploring methods of two of these, RM.CjeFIII and RM.CjeNIII, show very m5 enhancing the kinetic signature of C during SMRT high sequence similarity and yet recognize different se- m6 m6 0 0 0 0 sequencing. quences (5 -GCA AGG-3 and 5 -GKA AYG-3 , re- spectively). Thus, this finding represents another family of Type IIG restriction enzymes that resemble the MmeI DISCUSSION family, where a few simple changes in critical base recogni- tion elements cause changes in specificity (39). This again The results presented in this article and summarized in Table 1 represent one of the first times that it has been emphasizes the need for caution when transferring annota- possible to examine the complete methylation pattern of a tion from one characterized protein to another (40). The 11460 Nucleic Acids Research, 2012, Vol. 40, No. 22 Table 1. Bioinformatic predictions and experimental results for all MTase genes Bioinformatic predictions Experimental results ORF # Type Gene Prediction Name Rec. Seq. Geobacter metallireducens GS-15 Gmet_0255 Type II M (5) ? inactive m6 Gmet_3140 Type II M GGATC M.GmeI GG ATC m6 Gmet_0676 Type III M ? M.GmeII TCC AGG Chromohalobacter salexigens m6 Csal_0084 Type I M ? M.CsaII CC ACN CTC m6 Csal_1368 Type II M GATC M.CsaI RG ATCY m6 m6 Csal_1401 Type II M ? M.CsaIII AB+S AAM Vibrio breoganii 1C-10 m6 ORF_51A Type I M ? M.VbrI AGH AN TGAC m6 ORF_9B Type I M ? M.VbrII CT AGN RTAA m6 ORF_50B Type II M GATC M.VbrIII G ATC ORF_5C Type II M ? inactive Campylobacter jejuni 81-176 m6 CJJ81176_0776 Type I M ? M.CjeFII CA AYN ACT m6 CJJ81176_1539 Type I M ? M.CjeFIV TA AYN TGC m6 CJJ81176_0068 Type II RM ? RM.CjeFV GGRC A a m6 CJJ81176_0240 Type II M GAATTC M.CjeFI RA ATTY m6 CJJ81176_0713 Type II RM ? RM.CjeFIII GCA AGG Campylobacter jejuni NCTC 11168 m6 Cj1553c Type I M ? M.CjeNIV TA AYN TGC m6 Cj0208 Type II M GAATTC M.CjeNI RA ATTY m6 Cj0690c Type II RM ? RM.CjeNIII GKA AYG m6 Cj1051c Type II RM GAGN GT RM.CjeNII G AGN GT 5 5 Bacillus cereus ATCC 10987 BCE_0839 Type I M ? M1.BceSVIP ? m6 BCE_0841 Type I M ? M2.BceSVIP TA AGN TTG BCE_0365 Type II M (5) GCAGC M.BceSIV N/D m6 BCE_0392 Type II M ? M.BceSVII promiscuous A BCE_0393 Type II M (5) Many M.BceSV N/D BCE_4605 Type II M (5) GGWCC M.BceSII N/D m4 BCE_5606 Type II M ACGGC M1.BceSIII A CGGC m4 BCE_5607 Type II M ACGGC M2.BceSIII G CCGT m6 m6 BCE_1018 Type III M CGA AG M.BceSI CGA AG Italicized genes characterized previously; red text indicates new information or revision. Recognition sequences representations use the standard abbreviations. (Eur. J. Biochem., 150, 1–5, 1985) to represent ambiguity: R = G or A, Y = C or T, M = A or C, K = G or T, S = G or C, W = A or T, = not A (C or G or T), D = not C (A or G or T), H = not G (A or C or T), V = not T (A or C or G), N = A or C or G or T. m5 N/D = not detected ( C assignments were not attempted). indicates incorrect result obtained previously. 0 0 0 0 0 0 0 0 5 -GGCC-3 /5 -GCNGC-3 /5 -CCGG-3 /5 -GGNCC-3 are all recognized. composition of an amino acid change can be critical with very few off-target events noted. Of course, much if it occurs at a residue belonging to a DNA sequence greater coverage would be required to detect very rare recognition element. Two orphan MTases, M.CsaIII and off-site effects and so some degree of promiscuity cannot M.BceSVII, were found to be active when cloned, but be ruled out. However, the apparent promiscuity that was m6 inactive in the genome. Both are promiscuous A observed in our earlier work (15) using MTase genes MTases and both occur on prophage elements suggesting cloned in high copy number plasmids was not apparent. that they may play a protective role during phage infection. We consider the ‘true’ MTases specificity to be reflected in 0 0 Finally, two solitary 5 -GATC-3 MTases were shown to be the modification patterns seen when they are expressed in active. It should be noted that when examining complete their genomic context. Thus, based on the current genome sequences for MTases, some of the genes may be findings, we would have to conclude that in general it inactive because of mutation, while others may be inactive seems likely that most MTases show essentially identical due to transcriptional silencing as is often found when the specificity to their cognate REases, a result that was not genes are present as part of a prophage. In the latter case completely expected since there are no obvious constraints cloning can reveal methylation activity, permitting on their specificity. complete characterization as found earlier (15). Previously, it had been found that Type III MTases One of the striking features of the results from the only methylate a single strand of their recognition current analysis is that the recognition sequences of all sequence and that holds true here. Similarly, most MTases found to be active showed fairly strict specificity characterized Type IIG enzymes methylate just a single Nucleic Acids Research, 2012, Vol. 40, No. 22 11461 strand although several do not, including RM.CjeNII as SUPPLEMENTARY DATA described here. Nevertheless, this can be very helpful when Supplementary Data are available at NAR Online: trying to match recognition sequences found by Supplementary Tables 1 and 2 and Supplementary sequencing with the genes responsible for each consensus Figures 1–9. sequence. Another useful feature is that all known Type I restriction systems seem to possess split recognition se- quences, which can help in distinguishing them when ACKNOWLEDGEMENTS matching genes and consensus sequences. Nevertheless, We thank Martin Polz for the gift of V. breoganii 1C-10 if two Type I systems are present as in V. breoganii DNA and providing its DNA sequence prior to publica- 1C-10, it was essential to clone out the individual tion, and to Stuart Thompson for the gift of C. jejuni systems so that specificity and genes could be properly subsp. jejuni 81-176 and C. jejuni NCTC 11168 DNAs. matched. Note that because of the mechanism of methy- Dr Derek Lovley graciously informed us of his results lation it is only the M and S subunits that need to be with G. metallireducens prior to publication. We would cloned to permit assembly of a functional MTase (16). also like to thank P. Marks and K. Spittle for data In the case of the Type II RM system BceSIII, because analysis help and sample preparation advice, respectively. of the asymmetric nature of the recognition sequence, two We appreciate the careful reading of the manuscript and independent MTases are required to methylate each useful comments by Dr Geoff Wilson. strand of the sequence. While SMRT sequencing can easily find the locations of each methyl group, it was ne- cessary to clone out the two MTase genes separately in FUNDING order to assign strand specificity to each one. M.GmeI New England Biolabs; Pacific BioSciences and NIH grants also recognizes an asymmetric sequence, but in this case, [1RC2GM092602 to J.K. and R44 GM100560 to R.J.R.]. the two M genes are fused. At the present time, we have Funding for open access charge: New England Biolabs’ relatively little information about strand specificity of internal funds. MTases, because it has proven difficult to determine spe- cificity experimentally. As more data accumulate using the Conflict of interest statement. T.A.C., M.B., K.L., S.T. and kinds of analyses that we present here, it should become J.K. are full-time employees at Pacific Biosciences, a much easier in the future to make accurate bioinformatic company commercializing SMRT sequencing tech- predictions about recognition sequences and specificity for nologies. I.A.M., R.D.M., A.F., B.P.A. and R.J.R. are MTases in newly sequenced genomes. full-time employees of New England Biolabs, a company Despite the recognized importance of methylation for that sells research reagents such as DNA MTases. understanding fundamental microbiological processes, microbe adaptability and disease pathogenicity (11,12), in the past, there has not been a great deal of research into the REFERENCES methylation patterns of bacterial genomes, largely because 1. Kumar,S., Cheng,X., Klimasauskas,S., Mi,S., Posfai,J., of the difficulty of obtaining suitable data. One area where Roberts,R.J. and Wilson,G.G. (1994) The DNA (cytosine-5) knowledge about the methylome is very important relates methyltransferases. Nucleic Acids Res., 22, 1–10. to studies trying to transform DNA into strains that con- 2. Tahiliani,M., Koh,K.P., Shen,Y., Pastor,W.A., Bandukwala,H., tain one or more RM systems and which vastly reduce Brudno,Y., Agarwal,S., Iyer,L.M., Liu,D.R., Aravind,L. et al. (2009) Conversion of 5-methylcytosine to 5-hydroxymethylcytosine transformation efficiencies. In some cases, these barriers in mammalian DNA by MLL partner TETI. Science, 324, have been overcome by premethylating the DNA or by 930–935. removing the RM systems from strains (41,42). One 3. Kriaucionis,S. and Heintz,N. (2009) The nuclear DNA base problem with the latter approach is that removal of methy- 5-hydroxymethylcytosine is present in Purkinje neurons and the brain. Science, 324, 929–930. lation systems may fundamentally change the biology of 4. Ito,S., Shen,L., Dai,Q., Wu,S.C., Collins,L.B., Swenberg,J.A., the organism under study. With the kind of analysis He,C. and Zhang,Y. (2011) Tet proteins can convert provided here, the RM systems likely to cause problems 5-methylcytosine to 5-formylcytosine and 5-carboxylcytosine. with transformation can be easily spotted and appropriate Science, 333, 1300–1303. 5. Roberts,R.J., Vincze,T., Posfai,J. and Macelis,D. (2010) measures taken. Thus, the MTases necessary for protection REBASE—a database for DNA restriction and modification: can be identified and if needed intermediate cloning hosts enzymes, genes and genomes. Nucleic Acids Res., 38, D234–D236. carrying suitable complements of MTase genes can be 6. Eid,J., Fehr,A., Gray,J., Luong,K., Lyle,J., Otto,G., Peluso,P., prepared. Rank,D., Baybayan,P., Bettman,B. et al. (2009) Real-time DNA sequencing from single polymerase molecules. Science, 323, In summary, the results provided here show that SMRT 133–138. sequencing can provide functional information about active 7. Flusberg,B.A., Webster,D.R., Lee,J.H., Travers,K.J., MTases present in genomes and can decipher their recogni- Olivares,E.C., Clark,T.A., Korlach,J. and Turner,S.W. (2010) tion sequences, a task that used to be time-consuming to a Direct detection of DNA methylation during single-molecule, point where it was not usually carried out. This, combined real-time sequencing. Nat. Methods, 7, 461–465. 8. Korlach,J. and Turner,S.W. (2012) Going beyond five bases in with the long reads provided by this technology can be an DNA sequencing. Curr. Opin. Struct. Biol., 22, 251–261. excellent adjunct to current high-throughput sequencing 9. Roberts,R.J. and Halford,S.E. (1993) Type II restriction enzymes. platforms, in that sequence assembly is facilitated and In: Linn,S.M., Lloyd,R.S. and Roberts,R.J. (eds), Nucleases. Cold gene function is reliably documented. Spring Harbor Laboratory Press, Cold Spring Harbor, pp. 35–88. 11462 Nucleic Acids Research, 2012, Vol. 40, No. 22 10. Marinus,M.G. and Casadesus,J. (2009) Roles of DNA adenine in regulating virulence characteristics. J. Bacteriol., 190, methylation in host-pathogen interactions: mismatch repair, 6524–6529. transcriptional regulation, and more. FEMS Microbiol. Rev., 33, 28. Kan,N.C., Lautenberger,J.A., Edgell,M.H. and 488–503. Hutchison,C.A. III (1979) The nucleotide sequence recognized by 11. Srikhanta,Y.N., Fox,K.L. and Jennings,M.P. (2010) The the Escherichia coli K12 restriction and modification enzymes. J. phasevarion: phase variation of type III DNA methyltransferases Mol. Biol., 130, 191–209. controls coordinated switching in multiple genes. Nat. Rev. 29. Lovley,D.R. and Phillips,E.J.P. (1988) Novel mode of microbial Microbiol., 8, 196–206. energy-metabolism - organic-carbon oxidation coupled to 12. Casadesu´ s,J. and Low,D. (2006) Epigenetic gene regulation in the dissimilatory reduction of iron or manganese. (1988). Appl. bacterial world. Microbiol. Mol. Biol. Rev., 70, 830–856. Environ. Microbiol., 54, 1472–1480. 13. Bhagwat,A.S. and McClelland,M. (1992) DNA mismatch 30. Graentzdoerffer,A., Lindenstrauss,U., Pich,A. and Andreesen,J.R. correction by very short patch repair may have altered the (2002) New DNA-methyltransferase M.EacI, useful for protecting abundance of oligonucleotides in the E. coli genome. Nucleic double stranded DNA against cleavage by restriction enzymes, Acids Res., 20, 1663–1668. derived from Eubacterium acidaminophilum, German Patent Office 14. Reisenauer,A., Kahng,L.S., McCollum,S. and Shapiro,L. (1999) DE 10060526. Bacterial DNA methylation: a cell cycle regulator? J. Bacteriol., 31. Copeland,A., Lucas,S., Copeland,A., Lucas,S., Lapidus,A., 181, 5135–5139. Barry,K., Detter,J.C., Glavina del Rio,T., Hammon,N., Israni,S. 15. Clark,T.A., Murray,I.A., Morgan,R.D., Kislyuk,A.O., et al. (2011) Complete genome sequence of the halophilic and Spittle,K.E., Boitano,M., Fomenkov,A., Roberts,R.J. and highly halotolerant Chromohalobacter salexigens type strain Korlach,J. (2012) Characterization of DNA methyltransferase (1H11T). Stand. Genomic Sci., 5, 379–388. specificities using single-molecule, real-time DNA sequencing. 32. Ventura,M., Canchaya,C., Bernini,V., Altermann,E., Nucleic Acids Res., 40, e29. Barrangou,R., McGrath,S., Claesson,M.J., Li,Y., Leahy,S., 16. Bickle,T.A. (1993) The ATP-dependent restriction enzymes. Walker,C.D. et al. (2006) Comparative genomics and In: Linn,S.M., Lloyd,R.S. and Roberts,R.J. (eds), Nucleases. Cold transcriptional analysis of prophages identified in the genomes of Spring Harbor Laboratory Press, Cold Spring Harbor, Lactobacillus gasseri, Lactobacillus salivarius and Lactobacillus pp. 89–109. casei. Appl. Environ. Microbiol., 72, 3130–3146. 17. Travers,K.J., Chin,C.S., Rank,D.R., Eid,J.S. and Turner,S.W. 33. Hunt,D.E., David,L.A., Gevers,D., Preheim,S.P., Alm,E.J. and (2010) A flexible and efficient template format for circular Polz,M.F. (2008) Resource partitioning and sympatric consensus sequencing and SNP detection. Nucleic Acids Res., 38, differentiation among closely related bacterioplankton. Science, e159. 320, 1081–1085. 18. Machanick,P. and Bailey,T.L. (2011) MEME-ChIP: motif analysis 34. Preheim,S.P., Timberlake,S. and Polz,M.F. (2011) Merging of large DNA datasets. Bioinformatics, 27, 1696–1697. taxonomy with ecological population prediction in a case study of 19. Sayers,E.W., Barrett,T., Benson,D.A., Bolton,E., Bryant,S.H., Vibrionaceae. Appl. Environ. Microbiol., 77, 7195–7206. Canese,K., Chetvernin,V., Church,D.M., Dicuccio,M., Federhen,S. 35. Brooks,J.E., Blumenthal,R.M. and Gingeras,T.R. (1983) The et al. (2012) Database resources of the National Center for isolation and characterization of the Escherichia coli DNA Biotechnology Information. Nucleic Acids Res., 40, D13–D25. adenine methylase (dam) gene. Nucleic Acids Res., 11, 837–851. 20. Posfai,J., Bhagwat,A.S., Posfai,G. and Roberts,R.J. (1989) 36. Wang,L., Chen,S. and Deng,Z. (2012) Phosphorothioation: an Predictive motifs derived from cytosine methyltransferases. Nucleic unusual post-replicative modification on the dna backbone. Acids Res., 17, 2421–2435. In: Seligmann,H. (ed.), DNA Replication-Current Advances, 21. Klimasauskas,S., Timinskas,A., Menkevicius,S., Butkiene,D., New York: InTech, Chapter 3. pp. 57–74. Butkus,V. and Janulaitis,A. (1989) Sequence motifs characteristic 37. Parkhill,J., Wren,B.W., Mungall,K., Ketley,J.M., Churcher,C., of DNA[cytosine-N4]methylases: similarity to adenine and Basham,D., Chillingworth,T., Davies,R.M., Feltwell,T. and cytosine-C5 DNA-methylases. Nucleic Acids Res., 17, 9823–9832. Holroyd,S. (2000) The genome sequence of the food-borne 22. Kong,H., Lin,L.F., Porter,N., Stickel,S., Byrd,D., Posfai,J. and pathogen Campylobacter jejuni reveals hypervariable sequences. Roberts,R.J. (2000) Functional analysis of putative Nature, 403, 665–668. restriction-modification system genes in the Helicobacter pylori 38. Rasko,D.A., Ravel,J., Økstad,O.A., Helgason,E., Cer,R.Z., J99 genome. Nucleic Acids Res., 28, 3216–3223. Jiang,L., Shores,K.A., Fouts,D.E., Tourasse,N.J. and 23. Sibley,M.H. and Raleigh,E.A. (2004) Cassette-like variation of Angiuoli,S.V. (2004) The genome sequence of Bacillus cereus restriction enzyme genes in Escherichia coli C and relatives. ATCC 10987 reveals metabolic adaptations and a large plasmid Nucleic Acids Res., 32, 522–534. related to Bacillus anthracis pXO1. Nucleic Acids Res., 32, 24. Gibson,D.G., Young,L., Chuang,R.Y., Venter,J.C., 977–988. Hutchison,C.A. III and Smith,H.O. (2009) Enzymatic assembly of 39. Morgan,R.D. and Luyten,Y.A. (2009) Rational engineering of DNA molecules up to several hundred kilobases. Nat. Methods, type II restriction endonuclease DNA binding and cleavage 6, 343–345. specificity. Nucleic Acids Res., 37, 5222–5233. 25. Xu,S.Y., Nugent,R.L., Kasamkattil,J., Fomenkov,A., Gupta,Y., 40. Roberts,R.J., Chang,Y.C., Hu,Z., Rachlin,J.N., Anton,B.P., Aggarwal,A., Wang,X., Li,Z., Zheng,Y. and Morgan,R. (2012) Pokrzywa,R.M., Choi,H.P., Faller,L.L., Guleria,J. and Characterization of Type II and III restriction-modification Housman,G. (2011) COMBREX: a project to accelerate the systems from Bacillus cereus strains ATCC10987 and functional annotation of prokaryotic genomes. Nucleic Acids Res., ATCC14579. J. Bacteriol., 194, 49–60. 39, D11–D14. 26. Takata,T., Wassenaar,T.M., Xu,Q. and Blaser,M.J. (2002) The 41. Donahue,J.P., Israel,D.A., Peek,R.M., Blaser,M.J. and gene product of Campylobacter jejuni gene Cj0208 is a DNA Miller,G.G. (2000) Overcoming the restriction barrier to plasmid methyltransferase with specificity for GAATTC. Abstr. Gen. Meet. transformation of Helicobacter pylori. Mol. Microbiol., 37, Am. Soc. Microbiol., 102, 164. 1066–1074. 27. Kim,J.S., Li,J.Q., Barnes,I.H.A., Baltzegar,D.A., Pajaniappan,M., 42. Dong,H., Zhang,Y., Dai,Z. and Li,Y. (2010) Engineering Cullen,T.W., Trent,M.S., Burns,C.M. and Thompson,S.A. (2008) Clostridium strain to accept unmethylated DNA. PLoS One, 5, Role of the Campylobacter jejuni cj1461 DNA methyltransferase e9038.

Journal

Nucleic Acids ResearchOxford University Press

Published: Dec 2, 2012

There are no references for this article.