Access the full text.
Sign up today, get DeepDyve free for 14 days.
C. Seidman, K. Struhl (2001)
Introduction of plasmid DNA into cells.Current protocols in protein science, Appendix 4
O. Sang-jin, Kim Young-Chang, Park Young-Woo, Min So-Young, Kim In-sook, Kang Hyen-Sam (1987)
Complete nucleotide sequence of the penicillin G acylase gene and the flanking regions, and its expression in Escherichia coliGene, 56
Michael McClelland, K. Sanderson, J. Spieth, S. Clifton, P. Latreille, L. Courtney, S. Porwollik, Johar Ali, M. Dante, Feiyu Du, Shunfang Hou, Daniel Layman, S. Leonard, Christine Nguyen, Kelsi Scott, Andrea Holmes, Neenu Grewal, E. Mulvaney, E. Ryan, Hui Sun, L. Florea, W. Miller, T. Stoneking, M. Nhan, R. Waterston, R. Wilson (2001)
Complete genome sequence of Salmonella enterica serovar Typhimurium LT2Nature, 413
R. Milkman (1973)
Electrophoretic Variation in Escherichia coli from Natural SourcesScience, 182
R. Milkman (1999)
Gene Transfer in Escherichia coli
R. Charlebois (1999)
Organization of the Prokaryotic Genome
Robert Juhala, M. Ford, R. Duda, Anthony Youlton, G. Hatfull, R. Hendrix (2000)
Genomic sequences of bacteriophages HK97 and HK022: pervasive genetic mosaicism in the lambdoid bacteriophages.Journal of molecular biology, 299 1
S. Lacks, B. Mannarelli, S. Springhorn, B. Greenberg (1986)
Genetic basis of the complementary Dpnl and Dpnll restriction systems of S. pneumoniae: An intercellular cassette mechanismCell, 46
A. Daniel, F. Fuller-Pace, Diana Legge, N. Murray (1988)
Distribution and diversity of hsd genes in Escherichia coli and other enteric bacteriaJournal of Bacteriology, 170
G. Schumacher, D. Sizmann, H. Haug, P. Buckel, A. Böck (1986)
Penicillin acylase from E. coli: unique gene-protein relation.Nucleic acids research, 14 14
S. Lacks, B. Greenberg (1977)
Complementary specificity of restriction endonucleases of Diplococcus pneumoniae with respect to DNA methylation.Journal of molecular biology, 114 1
A. Nobusato, I. Uchiyama, I. Kobayashi (2000)
Diversity of restriction-modification gene homologues in Helicobacter pylori.Gene, 259 1-2
L. Worth, S. Clark, M. Radman, P. Modrich (1994)
Mismatch repair proteins MutS and MutL inhibit RecA-catalyzed strand transfer between diverged DNAs.Proceedings of the National Academy of Sciences of the United States of America, 91
(1994)
Identi®cation of high af®nity binding sites for LexA which de®ne new DNA damage-inducible genes in Escherichia coli
K. Nelson, R. Selander (1992)
Evolutionary genetics of the proline permease gene (putP) and the control region of the proline utilization operon in populations of Salmonella and Escherichia coliJournal of Bacteriology, 174
W. Arber, D. Wauters-Willems (1964)
Host specificity of DNA produced by Escherichia coliMolecular and General Genetics MGG, 108
H. Boyer (1964)
GENETIC CONTROL OF RESTRICTION AND MODIFICATION IN ESCHERICHIA COLIJournal of Bacteriology, 88
G. Olsen, C. Woese, R. Overbeek (1996)
The winds of (evolutionary) change: breathing new life into microbiology.Journal of bacteriology, 176 1
A. Stoltzfus, J. Leslie, R. Milkman (1988)
Molecular evolution of the Escherichia coli chromosome. I. Analysis of structure and natural variation in a previously uncharacterized region between trp and tonB.Genetics, 120 2
N. Murray (2002)
2001 Fred Griffith review lecture. Immigration control of DNA in bacteria: self versus non-self.Microbiology, 148 Pt 1
R. Welch, V. Burland, Guy Plunkett, P. Redford, P. Roesch, D. Rasko, E. Buckles, S. Liou, A. Boutin, J. Hackett, D. Stroud, G. Mayhew, D. Rose, S. Zhou, D. Schwartz, N. Perna, H. Mobley, M. Donnenberg, F. Blattner (2002)
Extensive mosaic structure revealed by the complete genome sequence of uropathogenic Escherichia coliProceedings of the National Academy of Sciences of the United States of America, 99
J. Parkhill, G. Dougan, K. James, N. Thomson, D. Pickard, J. Wain, C. Churcher, K. Mungall, S. Bentley, M. Holden, M. Sebaihia, S. Baker, D. Basham, K. Brooks, T. Chillingworth, P. Connerton, A. Cronin, P. Davis, R. Davies, L. Dowd, N. White, J. Farrar, T. Feltwell, N. Hamlin, A. Haque, T. Hien, S. Holroyd, K. Jagels, A. Krogh, T. Larsen, S. Leather, S. Moule, P. O'Gaora, C. Parry, M. Quail, Kim Rutherford, M. Simmonds, J. Skelton, K. Stevens, S. Whitehead, B. Barrell (2001)
Complete genome sequence of a multiple drug resistant Salmonella enterica serovar Typhi CT18Nature, 413
K. Rudd (1998)
Linkage Map of Escherichia coli K-12, Edition 10: The Physical MapMicrobiology and Molecular Biology Reviews, 62
(2003)
Origins of highly mosaic mycobacteriophage
D. Roper, T. Fawcett, R. Cooper (1993)
The Escherichia coli C homoprotocatechuate degradative operon: hpc gene order, direction of transcription and control of expressionMolecular and General Genetics MGG, 237
E. Koonin, A. Mushegian, K. Rudd (1996)
Sequencing and analysis of bacterial genomesCurrent Biology, 6
(1992)
Organization and function of the mcrBC genes of E . coli K - 12
M. Berlyn (1998)
Linkage Map of Escherichia coli K-12, Edition 10: The Traditional MapMicrobiology and Molecular Biology Reviews, 62
L. Lewis, G. Harlow, L. Gregg-Jolly, D. Mount (1994)
Identification of high affinity binding sites for LexA which define new DNA damage-inducible genes in Escherichia coli.Journal of molecular biology, 241 4
V. Barcus, A. Titheradge, N. Murray (1995)
The diversity of alleles at the hsd locus in natural populations of Escherichia coli.Genetics, 140 4
W. Arber, D. Wauters-Willems (1970)
Host specificity of DNA produced by Escherichia coli. XII. The two restriction and modification systems of strain 15T-.Molecular & general genetics : MGG, 108 3
A. Campbell (1992)
Chromosomal insertion sites for phages and plasmidsJournal of Bacteriology, 174
E. Boyd, K. Nelson, F. Wang, T. Whittam, R. Selander (1994)
Molecular genetic basis of allelic polymorphism in malate dehydrogenase (mdh) in natural populations of Escherichia coli and Salmonella enterica.Proceedings of the National Academy of Sciences of the United States of America, 91
Q. Xu, R. Morgan, R. Roberts, M. Blaser (2000)
Identification of type II restriction and modification systems in Helicobacter pylori reveals their substantial diversity among strains.Proceedings of the National Academy of Sciences of the United States of America, 97 17
L. Bullas, C. Colson, B. Neufeld (1980)
Deoxyribonucleic acid restriction and modification systems in Salmonella: chromosomally located systems of different serotypesJournal of Bacteriology, 141
C. Seidman, K. Struhl (2000)
Introduction of Plasmid DNA into CellsCurrent Protocols in Neuroscience, 11
(1966)
Host speci ® city of DNA produced by Escherichia coli : bacterial mutations affecting the restriction and modi ® cation of DNA
(1986)
Genetic basis of the complementary DpnI and DpnII restriction systems of S. pneumoniae: an intercellular cassette
W. Arber, D. Dussoix (1966)
Host specificity of DNA produced by Escherichia coli. 9. Host-controlled modification of bacteriophage fd.Journal of molecular biology, 20 3
R. Aras, A. Small, T. Ando, M. Blaser (2002)
Helicobacter pylori interstrain restriction-modification diversity prevents genome subversion by chromosomal DNA from competing strains.Nucleic acids research, 30 24
C. Rappleye, J. Roth (1997)
Transposition without transposase: a spontaneous mutation in bacteriaJournal of Bacteriology, 179
N. Saunders, Lori Snyder (2002)
The minimal mobile element.Microbiology, 148 Pt 12
R. Hall, C. Collis (1995)
Mobile gene cassettes and integrons: capture and spread of genes by site‐specific recombinationMolecular Microbiology, 15
P. Sharp, J. Kelleher, A. Daniel, G. Cowan, N. Murray (1992)
Roles of selection and recombination in the evolution of type I restriction-modification systems in enterobacteria.Proceedings of the National Academy of Sciences of the United States of America, 89
K. Kusano, K. Sakagami, T. Yokochi, T. Naito, Y. Tokinaga, E. Ueda, I. Kobayashi (1997)
A new type of illegitimate recombination is dependent on restriction and homologous interactionJournal of Bacteriology, 179
E. Raleigh, R. Trimarchi, H. Revel (1989)
Genetic and physical mapping of the mcrA (rglA) and mcrB (rglB) loci of Escherichia coli K-12.Genetics, 122 2
T. Bickle, D. Krüger (1993)
Biology of DNA restriction.Microbiological reviews, 57 2
(1989)
Molecular Cloning: A Laboratory Manual. 2nd Edn
(1987)
Complete nucleotide sequence of the penicillin G acylase gene and the ̄anking regions and its expression
S. Radoja, O. Francetic, N. Stojićević, I. Moric, V. Glišin, M. Konstantinovic (1999)
DNA region responsible for transcriptional regulation of the Escherichia coli penicillin amidase (pac) gene by CRP and PAA.Genetic analysis : biomolecular engineering, 15 6
L. Geer, M. Domrachev, D. Lipman, S. Bryant (2002)
CDART: protein homology by domain architecture.Genome research, 12 10
N. Murray, A. Daniel, G. Cowan, P. Sharp (1993)
Conservation of motifs within the unusually variable polypeptide sequences of type I restriction and modification enzymesMolecular Microbiology, 9
(2000)
Diversity of restriction-modi®cation gene homologues
F. Blattner, Guy Plunkett, C. Bloch, N. Perna, V. Burland, M. Riley, J. Collado-Vides, J. Glasner, C. Rode, G. Mayhew, J. Gregor, N. Davis, H. Kirkpatrick, M. Goeden, D. Rose, B. Mau, Y. Shao (1997)
The complete genome sequence of Escherichia coli K-12.Science, 277 5331
P. Reeves (1993)
Evolution of Salmonella O antigen variation by interspecific gene transfer on a large scale.Trends in genetics : TIG, 9 1
M. Pedulla, M. Ford, Jennifer Houtz, Tharun Karthikeyan, Curtis Wadsworth, John Lewis, D. Jacobs-Sera, Jacob Falbo, Joseph Gross, Nicholas Pannunzio, W. Brucker, Vanaja Kumar, Jayasankar Kandasamy, L. Keenan, S. Bardarov, J. Kriakov, J. Lawrence, W. Jacobs, R. Hendrix, G. Hatfull (2003)
Origins of Highly Mosaic Mycobacteriophage GenomesCell, 113
(1995)
Barriers to recombination: restriction
F. Wang, T. Whittam, R. Selander (1997)
Evolutionary genetics of the isocitrate dehydrogenase gene (icd) in Escherichia coli and Salmonella entericaJournal of Bacteriology, 179
R. Milkman, E. Jaeger, Ryan McBride (2003)
Molecular evolution of the Escherichia coli chromosome. VI. Two regions of high effective recombination.Genetics, 163 2
I. Kobayashi (2001)
Behavior of restriction-modification systems as selfish mobile elements and their impact on genome evolution.Nucleic acids research, 29 18
L. Florea, C. Riemer, S. Schwartz, Z. Zhang, N. Stojanovic, W. Miller, Michael McClelland (2000)
Web-based visualization tools for bacterial genome alignments.Nucleic acids research, 28 18
N. Perna, Guy Plunkett, V. Burland, B. Mau, J. Glasner, D. Rose, G. Mayhew, P. Evans, J. Gregor, H. Kirkpatrick, G. Pósfai, J. Hackett, Sara Klink, A. Boutin, Y. Shao, Leslie Miller, Erik Grotbeck, N. Davis, A. Lim, E. Dimalanta, K. Potamousis, Jennifer Apodaca, T. Anantharaman, Jieyi Lin, G. Yen, D. Schwartz, R. Welch, F. Blattner (2001)
Genome sequence of enterohaemorrhagic Escherichia coli O157:H7Nature, 409
J. Sambrook, E. Fritsch, T. Maniatis (2001)
Molecular Cloning: A Laboratory Manual
M. Prieto, E. Díaz, J. García (1996)
Molecular characterization of the 4-hydroxyphenylacetate catabolic pathway of Escherichia coli W: engineering a mobile aromatic degradative clusterJournal of Bacteriology, 178
D. Liu, P. Reeves (1994)
Presence of different O antigen forms in three isolates of one clone of Escherichia coli.Genetics, 138 1
A. Campbell, S. Schneider, B. Song (2004)
Lambdoid phages as elements of bacterial genomes (integrase/phage21/ Escherichia coli K-12/icd gene)Genetica, 86
N. Murray (2000)
Type I Restriction Systems: Sophisticated Molecular Machines (a Legacy of Bertani and Weigle)Microbiology and Molecular Biology Reviews, 64
H. Ochman, R. Selander (1984)
Standard reference strains of Escherichia coli from natural populationsJournal of Bacteriology, 157
A. Mira, H. Ochman (2002)
Gene location and bacterial sequence divergence.Molecular biology and evolution, 19 8
A. Roa, J. García (1999)
New insights into the regulation of the pac gene from Escherichia coli W ATCC 11105.FEMS microbiology letters, 177 1
M. Biserčić, J. Feutrier, P. Reeves (1991)
Nucleotide sequences of the gnd genes from nine natural isolates of Escherichia coli: evidence of intragenic recombination as a contributing factor in the evolution of the polymorphic gnd locusJournal of Bacteriology, 173
F. Barre, B. Søballe, B. Michel, M. Aroyo, Malcolm Robertson, D. Sherratt (2001)
Circles: The replication-recombination-chromosome segregation connectionProceedings of the National Academy of Sciences of the United States of America, 98
A. Titheradge, D. Ternent, N. Murray (1997)
A third family of allelic hsd genes in Salmonella enterica: sequence comparisons with related proteins identify conserved regions implicated in restriction of DNAMolecular Microbiology, 23
J. Lawrence, H. Ochman (1998)
Molecular archaeology of the Escherichia coli genome.Proceedings of the National Academy of Sciences of the United States of America, 95 16
Michael McClelland, L. Florea, K. Sanderson, S. Clifton, J. Parkhill, C. Churcher, G. Dougan, R. Wilson, W. Miller (2000)
Comparison of the Escherichia coli K-12 genome with sampled genomes of a Klebsiella pneumoniae and three salmonella enterica serovars, Typhimurium, Typhi and Paratyphi.Nucleic acids research, 28 24
William Wood, William Wood (1966)
Host specificity of DNA produced by Escherichia coli: bacterial mutations affecting the restriction and modification of DNA.Journal of molecular biology, 16 1
S. Altschul, Thomas Madden, A. Schäffer, Jinghui Zhang, Zheng Zhang, W. Miller, D. Lipman (1997)
Gapped BLAST and PSI-BLAST: a new generation of protein database search programs.Nucleic acids research, 25 17
H. Kong, Lee-Fong Lin, Nicole Porter, S. Stickel, D. Byrd, J. Posfai, R. Roberts (2000)
Functional analysis of putative restriction-modification system genes in the Helicobacter pylori J99 genome.Nucleic acids research, 28 17
J. Gossen, J. Vijg (1988)
E. coli C: a convenient host strain for rescue of highly methylated DNA.Nucleic acids research, 16 19
G. Bertani, J. Weigle (1953)
HOST CONTROLLED VARIATION IN BACTERIAL VIRUSESJournal of Bacteriology, 65
(2002)
The minimal mobile
E. Raleigh (1992)
Organization and function of the mcrBC genes of Escherichia coli K‐12Molecular Microbiology, 6
(1987)
Complete nucleotide sequence of the penicillin G acylase gene and the ̄ anking regions and its expression in Escherichia coli
R. Overbeek, M. Fonstein, M. D'Souza, G. Pusch, N. Maltsev (1999)
The use of gene clusters to infer functional coupling.Proceedings of the National Academy of Sciences of the United States of America, 96 6
B. Levin (1988)
Frequency-dependent selection in bacterial populations.Philosophical transactions of the Royal Society of London. Series B, Biological sciences, 319 1196
F. Valle, G. Gosset, B. Tenorio, G. Oliver, F. Bolivar (1986)
Characterization of the regulatory region of the Escherichia coli penicillin acylase structural gene.Gene, 50 1-3
K. Dybvig, R. Sitaraman, C. French (1998)
A family of phase-variable restriction enzymes with differing specificities generated by high-frequency gene rearrangements.Proceedings of the National Academy of Sciences of the United States of America, 95 23
Published online January 26, 2004 Published online January 26, 2004 522±534 Nucleic Acids Research, 2004, Vol. 32, No. 2 DOI: 10.1093/nar/gkh194 Cassette-like variation of restriction enzyme genes in Escherichia coli C and relatives Marion H. Sibley and Elisabeth A. Raleigh* New England Biolabs, Beverly, MA 01915, USA Received September 29, 2003; Revised and Accepted December 8, 2003 DDBJ/EMBL/GenBank accession no. AY392450 ABSTRACT Existing clues to the sequence evolution of the ICR of E.coli suggested a model of replaceable cassettes at a single location. A surprising result of comparative bacterial geno- These might be similar to the cassettes of integrons, which can mics has been the large amount of DNA found to be be exchanged by means of site-speci®c recombinases (9); or to present in one strain but not in another of the same alternative prophages at an attachment site, as with phage 21 species. We examine in detail one location where and the e14 excisable element (10); or they could resemble the gene content varies extensively, the restriction DpnI/DpnII exchangeable cassettes found in Streptococcus cluster in Escherichia coli. This region is designated pneumoniae (11,12), which are exchanged by means of homologous recombination in ¯anking DNA. the Immigration Control Region (ICR) for the density Three lines of evidence suggested that replaceable cassettes and variability of restriction functions found there. might occupy this location. First, many enteric strains have at To better de®ne the boundaries of this variable least one restriction gene in this genetic location, but the locus, we determined the sequence of the region restriction speci®cities are quite variable. At least 16 different from a restrictionless strain, E.coli C. Here we com- speci®cities can be identi®ed in 37 isolates of the ECOR pare the 13.7 kb E.coli C sequence spanning the site collection (4,13), a strain set designed to represent the of the ICR with corresponding sequences from ®ve diversity of the E.coli species worldwide (14,15). This E.coli strains and Salmonella typhimurium LT2. To variability is found not only in E.coli but also among different discuss this variation, we adopt the term `frame- genera of enteric bacteria (6). work' to refer to genes that are stable components Second, despite the residence of similar function (restric- of genomes within related lineages, while `migra- tion) at the same genetic location in different strains, the DNA sequences determining that function are often highly diver- tory' genes are transient inhabitants of the genome. gent. The hsd genes have been grouped into four families (IA, Strikingly, seven different migratory DNA segments, IB, IC, ID) based on sequence similarity, functional encoding different sets of genes and gene frag- complementation, and antibody cross-reactivity within but ments, alternatively occupy a single well-de®ned not between families (reviewed in 16±18). To appreciate the location in the seven strains examined. The ¯anking level of divergence between families, Sharp et al. (19) used framework genes, yjiS and yjiA, display approxi- divergence within type IA modi®cation subunits (HsdM) of mately normal patterns of conservation. The E.coli and Salmonella as a clock and suggested that IA and IB patterns observed are consistent with the action of families were so divergent that the difference might distin- a site-speci®c recombinase. Since no nearby gene guish bacterial phyla [roughly, comparing purple sulfur codes for a likely recombinase of known families, bacteria and spirochaetes (20)]. Despite this high level of such a recombinase must be of a new family or divergence, members of three of the four families are found at unlinked. the same location in E.coli, genetically linked to the thr locus (4). Analysis of four members of the IA family suggested that lateral transfer of hsd sequence has occurred (19). INTRODUCTION The third line of evidence suggesting a cassette-swap model The Immigration Control Region (ICR) was de®ned in is the presence of conserved (or at least hybridizable) Escherichia coli K-12 (1) as a locus at 98.6 min specifying sequence ¯anking the locus in strains with divergent type I three different restriction systems within 14 kb of DNA. This systems, by Southern blot (6). This suggests that these cluster includes the hsdR, hsdM and hsdS genes encoding the diverged genes occupy the same sequence environment, and type I system EcoKI, as well as the methylation-dependent that transposition or other site-independent mechanisms did restriction system genes mcrB, mcrC and mrr. Type I restriction not mediate acquisition. genes resident here (linked to serB and thr) are known to be The modi®cation-dependent restriction genes ¯anking the highly variable in speci®city, both within E.coli (2±4) and type I systems have been less well-studied, but limited among enteric bacteria (5,6). Some strains apparently lack evidence suggests lateral transfer has affected these genes as restriction-modi®cation (RM) systems at this locus. E.coli C has well. The mrr gene of Salmonella typhimurium LT2 is been a reference `restrictionless' strain for many years (7,8). relatively divergent from that of E.coli K-12 (71% DNA *To whom correspondence should be addressed. Tel: +1 978 927 5054; Fax: +1 978 921 1350; Email: [email protected] Nucleic Acids Research, Vol. 32 No. 2 ã Oxford University Press 2004; all rights reserved Nucleic Acids Research, 2004, Vol. 32, No. 2 523 identity). Typical homologous genes serving the same func- presumably occur at loci permissive for DNA acquisition, tion (orthologs) display DNA identity of ~84% in E.coli± while insertions that disrupt other genes would be purged from Salmonella comparisons (21). The mcrB and mcrC genes are the population by selection. Migratory genes found at the same entirely absent from many E.coli strains (4,6). locus in two cell lineages may be totally unrelated. We show In addition to a high frequency of segmental replacement, below that these ideas accurately re¯ect the pattern of the region as a whole appears to be subject to higher levels of sequence variation observed in the vicinity of the ICR. homologous recombination than is typical of most of the We ®nd that the ICR is occupied by a variety of migratory chromosome. Milkman and colleagues (22,23) designated the genes in different strains, all ¯anked precisely by the same region containing the ICR as one of two `hypervariable framework genes, consistent with acquisition by site-speci®c regions' in the E.coli genome. These regions show signs that recombination. We also ®nd evidence for reshuf¯ing of these homologous recombination of horizontally transferred DNA migratory genes by homologous recombination. contributed signi®cantly to the evolutionary history of the genes resident there. The other such region contains the O-antigen gene complex near 45 min (24,25). Taking another MATERIALS AND METHODS approach, Lawrence and Ochman (26) classify all three Strains of E.coli and growth conditions restriction systems, hsd, mcrBC and mrr, as laterally trans- ferred genes using criteria of GC content and codon usage. The wild-type E.coli strain C (AC3121 = CGSC 3121) was To clarify the boundaries of this putative cassette locus and obtained from the E.coli Genetic Stock Center. The wild-type to gain insight into the mechanisms of variation, we sought to E.coli strain W (ATCC 11105) was obtained from the obtain the sequence of the region from a strain thought to lack American Type Culture Collection. The K-12 strain ER2683 any `cassette' at this location. E.coli C is such a strain by three [fhuA2 glnV44 e14- rfbD1? relA1? endA1 spoT1? thi-1 D criteria. The ®rst is the absence of a classical RM system; it (mcrC-mrr) 114::IS10 D (lacI-lacA) 200/F¢proAB lacI has long been used as a permissive host in the study of type I DlacZM15 zzf::miniTn10 (KanR)], used for transformations, enzymes (7), and no restriction locus genetically linked to thr was constructed in this laboratory from an MM294 back- is identi®able (27). The second is the absence of modi®cation- ground strain. All strains were grown in Luria-Bertani medium dependent restriction systems (8), two of which ¯ank the type I at 37°C. system in K-12. The third is the absence of hybridizing General DNA methods sequence over much of the region by Southern blot analysis (6). Qiagen kits were used for puri®cation of chromosomal and It is convenient at this point to enunciate the idea of plasmid DNAs according to the manufacturer's instructions. `framework genes' and `migratory genes'. `Framework genes' All enzymes were from New England Biolabs. DNA ampli- are de®ned here as genes that remain constant in genomes of ®cation by PCR, restriction endonuclease digestions, DNA close relatives: they remain in the same order with respect to ligations and agarose gel electrophoresis were performed each other and in approximately the same chromosomal according to standard protocols (35). Transformation of E.coli location, but may be locally separated by segments that come with plasmid DNA was done using electroporation (36). and go among strains. This idea has had several antecedents: Size-fractionated genomic DNA library from E.coli C Welch et al. (28) used `framework', `backbone' and `con- served synteny' to refer to such segments. Perna et al. (29) A size-fractionated E.coli C genomic DNA library was used `backbone', while others have referred to `collinear prepared by digesting puri®ed chromosomal DNA with regions' (30,31). We imagine that within a cell lineage, such EcoRI, followed by electrophoresis on an agarose gel, and genes diverge in sequence in concert, or are exchanged by elution of DNA running at about 5 kb using a Qiagen gel homologous recombination. Such recombination depends on extraction kit. A library of these fragments was ligated to preservation of a high degree of DNA sequence identity EcoRI-digested, dephosphorylated pBR322. Following elec- [>96%; (32)]. The default assumption is that each copy of such troporation into ER2683, 10 pools of about 300 colonies each a gene in the lineage is descended from a common ancestor of were prepared, and plasmid minipreps were made from each that gene in the same lineage. Such genes are termed pool. These pools comprised the size-selected library. `orthologous' (33). Glycerol stocks were also prepared from 1 ml of each wash. In contrast, we propose the term `migratory genes' to refer Screening of library for ICR ¯anking regions by PCR to genes in sequence segments that come and go en bloc within a species or other taxon. Elsewhere, such segments have been We identi®ed a pair of primers that amplify a region in the yjiY called `loops' or `islands' (30), or `lineage-speci®c segments' gene (downstream of mrr in E.coli K-12) from both K-12 and (29). In principle, these could be transposable elements; genes C genomic DNA. These yjiY primers were used to screen each acquired by `illegitimate recombination' (34) in an undirected of the 10 plasmid pools of the size-selected E.coli C DNA fashion; or genes associated with prophages or other elements genomic library, and one of the 10 (9) showed the expected that mediate site-speci®c recombination at particular loca- 1022 bp PCR product. The strain ER2683 was transformed tions. Site-speci®c recombination requires particular short with the 9 plasmid pool and 136 resulting colonies were sequences (attachment sites) to be present on both recombin- patched on to an LB plate with ampicillin. Cells scraped from ing partners, and mediates rearrangement events with precise the patches were combined to make secondary pools of 13 borders. Rearrangements can include insertion of one mol- patches each. Plasmid DNA was prepared from each second- ecule into another, deletion between two such sites, ary pool, and the yjiY PCR repeated. The pool containing or inversion between them. Such attachment sites would patches 27±39 was positive. Cells from each of these 524 Nucleic Acids Research, 2004, Vol. 32, No. 2 individual patches were scraped from the plate, individual Other genome sequence ®les and methods plasmid preps made, and the yjiY PCR again repeated. Colony Sequence ®les used were: E.coli K-12, complete genome 38 had a positive PCR and cells from the 38 patch remaining NC_000913.1, nucleotides 4567731±4590882 (38); E.coli on the plate were scraped off and an overnight culture grown O157:H7, complete genome NC_002655.2, nucleotides to isolate a stock of this plasmid DNA. The plasmid of colony 5463295±5481664 (29); E.coli CFT073 complete genome 38 was designated pMS3. To sequence the ICR ¯anking NC_004431, nucleotides 5159888±5178596 (28); E.coli W, region upstream of mrr in E.coli C, PCR primers in yjiS and M15950.1, AF109125.1, M17609.1 and Z37980.2 (39±41); yjiA were used to screen the size-selected library. The plasmid and S.typhimurium LT2, complete genome NC_003197, pool from plate 8 of the size-selected E.coli C library had the nucleotides 4774451±4791743 (30). The E.coli C sequence correct sized product from the yjiS and yjiA primers. The ®les Z47799, AF036583, X81446, x55200, X53666, x81322, ER2683 strain was transformed with this prep and 208 x75028 and s56952 (37) were used to join our sequence to the resulting colonies were patched on to LB ampicillin. rest of the hpc (hpa) operon. Sequences for CFT073 and K1 Successive yjiS-yjiA PCRs were performed as described were kindly provided by Guy Plunkett in advance of above and the plasmid DNA from one ®nal positive colony publication. Sequence of the pathogenic strain K1 will be was designated pMS4. published elsewhere. Protein family annotation in Table 1 employed the Linkout facility of the Entrez web site at the National Center for Biotechnology Information (http://www. Additional PCR fragments ncbi.nlm.nih.gov/Database/index.html), and the Conserved Because of two closely spaced EcoRI sites in the K-12 Domain Database there (42). For correlation purposes, the sequence (193 bp apart), we expected a corresponding gap genetic and restriction maps of E.coli may be consulted between the ends of the pMS3 and pMS4 sequences. Primers (43,44). were used to amplify a PCR product across this gap, one Assembly of the E.coli W contig primer in the Z5950 coding sequence (pMS4) and one primer in yjiX (pMS3) from E.coli C genomic DNA. GenBank ®les M15950.1 Penicillin acylase [ECOPAC; (45)], Genes hpaC and hpaB are not known from E.coli C; hpaI is AF109125.1 Penicillin amidase precursor regulatory region known from C (37). To join sequence information revealed in (46), M17609.1 Penicillin G acylase, complete [ECOPGA; this study with information from known E.coli C sequence, a (39)] and Z37980.2 [EC4HPADNA; hpa genes (40)] com- PCR product was generated using primers in hpaC and hpaI bined to form the sequence across the E.coli W version of the with E.coli C chromosomal DNA as template; E.coli K-12 and ICR as far as tsr. The last 21 base pairs of ECOPGA were E.coli W chromosomal DNA were used as negative and similar to yjiS of K-12. To verify connection of this contig to positive controls, respectively. A PCR product of about 5.5 kb yjiR, primers were designed in pac and yjiR based on known was generated from these primers from E.coli C DNA. As sequence in W and K-12. End sequencing of this product expected, this product was also generated from W DNA and con®rmed linkage of pac and yjiR. The sequence used in no product was generated from K-12 DNA. The PCR product Figure 4 was obtained from a composite of three sequences from C DNA was cloned into the pPCR-Script Amp SK (+) such that all the sequence was obtained from two sources. The vector (Stratagene). The sequencing of the hpaC-hpaI PCR entire region is present in AF109125 (bp 85±373). Two product yielded a 5.5 kb sequence containing sequence similar corrections to the AF109125 sequence were made based on to the E.coli W sequence spanning hpaC through hpaI, other available sequences, since this allowed improved including the entirety of hpaB, hpaA and hpaX (GenBank alignment in Figure 4. Discrepancies were: AF109125 con- Z37980). tained a three base insertion (`TCG' bp 198±91 of that ®le) relative to our yjiS end sequence; second, AF109125 con- tained an insertion of G at position 277 of that sequence Sequencing relative to ECOPAC (no G before position 21). Sequence in The inserts of both pMS3 and pMS4 and the cloned hpaC± Figure 3 is contained in ECOPGA only and was accepted as hpaI PCR product were sequenced using as templates random presented. Linkage to EC4HPADNA rests on a 35 bp overlap insertions generated with the GPS-1 Genome Priming System between the end of ECOPGA and the end of EC4HPADNA (New England Biolabs). Sequencing primers were Primers S and the consistency of this linkage with the stable framework and N of the GPS kit. The PCR product joining pMS3 to pMS4 of the E.coli genomes. was directly sequenced using the primers employed for the original ampli®cation. Sequence was assembled using AutoAssembler (Applied Biosystems). RESULTS Escherichia coli C ICR sequence Dyad element detection The region of the ICR in the E.coli genome, including the Sequence ®les were created containing 70 bp of sequence genes yjiS through yjiA (Fig. 1), is spanned by an 18 kb EcoRI from each strain, joining 40 bp to the left of the yjiS stop with fragment in K-12 but only by a 5.1 kb EcoRI fragment in C 30 bp to the right of the yjiA stop. The GCG program LINEUP (6). We cloned and sequenced the 5 kb EcoRI fragment created a consensus sequence. STEMLOOP was then used to identi®ed by Daniel et al. (6) by Southern blot, using a PCR- predict dyad symmetry elements, with parameters of: min- directed search strategy to identify suitable clones in a size- imum stem, 6; minimum bonds/stem, 12; maximum loop 20. selected clone library. The adjacent 6.7 kb was obtained by The dyad shown in Figure 5 was the best of seven stems found. bootstrap PCR and cloning of segments with similarity to the Nucleic Acids Research, 2004, Vol. 32, No. 2 525 Table 1. Genes spanning the `empty' ICR in E.coli C a b Nucleotides Gene Distribution Functional information 5±1417 yjiR All strains except S.typhimurium. CFT073 COG1167.1, ARO8. Transcriptional regulator and K1 have only a 77 nt fragment from containing a DNA-binding HTH domain and the middle of the gene; W sequence is un®nished an aminotransferase domain (MocR family) 1594±1758 yjiS All strains. CFT073 and K1 have only an 86-nt COG5457.1. Uncharacterized conserved small fragment from the 3¢ end of the gene including protein, function unknown the stop codon; this fragment is fused to the yjiR fragment 1807±2370 hsdR CFT073 and O157:H7. Fragment includes the Type IB restriction endonuclease fragment. (fragment with 5¢ 565 nt of gene with start codon DNA sequence is 98% identical with EcoA upstream sequence) restriction enzyme of E.coli 15T (57) 2648±4300 Z5950 CFT073 and O157:H7 pfam03235.8, DUF262. Protein of unknown function 4369±5223 yjiA All strains pfam02492, CobW 5336±5539 yjiX All strains COG2879.1. Uncharacterized small protein 5670±7835 yjiY All strains COG1966.1, CstA. Carbon starvation protein, predicted membrane protein e f 8213±8725 hpaC W and S.typhimurium 4-hydroxyphenylacetate (4-HPA) hydroxylase, reductase component e f 8743±10305 hpaB W and S.typhimurium 4-hydroxyphenylacetate (4-HPA) hydroxylase, oxygenase component e f 10556±11449 hpaA W and S.typhimurium Transcription regulator e f 11459±12835 hpaX W and S.typhimurium Transmembrane facilitator e f 13010±13753 hpaI W and S.typhimurium 2,4-dihydroxy-hept-2-ene-1,7-dioic acid aldolase (end of sequenced (also known as region) hpcH in E.coli C) Distribution among the strains of the set considered in this paper. Information from Conserved Domain Database [see Methods (42)] or Prieto et al. (40). Designation is from E.coli K-12 sequence. Designation is from E.coli CFT073 sequence; the Z5950 sequence found in C and CFT073 represents a fusion of the coding sequences designated Z4949 and Z5950 in O157:H7. Designation from E.coli W sequence; part of the 4-HPA degradative gene cluster of 11 genes, completely sequenced in E.coli W (40). Gene is found elsewhere in the chromosome of S.typhimurium. uninterrupted in frame relative to the database version adopted as a name, except that the start codon and ®rst six amino acids of Z5950 differ between C and the source CFT073. As required by the sequencing strategy, one border of this region comprises two framework genes, yjiR and yjiS. The next three genes are candidate migratory genes: they are not found at all in K-12, but are found in other strains (see below). First is a fragment of an hsdR gene similar to a type IB family restriction enzyme, EcoA. EcoA is found genetically in this region in E.coli 15T . We note that the 565 nucleotides encode Figure 1. Genetic structure surrounding the ICR in E.coli. Conserved an N-terminal fragment of hsdR that would not form a framework genes (striped boxes, approximately to scale) encode functional endonuclease. This sequence also would not have hypothetical proteins of unknown function, between 98.4 and 98.9 min on been present on the EcoA DNA probe used by Daniel et al. (6). the K-12 chromosome. The variable region (dashed line) found between Our result is thus consistent with the observed lack of yjiS and yjiA contains distinct gene sets in different strains. A part of this region has been designated the Immigration Control Region (ICR) in E.coli restriction activity or hybridization. An ORF of unknown K-12. function is next. This is also found in E.coli CFT073, designated Z5950 (28); in O157:H7 a frameshift breaks this adjoining region in the K-12 genome, also identi®ed by PCR. ORF into two parts, Z5949 and Z5950 (29). Similarity to K-12 Sequencing employed a mixture of vector-primed end then resumes, with more framework genes of unknown sequencing, transposon insertion sequencing and gap closure function found to the right of the ICR in K-12: yjiA, yjiX by sequencing of PCR products. This sequence was deposited and yjiY. in GenBank (accession number AY392450). At the right end of our E.coli C sequence are ®ve more candidate migratory genes (Table 1). Again, these are not Genes present in E.coli C found in K-12 but are found at this sequence location in another strain, this time in E.coli W (40). These genes, hpaC, Table 1 lists genes identi®ed in our 13.7 kb segment, inferred hpaB, hpaA, hpaX and hpaI (hpcH), form the left end of a from DNA sequence similarity found in GenBank (release 10-gene cluster found between yjiY and tsr in E.coli W. The 138) using nBLAST (47). Genes are listed in clockwise order with respect to the K-12 sequence, and have been given the gene cluster enables E.coli W to degrade 4-hydroxyphenyla- name of the best match to another E.coli strain. All genes were cetic acid. E.coli C also carries the other genes of this cluster: 526 Nucleic Acids Research, 2004, Vol. 32, No. 2 Figure 2. Gene content of the ICR in six strains of E.coli and in Salmonella enterica sv typhimurium LT2. Boxes (not to scale) represent gene coding sequence, even if the gene is truncated. Similarity was assessed at the DNA level; if similarity was >65%, genes were judged orthologous, and named according to the earliest genome sequence. Boxes of the same color represent orthologous blocks of genes that appear to be moving together. Orange boxes: type IA restriction genes (hsdRMS) and mrr. Blue boxes: type IB restriction genes (hsdRMS) and two ORFs of unknown function, Z5949 and Z5950; these are fused in C and CFT073 with a 10 amino acid linker relative to the two in O157:H7. Yellow boxes: yjiW. Green boxes: unknowns Z5943 and Z5944. White boxes: genes unique in this dataset. These are: yjiT, yjiU and mcrD, all of unknown function, in K-12; restriction genes mcrB and mcrC; penicillin acylase (pac) in W; and unknowns STM4522, 4528 and 4529 in S.typhimurium. they were designated hpc (homoprotocatechuate degradation) K1, and S.typhimurium LT2 (see Materials and Methods for by Roper et al. (37). Our sequence overlaps that of Roper et al., references). For all seven strains, the same framework genes and we were able to construct a contiguous sequence as far as are present in the same order, interrupted by a variable region tsr by connecting those sequences with ours. This second in the same location as the ICR of E.coli K-12 (Fig. 1). In all locus for migratory genes will not be discussed further. strains, the framework genes yjiS and yjiA constitute the left and right borders, respectively, of the ICR collection of A coarse-grained comparison: patterns of gene migratory genes. occurrence in the variable region between yjiS and yjiA Figure 2 shows the migratory genes contained within the We compared our E.coli C ICR sequence with sequences variable region in the seven strains (not to scale). ORFs similar around this region of the chromosome from six other strains of at the DNA level [>65% identity (48)] are assumed to be enteric bacteria. In addition to the well-studied E.coli labora- orthologous and given names annotated in the earliest genome tory strain K-12, these strains include another E.coli labora- sequence. The number of genes present ranges from one tory strain (W), the E.coli pathogens CFT073, O157:H7 and (E.coli W) to 10 (E.coli K-12). In all, 45 ORFs representing 20 Nucleic Acids Research, 2004, Vol. 32, No. 2 527 Table 2. Percent nucleotide identity of framework genes around the ICR in the seven strains (E.coli and S.typhimurium LT2) Strain comparison yjiS (165 nt) yjiA (854 nt) yjiX (203 nt) yjiY (2166 nt) K1/CFT073 100 98 100 99 C/0157:H7 97 98 100 98 C/CFT073 85 98 100 98 C/K1 85 98 100 98 K-12/O157:H7 93 97 100 98 K-12/K1 87 97 100 99 CFT073/O157:H7 86 97 100 98 K-12/CFT073 87 96 100 99 C/K-12 93 96 100 98 C/W 95 96 99 98 K-12/W 93 95 99 98 K-12/S.typhimurium 68 87 91 88 C/S.typhimurium 67 86 91 88 K1/S.typhimurium 75 85 91 88 Comparisons with E.coli K1 and CFT073 were done using the 87 nt C-terminal fragment of yjiS found in these strains. Table 3. Percent nucleotide identity of migratory genes found in the variable locus between yjiS and yjiA Strain comparison Z5943 Z5944 yjiW hsdS hsdM hsdR mrr Z5950 (882 nt) (1164 nt) (399 nt) (1367 nt) (1588 nt) (3562 nt) (914 nt) (1653 nt) a a a K1/CFT073 99 97 96 Div Div Div ±± c b C/0157:H7 ± ± ± ± ± 99 ±99 C/CFT073 ± ± ± ± ± 99 ±99 a a a K-12/O157:H7 ± ± 94 Div Div Div ±± K-12/K1 ± ± 94 Patchy 97 97 99 ± d b CFT073/O157:H7 98 98 100 Patchy 99 98 ± 99 a a a K-12/CFT073 ± ± 94 Div Div Div ±± K-12/S.typhimurium ± ± 79 Patchy 89 83 71 ± K1/S.typhimurium ± ± 81 Patchy 89 84 71 ± Div = highly diverged. Comparisons between Type IA and Type IB hsd genes were not possible using this procedure because the genes are too divergent. Comparisons with O157:H7 were done using a segment covering both Z5949 and Z5950; the Z5950 sequence found in C and CFT073 represents a fusion of the coding sequences designated Z4949 and Z5950 in O157:H7. Comparisons with E.coli C were done using the 565-nt N-terminal fragment of hsdR (Type IB) found in this strain. In comparisons within Type IA or Type IB hsdS genes, the alignment procedure yielded 85±96% identity within the carboxy conserved region, 40±50% identity within the carboxy target recognition domain (TRD), 94±97% identity within the central conserved region, 40±50% identity within the amino TRD, and 91±97% identity within the amino conserved region. different genes are found here. Ten of these can be assigned a anywhere else in this dataset (white boxes) include: the K-12 function. Collectively, RM genes comprise 21 of the 43 ORFs, genes yjiT, yjiU, mcrD, mcrC and mcrB; the Salmonella ORFs and nine of the 10 assignable functions, consistent with the STM4522 (on the left) and STM4528 and 4529 (between mrr designation `ICR'. Six of the seven strains have at least one and yjiA); and penicillin acylase [pac (50)], the sole gene RM gene or gene fragment in this region. The RM systems are found in place of the ICR region in E.coli W. highly variable: two of the four type I families are represented, Intermediate scale view: DNA similarity patterns most likely representing ®ve different speci®cities (see surrounding the ICR below); modi®cation-dependent enzymes of two unrelated sorts are also present. The remaining gene with assignable The pattern of gene relationships described above is even function is penicillin acylase, in E.coli W. The remaining 10 more confusing than we had anticipated. To get a sharper sense genes (21 ORFs) are of unknown function. of gene relationships, we derived PIP indices [percentage of Some groups of genes migrate together (Fig. 2). The ®rst aligned bases identical; as in Florea et al. (51)] for DNA group comprises the type IA restriction genes hsdS, hsdM, segments spanning the ICR and its ¯anks, from yjiS to yjiY. hsdR and mrr (orange boxes) found in three strainsÐK-12, E.coli C was compared with six of the other strains, K-12 with S.typhimurium and K1. Second are the type IB restriction ®ve, K1 with four, and O157 with CFT073. The results are genes, also hsdS, hsdM and hsdR, and two ORFs of unknown shown in Tables 2 (for framework genes) and 3 (for migratory function, Z5949 and Z5950 (dark blue boxes) which are found genes). in O157:H7, CFT073 and C. Third, yjiW (yellow box), a gene The levels of DNA identity in the framework genes are of unknown function that may be SOS-regulated (49), appears generally consistent with expectation (Table 2). In general, in all ®ve of the strains with complete type I systems. Fourth, E.coli genes diverge at up to 5% of base pairs (52,53) (>95% Z5943 and Z5944 (green boxes) are found together in the identical), while E.coli±Salmonella divergence is 10±20% three E.coli pathogens. The strain-speci®c genes not found (53). For yjiX and yjiY, we observe divergence of up to 2% 528 Nucleic Acids Research, 2004, Vol. 32, No. 2 Figure 3. DNA sequence alignments at the right (yjiA) border of the ICR. O157:H7 was taken as the reference sequence (black letters) to minimize the amount of color. Differences from this are coded in color hierarchically, and the main goal is to highlight the sudden transition to diversity at the yjiA stop codon. Where K1 differs from O157:H7, the K1 allele is blue (even in other strains); additional differences found in C are in green; further differences found in S.typhimurium are in red; then differences found in W are in yellow and ®nally remaining differences in K-12 are in pink. within E.coli and 9±12% between E.coli and Salmonella. DNA-binding regions are <50% identical (56). The K-12/ Divergence is higher in the framework genes yjiS and yjiA, Salmonella comparison has been made before (19,57). immediately adjacent to the variable locus: up to 15% for A ®ne-scale view: borders of the ICR migratory E.coli comparisons, 13±33% for E.coli±Salmonella. In all elements pairwise comparisons, yjiS is least conserved and yjiX is most conserved among the four framework genes. This may re¯ect Figures 3 and 4 display nucleotide sequence alignments of the higher levels of homologous exchange immediately around seven strains at the right (yjiA) and left (yjiS) borders of the the variable locus, mediated by selection acting on the genes ICR. These alignments display sequence differences in found there (23,54,55). Below (next section) we discuss different colors: O157:H7 was taken as the reference sequence evidence for homologous exchange in the generation of the set (black letters) to minimize the amount of color. Differences of strains under consideration. from this are coded in color hierarchically. Those differences The pattern of sequence variation between similar migra- found in K1 or CFT073 are in blue wherever they occur; tory genes (Table 3) is consistent with previous studies. The additional differences found in C are in green; then further type IB hsdRM genes of O157 and CFT073 are highly similar differences found in Salmonella are in red; then additional to each other (98±99% nucleotide identity) and so highly differences found in W are in yellow; and remaining divergent from the type IA hsdRM genes of K-12 that differences found in K-12 are in pink. meaningful alignments cannot be made. This divergence is in Three properties of these border sequences are evident. line with earlier characterization of these families (56). The First, extreme sequence divergence begins abruptly after the hsdS genes of the two pathogens show the patchy similarity stop codon of the ¯anking framework gene (to the left of expected for hsdS genes encoding different sequence speci- the yjiA stop codon in Fig. 3; to the right of the yjiS stop codon ®city: conserved regions of the protein can be recognized as in Fig. 4), visually recognized as a high level of color segments with 85±97% DNA identity, while the inferred variation. Nucleic Acids Research, 2004, Vol. 32, No. 2 529 Figure 4. DNA sequence alignments at the left (yjiS) border of the ICR. O157:H7 was taken as the reference sequence (black letters) to minimize the amount of color. Differences from this are coded in color hierarchically as in Figure 3. The sequence of the hsdR fragment found in C is boxed. Second, families of sequences can nevertheless be identi- homologous recombination have obscured relationships, and ®ed, and at the right border these are congruent with families two novel sequences generate further variation. identi®ed by patterns of shared genes seen in Figure 2. At the K1 appears to be a recombinant between strains similar to right border, O157:H7, CFT073 and E.coli C form one family K-12 and CFT073. Markers ¯anking the recombination (reference sequence, black), K1 and K-12 form another (blue interval are a short deletion and the different hsd family genes. The deletion joins the yjiR and yjiS coding sequences in highlighted changes); while S.typhimurium (red) and E.coli W (yellow) are each unique (Fig. 3). This is consistent with gene CFT073 and K1 (Fig. 4). (Note that these genes are coded on groupings: O157:H7, CFT073 and C all share Z5950 imme- opposite strands, so no fusion protein will be expressed.) diately to the left of yjiA, K1 and K-12 share mrr, W carries CFT073 and K1 also share similar sequence to the right of the pac, and Salmonella carries STM5428 and STM5429 inter- yjiS stop codon, the only two strains to do so. These two are thus grouped together at the left border, although they were posed between mrr and yjiA. Sequence families are so dissimilar that they appear to be unrelated. This suggests not grouped at the right border. Inspection of Figure 2 suggests that four independent DNA acquisition events may have that a recombination event between strains similar to K-12 and occurred at this location, immediately adjacent to yjiA. CFT073, occurring within yjiW, could generate the gene Third, the family structure observed at the right border is con®guration found in K1. The CFT073-like ancestor must almost totally erased at the left (yjiS) border (Fig. 4). As at the have already carried the deletion, since K1 carries this as well. right border, sequences associated with four different This proposed sequence of events in CFT073 and K1 does categories of genes are found adjacent to the border (Z5943, not account for all the variation at the yjiS border, however. yjiT, STM4552 and penicillin acylase), but deletion and Two anomalies are evident in comparing Figures 2 and 4. 530 Nucleic Acids Research, 2004, Vol. 32, No. 2 Table 4. YjiAXY homologs and ORFs C-terminal to (left of) YjiA in Pseudomonas species Organism ORF C-terminal to YjiA homolog and E.coli K-12 homolog and % amino acid identity to K-12 best hit to GenBank database yjiA yjiX yjiY Pseudomonas ZP_00086222: Best hit to AAN67866, ZP_00086223: 171/313 ZP_00086224: 43/60 ZP_00086225: 484/708 ¯uorescens PfO-1 `transcriptional regulator, GntR family' identical (54%) identical (71%) identical (68%) of Pseudomonas putida, 110/214 identical (51%) Pseudomonas AAG07991: Best hit to NMB0780, AAG07992: 169/318 AAG07993: 45/67 AAG07994: 489/708 aeruginosa PA01 hypothetical protein of Neisseria identical (53%) identical (68%) identical (69%) meningitidis, 63/117 identical (53%) Pseudomonas putida AAN70211: Best hit to ZP_00103628, AAN70212: 168/320 AAN70213: 46/65 AAN70214: 482/681 KT2440 `methylenetetrahydrofolate reductase identical (52%) identical (70%) identical (70%) domain protein' of Desul®tobacterium hafniense, 127/468 identical (27%) Pseudomonas syringae AAO58081: Best hit to AA036475, AAO58082: 167/321 AAO58083: 46/65 AAO58084: 449/636 pv. tomato str. DC3000 `membrane associated protein' of identical (52%) identical (70%) identical (70%) Clostridium tetani, 56/187 identical (29%) Pseudomonas syringae ZP_00126198: Best hit to AAO58078, ZP_00126197: 164/319 ZP_00126196: 46/65 ZP_00126195: 479/707 pv. syringae B728a serine hydroxymethyltransferase of identical (51%) identical (70%) identical (67%) Pseudomonas syringae pv. tomato str. DC3000, 410/417 identical (98%) First, the pattern of shared genes seen next to yjiS (Fig. 2) found immediately adjacent to their sites of action. Therefore, would suggest that O157:H7 should belong with CFT073 and the distribution and environment of the contiguous framework K1, yet O157:H7 carries 57 bp of unique sequence between genes yjiA, yjiX and yjiY (Fig. 1) were considered in light of the yjiS stop codon and common sequence to the left this possibility. (upstream) of Z5943. Identity among the three strains begins These framework genes are present in all complete E.coli, abruptly after this 57 bp, 36 bp from the Z5943 start codon. In Shigella and Salmonella genomes but are absent from more effect, O157:H7 has suffered a short replacement event, distant enteric species: for example from the two complete compared with K1 and CFT073. Yersinia pestis genomes, the two complete Buchnera aphidi- The second anomaly concerns E.coli C. From Figure 2, we cola genomes, or the partially completed Klebsiella pneumo- might have proposed that a simple deletion, occurring in a nia sequences. Nevertheless, strongly conserved homologs of strain like one of the pathogens, had joined yjiS with the these genes are found in the distant non-enteric relative middle of hsdR. However, in Figure 4 we see that in E.coli C, Pseudomonas aeruginosa (PA4604, PA4605 and PA4606). 47 bp of unique sequence lies between the yjiS stop codon and They are present in conserved order and orientation, and are the hsdR fragment. This sequence is unlike either that of >50% identical at the amino acid level. The conservation of O157:H7 or that shared by CFT073 and K1. Possibly some organization may indicate a functional relationship among still-unknown cassette was acquired at some time to occupy these genes (58). the position of Z5943/4, followed by a deletion event. This Interestingly, all ®ve of the sequenced Pseudomonas have brings the number of cassette acquisitions to ®ve; in any case, yjiA, but the sequences immediately downstream differ some event occurred at this border to generate novel sequence. (Table 4); the ®ve ORFs at this location are unrelated. This In summary, there must have been six different events at the situation would be expected if the gene cluster acts as a left border, and four different events at the right border. This recombinase at an attachment site downstream of yjiA. suggests an active mechanism for generating sequence variation precisely at this locus. This mechanism is such that six of seven strains carry 50±60 base pairs of unique intergenic DISCUSSION sequence at the yjiS border, even when genes immediately to the right of that unique sequence are shared. The border Based on our sequence comparisons, the borders of the ICR sequence (adjacent to the yjiS stop) also appears to form a are the yjiS and yjiA genes. A variety of migratory genes and barrier to sequence rearrangement, since deletion on the left gene fragments lie between these genes in the different strains. (joining yjiR and yjiS in K1 and CFT073) and both deletion (in Between these border genes, the variability of the ICR begins C) and replacement (in O157:H7) on the right do not cross this immediately adjacent to their convergent stop codons. The edge. Features of the right and left border sequences are identi®able genes found at the ICR have the common feature considered in the Discussion. that they mediate interaction with external challenges of a sporadic nature: restriction systems (or remnants) to cope with Phylogenetic distribution of yjiAXY and adjacent invading DNA, or an enzyme for antibiotic modi®cation sequences (penicillin acylase). It is appropriate that genes with a sporadic We consider below (Discussion) the possibility that site- distribution should respond to sporadic challenges. Below we speci®c recombination mediates acquisition of elements of the consider some mechanistic models for generating this vari- ICR. Recombinases that mediate such events are usually ation. Nucleic Acids Research, 2004, Vol. 32, No. 2 531 Figure 5. Dyad symmetry element in DNA sequence around the proposed empty `attachment site'. Dyad symmetry elements can serve as binding sites or sites of action for proteins. Conserved sequence to the left of the yjiS stop codon was joined with sequence to the right of the yjiA stop codon for each strain, and aligned. Stop codons are boxed. The consensus sequence convention is: capital, all sequences identical; lower case: a majority of sequences identical. Mismatches to the consensus are lettered in red. Nucleotides involved in an interrupted dyad symmetry element are marked with a vertical line; mismatches within the dyad with dots. Mechanism of segmental replacements at the ICR Possible attachment site. Dyad symmetry elements (inverted repeats) are often prominent components of the DNA sites at Any mechanism proposed for segmental replacements at the which integrases act, although they can sometimes be dif®cult ICR must account for the abrupt transition from similarity to to detect (59). The alignment of sequences obtained by joining dissimilarity at the borders (Figs 3 and 4). In addition, the left and right borders of the ICR in the seven strains reveals horizontal transfer must play a role, since divergence among such an element (Fig. 5), identi®ed using the program the hsd genes of the E.coli strains is greater than that between STEMLOOP (see Materials and Methods). This joined those of K-12 and S.typhimurium. sequence would be the presumed empty attachment site if these segments were prophage-like or cassette acquisitions as Site-speci®c recombinase (cassette) model. Our preferred in integrons. A 13 bp dyad symmetry element (with two model invokes an attachment site for site-speci®c recombin- mismatches) surrounds the joined stop codons, interrupted by ation. It is known that unrelated phages may carry related 5 bp including the yjiS stop. Even though no lambda-integrase- integrases and can use the same attachment site [e.g. phage 21 related gene is present, the dyad symmetry could still be and e14; (59)]. Thus, the inserted sequence can be totally important for a novel site-speci®c recombinase. We note dissimilar except for the integration machinery. Such a model further that the structure of the occupied sites is not that would allow the inserted segments to diverge over long expected from a lambda-like integrase enzyme. Occupied sites periods, suf®cient even to allow divergence of the families of should conserve the dyad structure at each borderÐthat is, type I restriction enzymes, while conserving the sequences Figures 3 and 4 should also exhibit the presence of dyads or required for recombinase action in the host genome by sequences related to them, especially in any consensus selection for function of the ¯anking genes. The precision of sequence. the borders is immediately accounted for. Cassettes of genes would be propagated to new hosts only if they retained Illegitimate recombination/transposition model. An alterna- sequences compatible with recombinase speci®city. A pool of tive model would invoke random insertion events introducing genes would then evolve that circulates speci®cally at this new genes, for example by non-homologous end-joining locus. [possibly stimulated by the action of a restriction enzyme (61)], or by transposition. Purifying selection against those Recombinase candidate. The problem with this model is that insertions in unfavorable locations would eliminate most for all site-speci®c mechanisms known to mediate DNA events. In the present case, the problem is to account for the acquisition, the gene for the recombinase is closely associated apparently precise nature of the insertion process. with the recombining segment [this is not true for XerCD, Illegitimate recombination has been invoked to explain the which mediates resolution of circular dimers at unlinked sites tight packing of related and unrelated genes in phage genomes. of action; XerCD action is highly regulated and has not been In a comparison of the genomic sequences of 14 mycobacter- reported to mediate sequence acquisition (60)]. For phages (or iophage, Pedulla et al. (62) also ®nd a mosaic structure with transposon Tn7 in site-speci®c mode), the gene(s) is (are) segments separated by sharp transitions in sequence similarity. contained within the mobile segment, while in the case of As in our bacterial ICR sequences, they ®nd that most mosaic integron integrases, the recombinase gene lies immediately boundaries in these phage sequences are located at or very outside the mobile segment. In the present example, no near gene boundaries, but of the >500 boundaries found, three integrase/transposase candidate is immediately identi®able by disrupted coding sequence. Pedulla et al. invoked an earlier sequence similarity. However, the neighboring gene cluster suggestion (63) that such diversity must be generated by a yjiAXY might code for a novel recombinase of episodic combination of illegitimate and homologous recombination distribution. Consistent with this proposal, this cluster is found and mutational drift. In their model, recombination can occur immediately adjacent to a variable locus in the distantly virtually anywhere along the phage sequence, but the only related genus Pseudomonas (Table 4). Interestingly, it is missing from several organisms more closely related to E.coli, phages that survive to be analyzed have undergone recombi- suggesting that the cluster itself, as well as the cassettes nation events that do not diminish the function of any phage adjacent, may be mobile, or at any rate dispensable. protein on which natural selection acts. 532 Nucleic Acids Research, 2004, Vol. 32, No. 2 In the case under study, this seems an unsatisfactory model. symmetry element nor candidate recombinase is identi®able In E.coli K-12, the entire region can be deleted [Raleigh et al. in Neisseria. (64) and unpublished observations], so selective constraints Why is there so much variation? are presumably less than is the case for the streamlined phage The nature of selective pressures on restriction enzyme genomes. Thus, the insertion locations would be expected to function has been considerably debated (see references in vary somewhat in different events. Moreover, the model does 13,17,71). All models for the evolution of these genes proceed not account for the high frequency of highly divergent from the diversity observed. Three general models have been restriction genes at this location. At least six different events proposed: that RM systems protect the host from invading must have occurred among the seven strains, at least at the left DNA (phages or plasmids); that they enable the host to (yjiS) border (see Results: A ®ne scale view). promote or regulate recombination (a sexual function) or that they have sel®sh features that promote their own spread. In all Differential deletion model. Differential deletion from a large cases, there must be horizontal DNA transfer events to provide original set of `migratory genes' could in principle account for a selection substrate. what we see. Selection for function of the framework genes The site-speci®c recombinase model could account for the would be required. This model would require an initial set of high frequency of restriction loci at this site, while not 20 different genes. Many of the deletion events would have excluding that other functions, also advantageous when rare had to occur in a highly precise fashion to generate the sharp (72), might acquire the ability to circulate here (e.g. the borders observed. Trials on paper (not shown) suggest that 11 pac gene). This could occur when such genes acquire different deletion events and (contrary to the model) two small sequence determinants that allow interaction with the putative insertion events (of 47 and 57 bp, in E.coli C and O157:H7) recombinase. would be required to generate the observed pattern. Even more genes and events would be required to generate the novel Variable loci in enteric bacteria joints of C and O157:H7 by deletion only. Of those 13 events, Insertion/deletion variation has been extensively discussed in eight (including both insertions) would involve one of the two reports on enteric genome sequences, but most discussion has borders in an event with a precise endpoint (®ve on the left and focused on the gene content of the variable islands, not the three on the right; not shown). Deletions would have had to mechanism of acquisition. Phage-like islands or transposase bring the type I genes adjacent to yjiW in two different ways, homologues near island junctions have been noted in some with the type IA and type IB systems separately eliminated. cases, but these are present in only a fraction of examples. This mechanism seems excessively baroque, and does not Clearly, some mechanisms remain to be discovered. readily account for any of the properties of the locus. RM variation in other taxa ACKNOWLEDGEMENTS Variation of RM systems within populations has been a topic We thank Noreen Murray, Rich Roberts, Roger Milkman and of interest for other taxa also. In Mycoplasma genitalium, site- Romualdas Vaisvila for extensive discussions and intellectual speci®c recombinase action enables reassortment of endogen- support during conception of this paper. Guy Plunkett shared ous Type I speci®city domains, such that any cell in the sequence of the E.coli pathogens in advance of publication. population may express one or two of at least eight possible Rob Edwards was suf®ciently critical that we re-examined the speci®cities (65). Horizontal transfer is not needed for this. sequence environment surrounding the proposed attachment In Helicobacter pylori, very large numbers of RM site. We thank Don Comb and New England Biolabs for sequences are found in each isolate, with around two dozen support. candidate systems in each strain (66,67), scattered around the chromosome. Many of the type II system candidates turn out to be inactive or only partially active (68), with non- REFERENCES overlapping sets of four systems active in the two strains 1. Raleigh,E.A. (1992) Organization and function of the mcrBC genes of examined (67). Reactivation of inactive alleles by horizontal E.coli K-12. Mol. Microbiol., 6, 1079±1086. transfer and homologous recombination has been invoked as a 2. Boyer,H.W. (1964) Genetic control of restriction and modi®cation in Escherichia coli. J. Bacteriol., 88, 1652±1660. regulatory mechanism (69). No locus has been described that 3. Arber,W. and Wauters-Willems,D. (1970) Host speci®city of DNA contains multiple alternative speci®cities. produced by Escherichia coli. XII. The two restriction and modi®cation A closer analog of our situation is a variable restriction systems of strain 15T-. Mol. Gen. Genet., 108, 203±217. locus in Neisseria, found between two conserved genes, pheS 4. Barcus,V.A., Titheradge,A.J.B. and Murray,N.E. (1995) The diversity of and pheT (70). The variable region contains between zero and alleles at the hsd locus in natural populations of Escherichia coli. Genetics, 140, 1187±1197. six genes, six of the nine strains contain RM-related genes, and 5. Bullas,L.R., Colson,C. and Neufeld,B. (1980) Deoxyribonucleic acid divergence begins immediately after the start and stop codons restriction and modi®cation systems in Salmonella: chromosomally of the ¯anking genes pheS and pheT. Saunders and Snyder located systems of different serotypes. J. Bacteriol., 141, 275±292. invoke homologous recombination as a mechanism for 6. Daniel,A.S., Fuller-Pace,F.V., Legge,D.M. and Murray,N.E. (1988) Distribution and diversity of hsd genes in Escherichia coli and other reassortment of the contents of the variable sequences (70), enteric bacteria. J. Bacteriol., 170, 1775±1782. much as we see with K1, an apparent recombinant between 7. Bertani,G. and Weigle,J.J. (1953) Host-controlled variation in bacterial CFT073-like and K-12-like ancestors. There as here, however, viruses. J. Bacteriol., 65, 113±121. homologous recombination could not account for the insertion 8. Gossen,J.A. and Vijg,J. (1988) E.coli C: a convenient host strain for of diverse sequence at precise locations. Neither dyad rescue of highly methylated DNA. Nucleic Acids Res., 16, 9343. Nucleic Acids Research, 2004, Vol. 32, No. 2 533 9. Hall,R.M. and Collis,C.M. (1995) Mobile gene cassettes and integrons: 33. Koonin,E.V., Mushegian,A.R. and Rudd,K.E. (1996) Sequencing and capture and spread of genes by site-speci®c recombination. Mol. analysis of bacterial genomes. Curr. Biol., 6, 404±416. Microbiol., 15, 593±600. 34. Rappeley,C.A. and Roth,J.R. (1997) Transposition without transposase: a 10. Campbell,A., Schneider,S.J. and Song,B. (1992) Lambdoid phages as spontaneous mutation in bacteria. J. Bacteriol., 179, 2047±2052. elements of bacterial genomes. Genetica, 86, 259±267. 35. Sambrook,J., Fritsch,E.F. and Maniatis,T. (1989) Molecular Cloning: 11. Lacks,S. and Greenberg,B. (1977) Complementary speci®city of A Laboratory Manual. 2nd Edn. Cold Spring Harbor Laboratory Press, restriction endonucleases of Diplococcus pneumoniae with respect to Cold Spring Harbor, NY. DNA methylation. J. Mol. Biol., 114, 153±168. 36. Siedman,C.E., Struhl,K. and Jessen,T. (1997) Introduction of plasmid 12. Lacks,S.A., Mannarelli,B.M., Springhorn,S.S. and Greenberg,B. (1986) DNA into cells. In Ausubel,F.M., Kingston,R.B., Moore,D.D., Genetic basis of the complementary DpnI and DpnII restriction systems Seidman,J.G., Smith,J.A. and Struhl,K. (eds), Current Protocols in of S. pneumoniae: an intercellular cassette mechanism. Cell, 46, Molecular Biology, pp. 1.8.1±1.8.10. 993±1000. 37. Roper,D.I., Fawcett,T. and Cooper,R.A. (1993) The Escherichia coli C 13. Murray,N.E. (2002) Immigration control of DNA in bacteria: self versus homoprotocatechuate degradative operon: hpc gene order, direction of non-self. Microbiology, 148, 3±20. transcription and control of expression. Mol. Gen. Genet., 237, 241±250. 14. Ochman,H. and Selander,R.K. (1984) Standard reference strains of 38. Blattner,F.R., Plunkett,G., Bloch,C.A., Perna,N.T., Burland,V., Riley,M., Escherichia coli from natural populations. J. Bacteriol., 157, 690±693. Collado-Vides,J., Glasner,J.D., Rode,C.K., Mayhew,G.F. et al. (1997) 15. Milkman,R. (1973) Electrophoretic variation in Escherichia coli from The complete genome sequence of Escherichia coli K-12. Science, 277, natural sources. Science, 182, 1024±1026. 1433±1462. 16. Barcus,V.A. and Murray,N.E. (1995) Barriers to recombination: 39. Oh,S.J., Kim,Y.C., Park,Y.W., Min,S.Y., Kim,I.S. and Kang,H.S. (1987) restriction. Soc. Gen. Microbiol. Symp., 52, 31±58. Complete nucleotide sequence of the penicillin G acylase gene and the 17. Bickle,R.A. and Kru È ger,D.H. (1993) Biology of DNA restriction. ¯anking regions and its expression in Escherichia coli. Gene, 56, 87±97. Microbiol. Rev., 57, 434±450. 40. Prieto,M.A., Diaz,E. and Garcia,J.L. (1996) Molecular characterization 18. Titheradge,A.J., Ternent,D. and Murray,N.E. (1996) A third family of of the 4-hydroxyphenylacetate catabolic pathway of Escherichia coli W: allelic hsd genes in Salmonella enterica: sequence comparisons with engineering a mobile aromatic degradative cluster. J. Bacteriol., 178, 111±120. related proteins identify conserved regions implicated in restriction of DNA. Mol. Microbiol., 22, 437±447. 41. Roa,A. and Garcia,J.L. (1999) New insights into the regulation of the pac 19. Sharp,P.M., Kelleher,J.E., Daniel,A.S., Cowan,G.M. and Murray,N.E. gene from Escherichia coli W ATCC 11105. FEMS Microbiol. Lett., (1992) Roles of selection and recombination in the evolution of type I 177, 7±14. restriction-modi®cation systems in enterobacteria. Proc. Natl Acad. Sci. 42. Geer,L.Y., Domrachev,M., Lipman,D.J. and Bryant,S.H. (2002) USA, 89, 9836±9840. CDART: protein homology by domain architecture. Genome Res., 12, 1619±1623. 20. Olsen,G.J., Woese,C.R. and Overbeek,R. (1994) The winds of (evolutionary) change: breathing new life into microbiology. J. Bacteriol., 43. Rudd,K.E. (1998) Linkage map of Escherichia coli K-12, edition 10: the 176, 1±6. physical map. Microbiol. Mol. Biol. Rev., 62, 985±1019. 21. McClelland,M., Florea,L., Sanderson,K., Clifton,S.W., Parkhill,J., 44. Berlyn,M.K. (1998) Linkage map of Escherichia coli K-12, edition 10: Churcher,C., Dougan,G., Wilson,R.K. and Miller,W. (2000) Comparison the traditional map. Microbiol. Mol. Biol. Rev., 62, 814±984. of the Escherichia coli K-12 genome with sampled genomes of a 45. Valle,F., Gosset,G., Tenorio,B., Oliver,G. and Bolivar,F. (1986) Klebsiella pneumoniae and three salmonella enterica serovars, Characterization of the regulatory region of the Escherichia coli Typhimurium, Typhi and Paratyphi. Nucleic Acids Res., 28, 4974±4986. penicillin acylase structural gene. Gene, 50, 119±122. 22. Milkman,R. (1999) Gene transfer in Escherichia coli. In Charlebois,R.L. 46. Radoja,S., Francetic,O., Stojicevic,N., Moric,I., Glisin,V. and (ed.), Organization of the Prokaryotic Genome. American Society for Konstantinovic,M. (1999) DNA region responsible for transcriptional Microbiology, Washington, DC, pp. 291±309. regulation of the Escherichia coli penicillin amidase (pac) gene by CRP 23. Milkman,R., Jaeger,E. and McBride,R.D. (2003) Molecular evolution of and PAA. Genet. Anal., 15, 235±238. the Escherichia coli chromosome. VI. Two regions of high effective 47. Altschul,S.F., Madden,T.L., Schaffer,A.A., Zhang,J., Zhang,Z., recombination. Genetics, 163, 475±483. Miller,W. and Lipman,D.J. (1997) Gapped BLAST and PSI-BLAST: a 24. Reeves,P. (1993) Evolution of Salmonella O antigen variation by new generation of protein database search programs. Nucleic Acids Res., interspeci®c gene transfer on a large scale. Trends Genet., 9, 17±22. 25, 3389±3402. 25. Liu,D. and Reeves,P.R. (1994) Presence of different O antigen forms in 48. Mira,A. and Ochman,H. (2002) Gene location and bacterial sequence three isolates of one clone of Escherichia coli. Genetics, 138, 6±10. divergence. Mol. Biol. Evol., 19, 1350±1358. 26. Lawrence,J.G. and Ochman,H. (1998) Molecular archaeology of the 49. Lewis,L.K., Harlow,G.R., Gregg-Jolly,L.A. and Mount,D.W. (1994) Escherichia coli genome. Proc. Natl Acad. Sci. USA, 95, 9413±9417. Identi®cation of high af®nity binding sites for LexA which de®ne new 27. Wood,W.B. (1966) Host speci®city of DNA produced by Escherichia DNA damage-inducible genes in Escherichia coli. J. Mol. Biol., 241, coli: bacterial mutations affecting the restriction and modi®cation of 507±523. DNA. J. Mol. Biol., 16, 118±133. 50. Schumacher,G., Sizmann,D., Haug,H., Buckel,P. and Bock,A. (1986) 28. Welch,R.A., Burland,V., Plunkett,G.,3rd, Redford,P., Roesch,P., Penicillin acylase from E.coli: unique gene-protein relation. Nucleic Rasko,D., Buckles,E.L., Liou,S.R., Boutin,A., Hackett,J. et al. (2002) Acids Res., 14, 5713±5727. Extensive mosaic structure revealed by the complete genome sequence of 51. Florea,L., Riemer,C., Schwartz,S., Zhang,Z., Stojanovic,N., Miller,W. uropathogenic Escherichia coli. Proc. Natl Acad. Sci. USA, 99, and McClelland,M. (2000) Web-based visualization tools for bacterial 17020±17024. genome alignments. Nucleic Acids Res., 28, 3486±3496. 29. Perna,N.T., Plunkett,G.,3rd, Burland,V., Mau,B., Glasner,J.D., Rose,D.J., 52. Wang,F.S., Whittam,T.S. and Selander,R.K. (1997) Evolutionary Mayhew,G.F., Evans,P.S., Gregor,J., Kirkpatrick,H.A. et al. (2001) genetics of the isocitrate dehydrogenase gene (icd) in Escherichia coli Genome sequence of enterohaemorrhagic Escherichia coli O157:H7. and Salmonella enterica. J. Bacteriol., 179, 6551±6559. Nature, 409, 529±533. 53. Nelson,K. and Selander,R.K. (1992) Evolutionary genetics of the proline 30. McClelland,M., Sanderson,K.E., Spieth,J., Clifton,S.W., Latreille,P., permease gene (putP) and the control region of the proline utilization Courtney,L., Porwollik,S., Ali,J., Dante,M., Du,F. et al. (2001) Complete operon in populations of Salmonella and Escherichia coli. J. Bacteriol., genome sequence of Salmonella enterica serovar Typhimurium LT2. 174, 6886±6895. Nature, 413, 852±856. 54. Boyd,E.F., Nelson,K., Wang,F.S., Whittam,T.S. and Selander,R.K. 31. Parkhill,J., Dougan,G., James,K.D., Thomson,N.R., Pickard,D., Wain,J., (1994) Molecular genetic basis of allelic polymorphism in malate Churcher,C., Mungall,K.L., Bentley,S.D., Holden,M.T. et al. (2001) dehydrogenase (mdh) in natural populations of Escherichia coli and Complete genome sequence of a multiple drug resistant Salmonella Salmonella enterica. Proc. Natl Acad. Sci. USA, 91, 1280±1284. enterica serovar Typhi CT18. Nature, 413, 848±852. 55. Bisercic,M., Feutrier,J.Y. and Reeves,P.R. (1991) Nucleotide sequences 32. Worth,L.,Jr, Clark,S., Radman,M. and Modrich,P. (1994) Mismatch of the gnd genes from nine natural isolates of Escherichia coli: evidence repair proteins MutS and MutL inhibit RecA-catalyzed strand transfer of intragenic recombination as a contributing factor in the evolution of between diverged DNAs. Proc. Natl Acad. Sci. USA, 91, 3238±3241. the polymorphic gnd locus. J. Bacteriol., 173, 3894±3900. 534 Nucleic Acids Research, 2004, Vol. 32, No. 2 56. Murray,N.E. (2000) Type I restriction systems: sophisticated molecular 65. Dybvig,K., Sitaraman,R. and French,C.T. (1998) A family of phase- machines (a legacy of Bertani and Weigle). Microbiol. Mol. Biol. Rev., variable restriction enzymes with differing speci®cities generated by 64, 412±434. high-frequency gene rearrangements. Proc. Natl Acad. Sci. USA, 95, 57. Murray,N.E., Daniel,A.S., Cowan,G.M. and Sharp,P.M. (1993) 13923±13928. Conservation of motifs within the unusually variable polypeptide 66. Nobusato,A., Uchiyama,I. and Kobayashi,I. (2000) Diversity of sequences of type I restriction and modi®cation enzymes. Mol. restriction-modi®cation gene homologues in Helicobacter pylori. Gene, Microbiol., 9, 133±143. 259, 89±98. 58. Overbeek,R., Fonstein,M., D'Souza,M., Pusch,G.D. and Maltsev,N. 67. Xu,Q., Morgan,R.D., Roberts,R.J. and Blaser,M.J. (2000) Identi®cation (1999) The use of gene clusters to infer functional coupling. Proc. Natl of type II restriction and modi®cation systems in Helicobacter pylori Acad. Sci. USA, 96, 2896±2901. reveals their substantial diversity among strains. Proc. Natl Acad. Sci. 59. Campbell,A.M. (1992) Chromosomal insertion sites for phages and USA, 97, 9671±9676. plasmids. J. Bacteriol., 174, 7495±7499. 68. Kong,H., Lin,L.F., Porter,N., Stickel,S., Byrd,D., Posfai,J. and 60. Barre,F.X., Soballe,B., Michel,B., Aroyo,M., Robertson,M. and Roberts,R.J. (2000) Functional analysis of putative restriction- Sherratt,D. (2001) Circles: the replication-recombination-chromosome modi®cation system genes in the Helicobacter pylori J99 genome. segregation connection. Proc. Natl Acad. Sci. USA, 98, 8189±8195. Nucleic Acids Res., 28, 3216±3223. 61. Kusano,K., Sakagami,K., Yokochi,T., Naito,T., Tokinaga,Y., Ueda,E. 69. Aras,R.A., Small,A.J., Ando,T. and Blaser,M.J. (2002) Helicobacter and Kobayashi,I. (1997) A new type of illegitimate recombination is pylori interstrain restriction-modi®cation diversity prevents genome dependent on restriction and homologous interaction. J. Bacteriol., 179, subversion by chromosomal DNA from competing strains. Nucleic Acids 5380±5390. Res., 30, 5391±5397. 62. Pedulla,M.L., Ford,M.E., Houtz,J.M., Karthikeyan,T., Wadsworth,C., 70. Saunders,N.J. and Snyder,L.A. (2002) The minimal mobile element. Lewis,J.A., Jacobs-Sera,D., Falbo,J., Gross,J., Pannunzio,N.R. et al. Microbiology, 148, 3756±3760. (2003) Origins of highly mosaic mycobacteriophage genomes. Cell, 113, 71. Kobayashi,I. (2001) Behavior of restriction-modi®cation systems as 171±182. sel®sh mobile elements and their impact on genome evolution. 63. Juhala,R.J., Ford,M.E., Duda,R.L., Youlton,A., Hatfull,G.F. and Nucleic Acids Res., 29, 3742±3756. Hendrix,R.W. (2000) Genomic sequences of bacteriophages HK97 and 72. Levin,B.R. (1988) Frequency-dependent selection in bacterial HK022: pervasive genetic mosaicism in the lambdoid bacteriophages. populations. Philos. Trans. R. Soc. Lond., B, Biol. Sci., 319, J. Mol. Biol., 299, 27±51. 459±472. 64. Raleigh,E.A., Trimarchi,R. and Revel,H. (1989) Genetic and physical mapping of the mcrA (rglA) and mcrB (rglB) loci of Escherichia coli K-12. Genetics, 122, 279±296.
Nucleic Acids Research – Oxford University Press
Published: Jan 16, 2004
You can share this free article with as many people as you like with the url below! We hope you enjoy this feature!
Read and print from thousands of top scholarly journals.
Already have an account? Log in
Bookmark this article. You can see your Bookmarks on your DeepDyve Library.
To save an article, log in first, or sign up for a DeepDyve account if you don’t already have one.
Copy and paste the desired citation format or use the link below to download a file formatted for EndNote
Access the full text.
Sign up today, get DeepDyve free for 14 days.
All DeepDyve websites use cookies to improve your online experience. They were placed on your computer when you launched this website. You can change your cookie settings through your browser.