TY - JOUR AU1 - Mann,, Evan AU2 - Kimber, Matthew, S AU3 - Whitfield,, Chris AB - Abstract The structures of bacterial cell surface glycans are remarkably diverse. In spite of this diversity, the general strategies used for their assembly are limited. In one of the major processes, found in both Gram-positive and Gram-negative bacteria, the glycan is polymerized in the cytoplasm on a polyprenol lipid carrier and exported from the cytoplasm by an ATP-binding cassette (ABC) transporter. The ABC transporter actively participates in determining the chain length of the glycan substrate, which impacts functional properties of the glycoconjugate products. A subset of these systems employs an additional elaborate glycan capping strategy that dictates the size distribution of the products. The hallmarks of prototypical capped glycan systems are a chain-terminating enzyme possessing a coiled-coil molecular ruler and an ABC transporter possessing a carbohydrate-binding module, which recognizes the glycan cap. To date, detailed investigations are limited to a small number of prototypes, and here, we used our current understanding of these processes for a bioinformatics census of other examples in available genome sequences. This study not only revealed additional instances of existing terminators but also predicted new chemistries as well as systems that diverge from the established prototypes. These analyses enable some new functional hypotheses and offer a roadmap for future research. ABC transporter, bacterial polysaccharides, carbohydrate-binding module, coiled coil, glycan modifications Introduction Bacterial cells display complex carbohydrate-containing structures on their surfaces. The glycans include capsular polysaccharides (CPSs) and the N/O-linked glycans of glycoproteins, as well as lipopolysaccharide (LPS) O-antigen (OPS) confined to Gram-negative organisms, or teichoic acids and secondary cell-wall polymers (SCWPs) in Gram-positive bacteria (Whitfield et al. 2017). These glycans play prominent roles in many important biological processes, including biofilm formation, host–pathogen and host–symbiont interactions, and protection against mammalian immune defenses. They are also major contributors to microbial diversity, shaped by a variety of factors including host immune response (in pathogens) and bacteriophages that exploit glycans as receptors (Mostowy and Holt 2018). Despite the structural diversity evident in their surface glycans, bacteria rely on a few well-conserved strategies to assemble and export these molecules. The three main assembly processes are largely defined by the exporter family employed to translocate finished products (or their repeat-unit building blocks) across the cytoplasmic membrane. These exporters are synthase proteins, which couple synthesis and export (Keenleyside and Whitfield 1996; Whitney and Howell 2013; McNamara et al. 2015); multidrug/oligosaccharidyl-lipid/polysaccharide family flippases (e.g., MurJ, Zheng et al. 2018; Kuk et al. 2019); or ATP-binding cassette (ABC) transporters. The latter class of transporters provides the focus of this review. Biosynthesis of OPS substrates for glyco-ABC transporters We have exploited LPS OPS from Escherichia coli and Klebsiella pneumoniae as model systems to elucidate the factors involved in the recognition and export of glycan substrates by ABC transporters. LPS is a complex glycolipid and is conceptually divided into three structural regions: lipid A, a core oligosaccharide (core OS), and (in many cases) an OPS (Raetz and Whitfield 2002; Whitfield and Trent 2014). OPS is the most variable constituent in LPS (>180 structural O-serotypes in E. coli isolates alone), due to different combinations of monosaccharides, linkage types, and nonsugar modifications. The lipid A-core OS and OPS components of LPS follow independent biosynthesis and export pathways and are joined together in the periplasm. The completed LPS molecule is then translocated to the cell surface by a transenvelope (Lpt) multiprotein machine (Okuda et al. 2016). Many OPS assembly systems involve an ABC exporter and they provide tractable models to study general functional aspects of this family of glyco-exporters due to known glycan structures, defined genetic loci, established (in some cases) biochemical activities required for synthesis, and experimental tools to evaluate transport in vivo. Comparable ABC transporter-dependent processes occur in N/O-linked glycans attached to protein S-layers (SLGs), SCWPs or teichoic acids, and some Gram-negative CPSs, so the outcomes of OPS investigations can be widely applicable. Currently, two subgroups of ABC transporter-dependent OPS biosynthesis systems have been described, differing primarily in the mechanism of glycan engagement (Greenfield and Whitfield 2012; Liston et al. 2017). The “classical” simpler ABC transporter-dependent OPS export model is represented by K. pneumoniae serotype O2a (Kos et al. 2009; Figure 1A). All OPS are assembled using undecaprenyl (Und)-linked intermediates and O2a OPS assembly requires three dedicated glycosyltransferase (GT) enzymes (WbbMNO), which form a membrane-associated complex (Kos and Whitfield 2010). This system generates a wide distribution of OPS chain lengths, ranging from a couple of repeat units to > 20 (Whitfield et al. 1997) and is dictated by the relative activities of polymerizing GTs and the ABC transporter (Kos et al. 2009). Fig. 1 Open in new tabDownload slide Biosynthesis of prototype OPSs. The genetic loci and lipid-linked OPS intermediates from K. pneumoniae O2a (A) and E. coli O9a (B) are shown. OPS glycans exported via an ABC transporter (encoded by wzm-wzt) are synthesized at the cytosol–inner membrane interface on an Und-PP-GlcNAc “primer,” which is generated by WecA, a phosphoglycosyl transferase from the enterobacterial common antigen synthesis pathway (Raetz and Whitfield 2002). Specific GTs add an “adaptor” domain, creating an appropriate acceptor for polymerizing enzymes (possessing multiple GT catalytic sites), which extend the repeat-unit domain. The polymerases are WbbM in K. pneumoniae O2a (A) and WbdA in E. coli O9a (B). In O9a, WbdD is the chain terminator possessing kinase and methyltransferase domains. The symbol representations for sugar residues used in this and subsequent figures follow the established convention (Whitfield et al. 2017). Fig. 1 Open in new tabDownload slide Biosynthesis of prototype OPSs. The genetic loci and lipid-linked OPS intermediates from K. pneumoniae O2a (A) and E. coli O9a (B) are shown. OPS glycans exported via an ABC transporter (encoded by wzm-wzt) are synthesized at the cytosol–inner membrane interface on an Und-PP-GlcNAc “primer,” which is generated by WecA, a phosphoglycosyl transferase from the enterobacterial common antigen synthesis pathway (Raetz and Whitfield 2002). Specific GTs add an “adaptor” domain, creating an appropriate acceptor for polymerizing enzymes (possessing multiple GT catalytic sites), which extend the repeat-unit domain. The polymerases are WbbM in K. pneumoniae O2a (A) and WbdA in E. coli O9a (B). In O9a, WbdD is the chain terminator possessing kinase and methyltransferase domains. The symbol representations for sugar residues used in this and subsequent figures follow the established convention (Whitfield et al. 2017). In contrast, E. coli serotype O9a is the founding model for the more complex “chain terminating” OPS assembly strategy. This model incorporates additional intricate mechanisms (absent in O2a), which impose stricter control over the chain length distribution of the OPS products and confer on the ABC exporter specificity for the glycan substrate. The dual-domain GT polymerase, WbdAO9a (Figure 1B), operates in a distributive (rather than processive) manner, where the nonreducing terminus is released from the enzyme at each step (Greenfield et al. 2012b; Liston et al. 2015). The chain length of O9a OPS is dictated by the chain terminator (WbdD) independently of the ABC transporter. The C-terminus of WbdDO9a contains a membrane anchoring amphipathic helix adjacent to a region that recruits WbdAO9a into an active heterocomplex (Figure 2A) (Clarke et al. 2009; Liston et al. 2015). The N-terminus of WbdDO9a possesses kinase and methyltransferase catalytic sites, and the addition of a phosphomethyl moiety to the nonreducing terminal mannose (Man) residue blocks further OPS extension (Clarke et al. 2004; Clarke et al. 2011; Hagelueken et al. 2012). A narrow range of OPS lengths (9–18 repeat units) is produced, and this range is influenced by two parameters: the stoichiometry of WbdAO9a and WbdDO9a (King et al. 2014) and a coiled-coil molecular ruler provided by WbdDO9a. The latter effectively separates the chain-terminating catalytic sites from the membrane, ensuring OPS termination cannot occur before a minimum length is achieved (Hagelueken et al. 2015). The principles of the O9a model are conserved in E. coli O8 where the WbdAO8 polymerase generates a poly-Man glycan with a different repeat-unit structure to O9a, which is then capped with a methyl group. The cognate WbdDO8 protein possesses a coiled-coil region and a methyltransferase catalytic site (Clarke et al. 2004; Greenfield et al. 2012a). K. pneumoniae O12 exploits a different organization of catalytic sites to achieve the same goal (Figure 2B). The nonreducing terminal rhamnose (Rha) residue of the O12 OPS is capped by a single β-(2–3)-linked 3-deoxy-d-manno-oct-2-ulosonic acid (Kdo) residue (Vinogradov et al. 2002). A trifunctional enzyme (WbbB) contains two C-terminal GTs (belonging to CAZy families GT102 and GT103), which are responsible for polymerization of the O12 OPS chain (Figure 2C;Williams et al. 2017), and these are separated from the N-terminal terminating β-Kdo GT (GT99) (Ovchinnikova et al. 2016) by a coiled-coil molecular ruler. In these systems only the nonreducing terminally capped OPS is a substrate for the ABC transporter and engagement of the glycan terminus with a specific carbohydrate-binding module (CBM) is a prerequisite for export. Fig. 2 Open in new tabDownload slide Models for chain-extension and termination in E. coli O9a and K. pneumoniae O12. The figure shows the working models for the membrane-associated polymerization–termination complexes from E. coli O9a (A) and K. pneumoniae O12 (B). (C) Schematic cartoons of the domains in the key proteins. In E. coli O9a (A) WbdA and WbdD form a membrane-anchored heterocomplex; WbdD forms a trimer. When the OPS reaches a minimum length, defined by a coiled-coil molecular ruler in WbdD, it is capped at the site of chain elongation by WbdD with methylphosphate. K. pneumoniae O12 WbbB (B) incorporates chain-extension (WbbB-C) and termination (WbbB-N) into a single protein containing a coiled-coil molecular ruler between the domains. WbbB is thought to form a dimer. Fig. 2 Open in new tabDownload slide Models for chain-extension and termination in E. coli O9a and K. pneumoniae O12. The figure shows the working models for the membrane-associated polymerization–termination complexes from E. coli O9a (A) and K. pneumoniae O12 (B). (C) Schematic cartoons of the domains in the key proteins. In E. coli O9a (A) WbdA and WbdD form a membrane-anchored heterocomplex; WbdD forms a trimer. When the OPS reaches a minimum length, defined by a coiled-coil molecular ruler in WbdD, it is capped at the site of chain elongation by WbdD with methylphosphate. K. pneumoniae O12 WbbB (B) incorporates chain-extension (WbbB-C) and termination (WbbB-N) into a single protein containing a coiled-coil molecular ruler between the domains. WbbB is thought to form a dimer. Architecture of glyco-ABC transporters Members of the ABC transporter superfamily are found in all kingdoms of life, where they are responsible for the active transport of various substrates (reviewed in (Locher 2016)). ABC transporters possess a core architecture containing two transmembrane domains (TMDs) that form the translocation pathway and two nucleotide-binding domains (NBDs) that convert chemical energy within ATP to mechanical energy to drive transport. NBDs possess conserved motifs (e.g., Walker A and B motifs and the ABC transporter signature sequence), which contribute essential residues to ATP binding and hydrolysis. Most ABC transporters operate via an alternating access mechanism, whereby access to a central substrate-binding site within the TMDs alternates between opposite faces of the membrane. MsbA, which is responsible for export of LPS lipid A-core glycolipid, offers a good example. LPS enters MsbA from the inner leaflet of the cytoplasmic membrane and binds deep within a binding pocket, extending into the outer leaflet (Mi et al. 2017). The hydrophobic binding site accommodates the LPS acyl chains, while the core-OS largely remains solvent exposed. ATP binding by the NBDs drives switching of the TMDs to an outward facing orientation, releasing the LPS to the outer leaflet. Alternating access mechanisms cannot easily conceptually accommodate export substrates composed of (up to) hundreds of sugars attached to long-chain polyisoprenoid lipids; these transporters instead require more complex strategies. This is highlighted by X-ray crystal structures of the PglK and Wzm-Wzt exporters (Figure 3A). Campylobacter jejuni PglK flips an Und-PP-heptasaccharide substrate for protein N-glycosylation in the periplasm and relies on an “outward-only” mechanism involving opening of the TMDs only to the extracytosolic membrane face (Perez et al. 2015). In this mechanism, the lipid component is proposed to remain within the lipid bilayer associated with an external helix located on each TMD (Figure 3B). Interestingly, MurJ also possesses a lateral gate leading to speculation that this is a conserved strategy for transport of polyprenol-linked oligo/polysaccharides (Kuk et al. 2019). In PglK, the pyrophosphate group of the substrate is electrostatically attracted into the electropositive lumen of outward-facing PglK, which supports transport of the glycan through the channel. The homologous Helicobacter pylori OPS flippase (Wzk) has ~ 37% sequence identity (E = e−115) with PglK. PglK can functionally replace Wzk in export of high molecular weight H. pylori OPS in genetic complementation experiments (Hug et al. 2010), demonstrating that the PglK mechanism is compatible with the export of long polysaccharide chains as well. The molecular details of how this is achieved are unclear and require direct investigation. Fig. 3 Open in new tabDownload slide Structure and mechanism of Und-PP-glycan exporters. (A) Crystal structures of C. jejuni PglK (PDB id 5C73) and Aquifex aeolicus VF5 Wzm-Wzt (PDB id 6AN7). PglK lipid-anchoring helices are indicated in red. Wzm protomers are colored blue and Wzt protomers are colored red. The approximate position of the membrane is indicated. The view of Wzm-Wzt from the periplasm reveals a continuous channel extending through the membrane. PglK uses an “outward-only” mechanism to flip an Und-PP-heptasaccharide substrate (B). The pink star represents ADP and the black star represents ATP. During the Wzm-Wzt export mechanism, the OPS is proposed to extend fully through the channel (C). The periplasmic exit site of Wzm is plugged with a lipid when the transporter is disengaged from polysaccharide. Fig. 3 Open in new tabDownload slide Structure and mechanism of Und-PP-glycan exporters. (A) Crystal structures of C. jejuni PglK (PDB id 5C73) and Aquifex aeolicus VF5 Wzm-Wzt (PDB id 6AN7). PglK lipid-anchoring helices are indicated in red. Wzm protomers are colored blue and Wzt protomers are colored red. The approximate position of the membrane is indicated. The view of Wzm-Wzt from the periplasm reveals a continuous channel extending through the membrane. PglK uses an “outward-only” mechanism to flip an Und-PP-heptasaccharide substrate (B). The pink star represents ADP and the black star represents ATP. During the Wzm-Wzt export mechanism, the OPS is proposed to extend fully through the channel (C). The periplasmic exit site of Wzm is plugged with a lipid when the transporter is disengaged from polysaccharide. PglK and Wzk are half transporters, composed of two identical subunits that each contains a TMD and NBD, while most OPS ABC transporters are heterotetramers composed of two TMD (Wzm) and two NBD (Wzt) proteins. Currently, the only reported structure is the Aquifex aeolicus VF5 Wzm-Wzt complex, which revealed a striking continuous channel extending through the membrane-embedded Wzm dimer (Bi et al. 2018) (Figure 3A). A lipid plug is predicted to close the pore when substrate is not in the channel (Caffalette et al. 2019), and this is illustrated in the cartoon model shown in Figure 3C. The channel is wide enough to support a linear polysaccharide and is lined with aromatic residues, which are commonly involved in polysaccharide binding by π-stacking interactions. The A. aeolicus and E. coli O9a Wzm-Wzt transporters share a novel cytosolic gate helix (contributed by Wzt) located close to the channel entrance that is essential for function (Bi et al. 2018). This forms part of an electropositive pocket and likely attracts the Und-PP-component of the substrate. Comparison of the apo- and ATP-bound structures of Wzm-WztAa revealed structural rearrangements near the cytosolic entrance during ATP-turnover that could provide the mechanical force necessary to “push” the polysaccharide substrate through the channel in a stepwise manner (Figure 3C). Sequence level comparisons of SLG and teichoic acid exporters reveal conserved structural features (including the gate helix) and suggest a similar mechanism to OPS export (Liston et al. 2017; Bi et al. 2018). CBMs confer exporter specificity The K. pneumoniae O2a Wzm-Wzt ABC transporter does not impart specificity for the repeat-unit structure of the OPS substrate (Kos et al. 2009; Mann et al. 2016), and it is speculated that some (or all) of the conserved Und-PP-N-acetylglucosamine (GlcNAc) portion of the biosynthetic intermediate is involved in transporter engagement (Figure 3C). However, since export is obligately coupled to chain elongation in this system (Kos et al. 2009), we cannot exclude the alternative (but less likely) strategy involving the engagement of Wzm-Wzt by a GT component of the biosynthetic machinery without direct recognition of the substrate. The lack of specificity for the glycan repeat-unit structure of the substrate is also observed with the ABC transporter for Und-linked wall teichoic acid intermediates (Schirner et al. 2011; Brown et al. 2013) and for a family of Gram-negative CPS transporters that are thought to recognize a conserved glycosylated glycerophospholipid at the reducing terminus (Whitfield 2006; Willis and Whitfield 2013). In contrast, those systems that operate with a chain-terminating mechanism possess glycan export specificity that is conferred by a CBM. The CBM is appended to the C-terminus of Wzt (Wzt-C; absent in Wzt from K. pneumoniae O2a) (Figure 4A and B). Exchanging only the CBM domain alters the transporter’s substrate specificity to correspond to the OPS of the CBM source (Cuthbertson et al. 2005). Transport of the OPS in E. coli O9a and K. pneumoniae O12 requires both the CBM (Cuthbertson et al. 2007; Mann et al. 2016) and the cognate chain-terminating modification (Cuthbertson et al. 2005; Cuthbertson et al. 2007). The terminal modification confers overall chain-length control by ensuring that substrates have been appropriately modified (after achieving a prescribed length) prior to export. The significant changes in substrate engagement are reflected in the observation that (unlike K. pneumoniae O2a) polymerization and export are not obligatorily coupled (Kos et al. 2009). Fig. 4 Open in new tabDownload slide Structures of Wzt-C CBM representatives. Cartoon representation of the E. coli O9a CBM (PDB id 2R5O) with one chain displayed in rainbow and the other in gray (A). The inset panel displays the putative binding site in the O9a CBM, and the labeled residues are essential for in vitro LPS-binding and in vivo OPS export. (B) A structural alignment (r.m.s.d. 2.0 Å) of Wzt-CO9a (yellow) and Wzt-CO12 (green; PDB id 5HNO). (C) Schematic diagrams showing the organization of Wzt proteins and sequence comparisons. The NBDs (amino acids 1—~250) contain essential conserved ABC transporter motifs. The CBMs, ~ 200 amino acids in length, confer substrate specificity and are less conserved than the NBDs. Values indicate the % identity (% similarity) shared by each domain and the equivalent domain in WztO9a. Fig. 4 Open in new tabDownload slide Structures of Wzt-C CBM representatives. Cartoon representation of the E. coli O9a CBM (PDB id 2R5O) with one chain displayed in rainbow and the other in gray (A). The inset panel displays the putative binding site in the O9a CBM, and the labeled residues are essential for in vitro LPS-binding and in vivo OPS export. (B) A structural alignment (r.m.s.d. 2.0 Å) of Wzt-CO9a (yellow) and Wzt-CO12 (green; PDB id 5HNO). (C) Schematic diagrams showing the organization of Wzt proteins and sequence comparisons. The NBDs (amino acids 1—~250) contain essential conserved ABC transporter motifs. The CBMs, ~ 200 amino acids in length, confer substrate specificity and are less conserved than the NBDs. Values indicate the % identity (% similarity) shared by each domain and the equivalent domain in WztO9a. CBMs are typically appended to glycan-active enzymes in order to potentiate long-term interactions (Boraston et al. 2004). WztO9a-C contains an immunoglobulin-like fold with back-to-back antiparallel-β-sheets, commonly seen in CBMs, and forms a tight dimer through β-strand exchange (Figure 4A) (Cuthbertson et al. 2005; Cuthbertson et al. 2007). The WztO9a-C binding interface maps to a groove on the face of the four-stranded β-sheet and is critically dependent on aromatic and charged residues, presumed to interact with the polysaccharide through ring stacking π-interactions and polar–electrostatic interactions, respectively. The overall architecture is conserved in WztO12-C despite low primary sequence similarity and a substrate-binding mode that relies on polar and electrostatic sidechains (Mann et al. 2016). At a sequence level, A. aeolicus Wzm-Wzt resembles E. coli O9a Wzm-Wzt and its putative CBM was removed to facilitate crystallization. Based on the position of the C-terminus of crystalized WztAa-N, the CBM is predicted to be located at a membrane–distal position, with glycan binding occurring between the NBD and CBM (Bi et al. 2018). The precise role of Wzt-C in the Wzm-Wzt transport process remains poorly understood. In the current model, the CBM is predicted to direct substrate to a position that facilitates its diffusion into the transport pathway. However, in vitro ATP hydrolysis by Wzt is stimulated by the CBM (without polysaccharide substrate) (Bi et al. 2018), suggesting the CBM may also play an active role in regulating ATP-binding or ATP-hydrolysis, like (non-CBM) NBD modules of some ABC importers (Bordignon et al. 2010; Yang and Rees 2015). Interestingly, functional in vivo export is possible when Wzt NBD and CBM are expressed as separate polypeptides (Cuthbertson et al. 2007; Mann et al. 2016), so there remains much to learn about its contribution to export and its interaction with the central transporter. Systems with known terminated glycans reveal conserved features in the assembly machinery The prototype systems described above suggest guiding principles for understanding exporters associated with chain-terminating mechanisms. We therefore set out to examine the broader applicability of these conserved concepts, beginning with a survey of systems where a terminal capping residue has been reported in a glycan structure and sequence data are available for a corresponding wzt-containing gene locus (Table I). This is currently a very small subset of known bacterial glycan structures, reflecting both the difficulty in assigning terminal modifications, and the fact that they have not been directly investigated in most studies. For each of the identified polysaccharide synthesis gene clusters, translated ORF sequences surrounding the Wzt homolog were analyzed by COILS from the Swiss Institute of Bioinformatics (Parry 1982; Lupas et al. 1991; Lupas 1996) to identify proteins with coiled-coil regions, which might serve as a chain-length restricting spacer domain. Consistent with the models above, most of the relevant gene clusters revealed Wzt homologs with C-terminal domains and total protein lengths ranging from 395 to 469 amino acids (compared to 246 amino acids for Wzt from K. pneumoniae O2a, which possesses only the core NBD domain; Figure 4B). Definitive assignment of CBM activity requires direct biochemical investigation but predicted structures of all of the C-terminal domains do resemble the CBMs from E. coli O9a and K. pneumoniae O12. Phyre2 (Kelley et al. 2015) models them as immunoglobulin-like folds with confidence (data not shown). However, to our surprise, ORFs encoding proteins with predicted coiled coils were only identified in some examples. Other representatives lacked either the CBM or coiled-coil protein, and these suggest variations on the existing export models (see below). Table I Polysaccharides with known structures containing nonreducing terminal modifications that are exported by ABC transporters. Structures were obtained from the bacterial carbohydrate structure database (http://www.csdb.glycoscience.ru/bacterial/main.html) (Toukach and Egorova 2016). The features of the Wzt and proteins containing coiled-coil domains are shown. The symbol representations for sugar residues used in this and subsequent tables follow the established convention (Whitfield et al. 2017) Organism Glycan Terminator Repeat unit Genbank ID Wzt ID Wzt length Terminator ID Terminator domain organization Biosynthesis model References A. thermoaerophilus L420-91T SLG Me-3- AY442352.1 AAS55714.1 408 AAS55715.1 CC—MeTase (PF08241) IIa (Schäffer et al. 1999) B. bronchiseptica MO149 OPS Me-4- HE965807.11 CCJ56882.1 454 ND ND IIb (Vinogradov et al. 2010) B. mallei BM2308, ATCC 23344 OPS Me-3- NC_006348.1 YP_103572.1 465 ND ND IIb (Heiss et al. 2013) E. coli O8 OPS Me-3- AB010150.1 BAA28325.1 404 BAA28326.1 MeTase (PF13489)—CC IIa (Jansson et al. 1985) E. coli O9a OPS Me-P-3- AB010293.1 BAA28332 431 AFQ31610.1 MeTase (PF13847)—Kinase (PF00069)—CC IIa (Parolis et al. 1986; Clarke et al. 2004; Kubler-Kielb et al. 2012) G. stearothermophilus NRS 2004/3a SLG Me-2- AF328862.2 AAR99607.1 409 WsaE (AAR99608.1) MeTase (PF08241)—CC—GT (PF00535)—GT (PF00535) IIa (Schäffer et al. 2002; Steiner et al. 2008) G. tepidamans GS5-97T SLG AY883421.5 ABM68319.1 395 WsbH (ABM68321.1) GT (SSF53756)—CC IIa (Kählig et al. 2005) K. pneumoniae O12 OPS AF097519.2 AAN06493.1 440 AAN06494 GT99(PF05159)—CC—GT102—GT103 IIa (Vinogradov et al. 2002; Williams et al. 2017) K. pneumoniae O4 OPS LT174605.1 CZQ25323.1 469 CZQ25322.1 ND—CC IIa (Vinogradov et al. 2002) R. etli CE3 OPS CP000133.1 ABC89568.1 443 Unknown ND IIc (Forsberg et al. 2000; Lerouge et al. 2001; Ojeda et al., 2010) V. cholerae O1 Ogawa OPS Me-2- KC152957.1 AGA82302.1 271 RfbT (AGA82308.1) MeTase (PF05575) Ib (Redmond 1979; Stroeher et al. 1992; Hisatsune et al. 1993; Ito et al. 1994) Organism Glycan Terminator Repeat unit Genbank ID Wzt ID Wzt length Terminator ID Terminator domain organization Biosynthesis model References A. thermoaerophilus L420-91T SLG Me-3- AY442352.1 AAS55714.1 408 AAS55715.1 CC—MeTase (PF08241) IIa (Schäffer et al. 1999) B. bronchiseptica MO149 OPS Me-4- HE965807.11 CCJ56882.1 454 ND ND IIb (Vinogradov et al. 2010) B. mallei BM2308, ATCC 23344 OPS Me-3- NC_006348.1 YP_103572.1 465 ND ND IIb (Heiss et al. 2013) E. coli O8 OPS Me-3- AB010150.1 BAA28325.1 404 BAA28326.1 MeTase (PF13489)—CC IIa (Jansson et al. 1985) E. coli O9a OPS Me-P-3- AB010293.1 BAA28332 431 AFQ31610.1 MeTase (PF13847)—Kinase (PF00069)—CC IIa (Parolis et al. 1986; Clarke et al. 2004; Kubler-Kielb et al. 2012) G. stearothermophilus NRS 2004/3a SLG Me-2- AF328862.2 AAR99607.1 409 WsaE (AAR99608.1) MeTase (PF08241)—CC—GT (PF00535)—GT (PF00535) IIa (Schäffer et al. 2002; Steiner et al. 2008) G. tepidamans GS5-97T SLG AY883421.5 ABM68319.1 395 WsbH (ABM68321.1) GT (SSF53756)—CC IIa (Kählig et al. 2005) K. pneumoniae O12 OPS AF097519.2 AAN06493.1 440 AAN06494 GT99(PF05159)—CC—GT102—GT103 IIa (Vinogradov et al. 2002; Williams et al. 2017) K. pneumoniae O4 OPS LT174605.1 CZQ25323.1 469 CZQ25322.1 ND—CC IIa (Vinogradov et al. 2002) R. etli CE3 OPS CP000133.1 ABC89568.1 443 Unknown ND IIc (Forsberg et al. 2000; Lerouge et al. 2001; Ojeda et al., 2010) V. cholerae O1 Ogawa OPS Me-2- KC152957.1 AGA82302.1 271 RfbT (AGA82308.1) MeTase (PF05575) Ib (Redmond 1979; Stroeher et al. 1992; Hisatsune et al. 1993; Ito et al. 1994) *CC = coiled coil, MeTase = Methyltransferase, PF = Pfam Family. Open in new tab Table I Polysaccharides with known structures containing nonreducing terminal modifications that are exported by ABC transporters. Structures were obtained from the bacterial carbohydrate structure database (http://www.csdb.glycoscience.ru/bacterial/main.html) (Toukach and Egorova 2016). The features of the Wzt and proteins containing coiled-coil domains are shown. The symbol representations for sugar residues used in this and subsequent tables follow the established convention (Whitfield et al. 2017) Organism Glycan Terminator Repeat unit Genbank ID Wzt ID Wzt length Terminator ID Terminator domain organization Biosynthesis model References A. thermoaerophilus L420-91T SLG Me-3- AY442352.1 AAS55714.1 408 AAS55715.1 CC—MeTase (PF08241) IIa (Schäffer et al. 1999) B. bronchiseptica MO149 OPS Me-4- HE965807.11 CCJ56882.1 454 ND ND IIb (Vinogradov et al. 2010) B. mallei BM2308, ATCC 23344 OPS Me-3- NC_006348.1 YP_103572.1 465 ND ND IIb (Heiss et al. 2013) E. coli O8 OPS Me-3- AB010150.1 BAA28325.1 404 BAA28326.1 MeTase (PF13489)—CC IIa (Jansson et al. 1985) E. coli O9a OPS Me-P-3- AB010293.1 BAA28332 431 AFQ31610.1 MeTase (PF13847)—Kinase (PF00069)—CC IIa (Parolis et al. 1986; Clarke et al. 2004; Kubler-Kielb et al. 2012) G. stearothermophilus NRS 2004/3a SLG Me-2- AF328862.2 AAR99607.1 409 WsaE (AAR99608.1) MeTase (PF08241)—CC—GT (PF00535)—GT (PF00535) IIa (Schäffer et al. 2002; Steiner et al. 2008) G. tepidamans GS5-97T SLG AY883421.5 ABM68319.1 395 WsbH (ABM68321.1) GT (SSF53756)—CC IIa (Kählig et al. 2005) K. pneumoniae O12 OPS AF097519.2 AAN06493.1 440 AAN06494 GT99(PF05159)—CC—GT102—GT103 IIa (Vinogradov et al. 2002; Williams et al. 2017) K. pneumoniae O4 OPS LT174605.1 CZQ25323.1 469 CZQ25322.1 ND—CC IIa (Vinogradov et al. 2002) R. etli CE3 OPS CP000133.1 ABC89568.1 443 Unknown ND IIc (Forsberg et al. 2000; Lerouge et al. 2001; Ojeda et al., 2010) V. cholerae O1 Ogawa OPS Me-2- KC152957.1 AGA82302.1 271 RfbT (AGA82308.1) MeTase (PF05575) Ib (Redmond 1979; Stroeher et al. 1992; Hisatsune et al. 1993; Ito et al. 1994) Organism Glycan Terminator Repeat unit Genbank ID Wzt ID Wzt length Terminator ID Terminator domain organization Biosynthesis model References A. thermoaerophilus L420-91T SLG Me-3- AY442352.1 AAS55714.1 408 AAS55715.1 CC—MeTase (PF08241) IIa (Schäffer et al. 1999) B. bronchiseptica MO149 OPS Me-4- HE965807.11 CCJ56882.1 454 ND ND IIb (Vinogradov et al. 2010) B. mallei BM2308, ATCC 23344 OPS Me-3- NC_006348.1 YP_103572.1 465 ND ND IIb (Heiss et al. 2013) E. coli O8 OPS Me-3- AB010150.1 BAA28325.1 404 BAA28326.1 MeTase (PF13489)—CC IIa (Jansson et al. 1985) E. coli O9a OPS Me-P-3- AB010293.1 BAA28332 431 AFQ31610.1 MeTase (PF13847)—Kinase (PF00069)—CC IIa (Parolis et al. 1986; Clarke et al. 2004; Kubler-Kielb et al. 2012) G. stearothermophilus NRS 2004/3a SLG Me-2- AF328862.2 AAR99607.1 409 WsaE (AAR99608.1) MeTase (PF08241)—CC—GT (PF00535)—GT (PF00535) IIa (Schäffer et al. 2002; Steiner et al. 2008) G. tepidamans GS5-97T SLG AY883421.5 ABM68319.1 395 WsbH (ABM68321.1) GT (SSF53756)—CC IIa (Kählig et al. 2005) K. pneumoniae O12 OPS AF097519.2 AAN06493.1 440 AAN06494 GT99(PF05159)—CC—GT102—GT103 IIa (Vinogradov et al. 2002; Williams et al. 2017) K. pneumoniae O4 OPS LT174605.1 CZQ25323.1 469 CZQ25322.1 ND—CC IIa (Vinogradov et al. 2002) R. etli CE3 OPS CP000133.1 ABC89568.1 443 Unknown ND IIc (Forsberg et al. 2000; Lerouge et al. 2001; Ojeda et al., 2010) V. cholerae O1 Ogawa OPS Me-2- KC152957.1 AGA82302.1 271 RfbT (AGA82308.1) MeTase (PF05575) Ib (Redmond 1979; Stroeher et al. 1992; Hisatsune et al. 1993; Ito et al. 1994) *CC = coiled coil, MeTase = Methyltransferase, PF = Pfam Family. Open in new tab Where Pfam domains were identified in the coiled-coil-containing ORFs (identified using InterProScan; El-Gebali et al. 2019; Mitchell et al. 2019), they typically predicted an activity (e.g., methyltransferase or GT) entirely consistent with the known chain-terminating residue. Most of these examples are OPSs (including the established prototypes), but SLGs from glycosylated paracrystalline surface protein arrays in the Bacillaceae were also identified and similarities in the biosynthesis of OPS and SLGs have been recognized before (reviewed in (Schäffer and Messner 2017)). For example, the SLG from Geobacillus stearothermophilus NRS 2004/3a consists of poly-l-Rha terminated with a 2-O-methyl group (Schäffer et al. 2002). In this system, WsaE is a multidomain enzyme, reminiscent of K. pneumoniae O12 WbbB; two C-terminal GTs participate in polymerization (with another GT) and are separated by a coiled-coil region from the N-terminal methyltransferase (Steiner et al. 2008). The SLG from Anoxybacillus tepidamans GS5-97T is unusual in being capped with GlcNAc and N-acetylmuramic acid (MurNAc) on C2 and C3 of the final Rha, respectively (Kählig et al. 2005; Zayni et al. 2007). The synthetic pathway for this glycan has not been elucidated but one ORF in the gene locus predicts a single-domain GT (WbsH) with a coiled coil, consistent with it serving as a terminator protein. The precise activity of WbsH cannot be definitively assigned by sequence data but GlcNAc at the C-2 position directly obstructs the next glycose addition. Can bioinformatics data predict the identity of novel chain-terminating residues? In principle, any chemical group could serve as a chain-terminating residue after addition at the position of GT-catalyzed extension or by sterically inhibiting acceptor recognition by the polymerase. These terminal residues could include glycan substituents found in other (non-terminating) contexts, such as acetate, alanine, sulfate, pyruvate, carbamate, and phosphate, or any sugar whose addition breaks the glycan repeat-unit pattern. To search for novel terminating residues, we therefore screened databases for gene clusters containing an ORF predicted to encode an extended NBD protein and then investigated predicted activities within any corresponding coiled coil-containing proteins. The ABC transporter superfamily (PF00005) constitutes the single largest Pfam family, while the E. coli O9a CBM is a member of the Wzt_C protein family (PF14524). Pfam identifies the members of a family using a Hidden Markov Model (HMM) generated with HMMER3 (http://hmmer.org/) (Sonnhammer et al. 1998; El-Gebali et al. 2019). Because it does not rely upon annotations of neighboring genes, it can identify Wzt-C domains from incomplete sequence data or in situations with erroneous or outdated annotations. However, it is unable to unequivocally identify proteins involved in glycan export without additional information about the local genetic context. Conversely, some proteins expected to be assigned to the Wzt_C family are not identified. For example, B. bronchiseptica MO149 Wzt is not annotated as a family member, yet it clearly possesses a C-terminal extension, which models as a CBM, and the corresponding OPS possesses a terminal methyl residue (Table I). The omission of B. bronchiseptica MO149 Wzt may reflect limitations to the HMM used to assign the Wzt_C family members. Given this kind of uncertainty, a bioinformatic approach was developed here to identify additional instances of chain-terminating modifications, using a strategy that exploits our understanding of the known protein architectural and gene organizational correlates rather than searching directly for homologs of the Wzt-C domain. The initial step was to identify genes encoding Wzt homologs with C-terminal domains within the NCBI nucleotide database, on the premise that the CBM should (in most cases) correlate with a cognate terminator. The protein sequence of full-length O9aEc Wzt was used as a query sequence since the NBD sequence is more highly conserved than the CBM. A tBlastn search (Altschul et al. 1990) was performed within the non-redundant nucleotide sequence database with a value cutoff of 10e−20. A script was written to identify NBDs within the Genbank deposition for each tBlastn result, using the position of the high-scoring sequence pair (the custom script is available at https://github.com/WhitfieldLab/Wzt-C) by employing the BioPython package (Cock et al. 2009). To identify candidate Wzt homologs possessing C-terminal domains, sequences were selected based on two criteria: (i) a length ranging from 325 to 550 amino acids (i.e., longer than K. pneumoniae O2a Wzt (246 amino acids) but shorter than the MsbA half-transporter (582 amino acids)) and (ii) location of the corresponding gene within 7500 bp and in the same chromosomal orientation as one or more genes encoding annotated “GTs” (whose annotation contains “glyco,” “manno,” etc.) or enzymes involved in nucleotide sugar donor synthesis. ORFs surrounding wzt were manually examined for predicted gene products possessing coiled-coil motifs using COILs, and candidates were then subjected to domain prediction analysis using InterProScan software to propose the chemical identity of the nonreducing terminal modification. In most cases, a coiled-coil protein possessing one or more established Pfam domains could be assigned, facilitating prediction of the identity of the nonreducing terminal modification. This approach identified several Wzt homologs with extended C-terminal domains that are associated with polysaccharides with known repeat-unit structures but where the terminal modification status is unknown (Table II). Some of these Wzt proteins were recognized as parts of candidate glyco-exporters in a previous phylogenetic analysis but the partner coiled-coil proteins were not recognized at the time (Cuthbertson et al. 2010). The majority of these proteins offered additional examples of the established nonreducing terminal modifications. For example, the E. coli O99 antigen backbone is composed of poly-d-Rha. The O99 gene cluster resembles that of O9a and includes homologs of WbdA (WejI), as well as WbdD (WejH), with obvious methyltransferase and kinase domains (44% identity, 62% similarity to WbdDO9a, E = 0.0) (Perepelov et al. 2009). The bioinformatics data strongly suggests that the O99 glycan is terminated with a methylphosphate moiety like the O9a prototype. In this dataset, a methyl group was the most commonly predicted terminator. Its prevalence could reflect biases in the current genomic and structural data, the ready availability of this cellular resource, or some favorable biophysical property associated with the methyl group that is important for function. Some entries provided candidate terminators with new activities that have not yet been validated experimentally. In B. vietnamiensis G4, the putative modular chain-length regulation protein possesses a pair of C-terminal GT modules, separated from the N-terminal pyruvyltransferase candidate, suggesting a terminal pyruvate (Table II). Ketal pyruvate groups are common side-chain substituents in extracellular polysaccharides (EPSs), including E. coli colanic acid and Xanthomonas campestris xanthan gum (Ielpi et al. 1981; Marzocca et al. 1991; Stevenson et al. 1996). This modification is also found in some Gram-positive SCWPs, such as the glycan from Paenibacillus alvei (Hager et al. 2018). The P. alvei pyruvyltransferase (CsaB) shares 28% identity (E = 2e−26) with the B. vietnamiensis G4 putative pyruvyltransferase domain, suggesting a similar function. Phosphoenolpyruvate is the donor substrate for pyruvyltransferases. Like S-adenosyl methionine, ATP, and CMP-Kdo (the activated precursors for methyl, phosphate and Kdo residues), PEP is generated by “housekeeping” enzymes. As a result, only a specialized transferase enzyme is required to generate a terminal modification. Table II Polysaccharides with known repeat unit structures, predicted to have nonreducing terminal modifications, based on the association with ABC transporters possessing Wzt-C domains and proteins containing coiled-coil structure Organism Glycan Repeat unit Genbank ID Wzt ID Wzt length Terminator ID Terminator domain organization References Methyl phosphate terminator E. coli O99 OPS FJ940773.1 ACV53836 433 WejH (ACV53837.1) MeTase (PF13847)—Kinase (PS50011)—CC (Perepelov et al. 2009) X. campestris pv. Campestris 8004 OPS AF204145.1 AAK53481.1 428 AAK53480.1 MeTase (PF13847)—Kinase (PS50011)—CC (Molinaro et al. 2003) Methyl terminator B. phytofirmans PsJN OPS CP001052.1 ACD15306.1 493 ACD15307.1 MeTase (SSF53335)—CC—GT (PF13439)—GT (PF00534) (Silipo et al. 2008) P. aeruginosa O15 ATCC 33362 OPS LJZI01000136.1 KRV02538.1 465 KRV02539.1 MeTase (PF08241)—MeTase (PF13847)—CC (Lam et al. 2011) P. aeruginosa O17 ATTC 33364 OPS LJZK01000094.1 KRV20894.1 460 KRV20895.1 MeTase (PF13847)—GT (PF13692)—GT (PF00535) (Lam et al. 2011) P. putida FERM P-18867, S16, W619 OPS NC_010501.1 WP_012313299.1 402 WP_012313300 CC—MeTase (PF13489) (Knirel et al. 2002) P. syringae pv. Tomato DC3000 OPS NC_004578.1 NP_790909.1 454 NP_790908.1 MeTase (PF08242)—CC—GT (PF00535)—GT (PF00535) (Knirel et al. 1998) Sugar terminator A. actinomycetemcomitans serotype e OPS AB030032.1 BAA82537.1 398 BAA82538.1 GT (SSF53756)—CC—GT (PF00535)—GT (SSF53756) (Perry et al. 1996; Yoshida et al. 1999) A. thermoaerophilus DSM 10115/G+ SLG AF324836.3 AAS49125.1 435 WsdE (AAK27848.2) CC—GT (PF00535) (Novotny et al. 2004) S. marcescens O4 OPS AF038816.1 AAC00182.1 441 WbbK (AAC00183.1) β-Kdo Tase (PF05159)—CC—GT (PF00535)—GT (SSF53756) (Oxley and Wilkinson 1988) Unknown terminator B. vietnamiensis G4 OPS CP000614.1 ABO53824.1 454 ABO53823.1 PvTase (PF04230)—CC—GT—GT (PF00535) (Gaur and Wilkinson 2006) ABO56202.1 461 ND ND C. fetus ssp. Fetus 82–40 serotype A OPS NC_008599.1 WP_002848818.1 394 WP_011731843.1 ND (Perez-Perez et al. 1986; Senchenkova et al. 1996) P. aeruginosa PAO1 CPA OPS NC_002516.2 NP_254137.1 421 ND ND (Arsenault et al. 1994) R. tropici CIAT 899 OPS NC_020059.1 WP_015341688.1 460 ND ND (Gil-Serrano et al. 1995) Organism Glycan Repeat unit Genbank ID Wzt ID Wzt length Terminator ID Terminator domain organization References Methyl phosphate terminator E. coli O99 OPS FJ940773.1 ACV53836 433 WejH (ACV53837.1) MeTase (PF13847)—Kinase (PS50011)—CC (Perepelov et al. 2009) X. campestris pv. Campestris 8004 OPS AF204145.1 AAK53481.1 428 AAK53480.1 MeTase (PF13847)—Kinase (PS50011)—CC (Molinaro et al. 2003) Methyl terminator B. phytofirmans PsJN OPS CP001052.1 ACD15306.1 493 ACD15307.1 MeTase (SSF53335)—CC—GT (PF13439)—GT (PF00534) (Silipo et al. 2008) P. aeruginosa O15 ATCC 33362 OPS LJZI01000136.1 KRV02538.1 465 KRV02539.1 MeTase (PF08241)—MeTase (PF13847)—CC (Lam et al. 2011) P. aeruginosa O17 ATTC 33364 OPS LJZK01000094.1 KRV20894.1 460 KRV20895.1 MeTase (PF13847)—GT (PF13692)—GT (PF00535) (Lam et al. 2011) P. putida FERM P-18867, S16, W619 OPS NC_010501.1 WP_012313299.1 402 WP_012313300 CC—MeTase (PF13489) (Knirel et al. 2002) P. syringae pv. Tomato DC3000 OPS NC_004578.1 NP_790909.1 454 NP_790908.1 MeTase (PF08242)—CC—GT (PF00535)—GT (PF00535) (Knirel et al. 1998) Sugar terminator A. actinomycetemcomitans serotype e OPS AB030032.1 BAA82537.1 398 BAA82538.1 GT (SSF53756)—CC—GT (PF00535)—GT (SSF53756) (Perry et al. 1996; Yoshida et al. 1999) A. thermoaerophilus DSM 10115/G+ SLG AF324836.3 AAS49125.1 435 WsdE (AAK27848.2) CC—GT (PF00535) (Novotny et al. 2004) S. marcescens O4 OPS AF038816.1 AAC00182.1 441 WbbK (AAC00183.1) β-Kdo Tase (PF05159)—CC—GT (PF00535)—GT (SSF53756) (Oxley and Wilkinson 1988) Unknown terminator B. vietnamiensis G4 OPS CP000614.1 ABO53824.1 454 ABO53823.1 PvTase (PF04230)—CC—GT—GT (PF00535) (Gaur and Wilkinson 2006) ABO56202.1 461 ND ND C. fetus ssp. Fetus 82–40 serotype A OPS NC_008599.1 WP_002848818.1 394 WP_011731843.1 ND (Perez-Perez et al. 1986; Senchenkova et al. 1996) P. aeruginosa PAO1 CPA OPS NC_002516.2 NP_254137.1 421 ND ND (Arsenault et al. 1994) R. tropici CIAT 899 OPS NC_020059.1 WP_015341688.1 460 ND ND (Gil-Serrano et al. 1995) Open in new tab Table II Polysaccharides with known repeat unit structures, predicted to have nonreducing terminal modifications, based on the association with ABC transporters possessing Wzt-C domains and proteins containing coiled-coil structure Organism Glycan Repeat unit Genbank ID Wzt ID Wzt length Terminator ID Terminator domain organization References Methyl phosphate terminator E. coli O99 OPS FJ940773.1 ACV53836 433 WejH (ACV53837.1) MeTase (PF13847)—Kinase (PS50011)—CC (Perepelov et al. 2009) X. campestris pv. Campestris 8004 OPS AF204145.1 AAK53481.1 428 AAK53480.1 MeTase (PF13847)—Kinase (PS50011)—CC (Molinaro et al. 2003) Methyl terminator B. phytofirmans PsJN OPS CP001052.1 ACD15306.1 493 ACD15307.1 MeTase (SSF53335)—CC—GT (PF13439)—GT (PF00534) (Silipo et al. 2008) P. aeruginosa O15 ATCC 33362 OPS LJZI01000136.1 KRV02538.1 465 KRV02539.1 MeTase (PF08241)—MeTase (PF13847)—CC (Lam et al. 2011) P. aeruginosa O17 ATTC 33364 OPS LJZK01000094.1 KRV20894.1 460 KRV20895.1 MeTase (PF13847)—GT (PF13692)—GT (PF00535) (Lam et al. 2011) P. putida FERM P-18867, S16, W619 OPS NC_010501.1 WP_012313299.1 402 WP_012313300 CC—MeTase (PF13489) (Knirel et al. 2002) P. syringae pv. Tomato DC3000 OPS NC_004578.1 NP_790909.1 454 NP_790908.1 MeTase (PF08242)—CC—GT (PF00535)—GT (PF00535) (Knirel et al. 1998) Sugar terminator A. actinomycetemcomitans serotype e OPS AB030032.1 BAA82537.1 398 BAA82538.1 GT (SSF53756)—CC—GT (PF00535)—GT (SSF53756) (Perry et al. 1996; Yoshida et al. 1999) A. thermoaerophilus DSM 10115/G+ SLG AF324836.3 AAS49125.1 435 WsdE (AAK27848.2) CC—GT (PF00535) (Novotny et al. 2004) S. marcescens O4 OPS AF038816.1 AAC00182.1 441 WbbK (AAC00183.1) β-Kdo Tase (PF05159)—CC—GT (PF00535)—GT (SSF53756) (Oxley and Wilkinson 1988) Unknown terminator B. vietnamiensis G4 OPS CP000614.1 ABO53824.1 454 ABO53823.1 PvTase (PF04230)—CC—GT—GT (PF00535) (Gaur and Wilkinson 2006) ABO56202.1 461 ND ND C. fetus ssp. Fetus 82–40 serotype A OPS NC_008599.1 WP_002848818.1 394 WP_011731843.1 ND (Perez-Perez et al. 1986; Senchenkova et al. 1996) P. aeruginosa PAO1 CPA OPS NC_002516.2 NP_254137.1 421 ND ND (Arsenault et al. 1994) R. tropici CIAT 899 OPS NC_020059.1 WP_015341688.1 460 ND ND (Gil-Serrano et al. 1995) Organism Glycan Repeat unit Genbank ID Wzt ID Wzt length Terminator ID Terminator domain organization References Methyl phosphate terminator E. coli O99 OPS FJ940773.1 ACV53836 433 WejH (ACV53837.1) MeTase (PF13847)—Kinase (PS50011)—CC (Perepelov et al. 2009) X. campestris pv. Campestris 8004 OPS AF204145.1 AAK53481.1 428 AAK53480.1 MeTase (PF13847)—Kinase (PS50011)—CC (Molinaro et al. 2003) Methyl terminator B. phytofirmans PsJN OPS CP001052.1 ACD15306.1 493 ACD15307.1 MeTase (SSF53335)—CC—GT (PF13439)—GT (PF00534) (Silipo et al. 2008) P. aeruginosa O15 ATCC 33362 OPS LJZI01000136.1 KRV02538.1 465 KRV02539.1 MeTase (PF08241)—MeTase (PF13847)—CC (Lam et al. 2011) P. aeruginosa O17 ATTC 33364 OPS LJZK01000094.1 KRV20894.1 460 KRV20895.1 MeTase (PF13847)—GT (PF13692)—GT (PF00535) (Lam et al. 2011) P. putida FERM P-18867, S16, W619 OPS NC_010501.1 WP_012313299.1 402 WP_012313300 CC—MeTase (PF13489) (Knirel et al. 2002) P. syringae pv. Tomato DC3000 OPS NC_004578.1 NP_790909.1 454 NP_790908.1 MeTase (PF08242)—CC—GT (PF00535)—GT (PF00535) (Knirel et al. 1998) Sugar terminator A. actinomycetemcomitans serotype e OPS AB030032.1 BAA82537.1 398 BAA82538.1 GT (SSF53756)—CC—GT (PF00535)—GT (SSF53756) (Perry et al. 1996; Yoshida et al. 1999) A. thermoaerophilus DSM 10115/G+ SLG AF324836.3 AAS49125.1 435 WsdE (AAK27848.2) CC—GT (PF00535) (Novotny et al. 2004) S. marcescens O4 OPS AF038816.1 AAC00182.1 441 WbbK (AAC00183.1) β-Kdo Tase (PF05159)—CC—GT (PF00535)—GT (SSF53756) (Oxley and Wilkinson 1988) Unknown terminator B. vietnamiensis G4 OPS CP000614.1 ABO53824.1 454 ABO53823.1 PvTase (PF04230)—CC—GT—GT (PF00535) (Gaur and Wilkinson 2006) ABO56202.1 461 ND ND C. fetus ssp. Fetus 82–40 serotype A OPS NC_008599.1 WP_002848818.1 394 WP_011731843.1 ND (Perez-Perez et al. 1986; Senchenkova et al. 1996) P. aeruginosa PAO1 CPA OPS NC_002516.2 NP_254137.1 421 ND ND (Arsenault et al. 1994) R. tropici CIAT 899 OPS NC_020059.1 WP_015341688.1 460 ND ND (Gil-Serrano et al. 1995) Open in new tab Many systems are identified solely by sequence data (i.e., no information is available for the glycan structures), but the sheer number of sequenced genomes offers a powerful resource for the discovery of candidate novel termination chemistries (Table III). These genomes revealed other putative examples of pyruvate, as well as sulfate and phosphate (in the absence of additional methylation). Sulfate and phosphate are common side-chain modifications in carbohydrate-containing structures, such as bacterial EPSs (Poli et al. 2010). The predicted chain-terminating carbamoyltransferases from Herbaspirillum rubrisubalbicans M1 and Rhodoplanes sp. Z2-YC6860 each share > 28% sequence identity (E < 1e−36) with (NodU and NolO) carbamoyltransferases involved in the biosynthesis of nodulation (Nod) factors in Rhizobium sp. NGR234 (Jabbouri et al. 1995; Jabbouri et al. 1998). Rhizobia secrete Nod factors to induce symbiotic nodule formation on the roots of legumes. They are acylated oligosaccharides composed of β-1,4-GlcNAc that vary in the number, variety, and position of a range of substituents that include nonreducing terminal N-methyl, O-acetyl, and O-carbamoyl groups (D’Haeze and Holsters 2002). Table III Newly discovered candidates for putative coiled-coil termination activities Organism Genbank ID Wzt ID Wzt length Terminator ID Terminator domain organization Pyruvate terminator Pseudomonas protegens FDAARGOS_307 CP022097.2 ASE24426.1 424 ASE24427.1 PvTase (PF04230)—CC Sulfate Terminator Nitrosococcus halophilus Nc 4 CP001798.1 ADE15671.1 411 ADE15675.1 Sulfotransferase (PF13469)—CC—GT (PF00535)—MeTase (PF13489) O-acetyl/Kdo Terminator Pseudomonas koreensis P19E3 CP027477.1 AVX90388.1 419 AVX90387.1 Acetyltransferase (cd04647)—βKdoTase (PF05159)—CC—GT (PF00535)—GT (PF00535) Carbamate Terminator Herbaspirillum rubrisubalbicans M1 CP013737.1 ALU87684.1 429 ALU87686.1 Carbamoyltransferase (PF02543, PF16861)—CC—GT (PF00535) Rhodoplanes sp. Z2-YC6860 CP007440.1 AMN43749.1 419 AMN43744.1 Carbamoyltransferase (PF02543, PF16861)—CC Phosphate Terminator Paenibacillus polymyxa ATCC 15970 CP011420.1 APQ62228.1 433 APQ61431.1 Kinase (PF01633)—CC Methyl terminator Sebaldella termitidis ATCC 33386 CP001739.1 ACZ09262.1 415 ACZ09261.1 MeTase (PF13578)—CC—GT (PF00535)—GT (PF00535)—GT (PF01075) Thiocystis violascens DSM 198 CP003154.1 AFL74597.1 455 AFL74598.1 MeTase (PF13489)—GT (PF00535)—CC—GT (PF00535)—GT (PF13692) Enterococcus mundtii CRL35 CP025473.1 AZP92413.1 406 AZP92414.1 MeTase (PF13489)—CC Pseudomonas fluorescens Pf0–1 CP000094.2 ABA75785.1 438 ABA75784.1 MeTase (PF08241)—CC Clostridium saccharobutylicum DSM 13864 CP006721.1 AGX45163.1 419 AGX45162.1 MeTase (PF13578)—CC Organism Genbank ID Wzt ID Wzt length Terminator ID Terminator domain organization Pyruvate terminator Pseudomonas protegens FDAARGOS_307 CP022097.2 ASE24426.1 424 ASE24427.1 PvTase (PF04230)—CC Sulfate Terminator Nitrosococcus halophilus Nc 4 CP001798.1 ADE15671.1 411 ADE15675.1 Sulfotransferase (PF13469)—CC—GT (PF00535)—MeTase (PF13489) O-acetyl/Kdo Terminator Pseudomonas koreensis P19E3 CP027477.1 AVX90388.1 419 AVX90387.1 Acetyltransferase (cd04647)—βKdoTase (PF05159)—CC—GT (PF00535)—GT (PF00535) Carbamate Terminator Herbaspirillum rubrisubalbicans M1 CP013737.1 ALU87684.1 429 ALU87686.1 Carbamoyltransferase (PF02543, PF16861)—CC—GT (PF00535) Rhodoplanes sp. Z2-YC6860 CP007440.1 AMN43749.1 419 AMN43744.1 Carbamoyltransferase (PF02543, PF16861)—CC Phosphate Terminator Paenibacillus polymyxa ATCC 15970 CP011420.1 APQ62228.1 433 APQ61431.1 Kinase (PF01633)—CC Methyl terminator Sebaldella termitidis ATCC 33386 CP001739.1 ACZ09262.1 415 ACZ09261.1 MeTase (PF13578)—CC—GT (PF00535)—GT (PF00535)—GT (PF01075) Thiocystis violascens DSM 198 CP003154.1 AFL74597.1 455 AFL74598.1 MeTase (PF13489)—GT (PF00535)—CC—GT (PF00535)—GT (PF13692) Enterococcus mundtii CRL35 CP025473.1 AZP92413.1 406 AZP92414.1 MeTase (PF13489)—CC Pseudomonas fluorescens Pf0–1 CP000094.2 ABA75785.1 438 ABA75784.1 MeTase (PF08241)—CC Clostridium saccharobutylicum DSM 13864 CP006721.1 AGX45163.1 419 AGX45162.1 MeTase (PF13578)—CC Open in new tab Table III Newly discovered candidates for putative coiled-coil termination activities Organism Genbank ID Wzt ID Wzt length Terminator ID Terminator domain organization Pyruvate terminator Pseudomonas protegens FDAARGOS_307 CP022097.2 ASE24426.1 424 ASE24427.1 PvTase (PF04230)—CC Sulfate Terminator Nitrosococcus halophilus Nc 4 CP001798.1 ADE15671.1 411 ADE15675.1 Sulfotransferase (PF13469)—CC—GT (PF00535)—MeTase (PF13489) O-acetyl/Kdo Terminator Pseudomonas koreensis P19E3 CP027477.1 AVX90388.1 419 AVX90387.1 Acetyltransferase (cd04647)—βKdoTase (PF05159)—CC—GT (PF00535)—GT (PF00535) Carbamate Terminator Herbaspirillum rubrisubalbicans M1 CP013737.1 ALU87684.1 429 ALU87686.1 Carbamoyltransferase (PF02543, PF16861)—CC—GT (PF00535) Rhodoplanes sp. Z2-YC6860 CP007440.1 AMN43749.1 419 AMN43744.1 Carbamoyltransferase (PF02543, PF16861)—CC Phosphate Terminator Paenibacillus polymyxa ATCC 15970 CP011420.1 APQ62228.1 433 APQ61431.1 Kinase (PF01633)—CC Methyl terminator Sebaldella termitidis ATCC 33386 CP001739.1 ACZ09262.1 415 ACZ09261.1 MeTase (PF13578)—CC—GT (PF00535)—GT (PF00535)—GT (PF01075) Thiocystis violascens DSM 198 CP003154.1 AFL74597.1 455 AFL74598.1 MeTase (PF13489)—GT (PF00535)—CC—GT (PF00535)—GT (PF13692) Enterococcus mundtii CRL35 CP025473.1 AZP92413.1 406 AZP92414.1 MeTase (PF13489)—CC Pseudomonas fluorescens Pf0–1 CP000094.2 ABA75785.1 438 ABA75784.1 MeTase (PF08241)—CC Clostridium saccharobutylicum DSM 13864 CP006721.1 AGX45163.1 419 AGX45162.1 MeTase (PF13578)—CC Organism Genbank ID Wzt ID Wzt length Terminator ID Terminator domain organization Pyruvate terminator Pseudomonas protegens FDAARGOS_307 CP022097.2 ASE24426.1 424 ASE24427.1 PvTase (PF04230)—CC Sulfate Terminator Nitrosococcus halophilus Nc 4 CP001798.1 ADE15671.1 411 ADE15675.1 Sulfotransferase (PF13469)—CC—GT (PF00535)—MeTase (PF13489) O-acetyl/Kdo Terminator Pseudomonas koreensis P19E3 CP027477.1 AVX90388.1 419 AVX90387.1 Acetyltransferase (cd04647)—βKdoTase (PF05159)—CC—GT (PF00535)—GT (PF00535) Carbamate Terminator Herbaspirillum rubrisubalbicans M1 CP013737.1 ALU87684.1 429 ALU87686.1 Carbamoyltransferase (PF02543, PF16861)—CC—GT (PF00535) Rhodoplanes sp. Z2-YC6860 CP007440.1 AMN43749.1 419 AMN43744.1 Carbamoyltransferase (PF02543, PF16861)—CC Phosphate Terminator Paenibacillus polymyxa ATCC 15970 CP011420.1 APQ62228.1 433 APQ61431.1 Kinase (PF01633)—CC Methyl terminator Sebaldella termitidis ATCC 33386 CP001739.1 ACZ09262.1 415 ACZ09261.1 MeTase (PF13578)—CC—GT (PF00535)—GT (PF00535)—GT (PF01075) Thiocystis violascens DSM 198 CP003154.1 AFL74597.1 455 AFL74598.1 MeTase (PF13489)—GT (PF00535)—CC—GT (PF00535)—GT (PF13692) Enterococcus mundtii CRL35 CP025473.1 AZP92413.1 406 AZP92414.1 MeTase (PF13489)—CC Pseudomonas fluorescens Pf0–1 CP000094.2 ABA75785.1 438 ABA75784.1 MeTase (PF08241)—CC Clostridium saccharobutylicum DSM 13864 CP006721.1 AGX45163.1 419 AGX45162.1 MeTase (PF13578)—CC Open in new tab Many of the identified putative glycan-capping proteins display modular organizations with one or more predicted GT domain(s), in addition to a transferase for a nonglycose residue (Tables II and III). By analogy with the established models, these proteins are candidates for enzymes participating in both termination and polymerization (alone or with additional proteins) and they show diverse modular formats. For example, the putative terminator protein from Sebaldella termitidis ATCC 33386 (ACZ09261.1) possesses three predicted C-terminal GT domains, separated from the N-terminal methyltransferase by a coiled coil. In the candidate protein from Thiocystis violascens DSM 198 (AFL74598.1), an N-terminal methyltransferase-GT pair is separated by a coiled-coil region from two GT domains. Phylogenetic analysis of Wzt_C domains We next aimed to examine phylogenetic relationships among the Wzt_C family members identified by Pfam within the NCBI sequence database. For these analyses, 6971 Wzt_C sequences were trimmed to only include the Wzt_C domain in order to eliminate the confounding effects arising from the highly conserved NBD sequences. The phylogenetic tree is shown in Figure 5. We hypothesized that CBMs recognizing an identical terminator would group closely. This is the case for the three CBMs (E. coli O9a, E. coli O99, and X. campestris pv. Campestris 8004) proposed to recognize a methylphosphate terminus. However, instead of being confined to a single clade, CBMs recognizing methyl-terminated glycans are distributed throughout the tree. One possible explanation for this distribution is that a CBM recognizing a methyl-terminated glycan arose early in the evolution of the Wzt_C family and this is the progenitor of a majority of Wzt_C family members. As terminator diversity arose, so too did corresponding CBM lineages. Fig. 5 Open in new tabDownload slide Unrooted maximum likelihood phylogenetic tree of Wzt_C sequences. The phylogenetic tree contains Wzt_C sequences in NCBI as identified by a Pfam-produced HMM (https://pfam.xfam.org/family/PF14524#tabview=tab3). From this, a maximum likelihood tree was calculated using RAxML on the CIPRES gateway (Miller et al. 2010; Stamatakis 2014) and visualized using the Interactive Tree of Life webserver (Letunic and Bork 2019). The tree represents the best scoring tree of 100 bootstraps where the branch length represents the number of expected substitutions per site. Colored nodes of representative Wzt_C sequences with identified or proposed terminators are shown in Tables I–IV. Fig. 5 Open in new tabDownload slide Unrooted maximum likelihood phylogenetic tree of Wzt_C sequences. The phylogenetic tree contains Wzt_C sequences in NCBI as identified by a Pfam-produced HMM (https://pfam.xfam.org/family/PF14524#tabview=tab3). From this, a maximum likelihood tree was calculated using RAxML on the CIPRES gateway (Miller et al. 2010; Stamatakis 2014) and visualized using the Interactive Tree of Life webserver (Letunic and Bork 2019). The tree represents the best scoring tree of 100 bootstraps where the branch length represents the number of expected substitutions per site. Colored nodes of representative Wzt_C sequences with identified or proposed terminators are shown in Tables I–IV. Without additional confirmed terminators to label the tree more comprehensively, we cannot predict the extent of terminator diversity by tree construction (and counting the number of clades). It is not clear how closely Wzt_C family members need to be related in order to form “monofunctional” groups. We predict that the binding groove of the known CBM structures from E. coli O9a and K. pneumoniae O12 can accommodate one or more repeat-unit residues in addition to the terminating component, and these may also contribute to interactions with the CBM. Consequently, a small structural change to the CBM-binding pocket may be sufficient to adjust the specificity from a methyl terminus-recognizing CBM to a sugar terminus-recognizing CBM. In this context, monophyletic groups of Wzt_C family members could more closely correspond to glycan structure for the sugars and linkages immediately preceding the terminator, rather than the termination chemistry alone. Variations on the theme suggest novel chain-length determination and export strategies Collectively, the experimental and bioinformatics data presented here suggest that polysaccharide engagement by the Wzm-Wzt ABC transporter complex can occur via different approaches that we summarize in hypothetical models (Figure 6). These models are intended to stimulate debate and further research; they are a work in progress and not an unequivocal finalized nomenclature. In all of the models, we propose that the primary site of substrate binding is by the electropositive pocket near the Wzt gate helix, which is proposed to bind to the Und-PP-head group based on the Wzm-Wzt structure from A. aeolicus. Type I polysaccharide transporters lack a CBM and are separated into the classical system (type Ia) represented by K. pneumoniae O2a and examples where the glycan possesses a modified terminus that plays no role in export (type Ib). Type Ib is represented by Vibrio cholerae O1 OPS, which is capped by a nonreducing terminal 2 O-methyl group in serogroup Ogawa (Hisatsune et al. 1993) (Table I). However, the terminal methyl group is dispensable for export, and mutations in the gene encoding the methyltransferase (wbeT, formerly rfbT) differentiate serogroup Ogawa from the nonmethylated serogroup Inaba OPS (Stroeher et al. 1992). Despite the lack of apparent impact of the terminal residue on chain-length distribution and export (Manning et al., 1986), terminal methylation has an important biological impact; serotype conversion of Ogawa to Inaba is likely a central factor for the reinfection of previously immune individuals by V. cholerae O1 (Alam et al., 2016). It is currently unknown whether the V. cholerae transporter is also capable of exporting diverse glycan backbone structures (like K. pneumoniae O2a). The distribution of chain lengths in type I-like systems tends to be quite broad, although in K. pneumoniae O2a, LPS molecules containing shorter chains (2–5 repeat units) are absent from the profile (Kos and Whitfield 2010). Whether this is due to the relative rates of chain extension and export or a physical limitation determined by the geometry of the complex warrants investigation. Fig. 6 Open in new tabDownload slide Proposed models of glycan termination/chain-length regulation and substrate engagement by Wzm-Wzt. Fig. 6 Open in new tabDownload slide Proposed models of glycan termination/chain-length regulation and substrate engagement by Wzm-Wzt. We use “type II” to cover those systems that possess a Wzt protein with a C-terminal CBM recognizing the terminated glycan substrate and hence conferring glycan specificity. To the extent that they have been studied, systems that incorporate chain-termination processes have a more limited range of product chain lengths and CBMs can ensure that the biosynthetic controls are translated into the final product on the cell surface. In the prevalent type IIa variants, chain-terminating proteins possessing coiled-coil molecular rulers chemically alter the nonreducing terminus of the glycan. Examples include the biochemically characterized prototypes, E. coli O9a and K. pneumoniae O12, which differ in the organizational formats of the catalytic domains but follow the same overall strategy using a membrane-anchored terminating protein. Type IIb systems possess the same spectrum of catalytic components but the terminating enzymes lack a coiled-coil molecular ruler. The terminally 3-O-methylated OPS from Bordetella bronchiseptica MO149 (Table I) offers one example; WbmOO (CCJ56881.1) was proposed to be the terminating methyltransferase (PF08241) based on similarity to WbdDO8 (29% identity over 108aa) (Vinogradov et al. 2010). Other type IIb examples include the glycans produced by Burkholderia mallei BM2308 and B. pseudomallei RR2808, which also have terminal methyl groups (Table I). Both gene loci predict an extended Wzt and two methyltransferases, consistent with the terminator identity. The second methyltransferase presumably adds the side-chain substituent, but neither is predicted to contain coiled coils. P. aeruginosa Common Polysaccharide Antigen (CPA) may offer another version of the type IIb model. This system includes a Wzt_C domain appended to Wzt and chemical analysis of this poly-d-Rha glycan revealed the presence of small amounts of 3-O-methyl substituted rhamnose (Arsenault et al. 1994) but definitive evidence for a chain-terminating methyl group is not available. Two cyclic-di-GMP-regulated enzymes (WarAB, encoded by genes not linked to the CPA biosynthesis locus) are proposed to provide kinase and methyltransferase activities to cap CPA and regulate chain length of the product (McCarthy et al. 2017). WarAB interact with CPA GTs, but neither possesses a coiled-coil domain. Structural data for the glycan are essential to confirm the biochemical interpretations. The type IIb examples raise the question of whether chain length can be effectively controlled by a chain-terminating enzyme lacking a coiled-coil molecular ruler. A freely diffusing enzyme would likely add the terminator purely stochastically but any structural arrangement that prevents the terminating enzyme from closely approaching the membrane where the glycan is anchored could, in principle, suffice as a molecular ruler. In this scenario, chain-length determination could be achieved by structural elements other than coiled coils in the terminating enzyme, or chain length could be measured by the relative positioning of catalytic sites in a protein complex where the complex itself provides the appropriate geometry. We also propose the existence of type IIc systems where a Wzt-C domain is utilized to recognize the polymeric repeat of a glycan independent of any termination event. Rhizobium etli CE3 (Table I) offers a candidate example of a type IIc system. The nonreducing terminus of the OPS is decorated with several O-methyl groups (Forsberg et al. 2000) and the biosynthesis locus encodes an ABC transporter with a putative CBM (Lerouge et al. 2001; Cuthbertson et al. 2010). The gene locus encodes four predicted methyltransferases and nine GTs with no evident coiled-coil domains (Lerouge et al. 2003; Ojeda et al. 2013). In this system, the extent of main-chain and terminal O-methylation is modulated by environmental factors (Noel et al. 2004; D’Haeze et al. 2007), and the genes responsible for addition of the nonreducing terminal fucose (wreB) and O-methyl groups (wreA, wreD, wreF) are apparently dispensable for OPS production and export (Ojeda et al. 2010). Despite the absence of the hallmark features of chain-length regulation, the machinery produces a precise product containing five repeat units per glycan chain. While the absence of a termination enzyme would exclude polymer length control based on the spacing between the termination enzyme and the membrane, this arrangement could still selectively export short saccharides with a specific residue from within the repeat appearing consistently at the nonreducing terminus providing recognition. In this context, there are several examples of dual activity polymerases where reaction from one site appears to proceed more slowly than the other in vitro, generating products that always contain the same terminal linkage/residue (Ovchinnikova et al. 2016; Kelly et al. 2019; Doyle et al. 2019). We speculate that such very short-chain lengths are an intrinsic property of type IIc systems, as the glycan is captured and exported as soon as it is long enough to reach the CBM. Finally, type IId transporters are variants where a candidate capping enzyme is fused to the CBM, rather than existing as a separate protein (Table IV). An NBD-Wzt_C-Sulfotransferase ORF was reported previously in Geobacter uraniireducens strain Rf4 (Cuthbertson et al. 2010) but the expanded collection of genomes include examples of comparable NBD modularity including sulfotransferases, methyltransferases, GTs, pyruvyltransferases, acetyltransferases and cholinetransferases. The biochemical role of these activities is currently uncertain but many of the associated genetic loci do not encode coiled-coil domain proteins, raising the possibility that the transporter takes a direct role in a capping mechanism. Information on these systems is limited to genome sequence data, but they raise interesting questions, ranging from whether the catalytic domains attached to the NBDs truly dictate termination, to whether integration of the CBM and terminating enzyme allows the emergence of unique behaviors in these systems. Table IV Wzt homologs with both CBMs and additional C-terminal modules with predicted enzymatic activities Organism Genbank ID Wzt ID Wzt Organization Terminator ID Terminator domain organization Oscillatoriales cyanobacterium USR001 MBRE01000079.1 OCQ93928.1 NBD (PF000005)—Wzt_C(PF14524)—Sulfotransferase (PF00685) ND ND Tychonema bourrellyi FEM_GT703 NXIB02000039.1 PHX55816.1 ND ND Geobacter uraniireducens (strain Rf4) CP000698.1 ABQ27964.1 ABQ27960.1 MeTase (PF13489)—CC Oscillatoria nigro-viridis PCC 7112 CP003614.1 AFZ07876.1 ND ND Acidovorax sp. (strain JS42) CP000539.1 ABM40795.1 NBD (PF000005)—Wzt_C(PF14524)—GT (PF13692) ND ND Polynucleobacter sp. VK13 FWXJ01000009.1 SMC60958.1 NBD (PF000005)—Wzt_C(PF14524)—MeTase (PF05050) SMC60939.1 MeTase (PF05050)—CC—GT (PF00534)—GT (PF00534) Roseiflexus sp. (strain RS-1) CP000686.1 ABQ92433.1 NBD (PF000005)—Wzt_C(PF14524)—MeTase (PF08241)—MeTase (PF13649) ND ND Hyphomicrobium nitrativorans NL23 CP006912.1 AHB47207.1 NBD (PF000005)—Wzt_C(PF14524)—MeTase (PF13489) ND ND Clostridium clostridioforme CAG:132 CBDY010000463.1 CDB64836.1 NBD (PF000005)—Wzt_C(PF14524)—Cholinetransferase (PF04991) ND ND Comamonadaceae bacterium PBBC1 NKIP01000013.1 OYU11157.1 NBD (PF000005)—Wzt_C(PF14524)—Acetyltransferase (PF13527) ND ND Pseudomonas fluorescens F113 CP003150.1 AEV62088.1 NBD (PF000005)—Wzt_C(PF14524)—PvTase (PF04230) AEV62086.1 CC—GT (PF13692)—GT (PF00535) Organism Genbank ID Wzt ID Wzt Organization Terminator ID Terminator domain organization Oscillatoriales cyanobacterium USR001 MBRE01000079.1 OCQ93928.1 NBD (PF000005)—Wzt_C(PF14524)—Sulfotransferase (PF00685) ND ND Tychonema bourrellyi FEM_GT703 NXIB02000039.1 PHX55816.1 ND ND Geobacter uraniireducens (strain Rf4) CP000698.1 ABQ27964.1 ABQ27960.1 MeTase (PF13489)—CC Oscillatoria nigro-viridis PCC 7112 CP003614.1 AFZ07876.1 ND ND Acidovorax sp. (strain JS42) CP000539.1 ABM40795.1 NBD (PF000005)—Wzt_C(PF14524)—GT (PF13692) ND ND Polynucleobacter sp. VK13 FWXJ01000009.1 SMC60958.1 NBD (PF000005)—Wzt_C(PF14524)—MeTase (PF05050) SMC60939.1 MeTase (PF05050)—CC—GT (PF00534)—GT (PF00534) Roseiflexus sp. (strain RS-1) CP000686.1 ABQ92433.1 NBD (PF000005)—Wzt_C(PF14524)—MeTase (PF08241)—MeTase (PF13649) ND ND Hyphomicrobium nitrativorans NL23 CP006912.1 AHB47207.1 NBD (PF000005)—Wzt_C(PF14524)—MeTase (PF13489) ND ND Clostridium clostridioforme CAG:132 CBDY010000463.1 CDB64836.1 NBD (PF000005)—Wzt_C(PF14524)—Cholinetransferase (PF04991) ND ND Comamonadaceae bacterium PBBC1 NKIP01000013.1 OYU11157.1 NBD (PF000005)—Wzt_C(PF14524)—Acetyltransferase (PF13527) ND ND Pseudomonas fluorescens F113 CP003150.1 AEV62088.1 NBD (PF000005)—Wzt_C(PF14524)—PvTase (PF04230) AEV62086.1 CC—GT (PF13692)—GT (PF00535) Open in new tab Table IV Wzt homologs with both CBMs and additional C-terminal modules with predicted enzymatic activities Organism Genbank ID Wzt ID Wzt Organization Terminator ID Terminator domain organization Oscillatoriales cyanobacterium USR001 MBRE01000079.1 OCQ93928.1 NBD (PF000005)—Wzt_C(PF14524)—Sulfotransferase (PF00685) ND ND Tychonema bourrellyi FEM_GT703 NXIB02000039.1 PHX55816.1 ND ND Geobacter uraniireducens (strain Rf4) CP000698.1 ABQ27964.1 ABQ27960.1 MeTase (PF13489)—CC Oscillatoria nigro-viridis PCC 7112 CP003614.1 AFZ07876.1 ND ND Acidovorax sp. (strain JS42) CP000539.1 ABM40795.1 NBD (PF000005)—Wzt_C(PF14524)—GT (PF13692) ND ND Polynucleobacter sp. VK13 FWXJ01000009.1 SMC60958.1 NBD (PF000005)—Wzt_C(PF14524)—MeTase (PF05050) SMC60939.1 MeTase (PF05050)—CC—GT (PF00534)—GT (PF00534) Roseiflexus sp. (strain RS-1) CP000686.1 ABQ92433.1 NBD (PF000005)—Wzt_C(PF14524)—MeTase (PF08241)—MeTase (PF13649) ND ND Hyphomicrobium nitrativorans NL23 CP006912.1 AHB47207.1 NBD (PF000005)—Wzt_C(PF14524)—MeTase (PF13489) ND ND Clostridium clostridioforme CAG:132 CBDY010000463.1 CDB64836.1 NBD (PF000005)—Wzt_C(PF14524)—Cholinetransferase (PF04991) ND ND Comamonadaceae bacterium PBBC1 NKIP01000013.1 OYU11157.1 NBD (PF000005)—Wzt_C(PF14524)—Acetyltransferase (PF13527) ND ND Pseudomonas fluorescens F113 CP003150.1 AEV62088.1 NBD (PF000005)—Wzt_C(PF14524)—PvTase (PF04230) AEV62086.1 CC—GT (PF13692)—GT (PF00535) Organism Genbank ID Wzt ID Wzt Organization Terminator ID Terminator domain organization Oscillatoriales cyanobacterium USR001 MBRE01000079.1 OCQ93928.1 NBD (PF000005)—Wzt_C(PF14524)—Sulfotransferase (PF00685) ND ND Tychonema bourrellyi FEM_GT703 NXIB02000039.1 PHX55816.1 ND ND Geobacter uraniireducens (strain Rf4) CP000698.1 ABQ27964.1 ABQ27960.1 MeTase (PF13489)—CC Oscillatoria nigro-viridis PCC 7112 CP003614.1 AFZ07876.1 ND ND Acidovorax sp. (strain JS42) CP000539.1 ABM40795.1 NBD (PF000005)—Wzt_C(PF14524)—GT (PF13692) ND ND Polynucleobacter sp. VK13 FWXJ01000009.1 SMC60958.1 NBD (PF000005)—Wzt_C(PF14524)—MeTase (PF05050) SMC60939.1 MeTase (PF05050)—CC—GT (PF00534)—GT (PF00534) Roseiflexus sp. (strain RS-1) CP000686.1 ABQ92433.1 NBD (PF000005)—Wzt_C(PF14524)—MeTase (PF08241)—MeTase (PF13649) ND ND Hyphomicrobium nitrativorans NL23 CP006912.1 AHB47207.1 NBD (PF000005)—Wzt_C(PF14524)—MeTase (PF13489) ND ND Clostridium clostridioforme CAG:132 CBDY010000463.1 CDB64836.1 NBD (PF000005)—Wzt_C(PF14524)—Cholinetransferase (PF04991) ND ND Comamonadaceae bacterium PBBC1 NKIP01000013.1 OYU11157.1 NBD (PF000005)—Wzt_C(PF14524)—Acetyltransferase (PF13527) ND ND Pseudomonas fluorescens F113 CP003150.1 AEV62088.1 NBD (PF000005)—Wzt_C(PF14524)—PvTase (PF04230) AEV62086.1 CC—GT (PF13692)—GT (PF00535) Open in new tab A small number of Wzt_C sequences identified by Pfam appear to be putative CBMs without a cognate NBD. At first glance, this could suggest natural examples where the NBD and CBM domains operate in trans. Wzt domains can function in vivo when expressed as separate domains in the E. coli O9a/O8 and K. pneumoniae O12 prototype systems (Cuthbertson et al. 2007; Mann et al. 2016), although this format has not been observed in native systems. However, closer examination of sequences involving stand-alone CBMs revealed that many of these are most likely annotation errors where an obvious NBD ORF is present but was not initially assigned. Others reflect unassembled shotgun sequences, in which the read likely does not contain the complete gene sequence, so the situation remains uncertain. One example appears to be bona fide. The sequence from Firmicutes bacterium CAG:95 (Genbank ID CBKF010000177.1) contains adjacent wzt (NBD; Protein ID CDF08094.1) and wzt_C (Protein ID CDF08093.1) genes in the same orientation but separated by a short intergenic region (38 bp). This results in separate reading frames rather than reflecting an unrecognized stop codon in wzt. A wzm homolog precedes this wzt and a gene encoding a predicted coiled coil-containing GT (Protein ID CDF08090.1) is found downstream of wzt. Conclusion The bioinformatic analyses presented here suggest that Wzm–Wzt exporters incorporating CBMs can accommodate a diverse set of biochemical termination strategies, as well as several distinct approaches to regulating chain length. Analyses of poorly investigated systems predict novel functional features, and we hope that these will stimulate further investigation by the research community. Firstly, our results indicate substantial chemical diversity in polysaccharide-terminating moieties, and further variations may emerge as the sequence database expands. We also identified new multimodular enzymes with novel functionalities that may participate in polymerization and/or termination. All of these provide targets for biochemical investigations that will deepen our understanding of the principles that underpin glycan assembly. In all cases, definitive interpretation begins with (and is dependent on) integration of sequence data for the relevant glycan biosynthesis loci with complete structures of the glycans. Fortunately, these elements are increasingly being pursued in parallel in the contemporary literature. One key implication is the need for future polysaccharide structure determination to take special care to identify low abundance substitutions at the nonreducing terminus. Depending on the residues and linkages, some may be lost during glycan purification steps (e.g., acid hydrolysis) or overlooked during structural assignments. Genetic data that predict a CBM and (in many cases) a protein with a coiled-coil motif should serve as useful indicators that such terminal residues are likely present. The molecular machinery involved in bacterial glycan formation is remarkably efficient. While the precise number of active complexes per cell is uncertain, numbers in the range of hundreds have been proposed for translocation of completed LPS and these collectively export a calculated 70,000 molecules/minute to sustain growth and division of E. coli (reviewed in (Whitfield and Trent 2014)). In wild-type bacteria, where a major proportion of total LPS molecules contain OPS, a similar number of OPS assembly complexes is anticipated. Understanding the factors that underpin the efficient synthesis and export of glycans is critically dependent on further structural biology initiatives. At a simple level, a Wzm–Wzt structure that includes a CBM would provide important insight needed to design experiments to unravel the processes involved in the initiation of transport. Another priority is a catalogue of structures of biosynthesis enzyme-transporter complexes from the leading prototypes described above. These architectures (particularly the relative positions of the essential catalytic sites) will offer a foundation for fundamental understanding of the relationship between system design constraints and the challenges in coordinating synthesis and export of these long-chain glycans. The evolution of the glycan modification and transport machinery could be driven by the need to optimize the capacity for glycans to perform key functions, such as protection against the host innate immune response. For example, OPS chain length is an important feature in defending against complement-mediated killing (Miajlovic and Smith 2014). However, the machinery described here is not confined to pathogens; involvement in other roles, such as masking of conserved bacteriophage receptors, could offer a broad competitive advantage. The currently known systems offer a balance between the distribution of chain lengths and the number of chains produced, essentially providing extensive surface coverage with glycans of appropriate chain length. Logically, this may serve to limit resource investment into polysaccharide synthesis beyond the biologically important length. These assembly processes also have practical importance in vaccine development. Polysaccharides are generally conjugated to a carrier protein to elicit a T-cell-dependent immune response (Rappuoli 2018). Historically, this has been accomplished by chemical crosslinking, but an emerging strategy employs bacterial protein glycosylation machinery in recombinant E. coli to generate glycoproteins of choice (Natarajan et al. 2018). Export and optimal glycan chain length are important considerations and can be modulated using molecular biology approaches, providing that the essential machinery is understood. Unresolved questions What are the arrangements of terminating enzymes? Is a CBM always needed with a coiled-coil molecular ruler? What are the biological implications of the various terminator chemistries and chain-length regulation formats? Funding Discovery Grants (RGPIN-2015-04622 to C.W. and RGPIN-2015-0405 to M.S.K.) from that Natural Sciences and Engineering Research Council of Canada. C.W. holds a Canada Research Chair and E.M. is a recipient of an Alexander Graham Bell Canada Graduate Scholarship from the Natural Sciences and Engineering Research Council. Contributions E.M. collected and analyzed bioinformatic data. All authors contributed to interpretation of the results and manuscript preparation. Conflict of interest statement None declared. Abbreviations ABC ATP-binding cassette; CBM carbohydrate-binding module; CPA common polysaccharide antigen; CPS capsular polysaccharide; EPS exopolysaccharide; GlcNAc N-acetylglucosamine; GT glycosyltransferase; HMM hidden Markov model; Kdo 3-deoxy-d-manno-oct-2-ulosonic acid; LPS lipopolysaccharide; Man Mannose; NBD nucleotide-binding domain; Nod nodulation; OPS O antigen; OS oligosaccharide; Rha rhamnose; SCWP secondary cell-wall polymer; SLG S-layer glycan; TMD transmembrane domains; Und undecaprenyl References Alam MT , Ray SS , Chun CN et al. 2016 . Major shift of toxigenic V. cholerae O1 from Ogawa to Inaba serotype isolated from clinical and environmental samples in Haiti . PLoS Negl Trop Dis. 10 : e0005045 . Google Scholar Crossref Search ADS PubMed WorldCat Altschul SF , Gish W , Miller W , Myers EW , Lipman DJ . 1990 . Basic local alignment search tool . J Mol Biol. 215 : 403 – 410 . Google Scholar Crossref Search ADS PubMed WorldCat Arsenault TL , MacLean DB , Zou W , Szarek WA . 1994 . Smith-degradative studies on the polysaccharide portion of A-band lipopolysaccharide from a mutant (AK1401) of Pseudomonas aeruginosa strain PAO1 . Can J Chem. 72 : 1376 – 1382 . Google Scholar Crossref Search ADS WorldCat Bi Y , Mann E , Whitfield C , Zimmer J . 2018 . Architecture of a channel-forming O-antigen polysaccharide ABC transporter . Nature. 553 : 361 – 365 . Google Scholar Crossref Search ADS PubMed WorldCat Boraston AB , Bolam BN , Gilbert HJ , Davies GJ . 2004 . Carbohydrate-binding modules: Fine-tuning polysaccharide recognition . Biochem J. 382 : 769 – 781 . Google Scholar Crossref Search ADS PubMed WorldCat Bordignon E , Grote M , Schneider E . 2010 . The maltose ATP-binding cassette transporter in the 21st century-towards a structural dynamic perspective on its mode of action . Mol Microbiol. 77 : 1354 – 1366 . Google Scholar Crossref Search ADS PubMed WorldCat Brown S , Santa Maria JP , Walker S . 2013 . Wall teichoic acids of gram-positive bacteria . Annu Rev Microbiol. 67 : 313 – 336 . Google Scholar Crossref Search ADS PubMed WorldCat Caffalette CA , Corey RA , Sansom MSP , Stansfeld PJ , Zimmer J . 2019 . A lipid gating mechanism for the channel-forming O antigen ABC transporter . Nat Commun. 10 : 824 . Google Scholar Crossref Search ADS PubMed WorldCat Clarke BR , Cuthbertson L , Whitfield C . 2004 . Nonreducing terminal modifications determine the chain length of polymannose O antigens of Escherichia coli and couple chain termination to polymer export via an ATP-binding cassette transporter . J Biol Chem. 279 : 35709 – 35718 . Google Scholar Crossref Search ADS PubMed WorldCat Clarke BR , Greenfield LK , Bouwman C , Whitfield C . 2009 . Coordination of polymerization, chain termination, and export in assembly of the Escherichia coli lipopolysaccharide O9a antigen in an ATP-binding cassette transporter-dependent pathway . J Biol Chem. 284 : 30662 – 30672 . Google Scholar Crossref Search ADS PubMed WorldCat Clarke BR , Richards MR , Greenfield LK , Hou D , Lowary TL , Whitfield C . 2011 . In vitro reconstruction of the chain termination reaction in biosynthesis of the Escherichia coli O9a O-polysaccharide: The chain-length regulator, WbdD, catalyzes the addition of methyl phosphate to the non-reducing terminus of the growing . J Biol Chem. 286 : 41391 – 41401 . Google Scholar Crossref Search ADS PubMed WorldCat Cock PJA , Antao T , Chang JT , Chapman BA , Cox CJ , Dalke A , Friedberg I , Hamelryck T , Kauff F , Wilczynski B , de Hoon MJL . 2009 . Biopython: Freely available python tools for computational molecular biology and bioinformatics . Bioinformatics. 25 : 1422 – 1423 . Google Scholar Crossref Search ADS PubMed WorldCat Cuthbertson L , Kimber MS , Whitfield C . 2007 . Substrate binding by a bacterial ABC transporter involved in polysaccharide export . Proc Natl Acad Sci. 104 : 19529 – 19534 . Google Scholar Crossref Search ADS WorldCat Cuthbertson L , Kos V , Whitfield C . 2010 . ABC transporters involved in export of cell surface glycoconjugates . Microbiol Mol Biol Rev. 74 : 341 – 362 . Google Scholar Crossref Search ADS PubMed WorldCat Cuthbertson L , Powers J , Whitfield C . 2005 . The C-terminal domain of the nucleotide-binding domain protein Wzt determines substrate specificity in the ATP-binding cassette transporter for the lipopolysaccharide O-antigens in Escherichia coli serotypes O8 and O9a . J Biol Chem. 280 : 30310 – 30319 . Google Scholar Crossref Search ADS PubMed WorldCat D’Haeze W , Holsters M . 2002 . Nod factor structures, responses, and perception during initiation of nodule development . Glycobiology. 12 : 79R – 105R . Google Scholar Crossref Search ADS PubMed WorldCat D’Haeze W , Leoff C , Freshour G , Noel KD , Carlson RW . 2007 . Rhizobium etli CE3 bacteroid lipopolysaccharides are structurally similar but not identical to those produced by cultured CE3 bacteria . J Biol Chem. 282 : 17101 – 17113 . Google Scholar Crossref Search ADS PubMed WorldCat Doyle L , Ovchinnikova OG , Myler K et al. 2019 . Biosynthesis of a conserved glycolipid anchor for gram-negative bacterial capsules . Nat Chem Biol. 15 : 632 – 640 . Google Scholar Crossref Search ADS PubMed WorldCat El-Gebali S , Mistry J , Bateman A et al. 2019 . The Pfam protein families database in 2019 . Nucleic Acids Res. 47 : D427 – D432 . Google Scholar Crossref Search ADS PubMed WorldCat Forsberg LS , Bhat UR , Carlson RW . 2000 . Structural characterization of the O-antigenic polysaccharide of the lipopolysaccharide from Rhizobium etli strain CE3 . J Biol Chem. 275 : 18851 – 18863 . Google Scholar Crossref Search ADS PubMed WorldCat Gaur D , Wilkinson SG . 2006 . Lipopolysaccharide from Burkholderia vietnamiensis strain LMG 6999 contains two polymers identical to those present in the reference strain for Burkholderia cepacia serogroup O4 . FEMS Microbiol Lett. 157 : 183 – 188 . Google Scholar Crossref Search ADS WorldCat Gil-Serrano AM , González-Jiménez I , Tejero Mateo P et al. 1995 . Structural analysis of the O-antigen of the lipopolysaccharide of Rhizobium tropici CIAT899 . Carbohydr Res. 275 : 285 – 294 . Google Scholar Crossref Search ADS PubMed WorldCat Greenfield LK , Richards MR , Li J , Wakarchuk WW , Lowary TL , Whitfield C . 2012a . Biosynthesis of the polymannose lipopolysaccharide O-antigens from Escherichia coli serotypes O8 and O9a requires a unique combination of single- and multiple-active site mannosyltransferases . J Biol Chem. 287 : 35078 – 35091 . Google Scholar Crossref Search ADS WorldCat Greenfield LK , Richards MR , Vinogradov E , Wakarchuk WW , Lowary TL , Whitfield C . 2012b . Domain organization of the polymerizing mannosyltransferases involved in synthesis of the Escherichia coli O8 and O9a lipopolysaccharide O-antigens . J Biol Chem. 287 : 38135 – 38149 . Google Scholar Crossref Search ADS WorldCat Greenfield LK , Whitfield C . 2012 . Synthesis of lipopolysaccharide O-antigens by ABC transporter-dependent pathways . Carbohydr Res. 356 : 12 – 24 . Google Scholar Crossref Search ADS PubMed WorldCat Hagelueken G , Clarke BR , Huang H et al. 2015 . A coiled-coil domain acts as a molecular ruler to regulate O-antigen chain length in lipopolysaccharide . Nat Struct Mol Biol. 22 : 50 – 56 . Google Scholar Crossref Search ADS PubMed WorldCat Hagelueken G , Huang H , Clarke BR , Lebl T , Whitfield C , Naismith JH . 2012 . Structure of WbdD: A bifunctional kinase and methyltransferase that regulates the chain length of the O antigen in Escherichia coli O9a . Mol Microbiol. 86 : 730 – 742 . Google Scholar Crossref Search ADS PubMed WorldCat Hager FF , López-Guzmán A , Krauter S et al. 2018 . Functional characterization of enzymatic steps involved in pyruvylation of bacterial secondary cell wall polymer fragments . Front Microbiol. 9 : 1356 . Google Scholar Crossref Search ADS PubMed WorldCat Heiss C , Burtnick MN , Roberts RA , Black I , Azadi P , Brett PJ . 2013 . Revised structures for the predominant O-polysaccharides expressed by Burkholderia pseudomallei and Burkholderia mallei . Carbohydr Res. 381 : 6 – 11 . Google Scholar Crossref Search ADS PubMed WorldCat Hisatsune K , Kondo S , Isshiki Y , Iguchi T , Haishima Y . 1993 . Occurrence of 2-O-methyl-N-(3-deoxy-L-glycero-tetronyl)-D-perosamine (4-amino-4,6-dideoxy-D-manno-pyranose) in lipopolysaccharide from Ogawa but not from Inaba O forms of O1 Vibrio cholerae . Biochem Biophys Res Commun. 190 : 302 – 307 . Google Scholar Crossref Search ADS PubMed WorldCat Hug I , Couturier MR , Rooker MM , Taylor DE , Stein M , Feldman MF . 2010 . Helicobacter pylori lipopolysaccharide is synthesized via a novel pathway with an evolutionary connection to protein N-glycosylation . PLoS Pathog. 6 : e1000819 . Google Scholar Crossref Search ADS PubMed WorldCat Ielpi L , Couso RO , Dankert MA . 1981 . Xanthan gum biosynthesis pyruvic acid acetal residues are transferred from phosphoenolpyruvate to the pentasaccharide-P-P-lipid . Biochem Biophys Res Commun. 102 : 1400 – 1408 . Google Scholar Crossref Search ADS PubMed WorldCat Ito T , Higuchi T , Hirobe M , Hiramatsu K , Yokota T . 1994 . Identification of a novel sugar, 4-amino-4,6-dideoxy-2-O-methylmannose in the lipopolysaccharide of Vibrio cholerae O1 serotype Ogawa . Carbohydr Res. 256 : 113 – 128 . Google Scholar Crossref Search ADS PubMed WorldCat Jabbouri S , Fellay R , Talmont F et al. 1995 . Involvement of nodS in N-methylation and nodU in 6-O-carbamoylation of Rhizobium sp. NGR234 nod factors . J Biol Chem. 270 : 22968 – 22973 . Google Scholar Crossref Search ADS PubMed WorldCat Jabbouri S , Relić B , Hanin M et al. 1998 . nolO and noeI (HsnIII) of Rhizobium sp. NGR234 are involved in 3-O-carbamoylation and 2-O-methylation of nod factors . J Biol Chem. 273 : 12047 – 12055 . Google Scholar Crossref Search ADS PubMed WorldCat Jansson PE , Lönngren J , Widmalm G et al. 1985 . Structural studies of the O-antigen polysaccharides of Klebsiella O5 and Escherichia coli O8 . Carbohydr Res. 145 : 59 – 66 . Google Scholar Crossref Search ADS PubMed WorldCat Kählig H , Kolarich D , Zayni S et al. 2005 . N-acetylmuramic acid as capping element of α-D-fucose-containing S-layer glycoprotein glycans from Geobacillus tepidamans GS5-97T . J Biol Chem. 280 : 20292 – 20299 . Google Scholar Crossref Search ADS PubMed WorldCat Keenleyside WJ , Whitfield C . 1996 . A novel pathway for O-polysaccharide biosynthesis in Salmonella enterica serovar Borreze . J Biol Chem. 271 : 28581 – 28592 . Google Scholar Crossref Search ADS PubMed WorldCat Kelley LA , Mezulis S , Yates CM , Wass MN , Sternberg MJE . 2015 . The Phyre2 web portal for protein modeling, prediction and analysis . Nat Protoc. 10 : 845 – 858 . Google Scholar Crossref Search ADS PubMed WorldCat Kelly SD , Clarke BR , Ovchinnikova OG et al. 2019 . Klebsiella pneumoniae O1 and O2ac antigens provide prototypes for an unusual strategy for polysaccharide antigen diversification . J Biol Chem . 294 : 10863 – 10876 . Google Scholar Crossref Search ADS PubMed WorldCat King JD , Berry S , Clarke BR , Morris RJ , Whitfield C . 2014 . Lipopolysaccharide O antigen size distribution is determined by a chain extension complex of variable stoichiometry in Escherichia coli O9a . Proc Natl Acad Sci. 111 : 6407 – 6412 . Google Scholar Crossref Search ADS WorldCat Knirel YA , Ovod VV , Zdorovenko GM , Gvozdyak RI , Krohn KJ . 1998 . Structure of the O polysaccharide and immunochemical relationships between the lipopolysaccharides of Pseudomonas syringae pathovar tomato and pathovar maculicola . Eur J Biochem. 258 : 657 – 661 . Google Scholar Crossref Search ADS PubMed WorldCat Knirel YA , Shashkov AS , Senchenkova SN , Ajiki Y , Fukuoka S . 2002 . Structure of the O-polysaccharide of Pseudomonas putida FERM P-18867 . Carbohydr Res. 337 : 1589 – 1591 . Google Scholar Crossref Search ADS PubMed WorldCat Kos V , Cuthbertson L , Whitfield C . 2009 . The Klebsiella pneumoniae O2a antigen defines a second mechanism for O antigen ATP-binding cassette transporters . J Biol Chem. 284 : 2947 – 2956 . Google Scholar Crossref Search ADS PubMed WorldCat Kos V , Whitfield C . 2010 . A membrane-located glycosyltransferase complex required for biosynthesis of the D-Galactan I lipopolysaccharide O antigen in Klebsiella pneumoniae . J Biol Chem. 285 : 19668 – 19678 . Google Scholar Crossref Search ADS PubMed WorldCat Kubler-Kielb J , Whitfield C , Katzenellenbogen E , Vinogradov E . 2012 . Identification of the methyl phosphate substituent at the non-reducing terminal mannose residue of the O-specific polysaccharides of Klebsiella pneumoniae O3, Hafnia alvei PCM 1223 and Escherichia coli O9/O9a LPS . Carbohydr Res. 347 : 186 – 188 . Google Scholar Crossref Search ADS PubMed WorldCat Kuk ACY , Hao A , Guan Z , Lee S . 2019 . Visualizing conformation transitions of the lipid II flippase MurJ . Nat Commun. 10 : 1736 . Google Scholar Crossref Search ADS PubMed WorldCat Lam JS , Taylor VL , Islam ST , Hao Y , Kocíncová D . 2011 . Genetic and functional diversity of Pseudomonas aeruginosa lipopolysaccharide . Front Microbiol. 2 : 118 . Google Scholar Crossref Search ADS PubMed WorldCat Lerouge I , Laeremans T , Verreth C et al. 2001 . Identification of an ATP-binding cassette transporter for export of the O-antigen across the inner membrane in Rhizobium etli based on the genetic, functional, and structural analysis of an LPS mutant deficient in O-antigen . J Biol Chem. 276 : 17190 – 17198 . Google Scholar Crossref Search ADS PubMed WorldCat Lerouge I , Verreth C , Michiels J et al. 2003 . Three genes encoding for putative methyl- and acetyltransferases map adjacent to the wzm and wzt genes and are essential for O-antigen biosynthesis in Rhizobium etli CE3 . Mol Plant Microbe Interact. 16 : 1085 – 1093 . Google Scholar Crossref Search ADS PubMed WorldCat Letunic I , Bork P . 2019 . Interactive tree of life (iTOL) v4: Recent updates and new developments . Nucleic Acids Res. 44 : W242 – W245 . Google Scholar Crossref Search ADS WorldCat Liston SD , Clarke BR , Greenfield LK , Richards MR , Lowary TL , Whitfield C . 2015 . Domain interactions control complex formation and polymerase specificity in the biosynthesis of the Escherichia coli O9a antigen . J Biol Chem. 290 : 1075 – 1085 . Google Scholar Crossref Search ADS PubMed WorldCat Liston SD , Mann E , Whitfield C . 2017 . Glycolipid substrates for ABC transporters required for the assembly of bacterial cell-envelope and cell-surface glycoconjugates . Biochim Biophys Acta-Mol Cell Biol Lipids. 1862 : 1394 – 1403 . Google Scholar Crossref Search ADS PubMed WorldCat Locher KP . 2016 . Mechanistic diversity in ATP-binding cassette (ABC) transporters . Nat Struct Mol Biol. 23 : 487 – 493 . Google Scholar Crossref Search ADS PubMed WorldCat Lupas A . 1996 . Prediction and analysis of coiled-coil structures . Methods Enzymol. 266 : 513 – 525 . Google Scholar Crossref Search ADS PubMed WorldCat Lupas A , van Dyke M , Stock J . 1991 . Predicting coiled coils from protein sequences . Science. 252 : 1162 – 1164 . Google Scholar Crossref Search ADS PubMed WorldCat Mann E , Mallette E , Clarke BR , Kimber MS , Whitfield C . 2016 . The Klebsiella pneumoniae O12 ATP-binding cassette (ABC) transporter recognizes the terminal residue of its O-antigen polysaccharide substrate . J Biol Chem. 291 : 9748 – 9761 . Google Scholar Crossref Search ADS PubMed WorldCat Manning PA , Heuzenroeder MW , Yeadon J , Leavesley DI , Reeves PR , Rowley D . 1986 . Molecular cloning and expression in Escherichia coli K-12 of the O antigens of the Inaba and Ogawa serotypes of the Vibrio cholerae O1 lipopolysaccharides and their potential for vaccine development . Infect Immun. 53 : 272 – 277 . Google Scholar PubMed WorldCat Marzocca MP , Harding NE , Petroni EA , Cleary JM , Ielpi L . 1991 . Location and cloning of the ketal pyruvate transferase gene of Xanthomonas campestris . J Bacteriol. 173 : 7519 – 7524 . Google Scholar Crossref Search ADS PubMed WorldCat McCarthy RR , Mazon-Moya MJ , Moscoso JA et al. 2017 . Cyclic-di-GMP regulates lipopolysaccharide modification and contributes to Pseudomonas aeruginosa immune evasion . Nat Microbiol. 2 : 17027 . Google Scholar Crossref Search ADS PubMed WorldCat McNamara JT , Morgan JLW , Zimmer J . 2015 . A molecular description of cellulose biosynthesis . Annu Rev Biochem. 84 : 895 – 921 . Google Scholar Crossref Search ADS PubMed WorldCat Mi W , Li Y , Yoon SH , Ernst RK , Walz T , Liao M . 2017 . Structural basis of MsbA-mediated lipopolysaccharide transport . Nature. 549 : 233 – 237 . Google Scholar Crossref Search ADS PubMed WorldCat Miajlovic H , Smith SG . 2014 . Bacterial self-defence: How Escherichia coli evades serum killing . FEMS Microbiol Lett. 354 : 1 – 9 . Google Scholar Crossref Search ADS PubMed WorldCat Miller MA , Pfeiffer W , Schwartz T . 2010 . Creating the CIPRES science gateway for inference of large phylogenetic trees . In: 2010 gateway computing environments workshop, GCE 2010 , Piscataway : IEEE . p. 1 – 8 . Google Preview WorldCat COPAC Mitchell AL , Attwood TK , Babbitt PC et al. 2019 . InterPro in 2019: Improving coverage, classification and access to protein sequence annotations . Nucleic Acids Res. 47 : D351 – D360 . Google Scholar Crossref Search ADS PubMed WorldCat Molinaro A , Silipo A , Lanzetta R , Newman M , Dow JM , Parrilli M . 2003 . Structural elucidation of the O-chain of the lipopolysaccharide from Xanthomonas campestris strain 8004 . Carbohydr Res. 338 : 277 – 281 . Google Scholar Crossref Search ADS PubMed WorldCat Mostowy RJ , Holt KE . 2018 . Diversity-generating machines: Genetics of bacterial sugar-coating . Trends Microbiol. 26 : 1008 – 1021 . Google Scholar Crossref Search ADS PubMed WorldCat Natarajan A , Jaroentomeechai T , Li M , Glasscock CJ , DeLisa MP . 2018 . Metabolic engineering of glycoprotein biosynthesis in bacteria . Emerg Top Life Sci. 2 : 419 – 432 . Google Scholar Crossref Search ADS WorldCat Noel KD , Box JM , Bonne VJ . 2004 . 2-O-methylation of fucosyl residues of a rhizobial lipopolysaccharide is increased in response to host exudate and is eliminated in a symbiotically defective mutant . Appl Environ Microbiol. 70 : 1537 – 1544 . Google Scholar Crossref Search ADS PubMed WorldCat Novotny R , Pfoestl A , Messner P , Schäffer C . 2004 . Genetic organization of chromosomal S-layer glycan biosynthesis loci of Bacillaceae . Glycoconj J. 20 : 435 – 447 . Google Scholar Crossref Search ADS PubMed WorldCat Ojeda KJ , Box JM , Noel KD . 2010 . Genetic basis for Rhizobium etli CE3 O-antigen O-methylated residues that vary according to growth conditions . J Bacteriol. 192 : 679 – 690 . Google Scholar Crossref Search ADS PubMed WorldCat Ojeda KJ , Simonds L , Noel KD . 2013 . Roles of predicted glycosyltransferases in the biosynthesis of the Rhizobium etli CE3 O antigen . J Bacteriol. 195 : 1949 – 1958 . Google Scholar Crossref Search ADS PubMed WorldCat Okuda S , Sherman DJ , Silhavy TJ , Ruiz N , Kahne D . 2016 . Lipopolysaccharide transport and assembly at the outer membrane: The PEZ model . Nat Rev Microbiol. 14 : 337 – 345 . Google Scholar Crossref Search ADS PubMed WorldCat Ovchinnikova OG , Mallette E , Koizumi A , Lowary TL , Kimber MS , Whitfield C . 2016 . Bacterial β-Kdo glycosyltransferases represent a new glycosyltransferase family (GT99) . Proc Natl Acad Sci. 113 : E3120 – E3129 . Google Scholar Crossref Search ADS WorldCat Oxley D , Wilkinson SG . 1988 . Structural studies of glucorhamnans isolated from the lipopolysaccharides of reference strains for Serratia marcescens serogroups O4 and O7, and of an O14 strain . Carbohydr Res. 175 : 111 – 117 . Google Scholar Crossref Search ADS PubMed WorldCat Parolis LA , Parolis H , Dutton GG . 1986 . Structural studies of the O-antigen polysaccharide of Escherichia coli O9a . Carbohydr Res. 155 : 272 – 276 . Google Scholar Crossref Search ADS PubMed WorldCat Parry DA . 1982 . Coiled-coils in alpha-helix-containing proteins: Analysis of the residue types within the heptad repeat and the use of these data in the prediction of coiled-coils in other proteins . Biosci Rep. 2 : 1017 – 1024 . Google Scholar Crossref Search ADS PubMed WorldCat Perepelov AV , Li D , Liu B et al. 2009 . Structural and genetic characterization of Escherichia coli O99 antigen . FEMS Immunol Med Microbiol. 57 : 80 – 87 . Google Scholar Crossref Search ADS PubMed WorldCat Perez-Perez GI , Blaser MJ , Bryner JH . 1986 . Lipopolysaccharide structures of Campylobacter fetus are related to heat-stable serogroups . Infect Immun. 51 : 209 – 212 . Google Scholar PubMed WorldCat Perez C , Gerber S , Boilevin J et al. 2015 . Structure and mechanism of an active lipid-linked oligosaccharide flippase . Nature. 524 : 433 – 438 . Google Scholar Crossref Search ADS PubMed WorldCat Perry MB , Maclean LM , Brisson J , Wilson ME . 1996 . Structures of the antigenic O-polysaccharides of lipopolysaccharides produced by Actinobacillus actinomycetemcomitans serotypes a, c, d and e . Eur J Biochem. 242 : 682 – 688 . Google Scholar Crossref Search ADS PubMed WorldCat Poli A , Anzelmo G , Nicolaus B . 2010 . Bacterial exopolysaccharides from extreme marine habitats: Production, characterization and biological activities . Mar Drugs. 8 : 1779 – 1802 . Google Scholar Crossref Search ADS PubMed WorldCat Raetz CRH , Whitfield C . 2002 . Lipopolysaccharide endotoxins . Annu Rev Biochem. 71 : 635 – 700 . Google Scholar Crossref Search ADS PubMed WorldCat Rappuoli R . 2018 . Glycoconjugate vaccines: Principles and mechanisms . Sci Transl Med . 10 : eaat4615 . Google Scholar Crossref Search ADS PubMed WorldCat Redmond JW . 1979 . The structure of the O-antigenic side chain of the lipopolysaccharide of Vibrio cholerae 569B (Inaba) . Biochim Biophys Acta. 584 : 346 – 352 . Google Scholar Crossref Search ADS PubMed WorldCat Schäffer C , Messner P . 2017 . Emerging facets of prokaryotic glycosylation (M Pohlschroder, Ed.) . FEMS Microbiol Rev. 41 : 49 – 91 . Google Scholar Crossref Search ADS PubMed WorldCat Schäffer C , Müller N , Christian R et al. 1999 . Complete glycan structure of the S-layer glycoprotein of Aneurinibacillus thermoaerophilus GS4-97 . Glycobiology. 9 : 407 – 414 . Google Scholar Crossref Search ADS PubMed WorldCat Schäffer C , Wugeditsch T , Kählig H , Scheberl A , Zayni S , Messner P . 2002 . The surface layer (S-layer) glycoprotein of Geobacillus stearothermophilus NRS 2004/3a. Analysis of its glycosylation . J Biol Chem. 277 : 6230 – 6239 . Google Scholar Crossref Search ADS PubMed WorldCat Schirner K , Stone LK , Walker S . 2011 . ABC transporters required for export of wall teichoic acids do not discriminate between different main chain polymers . ACS Chem Biol. 6 : 407 – 412 . Google Scholar Crossref Search ADS PubMed WorldCat Senchenkova SN , Shashkov AS , Knirel YA , McGovern JJ , Moran AP . 1996 . The O-specific polysaccharide chain of Campylobacter fetus serotype B lipopolysaccharide is a D-rhamnan terminated with 3-O-methyl-D-rhamnose (D-acofriose) . Eur J Biochem. 239 : 434 – 438 . Google Scholar Crossref Search ADS PubMed WorldCat Silipo A , Ierano T , Lanzetta R , Molinaro A , Parrilli M . 2008 . The structure of the O-chain polysaccharide from the gram-negative endophytic bacterium Burkholderia phytofirmans strain PsJN . European J Org Chem. 2008 : 2303 – 2308 . Google Scholar Crossref Search ADS WorldCat Sonnhammer E , Eddy SR , Birney E , Bateman A , Durbin R . 1998 . Pfam: Multiple sequence alignments and HMM-profiles of protein domains . Nucleic Acids Res. 26 : 320 – 322 . Google Scholar Crossref Search ADS PubMed WorldCat Stamatakis A . 2014 . RAxML version 8: A tool for phylogenetic analysis and post-analysis of large phylogenies . Bioinformatics. 30 : 1312 – 1313 . Google Scholar Crossref Search ADS PubMed WorldCat Steiner K , Novotny R , Werz DB et al. 2008 . Molecular basis of S-layer glycoprotein glycan biosynthesis in Geobacillus stearothermophilus . J Biol Chem. 283 : 21120 – 21133 . Google Scholar Crossref Search ADS PubMed WorldCat Stevenson G , Andrianopoulos K , Hobbs M , Reeves PR . 1996 . Organization of the Escherichia coli K-12 gene cluster responsible for production of the extracellular polysaccharide colanic acid . J Bacteriol. 178 : 4885 – 4893 . Google Scholar Crossref Search ADS PubMed WorldCat Stroeher UH , Karageorgos LE , Morona R , Manning PA . 1992 . Serotype conversion in vibrio cholerae O1 . Proc Natl Acad Sci. 89 : 2566 – 2570 . Google Scholar Crossref Search ADS WorldCat Toukach PV , Egorova KS . 2016 . Carbohydrate structure database merged from bacterial, archaeal, plant and fungal parts . Nucleic Acids Res. 44 : D1229 – D1236 . Google Scholar Crossref Search ADS PubMed WorldCat Vinogradov E , Frirdich E , MacLean LL et al. 2002 . Structures of lipopolysaccharides from Klebsiella pneumoniae. Elucidation of the structure of the linkage region between core and polysaccharide O chain and identification of the residues at the non-reducing termini of the O chains . J Biol Chem . 277 : 25070 – 25081 . Google Scholar Crossref Search ADS PubMed WorldCat Vinogradov E , King JD , Pathak AK , Harvill ET , Preston A . 2010 . Antigenic variation among Bordetella: Bordetella bronchiseptica strain MO149 expresses a novel O chain that is poorly immunogenic . J Biol Chem. 285 : 26869 – 26877 . Google Scholar Crossref Search ADS PubMed WorldCat Whitfield C . 2006 . Biosynthesis and assembly of capsular polysaccharides in Escherichia coli . Annu Rev Biochem . 75 : 39 – 68 . Google Scholar Crossref Search ADS PubMed WorldCat Whitfield C , Amor PA , Koplin R . 1997 . Modulation of the surface architecture of gram-negative bacteria by the action of surface polymer: Lipid A-core ligase and by determinants of polymer chain length . Mol Microbiol. 23 : 629 – 638 . Google Scholar Crossref Search ADS PubMed WorldCat Whitfield C , Szymanski CM , Aebi M . 2017 . Eubacteria. In: Varki A , Cummings RD , Esko JD et al. , editors. Essentials of glycobiology . 3rd ed. Cold Spring Harbor (NY) : Cold Spring Harbor Laboratory Press . Google Preview WorldCat COPAC Whitfield C , Trent MS . 2014 . Biosynthesis and export of bacterial lipopolysaccharides . Annu Rev Biochem. 83 : 99 – 128 . Google Scholar Crossref Search ADS PubMed WorldCat Whitney JC , Howell PL . 2013 . Synthase-dependent exopolysaccharide secretion in gram-negative bacteria . Trends Microbiol. 21 : 63 – 72 . Google Scholar Crossref Search ADS PubMed WorldCat Williams DM , Ovchinnikova OG , Koizumi A et al. 2017 . Single polysaccharide assembly protein that integrates polymerization, termination, and chain-length quality control . Proc Natl Acad Sci. 114 : E1215 – E1223 . Google Scholar Crossref Search ADS WorldCat Willis LM , Whitfield C . 2013 . Structure, biosynthesis, and function of bacterial capsular polysaccharides synthesized by ABC transporter-dependent pathways . Carbohydr Res. 378 : 35 – 44 . Google Scholar Crossref Search ADS PubMed WorldCat Yang JG , Rees DC . 2015 . The allosteric regulatory mechanism of the Escherichia coli MetNI methionine ATP binding cassette (ABC) transporter . J Biol Chem. 290 : 9135 – 9140 . Google Scholar Crossref Search ADS PubMed WorldCat Yoshida Y , Nakano Y , Suzuki N , Nakao H , Yamashita Y , Koga T . 1999 . Genetic analysis of the gene cluster responsible for synthesis of serotype e-specific polysaccharide antigen in Actinobacillus actinomycetemcomitans . Biochim Biophys Acta-Gene Struct Expr. 1489 : 457 – 461 . Google Scholar Crossref Search ADS WorldCat Zayni S , Steiner K , Pföstl A et al. 2007 . The dTDP-4-dehydro-6-deoxyglucose reductase encoding fcd gene is part of the surface layer glycoprotein glycosylation gene cluster of Geobacillus tepidamans GS5-97T . Glycobiology. 17 : 433 – 443 . Google Scholar Crossref Search ADS PubMed WorldCat Zheng S , Sham L , Rubino FA et al. 2018 . Structure and mutagenic analysis of the lipid II flippase MurJ from Escherichia coli . Proc Natl Acad Sci. 115 : 6709 – 6714 . Google Scholar Crossref Search ADS WorldCat © The Author(s) 2019. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://academic.oup.com/journals/pages/open_access/funder_policies/chorus/standard_publication_model) TI - Bioinformatics analysis of diversity in bacterial glycan chain-termination chemistry and organization of carbohydrate-binding modules linked to ABC transporters JF - Glycobiology DO - 10.1093/glycob/cwz066 DA - 2019-11-20 UR - https://www.deepdyve.com/lp/oxford-university-press/bioinformatics-analysis-of-diversity-in-bacterial-glycan-chain-Ld6H6KE4vg SP - 822 VL - 29 IS - 12 DP - DeepDyve ER -