Abstract Background Multiresistance in Gram-negative bacteria is often due to acquisition of several different antibiotic resistance genes, each associated with a different mobile genetic element, that tend to cluster together in complex conglomerations. Accurate, consistent annotation of resistance genes, the boundaries and fragments of mobile elements, and signatures of insertion, such as DR, facilitates comparative analysis of complex multiresistance regions and plasmids to better understand their evolution and how resistance genes spread. Objectives To extend the Repository of Antibiotic resistance Cassettes (RAC) web site, which includes a database of ‘features’, and the Attacca automatic DNA annotation system, to encompass additional resistance genes and all types of associated mobile elements. Methods Antibiotic resistance genes and mobile elements were added to RAC, from existing registries where possible. Attacca grammars were extended to accommodate the expanded database, to allow overlapping features to be annotated and to identify and annotate features such as composite transposons and DR. Results The Multiple Antibiotic Resistance Annotator (MARA) database includes antibiotic resistance genes and selected mobile elements from Gram-negative bacteria, distinguishing important variants. Sequences can be submitted to the MARA web site for annotation. A list of positions and orientations of annotated features, indicating those that are truncated, DR and potential composite transposons is provided for each sequence, as well as a diagram showing annotated features approximately to scale. Conclusions The MARA web site (http://mara.spokade.com) provides a comprehensive database for mobile antibiotic resistance in Gram-negative bacteria and accurately annotates resistance genes and associated mobile elements in submitted sequences to facilitate comparative analysis. Introduction Antibiotic resistance is an increasing global health problem exacerbated by the lack of new antibiotics. The focus is moving from Gram-positive bacteria, e.g. MRSA, to Gram-negative bacteria, particularly Enterobacteriaceae.1 MDR (simultaneous resistance to antibiotics of several classes) in the latter is often due to acquisition of several resistance genes that have been captured from various source organisms and that travel as part of different mobile genetic elements. The smallest of these mobile elements are gene cassettes, carrying an attC recombination site and, typically, a single ORF, often a resistance gene. Site-specific recombination between attC and the attI site in an integron, catalysed by the integron-encoded IntI integrase, results in capture and expression of cassette-borne genes.2,3 IS typically contain little more than a transposase (tnp) gene, generally taking up almost the entire length of the element, flanked by short terminal IR, designated IRL (left) and IRR (right) relative to the direction of tnp transcription.4 A pair of the same IS may capture intervening resistance gene(s) in a composite transposon. For some IS, for example as recently shown for IS26,5,6 a single copy can move adjacent resistance genes. ISEcp1 (and a few related IS) captures segments adjacent to IRR using alternative downstream sequences7 (IRalt here) as part of ‘transposition units’ (formerly abbreviated as TU,8 but TPU is used here to avoid confusion with IS26 ‘translocatable units’6). ISCR elements are believed to use a rolling circle transposition mechanism proceeding from oriIS (downstream of the rolling circle replicase gene; rcr) and replication through the terIS end allows capture of an adjacent region.9 Unit transposons (Tn) of the Tn3 family (including the Tn21 subgroup10) are typically bounded by 38 bp IR, contain transposase (tnpA) and resolvase (tnpR) genes, a resolution (res) site and resistance or other gene(s), e.g. for mercury resistance.11 Tn402-type elements have 25 bp IR, contain transposition (tniABQ) and resolvase (tniR) genes separated by res, may carry a class 1 integron (giving elements designated class 1 In/Tn here) and target the res site of Tn21-like transposons.12 Many of these mobile elements create flanking DR (also known as target site duplications, TSD) on insertion. Different mobile elements and associated resistance genes can move horizontally between bacteria, including different species, on conjugative plasmids. These typically consist of a ‘backbone’ encoding plasmid functions (e.g. replication, stability/maintenance, conjugation) into which accessory regions are inserted. Insertions that do not affect plasmid functions may act as ‘founder elements’ for further non-disruptive insertions, often leading to formation of complex multiresistance regions (MRR). Annotation and comparative analysis of MRR to understand their evolution and relationships is complicated by insertions of one mobile element inside another, deletions and rearrangements.8 Current non-specialized annotation software generally focuses on identifying potential genes from homology to those of known function. Several existing resources (recently summarized13) are designed to identify resistance genes, including ResFinder,14 CARD15 and ARG-ANNOT.16 SRST2 identifies resistance genes in short-read data17 and SSTAR is a stand-alone tool using other databases.18 ISfinder19 provides a database and BLAST tools for identifying IS (and some transposons). A Transposon Registry20 (now at http://www.lstmed.ac.uk/services/the-transposon-registry) lists and assigns Tn numbers and provides links to sequences. VRprofile21 uses these and other databases to detect resistance and/or virulence gene clusters and some mobile elements (not transposons). IntegronFinder22 finds attC sites and integron components (including genes in cassettes, but without specific names). None of these tools provides accurate annotation of both resistance genes and all relevant types of mobile element. We developed Attacca, which uses a ‘feature’ database (FDB), BLAST searches and computational grammars, to consistently and accurately annotate features and identify patterns.23–25 This enabled a survey of gene cassettes26 and discovery of new cassettes.27 Attacca was made freely available through the Repository of Antibiotic resistance Cassettes (RAC; http://rac.aihi.mq.edu.au/rac/) but limited to annotation of gene cassettes and integron features.28 We have extended the FDB and the Attacca grammars to create the Multiple Antibiotic Resistance Annotator (MARA; http://mara.spokade.com) resource for annotation of complex MRR in Enterobacteriaceae. Methods The Mara FDB Each entry in the FDB includes a unique name and a unique feature ID number (FID; allowing updates if necessary, e.g. to reflect changes in resistance gene nomenclature). The International Nucleotide Sequence Database Collaboration (INSDC) accession number of an exemplar (generally the first reported example, unless possible sequence errors or other problems have been identified), the start and end positions of the feature and the exemplar sequence are also included. Each entry also has a % identity match criterion (usually set at ∼98%). Another field may be used to give additional ‘constraints’ for annotation (e.g. a specified nucleotide must be present at a certain position, the sequence must translate to a specific amino acid sequence28) and/or specify DR length, if required. Additional information (e.g. previous/alternative names) may be included in a ‘Notes’ field. bla genes were initially compiled from http://www.lahey.org/Studies/ but updated from https://www.ncbi.nlm.nih.gov/bioproject/PRJNA313047 and qnr genes from http://www.lahey.org/qnrStudies/. In both cases allele numbers are assigned from protein sequences (for qnr genes, the second of two possible ATG start codons is used29). There is currently no dedicated nomenclature resource for genes encoding aminoglycoside modifying enzymes (AME) and two different naming schemes exist. Both specify the type of modification (aac, acetylation; aad/ant, adenylylation; aph, phosphorylation). ‘Shaw’30 names indicate the position modified (e.g. 3, 6′), phenotype (e.g. I, II) and variants [e.g. a, b; as in aac(3)-IIa] and are generally used in MARA if they are better established or less confusing. ‘Novick’31 names indicate the position modified (e.g. 6′ = A, 3 = C), assign numbers to different genes (e.g. aacA4) and are generally used for cassette-borne genes, following published numbers26 and coordinating with INTEGRALL (http://integrall.bio.ua.pt/).32 The proposed naming scheme for 16S rRNA methylase genes33 is used (the associated web site is no longer available). Selected tetracycline and macrolide resistance genes were added with reference to http://faculty.washington.edu/marilynr/, which uses <79% amino acid identity as a cut-off for a new name.34 In MARA, additional lower-case letters may be used to distinguish variants that are 80 to ∼98% identical. Selected IS commonly associated with resistance genes were added to the FDB from ISfinder (https://www-is.biotoul.fr). Isoforms (defined in ISfinder as >95% nucleotide identity and/or >98% Tnp protein identity to a known IS) may be distinguished as separate features in MARA by additional lower-case letters, if differences appear important (e.g. IS1294b35). IR of IS, usually shorter than the length cut-off used in MARA, are not generally included as features. ISCR elements are currently not included in ISfinder, so they were compiled from published information and available sequences (an ISCR webpage is available but does not seem to have been updated recently). Defined/published TPU were also included in the FDB (but any Tn numbers given to them are not used). Selected transposons commonly associated with resistance genes and MRR in Enterobacteriaceae (mostly Tn3/Tn21 and Tn402 families) were compiled from the literature and/or available sequences. Their IR (38 bp or 25 bp, respectively) are included as features (set at 100% identity). Group II introns may be found inserted in cassette arrays and plasmids and those found in Enterobacteriaceae were added from the Database for Bacterial Group II introns (http://webapps2.ucalgary.ca/∼groupii/).36 Extending the Attacca annotation system The Attacca lexical recognizer uses BLASTn37 to identify occurrences of any feature from the FDB in a sequence and annotate them if the level of identity meets the identity match criterion. Partial features (i.e. if the sequence starts/ends within the feature or because of truncation by another feature), will be annotated if at least 19 bp of either end or 32 bp from the middle are present (to avoid spurious annotation of short regions). Attacca was modified to allow annotation of overlapping features, such as overlapping ORFs or genes within a transposon, even if only a fragment is present. If a DR length is specified for an IS or transposon feature, Attacca determines whether identical sequences of this length lie immediately adjacent to each end of the feature. If so, a pair of DR are annotated. If not and the feature is an IS and the DR length is >3 bp Attacca determines whether a directly oriented matching sequence of the specified length lies adjacent to another complete copy of the same IS. If so, these DR and a ‘Composite Transposon’, encompassing the two IS and the intervening region, will be annotated. Results and discussion MARA lists The ‘Browse lists’ tab in MARA gives free access to lists of most features included in the FDB and details of the exemplar (but note that annotations in the INSDC entry may not always be correct). Details of gene cassettes remain available in RAC while MARA lists the resistance genes within them and ‘in gene cassette’ is indicated under Notes (or ‘assumed to be in gene cassette’ if only the gene sequence is available but closely related genes are known to be cassette-borne). Resistance genes are grouped by antibiotic class, with bla gene lists divided into classes A–D. Relationships between genes (e.g. groupings of bla genes such as blaCTX-M, blaIMP and blaVIM variants) may be indicated in Notes. Amino acid changes are noted for some bla groups, including TEM and SHV variants, where information on whether they are considered broad-spectrum (originally used for TEM and SHV variants that gave resistance to both penicillins and first-generation cephalosporins;38 ‘BSBL’), ESBL or inhibitor resistant (‘IRT’ for TEM; ‘IR’ for SHV), if available (Lahey web site/publications) is also included. blaGES variants demonstrated to confer carbapenem resistance are indicated. Allele numbering schemes for mobile resistance genes derived from intrinsic chromosomal genes of Enterobacteriaceae (e.g. blaSHV from Klebsiella pneumoniae,39,ampC genes40) generally do not distinguish variants found on the chromosome of different strains of the same species from mobilized variants that may be the result of mutations subsequent to capture. Evidence (e.g. from searches of GenBank) for a particular variant being mobilized or not may be listed in Notes (e.g. blaSHV genes found on plasmids or outside K. pneumoniae) and this information will be updated periodically. Two versions of the ‘-cr’ variant of aac(6′)-Ib/aacA4, conferring low-level fluoroquinolone resistance, are listed under quinolone resistance genes as aacA4-crA and aacA4-crC. A list of IS included in the MARA FDB is not provided, as ISfinder19 gives more extensive information, but there are lists of ISCR elements (with the definitions of the terIS ends used in MARA explained in corresponding Notes for each), TPU, unit transposons and Group II introns. Annotating sequences with MARA Access to the MARA annotation service requires registration (free for non-commercial use) using a valid e-mail address. The ‘Annotate new sequence’ page enables a nucleotide sequence to be pasted in or uploaded as a FASTA file. Each submitted sequence needs to be given a short description (for user reference) and the sequence type indicated. Indicating plasmid or chromosome and/or the species is optional. Submitted sequences initially appear under ‘My sequences’ (a private workspace for each user) as ‘in progress’ and the annotation can be accessed once the status changes to ‘annotated’. Interpreting MARA annotations MARA text annotations (Figure 1a) show the position and name of each feature, with an arrow indicating orientation. This is the direction of transcription of genes or of the transposase gene (tnp, tnpA or tniA; i.e. IRL→IRR) for IS and transposons and the reverse transcriptase for Group II introns. Partial copies of features are indicated by # against their name and the arrow is dashed at the truncated end(s). The positions and sequences of any DR are also indicated. A region flanked by two copies of the same IS (or closely related IS, as specified in the FDB) and DR will be annotated as a composite transposon (e.g. Figure 1a, bounded by two copies of IS1R). Other columns list the FID, the feature type (R gene, IS, etc.) and Notes. The latter include the length of any IR and/or DR for IS, generally taken from ISfinder and using the same format (i.e. IR = 18/23 indicates that IRR only matches IRL at 18 positions of the 23 bp IR). Figure 1. View largeDownload slide Example of MARA (a) text annotations and (b) annotation diagram of part of plasmid R100 (also called NR1; GenBank accession no. AP000342 with position 1 reset as the start of the repA2 gene). In part (a) positions of annotated features are listed, their orientations are shown by arrows and truncated features are indicated by #, with the arrow dashed at the truncated end(s), DR sequences are given and any potential composite transposons are also indicated. In part (b) gaps >50 bp are indicated by dashed red lines and the length in bp given. Gene features (catA1, sul1) are shown by arrows, gene cassettes (aadA1a) by pale blue boxes, the CS of integrons as orange boxes and IS (e.g. IS1) as white block arrows labelled with the IS number/name, with the pointed end indicating IRR. Unit transposons (Tn21, Tn402) are shown as boxes of different colours and their IR are shown as flags, with the flat side at the outer boundary of the transposon. Truncated features (e.g. 3′-CS) are shown with a jagged edge on the truncated side(s). DR (flanking IRi and IRt of the integron, IRL and IRR of Tn21, different copies of IS1) are shown as ‘lollipops’ of the same colour. Here, a class 1 In/Tn (In2), carrying the aadA1a cassette, IS1326 and IS1353, is inserted in Tn21, which is itself inside an IS1-mediated composite (Tn9-like) transposon carrying catA1 with several pairs of DR. This figure appears in colour in the online version of JAC and in black and white in the print version of JAC. Figure 1. View largeDownload slide Example of MARA (a) text annotations and (b) annotation diagram of part of plasmid R100 (also called NR1; GenBank accession no. AP000342 with position 1 reset as the start of the repA2 gene). In part (a) positions of annotated features are listed, their orientations are shown by arrows and truncated features are indicated by #, with the arrow dashed at the truncated end(s), DR sequences are given and any potential composite transposons are also indicated. In part (b) gaps >50 bp are indicated by dashed red lines and the length in bp given. Gene features (catA1, sul1) are shown by arrows, gene cassettes (aadA1a) by pale blue boxes, the CS of integrons as orange boxes and IS (e.g. IS1) as white block arrows labelled with the IS number/name, with the pointed end indicating IRR. Unit transposons (Tn21, Tn402) are shown as boxes of different colours and their IR are shown as flags, with the flat side at the outer boundary of the transposon. Truncated features (e.g. 3′-CS) are shown with a jagged edge on the truncated side(s). DR (flanking IRi and IRt of the integron, IRL and IRR of Tn21, different copies of IS1) are shown as ‘lollipops’ of the same colour. Here, a class 1 In/Tn (In2), carrying the aadA1a cassette, IS1326 and IS1353, is inserted in Tn21, which is itself inside an IS1-mediated composite (Tn9-like) transposon carrying catA1 with several pairs of DR. This figure appears in colour in the online version of JAC and in black and white in the print version of JAC. Diagrams (Figure 1b) show annotated features to scale, with gaps >50 bp indicated by dashed red lines and the length in bp. Gene features are shown by arrows, as in text annotations. Like RAC, MARA annotates complete gene cassettes (pale blue boxes, Figure 1b, aadA1a) rather than the genes that they carry. The conserved segments of integrons are shown as orange boxes (Figure 1b, 5′-CS and 3′-CS). ‘In’ numbers, indicating different cassette arrays, are not used in MARA (but can be obtained from INTEGRALL32). IS are shown as white block arrows labelled with the IS number/name, with the pointed end indicating IRR (Figure 1b, e.g. IS1R) and Group II introns are similar but in grey. Unit transposons are shown as boxes, with common ones assigned their own colour, and their IR (complete or truncated) are shown as flags, with the flat side at the outer boundary of the transposon (Figure 1b, Tn21). Other partial features are shown with a jagged edge on the truncated side(s) (Figure 1b, e.g. 3′-CS). DR are shown as ‘lollipops’ of the same colour (Figure 1b, e.g. flanking Tn21 and the class 1 In/Tn). PNG versions of diagrams (e.g. for use in publications, but please cite this paper) can be obtained by clicking ‘Download image annotation’ or by using ‘Save Image As…’. For resistance gene families with well-defined nomenclature systems, e.g. bla genes named on amino acid sequence, different named ‘frameworks’41 of blaTEM-1 (blaTEM-1a, blaTEM-1b, etc.), each variant in the FDB will be specifically annotated by Attacca if complete. Genes encoding variants that have not yet been assigned a number or truncated copies of genes will be indicated by general names, e.g. blaTEM (Figure 2a). Figure 2. View largeDownload slide Examples of MARA annotations. Features are shown as described in Figure 1. (a) A blaTEM gene that is correctly annotated (top), corresponds to an unnamed blaTEM variant (middle) or is truncated (bottom). (b) A Tn1331-like element, showing annotation of a Tn1/2/3-like fragment. (c) Tn10 carrying tet(B), illustrating an annotation of a named composite transposon. (d) Tn6029, illustrating a region with multiple IS26 and fragments of Tn5393. (e) A hybrid Tn21/Tn1696 transposon with IS1111-family elements inserted in both IR, flanked by DR. (f) An example of an IS1111-attC element, ISPa21e, and an annotation gap in the well-defined 5′-CS (left) that was identified as an IS and added to the FDB (with a temporary name) before reannotation (right). (g) Annotations of Tn1722 (top left), Tn1721 (top right) and a more complex structure containing parts of Tn1722/1721. (h) An example of a spurious DR, where the 5 bp beyond IRR of ISEcp1 (as well as those beyond to IRalt of the TPU) happen to match those preceding IRL. (h) Sequence with possible assembly errors. Truncation of the leftmost IS26 is explained by the sequence starting within it, but the IS26 fragments within Tn2 are not explained by truncation by other elements. The pair of IS26 fragments to the right of sul2 are oppositely oriented and the sum of the sizes of the rightmost pair IS26 fragments (143 bp) is much less than a complete copy of IS26 (820 bp), suggesting incorrect joining of contigs in each case. This figure appears in colour in the online version of JAC and in black and white in the print version of JAC. Figure 2. View largeDownload slide Examples of MARA annotations. Features are shown as described in Figure 1. (a) A blaTEM gene that is correctly annotated (top), corresponds to an unnamed blaTEM variant (middle) or is truncated (bottom). (b) A Tn1331-like element, showing annotation of a Tn1/2/3-like fragment. (c) Tn10 carrying tet(B), illustrating an annotation of a named composite transposon. (d) Tn6029, illustrating a region with multiple IS26 and fragments of Tn5393. (e) A hybrid Tn21/Tn1696 transposon with IS1111-family elements inserted in both IR, flanked by DR. (f) An example of an IS1111-attC element, ISPa21e, and an annotation gap in the well-defined 5′-CS (left) that was identified as an IS and added to the FDB (with a temporary name) before reannotation (right). (g) Annotations of Tn1722 (top left), Tn1721 (top right) and a more complex structure containing parts of Tn1722/1721. (h) An example of a spurious DR, where the 5 bp beyond IRR of ISEcp1 (as well as those beyond to IRalt of the TPU) happen to match those preceding IRL. (h) Sequence with possible assembly errors. Truncation of the leftmost IS26 is explained by the sequence starting within it, but the IS26 fragments within Tn2 are not explained by truncation by other elements. The pair of IS26 fragments to the right of sul2 are oppositely oriented and the sum of the sizes of the rightmost pair IS26 fragments (143 bp) is much less than a complete copy of IS26 (820 bp), suggesting incorrect joining of contigs in each case. This figure appears in colour in the online version of JAC and in black and white in the print version of JAC. Tn1, Tn2 and Tn3 are closely related and apparently hybrids of one another, originally defined as carrying blaTEM-2, blaTEM-1b and blaTEM-1a, respectively.42 A Tn2 variant (9 differences) carrying blaTEM-1c43 and a relatively common Tn1 variant (10 differences) carrying blaTEM-1f are designated Tn2c and Tn1f, respectively, in MARA. Each of these transposons will be annotated correctly if intact, as will blaTEM genes within them, but correctly annotating hybrids and variants of these transposons44 is not always possible. Fragments of these transposons that are too short to be correctly distinguished as Tn1, Tn2 or Tn3 may be annotated as Tn1/2/3 and further manual comparison with exemplar sequences may be required (Figure 2b). Some named composite transposons will be indicated, if complete (e.g. Figure 2c, Tn10). Multiple copies of IS26 are often found separating regions that include different resistance genes (e.g. Figure 2d), and a single copy of IS26 can move adjacent regions6 or invert the region between incoming and existing copies.45 This needs to be borne in mind when reporting analysis of sequences with IS26, as a ‘Composite Transposon’ may have been created by insertion of a single copy of IS26 and an adjacent segment into an existing copy of IS26 already flanked by DR. Fragments of the Tn3-family element Tn5393, carrying the strAB genes, are common components of MRR (Figure 2d). Two of the most common members of the Tn21-subfamily that carry class 1 In/Tn are Tn21 and Tn1696.46 Although closely related transposons without a class 1 In/Tn have been identified (Tn5060 and Tn5036, respectively), the names Tn21 and Tn1696 are used in MARA as they are more familiar. Tn numbers have been assigned to some elements with a Tn21 backbone but different cassette arrays in the class 1 In/Tn (e.g. Tn260310) and some hybrid transposons (e.g. tnp and mer regions matching two different transposons, Figure 2e). MARA annotates the ‘base’ transposon, as this is more useful in comparative analysis, and any DR flanking hybrid transposons should be annotated (Figure 2e). Tn numbers for these overall structures and for any novel transposons can be obtained from the Transposon Registry. The 38 bp IR of Tn21-subfamily elements are the targets for certain IS1111 family elements (indicated in Notes), which have sub-terminal IR and do not create DR.47 The ‘outer’ part of an IR interrupted by an IS1111-type element is shown as a complete flag in diagrams while the ‘inner’ part is too short to be annotated (Figure 2e). DR flanking IR interrupted by IS1111-family elements are annotated (Figure 2e). Annotated fragments of 38 bp IR that are not adjacent to IS1111-family elements may indicate the boundaries of novel Tn3-family transposons. Other IS1111 family elements target the attC sites of gene cassettes (e.g. Figure 2f, ISPa21e).48 The Tn21-subfamily element Tn1721 consists of Tn1722 (tnpA, tnpR and mcp, encoding a proposed methyl-accepting chemotaxis protein, flanked by a 38 bp IR) and an additional region with the tet(A) determinant, a partial duplication of tnpA and an extra 38 bp IR (Figure 2f).11 Tn1722 can move independently and will be annotated if complete, but fragments of either transposon will be annotated as Tn1721#. Some hybrid Tn402-like elements (with or without class 1 In), have also been given numbers, but again the component parts will be annotated. TPU included in the FDB will be annotated if complete, along with any DR, but any Tn numbers assigned to them are not indicated. Diagrams show ISEcp1 and the adjacent region up to IRalt (Figure 3a and b). If the TPU is interrupted, for example by another IS, neither it nor any DR will be annotated (Figure 3c). Using MARA annotations to remove the interruption (and one copy of any DR) and re-annotating (Figure 3d) should identify any DR. Some TPU result from capture of an additional segment adjacent to another TPU8 (Figure 3e). These ‘composite’ TPU are generally not included in the FDB and will not be annotated as TPU nor any DR found, but annotation of the boundary of the adjacent segment may help to identify DR. Figure 3. View largeDownload slide Examples of annotations of ISEcp1 transposition units (TPU). (a) and (b) show a 3.078 kb TPU carrying blaCMY-2 (GenBank accession no. CP012929) and a 2.961 kb TPU carrying blaOXA-181 (AB972272), respectively. Each is flanked by 5 bp DR that were annotated by MARA (DR sequences added from text annotations). (c) The same 3.078 kb TPU as in (a) with blaCMY-23 (a variant of blaCMY-2 with one nucleotide difference giving one amino acid difference) and ISEcp1 interrupted by IS5 with 4 bp DR (HG941718). (d) Removing IS5 and one copy of the repeated TTAA and re-annotating with MARA identifies the DR. (e) No DR are annotated by Attacca in JN205800 or JQ996150, as an adjacent segment (a fragment of the ereA3 cassette or ISKpn14, respectively) has been captured creating a new TPU. Attacca annotation of the boundary of this segment enables identification of 5 bp DR. This figure appears in colour in the online version of JAC and in black and white in the print version of JAC. Figure 3. View largeDownload slide Examples of annotations of ISEcp1 transposition units (TPU). (a) and (b) show a 3.078 kb TPU carrying blaCMY-2 (GenBank accession no. CP012929) and a 2.961 kb TPU carrying blaOXA-181 (AB972272), respectively. Each is flanked by 5 bp DR that were annotated by MARA (DR sequences added from text annotations). (c) The same 3.078 kb TPU as in (a) with blaCMY-23 (a variant of blaCMY-2 with one nucleotide difference giving one amino acid difference) and ISEcp1 interrupted by IS5 with 4 bp DR (HG941718). (d) Removing IS5 and one copy of the repeated TTAA and re-annotating with MARA identifies the DR. (e) No DR are annotated by Attacca in JN205800 or JQ996150, as an adjacent segment (a fragment of the ereA3 cassette or ISKpn14, respectively) has been captured creating a new TPU. Attacca annotation of the boundary of this segment enables identification of 5 bp DR. This figure appears in colour in the online version of JAC and in black and white in the print version of JAC. Some ISCR3 family elements49 appear to be hybrids of one another.8 Those assigned an ISCR number should be correctly annotated when intact, but fragments may be annotated as part of a different ISCR3-like element or as ‘ISCR3-like.’ Two different, defined fragments of ISCR27, associated with the blaNDM-1 gene, are included as separate features. Evaluation, limitations and use of MARA annotations During development, correct annotation of exemplar features was verified. Comparison of MARA with ResFinder and SSTAR using sequences in the SSTAR paper18 (Tables S1 and S2, available as Supplementary data at JAC Online) shows that MARA also found all of the resistance genes detected by these programs and provides additional information. There is currently no ‘gold standard’ for annotation of MRR components against which the performance of Attacca can be fully evaluated, but Figure 2(b–f) illustrates the successful annotation of different combinations of features. MARA is best suited to analysing MRR in completely assembled plasmid or genome sequences from Enterobacteriaceae (e.g. from long-read data) and may not correctly/precisely annotate sequences that contain many errors or truncated features in minimally assembled sequences. A whole bacterial genome (up to seven megabases) can be submitted for annotation, or contigs from short-read data. As MARA will annotate fragments of any repeats that are found at the ends of contigs (e.g. IS and which end is present) annotations provide information that can be used to design PCR strategies to confirm links between contigs. The start point of circular plasmid sequences may also need to be repositioned (ideally outside any MRR) to give the most informative annotations. Only features that are included in the FDB will be identified by MARA, so novel mobile elements and resistance genes (or variants that fall outside the percentage identity cut-off used, usually ∼98%, but this may be optimized for each feature) will not be annotated. Gaps in MRR may need to be analysed further to identify novel features, e.g. a search of ISfinder with the 1585 bp gap in Figure 2(f) (left-hand diagram) suggests an IS1182-like element flanked by a 4 bp DR that was then added to the FDB (right-hand diagram). Similarly, closely spaced fragments of the same, or related, IS may suggest sequence errors or a variant that is too different to be annotated using the percentage identity cut-off in the FDB. DR annotations may occasionally be spurious, if the same short sequences happen to be present by coincidence at the ends of an IS/transposon (Figure 2h). This should be obvious if different features are annotated flanking the inserted element, but it may also be possible to verify DR by joining the sequences flanking the insertion, minus one copy of the DR, and using this in a BLASTn search for uninterrupted versions of the sequence. Partial copies of features that cannot be explained by, for example, the sequence starting/ending within them or truncation by another mobile element (Figure 2i) may suggest incorrect assembly that needs checking. Maintaining the MARA FDB The MARA FDB will be regularly updated from available web sites and publications but users are also encouraged to submit new resistance genes or mobile elements for inclusion. In addition to the conditions listed for RAC that suggest a novel cassette,28 manual review by MARA curators will also be triggered by annotation of an unnumbered variant of a known gene (e.g. Figure 2a) or a gap in another feature that could be a novel mobile element (Figure 2f). The submitter would then be contacted to suggest, for example, that the sequence could be analysed using ISfinder and any novel IS submitted for naming and inclusion in the ISfinder database. New features, named by existing nomenclature registries where possible, can also be submitted by notifying MARA curators. Any new feature submitted to the FDB will be initially only used to annotate sequences submitted by that user, until they give explicit written permission for MARA administrators to move it to the public FDB or a description of the feature appears in, for example, GenBank, ISfinder or a journal article. Conclusions The MARA web site provides access to the Attacca automatic annotation engine allowing users to easily, quickly and accurately annotate MRR in DNA sequences from Enterobacteriaceae. Many features are also already relevant to Acinetobacter and Pseudomonas and we are expanding the FDB to better annotate MRR and resistance islands in these species. Acknowledgements Part of this work was presented at the Fifty-third Interscience Conference on Antimicrobial Agents and Chemotherapy, Denver, CO, USA, 2013 (Poster P1721) and at the First ASM Conference on Rapid Next-Generation Sequencing and Bioinformatic Pipelines for Enhanced Molecular Epidemiologic Investigation of Pathogens, Washington, DC, USA, 2015 (Poster 4). We would like to thank Enrico Coiera and Jon Iredell for supporting this project, August Gilg, Stefan Haunsberger and Andreas Huettl for their work on the diagram generation software, Vitaliy Kim for his work on the sequence submission interface and Kaitlin Tagg for comments on the manuscript prior to submission. Funding This work was supported by GNT1001021 (Centre for Research Excellence in Critical Infection) from the Australian National Health and Medical Research Council (NHMRC), Spokade Pty Ltd and the Australian Institute of Health Innovation at Macquarie University. Transparency declarations G. T. is a director and major shareholder of Spokade Pty Ltd, which owns and maintains Attacca. S. R. P.: none to declare. Supplementary data Tables S1 and S2 are available as Supplementary data at JAC Online. References 1 Boucher HW, Talbot GH, Bradley JS et al. Bad bugs, no drugs: no ESKAPE! An update from the Infectious Diseases Society of America. Clin Infect Dis 2009; 48: 1– 12. Google Scholar CrossRef Search ADS PubMed 2 Hall RM, Stokes HW. Integrons: novel DNA elements which capture genes by site-specific recombination. Genetica 1993; 90: 115– 32. Google Scholar CrossRef Search ADS PubMed 3 Escudero JA, Loot C, Nivina A et al. The integron: adaptation on demand. Microbiol Spectr 2015; 3: MDNA3-0019-2014. 4 Siguier P, Gourbeyre E, Varani A et al. Everyman’s guide to bacterial insertion sequences. Microbiol Spectr 2015; 3: MDNA3-0030-2014. 5 Harmer CJ, Hall RM. IS26-mediated formation of transposons carrying antibiotic resistance genes. mSphere 2016; 1: e00038– 16. Google Scholar CrossRef Search ADS PubMed 6 Harmer CJ, Moran RA, Hall RM. Movement of IS26-associated antibiotic resistance genes occurs via a translocatable unit that includes a single IS26 and preferentially inserts adjacent to another IS26. MBio 2014; 5: e01801– 14. Google Scholar CrossRef Search ADS PubMed 7 Poirel L, Decousser JW, Nordmann P. Insertion sequence ISEcp1B is involved in expression and mobilization of a blaCTX-M β-lactamase gene. Antimicrob Agents Chemother 2003; 47: 2938– 45. Google Scholar CrossRef Search ADS PubMed 8 Partridge SR. Analysis of antibiotic resistance regions in Gram-negative bacteria. FEMS Microbiol Rev 2011; 35: 820– 55. Google Scholar CrossRef Search ADS PubMed 9 Toleman MA, Bennett PM, Walsh TR. ISCR elements: novel gene-capturing systems of the 21st century? Microbiol Mol Biol Rev 2006; 70: 296– 316. Google Scholar CrossRef Search ADS PubMed 10 Liebert CA, Hall RM, Summers AO. Transposon Tn21, flagship of the floating genome. Microbiol Mol Biol Rev 1999; 63: 507– 22. Google Scholar PubMed 11 Grinsted J, de la Cruz F, Schmitt R. The Tn21 subgroup of bacterial transposable elements. Plasmid 1990; 24: 163– 89. Google Scholar CrossRef Search ADS PubMed 12 Minakhina S, Kholodii G, Mindlin S et al. Tn5053 family transposons are res site hunters sensing plasmidal res sites occupied by cognate resolvases. Mol Microbiol 1999; 33: 1059– 68. Google Scholar CrossRef Search ADS PubMed 13 Xavier BB, Das AJ, Cochrane G et al. Consolidating and exploring antibiotic resistance gene data resources. J Clin Microbiol 2016; 54: 851– 9. Google Scholar CrossRef Search ADS PubMed 14 Zankari E, Hasman H, Cosentino S et al. Identification of acquired antimicrobial resistance genes. J Antimicrob Chemother 2012; 67: 2640– 4. Google Scholar CrossRef Search ADS PubMed 15 McArthur AG, Waglechner N, Nizam F et al. The comprehensive antibiotic resistance database. Antimicrob Agents Chemother 2013; 57: 3348– 57. Google Scholar CrossRef Search ADS PubMed 16 Gupta SK, Padmanabhan BR, Diene SM et al. ARG-ANNOT, a new bioinformatic tool to discover antibiotic resistance genes in bacterial genomes. Antimicrob Agents Chemother 2014; 58: 212– 20. Google Scholar CrossRef Search ADS PubMed 17 Inouye M, Dashnow H, Raven LA et al. SRST2: rapid genomic surveillance for public health and hospital microbiology labs. Genome Med 2014; 6: 90. Google Scholar CrossRef Search ADS PubMed 18 de Man TJ, Limbago BM. SSTAR, a stand-alone easy-to-use antimicrobial resistance gene predictor. mSphere 2016; 1: e00050– 15. Google Scholar CrossRef Search ADS PubMed 19 Siguier P, Perochon J, Lestrade L et al. ISfinder: the reference centre for bacterial insertion sequences. Nucleic Acids Res 2006; 34: D32– 6. Google Scholar CrossRef Search ADS PubMed 20 Roberts AP, Chandler M, Courvalin P et al. Revised nomenclature for transposable genetic elements. Plasmid 2008; 60: 167– 73. Google Scholar CrossRef Search ADS PubMed 21 Li J, Tai C, Deng Z et al. VRprofile: gene-cluster-detection-based profiling of virulence and antibiotic resistance traits encoded within genome sequences of pathogenic bacteria. Brief Bioinform 2017; doi:10.1093/bib/bbw141. 22 Cury J, Jové T, Touchon M et al. Identification and analysis of integrons and cassette arrays in bacterial genomes. Nucleic Acids Res 2016; 44: 4539– 50. Google Scholar CrossRef Search ADS PubMed 23 Schaeffer J, Held A, Tsafnat G. Computational grammars for interrogation of genomes. In: Sintchenko V, ed. Infectious Diseases Bioinformatics . New York, NY, USA: Springer, 2010; 263– 78. Google Scholar CrossRef Search ADS 24 Tsafnat G, Schaeffer J, Clayphan A et al. Computational inference of grammars for larger-than-gene structures from annotated gene sequences. Bioinformatics 2011; 27: 791– 6. Google Scholar CrossRef Search ADS PubMed 25 Tsafnat G, Coiera E, Partridge SR et al. Context-driven discovery of gene cassettes in mobile integrons using a computational grammar. BMC Bioinformatics 2009; 10: 281. Google Scholar CrossRef Search ADS PubMed 26 Partridge SR, Tsafnat G, Coiera E et al. Gene cassettes and cassette arrays in mobile resistance integrons. FEMS Microbiol Rev 2009; 33: 757– 84. Google Scholar CrossRef Search ADS PubMed 27 Partridge SR, Tsafnat G. A novel gene cassette potentially conferring resistance to aminoglycosides. Antimicrob Agents Chemother 2012; 56: 4566– 7. Google Scholar CrossRef Search ADS PubMed 28 Tsafnat G, Copty J, Partridge SR. RAC: repository of antibiotic resistance cassettes. Database 2011; 2011: bar054. Google Scholar CrossRef Search ADS PubMed 29 Jacoby G, Cattoir V, Hooper D et al. qnr gene nomenclature. Antimicrob Agents Chemother 2008; 52: 2297– 9. Google Scholar CrossRef Search ADS PubMed 30 Shaw KJ, Rather PN, Hare RS et al. Molecular genetics of aminoglycoside resistance genes and familial relationships of the aminoglycoside-modifying enzymes. Microbiol Rev 1993; 57: 138– 63. Google Scholar PubMed 31 Novick RP, Clowes RC, Cohen SN et al. Uniform nomenclature for bacterial plasmids: a proposal. Bacteriol Rev 1976; 40: 168– 89. Google Scholar PubMed 32 Moura A, Soares M, Pereira C et al. INTEGRALL: a database and search engine for integrons, integrases and gene cassettes. Bioinformatics 2009; 25: 1096– 8. Google Scholar CrossRef Search ADS PubMed 33 Doi Y, Wachino J, Arakawa Y. Nomenclature of plasmid-mediated 16S rRNA methylases responsible for panaminoglycoside resistance. Antimicrob Agents Chemother 2008; 52: 2287– 8. Google Scholar CrossRef Search ADS PubMed 34 Warburton PJ, Roberts AP. Comment on: resistance gene naming and numbering: is it a new gene or not? J Antimicrob Chemother 2017; 72: 634– 7. Google Scholar CrossRef Search ADS PubMed 35 Tagg KA, Iredell JR, Partridge SR. Complete sequencing of IncI1 sequence type 2 plasmid pJIE512b indicates mobilization of blaCMY-2 from an IncA/C plasmid. Antimicrob Agents Chemother 2014; 58: 4949– 52. Google Scholar CrossRef Search ADS PubMed 36 Candales MA, Duong A, Hood KS et al. Database for bacterial group II introns. Nucleic Acids Res 2012; 40: D187– 90. Google Scholar CrossRef Search ADS PubMed 37 Altschul SF, Gish W, Miller W et al. Basic local alignment search tool. J Mol Biol 1990; 215: 403– 10. Google Scholar CrossRef Search ADS PubMed 38 Livermore DM. Defining an extended-spectrum β-lactamase. Clin Microbiol Infect 2008; 14 Suppl 1: 3– 10. Google Scholar CrossRef Search ADS PubMed 39 Ford PJ, Avison MB. Evolutionary mapping of the SHV β-lactamase and evidence for two separate IS26-dependent blaSHV mobilization events from the Klebsiella pneumoniae chromosome. J Antimicrob Chemother 2004; 54: 69– 75. Google Scholar CrossRef Search ADS PubMed 40 Jacoby GA. AmpC β-lactamases. Clin Microbiol Rev 2009; 22: 161– 82. Google Scholar CrossRef Search ADS PubMed 41 Leflon-Guibout V, Heym B, Nicolas-Chanoine M. Updated sequence information and proposed nomenclature for blaTEM genes and their promoters. Antimicrob Agents Chemother 2000; 44: 3232– 4. Google Scholar CrossRef Search ADS PubMed 42 Partridge SR, Hall RM. Evolution of transposons containing blaTEM genes. Antimicrob Agents Chemother 2005; 49: 1267– 8. Google Scholar CrossRef Search ADS PubMed 43 Bailey J, Pinyon J, Abnantham S et al. Distribution of the blaTEM gene and blaTEM-containing transposons in commensal Escherichia coli. J Antimicrob Chemother 2011; 66: 745– 51. Google Scholar CrossRef Search ADS PubMed 44 Partridge SR. What’s in a name? ISSwi1 corresponds to transposons related to Tn2 and Tn3. mBio 2015; 6: e01344– 15. Google Scholar PubMed 45 He S, Hickman AB, Varani AM et al. Insertion sequence IS26 reorganizes plasmids in clinically isolated multidrug-resistant bacteria by replicative transposition. mBio 2015; 6: e00762– 15. Google Scholar PubMed 46 Partridge SR, Brown HJ, Stokes HW et al. Transposons Tn1696 and Tn21 and their integrons In4 and In2 have independent origins. Antimicrob Agents Chemother 2001; 45: 1263– 70. Google Scholar CrossRef Search ADS PubMed 47 Partridge SR, Hall RM. The IS1111 family members IS4321 and IS5075 have subterminal inverted repeats and target the terminal inverted repeats of Tn21 family transposons. J Bacteriol 2003; 185: 6371– 84. Google Scholar CrossRef Search ADS PubMed 48 Tetu SG, Holmes AJ. A family of insertion sequences that impacts integrons by specific targeting of gene cassette recombination sites, the IS1111-attC group. J Bacteriol 2008; 190: 4959– 70. Google Scholar CrossRef Search ADS PubMed 49 Toleman MA, Walsh TR. Evolution of the ISCR3 group of ISCR elements. Antimicrob Agents Chemother 2008; 52: 3789– 91. Google Scholar CrossRef Search ADS PubMed © The Author(s) 2018. Published by Oxford University Press on behalf of the British Society for Antimicrobial Chemotherapy. All rights reserved. For Permissions, please email: email@example.com.
Journal of Antimicrobial Chemotherapy – Oxford University Press
Published: Apr 1, 2018
It’s your single place to instantly
discover and read the research
that matters to you.
Enjoy affordable access to
over 18 million articles from more than
15,000 peer-reviewed journals.
All for just $49/month
Query the DeepDyve database, plus search all of PubMed and Google Scholar seamlessly
Save any article or search result from DeepDyve, PubMed, and Google Scholar... all in one place.
Get unlimited, online access to over 18 million full-text articles from more than 15,000 scientific journals.
Read from thousands of the leading scholarly journals from SpringerNature, Elsevier, Wiley-Blackwell, Oxford University Press and more.
All the latest content is available, no embargo periods.
“Hi guys, I cannot tell you how much I love this resource. Incredible. I really believe you've hit the nail on the head with this site in regards to solving the research-purchase issue.”Daniel C.
“Whoa! It’s like Spotify but for academic articles.”@Phil_Robichaud
“I must say, @deepdyve is a fabulous solution to the independent researcher's problem of #access to #information.”@deepthiw
“My last article couldn't be possible without the platform @deepdyve that makes journal papers cheaper.”@JoseServera