Background: Genome mining tools have enabled us to predict biosynthetic gene clusters that might encode compounds with valuable functions for industrial and medical applications. With the continuously increasing number of genomes sequenced, we are confronted with an overwhelming number of predicted clusters. In order to guide the effective prioritization of biosynthetic gene clusters towards finding the most promising compounds, knowledge about diversity, phylogenetic relationships and distribution patterns of biosynthetic gene clusters is necessary. Results: Here, we provide a comprehensive analysis of the model actinobacterial genus Amycolatopsis and its potential for the production of secondary metabolites. A phylogenetic characterization, together with a pan-genome analysis showed that within this highly diverse genus, four major lineages could be distinguished which differed in their potential to produce secondary metabolites. Furthermore, we were able to distinguish gene cluster families whose distribution correlated with phylogeny, indicating that vertical gene transfer plays a major role in the evolution of secondary metabolite gene clusters. Still, the vast majority of the diverse biosynthetic gene clusters were derived from clusters unique to the genus, and also unique in comparison to a database of known compounds. Our study on the locations of biosynthetic gene clusters in the genomes of Amycolatopsis’ strains showed that clusters acquired by horizontal gene transfer tend to be incorporated into non-conserved regions of the genome thereby allowing us to distinguish core and hypervariable regions in Amycolatopsis genomes. Conclusions: Using a comparative genomics approach, it was possible to determine the potential of the genus Amycolatopsis to produce a huge diversity of secondary metabolites. Furthermore, the analysis demonstrates that horizontal and vertical gene transfer play an important role in the acquisition and maintenance of valuable secondary metabolites. Our results cast light on the interconnections between secondary metabolite gene clusters and provide a way to prioritize biosynthetic pathways in the search and discovery of novel compounds. Keywords: Amycolatopsis, Genome mining, Comparative genomics, Biosynthetic gene cluster, Gene cluster family, Secondary metabolite diversity, Phylogeny, Natural products, Evolution * Correspondence: email@example.com Interfaculty Institute of Microbiology and Infection Medicine Tübingen, Microbiology/Biotechnology, University of Tübingen, Tübingen, Germany German Centre for Infection Research (DZIF), Partner Site Tübingen, Tübingen, Germany Full list of author information is available at the end of the article © The Author(s). 2018 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. Adamek et al. BMC Genomics (2018) 19:426 Page 2 of 15 Background related GCFs [14–16]. It was shown that on one hand BGC The value of bacterial secondary metabolites for medical distribution was correlated with species phylogeny while on applications, as pharmaceuticals, especially anti-infectives, the other hand the vast BGC diversity could not be ex- but also for industrial use is indisputable [1, 2]. Further- plained by vertical evolution. Furthermore, distinct taxa, or more, the demand for the discovery of novel compounds even distinct species, show remarkable differences in their for medical applications is urgent, especially in the light of BGCs. This leaves open questions concerning the main the increasing antibiotic resistance to drugs currently in mechanisms for secondary metabolite evolution. Because of use . To facilitate the discovery of novel compounds, these taxonomic differences, it is necessary to characterize bacterial genome sequences are screened for genome re- many different bacterial genera in order to evaluate the di- gions that are likely to code for the production of second- versity of BGCs and the mechanisms leading to their diver- ary metabolites. This bioinformatics approach is the first sification. This knowledge should help us to predict where important step in the genome mining pipeline that is ne- to seek novel secondary metabolites, and to estimate if the cessary to guide the discovery of novel compounds [4, 5]. search for novel producers should be based on phylogeny, The secondary metabolite machinery of bacteria is mainly geography or on specific microenvironments. Classifying organized into several diverse clusters, called biosynthetic GCFs enables us to further prioritize BGCs with respect to gene clusters (BGCs), which contain biosynthesis genes in their novelty and to predict their structural scaffolds . close physical proximity. BGCs encoding for closely re- In this work, we focus on the actinomycete genus lated biosynthetic pathways that produce highly similar Amycolatopsis as a model system for an in-depth study chemical compounds are summarized under the term of secondary metabolite gene clusters harbored by this gene cluster families (GCFs). Polyketide synthase (PKS) genus. As of 2017, 69 different Amycolatopsis species and non-ribosomal peptide synthetase (NRPS) gene clusters have been validly named . 41 genome sequences are huge megasynthases that produce natural products by a representing 28 different Amycolatopsis species are pub- multimodular assembly line in a series of chemical con- licly available as complete or draft genome sequences. densation reactions . Other notable classes include Amycolatopsis strains are ubiquitously distributed and ribosomally synthesized and post-translationally modi- have been isolated foremost from soil, but also from fied peptides (RiPPs) and terpenes [7, 8]. aquatic habitats, rock surfaces, and from clinical sources Recent comparative genomics approaches have shown [18–23]. Only four Amycolatopsis species are known to that the potential for bacteria to produce secondary have pathogenic properties [24, 25]. metabolites is much more promising than previously Amycolatopsis is already valued as a producer for the thought, as many actinobacterial genomes harbor 20–29 commercially used vancomycin and other glycopeptide BGCs on average . With the currently available tools, antibiotics as well as for the production of the ansamycin detection of putative BGCs is fast and simple . It is rifamycin . Other compounds with antibacterial, anti- now feasible to detect thousands of putative BGCs. To fungal or antiviral properties that have been derived from guide the discovery of the most promising novel com- Amycolatopsis strains are quartromycin , octacosamicin pounds, it is important to understand the distribution pat- , chelocardin , kigamicin and themacrotermy- terns of BGCs. Therefore, knowledge about the diversity, cins A-D . environmental distribution and phylogenetic relationships To explore the full potential of Amycolatopsis strains of BGCs in the context of their environmental function is for the synthesis of secondary metabolites, we performed paramount. a comprehensive analysis of the secondary metabolite In contrast to primary metabolites, bacterial secondary gene clusters in Amycolatopsis. We were able to eluci- metabolites are not necessary for the immediate survival date the phylogenetic patterns in which biosynthetic of the bacterium, but are important for adaption, as well gene clusters evolve and to reveal the huge genetic po- as for fitness advantages in specific natural habitats. tential of members of this taxon to produce novel sec- Early hypotheses suggested that bacteria mainly produce ondary metabolites. secondary metabolites with antibiotic activity for defense purposes, more recent studies show that these secondary Results metabolites also play a key role as signaling molecules In order to characterize and compare members of the [11, 12]. Furthermore, they have been shown to be in- genus Amycolatopsis and to establish their potential for volved in complex mutualistic relationships in their spe- biosynthesis of secondary metabolites we used 43 Amy- cific environment . Yet, the complex functions of colatopsis genome sequences for a comparative genom- secondary metabolites in their natural environment re- ics approach. In total, 41 of the 43 strains were derived main poorly understood. from public databases and two strains, Amycolatopsis Previous approaches to characterize secondary metabolite sp. H5 and KNN 50.9b, were newly sequenced. This gene clusters used different methods to sort BGCs into Whole Genome Shotgun project has been deposited at Adamek et al. BMC Genomics (2018) 19:426 Page 3 of 15 DDBJ/ENA/GenBank under the accession NMUL00000000 CGMCC 4.3568, A. nigrescens DSM 44992, A. sacchari (H5) and NMUK00000000 (KNN50.9b). The version DSM 44468, A. taiwanensis DSM 45107, and A. xylanica described in this paper is version NMUL01000000 for CPCC 202699 formed distinct single membered clades. It Amycolatopsis sp. H5 and version NMUK01000000 for was not possible to detect any significant relationships Amycolatopsis sp. KNN50.9b. Basic data for the newly between the phylogeny of Amycolatopsis strains and their sequenced strains are given in the supplementary ma- origin (Additional file 3: Table S2). Members from the same terial (Additional file 2:Table S1). phylogenetic clade were isolated from various geographic regions across the world. The majority of strains were Characterization of the genus Amycolatopsis isolated from diverse soils; the marine isolate A. marina To assess relationships between the sequenced Amycola- CGMCC 4.3568 and the salt-lake isolate A. halophila topsis strains we performed a multi locus sequence analysis YIM 93223 did not clade with any of the soil strains. (MLSA). Based on the concatenation of 7 housekeeping Discrepancies were observed in the assignment of strains genes (atpD, clpB, gapA, gyrB, nuoD, pyrH, rpoB)amax- delineated as Amycolatopsis orientalis. Among the strains imum likelihood phylogenetic tree was generated for all of in group A is the industrial vancomycin producer A. the 43 Amycolatopsis strains (Fig. 1a); Nocardia farcinina orientalis HCCB10007 which clades a significant dis- IFM10152 and Streptomyces avermitilis MA-4680 were tance away from the A. orientalis DSM 40040 T. Fur- used as outgroups. We were able to distinguish four major thermore, A. orientalis DSM 46075 and DSM 43388 fell phylogenetic lineages containing the majority of the Amyco- into clade C, even further away from the A. orientalis latopsis stains, from here on referred to as A, B, C and D. type strain. When comparing the MLSA tree with a 16S Six strains, namely A. halophila YIM 93223, A. marina rRNA tree based on sequences derived from genomic Fig. 1 Amycolatopsis phylogeny, core−/pan-genome and average nucleotide identity. a) Maximum likelihood tree based on a MLSA (concatenated sequences of atpD, clpB, gapA, gyrB, nuoD, pyrH and rpoB) of 43 members of the genus Amycolatopsis. Bootstrap values were calculated from 500 bootstrap repetitions. b) Flower diagram representing the core-, accessory- and pan-genome of the Amycolatopsis strains. c) Heatmap displaying relationships between Amycolatopsis strains based on ANIm values Adamek et al. BMC Genomics (2018) 19:426 Page 4 of 15 data (Additional file 1:FigureS1),similar discrepancies (group A), 88.7–99.9% (group B), 85.3–99.1% (group C) could be seen. A. orientalis HCCB10007 clades in close and 84.4–96.5% (group D). For the strains that do not proximity to A. japonica DSM 44213, but not with the clade with any of the larger phylogenetic groups the ANIm A. orientalis type strain DSM 40040. A. orientalis DSM values with the other strains ranged from 83.7–84.4% (A. 46075 and DSM 43388 clade with group C strains as in nigrescens), 83.5–85.0% (A. xylanica), 83.6–86% (A. marina) the MLSA tree. However, in the 16S rRNA tree it could and 83.0–84.0% (A. halophila). Comparing these values to be clearly seen that the phylogenetic resolution is too the average ANI observed within other bacterial genera low to distinguish Amycolatopsis strains on a species shows thatall Amycolatopsis stains are within average level. One problem here is that most Amycolatopsis boundaries specified for a bacterial genus, hence their strainshavemultiple, in some casesdifferent,copiesof assignment to the genus Amycolatopsis is supported. the 16S rRNA gene. While the four clades (A-D) were Results of the POCP analysis (Additional file 4: Table S3) basically the same in the 16S rRNA tree as in the further confirm that except for A. halophila all of the MLSA tree, in some cases the multiple 16S rRNA cop- Amycolatopsis strains have at least 50% conserved pro- ies did not clade. This could be seen for example for A. teins, and therefore belong to the same genus, while A. orientalis B-37 that clades among multiple copies of A. halophila might be considered a different genus. lurida 16S rRNA genes, for A. decaplanina, which clus- ters with different copies of A. keratiniphila subsp. Amycolatopsis biosynthetic gene clusters - diversity and nogabecina,and for A. sacchari, which clades among A. phylogenetic affiliation sulphurea genes (Additional file 1:FigureS1). To study the potential of the strains to produce secondary In order to assess the genome similarity amongst the metabolites, all of the Amycolatopsis genomes were Amycolatopsis strains, a pan genome analysis was per- screened for candidate BGCs using the secondary metab- formed using the BPGA analysis tool . To reduce olite identification pipeline antiSMASH. Because the esti- any bias conferred by the 6 closely related and highly mation of precise cluster boundaries is a critical step similar A. mediterranei genomes, only the A. mediterranei when computationally comparing BGCs, all of the clusters S699 genome was used as a reference for A. mediterranei. detected with antiSMASH were manually curated . A The pan-genome analysis revealed a core genome of 1212 detailed overview on the distribution of BGCs with re- genes with an accessory genome of 27,483 genes and spect to their phylogenetic affiliation is given in Additional 33,342 unique genes (Fig. 1b). The core-pan plot file 1: Figure S4. (Additional file 1: Figure S2) shows that the pan genome In general, strains from the phylogenetic groups A and is likely to be extended if more genomes were added to the B have a higher number of BGCs (A: on average 37 analysis, hence the pan genome is considered to be “open”. BGCs, range 34–45 BGCs; B: on average 34 BGCs, range The core genome curve levels off, therefore the addition of 28–41 BGCs) than strains from group C (on average 30 more genomes to the analysis will probably not change the BGCs, range 22–38 BGCs). Within group D the lowest core genome size significantly. The COG (Clusters of number of BGCs (on average 18 BGCs, range 14–20 Orthologous Groups) analysis (Additional file 1:FigureS3) BGCs) were identified. The genomes of A. sacchari and for core, accessory and unique genes revealed that the A. taiwanensis, which are distinctly related with group majority of the core genes are involved in translation and D, have 16 and 18 BGCs respectively. The strains from ribosomal structure biogenesis. Core, accessory and unique the isolated aqueous and saline environments harbor genes are all similarly involved in transcription and amino only 22 BGCs (A. marina) and 14 BGCs (A. halophila). acid transport and metabolism. A remarkable number of In contrast, 43 and 41 BCGs were found in the genomes unique and accessory genes are involved in the biosynthesis of the A. xylanica and A. nigrescens strains. When compar- of secondary metabolites and in transport and catabolism. ing the BGC representatives for the different phylogenetic The majority of genes could only be linked to some general clades, it can be seen that strains from groups A and B have functions or to no function at all. remarkably high numbers of PKS and NRPS genes com- As group D strains and A. taiwanensis and A. halophila paredto the groupCand D strains. Thenumber ofRiPP, were clustering apart from the majority of the strains, we terpene and other BGCs is fairly constant over the different suspected they might represent novel taxa, distinct from phylogenetic subgroups, though the genome of the A. halo- the genus Amycolatopsis. Consequently, the average nu- phila strain lacks terpene BGCs. Overall each strain added cleotide identity based on MUMmer (ANIm) to distinguish to the analysis contributed on average 6–7new BGCs. strains at species level, and the percentage of conserved The relationship between the BGCs of each Amycola- proteins (POCP) to distinguish strains at genus level, were topsis strain was assessed by manually sorting the identi- calculated for all vs. all strains. The results, displayed as a fied BGCs to GCFs, according to cluster architecture and heatmap (Fig. 1c), show that within the phylogenetic Blast similarity. A concise overview of the sorting rationale subgroups the strains have ANIm values of 89.8–96.8% is given in Additional file 1:Figure S5. Overall 442 GCFs Adamek et al. BMC Genomics (2018) 19:426 Page 5 of 15 were distinguished, the majority of which were either PKS We also used a computational method to group or NRPS. It is possible to distinguish between common BGCs into GCFs and visualized them by genetic net- GCFs (present in four or more strains), rare GCFs working. The resultant groups follow the similarity of (present in 2–3 strains) and unique GCFs (present in only their Pfam-domains in each cluster, as previously noted one strain). by Cimermancic et al. . Using the Jaccard- and domain The distribution of GCFs amongst the members of the duplication index (DDI) as distance metrics a genetic net- genus Amycolatopsis is visualized in Fig. 2 as a presence/ work showing an all vs. all comparison of the Amycolatop- absence map. It can be seen that Amycolatopsis strains sis BGCs was generated (Fig. 3a). The same color code as with a high similarity in their BGC presence/absence pat- for the BGC-presence/absence map was used to distin- terns cluster together in the dendrogram. The patterns in guish between the BGC-classes. Most of the delineated the distribution of GCF in the main correlate with the spe- GCFs corresponded to our previously defined GCFs. In cies phylogeny. Comparing the BGCs and their phylogen- Fig. 3a the BGCs that were previously linked to a specific etic affiliation, it can be seen that the common GCFs are secondary metabolite are highlighted. This encompasses usually present in all members of their phylogenetic clade the NRPS biosynthesis clusters encoding the albachelin and rarely cluster outside of their phylogenetic subgroups. and amychelin like siderophores, and the glycopeptide The common PKS, NRPS and PKS/NRPS-hybrid clusters, class of antibiotics. Furthermore, the polyketide clusters as well as some of the RiPP families are mainly repre- for rifamycin, ECO-0501, chelocardin and the macroter- sented. Four terpene cluster families, one RiPP family and mycins are shown. The vast majority of strains harbored a several clusters from the “others” category were present in 2-metyhlisoborneol encoding terpene BGC. All Amycola- the genomes of the majority of the Amycolatopsis strains. topsis strains harbored the same ectoine BGC, which was Additional file 1: Figure S6 shows the frequency of GCFs excluded from further analyses because it should be con- within the genus Amycolatopsis in detail. When comparing sidered as a primary metabolite. An example in which the the distribution of GCFs, the conserved GCFs only account automatically calculated GCFs differed from the manually for a small proportion of the biosynthetic pathway diversity sorted ones is shown in the Additional file 1:Figure S7. in Amycolatopsis, only 33% are rare or common GCFs. A To distinguish novel BGCs from known BGCs we used vast number of GCFs are represented by only a single gene clusters deposited at the Minimum Information member (67% unique GCFs). The number of unique GCFs about a Biosynthetic Gene Cluster (MIBiG) database as exceeds the common and occasional GCFs by a factor of a reference, which at the date of publication contained two. These numbers emphasize the huge potential for 1297 annotated BGCs of known compounds. A genetic strain specific diversification. network of all of the MIBiG BGCs together with all of Fig. 2 Presence/absence of GCFs in Amycolatopsis strains. Each column in the map stands for a gene cluster family, each row stands for a certain Amycolatopsis strain, respective to the phylogeny in Fig. 1a. The presence of a GCF member in a strain is highlighted by a color code according to their class: PKSs – orange, PKS/NRPS-hybrids – light blue, NRPSs – dark green, RiPPs – yellow, Terpenes – purple, all other identified BGC classes – dark blue. For each class the GCFs are sorted by abundance, from high to low abundance. The absence of the respective GCF member is shown in grey. The dendrogram (UPGMA clustering with dice similarity coefficient) is derived from a similarity matrix containing information on the presence/ absence of BGCs Adamek et al. BMC Genomics (2018) 19:426 Page 6 of 15 Fig. 3 Genetic network and rarefaction curves of Amycolatopsis BGCs. Color codes are respective for gene cluster type (a) or phylogeny (b). A node stands for a specific BGC, while the length of the edges represents their relation, expressed through the Jaccard index value (threshold 0.65). (c) Rarefaction curves representing the BGC richness of the four phylogenetic subgroups. 1. albachelin-like NRPS and similar clusters (see Additional file 1: Figure S7), 2. 2-methylisoborneol, 3. glycopeptides, 4. rifamycin, 5. ECO-0501, 6. macrotermycin-like PKS clusters, 7. octacosamicin, 8. chelocardin the Amycolatopsis BGCs was created, using the Cimer- the A. halophila, A. nigrescens, A. taiwanensis and A. mancic index (Additional file 1: Figure S8). It was pos- xylanica BGCs remained singletons, while about half of sible to distinguish 1149 clusters, 388 of which were the BGCs from A. marina clustered in several of the lar- only found in the genomes of the Amycolatopsis strains, ger groups with mixed phylogeny. Some A. sacchari 742 were MIBiG only, and 19 consisted of Amycolatopsis BGCs clustered with group D strains. and MIBiG clusters. Of the 388 Amycolatopsis only clus- To assess BGC richness for a phylogenetic group a rar- ters 275 were singletons. These results provide further efaction curve, representing the abundance of BGCs per evidence of the huge diversity of Amycolatopsis BGCs strain is shown (Fig. 3c); a steep slope of the curve indicates and the immense potential this genus has for the detec- that it is likely that more novel BGCs will be discovered if tion of novel secondary metabolites. more strains are sampled. A steep slope can be seen for all To estimate further relationships between the Amyco- four phylogenetic groups, although that for group D is latopsis phylogenetic groups and the GCFs we used a much lower. Therefore, we would expect that maximum different color code for the nodes in the gene cluster diversity will be reached when sampling only a few more network, according to the strains’ phylogenetic affiliation strains from group D. It can be concluded that new mem- (Fig. 3b). Of the 70 common GCFs network clusters 31 bers of all of the phylogenetic groups have the potential to were specific for one phylogenetic group, 17 had mem- harbor yet undiscovered biosynthetic pathways. Plotting the bers from two phylogenetic lineages, and 22 contained relative number of BGCs per strain against the genome size members of three or more different phylogenetic line- (Additional file 1: Figure S9) revealed that phylogenetic ages. For the families with only two or three members, clades A and B not only have the largest genomes but also the numbers are too low to draw conclusions concerning harbor the highest number of BGCs. Members of clade C the distribution of phylogenetic groups. The majority of have comparably large genomes, but less BGCs while Adamek et al. BMC Genomics (2018) 19:426 Page 7 of 15 clade D strains have the smallest genomes and the lowest Taken together the common BGCs tend to be located BGC numbers. Taken all together, the most promising in a broad central area on the genome, opposite to the phylogenetic groups for genome mining are represented replication origin oriC, located upstream form the dnaA by the clade A and B strains, as well as by the A. nigrescens gene. These patterns can also be observed when all of the and A. xylanica strains. BGCs are taken into account. Additional file 1: Figure S10 shows the position of all of the BGCs on each of the line- BGC locations on the Amycolatopsis genomes arized genomes and pseudocontigs. The relative positions of the BGCs on the genomes can Figure 5 shows the relative position for gene cluster provide additional information about gene transfer, rear- types, such as terpenes, NRPS, and lantipeptides, on a cir- rangements and relationships of the BGCs. As all of the cular genome model. This relative position is expressed as A. mediterranei strains showed the same BGCs in the downstream distance (%) from oriC. For the majority of same location, this species is only represented by A. cluster types the distribution is denser around a region op- mediterranei strain S699 in the subsequent analyses. posite to the replication origin, while the regions flanking Since only 11 out of the 38 Amycolatopsis genomes were the replication origin tend to have less clusters. Exceptions in a complete state or available as draft genome with from these patterns are represented by the lantipeptides, only one scaffold we assembled the draft genomes with lassopeptides, aryl-polyenes and indoles, where about half multiple contigs as linearized pseudo contigs. For most of of the clusters are located in a region near to the replica- the complete genomes and the pseudo contigs synteny tion origin. with the respective reference strain of their phylogenetic To finally compare BGC location with overall genome group is given. For the A. japonica and A. lurida genomes conservation within the phylogenetic groups, conserved re- large scale rearrangements were observed that affected the gions and hypervariable regions were identified using a position of the BGCs. PARSNP core genome alignment. Because of the large gen- The position of each BGC was annotated on the etic differences between the Amycolatopsis strains, it was complete genomes and pseudo contigs of all of the Amyco- not possible to detect genomic islands though core-regions latopsis strains. Figure 4 shows the relative position of all and hypervariable regions were observed. It can be seen common GCFs (with four or more members). Different that the more closely related the strains, the smaller the hy- patterns can be observed with respect to the distribution of pervariable regions. It can be seen from Additional file 1: BGCs throughout the Amycolatopsis genomes and pseudo- Figure S11 that for the majority of BGCs the location also contigs. Not only is the presence/absence of BGCs corre- corresponds with the hypervariable regions of the genome. lated with the phylogeny, but the location of most of the commonBGCsisconserved within phylogenetic groups. Discussion This can be seen, for example, for “Lantipeptide BGC-1” Actinobacterial genome sequences have a much higher and “Terpene BGC-6” which is always neighboring the potential for the production of secondary metabolites than “Other BGC-6” clusters (highlighted as grey squares in previously thought [35, 36]. With recent advances in bio- Fig. 4). For other GCFs the position on the genome is not informatic search algorithms, it is possible to identify fixed, examples are highlighted as grey circles in the Figure. novel biosynthesis pathways based on predictions drawn This is seen best for PKS/NRPS BGC-4, which is distrib- from bioinformatics, and thereby guide the discovery of uted throughout phylogenetic clades A and B and is also novel compounds . Nevertheless, little is known about present in the genome of A. marina. Another example of a the variety and the evolutionary interconnections between BGC with a variable position is NRPS BGC-14, which is secondary metabolite gene clusters and species’ phylogeny present in some members of phylogenetic clades A, B and . Doroghazi and Metcalf were able to portray the huge C. Finally, an example of the huge diversity of BGCs, with diversity of secondary metabolites in different actinomy- respect to their locations on the genome and their phyl- cete genera , but it is also apparent that the genomes ogeny are the NRPS BGC-10 clusters, which are members of a single bacterial genus can harbor a wealth of undis- of the glycopeptide family (highlighted with yellow stars in covered secondary metabolites [14, 39]. In order to study Fig. 4). All of the strains from the phylogenetic clade A and the diversity and relationships of secondary metabolites two strains from group B harbor the glycopeptide BGC in we focused on the genus Amycolatopsis, which is already different locations on the genome. For A. japonica and A. known to produce valuable secondary metabolites , lurida it can be speculated that the different locations on and to harbor a yet unknown potential for the discovery their genome is due to genome rearrangements. The pres- of new natural products. ence of the glycopeptide BGCs in the group B genomes of To draw a comprehensive picture of the phylogenetic A. balhimycina and in the genomes of Amycolatopsis sp. relations between the sequenced members of the genus H5 clearly indicates that these clusters have been acquired Amycolatopsis a MLSA approach based on seven com- by horizontal gene transfer (HGT). mon housekeeping genes was used. At the 16S rRNA Adamek et al. BMC Genomics (2018) 19:426 Page 8 of 15 Fig. 4 The relative location of common BGCs on linearized genomes and pseudocontigs of Amycolatopsis. Examples for cluster families conserved in a phylogenetic group, which also share the same location are highlighted in gray squares. Examples for cluster families with a random distribution pattern are highlighted with gray circles. The glycopeptide as example for a cluster family with unusual distribution patterns are highlighted in yellow stars level the similarity between strains is around 97% or in there different phylogenetic subclades (clade A, B and higher , hence discrimination based only on 16S C). The AMS is represented by Amycolatopsis group D, rRNA data does not clearly identify relationships among and the ATS clade only by A. taiwanensis. ANIm values members of the genus. In contrast, using MLSA, four underpinned these results, as ANI values within the sub- major Amycolatopsis clades were detected. Furthermore, groups were much higher than between them. ANI values four isolates each formed a separate phylogenetic branch. below the 95% threshold are commonly used for species By phylogenetic analysis based on 16S rRNA and an actino- delineation . On this basis, strains previously classified bacterial conserved gene, Tang et al.  delineated three as A. orientalis HCCB10007, DSM 43388 and DSM 46075 types of Amycolatopsis stains: the mesophilic and moder- were shown to be misclassified. No information regarding ately thermophilic A. orientalis clade (AOS), the mesophilic the original method of classification was available for A. A. taiwanensis clade (ATS), and the thermophilic A. orientalis DSM 43388 and DSM 46075. A. orientalis methanolica subclade (AMS). In our study we were HCCB10007 was derived from the strain A. orientalis able to further distinguish members of the AOS clade ATCC 43491 through physical and chemical mutageneses Adamek et al. BMC Genomics (2018) 19:426 Page 9 of 15 Fig. 5 Relative location and density of all BGCs on the circular Amycolatopsis genomes. a) Relative location of Amycolatopsis BGCs expressed as downstream distance (0.00–1.00) to the replication origin oriC (=0.00). b) BGC density on certain areas of the circular Amycolatopsis genome (Total). c) BGC density on certain areas of the circular Amycolatopsis genome (main BGC classes) . This strain has originally been classified as Streptomy- emphasize the need to set new standards for the taxo- ces orientalis, and has since been renamed twice (Nocardia nomic classification of bacterial strains using genome se- orientalis and Amycolatopsis orientalis)[20, 44]. Conse- quences . quently, we agree with the previous suggestion by Jeong et The majority of the Amycolatopsis strains were isolated al. that stains DSM 46075 and DSM 43388 belong to novel from different soil types, but no correlation was found Amycolatopsis species , while further studies are needed between their geographic distribution and phylogenetic to establish if strain HCCB10007 belongs to the species A. relationships though the aquatic isolates, A. halophila and keratiniphila. A. marina, did not cluster with the soil isolates. Tan et al. Furthermore, POCP analysis showed that A. halophila,  investigated the phylogenetic diversity of different which was first classified based on 16S rRNA sequencing Amycolatopsis strains isolated from the same geographical , might represent a novel genus. In their study, and ecological habitat based on 16S rRNA sequencing. evaluating the thresholds to define a novel genus based and showed that at the same site the strains fell into on the POCP values, Qin et al. suggested to consider the several phylogenetic groups which corresponded to the genome size for prokaryotic taxonomy . A. halophila four phylogenetic subclades found in this study. Taken YIM 93223 also has a much smaller genome than other together these results suggest that there is no correlation Amycolatopsis strains. Therefore, there is need to reevalu- between geography and phylogeny for Amycolatopsis soil ate the taxonomic status of this strain. Our results further isolates though phylogenetic diversity can be found in Adamek et al. BMC Genomics (2018) 19:426 Page 10 of 15 small, geographically close regions. The four Amycolatopsis highly dependent on the bacterial genus [16, 38]. It is clear sublineages are ubiquitously distributed and hence are not from this study that the capacity of members of the genus the consequence of adaption to a specific geographical Amycolatopsis to produce diverse secondary metabolites is region. In contrast, too little data are available to draw comparable to that of the genera Mycobacterium and conclusions about the distribution of the aquatic isolates. Streptomyces . Further, no correlation was found between the geographic When taking a closer look at the potential of Amycola- distribution of strains and that of their BGCs, though a topsis strains to synthesize secondary metabolites different correlation was found between the species’ phylogeny and trends are apparent in the diversity and distribution of the distribution of BGCs. Therefore, it can be concluded BGCs: I) Some BGCs were found in members of all four that taxonomy is a more important indicator of BGC distri- of the subgroups. These BGCs mainly encoded ectoines, bution than geographic origin. This phenomenon has also non-NRPS derived siderophores, terpenes and RiPPs; no been observed with the marine actinobacterium Salinispora PKS or NPRS clusters fell into this grouping. These BGCs . In general, these data support the view that geograph- probably play a universal role in the metabolism of Amy- ically distant but ecologically similar habitats share overlap- colatopsis, and therefore might be seen as core-secondary ping gene pools. . The rarefaction curves for all of the metabolite clusters. II) In contrast, a correlation with the phylogenetic groups (Fig. 3c) showed that sampling more subgroup phylogeny was shown for most of the common Amycolatopsis genomes, will lead to the discovery of novel BGCs. These clusters have most likely been acquired BGCs even if the sampling was restricted to the same geo- through HGT in an ancestor strain, and have been graphic regions and soil types. retained throughout speciation. III) The extensive range Core−/pan-genome analysis revealed that members of of unique BGCs observed accounted for 67% of the di- the genus Amycolatopsis shared a core genome of 1212 verse Amycolatopsis GCFs and seemed to be derived from genes and a pan genome of 27,483 accessory and 33,342 recent HGT events. These clusters might be retained, if unique genes. So far only few core−/pan-genome studies they enhance the ability of strains to colonize ecological have been carried out for actinobacteria with comparably niches, or might be lost, and/or replaced if no such advan- large genomes (5–10 Mb). A study on 17 Streptomyces tage is realized . species revealed a core genome of 2018 genes, with 11,743 Two previous studies on the diversity of secondary in the accessory genome, and 20,831 in the unique gen- metabolites within actinobacterial taxa gave contradictory ome  while another one on 31 Streptomyces species results on the relationship between phylogeny and diversity revealed 2048 core genes, 9806 accessory and 17,840 of BGCs. Doroghazi et al., found that in 860 actinobacterial unique genes . Similarly, a comparative genomic ana- genomes BGC diversity for PKS and NRPS genes correlated lysis of 17 species of the genus Nocardiopsis revealed a with phylogeny at the species level thereby revealing the core genome of 1993 genes and a pan genome of over importance of secondary metabolites for speciation . In 22,000 genes . To identify and compare ortholog clus- contrast, Cimermancic et al. reported that the highest BGC ters, these studies used the pan genome analysis pipeline diversity was at the tips of phylogenetic trees, indicating PGAP . A second analysis using PGAP with 37 that their diversification is phylogeny independent . Amycoaltopsis genomes showed very similar results, albeit BGC diversity in the present study reflects both of these different exact numbers (Additional file 1: Figure S12). The trends suggesting that vertical gene transfer might be the core/pan-genome difference between both methods can be most important driver for the maintenance of common explained by leaving out A. nigrescens from the analysis and BGCs while recent HGT events independent of phylogeny, by the fact that the original NCBI annotations had to be as seen as through the singletons and, phylogenetically used to prepare the input data for PGAP. Both analyses re- independent cluster families might lead to further diversifi- veal a very small core genome compared to other studies. cation. The tendency of phylogenetically related BGCs to It is likely that this discrepancy results from the higher be located at the same position in the genomes of Amycola- number of genomes compared in our study, which usually topsis supports the hypothesis that these BGCs may have results in a lower core genome and shows the diversity of arisen from the same ancestral strain. At the same time the the genus. observation that BGCs which belong to the same cluster The Amycolatopsis pan-genome is quite large and is family are present in distinctly related strains is in line with still considered as “open”. This shows that members of their distribution by HGT. the genus have an extensive adaptive capacity. The COG Previous studies on the diversity and evolution of Sali- analysis (Additional file 1:FigureS3) showed that a nispora BGCs showed that a number of BGCs was fixed major part of the accessory and unique genes of the over globally distributed populations , though the Amycolatopsis strains are involved in secondary metab- highest diversity of Salinispora BGCs by far were derived olite biosynthesis and transport. Previous studies suggested from unique BGCs, on average 1–2 were found even that the diversity of secondary metabolites in bacteria is within highly conserved species . Adamek et al. BMC Genomics (2018) 19:426 Page 11 of 15 Similar observations to those outlined above can be Conclusions made for Amycolatopsis where BGC diversity is derived A comparative analysis of the genus Amycolatopsis and mostly from singleton BGCs. As Amycolatopsis strains its’ biosynthetic potential revealed a highly variable gene are not as closely related to one another as Salinispora content. All of the Amycolatopsis strains showed a small strains, an average of 6–7novel BGCs tend to be core-genome, but had a huge pan-genome indicating a present in new species though BGC fixation beyond great potential for the production of secondary metabo- the species level was observed within the phylogenetic lites. We were able to distinguish four phylogenetic subli- subgroups. neages within the genus Amycolatopsis, and four strains The majority of BGCs in Amycolatopsis genomes tend that formed distinct lineages in the phylogenetic tree. to be located in a region opposite the core region sur- When comparing the phylogenetic resolution with the rounding the origin of replication. This suggests that the potential of Amycolatopsis strains to produce secondary acquisition of BGCs via HGT occurs preferentially in metabolites an extensive diversity of BGCs was seen, most non-core regions of the genome. The distinction between of which comes from clusters unique to the genus. Hori- core- and non-core-regions has previously been proposed zontal and vertical gene transfer seem equally important for the genomes of A. mediterranei U32 , A. orientalis to drive and maintain the diversity of secondary metabo- HCCB10007 and A. methanolica 239 , where re- lites. Among the vertically inherited clusters, a few extend gions with a lower density of coding genes were observed across several phylogenetic lineages but most are specific and considered to be non-core-regions. In general, these for individual lineages. The observation that really novel regions correspond with the regions of high BGC diversity clusters acquired through HGT were detected shows that observed in the present study although the proposed related biosynthetic pathways can be transferred to unre- variable regions are larger than the non-core-regions lated strains through this mechanism. Further, it is evident proposed for strains U32, HCCB10007 and 239. A similar that novel BGCs are mainly, but not exclusively incorpo- phenomenon has been observed for Streptomyces where a rated into non-core hypervariable regions opposite the core region in the linear chromosome around the replica- replication origin on the circular Amycolatopsis genomes. tion origin is conserved, while the arms of the chromo- some display a high variability and contain the majority of Methods species specific sequences . In this same study, it was Amycolatopsis genomes also reported that the more phylogenetically distant the All of the Amycolatopsis genome sequences available in strains, the greater the size of the variable region. In the December 2016 at the National Center for Biotechnol- present study, it was found that within the closely related ogy Information (NCBI) database  and the DOE subgroups (groups A, B and D) the size of the hypervari- Joint Genome Institute -Integrated Microbial Genomes able region opposite the dnaA gene is smaller than in the & Microbiomes (JGI-IMG) database , were used. distantly related subgroup (group C). All in all, our study Draft genomes that consisted of more than 300 contigs is in agreement with the hypothesis that BGCs are located and sequences from single cell genomic approaches were mainly in the non-core region, probably because inser- omitted due to quality issues. tions in essential gene clusters would in most cases prove For the sequencing of the Amycolatopsis sp. H5 and to be lethal for the organism . However, the fact KNN 50.9b genomes, sequencing libraries were prepared that some BGCs, such as these coding for lantipeptides, are by applying Illumina TruSeq DNA PCR-Free Library mainly located in the core region shows that BGC-location Preparation Kits with a target insert size of 550 bp. Sub- is not exclusively found in the hypervariable regions in- sequent paired-end sequencing was performed on an dicating that insertions in core regions are not neces- Illumina HiSeq 1500 System (Illumina, San Diego, CA, sary lethal. USA) using HiSeq Reagent v3 Kits (Illumina, San Diego, In the present study it was not possible, as is the case CA, USA). Read length was 2× 250 bp. Base calling was of themorehighly conserved genus Salinispora , to performed with an in-house software platform . To detect precise genomic islands, given the extreme gen- assemble the resultant reads, the gsAssembler software etic variation and small core genome though hypervari- (Newbler) v2.8 was used. The genome sequence was able regions were evident within the genetic subgroups. submitted to the NCBI Prokaryotic Gene Annotation These hypervariable regions corresponded with the ma- Pipeline for annotation. jority of BGCs, but showed no consistent structural similarities, as corresponding flanking regions, or con- Comparative analysis of Amycolatopsis strains served mobile elements. To establish whether a “path- To elucidate the phylogenetic relationships between the way swapping” mechanism, as evident for Salinispora Amycolatopsis strains a multilocus sequence typing ap- , is also true for Amycolatopsis, a larger number of proach based on the concatenation of seven housekeep- more closely related strains needs to be analyzed. ing genes atpD, clpB, gapA, gyrB, nuoD, pyrH and rpoB Adamek et al. BMC Genomics (2018) 19:426 Page 12 of 15 was used. The single gene sequences were aligned using domains with the same modular position in the different ClustalW, embedded in MEGA6.0 software , trimmed clusters were compared. Clusters where the majority of with respect to the reading frame and subsequently KS and/or C domains shared a BLAST identity over 80% concatenated with the FaBox Fasta Alignment Joiner . were considered to belong to the same GCF. Results A maximum likelihood tree was generated using the were collected in a presence/absence matrix, with 1 repre- Tamura-Nei Model with NNI (Nearest Neighbor Inter- senting the presence and 0 the absence of a GCF member change) and 500 bootstrap replications was calculated in each of the Amycolatopsis strains. Hierarchical clus- with MEGA6.0 software. ter analysis using the DICE coefficient with UPGMA Core−/pan-genome analysis was performed using the (Unweighted Pair Group Method with Arithmetic mean) Bacterial Pan Genome Analysis (BPGA) tool . To was performed with PAST . Comparison of the Amy- avoid bias derived from different annotations all of the colatopsis phylogenetic tree with the BGC-dendrogram genome sequences were newly annotated using PROKKA was performed with Dendroscope v3.5.7, using the Tangle- 1.2 with default settings . As all six of the A. mediter- gram algorithm . ranei genomes were highly similar A. mediterranei S699 For genetic networking, the Pfam-domains of each was taken to represent the species to avoid bias. Ortholo- BGC were identified using HMMER 3.1b2  with the gous genes were identified with the USEARCH algorithm respective Hidden Markov Models (HMM) obtained  using a threshold of 0.5. Variations of the similarity from the Pfam database . A similarity index based threshold to 0.3, 0.4, 0.6 an 0.7 did not significantly alter on the absence or presence of Pfam domains was used the results, therefore the default threshold of 0.5 was to delineate BGC similarity, as previously described by chosen. Core−/pan-genome plots were calculated over Lin et al.  with the modifications of Cimermancic et 500 iterations. For comparative purposes an additional al. . A similarity threshold of 0.65 was chosen, because core−/pan-genome analysis was performed using the pan it best reflected the manually determined GCFs. The genome analysis pipeline PGAP . Runs were per- threshold was evaluated manually, as the threshold values formed using default settings under the MP and GF mode of 0.5  and 0.8  described in previous publications of PGAP. were not found to be suitable to distinguish between the To resolve the relationship of Amycolatopsis strains on Amycolatopsis BGCs. The resulting similarity matrix was the genus and species level the percentage of conserved visualized with Cytoscape 3.4.0 . proteins (POCP) was calculated as previously described Rarefaction curves displaying the relative BGC richness , and the Average Nucleotide Identity based on the for each phylogenetic group were calculated from the MUMmer algorithm (ANIm) was calculated with JSpecies BGC presence/absence matrix using EstimateS . using the default settings . Graphical visualization of ANIm values was implemented with R version 3.3.3 . BGC location BGC and GCF identification To schematically display the relative positions of the The biosynthetic gene clusters of all of the Amycolatopsis common BGC clusters on the Amycolatopsis genomes, strains were identified using antiSMASH 3.0 with default the approach previously described by Ziemert et al.  settings . Identified clusters were compared using was used. First, the draft genomes were assembled as MultiGeneBlast . Cluster boundaries were determined pseudocontigs on the phylogenetically closest complete as previously described  and clusters were manually genome as a reference using CONTIGuator v2.7 . trimmed using Artemis . The circular genomes were linearized, using the dnaA Assigning gene clusters to GCFs was based on manual gene as the start for each linearized pseudocontig. If inspection of the antiSMASH output files, a comparison necessary, the reverse complement sequence was used with multigeneblast and sequence comparison of KS and for genome alignment. Second, the position of the re- C domains was achieved using BLAST  and NaPDoS spective BGCs on the complete genomes and on the . The following criteria had to be met for BGC clus- pseudocontigs was annotated using geneious R9.1.6 ters to be assigned to the same gene cluster family: I) . Finally, the complete genomes and pseudocontigs The gene clusters had to have a similar architecture, II) were normalized in length to visually distinguish between The majority of genes included in the cluster needed to the relative position of the BGCs on the genomes and have the same function, but not necessarily in the same pseudocontigs. The contigs were aligned to the closest re- order. III) The majority of genes in the genome needed lated complete genome within the same phylogenetic to have a BLAST similarity of at least 50% identity over group. The circular genomes were linearized and normal- an 80% coverage rate. IV) For modular PKS, NRPS and ized in length. An overview of the of complete genome their hybrid clusters a BLAST similarity of the respective and pseudo contig synteny is shown in Additional file 1: KS and C domains was considered. Hence, KS and C Figure S13. Adamek et al. BMC Genomics (2018) 19:426 Page 13 of 15 To distinguish conserved regions from hypervariable This Whole Genome Shotgun project has been deposited at DDBJ/ ENA/GenBank under the accession NMUL00000000 (H5) and regions on the Amycolatopsis genomes and pseudocontigs, NMUK00000000 (KNN50.9b). The version described in this paper is andtoidentifygenomerearrangements, theHarvest toolkit version NMUL01000000 for Amycolatopsis sp. H5 and version containing the Parsnp v1.2 tool for core genome alignment NMUK01000000 for Amycolatopsis sp. KNN50.9b. and Gingr 1.2 for visualization was used . Due to the Authors’ contributions small core genome of Amycolatopsis, a core genome align- MAd carried out the comparative genomics analyses and wrote the paper. MAl ment for all of the strains was not feasible hence, core gen- wrote the bioinformatics scripts for the genomic analyses, and supported MAd with the comparative genomics analyses. HSA contributed the BGC density ome alignment for the phylogenetic subgroups that shared analysis and visualization with R. MG and ATB were responsible for overseeing 85% ANIm was performed. This excluded the genome the isolation and characterization of the novel Amycolatopsis sp. strains H5 and sequences of A. halophila, A. marina, A. nigrescens, A. sac- KNN50.9b. DW performed the POCP analysis. DW, AW and JK were responsible for sequencing the novel Amycolatopsis strains, and NZ guided the research chari, A. taiwanensis and A. xylanica from this analysis. and edited the manuscript. All authors read and approved the manuscript. BGC density plots were created with R version 3.3.3 . Thereby, the genome was divided into 8 regions, and density Ethics approval and consent to participate Not applicable. plots were built showing the abundance of BGCs in each re- gion, for each cluster type and for all cluster types in total. Competing interests The authors declare that they have no competing interests. Additional files Publisher’sNote Springer Nature remains neutral with regard to jurisdictional claims in Additional file 1: Supplementary Figures. Figure S1-S13 (PDF 6871 kb) published maps and institutional affiliations. Additional file 2: Table S1. Sequencing statistics for Amycolatopsis sp. H5 and Amycolatopsis sp. KNN50.9b. (DOCX 42 kb) Author details Interfaculty Institute of Microbiology and Infection Medicine Tübingen, Additional file 3: Table S2. Basic features of Amycolatopsis genomes. Microbiology/Biotechnology, University of Tübingen, Tübingen, Germany. (DOCX 110 kb) German Centre for Infection Research (DZIF), Partner Site Tübingen, Additional file 4: Table S3. POCP analysis. (XLSX 20 kb) Tübingen, Germany. School of Biology, Newcastle University, Ridley Building 2, Newcastle upon Tyne NE1 7RU, UK. School of Biosciences, University of Kent, Canterbury CT2 7NJ, UK. Universität Bielefeld, Center for Biotechnology Abbreviations (CeBiTec), Bielefeld, Germany. AMS: Thermophilic Amycolatopsis methanolica subclade; ANI: Average Nucleotide Identity; AOS: Mesophilic/moderately thermophilic Amycolatopsis Received: 28 August 2017 Accepted: 21 May 2018 orientalis subclade; ATS: Thermophilic Amycolatopsis taiwanensis subclade; BGC: Biosynthetic gene cluster; GCF: Gene cluster family; HGT: Horizontal gene transfer; MIBiG: Minimum Information about a Biosynthetic Gene References cluster; NRPS: Non-ribosomal peptide synthetase; PKS: Polyketide synthase; 1. Katz L, Baltz RH. Natural product discovery: past, present, and future. J Ind POCP: Percentage of Conserved Proteins; RiPP: Ribosomally synthesized and Microbiol Biotechnol. 2016;43(2–3):155–76. post-translationally modified peptides 2. Newman DJ, Cragg GM. Natural products as sources of new drugs from 1981 to 2014. J Nat Prod. 2016;79(3):629–61. Acknowledgements 3. Spellberg B. The future of antibiotics. Crit Care. 2014;18(3):228. The authors would like to thank Timo Niedermeyer for establishing the 4. Medema MH, Fischbach MA. Computational approaches to natural product collaboration that secured the novel Amycolatopsis strains. Strains KNN 50.9b discovery. Nat Chem Biol. 2015;11(9):639–48. and H5 were isolated and characterized by Kanungnid Busarakam and 5. Ziemert N, Alanjary M, Weber T. The evolution of genome mining in Hamidah Idris of Newcastle University, UK. The authors are grateful to Evi microbes - a review. Nat Prod Rep. 2016;33(8):988–1005. Stegmann for valuable discussions. 6. Weissman KJ. The structural biology of biosynthetic megaenzymes. Nat Chem Biol. 2015;11(9):660–70. 7. Arnison PG, Bibb MJ, Bierbaum G, Bowers AA, Bugni TS, Bulaj G, Camarero Funding JA, Campopiano DJ, Challis GL, Clardy J, et al. Ribosomally synthesized and This work was supported by the DZIF TTU 9.704. Alan T Bull and Michael post-translationally modified peptide natural products: overview and Goodfellow are grateful to the Leverhulme Trust for the award of Emeritus recommendations for a universal nomenclature. Nat Prod Rep. 2013;30(1):108–60. Fellowships. The bioinformatics support of the BMBF funded project 8. Daum M, Herrmann S, Wilkinson B, Bechthold A. Genes and enzymes ‘Bielefeld-Gießen Center for Microbial Bioinformatics – BiGi (Grant number involved in bacterial isoprenoid biosynthesis. Curr Opin Chem Biol. 031A533)’ within the German Network for Bioinformatics Infrastructure 2009;13(2):180–8. (de.NBI) is gratefully acknowledged. The funding bodies had no role in the 9. Baltz RH. Gifted microbes for genome mining and natural product design of the study, the preparation of the manuscript and the collection, discovery. J Ind Microbiol Biotechnol. 2017;44(4–5):573–88. analysis, and interpretation of data. 10. Weber T, Blin K, Duddela S, Krug D, Kim HU, Bruccoleri R, Lee SY, Fischbach MA, Muller R, Wohlleben W, et al. antiSMASH 3.0-a comprehensive resource Availability of data and materials for the genome mining of biosynthetic gene clusters. Nucleic Acids Res. Data from the following public databases was used for the analyses: 2015;43(W1):W237–43. 11. Traxler MF, Kolter R. Natural products in soil microbe interactions and Complete and draft genomes from the Joint Genome Institute (JGI) evolution. Nat Prod Rep. 2015;32(7):956–70. genome portal http://genome.jgi.doe.gov/ 12. Davies J. Specialized microbial metabolites: functions and origins. J Antibiot Complete and draft genomes from the national (NCBI) assembly (Tokyo). 2013;66(7):361–4. database https://www.ncbi.nlm.nih.gov/assembly/ 13. Florez LV, Scherlach K, Gaube P, Ross C, Sitte E, Hermes C, Rodrigues A, Reference biosynthetic gene clusters from the Minimum Information Hertweck C, Kaltenpoth M. Antibiotic-producing symbionts dynamically about a Biosynthetic Gene Cluster Database (MIBiG) http:// transition between plant pathogenicity and insect-defensive mutualism. mibig.secondarymetabolites.org/ Nat Commun. 2017;8:15172. Adamek et al. BMC Genomics (2018) 19:426 Page 14 of 15 14. Ziemert N, Lechner A, Wietz M, Millan-Aguinaga N, Chavarria KL, Jensen PR. 35. Bentley SD, Chater KF, Cerdeno-Tarraga AM, Challis GL, Thomson NR, James Diversity and evolution of secondary metabolism in the marine actinomycete KD, Harris DE, Quail MA, Kieser H, Harper D, et al. Complete genome genus Salinispora. Proc Natl Acad Sci U S A. 2014;111(12):E1130–9. sequence of the model actinomycete Streptomyces coelicolor A3(2). Nature. 15. Doroghazi JR, Albright JC, Goering AW, Ju KS, Haines RR, Tchalukov KA, Labeda 2002;417(6885):141–7. DP, Kelleher NL, Metcalf WW. A roadmap for natural product discovery based 36. Ikeda H, Ishikawa J, Hanamoto A, Shinose M, Kikuchi H, Shiba T, Sakaki Y, on large-scale genomics and metabolomics. Nat Chem Biol. 2014;10(11):963–8. Hattori M, Omura S. Complete genome sequence and comparative analysis of the industrial microorganism Streptomyces avermitilis. Nat Biotechnol. 16. Cimermancic P, Medema MH, Claesen J, Kurita K, Wieland Brown LC, 2003;21(5):526–31. Mavrommatis K, Pati A, Godfrey PA, Koehrsen M, Clardy J, et al. Insights into secondary metabolism from a global analysis of prokaryotic biosynthetic 37. Jensen PR. Natural products and the gene cluster revolution. Trends gene clusters. Cell. 2014;158(2):412–21. Microbiol. 2016;24(12):968–77. 17. LPSN - List of procaryotic names with standing in nomenclature, Acessed 38. Doroghazi JR, Metcalf WW. Comparative genomics of actinomycetes with a May, 2017 [http://www.bacterio.net/amycolatopsis.html]. focus on natural product biosynthetic genes. BMC Genomics. 2013;14:611. 18. Bian J, Li Y, Wang J, Song FH, Liu M, Dai HQ, Ren B, Gao H, Hu X, Liu ZH, et 39. Komaki H, Ichikawa N, Oguchi A, Hamada M, Tamura T, Fujita N. Genome-based al. Amycolatopsis marina sp. nov., an actinomycete isolated from an ocean analysis of non-ribosomal peptide synthetase and type-I polyketide synthase gene sediment. Int J Syst Evol Microbiol. 2009;59(Pt 3):477–81. clusters in all type strains of the genus Herbidospora. BMC Res Notes. 2015;8:548. 19. Carlsohn MR,Groth I, TanGY, SchutzeB,Saluz HP,Munder T,YangJ, 40. Huang JR, Ming H, Li S, Zhao ZL, Meng XL, Zhang JX, Tang Z, Li WJ, Nie GX. Wink J, Goodfellow M. Amycolatopsis saalfeldensis sp. nov.,anovel Amycolatopsis xuchangensis sp. nov. and Amycolatopsis jiguanensis sp. nov., actinomycete isolated from a medieval alum slate mine. Int J Syst Evol isolated from soil. Antonie Van Leeuwenhoek. 2016;109(11):1423–31. Microbiol. 2007;57(Pt 7):1640–6. 41. Tang B, Xie F, Zhao W, Wang J, Dai S, Zheng H, Ding X, Cen X, Liu H, Yu Y, 20. Lechevalier MP, Prauser H, Labeda DP, Ruan J-S. Two new genera of et al. A systematic study of the whole genome sequence of Amycolatopsis Nocardioform Actinomycetes: Amycolata gen. nov. and Amycolatopsis gen. methanolica strain 239T provides an insight into its physiological and nov. Int J Syst Evol Microbiol. 1986;36(1):29–37. taxonomic properties which correlate with its position in the genus. Synth 21. Majumdar S, Prabhagaran SR, Shivaji S, Lal R. Reclassification of Amycolatopsis Syst Biotechnol. 2016;1(3):169–86. orientalis DSM 43387 as Amycolatopsis benzoatilytica sp. nov. Int J Syst Evol 42. Goris J, Konstantinidis KT, Klappenbach JA, Coenye T, Vandamme P, Tiedje Microbiol. 2006;56(Pt 1):199–204. JM. DNA-DNA hybridization values and their relationship to whole-genome 22. Tang SK, Wang Y, Guan TW, Lee JC, Kim CJ, Li WJ. Amycolatopsis halophila sequence similarities. Int J Syst Evol Microbiol. 2007;57(Pt 1):81–91. sp. nov., a halophilic actinomycete isolated from a salt lake. Int J Syst Evol 43. Xu L, Huang H, Wei W, Zhong Y, Tang B, Yuan H, Zhu L, Huang W, Ge M, Microbiol. 2010;60(Pt 5):1073–8. Yang S, et al. Complete genome sequence and comparative genomic analyses of the vancomycin-producing Amycolatopsis orientalis. BMC 23. Wink JM, Kroppenstedt RM, Ganguli BN, Nadkarni SR, Schumann P, Seibert Genomics. 2014;15:363. G, Stackebrandt E. Three new antibiotic producing species of the genus Amycolatopsis, Amycolatopsis balhimycina sp. nov., A. tolypomycina sp. nov., 44. Brigham RB, Pittenger RC. Streptomyces orientalis, n. sp, the source of A. vancoresmycina sp. nov., and description of Amycolatopsis keratiniphila vancomycin. Antibiot Chemother (Northfield). 1956;6(11):642–7. subsp. keratiniphila subsp. nov. and A. keratiniphila subsp. nogabecina subsp. 45. Jeong H, Sim YM, Kim HJ, Lee YJ, Lee DW, Lim SK, Lee SJ. Genome nov. Syst Appl Microbiol. 2003;26(1):38–46. sequences of Amycolatopsis orientalis subsp. orientalis strains DSM 43388 24. Labeda DP, Donahue JM, Williams NM, Sells SF, Henton MM. Amycolatopsis and DSM 46075. Genome Announc. 2013;1(4):e00545-13. kentuckyensis sp. nov., Amycolatopsis lexingtonensis sp. nov. and 46. Qin QL, Xie BB, Zhang XY, Chen XL, Zhou BC, Zhou J, Oren A, Zhang YZ. A Amycolatopsis pretoriensis sp. nov., isolated from equine placentas. Int J Syst proposed genus boundary for the prokaryotes based on genomic insights. Evol Microbiol. 2003;53(Pt 5):1601–5. J Bacteriol. 2014;196(12):2210–5. 25. Huang Y, Pasciak M, Liu Z, Xie Q, Gamian A. Amycolatopsis palatopharyngis 47. Chun J, Oren A, Ventosa A, Christensen H, Arahal DR, da Costa MS, Rooney sp. nov., a potentially pathogenic actinomycete isolated from a human AP, Yi H, Xu XW, De Meyer S, et al. Proposed minimal standards for the use clinical source. Int J Syst Evol Microbiol. 2004;54(Pt 2):359–63. of genome data for the taxonomy of prokaryotes. Int J Syst Evol Microbiol. 2018;68(1):461–6. 26. Chen S, Wu Q, Shen Q, Wang H. Progress in understanding the genetic information and biosynthetic pathways behind Amycolatopsis antibiotics, 48. Tan GY, Ward AC, Goodfellow M. Exploration of Amycolatopsis diversity in with implications for the continued discovery of novel drugs. soil using genus-specific primers and novel selective media. Syst Appl Chembiochem. 2016;17(2):119–28. Microbiol. 2006;29(7):557–69. 27. He HY, Pan HX, Wu LF, Zhang BB, Chai HB, Liu W, Tang GL. Quartromicin 49. Fondi M, Karkman A, Tamminen MV, Bosi E, Virta M, Fani R, Alm E, biosynthesis: two alternative polyketide chains produced by one polyketide McInerney JO. "every gene is everywhere but the environment selects": synthase assembly line. Chem Biol. 2012;19(10):1313–23. global Geolocalization of gene sharing in environmental samples through 28. Dobashi K, Matsuda N, Hamada M, Naganawa H, Takita T, Takeuchi T. Novel network analysis. Genome Biol Evol. 2016;8(5):1388–400. antifungal antibiotics octacosamicins a and B. I. Taxonomy, fermentation 50. Kim JN, Kim Y, Jeong Y, Roe JH, Kim BG, Cho BK. Comparative genomics and isolation, physico-chemical properties and biological activities. J Antibiot reveals the Core and accessory genomes of Streptomyces species. J Microbiol (Tokyo). 1988;41(11):1525–32. Biotechnol. 2015;25(10):1599–605. 29. Lukezic T, Lesnik U, Podgorsek A, Horvat J, Polak T, Sala M, Jenko B, Raspor 51. Tian X, Zhang Z, Yang T, Chen M, Li J, Chen F, Yang J, Li W, Zhang B, Zhang P, Herron PR, Hunter IS, et al. Identification of the chelocardin biosynthetic Z, et al. Comparative genomics analysis of Streptomyces species reveals their gene cluster from Amycolatopsis sulphurea: a platform for producing novel adaptation to the marine environment and their diversity at the genomic tetracycline antibiotics. Microbiology. 2013;159(Pt 12):2524–32. level. Front Microbiol. 2016;7:998. 30. Kunimoto S, Lu J, Esumi H, Yamazaki Y, Kinoshita N, Honma Y, Hamada M, 52. Li HW, Zhi XY, Yao JC, Zhou Y, Tang SK, Klenk HP, Zhao J, Li WJ. Comparative Ohsono M, Ishizuka M, Takeuchi T. Kigamicins, novel antitumor antibiotics. I. genomic analysis of the genus Nocardiopsis provides new insights into its genetic Taxonomy, isolation, physico-chemical properties and biological activities. mechanisms of environmental adaptability. PLoS One. 2013;8(4):e61528. J Antibiot (Tokyo). 2003;56(12):1004–11. 53. Zhao Y, Wu J, Yang J, Sun S, Xiao J, Yu J. PGAP: pan-genomes analysis 31. Beemelmanns C, Ramadhar TR, Kim KH, Klassen JL, Cao S, Wyche TP, Hou Y, pipeline. Bioinformatics. 2012;28(3):416–8. Poulsen M, Bugni TS, Currie CR, et al. Macrotermycins A-D, glycosylated 54. Jensen PR, Williams PG, Oh DC, Zeigler L, Fenical W. Species-specific macrolactams from a termite-associated Amycolatopsis sp. M39. Org Lett. secondary metabolite production in marine actinomycetes of the genus 2017;19(5):1000–3. Salinispora. Appl Environ Microbiol. 2007;73(4):1146–52. 32. Chaudhari NM, Gupta VK, Dutta C. BPGA- an ultra-fast pan-genome analysis 55. Zhao W, Zhong Y, Yuan H, Wang J, Zheng H, Wang Y, Cen X, Xu F, Bai J, pipeline. Sci Rep. 2016;6:24373. Han X, et al. Complete genome sequence of the rifamycin SV-producing 33. Zhang W, Du P, Zheng H, Yu W, Wan L, Chen C. Whole-genome sequence Amycolatopsis mediterranei U32 revealed its genetic characteristics in comparison as a method for improving bacterial species definition. J Gen phylogeny and metabolism. Cell Res. 2010;20(10):1096–108. Appl Microbiol. 2014;60(2):75–8. 56. Choulet F, Aigle B, Gallois A, Mangenot S, Gerbaud C, Truong C, Francou FX, 34. Adamek M, Spohn M, Stegmann E, Ziemert N. Mining bacterial genomes for Fourrier C, Guerineau M, Decaris B, et al. Evolution of the terminal regions secondary metabolite gene clusters. Methods Mol Biol. 2017;1520:23–47. of the Streptomyces linear chromosome. Mol Biol Evol. 2006;23(12):2361–9. Adamek et al. BMC Genomics (2018) 19:426 Page 15 of 15 57. Penn K, Jenkins C, Nett M, Udwary DW, Gontang EA, McGlinchey RP, Foster B, Lapidus A, Podell S, Allen EE, et al. Genomic islands link secondary metabolism to functional adaptation in marine Actinobacteria. ISME J. 2009;3(10):1193–203. 58. Coordinators NR. Database resources of the National Center for biotechnology information. Nucleic Acids Res. 2017;45(D1):D12–7. 59. Markowitz VM, Chen IM, Palaniappan K, Chu K, Szeto E, Grechkin Y, Ratner A, Jacob B, Huang J, Williams P, et al. IMG: the integrated microbial genomes database and comparative analysis system. Nucleic Acids Res. 2012;40(Database issue):D115–22. 60. Wibberg D, Andersson L, Tzelepis G, Rupp O, Blom J, Jelonek L, Puhler A, Fogelqvist J, Varrelmann M, Schluter A, et al. Genome analysis of the sugar beet pathogen Rhizoctonia solani AG2-2IIIB revealed high numbers in secreted proteins and cell wall degrading enzymes. BMC Genomics. 2016;17:245. 61. Tamura K, Stecher G, Peterson D, Filipski A, Kumar S. MEGA6: molecular evolutionary genetics analysis version 6.0. Mol Biol Evol. 2013;30(12):2725–9. 62. Villesen P. FaBox: an online toolbox for FASTA sequences. Mol Ecol Notes. 2007;7(6):965–8. 63. Seemann T. Prokka: rapid prokaryotic genome annotation. Bioinformatics. 2014;30(14):2068–9. 64. Edgar RC. Search and clustering orders of magnitude faster than BLAST. Bioinformatics. 2010;26(19):2460–1. 65. Richter M, Rossello-Mora R, Oliver Glockner F, Peplies J. JSpeciesWS: a web server for prokaryotic species circumscription based on pairwise genome comparison. Bioinformatics. 2016;32(6):929–31. 66. Development R. Core team: R: a language environment for statistical computing. In: R Foundation for Statistical Computing; 2008. 67. Medema MH, Takano E, Breitling R. Detecting sequence homology at the gene cluster level with MultiGeneBlast. Mol Biol Evol. 2013;30(5):1218–23. 68. Carver T, Berriman M, Tivey A, Patel C, Bohme U, Barrell BG, Parkhill J, Rajandream MA. Artemis and ACT: viewing, annotating and comparing sequences stored in a relational database. Bioinformatics. 2008;24(23):2672–6. 69. Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ. Basic local alignment search tool. J Mol Biol. 1990;215(3):403–10. 70. Ziemert N, Podell S, Penn K, Badger JH, Allen E, Jensen PR. The natural product domain seeker NaPDoS: a phylogeny based bioinformatic tool to classify secondary metabolite gene diversity. PLoS One. 2012;7(3):e34064. 71. Hammer Ø, Harper DAT, Ryan PD. PAST: paleontological statistics software package for education. Palaeontol Electron. 2001;4(1):9pp. 72. Huson DH, Scornavacca C. Dendroscope 3: an interactive tool for rooted phylogenetic trees and networks. Syst Biol. 2012;61(6):1061–7. 73. HMMER 3.1b2 [http://hmmer.org/]. 74. Finn RD, Coggill P, Eberhardt RY, Eddy SR, Mistry J, Mitchell AL, Potter SC, Punta M, Qureshi M, Sangrador-Vegas A, et al. The Pfam protein families database: towards a more sustainable future. Nucleic Acids Res. 2016;44(D1):D279–85. 75. Lin K, Zhu L, Zhang DY. An initial strategy for comparing proteins at the domain architecture level. Bioinformatics. 2006;22(17):2081–6. 76. Schorn MA, Alanjary MM, Aguinaldo K, Korobeynikov A, Podell S, Patin N, Lincecum T, Jensen PR, Ziemert N, Moore BS. Sequencing rare marine actinomycete genomes reveals high density of unique natural product biosynthetic gene clusters. Microbiology. 2016;162(12):2075–86. 77. Su G, Morris JH, Demchak B, Bader GD. Biological network exploration with Cytoscape 3. Curr Protoc Bioinformatics. 2014;47:8. 13 11–24 78. Colwell RK, Elsensohn JE. EstimateS turns 20: statistical estimation of species richness and shared species from samples, with non-parametric extrapolation. Ecography. 2014;37(6):609–13. 79. Galardini M, Biondi EG, Bazzicalupo M, Mengoni A. CONTIGuator: a bacterial genomes finishing tool for structural insights on draft genomes. Source Code Biol Med. 2011;6:11. 80. Kearse M, Moir R, Wilson A, Stones-Havas S, Cheung M, Sturrock S, Buxton S, Cooper A, Markowitz S, Duran C, et al. Geneious basic: an integrated and extendable desktop software platform for the organization and analysis of sequence data. Bioinformatics. 2012;28(12):1647–9. 81. Treangen TJ, Ondov BD, Koren S, Phillippy AM. The harvest suite for rapid core-genome alignment and visualization of thousands of intraspecific microbial genomes. Genome Biol. 2014;15(11):524.
BMC Genomics – Springer Journals
Published: Jun 1, 2018