TY - JOUR AU - Pucker, Boas AB - Background Carotenoids and flavonoids are two important classes of plant pigments. Colorful flowers are often pigmented by one or both of these pigments. Flavonoids (anthocyanins) are known for blue, purple, red, and orange colors, while carotenoids can provide yellow, orange, and red pigmentation. Biological roles of carotenoids include photoprotection [1, 2], light capture [3], coloration of plant structures [4–6], and hormone precursors [7]. Therefore, carotenoids are considered part of the primary metabolism. In contrast, flavonoids are specialized plant metabolites that contribute to stress resilience and play important roles in the reproduction of many plants [8–10]. The biosynthesis of flavonoid aglycones [11, 12], the decoration with sugar and acyl groups [13–15], and the intracellular transport of flavonoids [16–18] have been reviewed elsewhere. The core of the flavonoid biosynthesis pathway is well conserved across land plant species, whereas the decorating reactions are often species-specific. Glycosylation and acylation modify the properties of flavonoids, but specific properties depend on the added sugar moiety or acyl group. Glycosylation can enhance the anthocyanin stability [19–21] and be required for intracellular flavonoid transport [17, 18]. Acylation can also boost the stability of anthocyanins [22]. Applications of anthocyanins as food colorants rely on a high stability. Biotechnological applications involving flavonoids are generally facilitated by knowledge about the genes involved in their biosynthesis. For example, such insights are crucial for the production of flavonoids in heterologous systems [23]. The flavonoid biosynthesis pathway is well studied in several model organisms and crops like Arabidopsis thaliana [18, 20], Petunia hybrida [11, 24, 25], Antirrhinum majus [26, 27], Vitis vinifera [28–30], and Zea mays [20, 31]. An ongoing challenge is the efficient transfer of this knowledge to other plant species. In fact, the functional annotation transfer challenge goes far beyond knowledge about the flavonoid biosynthesis. Plant genome sequences are becoming available with an increasing pace [32, 33], but the functional annotation is lacking behind. Automatic approaches for biological interpretation of these data sets are crucial to exploit the full value of genomic resources. There are several comprehensive databases collecting and sharing knowledge about plant metabolism, e.g. The Plant Metabolic Network [34], Plant Reactome [35], and MetaCyc [36]. Tools like InterProScan5 [37], Blast2GO [38], KEGG Automatic Annotation Server [39], and Mercator4 [40] allow the assignment of certain annotation terms to input sequences. While these tools are able to annotate genome-scale data sets, they are less specific with respect to the assigned annotation. Researchers looking for a specific pathway could benefit from tools focusing on their pathway of interest. Here, we present an updated approach for the automatic annotation of the flavonoid biosynthesis genes in any land plant species of interest based on prior knowledge about the pathway. We also demonstrate that this approach can be applied to other pathways by investigating the carotenoid biosynthesis genes in multiple land plant species. A web server makes this automatic annotation workflow available to the research community. Materials and methods Validation of KIPEs3 results through manual inspection A validation of KIPEs3 was based on Arabidopsis thaliana Nd-1 v1.0 [41], Capsella grandiflora v1.1 [42], Daucus carota v2.0 [43], and Dioscorea dumetorum v1.0 [44]. Identified candidates were manually inspected in a phylogenetic tree constructed by FastTree2 [45] based on alignments generated with MAFFT v7 [46]. Run time analysis of KIPEs3 The average run time of KIPEs3 per data set was assessed using a previously defined set of 121 plant species [47]. All these sequence files were retrieved from Phytozome [48]. The analysis was performed on coding sequences to represent transcriptome assemblies. Performance of KIPEs3 with FastTree v2.1.10 [45] was analyzed using default parameters. Run time and the number of sequences in the analyzed data sets were documented. Run times were measured for analyses with the construction of phylogenetic trees for all final candidate genes. Comparison of KIPEs3 predictions against annotation databases The accuracy of KIPEs3 predictions was compared against MetaCyc [49], Plant Reactome [35], and Plant Metabolic Network (PMN) [34]. FLS was selected as a representative gene of the flavonoid biosynthesis and LYC-b was selected as a representative gene of the carotenoid biosynthesis. The compared species were determined by overlap between the databases with respect to the covered species. The usage of the same genome sequence and annotation for a species across different databases was important. FLS annotations in Arabidopsis thaliana, Cicer arietinum, and Citrus sinensis were compared. The LYC-b annotations in A. thaliana, Solanum lycopersicum, and Daucus carota were compared. The candidates were searched using the Enzyme Commission numbers, EC 1.14.20.6 for FLS and EC 5.5.1.19 for LYC-b in Metacyc and PMN. In Plant Reactome, the genes were searched using the terms “flavonol synthase” and “lycopene beta cyclase” in the search bar. KIPEs3 v0.35 was run as a command-line tool providing flavonoid biosynthesis baits, residues, and the predicted peptide sequences of the species to identify FLS candidates in each species. The LYC-b candidates were annotated in another run based on the carotenoid biosynthesis bait sequences, residues, and the predicted peptide sequences of the respective species. The FLS and LYC-b candidates listed in the three databases and identified by KIPEs3 were aligned using MAFFT v7 [46] to discriminate functional proteins and the products of pseudogenes by identifying FLS and LYC-b specific motifs and residues, respectively. A phylogenetic tree of the LYC-b candidates was generated using FastTree v2.1 [45]. Gene expression analysis Gene expression was investigated in all plant species that were used for comparison of KIPEs3 against MetaCyc, Plant Reactome, and Plant Metabolic Network. As basis for the gene expression served the A. thaliana TAIR10 sequence combined with the Araport11 annotation [50], Cicer arietinum v1.0 (phytozome 492) [51], Citrus sinensis v1.1 (phytozome 154) [52], Solanum lycopersicum (phytozome 514 ITAG3.2) [53], and Daucus carota v2.0 (phytozome 388) [43]. All available RNA-seq data sets were retrieved from the Sequence Read Archive (SRA) via fastq-dump [54] (S1 File). Validation of KIPEs3 results through manual inspection A validation of KIPEs3 was based on Arabidopsis thaliana Nd-1 v1.0 [41], Capsella grandiflora v1.1 [42], Daucus carota v2.0 [43], and Dioscorea dumetorum v1.0 [44]. Identified candidates were manually inspected in a phylogenetic tree constructed by FastTree2 [45] based on alignments generated with MAFFT v7 [46]. Run time analysis of KIPEs3 The average run time of KIPEs3 per data set was assessed using a previously defined set of 121 plant species [47]. All these sequence files were retrieved from Phytozome [48]. The analysis was performed on coding sequences to represent transcriptome assemblies. Performance of KIPEs3 with FastTree v2.1.10 [45] was analyzed using default parameters. Run time and the number of sequences in the analyzed data sets were documented. Run times were measured for analyses with the construction of phylogenetic trees for all final candidate genes. Comparison of KIPEs3 predictions against annotation databases The accuracy of KIPEs3 predictions was compared against MetaCyc [49], Plant Reactome [35], and Plant Metabolic Network (PMN) [34]. FLS was selected as a representative gene of the flavonoid biosynthesis and LYC-b was selected as a representative gene of the carotenoid biosynthesis. The compared species were determined by overlap between the databases with respect to the covered species. The usage of the same genome sequence and annotation for a species across different databases was important. FLS annotations in Arabidopsis thaliana, Cicer arietinum, and Citrus sinensis were compared. The LYC-b annotations in A. thaliana, Solanum lycopersicum, and Daucus carota were compared. The candidates were searched using the Enzyme Commission numbers, EC 1.14.20.6 for FLS and EC 5.5.1.19 for LYC-b in Metacyc and PMN. In Plant Reactome, the genes were searched using the terms “flavonol synthase” and “lycopene beta cyclase” in the search bar. KIPEs3 v0.35 was run as a command-line tool providing flavonoid biosynthesis baits, residues, and the predicted peptide sequences of the species to identify FLS candidates in each species. The LYC-b candidates were annotated in another run based on the carotenoid biosynthesis bait sequences, residues, and the predicted peptide sequences of the respective species. The FLS and LYC-b candidates listed in the three databases and identified by KIPEs3 were aligned using MAFFT v7 [46] to discriminate functional proteins and the products of pseudogenes by identifying FLS and LYC-b specific motifs and residues, respectively. A phylogenetic tree of the LYC-b candidates was generated using FastTree v2.1 [45]. Gene expression analysis Gene expression was investigated in all plant species that were used for comparison of KIPEs3 against MetaCyc, Plant Reactome, and Plant Metabolic Network. As basis for the gene expression served the A. thaliana TAIR10 sequence combined with the Araport11 annotation [50], Cicer arietinum v1.0 (phytozome 492) [51], Citrus sinensis v1.1 (phytozome 154) [52], Solanum lycopersicum (phytozome 514 ITAG3.2) [53], and Daucus carota v2.0 (phytozome 388) [43]. All available RNA-seq data sets were retrieved from the Sequence Read Archive (SRA) via fastq-dump [54] (S1 File). Results and discussion KIPEs was developed for the automatic identification and annotation of genes/enzymes in the flavonoid biosynthesis of a plant species. Orthologs of well characterized sequences are identified as initial candidates in the species of interest. Next, these candidates are screened for functionally important amino acid residues which enables a prediction about the functionality or non-functionality of a candidate. KIPEs enables the analysis of large numbers of sequence datasets with a constant quality. Extended bait sequence collection The initial KIPEs version was exclusively focused on annotating the biosynthesis of flavonoid aglycones [12]. Now, KIPEs3 is able to identify additional candidates that are relevant for the decoration of flavonoid aglycones. The basis for the identification of candidates in new data sets are bait sequences that are previously characterized sequences from other species. The initial KIPEs bait sequence collection was extended by adding anthocyanidin rhamnosyltransferase (UGT79B2 and UGT79B3), anthocyanin 5-O-glucosyltransferase (UGT75C1), flavonoid 3-O-glucosyltransferase (UGT78D1 and UGT78D2), UGT72L1, UGT80B1/TT15, anthocyanin acyl transferase (AAT), anthocyanin O-methyl transferase (OMT), BAHD acyltransferase, glutathione S-transferase (GST)-like ligandins/TT19, TT10, anthocyanin transporting ABC transporter, anthocyanin and PA precursor transporting MATE, P3A-ATPase AHA10/TT13, and GFS9/TT9 (Table 1). Download: PPT PowerPoint slide PNG larger image TIFF original image Table 1. Flavonoid biosynthesis steps in the KIPEs3 repertoire. Steps added during the development of KIPEs3 are indicated with an asterisk next to the label. Steps of the core flavonoid aglycone biosynthesis were part of the initial KIPEs release [12]. Required bait sequences and related information are available via GitHub [55]. https://doi.org/10.1371/journal.pone.0294342.t001 Major technical updates The initial Python2 implementation of KIPEs [12] was converted into a Python3 implementation. Syntax changes and the replacement of modules were an integral part of this transformation process. This technical upgrade ensures that more users are able to run KIPEs without installing an outdated Python version. It is also important for long term preservation of KIPEs and to avoid security issues. While upgrading the code to the latest Python version, we also added new features. The established KIPEs components form the basis of KIPEs3 starting with the identification of orthologs for all bait sequences in the new species of interest (Fig 1). Each step in the pathway (enzyme/gene) is represented by one FASTA file containing numerous bait sequences from different plant species. Each collection of bait sequences is used to identify orthologs, i.e. most likely candidates, in the species of interest. This approach is based on the established assumption that orthologs are likely to have the same function (see [12] for details). This identification of orthologs is now based on the widely used Python module dendropy [73]. However, KIPEs3 is not restricted to finding orthologs, but checks a larger set of homologs for their overall sequence similarity and the presence of functionally relevant amino acids. The function of an enzyme is not determined by overall sequence similarity, but can depend on a few essential amino acid residues. Download: PPT PowerPoint slide PNG larger image TIFF original image Fig 1. Simplified illustration of central KIPEs3 steps and functions that are included in the standalone and in the online version. Transcriptome or peptide sequences are supplied by the user. Initial candidates are identified by BLAST and checked through global alignments, phylogenetic trees, and the inspection of functionally relevant amino acid residues. Phylogenetic trees are constructed for all final candidates. The workflow illustration is an extended version of the one published with the initial KIPEs version[12]. https://doi.org/10.1371/journal.pone.0294342.g001 A new option allows users to provide expression data that are subjected to an all-vs-all co-expression analysis among the identified candidates. It is generally expected that genes associated with following or preceding steps of the same branch in a biosynthesis pathway show a similar expression pattern, i.e. co-expression with other genes in this pathway branch. A strong co-expression value can further boost the reliability of identified candidates. KIPEs3 will produce a file that can serve as Cytoscape [74] input to visualize the gene expression network (Fig 2). This co-expression feature can also help to identify the most important candidate if multiple close paralogs exist for a given function. Download: PPT PowerPoint slide PNG larger image TIFF original image Fig 2. Co-expression network inferred by KIPEs3 and visualized by Cytoscape. Flavonol biosynthesis genes in Malus domestica (A) and Manihot esculenta (B) were analyzed based on co-expression. The anthocyanin biosynthesis genes were studied in Musa acuminata (C) and Theobroma cacao (D). https://doi.org/10.1371/journal.pone.0294342.g002 Many KIPEs users expressed an interest in phylogenetic trees that place the identified candidates in the context of previously characterized homologous sequences. These trees can help to explore the relationships of multiple candidates to previously characterized bait or reference sequences. KIPEs3 can automatically construct trees of the identified candidates and the corresponding bait sequences for all steps in the pathway. This enables a manual inspection without the need for additional phylogenetic analyses. KIPEs3 generates files in the Newick format which can be visualized and modified for publication with established tools like FigTree or iTOL [75]. KIPEs is able to start with coding sequences or with peptide sequences. It is even possible to provide a genome sequence, but performing the gene prediction with dedicated tools is recommended for most applications. Generally, it is best to provide peptide sequences as this speeds up the analysis. Users define the type of input sequences that they are providing. Since this information might not always be accurate, an option was added to check if the provided sequences comprise amino acids or nucleotides. A warning is shown if the result of this analysis contradicts the sequence type set by the user. However, it remains unfeasible to distinguish a collection of coding sequences from a highly fragmented genome sequence assembly based on short reads. Therefore, it remains the users’ responsibility to set the correct sequence type. Given the power of modern hardware, the computational resources required for running KIPEs are neglectable. A single core and less than 2GB of memory are already sufficient for the analysis. The entire analysis is completed within minutes, but the precise run time depends on the specific settings and size of the analyzed data set (Fig 3). Especially large data sets can cause run times of about one hour, but most plant species can be analyzed in less than 20 minutes (Fig 3A). This run time analysis was performed on Phytozome data sets of a wide range of plant species to simulate a realistic application of KIPEs3. Download: PPT PowerPoint slide PNG larger image TIFF original image Fig 3. KIPEs3 run time analysis based on 121 Phytozome data sets (A) revealed that the run time is depending on the number of sequences in the input data set (B). https://doi.org/10.1371/journal.pone.0294342.g003 Comparison of KIPEs3 to other databases MetaCyc, Plant Metabolic Network (PMN), and Plant Reactome are comprehensive databases describing the pathways, enzymes, and substrate compounds. While MetaCyc contains information about all domains of life, PMN and Plant Reactome are specific to plants and green algae. KIPEs3 is a specialized tool which provides individual gene level annotation for a user defined biosynthesis pathway in any species. MetaCyc contains only genes with experimentally determined functions in the represented pathways. PMN comprises 126 species-specific datasets and houses PlantCyc which covers over 500 additional plant species. PlantCyc harbors experimentally supported and predicted gene functions. The predicted enzymes are classified using the Ensemble Enzyme Prediction Pipeline (E2P2) software and RPSD v4.2 [76]. E2P2 uses BLAST [77] and PRIAM [78] to assign functions using sequence similarity to previously-known genes sourced from MetaCyc, BRENDA [79], and SwissProt [80] among others. The genome sequences used for this annotation are selected on the basis of at least 75% BUSCO (Benchmarking Universal Single-Copy Orthologs) completeness [34]. Plant Reactome uses Oryza sativa as a reference species to predict enzymes in 120 other species currently available in the database. Enzyme prediction in a species is based on orthologs of that enzyme in rice. Enzyme prediction in KIPEs3 is more stringent as it starts with ortholog identification based on multiple experimentally identified genes defined in literature (bait sequences). Ortholog identification is followed by an in-depth sequence similarity analysis including a check of the presence of functionally relevant amino acids. Only those candidates which pass all these steps are reported as most likely functional. Moreover, KIPEs3 can also identify similar candidates that do not fulfill all selection criteria. Contrary to most databases, researchers can use their own assembled dataset when running KIPEs3. This option enables the functional annotation of new genome sequences to identify a particular pathway of interest, given some prior knowledge about the pathway. Some features of these databases compared to KIPEs are summarized in Table 2. Download: PPT PowerPoint slide PNG larger image TIFF original image Table 2. Comparison of KIPEs3 against established pathway databases. https://doi.org/10.1371/journal.pone.0294342.t002 Functional annotation performance of KIPEs3 compared to other databases We compared the annotation results for FLS and LYC-b in terms of correctness of the annotation. Functional annotation correctness of a candidate can be assessed (a) in terms of correct amino acid residues required for a functional enzyme, (b) in terms of its position in a phylogenetic tree with characterized orthologous sequences, and (c) in terms of expression pattern. A higher gene expression is usually suggesting functional relevance of a given candidate. Co-expression with other members of the same biosynthesis pathways is also helpful in identifying the most promising candidates. We followed a combined approach for LYC-b and FLS annotation. The annotation correctness was assessed by the co-expression of a given candidate with other genes of the flavonoid or carotenoid biosynthesis, respectively. The presence of functionally relevant amino acid residues served as the final and deciding factor for declaring a candidate gene as “functional”. Lycopene β cyclase (LYC-b) is one of the key enzymes in the carotenoid biosynthesis pathway and catalyzes the β-cyclization of both ends of lycopene to produce β-lycopene. LYC-b is known to have conserved domains such as “dinucleotide binding motif”, “Cyclase motif I and II”, “Conserved region β-LYC”, and a “β cyclase motif” [81]. Since studies about the amino acid residues responsible for catalytic sites are still lacking, a phylogenetic tree of LYC-b candidates was constructed to identify likely functional orthologs. The carotenoid genes, neoxanthin synthase (NSY) and lycopene ε-cyclase (LYC-ε), were taken as an outgroup due to their high structure similarity to LYC-b [82] (S2 and S3 Files). FLS is a key enzyme of the flavonol biosynthesis pathway and catalyzes the oxidation of dihydroflavonols to flavonols. FLS belongs to the 2-oxoglutarate dependent dioxygenase (2-ODD) superfamily. They are characterized by conserved 2-oxoglutarate binding residues (Arg287 and Ser289) and conserved ferrous iron-binding residues (His221, Asp223, and His277) [83]. A functional FLS also possesses residues responsible for proper folding of 2-ODD polypeptide (Gly68, His75, Pro207, and Gly261) [84]. FLS specific motifs, “PxxxIRxxxEQP” and “SxxTxLVP”, distinguish FLS from other plant 2-ODDs like F3H, ANS, and FNS II (S4 File). We used two metrics (specificity and accuracy) to characterize the annotation performance of the different tools and databases. Specificity describes how well the database/tool can annotate a functional enzyme. It is calculated as the ratio of the number of correct candidates predicted by the database/tool to the total number of functional candidates present in that species. Accuracy denotes the ability of a database/tool to discriminate between functional and non-functional candidates. It is calculated as the ratio of correct predictions to the total predictions made by a database/tool for a given species and a gene in a pathway. Finally, the annotation performance is calculated as the average sum of specificity and accuracy. For LYC-b annotation, we found that the performance of Plant Reactome and KIPEs was 100% in all three species, Arabidopsis, tomato, and carrot. MetaCyc showed 100% annotation performance for Arabidopsis, 75% for tomato, and dropped to 0% for carrot as it did not annotate any LYC-b gene in carrot. Despite the high specificity (100%) of Plant Metabolic Network, the performance was between 65–75% for all three species due to low accuracy. (Fig 4(A)–4(F)). KIPEs3 maintained its 100% performance in all three species, Arabidopsis, chickpea, and orange, used for FLS annotation as well. Like KIPEs3, MetaCyc provides a correct Arabidopsis FLS annotation. However, MetaCyc does not report any FLS in chickpea and orange, while KIPEs3 identified candidates in both species. Plant Reactome was neither specific nor accurate in FLS annotation and failed to annotate any candidate in any of the above species. Plant Metabolic Network showed low performance (~0–60%) again due to low accuracy (Fig 4 (G)–4(I)). Download: PPT PowerPoint slide PNG larger image TIFF original image Fig 4. Functional annotation tool comparison using the LYC-b gene from the carotenoid biosynthesis pathway and the FLS gene from the flavonoid biosynthesis pathway. Comparison of LYC-b annotation (a-f) between KIPEs and three databases. The number of annotated (red dashed line) and correctly annotated (black solid line) LYC-b genes is displayed per tool/database for Arabidopsis thaliana (a), Solanum lycopersicum (b), and Daucus carota (c). The corresponding gene expression (TPM) of all candidate genes was investigated to differentiate between genes and potential pseudogenes (d-f) for the three species, respectively. Functional gene IDs are written in boldface. Comparison of FLS annotation (g-l) between KIPEs and three databases. The number of annotated (red dashed line) and correctly annotated (black solid line) FLS genes is displayed per tool/database for Arabidopsis thaliana (g), Cicer arietinum (h), and Citrus sinensis (i). The corresponding gene expression (TPM) of all candidate genes was investigated to differentiate between genes and potential pseudogenes (j-l) for the three species, respectively. Functional gene IDs are written in boldface. https://doi.org/10.1371/journal.pone.0294342.g004 Plant Metabolic Network exhibited the least accuracy which is likely due to the function prediction approach using sequence similarity alone which can lead to inaccurate prediction. Highly similar enzymes belonging to the same family might have similar sequences, but may have different functions in vivo. The performance of Plant Reactome was also low in case of FLS annotation emphasizing the downfall of using only rice in curation and prediction, as genes in species distant from rice will be predicted less reliably and genes missing in rice cannot be predicted in other species. MetaCyc contains only experimentally evidenced genes, hence has high accuracy but low specificity. This also leads to annotation of a limited number of genes and species. In summary, KIPEs3 showed the best performance for functional annotation of genes having both high accuracy as well as specificity. This makes KIPEs3 a good tool for annotating novel genome sequences with focus on a particular pathway with a substantial amount of prior knowledge about the involved genes in other species. KIPEs3 is available as an online service KIPEs3 is provided as an online service hosted on a webserver at TU Braunschweig [85]. Users can upload their FASTA files containing coding sequences or peptide sequences of a species. Bait sequence sets are stored on the server and can be directly used without the need to be manually uploaded by the user. Optional parameters are pre-configured to allow an automatic analysis with default settings, but can also be modified by the user through the web interface if necessary. Users may opt-in to receive email notifications once their analysis is finished and the results are available. Alternatively, users may choose to store a cookie to allow their browser to keep track of the progress. The archive provided for download includes all result files that users would get when running KIPEs3 locally. The web server is implemented in Python3 using the Django web framework v4 [86]. Whenever a user uploads a file and submits a new job through the form on the website, the form data is sent to the server via an HTTP POST request and the file is temporarily stored on the server machine. The job is placed into a queue and executed as soon as there are enough free resources (disk space, memory, and CPU cores) available. KIPEs3 and all its dependencies are installed on the server machine. The user is redirected to a page with a console showing the command that is executed, and the program output is displayed in real time using XMLHttpRequests. All user supplied data are deleted from the server 48 hours after the job has been finished and the results have been downloaded. Analysis of the flavonoid biosynthesis in A. thaliana Nd-1, Capsella rubella, and Dioscorea dumetorum First, a technical check was performed by analyzing the flavonoid biosynthesis in the A. thaliana accession Niederzenz-1 (Nd-1). It is expected that orthologs of all characterized genes of the reference accession Columbia-0 (Col-0) are also present in Nd-1, while the sequences are slightly different from the bait sequences. All genes known in Col-0 were also discovered in Nd-1 (S2 File). Next, Capsella grandiflora was analyzed with KIPEs3 to show that the identification of flavonoid biosynthesis genes works also in species that are not represented in most of the bait sequence sets like A. thaliana. Genes present in A. thaliana were also discovered in the close relative C. grandiflora (S3 File). To demonstrate the potential of KIPEs3 to annotate the flavonoid biosynthesis in a distant relative of A. thaliana, the flavonoid biosynthesis of Dioscorea dumetorum, which belongs to the monocots, was analyzed (S5 File). The new flavonoid biosynthesis pathway steps added to the repertoire of KIPEs3 include the diverse and promiscuous decoration enzymes like glycosyltransferases, acyltransferases, and methyltransferases. Since some of these steps might be species-specific, it is possible that some steps are missing in a given plant species of interest. This requires specific attention when interpreting the KIPEs3 results to avoid false positives. While the aglycon biosynthesis is highly conserved and catalyzed by well characterized enzymes, flavonoid decorating enzymes are less studied and show a high level of promiscuity [87–89]. This poses a challenge for the identification through sequence similarity. Little is known about the functionally relevant amino acids in the flavonoid decorating enzymes. In addition, this extension of the KIPEs3 flavonoid biosynthesis bait sequence sets is no longer restricted to enzymes, but also includes flavonoid transporters. They do not have an active center that could be composed of functionally relevant amino acids. This turns the reliable identification of transport-associated proteins into a remaining challenge. Analysis of the carotenoid biosynthesis A novel set of bait sequences was compiled to enable the annotation of the carotenoid biosynthesis with KIPEs3 (Table 3). The carotenoid biosynthesis is well studied in many plant species including A. thaliana [90]. The set of compiled bait and reference sequences was evaluated by running the annotation workflow on A. thaliana Nd-1, Daucus carota, C. rubella, and D. dumetorum (Fig 5; S3 File). Daucus carota carotenoid biosynthesis genes previously described in the literature [91] were recovered by KIPEs3 (S6 File). This supports the accuracy of the candidate gene identification. Download: PPT PowerPoint slide PNG larger image TIFF original image Fig 5. Carotenoid biosynthesis genes in Arabidopsis thaliana Nd-1. The carotenoid biosynthesis comprises a linear part (A) and a branching part (B). https://doi.org/10.1371/journal.pone.0294342.g005 Download: PPT PowerPoint slide PNG larger image TIFF original image Table 3. Carotenoid biosynthesis genes. 1 most likely non-functional 2 BCH duplication after monocot-dicot split. https://doi.org/10.1371/journal.pone.0294342.t003 Transcriptional regulation of biosynthesis pathways Many plant biosynthesis pathways are regulated at the transcriptional level by members of large transcription factor families. Examples are the R2R3-MYBs and bHLHs that form complexes involved in the control of the flavonoid biosynthesis [106]. As we demonstrated previously, it is possible to analyze large gene families like MYBs and bHLHs with KIPEs [12]. However, KIPEs3 is optimized for the analysis of pathways that comprise genes with low copy numbers. Most structural genes in these pathways are part of a small gene family. Large gene families like the P450 cytochromes (CYPs) or 2-oxoglutarate dioxygenases (2-ODDs) comprise multiple smaller subfamilies with distinct biological functions. Only the individual subfamilies are relevant for the analysis with KIPEs3. Ideally, the application of KIPEs3 should be restricted to the annotation of structural genes in a biosynthesis. While it is possible to investigate large transcription factor families, dedicated tools will perform better. Members of the MYB and bHLH gene families can be annotated by specific tools [47]. General phylogeny-based approaches can be deployed to study other involved transcription factor families [107, 108]. Extended bait sequence collection The initial KIPEs version was exclusively focused on annotating the biosynthesis of flavonoid aglycones [12]. Now, KIPEs3 is able to identify additional candidates that are relevant for the decoration of flavonoid aglycones. The basis for the identification of candidates in new data sets are bait sequences that are previously characterized sequences from other species. The initial KIPEs bait sequence collection was extended by adding anthocyanidin rhamnosyltransferase (UGT79B2 and UGT79B3), anthocyanin 5-O-glucosyltransferase (UGT75C1), flavonoid 3-O-glucosyltransferase (UGT78D1 and UGT78D2), UGT72L1, UGT80B1/TT15, anthocyanin acyl transferase (AAT), anthocyanin O-methyl transferase (OMT), BAHD acyltransferase, glutathione S-transferase (GST)-like ligandins/TT19, TT10, anthocyanin transporting ABC transporter, anthocyanin and PA precursor transporting MATE, P3A-ATPase AHA10/TT13, and GFS9/TT9 (Table 1). Download: PPT PowerPoint slide PNG larger image TIFF original image Table 1. Flavonoid biosynthesis steps in the KIPEs3 repertoire. Steps added during the development of KIPEs3 are indicated with an asterisk next to the label. Steps of the core flavonoid aglycone biosynthesis were part of the initial KIPEs release [12]. Required bait sequences and related information are available via GitHub [55]. https://doi.org/10.1371/journal.pone.0294342.t001 Major technical updates The initial Python2 implementation of KIPEs [12] was converted into a Python3 implementation. Syntax changes and the replacement of modules were an integral part of this transformation process. This technical upgrade ensures that more users are able to run KIPEs without installing an outdated Python version. It is also important for long term preservation of KIPEs and to avoid security issues. While upgrading the code to the latest Python version, we also added new features. The established KIPEs components form the basis of KIPEs3 starting with the identification of orthologs for all bait sequences in the new species of interest (Fig 1). Each step in the pathway (enzyme/gene) is represented by one FASTA file containing numerous bait sequences from different plant species. Each collection of bait sequences is used to identify orthologs, i.e. most likely candidates, in the species of interest. This approach is based on the established assumption that orthologs are likely to have the same function (see [12] for details). This identification of orthologs is now based on the widely used Python module dendropy [73]. However, KIPEs3 is not restricted to finding orthologs, but checks a larger set of homologs for their overall sequence similarity and the presence of functionally relevant amino acids. The function of an enzyme is not determined by overall sequence similarity, but can depend on a few essential amino acid residues. Download: PPT PowerPoint slide PNG larger image TIFF original image Fig 1. Simplified illustration of central KIPEs3 steps and functions that are included in the standalone and in the online version. Transcriptome or peptide sequences are supplied by the user. Initial candidates are identified by BLAST and checked through global alignments, phylogenetic trees, and the inspection of functionally relevant amino acid residues. Phylogenetic trees are constructed for all final candidates. The workflow illustration is an extended version of the one published with the initial KIPEs version[12]. https://doi.org/10.1371/journal.pone.0294342.g001 A new option allows users to provide expression data that are subjected to an all-vs-all co-expression analysis among the identified candidates. It is generally expected that genes associated with following or preceding steps of the same branch in a biosynthesis pathway show a similar expression pattern, i.e. co-expression with other genes in this pathway branch. A strong co-expression value can further boost the reliability of identified candidates. KIPEs3 will produce a file that can serve as Cytoscape [74] input to visualize the gene expression network (Fig 2). This co-expression feature can also help to identify the most important candidate if multiple close paralogs exist for a given function. Download: PPT PowerPoint slide PNG larger image TIFF original image Fig 2. Co-expression network inferred by KIPEs3 and visualized by Cytoscape. Flavonol biosynthesis genes in Malus domestica (A) and Manihot esculenta (B) were analyzed based on co-expression. The anthocyanin biosynthesis genes were studied in Musa acuminata (C) and Theobroma cacao (D). https://doi.org/10.1371/journal.pone.0294342.g002 Many KIPEs users expressed an interest in phylogenetic trees that place the identified candidates in the context of previously characterized homologous sequences. These trees can help to explore the relationships of multiple candidates to previously characterized bait or reference sequences. KIPEs3 can automatically construct trees of the identified candidates and the corresponding bait sequences for all steps in the pathway. This enables a manual inspection without the need for additional phylogenetic analyses. KIPEs3 generates files in the Newick format which can be visualized and modified for publication with established tools like FigTree or iTOL [75]. KIPEs is able to start with coding sequences or with peptide sequences. It is even possible to provide a genome sequence, but performing the gene prediction with dedicated tools is recommended for most applications. Generally, it is best to provide peptide sequences as this speeds up the analysis. Users define the type of input sequences that they are providing. Since this information might not always be accurate, an option was added to check if the provided sequences comprise amino acids or nucleotides. A warning is shown if the result of this analysis contradicts the sequence type set by the user. However, it remains unfeasible to distinguish a collection of coding sequences from a highly fragmented genome sequence assembly based on short reads. Therefore, it remains the users’ responsibility to set the correct sequence type. Given the power of modern hardware, the computational resources required for running KIPEs are neglectable. A single core and less than 2GB of memory are already sufficient for the analysis. The entire analysis is completed within minutes, but the precise run time depends on the specific settings and size of the analyzed data set (Fig 3). Especially large data sets can cause run times of about one hour, but most plant species can be analyzed in less than 20 minutes (Fig 3A). This run time analysis was performed on Phytozome data sets of a wide range of plant species to simulate a realistic application of KIPEs3. Download: PPT PowerPoint slide PNG larger image TIFF original image Fig 3. KIPEs3 run time analysis based on 121 Phytozome data sets (A) revealed that the run time is depending on the number of sequences in the input data set (B). https://doi.org/10.1371/journal.pone.0294342.g003 Comparison of KIPEs3 to other databases MetaCyc, Plant Metabolic Network (PMN), and Plant Reactome are comprehensive databases describing the pathways, enzymes, and substrate compounds. While MetaCyc contains information about all domains of life, PMN and Plant Reactome are specific to plants and green algae. KIPEs3 is a specialized tool which provides individual gene level annotation for a user defined biosynthesis pathway in any species. MetaCyc contains only genes with experimentally determined functions in the represented pathways. PMN comprises 126 species-specific datasets and houses PlantCyc which covers over 500 additional plant species. PlantCyc harbors experimentally supported and predicted gene functions. The predicted enzymes are classified using the Ensemble Enzyme Prediction Pipeline (E2P2) software and RPSD v4.2 [76]. E2P2 uses BLAST [77] and PRIAM [78] to assign functions using sequence similarity to previously-known genes sourced from MetaCyc, BRENDA [79], and SwissProt [80] among others. The genome sequences used for this annotation are selected on the basis of at least 75% BUSCO (Benchmarking Universal Single-Copy Orthologs) completeness [34]. Plant Reactome uses Oryza sativa as a reference species to predict enzymes in 120 other species currently available in the database. Enzyme prediction in a species is based on orthologs of that enzyme in rice. Enzyme prediction in KIPEs3 is more stringent as it starts with ortholog identification based on multiple experimentally identified genes defined in literature (bait sequences). Ortholog identification is followed by an in-depth sequence similarity analysis including a check of the presence of functionally relevant amino acids. Only those candidates which pass all these steps are reported as most likely functional. Moreover, KIPEs3 can also identify similar candidates that do not fulfill all selection criteria. Contrary to most databases, researchers can use their own assembled dataset when running KIPEs3. This option enables the functional annotation of new genome sequences to identify a particular pathway of interest, given some prior knowledge about the pathway. Some features of these databases compared to KIPEs are summarized in Table 2. Download: PPT PowerPoint slide PNG larger image TIFF original image Table 2. Comparison of KIPEs3 against established pathway databases. https://doi.org/10.1371/journal.pone.0294342.t002 Functional annotation performance of KIPEs3 compared to other databases We compared the annotation results for FLS and LYC-b in terms of correctness of the annotation. Functional annotation correctness of a candidate can be assessed (a) in terms of correct amino acid residues required for a functional enzyme, (b) in terms of its position in a phylogenetic tree with characterized orthologous sequences, and (c) in terms of expression pattern. A higher gene expression is usually suggesting functional relevance of a given candidate. Co-expression with other members of the same biosynthesis pathways is also helpful in identifying the most promising candidates. We followed a combined approach for LYC-b and FLS annotation. The annotation correctness was assessed by the co-expression of a given candidate with other genes of the flavonoid or carotenoid biosynthesis, respectively. The presence of functionally relevant amino acid residues served as the final and deciding factor for declaring a candidate gene as “functional”. Lycopene β cyclase (LYC-b) is one of the key enzymes in the carotenoid biosynthesis pathway and catalyzes the β-cyclization of both ends of lycopene to produce β-lycopene. LYC-b is known to have conserved domains such as “dinucleotide binding motif”, “Cyclase motif I and II”, “Conserved region β-LYC”, and a “β cyclase motif” [81]. Since studies about the amino acid residues responsible for catalytic sites are still lacking, a phylogenetic tree of LYC-b candidates was constructed to identify likely functional orthologs. The carotenoid genes, neoxanthin synthase (NSY) and lycopene ε-cyclase (LYC-ε), were taken as an outgroup due to their high structure similarity to LYC-b [82] (S2 and S3 Files). FLS is a key enzyme of the flavonol biosynthesis pathway and catalyzes the oxidation of dihydroflavonols to flavonols. FLS belongs to the 2-oxoglutarate dependent dioxygenase (2-ODD) superfamily. They are characterized by conserved 2-oxoglutarate binding residues (Arg287 and Ser289) and conserved ferrous iron-binding residues (His221, Asp223, and His277) [83]. A functional FLS also possesses residues responsible for proper folding of 2-ODD polypeptide (Gly68, His75, Pro207, and Gly261) [84]. FLS specific motifs, “PxxxIRxxxEQP” and “SxxTxLVP”, distinguish FLS from other plant 2-ODDs like F3H, ANS, and FNS II (S4 File). We used two metrics (specificity and accuracy) to characterize the annotation performance of the different tools and databases. Specificity describes how well the database/tool can annotate a functional enzyme. It is calculated as the ratio of the number of correct candidates predicted by the database/tool to the total number of functional candidates present in that species. Accuracy denotes the ability of a database/tool to discriminate between functional and non-functional candidates. It is calculated as the ratio of correct predictions to the total predictions made by a database/tool for a given species and a gene in a pathway. Finally, the annotation performance is calculated as the average sum of specificity and accuracy. For LYC-b annotation, we found that the performance of Plant Reactome and KIPEs was 100% in all three species, Arabidopsis, tomato, and carrot. MetaCyc showed 100% annotation performance for Arabidopsis, 75% for tomato, and dropped to 0% for carrot as it did not annotate any LYC-b gene in carrot. Despite the high specificity (100%) of Plant Metabolic Network, the performance was between 65–75% for all three species due to low accuracy. (Fig 4(A)–4(F)). KIPEs3 maintained its 100% performance in all three species, Arabidopsis, chickpea, and orange, used for FLS annotation as well. Like KIPEs3, MetaCyc provides a correct Arabidopsis FLS annotation. However, MetaCyc does not report any FLS in chickpea and orange, while KIPEs3 identified candidates in both species. Plant Reactome was neither specific nor accurate in FLS annotation and failed to annotate any candidate in any of the above species. Plant Metabolic Network showed low performance (~0–60%) again due to low accuracy (Fig 4 (G)–4(I)). Download: PPT PowerPoint slide PNG larger image TIFF original image Fig 4. Functional annotation tool comparison using the LYC-b gene from the carotenoid biosynthesis pathway and the FLS gene from the flavonoid biosynthesis pathway. Comparison of LYC-b annotation (a-f) between KIPEs and three databases. The number of annotated (red dashed line) and correctly annotated (black solid line) LYC-b genes is displayed per tool/database for Arabidopsis thaliana (a), Solanum lycopersicum (b), and Daucus carota (c). The corresponding gene expression (TPM) of all candidate genes was investigated to differentiate between genes and potential pseudogenes (d-f) for the three species, respectively. Functional gene IDs are written in boldface. Comparison of FLS annotation (g-l) between KIPEs and three databases. The number of annotated (red dashed line) and correctly annotated (black solid line) FLS genes is displayed per tool/database for Arabidopsis thaliana (g), Cicer arietinum (h), and Citrus sinensis (i). The corresponding gene expression (TPM) of all candidate genes was investigated to differentiate between genes and potential pseudogenes (j-l) for the three species, respectively. Functional gene IDs are written in boldface. https://doi.org/10.1371/journal.pone.0294342.g004 Plant Metabolic Network exhibited the least accuracy which is likely due to the function prediction approach using sequence similarity alone which can lead to inaccurate prediction. Highly similar enzymes belonging to the same family might have similar sequences, but may have different functions in vivo. The performance of Plant Reactome was also low in case of FLS annotation emphasizing the downfall of using only rice in curation and prediction, as genes in species distant from rice will be predicted less reliably and genes missing in rice cannot be predicted in other species. MetaCyc contains only experimentally evidenced genes, hence has high accuracy but low specificity. This also leads to annotation of a limited number of genes and species. In summary, KIPEs3 showed the best performance for functional annotation of genes having both high accuracy as well as specificity. This makes KIPEs3 a good tool for annotating novel genome sequences with focus on a particular pathway with a substantial amount of prior knowledge about the involved genes in other species. KIPEs3 is available as an online service KIPEs3 is provided as an online service hosted on a webserver at TU Braunschweig [85]. Users can upload their FASTA files containing coding sequences or peptide sequences of a species. Bait sequence sets are stored on the server and can be directly used without the need to be manually uploaded by the user. Optional parameters are pre-configured to allow an automatic analysis with default settings, but can also be modified by the user through the web interface if necessary. Users may opt-in to receive email notifications once their analysis is finished and the results are available. Alternatively, users may choose to store a cookie to allow their browser to keep track of the progress. The archive provided for download includes all result files that users would get when running KIPEs3 locally. The web server is implemented in Python3 using the Django web framework v4 [86]. Whenever a user uploads a file and submits a new job through the form on the website, the form data is sent to the server via an HTTP POST request and the file is temporarily stored on the server machine. The job is placed into a queue and executed as soon as there are enough free resources (disk space, memory, and CPU cores) available. KIPEs3 and all its dependencies are installed on the server machine. The user is redirected to a page with a console showing the command that is executed, and the program output is displayed in real time using XMLHttpRequests. All user supplied data are deleted from the server 48 hours after the job has been finished and the results have been downloaded. Analysis of the flavonoid biosynthesis in A. thaliana Nd-1, Capsella rubella, and Dioscorea dumetorum First, a technical check was performed by analyzing the flavonoid biosynthesis in the A. thaliana accession Niederzenz-1 (Nd-1). It is expected that orthologs of all characterized genes of the reference accession Columbia-0 (Col-0) are also present in Nd-1, while the sequences are slightly different from the bait sequences. All genes known in Col-0 were also discovered in Nd-1 (S2 File). Next, Capsella grandiflora was analyzed with KIPEs3 to show that the identification of flavonoid biosynthesis genes works also in species that are not represented in most of the bait sequence sets like A. thaliana. Genes present in A. thaliana were also discovered in the close relative C. grandiflora (S3 File). To demonstrate the potential of KIPEs3 to annotate the flavonoid biosynthesis in a distant relative of A. thaliana, the flavonoid biosynthesis of Dioscorea dumetorum, which belongs to the monocots, was analyzed (S5 File). The new flavonoid biosynthesis pathway steps added to the repertoire of KIPEs3 include the diverse and promiscuous decoration enzymes like glycosyltransferases, acyltransferases, and methyltransferases. Since some of these steps might be species-specific, it is possible that some steps are missing in a given plant species of interest. This requires specific attention when interpreting the KIPEs3 results to avoid false positives. While the aglycon biosynthesis is highly conserved and catalyzed by well characterized enzymes, flavonoid decorating enzymes are less studied and show a high level of promiscuity [87–89]. This poses a challenge for the identification through sequence similarity. Little is known about the functionally relevant amino acids in the flavonoid decorating enzymes. In addition, this extension of the KIPEs3 flavonoid biosynthesis bait sequence sets is no longer restricted to enzymes, but also includes flavonoid transporters. They do not have an active center that could be composed of functionally relevant amino acids. This turns the reliable identification of transport-associated proteins into a remaining challenge. Analysis of the carotenoid biosynthesis A novel set of bait sequences was compiled to enable the annotation of the carotenoid biosynthesis with KIPEs3 (Table 3). The carotenoid biosynthesis is well studied in many plant species including A. thaliana [90]. The set of compiled bait and reference sequences was evaluated by running the annotation workflow on A. thaliana Nd-1, Daucus carota, C. rubella, and D. dumetorum (Fig 5; S3 File). Daucus carota carotenoid biosynthesis genes previously described in the literature [91] were recovered by KIPEs3 (S6 File). This supports the accuracy of the candidate gene identification. Download: PPT PowerPoint slide PNG larger image TIFF original image Fig 5. Carotenoid biosynthesis genes in Arabidopsis thaliana Nd-1. The carotenoid biosynthesis comprises a linear part (A) and a branching part (B). https://doi.org/10.1371/journal.pone.0294342.g005 Download: PPT PowerPoint slide PNG larger image TIFF original image Table 3. Carotenoid biosynthesis genes. 1 most likely non-functional 2 BCH duplication after monocot-dicot split. https://doi.org/10.1371/journal.pone.0294342.t003 Transcriptional regulation of biosynthesis pathways Many plant biosynthesis pathways are regulated at the transcriptional level by members of large transcription factor families. Examples are the R2R3-MYBs and bHLHs that form complexes involved in the control of the flavonoid biosynthesis [106]. As we demonstrated previously, it is possible to analyze large gene families like MYBs and bHLHs with KIPEs [12]. However, KIPEs3 is optimized for the analysis of pathways that comprise genes with low copy numbers. Most structural genes in these pathways are part of a small gene family. Large gene families like the P450 cytochromes (CYPs) or 2-oxoglutarate dioxygenases (2-ODDs) comprise multiple smaller subfamilies with distinct biological functions. Only the individual subfamilies are relevant for the analysis with KIPEs3. Ideally, the application of KIPEs3 should be restricted to the annotation of structural genes in a biosynthesis. While it is possible to investigate large transcription factor families, dedicated tools will perform better. Members of the MYB and bHLH gene families can be annotated by specific tools [47]. General phylogeny-based approaches can be deployed to study other involved transcription factor families [107, 108]. Conclusions While the generation of high-quality genome sequences is turning into a routine task, the structural and especially the functional annotation of these sequences is an increasing challenge. The ability to annotate biosynthesis pathways beyond the flavonoid biosynthesis positions KIPEs3 as a versatile platform for functional annotation in projects focused on specialized metabolites. We demonstrated the capabilities of KIPEs3 by comparing the annotation of FLS and LYC-b in three species against the results of other popular tools and databases. We showed that the use of broad taxonomic range bait sequences and functionally relevant amino acids allows KIPEs3 to recover more correct annotations per pathway than other tools that rely on general sequence similarity. KIPEs3 can easily annotate novel genome sequences with focus on a particular pathway while this is not possible with other annotation tools. Although we focused on plant metabolism in this study, KIPEs3 could also be applied for the investigation of non-plant organisms like fungi or bacteria. A sufficient amount of prior knowledge about the pathway in other species is the only requirement. The automatic annotation of biosynthesis pathways paves the way for comparative genomic studies that are looking for species-specific differences or intraspecific variation. Supporting information S1 File. Accession numbers of the analyzed RNA-seq datasets. https://doi.org/10.1371/journal.pone.0294342.s001 (TXT) S2 File. Multiple sequence alignment of the LYC-b candidates in three species (Arabidopsis thaliana, Solanum lycopersicum, and Daucus carota) annotated by KIPEs3, MetaCyc, Plant Reactome, and Plant Metabolic Network. The alignment was generated using MAFFTv7. Characteristic regions of plant LCY-bs are indicated above the sequences: Dinucleotide binding site, Cyclase motifs (CM) I and II, and Charged region. Domains described as essential for LCY-b activity are shown as LCY-b CAD (Catalytic Activity Domain). The functional LYC-b on the basis of presence of conserved amino acid residues are highlighted in blue color. https://doi.org/10.1371/journal.pone.0294342.s002 (PDF) S3 File. Phylogenetic tree of the LYC-b candidates in the three species (Arabidopsis thaliana, Solanum lycopersicum, and Dacus carota). The highly similar sequences of Neoxanthin synthase (NSY) and Lycopene-e-cylase (LYC-e) were included as outgroup (highlighted in gray). The functional LYC-b clade was highlighted in the phylogenetic tree. https://doi.org/10.1371/journal.pone.0294342.s003 (PDF) S4 File. Multiple sequence alignment of the FLS in three species (Arabidopsis thaliana, Cicer arietinum, and Citrus sinensis) annotated by KIPEs3, MetaCyc, Plant Reactome, and Plant Metabolic Network. The alignment was generated using MAFFTv7. The two black boxes highlight the FLS-specific motifs “PxxxIRxxxEQP” and “SxxTxLVP.” Amino acid residues responsible for binding ferrous iron (H221, D223, and H277) and 2-oxoglutarate (R287 and S289) are marked with black and gray arrows, respectively. The residues known to be involved in the proper folding of the 2-ODD polypeptide are marked with asterisks (G68, H75, P207, and G261). The functional FLS on the basis of the presence of conserved amino acid residues are highlighted in red-yellow color. https://doi.org/10.1371/journal.pone.0294342.s004 (PDF) S5 File. Results of the flavonoid biosynthesis analysis in Arabidopsis thaliana Nd-1, Capsella rubella, and Dioscorea dumetorum. https://doi.org/10.1371/journal.pone.0294342.s005 (TXT) S6 File. Results of the carotenoid biosynthesis analysis in A. thaliana Nd-1, Daucus carota, C. rubella, and D. dumetorum. https://doi.org/10.1371/journal.pone.0294342.s006 (TXT) Acknowledgments Many thanks to the German network for bioinformatics infrastructure (de.NBI, grant 031A533A) and the Bioinformatics Resource Facility (BRF) at the Center for Biotechnology (CeBiTec) at Bielefeld University for providing an environment to perform the computational analyses. bioRender.com was used to construct some of the figures. TI - KIPEs3: Automatic annotation of biosynthesis pathways JF - PLoS ONE DO - 10.1371/journal.pone.0294342 DA - 2023-11-16 UR - https://www.deepdyve.com/lp/public-library-of-science-plos-journal/kipes3-automatic-annotation-of-biosynthesis-pathways-3CFliKfnsj SP - e0294342 VL - 18 IS - 11 DP - DeepDyve ER -