Background: Thinning supplies of natural resources increase attention to sustainable microbial production of bio-based fuels. The strain Clostridium beijerinckii NRRL B-598 is a relatively well-described butanol producer regarding its genotype and phenotype under various conditions. However, a link between these two levels, lying in the description of the gene regulation mechanisms, is missing for this strain, due to the lack of transcriptomic data. Results: In this paper, we present a transcription profile of the strain over the whole fermentation using an RNA-Seq dataset covering six time-points with the current highest dynamic range among solventogenic clostridia. We investigated the accuracy of the genome sequence and particular genome elements, including pseudogenes and prophages. While some pseudogenes were highly expressed, all three identified prophages remained silent. Furthermore, we identified major changes in the transcriptional activity of genes using differential expression analysis between adjacent time-points. We identified functional groups of these significantly regulated genes and together with fermentation and cultivation kinetics captured using liquid chromatography and flow cytometry, we identified basic changes in the metabolism of the strain during fermentation. Interestingly, C. beijerinckii NRRL B-598 demonstrated different behavior in comparison with the closely related strain C. beijerinckii NCIMB 8052 in the latter phases of cultivation. Conclusions: We provided a complex analysis of the C. beijerinckii NRRL B-598 fermentation profile using several technologies, including RNA-Seq. We described the changes in the global metabolism of the strain and confirmed the uniqueness of its behavior. The whole experiment demonstrated a good reproducibility. Therefore, we will be able to repeat the experiment under selected conditions in order to investigate particular metabolic changes and signaling pathways suitable for following targeted engineering. Keywords: Clostridium beijerinckii NRRL B-598, RNA-Seq transcriptome, ABE fermentation Background ethanol (ABE) is currently of great interest . Solvento- While a less costly petroleum refinery still represents the genic Clostridia are widely studied for their ability to main source of fuels and chemicals, limited natural re- produce biofuels from biomass in ABE fermentation . sources and nature protection have increased attention Unfortunately, different genera or even strains of these to sustainable production of bio-based products. These rod-shaped, gram-positive anaerobes show substantial trends make biorefinery the future lucrative producer of differences in phenotypic traits, i.e. the ability to utilize renewable fuels and chemicals. Especially, the microbial different substrates and to produce different substances. production of solvents such as acetone, butanol, and Thus, the findings acquired using model organisms such as C. acetobutylicum ATCC 824 , C. pasteurianum DSM 525 , or C. beijerinckii NCIMB 8052  cannot * Correspondence: email@example.com be applied in general. Fortunately, thanks to a massive Department of Biomedical Engineering, Brno University of Technology, reduction in sequencing costs, a wide range of complete Technicka 12, 616 00 Brno, Czechia Full list of author information is available at the end of the article © The Author(s). 2018 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. Sedlar et al. BMC Genomics (2018) 19:415 Page 2 of 13 or at least draft genomes of solventogenic Clostridia are metabolites formation with acid production in the first now available. These include various strains of C. aceto- period followed by solvents formation (see Fig. 1a). Six butylicum, C. aurantibutyricum, C. beijerinckii, C. diolis, time-points (T1–T6) were selected for RNA-Seq analysis C. felsineum, C. pasteurianum, C. puniceum, C. roseum, to cover all metabolic stages within a period of 23 h. The C. saccharobutylicum, and C. saccharoperbutylacetoni- latter stages were not analyzed due to a high percentage of cum . C. beijerinckii strains, utilizing a wider range of dead and lysing cells (Fig. 1b) causing an insufficient qual- substrates for solvent production seem to be the most ity of RNA samples for RNA-Seq. Individual sampling robust, i.e. able to endure a wide range of environmental points were selected based on the fermentation pattern, conditions, among these . which was monitored on-line as changes in a pH course However, the knowledge of the genomic sequence itself (Fig. 1c). The first sample was collected after an approxi- does not provide any information regarding the gene regu- mate five-fold increase in optical cell density (Fig. 1d) lation, which is crucial to improvements of the strains for while a sharp decrease in pH occurred, so only acidogenic, industrial application. The study of gene expression is non-sporulating and mostly motile cells were expected to therefore irreplaceable in genome engineering. Current be present in the sample. The second time-point was pro- whole transcriptome sequencing technology, referred to posed to cover a transient physiological state between as RNA-Seq, allows the study of transcription on a acidogenesis and solventogenesis, which was indicated by genome-wide scale with an unlimited dynamic range, a pH breakpoint and corresponded to the highest concen- compared to the older microarrays, which only enabled tration of acids in media along with the onset of solvent researchers to track preselected genes . In this paper, formation. No cell-thickening or pre-spore formation was we present transcriptome dynamics during the cultivation observed at this stage. The third sample set was with- of the promising butanol producer, C. beijerinckii NRRL drawn during the period of the most progressive rise in B-598  (formerly misidentified as C. pasteurianum pH, suggesting a high rate of reutilization of the acids, to- NRRL B-598 ) as a result of RNA-Seq profiling. Until gether with solvent formation. Granulose accumulation now, only the transcription of six selected genes involved and early phases of sporulation were observed at this stage in sporulation and solvent production was studied for this (see Additional file 1). The second pH breakpoint was strain using RT-qPCR, yet the study supported the theory covered by the fourth sample, where the rise in pH ceased that solventogenesis is not regulated in the same way in and pH again started to decline, indicating a change in all solventogenic clostridia . Here, we further investi- metabolism. However, there was no apparent increase in gate the specifics of the strain C. beijerinckii NRRL B-598. the production of acids in the fermentation data. The The obtained transcriptome data includes the whole life remaining two samples were taken at the regular cycle of the strain and therefore covers changes in metab- time-intervals, in order to cover all stages of ABE fermen- olism, i.e. acidogenesis, solventogenesis and their transi- tation as well as the sporulation cycle. Overall culture fit- tion state. Together with the sporulation cycle and other ness and spore formation was monitored by flow significant events such as changing motility and adapta- cytometry (FC) and the combined staining of cell culture tion to acid/solvent stress, the whole fermentation process by membrane disruption and enzyme activity indicators: is reflected in this dataset. Flow cytometry, combined with propidium iodide (PI) and carboxyfluorescein diacetate fluorescent staining , has enabled insights into popula- (CFDA), respectively. A relatively high amount of tion heterogeneity and HPLC analysis of metabolites/sub- double-stained cells was present in the culture at all strate; plus, growth curve data has allowed us to better stages. A previous study by Kolek et al.  considered interpret the biological meaning. Moreover, the RNA-Seq these double-stained cells as an active population consist- technology has allowed us to study not only the temporal ing of cell doublets and sporulating cells; therefore, only transcription of any gene but also to explore the accuracy PI-positive cells were counted as dead cells. The staining of the current genome annotation. Compared to the tran- pattern of the Clostridium culture at different time-points scription profiling of the strain C. beijerinckii NCIMB revealed dynamic changes in proportion of active cells 8052, we reached a dynamic range that was approximately within the first 13 h, with a detectable drop at the period 10 times higher. To increase the robustness and validity of with the lowest pH (the sixth hour), thus supporting the the experiment, each of the time-points was represented presumption that cells are highly-stressed by the presence by three biological replicates rather, than verification using of organic acids together with a low pH (when values qPCR . slightly below pH 5 were reached). After the 13th hour, viability gradually decreased and during the 23rd hour the Results first mature spores, released from mother cells, were ob- Cultivation and fermentation kinetics served. The FC data provided a better insight into viability The fermentation profile of C. beijerinckii NRRL changes compared to sole OD measurements, according B-598 showed a typical two-stage course of to which the culture kept on growing steadily until the Sedlar et al. BMC Genomics (2018) 19:415 Page 3 of 13 Fig. 1 Cultivation and fermentation characteristics of Clostridium beijerinckii NRRL B-598. (a) The concentration of glucose, solvents and acids during ABE fermentation. (b) Flow cytometry – the distribution of cells within the population according to their fluorescence pattern for combined staining using PI and CFDA. (c) pH curve for respective cultivation. (d) Cell growth measured as optical density at 600 nm. Values represent the mean of the biological replicates and error bars represent the standard deviations. Time-points (T1–T6) for samples subjected to RNA expression analysis are indicated by red vertical dotted lines and/or by red text labels 18th hour. The only noticeable changes in the OD and hour was reached at the very beginning. Surprisingly, measurements are the two slowdowns during the after a decrease in the acid/solvent switch, the glu- acidogenesis/solventogenesis transient states. The FC cose consumption increased again and accompanied data clearly shows that culture viability had already the T3–T4 transition state with the highest number started to decline at around the 13th hour, which cor- of regulated genes. responds to the apparent decrease in the number of regulated genes from that time. Mapping statistics A proportion of viable cells determined by FC was The whole dataset covered three series of six samples used to calculate the specific glucose consumption rate (six time-points), in which each series represented an in- relating only to the active portion of clostridium culture dependent biological replicate (A, B, and C). Although (see Table 1). The amount of glucose consumed per time series A consisted of reads that were 50 bp long and and biomass unit could help to elucidate the differences in series B and C consisted of reads that were 75 bp long, expressions of glycolysis-related genes. The highest num- the whole series could be processed in the same way. ber of 5.16 g of utilized glucose per gram of active biomass The quality assessment after the first preprocessing steps (demultiplexing, quality trimming, and adapter trim- ming) confirmed an overall high-quality of sequences Table 1 Specific rate of glucose utilization between time-points (average Phred score Q ≈ 35) and no adapter content. chosen for RNA-seq analysis The only following sequence-filtering step was the re- Samples Time interval (h) Specific glucose consumption -1a − 1 moval of the remaining residual rRNA contamination, rate (g.g .h ) even after the rRNA depletion. The rRNA depletion was T1-T2 3.5–6.0 5.16 performed prior to the library construction and the T2-T3 6.0–8.5 2.20 non-captured rRNAs were apparent from the high GC T3-T4 8.5–13.0 2.71 content in some reads. The amount of non-rRNA reads T4-T5 13.0–18.0 2.50 ranged from 7.3 to 20.5 million (see Fig. 2a). Subsequently, T5-T6 18.0–23.0 1.59 we mapped the cleansed reads to the C. beijerinckii NRRL Values were calculated for the concentration of viable cells B-598 genome. Most reads mapped to the genome Sedlar et al. BMC Genomics (2018) 19:415 Page 4 of 13 Fig. 2 Quality of RNA-Seq reads. (a) The total number of reads in particular samples. The color of stacked bars distinguishes between non-rRNA and rRNA reads. (b) Mapping statistics of reads – percentages of uniquely mapped, multi-mapped, and unmapped non-rRNA reads unambiguously, regardless of their different length in rep- Pseudogenes licates A and B, C (see Fig. 2b). Nevertheless, in order to Due to the high number of pseudogenes with detect- cover the expression of duplicated genes that were present able expression, we decided to further investigate in the C. beijerinckii NRRL B-598 genome, the reads map- their coverage by RNA-Seq reads. Only a single ping to multiple loci were also included in the gene ex- pseudogene remained completely silent when ambigu- pression analysis (see Table 2). However, the contribution ously mapping reads were used, while 184 pseudo- of such reads was down-weighted in the expression ana- genes had RPKM > 1 (Reads Per Kilobase per Milion lysis depending on the number of times they mapped to mapped reads) in all six time-points. Using only the genome, so the sum of the total number of reads uniquely mapped reads, eight pseudogenes remained stayed intact. completely silent and 178 were transcribed in every The reads mapping to more genomic objects were also time-point. Although the number of transcribed pseudo- weighted. Such a phenomenon is caused by overlapping genes remained almost the same across the six genes. In the current RefSeq genome (NZ_CP011966.2), time-points, levels of their expression seemed to rise over 285 out of the 5230 genes predicted by NCBI PGAP  time. While pseudogenes formed approximately 2.8% of overlapped by at least one codon and another 66 neigh- C. beijerinckii NRRL B-598 genome, only 0.47% of all boring genes had no space between them. Although none reads in T1 mapped to pseudogenes. However, this num- of the 198 pseudogenes overlapped with another pseudo- ber continuously rose over the time according to the gene, 18 pseudogenes overlapped with genes directly and linear model %mapped =0,1115 ∙ time - 0.0629 (with the another 73 pseudogenes were at a distance from genes regression value 0.9575), resulting in 2.83% of reads to be that could be covered by a single read. These reasons mapped onto pseudogenes in T6. caused single read mapping onto two genomic objects. At To further analyze the activity of pseudogenes, we de- the same time, the transcriptome assembly contained cided to evaluate the coverage of pseudogenes through fewer transcripts compared to the number of genomic the use of transcripts assembled from all the reads in elements with detectable transcription (precisely 4837 our dataset. The accuracy of mapping transcripts to the transcripts vs. 5418 genomic elements) because the over- genome is higher thanks to their length (1057 bp on lapping and nearby genes, e.g. those in the same operon, average). The results are summarized in Table 3. were covered by a single transcript. Due to this fact, tran- There are 24 pseudogenes that were not covered by any scripts could not have been used to resolve overlapping transcript. These were probably completely silent (see genes. On the other hand, their mapping to the genome Additional file 2). The second group consisted of 78 pseu- helped to confirm or disprove transcriptional activity of dogenes that were not covered in their whole length. In pseudogenes and prophages. most cases, there were only short overlaps with transcripts Table 2 Transcriptional activity of genes and pseudogenes Sample T1 (3.5 h) T2 (6 h) T3 (8.5 h) T4 (13 h) T5 (18 h) T6 (23 h) Total No. of genes with RPKM>1 5055 (4981) 5101 (5026) 5162 (5100) 5197 (5139) 5198 (5133) 5193 (5128) 5219 (5158) No. of pseudogenes with RPKM>1 188 (179) 186 (179) 190 (184) 196 (190) 195 (188) 194 (187) 197 (190) 4 4 4 4 4 4 4 Max. expression (RPKM) 4.0∙10 3.4∙10 3.4∙10 3.4∙10 3.4∙10 4.0∙10 4.0∙10 Values in brackets apply to uniquely mapped reads only Sedlar et al. BMC Genomics (2018) 19:415 Page 5 of 13 Table 3 Coverage of pseudogenes by transcripts Not covered Partly covered Fully covered, overlapped transcripts Fully covered, single transcript Frameshifted 7 45 10 33 Missing start and/or stop 15 23 3 37 Internal stop 2 6 0 8 Combined issues 0 4 3 2 Total 24 78 16 80 of active genes neighboring these pseudogenes. In some t-Distributed Stochastic Neighbor Embedding (t-SNE) cases, only part of a transcript was mapped to a pseudo-  dimensionality reduction method on the normalized gene sequence, suggesting that these are silenced duplica- expression data. This final 2D representation showed tions of an active gene. Although genes in the third group that replicates (A, B, and C) were similar to each other were fully covered, this coverage consisted of two or more at particular sampling times (T1–T6), while replicates overlapping transcripts. Therefore, the transcription in sequenced using Illumina HiSeq (A) were slightly more both groups (partly covered and fully covered by overlap- distant to samples from Illumina NextSeq (B and C), see ping transcripts) was highly questionable. On the contrary, Fig. 3b. Overall, samples were divided into two clusters. pseudogenes within the fourth group were fully covered While one cluster contained samples corresponding to by unique transcripts. This group consisted of pseudo- the initial phase of fermentation (up to 8.5th hour), the genes that were transcribed and active genes that were other cluster consisted of samples from the later fermen- possibly misidentified as pseudogenes due to errors in the tation phase (from 13th up to 23rd hour). genome assembly. In comparison with their transcripts, 23 out of 80 pseudogenes (see Additional file 3)in this Differential expression group were missing one nucleotide in homopolymers. We explored differential expression of all genes and This could have been caused by previous sequencing er- pseudogenes with detectable transcription among adja- rors, as Roche 454 in combination with PacBio were used cent time-points, in order to analyze changes in the for the genome assembly. Nevertheless, insertion of these transcription of particular genes over the whole fermen- nucleotides was not detected in all reads mapping to these tation process (see Fig. 4). In total, transcription of 2260 positions; the figure ranged from 60% to almost 100%. annotated genomic objects, forming more than 41.5% of all protein-coding elements, was regulated during the Transcription profiles and reproducibility fermentation process when the criterion of adjusted Only 11 genes were not transcribed at any of the six sam- p-value < 0.05 (Benjamini-Hochberg correction) was ap- pling points. Moreover, seven out of those 11 genes were plied. While 474 genes were regulated more than once, related to 16S rRNA and these reads were filtered before only 31 of them were regulated more than three times. mapping. Therefore, only four genes (X276_RS15615, The single gene X276_RS14155 (PTS maltose trans- X276_RS24570, X276_RS24585, X276_RS26445) demon- porter subunit IIBC) was regulated four times. The ma- strated no transcripts. On the other hand, 5024 genes out jority of differentially expressed genes were covered by of all 5219 transcribed genes (RPKM> 1) had detectable at least 100 reads after the normalization of expression transcription at all time-points. Nevertheless, it is difficult data (see Additional file 5). In total 3168 genes had no to decide whether the expression of genes with low RPKM statistically significant regulations among adjacent values has biological meaning, due to a high bio- time-points and formed potential housekeeping genes. logical noise. Analysis using assembled transcripts is The complete results of the differential expression ana- complicated, because most transcripts cover more lysis, including log2fold changes and adjusted p-values, than one gene and transcripts overlap. Transcription are available in Additional file 6. on a genome-wide scale (see Additional file 4)shows A major change was detected between the third and the a novel pattern. While the transcriptional profiles fourth time-point when 1582 genes were regulated. While from the first three time-points (T1, T2, and T3) cor- 835 out of these genes were up-regulated, 714 were respond to the transcription of the C. beijerinckii up-regulated only between these two time-points (see NCIMB 8052 genome , the latter profiles do not. Fig. 4b). Similarly, 666 out of the 747 down-regulated Reproducibility of the experiment was verified using genes were down-regulated uniquely between T3 and three biological replicates and by checking the expres- T4 (see Fig. 4c). However, some of the uniquely sion of six selected genes whose transcription profiles up-regulated genes were down-regulated between another were observed during a previous study by Kolek et al. couple of time points and some of the uniquely  (see Fig. 3a). The samples were visualized using the down-regulated genes were up-regulated during another Sedlar et al. BMC Genomics (2018) 19:415 Page 6 of 13 Fig. 3 Analysis of the transcriptome reproducibility. (a) Transcription profiles of six selected genes visualized on the heatmap using a Z-score related to an average expression of each gene. (b) 2D representation of the normalized expression data after dimensionality reduction by t-SNE to compare the samples collected at the six time-points (T1–T6) coded by different colors. Each point represented a sample with a text label based on the biological replicate (A, B, and C) and the time-point from which it originated (T1–T6) transition. Therefore, the total number of uniquely Transcription of phage DNA regulated genes between the T3 and T4 time-points was We searched the C. beijerinckii NRRL B-598 genome for 1174. Every pair of adjacent time-points had uniquely reg- phage sequences and found three prophages (see ulated genes except for the last T5–T6 transition, when Table 4). While two of these regions were relatively short regulation of only six already regulated genes was de- and phages were incomplete, the other phage was intact tected. Nevertheless, previously up-regulated genes and consisted of 35 genes coding known phage proteins X276_RS05345 (hypothetical protein) and X276_RS24350 and six hypothetical protein-coding regions. (butyrate kinase) were down-regulated between these later The expression within the first phage region corre- time-points. Both up-regulated genes during this transi- sponding to an incomplete phage was low (averaging tion, X276_RS08605 (tryptophan synthase subunit beta) RPKM = 47) with only two genes differentially expressed and X276_RS18605 (DUF4179 domain-containing pro- during T3–T4 change. Six genes were carried by a posi- tein), also had detectable growth in transcription between tive and four by a negative strand. Only four genes were previous time-points and were covered by more than fully covered by transcripts mapping to the region. The 1000 and 2000 reads, respectively. transcription within the third phage region covering the Fig. 4 Differential expression analysis. Venn diagrams showing the number of (a) all-regulated, (b) up-regulated, and (c) down-regulated genes between adjacent time-points Sedlar et al. BMC Genomics (2018) 19:415 Page 7 of 13 Table 4 Phage DNA within the C. beijerinckii NRRL B-598 genome Region Position Length (bp) Status Total no. of proteins No. of phage proteins 1 996,985..1006473 9488 incomplete 10 8 2 2,920,342..2960012 39,670 intact 41 35 3 4,005,361..4018720 13,357 incomplete 17 15 other incomplete phage was more active with average Although pseudogenes, in bacteria defined as ‘genes si- RPKM = 86, but none of the genes were differentially lenced by one or more deleterious mutations’ , could expressed during the fermentation. All genes were car- still be transcribed , their number in C. beijerinckii ried by a negative strand and 14 out of the 17 genes NRRL B-598 was rather high. For example, the reference were covered by a single transcript, including one pseudo- sequence for the closely related strain C. beijerinckii gene (X276_RS17860) with a missing stop codon. The NCIMB 8052  (NC_009617.1) contained only 112 only region containing intact prophage consisted of 38 pseudogenes predicted by NCBI PGAP. While the num- genes and three pseudogenes with a missing stop codon, ber of pseudogenes with an incomplete coding region or carried by a positive strand. The whole region began with those containing internal stop was comparable for both a pseudogene and had low transcription (averaging strains, the number of pseudogenes with frameshift was RPKM = 21). Although six genes had statistically signifi- almost twice as high in C. beijerinckii NRRL B-598 gen- cant differential expressions between T3 and T4, only ome. Although the high number of frameshifted genes short transcripts mapped to the region and only partly could indicate an extraordinary number of frameshifted covered the genes. Thus, the phage remained silent. duplicates of genes, all 23 indels were detected in homo- polymers. Therefore, such pseudogenes could also be Discussion misannotated genes due to pyrosequencing errors  The fermentation data presented in Fig. 1 comply with that were not filtered out using PacBio RSII sequencing standard results usually achieved by using the same TYA used for the complete genome assembly . Neverthe- cultivation medium [11, 12]. Deeper insight into the less, 50 bp and 75 bp long reads were too short to population is enabled by combination of double fluores- distinguish between a frameshifted duplicate and an as- cent staining and flow cytometry. Value of flow cytome- sembly error as no indels were present in 100% of reads try had already been confirmed for C. acetobutylicum mapping to ambiguous positions. Eventually, the activity [16, 17]. Cytometric data enabled the calculation of a of some pseudogenes was supported in differential specific rate of glucose consumption related to metabol- expression analysis, by high log2foldchange, excessing a ically active cells in the population during different time value of three. periods of the cultivation, together with information The transcriptome of C. beijerinckii NRRL B-598 had about the overall culture condition. never been studied before so no correlation to the older The high proportion of reads that mapped to the gen- dataset could be carried out. However, the transcription ome in particular samples unambiguously, suggested a of the six selected genes under the same cultivation con- good quality of RNA-Seq data and successful alignment ditions was monitored using qRT-PCR in study of C. even for shorter 50 bp reads in replicates A. Although beijerinckii NRRL B-598 and its mutant strain overex- we presumed that utilization of longer 75 bp reads in pressing sporulation initiation factor spo0A . In the replicates B and C could reach even higher percentage mentioned study by Kolek et al. , an increase in ex- of unique mapping, the proportion remained similar (see pression was observed in mid-cultivation for spoIIE and Fig. 2b). Nevertheless, the number of genes with detect- sigG and in the second part of cultivation for spoVD. This able transcription slightly differed when reads mapping corresponded to the results of this study (see Fig. 3a). to multiple loci were used. Although high sequencing Moreover, the expression profiles of the remaining genes depth and rRNA depletion brought a noise to RNA-Seq also showed the same pattern. Butyrate kinase (buk, , in our case, this bias was caused by duplicated X276_RS1200) transcription was maximal at the begin- genes rather than being a sequencing issue . To pre- ning of the cultivation, decreased in time, and rose slightly vent omitting transcription of duplicated genes and at the end of cultivation. The expression of ald and spo0A pseudogenes, we decided to include multi-mapping increased in the first third of cultivation and for ald also reads into the analysis. The majority of reads mapped to at the end of cultivation. Moreover, the reproducibility of the genome without any mismatches and support an the experiment was supported by utilization of three bio- overall high quality of the genome assembly. Neverthe- logical replicates and their high similarity in the sampling less, 23 indels were detected in regions of frameshifted points visualized using tSNE in Fig. 3. The tSNE coordi- pseudogenes. nates were obtained by comparing distances among Sedlar et al. BMC Genomics (2018) 19:415 Page 8 of 13 samples in the original high-dimensional space, i.e. The massive change in the gene expression, which can distances from the normalized expression profiles to the be spotted in Fig. 4, was surprisingly not associated with distances of the samples in the reduced space, i.e. the visu- the acidogenesis/solventogenesis switch that occurred alized points. The position of the samples in the 2D space earlier, mainly between the T2 and T3 time-points, nei- was then optimized until the samples with similar expres- ther with the sporulation initiation. Regarding the COG sion profiles were placed close to each other and samples assignment of 45 abovementioned genes to group J with very different expression profiles were at a further (translation), it might be possible that at least a part of distance from each other. Two main clusters, distinguish- these genes corresponded with spore coat formation ing samples from the first and the second half of the genes. Clostridial sporulation typically lasts 8–12 h and experiment, were present. While the similarity of the rep- therefore the T4 time-point might have coincided with licates from the first cluster was supported mainly by the stage IV or V of a sporulation cycle in which formation first coordinate tSNE1, the similarity in the other cluster of spore coat proteins occurred . In addition to the was supported by the second coordinated tSNE2. coat proteins, a need for specific protein complexes in- Wang et al.  observed similar clustering of volved in spore structures assemblies could be respon- RNA-Seq samples of C. beijerinckii NCIMB 8052, in sible for the increased protein formation demand. which the first cluster was represented by samples from Further transition between T4 and T5 could also show exponential and transition phases and the other by sam- an entry to the irreversible phase of sporulation, in ples from a stationary phase. On the other hand, tran- which two independent gene regulations were estab- scription profiles of C. beijerinckii NCIMB 8052  and lished in the mother cell and pre-spore and sporulation C. beijerinckii NRRL B-598 (see Additional file 4) on the must be completed. Overall culture attenuation after T4 genome-wide scale were different, especially in the later is apparent from both a decrease of specific glucose con- phase of cultivation. This could have been caused by sumption (Table 1) and from cytometric data that con- structural reorganizations in the genomes of both strains firmed the gradual increase in the proportion of inactive or by differences in gene regulatory mechanisms. Due to cells. An opposite phenomenon was observed between the high similarity of both genomes (see Additional file 7), T3 and T4. An increase in the specific rate of glucose the latter seemed more relevant. The explanation for dif- consumption, corresponding to highly regulated genes ferences in transcription profiles of C. beijerinckii NRRL coding for COG functional group C (energy production B-598 and C. beijerinckii NCIMB 8052 in the later and conversion) (see Additional file 8), was detected phases could lie in the different phenotypic behavior of together with an apparently improved viability. both strains at this stage. Although strain NCIMB 8052 Even though the massive change between T3 and T4 ceased growing together with the start of solventogenesis was obvious, searching within COG categories (see [5, 13], strain NRRL B-598 continued growing until ap- Additional file 8) does not provide unambiguous clarifi- proximately half way through the solventogenic phase cation for this phenomenon. Mostly the same categories (see Fig. 1d). Another apparent difference was an in- of regulated genes could be found between adjacent creased number of mature spores formed by the NCIMB time-points within the first 13 h of cultivation with both 8052 strain under similar cultivation conditions . down- and up-regulated representatives. After the 13th The genome of C. beijerinckii NRRL B-598 contained hour COG D and COG L related to cell cycle control two housekeeping regions with stable high level of tran- and replication respectively were not differentially scription activity that were not present in C. beijerinckii expressed which was fully consistent with the decrease NCIMB 8052 genome. This high activity was caused by in cell growth and declining culture viability supporting genes transcribing into cell wall binding proteins, in the a hypothesis of the switch of a highly proliferating cul- first region by the gene X276_RS24890 with average ture into a new strategy, securing genus preservation via RPKM 2.4∙10 , while in the second region by the gene ensuring a complete sporulation process. Simultaneously X276_RS25120 with average RPKM 1.8∙10 . The most COG F for nucleotide metabolism transport are noticeable change in the transcription on the genome up-regulated within the first two compared time sets wide scale was captured between T3 and T4 time-points and down-regulated in the latter two. These findings when the highest number of differentially expressed were comparable to the transcriptional profile of C. acet- genes was detected. Increased activity was visible espe- obutylicum  unlike the category J which was in C. cially within the region spanning the position from acetobutylicum down-regulated in the stationary phase. 176,588 to 208,581 containing 45 genes whose average The same applied to the motility related genes (COG N) 3 3 expression in RPKM rose from 1.9∙10 to 3.0∙10 . that were in our study more down-regulated even within Thirty-seven out of those genes code proteins belonged the first measured interval and up-regulated in latter to the Clusters of Orthologous Groups of proteins stages between T4 and T5. This might seem confusing (COG) functional group J associated with translation. as solventogenic clostridia are known to be motile within Sedlar et al. BMC Genomics (2018) 19:415 Page 9 of 13 the exponential and acidogenic stage  and after the with the strain C. beijerinckii NCIMB 8052 as the switch to solventogenesis, motility is generally lost. C. prophages are responsible for genome rearrangements beijerinckii NRRL B-598 possessed such a change in mo- and inversions . Even though the complete prophage tility as well but the first sample point T1 was already contained six differentially expressed genes between T3 characterized by highly motile cells and therefore a de- and T4, their average transcription was very low suggest- crease in related genes expression copied the phenotypic ing false positive detection. Due to the absence of tran- profile. On the other hand, an increase in the latter scripts mapping to the prophage regions, all these three stages is probably the result of culture phenotype regions seemed to be silent. During industrial cultivations desynchronization when all the cell types are again in the South Africa , there were several events mapped present, including motile cells. The predominant upreg- in which bacteriophages caused total collapse or reduction ulation of COG O (post translational modification, pro- of solvents production due to lytic or lysogenic cycles, tein turnover, chaperone function) between later stages respectively. Therefore, the detected prophages deserve might relate to cell stress response to increasing solvent further experimental investigation. concentrations . Furthermore, some cells within the whole population might have undergone a massive change in energy me- Conclusions tabolism and solvent production, which is associated Although the strain C. beijerinckii NRRL B-598 is a with the switch of different genes in the period of transi- promising butanol producer, we lack a precise descrip- tion between T3 and T4 time-points. The solvent forma- tion of mechanisms within its fermentation metabolism, tion and acidogenesis/solventogenesis switch are usually which prevent us from further modifications of the explained as a stress response induced by accumulation strain for industrial applications. Moreover, these mech- of acids in the cultivation medium and pH decrease. anisms seems to be unique and different from other Low pH could cause depletion of ATP pool in cells be- clostridia, including a closely related strain C. beijerinckii cause of active transport of H ions across cell mem- NCIMB 8052. In this study, we provided a complex ana- brane. To prevent this event and to ensure population lysis of its fermentation profile using HLPC, FC, and survival, some cells initiated sporulation, while other RNA-Seq technologies. Six time-points were selected to cells began converting acids into solvents. However, the study its transcription profile, while the whole experiment whole population situation was no longer critical at was repeated in order to get three biological replicates (A, time-point T4 and the lower concentration of acids in B, and C) for each time-point. This allowed us to verify the medium might have induced another metabolic the reproducibility of the experiment and to gather the change, this time associated with the direct formation of RNA-Seq dataset with the currently highest dynamic butanol/acetone from glucose. As this pathway gener- range available among solventogenic clostridia. We ated only a half of ATP in comparison with acidogenesis, analyzed the latest RefSeq annotation of the genome and its overall rate was probably higher. However, a signifi- confirmed its high accuracy. Nevertheless, through the cant advantage of the reduced risk of low pH out- analysis of single nucleotide variants, several putative weighed this discomfort. Moreover, this hypothesis was missing nucleotides were found within the regions of supported by metabolites formations, glucose consump- frameshifted pseudogenes. Transcription regulations tion, and pH profile (see Fig. 1a, c) and by an increase in identified by differential expression analysis of adjacent specific glucose consumption. More than 20 years ago, time-points showed the greatest changes between T3 and Dürre et al.  envisaged for C. acetobutylicum that T4 time-points. Surprisingly, this change was not directly different genes are probably involved in early and late connected to the acidogenic/solventogenic change, nor solventogenesis. Population heterogeneity reflected by the sporulation initiation but rather to a massive change FC and fluorescent staining (Fig. 1b) supports the in the energy metabolism and solvent production in a part hypothesis that not all cells in the population exhibit the of cell population as we discuss based on auxiliary HLPC same phenotype to cope with changing unfavorable and FC data. living conditions. The population might rather choose Furthermore, we discovered three prophage regions the bet hedging strategy  to enable at least some within the genome, which demonstrated low or no cells from the population to survive. transcription activity. Nevertheless, these regions are Many bacterial genomes contain prophages or at least important for further experimental investigation. The their remnants. Although they may represent large frac- experimental design and the gathered data proved good tion of the strain-specific DNA sequences , the strain reproducibility, therefore, repeating the experiment C. beijerinckii NRRL B-598 contained only three pro- under different conditions will also allow us to explore phage regions while only one was complete. This could gene regulatory mechanisms and signaling pathways be the reason for a high genome sequence similarity within the strain. Sedlar et al. BMC Genomics (2018) 19:415 Page 10 of 13 Methods Microscopy, fluorescent staining, and flow cytometry Bacterial culture and fermentation experiment Phase contrast microscopy (Olympus BX51; Olympus) The strain C. beijerinckii NRRL B-598 was maintained in with × 400 and × 1000 magnifications was used to deter- a form of spore suspension. TYA broth, prepared ac- mine the morphological status of cells. Population viability cording to Kolek et al. (2017) , containing: 50 g/l and heterogeneity was evaluated using flow cytometry glucose, 6 g/l tryptone (Sigma Aldrich), 2 g/l yeast ex- (BD Accuri C6) in combination with fluorescent staining. tract (Merck), 3 g/l ammonium acetate, 0.5 g/l KH PO , A combination of propidium iodide PI (Sigma Aldrich) 2 4 0.3 g/l MgSO ∙7H O, and 0.01 g/l FeSO , was used for and carboxyfluorescein diacetate CFDA (Sigma Aldrich) 4 2 4 the fermentation experiment. Multiforce 1 l bioreactors was employed for the differentiation of active and dam- (Infors HT) with 630 ml TYA broth and agitation at aged cells and detection of spores according to Kolek et al. 200 rpm were used for batch cultivation of the strain at (2016) . 37 °C. Oxygen was removed from bioreactors by bub- bling with N prior to fermentation. pH was adjusted to RNA isolation and sequencing 6.3 by 10% NaOH and all bioreactors were inoculated Cell samples for isolation of total RNA were collected with 70 ml of inoculum that was cultured previously in from 3 ml of culture broth (OD 0.9–1.0) by centrifu- an anaerobic chamber overnight (Concept 400; Ruskinn gation at 10000 rpm for two minutes, washed with Technology) under an anaerobic atmosphere (90% N , RNase free water and cell pellets were immediately 10% H ). The whole experiment was repeated during stored at − 70 °C. RNA from the cell pellet was isolated different weeks to obtain three biological replicates. using High Pure RNA Isolation Kit (Roche). Isolated Samples were taken at specific times and processed for total RNA was stored frozen at − 70 °C. The total RNA cell concentration determination, HPLC analysis, mi- concentration was determined on DS-11 FX+ Spectro- croscopy, flow cytometry, and RNA isolation. Samples photometer (DeNovix). Quality and integrity of the for RNA isolation were taken at 3.5, 6, 8.5, 13, 18, and samples were assessed using the Agilent RNA 6000 23 h of cultivation. Nano Kit (Agilent) with the Agilent 2100 Bioanalyzer (Agilent). RNA integrity number was measured using Culture growth and HPLC analysis 2100 Bioanalyzer Expert software. Cell concentration was determined by the optical density Frozen total RNA samples were thawed on ice and an (OD) measurement at 600 nm with Spectrophotometer aliquot of each sample containing 10 μg of RNA was (Varian Cary 50 UV-VIS spectrophotometer, Varian) taken for 16S and 23S ribosomal RNAs removal using against TYA broth. For calculations of a specific glucose The MICROBExpress™ Bacterial mRNA Enrichment Kit consumption rate, dry weight of biomass (CDW) was (Ambion). Efficiency of ribosomal RNA depletion and used. CDW was determined after drying biomass until concentration of RNA samples were checked on the constant weight at 105 °C. The equation was following: Agilent 2100 Bioanalyzer (Agilent) with the Agilent RNA 6000 Nano Kit (Agilent). Library construction and c −c iþ1 i q ¼ p sequencing of samples from the first replicate on CDW X ðÞ t −t i;iþ1 i;iþ1 iþ1 i Illumina HiSeq 4000, single-end, 50 bp, was performed by BGI Europe A/S (Copenhagen, Denmark). Library where q is a specific substrate consumption rate related construction and sequencing of samples from two − 1 − 1 to a number of viable cells (g.g .h ), c is concentration remaining replicates were performed by CEITEC Gen- of glucose (g/L), CDW is cell dry weight (g/L), x is a pro- omics core facility (Brno, Czechia) on Illumina NextSeq, portion of viable cells in population and t is time (h). Sym- single-end, 75 bp. bols i and i+ 1 indicate two adjacent sampling time points. Concentrations of glucose and fermentation products (lactic acid, acetic acid, butyric acid, ethanol, acetone, and Bioinformatics analysis butanol) were measured by HPLC with refractive index The quality assessment after steps of the RNA-Seq reads detection (Agilent Series 1200 HPLC; Agilent) in microfil- processing was done using FastQC in combination with tered samples of culture broths. An IEX H+ polymer col- MultiQC to summarize the reports across all samples umn (Watrex) was used for the separation. Conditions of . Reads representing 16S and 23S rRNA regions were analysis were as follows: isocratic elution, 5 mM H SO as filtered out using SortMeRNA  with SILVA database 2 4 − 1 a mobile phase with flow rate of 0.5 ml min , column of known bacterial 16S and 23S rRNA genes to temperature 60 °C, injection sample volume 20 μl. The simplify the following mapping of reads. Clean reads chromatograms were processed by ChemStation for LC were mapped to the reference genome of C. beijerinckii systems software using a set of standard samples with NRRL B-598 (NZ_CP011966.2) using STAR . Result- known concentrations to elaborate calibration curves. ing SAM (Sequence Read Alignment/Map) files were Sedlar et al. BMC Genomics (2018) 19:415 Page 11 of 13 indexed and transformed into more compact BAM (Bin- Floating window of 10,000 bp with step of 200 bp was used to render ary Read Alignment/Map) format using SAMtools . the shading area. (PDF 701 kb) Transcripts were assembled de novo from a whole data- Additional file 5: Differential analysis of adjacent time points using MA plots. MA plots showing statistically differentially expressed genes in set of 18 samples using Trinity v2.4.0 . Transcripts color. Color coding respect the color coding used in Venn diagrams in were mapped to C. beijerinckii NRRL B-598 reference Fig. 4. (PDF 315 kb) genome (NZ_CP011966.2) with BLAST+ v2.7.1 . Additional file 6: Differential expression analysis. Complete results of Mapped reads and transcripts were visualized as a graph differential expression analysis using DESeq2. (XLSX 2287 kb) of sequence read coverage across the genome and further Additional file 7: Dotplot of C. beijerinckii NRRL B-598 and C. beijerinckii NCIMB 8052 genome. Dotplots showing that no major rearrangement explored in Integrative Genomics Viewer (IGV) v 2.4.3 between the two strains are present. (PDF 315 kb)  to capture variable regions, including identification of Additional file 8: COG functional categories of differential expressed putative missing nucleotides in pseudogene region in the genes. Barplots showing the number of COG categories associated with current genome assembly. On the other hand, differentially expressed genes between adjacent time points. (PDF 226 kb) genome-wide coverage plots were reconstructed with SAMtools using sorted reads and visualized as circular Acknowledgements Computational resources were partially provided by the CESNET LM2015042 representations of genome with DNAplotter  inte- and the CERIT Scientific Cloud LM2015085, under the programme “Projects grated in Artemis . Dotplot for visual comparison of of Large Research, Development, and Innovations Infrastructures”.We C. beijerinckii NRRL B-598 and C. beijerinckii NCIMB acknowledge the CF Genomics of CEITEC supported by the NCMG research infrastructure (LM2015091 funded by MEYS CR) for their support with 8052 genomes was produced in YASS genomic similarity obtaining scientific data presented in this paper. search tool . Phage regions in the C. beijerinckii NRRL B-598 genome were predicted with PHASTER and Funding This work has been supported by grant project GACR 17-00551S. PhiSpy . In PhiSpy both available clostridial references (C. perfringens and C. tetani)were used. Availability of data and materials A count table was reconstructed using the R/Bio- The genome assembly referred in this paper is version CP011966.2 using NCBI RefSeq annotation NZ_CP011966.2. The RNA-Seq sequencing data have conductor featureCounts function included in the been deposited in the NCBI Sequence Read Archive (SRA) under the Rsubread package  and RPKM were computed accession number SRP033480. using the R/Bioconductor edgeR package . Differ- ential analysis was performed on a raw count table with Authors’ contributions KS, JK, PP, and IP designed the study. MV and BB performed the experiments. R/Bioconductor DESeq2 package . Data was normal- KS, PK, and KK analyzed the data. KS, PP, and BB wrote the manuscript with the ized using a built-in DESeq2 function. This normalization input from all authors. All authors discussed the results, read and approved the used negative binomial distribution and handles both dif- final manuscript. ferences in library sizes and differences in library compos- Ethics approval and consent to participate ition. DESeq2 identified genes that were differentially Not applicable. expressed in a time-dependent manner. Dimensionality Competing interests reduction and visualization of normalized samples was The authors declare that they have no competing interests. produced with R Rtsne package using Barnes-Hut t-SNE implementation  in combination with ggplot2 R pack- Publisher’sNote age . Venn diagrams and heatmaps representing tran- Springer Nature remains neutral with regard to jurisdictional claims in scription of selected genes using Z score were generated published maps and institutional affiliations. with R packages VennDiagram  and gplots, respect- Author details ively. Time series and bar plots were generated with Department of Biomedical Engineering, Brno University of Technology, Matlab 2017b. Technicka 12, 616 00 Brno, Czechia. Department of Biotechnology, University of Chemistry and Technology Prague, Technicka 5, 166 28 Prague, Czechia. Institute of Aquaculture and Protection of Waters, University of Additional files South Bohemia in České Budějovice, Na Sádkách 1780, 370 05 České Budějovice, Czechia. Department of Biochemistry and Molecular Genetics, University of Virginia Health System, Charlottesville, VA 22908, USA. Additional file 1: Snapshots from microscopic observation during cultivation. (PDF 628 kb) Received: 20 February 2018 Accepted: 18 May 2018 Additional file 2: Silent pseudogenes. (PDF 195 kb) Additional file 3: Putative active genes misidentified as pseudogenes References due to assembly errors. (PDF 210 kb) 1. Kujawska A, Kujawski J, Bryjak M, Kujawski W. ABE fermentation products Additional file 4: Circular plots showing average coverage of the recovery methods - A review. Renew. Sustain. Energy Rev. [Internet]. genome by RNA-Seq reads in all six time points. The outermost and the Pergamon; 2015 [cited 2017 Nov 23];48:648–661. Available from: http:// second outermost circles represent positions of genes on the forward www.sciencedirect.com/science/article/pii/S1364032115002981. (red) and reverse (blue) strands respectively. The third circle (green) 2. Patakova P, Linhova M, Rychtera M, Paulova L, Melzoch K. Novel and stands for pseudogenes. The yellow peak and shading area represents neglected issues of acetone-butanol-ethanol (ABE) fermentation by transcription greater than the average and violet lower than average. clostridia: Clostridium metabolic diversity, tools for process mapping and Sedlar et al. BMC Genomics (2018) 19:415 Page 12 of 13 continuous fermentation systems. Biotechnol Adv. [Internet]. Elsevier; 2013 clostridia. Appl. Environ. Microbiol. [Internet]. 2008 [cited 2018 Feb 9];74: [cited 2017 Nov 23];31:58–67. Available from: http://www.sciencedirect.com/ 7497–7506. Available from: http://www.ncbi.nlm.nih.gov/pubmed/18931289. science/article/pii/S0734975012000122. 18. Lahens NF, Kavakli IH, Zhang R, Hayer K, Black MB, Dueck H, et al. IVT-seq reveals extreme bias in RNA sequencing. Genome Biol. [Internet]. BioMed 3. Nölling J, Breton G, Omelchenko M V, Makarova KS, Zeng Q, Gibson G, et al. Central; 2014 [cited 2018 Jan 31];15:R86. Available from: http:// Genome sequence and comparative analysis of the solvent-producing genomebiology.biomedcentral.com/articles/10.1186/gb-2014-15-6-r86. bacterium Clostridium acetobutylicum. J Bacteriol [Internet]. American Society for Microbiology; 2001 [cited 2017 Nov 23];183:4823–4838. Available 19. Zytnicki M. Mmquant: how to count multi-mapping reads? BMC from: http://www.ncbi.nlm.nih.gov/pubmed/11466286. Bioinformatics [Internet]. BioMed Central; 2017 [cited 2018 Jan 31];18:411. 4. Poehlein A, Grosse-Honebrink A, Zhang Y, Minton NP, Daniel R. Complete Available from: http://www.ncbi.nlm.nih.gov/pubmed/28915787. genome sequence of the nitrogen-fixing and solvent-producing Clostridium 20. Goodhead I, Darby AC. Taking the pseudo out of pseudogenes. Curr. Opin. pasteurianum DSM 525. Genome Announc. [Internet]. American Society for Microbiol. [Internet]. Elsevier Current Trends; 2015 [cited 2018 Jan 10];23: Microbiology; 2015 [cited 2017 Nov 23];3:e01591-e01514. Available from: 102–109. Available from: http://www.sciencedirect.com/science/article/pii/ http://www.ncbi.nlm.nih.gov/pubmed/25700415. S1369527414001799. 5. Wang Y, Li X, Mao Y, Blaschek HP. Single-nucleotide resolution analysis of 21. Zheng D, Frankish A, Baertsch R, Kapranov P, Reymond A, Siew WC, et al. the transcriptome structure of Clostridium beijerinckii NCIMB 8052 using Pseudogenes in the ENCODE regions: Consensus annotation, analysis of RNA-Seq. BMC Genomics [Internet]. BioMed Central; 2011 [cited 2017 Nov transcription, and evolution. Genome Res. [Internet]. Cold Spring Harbor 22];12:479. Available from: http://bmcgenomics.biomedcentral.com/articles/ Laboratory Press; 2007 [cited 2018 Jan 10];17:839–851. Available from: 10.1186/1471-2164-12-479. http://www.ncbi.nlm.nih.gov/pubmed/17568002. 22. Balzer S, Malde K, Lanzen A, Sharma A, Jonassen I. Characteristics of 6. Poehlein A, Solano JDM, Flitsch SK, Krabben P, Winzer K, Reid SJ, et al. 454 pyrosequencing data-enabling realistic simulation with flowsim. Microbial solvent formation revisited by comparative genome analysis. Bioinformatics [Internet]. Oxford University Press; 2011 [cited 2018 Jan Biotechnol Biofuels [Internet]. BioMed Central; 2017 [cited 2017 Nov 23];10: 31]. p. i420–i425. Available from: http://www.ncbi.nlm.nih.gov/pubmed/ 58. Available from: http://biotechnologyforbiofuels.biomedcentral.com/ articles/10.1186/s13068-017-0742-z. 7. Ezeji T, Blaschek HP. Fermentation of dried distillers’ grains and solubles 23. Al-Hinai MA, Jones SW, Papoutsakis ET. The Clostridium sporulation (DDGS) hydrolysates to solvents and value-added products by programs: diversity and preservation of endospore differentiation. Microbiol solventogenic clostridia. Bioresour. Technol. [Internet]. Elsevier; 2008 [cited Mol Biol Rev [Internet]. American Society for Microbiology (ASM); 2015 2017 Dec 8];99:5232–5242. Available from: http://www.sciencedirect.com/ [cited 2018 Feb 9];79:19–37. Available from: http://www.ncbi.nlm.nih.gov/ science/article/pii/S0960852407007778. pubmed/25631287. 8. Wang Z, Gerstein M, Snyder M. RNA-Seq: A revolutionary tool for 24. Alsaker K V, Papoutsakis ET. Transcriptional program of early sporulation and transcriptomics. Nat. Rev. Genet. [Internet]. Nature Publishing Group; 2009 stationary-phase events in Clostridium acetobutylicum. J Bacteriol [Internet]. [cited 2017 Nov 23];10:57–63. Available from: http://www.nature.com/ American Society for Microbiology; 2005 [cited 2018 Feb 14];187:7103–7118. doifinder/10.1038/nrg2484. Available from: http://www.ncbi.nlm.nih.gov/pubmed/16199581. 25. Jones DT, Woods DR. Acetone-butanol fermentation revisited. Microbiol Rev 9. Sedlar K, Kolek J, Skutkova H, Branska B, Provaznik I, Patakova P. Complete [Internet]. American Society for Microbiology (ASM); 1986 [cited 2018 Feb genome sequence of Clostridium pasteurianum NRRL B-598, a non-type 20];50:484–524. Available from: http://www.ncbi.nlm.nih.gov/pubmed/ strain producing butanol. J Biotechnol. 2015;214:113–4. Available from: http://www.sciencedirect.com/science/article/pii/S0168165615301279. 10. Sedlar K, Kolek J, Provaznik I, Patakova P. Reclassification of non-type strain 26. Patakova P, Kolek J, Sedlar K, Koscova P, Branska B, Kupkova K, et al. Clostridium pasteurianum NRRL B-598 as Clostridium beijerinckii NRRL B- Comparative analysis of high butanol tolerance and production in clostridia. 598. J Biotechnol [Internet]. 2017 [cited 2017 Mar 3];244:1–3. Available from: Biotechnol. Adv. [Internet]. Elsevier; 2017 [cited 2018 Feb 14]; Available from: http://linkinghub.elsevier.com/retrieve/pii/S0168165617300135. https://www.sciencedirect.com/science/article/pii/S0734975017301568. 11. Kolek J, Diallo M, Vasylkivska M, Branska B, Sedlar K, López-Contreras AM, et 27. Dürre P, Fischer RJ, Kuhn A, Lorenz K, Schreiber W, Stürzenhofecker B, et al. al. Comparison of expression of key sporulation, solventogenic and Solventogenic enzymes of Clostridium acetobutylicum: catalytic properties, acetogenic genes in C. Beijerinckii NRRL B-598 and its mutant strain genetic organization, and transcriptional regulation. FEMS Microbiol Rev overexpressing spo0A. Appl. Microbiol. Biotechnol. [Internet]. Springer Berlin [Internet]. 1995 [cited 2018 Feb 9];17:251–262. Available from: http://www. Heidelberg; 2017 [cited 2017 Dec 8];101:8279–8291. Available from: http:// ncbi.nlm.nih.gov/pubmed/7576767. link.springer.com/10.1007/s00253-017-8555-3. 28. Beaumont HJE, Gallie J, Kost C, Ferguson GC, Rainey PB. Experimental evolution of bet hedging. Nature [Internet]. 2009 [cited 2018 Feb 9];462:90–93. 12. Kolek J, Branska B, Drahokoupil M, Patakova P, Melzoch K. Evaluation of Available from: http://pubman.mpdl.mpg.de/pubman/item/escidoc:2259285/ viability, metabolic activity and spore quantity in clostridial cultures component/escidoc:2259283/Beaumont_et_al_2009.pdf. during ABE fermentation. Sauer M, editor. FEMS Microbiol Lett [Internet]. Oxford University Press; 2016 [cited 2018 Jan 3];363:fnw031. 29. Brussow H, Canchaya C, Hardt W-D. Phages and the evolution of bacterial Available from: https://academic.oup.com/femsle/article-lookup/doi/10. pathogens: from genomic rearrangements to lysogenic conversion. 1093/femsle/fnw031. Microbiol Mol Biol Rev [Internet] American Society for Microbiology (ASM); 13. Wang Y, Li X, Mao Y, Blaschek HP. Genome-wide dynamic transcriptional 2004 [cited 2018 Feb 1];68:560–602. Available from: http://www.ncbi.nlm. profiling in Clostridium beijerinckii NCIMB 8052 using single-nucleotide nih.gov/pubmed/15353570. resolution RNA-Seq. BMC Genomics [Internet]. BioMed Central; 2012 [cited 30. Ochman H, Lawrence JG, Grolsman EA. Lateral gene transfer and the nature 2017 Nov 22];13:102. Available from: http://bmcgenomics.biomedcentral. of bacterial innovation. Nature [Internet]. Nature Publishing Group; 2000 com/articles/10.1186/1471-2164-13-102. [cited 2018 Feb 1];405:299–304. Available from: http://www.nature.com/ 14. Angiuoli S V., Gussman A, Klimke W, Cochrane G, Field D, Garrity GM, et al. articles/35012500. Toward an Online Repository of Standard Operating Procedures (SOPs) for 31. Jones DT, Shirley M, Wu X, Keis S. Bacteriophage infections in the industrial (Meta)genomic Annotation. Omi. A J. Integr. Biol. [Internet]. 2008 [cited 2018 acetone butanol (AB) fermentation process. J Mol Microbiol Biotechnol Jan 10];12:137–41. Available from: http://www.ncbi.nlm.nih.gov/pubmed/ [Internet]. 2000 [cited 2018 Feb 9];2:21–26. Available from: https://www. 18416670. caister.com/jmmb/v/v2/v2n1/03.pdf. 15. Maaten L van der, Hinton G. Visualizing Data using t-SNE. J Mach Learn Res 32. Ewels P, Magnusson M, Lundin S, Kaller M. MultiQC: Summarize analysis [Internet]. 2008 [cited 2018 Jan 12];620:267–284. Available from: http://www. results for multiple tools and samples in a single report. Bioinformatics jmlr.org/papers/v9/vandermaaten08a.html. [Internet]. Oxford University Press; 2016 [cited 2017 Aug 22];32:3047–3048. 16. Tracy BP, Gaida SM, Papoutsakis ET. Flow cytometry for bacteria: Enabling Available from: https://academic.oup.com/bioinformatics/article-lookup/doi/ metabolic engineering, synthetic biology and the elucidation of complex 10.1093/bioinformatics/btw354. phenotypes. Curr. Opin. Biotechnol. [Internet]. 2010 [cited 2018 Feb 9];21: 33. Kopylova E, Noé L, Touzet H. SortMeRNA: fast and accurate filtering of 85–99. Available from: http://www.ncbi.nlm.nih.gov/pubmed/20206495. ribosomal RNAs in metatranscriptomic data. Bioinformatics [Internet]. 2012 17. Tracy BP, Gaida SM, Papoutsakis ET. Development and application of flow- [cited 2017 Sep 14];28:3211–3217. Available from: http://www.ncbi.nlm.nih. cytometric techniques for analyzing and sorting endospore-forming gov/pubmed/23071270. Sedlar et al. BMC Genomics (2018) 19:415 Page 13 of 13 34. Quast C, Pruesse E, Yilmaz P, Gerken J, Schweer T, Yarza P, et al. The SILVA ribosomal RNA gene database project: Improved data processing and web- based tools. Nucleic Acids Res. [Internet]. Oxford University Press; 2013 [cited 2018 Jan 4];41:D590–D596. Available from: http://academic.oup.com/ nar/article/41/D1/D590/1069277/The-SILVA-ribosomal-RNA-gene-database- project. 35. Dobin A, Davis CA, Schlesinger F, Drenkow J, Zaleski C, Jha S, et al. STAR: Ultrafast universal RNA-seq aligner. Bioinformatics [Internet]. Oxford University Press; 2013 [cited 2017 Jul 26];29:15–21. Available from: https://academic.oup. com/bioinformatics/article-lookup/doi/10.1093/bioinformatics/bts635. 36. Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics [Internet]. Oxford University Press; 2009 [cited 2017 Jul 26];25:2078–9. Available from: https://academic.oup.com/bioinformatics/article-lookup/doi/10.1093/ bioinformatics/btp352. 37. Grabherr MG, Haas BJ, Yassour M, Levin JZ, Thompson DA, Amit I, et al. Full- length transcriptome assembly from RNA-Seq data without a reference genome. Nat. Biotechnol. [Internet]. NIH Public Access; 2011 [cited 2018 Jan 5];29:644–652. Available from: http://www.ncbi.nlm.nih.gov/pubmed/ 38. Camacho C, Coulouris G, Avagyan V, Ma N, Papadopoulos J, Bealer K, et al. BLAST+: architecture and applications. BMC Bioinformatics [Internet]. 2009 [cited 2018 Jan 17];10:421. Available from: http://www.ncbi.nlm.nih.gov/ pubmed/20003500. 39. Robinson JT, Thorvaldsdóttir H, Winckler W, Guttman M, Lander ES, Getz G, et al. Integrative genomics viewer. Nat. Biotechnol. [Internet]. 2011 [cited 2018 Jan 17];29:24–26. Available from: http://www.nature.com/doifinder/10. 1038/nbt.1754. 40. Carver T, Thomson N, Bleasby A, Berriman M, Parkhill J. DNAPlotter: Circular and linear interactive genome visualization. Bioinformatics [Internet]. 2009 [cited 2018 Jan 17];25:119–120. Available from: https://academic.oup.com/ bioinformatics/article-lookup/doi/10.1093/bioinformatics/btn578. 41. Rutherford K, Parkhill J, Crook J, Horsnell T, Rice P, Rajandream MA, et al. Artemis: sequence visualization and annotation. Bioinformatics [Internet]. 2000 [cited 2018 Jan 29];16:944–945. Available from: http://www.ncbi.nlm. nih.gov/pubmed/11120685. 42. Noé L, Kucherov G. YASS: Enhancing the sensitivity of DNA similarity Search Nucleic Acids Res [Internet]. Oxford University Press; 2005 [cited 2018 Jan 17];33:W540–W543. Available from: https://academic.oup.com/nar/article- lookup/doi/10.1093/nar/gki478. 43. Arndt D, Grant JR, Marcu A, Sajed T, Pon A, Liang Y, et al. PHASTER: a better, faster version of the PHAST phage search tool. Nucleic Acids Res. [Internet]. Oxford University Press; 2016 [cited 2018 Jan 17];44:W16–W21. Available from: http://www.ncbi.nlm.nih.gov/pubmed/27141966. 44. Akhter S, Aziz RK, Edwards RA. PhiSpy: A novel algorithm for finding prophages in bacterial genomes that combines similarity-and composition- based strategies. Nucleic Acids Res. [Internet]. Oxford University Press; 2012 [cited 2018 Jan 17];40:e126. Available from: http://www.ncbi.nlm.nih.gov/ pubmed/22584627. 45. Liao Y, Smyth GK, Shi W. FeatureCounts: an efficient general purpose program for assigning sequence reads to genomic features. Bioinformatics [Internet]. 2014 [cited 2018 Jan 29];30:923–930. Available from: http://www. ncbi.nlm.nih.gov/pubmed/24227677. 46. Robinson MD, McCarthy DJ, Smyth GK. edgeR: A Bioconductor package for differential expression analysis of digital Gene Expr data. Bioinformatics [Internet]. Oxford University Press; 2010 [cited 2018 Jan 5];26:139–140. Available from: https://academic.oup.com/bioinformatics/article-lookup/doi/ 10.1093/bioinformatics/btp616. 47. Love MI, Huber W, Anders S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. [Internet]. BioMed Central; 2014 [cited 2018 Jan 5];15:550. Available from: http:// genomebiology.biomedcentral.com/articles/10.1186/s13059-014-0550-8. 48. van der Maaten L. Accelerating t-SNE using Tree-Based Algorithms J Mach Learn Res [Internet]. 2014 [cited 2018 Jan 17];15:1–21. Available from: http:// jmlr.org/papers/volume15/vandermaaten14a/vandermaaten14a.pdf. 49. Wickham H. ggplot2 Elegant Graphics for Data Analysis [Internet]. Media. Springer; 2009 [cited 2018 Jan 29]. Available from: http://had.co.nz/ggplot2/book. 50. Chen H, Boutros PC. VennDiagram: a package for the generation of highly- customizable Venn and Euler diagrams in R. BMC Bioinformatics [Internet]. BioMed Central; 2011 [cited 2018 Jan 29];12:35. Available from: http:// bmcbioinformatics.biomedcentral.com/articles/10.1186/1471-2105-12-35.
BMC Genomics – Springer Journals
Published: May 30, 2018