Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

Single-nucleotide resolution analysis of the transcriptome structure of Clostridium beijerinckii NCIMB 8052 using RNA-Seq

Single-nucleotide resolution analysis of the transcriptome structure of Clostridium beijerinckii... Background: Clostridium beijerinckii is an important solvent producing microorganism. The genome of C. beijerinckii NCIMB 8052 has recently been sequenced. Although transcriptome structure is important in order to reveal the functional and regulatory architecture of the genome, the physical structure of transcriptome for this strain, such as the operon linkages and transcript boundaries are not well understood. Results: In this study, we conducted a single-nucleotide resolution analysis of the C. beijerinckii NCIMB 8052 transcriptome using high-throughput RNA-Seq technology. We identified the transcription start sites and operon structure throughout the genome. We confirmed the structure of important gene operons involved in metabolic pathways for acid and solvent production in C. beijerinckii 8052, including pta-ack, ptb-buk, hbd-etfA-etfB-crt (bcs) and ald-ctfA-ctfB-adc (sol) operons; we also defined important operons related to chemotaxis/motility, transcriptional regulation, stress response and fatty acids biosynthesis along with others. We discovered 20 previously non-annotated regions with significant transcriptional activities and 15 genes whose translation start codons were likely mis-annotated. As a consequence, the accuracy of existing genome annotation was significantly enhanced. Furthermore, we identified 78 putative silent genes and 177 putative housekeeping genes based on normalized transcription measurement with the sequence data. We also observed that more than 30% of pseudogenes had significant transcriptional activities during the fermentation process. Strong correlations exist between the expression values derived from RNA-Seq analysis and microarray data or qRT-PCR results. Conclusions: Transcriptome structural profiling in this research provided important supplemental information on the accuracy of genome annotation, and revealed additional gene functions and regulation in C. beijerinckii. Background greater potential for biosolvent production than Solvents such as acetone, butanol and ethanol (ABE) pro- C. acetobutylicum. duced through microbial fermentation represent important The genome of C. beijerinckii NCIMB 8052 was potential renewable fuels and chemicals [1]. Clostridium sequenced by the DOE Joint Genome Institute in 2007 acetobutylicum and C. beijerinckii are among the promi- (JGI project ID 3634512). The genome size is 6.0 Mb, nent solvent-producing species. Although C. beijerinckii is which is 50% larger than that of C. acetobutylicum ATCC phenotypically similar to C. acetobutylicum, the saccharoly- 824. The C. beijerinckii 8052 solvent-producing genes are all located on the chromosome, as opposed to the location tic strains are phylogenetically distant from the amylolytic C. acetobutylicum ATCC 824 type strain [2]. C. beijerinckii of these genes on a mega-plasmid in C. acetobutylicum exhibits a broader substrate range and optimum pH for 824. Although transcriptome structural organization is growth and solvent production [3]; thus it may have important in order to reveal the functional and regulatory architecture of the genome, such annotation for the C. bei- jerinckii 8052 genome is far from complete. Current gen- ome annotation was made by computational analysis * Correspondence: [email protected] Institute for Genomic Biology, University of Illinois at Urbana-Champaign, based on gene prediction algorithms. Although this allows Urbana, IL 61801, USA for the determination of the complete set of gene loci and Full list of author information is available at the end of the article © 2011 Wang et al; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Wang et al. BMC Genomics 2011, 12:479 Page 2 of 10 http://www.biomedcentral.com/1471-2164/12/479 intergenic regions of the genome, it does not provide suffi- cient information concerning the transcriptional organiza- tion on a genome-wide level. For example, transcriptome structures such as operon linkages and transcript bound- aries, etc. are not well understood and lack confirmation with experimental approaches. Next-generation high- throughput sequencing technology enabled us to obtain millions of cDNA reads simultaneously. In RNA-Seq ana- lysis, these reads can be assembled against the genome sequence, and expression values calculated based on the reads mapped to genes. Deep sequencing of cDNA pool allowed us to study the bacterial transcriptome structure and gene expression at an unprecedented resolution and depth [4-8]. In this study, we used RNA-Seq technology to investigate C. beijerinckii 8052 transcriptome structure at a single-nucleotide resolution. We identified the transcrip- tion start sites and operon structure throughout the genome. We confirmed the structure of important gene operons involved in metabolic pathways for acid and solvent production in C. beijerinckii 8052. We defined important operons related to chemotaxis/motility, tran- scriptional regulation, stress response and fatty acids bio- synthesis. We discovered 20 previously non-annotated Figure 1 Fermentation kinetics of C. beijerinckii 8052 batch regions with significant transcriptional activities and 15 culture. (A) Cell growth curve with sampling points for RNA-Seq genes whose translation start codons were likely mis- indicated by red arrows. (B) Solvents and acids production over annotated. The results from this study significantly time with the onset of solvent production indicated by a green enhanced the accuracy of current genome annotation, and arrow. provided an essential reference point for other researchers working in related fields. The sequence coverage per base was plotted and visua- Results and discussion lized using the genome browser Artemis and DNAPlotter Growth kinetics and ABE fermentation [5,9,10] (Additional file 1 Figure S1). The sequence cover- C. beijerinckii 8052 grew rapidly with a very short lag age of the genes (based on number of mapped reads) for phase in the batch fermentation with P2 medium supple- different samples was observed to be different in the over- mented with yeast extract and glucose (Figure 1A). The all sequence coverage profiles, which might be a reflection fermentation experienced a shift from acidogenesis to of the physiological states as the cell transited from one solventogenesis at approximately 4.5-8 h. Formation of growth phase to another. For example, sample 2 repre- solvents was detected at between 4.5-6.5 h after the start sented a time period that was at the beginning of the tran- of fermentation, which corresponded to the late expo- sition from acidogenesis to solventogenesis. In this sample, nential growth phase. Butanol continued to increase genes encoding butyrate kinase (buk, 233080-234147 nt; throughout the stationary phase (Figure 1B). Samples for Gene ID is listed in Additional file 2 Table S1, similarly RNA isolation were collected at time points during acido- hereinafter), acetyl-CoA acetyltransferase (thl,499121- genesis (2 and 4.5 h) and solventogenesis (after 6.5 h) 500302 nt), glyceraldehyde-3-phosphate dehydrogenase (Figure 1A). The combined information from these sam- (gap, 710763-711764 nt), fructose-bisphosphate aldolase ples collected at different growth phases is representative (fba, 2199060-2199926 nt), aldehyde dehydrogenase (ald, for transcriptome structural analysis. 4399026-4400432 nt), acetoacetate decarboxylase (adc, 4401916-4402656 nt) were all actively expressed (high Transcriptome definition and structure peaks in Additional file 1 Figure S1). These genes are The 75-nt cDNA reads were mapped to the C. beijerinckii involved in acid and butyryl-CoA formation, solventogen- 8052 genome. Only those reads that mapped unambigu- esis and glycolysis. The time frame of sample 4 was consis- ously to the genome were used for further analyses tent with the transition to non-active growth and (Table 1). Collectively, around 75.5% of the C. beijerinckii clostridial spores formation. In this sample, genes encod- 8052 genome was transcribed in at least one sample, even ing stage V sporulation protein T (spoVT, 115288-115836 though the fraction in each single sample was much less. nt), spore coat protein cotJC (cotJC, 2411617-2412183 nt), Wang et al. BMC Genomics 2011, 12:479 Page 3 of 10 http://www.biomedcentral.com/1471-2164/12/479 Table 1 Summary of RNA-Seq sequencing and data analysis results Sample 1 2 3 4 5 6* Total Time collected (h) 2 4.5 10 14 17 26.5 Total No. of reads 8988633 9457480 8011531 8448929 10363535 38574501 83844609 No. of reads mapped 8473125 9037616 7514804 7730815 9842491 37676913 80275764 No. of reads unambiguously mapped 6776544 7274568 6096405 6189652 8096169 35027722 69461060 No. of bases unambiguously mapped 508240800 545592600 457230375 464223900 607212675 2627079150 5209579500 Percentage of genome represented 38.71 36.59 43.84 40.60 41.34 50.67 75.46 No. of genes with detectable expression** 4219 4082 4496 4453 4487 4750 5024 -1 -2 -2 -2 -1 -2 -2 Range in expression levels (RPKM) 3.2 × 10 5.8 × 10 4.5 × 10 7.0 × 10 1.0 × 10 3.6 × 10 3.6 × 10 4 4 4 4 4 4 4 ~ 2.5 × 10 ~ 6.0 × 10 ~ 2.5 × 10 ~ 9.0 × 10 ~ 8.6 × 10 ~ 9.8 × 10 ~ 9.8 × 10 *Sample 6 was sequenced with 1 sample/lane, while samples 1-5 with 2 samples/lane. See Methods section for more details; **Pseudogenes included. spore coat peptide assembly protein cotJB (cotJB, 2412190- Cbei_0465, Cbei_1518 and Cbei_3542, respectively. In 2412453 nt), small acid-soluble spore protein, sspA (sspA, addition, a Known Riboswitch-like Element (flavin mono- 3596975-3597184 nt), small acid-soluble spore protein, nucleotide (FMN) riboswitch) was found in the 5’-UTR sspC2 (sspC2, 3814296-3814505 nt) were actively of Cbei_1224. More details about the identified putative expressed among others (Additional file 1 Figure S1). riboswitches were described in Additional file 6 Table S4. These genes are involved in the regulation of spore coat Previously, an operon containing genes encoding gly- assembly and other sporulation-related processes [11,12]. ceraldehyde-3-phosphate dehydrogenase (gap), phospho- These results were in good agreement with our previous glycerate kinase (pgk) and triosephosphate isomerase observation using a 500-gene set DNA microarray [13]. A (tpi) was revealed by transcriptional analyses for C. acet- more detailed genome-wide transcriptional analysis of obutylicum, while the gene encoding phosphoglycerate C. beijerinckii 8052 during the shift from acidogenesis to mutase (pgm) was identified as not a member of this operon [17]. In this study, employing the single-nucleo- solventogenesis is currently underway in our lab. tide resolution sequence data, a similar gap-pgk-tpi The distinct transcript boundaries and multigene operon structure in a genome are essential for elucidat- operon was observed, upstream and apart from pgm for ing a genome’s regulatory mechanisms. Through tran- C. beijerinckii 8052. In addition, it is very interesting scriptome profiling analysis, the transcript boundaries that the upstream gene Cbei_0596 was also found to be and operon structures were identified across the entire linked with this operon (Figure 2A), which was validated C. beijerinckii 8052 genome (Additional file 3 Table S2). by end-point RT-PCR as discussed below (Additional Of all the > 5000 genes, 2151 genes (42.2%) were deter- file 7 Figure S2 and Additional file 8 Table S5). The mined as being parts of multi-gene operon structures. upstream gene (CA_C0708) of gap in C. acetobutylicum The largest number of genes in a single operon was 32 824 genome is conjectured to be the counterpart of (ribosomal proteins operon, Cbei_0150-0181, 192128- Cbei_0596 in C. beijerinckii 8052, since they both 207934 nt). A refined genome annotation file (in Gen- encode putative transcriptional regulator proteins and Bank format) was generated (Additional file 4) based on share 64% sequence identity. However, the intergenic the findings from this work and the current C. beijer- distance between CA_C0708 and CA_C0709 in C. aceto- inckii 8052 genome annotation in NCBI. The GenBank butylicum 824 genome is 146 nt, while that between file can also be downloaded from https://netfiles.uiuc. Cbei_0596 and Cbei_0597 in C. beijerinckii 8052 gen- edu/blaschek/www/Wang-BMC2011. ome is only 71 nt. Although Cbei_0596-0599 are orga- In addition, 5’-untranslated regions (5’-UTRs) were nized in the same operon, the sequence coverage depth determined at the same time. Similar to other bacteria downstream of gap was unexpectedly lower than that of [4,14], most of the identified 5’-UTRswereveryshort. the anterior (Figure 2A). An intrinsic terminator was Among all the 1605 5’-UTRs that were estimated with predicted adjacent and downstream of gap (711818- high confidence (that is, valid 5’-UTRs identified in the 6 711852 nt) using the bacterial genome transcription samples with a ≤ 50 nt window), 1262 were ≤ 50 nt, and terminator prediction software TransTerm [18]. The ter- only 65 were ≥ 100 nt (Additional file 5 Table S3). RibEx minator may have played a key role in attenuating the [15] was used to test for the putative regulatory elements transcription of the downstream genes in this operon. among the 5’-UTRs over 100 bp. Three Predicted Ribos- Further experiments need to be carried out to confirm witch-like Elements (do not belong to any Rfam [16]; yet the activity and function of this element. present very significant associations with either a COG The butyryl-CoA formation related genes in C. aceto- or a KEGG pathway) were found in the 5’-UTRs of butylicum 824 encoding 3-hydroxybutyryl-CoA Wang et al. BMC Genomics 2011, 12:479 Page 4 of 10 http://www.biomedcentral.com/1471-2164/12/479 [24]. In C. acetobutylicum 824, the genes adhE (acetal- dehyde-CoA/alcohol dehydrogenase), ctfA (acetoacetyl- CoA: acetate/butyrate-CoA transferase subunit A) and ctfB (acetoacetyl-CoA: acetate/butyrate-CoA transferase subunit B) are located in the sol (solvent formation) operon on the mega-plasmid pSOL1 whose loss leads to the degeneration of the strain [25], while adc (acetoace- tate decarboxylase) is organized in a monocistronic operon in the opposite direction [26,27]. In this study, a sol operon organized in the order of ald-ctfA-ctfB-adc was observed (Figure 2B and Additional file 3 Table S2). Previously, Chen and Blaschek (1999) speculated that the ald, ctfA, ctfB and adc genes were located in an operon following a Northern hybridization analysis of C. beijerinckii 8052 total RNA [28]. With direct sequen- cing data, this study successfully confirmed the above hypothesis, and identified the transcriptional start sites (TSS) upstream of ald gene. This solventogenic gene arrangement in C. beijerinckii 8052 is consistent with that observed in C. beijerinckii NRRL B593 and C. sac- charoperbutylacetonicum N1-4 [29,30]. A flagellar/chemotaxis gene operon (CA_C2225-C2215) was previously defined in C. acetobutylicum [31]; a coun- terpart organized in exactly the same order (Cbei_4312- Figure 2 Single-nucleotide resolution-sequencing data revealed 4302) in C. beijerinckii 8052 was observed in this study important operon structures in C. beijerinckii 8052. (A) The (Additional file 3 Table S2). Similarly to C. acetobutylicum glycolytic genes (gap-pgk-tpi) operon structure (from sample 1). The upstream gene Cbei_0596 encoding a putative transcriptional 824, the transcriptional regulator sigF operon was also regulator protein was also linked with this operon. The intrinsic confirmed in C. beijerinckii 8052 with the sequencing data terminator predicted with TransTerm [18] is represented with a small that includes the forespore-specific sigma factor gene sigF, yellow bar. (B) The solventogenic genes are organized in the sol the anti-sigF factor gene spoIIAB, and the anti-anti-sigF operon in the order of ald-ctfA-ctfB-adc. Gene positions and directions factor gene spoIIAA (Additional file 3 Table S2). In Bacil- are shown beneath by arrows. The color of arrows indicates the COG lus subtilis and C. acetobutylicum, the class I heat shock functional class of genes as defined in Additional file 1 Figure S1. genes are organized in dnaKJ (organized in the order of hrcA-grpE-dnaK-dnaJ in C. acetobutylicum 824) and groESL (groES-groEL) operons [32,33]. The similar organi- dehydrogenase (hbd), crotonase (crt), and butyryl-CoA zation of dnaKJ and groESL operons for C. beijerinckii dehydrogenase (bcd) were identified as a bcs (butyryl- 8052 was also observed in this study (Additional file 3 CoA synthesis) operon in the order of crt-bcd-etfB Table S2). The fatty acid biosynthesis genes are organized (encoding electron transfer flavoprotein subunit beta)- in a single fab operon in C. acetobutylicum (CA_C3568- etfA (encoding electron transfer flavoprotein subunit C3580) [34], while in this study, the fab genes (Cbei_1067- alpha)-hbd [19,20]. A homologous operon structure 1077) in C. beijerinckii 8052 were observed to be was observed for C. beijerinckii 8052 (Cbei_0321-0325) organized in four operons, which is similar to those of transcribed in the same order but the opposite direc- tion (Additional file 3 Table S2). In addition, based on B. subtilis [35] (Additional file 3 Table S2). the sequencing data, similar pfk (phosphofructokinase)- While the current C. beijerinckii 8052 genome was pyk (pyruvate kinase), pta (phosphotransacetylase)-ack annotated based on bioinformatical predictions, the RNA- (acetate kinase), ptb (phosphate butyryltransferase)-buk Seq sequencing approach provides additional experimental gene operons were identified in C. beijerinckii 8052 as evidence for genome annotation. By comparing the those described in C. acetobutylicum 824 [21-23] sequence coverage data to the genome annotation, 20 (Additional file 3 Table S2). Among them, ptb-buk non-annotated regions were found to have significant operon was also verified by end-point RT-PCR experi- transcriptional activities (Figure 3A, see also the supple- ment as discussed below (Additional file 7 Figure S2 mental texts in Additional file 9). These regions may and Additional file 8 Table S5). represent potential new genes or regulatory RNAs. Twelve Nearly all the genes encoding solventogenic enzymes potential new genes were predicted in these regions using GeneMark [36]. Additional details about this test and the have been cloned and characterized in C. acetobutylicum Wang et al. BMC Genomics 2011, 12:479 Page 5 of 10 http://www.biomedcentral.com/1471-2164/12/479 more during the course of a batch fermentation. Besides, several genes encoding the subunits (such as soborse- specific subunits, lactose/cellobiose-specific subunits) of phosphotransferase system (PTS) related to the trans- port of sugars other than glucose were also among the list of silent genes. Since glucose was the only carbohy- drate used in this study, these enzymes were not induced during the fermentation process. Putative housekeeping genes (HKGs) Some genes have little variation in expression level through the entire fermentation process, and they are regarded as putative housekeeping genes (HKGs). For accurate gene expression quantification, normalization of gene expression against HKGs (endogenous control or reference gene) is generally required. In this study, 177 protein-coding genes were identified as putative HKGs with the lowest coefficient of variation(CV)inRPKM (see Methods section) values among all the sampling points (CV = standard deviation/mean; < 30% for listed genes in Additional file 12 Table S8) [37]. A COG func- tional group analysis by Fisher’s exact test found that COG functional category D (Cell cycle control, mitosis and meiosis, 10.3%), L (Replication, recombination and Figure 3 Mapping the high resolution RNA-Seq data back to repair, 9.2%) and E (Amino acid transport and metabo- the genome allows the test of current C. beijerinckii 8052 lism, 6.6%) were overrepresented in this list [37,38]. This genome annotation. (A) Sequence coverage in the Cbei_4206- 4207 region, showing transcription activity across a region not list of putative HKGs was generated based on the tran- included in the current genome annotation (from sample 2). (B) scription data obtained from the six samples under the Sequence coverage near Cbei_1787, with the sequence coverage certain batch fermentation conditions employed in this starts downstream the annotated translation start codon (from study. This list can be considered as a starting point for sample 5). Gene positions and directions are shown beneath by identifying HKGs for C. beijerinckii.However,whether arrows. The color of arrows indicates the COG functional class of genes as defined in Additional file 1 Figure S1. these genes are stably expressed under different experi- mental conditions requires further study. A prediction of the promoters for primary sigma fac- tors for all the putative HKGs was carried out using predicted genes were summarized in Additional file 10 BPROM (http://linux1.softberry.com/berry.phtml). In Table S6. total, 88 promoters along with their corresponding -10 In addition, 15 transcripts were identified with TSS box and -35 box sequences were predicted (Additional that are downstream of the current annotated transla- file 12 Table S8). tion start sites (Figure 3B, see also the supplemental texts in Additional file 9). For these regions, re-annota- Pseudogenes tion is needed since the current start codons may have Pseudogenes in bacteria genomes are non-functional been mis-annotated. copies of gene fragments usually created by random Putative silent genes mutations and chance events [39,40]. Although pseudo- Based on RNA-Seq sequence data, 78 protein-encoding genes are usually non-functional, it has been reported genes demonstrated no transcripts over all six sampling that quite a few of them can still go through the tran- time points, and these genes are likely silent (Additional scription process [41-43]. In this study, 26 out of the 82 file 11 Table S7). Thirty-one out of them were genes pseudogenes in C. beijerinckii 8052 genome were found encoding hypothetical proteins. In addition, half of the to have significant transcriptional activities over the genes encoding transposases (14 out of the total 29 in course of fermentation (Additional file 13 Table S9). the genome) were silent. Although transposases may However, although globally the 82 pseudogenes comprise have played important roles during the evolution of about 0.6% of the predicted CDS of the genome, only < C. beijerinckii,mostofthemare notasfunctionalany 0.1% of all the RNA-Seq reads mapped to these genes, Wang et al. BMC Genomics 2011, 12:479 Page 6 of 10 http://www.biomedcentral.com/1471-2164/12/479 indicating a significantly lower transcriptional activity when compared to the regular protein coding genes. In addition, six pseudogenes were found to be completely silent throughout the fermentation process (Additional file 13 Table S9). This is overrepresented among the silent genes when compared to the protein-coding genes based on a Fisher’s exact test (p = 0.0015). When pseudogenes share high sequence identity with other functional genes in the genome, it is usually on one hand very difficult to design probes to detect the transcription of pseudogenes with traditional methods, and on the other pseudogenes can lead to amplification bias during genetic studies of the functional genes with high sequence identity to pseudogenes [42]. Apparently such problems can be avoided with RNA-Seq method, which is one of the unprecedented advantages of RNA- Seq technique. For summarization and easy reference to the readers, a table (Additional file 14 Table S10) summarizing the var- ious supplementary information provided by RNA-Seq study to the current genome annotation was listed in the supplemental materials. End-point RT-PCR End-point RT-PCR was used to validate the operon struc- ture results obtained from RNA-Seq analysis [4]. One Figure 4 Correlations of RNA-Seq data with microarray and qRT- gene pair that is highly likely to be co-operonic with an PCR data. (A) Comparison of normalized sequence coverage depth intergenic distance of 7 bp was chosen as a positive con- from RNA-Seq and absolute microarray fluorescence intensity. Shown trol (Cbei_0341-0342). Ten other gene pairs with long is a plot of log -transformation of RPKM for sample 2 (at 4.5 h) and raw intergenic distances (71-440 bp) where there were high microarray intensities for an equivalent sample (at 5 h) from Shi and Blaschek [13]. (B) Real time qRT-PCR verification of RNA-Seq results for transcription levels were chosen. All the operon structures selected genes (from sample 4, see also Additional file 15 Table S11). determined by sequence data in the chosen gene pairs were confirmed by end-point RT-PCR results, indicating that RNA-Seq is a valid approach for operon structure sequencing data were compared. Similar differential identification (Additional file 7 Figure S2 and Additional expression patterns by microarray data from equivalent file 8 Table S5). Among them, the ptb-buk operon and the samples (5 h vs. 13 h) were observed (Figure 5). Although linkage between Cbei_0596 and Cbei_0597 (gap)asdis- only a limited number of genes were compared, the good cussed above were also confirmed. correlation between two methods further demonstrated the effectiveness of RNA-Seq approach. An additional Correlations of RNA-Seq data with microarray and qRT- advantage of RNA-Seq is that the sequence data can mea- PCR data sure expression levels for every gene without bias caused The RNA-Seq data obtained in this study were compared by sequence-specific differences in hybridization efficiency to the results obtained previously using a 500-gene set in microarray-based methods [4,44], and sequence data DNA microarray [13]. A small number of genes that did allow one to measure gene expression more accurately not allow for unambiguous mapping in RNA-Seq analysis with a much higher dynamic range as indicated in this were not included in the comparison. A good correlation study (Table 1) and other references [4,8,44,45]. was observed between the normalized coverage depth for The expression measurement with RNA-Seq data was sample 2 (collected at 4.5 h) and the raw microarray fluor- further validated using real time quantitative reverse escence intensities for an equivalent sample (collected at transcription PCR (qRT-PCR). Twenty-five genes were 5 h under the same growth condition) (Figure 4A). Specifi- selected for the test (Additional file 13 Table S9). A cally, the gene expression patterns of glycolytic genes high degree of correlation (R = 0.959) between the Cbei_0597-0600 and sporulation genes Cbei_2069-2071 threshold value (Ct) and the log -tranformation of from samples 2 (4.5 h) and 4 (14 h) measured by RPKM was observed (Figure 4B). Wang et al. BMC Genomics 2011, 12:479 Page 7 of 10 http://www.biomedcentral.com/1471-2164/12/479 Methods Bacterial culture and fermentation experiment Laboratory stocks of C. beijerinckii 8052 spores were stored in sterile H O at 4°C [46]. Spores were heat- shocked at 80°C for 10 min, followed by cooling on ice for 5 min. The heat-shocked spores were inoculated into tryptone-glucose-yeast extract (TGY) medium con- -1 -1 -1 taining 30 g L tryptone, 20 g L glucose, 10 g L -1 yeast extract and 1 g L L-cysteine at a 1% inoculum level. The TGY culture was incubated at 35 ± 1°C for 12-14 h in an anaerobic chamber under N :CO :H 2 2 2 (volume ratio of 85:10:5) atmosphere. Subsequently, actively growing culture was inoculated into a model -1 -1 Figure 5 Comparison of transcriptional profiles for sample 2 (4.5 solution containing 60 g L glucose, 1 g L yeast h, black line) and sample 4 (14 h, red line), normalized to the extract, and filter-sterilized P2 medium [47,48] in a Six- gene length and genome-wide total number of unambiguously fors bioreactor system (Infors AG, Bottmingen, Switzer- mapped reads for that sample.Shown arean ~5.4 kb segment land). Throughout the experiment, oxygen-free nitrogen surrounding the gap-pgk-tpi glycolytic gene operon (Cbei_0597-0600) and an ~1.2 kb segment surrounding the cotJC-cotJB (Cbei_2069-2071) was flushed through the broth to maintain anaerobiosis. sporulation gene operon. Specific genes are indicated with arrows, Temperature was controlled at 35 ± 1°C. A stirring at whose colors indicate the COG functional class of genes as defined in 50 rpm was employed for mixing. During the course of Additional file 1 Figure S1. Below each gene is the expression level fold fermentation, samples were collected for cell density change that was measured by microarray data for equivalent samples and product concentration measurements. For sequen- (5 h vs. 13 h in Shi and Blaschek [13]; microarray data for Cbei_2071 are not available). cing purpose, RNA samples were taken over the early exponential, late exponential and stationary phases (sample 1-6 at 2, 4.5, 10, 14, 17 and 26.5 h respectively as shown in Figure 1A). Conclusions A single-nucleotide resolution analysis of the C. beijer- Culture growth and fermentation products analysis inckii NCIMB 8052 transcriptome structure was con- Culture growth was measured by following optical density ducted using high-throughput RNA-Seq technology. (OD) in the fermentation broth at A using a BioMate 5 The transcription start sites and operon structures were UV-Vis Spectrophotometer (Thermo Fisher Scientific Inc., identified throughout the genome. The structure of Waltham, MA). ABE, acetic acid, and butyric acid concen- operons involved in metabolic pathways for acid and trations were quantified using gas chromatography (GC) solvent production in C. beijerinckii 8052 were con- system as previously described [49]. firmed. Important operons related to chemotaxis/moti- lity, transcriptional regulation, stress response and fatty RNA isolation, library construction and sequencing acids biosynthesis along with others were defined. In preparation for RNA isolation, 10 ml cultures were har- Twenty previously non-annotated regions were discov- vested at six time points, and centrifuged at 4,000 × g for ered with significant transcriptional activities and 15 10 min at 4°C. Total RNA was extracted from the cell pel- genes were identified whose translation start codons let using Trizol reagent based on manufacture’s protocol were likely mis-annotated. As a consequence, the accu- (Invitrogen, Carlsbad, CA) and further purified using racy of existing genome annotation was significantly RNeasy minikit (Qiagen, Valencia, CA). DNA was enhanced. Moreover, 78 silent genes and 177 putative removed using a DNA-free™ kit (Ambion Inc., Austin, housekeeping genes were identified based on normalized TX). RNA quality was assessed using a nanochip on a transcription measurement with the sequence data. model 2100 bioanalyzer (Agilent Technologies, Santa More than 30% of the pseudogenes were observed to Clara, CA). RNA concentration was determined with a have significant transcriptional activities during the fer- nanodrop (Biotek Instruments, Winooski, VT). Bacterial mentation process. Strong correlations exist between the 16S and 23S ribosomal RNAs were removed with a expression values derived from RNA-Seq analysis and MICROBExpress™ kit (Ambion Inc., Austin, TX). The microarray data or qRT-PCR results. Transcriptome enriched mRNA was converted to a RNA-Seq library structural profiling in this study provided important using the mRNA-Seq library construction kit (Illumina supplemental information on the accuracy of annotation Inc., San Diego, CA) following manufacturer’s protocols. of the C. beijerinckii genome. For samples 1 to 6, two samples were pooled and Wang et al. BMC Genomics 2011, 12:479 Page 8 of 10 http://www.biomedcentral.com/1471-2164/12/479 sequenced on one single lane of an eight-lane flow cell genes and primer sequences are listed in Additional file with the Genome Analyzer IIx system (Illumina Inc., San 15 Table S11. Diego, CA). However, sample 6 yielded a poor read quality following the first sequencing. In order to obtain enough RNA-Seq data accession number sequencing depth, sample 6 was sequenced again using The RNA-Seq sequencing data have been deposited in one single lane under otherwise identical conditions. The theNCBISequenceRead Archive(SRA) underthe derived sequence reads were 75 nt long. The overall error accession number SRA045799. rate of the control DNA was < 0.6%. The total number of reads generated from each library is summarized in Additional material Table 1. Additional file 1: Circular plots of the reads from all six samples mapping to the C. beijerinckii 8052 genome. The outermost and Sequence mapping and visualization second outermost circles represent CDS on the forward and reverse The generated 75-nt reads were mapped to the C. beijer- strands respectively, both of which are colored according to Clusters of inckii 8052 genome using MAQ, and those that did not Orthologous Groups (COG) functional classification assigned to C. beijerinckii 8052 annotation. The gold peak and shading area represents align uniquely to the genome were discarded [5,50]. The greater than the average and lower (in purple). The COG functional quality parameter (-q) used in MAQ pileup was set to 30. classes and corresponding color-coding are as follows (with RGB color Each base was assigned a value based on the number of model values in the parentheses): Class J, black (0 0 0); Class K, blue (0 0 255); Class L, brown (165 42 42); Class B, dark blue (0 0 139); Class D, mapped sequence coverage. The coverage plot files were chocolate (210 105 30); Class V, cyan (0 255 255); Class T, red (255 0 0); read into Artemis and visualized as sequence coverage Class M, yellow (255 255 0); Class N, dark green (0 100 0); Class U, grey profiles over the entire genome [5,9]. Transcription start (128 128 128); Class O, gold (255 215 0); Class C, orange (255 165 0); Class G, light grey (170 170 170); Class E, mid red (255 63 63); Class F, sites (TSS) were manually identified as described in Pas- pink (255 192 203); Class H, purple (128 0 128); Class I, violet (238 130 salacqua et al. [4]. The region between determined TSS 138); Class P, skyblue (135 206 235); Class Q, tan (210 180 140); Class R, andthe annotatedtranslation startsitewas definedas darkgrey (100 100 100); Class S, darkred (139 0 0); Not in COGs, green (0 255 0). If one gene belongs to more than one COG classes, the color for the 5’-untranslated region (5’-UTR). that gene was defined by the first class it belongs to as the above order. Additional file 2: The corresponding Gene ID and ortholog gene in Measurement of gene expression C. acetobutylicum ATCC 824 for the genes in C. beijerinckii 8052 The quantitative gene expression value, RPKM (reads/Kb/ mentioned in the paper (three-letter short names were used in the main text). Million), was calculated using custom Perl scripts by nor- Additional file 3: TSS and operon structure in C. beijerinckii 8052 malizing the sequence coverage over the gene length and chromosome. total unambiguously mapped reads in each library [7,37]. Additional file 4: Refined genome annotation (in GenBank format) based on the findings from this work and the current C. beijerinckii End-point RT-PCR for operon structure assessment 8052 genome annotation in NCBI. The GenBank file can also be downloaded from https://netfiles.uiuc.edu/blaschek/www/Wang- End-point RT-PCR was performed as described in Passa- BMC2011. lacqua et al. [4]. For each selected co-operonic gene pair, Additional file 5: Genes with 5’-UTR length ≥ 100 nt. four primers 1F, 1R, 2F and 2R were designed. Primers 1F Additional file 6: Putative riboswitches identified using RibEx and 1R amplify a region within gene 1, while 2F and 2R among the 5’-UTRs over 100 nt. amplify a region within gene 2. When a continuous tran- Additional file 7: Images of end-point RT-PCR products for 11 script exists, 1F and 2R amplify across the intergenic selected co-operonic gene pairs on a 1.5% agarose gel. For each gene pair (1-11, see details in Additional file 8, Table S5) as labeled on region between genes 1 and 2. Therefore, the size of the the top, the band on the left lane is PCR product using cDNA reverse- end-point RT-PCR product will be the same using either transcribed from RNA samples as template, on the right is PCR product cDNA or genomic DNA as templates. The reaction pro- using genomic DNA as template. ducts were visualized on 1.5% agarose gels stained with Additional file 8: Operon structure validation with end-point RT- PCR. ethidium bromide. Additional file 9: Supplemental texts. Additional file 10: Potential new genes predicted in non-annotated Real time qRT-PCR regions with significant transcriptional activities using GeneMark. Quantitative reverse transcription PCR (qRT-PCR) was Additional file 11: Putative silent genes over all the six sampling performed in order to validate the quantification of gene time points. expression level by RNA-Seq. Twenty-five genes were Additional file 12: Putative housekeeping genes (HKGs). chosen to represent a large range of RPKM values (from Additional file 13: Transcription of the pseudogenes. ~ 40 to > 18000). Triplicate reactions were performed Additional file 14: Summary of the supplementary information using Power SYBR green PCR master mix (Applied Bio- provided by RNA-Seq to the current C. beijerinckii 8052 genome annotation. systems, Carlsbad, CA) on an ABI Prism 7900 HT fast Additional file 15: Genes and primer sequences for qRT-PCR test. real-time PCR machine (Applied Biosystesms). Detected Wang et al. BMC Genomics 2011, 12:479 Page 9 of 10 http://www.biomedcentral.com/1471-2164/12/479 14. McGrath PT, Lee H, Zhang L, Iniesta AA, Hottes AK, Tan MH, Hillson NJ, Acknowledgements Hu P, Shapiro L, McAdams HH: High-throughput identification of This work was supported by USDA Research Initiative Award AG 2006-35504- transcription start sites, conserved promoter motifs and predicted 17419, and Illinois Council on Food and Agricultural Research (C-FAR) Grant regulons. Nat Biotechnol 2007, 25(5):584-592. 698 IDA CF06DS-01-03 to HPB. We thank Dr. Heng Li and Dr. Timothy 15. Abreu-Goodger C, Merino E: RibEx: a web server for locating riboswitches Perkins for their helpful inputs on the data analysis. and other conserved bacterial regulatory elements. Nucleic Acids Res 2005, 33:W690-W692. Author details 16. Griffiths-Jones S, Moxon S, Marshall M, Khanna A, Eddy SR, Bateman A: Department of Agricultural and Biological Engineering, University of Illinois Rfam: annotating non-coding RNAs in complete genomes. Nucleic Acids at Urbana-Champaign, Urbana, IL 61801, USA. Institute for Genomic Biology, Res 2005, 33:D121-D124. University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA. 17. Schreiber W, Dürre P: Differential expression of genes within the gap Department of Animal Sciences, University of Illinois at Urbana-Champaign, operon of Clostridium acetobutylicum. Anaerobe 2000, 6(5):291-297. Urbana, IL 61801, USA. Department of Food Science and Human Nutrition, 18. Ermolaeva MD, Khalak HG, White O, Smith HO, Salzberg SL: Prediction of University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA. Center transcription terminators in bacterial genomes. J Mol Biol 2000, 301(1):27-33. for Advanced Bioenergy Research (CABER), University of Illinois at Urbana- 19. Bennett GN, Rudolph FB: The central metabolic pathway from acetyl-CoA Champaign, Urbana, IL 61801, USA. to butyryl-CoA in Clostridium acetobutylicum. Fems Microbiol Rev 1995, 17(3):241-249. Authors’ contributions 20. Boynton ZL, Bennett GN, Rudolph FB: Cloning, sequencing, and YW, XL, HPB conceived and designed the study. YW and XL performed the expression of clustered genes encoding β-hydroxybutyryl-coenzyme A experiments. YW, XL and YM analyzed the RNA-Seq data. YW, XL and HPB (CoA) dehydrogenase, crotonase, and butyryl-CoA dehydrogenase from wrote the manuscript, with input from all authors. All authors discussed the Clostridium acetobutylicum ATCC 824. J Bacteriol 1996, 178(11):3015-3024. results, read and approved the final manuscript. 21. Belouski E, Watson DE, Bennett GN: Cloning, sequence, and expression of the phosphofructokinase gene of Clostridium acetobutylicum ATCC 824 Competing interests in Escherichia coli. Curr Microbiol 1998, 37(1):17-22. The authors declare that they have no competing interests. 22. Boynton ZL, Bennett GN, Rudolph FB: Cloning, sequencing, and expression of genes encoding phosphotransacetylase and acetate Received: 2 June 2011 Accepted: 30 September 2011 kinase from Clostridium acetobutylicum ATCC 824. Appl Environ Microb Published: 30 September 2011 1996, 62(8):2758-2766. 23. Walter KA, Nair RV, Cary JW, Bennett GN, Papoutsakis ET: Sequence and References arrangement of two genes of the butyrate-synthesis pathway of 1. Wu M, Wang M, Liu JH, Huo H: Assessment of potential life-cycle energy Clostridium acetobutylicum ATCC 824. Gene 1993, 134(1):107-111. and greenhouse gas emission effects from using corn-based butanol as 24. Düerre P: Formation of solvents in clostridia. In Handbook on clostridia. a transportation fuel. Biotechnol Progr 2008, 24(6):1204-1214. Edited by: Düerre P. London: CRC press; 2005:671-693. 2. Keis S, Bennett CF, Ward VK, Jones DT: Taxonomy and phylogeny of 25. Cornillot E, Nair RV, Papoutsakis ET, Soucaille P: The genes for butanol and industrial solvent-producing clostridia. Int J Syst Bacteriol 1995, acetone formation in Clostridium acetobutylicum ATCC 824 reside on a 45(4):693-705. large plasmid whose loss leads to degeneration of the strain. J Bacteriol 3. Ezeji T, Blaschek HP: Fermentation of dried distillers’ grains and solubles 1997, 179(17):5442-5447. (DDGS) hydrolysates to solvents and value-added products by 26. Gerischer U, Dürre P: Cloning, sequencing, and molecular analysis of the solventogenic clostridia. Bioresource Technol 2008, 99(12):5232-5242. acetoacetate decarboxylase gene region from Clostridium 4. Passalacqua KD, Varadarajan A, Ondov BD, Okou DT, Zwick ME, acetobutylicum. J Bacteriol 1990, 172(12):6907-6918. Bergman NH: Structure and complexity of a bacterial transcriptome. J 27. Petersen DJ, Cary JW, Vanderleyden J, Bennett GN: Sequence and Bacteriol 2009, 191(10):3203-3211. arrangement of genes encoding enzymes of the acetone-production 5. Perkins TT, Kingsley RA, Fookes MC, Gardner PP, James KD, Yu L, Assefa SA, pathway of Clostridium acetobutylicum ATCC 824. Gene 1993, He M, Croucher NJ, Pickard DJ, et al: A strand-specific RNA-Seq analysis of 123(1):93-97. the transcriptome of the typhoid bacillus Salmonella typhi. PLoS Genet 28. Chen CK, Blaschek HP: Effect of acetate on molecular and physiological 2009, 5(7). aspects of Clostridium beijerinckii NCIMB 8052 solvent production and 6. Wurtzel O, Sapra R, Chen F, Zhu YW, Simmons BA, Sorek R: A single-base strain degeneration. Appl Environ Microb 1999, 65(2):499-505. resolution map of an archaeal transcriptome. Genome Res 2010, 29. Rui H: The cloning of a putative regulatory gene and the sol region from 20(1):133-141. Clostridium beijerinckii. Master thesis Virginia Polytechnic Institute and State 7. Mortazavi A, Williams BA, McCue K, Schaeffer L, Wold B: Mapping and University, Department of Biochemistry; 1999. quantifying mammalian transcriptomes by RNA-Seq. Nat Methods 2008, 30. Kosaka T, Nakayama S, Nakaya K, Yoshino S, Furukawa K: Characterization 5(7):621-628. of the sol operon in butanol-hyperproducing Clostridium 8. Nagalakshmi U, Wang Z, Waern K, Shou C, Raha D, Gerstein M, Snyder M: saccharoperbutylacetonicum strain N1-4 and its degeneration The transcriptional landscape of the yeast genome defined by RNA mechanism. Biosci Biotechnol Biochem 2007, 71(1):58-68. sequencing. Science 2008, 320:1344-1349. 31. Paredes CJ, Rigoutsos I, Papoutsakis ET: Transcriptional organization of the 9. Carver T, Berriman M, Tivey A, Patel C, Bohme U, Barrell BG, Parkhill J, Clostridium acetobutylicum genome. Nucleic Acids Res 2004, Rajandream M-A: Artemis and ACT: viewing, annotating and comparing 32(6):1973-1981. sequences stored in a relational database. Bioinformatics 2008, 32. Bahl H, Muller H, Behrens S, Joseph H, Narberhaus F: Expression of heat 24(23):2672-2676. shock genes in Clostridium acetobutylicum. Fems Microbiol Rev 1995, 10. Carver T, Thomson N, Bleasby A, Berriman M, Parkhill J: DNAPlotter: circular 17(3):341-348. and linear interactive genome visualization. Bioinformatics 2009, 33. Tomas CA, Welker NE, Papoutsakis ET: Overexpression of groESL in 25(1):119-120. Clostridium acetobutylicum results in increased solvent production and 11. Steil L, Serrano M, Henriques AO, Volker U: Genome-wide analysis of tolerance, prolonged metabolism, and changes in the cell’s temporally regulated and compartment-specific gene expression in transcriptional program. Appl Environ Microb 2003, 69(8):4951-4965. sporulating cells of Bacillus subtilis. Microbiology-(UK) 2005, 151:399-420. 34. Tomas CA, Beamish J, Papoutsakis ET: Transcriptional analysis of butanol 12. Wang ST, Setlow B, Conlon EM, Lyon JL, Imamura D, Sato T, Setlow P, stress and tolerance in Clostridium acetobutylicum. J Bacteriol 2004, Losick R, Eichenberger P: The forespore line of gene expression in Bacillus 186(7):2006-2018. subtilis. J Mol Biol 2006, 358(1):16-37. 35. de Mendoza D, Schujman GE, Aguilar PS: Biosynthesis and function of 13. Shi Z, Blaschek HP: Transcriptional analysis of Clostridium beijerinckii membrane lipids. In Bacillus subtilis and its closest relatives: from genes to NCIMB 8052 and the hyper-butanol-producing mutant BA101 during the cells. Edited by: Sonenshein A, Hoch J, Losick R. Washington, D.C.: ASM shift from acidogenesis to solventogenesis. Appl Environ Microb 2008, Press; 2002:43-55. 74(24):7709-7714. Wang et al. BMC Genomics 2011, 12:479 Page 10 of 10 http://www.biomedcentral.com/1471-2164/12/479 36. Besemer J, Borodovsky M: GeneMark: web software for gene finding in prokaryotes, eukaryotes and viruses. Nucleic Acids Res 2005, 33(Suppl 2): W451-W454. 37. Severin A, Woody J, Bolon Y-T, Joseph B, Diers B, Farmer A, Muehlbauer G, Nelson R, Grant D, Specht J, et al: RNA-Seq Atlas of Glycine max: A guide to the soybean transcriptome. BMC Plant Biology 2010, 10(1):160. 38. Fisher RA: A preliminary linkage test with Agouti and Undulated mice; The fifth linkage-group. Heredity London 1949, 3:229-241. 39. Kuo CH, Ochman H: The extinction dynamics of bacterial pseudogenes. PLoS Genet 2010, 6(8). 40. Lerat E, Ochman H: Recognizing the pseudogenes in bacterial genomes. Nucleic Acids Res 2005, 33(10):3125-3132. 41. Hirotsune S, Yoshida N, Chen A, Garrett L, Sugiyama F, Takahashi S, Yagami K-i, Wynshaw-Boris A, Yoshiki A: An expressed pseudogene regulates the messenger-RNA stability of its homologous coding gene. Nature 2003, 423(6935):91-96. 42. Zheng DY, Frankish A, Baertsch R, Kapranov P, Reymond A, Choo SW, Lu YT, Denoeud F, Antonarakis SE, Snyder M, et al: Pseudogenes in the ENCODE regions: Consensus annotation, analysis of transcription, and evolution. Genome Res 2007, 17(6):839-851. 43. Svensson Ö, Arvestad L, Lagergren J: Genome-wide survey for biologically functional pseudogenes. PLoS Comput Biol 2006, 2(5):e46. 44. Wilhelm BT, Marguerat S, Watt S, Schubert F, Wood V, Goodhead I, Penkett CJ, Rogers J, Bahler J: Dynamic repertoire of a eukaryotic transcriptome surveyed at single-nucleotide resolution. Nature 2008, 453(7199):1239-1243. 45. t Hoen PAC, Ariyurek Y, Thygesen HH, Vreugdenhil E, Vossen R, de Menezes RX, Boer JM, van Ommen GJB, den Dunnen JT: Deep sequencing- based expression analysis shows major advances in robustness, resolution and inter-lab portability over five microarray platforms. Nucleic Acids Res 2008, 36(21):e141. 46. Annous BA, Blaschek HP: Isolation and characterization of Clostridium acetobutylicum mutants with enhanced amylolytic activity. Appl Environ Microb 1991, 57(9):2544-2548. 47. Qureshi N, Lolas A, Biaschek HP: Soy molasses as fermentation substrate for production of butanol using Clostridium beijerinckii BA101. J Ind Microbiol Biot 2001, 26(5):290-295. 48. Jesse TW, Ezeji TC, Qureshi N, Blaschek HP: Production of butanol from starch-based waste packing peanuts and agricultural waste. J Ind Microbiol Biot 2002, 29(3):117-123. 49. Ezeji TC, Qureshi N, Blaschek HP: Production of acetone, butanol and ethanol by Clostridium beijerinckii BA101 and in situ recovery by gas stripping. World J Microb Biot 2003, 19(6):595-603. 50. Li H, Ruan J, Durbin R: Mapping short DNA sequencing reads and calling variants using mapping quality scores. Genome Res 2008, 18(11):1851-1858. doi:10.1186/1471-2164-12-479 Cite this article as: Wang et al.: Single-nucleotide resolution analysis of the transcriptome structure of Clostridium beijerinckii NCIMB 8052 using RNA-Seq. BMC Genomics 2011 12:479. Submit your next manuscript to BioMed Central and take full advantage of: • Convenient online submission • Thorough peer review • No space constraints or color figure charges • Immediate publication on acceptance • Inclusion in PubMed, CAS, Scopus and Google Scholar • Research which is freely available for redistribution Submit your manuscript at www.biomedcentral.com/submit http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png BMC Genomics Springer Journals

Single-nucleotide resolution analysis of the transcriptome structure of Clostridium beijerinckii NCIMB 8052 using RNA-Seq

BMC Genomics , Volume 12 (1) – Sep 30, 2011

Loading next page...
 
/lp/springer-journals/single-nucleotide-resolution-analysis-of-the-transcriptome-structure-H20opLutfB

References (107)

Publisher
Springer Journals
Copyright
Copyright © 2011 by Wang et al; licensee BioMed Central Ltd.
Subject
Life Sciences; Life Sciences, general; Microarrays; Proteomics; Animal Genetics and Genomics; Microbial Genetics and Genomics; Plant Genetics & Genomics
eISSN
1471-2164
DOI
10.1186/1471-2164-12-479
pmid
21962126
Publisher site
See Article on Publisher Site

Abstract

Background: Clostridium beijerinckii is an important solvent producing microorganism. The genome of C. beijerinckii NCIMB 8052 has recently been sequenced. Although transcriptome structure is important in order to reveal the functional and regulatory architecture of the genome, the physical structure of transcriptome for this strain, such as the operon linkages and transcript boundaries are not well understood. Results: In this study, we conducted a single-nucleotide resolution analysis of the C. beijerinckii NCIMB 8052 transcriptome using high-throughput RNA-Seq technology. We identified the transcription start sites and operon structure throughout the genome. We confirmed the structure of important gene operons involved in metabolic pathways for acid and solvent production in C. beijerinckii 8052, including pta-ack, ptb-buk, hbd-etfA-etfB-crt (bcs) and ald-ctfA-ctfB-adc (sol) operons; we also defined important operons related to chemotaxis/motility, transcriptional regulation, stress response and fatty acids biosynthesis along with others. We discovered 20 previously non-annotated regions with significant transcriptional activities and 15 genes whose translation start codons were likely mis-annotated. As a consequence, the accuracy of existing genome annotation was significantly enhanced. Furthermore, we identified 78 putative silent genes and 177 putative housekeeping genes based on normalized transcription measurement with the sequence data. We also observed that more than 30% of pseudogenes had significant transcriptional activities during the fermentation process. Strong correlations exist between the expression values derived from RNA-Seq analysis and microarray data or qRT-PCR results. Conclusions: Transcriptome structural profiling in this research provided important supplemental information on the accuracy of genome annotation, and revealed additional gene functions and regulation in C. beijerinckii. Background greater potential for biosolvent production than Solvents such as acetone, butanol and ethanol (ABE) pro- C. acetobutylicum. duced through microbial fermentation represent important The genome of C. beijerinckii NCIMB 8052 was potential renewable fuels and chemicals [1]. Clostridium sequenced by the DOE Joint Genome Institute in 2007 acetobutylicum and C. beijerinckii are among the promi- (JGI project ID 3634512). The genome size is 6.0 Mb, nent solvent-producing species. Although C. beijerinckii is which is 50% larger than that of C. acetobutylicum ATCC phenotypically similar to C. acetobutylicum, the saccharoly- 824. The C. beijerinckii 8052 solvent-producing genes are all located on the chromosome, as opposed to the location tic strains are phylogenetically distant from the amylolytic C. acetobutylicum ATCC 824 type strain [2]. C. beijerinckii of these genes on a mega-plasmid in C. acetobutylicum exhibits a broader substrate range and optimum pH for 824. Although transcriptome structural organization is growth and solvent production [3]; thus it may have important in order to reveal the functional and regulatory architecture of the genome, such annotation for the C. bei- jerinckii 8052 genome is far from complete. Current gen- ome annotation was made by computational analysis * Correspondence: [email protected] Institute for Genomic Biology, University of Illinois at Urbana-Champaign, based on gene prediction algorithms. Although this allows Urbana, IL 61801, USA for the determination of the complete set of gene loci and Full list of author information is available at the end of the article © 2011 Wang et al; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Wang et al. BMC Genomics 2011, 12:479 Page 2 of 10 http://www.biomedcentral.com/1471-2164/12/479 intergenic regions of the genome, it does not provide suffi- cient information concerning the transcriptional organiza- tion on a genome-wide level. For example, transcriptome structures such as operon linkages and transcript bound- aries, etc. are not well understood and lack confirmation with experimental approaches. Next-generation high- throughput sequencing technology enabled us to obtain millions of cDNA reads simultaneously. In RNA-Seq ana- lysis, these reads can be assembled against the genome sequence, and expression values calculated based on the reads mapped to genes. Deep sequencing of cDNA pool allowed us to study the bacterial transcriptome structure and gene expression at an unprecedented resolution and depth [4-8]. In this study, we used RNA-Seq technology to investigate C. beijerinckii 8052 transcriptome structure at a single-nucleotide resolution. We identified the transcrip- tion start sites and operon structure throughout the genome. We confirmed the structure of important gene operons involved in metabolic pathways for acid and solvent production in C. beijerinckii 8052. We defined important operons related to chemotaxis/motility, tran- scriptional regulation, stress response and fatty acids bio- synthesis. We discovered 20 previously non-annotated Figure 1 Fermentation kinetics of C. beijerinckii 8052 batch regions with significant transcriptional activities and 15 culture. (A) Cell growth curve with sampling points for RNA-Seq genes whose translation start codons were likely mis- indicated by red arrows. (B) Solvents and acids production over annotated. The results from this study significantly time with the onset of solvent production indicated by a green enhanced the accuracy of current genome annotation, and arrow. provided an essential reference point for other researchers working in related fields. The sequence coverage per base was plotted and visua- Results and discussion lized using the genome browser Artemis and DNAPlotter Growth kinetics and ABE fermentation [5,9,10] (Additional file 1 Figure S1). The sequence cover- C. beijerinckii 8052 grew rapidly with a very short lag age of the genes (based on number of mapped reads) for phase in the batch fermentation with P2 medium supple- different samples was observed to be different in the over- mented with yeast extract and glucose (Figure 1A). The all sequence coverage profiles, which might be a reflection fermentation experienced a shift from acidogenesis to of the physiological states as the cell transited from one solventogenesis at approximately 4.5-8 h. Formation of growth phase to another. For example, sample 2 repre- solvents was detected at between 4.5-6.5 h after the start sented a time period that was at the beginning of the tran- of fermentation, which corresponded to the late expo- sition from acidogenesis to solventogenesis. In this sample, nential growth phase. Butanol continued to increase genes encoding butyrate kinase (buk, 233080-234147 nt; throughout the stationary phase (Figure 1B). Samples for Gene ID is listed in Additional file 2 Table S1, similarly RNA isolation were collected at time points during acido- hereinafter), acetyl-CoA acetyltransferase (thl,499121- genesis (2 and 4.5 h) and solventogenesis (after 6.5 h) 500302 nt), glyceraldehyde-3-phosphate dehydrogenase (Figure 1A). The combined information from these sam- (gap, 710763-711764 nt), fructose-bisphosphate aldolase ples collected at different growth phases is representative (fba, 2199060-2199926 nt), aldehyde dehydrogenase (ald, for transcriptome structural analysis. 4399026-4400432 nt), acetoacetate decarboxylase (adc, 4401916-4402656 nt) were all actively expressed (high Transcriptome definition and structure peaks in Additional file 1 Figure S1). These genes are The 75-nt cDNA reads were mapped to the C. beijerinckii involved in acid and butyryl-CoA formation, solventogen- 8052 genome. Only those reads that mapped unambigu- esis and glycolysis. The time frame of sample 4 was consis- ously to the genome were used for further analyses tent with the transition to non-active growth and (Table 1). Collectively, around 75.5% of the C. beijerinckii clostridial spores formation. In this sample, genes encod- 8052 genome was transcribed in at least one sample, even ing stage V sporulation protein T (spoVT, 115288-115836 though the fraction in each single sample was much less. nt), spore coat protein cotJC (cotJC, 2411617-2412183 nt), Wang et al. BMC Genomics 2011, 12:479 Page 3 of 10 http://www.biomedcentral.com/1471-2164/12/479 Table 1 Summary of RNA-Seq sequencing and data analysis results Sample 1 2 3 4 5 6* Total Time collected (h) 2 4.5 10 14 17 26.5 Total No. of reads 8988633 9457480 8011531 8448929 10363535 38574501 83844609 No. of reads mapped 8473125 9037616 7514804 7730815 9842491 37676913 80275764 No. of reads unambiguously mapped 6776544 7274568 6096405 6189652 8096169 35027722 69461060 No. of bases unambiguously mapped 508240800 545592600 457230375 464223900 607212675 2627079150 5209579500 Percentage of genome represented 38.71 36.59 43.84 40.60 41.34 50.67 75.46 No. of genes with detectable expression** 4219 4082 4496 4453 4487 4750 5024 -1 -2 -2 -2 -1 -2 -2 Range in expression levels (RPKM) 3.2 × 10 5.8 × 10 4.5 × 10 7.0 × 10 1.0 × 10 3.6 × 10 3.6 × 10 4 4 4 4 4 4 4 ~ 2.5 × 10 ~ 6.0 × 10 ~ 2.5 × 10 ~ 9.0 × 10 ~ 8.6 × 10 ~ 9.8 × 10 ~ 9.8 × 10 *Sample 6 was sequenced with 1 sample/lane, while samples 1-5 with 2 samples/lane. See Methods section for more details; **Pseudogenes included. spore coat peptide assembly protein cotJB (cotJB, 2412190- Cbei_0465, Cbei_1518 and Cbei_3542, respectively. In 2412453 nt), small acid-soluble spore protein, sspA (sspA, addition, a Known Riboswitch-like Element (flavin mono- 3596975-3597184 nt), small acid-soluble spore protein, nucleotide (FMN) riboswitch) was found in the 5’-UTR sspC2 (sspC2, 3814296-3814505 nt) were actively of Cbei_1224. More details about the identified putative expressed among others (Additional file 1 Figure S1). riboswitches were described in Additional file 6 Table S4. These genes are involved in the regulation of spore coat Previously, an operon containing genes encoding gly- assembly and other sporulation-related processes [11,12]. ceraldehyde-3-phosphate dehydrogenase (gap), phospho- These results were in good agreement with our previous glycerate kinase (pgk) and triosephosphate isomerase observation using a 500-gene set DNA microarray [13]. A (tpi) was revealed by transcriptional analyses for C. acet- more detailed genome-wide transcriptional analysis of obutylicum, while the gene encoding phosphoglycerate C. beijerinckii 8052 during the shift from acidogenesis to mutase (pgm) was identified as not a member of this operon [17]. In this study, employing the single-nucleo- solventogenesis is currently underway in our lab. tide resolution sequence data, a similar gap-pgk-tpi The distinct transcript boundaries and multigene operon structure in a genome are essential for elucidat- operon was observed, upstream and apart from pgm for ing a genome’s regulatory mechanisms. Through tran- C. beijerinckii 8052. In addition, it is very interesting scriptome profiling analysis, the transcript boundaries that the upstream gene Cbei_0596 was also found to be and operon structures were identified across the entire linked with this operon (Figure 2A), which was validated C. beijerinckii 8052 genome (Additional file 3 Table S2). by end-point RT-PCR as discussed below (Additional Of all the > 5000 genes, 2151 genes (42.2%) were deter- file 7 Figure S2 and Additional file 8 Table S5). The mined as being parts of multi-gene operon structures. upstream gene (CA_C0708) of gap in C. acetobutylicum The largest number of genes in a single operon was 32 824 genome is conjectured to be the counterpart of (ribosomal proteins operon, Cbei_0150-0181, 192128- Cbei_0596 in C. beijerinckii 8052, since they both 207934 nt). A refined genome annotation file (in Gen- encode putative transcriptional regulator proteins and Bank format) was generated (Additional file 4) based on share 64% sequence identity. However, the intergenic the findings from this work and the current C. beijer- distance between CA_C0708 and CA_C0709 in C. aceto- inckii 8052 genome annotation in NCBI. The GenBank butylicum 824 genome is 146 nt, while that between file can also be downloaded from https://netfiles.uiuc. Cbei_0596 and Cbei_0597 in C. beijerinckii 8052 gen- edu/blaschek/www/Wang-BMC2011. ome is only 71 nt. Although Cbei_0596-0599 are orga- In addition, 5’-untranslated regions (5’-UTRs) were nized in the same operon, the sequence coverage depth determined at the same time. Similar to other bacteria downstream of gap was unexpectedly lower than that of [4,14], most of the identified 5’-UTRswereveryshort. the anterior (Figure 2A). An intrinsic terminator was Among all the 1605 5’-UTRs that were estimated with predicted adjacent and downstream of gap (711818- high confidence (that is, valid 5’-UTRs identified in the 6 711852 nt) using the bacterial genome transcription samples with a ≤ 50 nt window), 1262 were ≤ 50 nt, and terminator prediction software TransTerm [18]. The ter- only 65 were ≥ 100 nt (Additional file 5 Table S3). RibEx minator may have played a key role in attenuating the [15] was used to test for the putative regulatory elements transcription of the downstream genes in this operon. among the 5’-UTRs over 100 bp. Three Predicted Ribos- Further experiments need to be carried out to confirm witch-like Elements (do not belong to any Rfam [16]; yet the activity and function of this element. present very significant associations with either a COG The butyryl-CoA formation related genes in C. aceto- or a KEGG pathway) were found in the 5’-UTRs of butylicum 824 encoding 3-hydroxybutyryl-CoA Wang et al. BMC Genomics 2011, 12:479 Page 4 of 10 http://www.biomedcentral.com/1471-2164/12/479 [24]. In C. acetobutylicum 824, the genes adhE (acetal- dehyde-CoA/alcohol dehydrogenase), ctfA (acetoacetyl- CoA: acetate/butyrate-CoA transferase subunit A) and ctfB (acetoacetyl-CoA: acetate/butyrate-CoA transferase subunit B) are located in the sol (solvent formation) operon on the mega-plasmid pSOL1 whose loss leads to the degeneration of the strain [25], while adc (acetoace- tate decarboxylase) is organized in a monocistronic operon in the opposite direction [26,27]. In this study, a sol operon organized in the order of ald-ctfA-ctfB-adc was observed (Figure 2B and Additional file 3 Table S2). Previously, Chen and Blaschek (1999) speculated that the ald, ctfA, ctfB and adc genes were located in an operon following a Northern hybridization analysis of C. beijerinckii 8052 total RNA [28]. With direct sequen- cing data, this study successfully confirmed the above hypothesis, and identified the transcriptional start sites (TSS) upstream of ald gene. This solventogenic gene arrangement in C. beijerinckii 8052 is consistent with that observed in C. beijerinckii NRRL B593 and C. sac- charoperbutylacetonicum N1-4 [29,30]. A flagellar/chemotaxis gene operon (CA_C2225-C2215) was previously defined in C. acetobutylicum [31]; a coun- terpart organized in exactly the same order (Cbei_4312- Figure 2 Single-nucleotide resolution-sequencing data revealed 4302) in C. beijerinckii 8052 was observed in this study important operon structures in C. beijerinckii 8052. (A) The (Additional file 3 Table S2). Similarly to C. acetobutylicum glycolytic genes (gap-pgk-tpi) operon structure (from sample 1). The upstream gene Cbei_0596 encoding a putative transcriptional 824, the transcriptional regulator sigF operon was also regulator protein was also linked with this operon. The intrinsic confirmed in C. beijerinckii 8052 with the sequencing data terminator predicted with TransTerm [18] is represented with a small that includes the forespore-specific sigma factor gene sigF, yellow bar. (B) The solventogenic genes are organized in the sol the anti-sigF factor gene spoIIAB, and the anti-anti-sigF operon in the order of ald-ctfA-ctfB-adc. Gene positions and directions factor gene spoIIAA (Additional file 3 Table S2). In Bacil- are shown beneath by arrows. The color of arrows indicates the COG lus subtilis and C. acetobutylicum, the class I heat shock functional class of genes as defined in Additional file 1 Figure S1. genes are organized in dnaKJ (organized in the order of hrcA-grpE-dnaK-dnaJ in C. acetobutylicum 824) and groESL (groES-groEL) operons [32,33]. The similar organi- dehydrogenase (hbd), crotonase (crt), and butyryl-CoA zation of dnaKJ and groESL operons for C. beijerinckii dehydrogenase (bcd) were identified as a bcs (butyryl- 8052 was also observed in this study (Additional file 3 CoA synthesis) operon in the order of crt-bcd-etfB Table S2). The fatty acid biosynthesis genes are organized (encoding electron transfer flavoprotein subunit beta)- in a single fab operon in C. acetobutylicum (CA_C3568- etfA (encoding electron transfer flavoprotein subunit C3580) [34], while in this study, the fab genes (Cbei_1067- alpha)-hbd [19,20]. A homologous operon structure 1077) in C. beijerinckii 8052 were observed to be was observed for C. beijerinckii 8052 (Cbei_0321-0325) organized in four operons, which is similar to those of transcribed in the same order but the opposite direc- tion (Additional file 3 Table S2). In addition, based on B. subtilis [35] (Additional file 3 Table S2). the sequencing data, similar pfk (phosphofructokinase)- While the current C. beijerinckii 8052 genome was pyk (pyruvate kinase), pta (phosphotransacetylase)-ack annotated based on bioinformatical predictions, the RNA- (acetate kinase), ptb (phosphate butyryltransferase)-buk Seq sequencing approach provides additional experimental gene operons were identified in C. beijerinckii 8052 as evidence for genome annotation. By comparing the those described in C. acetobutylicum 824 [21-23] sequence coverage data to the genome annotation, 20 (Additional file 3 Table S2). Among them, ptb-buk non-annotated regions were found to have significant operon was also verified by end-point RT-PCR experi- transcriptional activities (Figure 3A, see also the supple- ment as discussed below (Additional file 7 Figure S2 mental texts in Additional file 9). These regions may and Additional file 8 Table S5). represent potential new genes or regulatory RNAs. Twelve Nearly all the genes encoding solventogenic enzymes potential new genes were predicted in these regions using GeneMark [36]. Additional details about this test and the have been cloned and characterized in C. acetobutylicum Wang et al. BMC Genomics 2011, 12:479 Page 5 of 10 http://www.biomedcentral.com/1471-2164/12/479 more during the course of a batch fermentation. Besides, several genes encoding the subunits (such as soborse- specific subunits, lactose/cellobiose-specific subunits) of phosphotransferase system (PTS) related to the trans- port of sugars other than glucose were also among the list of silent genes. Since glucose was the only carbohy- drate used in this study, these enzymes were not induced during the fermentation process. Putative housekeeping genes (HKGs) Some genes have little variation in expression level through the entire fermentation process, and they are regarded as putative housekeeping genes (HKGs). For accurate gene expression quantification, normalization of gene expression against HKGs (endogenous control or reference gene) is generally required. In this study, 177 protein-coding genes were identified as putative HKGs with the lowest coefficient of variation(CV)inRPKM (see Methods section) values among all the sampling points (CV = standard deviation/mean; < 30% for listed genes in Additional file 12 Table S8) [37]. A COG func- tional group analysis by Fisher’s exact test found that COG functional category D (Cell cycle control, mitosis and meiosis, 10.3%), L (Replication, recombination and Figure 3 Mapping the high resolution RNA-Seq data back to repair, 9.2%) and E (Amino acid transport and metabo- the genome allows the test of current C. beijerinckii 8052 lism, 6.6%) were overrepresented in this list [37,38]. This genome annotation. (A) Sequence coverage in the Cbei_4206- 4207 region, showing transcription activity across a region not list of putative HKGs was generated based on the tran- included in the current genome annotation (from sample 2). (B) scription data obtained from the six samples under the Sequence coverage near Cbei_1787, with the sequence coverage certain batch fermentation conditions employed in this starts downstream the annotated translation start codon (from study. This list can be considered as a starting point for sample 5). Gene positions and directions are shown beneath by identifying HKGs for C. beijerinckii.However,whether arrows. The color of arrows indicates the COG functional class of genes as defined in Additional file 1 Figure S1. these genes are stably expressed under different experi- mental conditions requires further study. A prediction of the promoters for primary sigma fac- tors for all the putative HKGs was carried out using predicted genes were summarized in Additional file 10 BPROM (http://linux1.softberry.com/berry.phtml). In Table S6. total, 88 promoters along with their corresponding -10 In addition, 15 transcripts were identified with TSS box and -35 box sequences were predicted (Additional that are downstream of the current annotated transla- file 12 Table S8). tion start sites (Figure 3B, see also the supplemental texts in Additional file 9). For these regions, re-annota- Pseudogenes tion is needed since the current start codons may have Pseudogenes in bacteria genomes are non-functional been mis-annotated. copies of gene fragments usually created by random Putative silent genes mutations and chance events [39,40]. Although pseudo- Based on RNA-Seq sequence data, 78 protein-encoding genes are usually non-functional, it has been reported genes demonstrated no transcripts over all six sampling that quite a few of them can still go through the tran- time points, and these genes are likely silent (Additional scription process [41-43]. In this study, 26 out of the 82 file 11 Table S7). Thirty-one out of them were genes pseudogenes in C. beijerinckii 8052 genome were found encoding hypothetical proteins. In addition, half of the to have significant transcriptional activities over the genes encoding transposases (14 out of the total 29 in course of fermentation (Additional file 13 Table S9). the genome) were silent. Although transposases may However, although globally the 82 pseudogenes comprise have played important roles during the evolution of about 0.6% of the predicted CDS of the genome, only < C. beijerinckii,mostofthemare notasfunctionalany 0.1% of all the RNA-Seq reads mapped to these genes, Wang et al. BMC Genomics 2011, 12:479 Page 6 of 10 http://www.biomedcentral.com/1471-2164/12/479 indicating a significantly lower transcriptional activity when compared to the regular protein coding genes. In addition, six pseudogenes were found to be completely silent throughout the fermentation process (Additional file 13 Table S9). This is overrepresented among the silent genes when compared to the protein-coding genes based on a Fisher’s exact test (p = 0.0015). When pseudogenes share high sequence identity with other functional genes in the genome, it is usually on one hand very difficult to design probes to detect the transcription of pseudogenes with traditional methods, and on the other pseudogenes can lead to amplification bias during genetic studies of the functional genes with high sequence identity to pseudogenes [42]. Apparently such problems can be avoided with RNA-Seq method, which is one of the unprecedented advantages of RNA- Seq technique. For summarization and easy reference to the readers, a table (Additional file 14 Table S10) summarizing the var- ious supplementary information provided by RNA-Seq study to the current genome annotation was listed in the supplemental materials. End-point RT-PCR End-point RT-PCR was used to validate the operon struc- ture results obtained from RNA-Seq analysis [4]. One Figure 4 Correlations of RNA-Seq data with microarray and qRT- gene pair that is highly likely to be co-operonic with an PCR data. (A) Comparison of normalized sequence coverage depth intergenic distance of 7 bp was chosen as a positive con- from RNA-Seq and absolute microarray fluorescence intensity. Shown trol (Cbei_0341-0342). Ten other gene pairs with long is a plot of log -transformation of RPKM for sample 2 (at 4.5 h) and raw intergenic distances (71-440 bp) where there were high microarray intensities for an equivalent sample (at 5 h) from Shi and Blaschek [13]. (B) Real time qRT-PCR verification of RNA-Seq results for transcription levels were chosen. All the operon structures selected genes (from sample 4, see also Additional file 15 Table S11). determined by sequence data in the chosen gene pairs were confirmed by end-point RT-PCR results, indicating that RNA-Seq is a valid approach for operon structure sequencing data were compared. Similar differential identification (Additional file 7 Figure S2 and Additional expression patterns by microarray data from equivalent file 8 Table S5). Among them, the ptb-buk operon and the samples (5 h vs. 13 h) were observed (Figure 5). Although linkage between Cbei_0596 and Cbei_0597 (gap)asdis- only a limited number of genes were compared, the good cussed above were also confirmed. correlation between two methods further demonstrated the effectiveness of RNA-Seq approach. An additional Correlations of RNA-Seq data with microarray and qRT- advantage of RNA-Seq is that the sequence data can mea- PCR data sure expression levels for every gene without bias caused The RNA-Seq data obtained in this study were compared by sequence-specific differences in hybridization efficiency to the results obtained previously using a 500-gene set in microarray-based methods [4,44], and sequence data DNA microarray [13]. A small number of genes that did allow one to measure gene expression more accurately not allow for unambiguous mapping in RNA-Seq analysis with a much higher dynamic range as indicated in this were not included in the comparison. A good correlation study (Table 1) and other references [4,8,44,45]. was observed between the normalized coverage depth for The expression measurement with RNA-Seq data was sample 2 (collected at 4.5 h) and the raw microarray fluor- further validated using real time quantitative reverse escence intensities for an equivalent sample (collected at transcription PCR (qRT-PCR). Twenty-five genes were 5 h under the same growth condition) (Figure 4A). Specifi- selected for the test (Additional file 13 Table S9). A cally, the gene expression patterns of glycolytic genes high degree of correlation (R = 0.959) between the Cbei_0597-0600 and sporulation genes Cbei_2069-2071 threshold value (Ct) and the log -tranformation of from samples 2 (4.5 h) and 4 (14 h) measured by RPKM was observed (Figure 4B). Wang et al. BMC Genomics 2011, 12:479 Page 7 of 10 http://www.biomedcentral.com/1471-2164/12/479 Methods Bacterial culture and fermentation experiment Laboratory stocks of C. beijerinckii 8052 spores were stored in sterile H O at 4°C [46]. Spores were heat- shocked at 80°C for 10 min, followed by cooling on ice for 5 min. The heat-shocked spores were inoculated into tryptone-glucose-yeast extract (TGY) medium con- -1 -1 -1 taining 30 g L tryptone, 20 g L glucose, 10 g L -1 yeast extract and 1 g L L-cysteine at a 1% inoculum level. The TGY culture was incubated at 35 ± 1°C for 12-14 h in an anaerobic chamber under N :CO :H 2 2 2 (volume ratio of 85:10:5) atmosphere. Subsequently, actively growing culture was inoculated into a model -1 -1 Figure 5 Comparison of transcriptional profiles for sample 2 (4.5 solution containing 60 g L glucose, 1 g L yeast h, black line) and sample 4 (14 h, red line), normalized to the extract, and filter-sterilized P2 medium [47,48] in a Six- gene length and genome-wide total number of unambiguously fors bioreactor system (Infors AG, Bottmingen, Switzer- mapped reads for that sample.Shown arean ~5.4 kb segment land). Throughout the experiment, oxygen-free nitrogen surrounding the gap-pgk-tpi glycolytic gene operon (Cbei_0597-0600) and an ~1.2 kb segment surrounding the cotJC-cotJB (Cbei_2069-2071) was flushed through the broth to maintain anaerobiosis. sporulation gene operon. Specific genes are indicated with arrows, Temperature was controlled at 35 ± 1°C. A stirring at whose colors indicate the COG functional class of genes as defined in 50 rpm was employed for mixing. During the course of Additional file 1 Figure S1. Below each gene is the expression level fold fermentation, samples were collected for cell density change that was measured by microarray data for equivalent samples and product concentration measurements. For sequen- (5 h vs. 13 h in Shi and Blaschek [13]; microarray data for Cbei_2071 are not available). cing purpose, RNA samples were taken over the early exponential, late exponential and stationary phases (sample 1-6 at 2, 4.5, 10, 14, 17 and 26.5 h respectively as shown in Figure 1A). Conclusions A single-nucleotide resolution analysis of the C. beijer- Culture growth and fermentation products analysis inckii NCIMB 8052 transcriptome structure was con- Culture growth was measured by following optical density ducted using high-throughput RNA-Seq technology. (OD) in the fermentation broth at A using a BioMate 5 The transcription start sites and operon structures were UV-Vis Spectrophotometer (Thermo Fisher Scientific Inc., identified throughout the genome. The structure of Waltham, MA). ABE, acetic acid, and butyric acid concen- operons involved in metabolic pathways for acid and trations were quantified using gas chromatography (GC) solvent production in C. beijerinckii 8052 were con- system as previously described [49]. firmed. Important operons related to chemotaxis/moti- lity, transcriptional regulation, stress response and fatty RNA isolation, library construction and sequencing acids biosynthesis along with others were defined. In preparation for RNA isolation, 10 ml cultures were har- Twenty previously non-annotated regions were discov- vested at six time points, and centrifuged at 4,000 × g for ered with significant transcriptional activities and 15 10 min at 4°C. Total RNA was extracted from the cell pel- genes were identified whose translation start codons let using Trizol reagent based on manufacture’s protocol were likely mis-annotated. As a consequence, the accu- (Invitrogen, Carlsbad, CA) and further purified using racy of existing genome annotation was significantly RNeasy minikit (Qiagen, Valencia, CA). DNA was enhanced. Moreover, 78 silent genes and 177 putative removed using a DNA-free™ kit (Ambion Inc., Austin, housekeeping genes were identified based on normalized TX). RNA quality was assessed using a nanochip on a transcription measurement with the sequence data. model 2100 bioanalyzer (Agilent Technologies, Santa More than 30% of the pseudogenes were observed to Clara, CA). RNA concentration was determined with a have significant transcriptional activities during the fer- nanodrop (Biotek Instruments, Winooski, VT). Bacterial mentation process. Strong correlations exist between the 16S and 23S ribosomal RNAs were removed with a expression values derived from RNA-Seq analysis and MICROBExpress™ kit (Ambion Inc., Austin, TX). The microarray data or qRT-PCR results. Transcriptome enriched mRNA was converted to a RNA-Seq library structural profiling in this study provided important using the mRNA-Seq library construction kit (Illumina supplemental information on the accuracy of annotation Inc., San Diego, CA) following manufacturer’s protocols. of the C. beijerinckii genome. For samples 1 to 6, two samples were pooled and Wang et al. BMC Genomics 2011, 12:479 Page 8 of 10 http://www.biomedcentral.com/1471-2164/12/479 sequenced on one single lane of an eight-lane flow cell genes and primer sequences are listed in Additional file with the Genome Analyzer IIx system (Illumina Inc., San 15 Table S11. Diego, CA). However, sample 6 yielded a poor read quality following the first sequencing. In order to obtain enough RNA-Seq data accession number sequencing depth, sample 6 was sequenced again using The RNA-Seq sequencing data have been deposited in one single lane under otherwise identical conditions. The theNCBISequenceRead Archive(SRA) underthe derived sequence reads were 75 nt long. The overall error accession number SRA045799. rate of the control DNA was < 0.6%. The total number of reads generated from each library is summarized in Additional material Table 1. Additional file 1: Circular plots of the reads from all six samples mapping to the C. beijerinckii 8052 genome. The outermost and Sequence mapping and visualization second outermost circles represent CDS on the forward and reverse The generated 75-nt reads were mapped to the C. beijer- strands respectively, both of which are colored according to Clusters of inckii 8052 genome using MAQ, and those that did not Orthologous Groups (COG) functional classification assigned to C. beijerinckii 8052 annotation. The gold peak and shading area represents align uniquely to the genome were discarded [5,50]. The greater than the average and lower (in purple). The COG functional quality parameter (-q) used in MAQ pileup was set to 30. classes and corresponding color-coding are as follows (with RGB color Each base was assigned a value based on the number of model values in the parentheses): Class J, black (0 0 0); Class K, blue (0 0 255); Class L, brown (165 42 42); Class B, dark blue (0 0 139); Class D, mapped sequence coverage. The coverage plot files were chocolate (210 105 30); Class V, cyan (0 255 255); Class T, red (255 0 0); read into Artemis and visualized as sequence coverage Class M, yellow (255 255 0); Class N, dark green (0 100 0); Class U, grey profiles over the entire genome [5,9]. Transcription start (128 128 128); Class O, gold (255 215 0); Class C, orange (255 165 0); Class G, light grey (170 170 170); Class E, mid red (255 63 63); Class F, sites (TSS) were manually identified as described in Pas- pink (255 192 203); Class H, purple (128 0 128); Class I, violet (238 130 salacqua et al. [4]. The region between determined TSS 138); Class P, skyblue (135 206 235); Class Q, tan (210 180 140); Class R, andthe annotatedtranslation startsitewas definedas darkgrey (100 100 100); Class S, darkred (139 0 0); Not in COGs, green (0 255 0). If one gene belongs to more than one COG classes, the color for the 5’-untranslated region (5’-UTR). that gene was defined by the first class it belongs to as the above order. Additional file 2: The corresponding Gene ID and ortholog gene in Measurement of gene expression C. acetobutylicum ATCC 824 for the genes in C. beijerinckii 8052 The quantitative gene expression value, RPKM (reads/Kb/ mentioned in the paper (three-letter short names were used in the main text). Million), was calculated using custom Perl scripts by nor- Additional file 3: TSS and operon structure in C. beijerinckii 8052 malizing the sequence coverage over the gene length and chromosome. total unambiguously mapped reads in each library [7,37]. Additional file 4: Refined genome annotation (in GenBank format) based on the findings from this work and the current C. beijerinckii End-point RT-PCR for operon structure assessment 8052 genome annotation in NCBI. The GenBank file can also be downloaded from https://netfiles.uiuc.edu/blaschek/www/Wang- End-point RT-PCR was performed as described in Passa- BMC2011. lacqua et al. [4]. For each selected co-operonic gene pair, Additional file 5: Genes with 5’-UTR length ≥ 100 nt. four primers 1F, 1R, 2F and 2R were designed. Primers 1F Additional file 6: Putative riboswitches identified using RibEx and 1R amplify a region within gene 1, while 2F and 2R among the 5’-UTRs over 100 nt. amplify a region within gene 2. When a continuous tran- Additional file 7: Images of end-point RT-PCR products for 11 script exists, 1F and 2R amplify across the intergenic selected co-operonic gene pairs on a 1.5% agarose gel. For each gene pair (1-11, see details in Additional file 8, Table S5) as labeled on region between genes 1 and 2. Therefore, the size of the the top, the band on the left lane is PCR product using cDNA reverse- end-point RT-PCR product will be the same using either transcribed from RNA samples as template, on the right is PCR product cDNA or genomic DNA as templates. The reaction pro- using genomic DNA as template. ducts were visualized on 1.5% agarose gels stained with Additional file 8: Operon structure validation with end-point RT- PCR. ethidium bromide. Additional file 9: Supplemental texts. Additional file 10: Potential new genes predicted in non-annotated Real time qRT-PCR regions with significant transcriptional activities using GeneMark. Quantitative reverse transcription PCR (qRT-PCR) was Additional file 11: Putative silent genes over all the six sampling performed in order to validate the quantification of gene time points. expression level by RNA-Seq. Twenty-five genes were Additional file 12: Putative housekeeping genes (HKGs). chosen to represent a large range of RPKM values (from Additional file 13: Transcription of the pseudogenes. ~ 40 to > 18000). Triplicate reactions were performed Additional file 14: Summary of the supplementary information using Power SYBR green PCR master mix (Applied Bio- provided by RNA-Seq to the current C. beijerinckii 8052 genome annotation. systems, Carlsbad, CA) on an ABI Prism 7900 HT fast Additional file 15: Genes and primer sequences for qRT-PCR test. real-time PCR machine (Applied Biosystesms). Detected Wang et al. BMC Genomics 2011, 12:479 Page 9 of 10 http://www.biomedcentral.com/1471-2164/12/479 14. McGrath PT, Lee H, Zhang L, Iniesta AA, Hottes AK, Tan MH, Hillson NJ, Acknowledgements Hu P, Shapiro L, McAdams HH: High-throughput identification of This work was supported by USDA Research Initiative Award AG 2006-35504- transcription start sites, conserved promoter motifs and predicted 17419, and Illinois Council on Food and Agricultural Research (C-FAR) Grant regulons. Nat Biotechnol 2007, 25(5):584-592. 698 IDA CF06DS-01-03 to HPB. We thank Dr. Heng Li and Dr. Timothy 15. Abreu-Goodger C, Merino E: RibEx: a web server for locating riboswitches Perkins for their helpful inputs on the data analysis. and other conserved bacterial regulatory elements. Nucleic Acids Res 2005, 33:W690-W692. Author details 16. Griffiths-Jones S, Moxon S, Marshall M, Khanna A, Eddy SR, Bateman A: Department of Agricultural and Biological Engineering, University of Illinois Rfam: annotating non-coding RNAs in complete genomes. Nucleic Acids at Urbana-Champaign, Urbana, IL 61801, USA. Institute for Genomic Biology, Res 2005, 33:D121-D124. University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA. 17. Schreiber W, Dürre P: Differential expression of genes within the gap Department of Animal Sciences, University of Illinois at Urbana-Champaign, operon of Clostridium acetobutylicum. Anaerobe 2000, 6(5):291-297. Urbana, IL 61801, USA. Department of Food Science and Human Nutrition, 18. Ermolaeva MD, Khalak HG, White O, Smith HO, Salzberg SL: Prediction of University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA. Center transcription terminators in bacterial genomes. J Mol Biol 2000, 301(1):27-33. for Advanced Bioenergy Research (CABER), University of Illinois at Urbana- 19. Bennett GN, Rudolph FB: The central metabolic pathway from acetyl-CoA Champaign, Urbana, IL 61801, USA. to butyryl-CoA in Clostridium acetobutylicum. Fems Microbiol Rev 1995, 17(3):241-249. Authors’ contributions 20. Boynton ZL, Bennett GN, Rudolph FB: Cloning, sequencing, and YW, XL, HPB conceived and designed the study. YW and XL performed the expression of clustered genes encoding β-hydroxybutyryl-coenzyme A experiments. YW, XL and YM analyzed the RNA-Seq data. YW, XL and HPB (CoA) dehydrogenase, crotonase, and butyryl-CoA dehydrogenase from wrote the manuscript, with input from all authors. All authors discussed the Clostridium acetobutylicum ATCC 824. J Bacteriol 1996, 178(11):3015-3024. results, read and approved the final manuscript. 21. Belouski E, Watson DE, Bennett GN: Cloning, sequence, and expression of the phosphofructokinase gene of Clostridium acetobutylicum ATCC 824 Competing interests in Escherichia coli. Curr Microbiol 1998, 37(1):17-22. The authors declare that they have no competing interests. 22. Boynton ZL, Bennett GN, Rudolph FB: Cloning, sequencing, and expression of genes encoding phosphotransacetylase and acetate Received: 2 June 2011 Accepted: 30 September 2011 kinase from Clostridium acetobutylicum ATCC 824. Appl Environ Microb Published: 30 September 2011 1996, 62(8):2758-2766. 23. Walter KA, Nair RV, Cary JW, Bennett GN, Papoutsakis ET: Sequence and References arrangement of two genes of the butyrate-synthesis pathway of 1. Wu M, Wang M, Liu JH, Huo H: Assessment of potential life-cycle energy Clostridium acetobutylicum ATCC 824. Gene 1993, 134(1):107-111. and greenhouse gas emission effects from using corn-based butanol as 24. Düerre P: Formation of solvents in clostridia. In Handbook on clostridia. a transportation fuel. Biotechnol Progr 2008, 24(6):1204-1214. Edited by: Düerre P. London: CRC press; 2005:671-693. 2. Keis S, Bennett CF, Ward VK, Jones DT: Taxonomy and phylogeny of 25. Cornillot E, Nair RV, Papoutsakis ET, Soucaille P: The genes for butanol and industrial solvent-producing clostridia. Int J Syst Bacteriol 1995, acetone formation in Clostridium acetobutylicum ATCC 824 reside on a 45(4):693-705. large plasmid whose loss leads to degeneration of the strain. J Bacteriol 3. Ezeji T, Blaschek HP: Fermentation of dried distillers’ grains and solubles 1997, 179(17):5442-5447. (DDGS) hydrolysates to solvents and value-added products by 26. Gerischer U, Dürre P: Cloning, sequencing, and molecular analysis of the solventogenic clostridia. Bioresource Technol 2008, 99(12):5232-5242. acetoacetate decarboxylase gene region from Clostridium 4. Passalacqua KD, Varadarajan A, Ondov BD, Okou DT, Zwick ME, acetobutylicum. J Bacteriol 1990, 172(12):6907-6918. Bergman NH: Structure and complexity of a bacterial transcriptome. J 27. Petersen DJ, Cary JW, Vanderleyden J, Bennett GN: Sequence and Bacteriol 2009, 191(10):3203-3211. arrangement of genes encoding enzymes of the acetone-production 5. Perkins TT, Kingsley RA, Fookes MC, Gardner PP, James KD, Yu L, Assefa SA, pathway of Clostridium acetobutylicum ATCC 824. Gene 1993, He M, Croucher NJ, Pickard DJ, et al: A strand-specific RNA-Seq analysis of 123(1):93-97. the transcriptome of the typhoid bacillus Salmonella typhi. PLoS Genet 28. Chen CK, Blaschek HP: Effect of acetate on molecular and physiological 2009, 5(7). aspects of Clostridium beijerinckii NCIMB 8052 solvent production and 6. Wurtzel O, Sapra R, Chen F, Zhu YW, Simmons BA, Sorek R: A single-base strain degeneration. Appl Environ Microb 1999, 65(2):499-505. resolution map of an archaeal transcriptome. Genome Res 2010, 29. Rui H: The cloning of a putative regulatory gene and the sol region from 20(1):133-141. Clostridium beijerinckii. Master thesis Virginia Polytechnic Institute and State 7. Mortazavi A, Williams BA, McCue K, Schaeffer L, Wold B: Mapping and University, Department of Biochemistry; 1999. quantifying mammalian transcriptomes by RNA-Seq. Nat Methods 2008, 30. Kosaka T, Nakayama S, Nakaya K, Yoshino S, Furukawa K: Characterization 5(7):621-628. of the sol operon in butanol-hyperproducing Clostridium 8. Nagalakshmi U, Wang Z, Waern K, Shou C, Raha D, Gerstein M, Snyder M: saccharoperbutylacetonicum strain N1-4 and its degeneration The transcriptional landscape of the yeast genome defined by RNA mechanism. Biosci Biotechnol Biochem 2007, 71(1):58-68. sequencing. Science 2008, 320:1344-1349. 31. Paredes CJ, Rigoutsos I, Papoutsakis ET: Transcriptional organization of the 9. Carver T, Berriman M, Tivey A, Patel C, Bohme U, Barrell BG, Parkhill J, Clostridium acetobutylicum genome. Nucleic Acids Res 2004, Rajandream M-A: Artemis and ACT: viewing, annotating and comparing 32(6):1973-1981. sequences stored in a relational database. Bioinformatics 2008, 32. Bahl H, Muller H, Behrens S, Joseph H, Narberhaus F: Expression of heat 24(23):2672-2676. shock genes in Clostridium acetobutylicum. Fems Microbiol Rev 1995, 10. Carver T, Thomson N, Bleasby A, Berriman M, Parkhill J: DNAPlotter: circular 17(3):341-348. and linear interactive genome visualization. Bioinformatics 2009, 33. Tomas CA, Welker NE, Papoutsakis ET: Overexpression of groESL in 25(1):119-120. Clostridium acetobutylicum results in increased solvent production and 11. Steil L, Serrano M, Henriques AO, Volker U: Genome-wide analysis of tolerance, prolonged metabolism, and changes in the cell’s temporally regulated and compartment-specific gene expression in transcriptional program. Appl Environ Microb 2003, 69(8):4951-4965. sporulating cells of Bacillus subtilis. Microbiology-(UK) 2005, 151:399-420. 34. Tomas CA, Beamish J, Papoutsakis ET: Transcriptional analysis of butanol 12. Wang ST, Setlow B, Conlon EM, Lyon JL, Imamura D, Sato T, Setlow P, stress and tolerance in Clostridium acetobutylicum. J Bacteriol 2004, Losick R, Eichenberger P: The forespore line of gene expression in Bacillus 186(7):2006-2018. subtilis. J Mol Biol 2006, 358(1):16-37. 35. de Mendoza D, Schujman GE, Aguilar PS: Biosynthesis and function of 13. Shi Z, Blaschek HP: Transcriptional analysis of Clostridium beijerinckii membrane lipids. In Bacillus subtilis and its closest relatives: from genes to NCIMB 8052 and the hyper-butanol-producing mutant BA101 during the cells. Edited by: Sonenshein A, Hoch J, Losick R. Washington, D.C.: ASM shift from acidogenesis to solventogenesis. Appl Environ Microb 2008, Press; 2002:43-55. 74(24):7709-7714. Wang et al. BMC Genomics 2011, 12:479 Page 10 of 10 http://www.biomedcentral.com/1471-2164/12/479 36. Besemer J, Borodovsky M: GeneMark: web software for gene finding in prokaryotes, eukaryotes and viruses. Nucleic Acids Res 2005, 33(Suppl 2): W451-W454. 37. Severin A, Woody J, Bolon Y-T, Joseph B, Diers B, Farmer A, Muehlbauer G, Nelson R, Grant D, Specht J, et al: RNA-Seq Atlas of Glycine max: A guide to the soybean transcriptome. BMC Plant Biology 2010, 10(1):160. 38. Fisher RA: A preliminary linkage test with Agouti and Undulated mice; The fifth linkage-group. Heredity London 1949, 3:229-241. 39. Kuo CH, Ochman H: The extinction dynamics of bacterial pseudogenes. PLoS Genet 2010, 6(8). 40. Lerat E, Ochman H: Recognizing the pseudogenes in bacterial genomes. Nucleic Acids Res 2005, 33(10):3125-3132. 41. Hirotsune S, Yoshida N, Chen A, Garrett L, Sugiyama F, Takahashi S, Yagami K-i, Wynshaw-Boris A, Yoshiki A: An expressed pseudogene regulates the messenger-RNA stability of its homologous coding gene. Nature 2003, 423(6935):91-96. 42. Zheng DY, Frankish A, Baertsch R, Kapranov P, Reymond A, Choo SW, Lu YT, Denoeud F, Antonarakis SE, Snyder M, et al: Pseudogenes in the ENCODE regions: Consensus annotation, analysis of transcription, and evolution. Genome Res 2007, 17(6):839-851. 43. Svensson Ö, Arvestad L, Lagergren J: Genome-wide survey for biologically functional pseudogenes. PLoS Comput Biol 2006, 2(5):e46. 44. Wilhelm BT, Marguerat S, Watt S, Schubert F, Wood V, Goodhead I, Penkett CJ, Rogers J, Bahler J: Dynamic repertoire of a eukaryotic transcriptome surveyed at single-nucleotide resolution. Nature 2008, 453(7199):1239-1243. 45. t Hoen PAC, Ariyurek Y, Thygesen HH, Vreugdenhil E, Vossen R, de Menezes RX, Boer JM, van Ommen GJB, den Dunnen JT: Deep sequencing- based expression analysis shows major advances in robustness, resolution and inter-lab portability over five microarray platforms. Nucleic Acids Res 2008, 36(21):e141. 46. Annous BA, Blaschek HP: Isolation and characterization of Clostridium acetobutylicum mutants with enhanced amylolytic activity. Appl Environ Microb 1991, 57(9):2544-2548. 47. Qureshi N, Lolas A, Biaschek HP: Soy molasses as fermentation substrate for production of butanol using Clostridium beijerinckii BA101. J Ind Microbiol Biot 2001, 26(5):290-295. 48. Jesse TW, Ezeji TC, Qureshi N, Blaschek HP: Production of butanol from starch-based waste packing peanuts and agricultural waste. J Ind Microbiol Biot 2002, 29(3):117-123. 49. Ezeji TC, Qureshi N, Blaschek HP: Production of acetone, butanol and ethanol by Clostridium beijerinckii BA101 and in situ recovery by gas stripping. World J Microb Biot 2003, 19(6):595-603. 50. Li H, Ruan J, Durbin R: Mapping short DNA sequencing reads and calling variants using mapping quality scores. Genome Res 2008, 18(11):1851-1858. doi:10.1186/1471-2164-12-479 Cite this article as: Wang et al.: Single-nucleotide resolution analysis of the transcriptome structure of Clostridium beijerinckii NCIMB 8052 using RNA-Seq. BMC Genomics 2011 12:479. Submit your next manuscript to BioMed Central and take full advantage of: • Convenient online submission • Thorough peer review • No space constraints or color figure charges • Immediate publication on acceptance • Inclusion in PubMed, CAS, Scopus and Google Scholar • Research which is freely available for redistribution Submit your manuscript at www.biomedcentral.com/submit

Journal

BMC GenomicsSpringer Journals

Published: Sep 30, 2011

There are no references for this article.