Background: The halophyte Suaeda aralocaspica performs complete C photosynthesis within individual cells (SCC ), which 4 4 is distinct from typical C plants, which require the collaboration of 2 types of photosynthetic cells. However, despite SCC 4 4 plants having features that are valuable in engineering higher photosynthetic efficiencies in agriculturally important C species such as rice, there are no reported sequenced SCC plant genomes, limiting our understanding of the mechanisms involved in, and evolution of, SCC photosynthesis. Findings: Using Illumina and Pacific Biosciences sequencing platforms, we generated ∼202 Gb of clean genomic DNA sequences having a 433-fold coverage based on the 467 Mb estimated genome size of S. aralocaspica. The final genome assembly was 452 Mb, consisting of 4,033 scaffolds, with a scaffold N50 length of 1.83 Mb. We annotated 29,604 protein-coding genes using Evidence Modeler based on the gene information from ab initio predictions, homology levels with known genes, and RNA sequencing–based transcriptome evidence. We also annotated noncoding genes, including 1,651 long noncoding RNAs, 21 microRNAs, 382 transfer RNAs, 88 small nuclear RNAs, and 325 ribosomal RNAs. A complete (circular with no gaps) chloroplast genome of S. aralocaspica 146,654 bp in length was also assembled. Conclusions: We have presented the genome sequence of the SCC plant S. aralocaspica. Knowledge of the genome of S. aralocaspica should increase our understanding of the evolution of SCC photosynthesis and contribute to the engineering of C photosynthesis into economically important C crops. 4 3 Received: 14 February 2019; Revised: 11 July 2019; Accepted: 27 August 2019 The Author(s) 2019. Published by Oxford University Press. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. 1 Downloaded from https://academic.oup.com/gigascience/article-abstract/8/9/giz116/5568370 by Ed 'DeepDyve' Gillespie user on 16 October 2019 2 A draft genome assembly of halophyte Suaeda aralocaspica Keywords: Suaeda aralocaspica; genome; single-cell C ; photosynthesis; long noncoding RNAs; halophyte Background Carbon loss through photorespiration and water loss through transpiration are common in C plants, especially in warm or dry environments, and they result in significant decreases in growth, water use efficiency, and harvestable yields [ 1]. These problems are overcome in C and crassulacean acid metabolism (CAM) plant families , which perform evolved CO -concentrating mechanisms (C cycle) and Calvin cycle (C 2 4 3 cycle) using spatial (Kranz structure) and temporal (day to night switch) separations, respectively. Both C and CAM plants can outperform C plants, especially under photorespiratory condi- tions, and increase their water use efficiency [ 2], which has cre- ated considerable interest in implementing the C cycle in C 4 3 crops such as rice to improve yields and stress tolerance [3–6]. Among eudicots, C photosynthesis most frequently occurs in the Amaranthaceae of Caryophyllales [7–9]. Four Amaran- Figure 1: Example of S. aralocaspica. thaceae species (3 Bienertia and 1 Suaeda) can perform both C and C cycles within individual photosynthetic cells (single-cell cal research and gene discovery in SCC plants, as well as for en- C [SCC ]) [10–13]. Suaeda contains species that utilize all types 4 4 gineering C functional modules into C crops to increase yields 4 3 of C ,C ,and SCC mechanisms for CO fixation and, thus, rep- 4 3 4 2 and to adapt to high-salt conditions. resents a unique genus to study the evolution of C photosyn- thesis . Mechanistically, the spatially separated chloroplasts in SCC contain different sets of nuclear-encoded proteins that Data Description are related to specific functions in the C and C cycles, which 4 3 Plant material biochemically and functionally resemble mesophyll and bundle sheath cells in chloroplasts of Kranz C plant species [10, 11, 15– Seeds were first collected from a healthy specimen of S. aralo- 18]. These findings indicate that the key enzymes in photosyn- caspica (Fig. 1). The selected plant measured ∼40 cm in height thesis are conserved and that both C and C enzymes work in 3 4 and was located within a natural stand close to Fu-kang County, the same cells in SCC plants during the daytime, which is dif- Xinjiang Uygur Autonomous Region, China (44 14 N latitude, E ferent from both C and CAM plants. 87 40 E longitude, 445 m elevation). The seeds were placed in At present, most of the knowledge of SCC photosynthe- 0.1% potassium permanganate, washed clean for 5 min with ul- sis has come from studies of Bienertia sinuspersici, which has trapure water, and then spread in sterilized petri dishes. After a 2 types of chloroplasts distributed in the central and periph- week of 30 C shaded culturing, the seeds germinated. After seed eral parts of the cell [16, 18–29]. Studies on Suaeda aralocaspica germination, leaves were collected as tissue sources for whole- (NCBI:txid224144) have focused on the germination of dimor- genome sequencing. In addition, 6 other healthy S. aralocaspica phic seeds [30–34]. S. aralocaspica has elongated photosynthetic (collected from the same location as the plant used for seed col- cells with 2 types of chloroplasts distributed at the opposite ends lection) were chosen as tissue (mature leaf, stem, root, and fruit) of the cell. This is analogous to the Kranz anatomy but lacks sources for RNA sequencing (RNA-seq). The samples were frozen the intervening cell wall . This cellular feature indicates that in liquid nitrogen immediately after being collected and then S. aralocaspica conducts C and C photosynthesis within a sin- 4 3 stored at −80 C until DNA/RNA extraction. All the samples were gle cell, perhaps retaining the photosynthetic characteristics of collected with permission from and under the supervision of the both C and C cycles and representing an intermediate model 4 3 local forestry bureau. of the evolutionary process from C to C [35, 36]. S. aralocaspica 3 4 is a hygro-halophyte that grows in temperate salt deserts with DNA extraction and genome sequencing low night temperatures in areas ranging from the northeast of the Caspian lowlands eastwards to Mongolia and western China Genomic DNA was extracted from leaves using a General AllGen . Therefore, it is important to sequence the genome of S. Kit (Tiangen Biotech, Beijing, China) according to its manufac- aralocaspica, which should aid the study of C evolution under 4 turer’s instructions. Genomic DNA isolated from S. aralocaspica stressful growth conditions and accelerate the engineering of was used to construct multiple types of libraries, including short C photosynthesis into C crops for adaptation to high-saline insert size (350, 500, and 800 bp) libraries, mate-paired (2, 5, 10, 4 3 growth conditions. and 20 kb) libraries, and PacBio single-molecule real-time cell In the present study, we sequenced the genome of S. aralo- libraries. The purified libraries were quantified and stored at caspica collected from a cold desert in the Junggar Basin, Xin- −80 C before sequencing. Then, the S. aralocaspica genome was jiang, China. Using an integrated assembly strategy that com- sequenced on Illumina HiSeq 2000 (Illumina Inc., San Diego, CA, bined shotgun Illumina sequencing and single-molecule real- USA) and PacBio RS II platform (Pacific Biosciences of California, time sequencing technology from Pacific Biosciences (PacBio), Menlo Park, CA, USA) using 8 libraries with different insert sizes. we generated a reference genome assembly of S. aralocaspica us- This generated 370 Gb raw Illumina HiSeq data and 10 Gb (∼21× ing protocols established in other plant species [37–40]. To our genome coverage) PacBio reads (Supplemental Table 1). knowledge, this is the first sequenced SCC genome. These ge- 4 To reduce the effects of sequencing errors on the assembly, nomic resources provide a platform for advancing basic biologi- a series of stringent filtering steps were used during read gener- Downloaded from https://academic.oup.com/gigascience/article-abstract/8/9/giz116/5568370 by Ed 'DeepDyve' Gillespie user on 16 October 2019 Wang et al. 3 Table 1. Summary of S. aralocaspica genome assembly ation. We cleaned Illumina reads using the following steps: (1) Cut off adaptors. For the mate-paired library data, reads with- Illumina + out Nextera adaptors longer than 10 bp on both end1 and end2 Assembly Illumina PacBio were removed; (2) Remove tail bases with quality score <20; (3) Remove reads harboring >20% bases with quality scores <20; Total assembly size 424 Mb 452 Mb (4) Remove reads with lengths <30 nucleotides (nt) for DNA-seq; Number of scaffolds (≥500 bp) 4,184 4,033 and (5) Remove duplicated paired-end reads from DNA-seq that Longest scaffold 9.29 Mb 9.98 Mb represent potential PCR artifacts. In total, 1053,309 raw subreads N50 contig (size/number) 49.21 kb/2,464 were produced by Pacbio. Then, reads with lengths <1kbwere N50 scaffold (size/number) 1.44 Mb/80 1.83 Mb/67 filtered, and 935,509 reads were retained. Next, 46 Gb of Illumina N90 scaffold (size/number) 306.62 kb/332 363.87 kb/282 clean reads with 100-bp read lengths was used to correct the % of N 5.78% 2.98% PacBio raw reads using Proovread (Proovread, RRID:SCR 017331) Annotation  (v2). This yielded 632,805 corrected PacBio reads. After the Number of protein-coding genes 29,604 Number of small RNAs 816 quality control and filtering steps, 195 Gb clean Illumina reads Number of long noncoding 1,982 and 6.9 clean PacBio reads were retained, resulting in a 433× fold genes coverage of the genome (Supplemental Table 1). Estimation of genome size liquid nitrogen. After homogenizing the samples in a guani- GCE (GCE, RRID:SCR 017332) (v1.0.0) was used to estimate dine thiocyanate extraction buffer, sodium acetate and chlo- the genome size and heterozygosity. The term k-mer refers to a roform/isoamyl alcohol (24:1) were added. The solution was sequence with a length of k bp, and each unique k-mer within a shaken vigorously, placed on ice for 15 min, and centrifuged genome dataset can be used to determine the discrete probabil- (13,200 rpm) at 4 C to separate a clear upper aqueous layer, ity distributions of all possible k-mers and their frequencies of from which RNA was precipitated with isopropanol. The pre- occurrence. Genome size can be calculated using the total length cipitated RNA was washed with 75% ethanol to remove impu- of sequencing reads divided by sequencing depth. To estimate rities and then resuspended with diethyl pyrocarbonate–treated the sequencing depth of the S. aralocaspica genome, we counted water. Total RNA was treated with RQ1 DNase (Promega) to re- the copy number of a certain k-mer (e.g., 17-mer) present in the move DNA. The quality and quantity of the purified RNA were sequence reads and plotted the distribution of the copy num- determined by measuring the absorbance at 260 nm/280 nm bers. The peak value of the frequency curve represents the over- (A260/A280) using smartspec plus (BioRad). RNA integrity was all sequencing depth. We used the algorithm N × (L − K + 1)/D further verified by 1.5% agarose gel electrophoresis. RNAs were = G,where N represents the total sequence read number, L rep- then equally mixed for RNA-seq library preparation. Polyadeny- resents the average length of sequence reads, and K represents lated messenger RNAs (mRNAs) were purified and concentrated the k-mer length, which was defined here as 17 bp. G denotes with oligo(dT)-conjugated magnetic beads (Invitrogen) before the genome size, and D represents the overall depth estimated directional RNA-seq library preparation. Purified mRNAs were from the k-mer distribution. Based on this method, the esti- fragmented at 95 C, followed by end repair and 5 adaptor lig- mated genome size of S. aralocaspica was 467 Mb (Supplemental ation. Reverse transcription was performed using an RT primer Fig. 1) and the heterozygosity was 0.16%. harboring a 3 adaptor sequence and a randomized hexamer. The complementary DNAs (cDNAs) were purified and amplified, and PCR products corresponding to 200–500 bp were purified, Genome assembly quantified, and stored at −80 C before sequencing. Transcrip- The primary assembled genome was generated by SOAPdenovo tomic libraries were sequenced using Illumina HiSeq X Ten (Il- (SOAPdenovo2, RRID:SCR 014986) (version 2.04-r240) and lumina Inc., San Diego, CA, USA) for paired-end 150-nt reads. As contained 17,302 initial contigs (N50, ∼49.2 kb) and 4,184 scaf- a result, we generated 30 Gb of RNA-seq data (Supplemental Ta- folds (N50, ∼1.44 Mb) spanning 445.6 Mb, with 96.1 Mb (21.56%) ble 3). of the total size being intra-scaffold gaps (Supplemental Table To further annotate transcriptional start and termination 2). Then, we used all of the reads from the short insert libraries sites, we also sequenced cap analysis of gene expression and to fill gaps using GapCloser (GapCloser, RRID:SCR 015026) deep sequencing (CAGE) and polyadenylation site sequencing (v1.12), and 74.7% of the total gaps were filled. This resulted in a (PAS) data. In brief, 20 μg of total RNA of mature leaves was used genome size of 424.5 Mb, with 5.92% gaps, which was calculated for CAGE-seq library preparation. Polyadenylated mRNAs were using the total length of Ns divided by the total length of the purified and concentrated with oligo (dT)-conjugated magnetic assembly. Then, PBJelly (PBJelly, RRID:SCR 012091) (v15.8.24) beads (Invitrogen). After treating with FastAP (Invitrogen) for 1 was used for the second round of gap filling using the polished hat37 C and subsequently with tobacco acid pyrophosphatase PacBio data. This finally yielded a ∼452-Mb genome assembly (Ambion) for 1 h at 37 C, the decapped full-length mRNA was with 4,033 scaffolds (N50, 1.83 Mb) (Table 1, Supplemental Table ligated to the Truseq 5 RNA adaptor (Illumina) for 1 h at 37 C 2). The assembly spanned 96.8% of the S. aralocaspica genome and purified with oligo (dT)-conjugated magnetic beads (Invit- (467 Mb) estimated by the k-mer spectrum (Supplemental rogen). Following fragmentation at 95 C, first-strand cDNA was Fig. 1). synthesized using an RT primer harboring the Truseq 3 adap- tor sequence (Illumina) and a randomized hexamer. The cDNAs were purified and amplified using Truseq PCR primers (Illumina), RNA preparation and sequencing and products corresponding to 200–500 bp were purified, quan- RNA-seq was performed for genome annotation. Different tis- tified, and stored at −80 C until sequencing. CAGE-seq libraries sues (mature leaf, stem, root, and fruit) of 6 S. aralocaspica spec- were sequenced with Illumina Nextseq 500 (Illumina Inc., San imens were used for RNA extraction. Tissues were ground in Diego, CA, USA) for paired-end 150-nt reads. Finally, 16 Gb of Downloaded from https://academic.oup.com/gigascience/article-abstract/8/9/giz116/5568370 by Ed 'DeepDyve' Gillespie user on 16 October 2019 4 A draft genome assembly of halophyte Suaeda aralocaspica CAGE-seq data were generated (Supplemental Table 3). In addi- cia oleracea,and Vitis vinifera) provided homology evidence. The tion, 10 μg of total RNA of mature leaves was used for PAS-seq S. aralocaspica RNA-seq data generated from this study and a library preparation. In brief, polyadenylated mRNAs were puri- published transcriptome of the seed  were assembled into fied using oligo (dT)-conjugated magnetic beads (Invitrogen). Pu- unigenes by Trinity  as the transcript evidence. We predicted rified RNA was fragmented and then reverse transcription was 29,064 PCGs, with an average transcript length of 4,462 bp, cod- performed using a PAS-RT primer (a modified Truseq 3 adap- ing sequence size of 1,112 bp, and a mean of 4.76 exons per tran- tor harboring dT18 and 2 additional anchor nucleotides at the script (Supplemental Tables 6 and 7). Of the annotated PCGs, 3 terminus). DNA was then synthesized with Terminal-Tagging 97.2% were functionally annotated by the InterPro, GO, KEGG, TM oligo cDNA using a ScriptSeq cv2 RNA-Seq Library Preparation SwissProt, or NR databases (Supplemental Figs 4 and 5, Supple- Kit (Epicentre). The cDNAs were purified and amplified, and PCR mental Table 8), and ∼91% were annotated with protein or tran- products corresponding to 300–500 bp were purified, quantified, script support (Supplemental Table 9). The transcriptional start and stored at −80 C before sequencing. PAS-seq libraries were and termination sites of most of the annotated genes were sup- sequenced with Illumina Nextseq 500 (Illumina Inc., San Diego, ported by sequencing reads from CAGE-seq and PAS-seq (Sup- CA, USA) for single-end 300-nt reads. Finally, 28.5 Gb of PAS-seq plemental Figs 6 and 7). data were generated (Supplemental Table 3). In addition, 1,651 long noncoding RNAs were predicted fol- To annotate microRNA, a total of 3 μgofmixed totalRNA lowing a previously published method . In total, 382 transfer was the template for a small RNA cDNA library preparation us- RNAs (tRNAs) were predicted using tRNAscan-SE (tRNAscan-SE, ing Balancer NGS Library Preparation Kit for small/microRNA RRID:SCR 010835) (v1.3.1). Additionally, 21 miRNAs, 88 small (GnomeGen), following the manufacturer’s instructions. Briefly, nuclear RNAs, and 325 ribosomal RNAs were identified by using RNAs were ligated to 3 and 5 adaptors sequentially, reverse the CMscan tool from INFERNAL (Infernal, RRID:SCR 011809) transcribed to cDNA, and PCR amplified. The whole library (v1.1.2) to search the Rfam database with option –cut ga (Supple- was applied to 10% native polyacrylamide gel electrophoresis, mental Table 10, Supplemental Fig. 8). and bands corresponding to microRNA insertions were cut and eluted. After ethanol precipitation and washing, the purified Repeat annotation small RNA libraries were quantified using a Qubit Fluorometer (Invitrogen) and stored at −80 C until sequencing. The small To annotate the repeat sequences of the S. aralocaspica genome, RNA library was sequenced with Illumina GA IIx (Illumina Inc., a combination of de novo and homology-based approaches was San Diego, CA, USA) for 33-nt reads. Finally, 4.5 Gb of small RNA used [55, 56]. For homology-based identification, we used Re- data were generated (Supplemental Table 3). peatMasker (RepeatMasker, RRID:SCR 012954) (open-4.0.5) to search the protein database in Rebase against the S. aralo- caspica genome and identify transposable elements (TEs). The Genome quality evaluation Rebase database  was used to identify TEs. Parameters of Different methods and data were used to check the complete- RepeatMasker were set to “-species Viridiplantae -pa 30 -e rm- ness of the assembly. Using BWA (BWA, RRID:SCR 010910), blast”. In the de novo approach, PILER (PILER, RRID:SCR 017333) we found that 87.08–90.63% of DNA-paired end reads (350, 500,  (v1.0) was used to build the consensus repeat database. and 800 bp) could be properly mapped to the final assembled PILER software requires PALS, FAMS, and PILER to construct the genome (Supplemental Table 4, Supplemental Fig. 2). We evalu- consensus library. The default parameters of PILER were used. ated the completeness of the gene regions in our assembly using Then, the predicted consensus TEs were classified using Repeat- BUSCO (BUSCO, RRID:SCR 015008) (v3.0.2). In total, 89.5% of Classifer implemented in the RepeatModeler package (Repeat- the 1,440 single-copy orthologs presented in the plant lineage Modeler, RRID:SCR 015027) (Version 1.0.11). We used Repeat- was completely identified in the genome (Supplemental Fig. 3). Masker to search the TEs within the database constructed by Furthermore, Trinity (Trinity, RRID:SCR 013048) PILER. Finally, we combined the de novo and homolog predic- (r20140413p1) was used to assemble the RNA-seq reads se- tions of repeat elements according to their coordination in the quenced from the mixed S. aralocaspica RNA library into 157,521 genome, and detected 173.5 Mb repeat elements, which con- unigenes. Then, these unigenes were aligned to the genome stituted 38.41% of the genome (Supplemental Table 11). As ob- assembly by BLASTN with default parameter. We found that served in other sequenced genomes , long terminal repeats 94.5% of the unigenes could be aligned to the genome assembly, in S. aralocaspica occupied the majority (48.5%) of the re- and 76.3% of the unigenes could cover 90% of the sequence peated sequences (Supplemental Table 12). length of 1 scaffold. For unigenes longer than 1 kb, 99.5% of the unigenes could be aligned to the genome assembly, and 92.8% Phylogenetic placement of S. aralocaspica of the unigenes could cover 90% of the sequence length of 1 scaffold (Supplemental Table 5). The OrthoFinder (OrthoFinder, RRID:SCR 017118) (v2.3.3) clustering method was used to perform orthologous group anal- yses with complete annotated protein sequences of 18 se- Gene and functional annotations quenced plant genomes: 8 C species (Solanum tuberosum, S. ol- The genome of S. aralocaspica was annotated for protein-coding eracea, B. vulgaris, C. quinoa, A. thaliana, O. sativa, Musa acumi- genes (PCGs), repeat elements, noncoding genes, and other ge- nata,and Physcomitrella patens), 8 C species (S. aralocaspica, Ama- nomic elements. In detail, MAKER (MAKER, RRID:SCR 005309) ranthus hypochondriacus, Sorghum bicolor, Setaria italica, Zea mays,  (v2.31.9) was used to generate a consensus gene set based on Saccharum spp., Panicum hallii,and Pennisetum glaucum), and 2 3 different types of evidence, ab initio, protein homologues, and CAM species (Ananas comosus and Phalaenopsis equestris). The the transcripts. De novo predictions were processed by AUGUS- longest proteins encoded by each gene in all species were se- TUS (AUGUSTUS, RRID:SCR 008417) (v3.2.1). Nonredundant lected as input for OrthoFinder with default parameters. In total, protein sequences of 7 sequenced plants (Arabidopsis thaliana, 19,324 orthogroups, containing ≥2 genes, were circumscribed, Oryza sativa, Beta vulgaris, Chenopodium quinoa, Glycine max, Spina- 11,768 of which contained ≥1genefrom S. aralocaspica (Supple- Downloaded from https://academic.oup.com/gigascience/article-abstract/8/9/giz116/5568370 by Ed 'DeepDyve' Gillespie user on 16 October 2019 Wang et al. 5 Figure 2: Phylogenetic tree of S. aralocaspica with other C /C /CAM plants. Bootstrap values were obtained from 1,000 bootstrap replicates and are reported as percent- 3 4 ages. mental Table 13). Of the 29,604 annotated S. aralocaspica genes, thesizers might have had independently evolved. Outside of the 23,112 (89%) were classified into orthogroups. In total, 3,895 or- Amaranthaceae, S. aralocaspica (SCC ) is more closely related to thogroups (172,107 genes) were shared among all the genomes the C than C plants. These findings do not fully support the ex- 3 4 analyzed. A total of 70 orthogroups (351 genes) were specific to isting model that S. aralocaspica would be a C –C intermediate 3 4 the assembled S. aralocaspica genome when compared with the and was on the road toward the C4 plants [35, 36]. other 17 genomes. With OrthoFinder, 15 single-copy orthologous genes, shared Assembly of the S. aralocaspica chloroplast genome across 18 species, were identified and were aligned with MUS- Using the short insert size (350 bp) data, a complete (circu- CLE (MUSCLE, RRID:SCR 011812) (v3.8.31), using default set- lar with no gaps) chloroplast genome of S. aralocaspica was as- tings (see Supplementary File 1 for commands and settings). The sembled at 146,654 bp in length using NOVOPlasty (NOVOPlasty, concatenated amino acid sequences were trimmed using trimAI RRID:SCR 017335) (v2.7.2). The Rubisco-bis-phosphate oxy- (trimAI, RRID:SCR 017334) (trimal -gt 0.8 -st 0.001 -cons 60) genase (RuBP) subunit of C. quinoa (GenBank:KY419706.1) was (v1.2rev59) and were further used by ModelFinder to select the selected as a seed sequence. An initial gene annotation of the best model (JTTDCMut+F+I+G4). Then, the phylogenetic trees genome was performed using GeSeq (GeSeq, RRID:SCR 017336) were constructed using IQ-Tree (IQ-TREE, RRID:SCR 017254) . The circular chloroplast genome maps were drawn using (v1.6.10). The aLRT method was used to perform 1,000 bootstrap the OrganellarGenome DRAW tool (OGDraw, RRID:SCR 017337) analyses to test the robustness of each branch. Then, a time- , with subsequent manual editing (Fig. 3). tree was inferred using the Realtime method [66, 67] and or- dinary least-squares estimates of branch lengths. This analy- sis involved 18 amino acid sequences. There were 4,489 posi- Conclusion tions in the final dataset. The timetree was constructed using Using the Illumina and Pacbio platforms, we successfully assem- MEGA X (MEGA Software, RRID:SCR 000667). The resulting bled the genome of S. aralocaspica, the first sequenced genome phylogenetic tree showed that all 5 Amaranthaceae species were of a SCC plant. The final genome assembly was 452 Mb in size placed in the same clade, among which A. hypochondriacus (C ) 4 and consisted of 4,033 scaffolds, with a scaffold N50 length of was placed as a sister subclade to the other 3 C species (Fig. 2). 1.83 Mb. We annotated 29,604 protein-coding genes and non- Moreover, S. aralocaspica (SCC ) was the sister clade of 4 other coding genes including 1,651 long noncoding RNAs, 21 miRNAs, species from the Amaranthaceae including A. hypochondriacus 382 tRNAs, 88 small nuclear RNAs, and 325 ribosomal RNAs. The (C )(Fig. 2). Our results of phylogenetic analyses were consistent phylogenetic tree placed SCC in a clade more closely related with a previous study on the evolution of C. quinoa . Inside 4 to the C than the C plants, not fully supporting the hypothe- of the Amaranthaceae, the close phylogenetic distance between 3 4 sis that SCC is a C –C intermediate that independently evolved S. aralocaspica (SCC )and A. hypochondriacus (C ), away from all 4 3 4 4 4 from the C ancestors. A complete (circular with no gaps) chloro- other C relatives, suggests that these SCC and C photosyn- 3 3 4 4 Downloaded from https://academic.oup.com/gigascience/article-abstract/8/9/giz116/5568370 by Ed 'DeepDyve' Gillespie user on 16 October 2019 6 A draft genome assembly of halophyte Suaeda aralocaspica Figure 3: Gene map of the S. aralocaspica chloroplast genome. Genes shown outside the outer circle are transcribed clockwise, and those inside are transcribed coun- terclockwise. Genes belonging to different functional groups are color coded. The dashed area in the inner circle indicates guanine-cytosine content of the chloroplast genome. plast genome of S. aralocaspica was also assembled, and was Additional files 146,654 bp in size. The available genome assembly, together with Supplemental Figure 1: k-mer distribution of sequencing reads. transcriptomic data of S. aralocaspica,providesavaluable re- Supplemental Figure 2: Size distribution of inserts in sequenced source for investigating C evolution and mechanisms. We an- paired-end DNA reads. ticipate that future studies of S. aralocaspica will greatly facili- Supplemental Figure 3: Integrity comparison of genome assem- tate the process of engineering crops, especially C species, in- blies of S. aralocaspica with BUSCO. For S. aralocaspica, assemblies cluding rice, with higher photosynthetic efficiencies and saline in each step were analyzed respectively. tolerance. Supplemental Figure 4: Annotated genes supported by different evidence. Supplemental Figure 5: Gene ontology distribution of S. aralo- Availability of supporting data and materials caspica protein-coding genes. Raw sequencing data are deposited in the NCBI SRA with acces- Supplemental Figure 6: Transcription start site (TSS) annotation sion number SRP128359. The NCBI Bioproject accession is PR- with CAGE-seq. JNA428881. Further supporting data and materials are available Supplemental Figure 7: Transcription terminal site (TTS) anno- in the GigaScience GigaDB database . tation with PAS-seq. Downloaded from https://academic.oup.com/gigascience/article-abstract/8/9/giz116/5568370 by Ed 'DeepDyve' Gillespie user on 16 October 2019 Wang et al. 7 Supplemental Figure 8: Noncoding RNAs classification in S. ar- Yu Z., H.W., L.J., and K.Z. assembled the genome, analyzed the alocaspica. data, and generated the graphs. Yi Z., W.Q., C.T., L.W., C.C., and Supplemental Table 1: Summary of sequencing data obtained X.W. wrote the manuscript. for genome assembly. Supplemental Table 2: The assembly statistics of the S. aralo- caspica genome. References Supplemental Table 3: Information of different types of RNA li- braries. 1. Walker BJ, VanLoocke A, Bernacchi CJ, et al. The costs of pho- Supplemental Table 4: Mapping efficiency of short insert library torespiration to food production now and in the future. Annu reads Rev Plant Biol 2016;67(1):107–29. Supplemental Table 5: Assessment of sequence coverage of S. 2. Yamori W, Hikosaka K, Way DA. Temperature response of aralocaspica genome assembly using unigenes. photosynthesis in C ,C , and CAM plants: temperature 3 4 Supplemental Table 6: Gene prediction in the S. aralocaspica acclimation and temperature adaptation. Photosynth Res genome. 2014;119(1–2):101–17. Supplemental Table 7: Comparison of the gene structure among 3. Hibberd JM, Sheehy JE, Langdale JA. Using C photosynthe- S. aralocaspica and some other species. sis to increase the yield of rice-rationale and feasibility. Curr Supplemental Table 8: Summary of S. aralocaspica gene annota- Opin Plant Biol 2008;11(2):228–31. tion based on homology or functional classification. 4. von Caemmerer S, Quick WP, Furbank RT. The development Supplemental Table 9: Number of S. aralocaspica genes with pro- of C4 rice: current progress and future challenges. Science tein or unigene support. 2012;336(6089):1671–2. Supplemental Table 10: Noncoding RNA genes in the S. aralo- 5. Gu J-F, Qiu M, Yang J-C. Enhanced tolerance to drought in caspica genome. transgenic rice plants overexpressing C4 photosynthesis en- Supplemental Table 11: Repeat elements in the S. aralocaspica zymes. Crop J 2013;1(2):105–14. genome. Repeat elements were identified by different methods 6. Betti M, Bauwe H, Busch FA, et al. Manipulating photorespira- and then combined into a final repeat set. tion to increase plant productivity: recent advances and per- Supplemental Table 12: Repeat elements in S. aralocaspica spectives for crop improvement. J Exp Bot 2016;67(10):2977– genome. 88. Supplemental Table 13: Orthogroups clustered by OrthoFinder 7. Akhani H, Trimborn P, Ziegler H. Photosynthetic pathways in 18 species. in Chenopodiaceae from Africa, Asia and Europe with their ecological, phytogeographical and taxonomical importance. Plant Syst Evol 1997;206(1):187–221. Abbreviations 8. Sage RF, Li M, Monson RK. The taxonomic distribution of C photosynthesis. In: Sage RF, Monson RK , eds. C Plant Biol- bp: base pairs; BUSCO: Benchmarking Universal Single-Copy Or- ogy. San Diego, CA, USA: Academic Press; 1999:551–84. thologs; BWA: Burrows-Wheeler Aligner; CAGE: cap analysis of 9. Jacobs SWL. Review of leaf anatomy and ultrastructure gene expression and deep sequencing; CAM: crassulacean acid in the Chenopodiaceae (Caryophyllales). J Torrey Bot Soc metabolism; cDNA: complementary DNA; Gb: gigabase pairs; 2001;128(3):236–53. GCE: Genomic Character Estimator; GO: Gene Ontology; kb: kilo- 10. Voznesenskaya EV, Franceschi VR, Kiirats O, et al. Kranz base pairs; KEGG: Kyoto Encyclopedia of Genes and Genomes; anatomy is not essential for terrestrial C4 plant photosyn- Mb: megabase pairs; miRNA: microRNA; mRNA: messenger RNA; thesis. Nature 2001;414(6863):543–6. NCBI: National Center for Biotechnology Information; nt: nu- 11. Voznesenskaya EV, Franceschi VR, Kiirats O, et al. Proof of C4 cleotide; PacBio: Pacific Biosciences; PAS: polyadenylation site photosynthesis without Kranz anatomy in Bienertia cycloptera sequencing; PCG: protein-coding gene; RNA-seq: RNA sequenc- (Chenopodiaceae). Plant J 2002;31(5):649–62. ing; SCC : single-cell C photosynthesis; SRA: Sequence Read 4 4 12. Akhani H, Barroca J, Koteeva N, et al. Bienertia sinusper- Archive; TE: transposable element; tRNA: transfer RNA; TTS: sici (Chenopodiaceae): a new species from southwest Asia transcription terminal site. and discovery of a third terrestrial C4 plant without Kranz anatomy. Syst Bot 2005;30(2):290–301. Competing interests 13. Akhani H, Chatrenoor T, Dehghani M, et al. A new species of Bienertia (Chenopodiaceae) from Iranian salt deserts: a third The authors declare that they have no competing interests. species of the genus and discovery of a fourth terrestrial C plant without Kranz anatomy. Plant Biosyst 2012;146(3):550– Funding 14. Schutze ¨ P, Freitag H, Weising K. An integrated molecular This research was supported by the Key Research and Develop- and morphological study of the subfamily Suaedoideae Ulbr. ment Program of Xinjiang province (2018B01006–4), the National (Chenopodiaceae). Plant Syst Evol 2003;239(3):257–86. Natural Science Foundation of China (31770451), the National 15. Voznesenskaya EV, Edwards GE, Kiirats O, et al. Development Key Research and Development Program (2016YFC0501400), and of biochemical specialization and organelle partitioning in ABLife ( ABL2014–02028). the single-cell C system in leaves of Borszczowia aralocaspica (Chenopodiaceae). Am J Bot 2003;90(12):1669–80. 16. Voznesenskaya EV, Koteyeva NK, Chuong SD, et al. Dif- Authors’ contributions ferentiation of cellular and biochemical features of the C.T., L.W., Yi Z., and S.M. initiated the project and designed the single-cell C syndrome during leaf development in Bi- study. L.W., H.W., L.J., Z.Z., and K.Z. prepared experimental mate- enertia cycloptera (Chenopodiaceae). Am J Bot 2005;92(11): rials and performed experiments for data collection. G.M., C.C., 1784–95. Downloaded from https://academic.oup.com/gigascience/article-abstract/8/9/giz116/5568370 by Ed 'DeepDyve' Gillespie user on 16 October 2019 8 A draft genome assembly of halophyte Suaeda aralocaspica 17. Offermann S, Okita TW, Edwards GE. Resolving the com- 33. Wang HL, Tian CY, Wang L. Germination of dimorphic seeds partmentation and function of C photosynthesis in the of Suaeda aralocaspica in response to light and salinity condi- single-cell C species Bienertia sinuspersici. Plant Physiol tions during and after cold stratification. PeerJ 2017; 5:e3671. 2011;155(4):1612–28. 34. Wang L, Wang HL, Yin L, et al. Transcriptome assem- 18. Offermann S, Friso G, Doroshenk KA, et al. Developmental bly in Suaeda aralocaspica to reveal the distinct temporal and subcellular organization of single-cell C photosynthesis gene/miRNA alterations between the dimorphic seeds dur- in Bienertia sinuspersici determined by large-scale proteomics ing germination. BMC Genomics 2017;18:806. and cDNA assembly from 454 DNA sequencing. J Proteome 35. Edwards GE, Voznesenskaya EV. C photosynthesis: Kranz Res 2015;14(5):2090–108. forms and single-cell C in terrestrial plants. In: Raghavendra 19. Wimmer D, Bohnhorst P, Shekhar V, et al. Transit peptide el- AS,SageRF,eds. C Photosynthesis and Related CO Con- 4 2 ements mediate selective protein targeting to two different centrating Mechanisms. Dordrecht, Netherlands: Springer; types of chloroplasts in the single-cell C species Bienertia si- 2011:29–61. nuspersici. Sci Rep 2017;7:41187. 36. Sharpe RM, Offermann S. One decade after the discovery of 20. Juric´ I, Gonzalez-P ´ er ´ ez V, Hibberd JM, et al. Size matters single-cell C species in terrestrial plants: what did we learn for single-cell C4 photosynthesis in Bienertia. J Exp Bot about the minimal requirements of C photosynthesis? Pho- 2017;68(2):255–67. tosynth Res 2014;119(1–2):169–80. 21. Stutz SS, Edwards GE, Cousins AB. Single-cell C photosyn- 37. Badouin H, Gouzy J, Grassa CJ, et al. The sunflower genome thesis: efficiency and acclimation of Bienertia sinuspersici to provides insights into oil metabolism, flowering and Asterid growth under low light. New Phytol 2014;202(1):220–32. evolution. Nature 2017;546(7656):148–52. 22. Lung SC, Yanagisawa M, Chuong SD. Protoplast isolation and 38. Jarvis DE, Ho YS, Lightfoot DJ, et al. The genome of transient gene expression in the single-cell C species, Bi- Chenopodium quinoa. Nature 2017;542:307. enertia sinuspersici. Plant Cell Rep 2011;30(4):473–84. 39. Zhang GQ, Liu KW, Li Z, et al. The Apostasia genome and the 23. Leisner CP, Cousins AB, Offermann S, et al. The effects of evolution of orchids. Nature 2017;549(7672):379–83. salinity on photosynthesis and growth of the single-cell C 40. Zhao G, Zou C, Li K, et al. The Aegilops tauschii genome reveals species Bienertia sinuspersici (Chenopodiaceae). Photosynth multiple impacts of transposons. Nat Plants 2017;3:946–55. Res 2010;106(3):201–14. 41. Hackl T, Hedrich R, Schultz J, et al. proovread: large-scale 24. Uzilday B, Ozgur R, Yalcinkaya T, et al. Changes in redox reg- high-accuracy PacBio correction through iterative short read ulation during transition from C to single cell C photosyn- consensus. Bioinformatics 2014;30(21):3004–11. 3 4 thesis in Bienertia sinuspersici. J Plant Physiol 2017;220:1–10. 42. Liu B, Shi Y, Yuan J, et al. Estimation of genomic charac- 25. Koteyeva NK, Voznesenskaya EV, Berry JO, et al. The unique teristics by analyzing k-mer frequency in de novo genome structural and biochemical development of single cell C projects. arXiv 2013:1308.2012 . photosynthesis along longitudinal leaf gradients in Bienertia 43. Li R, Zhu H, Ruan J, et al. De novo assembly of human sinuspersici and Suaeda aralocaspica (Chenopodiaceae). J Exp genomes with massively parallel short read sequencing. Bot 2016;67(9):2587–601. Genome Res 2010;20(2):265–72. 26. Rosnow J, Yerramsetty P, Berry JO, et al. Exploring mecha- 44. The-Tomato Genome Consortium. The tomato genome se- nisms linked to differentiation and function of dimorphic quence provides insights into fleshy fruit evolution. Nature chloroplasts in the single cell C4 species Bienertia sinuspersici. 2012;485:635. BMC Plant Biol 2014;14:34. 45. English AC, Richards S, Han Y, et al. Mind the gap: upgrading 27. Park J, Knoblauch M, Okita TW, et al. Structural changes genomes with Pacific Biosciences RS long-read sequencing in the vacuole and cytoskeleton are key to development technology. PLoS One 2012;7(11):e47768. of the two cytoplasmic domains supporting single-cell C(4) 46. Li H, Durbin R. Fast and accurate short read align- photosynthesis in Bienertia sinuspersici. Planta 2009;229(2): ment with Burrows-Wheeler transform. Bioinformatics 369–82. 2009;25(14):1754–60. 28. Lara MV, Offermann S, Smith M, et al. Leaf development in 47. Simao FA, Waterhouse RM, Ioannidis P, et al. BUSCO: assess- the single-cell C system in Bienertia sinuspersici: expression ing genome assembly and annotation completeness with of genes and peptide levels for C metabolism in relation single-copy orthologs. Bioinformatics 2015;31(19):3210–2. to chlorenchyma structure under different light conditions. 48. Grabherr MG, Haas BJ, Yassour M, et al. Full-length tran- Plant Physiol 2008;148(1):593–610. scriptome assembly from RNA-Seq data without a reference 29. Chuong SD, Franceschi VR, Edwards GE. The cytoskele- genome. Nat Biotechnol 2011;29(7):644–52. ton maintains organelle partitioning required for single-cell 49. Cantarel BL, Korf I, Robb SM, et al. MAKER: an easy-to-use C4 photosynthesis in Chenopodiaceae species. Plant Cell annotation pipeline designed for emerging model organism 2006;18(9):2207–23. genomes. Genome Res 2008;18(1):188–96. 30. Wang L, Huang Z, Baskin CC, et al. Germination of dimor- 50. Stanke M, Morgenstern B. AUGUSTUS: a web server for phic seeds of the desert annual halophyte Suaeda aralocaspica gene prediction in eukaryotes that allows user-defined con- (Chenopodiaceae), a C plant without Kranz anatomy. Ann straints. Nucleic Acids Res 2005;33(Web Server issue):W465– Bot 2008;102(5):757–69. 7. 31. Wang L, Baskin JM, Baskin CC, et al. Seed dimorphism, nutri- 51. Wang L, Wang HL, Yin L, et al. Transcriptome assem- ents and salinity differentially affect seed traits of the desert bly in Suaeda aralocaspica to reveal the distinct temporal halophyte Suaeda aralocaspica via multiple maternal effects. gene/miRNA alterations between the dimorphic seeds dur- BMC Plant Biol 2012;12:170. ing germination. BMC Genomics 2017;18(1):806. 32. Cao J, Lv XY, Chen L, et al. Effects of salinity on the 52. Cabili MN, Trapnell C, Goff L, et al. Integrative annota- growth, physiology and relevant gene expression of an an- tion of human large intergenic noncoding RNAs reveals nual halophyte grown from heteromorphic seeds. AoB Plants global properties and specific subclasses. Genes Dev 2011; 2015;7:plv112. 25(18):1915–27. Downloaded from https://academic.oup.com/gigascience/article-abstract/8/9/giz116/5568370 by Ed 'DeepDyve' Gillespie user on 16 October 2019 Wang et al. 9 53. Lowe TM, Eddy SR. tRNAscan-SE: a program for improved de- 64. Capella-Gutierrez S, Silla-Martinez JM, Gabaldon T. trimAl: a tection of transfer RNA genes in genomic sequence. Nucleic tool for automated alignment trimming in large-scale phy- Acids Res 1997;25(5):955–64. logenetic analyses. Bioinformatics 2009;25(15):1972–3. 54. Nawrocki EP, Kolbe DL, Eddy SR. Infernal 1.0: inference of 65. Nguyen LT, Schmidt HA, von Haeseler A, et al. IQ-TREE: a fast RNA alignments. Bioinformatics 2009;25(10):1335–7. and effective stochastic algorithm for estimating maximum- 55. Iorizzo M, Senalik DA, Grzebelus D, et al. De novo assem- likelihood phylogenies. Mol Biol Evol 2015;32(1):268–74. bly and characterization of the carrot transcriptome reveals 66. Tamura K, Tao Q, Kumar S. Theoretical foundation of the Rel- novel genes, new markers, and genetic diversity. BMC Ge- Time method for estimating divergence times from variable nomics 2011;12:389. evolutionary rates. Mol Biol Evol 2018;35(7):1770–82. 56. Wang L, Yu S, Tong C, et al. Genome sequencing of the 67. Tamura K, Battistuzzi FU, Billing-Ross P, et al. Estimating high oil crop sesame provides insight into oil biosynthesis. divergence times in large molecular phylogenies. Proc Natl Genome Biol 2014;15(2):R39. Acad Sci U S A 2012;109(47):19333–8. 57. Tarailo-Graovac M, Chen N. Using RepeatMasker to iden- 68. Kumar S, Stecher G, Li M, et al. MEGA X: Molecular Evolution- tify repetitive elements in genomic sequences. Curr Protoc ary Genetics Analysis across computing platforms. Mol Biol Bioinform 2009:Chap 4:Unit 4.10. Evol 2018;35(6):1547–9. 58. Repbase. 2001. http://www.girinst.org/server/RepBase/index 69. Zou C, Chen A, Xiao L, et al. A high-quality genome assembly .php. Accessed 1st April 2019 of quinoa provides insights into the molecular basis of salt 59. Edgar RC, Myers EW. PILER: identification and classification bladder-based salinity tolerance and the exceptional nutri- of genomic repeats. Bioinformatics 2005;21(Suppl 1):i152–i8. tional value. Cell Res 2017;27:1327. 60. Rao SK, Fukayama H, Reiskind JB, et al. Identification of C4 70. Dierckxsens N, Mardulyn P, Smits G. NOVOPlasty: de novo responsive genes in the facultative C4 plant Hydrilla verticil- assembly of organelle genomes from whole genome data. lata. Photosynth Res 2006;88(2):173–83. Nucleic Acids Res 2017;45(4):e18. 61. Vlasova A, Capella-Gutierrez S, Rendon-Anaya M, et al. 71. Tillich M, Lehwark P, Pellizzer T, et al. GeSeq - versatile and Genome and transcriptome analysis of the Mesoamerican accurate annotation of organelle genomes. Nucleic Acids Res common bean and the role of gene duplications in establish- 2017;45(W1):W6–W11. ing tissue and temporal specialization of genes. Genome Biol 72. Lohse M, Drechsel O, Kahlau S, et al. 2016;17:32. OrganellarGenomeDRAW–a suite of tools for generating 62. Wicker T, Sabot F, Hua-Van A, et al. A unified classification physical maps of plastid and mitochondrial genomes system for eukaryotic transposable elements. Nat Rev Genet and visualizing expression data sets. Nucleic Acids Res 2007;8(12):973–82. 2013;41(Web Server issue):W575–81. 63. Emms DM, Kelly S. OrthoFinder: solving fundamental 73. Wang L, Ma G, Wang H, et al. Supporting data for “A draft biases in whole genome comparisons dramatically im- genome assembly of halophyte Suaeda aralocaspica, a plant proves orthogroup inference accuracy. Genome Biol 2015; that performs C4 photosynthesis within individual cells.” Gi- 16:157. gaScience Database 2019. http://dx.doi.org/10.5524/100646.
GigaScience – Oxford University Press
Published: Sep 1, 2019
It’s your single place to instantly
discover and read the research
that matters to you.
Enjoy affordable access to
over 18 million articles from more than
15,000 peer-reviewed journals.
All for just $49/month
Query the DeepDyve database, plus search all of PubMed and Google Scholar seamlessly
Save any article or search result from DeepDyve, PubMed, and Google Scholar... all in one place.
Get unlimited, online access to over 18 million full-text articles from more than 15,000 scientific journals.
Read from thousands of the leading scholarly journals from SpringerNature, Elsevier, Wiley-Blackwell, Oxford University Press and more.
All the latest content is available, no embargo periods.
“Hi guys, I cannot tell you how much I love this resource. Incredible. I really believe you've hit the nail on the head with this site in regards to solving the research-purchase issue.”Daniel C.
“Whoa! It’s like Spotify but for academic articles.”@Phil_Robichaud
“I must say, @deepdyve is a fabulous solution to the independent researcher's problem of #access to #information.”@deepthiw
“My last article couldn't be possible without the platform @deepdyve that makes journal papers cheaper.”@JoseServera