Background: Parasitoid wasps are well-known natural enemies of major agricultural pests and arthropod borne diseases. The parasitoid wasp Macrocentrus cingulum (Hymenoptera: Braconidae) has been widely used to control the notorious insect pests Ostrinia furnacalis (Asian Corn Borer) and O. nubilalis (European corn borer). One striking phenomenon exhibited by M. cingulum is polyembryony, the formation of multiple genetically identical offspring from a single zygote. Moreover, M. cingulum employs a passive parasitic strategy by preventing the host’s immune system from recognizing the embryo as a foreign body. Thus, the embryos evade the host’s immune system and are not encapsulated by host hemocytes. Unfortunately, the mechanism of both polyembryony and immune evasion remains largely unknown. Results: We report the genome of the parasitoid wasp M. cingulum. Comparative genomics analysis of M. cingulum and other 11 insects were conducted, finding some gene families with apparent expansion or contraction which might be linked to the parasitic behaviors or polyembryony of M. cingulum. Moreover, we present the evidence that the microRNA miR-14b regulates the polyembryonic development of M. cingulum by targeting the c-Myc Promoter-binding Protein 1 (MBP-1), histone-lysine N-methyltransferase 2E (KMT2E) and segmentation protein Runt. In addition, Hemomucin, an O-glycosylated transmembrane protein, protects the endoparasitoid wasp larvae from being encapsulated by host hemocytes. Motif and domain analysis showed that only the hemomucin in two endoparasitoids, M. cingulum and Venturia canescens, possessing the ability of passive immune evasion has intact mucin domain and similar O-glycosylation patterns, indicating that the hemomucin is a key factor modulating the immune evasion. Conclusions: The microRNA miR-14b participates in the regulation of polyembryonic development, and the O-glycosylation of the mucin domain in the hemomucin confers the passive immune evasion in this wasp. These key findings provide new insights into the polyembryony and immune evasion. Keywords: Macrocentrus cingulum, Genome, Polyembryony, Immune evasion, Comparative genomics * Correspondence: email@example.com; firstname.lastname@example.org; email@example.com Chuanlin Yin, Meizhen Li and Jian Hu contributed equally to this work. State Key Laboratory of Biocontrol, Sun Yat-sen University, 135 Xingang Road West, Guangzhou 510275, China Ministry of Agriculture Key Lab of Molecular Biology of Crop Pathogens and Insects, Institute of Insect Science, Zhejiang University, 866 Yuhangtang Road, Hangzhou 310058, China Full list of author information is available at the end of the article © The Author(s). 2018 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. Yin et al. BMC Genomics (2018) 19:420 Page 2 of 18 Background infected by other organisms . However, in seeming Parasitoid wasps are a group of hymenopteran insects contrast with these distinctly antagonistic parasitic tactics, that parasitize the eggs, larvae or pupae of other arthro- M. cingulum employs a more passive parasitic strategy pods . These wasps differ from other parasitic organ- (Fig. 1). Parasitism substantially retards development isms because they kill their host, and the adult wasps are of the Ostrinia host, ultimately allowing the adult wasps free-living. Because of their host specificity and lethality, to emerge and kill the host before it pupates (Fig. 1a). parasitoid wasps provide an important and effective While growing in the host hemolymph, M. cingulum em- strategy for the biological control of agricultural pests, bryos express surface features that prevent the host’sim- thus reducing the need for chemical pesticides [2, 3]. mune system from recognizing the embryo as a foreign Additionally, short generation times, ease of rearing, and body, thus the embryos evade the host’s immune system interfertile species are characteristics that make at least and are not encapsulated by host hemocytes (Fig. 1b). some parasitoid wasps highly tractable genetic model Available evidence suggests Hemomucin, an O-glycosylated systems . This is exemplified by Nasonia vitripennis, transmembrane protein on the egg and embryo’ssurface of which was the first parasitoid wasp genome to be se- M. cingulum, may protect the endoparasitoid wasp from quenced, laying the foundation for genomic research in being encapsulated by host hemocytes [13, 14]. When the this economically and ecologically significant group of expression of hemomucin was decreased by RNAi, more insects . embryos were encapsulated relative to the control . One striking phenomenon exhibited by numerous Similar results were observed after the embryos were parasitoid wasp species is polyembryony, the formation digested by O-glycosidase, which may specifically digest of multiple genetically identical offspring from a single β-gal (1→ 3) linkages between GalNAc and Ser/Thr of the zygote. Polyembryony appears in only some parasitic mucin domain in hemomucin, indicating the important species within four families of Hymenoptera and one role of the mucin domain in hemomucin . Despite species of Strepsiptera in insects, and is believed to have these initial insights, the mechanism of immune evasion re- evolved independently at least four times among parasit- mains largely unknown. oid wasps . This includes the endoparasitic wasp With the aim of further investigations of polyembry- Macrocentrus cingulum, for which some details of ony and immune evasion, along with other studies of polyembryonic development have been described [6, 7]. wasp biology, we present the draft genome of M. M. cingulum usually deposits egg(s) into the larval cingulum. Comparative genomic analyses indicated that hemocoel of corn borer, Ostrinia furnacalis or O. nubi- parasitic behavior has shaped the endoparasitoid wasp lalis. Subsequently, the egg cleaves into several dozens genome and induced significant gene gain and loss asso- of primary embryonic cells, one of which may further ciated with parasitism. RNA-Seq analysis of the primary develop into a morula containing dozens of secondary and secondary embryonic cells revealed that microRNA embryonic embryos (SEE). SEE may develop into an em- (miRNA) miR-14b was abundant in the primary embry- bryo, which developed into a larva, or a pseudogerm, onic cell. Target prediction and dual-luciferase assay val- which was finally consumed by hatched larvae . Pro- idation suggested that miR-14b plays critical roles in liferation of embryos are mainly related to the egg cleav- polyembryonic development. Comparative analyses of age and the formation of morula. Notably, these wasp hemomucin genes found that the presence of a meticulous physical observations of polyembryonic de- mucin domain and its glycosylation was positively asso- velopment have not yet been complemented with mo- ciated with immune evasion in parasitoid wasps. lecular analyses. Thus, there remains tremendous opportunities for investigating the molecular mecha- Results nisms underlying the developmental complexity of Genome sequencing, assembly and annotation polyembryony. We sequenced the genome of M. cingulum from ~ 1000 Beyond polyembryony, parasitoid wasps exhibit a range male wasps of an inbred strain which was maintained by of other distinct and noteworthy traits that evolved as sibling mating for five generations. Three paired-end li- strategies to manipulate their host. For instance, many braries with different insert sizes (170 bp, 500 bp, species introduce various parasitoid-associated factors 800 bp) and one mate-pair libraries with insert size 8 Kb such as venom and polydnaviruses (PDV) into the host (Additional file 1: Table S1), were constructed and se- during oviposition to block the host immune re- quenced using Illumina HiSeq 2000 platform, yielding sponses [9, 10]. Another example is the production of 103.67 Gb of raw data. After removing the low-quality teratocytes from cellular membranes after the eggs reads, 93.24 Gb of clean data were obtained, covering ~ hatched. The teratocytes are released into the hemolymph 690 X of M. cingulum genome which was estimated to be of the host to inhibit melanization and to produce 135 Mb by 17 K-mer analysis (Additional file 1:Figure S1, anti-microbial peptides to protect the host from being Additional file 1: Table S2) and flow cytometry Yin et al. BMC Genomics (2018) 19:420 Page 3 of 18 Fig. 1 The life cycle and living strategies of M. cingulum. a The wasp M. cingulum parasitizes the larvae of Asian corn borer Ostrinia furnacalis at late stage of the 4th instar. The wasp develops in host hemocoel and grows into adult 12 days later. After parasitism, the host expands their larval stagefrom 5 to 8daysto12–13 days, and never pupates. Eventually, the wasp larvae creep out of the corpse of host and cocoon on it, which has been consumed by wasp larvae. b The wasp M. cingulum evades the encapsulation of host hemocytes by Hemomucin. Theimmuneevasion of M. cingulum is related to the O-glycosylation of the transmembrane protein Hemomucin located on the surface of egg and embryos. c The polyembryonic development of M. cingulum.One egg of M. cingulum can proliferate into multiple genetically identical offspring. The wasp egg firstly cleaves into dozens of primary embryonic cells, which split within the extraembryonic membrane and grows into morulae containing dozens of secondary embryonic cells, which differentiated into two castes: embryo and pseudogerm. Embryo is composed of embryonic primordium and syncytial extraembryonic membrane, and eventually develops into wasp larva. Pseudogerm is syncytium and is eaten by the wasp larva at the late stage as compensatory nutrition (Additional file 1: Figure S2). The heterozygosity rate genomes (Additional file 1:FigureS3).Weevaluated of thegenomewas estimatedtobe0.2%, whichwas the genome assembly using Benchmarking Universal consistent with previous reports from other wasps Single-Copy Orthologs (BUSCO v3) [17, 18], which and showed that Hymenoptera species generally have identified 99.5% of 1658 conserved arthropod genes, low heterozygosity . of which 98. 9% (1639) are full length, indicating that We used ABySS Ver2.0.2  to de novo assemble the the genome assembly contained almost all gene infor- genome, achieving a draft genome of 132 Mb with GC mation and was suitable for subsequent analyses. content of 35.66% (Table 1, Additional file 1: Table S3). We annotated 145,038 repeat sequences (29 Mb), The Contig N50 of the genome assembly was 64.9 Kb, constituting 21.9% of the M. cingulum genome. Only nine which is among the largest of all published insect repeat sequence families were identified in M. cingulum Yin et al. BMC Genomics (2018) 19:420 Page 4 of 18 Table 1 Feature of assembled genome and Gene Sets Apis mellifera. The short intron length in M. cingulum was consistent with its small genome size. However, the Features Macrocentrus Nasonia Ceratosolen cingulum vitripennis solmsi average CDS length is similar among three hymenop- Genome size 132.36 295 278 terans, 1536 bp in M. cingulum, 1260 bp in A. mellifera (Mb) and 1289 bp in N. vitripennis, suggesting that the CDS Number of 13,289 26,605 15,018 lengths were conserved whereas the intron lengths contigs might be variable in closely related species (Additional Number of 5696 6181 7397 file 1: Table S8). Scaffolds We identified noncoding RNA by homologous search Quality Control (Covered by assembly) against the Rfam database  (E-value <=1e-5), yielding Contig N50 64,884 18,500 74,395 16 small nucleolar RNA (snoRNA), 39 small nuclear RNA (bp) (snRNA), 144 transfer RNA (tRNA) and 148 ribosome Scaffold N50 192.445 709.00 9558 RNA (rRNA). We predicted 111 miRNA using the soft- (Kb) ware MapMi  with the Hexapoda miRNAs in the miR- BUSCO genes 99.45 95.96 60.19 base as the reference (Additional file 1: Table S7) . (%) Genomic Features Repeat (%) 24.9 42.1 – Genome-based phylogeny of M. cingulum G + C (%) 35.66 40.6 – We carried out orthologous and homologous group analysis of proteins from 13 species, including six hyme- Gene Annotation nopterans, two dipterans, two lepidopterans, one coleop- Number of 11,993 18,882 11,412 Genes teran, one hemipteran, and one mite species, finding 2957 single-copy orthologs and 1174 multi-copies ortho- by RepeatModeler . To date, M. cingulum ranks the logs (Fig. 2a). The genome of the mite Tetranychus third smallest known insect genome assemblies, only lar- urticae was used as the outgroup. We also found 2028 ger than 108 Mb of the human body louse Prediculus hymenopteran-specific genes and 5206 insect-specific humanus humanus  and 89.6 Mbp of the midge genes. We performed a phylogenomic analysis of 589 Belgica antarctica . Corresponding to its small size, single copy orthologous genes with 128,070 conserved the M. cingulum genome only contains 15.6% known sites using maximum likelihood method as implemented transposable elements and 1.8% tandem repeats. In con- in RAXML . The phylogenetic tree of these 13 spe- trast, another parasitic wasp N. vitripennis had a greater cies showed that M. cingulum and the other 5 wasps/ abundance of transposable elements and repetitive DNA ants formed a hymenoptera cluster. Unexpectedly, M. (> 30%) . N. vitripennis also has a larger genome cingulum had a close relation with bees/ants rather than (295 Mb) than M. cingulum, suggesting that the percent- with the ectoparasitic wasp N. vitripennis in the family ages of repeat elements is the main factor affecting the of Pteromalidae. A. mellifera diverged from Solenopsis genome size (Additional file 1: Table S6). invicta approximately 177 million years ago, whereas M. We used the Optimized Maker-based Insect Genome cingulum separated from them approximately 253 mil- Annotation (OMIGA) pipeline  to annotate protein lion years ago (Fig. 2a). Homology analysis also indicated coding genes in the M. cingulum genome. The OMIGA that M. cingulum shared 5122 homologous genes with pipeline included training the de novo prediction soft- A. mellifera, higher than 5045 with N. vitripennis and ware AUGUSTUS and SNAP with 5036 full-length M. 4969 with Copidosoma floridanum (Fig. 2b). Moreover, cingulum genes that were obtained by analyzing the M. cingulum shared higher amino acid identity with A. transcriptomes of embryos, larvae, adult male and fe- mellifera compared With N. vitripennis or C. floridanum male wasps. After integrating evidence from RNA-Seq, (Fig. 2c). Only 26.8% of the orthologous pairs between de novo prediction and homolog protein alignment, an M. cingulum and A. mellifera shared a consensus official gene set (OGS) of 11,993 genes for M. cingulum gene order, suggesting that frequent occurrence of was obtained. 98.6% of OGSs had the expression evi- chromosomal rearrangement after divergence (Fig. dence in at least one sample. In total, 96.86% of the in- 2d). Taken all these evidences together, genome-based ferred proteins found the homology sequences in the phylogenetic analysis showed that the M. cingulum databases of NCBI NR, SWISS-PROT and InterPro clustered together with bees/ants, rather than other (Table 1 and Additional file 1: Table S7). parasitoid wasps such as N. vitripennis and C. florida- The average intron length in the M. cingulum genome num, which is consistent with a previous Bayesian was only 385 bp, much less than other hymenopteran in- phylogenetic analysis with a conserved gene family sects such as 1285 bp in N. vitripennis and 1291 bp in . We noticed that this result is not consistent Yin et al. BMC Genomics (2018) 19:420 Page 5 of 18 Fig. 2 Phylogenetic and comparative genomics analysis of M. cingulum genome. a The phylogenetic tree was constructed from 532 single-copy genes with 47,606 reliable sites by RAxML maximum likelihood methods. Bars are subdivided to represent different types of orthologs’ relationships. 1:1:1 indicates universal single-copy genes, absence and/or duplication in a single genome was allowed. N: N: N indicates other universal genes, absence in a single genome or two genomes within the different orders was allowed. Diptera indicates dipteran-specific genes. Lepidoptera indicates lepidopteran-specific genes. Hymenoptera indicates hymenopteran-specific. Others indicates all other orthologs. SD, species-specific duplicated genes. ND, species-specific genes. b The Venn diagram indicates the numbers of orthologous genes shared among the four hymenopterans, M. cingulum, C. floridanum, A. mellifera and N. vitripennis. All four hymenoptera insects share 4082 common homologous genes. The data shows the completeness of the gene repertoire encoded in the wasp M. cingulum genome. c The distribution of pairwise amino acid identity between six hymenoptera insects. Histogram shows M. cingulum shares higher amino acid identity with A. melifera than with other hymenopterans. There were 6796 orthologs between A. melifera and S. invicta, 4452 orthologs between C. floridanum and C. solmsi, 5137 orthologs between C. floridanum and N. vitripennis, 6403 orthologs between M. cingulum and A. melifera, 4196 orthologs between M. cingulum and C. floridanum, 4872 orthologs between M. cingulum and C. solmsi, 5808 orthologs between M. cingulum and N. vitripennis, 5616 orthologs between M. cingulum and S. invicta, 5790 orthologs between N. vitripennis and C. solmsi. d Microsynteny between three wasps by tracking gene positions through multiple species. The M. cingulum and A. mellifera shared 7284 orthologous, but only 1950 genes constituted 351 synteny blocks. The M. cingulum and N. vitripennis shared 6809 orthologous genes, but only 1877 genes constituted 346 synteny blocks, suggesting the frequent chromosome rearrangement among the wasp species with arecentreportonthe evolutionofthe Hymen- other parasitoid wasps (p < 0.05, student t-test), such as optera . It remains possible that the sample bias odorant binding proteins (OBP), P450 enzymes, Gluta- in our work resulted in this inconsistence since we thione S-transferases (GST) (Additional file 1: Table S13 did not choose much more hymenopteran species. and S14, Additional file 1: Figure S8-S16). The contrac- However, both genome-based phylogeny and hom- tion of OBP gene family could be related to the parasit- ology analysis indicated that M. cingulum is closer to oid lifestyle of M. cingulum because the parasitoid wasps bees/ants rather than parasitoid wasps. This inconsist- obtain nutrients from the host. As such, it would be un- ency is worthy of further clarification. necessary for them to detect as many odorants as non-parasitic insects since they have a safer environment The genomic features associated with parasitism and abundant food sources (Additional file 1:Figure S12). We identified gene families with apparent expansions or It has been reported that OBP genes in another poly- contractions in M. cingulum by a comparison with other embryonic wasp C. floridanum showed caste-specific five hymenopterans (including two parasitoid wasps), functions. The majority of OBP genes might have the two dipterans, two lepidopterans, one coleopteran, and functions in contacting with host hemolymph . The one hemipteran (Fig. 3). We observed some gene fam- decrease of P450 and GSTs gene families suggests the de- ilies were significantly contracted in M. cingulum and toxification ability of the parasitoid wasp has degenerated, Yin et al. BMC Genomics (2018) 19:420 Page 6 of 18 Fig. 3 Gene gain and loss associated with parasitism. a Number of gene families with apparent expansion/contraction among M. cingulum and 12 other species. M. cingulum has a significant higher number of gene contractions than expansions. In contrast, C. floridanum has more events of gene expansions. b Gene families with significant contraction or expansion. Noticeably, the gene families of OBP, P450, CCN, CDC, CDK, CASP, CBFA2T, CIT, CLASP, COX, DNAJ were contracted, whereas the gene families of CHEK, DNCH1, DLG1, NDUF and VDP were expanded in M. cingulum. c Numbers of venom proteins in different parasitic wasps. M. cingulum has few venom proteins since it has the feature of immune evasion and does not rely on the venom proteins to condition the host again perhaps because the host protects the parasitoid lar- generate dozens of secondary embryonic cells without vae from the external environment. The centromere pro- differentiation . Though the polyembryony is a strik- tein and cyclin-dependent kinase have essential roles in ing reproductive strategy, its molecular mechanisms cell cycle and cell division. These gene families were also remain largely unknown. The gene family analysis also indi- significantly contracted in M. cingulum (p < 0.05, student cated M. cingulum’s genomic features associated with poly- t-test). The driving forces leading to the contraction of embryony. Euchromatic histone-lysine N-methyltransferase these gene families remain unclear. In addition, the venom (EHMT) and Golgi apparatus protein (GLG) gene families proteins were significantly reduced in the M. cingulum, are significantly expanded. EHMT have been implicated in which is consistent with the fact that this endoparasitoid both maintenance of pluripotency and stabilization of a dif- wasp does not rely on the venom proteins to control the ferentiated cell identity [30, 31]. GLG interacts selectively host (Fig. 3c). Some insect immune genes are also and non-covalently with fibroblast growth factors that are contracted in M. cingulum (Additional file 1: Table S15). key players in the processes of proliferation and differenti- We found that the expansion of some gene families is as- ation of wide variety of cells and tissues . Some ortholo- sociated with the parasitoid life. Several gene families that gous groups specifically expanded in the two polyembryony are critical in vesicular trafficking pathway were expanded, wasps, such as vascular endothelial growth factor receptor including ADP ribosylation factors (ARF) and cytoplasmic which has been reported to control intestinal stem cell pro- dynein. These expansions could facilitate the parasitoids liferation . to better absorb the nutrients from the host. Immediately upon oviposition, the Macrocentrus egg initiates cleavage and then enters the proliferation phase. The gene family expansions associated with polyembryony During this stage, one single egg produces dozens of M. cingulum has a striking characteristic of polyembry- normal embryos and thousands of pseudogerms. The ony. During the proliferation phase of the embryo devel- normal embryos produce embryonic primordia that de- opment in M. cingulum, the blastomeres divide and velop into larvae while the pseudogerms do not. To Yin et al. BMC Genomics (2018) 19:420 Page 7 of 18 Fig. 4 The differential gene expression analysis between embryos and pseudogerms of M. cingulum. a Heatmap of differentially expressed genes in embryos and pseudogerms. The signal transduction pathways were enriched in the embryos (b) whereas the pathways of carbohydrate metabolism were enriched in the pseudogerms (d). The enriched pathways were consistent with the fate and function of embryos (cell cycle, FOXO, Wnt, Hedgehog, etc.) (c) and pseudogerms (hormone biosynthesis, inositol phosphate metabolism, amino sugar and nucleotide sugar metabolism, etc.) (e) identify the genes modulating polyembryonic develop- highly-expressed genes between these two types of em- ment in M. cingulum, we performed RNA-Seq analysis bryonic cells clearly reflects their distinct functions in of different types of embryos and compared the tran- embryonic development (Additional file 1: Table S11). script abundances of two types of embryos (normal em- bryos and pseudogerms) (Fig. 4a). In normal embryos, The microRNA miR-14b modulates the polyembryony in the most abundant transcripts were from genes associ- M. cingulum ated with cell growth and development, nucleotide and We also compared miRNA abundance between morulae, amino acid metabolism, DNA replication, purine and embryos, and pseudogerm. Pseudogerm is a syncytium, pyrimidine metabolism (Fig. 4b and c). In contrast, for and it does not contain an embryonic primordium while the pseudogerms, a kind of abnormal embryo cells origi- theembryodoes. ThemiR-14b stoodout as having nated from the secondary embryonic cells and mainly high-expression levels specifically in the primary morula, provides protection and nutrients to the embryo, abun- the early embryos at the proliferation phase, compared to dant transcripts were from genes associated with amino normal embryos (p < 0.05, student t-test) (Additional file 1: sugar and nucleotide sugar metabolism, galactose metab- TableS12). miR-14 is aconserved insect-specificmiRNA.It olism, inositol phosphate metabolism and insect hor- was found to be an important regulator in insulin produc- mone biosynthesis (Fig. 4d and e). The differences in tion and metabolism , hedgehog pathway , Yin et al. BMC Genomics (2018) 19:420 Page 8 of 18 apoptosis , insect development and metamorphosis the constructs were transfected into HEK293T cells. . miR-14b shares identical seed region with miR-14 and Compared to the negative controls, the luciferase re- has 3 different nucleotides near 3′ end compared with porter activity of three positive constructs was signifi- miR-14. Quantitative real time PCR (qPCR) further con- cantly reduced in the presence of miR-14b mimics and firmed the high expression of miR-14b specifically in mor- the mutation of binding sites abolished the repressions ulae but not in normal embryos (Additional file 1:Figure (Fig. 5a, b, and c). These results confirmed that S17). We predicted the targets of miR-14b using five differ- miR-14b targets on the MBP-1, KMT2E and Runt. ent software packages: miRanda, RNAhybrid, RNA22, PITA The binding sites of the miRNA in the target genes and TargetScan. Taking the intersection of results from all are shown in Fig. 5d. five packages, three genes were consistently predicted to be The first target gene MBP-1 acts as a negative regula- the target of miR-14b: c-Myc Promoter-binding Protein 1 tory factor for the transcription factor c-Myc, which (MBP-1), histone-lysine N-methyltransferase 2E (KMT2E) drives cell proliferation and regulates stem cell and segmentation protein Runt. self-renewal . miR-14b might control the MBP-1 and We performed a dual-luciferase assay to confirm the thus stimulate the c-Myc to maintain the embryo cells to interactions between the miR-14b and three predicted be pluripotent and to produce multiple normal embryos target genes. The miR-14b binding regions on three tar- . The second target gene KMT2E specifically mono- get genes (3’UTRs of KMT2E and Runt, the CDS region and di-methylates Lys-4 of histone H3 and thus regu- of MBP-1) were introduced into the pMIR-REPORT vec- lates the expression of the downstream genes. KMT2E tor at the downstream of a firefly luciferase gene. For acts as an important cell cycle regulator, participating in the negative control, mutations were introduced into the cell cycle regulatory network . The third target gene target genes to abolish the miRNA target sites. The con- Runt is a vital transcriptional regulator which regulates structs that did not contain the miR-14b binding sites of the segmentation in the development. Runt plays an the target genes were used as the negative control. All essential role in modulating the formation of classic Fig. 5 miR-14b modulate the polyembryony by targeting the RUNT, MBP-1 and KMT2E in M. cingulum. a-c Dual-luciferase reporter assays were performed to examine the interaction of mci-miR-14b and its target sequences and corresponding mutated targeting sequences (RUNT 3’UTR-Mt, MBP-1 CDS-Mt, KMT2E 3’UTR-Mt) cloned into the 3’UTR of the reporter gene. Signals from the reporters with the 3’UTR of RUNT, MBP-1, KMT2E were significantly knocked down in the presence of the mci-miR-14b mimics (p < 0.0001). Introducing the mutations at the target sites abolish the silence effects. ** indicates 0.001 < p < 0.05, *** indicates p < 0.001. d The target sites of mci-miR-14b appears in the 3’UTR of RUNT and KMT2E whereas in the CDS of MBP-1 Yin et al. BMC Genomics (2018) 19:420 Page 9 of 18 developmental patterns . Downregulation of Runt by expanded, which may facilitate the transfer of nucleic miR-14b might prevent the transition from cell differen- acids, amino acids and other nutrients from the host tiation to cell proliferation phases of development. hemolymph to the wasp larvae. We did not find any sig- nificant evidence for contraction or expansion of genes O-glycosylation of the mucin domain in hemomucin associated with circadian rhythm, immune system and confers the immune evasion digestion. Changes in these gene families might be ex- Most endoparasitoids actively inhibit the immune re- pected because the endoparasitoid wasp larvae that res- sponses of the host by multiple parasitic factors such ide in the dark interior of the host hemocoel, are as PDV, teratocytes and venom. However, some endo- carnivore, and are typically free of bacterial infections. parasitoid wasps, such as M. cingulum, successfully However, the conserved repertoire for these gene fam- evade the encapsulation of host hemocytes by deploy- ilies may simply be due to the fact that the endoparasi- ing hemomucin, an O-glycosylated protein. In M. cin- toid wasp adults are free living and routinely face the gulum, the hemomucin protein (McHEM) was shown fully complexity of an external environment. to be highly expressed on the extraembryonic mem- The genome assembly of M. cingulum created an un- brane of eggs and embryos. Either blocking with precedented opportunity to investigate the molecular anti-McHEM serum or digestion with O-glycosidase mechanisms underlying polyembryony, the most unusual led to the encapsulation of M. cingulum embryos by and interesting characteristics of this endoparasitoid the host . To better understand the genetic fea- wasp. The adaptive significance of this peculiar develop- tures of immune evasion, we searched the genomes mental mode is not yet very well understood, but one or transcriptomes of eight endoparasitoid wasp spe- leading hypothesis suggests it is a means to increase re- cies, and results showed that hemomucin exists in all productive output when females are constrained by egg wasps (Additional file 1: Table S10). Only M. cingulum production (due to very small body size) or oviposition and Venturia canescens hemomucin proteins contain a opportunities (due to low host abundances) . This in- mucin domain with the typical stretches of poly-threonine cludes the production of two different offspring castes, separated by proline residues (Fig. 6a). Both M. which may improve the survival rate of M. cingulum lar- cingulum and V. canescens passively evade host cellular vae. In polyembryonic development, one egg divides into immune reactions [14, 43, 44]. In V. canescens, the multiple pluripotent embryos, each of which develops hemomucin-lipophorin complex is thought to protect the into an individual. The gene regulating the pluripotency parasitoid egg by camouflaging the egg surface with host of embryos at cell proliferation stage remains unclear. proteins during the initial contact with hemolymph. In To this end, our analysis of miRNAs associates miR-14b contrast, the other six larval parasitoid wasps we exam- with polyembryonic development, specifically by target- ined employ multiple parasitic factors such as PDV and ing regulatory genes MBP-1, KMT2E and Runt. Previous teratocytes to actively suppress the immune reaction of reports have shown that downregulation of these three host (Fig. 6b). Prediction of O-glycosylation sites in target genes are essential to maintain the pluripotency of the mucin domain showed that hemomucin of these two stem cell [39, 40, 47]. miR-14b is abundant in the prolif- endoparasitoid wasps had similar O-glycosylation pat- eration phase during which multiple embryo cells are terns (Fig. 6c). These pieces of evidence suggest that generated from single egg. Once the embryos and pseu- O-glycosylation in the mucin domain should be a key dogerms begin development and differentiation, expres- factor modulating the immune evasion in M. cingulum. sion level of miR-14b is much lower, suggesting it is sensitive and efficient during this regulatory process. Discussion The totality of available evidence collected here strongly The comparative genomics analyses of the parasitoid implicates miR-14b as a factor in modulating polyembry- wasp genomes revealed that the parasitoid lifestyle ony in M. cingulum. It has been reported that some long seems to have shaped the M. cingulum genome, includ- non-coding RNA were specifically expressed in either ing significant gene family contractions of OBP and the cleavage or the subsequent primary morula stages, metabolic detoxification enzymes (P450 and GST). implying noncoding RNA might play important roles in Because the larvae residing in the host hemolymph are polyembryogenesis . protected from the external complex environments, the Suppression or avoidance of host immune response is contractions of OBP, P450 and GST are common fea- a key element of wasp biology. Active suppression strat- tures of parasitoid wasps [5, 46]. We also observed a de- egies, such as teratocytes or injecting PDVs or VLPs, crease of the number of venom proteins in M. cingulum, have generally received the bulk of attentions in recent which probably occurred because this endoparasitoid research. Despite receiving less attention from researchers, does not rely on its venom to condition the host. In con- passive immune-evasion strategies, as employed by M. trast, the ARF and cytoplasmic dynein were significantly cingulum, are also common among parasitoid wasps. Yin et al. BMC Genomics (2018) 19:420 Page 10 of 18 Fig. 6 (See legend on next page.) Yin et al. BMC Genomics (2018) 19:420 Page 11 of 18 (See figure on previous page.) Fig. 6 The phylogenetic relationships of insect hemomucin genes and the associations of mucin domain with immune evasion in M. cingulum. a the hemomucin genes of 69 insects were collected from the NCBI GenBank. Among which, 15 insects have two hemomucin genes, one with intact mucin domain and the other without mucin domain. Thirty-two insects have only one hemomucin gene with mucin domain whereas 22 insects have only one hemomucin gene without mucin domain. Phylogenetic analysis indicated that some wasps lost the mucin domain such as Cotesia vestalis and Microplitis demolitor whereas other wasps have mucin domain. b For the larval parasitoids, we found that only two wasps (M. cingulum and V. canescens) with immune evasion have the mucin domain whereas other wasps with PDV or teratocytes (or both) do not have. c The mucin domains have abundant O-glycosylation sites. The O-glycosylation status of the mucin domain involved in conferring the immune evasion in M. cingulum (unpublished data) Hemomucin is currently the only protein reported to play Sun Yat-sen University. We used a whole genome shotgun an important role in evading host cellular immune reac- strategy and the next-generation sequencing technologies tions. Our previous study proved that after the connection on the Illumina HiSeq 2000 platform to sequence the gen- of the sugar chain (Gal-GalNAc) and peptide (S/T) in ome of M. cingulum. DNA was extracted from pooled hemomucin were digested by O-glycosidase, more M. cin- adult males. To decrease the risk of non-randomness, we gulum’s embryos were encapsulated compared to control built different insert sizes libraries. Three paired-end se- . We further proved that the recombinant unglycosy- quencing libraries of M. cingulum (170 bp, 500 bp and lated hemomucin could not protect the sephadex A-25 800 bp) and 1 mate-pair large insert size sequencing beads mimicked parasitoid egg from being encapsulated library (8 Kb) were constructed respectively. In total, we by hemocytes of O. furnacalis, however, the recombinant got 103.67 G raw data for M. cingulum. After filtering out glycosylated hemomucin could (unpublished data). Here, low quality and duplicated reads, 93.24 G data were main- we found that only hemomucin in parasitoid taken passive tained for assembly for M. cingulum (Additional file 1: immune evasion strategies possessed the typical mucin Table S1). domain (Fig. 6). So, we speculated that when parasitoids gain more active factors such as PDVs, they do not rely on Transcriptome sequencing the protection of hemomucin, so the mucin domain were The transcriptomes of the morula, embryo, pseudogerm, lost during evolution. The immune evasion function larva, female and male adult of M. cingulum were se- of mucin has been proved in other parasites such as quenced using Illumina 2000 platform with paired-end Trypanosoma cruzi and virus[50, 51], indicating library. Total RNA was isolated from the sample of mucin domain in glycoprotein, which located on the mixed embryos including morula and embryo, embryo, parasite’s surface, should play vital role during the pseudogerm, larva, female adult and male adult of M. co-evolution of parasite and host. However, further cingulum using TRIzol kit following the manufacturer’s studies are needed to verify this hypothesis. protocol (Life Technologies, USA). RNA sequencing li- braries were constructed using Illumina mRNA-Seq Prep Conclusions Kit. Oligo (dT) magnetic beads were used to purify poly We obtained the draft genome of parasitoid wasp M. (A) containing mRNA molecules. The mRNA was frag- cingulum. Comparative genomics analysis of 12 insects mented and the first strand cDNA was synthesized by including six wasps indicated that parasitoid life has reverse transcription using a random primer. The shaped the wasp genomes, resulted in the expansion or second-strand cDNA was synthesized with DNA poly- contraction of some gene families associated with chem- merase I to produce double-stranded cDNA fragments. ical communication, detoxification, development, cell The double stranded cDNA was end-repaired using cycle, etc. Based on a series of analyses of M. cingulum’s Klenow and T4 DNA polymerases. After ligation to genome and transcriptome, we found that the miR-14b paired-end sequencing adapters, gel electrophoresis was might play a key role in regulating polyembryonic devel- used for size selection. Finally, the library preparation opment of this parasitoid wasp. Only hemomucin in endo- was completed by PCR amplification and the libraries parasitoid wasps taking passive immune evasion strategies were sequenced using Illumina HiSeq 2000 platform possessed the typical mucin domain. Further analysis sug- (101 bp at each end). gested that the glycosylation of the mucin domain in these wasps might confer the immune evasion. Estimation of genome size Genome size determinations were performed with flow Methods cytometry following the procedures described in Hare & Genome sequencing Johnston  with some modifications. A whole body of Animal experiments were approved by the Molecular M. cingulum standard (1C = 175 Mbp) were placed into Ecology and Pest Control lab, School of Life Sciences, 1 ml of Galbraith buffer in a 2-ml Kontes Dounce Yin et al. BMC Genomics (2018) 19:420 Page 12 of 18 homogenizer tube and stroked 15 times with the A pes- The final assembly comprised 5696 scaffolds spanning tle to release nuclei from both the samples. The result- 132 Mb (129 Mb excluding gaps) with scaffold N50 of ant solution was filtered through 40 U nylon mesh, 192 Kb. The largest scaffold spans 1.37 Mb, and the stained a minimum of 20 min in the dark with 25 μlof genome-wide GC content was 35.66% (Additional file 1: propidium iodide, and then run on a Partec Cyflow cyt- Table S3). We used 500 bp non-overlapped window to ometer to score relative red fluorescence (> 590 nm) of analysis the distribution of GC content and CpG ratio nuclei from the sample and standard. The amount of by an in-house script (Additional file 1: Figure S3). CpG DNA in the sample was determined as the mean channel ratio is defined as P[CpG]/(P[C]*P[G]). P[C] is the fre- number of the 2C peak of the sample divided by the quency of C nucleotide and P[G] is the frequency of G mean channel number of the 2C peak of the standard nucleotides, while P[CpG] is the frequency of CpG times the amount of DNA in the standard. All DNA esti- dinucleotides. mates were determined from a co-preparation of sample and (internal) standard. The position of the sample peak Genome quality assessment relative to that of the other peaks was established by a sin- Several statistics are used to describe the completeness gle run with the sample or (external) standard prepared and contiguity of a genome assembly, and by far the and stained individually. M. cingulum males (161 Mb, 1C most important and widely used software is BUSCO peak is channel 82.00). M. cingulum females (314 Mb, 2C [17, 18](version3.0). To runBUSCO software,we peak is channel 160.00) (Additional file 1: Figure S1). selected the inecta db 9 data sets which contains Genome size was also estimated by K-mer analysis. 1658 benchmarking universal single-copy orthologous The distribution of K-mer depends on the characteristics genesasthe library.The BUSCOsoftwarewas per- of the genome and follow a Poisson’s distribution . A formed with default parameters. K-mer refers to an artificial sequence division of K nu- To compare the quality of the M. cingulum genome cleotides iteratively from sequencing reads. To obtain in- with other species, we collected all the published insect dependent estimates of genome size and repeat content genomes (Additional file 1: Table S4), and used the same we used the software jellyfish (version 1.1.4) to generate parameters and procedures to assess them. The results k-mer spectra of original raw sequencing data . The proved that the genome assembly of M. cingulum had a size of K-mer was set to 17 in this study. Short insert high quality (Additional file 1: Table S5). size libraries (170 bp and 500 bp) were sequenced and used to estimate genome size. Based on this method- Genome annotation ology, the genome size of M. cingulum, was estimated to The M. cingulum genome was annotated by the OMIGA be 135 Mb (Additional file 1: Figure S2).  genome annotation pipeline, which is an optimized Maker-based insect genome annotation workflow. Genome assembly Identifying repeat sequences In the pipeline, the first step is to identifying repeat se- 1) Genomic paired and mate-pair reads were quality quences, because repeats complicate genome annotation trimmed and filtered as described in Rödelsperger . Tandem Repeats Finder (TRF) was used to search et al. 2014 . tandem repeats in the genomes , and Novel repeat 2) Genomic paired-end libraries and mate-pair library sequences were predicted by RepeatModeler (version were used to generate a first draft assembly with 1.0.7), which includes two De novo programs, RECON the ABySS  assembler (Version 1.3.7) with K-mer  (version 1.08) and RepeatScount  (version 1.0.5). 64, yielding an assembly with Scaffold N50 and Transposable elements (TEs) were predicted in the as- Contig N50 79 Kb, 30 Kb respectively. semblies by homology searching against RepBase using 3) Next, we used SSPACE  (version 2.0) with the RepeatMasker  (version 4.0.5). Both programs were settings (−k 5 -a 0.7 -× 1 -m 90 -o 20) for scaffolding used with default Parameters, yielded 24.34 Mb of repeat using the mate-pair data and make the Scaffold N50 sequences (Additional file 1: Table S6). increase to 170 Kb. 4) Intra scaffold gaps were closed using the Gapclosing Mapping RNA-Seq raw data with the genome scaffolds module (version 1.12) of the SOAP package , The transcriptome assembly followed the protocol de- increasing the Contig N50 comes to 64 Kb. scribed by Trapnell et al. . Six transcriptomes were 5) Last, we used the software Rabbit  (version used to provide gene expression evidence. The RNA-Seq 2.6.18) with the settings (K-mer_size 16 max_occ raw data were quality checked by Trimmomatic  438) to remove the redundancy of that could not (version 0.36) and then were mapped to the genome by be assembled. the Bowtie  (version 2.2.5). Next, TopHat  Yin et al. BMC Genomics (2018) 19:420 Page 13 of 18 (version 2.1.0) was used to determine the exon/intron best 20 hits were used for annotation. Protein domains junctions with the genome. Finally, Cufflinks (version were annotated by InterProScan (version 5.21–60.0) with 2.2.1) was used to obtain putative transcripts. We named the panther data version 10.0. Gene Ontology (GO) ana- these transcripts as the Cufflink Gene Sets. All programs lysis was carried out using the software Blast2GO, yielding were used with default parameters. 59 enriched subcategories at level 2 (Additional file 1: Figure S6). The KEGG pathway annotation were got Re-training de novo gene prediction software by the BlastKOALA web server, and the level 2 cat- To obtain high accuracy, de novo gene prediction soft- egories had 42 subcategories (Additional file 1:FigureS5). ware must be re-trained before it can be used for gen- Clusters of Orthologous Groups of proteins (COGs) were ome annotation. The best training strategy is to use annotated by in-house Perl scripts (Additional file 1: sufficient genes of the same species as the training data- Figure S4). A total of 11,485, 11,617, 9094, 7254, and 4857 set . To collect enough genes for training, we se- genes were annotated from the reference gene set using lected transcripts with intact open reading frame (ORF) the Swissprot, nr, InterPro, GO, KEGG databases, respect- from the Cufflink genes. To further maximize sensitivity ively. Six hundred eighty-eight genes were not annotated for capturing ORFs that may have functional signifi- by any known databases (Additional file 1: Table S7). cance, a BLAST search against a database (E-value =1e-5) of UniProtKB/Swiss-Prot proteins, and searching Noncoding RNA gene annotation PFAM (E-value =1e-5) to identify protein domains. After We use INFERNAL  software searching against filtered by TransDecoder software, only the genes which Rfam database of release 11.0 with e-value cutoff of has a complete ORFs was included. If genes had multiple 1e-5 to predict noncoding RNA (ncRNAs). Four types transcripts, only the longest transcript was kept for fur- of ncRNAs were annotated in our analysis: transfer ther use. Then those genes were used to re-train the pre- RNA (tRNA), ribosomal RNA (rRNA), small nuclear diction software Augustus  (version 3.1) and SNAP RNA (snRNA) and small nucleolar RNA (snoRNA).  (version 2006–07-28). For GeneMark-ET  (Suite In M. cingulum genome, in total, 148 rRNAs, 144 4.21), more than 10 Mb of genome sequence was used tRNAs, 39 snRNAs and 16 snoRNAs were annotated to re-train the software. The default parameters were (Additional file 1:Table S7). used for training. The MapMi  program (version 1.5.0) was used to identify the M. cingulum miRNA homologs by Producing an official gene sets with Maker mapping all insect miRNAs in the miRBase against Homolog-based predictions, de novo predictions and the M. cingulum genome. The sequences of protein transcriptome-based predictions were integrated to an- coding genes, repetitive elements and other classes of notate the protein coding genes in the M. cingulum gen- non-coding RNAs were removed from the genome ome. We annotated protein-coding genes using the scaffolds before mapping. All algorithms were per- MAKER  pipeline (Version 2.31). In the MAKER formed with default parameters. Finally, 111 miRNAs pipeline, sequences of homologous proteins were from were obtained (Additional file 1:Table S7). the NCBI invertebrate RefSeq. Three ab initio gene pre- diction programs including Augustus, SNAP and Orthologs GeneMark-ET which were re-trained with M. cingulum Orthologous groups were constructed with OrthoMCL transcript were used to predict coding genes. Addition-  using the protein sequences of M. cingulum, other ally, the RNA-Seq data were mapped to the genome five Hymenoptera insects (N. vitripennis, Ceratosolen using TopHat, and cufflinks was used to assemble solmsi, C. floridanum, A. mellifera and S. invicta), one transcripts to the gene models. All gene evidence Coleoptera species (Tribolium castaneum), two Diptera identified from above three approaches were com- species (Drosophila melanogaster and Anopheles gam- bined by MAKER into a weighted and non-redundant biae), two Lepidoptera species (Danaus plexippus and consensus of gene structures. All the MAKER param- Bombyx mori), one Hemiptera species (Cimex lectular- eters are default settings. Finally, 11,993 genes were ius) as well as one non-insect arthropod species (T. urti- annotated (Additional file 1:Table S7). cae) (Additional file 1: Table S4). The default parameters were used, yielding 590 absolutely 1:1 orthologs from Gene function annotation 19,318 OrthoMCL clusters using a custom Perl script Functional annotation of 11,993 M. cingulum protein (Fig. 2a). We compared the predicted genes of M. cingu- coding genes was carried out by BLASTP against two in- lum with other two published Hymenopteran genomes tegrated protein sequence databases- UniProtKB/Swis- (C. solmsi and N. vitripennis) and D. melanogaster using s-Prot proteins and NCBI Non-redundant protein OrthoMCL-DB  (version 9.0) with the default set- sequences (nr). The E-value cutoff was set at 1E-5. The tings, yielding 3850 orthologs groups in four species Yin et al. BMC Genomics (2018) 19:420 Page 14 of 18 (Fig. 2b). The distribution of pairwise amino acid iden- invicta, T. urticae, T. castaneum) were downloaded from tity was counted for each pair of the orthologs genes by the Ensembl database or NCBI or other databases (Data the needle module in the EMBOSS  packages (ver- source showed at Additional file 1: Table S4). For genes sion 6.6.0)(Fig. 2c). with alternative splicing variants, the longest transcripts were selected. We used Treefam  to define a gene Synteny family as a group of genes that descended from a single To understand the potential chromosome rearrange- gene in the last common ancestor of considered species ment between N. vitripennis, M. cingulum and C. flori- . We used CAFÉ  to identify gene family expan- danum, best reciprocal hit of protein sequences using sions and contractions. This revealed 824 gene family BLASTP with an e-value < 0.01 between any two pairs expansions and 3980 gene family contractions in M. of species were defined as orthologous counterparts. cingulum (Fig. 3). The similarity of genes was indicated as a density plot of the product of aligning ration and identity. The aligning ratio was inferred by the size of aligning loci divide the Hemomucin gene analysis size of shorter protein sequence in the alignment. The We downloaded all insects OGS from InsectBase . identify information was derived directly from the Blastp The hemomucin protein sequences of M. cingulum alignment. Synteny blocks were identified based on the and D. melanogaster were retrieved from GenBank of orthologous gene order detected as above. Synteny the National Center for Biotechnology Information blocks were defined when at least 3 orthologous coun- (NCBI)  as reference sequences. The candidate terparts were both clustered (not interrupted by more hemomucin genes of each species were obtained by than 5 genes) and located in continuous loci in a single the BLASTP against the reference sequences with scaffold for each species in each pair of species (Fig. 2d). E-value 1E-30. Then the candidate hemomucin genes were analyzed using the HMMER (version3.1b2). Phylogenetic tree and divergence time To ensure reliability, the sequences short than 300 bp We constructed a phylogenetic tree of M. cingulum and were removed. other selected insects (A. gambiae, A. mellifera, B. mori, The phylogenetic relationships of hemomucin in dif- C. floridanum, C. solmsi, C. lectularius, D. plexippus, D. ferent insects was inferred using the Neighbor-Joining melanogaster, N. vitripennis, S. invicta, T. urticae, T. cas- method . The optimal tree with the sum of branch taneum) using 590 single-copy orthologous genes. We length of 10.49620813 was shown. The bootstrap value generated multiple sequence alignments for each 1:1 was set as 1000 replicates and the support values were orthologs cluster using MUSCLE. The resulting align- given. The evolutionary distances were computed using ments were trimmed using trimAl to remove positions the p-distance method and were in the units of the num- with gaps in more than 20% of the sequences, and ber of amino acid differences per site. The analysis concatenated to one super-sequence for each species, re- involved 79 amino acid sequences. All positions con- spectively. Then we used the maximum likelihood taining gaps and missing data were eliminated. There method implemented in RAxML  to reconstruct the wasatotalof 215 positionsinthe finaldataset. phylogenetic tree. Modeltest  was used to select the Evolutionary analyses were conducted in MEGA (ver- best substitution model. RAxML was used to recon- sion 7.0.18) (Fig. 6a). struct the phylogenetic tree with the aLRT method for To investigate the roles of hemomucin conferring the branch support and T. castaneum was used as an out- passive evasion in wasp, we selected eight wasp species group species. The values of statistical support were ob- (Microplitis bicoloratus, M. demolitor, M. cingulum, tained from 1000 replicates of bootstrap analyses. The Cotesia vestalis, C. rubecula, C. glomerata, V. canescens, Reltime ML  approach was used to estimate the spe- Diadromus collaris) which have strong evidence of either cies divergence time using the program MEGA  passive evasion or parasitic factors. For M. bicoloratus, (version 7.0.18) and the 250 million years’ divergence C. vestalis, C. rubecula, C. glomerata, V. canescens and time between D. melanogaster and A. gambiae [79, 80] D. collaris which did not have genome sequences, we was selected as divergence time calibration constraints downloaded the RNA-Seq raw data from the NCBI SRA that used to convert relative divergence times to abso- database and assembled them by Trinity  (version lute divergence times (Fig. 2a). 2.4.0) with default parameters. Multiple amino acid sequence alignments were ana- Gene family clusters, expansion and contraction lyzed using the ClustalX multiple-alignment program. A Protein data for 13 species (A. gambiae, A. mellifera, B. phylogenetic tree was constructed using MEGA (version mori, C. floridanum, C. solmsi, C. lectularius, D. plexip- 7.0.8) based on the known amino acid sequences of pus, D. melanogaster, M. cingulum, N. vitripennis, S. hemomucin of insects. The protein pattern and profile Yin et al. BMC Genomics (2018) 19:420 Page 15 of 18 of hemomucin genes were obtained from the PROSITE GSPs and universal primer mix. The PCR conditions database using InterProScan. The transmembrane were: incubation at 94 °C for 3 mins; five cycles at 94 °C helix and signal peptide were analyzed using SignalP for 30 s, 72 °C for 3 mins; five cycles at 94 °C for 30 s, (version 4.1) and TMHMM (version 2.0). The poten- 70 °C for 30 s, 72 °C for 3 mins; and 25 cycles at 94 °C tial O-glycosylation sites and phosphorylation sites for 30 s, 68 °C for 30 s, 72 °C for 3 mins, with a final ex- were predicted using NetOGlyc (version4.0) and Net- tension of 72 °C for 10 mins. The PCR products were PHos online server (http://www.cbs.dtu.dk/services/NetO separated on an agarose gel and purified using the Glyc and http://www.cbs.dtu.dk/services/NetPhos/)(Fig. 6b, MiniBEST Agarose Gel DNA Extraction Kit (Takara, Fig. 6c, Additional file 1:Table S9). Otsu, Japan). Purified DNA fragment was ligated into the pGEM-T Easy Vector (Promega) for sequencing (BGI, Transcriptome analyses Shenzhen, China). We used cuffdiff to analyze the differential gene expression. First, raw RNA-Seq reads were processed to Cell culture and luciferase assay remove adapter and low-quality sequences using Trim- 3’UTR fragments to test were cloned at the downstream momatic. Next, the clean reads were aligned to the as- of the firefly luciferase gene. pMIR-REPORT vector sembled M. cingulum genome Scaffolds using bowtie. (Obio, Shanghai, China) was used as a firefly luciferase Raw counts for each predicted gene were derived from reporter vector, and pRL-CMV vector (Promega, the read alignments and normalized to fragments per Madison, WI, USA) was used as the Rellina luciferase kilobase of exon model per million mapped fragments control reporter vector. We used the HEK293T cell line (FPKM) and differential expression analyses were per- (Shanghai Institutes for Biological Sciences, Shanghai, formed using cuffdiff. The raw P-values were adjusted China) for the assay and the cells were cultured at 37 °C, for multiple testing using the false discovery rate 5% CO with DMEM (Gibco, Grand Island, NY, USA) + (FDR). For each comparison, genes with FDR < 0.05 10% FBS (Hyclone, logan, UT, USA) and plated in and fold change > 2 were considered as differentially 96-well culture plates at a density of 2 × 10 cells per expressed genes. well for 24 h’ incubation. For DNA transfection mixture, the proportion was 0.2 μg of reporter vector, 0.01 μgof miRNA gene expression analysis and target prediction control reporter and 0.25 μl lipofectamine 2000 Reagent The raw RNA-Seq reads were mapped to the extended (Invitrogen) per well. miRNA mimics were synthesized miRNA precursors. We used an in-house Perl script to by RiboBio (Guangzhou, China) and diluted to a concen- count the raw reads of each miRNA. Then, the software tration of 100 nM. After incubated at room temperate DEseq2  was used to analyze the differential expressions for 5 mins, DNA and miRNA mixed with lipofectamine of miRNA genes. For each comparison, miRNA genes with 2000 reagent transfection were incubated for 20 mins, p <0.05 were considered as differentially-expressed miRNA. respectively. After removing 50 μl culture medium per To predict the miRNA targets, we employed five soft- well, 25 μl DNA transfection mixture and 25 μl miRNA wares miRanda  (version 3.3a) with maximum free mixture were co-transfected for almost 6 h. We applied energy parameter − 25 kcal/mol, Targetscan  (version six replicates for each sample. At 48 h after transfection, 7.0) with default parameters, RNAhybird  (version cell lysates were prepared, and we conducted firefly lu- 2.1.2) with maximum free energy parameter − 20 kcal/ ciferase activity assay using Dual-Luciferase Reporter mol, PITA  (version) with maximum free energy par- Assay System (Promega) according to the manufacturer’s ameter − 10 kcal/mol and RNA22 (version) with default protocols with Infinite M1000 (Tecan, Switzerland). The parameters. The miRNA-mRNA relations which were experiment was performed in three replicates. The mean predicted by all five software packages were kept for fur- of the relative luciferase expression ratio (firefly lucifer- ther analysis. ase/renilla luciferase, Luc/R-luc) of the control was set to 1. Statistics was analyzed by two-tail t-test. miRNA target validation 3’UTR cloning Resistance, chemical ecology and immunity related gene To obtain the 3’UTR of target genes in M. cingulum, family analysis 3’RACE reactions were performed using a SMARTer™ Five chemical ecology related gene families of odorant re- RACE cDNA Amplification kit (Clontech, Mountain ceptors (ORs), gustatory receptors (GRs), ionotropic recep- View, CA, USA) according to the user’s manual. tors (IRs), chemosensory proteins (CSPs) and odorant Gene-specific primers (GSP) used for 3’-RACE reactions binding proteins (OBPs), and the insecticide target genes of were designed based on the coding region sequences known insecticides, detoxification genes of P450s, GSTs and (CDS) using PRIMER PREMIER 5.0 (Additional file 1: SNMPs were identified in the M. cingulum genome. First, Table S16). The first-step PCRs were performed with the we obtained the reference protein sequences of each gene Yin et al. BMC Genomics (2018) 19:420 Page 16 of 18 family from the GenBank of NCBI, and manually confirmed McHEM: M. cingulum hemomucin protein; miRNA: microRNA; OBP: Odorant binding proteins; OGS: Official gene set; OMIGA: Optimized Maker-based Insect each reference sequence. Then BLASTP was used to obtain Genome Annotation; PDV: Polydnaviruses; qPCR: Quantitative PCR; rRNA: Ribosome the homolog candidate sequences with E-value 1E-5. RNA; snoRNA: Small nucleolar RNA; snRNA: Small nuclear RNA; tRNA: Transfer RNA Immune-related reference genes were retrieved from the Acknowledgements immunodb database  and aligned against to different We thank Professor Michael Strand in University of Georgia for his valuable wasp species using BLASTP (E-value 1E-5). All candidate suggestions. This work was partially supported by the National Key Research sequences were filtered by HMMER (E-value 1E-5) against and Development Program [2016YFC1200600, 2017YFD0200900, 2017YFC1200602], and the National Natural Science Foundation of China [31672033, 31760514, with the Pfam database. Multiple sequence alignments were 31772238, 31701785]. aligned by the MUSCLE, and conservation blocks were trimmed by the trimal software. The phylogenetic trees were Funding This work was in partial supported by the National Key Research and Development constructed by the RAxML with suited Model se- Program [2016YFC1200600, 2017YFD0200900, 2017YFC1200602], and the National lected by the Modeltest with bootstrap value of 1000. Natural Science Foundation of China [31672033, 31760514]. The funders had no role in study design, data collection and analysis, interpretation of data, decision to publish, or preparation of the manuscript. Additional file Availability of data and materials Additional file 1: Figure S1. Flow cytometry estimation of the genome Data for the M. cingulum genome has been deposited in the GenBank/ size for the M. cingulum. Figure S2. The distribution of 17-mer frequency EMBL/DDBJ Bioproject database under the accession code PRJNA361069. in M. cingulum genome sequencing reads. Figure S3. Distribution of GC This Whole Genome Shotgun project has been deposited at DDBJ/ENA/ content, CpG Obs/ExpRatios of M. cingulum(Mcin), N. vitripennis(Nvit) and GenBank under the accession MVJL00000000. The version described in this A. mellifera(Amel). Figure S4. COG function classification of the OGS in M. paper is version MVJL01000000. And all the data in this article had been cingulum. Figure S5. KEGG pathway analysis of the OGS in M. cingulum. deposited in the database InsectBase. Figure S6. GO classification of the OGS in M. cingulum. Figure S7. Venn diagram of the homologous protein-coding genes among three wasps Authors’ contributions (M. cingulum, C. solmsi, N. vitripennis) and fruit fly (D. melanogaster). Fig- Project design: FL, JH, WQZ, XXC; Manuscript design: FL, JH; Manuscript ure S8. Phylogenetic relationship of CSP proteins from A. mellifera, C. flori- writing: FL, JRW, CLY, HJ, MZL; M. cingulum population:QMC, YPD; Collection danum, C. solmsi, M. cingulum, N.vitripennis, S.invicta. Figure S9. of DNA/RNA samples for genome and RNA-Seq sequencing:QMC; Sequencing, Phylogenetic relationship of GR proteins from A. mellifera, C. floridanum, assembling and annotation of genome and transcriptome: CLY, MZL, JDL; Gene C. solmsi, M. cingulum, N.vitripennis, S.invicta. Figure S10. Phylogenetic re- families: KL; Experiments and data analyses related to miRNA: CLY, MZL; MiRNA lationship of IR proteins from A. mellifera, C. floridanum, C. solmsi, M. cin- detection: YPD, ZKS; Hemomucin analysis: DHG, CLY; Synteny: JPL; Tables, gulum, N.vitripennis, S.invicta. Figure S11. Phylogenetic relationship of OR figures and supplementary information: CLY, MZL, KH All authors of this proteins from C. floridanum, D. melanogaster and M. cingulum. Figure paper have read and approved the final version of the manuscript. S12. Phylogenetic relationship of OBP proteins from A. mellifera, C. florida- num, C. solmsi, M. cingulum, N.vitripennis, S.invicta. Figure S13. Phylogen- Ethics approval and consent to participate etic relationship of SNMP proteins from A.mellifera, C. floridanum, C. Not applicable. This study has not directly involved humans, animals or plants. solmsi, M. cingulum, N.vitripennis, S.invicta. Figure S14. Phylogenetic rela- The insects collected for sequencing were derived from laboratory. tionship of GST proteins from A. mellifera, C. floridanum, C. solmsi, M. cin- gulum, N.vitripennis, S.invicta. Figure S15. Phylogenetic relationship of Competing interests P450 proteins from N. vitripennis, D. melanogaster and M. cingulum. Fig- The authors declare that they have no competing interests. ure S16. Phylogenetic relationship of ABC proteins from M. cingulum and D. melanogaster. Figure S17. Different expression levels of miR-14b in dif- ferent developmental stages of M. cingulum.Table S1. Genome sequen- Publisher’sNote cing data of M. cingulum. Table S2. Estimation of M. cingulum genome Springer Nature remains neutral with regard to jurisdictional claims in size using K-mer analysis. Table S3. Summary of the M. cingulum gen- published maps and institutional affiliations. ome assembly. Table S4. The published insect genomes. Table S5. The genome assembly assessment on different insects. Table S6. Classifica- Author details tion of repeat sequences identified in the M. cingulum genome. Table Ministry of Agriculture Key Lab of Molecular Biology of Crop Pathogens and S7. Genome features of the M. cingulum, N. vitripennis and A. mellifera. Insects, Institute of Insect Science, Zhejiang University, 866 Yuhangtang Road, Table S8. Gene features of M. cingulum, N. vitripennis and A. mellifera. Hangzhou 310058, China. State Key Laboratory of Biocontrol, Sun Yat-sen Table S9. The insects with OGSs in InsectBase. Table S10. Hemomucin University, 135 Xingang Road West, Guangzhou 510275, China. College of genes in eight wasps. Table S11. The different gene expression of em- Information Science and Technology, Nanjing Agricultural University, Nanjing bryo and pseudogerm transcriptomes in KEGG pathway. Table S12. The 210095, China. College of Plant Protection, Nanjing Agricultural University, differently expressed miRNAs in embryo and mixed embryo transcrip- Nanjing 210095, China. Ecology and Evolutionary Biology, University of tomes. Table S13. Comparison of gene numbers for chemoreception in Kansas, Lawrence, KS 66046, USA. A.mellifera, C. floridanum, C. solmsi, M. cingulum, N. vitripennis and S. invicta. Table S14. Comparison of gene numbers for Gene families asso- Received: 15 November 2017 Accepted: 11 May 2018 ciated with insecticide resistance and detoxification in D. melanogaster, A. mellifera, C. floridanum, C. solmsi, M. cingulum, N. vitripennis and S. invicta. Table S15. Comparison of gene numbers of insect immune in A. melli- References fera, C. floridanum, C. solmsi, M. cingulum, N. vitripennis and S. invicta. Table 1. Mayhew PJ. Comparing parasitoid life histories. Entomol Exp Appl. 2016; S16. The PCR primer for target genes of mci-miR-14b. (PDF 6076 kb) 159(2):147–62. 2. Beckage NE, Gelman DB. Wasp parasitoid disruption of host development: Abbreviations implications for new biologically based strategies for insect control. Annu ARF: ADP ribosylation factors; BUSCO: Benchmarking Universal Single-Copy Rev Entomol. 2004;49:299–330. Orthologs; EHMT: Euchromatic histone-lysine N-methyltransferase; GLG: Golgi 3. van Lenteren JC. The state of commercial augmentative biological control: apparatus protein; GST: Glutathione S-transferases; KMT2E: Histone-lysine plenty of natural enemies, but a frustrating lack of uptake. BioControl. N-methyltransferase 2E; MBP-1: c-myc promoter-binding protein 1; 2012;57(1):1–20. Yin et al. BMC Genomics (2018) 19:420 Page 17 of 18 4. Werren JH, Loehlin DW: The parasitoid wasp Nasonia: an emerging model 30. Shimaji K, Konishi T, Tanaka S, Yoshida H, Kato Y, Ohkawa Y, Sato T, Suyama system with haploid male genetics. Cold Spring Harb Protoc 2009, 2009(10): M, Kimura H, Yamaguchi M. Genomewide identification of target genes of pdb emo134. histone methyltransferase dG9a during Drosophila embryogenesis. Genes 5. Werren JH, Richards S, Desjardins CA, Niehuis O, Gadau J, Colbourne JK, Cells. 2015;20(11):902–14. Werren JH, Richards S, Desjardins CA, Niehuis O, et al. Functional and 31. Ohno H, Shinoda K, Ohyama K, Sharp LZ, Kajimura S. EHMT1 controls brown evolutionary insights from the genomes of three parasitoid Nasonia adipose cell fate and thermogenesis through the PRDM16 complex. Nature. species. Science. 2010;327(5963):343–8. 2013;504(7478):163–7. 6. Segoli M, Harari AR, Rosenheim JA, Bouskila A, Keasar T. The evolution of 32. Wen L, Fukuda M, Sunada M, Ishino S, Ishino Y, Okita TW, Ogawa M, Ueda T, polyembryony in parasitoid wasps. J Evol Biol. 2010;23(9):1807–19. Kumamaru T. Guanine nucleotide exchange factor 2 for Rab5 proteins 7. Strand MR, Grbic M. The development and evolution of polyembryonic insects. coordinated with GLUP6/GEF regulates the intracellular transport of the proglutelin from the Golgi apparatus to the protein storage vacuole in Curr Top Dev Biol. 1997;35:121–59. rice endosperm. J Exp Bot. 2015;66(20):6137–47. 8. Hu J, Wang P, Zhang W. Two types of embryos with different functions are generated in the polyembryonic wasp Macrocentrus cingulum (Hymenoptera: 33. Bond D, Foley E. Autocrine platelet-derived growth factor-vascular endothelial Braconidae). Arthropod structure & development. 2015;44(6 Pt B):677–87. growth factor receptor-related (Pvr) pathway activity controls intestinal stem cell 9. Strand MR, Burke GR. Polydnaviruses as symbionts and gene delivery systems. proliferation in the adult Drosophila midgut. J Biol Chem. 2012;287(33):27359–70. PLoS Pathog. 2012;8(7):e1002757. 34. Varghese J, Lim SF, Cohen SM. Drosophila miR-14 regulates insulin 10. Glatz RV, Asgari S, Schmidt O. Evolution of polydnaviruses as insect immune production and metabolism through its target, sugarbabe. Genes Dev. 2010; suppressors. Trends Microbiol. 2004;12(12):545–54. 24(24):2748–53. 35. Barakat MT, Humke EW, Scott MP. Learning from Jekyll to control Hyde: 11. Gao F, Gu QJ, Pan J, Wang ZH, Yin CL, Li F, Song QS, Strand MR, Chen XX, hedgehog signaling in development and cancer. Trends Mol Med. 2010; Shi M. Cotesia vestalis teratocytes express a diversity of genes and exhibit 16(8):337–48. novel immune functions in parasitism. Sci Rep. 2016;6:26967. 12. Havard S, Pelissier C, Ponsard S, Campan ED. Suitability of three Ostrinia 36. Kumarswamy R, Chandna S. Inhibition of microRNA-14 contributes to species as hosts for Macrocentrus cingulum: a comparison of their encapsulation actinomycin-D-induced apoptosis in the Sf9 insect cell line. Cell Biol Int. abilities. Insect science. 2014;21(1):93–102. 2010;34(8):851–7. 13. Hu J, Xu Q, Hu S, Yu X, Liang Z, Zhang W. Hemomucin, an O-glycosylated 37. Liu Z, Ling L, Xu J, Zeng B, Huang Y, Shang P, Tan A. MicroRNA-14 regulates larval protein on embryos of the wasp Macrocentrus cingulum that protects it development time in Bombyx mori. Insect Biochem Mol Biol. 2018;93:57–65. against encapsulation by hemocytes of the host Ostrinia furnacalis. J Innate 38. Ghosh AK, Steele R, Ray RB. C-myc promoter-binding protein 1 (MBP-1) Immun. 2014;6(5):663–75. regulates prostate cancer cell growth by inhibiting MAPK pathway. J Biol 14. Kinuthia W, Li D, Schmidt O, Theopold U. Is the surface of endoparasitic Chem. 2005;280(14):14325–30. wasp eggs and larvae covered by a limited coagulation reaction? J Insect 39. Lo Presti M, Ferro A, Contino F, Mazzarella C, Sbacchi S, Roz E, Lupo C, Physiol. 1999;45(5):501–6. Perconti G, Giallongo A, Migliorini P, et al. Myc promoter-binding protein-1 (MBP-1) is a novel potential prognostic marker in invasive ductal breast 15. Xiao JH, Yue Z, Jia LY, Yang XH, Niu LH, Wang Z, Zhang P, Sun BF, He SM, Li carcinoma. PLoS One. 2010;5(9):e12961. Z, et al. Obligate mutualism within a host drives the extreme specialization of a fig wasp genome. Genome Biol. 2013;14(12):R141. 40. Zhang X, Novera W, Zhang Y, Deng LW. MLL5 (KMT2E): structure, function, 16. Simpson JT, Wong K, Jackman SD, Schein JE, Jones SJ, Birol I. ABySS: a parallel and clinical relevance. Cell Mol Life Sci. 2017;74(13):2333–44. assembler for short read sequence data. Genome Res. 2009;19(6):1117–23. 41. Walrad PB, Hang SY, Gergen JP. Hairless is a cofactor for Runt-dependent 17. Simao FA, Waterhouse RM, Ioannidis P, Kriventseva EV, Zdobnov EM. BUSCO: transcriptional regulation. Mol Biol Cell. 2011;22(8):1364–74. assessing genome assembly and annotation completeness with single-copy 42. Theopold U, Samakovlis C, Erdjument-Bromage H, Dillon N, Axelsson B, orthologs. Bioinformatics. 2015;31(19):3210–2. Schmidt O, Tempst P, Hultmark D. Helix pomatia lectin, an inducer of 18. Waterhouse RM, Seppey M, Simao FA, Manni M, Ioannidis P, Klioutchnikov Drosophila immune response, binds to hemomucin, a novel surface G, Kriventseva EV, Zdobnov EM. BUSCO applications from quality mucin. J Biol Chem. 1996;271(22):12708–15. assessments to gene prediction and phylogenomics. Mol Biol Evol. 2018; 43. Hu J, Yu XQ, Fu WJ, Zhang WQ. A Helix pomatia lectin binding protein on 35(3): 543–8. the extraembryonic membrane of the polyembryonic wasp Macrocentrus cingulum protects embryos from being encapsulated by hemocytes of host 19. Tempel S. Using and understanding RepeatMasker. Methods Mol Biol. 2012; Ostrinia furnaclis. Dev Comp Immunol. 2008;32(4):356–64. 859:29–51. 20. Kirkness EF, Haas BJ, Sun W, Braig HR, Perotti MA, Clark JM, Lee SH, Robertson 44. Hu J, Zhu XX, Fu WJ. Passive evasion of encapsulation in Macrocentrus cingulum HM, Kennedy RC, Elhaik E, et al. Genome sequences of the human body louse Brischke (Hymenoptera: Braconidae), a polyembryonic parasitoid of Ostrinia and its primary endosymbiont provide insights into the permanent parasitic furnacalis Guenee (Lepidoptera: Pyralidae). J Insect Physiol. 2003;49(4):367–75. lifestyle. Proc Natl Acad Sci U S A. 2010;107(27):12168–73. 45. Strand MR, Burke GR. Polydnavirus-wasp associations: evolution, genome 21. Kelley JL, Peyton JT, Fiston-Lavier AS, Teets NM, Yee MC, Johnston JS, organization, and function. Curr Opin Virol. 2013;3(5):587–94. Bustamante CD, Lee RE, Denlinger DL. Compact genome of the Antarctic 46. Burke GR, Walden KK, Whitfield JB, Robertson HM, Strand MR. Widespread midge is likely an adaptation to an extreme environment. Nat Commun. genome reorganization of an obligate virus mutualist. PLoS Genet. 2014; 2014;5:4611. 10(9):e1004660. 22. Liu J, Xiao H, Huang S, Li F. OMIGA: Optimized Maker-based insect genome 47. Kramer JM. Regulation of cell differentiation and function by the euchromatin annotation. Mol Genet Genomics. 2014;289(4):567–73. histone methyltranserfases G9a and GLP. Biochem Cell Biol. 2016;94(1):26–32. 23. Daub J, Eberhardt RY, Tate JG, Burge SW. Rfam: annotating families of non- 48. Inoue H, Yoshimura J, Iwabuchi K. Gene expression of protein-coding and coding RNA sequences. Methods Mol Biol. 2015;1269:349–63. non-coding RNAs related to polyembryogenesis in the parasitic wasp, 24. Guerra-Assuncao JA, Enright AJ. MapMi: automated mapping of microRNA Copidosoma floridanum. PLoS One. 2014;9(12):e114372. loci. BMC bioinformatics. 2010;11:133. 49. Flavia Nardy A, Freire-de-Lima CG, Morrot A. Immune evasion strategies of 25. Kozomara A, Griffiths-Jones S. miRBase: annotating high confidence microRNAs Trypanosoma cruzi. J Immunol Res. 2015;2015:178947. usingdeepsequencingdata. NucleicAcids Res. 2014;42(Database issue):D68–73. 50. Francica JR, Varela-Rohena A, Medvec A, Plesa G, Riley JL, Bates P. Steric 26. Stamatakis A: Using RAxML to infer phylogenies. Current protocols in shielding of surface epitopes and impaired immune recognition induced by bioinformatics / editoral board, Andreas D Baxevanis [et al] 2015, 51:6 14 11–14. the ebola virus glycoprotein. PLoS Pathog. 2010;6(9):e1001098. 27. Pennacchio F, Strand MR. Evolution of developmental strategies in parasitic 51. Reynard O, Borowiak M, Volchkova VA, Delpeut S, Mateo M, Volchkov VE. hymenoptera. Annu Rev Entomol. 2006;51:233–58. Ebolavirus glycoprotein GP masks both its own epitopes and the presence of cellular surface proteins. J Virol. 2009;83(18):9596–601. 28. Peters RS, Krogmann L, Mayer C, Donath A, Gunkel S, Meusemann K, Kozlov A, Podsiadlowski L, Petersen M, Lanfear R, et al. Evolutionary history of the 52. Hare EE, Johnston JS. Genome size determination using flow cytometry of Hymenoptera. Curr Biol. 2017;27(7):1013–8. propidium iodide-stained nuclei. Methods Mol Biol. 2011;772:3–12. 29. Donnell DM. Analysis of odorant-binding protein gene family members in 53. Li R, Fan W, Tian G, Zhu H, He L, Cai J, Huang Q, Cai Q, Li B, Bai Y, et al. The the polyembryonic wasp, Copidosoma floridanum: evidence for caste bias sequence and de novo assembly of the giant panda genome. Nature. 2010; and host interaction. J Insect Physiol. 2014;60:127–35. 463(7279):311–7. Yin et al. BMC Genomics (2018) 19:420 Page 18 of 18 54. Marcais G, Kingsford C. A fast, lock-free approach for efficient parallel counting 82. De Bie T, Cristianini N, Demuth JP, Hahn MW. CAFE: a computational tool of occurrences of k-mers. Bioinformatics. 2011;27(6):764–70. for the study of gene family evolution. Bioinformatics. 2006;22(10):1269–71. 55. Rodelsperger C, Neher RA, Weller AM, Eberhardt G, Witte H, Mayer WE, 83. Yin C, Shen G, Guo D, Wang S, Ma X, Xiao H, Liu J, Zhang Z, Liu Y, Zhang Y, Dieterich C, Sommer RJ. Characterization of genetic diversity in the nematode et al. InsectBase: a resource for insect genomes and transcriptomes. Nucleic Pristionchus pacificus from population-scale resequencing data. Genetics. 2014; Acids Res. 2016;44(D1):D801–7. 196(4):1153–65. 84. Clark K, Karsch-Mizrachi I, Lipman DJ, Ostell J, Sayers EW. GenBank. Nucleic 56. Boetzer M, Henkel CV, Jansen HJ, Butler D, Pirovano W. Scaffolding pre- Acids Res. 2016;44(D1):D67–72. assembled contigs using SSPACE. Bioinformatics. 2011;27(4):578–9. 85. Meng X, Ji Y. Modern computational techniques for the HMMER sequence analysis. ISRN Bioinform. 2013;2013:252183. 57. Luo R, Liu B, Xie Y, Li Z, Huang W, Yuan J, He G, Chen Y, Pan Q, Liu Y, et al. 86. Saitou N, Nei M. The neighbor-joining method: a new method for reconstructing SOAPdenovo2: an empirically improved memory-efficient short-read de phylogenetic trees. Mol Biol Evol. 1987;4(4):406–25. novo assembler. GigaScience. 2012;1(1):18. 87. Grabherr MG, Haas BJ, Yassour M, Levin JZ, Thompson DA, Amit I, Adiconis 58. You M, Yue Z, He W, Yang X, Yang G, Xie M, Zhan D, Baxter SW, Vasseur L, X, Fan L, Raychowdhury R, Zeng Q, et al. Full-length transcriptome assembly Gurr GM. A heterozygous moth genome provides insights into herbivory from RNA-Seq data without a reference genome. Nat Biotechnol. 2011;29(7): and detoxification. Nat Genet. 2013;45(2):220–5. 644–52. 59. Yandell M, Ence D. A beginner's guide to eukaryotic genome annotation. 88. Love MI, Huber W, Anders S. Moderated estimation of fold change and Nat Rev Genet. 2012;13(5):329–42. dispersion for RNA-seq data with DESeq2. Genome Biol. 2014;15(12):550. 60. Benson G. Tandem repeats finder: a program to analyze DNA sequences. 89. John B, Enright AJ, Aravin A, Tuschl T, Sander C, Marks DS. Human Nucleic Acids Res. 1999;27(2):573–80. MicroRNA targets. PLoS Biol. 2004;2(11):e363. 61. Bao Z, Eddy SR. Automated de novo identification of repeat sequence families 90. Lewis BP, Burge CB, Bartel DP. Conserved seed pairing, often flanked by in sequenced genomes. Genome Res. 2002;12(8):1269–76. adenosines, indicates that thousands of human genes are microRNA targets. 62. Price AL, Jones NC, Pevzner PA. De novo identification of repeat families in Cell. 2005;120(1):15–20. large genomes. Bioinformatics. 2005;21(Suppl 1):i351–8. 91. Rehmsmeier M, Steffen P, Hochsmann M, Giegerich R. Fast and effective 63. Trapnell C, Roberts A, Goff L, Pertea G, Kim D, Kelley DR, Pimentel H, Salzberg SL, prediction of microRNA/target duplexes. RNA. 2004;10(10):1507–17. Rinn JL, Pachter L. Differential gene and transcript expression analysis of RNA-seq 92. Kertesz M, Iovino N, Unnerstall U, Gaul U, Segal E. The role of site accessibility experiments with TopHat and cufflinks. Nat Protoc. 2012;7(3):562–78. in microRNA target recognition. Nat Genet. 2007;39(10):1278–84. 64. Bolger AM, Lohse M, Usadel B. Trimmomatic: a flexible trimmer for Illumina 93. Lavieri R, Filaci G, Fenoglio D, Giacomini M. ImmunoDB: a web based tool sequence data. Bioinformatics. 2014;30(15):2114–20. to analyze preclinical data. Stud Health Technol Inform. 2014;205:438–42. 65. Langmead B, Trapnell C, Pop M, Salzberg SL. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol. 2009;10(3):R25. 66. Trapnell C, Pachter L, Salzberg SL. TopHat: discovering splice junctions with RNA-Seq. Bioinformatics. 2009;25(9):1105–11. 67. Makarov V. Computer programs for eukaryotic gene prediction. Brief Bioinform. 2002;3(2):195–9. 68. Stanke M, Steinkamp R, Waack S, Morgenstern B. AUGUSTUS: a web server for gene finding in eukaryotes. Nucleic Acids Res. 2004;32(Web Server): W309–12. 69. Korf I. Gene finding in novel genomes. BMC bioinformatics. 2004;5:59. 70. Lomsadze A, Burns PD, Borodovsky M. Integration of mapped RNA-Seq reads into automatic training of eukaryotic gene finding algorithm. Nucleic Acids Res. 2014;42(15):e119. 71. Campbell MS, Holt C, Moore B, Yandell M: Genome annotation and curation using MAKER and MAKER-P. Current protocols in bioinformatics / editoral board, Andreas D Baxevanis [et al] 2014, 48:4 11 11–39. 72. Nawrocki EP, Kolbe DL, Eddy SR. Infernal 1.0: inference of RNA alignments. Bioinformatics. 2009;25(10):1335–7. 73. Li L, Stoeckert CJ Jr, Roos DS. OrthoMCL: identification of ortholog groups for eukaryotic genomes. Genome Res. 2003;13(9):2178–89. 74. Fischer S, Brunk BP, Chen F, Gao X, Harb OS, Iodice JB, Shanmugam D, Roos DS, Stoeckert CJ, Jr.: Using OrthoMCL to assign proteins to OrthoMCL-DB groups or to cluster proteomes into new ortholog groups. Current protocols in bioinformatics / editoral board, Andreas D Baxevanis [et al] 2011, Chapter 6: Unit 6 12 11–19. 75. Olson SA. EMBOSS opens up sequence analysis. European molecular biology open software suite. Brief Bioinform. 2002;3(1):87–91. 76. Posada D, Crandall KA. MODELTEST: testing the model of DNA substitution. Bioinformatics. 1998;14(9):817–8. 77. Tamura K, Battistuzzi FU, Billing-Ross P, Murillo O, Filipski A, Kumar S. Estimating divergence times in large molecular phylogenies. Proc Natl Acad Sci U S A. 2012;109(47):19333–8. 78. Kumar S, Stecher G, Tamura K. MEGA7: molecular evolutionary genetics analysis version 7.0 for bigger datasets. Mol Biol Evol. 2016;33(7):1870–4. 79. Bolshakov VN, Topalis P, Blass C, Kokoza E, della Torre A, Kafatos FC, Louis C. A comparative genomic analysis of two distant diptera, the fruit fly, Drosophila melanogaster, and the malaria mosquito, Anopheles gambiae. Genome Res. 2002;12(1):57–66. 80. Yeates DK, Wiegmann BM. Congruence and controversy: toward a higher-level phylogeny of Diptera. Annu Rev Entomol. 1999;44:397–428. 81. Li H, Coghlan A, Ruan J, Coin LJ, Heriche JK, Osmotherly L, Li R, Liu T, Zhang Z, Bolund L, et al. TreeFam: a curated database of phylogenetic trees of animal gene families. Nucleic Acids Res. 2006;34(Database issue):D572–80.
BMC Genomics – Springer Journals
Published: May 30, 2018
It’s your single place to instantly
discover and read the research
that matters to you.
Enjoy affordable access to
over 18 million articles from more than
15,000 peer-reviewed journals.
All for just $49/month
Query the DeepDyve database, plus search all of PubMed and Google Scholar seamlessly
Save any article or search result from DeepDyve, PubMed, and Google Scholar... all in one place.
All the latest content is available, no embargo periods.
“Whoa! It’s like Spotify but for academic articles.”@Phil_Robichaud