Notwithstanding the rapid developments in sequencing techniques, Y and W sex chromosomes have still been mostly excluded from whole genome sequencing projects due to their high re- petitive DNA content. Therefore, Y and W chromosomes are poorly described in most species despite their biological importance. Several methods were developed for identifying Y or W- linked sequences among unmapped scaffolds. However, it is not enough to discover functional regions from short unmapped scaffolds. Here, we provide a new and simple strategy based on k-mer comparison for comprehensive analysis of the W chromosome in Bombyx mori. Using this novel method, we effectively assembled de novo 1281 W-derived genome contigs (totaling 1.9 Mbp), and identified 156 W-linked transcript RNAs and 345 W-linked small RNAs. This method will help in the elucidation of mechanisms of sexual development and exploration of W chromosome biological functions, and provide insights into the evolution of sex chromo- somes. Moreover, we showed this method can be employed in identifying heterogametic sex chromosomes (W and Y chromosomes) in many other species where genomic information is still scarce. Key words: heterogametic sex chromosome, silkworm, non-coding RNA 1. Introduction Bombyx mori, Gallus gallus. Previously people believed a sex deter- In most animal species, sex-determination is governed by chromo- mining gene would be localized in a unique sequence domain of a somal differences involving two common sex determination systems: heterogametic sex chromosome; thus many studies were focused on XX/XY sex chromosomes determine female/male, e.g. in Homo sapi- ﬁshing or positioning unique sequence domains on W- or Y-chromo- ens, Drosophila melanogaster and Anopheles gambiae and other spe- somes. In this way, DMY of the medaka ﬁsh was identiﬁed as a sex- cies and WZ/ZZ sex chromosomes determine female/male, e.g. in determining gene. However, Y and W chromosomes present special V C The Author(s) 2018. Published by Oxford University Press on behalf of Kazusa DNA Research Institute. This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact email@example.com 375 Downloaded from https://academic.oup.com/dnaresearch/article-abstract/25/4/375/4958748 by Ed 'DeepDyve' Gillespie user on 22 August 2018 376 A new method describing sex chromosome 3–5 challenges for whole genome shotgun assembly, because they are SRA accession number PRJNA393916. B. mori female and male enriched with interspersed repeat elements and segmental duplica- HiSeq reads were produced on an Illumina HiSeq 2000 sequencer 6,7 tions. Even though some unique domains exist on Y or W, many (101-bp pair-end using pools of larval posterior silk gland which transposable elements (TEs) interrupt these unique sequences into yielded 73-fold coverage of genome for each sex). B. mori female short pieces, making it difﬁcult to ﬁnd such unique domains. Thus, MiSeq reads (only used for assisting de novo assembly of W contigs) whole genome shotgun methods, even the latest third generation se- were produced on an Illumina MiSeq sequencer (250-bp pair-end; pools of female larvae; 20-fold genome coverage). All reads were quencing techniques (e.g. PacBIO) which can produce very long from Dazao, the inbred strain used in the genome project. The next reads, cannot effectively assemble high quality contigs or scaffolds of 8–10 generation sequencing data for male and female D. melanogaster Y and W chromosomes at present stage, which obstructs studies pooled 5 day old mated adults was downloaded from the NCBI of sex-determination and sexual development and reproduction. Sequence Read Archive [SRA: SRP007888]; the next generation se- In most genome projects, Y and W chromosome sequences are frag- quence data for male and female An. gambiae pooled wild type mented into a large number of small, unmapped scaffolds. Recently strain was downloaded from NCBI Sequence Read Archive [SRA: a few researchers have applied the CQ (Chromosome Quotient) 12,15 12 11 SRP014756]. Preferably, to ensure the accuracy of the KQ method or YGS (Y chromosome Genome Scan) method to identify method, male and female sequence data should be from highly in- Y-linked sequences in these unmapped scaffolds. The CQ method uses bred populations and data for both sexes should be of the same the number of alignments from male and female sequence data to de- amount. Since the coverage of the next-generation sequence data can termine whether a sequence is Y-linked. The YGS method, on the affect the calculation of the KQ value, from our experience, we sug- other hand, is based on a comparison of the assembled genome with 11,13 gest that more than 50X coverage is an appropriate choice, which female short-reads. We tried to apply both methods in B. mori to can produce a more precise high-frequency W-15-mer. B. mori ﬁnd W-derived contigs. However, both methods can only identify lim- RNA-Seq data of mixed sex embryos from 0–24 hpo were produced ited number of contigs. In CQ method all repeat sequences must be using an Illumina HiSeq 2000 sequencer, yielding 101-bp pair-end masked and the female to male ratio is not always precise, especially in and 24 G clean data for each of the different periods (0, 2, 4, 8, 16 short sequences (length<¼ 2 kb) with few reads aligned. As YGS and 24 hpo). Small RNA sequences have been deposited in method is based on genome sequences, implementing it in species with- GenBank/EMBL/DDBJ under accession numbers from AB386191– out reference genome is difﬁcult. Consequently, both methods cannot AB424683. work very effectively in B. mori, especially for short sequences. Hence, neither of them provides a reliable strategy for de novo assembly of Y or W chromosomes, which restricted the application of the two meth- 2.2. Criteria for filtrating valid W-specific k-mer ods in many other species. The genome sequence of males and females differ only by the W Just like the previously mentioned two methods, many studies chromosome, which produces W-speciﬁc k-mers. These k-mers are concentrated on how to eliminate repetitive sequences from the se- present only in the female data, but absent from the male data. quence data. However, the majority of Y or W chromosomes are Before doing a female vs. male k-mer comparison, removal of se- composed of repetitive sequences and many functional domains may quencing errors in the reads is important in order to achieve high res- be inserted into TEs, which means very limited unique contigs can be olution. This was done using two criteria: (i) masking low quality found if all repetitive sequences are removed. Nevertheless, many bases and (ii) ﬁltering k-mers with rare frequency. Speciﬁcally, for mutations accumulate on Y or W repetitive sequences and Y/W chro- the B. mori reads, the sequencing errors were ﬁltered by masking ba- mosome have no recombination with X/Z chromosome, which may ses with phred score<20 and removing 15-mers that were present generate some speciﬁc SNPs that can be used as molecular markers fewer than 5X in reads. Counting k-mers for separate female and to ﬁnd Y or W linked sequences. So we can utilize these repetitive se- male data and removal of sequencing errors can be done using 16,17 quences, instead of masking them, for identifying Y or W linked se- Jellyﬁsh. We set up the KQ parameter, for a given k-mer Ki, quences. Based on this concept, we applied a novel K-mer KQ(ki) ¼ M(ki)/F(ki), where M(ki) is the frequency of Ki in male Quotient (KQ) method as a more easy, effective and accurate way reads, F(ki) is the frequency of Ki in female reads. To obtain reliable for comprehensive analysis of the B. mori W chromosome. It can W-derived k-mers, we used the strict criterion of KQ¼ 0. To reduce provide a strategy for de novo assembling of contigs of Y or W chro- the interference of individual polymorphism in sequencing reads, we mosomes, unlike the CQ or YGS method. Moreover, in addition to ﬁltered out these k-mers (KQ¼ 0) with low frequency (<15, B. mori, we successfully applied this method to two diptera, D. mela- Supplementary Fig. S1) in female reads. nogaster and An. gambiae, as XX/XY sex determination species. As a complete genome sequence is not necessary and removal of repeat 2.3. W-derived reads and de novo assembly strategy sequences is not needed for the successful implementation of our Reads containing W-15-mers could be W-derived and therefore method, it can also be applied to species where genomic information called W-reads. We screened the W-reads from B. mori Illumina fe- is scarce. Additionally, the KQ method can be extended for the iden- male reads with Bowtie. Since the reads were paired ends, if a sin- tiﬁcation of Y or W-linked transcript RNAs and small RNAs, thus gle read sequence was W-derived (contained W-15-mer), its providing an innovative approach to study the functions of W or Y corresponding pair end read could be also W-derived even without a chromosomes without complete sequence information. W-15-mer. Based on this assumption, we screened W-derived paired- end reads and assembled them using the MaSuRCA assembler. To prevent incorporation of invalid contigs originating from misassem- 2. Materials and methods bly, a criterion was followed to ﬁlter out invalid contigs. We aligned 2.1. Short reads all the W-15mers to all contigs with 0 mismatch using Bowtie, and All sequencing data produced in the present study have been depos- calculated the amount of W-15-mers per Kb sequence (AWK) ited in the NCBI Short Read Archive and can be accessed under the for each contig. From the scatter distribution map (Supplementary Downloaded from https://academic.oup.com/dnaresearch/article-abstract/25/4/375/4958748 by Ed 'DeepDyve' Gillespie user on 22 August 2018 S. Li et al. 377 Fig. S2), we found that when the AWK value was less than 10, the First-Strand Synthesis Kit (TaKaRa, Kusatsu, Japan). qPCR was per- scatters deviated from the main trend line, indicating that such con- formed with SYBR Premix Ex (TaKaRa, Kusatsu, Japan). tigs with fewer W-15-mers were possibly not W-derived. Thus, we set a threshold of AWK¼ 10 for ﬁltering invalid contigs. Whereas in 3. Results and discussion the case of very short contigs (less than 1 kb), we suggested that at least 10 W-15-mers were required to be aligned to such contigs. 3.1. The KQ method Although most of the B. mori W chromosome is composed of TEs 2.4. AWK threshold in transcripts and small RNAs whose copies are widely distributed on autosomes or on the Z chro- For identiﬁcation of W derived contigs from the already assembled mosome, still some subtle variations such as single nucleotide poly- genome sequences, it is also appropriate to set the same threshold morphisms (SNPs) can be observed only on W chromosome. Such with AWK¼ 10 as followed previously. However, for identiﬁcation W-speciﬁc variations may produce W-speciﬁc short sequences called of transcripts and small RNA sequences we must follow different cri- k-mers. teria. Considering that de novo assembled RNA transcripts were For obtaining these possible W-speciﬁc k-mers, we tried to ﬁnd an composed of different exons and free from introns, we could increase optimum k value (sequence length). K¼ 15, which is a typical value 11,19 the threshold to AWK¼ 50 for W-derived transcripts. Thus, we sug- used for ﬁltering errors in short reads, is suitable for female- gested that it is better to choose a number from 10 to 50, since the speciﬁc (W-speciﬁc) k-mers in B. mori. We counted the frequencies higher AWK value gave more reliable results but fewer W-derived of each 15-mer separately in female and male sequence data and cre- transcript RNA candidates and the lower AWK values gave more ated a parameter called KQ. For a given k-mer Ki, KQ(ki) ¼ M(ki)/ candidate sequences but less reliable results. As for the small RNAs, F(ki), where M(ki) is the frequency of Ki in male reads and F(ki) de- we counted the number of aligned W-15-mers for each small RNA notes the frequency of Ki in female reads. As females and males share and set a threshold with no less than three aligned W-15-mers. the same complement of autosomes, autosomal 15-mers are present Similarly, this threshold is also ﬂexible and can be increased for in both female and male sequencing data in roughly the same quanti- higher conﬁdence results, or can be decreased for a higher number of ties. Therefore, autosomal 15-mers have KQ values distributed W-linked small RNA candidates. around one (Fig. 1). As males have two Z chromosomes while fe- males have only one, the Z-speciﬁc 15-mers have KQ values distrib- uted around two. Since unique W chromosomal 15-mers are present 2.5. Application and limitation of the KQ method only in female sequencing data, these KQ values localize to zero. We Different species have different genome sizes and therefore the k also applied KQ to D. melanogaster and An. gambiae [Fig. 1, ex- value of k-mer needs to be changed according to the genome size. The k-mer size should be large enough so that identical k-mers sel- change M(ki) and F(ki) position in XX/XY species]. The results were 19,20 dom occur from sequencing errors. On the other hand, run time similar to B. mori, which conﬁrms that KQ may be applied to most sexually heterogametic species. and memory usage would be sharply increased with larger k values. In our assumption, these W or Y-speciﬁc 15-mers (KQ¼ 0) could Thus, k¼ 15 for Drosophila, and k¼ 18 for humans are typical val- 19,20 ues used for ﬁltering errors in short reads from these genomes. be used as labels to identify sequences from W or Y chromosome. To Taking the above information and results from the YGS method into verify such an assumption, we ﬁrst applied it to D. melanogaster Y, consideration, we recommend initial values of k¼ 15 for insect-like which has been relatively well sequenced. We aligned all Y-speciﬁc genomes (approximately size 100–500 Mb) and k¼ 18 for 15-mers (Y-15-mers) of D. melanogaster to its genome scaffolds vertebrate-like genomes (approximately size 2 Gb). (Genebank assembly accession: GCA_000001215.4), and a parame- ter named AYK (Amount of Y-15-mers per Kb sequence) was created for each scaffold. Based on the analysis of each scaffold’s AYK value 2.6. Molecular biology methods and length (Fig. 2A), we could show clearly that the majority of scaf- Bombyx mori (Dazao strain) larvae were reared on fresh mulberry folds are Y-derived (red points) when AYK>¼10, while scaffolds of leaves in a dedicated silkworm rearing room at ambient temperature X or autosome (blue and green points) are with the AYK< 10 and (23–27 C). Genomic DNA samples were separately isolated from some unmapped scaffolds (grey points) with AYK value>¼10 are virgin female and male individuals with tissue DNA kits (OMEGA, also possible Y-derived. The result conﬁrmed that our KQ method Norcross, GA). In the case of W contigs that had autosomal or Z could effectively identify Y-linked sequences. Except KQ method, we paralogs, primers for genomic DNA PCR were designed considering also applied previous CQ and YGS method in D. melanogaster, and the differences between the autosomal or Z paralogs and the W se- compared the results of three methods to evaluate which method quence at the 3’ end of the primer. PCR was performed with GO could be most effective (We only selected contigs with AYK>¼50 in Taq DNA Polymerase (Promega, Madison, WI; Supplementary KQ method to unify the standard of three methods). For 200 already Table S1–S3), RNA was isolated from individual embryos following known Y-scaffolds, CQ method could identify only 63 of 200 as Y- the traditional Trizol method (Ambion, Carlsbad, CA) using the total linked, whereas YGS identiﬁed 106 and our KQ provided 154 Y- RNA isolation protocol according to the manufacturer’s instructions. scaffolds. Figure 2B showed that for more than 5 kb scaffolds, three Residual DNA in samples was used to identify the gender of each methods could identify most Y-linked scaffolds, whereas for less embryo with the W_seq1 marker (Supplementary Table S1). Total than 5 kb, KQ method worked obviously better than the other two RNA (following DNase treatment) was subjected to reverse tran- TM methods. Thus, compared with other two methods, KQ method per- scription using a PrimeScript RT Master Mix (Perfect Real Time; TaKaRa, Kusatsu, Japan) in 50 ll reaction volume (2500 ng total formed more effectively for short scaffolds (less than 5 kb). As Y and RNA) and then diluted 5-fold. One lg of cDNA was used in 10 ll W chromosome sequences are usually fragmented into a large num- ber of short and mostly unmapped scaffolds due to highly repetitive PCR reaction volume. Small RNAs (following DNase treatment) TM were subjected to reverse transcription using a Mir-X miRNA sequences, KQ method would have wider application in Downloaded from https://academic.oup.com/dnaresearch/article-abstract/25/4/375/4958748 by Ed 'DeepDyve' Gillespie user on 22 August 2018 378 A new method describing sex chromosome Figure 1. Distribution of calculated KQs. We counted the KQ value for each 15-mer and constructed a distribution histogram for B. mori, D. melanogaster and An. gambiae. It is apparent that most autosome-derived, Z specific or W unique 15-mers have distinctive KQs. Autosome derived 15-mers have a KQ value around 1 (mainly from 0.3 to 1.8), Z or X derived 15-mers have a KQ value around 2 (mainly from 1.8 to 2.2) and W or Y derived 15-mers have a KQ value around 0. Thus, we can easily identify W specific 15-mers by the KQ value. heterogametic organisms, particularly for the species in which geno- chromosome (20 M) depending on the relative cytogenetic length mic information is still scarce. and if we obtain a sufﬁcient number of W-BACs, it would be possible to construct BAC contigs covering the whole W chromosome. 3.2. De novo assembled W specific genome contigs and identification of W-BACs 3.3. Identification of W-linked transcript RNAs We obtained 93,942 high-frequency 15-mers (frequency>¼15 in Use of RNA-Seq has improved the identiﬁcation of a large number of female reads), where the KQ values were zero in B. mori. These cellularly expressed RNA species. However, little information is re- 15-mers (W-15-mers) could not be aligned onto a male genomic as- ported on how many of these are actually transcribed on the W chro- sembly, which strengthened the likelihood of W-speciﬁcity. As reads mosome. Here, we applied the KQ method to RNA-Seq data derived containing W-15-mers could be W-derived, we aligned all the W-15- from silkworm embryos to ﬁnd W-linked transcripts. We performed mers to female genome sequencing reads and screened the reads con- deep RNA-Seq for 0-24 h post-oviposition (hpo) mixed embryo sam- taining W-15-mers. We obtained about 1.1 million HiSeq paired-end ples (each sample contained 5–6 unsexed embryos), which we assem- reads (101 bp, 216 Mbp) and 0.6 million MiSeq paired-end reads bled de novo using the Trinity assembler, and obtained 175,492 (250 bp, 300 Mbp) as W-derived reads, which were assembled into transcript RNAs including different alternative splicing isoforms. We 6718 W-speciﬁc contigs using the MaSuRCA assembler. We set a then screened for valid W-derived RNAs with a strict threshold of threshold AWK (Amount of W-15-mers per Kb sequence) value¼ 10 AWK value¼ 50 and thus selected a total of 156 candidate W-linked and considered contigs with an AWK value greater than 10 as valid RNAs. Using an expression heatmap to analyse the data (Fig. 4A), W-speciﬁc contigs (Supplementary Fig. S2). Eligible contigs of 1281 we observed two main expression clusters: in Cluster A RNAs initi- were selected with a total length of around 1.9 Mb, which covered al- ated from 16 hpo; in Cluster B RNAs were continuously expressed most 1/10 of the W chromosome. To conﬁrm these contigs were in- from 0–24 hpo. deed W-derived, we randomly chose 10 of them (named W_Seq1 to To conﬁrm that these RNAs were indeed W-derived, ﬁve candi- W_Seq10) for a separate PCR test with female and male genomic dates were selected for examining their expression in individual em- DNA. The primers were designed to have W-15-mers at the 3’ end and bryos. We designed primers using their W-k-mers (Supplementary used a relatively high annealing temperature to reduce the likelihood 22–24 Table S3) in the same way as for W-BAC ends. We isolated both of primer mismatch. PCR bands appeared only in females for all DNA and RNA from single embryos aged 16 hpo, and used DNA to 10 primer pairs, (Fig. 3A), indicating that all of them were W-speciﬁc. detect the presence of the W chromosome by PCR (Supplementary To obtain the ﬁne structure of the W chromosome, we used this Fig. S3), while we followed RNA by RT-PCR to examine expression method to ﬁnd W-derived bacterial artiﬁcial chromosomes (BACs) in (Fig. 4B). The RT-PCR results showed cluster A RNAs were ex- silkworm BAC libraries. The silkworm BAC reference library (de- pressed only in female embryos, whereas cluster B RNAs were ex- rived from both sexes) contains 74,717 BACs, for which about 200– pressed in both female and male embryos. For further veriﬁcation, 500 bp ends have already been sequenced with the Sanger ABI3730 we performed a separate PCR test of both cluster A and B RNAs sequencer. Using the threshold AWK value¼ 10,122 BAC end se- with female and male genomic DNA. For these ﬁve RNAs, PCR quences were obtained to be putatively W derived, and we chose 10 bands appeared only in female genomic DNA (Fig. 4B), indicating of them at random to conﬁrm their W speciﬁcity by PCR test against both cluster RNAs were transcribed on the W chromosome. These female vs. male genomic DNA (Supplementary Table S2). The pres- results suggested that cluster B RNAs are very likely to be maternal ence of PCR bands only in female genomic DNA conﬁrmed (Fig. 3B) RNAs deposited in the embryos as they are present throughout early that all 10 BAC end sequences were W-speciﬁc, i.e. the correspond- embryos of both females and males. An interesting phenomenon in ing BACs were also W-derived. RT-PCR of cluster B RNAs is that bands were brighter in female The results indicated the KQ method could effectively assemble than in male, which indicated the expressions of cluster B RNAs are and identify contigs originating from the W chromosome. These con- tigs will be invaluable in the search for W-speciﬁc PCR markers. The different in female and male embryos. Thus, we did the qPCR of a length of B. mori W chromosome may have the same size with Z cluster B RNAs (DN42314_c0) in female and male embryos from 0 Downloaded from https://academic.oup.com/dnaresearch/article-abstract/25/4/375/4958748 by Ed 'DeepDyve' Gillespie user on 22 August 2018 S. Li et al. 379 Figure 3. Female-specific PCR amplification of de novo assembled contigs (A) and BAC end sequences (B). (A) PCR performed with three female and three male genomic DNA samples shows female-specific amplification of 10 W-contigs. Chr 2 sequence was amplified as a control in both female and male genomic DNA confirming the integrity of the genomic DNA samples. (B) PCR carried out with three female and three male genomic DNA samples Figure 2. Identification of Y-linked sequences in D. melanogaster. (A) We ap- shows female-specific amplification of 10 BAC end sequences. plied KQ method in genome sequences of D. melanogaster and counted length and AYK value for each scaffold. For the vast majority of Autosome With GO annotation and an NCBI BLAST search of 22 candidate (green point) and X-derived (blue point) scaffolds, their AYK values are less than 10 (usually close to 0); Y-derived scaffolds (red point) were with AYK coding RNA sequences, we found that many of them had high simi- value more than 10. Unmapped scaffolds (grey point) have dispersive AYK larity with some B. mori uncharacterized genes located on an auto- values, we speculated that scaffolds with AYK value more than 10 are possi- some or Z chromosome (Supplementary Table S4), suggesting these ble Y-linked. (B) Compared with the CQ and YGS methods, KQ method could genes may have been translocated onto the W chromosome through identify more precise Y-derived scaffolds in D. melanogaster. Based on the transposons during evolution. The GO analysis showed that these analysis of scaffold length and number, the results of three methods were coding transcripts were diverse, possibly involved in DNA integra- similar when the length of scaffold is more than 5 kb, whereas KQ method (red pillar) could identify more precisely Y-linked scaffolds when the length tion and metabolic processes, nucleic acid binding, nucleotidyltrans- of scaffold is less than 5 kb. ferase activity, etc. (Supplementary Fig. S4). to 36 hpo, and found sex-biased expression starting from 8 hpo (Fig. 4C). The expression levels were almost same in both sex em- 3.5. Identification of W-linked small RNAs bryos before 8 hpo, but after that, it seemed to fade away in male The W chromosome being a source of female-enriched piRNAs in- embryos and de novo expressed in female embryos. dicates that W chromosome function mainly depends on them. For example, Fem piRNA was reported to be the primary sex-determiner in silkworm. As small RNAs are usually around 18–30 nt, it is dif- 3.4. Annotation of W-linked transcript RNAs ﬁcult to determine whether or not they are W-derived. Using shorter We found 156 W-linked RNAs, among which we predicted 134 were length W-15-mers, the KQ method can also be used to identify W-de- 28,29 non-coding RNAs by their Coding-Non-Coding Index (CNCI). rived small RNAs from the pool of small RNA sequences in B. mori. Non-coding RNAs, most of which are long ncRNAs (lncRNAs), are Kawaoka et al. (2008) identiﬁed 38,493 small RNA species from the main repertoire of the transcripts expressed from the W chromo- ovarian RNA (pupa day 4) in the silkworm, while we identiﬁed some. We found 93 ovarian piRNAs among the lncRNAs through 72,439 small RNA species from embryo RNA of day 8 pupa. We NCBI BLAST, which implied many lncRNAs seemed to be precur- merged the two small RNA datasets, which amounted to 106,717 sors of piRNAs. small RNA species and then aligned W-15-mers to all these small Downloaded from https://academic.oup.com/dnaresearch/article-abstract/25/4/375/4958748 by Ed 'DeepDyve' Gillespie user on 22 August 2018 380 A new method describing sex chromosome Figure 4. W-derived transcripts, expression profiles and genomic PCR amplification. (A) Hierarchical cluster analysis of W-derived transcripts. Expression pro- files of transcripts in mixed embryos (0–24 hpo) were clustered and visualized in a heatmap. (B) PCR amplification of W-derived transcripts in cDNA (Left panel) and gDNA (Right panel). RT-PCR (cDNA) showed expression as cluster A (female embryos specific) or cluster B (in both female and male embryos). However,a PCR test of genomic DNA showed both transcript RNAs were female-specific, indicating that they are W-derived, but cluster A RNAs are zygotically expressed in female embryos whereas cluster B RNAs are probably maternal RNAs deposited into eggs. (C) qPCR of a cluster B RNA (DN42314_c0) from 0 to 36 hpo ex- hibited its sex-biased expression in embyos. RNA sequences to screen small RNAs with no fewer than three 4. Discussion aligned W-15-mers, resulting in 345 candidates. Their sizes were mostly around 27–29 nt (Fig. 5A) with a strong bias for U(T) at the In this study, we ﬁrst explored the existence of W or Y-speciﬁc 15- 5’ end, indicating that most of the W-linked small RNAs were mers in B. mori, D. melanogaster and An. gambiae, then developed piRNAs. We randomly chose 5 small RNAs out of the 345 candi- the KQ method and obtained a successful result in D. melanogaster. dates to check their expression levels using RT-PCR in separate fe- After that, we applied KQ method in B. mori to assemble W-linked male and male pupal cDNA (day 4, Supplementary Table S5). The genome contigs and to identify W-linked transcript RNAs and small bands appeared only in females for all the ﬁve small RNAs (Fig. 5B), RNAs effectively. In contrast to the previously reported CQ and which demonstrated that they were indeed W-derived. YGS methods, our KQ method is more effective and accurate Downloaded from https://academic.oup.com/dnaresearch/article-abstract/25/4/375/4958748 by Ed 'DeepDyve' Gillespie user on 22 August 2018 S. Li et al. 381 RNA, silencing of Masc by Fem piRNA is required for the produc- tion of female-speciﬁc isoforms of Bmdsx in female embryos and that a Masc-associated process may control both dosage compensa- 32,34,35 tion and masculinization in male embryos. This suggests that W-linked non-coding RNAs, especially piRNAs, play important roles in sex-determination and dosage compensation, and may be the main functional form encoded on the W. Also many W-derived piRNAs are expressed at the pupal period, which may function to 36,37 maintain germline stem cells, suggesting that the effects of W- linked piRNAs are continuous throughout female life. Acknowledgements The authors thank Youbing Guo, Duolian Liu and Li Peng for assistance with the information analysis. They also appreciate Zha XF for providing the em- bryo RNA-Seq data. Conflict of interest The authors declare no competing ﬁnancial interests. Funding This work was supported by the grant from the One Thousand Foreign Figure 5. Length distribution graph of W-linked small RNAs and female-spe- cific RT-PCR amplification. (A) The lengths of most W-derived small RNAs Experts Recruitment Program of the Chinese Government (No. WQ were concentrated on around 27–29 nt and their first bases from the 5’ end 20125500074). were preferentially T (U), which indicated most W-derived small RNAs are piRNAs. (B) Female-specific RT-PCR amplifications indicated that all five small RNAs were indeed W-derived. Supplementary data Supplementary data are available at DNARES online. especially in short sequences. In addition, the KQ method opens up a novel approach for de novo assembly of W contigs, and is now the only method available to identify W or Y-speciﬁc small RNAs. Also References KQ method can be used to identify sequences linked to X or Z chro- 1. Matsuda, M., Nagahama, Y. and Shinomiya, A. 2002, DMY is a mosome if we used X or Z-speciﬁc 15-mers (KQ value around 2). As Y-speciﬁc DM-domain gene required for male development in the medaka a genome reference sequence is not necessary, the KQ method can be ﬁsh, Nature, 417, 559–63. applied to most species with heterogametic sex chromosomes where 2. Ayers, K.L., Davidson, N.M., Demiyah, D., et al. 2013, RNA sequencing genome information is scarce, and it could become an useful tool for reveals sexually dimorphic gene expression before gonadal differentiation obtaining whole genome information of heterogametic sex chromo- in chicken and allows comprehensive annotation of the W-chromosome, somes (Y or W). Genome Biol., 14,R26. The complete biological function of the B. mori W chromosome is 3. Chen, N., Bellott, D.W., Page, D.C. and Clark, A.G. 2012, Identiﬁcation of avian W-linked contigs by short-read sequencing, BMC Genomics, 13,183. still unclear. Although many scientists have tried to uncover the bio- 2,26,32 4. Schartl, M., Schmid, M. and Nanda, I. 2016, Dynamics of vertebrate sex logical process of sex-determination in this species, the lack of chromosome evolution: from equal size to giants and dwarfs, W chromosome speciﬁc sequence information hinders the annotation Chromosoma, 125, 553–71. and molecular studies of the W. Here, we showed that the KQ 5. Consortium, I.C.G.S. 2004, Sequence and comparative analysis of the method could help in drastically changing this situation. We ob- chicken genome provide unique perspectives on vertebrate evolution, tained W-linked transcripts and small RNAs without the use of com- Nature, 432, 695–716. plete genome sequences. These RNA sequences will be helpful in the 6. Skaletsky, H., Kuroda-Kawaguchi, T., Minx, P.J., et al. 2003, The functional study of the W chromosome. Based on preliminary tran- male-speciﬁc region of the human Y chromosome is a mosaic of discrete script RNA analysis, we found many lncRNAs and a few possible sequence classes, Nature, 423, 825–37. coding RNAs expressed from 16 hpo when the sex-determination 7. Foote, S., Vollrath, D., Hilton, A. and Page, D. 1992, The human Y chro- mosome: overlapping DNA clones spanning the euchromatic region, cascade initiates, and which may be involved in the sex determina- Science, 258, 60–6. tion process. Many lncRNAs seem to be the precursors of piRNAs, 8. Hall, A.B., Papathanos, P.A., Sharma, A., et al. 2016, Radical remodeling and our small RNA analysis showed that most W-linked small of the Y chromosome in a recent radiation of malaria mosquitoes, Proc. RNAs are likely piRNAs, which indicates that piRNA based gene Natl. Acad. Sci. USA, 113, E2114–23. regulation may be the main function of the B. mori W chromosome. 9. Krsticevic, F.J., Schrago, C.G. and Carvalho, A.B. 2015, Long-read single piRNA associates with PIWI family proteins whose complexes molecule sequencing to resolve tandem gene copies: the Mst77Y region on cleave target sequences to silence genes or transposons (Aravin et al., the drosophila melanogaster Y chromosome, G3: Genes|Genomes|Genetics, 2006; Saito et al., 2006; Brennecke et al., 2007). For example, 5, 1145–50. Kiuchi et al. (2014) found that a speciﬁc W-linked piRNA (Fem 10. Traut, W., Vogel, H., Glockner, G., Hartmann, E. and Heckel, D.G. piRNA) interacts with the product of a Z-linked Masculinizer (Masc) 2013, High-throughput sequencing of a single chromosome: a moth W Downloaded from https://academic.oup.com/dnaresearch/article-abstract/25/4/375/4958748 by Ed 'DeepDyve' Gillespie user on 22 August 2018 382 A new method describing sex chromosome chromosome, Chromosome Res.: Int. J. Mol. Supramol. Evol. Aspects 24. Abe, H., Seki, M. and Ohbayashi, F. 2005, Partial deletions of the W Chromosome Biol., 21, 491–505. chromosome due to reciprocal translocation in the silkworm Bombyx 11. Carvalho, A.B. and Clark, A.G. 2013, Efﬁcient identiﬁcation of Y chro- mori, Insect Mol. Biol., 14, 339–52. mosome sequences in the human and Drosophila genomes, Genome Res., 25. Suetsugu, Y., Minami, H., Shimomura, M., et al. 2007, End-sequencing 23, 1894–907. and characterization of silkworm (Bombyx mori) bacterial artiﬁcial chro- 12. Hall, A.B., Qi, Y., Timoshevskiy, V., Sharakhova, M.V., Sharakhov, I.V. mosome libraries, BMC Genomics, 8, 314. and Tu, Z. 2013, Six novel Y chromosome genes in Anopheles mosquitoes 26. Traut, W., Sahara, K. and Marec, F. 2007, Sex chromosomes and sex de- discovered by independently sequencing males and females, BMC termination in Lepidoptera, Sexual Dev.: Genetics, Mol. Biol. Evol. Genomics, 14, 273. Endocrinol. Embryol. Pathol. Sex Determination Differentiation, 1, 13. Carvalho, A.B., Dobo, B.A., Vibranovski, M.D. and Clark, A.G. 2001, 332–46. Identiﬁcation of ﬁve new genes on the Y chromosome of Drosophila mela- 27. Grabherr, M.G., Haas, B.J., Yassour, M., et al. 2011, Trinity: reconstruct- nogaster, Proc. Natl. Acad. Sci. USA, 98, 13225–30. ing a full-length transcriptome without a genome from RNA-Seq data, 14. International Silkworm Genome, C. 2008, The genome of a lepidopteran Nat. Biotechnol., 29, 644–52. model insect, the silkworm Bombyx mori, Insect Biochem. Mol. Biol., 38, 28. Wu, Y., Cheng, T., Liu, C., et al. 2016, Systematic identiﬁcation and char- 1036–45. acterization of long non-coding RNAs in the silkworm, Bombyx mori, 15. Malone, J.H., Cho, D.Y., Mattiuzzo, N.R., et al. 2012, Mediation of PloS One, 11, e0147147. Drosophila autosomal dosage effects and compensation by network inter- 29. Sun, L., Luo, H., Bu, D., et al. 2013, Utilizing sequence intrinsic composi- actions, Genome Biol., 13, r28. tion to classify protein-coding and long non-coding transcripts, Nucleic 16. Tomaszkiewicz, M., Medvedev, P. and Makova, K.D. 2017, Y and W chro- Acids Res., 41, e166. mosome assemblies: approaches and discoveries, Trends Genet., 33, 266–82. 30. Wang, J., Long, M. and Vibranovski, M.D. 2012, Retrogenes moved out 17. Marcais, G. and Kingsford, C. 2011, A fast, lock-free approach for efﬁ- of the z chromosome in the silkworm, J. Mol. Evol., 74, 113–26. cient parallel counting of occurrences of k-mers, Bioinformatics, 27, 31. Kawaoka, S., Kadota, K., Arai, Y., et al. 2011, The silkworm W chromo- 764–70. some is a source of female-enriched piRNAs, RNA, 17, 2144–51. 18. Langmead, B. 2010, Aligning short sequencing reads with Bowtie. Curr. 32. Kiuchi, T., Koga, H., Kawamoto, M., et al. 2014, A single female-speciﬁc Protoc. Bioinformatics, 32, 11.17.11–11.17.14. piRNA is the primary determiner of sex in the silkworm, Nature, 509, 19. Kelley, D.R., Schatz, M.C. and Salzberg, S.L. 2010, Quake: 633–6. quality-aware detection and correction of sequencing errors, Genome 33. Sakai, H., Aoki, F. and Suzuki, M. G. 2014, Identiﬁcation of the key Biol., 11,R116. stages for sex determination in the silkworm, Bombyx mori, Dev. Genes 20. Li, R., Zhu, H., Ruan, J., et al. 2010, De novo assembly of human ge- Evol., 224, 119–23. nomes with massively parallel short read sequencing, Genome Res., 20, 34. Brennecke, J., Aravin, A.A., Stark, A., et al. 2007, Discrete small 265–72. RNA-generating loci as master regulators of transposon activity in 21. Zimin, A.V., Marcais, G., Puiu, D., Roberts, M., Salzberg, S.L. and Drosophila, Cell, 128, 1089–103. Yorke, J.A. 2013, The MaSuRCA genome assembler, Bioinformatics, 29, 35. Saito, K., Nishida, K.M., Mori, T., et al. 2006, Speciﬁc association of Piwi 2669–77. with rasiRNAs derived from retrotransposon and heterochromatic regions 22. Fan, X.Y., Hu, Z.Y., Xu, F.H., Yan, Z.Q., Guo, S.Q. and Li, Z.M. 2003, in the Drosophila genome, Genes Dev., 20, 2214–22. Rapid detection of rpoB gene mutations in rifampin-resistant mycobacte- 36. Lin, H. and Spradling, A.C. 1997, A novel group of pumilio mutations af- rium tuberculosis isolates in shanghai by using the ampliﬁcation refrac- fects the asymmetric division of germline stem cells in the Drosophila tory mutation system, J. Clin. Microbiol., 41, 993–7. ovary, Development (Cambridge, England), 124, 2463–76. 23. Promboon, A., Shimada, T., Fujiwara, H. and Kobayashi, M. 1995, 37. Pal-Bhadra, M., Leibovitch, B.A., Gandhi, S.G., et al. 2004, Linkage map of random ampliﬁed polymorphic DNAs (RAPDs) in the Heterochromatic silencing and HP1 localization in Drosophila are depen- silkworm, Genet. Res., 66,1. dent on the RNAi machinery, Science, 303, 669–72. Downloaded from https://academic.oup.com/dnaresearch/article-abstract/25/4/375/4958748 by Ed 'DeepDyve' Gillespie user on 22 August 2018
DNA Research – Oxford University Press
Published: Aug 1, 2018
It’s your single place to instantly
discover and read the research
that matters to you.
Enjoy affordable access to
over 18 million articles from more than
15,000 peer-reviewed journals.
All for just $49/month
Query the DeepDyve database, plus search all of PubMed and Google Scholar seamlessly
Save any article or search result from DeepDyve, PubMed, and Google Scholar... all in one place.
All the latest content is available, no embargo periods.
“Whoa! It’s like Spotify but for academic articles.”@Phil_Robichaud