Notwithstanding the rapid developments in sequencing techniques, Y and W sex chromosomes have still been mostly excluded from whole genome sequencing projects due to their high re- petitive DNA content. Therefore, Y and W chromosomes are poorly described in most species despite their biological importance. Several methods were developed for identifying Y or W- linked sequences among unmapped scaffolds. However, it is not enough to discover functional regions from short unmapped scaffolds. Here, we provide a new and simple strategy based on k-mer comparison for comprehensive analysis of the W chromosome in Bombyx mori. Using this novel method, we effectively assembled de novo 1281 W-derived genome contigs (totaling 1.9 Mbp), and identified 156 W-linked transcript RNAs and 345 W-linked small RNAs. This method will help in the elucidation of mechanisms of sexual development and exploration of W chromosome biological functions, and provide insights into the evolution of sex chromo- somes. Moreover, we showed this method can be employed in identifying heterogametic sex chromosomes (W and Y chromosomes) in many other species where genomic information is still scarce. Key words: heterogametic sex chromosome, silkworm, non-coding RNA 1. Introduction Bombyx mori, Gallus gallus. Previously people believed a sex deter- In most animal species, sex-determination is governed by chromo- mining gene would be localized in a unique sequence domain of a somal differences involving two common sex determination systems: heterogametic sex chromosome; thus many studies were focused on XX/XY sex chromosomes determine female/male, e.g. in Homo sapi- ﬁshing or positioning unique sequence domains on W- or Y-chromo- ens, Drosophila melanogaster and Anopheles gambiae and other spe- somes. In this way, DMY of the medaka ﬁsh was identiﬁed as a sex- cies and WZ/ZZ sex chromosomes determine female/male, e.g. in determining gene. However, Y and W chromosomes present special V C The Author(s) 2018. Published by Oxford University Press on behalf of Kazusa DNA Research Institute. This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact email@example.com 1 Downloaded from https://academic.oup.com/dnaresearch/advance-article-abstract/doi/10.1093/dnares/dsy010/4958748 by Ed 'DeepDyve' Gillespie user on 12 July 2018 2 A new method describing sex chromosome 3–5 challenges for whole genome shotgun assembly, because they are SRA accession number PRJNA393916. B. mori female and male enriched with interspersed repeat elements and segmental duplica- HiSeq reads were produced on an Illumina HiSeq 2000 sequencer 6,7 tions. Even though some unique domains exist on Y or W, many (101-bp pair-end using pools of larval posterior silk gland which transposable elements (TEs) interrupt these unique sequences into yielded 73-fold coverage of genome for each sex). B. mori female short pieces, making it difﬁcult to ﬁnd such unique domains. Thus, MiSeq reads (only used for assisting de novo assembly of W contigs) whole genome shotgun methods, even the latest third generation se- were produced on an Illumina MiSeq sequencer (250-bp pair-end; pools of female larvae; 20-fold genome coverage). All reads were quencing techniques (e.g. PacBIO) which can produce very long from Dazao, the inbred strain used in the genome project. The next reads, cannot effectively assemble high quality contigs or scaffolds of 8–10 generation sequencing data for male and female D. melanogaster Y and W chromosomes at present stage, which obstructs studies pooled 5 day old mated adults was downloaded from the NCBI of sex-determination and sexual development and reproduction. Sequence Read Archive [SRA: SRP007888]; the next generation se- In most genome projects, Y and W chromosome sequences are frag- quence data for male and female An. gambiae pooled wild type mented into a large number of small, unmapped scaffolds. Recently strain was downloaded from NCBI Sequence Read Archive [SRA: a few researchers have applied the CQ (Chromosome Quotient) 12,15 12 11 SRP014756]. Preferably, to ensure the accuracy of the KQ method or YGS (Y chromosome Genome Scan) method to identify method, male and female sequence data should be from highly in- Y-linked sequences in these unmapped scaffolds. The CQ method uses bred populations and data for both sexes should be of the same the number of alignments from male and female sequence data to de- amount. Since the coverage of the next-generation sequence data can termine whether a sequence is Y-linked. The YGS method, on the affect the calculation of the KQ value, from our experience, we sug- other hand, is based on a comparison of the assembled genome with 11,13 gest that more than 50X coverage is an appropriate choice, which female short-reads. We tried to apply both methods in B. mori to can produce a more precise high-frequency W-15-mer. B. mori ﬁnd W-derived contigs. However, both methods can only identify lim- RNA-Seq data of mixed sex embryos from 0–24 hpo were produced ited number of contigs. In CQ method all repeat sequences must be using an Illumina HiSeq 2000 sequencer, yielding 101-bp pair-end masked and the female to male ratio is not always precise, especially in and 24 G clean data for each of the different periods (0, 2, 4, 8, 16 short sequences (length<¼ 2 kb) with few reads aligned. As YGS and 24 hpo). Small RNA sequences have been deposited in method is based on genome sequences, implementing it in species with- GenBank/EMBL/DDBJ under accession numbers from AB386191– out reference genome is difﬁcult. Consequently, both methods cannot AB424683. work very effectively in B. mori, especially for short sequences. Hence, neither of them provides a reliable strategy for de novo assembly of Y or W chromosomes, which restricted the application of the two meth- 2.2. Criteria for filtrating valid W-specific k-mer ods in many other species. The genome sequence of males and females differ only by the W Just like the previously mentioned two methods, many studies chromosome, which produces W-speciﬁc k-mers. These k-mers are concentrated on how to eliminate repetitive sequences from the se- present only in the female data, but absent from the male data. quence data. However, the majority of Y or W chromosomes are Before doing a female vs. male k-mer comparison, removal of se- composed of repetitive sequences and many functional domains may quencing errors in the reads is important in order to achieve high res- be inserted into TEs, which means very limited unique contigs can be olution. This was done using two criteria: (i) masking low quality found if all repetitive sequences are removed. Nevertheless, many bases and (ii) ﬁltering k-mers with rare frequency. Speciﬁcally, for mutations accumulate on Y or W repetitive sequences and Y/W chro- the B. mori reads, the sequencing errors were ﬁltered by masking ba- mosome have no recombination with X/Z chromosome, which may ses with phred score<20 and removing 15-mers that were present generate some speciﬁc SNPs that can be used as molecular markers fewer than 5X in reads. Counting k-mers for separate female and to ﬁnd Y or W linked sequences. So we can utilize these repetitive se- male data and removal of sequencing errors can be done using 16,17 quences, instead of masking them, for identifying Y or W linked se- Jellyﬁsh. We set up the KQ parameter, for a given k-mer Ki, quences. Based on this concept, we applied a novel K-mer KQ(ki) ¼ M(ki)/F(ki), where M(ki) is the frequency of Ki in male Quotient (KQ) method as a more easy, effective and accurate way reads, F(ki) is the frequency of Ki in female reads. To obtain reliable for comprehensive analysis of the B. mori W chromosome. It can W-derived k-mers, we used the strict criterion of KQ¼ 0. To reduce provide a strategy for de novo assembling of contigs of Y or W chro- the interference of individual polymorphism in sequencing reads, we mosomes, unlike the CQ or YGS method. Moreover, in addition to ﬁltered out these k-mers (KQ¼ 0) with low frequency (<15, B. mori, we successfully applied this method to two diptera, D. mela- Supplementary Fig. S1) in female reads. nogaster and An. gambiae, as XX/XY sex determination species. As a complete genome sequence is not necessary and removal of repeat 2.3. W-derived reads and de novo assembly strategy sequences is not needed for the successful implementation of our Reads containing W-15-mers could be W-derived and therefore method, it can also be applied to species where genomic information called W-reads. We screened the W-reads from B. mori Illumina fe- is scarce. Additionally, the KQ method can be extended for the iden- male reads with Bowtie. Since the reads were paired ends, if a sin- tiﬁcation of Y or W-linked transcript RNAs and small RNAs, thus gle read sequence was W-derived (contained W-15-mer), its providing an innovative approach to study the functions of W or Y corresponding pair end read could be also W-derived even without a chromosomes without complete sequence information. W-15-mer. Based on this assumption, we screened W-derived paired- end reads and assembled them using the MaSuRCA assembler. To prevent incorporation of invalid contigs originating from misassem- 2. Materials and methods bly, a criterion was followed to ﬁlter out invalid contigs. We aligned 2.1. Short reads all the W-15mers to all contigs with 0 mismatch using Bowtie, and All sequencing data produced in the present study have been depos- calculated the amount of W-15-mers per Kb sequence (AWK) ited in the NCBI Short Read Archive and can be accessed under the for each contig. From the scatter distribution map (Supplementary Downloaded from https://academic.oup.com/dnaresearch/advance-article-abstract/doi/10.1093/dnares/dsy010/4958748 by Ed 'DeepDyve' Gillespie user on 12 July 2018 S. Li et al. 3 Fig. S2), we found that when the AWK value was less than 10, the First-Strand Synthesis Kit (TaKaRa, Kusatsu, Japan). qPCR was per- scatters deviated from the main trend line, indicating that such con- formed with SYBR Premix Ex (TaKaRa, Kusatsu, Japan). tigs with fewer W-15-mers were possibly not W-derived. Thus, we set a threshold of AWK¼ 10 for ﬁltering invalid contigs. Whereas in 3. Results and discussion the case of very short contigs (less than 1 kb), we suggested that at least 10 W-15-mers were required to be aligned to such contigs. 3.1. The KQ method Although most of the B. mori W chromosome is composed of TEs 2.4. AWK threshold in transcripts and small RNAs whose copies are widely distributed on autosomes or on the Z chro- For identiﬁcation of W derived contigs from the already assembled mosome, still some subtle variations such as single nucleotide poly- genome sequences, it is also appropriate to set the same threshold morphisms (SNPs) can be observed only on W chromosome. Such with AWK¼ 10 as followed previously. However, for identiﬁcation W-speciﬁc variations may produce W-speciﬁc short sequences called of transcripts and small RNA sequences we must follow different cri- k-mers. teria. Considering that de novo assembled RNA transcripts were For obtaining these possible W-speciﬁc k-mers, we tried to ﬁnd an composed of different exons and free from introns, we could increase optimum k value (sequence length). K¼ 15, which is a typical value 11,19 the threshold to AWK¼ 50 for W-derived transcripts. Thus, we sug- used for ﬁltering errors in short reads, is suitable for female- gested that it is better to choose a number from 10 to 50, since the speciﬁc (W-speciﬁc) k-mers in B. mori. We counted the frequencies higher AWK value gave more reliable results but fewer W-derived of each 15-mer separately in female and male sequence data and cre- transcript RNA candidates and the lower AWK values gave more ated a parameter called KQ. For a given k-mer Ki, KQ(ki) ¼ M(ki)/ candidate sequences but less reliable results. As for the small RNAs, F(ki), where M(ki) is the frequency of Ki in male reads and F(ki) de- we counted the number of aligned W-15-mers for each small RNA notes the frequency of Ki in female reads. As females and males share and set a threshold with no less than three aligned W-15-mers. the same complement of autosomes, autosomal 15-mers are present Similarly, this threshold is also ﬂexible and can be increased for in both female and male sequencing data in roughly the same quanti- higher conﬁdence results, or can be decreased for a higher number of ties. Therefore, autosomal 15-mers have KQ values distributed W-linked small RNA candidates. around one (Fig. 1). As males have two Z chromosomes while fe- males have only one, the Z-speciﬁc 15-mers have KQ values distrib- uted around two. Since unique W chromosomal 15-mers are present 2.5. Application and limitation of the KQ method only in female sequencing data, these KQ values localize to zero. We Different species have different genome sizes and therefore the k also applied KQ to D. melanogaster and An. gambiae [Fig. 1, ex- value of k-mer needs to be changed according to the genome size. The k-mer size should be large enough so that identical k-mers sel- change M(ki) and F(ki) position in XX/XY species]. The results were 19,20 dom occur from sequencing errors. On the other hand, run time similar to B. mori, which conﬁrms that KQ may be applied to most sexually heterogametic species. and memory usage would be sharply increased with larger k values. In our assumption, these W or Y-speciﬁc 15-mers (KQ¼ 0) could Thus, k¼ 15 for Drosophila, and k¼ 18 for humans are typical val- 19,20 ues used for ﬁltering errors in short reads from these genomes. be used as labels to identify sequences from W or Y chromosome. To Taking the above information and results from the YGS method into verify such an assumption, we ﬁrst applied it to D. melanogaster Y, consideration, we recommend initial values of k¼ 15 for insect-like which has been relatively well sequenced. We aligned all Y-speciﬁc genomes (approximately size 100–500 Mb) and k¼ 18 for 15-mers (Y-15-mers) of D. melanogaster to its genome scaffolds vertebrate-like genomes (approximately size 2 Gb). (Genebank assembly accession: GCA_000001215.4), and a parame- ter named AYK (Amount of Y-15-mers per Kb sequence) was created for each scaffold. Based on the analysis of each scaffold’s AYK value 2.6. Molecular biology methods and length (Fig. 2A), we could show clearly that the majority of scaf- Bombyx mori (Dazao strain) larvae were reared on fresh mulberry folds are Y-derived (red points) when AYK>¼10, while scaffolds of leaves in a dedicated silkworm rearing room at ambient temperature X or autosome (blue and green points) are with the AYK< 10 and (23–27 C). Genomic DNA samples were separately isolated from some unmapped scaffolds (grey points) with AYK value>¼10 are virgin female and male individuals with tissue DNA kits (OMEGA, also possible Y-derived. The result conﬁrmed that our KQ method Norcross, GA). In the case of W contigs that had autosomal or Z could effectively identify Y-linked sequences. Except KQ method, we paralogs, primers for genomic DNA PCR were designed considering also applied previous CQ and YGS method in D. melanogaster, and the differences between the autosomal or Z paralogs and the W se- compared the results of three methods to evaluate which method quence at the 3’ end of the primer. PCR was performed with GO could be most effective (We only selected contigs with AYK>¼50 in Taq DNA Polymerase (Promega, Madison, WI; Supplementary KQ method to unify the standard of three methods). For 200 already Table S1–S3), RNA was isolated from individual embryos following known Y-scaffolds, CQ method could identify only 63 of 200 as Y- the traditional Trizol method (Ambion, Carlsbad, CA) using the total linked, whereas YGS identiﬁed 106 and our KQ provided 154 Y- RNA isolation protocol according to the manufacturer’s instructions. scaffolds. Figure 2B showed that for more than 5 kb scaffolds, three Residual DNA in samples was used to identify the gender of each methods could identify most Y-linked scaffolds, whereas for less embryo with the W_seq1 marker (Supplementary Table S1). Total than 5 kb, KQ method worked obviously better than the other two RNA (following DNase treatment) was subjected to reverse tran- TM methods. Thus, compared with other two methods, KQ method per- scription using a PrimeScript RT Master Mix (Perfect Real Time; TaKaRa, Kusatsu, Japan) in 50 ll reaction volume (2500 ng total formed more effectively for short scaffolds (less than 5 kb). As Y and RNA) and then diluted 5-fold. One lg of cDNA was used in 10 ll W chromosome sequences are usually fragmented into a large num- ber of short and mostly unmapped scaffolds due to highly repetitive PCR reaction volume. Small RNAs (following DNase treatment) TM were subjected to reverse transcription using a Mir-X miRNA sequences, KQ method would have wider application in Downloaded from https://academic.oup.com/dnaresearch/advance-article-abstract/doi/10.1093/dnares/dsy010/4958748 by Ed 'DeepDyve' Gillespie user on 12 July 2018 4 A new method describing sex chromosome Figure 1. Distribution of calculated KQs. We counted the KQ value for each 15-mer and constructed a distribution histogram for B. mori, D. melanogaster and An. gambiae. It is apparent that most autosome-derived, Z specific or W unique 15-mers have distinctive KQs. Autosome derived 15-mers have a KQ value around 1 (mainly from 0.3 to 1.8), Z or X derived 15-mers have a KQ value around 2 (mainly from 1.8 to 2.2) and W or Y derived 15-mers have a KQ value around 0. Thus, we can easily identify W specific 15-mers by the KQ value. heterogametic organisms, particularly for the species in which geno- contigs will be invaluable in the search for W-speciﬁc PCR markers. mic information is still scarce. The length of B. mori W chromosome may have the same size with Z chromosome (20 M) depending on the relative cytogenetic length and if we obtain a sufﬁcient number of W-BACs, it would 3.2. De novo assembled W specific genome contigs be possible to construct BAC contigs covering the whole W and identification of W-BACs chromosome. We obtained 93,942 high-frequency 15-mers (frequency>¼15 in female reads), where the KQ values were zero in B. mori. These 3.3. Identification of W-linked transcript RNAs 15-mers (W-15-mers) could not be aligned onto a male genomic as- Use of RNA-Seq has improved the identiﬁcation of a large number of sembly, which strengthened the likelihood of W-speciﬁcity. As reads containing W-15-mers could be W-derived, we aligned all the W-15- cellularly expressed RNA species. However, little information is re- ported on how many of these are actually transcribed on the W chro- mers to female genome sequencing reads and screened the reads con- mosome. Here, we applied the KQ method to RNA-Seq data derived taining W-15-mers. We obtained about 1.1 million HiSeq paired-end from silkworm embryos to ﬁnd W-linked transcripts. We performed reads (101 bp, 216 Mbp) and 0.6 million MiSeq paired-end reads deep RNA-Seq for 0-24 h post-oviposition (hpo) mixed embryo sam- (250 bp, 300 Mbp) as W-derived reads, which were assembled into ples (each sample contained 5–6 unsexed embryos), which we assem- 6718 W-speciﬁc contigs using the MaSuRCA assembler. We set a bled de novo using the Trinity assembler, and obtained 175,492 threshold AWK (Amount of W-15-mers per Kb sequence) value¼ 10 transcript RNAs including different alternative splicing isoforms. We and considered contigs with an AWK value greater than 10 as valid then screened for valid W-derived RNAs with a strict threshold of W-speciﬁc contigs (Supplementary Fig. S2). Eligible contigs of 1281 were selected with a total length of around 1.9 Mb, which covered al- AWK value¼ 50 and thus selected a total of 156 candidate W-linked most 1/10 of the W chromosome. To conﬁrm these contigs were in- RNAs. Using an expression heatmap to analyse the data (Fig. 4A), deed W-derived, we randomly chose 10 of them (named W_Seq1 to we observed two main expression clusters: in Cluster A RNAs initi- W_Seq10) for a separate PCR test with female and male genomic ated from 16 hpo; in Cluster B RNAs were continuously expressed from 0–24 hpo. DNA. The primers were designed to have W-15-mers at the 3’ end To conﬁrm that these RNAs were indeed W-derived, ﬁve candi- and used a relatively high annealing temperature to reduce the likeli- 22–24 dates were selected for examining their expression in individual em- hood of primer mismatch. PCR bands appeared only in females bryos. We designed primers using their W-k-mers (Supplementary for all 10 primer pairs, (Fig. 3A), indicating that all of them were W- Table S3) in the same way as for W-BAC ends. We isolated both speciﬁc. DNA and RNA from single embryos aged 16 hpo, and used DNA to To obtain the ﬁne structure of the W chromosome, we used this method to ﬁnd W-derived bacterial artiﬁcial chromosomes (BACs) in detect the presence of the W chromosome by PCR (Supplementary silkworm BAC libraries. The silkworm BAC reference library (de- Fig. S3), while we followed RNA by RT-PCR to examine expression rived from both sexes) contains 74,717 BACs, for which about 200– (Fig. 4B). The RT-PCR results showed cluster A RNAs were ex- 500 bp ends have already been sequenced with the Sanger ABI3730 pressed only in female embryos, whereas cluster B RNAs were ex- sequencer. Using the threshold AWK value¼ 10,122 BAC end se- pressed in both female and male embryos. For further veriﬁcation, quences were obtained to be putatively W derived, and we chose 10 we performed a separate PCR test of both cluster A and B RNAs of them at random to conﬁrm their W speciﬁcity by PCR test against with female and male genomic DNA. For these ﬁve RNAs, PCR bands appeared only in female genomic DNA (Fig. 4B), indicating female vs. male genomic DNA (Supplementary Table S2). The pres- both cluster RNAs were transcribed on the W chromosome. These ence of PCR bands only in female genomic DNA conﬁrmed (Fig. 3B) that all 10 BAC end sequences were W-speciﬁc, i.e. the correspond- results suggested that cluster B RNAs are very likely to be maternal ing BACs were also W-derived. RNAs deposited in the embryos as they are present throughout early The results indicated the KQ method could effectively assemble embryos of both females and males. An interesting phenomenon in and identify contigs originating from the W chromosome. These RT-PCR of cluster B RNAs is that bands were brighter in female Downloaded from https://academic.oup.com/dnaresearch/advance-article-abstract/doi/10.1093/dnares/dsy010/4958748 by Ed 'DeepDyve' Gillespie user on 12 July 2018 S. Li et al. 5 Figure 3. Female-specific PCR amplification of de novo assembled contigs (A) and BAC end sequences (B). (A) PCR performed with three female and three male genomic DNA samples shows female-specific amplification of 10 W-contigs. Chr 2 sequence was amplified as a control in both female and male genomic DNA confirming the integrity of the genomic DNA samples. (B) PCR carried out with three female and three male genomic DNA samples Figure 2. Identification of Y-linked sequences in D. melanogaster. (A) We ap- shows female-specific amplification of 10 BAC end sequences. plied KQ method in genome sequences of D. melanogaster and counted length and AYK value for each scaffold. For the vast majority of Autosome chromosome. We found 93 ovarian piRNAs among the lncRNAs (green point) and X-derived (blue point) scaffolds, their AYK values are less than 10 (usually close to 0); Y-derived scaffolds (red point) were with AYK through NCBI BLAST, which implied many lncRNAs seemed to be value more than 10. Unmapped scaffolds (grey point) have dispersive AYK precursors of piRNAs. values, we speculated that scaffolds with AYK value more than 10 are possi- With GO annotation and an NCBI BLAST search of 22 candidate ble Y-linked. (B) Compared with the CQ and YGS methods, KQ method could coding RNA sequences, we found that many of them had high simi- identify more precise Y-derived scaffolds in D. melanogaster. Based on the larity with some B. mori uncharacterized genes located on an auto- analysis of scaffold length and number, the results of three methods were some or Z chromosome (Supplementary Table S4), suggesting these similar when the length of scaffold is more than 5 kb, whereas KQ method (red pillar) could identify more precisely Y-linked scaffolds when the length genes may have been translocated onto the W chromosome through of scaffold is less than 5 kb. transposons during evolution. The GO analysis showed that these coding transcripts were diverse, possibly involved in DNA integra- than in male, which indicated the expressions of cluster B RNAs are tion and metabolic processes, nucleic acid binding, nucleotidyltrans- different in female and male embryos. Thus, we did the qPCR of a ferase activity, etc. (Supplementary Fig. S4). cluster B RNAs (DN42314_c0) in female and male embryos from 0 to 36 hpo, and found sex-biased expression starting from 8 hpo 3.5. Identification of W-linked small RNAs (Fig. 4C). The expression levels were almost same in both sex em- The W chromosome being a source of female-enriched piRNAs in- bryos before 8 hpo, but after that, it seemed to fade away in male dicates that W chromosome function mainly depends on them. For embryos and de novo expressed in female embryos. example, Fem piRNA was reported to be the primary sex-determiner in silkworm. As small RNAs are usually around 18–30 nt, it is dif- 3.4. Annotation of W-linked transcript RNAs ﬁcult to determine whether or not they are W-derived. Using shorter We found 156 W-linked RNAs, among which we predicted 134 were length W-15-mers, the KQ method can also be used to identify W-de- 28,29 non-coding RNAs by their Coding-Non-Coding Index (CNCI). rived small RNAs from the pool of small RNA sequences in B. mori. Non-coding RNAs, most of which are long ncRNAs (lncRNAs), are Kawaoka et al. (2008) identiﬁed 38,493 small RNA species from the main repertoire of the transcripts expressed from the W ovarian RNA (pupa day 4) in the silkworm, while we identiﬁed Downloaded from https://academic.oup.com/dnaresearch/advance-article-abstract/doi/10.1093/dnares/dsy010/4958748 by Ed 'DeepDyve' Gillespie user on 12 July 2018 6 A new method describing sex chromosome Figure 4. W-derived transcripts, expression profiles and genomic PCR amplification. (A) Hierarchical cluster analysis of W-derived transcripts. Expression pro- files of transcripts in mixed embryos (0–24 hpo) were clustered and visualized in a heatmap. (B) PCR amplification of W-derived transcripts in cDNA (Left panel) and gDNA (Right panel). RT-PCR (cDNA) showed expression as cluster A (female embryos specific) or cluster B (in both female and male embryos). However,a PCR test of genomic DNA showed both transcript RNAs were female-specific, indicating that they are W-derived, but cluster A RNAs are zygotically expressed in female embryos whereas cluster B RNAs are probably maternal RNAs deposited into eggs. (C) qPCR of a cluster B RNA (DN42314_c0) from 0 to 36 hpo ex- hibited its sex-biased expression in embyos. 72,439 small RNA species from embryo RNA of day 8 pupa. We female and male pupal cDNA (day 4, Supplementary Table S5). The merged the two small RNA datasets, which amounted to 106,717 bands appeared only in females for all the ﬁve small RNAs (Fig. 5B), which demonstrated that they were indeed W-derived. small RNA species and then aligned W-15-mers to all these small RNA sequences to screen small RNAs with no fewer than three aligned W-15-mers, resulting in 345 candidates. Their sizes were 4. Discussion mostly around 27–29 nt (Fig. 5A) with a strong bias for U(T) at the 5’ end, indicating that most of the W-linked small RNAs were In this study, we ﬁrst explored the existence of W or Y-speciﬁc 15- piRNAs. We randomly chose 5 small RNAs out of the 345 candi- mers in B. mori, D. melanogaster and An. gambiae, then developed dates to check their expression levels using RT-PCR in separate the KQ method and obtained a successful result in D. melanogaster. Downloaded from https://academic.oup.com/dnaresearch/advance-article-abstract/doi/10.1093/dnares/dsy010/4958748 by Ed 'DeepDyve' Gillespie user on 12 July 2018 S. Li et al. 7 piRNA associates with PIWI family proteins whose complexes cleave target sequences to silence genes or transposons (Aravin et al., 2006; Saito et al., 2006; Brennecke et al., 2007). For example, Kiuchi et al. (2014) found that a speciﬁc W-linked piRNA (Fem piRNA) interacts with the product of a Z-linked Masculinizer (Masc) RNA, silencing of Masc by Fem piRNA is required for the produc- tion of female-speciﬁc isoforms of Bmdsx in female embryos and that a Masc-associated process may control both dosage compensa- 32,34,35 tion and masculinization in male embryos. This suggests that W-linked non-coding RNAs, especially piRNAs, play important roles in sex-determination and dosage compensation, and may be the main functional form encoded on the W. Also many W-derived piRNAs are expressed at the pupal period, which may function to 36,37 maintain germline stem cells, suggesting that the effects of W- linked piRNAs are continuous throughout female life. Acknowledgements The authors thank Youbing Guo, Duolian Liu and Li Peng for assistance with the information analysis. They also appreciate Zha XF for providing the em- bryo RNA-Seq data. Conflict of interest Figure 5. Length distribution graph of W-linked small RNAs and female-spe- cific RT-PCR amplification. (A) The lengths of most W-derived small RNAs The authors declare no competing ﬁnancial interests. were concentrated on around 27–29 nt and their first bases from the 5’ end were preferentially T (U), which indicated most W-derived small RNAs are piRNAs. (B) Female-specific RT-PCR amplifications indicated that all five Funding small RNAs were indeed W-derived. This work was supported by the grant from the One Thousand Foreign Experts Recruitment Program of the Chinese Government (No. WQ After that, we applied KQ method in B. mori to assemble W-linked 20125500074). genome contigs and to identify W-linked transcript RNAs and small RNAs effectively. In contrast to the previously reported CQ and YGS methods, our KQ method is more effective and accurate espe- Supplementary data cially in short sequences. In addition, the KQ method opens up a Supplementary data are available at DNARES online. novel approach for de novo assembly of W contigs, and is now the only method available to identify W or Y-speciﬁc small RNAs. Also KQ method can be used to identify sequences linked to X or Z chro- References mosome if we used X or Z-speciﬁc 15-mers (KQ value around 2). As 1. Matsuda, M., Nagahama, Y. and Shinomiya, A. 2002, DMY is a a genome reference sequence is not necessary, the KQ method can be Y-speciﬁc DM-domain gene required for male development in the medaka applied to most species with heterogametic sex chromosomes where ﬁsh, Nature, 417, 559–63. genome information is scarce, and it could become an useful tool for 2. Ayers, K.L., Davidson, N.M., Demiyah, D., et al. 2013, RNA sequencing obtaining whole genome information of heterogametic sex chromo- reveals sexually dimorphic gene expression before gonadal differentiation somes (Y or W). in chicken and allows comprehensive annotation of the W-chromosome, Genome Biol., 14,R26. The complete biological function of the B. mori W chromosome is 3. Chen, N., Bellott, D.W., Page, D.C. and Clark, A.G. 2012, Identiﬁcation of still unclear. Although many scientists have tried to uncover the bio- 2,26,32 avian W-linked contigs by short-read sequencing, BMC Genomics, 13,183. logical process of sex-determination in this species, the lack of 4. Schartl, M., Schmid, M. and Nanda, I. 2016, Dynamics of vertebrate sex W chromosome speciﬁc sequence information hinders the annotation chromosome evolution: from equal size to giants and dwarfs, and molecular studies of the W. Here, we showed that the KQ Chromosoma, 125, 553–71. method could help in drastically changing this situation. We ob- 5. Consortium, I.C.G.S. 2004, Sequence and comparative analysis of the tained W-linked transcripts and small RNAs without the use of com- chicken genome provide unique perspectives on vertebrate evolution, plete genome sequences. These RNA sequences will be helpful in the Nature, 432, 695–716. 6. Skaletsky, H., Kuroda-Kawaguchi, T., Minx, P.J., et al. 2003, The functional study of the W chromosome. Based on preliminary tran- male-speciﬁc region of the human Y chromosome is a mosaic of discrete script RNA analysis, we found many lncRNAs and a few possible sequence classes, Nature, 423, 825–37. coding RNAs expressed from 16 hpo when the sex-determination 7. Foote, S., Vollrath, D., Hilton, A. and Page, D. 1992, The human Y chro- cascade initiates, and which may be involved in the sex determina- mosome: overlapping DNA clones spanning the euchromatic region, tion process. Many lncRNAs seem to be the precursors of piRNAs, Science, 258, 60–6. and our small RNA analysis showed that most W-linked small 8. Hall, A.B., Papathanos, P.A., Sharma, A., et al. 2016, Radical remodeling RNAs are likely piRNAs, which indicates that piRNA based gene of the Y chromosome in a recent radiation of malaria mosquitoes, Proc. regulation may be the main function of the B. mori W chromosome. Natl. Acad. Sci. USA, 113, E2114–23. Downloaded from https://academic.oup.com/dnaresearch/advance-article-abstract/doi/10.1093/dnares/dsy010/4958748 by Ed 'DeepDyve' Gillespie user on 12 July 2018 8 A new method describing sex chromosome 9. Krsticevic, F.J., Schrago, C.G. and Carvalho, A.B. 2015, Long-read single 23. Promboon, A., Shimada, T., Fujiwara, H. and Kobayashi, M. 1995, molecule sequencing to resolve tandem gene copies: the Mst77Y region on Linkage map of random ampliﬁed polymorphic DNAs (RAPDs) in the the drosophila melanogaster Y chromosome, G3: Genes|Genomes|Genetics, silkworm, Genet. Res., 66,1. 5, 1145–50. 24. Abe, H., Seki, M. and Ohbayashi, F. 2005, Partial deletions of the W 10. Traut, W., Vogel, H., Glockner, G., Hartmann, E. and Heckel, D.G. chromosome due to reciprocal translocation in the silkworm Bombyx 2013, High-throughput sequencing of a single chromosome: a moth W mori, Insect Mol. Biol., 14, 339–52. chromosome, Chromosome Res.: Int. J. Mol. Supramol. Evol. Aspects 25. Suetsugu, Y., Minami, H., Shimomura, M., et al. 2007, End-sequencing Chromosome Biol., 21, 491–505. and characterization of silkworm (Bombyx mori) bacterial artiﬁcial chro- 11. Carvalho, A.B. and Clark, A.G. 2013, Efﬁcient identiﬁcation of Y chro- mosome libraries, BMC Genomics, 8, 314. mosome sequences in the human and Drosophila genomes, Genome Res., 26. Traut, W., Sahara, K. and Marec, F. 2007, Sex chromosomes and sex de- 23, 1894–907. termination in Lepidoptera, Sexual Dev.: Genetics, Mol. Biol. Evol. 12. Hall, A.B., Qi, Y., Timoshevskiy, V., Sharakhova, M.V., Sharakhov, I.V. Endocrinol. Embryol. Pathol. Sex Determination Differentiation, 1, and Tu, Z. 2013, Six novel Y chromosome genes in Anopheles mosquitoes 332–46. discovered by independently sequencing males and females, BMC 27. Grabherr, M.G., Haas, B.J., Yassour, M., et al. 2011, Trinity: reconstruct- Genomics, 14, 273. ing a full-length transcriptome without a genome from RNA-Seq data, 13. Carvalho, A.B., Dobo, B.A., Vibranovski, M.D. and Clark, A.G. 2001, Nat. Biotechnol., 29, 644–52. Identiﬁcation of ﬁve new genes on the Y chromosome of Drosophila mela- 28. Wu, Y., Cheng, T., Liu, C., et al. 2016, Systematic identiﬁcation and char- nogaster, Proc. Natl. Acad. Sci. USA, 98, 13225–30. acterization of long non-coding RNAs in the silkworm, Bombyx mori, 14. International Silkworm Genome, C. 2008, The genome of a lepidopteran PloS One, 11, e0147147. model insect, the silkworm Bombyx mori, Insect Biochem. Mol. Biol., 38, 29. Sun, L., Luo, H., Bu, D., et al. 2013, Utilizing sequence intrinsic composi- 1036–45. tion to classify protein-coding and long non-coding transcripts, Nucleic 15. Malone, J.H., Cho, D.Y., Mattiuzzo, N.R., et al. 2012, Mediation of Acids Res., 41, e166. Drosophila autosomal dosage effects and compensation by network inter- 30. Wang, J., Long, M. and Vibranovski, M.D. 2012, Retrogenes moved out actions, Genome Biol., 13, r28. of the z chromosome in the silkworm, J. Mol. Evol., 74, 113–26. 16. Tomaszkiewicz, M., Medvedev, P. and Makova, K.D. 2017, Y and W chro- 31. Kawaoka, S., Kadota, K., Arai, Y., et al. 2011, The silkworm W chromo- mosome assemblies: approaches and discoveries, Trends Genet., 33, 266–82. some is a source of female-enriched piRNAs, RNA, 17, 2144–51. 17. Marcais, G. and Kingsford, C. 2011, A fast, lock-free approach for efﬁ- 32. Kiuchi, T., Koga, H., Kawamoto, M., et al. 2014, A single female-speciﬁc cient parallel counting of occurrences of k-mers, Bioinformatics, 27, piRNA is the primary determiner of sex in the silkworm, Nature, 509, 764–70. 633–6. 18. Langmead, B. 2010, Aligning short sequencing reads with Bowtie. Curr. 33. Sakai, H., Aoki, F. and Suzuki, M. G. 2014, Identiﬁcation of the key Protoc. Bioinformatics, 32, 11.17.11–11.17.14. stages for sex determination in the silkworm, Bombyx mori, Dev. Genes 19. Kelley, D.R., Schatz, M.C. and Salzberg, S.L. 2010, Quake: Evol., 224, 119–23. quality-aware detection and correction of sequencing errors, Genome 34. Brennecke, J., Aravin, A.A., Stark, A., et al. 2007, Discrete small Biol., 11,R116. RNA-generating loci as master regulators of transposon activity in 20. Li, R., Zhu, H., Ruan, J., et al. 2010, De novo assembly of human ge- Drosophila, Cell, 128, 1089–103. nomes with massively parallel short read sequencing, Genome Res., 20, 35. Saito, K., Nishida, K.M., Mori, T., et al. 2006, Speciﬁc association of Piwi 265–72. with rasiRNAs derived from retrotransposon and heterochromatic regions 21. Zimin, A.V., Marcais, G., Puiu, D., Roberts, M., Salzberg, S.L. and in the Drosophila genome, Genes Dev., 20, 2214–22. Yorke, J.A. 2013, The MaSuRCA genome assembler, Bioinformatics, 29, 36. Lin, H. and Spradling, A.C. 1997, A novel group of pumilio mutations af- 2669–77. fects the asymmetric division of germline stem cells in the Drosophila 22. Fan, X.Y., Hu, Z.Y., Xu, F.H., Yan, Z.Q., Guo, S.Q. and Li, Z.M. 2003, ovary, Development (Cambridge, England), 124, 2463–76. Rapid detection of rpoB gene mutations in rifampin-resistant mycobacte- 37. Pal-Bhadra, M., Leibovitch, B.A., Gandhi, S.G., et al. 2004, rium tuberculosis isolates in shanghai by using the ampliﬁcation refrac- Heterochromatic silencing and HP1 localization in Drosophila are depen- tory mutation system, J. Clin. Microbiol., 41, 993–7. dent on the RNAi machinery, Science, 303, 669–72. Downloaded from https://academic.oup.com/dnaresearch/advance-article-abstract/doi/10.1093/dnares/dsy010/4958748 by Ed 'DeepDyve' Gillespie user on 12 July 2018
DNA Research – Oxford University Press
Published: Apr 2, 2018
It’s your single place to instantly
discover and read the research
that matters to you.
Enjoy affordable access to
over 18 million articles from more than
15,000 peer-reviewed journals.
All for just $49/month
Query the DeepDyve database, plus search all of PubMed and Google Scholar seamlessly
Save any article or search result from DeepDyve, PubMed, and Google Scholar... all in one place.
Get unlimited, online access to over 18 million full-text articles from more than 15,000 scientific journals.
Read from thousands of the leading scholarly journals from SpringerNature, Elsevier, Wiley-Blackwell, Oxford University Press and more.
All the latest content is available, no embargo periods.
“Hi guys, I cannot tell you how much I love this resource. Incredible. I really believe you've hit the nail on the head with this site in regards to solving the research-purchase issue.”Daniel C.
“Whoa! It’s like Spotify but for academic articles.”@Phil_Robichaud
“I must say, @deepdyve is a fabulous solution to the independent researcher's problem of #access to #information.”@deepthiw
“My last article couldn't be possible without the platform @deepdyve that makes journal papers cheaper.”@JoseServera