Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 7-Day Trial for You or Your Team.

Learn More →

Structural features of the rice chromosome 4 centromere

Structural features of the rice chromosome 4 centromere Published online April 2, 2004 Nucleic Acids Research, 2004, Vol. 32, No. 6 2023±2030 DOI: 10.1093/nar/gkh521 Structural features of the rice chromosome 4 centromere Yu Zhang, Yuchen Huang, Lei Zhang, Ying Li, Tingting Lu, Yiqi Lu, Qi Feng, Qiang Zhao, 1 1 2 Zhukuan Cheng , Yongbiao Xue , Rod A. Wing and Bin Han* National Center for Gene Research, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, 500 Caobao Road, Shanghai 200233, China, Institute of Genetics and Developmental Biology, Chinese Academy of Sciences, Datun Road, Andingmenwai, Beijing 100101, China and Department of Plant Sciences, Arizona Genomics Institute, The University of Arizona, Tucson, AZ 85721, USA Received January 4, 2004; Revised February 12, 2004; Accepted March 10, 2004 ABSTRACT unique sequence (7,8), most eukaryotic species are composed of long highly repetitive DNA sequences. The currently A complete sequence of a chromosome centromere available eukaryotic genome sequencing projects have pro- is necessary for fully understanding centromere vided the virtually complete physical maps and sequences of function. We reported the sequence structures of many species, including Caenorhabditis elegans (9), the ®rst complete rice chromosome centromere Arabidopsis thaliana (10), Homo sapiens (11,12) and Oryza through sequencing a large insert bacterial arti®cial sativa (13,14) in the last few years. However, as a highly chromosome clone-based contig, which covered heterochromatic part of the chromosome, the centromere is still left to be a big `gap' to be sequenced. The highly the rice chromosome 4 centromere. Complete repetitive DNA is more dif®cult to map, clone, sequence and sequencing of the 124-kb rice chromosome 4 cen- assemble than the low copy number DNA. Though some tromere revealed that it consisted of 18 tracts of 379 detailed sequences about centromere regions have been tandemly arrayed repeats known as CentO and a analyzed, for example the satellite arrays of human centro- total of 19 centromeric retroelements (CRs) but no meres (15) and the pericentromeric regions on Arabidopsis unique sequences were detected. Four tracts, com- chromosomes (16), the sequences are not completely available posed of 65 CentO repeats, were located in the yet and the genome sequencing project has left the big opposite orientation, and 18 CentO tracts were challenge of determining the primary sequence of a functional ¯anked by 19 retroelements. The CRs were classi- higher eukaryotic centromere. The full length of natural ®ed into four types, and the type I retroelements centromeres in most eukaryotes is at the megabase level appeared to be more speci®c to rice centromeres. generally. Such a long repeat DNA region is the most dif®cult The preferential insert of the CRs among CentO barrier to cloning and sequencing it. The ordinary approach is repeats indicated that the centromere-speci®c retro- to use different methods that isolate speci®c centromere elements may contribute to centromere expansion regions from the rest of the genome. A successful example is the use of pulse ®eld gel electrophoresis (PFGE) for isolation during evolution. The presence of three intact retro- of the g1230 minichromosome derivative as a template for transposons in the centromere suggests that they cloning and sequencing a signi®cant proportion of the may be responsible for functional centromere initi- functional Drosophila centromere (17). Bacterial arti®cial ation through a transcription-mediated mechanism. chromosome (BAC) clones and similar clones with large insert size are also a good choice for isolation of centromere sequences from other genomic regions. However, this INTRODUCTION approach still has limitations for manipulation of a full centromere region such as those in human and Arabidopsis. The centromere is essential for correct segregation of Rice is an exception in the important model species because of chromosomes in both mitotic and meiotic cells. Although its different centromere size, and the size of the satellite repeat centromere function is conserved in eukaryotes, centromeric is quantitatively variable among the 12 rice centromeres sequences appear to be variable (1,2). It is believed that a detected by the ¯uorescence intensities of the ¯uorescence complete sequence of a chromosome centromere is necessary in situ hybridization signals (18). Though some chromosomes for fully understanding centromere function. Centromere have a similar centromere size to those in other species functions can be recapitulated by arti®cial chromosome constructs (3,4) or chromosome fragments (5,6), revealing (>1 Mb), the centromeres of several chromosomes are the important role of speci®c centromere sequences. surprisingly small and can be fully covered by BAC contigs Except for the centromere of the budding yeast constructed using the normal available technical approach. Saccharomyces cerevisiae, which consists of only ~125 bp Thus, rice provides an opportunity to obtain the full *To whom correspondence should be addressed. Tel: +86 21 64845260; Fax: +86 21 64825775; Email: [email protected] Nucleic Acids Research, Vol. 32 No. 6 ã Oxford University Press 2004; all rights reserved 2024 Nucleic Acids Research, 2004, Vol. 32, No. 6 centromere sequence and the truly complete understanding of Sequence analysis centromere sequence composition and organization. DNA sequences similar to the BAC sequences were searched Some DNA components of rice centromeres have been in the GenBank database and the TIGR Repeat Database reported. The centers of rice centromeres are occupied by two (ftp://ftp.tigr.org/pub/data/TIGR_Plant_Repeats/) using the kinds of repetitive elements: the 155-bp satellite repeat CentO BLASTN homology search software. Sequence alignments and the centromere-speci®c retrotransposons (18). CentO between different CentO monomers were performed and satellites were found to be located exclusively in rice re®ned manually using GeneDoc (http://www.psc.edu/ centromeres and were regarded as a key component of biomed/genedoc). The ages of full-length retroelements were functional rice centromeres. The retroelements found in the measured by comparing their 5¢ and 3¢ long terminal repeat rice centromere, such as RCS1 (19), RCB11 (20,21), RIRE7 (LTR) sequences (26). Kimura-2 parameter distances between (22,23), are mostly derived from gypsy-like retrotransposon the two LTRs of individual elements were calculated using family. With the rapid progress of the rice genome sequen- MEGA program (http://www.megasoftware.net/). The re- cing, complete sequence composition and structure of rice ±9 ported substitution rate of 6.5 3 10 per synonymous site centromere become available now. per year for grasses (27) was used to estimate the ages of the Here, the physical structure of the rice chromosome 4 elements. centromere was determined directly by sequencing a large insert clone-based contig, which covered the centromere. The unique structures of the rice chromosome 4 centromere were RESULTS identi®ed. Some of the structures appear to be speci®c to the Physical mapping of the chromosome 4 centromeric rice centromeres. region As part of an international effort to completely sequence the rice genome, we constructed a comprehensive clone-based MATERIALS AND METHODS physical map of chromosome 4 of O.sativa ssp. japonica Materials and physical map construction Nipponbare, consisting of four contiguous BAC clones Construction of a clone-based physical map of the rice (contigs), through an integrated approach. Two large insert chromosome 4 was described previously (24). The rice Oryza BAC libraries (OSJNBa and OSJNBb) and high-quality of sativa ssp. japonica cv. Nipponbare clones used for construct- DNA ®ngerprinting data allowed us to construct big contigs ing the physical map of the centromere contig were from four covering nearly all regions of the rice O.sativa ssp japonica genomic libraries: two BAC libraries of OSJNBa and OSJNBb Nipponbare genome, including regions of high repetitive provided by Clemson University Genomics Institute (CUGI); DNA sequences (28). Contig3 (Fig. 1), the second largest one PAC library provided by Rice Genome Research Program contig of 12.9 Mb, fully covered the genetic region from 19.6 (RGP); and some of BACs were provided by the Monsanto. to 19.9 cM, where the centromere had been located by the genetic mapping (29,30). Using the key centromere DNA sequencing and assembly component CentO satellite as a probe, we screened the OSJNBb0062N22 and other clones were puri®ed by caesium BACs in Contig3 tiling path for sequencing, and found that chloride-gradient. For a shotgun approach, sheared BAC DNA only OSJNBb0062N22 and the overlap region between (2±3 kb) was ligated into a pBluescript vector and transformed OSJNBb0062N22 and OSJNBa0032B23 contained the into Escherichia coli DH5a. The shotgun subclones were CentO satellites. The BAC clone OSJNBb0062N22 located sequenced from both ends by the dideoxy chain termination at the position ~0.4 Mb in Contig3 and the gap between method using either BigDye Terminator Cycle sequencing Contig2 and Contig3 was identi®ed as a chloroplast genome V2.0 Ready Reaction (Applied Biosystems) or DYEnamic ET insertion (our unpublished data). Thus, the chromosome 4 Dye Terminator Kit (MegaBACE; Amersham Pharmacia centromere core region was fully covered by the clone Biotech, Inc.). Most of the reactions were analyzed on OSJNBb0062N22 in Contig3 (Fig. 1). ABI3730 sequencers and Megabace 1000 capillary sequen- Sequencing and assembly of the centromere BAC clone cing machines. The shotgun sequences were assembled using the PHRED and PHRAP programs ®rst (25) and primary The centromeric BAC clone, OSJNBb0062N22, was se- assembly results were re®ned by careful manual checking to quenced by a random shotgun approach on both strands of overcome the misalignments caused by repeats. Manual subclones of ~2±3 kb and achieved a 10-fold coverage. The editing corrected many mis-assemblies caused by automatic sequences were assembled by PHRED and PHARP ®rst, and assembly software such as excessive coverage in some regions primary assembly result was re®ned with careful manual and the separation of two ends sequence pairs from one checking to overcome the misalignments caused by repeats. subclone. The empirical result of PFGE and several restriction The assembled size of the BAC clone agreed with the size enzymes pro®ling were used to validate the length and the determined by PFGE. The validity of sequence assembly was accuracy of the assembly. Sequence gaps were closed by using further veri®ed by in silico and empirical pro®ling with various dye-labeled terminator chemistries or by a combin- several restriction enzymes. The adjacent clones ation approach of primer walking and PCR with oligonucleo- OSJNBb0026I12 and OSJNBa0032B23 were assembled sep- tides. Sequence regions of poor quality were re-sequenced arately and the large overlap regions showed identical from cloned plasmids. The nucleotide sequence of BAC sequence composition and organization. All of these evi- OSJNBb0062N22 has been deposited in GenBank under the dences suggested that the quality and assembly of the accession number BX890594. sequence were of high accuracy and were reliable. The total Nucleic Acids Research, 2004, Vol. 32, No. 6 2025 Figure 1. Map of the centromere region of rice chromosome 4. Four contigs, which covered the entire chromosome 4, are indicated in orange and as described. A part of tiling path of BAC clones of Contig3, which covered the whole centromeric region (yellow) was shown. It was also indicated that the genetic distance between markers S11182 (19.9 cM) and E21001S (19.9 cM) corresponds to the physical distance of 1200±1300 kb (24,29). The restriction enzyme sites of the BAC OSJNBb0062N22 of 181 kb, which contain all CentO repeats were indicated by the black vertical short-lines and described as following: B = BbvCI, F = FseI, N = NotI, H = HpaI. The CentO region was centralized on the BAC clone OSJNBb0062N22 from 19 781 to 174 142 bp, indicated by two red vertical arrows. The detailed distribution of CentO satellite repeats was shown in a square box. Each dot represented one satellite monomer. length of BAC OSJNBb0062N22 was 181 586 bp and fully CentO repeats were found in all 18 tracts and it seemed that covered the CentO repeats ranging from 49 781 to 174 142 bp, 165 bp CentO monomers were inclined to appear towards the thus de®ning the core region of the centromere. short arm telomere and 155 bp ones were biased towards the other end. A lot of incomplete CentO monomers were also Sequence contents of the centromere core region and found in the array. They existed as internal deletions, CentO satellite repeats fragments lacking the 5¢ or 3¢ ends and short internal fragments. The deletion usually happened in a different Overall, the 124 kb centromere core region consisted of two position in different monomers but some incomplete mono- kinds of repetitive elements: the 155/165 bp CentO satellite mers shared the same deletion patterns. Figure 3 showed a repeats and retroelements (Fig. 2). The centromeric retro- monomer named as CentO-4A23, with the internal 47 bp elements (CRs), positioned among the tandem CentO mono- deletion repeated nine times. About half of incomplete mer arrays, divided the CentO arrays into 18 tracts. CentO monomers existed at the edge of the tracts. Some of these tracts were separated by the retroelements on both sides. truncated monomers at ends of adjacent tracts could be Eighteen CentO tracts (designated CentO-4A to CentO-4R) merged into one complete monomer by removing the inserted were found to be dispersed in the centromere core region (Fig. 2). The length of CentO tracts ranged widely from 477 to retroelement, indicating that the CentO arrays were usually 8571 bp (Table 1). The total length of all CentO repeats was 58 disrupted by CRs. Interestingly, the retroelements seemed 865 bp, representing 47% of the centromere core region. preferentially inserted into two target sites of 85 bp and 125± Unexpectedly, the directions of the 18 tracts were not the same 128 bp of the 155 bp CentO repeat through monitoring nine and four tracts, which located in the internal part of the core insertion events (Fig. 3). region (CentO-4H, I, J and N) were found in the opposite Although all CentO monomers were well conserved in orientation, and tracts CentO-4H, I and J were highly length, identical repeats among the CentO arrays appeared to homologous to CentO-4E and F but in the opposite orienta- be rare. Except some conserved nucleotide, the polymorph- tion. The orientations of CentO monomers in each tract were isms were dispersed along the whole 155 bp consensus the same. satellite sequence. Most polymorphism sites were only The chromosome 4 centromere had 379 copies of CentO variations in one monomer while some have variations in satellite repeat in the 18 tracts. According to their lengths and many monomers. The identities between different monomers structures, these CentO monomers could be classi®ed into were mostly from 90 to 98%, but the divergences of the three subgroups: 155 bp CentO (154 copies), 165 bp CentO monomers in the CentO tracts near the edge of the centromere (161 copies) and incomplete CentO (64 copies). The 165-bp core region were more apparent than others. Some of them CentO had a 10-nt duplication (TATTGGCATA, see Fig. 3), were only <85% identical to the consensus sequence. In maize compared with the 155-bp monomer. These two types of and Arabidopsis, chromatin immunoprecipitation studies have 2026 Nucleic Acids Research, 2004, Vol. 32, No. 6 Figure 2. The complete organization of the tandemly repeated arrays of CentO sequences and CRs in the core region of rice chromosome 4 centromere. The horizontal red arrows represented the length and orientation of the 18 CentO tracts. The detail monomer's identity and orientation were shown by the black and red dots. The y axis showed different identity between these monomers. The CentO tracts were separated by 19 retroelements of four types. Four type retroelements were marked by four different colors, individually. Table 1. Detailed information of the 18 CentO tracts of rice chromosome 4 centromere Tract Position Length Orientation Copy number 155 bp CentO 165 bp CentO Incomplete CentO Identity CentO-4A 1±8571 8571 + 55 14 35 6 84±97% CentO-4B 9366±10 486 1121 + 8 3 3 2 93±95% CentO-4C 14 778±19 078 4301 + 28 9 16 3 92±98% CentO-4D 20 753±24 877 4125 + 26 11 13 2 92±100% CentO-4E 32 620±36 620 4001 + 26 8 15 3 92±98% CentO-4F 37 411±38 948 1538 + 11 4 5 2 93±98% CentO-4G 45 200±49 506 4307 + 28 7 17 4 86±96% CentO-4H 54 200±57 022 2823 ± 19 7 9 3 90±98% CentO-4I 57 815±59 712 1898 ± 13 4 6 3 93±97% CentO-4J 60 507±63 570 3064 ± 20 4 12 4 92±98% CentO-4K 70 771±71 247 477 + 4 1 1 2 98% CentO-4L 72 077±73 739 1663 + 11 3 6 2 92±96% CentO-4M 79 992±81 005 1014 + 7 3 3 1 93±94% CentO-4N 81 749±83 514 1766 ± 13 6 2 5 90±95% CentO-4O 835 34±87 515 3982 + 26 21 3 2 88±96% CentO-4P 100 834±106 044 5211 + 31 22 4 5 84±94% CentO-4Q 111 043±113 628 2586 + 18 9 5 4 82±96% CentO-4R 117 925±124 341 6417 + 35 18 6 11 78±95% Total 1±124 341 58 865 379 154 161 64 shown that only a portion of the centromeric satellites are whether this is related to the variable divergence found involved in the centromeric function (31,32). It is still unclear between different satellites. Nucleic Acids Research, 2004, Vol. 32, No. 6 2027 Figure 3. Sequence comparison between different rice chromosome 4 CentO monomers. From top to bottom: four typical 155-bp monomers, four typical 165-bp monomers, four monomers with several nucleotide indels, two monomers at the edge of centromere core region with the most variations. Sequence conservation between different monomers was indicated by background shading. Dark shading represented 100% conservation and light shading >80%. The preferential insertion sites of CR were indicated by the vertical arrows. Rice CRs in rice centromere regions (22). These two types were closely related because their coding sequences shared ~80% similarity Nineteen retroelements, within the 18 CentO tracts, were despite different LTRs and 5¢ UTR regions. They were mostly divided into four types depending on their structure and fragmented and truncated retroelements except for a single sequence homology (Fig. 4). intact type II retrotransposon, CR4-4. Type I retroelements belonged to a novel rice retro- The type IV element was actually a kind of repetitive transposon family. Eight retroelements of this type, including sequence with no homology to any known repeats in rice ®ve solo LTRs, two intact retrotransposons and one incom- collected by the TIGR Repeat Database, but it was found to be plete copy, were found in the centromere core region. In this dispersed along the whole chromosome 4. The only copy of 124 kb core region, we detected only solo LTRs that belong to type IV repeat in chromosome 4 centromere region was this type. Solo LTRs are thought to arise from an intra-element divided into two parts (CR4-15 and CR4-17) by an intact type recombination between paired LTRs and there would be few I retrotransposon (CR4-16). solo LTRs in the regions where recombination rates are low. It The types I, II and III retroelements were all typical LTR- is interesting to note that there were still some solo LTRs retrotransposons and had similar primer binding sites (PBS) within the centromere since recombination in centromeric complementary to the 3¢ end of initiator methionyl tRNA and regions is known to be highly suppressed. The solo LTRs in polypurine tracts (PPT). Through sequence homology centromere suggested that there was no signi®cant correlation searches with the genetically anchored publicly available between the frequency of solo LTR formation and regional rice genome sequence, we found that these retroelements were recombination rates, and that different types of solo LTRs may primarily located in the centromeric or pericentromeric be formed by several different processes. Except the solo heterochromatin regions. However, additional copies of LTRs, CR4-2 and CR4-16 were two intact elements with the these elements could be found elsewhere across all rice coding capacity of a gag protein. Despite having structural chromosomes. The retroelements were not as unique to the features of LTR retrotransposons, CR4-19 near the edge of the centromere as CentO, the other centromere component. core region diverged extensively and lacked the full ORF that Based on sequence divergence between 5¢/3¢ LTRs (Kimura codes for the polyprotein, suggesting that this element was parameter distances) and an estimate of the nucleotide ±9 likely non-autonomous. substitution rate for grasses (6.5 3 10 substitutions per Type II and type III retroelements were typical Ty3-gypsy synonymous site per year) (27), we measured the insertion class retrotransposons with a relatively large 5¢ UTR and a time of the three retrotransposons (CR4-2, CR4-16 and CR4- polyprotein reading frame that overlapped the downstream 4) to be 0.88, 0.19 and 1.63 million years, respectively. The LTR; furthermore, type III represented the gypsy-type retro- ages of other retroelements were dif®cult to measure because transposon RIRE7 identi®ed previously with the character of of the signi®cant sequence degeneracy and the loss of the preferential insertion into the tandem repeat sequence (CentO) pair of complete LTRs. These data suggested that type I 2028 Nucleic Acids Research, 2004, Vol. 32, No. 6 Figure 4. Detailed structural features of three types of CRs. CR4-2, CR4-16 and CR4-11 were intact retroelements. Others were fragments of retroelements. Types I, II and III retroelements were indicated in green, light blue and dark blue, respectively. It was also indicated that the previous identi®ed rice centromeric repetitive DNA fragments (RCE1, RCS1, RCH2, RCH1) were found to correspond to type I or II retroelements. retroelements transposed more recently than type II retro- and the pericentric region of the long arm is highly elements. Homology searches also showed that the type I heterochromatinized (34). Though chromosomes 4 and 8 retroelements only exist in rice, whereas types II and III have similar sizes of CentO repeats (59 and 68 kb), more retroelements inserted into the CentO repeats of chromosome retroelements have homologs in other cereals. This suggested 4 centromere core region. In addition, though the CR elements that type I retroelements have transposed into the centromere of rice chromosome 8 centromere were much less than that of more recently than other elements and may be more speci®c to chromosome 4, an intact RIRE7 retrotransposon (type III rice centromeres. retroelement) was found in the chromosome 8 centromeric We identi®ed three intact centromeric retrotransposons region. (CR4-2, CR4-4 and CR4-16) in the chromosome 4 centromere (Fig. 4), while no intact retrotransposon in the centromere had been found in rice in the previous research (21). These intact retrotransposons, which included the LTRs, PBSs and PPTs DISCUSSION immediately internal to the LTR and the retrotransposon The centromere is the most essential element of chromosomes reading frames, belonged to types I and II retroelements for the faithful segregation and inheritance of genetic infor- separately, indicating that they had the possibilities to be mation in higher eukaryotic species. However, centromeric actively transcribed. The presence of the full intact retro- sequences in different species appear to be variable, although transposons provided evidence that the CR elements ¯anking their function is highly conserved (1,2). There seems no doubt the satellite DNA were capable of initiating transcription, that achieving a complete sequence of a chromosome which is thought to be relative to the functional centromere centromere will contribute to establishing the sequences initiation by a transcription-mediated mechanism (33). required for the function and a better understanding of the function of the centromere. Unfortunately, so far, the Comparison with the rice chromosome 8 centromere- sequences of natural centromeres are not completely available related sequences yet, although some detailed sequences about centromere Rice chromosome 8 has also been identi®ed to be with limited regions have been studied (15,16). The sequence structure of amount of CentO repeats (18). We searched a rice chromo- the centromere of rice chromosome 4 reported here is, to our some 8 centromere-related sequence of B1052H09 knowledge, the ®rst complete centromere sequence, at least in (AP006480) that was completely sequenced and assembled rice. from GenBank. Comparative analysis revealed that the CentO DNA sequences associated with centromeric regions have spanned region of chromosome 8 centromere was 78 kb and been reported in numerous species including some plant the length of CentO repeats were 68 kb. Chromosome 4 has species. The centromeres of higher eukaryotic species are the most heterochromatic region in the rice genome. About mainly composed of satellite repeats and other repetitive one-third of the chromosome including the entire short arm elements. In rice chromosome 4, tandemly arrayed repeats Nucleic Acids Research, 2004, Vol. 32, No. 6 2029 CentO and 19 CRs constituted the centromere core region. The Since there are no centromeric regions that have been 155 bp CentO repeats had the similar monomer length with the completely sequenced in higher plants, the full structural information of the centromere of rice chromosome 4 may shed centromeric satellite repeats in other species, such as alpha some light on the functional dissection of a centromere and satellite in human (15), pAL1 in Arabidopsis (16) and CentC identi®cation of minimal sequences that provide the centro- in maize (35). The conservative repeat lengths found for most mere function. centromeric satellites are thought to be corresponding to the nucleosomal unit lengths (1). Retroelements have been reported to be conserved components of cereal centromeres ACKNOWLEDGEMENTS whereas the retrotransposon family shows no such clear localization in Arabidopsis genome (21). Previous researches This work was supported by the grants from the Ministry of have identi®ed a highly conserved Ty3/gypsy-like retro- Sciences and Technology (2002AA2Z1003 and transposon family in the centromeres of maize (35), sorghum 2003AA222091), the Chinese Academy of Sciences, the (36), barley (37), rice (18,22,23) and many other cereals (21). Shanghai Municipal Commission of Sciences and Technology Several centromeric repetitive DNA elements have been (038019315) and the National Natural Science Foundation of identi®ed to play important roles in interactions with the China (30221002 and 30325014). centromeric-speci®c histone H3 variant (31,32). And the complete composition of centromeric repeats will provide a platform for identi®cation of centromere function and minimal REFERENCES sequences that provide centromere function. The preferential 1. Henikoff,S., Ahmad,K. and Malik,H.S. (2001) The centromere paradox: insertion of the CR elements among CentO repeats indicates stable inheritance with rapidly evolving DNA. Science, 293, 1098±1102. that the centromere-speci®c retroelements have contributed to 2. Sullivan,B.A., Blower,M.D. and Karpen,G.H. (2001) Determining centromere identity: cyclical stories and forking paths. Nature Rev. centromere expansion during evolution. Among the CentO Genet., 2, 584±596. tracts, four tracts of 65 CentO repeats were located in an 3. Harrington,J.J., Van,Bokkelen,G., Mays,R.M., Gustashaw,K. and opposite orientation. Our study also reveals that type I Willard,H.F. (1997) Formation of de novo centromeres and construction retroelements have transposed into the centromere more of ®rst-generation human arti®cial microchromosomes. Nature Genet., recently than other elements and may be more speci®c to 15, 345±355. 4. Ikeno,M., Grimes,B., Okazaki,T., Nakano,M., Saitoh,K., Hoshino,H., rice centromeres. McGill,N.I., Cooke,H. and Masumoto,H. (1998) Construction of YAC- The structure of the complete rice chromosome 4 based mammalian arti®cial chromosomes. Nat. Biotechnol., 16, 431±439. centromere suggests that a certain number of tandemly 5. Murphy,T.D. and Karpen,G.H. (1995) Localization of centromere arrayed repeats and at least one intact retrotransposon element function in a Drosophila minichromosome. Cell, 82, 599±609. 6. Kaszas,E. and Birchler,J.A. (1996) Misdivision analysis of centromere might be necessary for maintaining the full centromere structure in maize. EMBO J., 15, 5246±5255. function in higher plants through a transcription-mediated 7. Clarke,L. (1990) Centromeres of budding and ®ssion yeast. Trends mechanism. Recent studies in ®ssion yeast suggest that either, Genet., 6, 150±154. or both, tandem repeats and LTR retrotransposons may play a 8. Clarke,L. (1998) Centromeres: proteins, protein complexes, and repeated domains at centromeres of simple eukaryotes. Curr. Opin. Genet. Dev., 8, role in the heterochromatinization of centromeric DNA 212±218. through an RNA interference (RNAi) related mechanism 9. The C. elegans Sequencing Consortium. (1998) Genome sequence of the (38±40). The heterochromatin that coats centromeric repeats is nematode C. elegans: a platform for investigating biology. Science, 11, required for the assembly of an active centromere. As in 2012±2018. ®ssion yeast, the centromeric heterochromatins of most 10. The Arabidopsis Genome Initiative. (2000) Analysis of the genome sequence of the ¯owering plant Arabidopsis thaliana. Nature, 408, 796± eukaryotes are frequently made up of tandem arrays of simple satellites interspersed with other repetitive elements (e.g. 11. Venter,J.C., Adams,M.D., Myers,E.W., Li,P.W., Mural,R.J., Sutton,G.G., retrotransposons). Those tandem repeats and repetitive elem- Smith,H.O., Yandell,M., Evans,C.A., Holt,R.A. et al. (2001) The ents would be subject to triggering chromatin modi®cation by sequence of the human genome. Science, 291, 1304±1351. 12. International Human Genome Sequencing Consortium. (2001) Initial the same mechanism as long as one strand is transcribed and sequencing and analysis of the human genome. Nature, 409, 860±921. the components of the RNAi machinery are recruited. 13. Sasaki,T., Matsumoto,T., Yamamoto,K., Sakata,K., Baba,T., Rice provides an almost perfect model to obtain the full Katayose,Y., Wu,J., Niimura,Y., Cheng,Z., Nagamura,Y. et al. (2002) centromere sequence, for several rice chromosomes contain a The genome sequence and structure of rice chromosome 1. Nature, 420, 312±316. limited amount of the satellite repeats. Beside the centromere 14. Feng,Q., Zhang,Y., Hao,P., Wang,S., Fu,G., Huang,Y., Li,Y., Zhu,J., of chromosome 4, the centromere of chromosome 8 is also Liu,Y., Hu,X. et al. (2002) Sequence and analysis of rice chromosome 4. small enough to be covered by constructed BAC contigs and Nature, 420, 316±320. be sequenced. We compared the only two available centro- 15. Schueler,M.G., Higgins,A.W., Rudd,M.K., Gustashaw,K. and Willard,H.F. (2001) Genomic and genetic de®nition of a functional mere sequences in rice till now, the complete sequence in human centromere. Science, 294, 109±115. chromosome 4 and the centromere-related sequence in 16. Copenhaver,G.P., Nickel,K., Kuromori,T., Benito,M.I., Kaul,S., Lin,X., chromosome 8 obtained from GenBank, to detect the differ- Bevan,M., Murphy,G., Harris,B., Parnell,L.D. et al. (1999) Genetic entiation of centromere sequence between chromosomes in de®nition and sequence analysis of Arabidopsis centromeres. Science, 286, 2468±2474. the same species. The two centromeres comprised almost the 17. Sun,X., Le,H.D., Wahlstrom,J.M. and Karpen,G.H. (2003) Sequence same amount of CentO repeats, whereas the number of analysis of a functional Drosophila centromere. Genome Res., 13, 182± retroelements varied, which indicated that the similarity between the centromere sequences might be associated with 18. Cheng,Z., Dong,F., Langdon,T., Ouyang,S., Buell,C.R., Gu,M., their function. Blattner,F.R. and Jiang,J. (2002) Functional rice centromeres are marked 2030 Nucleic Acids Research, 2004, Vol. 32, No. 6 by a satellite repeat and a centromere-speci®c retrotransposon. Plant 30. Wu,J., Maehara,T., Shimokawa,T., Yamamoto,S., Harada,C., Cell, 14, 1691±1704. Takazaki,Y., Ono,N., Mukai,Y., Koike,K., Yazaki,J. et al. (2002) A 19. Dong,F., Miller,J.T., Jackson,S.A., Wang,G.L., Ronald,P.C. and Jiang,J. comprehensive rice transcript map containing 6591 expressed sequence (1998) Rice (Oryza sativa) centromeric regions consist of complex DNA. tag sites. Plant Cell, 14, 525±535. Proc. Natl Acad. Sci. USA, 95, 8135±8140. 31. Nagaki,K., Talbert,P.B., Zhong,C.X., Dawe,R.K., Henikoff,S. and 20. Nomomura,K.I. and Kurata,N. (1999) Organization of the 1.9-kb repeat Jiang,J. (2003) Chromatin immunoprecipitation reveals that the 180-bp unit RCE1 in the centromeric region of rice chromosomes. Mol. Gen. satellite repeat is the key functional DNA element of Arabidopsis Genet., 261, 1±10. thaliana centromeres. Genetics, 163, 1221±1225. 21. Langdon,T., Seago,C., Mende,M., Leggett,M., Thomas,H., Forster,J.W., 32. Zhong,C.X., Marshall,J.B., Topp,C., Mroczek,R., Kato,A., Nagaki,K., Thomas,H., Jones,R.N. and Jenkins,G. (2000) Retrotransposon evolution Birchler,J.A., Jiang,J. and Dawe,R.K. (2002) Centromeric retroelements in diverse plant genomes. Genetics, 156, 313±325. and satellites interact with maize kinetochore protein CENH3. Plant Cell, 22. Kumekawa,N., Ohmido,N., Fukui,K., Ohtsubo,E. and Ohtsubo,H. (2001) 14, 2825±2836. A new gypsy-type retrotransposon, RIRE7: preferential insertion into the 33. Jiang,J., Birchler,J.A., Parrott,W.A. and Dawe,R.K. (2003) A molecular tandom repeat sequence TrsD in pericentromeric heterochromatin view of plant centromeres. Trends Plant Sci., 8, 570±575. regions of rice chromosomes. Mol. Genet. Gemomics, 265, 480±488. 34. Cheng,Z., Buell,C.R., Wing,R.A., Gu,M. and Jiang,J. (2001) Toward a 23. Nonomura,K.I. and Kurata,N. (2001) The centromere composition of cytological characterization of the rice genome. Genome Res., 11, 2133± multiple repetitive sequences on rice chromosome 5. Chromosoma, 110, 284±291. 35. Ananiev,E.V., Phillips,R.L. and Rines,H.W. (1998) Chromosome- 24. Zhao,Q., Zhang,Y., Cheng,Z., Chen,M., Wang,S., Feng,Q., Huang,Y., speci®c molecular organization of maize (Zea mays L.) centromeric Li,Y., Tang,Y., Zhou,B. et al. (2002) A ®ne physical map of the rice chromosome 4. Genome Res., 12, 817±823. regions. Proc. Natl Acad. Sci. USA, 95, 13073±13078. 25. Ewing,B. and Green,P. (1998) Base-calling of automated sequencer 36. Miller,J.T., Dong,F., Jackson,S.A., Song,J. and Jiang,J. (1998) traces using phred. II. Error probabilities. Genome Res., 8, 186±194. Retrotransposon-related DNA sequences in the centromeres of grass 26. SanMiguel,P., Gaut,B.S., Tikhonov,A., Nakajima,Y. and Bennetzen,J.L. chromosomes. Genetics, 150, 1615±1623. (1998) The paleontology of intergene retrotransposons of maize. Nature 37. Hudakova,S., Michalek,W., Presting,G.G., ten Hoopen,R., dos Santos,K., Genet., 20, 43±45. Jasencakova,Z. and Schubert,I. (2001) Sequence organization of barley 27. Gaut,B.S., Morton,B.R., McCaig,B.C. and Clegg,M.T. (1996) centromeres. Nucleic Acids Res., 29, 5029±5035. Substitution rate comparisons between grasses and palms: synonymous 38. Volpe,T., Schramke,V., Hamilton,G.L., White,S.A., Teng,G., rate differences at the nuclear gene Adh parallel rate differences at the Martienssen,R.A. and Allshire,R.C. (2003) RNA interference is required plastid gene rbcL. Proc. Natl Acad. Sci. USA, 93, 10274±10279. for normal centromere function in ®ssion yeast. Chromosome Res., 11, 28. Chen,M., Presting,G., Barbazuk,W.B., Goicoechea,J.L., Blackmon,B., 137±146. Fang,G., Kim,H., Frisch,D., Yu,Y., Sun,S. et al. (2002) An integrated 39. Martienssen,R.A. (2003) Maintenance of heterochromatin by RNA physical and genetic map of the rice genome. Plant Cell, 14, 537±545. interference of tandem repeats. Nature Genet., 35, 213±214. 29. Harushima,Y., Yano,M., Shomura,A., Sato,M., Shimano,T., Kuboki,Y., 40. Schramke,V. and Allshire,R. (2003) Hairpin RNAs and retrotransposon Yamamoto,T., Lin,S.Y., Antonio,B.A., Parco,A. et al. (1998). A high- LTRs effect RNAi and chromatin-based gene silencing. Science, 301, density rice genetic linkage map with 2275 markers using a single F2 1069±1074. population. Genetics, 148, 479±494. http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Nucleic Acids Research Oxford University Press

Loading next page...
 
/lp/oxford-university-press/structural-features-of-the-rice-chromosome-4-centromere-pI0IW4ivgn

References (42)

Publisher
Oxford University Press
Copyright
Oxford University Press
ISSN
0305-1048
eISSN
1362-4962
DOI
10.1093/nar/gkh521
pmid
15064362
Publisher site
See Article on Publisher Site

Abstract

Published online April 2, 2004 Nucleic Acids Research, 2004, Vol. 32, No. 6 2023±2030 DOI: 10.1093/nar/gkh521 Structural features of the rice chromosome 4 centromere Yu Zhang, Yuchen Huang, Lei Zhang, Ying Li, Tingting Lu, Yiqi Lu, Qi Feng, Qiang Zhao, 1 1 2 Zhukuan Cheng , Yongbiao Xue , Rod A. Wing and Bin Han* National Center for Gene Research, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, 500 Caobao Road, Shanghai 200233, China, Institute of Genetics and Developmental Biology, Chinese Academy of Sciences, Datun Road, Andingmenwai, Beijing 100101, China and Department of Plant Sciences, Arizona Genomics Institute, The University of Arizona, Tucson, AZ 85721, USA Received January 4, 2004; Revised February 12, 2004; Accepted March 10, 2004 ABSTRACT unique sequence (7,8), most eukaryotic species are composed of long highly repetitive DNA sequences. The currently A complete sequence of a chromosome centromere available eukaryotic genome sequencing projects have pro- is necessary for fully understanding centromere vided the virtually complete physical maps and sequences of function. We reported the sequence structures of many species, including Caenorhabditis elegans (9), the ®rst complete rice chromosome centromere Arabidopsis thaliana (10), Homo sapiens (11,12) and Oryza through sequencing a large insert bacterial arti®cial sativa (13,14) in the last few years. However, as a highly chromosome clone-based contig, which covered heterochromatic part of the chromosome, the centromere is still left to be a big `gap' to be sequenced. The highly the rice chromosome 4 centromere. Complete repetitive DNA is more dif®cult to map, clone, sequence and sequencing of the 124-kb rice chromosome 4 cen- assemble than the low copy number DNA. Though some tromere revealed that it consisted of 18 tracts of 379 detailed sequences about centromere regions have been tandemly arrayed repeats known as CentO and a analyzed, for example the satellite arrays of human centro- total of 19 centromeric retroelements (CRs) but no meres (15) and the pericentromeric regions on Arabidopsis unique sequences were detected. Four tracts, com- chromosomes (16), the sequences are not completely available posed of 65 CentO repeats, were located in the yet and the genome sequencing project has left the big opposite orientation, and 18 CentO tracts were challenge of determining the primary sequence of a functional ¯anked by 19 retroelements. The CRs were classi- higher eukaryotic centromere. The full length of natural ®ed into four types, and the type I retroelements centromeres in most eukaryotes is at the megabase level appeared to be more speci®c to rice centromeres. generally. Such a long repeat DNA region is the most dif®cult The preferential insert of the CRs among CentO barrier to cloning and sequencing it. The ordinary approach is repeats indicated that the centromere-speci®c retro- to use different methods that isolate speci®c centromere elements may contribute to centromere expansion regions from the rest of the genome. A successful example is the use of pulse ®eld gel electrophoresis (PFGE) for isolation during evolution. The presence of three intact retro- of the g1230 minichromosome derivative as a template for transposons in the centromere suggests that they cloning and sequencing a signi®cant proportion of the may be responsible for functional centromere initi- functional Drosophila centromere (17). Bacterial arti®cial ation through a transcription-mediated mechanism. chromosome (BAC) clones and similar clones with large insert size are also a good choice for isolation of centromere sequences from other genomic regions. However, this INTRODUCTION approach still has limitations for manipulation of a full centromere region such as those in human and Arabidopsis. The centromere is essential for correct segregation of Rice is an exception in the important model species because of chromosomes in both mitotic and meiotic cells. Although its different centromere size, and the size of the satellite repeat centromere function is conserved in eukaryotes, centromeric is quantitatively variable among the 12 rice centromeres sequences appear to be variable (1,2). It is believed that a detected by the ¯uorescence intensities of the ¯uorescence complete sequence of a chromosome centromere is necessary in situ hybridization signals (18). Though some chromosomes for fully understanding centromere function. Centromere have a similar centromere size to those in other species functions can be recapitulated by arti®cial chromosome constructs (3,4) or chromosome fragments (5,6), revealing (>1 Mb), the centromeres of several chromosomes are the important role of speci®c centromere sequences. surprisingly small and can be fully covered by BAC contigs Except for the centromere of the budding yeast constructed using the normal available technical approach. Saccharomyces cerevisiae, which consists of only ~125 bp Thus, rice provides an opportunity to obtain the full *To whom correspondence should be addressed. Tel: +86 21 64845260; Fax: +86 21 64825775; Email: [email protected] Nucleic Acids Research, Vol. 32 No. 6 ã Oxford University Press 2004; all rights reserved 2024 Nucleic Acids Research, 2004, Vol. 32, No. 6 centromere sequence and the truly complete understanding of Sequence analysis centromere sequence composition and organization. DNA sequences similar to the BAC sequences were searched Some DNA components of rice centromeres have been in the GenBank database and the TIGR Repeat Database reported. The centers of rice centromeres are occupied by two (ftp://ftp.tigr.org/pub/data/TIGR_Plant_Repeats/) using the kinds of repetitive elements: the 155-bp satellite repeat CentO BLASTN homology search software. Sequence alignments and the centromere-speci®c retrotransposons (18). CentO between different CentO monomers were performed and satellites were found to be located exclusively in rice re®ned manually using GeneDoc (http://www.psc.edu/ centromeres and were regarded as a key component of biomed/genedoc). The ages of full-length retroelements were functional rice centromeres. The retroelements found in the measured by comparing their 5¢ and 3¢ long terminal repeat rice centromere, such as RCS1 (19), RCB11 (20,21), RIRE7 (LTR) sequences (26). Kimura-2 parameter distances between (22,23), are mostly derived from gypsy-like retrotransposon the two LTRs of individual elements were calculated using family. With the rapid progress of the rice genome sequen- MEGA program (http://www.megasoftware.net/). The re- cing, complete sequence composition and structure of rice ±9 ported substitution rate of 6.5 3 10 per synonymous site centromere become available now. per year for grasses (27) was used to estimate the ages of the Here, the physical structure of the rice chromosome 4 elements. centromere was determined directly by sequencing a large insert clone-based contig, which covered the centromere. The unique structures of the rice chromosome 4 centromere were RESULTS identi®ed. Some of the structures appear to be speci®c to the Physical mapping of the chromosome 4 centromeric rice centromeres. region As part of an international effort to completely sequence the rice genome, we constructed a comprehensive clone-based MATERIALS AND METHODS physical map of chromosome 4 of O.sativa ssp. japonica Materials and physical map construction Nipponbare, consisting of four contiguous BAC clones Construction of a clone-based physical map of the rice (contigs), through an integrated approach. Two large insert chromosome 4 was described previously (24). The rice Oryza BAC libraries (OSJNBa and OSJNBb) and high-quality of sativa ssp. japonica cv. Nipponbare clones used for construct- DNA ®ngerprinting data allowed us to construct big contigs ing the physical map of the centromere contig were from four covering nearly all regions of the rice O.sativa ssp japonica genomic libraries: two BAC libraries of OSJNBa and OSJNBb Nipponbare genome, including regions of high repetitive provided by Clemson University Genomics Institute (CUGI); DNA sequences (28). Contig3 (Fig. 1), the second largest one PAC library provided by Rice Genome Research Program contig of 12.9 Mb, fully covered the genetic region from 19.6 (RGP); and some of BACs were provided by the Monsanto. to 19.9 cM, where the centromere had been located by the genetic mapping (29,30). Using the key centromere DNA sequencing and assembly component CentO satellite as a probe, we screened the OSJNBb0062N22 and other clones were puri®ed by caesium BACs in Contig3 tiling path for sequencing, and found that chloride-gradient. For a shotgun approach, sheared BAC DNA only OSJNBb0062N22 and the overlap region between (2±3 kb) was ligated into a pBluescript vector and transformed OSJNBb0062N22 and OSJNBa0032B23 contained the into Escherichia coli DH5a. The shotgun subclones were CentO satellites. The BAC clone OSJNBb0062N22 located sequenced from both ends by the dideoxy chain termination at the position ~0.4 Mb in Contig3 and the gap between method using either BigDye Terminator Cycle sequencing Contig2 and Contig3 was identi®ed as a chloroplast genome V2.0 Ready Reaction (Applied Biosystems) or DYEnamic ET insertion (our unpublished data). Thus, the chromosome 4 Dye Terminator Kit (MegaBACE; Amersham Pharmacia centromere core region was fully covered by the clone Biotech, Inc.). Most of the reactions were analyzed on OSJNBb0062N22 in Contig3 (Fig. 1). ABI3730 sequencers and Megabace 1000 capillary sequen- Sequencing and assembly of the centromere BAC clone cing machines. The shotgun sequences were assembled using the PHRED and PHRAP programs ®rst (25) and primary The centromeric BAC clone, OSJNBb0062N22, was se- assembly results were re®ned by careful manual checking to quenced by a random shotgun approach on both strands of overcome the misalignments caused by repeats. Manual subclones of ~2±3 kb and achieved a 10-fold coverage. The editing corrected many mis-assemblies caused by automatic sequences were assembled by PHRED and PHARP ®rst, and assembly software such as excessive coverage in some regions primary assembly result was re®ned with careful manual and the separation of two ends sequence pairs from one checking to overcome the misalignments caused by repeats. subclone. The empirical result of PFGE and several restriction The assembled size of the BAC clone agreed with the size enzymes pro®ling were used to validate the length and the determined by PFGE. The validity of sequence assembly was accuracy of the assembly. Sequence gaps were closed by using further veri®ed by in silico and empirical pro®ling with various dye-labeled terminator chemistries or by a combin- several restriction enzymes. The adjacent clones ation approach of primer walking and PCR with oligonucleo- OSJNBb0026I12 and OSJNBa0032B23 were assembled sep- tides. Sequence regions of poor quality were re-sequenced arately and the large overlap regions showed identical from cloned plasmids. The nucleotide sequence of BAC sequence composition and organization. All of these evi- OSJNBb0062N22 has been deposited in GenBank under the dences suggested that the quality and assembly of the accession number BX890594. sequence were of high accuracy and were reliable. The total Nucleic Acids Research, 2004, Vol. 32, No. 6 2025 Figure 1. Map of the centromere region of rice chromosome 4. Four contigs, which covered the entire chromosome 4, are indicated in orange and as described. A part of tiling path of BAC clones of Contig3, which covered the whole centromeric region (yellow) was shown. It was also indicated that the genetic distance between markers S11182 (19.9 cM) and E21001S (19.9 cM) corresponds to the physical distance of 1200±1300 kb (24,29). The restriction enzyme sites of the BAC OSJNBb0062N22 of 181 kb, which contain all CentO repeats were indicated by the black vertical short-lines and described as following: B = BbvCI, F = FseI, N = NotI, H = HpaI. The CentO region was centralized on the BAC clone OSJNBb0062N22 from 19 781 to 174 142 bp, indicated by two red vertical arrows. The detailed distribution of CentO satellite repeats was shown in a square box. Each dot represented one satellite monomer. length of BAC OSJNBb0062N22 was 181 586 bp and fully CentO repeats were found in all 18 tracts and it seemed that covered the CentO repeats ranging from 49 781 to 174 142 bp, 165 bp CentO monomers were inclined to appear towards the thus de®ning the core region of the centromere. short arm telomere and 155 bp ones were biased towards the other end. A lot of incomplete CentO monomers were also Sequence contents of the centromere core region and found in the array. They existed as internal deletions, CentO satellite repeats fragments lacking the 5¢ or 3¢ ends and short internal fragments. The deletion usually happened in a different Overall, the 124 kb centromere core region consisted of two position in different monomers but some incomplete mono- kinds of repetitive elements: the 155/165 bp CentO satellite mers shared the same deletion patterns. Figure 3 showed a repeats and retroelements (Fig. 2). The centromeric retro- monomer named as CentO-4A23, with the internal 47 bp elements (CRs), positioned among the tandem CentO mono- deletion repeated nine times. About half of incomplete mer arrays, divided the CentO arrays into 18 tracts. CentO monomers existed at the edge of the tracts. Some of these tracts were separated by the retroelements on both sides. truncated monomers at ends of adjacent tracts could be Eighteen CentO tracts (designated CentO-4A to CentO-4R) merged into one complete monomer by removing the inserted were found to be dispersed in the centromere core region (Fig. 2). The length of CentO tracts ranged widely from 477 to retroelement, indicating that the CentO arrays were usually 8571 bp (Table 1). The total length of all CentO repeats was 58 disrupted by CRs. Interestingly, the retroelements seemed 865 bp, representing 47% of the centromere core region. preferentially inserted into two target sites of 85 bp and 125± Unexpectedly, the directions of the 18 tracts were not the same 128 bp of the 155 bp CentO repeat through monitoring nine and four tracts, which located in the internal part of the core insertion events (Fig. 3). region (CentO-4H, I, J and N) were found in the opposite Although all CentO monomers were well conserved in orientation, and tracts CentO-4H, I and J were highly length, identical repeats among the CentO arrays appeared to homologous to CentO-4E and F but in the opposite orienta- be rare. Except some conserved nucleotide, the polymorph- tion. The orientations of CentO monomers in each tract were isms were dispersed along the whole 155 bp consensus the same. satellite sequence. Most polymorphism sites were only The chromosome 4 centromere had 379 copies of CentO variations in one monomer while some have variations in satellite repeat in the 18 tracts. According to their lengths and many monomers. The identities between different monomers structures, these CentO monomers could be classi®ed into were mostly from 90 to 98%, but the divergences of the three subgroups: 155 bp CentO (154 copies), 165 bp CentO monomers in the CentO tracts near the edge of the centromere (161 copies) and incomplete CentO (64 copies). The 165-bp core region were more apparent than others. Some of them CentO had a 10-nt duplication (TATTGGCATA, see Fig. 3), were only <85% identical to the consensus sequence. In maize compared with the 155-bp monomer. These two types of and Arabidopsis, chromatin immunoprecipitation studies have 2026 Nucleic Acids Research, 2004, Vol. 32, No. 6 Figure 2. The complete organization of the tandemly repeated arrays of CentO sequences and CRs in the core region of rice chromosome 4 centromere. The horizontal red arrows represented the length and orientation of the 18 CentO tracts. The detail monomer's identity and orientation were shown by the black and red dots. The y axis showed different identity between these monomers. The CentO tracts were separated by 19 retroelements of four types. Four type retroelements were marked by four different colors, individually. Table 1. Detailed information of the 18 CentO tracts of rice chromosome 4 centromere Tract Position Length Orientation Copy number 155 bp CentO 165 bp CentO Incomplete CentO Identity CentO-4A 1±8571 8571 + 55 14 35 6 84±97% CentO-4B 9366±10 486 1121 + 8 3 3 2 93±95% CentO-4C 14 778±19 078 4301 + 28 9 16 3 92±98% CentO-4D 20 753±24 877 4125 + 26 11 13 2 92±100% CentO-4E 32 620±36 620 4001 + 26 8 15 3 92±98% CentO-4F 37 411±38 948 1538 + 11 4 5 2 93±98% CentO-4G 45 200±49 506 4307 + 28 7 17 4 86±96% CentO-4H 54 200±57 022 2823 ± 19 7 9 3 90±98% CentO-4I 57 815±59 712 1898 ± 13 4 6 3 93±97% CentO-4J 60 507±63 570 3064 ± 20 4 12 4 92±98% CentO-4K 70 771±71 247 477 + 4 1 1 2 98% CentO-4L 72 077±73 739 1663 + 11 3 6 2 92±96% CentO-4M 79 992±81 005 1014 + 7 3 3 1 93±94% CentO-4N 81 749±83 514 1766 ± 13 6 2 5 90±95% CentO-4O 835 34±87 515 3982 + 26 21 3 2 88±96% CentO-4P 100 834±106 044 5211 + 31 22 4 5 84±94% CentO-4Q 111 043±113 628 2586 + 18 9 5 4 82±96% CentO-4R 117 925±124 341 6417 + 35 18 6 11 78±95% Total 1±124 341 58 865 379 154 161 64 shown that only a portion of the centromeric satellites are whether this is related to the variable divergence found involved in the centromeric function (31,32). It is still unclear between different satellites. Nucleic Acids Research, 2004, Vol. 32, No. 6 2027 Figure 3. Sequence comparison between different rice chromosome 4 CentO monomers. From top to bottom: four typical 155-bp monomers, four typical 165-bp monomers, four monomers with several nucleotide indels, two monomers at the edge of centromere core region with the most variations. Sequence conservation between different monomers was indicated by background shading. Dark shading represented 100% conservation and light shading >80%. The preferential insertion sites of CR were indicated by the vertical arrows. Rice CRs in rice centromere regions (22). These two types were closely related because their coding sequences shared ~80% similarity Nineteen retroelements, within the 18 CentO tracts, were despite different LTRs and 5¢ UTR regions. They were mostly divided into four types depending on their structure and fragmented and truncated retroelements except for a single sequence homology (Fig. 4). intact type II retrotransposon, CR4-4. Type I retroelements belonged to a novel rice retro- The type IV element was actually a kind of repetitive transposon family. Eight retroelements of this type, including sequence with no homology to any known repeats in rice ®ve solo LTRs, two intact retrotransposons and one incom- collected by the TIGR Repeat Database, but it was found to be plete copy, were found in the centromere core region. In this dispersed along the whole chromosome 4. The only copy of 124 kb core region, we detected only solo LTRs that belong to type IV repeat in chromosome 4 centromere region was this type. Solo LTRs are thought to arise from an intra-element divided into two parts (CR4-15 and CR4-17) by an intact type recombination between paired LTRs and there would be few I retrotransposon (CR4-16). solo LTRs in the regions where recombination rates are low. It The types I, II and III retroelements were all typical LTR- is interesting to note that there were still some solo LTRs retrotransposons and had similar primer binding sites (PBS) within the centromere since recombination in centromeric complementary to the 3¢ end of initiator methionyl tRNA and regions is known to be highly suppressed. The solo LTRs in polypurine tracts (PPT). Through sequence homology centromere suggested that there was no signi®cant correlation searches with the genetically anchored publicly available between the frequency of solo LTR formation and regional rice genome sequence, we found that these retroelements were recombination rates, and that different types of solo LTRs may primarily located in the centromeric or pericentromeric be formed by several different processes. Except the solo heterochromatin regions. However, additional copies of LTRs, CR4-2 and CR4-16 were two intact elements with the these elements could be found elsewhere across all rice coding capacity of a gag protein. Despite having structural chromosomes. The retroelements were not as unique to the features of LTR retrotransposons, CR4-19 near the edge of the centromere as CentO, the other centromere component. core region diverged extensively and lacked the full ORF that Based on sequence divergence between 5¢/3¢ LTRs (Kimura codes for the polyprotein, suggesting that this element was parameter distances) and an estimate of the nucleotide ±9 likely non-autonomous. substitution rate for grasses (6.5 3 10 substitutions per Type II and type III retroelements were typical Ty3-gypsy synonymous site per year) (27), we measured the insertion class retrotransposons with a relatively large 5¢ UTR and a time of the three retrotransposons (CR4-2, CR4-16 and CR4- polyprotein reading frame that overlapped the downstream 4) to be 0.88, 0.19 and 1.63 million years, respectively. The LTR; furthermore, type III represented the gypsy-type retro- ages of other retroelements were dif®cult to measure because transposon RIRE7 identi®ed previously with the character of of the signi®cant sequence degeneracy and the loss of the preferential insertion into the tandem repeat sequence (CentO) pair of complete LTRs. These data suggested that type I 2028 Nucleic Acids Research, 2004, Vol. 32, No. 6 Figure 4. Detailed structural features of three types of CRs. CR4-2, CR4-16 and CR4-11 were intact retroelements. Others were fragments of retroelements. Types I, II and III retroelements were indicated in green, light blue and dark blue, respectively. It was also indicated that the previous identi®ed rice centromeric repetitive DNA fragments (RCE1, RCS1, RCH2, RCH1) were found to correspond to type I or II retroelements. retroelements transposed more recently than type II retro- and the pericentric region of the long arm is highly elements. Homology searches also showed that the type I heterochromatinized (34). Though chromosomes 4 and 8 retroelements only exist in rice, whereas types II and III have similar sizes of CentO repeats (59 and 68 kb), more retroelements inserted into the CentO repeats of chromosome retroelements have homologs in other cereals. This suggested 4 centromere core region. In addition, though the CR elements that type I retroelements have transposed into the centromere of rice chromosome 8 centromere were much less than that of more recently than other elements and may be more speci®c to chromosome 4, an intact RIRE7 retrotransposon (type III rice centromeres. retroelement) was found in the chromosome 8 centromeric We identi®ed three intact centromeric retrotransposons region. (CR4-2, CR4-4 and CR4-16) in the chromosome 4 centromere (Fig. 4), while no intact retrotransposon in the centromere had been found in rice in the previous research (21). These intact retrotransposons, which included the LTRs, PBSs and PPTs DISCUSSION immediately internal to the LTR and the retrotransposon The centromere is the most essential element of chromosomes reading frames, belonged to types I and II retroelements for the faithful segregation and inheritance of genetic infor- separately, indicating that they had the possibilities to be mation in higher eukaryotic species. However, centromeric actively transcribed. The presence of the full intact retro- sequences in different species appear to be variable, although transposons provided evidence that the CR elements ¯anking their function is highly conserved (1,2). There seems no doubt the satellite DNA were capable of initiating transcription, that achieving a complete sequence of a chromosome which is thought to be relative to the functional centromere centromere will contribute to establishing the sequences initiation by a transcription-mediated mechanism (33). required for the function and a better understanding of the function of the centromere. Unfortunately, so far, the Comparison with the rice chromosome 8 centromere- sequences of natural centromeres are not completely available related sequences yet, although some detailed sequences about centromere Rice chromosome 8 has also been identi®ed to be with limited regions have been studied (15,16). The sequence structure of amount of CentO repeats (18). We searched a rice chromo- the centromere of rice chromosome 4 reported here is, to our some 8 centromere-related sequence of B1052H09 knowledge, the ®rst complete centromere sequence, at least in (AP006480) that was completely sequenced and assembled rice. from GenBank. Comparative analysis revealed that the CentO DNA sequences associated with centromeric regions have spanned region of chromosome 8 centromere was 78 kb and been reported in numerous species including some plant the length of CentO repeats were 68 kb. Chromosome 4 has species. The centromeres of higher eukaryotic species are the most heterochromatic region in the rice genome. About mainly composed of satellite repeats and other repetitive one-third of the chromosome including the entire short arm elements. In rice chromosome 4, tandemly arrayed repeats Nucleic Acids Research, 2004, Vol. 32, No. 6 2029 CentO and 19 CRs constituted the centromere core region. The Since there are no centromeric regions that have been 155 bp CentO repeats had the similar monomer length with the completely sequenced in higher plants, the full structural information of the centromere of rice chromosome 4 may shed centromeric satellite repeats in other species, such as alpha some light on the functional dissection of a centromere and satellite in human (15), pAL1 in Arabidopsis (16) and CentC identi®cation of minimal sequences that provide the centro- in maize (35). The conservative repeat lengths found for most mere function. centromeric satellites are thought to be corresponding to the nucleosomal unit lengths (1). Retroelements have been reported to be conserved components of cereal centromeres ACKNOWLEDGEMENTS whereas the retrotransposon family shows no such clear localization in Arabidopsis genome (21). Previous researches This work was supported by the grants from the Ministry of have identi®ed a highly conserved Ty3/gypsy-like retro- Sciences and Technology (2002AA2Z1003 and transposon family in the centromeres of maize (35), sorghum 2003AA222091), the Chinese Academy of Sciences, the (36), barley (37), rice (18,22,23) and many other cereals (21). Shanghai Municipal Commission of Sciences and Technology Several centromeric repetitive DNA elements have been (038019315) and the National Natural Science Foundation of identi®ed to play important roles in interactions with the China (30221002 and 30325014). centromeric-speci®c histone H3 variant (31,32). And the complete composition of centromeric repeats will provide a platform for identi®cation of centromere function and minimal REFERENCES sequences that provide centromere function. The preferential 1. Henikoff,S., Ahmad,K. and Malik,H.S. (2001) The centromere paradox: insertion of the CR elements among CentO repeats indicates stable inheritance with rapidly evolving DNA. Science, 293, 1098±1102. that the centromere-speci®c retroelements have contributed to 2. Sullivan,B.A., Blower,M.D. and Karpen,G.H. (2001) Determining centromere identity: cyclical stories and forking paths. Nature Rev. centromere expansion during evolution. Among the CentO Genet., 2, 584±596. tracts, four tracts of 65 CentO repeats were located in an 3. Harrington,J.J., Van,Bokkelen,G., Mays,R.M., Gustashaw,K. and opposite orientation. Our study also reveals that type I Willard,H.F. (1997) Formation of de novo centromeres and construction retroelements have transposed into the centromere more of ®rst-generation human arti®cial microchromosomes. Nature Genet., recently than other elements and may be more speci®c to 15, 345±355. 4. Ikeno,M., Grimes,B., Okazaki,T., Nakano,M., Saitoh,K., Hoshino,H., rice centromeres. McGill,N.I., Cooke,H. and Masumoto,H. (1998) Construction of YAC- The structure of the complete rice chromosome 4 based mammalian arti®cial chromosomes. Nat. Biotechnol., 16, 431±439. centromere suggests that a certain number of tandemly 5. Murphy,T.D. and Karpen,G.H. (1995) Localization of centromere arrayed repeats and at least one intact retrotransposon element function in a Drosophila minichromosome. Cell, 82, 599±609. 6. Kaszas,E. and Birchler,J.A. (1996) Misdivision analysis of centromere might be necessary for maintaining the full centromere structure in maize. EMBO J., 15, 5246±5255. function in higher plants through a transcription-mediated 7. Clarke,L. (1990) Centromeres of budding and ®ssion yeast. Trends mechanism. Recent studies in ®ssion yeast suggest that either, Genet., 6, 150±154. or both, tandem repeats and LTR retrotransposons may play a 8. Clarke,L. (1998) Centromeres: proteins, protein complexes, and repeated domains at centromeres of simple eukaryotes. Curr. Opin. Genet. Dev., 8, role in the heterochromatinization of centromeric DNA 212±218. through an RNA interference (RNAi) related mechanism 9. The C. elegans Sequencing Consortium. (1998) Genome sequence of the (38±40). The heterochromatin that coats centromeric repeats is nematode C. elegans: a platform for investigating biology. Science, 11, required for the assembly of an active centromere. As in 2012±2018. ®ssion yeast, the centromeric heterochromatins of most 10. The Arabidopsis Genome Initiative. (2000) Analysis of the genome sequence of the ¯owering plant Arabidopsis thaliana. Nature, 408, 796± eukaryotes are frequently made up of tandem arrays of simple satellites interspersed with other repetitive elements (e.g. 11. Venter,J.C., Adams,M.D., Myers,E.W., Li,P.W., Mural,R.J., Sutton,G.G., retrotransposons). Those tandem repeats and repetitive elem- Smith,H.O., Yandell,M., Evans,C.A., Holt,R.A. et al. (2001) The ents would be subject to triggering chromatin modi®cation by sequence of the human genome. Science, 291, 1304±1351. 12. International Human Genome Sequencing Consortium. (2001) Initial the same mechanism as long as one strand is transcribed and sequencing and analysis of the human genome. Nature, 409, 860±921. the components of the RNAi machinery are recruited. 13. Sasaki,T., Matsumoto,T., Yamamoto,K., Sakata,K., Baba,T., Rice provides an almost perfect model to obtain the full Katayose,Y., Wu,J., Niimura,Y., Cheng,Z., Nagamura,Y. et al. (2002) centromere sequence, for several rice chromosomes contain a The genome sequence and structure of rice chromosome 1. Nature, 420, 312±316. limited amount of the satellite repeats. Beside the centromere 14. Feng,Q., Zhang,Y., Hao,P., Wang,S., Fu,G., Huang,Y., Li,Y., Zhu,J., of chromosome 4, the centromere of chromosome 8 is also Liu,Y., Hu,X. et al. (2002) Sequence and analysis of rice chromosome 4. small enough to be covered by constructed BAC contigs and Nature, 420, 316±320. be sequenced. We compared the only two available centro- 15. Schueler,M.G., Higgins,A.W., Rudd,M.K., Gustashaw,K. and Willard,H.F. (2001) Genomic and genetic de®nition of a functional mere sequences in rice till now, the complete sequence in human centromere. Science, 294, 109±115. chromosome 4 and the centromere-related sequence in 16. Copenhaver,G.P., Nickel,K., Kuromori,T., Benito,M.I., Kaul,S., Lin,X., chromosome 8 obtained from GenBank, to detect the differ- Bevan,M., Murphy,G., Harris,B., Parnell,L.D. et al. (1999) Genetic entiation of centromere sequence between chromosomes in de®nition and sequence analysis of Arabidopsis centromeres. Science, 286, 2468±2474. the same species. The two centromeres comprised almost the 17. Sun,X., Le,H.D., Wahlstrom,J.M. and Karpen,G.H. (2003) Sequence same amount of CentO repeats, whereas the number of analysis of a functional Drosophila centromere. Genome Res., 13, 182± retroelements varied, which indicated that the similarity between the centromere sequences might be associated with 18. Cheng,Z., Dong,F., Langdon,T., Ouyang,S., Buell,C.R., Gu,M., their function. Blattner,F.R. and Jiang,J. (2002) Functional rice centromeres are marked 2030 Nucleic Acids Research, 2004, Vol. 32, No. 6 by a satellite repeat and a centromere-speci®c retrotransposon. Plant 30. Wu,J., Maehara,T., Shimokawa,T., Yamamoto,S., Harada,C., Cell, 14, 1691±1704. Takazaki,Y., Ono,N., Mukai,Y., Koike,K., Yazaki,J. et al. (2002) A 19. Dong,F., Miller,J.T., Jackson,S.A., Wang,G.L., Ronald,P.C. and Jiang,J. comprehensive rice transcript map containing 6591 expressed sequence (1998) Rice (Oryza sativa) centromeric regions consist of complex DNA. tag sites. Plant Cell, 14, 525±535. Proc. Natl Acad. Sci. USA, 95, 8135±8140. 31. Nagaki,K., Talbert,P.B., Zhong,C.X., Dawe,R.K., Henikoff,S. and 20. Nomomura,K.I. and Kurata,N. (1999) Organization of the 1.9-kb repeat Jiang,J. (2003) Chromatin immunoprecipitation reveals that the 180-bp unit RCE1 in the centromeric region of rice chromosomes. Mol. Gen. satellite repeat is the key functional DNA element of Arabidopsis Genet., 261, 1±10. thaliana centromeres. Genetics, 163, 1221±1225. 21. Langdon,T., Seago,C., Mende,M., Leggett,M., Thomas,H., Forster,J.W., 32. Zhong,C.X., Marshall,J.B., Topp,C., Mroczek,R., Kato,A., Nagaki,K., Thomas,H., Jones,R.N. and Jenkins,G. (2000) Retrotransposon evolution Birchler,J.A., Jiang,J. and Dawe,R.K. (2002) Centromeric retroelements in diverse plant genomes. Genetics, 156, 313±325. and satellites interact with maize kinetochore protein CENH3. Plant Cell, 22. Kumekawa,N., Ohmido,N., Fukui,K., Ohtsubo,E. and Ohtsubo,H. (2001) 14, 2825±2836. A new gypsy-type retrotransposon, RIRE7: preferential insertion into the 33. Jiang,J., Birchler,J.A., Parrott,W.A. and Dawe,R.K. (2003) A molecular tandom repeat sequence TrsD in pericentromeric heterochromatin view of plant centromeres. Trends Plant Sci., 8, 570±575. regions of rice chromosomes. Mol. Genet. Gemomics, 265, 480±488. 34. Cheng,Z., Buell,C.R., Wing,R.A., Gu,M. and Jiang,J. (2001) Toward a 23. Nonomura,K.I. and Kurata,N. (2001) The centromere composition of cytological characterization of the rice genome. Genome Res., 11, 2133± multiple repetitive sequences on rice chromosome 5. Chromosoma, 110, 284±291. 35. Ananiev,E.V., Phillips,R.L. and Rines,H.W. (1998) Chromosome- 24. Zhao,Q., Zhang,Y., Cheng,Z., Chen,M., Wang,S., Feng,Q., Huang,Y., speci®c molecular organization of maize (Zea mays L.) centromeric Li,Y., Tang,Y., Zhou,B. et al. (2002) A ®ne physical map of the rice chromosome 4. Genome Res., 12, 817±823. regions. Proc. Natl Acad. Sci. USA, 95, 13073±13078. 25. Ewing,B. and Green,P. (1998) Base-calling of automated sequencer 36. Miller,J.T., Dong,F., Jackson,S.A., Song,J. and Jiang,J. (1998) traces using phred. II. Error probabilities. Genome Res., 8, 186±194. Retrotransposon-related DNA sequences in the centromeres of grass 26. SanMiguel,P., Gaut,B.S., Tikhonov,A., Nakajima,Y. and Bennetzen,J.L. chromosomes. Genetics, 150, 1615±1623. (1998) The paleontology of intergene retrotransposons of maize. Nature 37. Hudakova,S., Michalek,W., Presting,G.G., ten Hoopen,R., dos Santos,K., Genet., 20, 43±45. Jasencakova,Z. and Schubert,I. (2001) Sequence organization of barley 27. Gaut,B.S., Morton,B.R., McCaig,B.C. and Clegg,M.T. (1996) centromeres. Nucleic Acids Res., 29, 5029±5035. Substitution rate comparisons between grasses and palms: synonymous 38. Volpe,T., Schramke,V., Hamilton,G.L., White,S.A., Teng,G., rate differences at the nuclear gene Adh parallel rate differences at the Martienssen,R.A. and Allshire,R.C. (2003) RNA interference is required plastid gene rbcL. Proc. Natl Acad. Sci. USA, 93, 10274±10279. for normal centromere function in ®ssion yeast. Chromosome Res., 11, 28. Chen,M., Presting,G., Barbazuk,W.B., Goicoechea,J.L., Blackmon,B., 137±146. Fang,G., Kim,H., Frisch,D., Yu,Y., Sun,S. et al. (2002) An integrated 39. Martienssen,R.A. (2003) Maintenance of heterochromatin by RNA physical and genetic map of the rice genome. Plant Cell, 14, 537±545. interference of tandem repeats. Nature Genet., 35, 213±214. 29. Harushima,Y., Yano,M., Shomura,A., Sato,M., Shimano,T., Kuboki,Y., 40. Schramke,V. and Allshire,R. (2003) Hairpin RNAs and retrotransposon Yamamoto,T., Lin,S.Y., Antonio,B.A., Parco,A. et al. (1998). A high- LTRs effect RNAi and chromatin-based gene silencing. Science, 301, density rice genetic linkage map with 2275 markers using a single F2 1069±1074. population. Genetics, 148, 479±494.

Journal

Nucleic Acids ResearchOxford University Press

Published: Apr 7, 2004

There are no references for this article.