Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

10-11 bp periodicities in complete genomes reflect protein structure and DNA folding.

10-11 bp periodicities in complete genomes reflect protein structure and DNA folding. +(  *+ BIOINFORMATICS %#. 10–11 bp periodicities in complete genomes reflect protein structure and DNA folding #-3#(   #'.. *"   -'$+*+1 *./'/0/# $+- &#+-#/'!( '+(+%2 0) +("/ *'1#-.'/2 *1('"#*./- 4 #-('* #-)*2 *" #,-/)#*/ +$ /-0!/0-( '+(+%2 #'3)** *./'/0/# +$ !'#*!# #&+1+/  .-#( Abstract in protein coding DNA is widely used to find exons in newly sequenced DNA (Fickett and Tung, 1992) and correlations Motivation: Completely sequenced genomes allow for over thousands of base pairs are discussed in connection with detection and analysis of the relatively weak periodicities of chromosomal organization (Elton, 1974; Bernardi et al., 10–11 basepairs (bp). Two sources contribute to such 1985). signals: correlations in the corresponding protein sequences In this paper we analyze 10–11 base pair (bp) oscillations due to the amphipatic character of α-helices and the folding in complete genomes of yeast, bacteria, and archaea. In addi- of DNA (nucleosomal patterns, DNA supercoiling). Since the tion to the period 3 oscillations, the correlation functions in topological state of genomic DNA is of importance for its Figure 1 exhibit pronounced oscillations with a period of replication, recombination and transcription, there is an 10–11 bp. Such periodicities have been described already in immediate interest to obtain information about the super- Widom (1996) and Herzel et al. (1998a). However, the inter- coiled state from sequence periodicities. pretation of these oscillations remained vague. Our paper is Results: We show that correlations within proteins affect devoted to the interpretation of these periodicities in relation mainly the oscillations at distances below 35 bp. The to protein structure and DNA folding. long-ranging correlations up to 100 bp reflect primarily We discuss two possible origins of 10–11 bp oscillations. DNA folding. For the yeast genome these oscillations are On the one hand, the alternation of hydrophobic and hydro- consistent in detail with the chromatin structure. For philic amino acids (aa) in α-helices leads to a periodicity of eubacteria and archaea the periods deviate significantly about 3.5 aa in protein sequences (Garnier et al., 1978; Kane- from the 10.55 bp value for free DNA. These deviations hisa and Tsong, 1980), which corresponds to 10–11 bases in suggest that while a period of 11 bp in bacteria reflects DNA sequences (Zhurkin, 1981). On the other hand, the heli- negative supercoiling, the significantly different period of cal periodicity of DNA is about 10.5 bp. Consequently, sig- thermophilic archaea close to 10 bp corresponds to positive nals for DNA folding show periodicities of 10–11 bp as well supercoiling of thermophilic archaeal genomes. (Trifonov and Sussman, 1980). Availability: Protein sets and C programs for the calculation It is the aim of this paper to separate these two different of correlation functions are available on request from the sources as a prerequisite of a detailed discussion of DNA authors (see http://itb.biologie.hu-berlin.de). supercoiling. We show in Section 3 that protein-induced cor- Contact: h.herzel@biologie.hu-berlin.de, o.weiss@biolo- relations affect mainly the first 3 peaks around 10, 21, and 31 gie.hu-berlin.de, BPTRIFO1@WEIZMANN.weizmann.ac.il bp. The peaks beyond 35 bp reflect primarily the curvature of DNA. This will be discussed in detail in Sections 4 and 5. Introduction The analysis of correlations in DNA sequences is an import- ant complement to experimental studies. Since several com- Methods plete genomes are available now, a careful analysis of se- Correlation functions as in Figure 1 measure the excess of quence periodicities can help to understand structure and certain nucleotide pairs at a distance of k base pairs. For the function of genomic DNA. For example, the 3 bp periodicity calculation of A–A autocorrelations we count, for example, the number N (k) of A–A pairs in a distance k. Altogether AA there are N–k pairs in a sequence of length N. Consequently, the pair probability to find A and A in a distance k can be estimated as *To whom correspondence should be addressed Oxford University Press 187 H.Herzel et al. Fig. 1. Correlation functions of the complete yeast genome. Upper graph: C + C . Middle graph: C + C . Lower graph: C . The thick AA TT AT TA WW lines correspond to running averages over 3 bp (see Methods section for details). for correlation functions that involve A or T. We find C ≈ N (k) AA AA p (k)  (1) AA C and C ≈ C . Hence, we plot in Figure 1 the sums C N k TT AT TA AA + C and C + C to improve the signal-to-noise ratio. TT AT TA The probability of the single nucleotide A is denoted as p . Moreover, we present correlation functions C of weakly WW If the nucleotides at a distance k are statistically independent binding nucleotides A or T. In many cases this choice gives we find p (k) = p · p . Thus the difference AA A A the strongest oscillations with a period of 10–11 bp. C (k) = p (k) – p · p (2) We consider also correlations between pairs of dinucleo- AA AA A A tides in connection with DNA bending. Furthermore, we measures correlations at a distance of k base pairs. A positive present in the next section correlation functions of amino peak of C implies that there are more A–A pairs at that AA acid sequences. For this purpose we count, for example, pairs distance k than expected by chance. of hydrophobic amino acids as a function of the distance k. Correlation functions of protein-coding DNA show a The systematic error (bias) due to the finite lengths of the strong period 3 due to the genetic code. Widom (1996) separ- sequences has been removed according to the formulae ates this periodicity from periods of 10–11 bp by spectral given in (Weiss and Herzel, 1998). analysis. We apply simply a 3 bp running average to remove In order to extract the periods T of observed oscillations we the period 3: use nonlinear curve fitting. Trends are described by a parab- ola and the oscillations are fitted by a cos-function with expo- C(k 1) C(k) C(k 1) C(k)  (3) nential decay: Figure 1 illustrates that such a procedure highlights the C(k)  a  a  k a  k  a 1 2 3 4 periodicities of interest. cos( k a ) exp( a  k) (4) There are 9 independent correlation functions for DNA se- 5 6 quences (Herzel and Große, 1995). Consequently, we ana- lyzed various pair correlations such as C , C , C , etc. The free parameters a and the period T are then obtained AA GG AC i It turns out that 10–11 bp oscillations are most pronounced by least-square fits. 188 Sequence periodicities reflect DNA supercoiling Protein-induced oscillations It is well known that amino acid sequences of native proteins exhibit periodicities (Garnier et al., 1978; Kanehisa and Tsong, 1980). In particular, α-helices are characterized by an alternation of hydrophilic and hydrophobic amino acids in a distance of 3–4 amino acids (Hobohm and Sander, 1995). Leucine zippers and coiled-coils show, for instance, such periodicities. In Figure 2 we present correlation functions averaged over all yeast proteins. Details regarding the calculation of cor- relation functions from sets of protein sequences have been described elsewhere (Weiss and Herzel, 1998). The hydro- phobic amino acids L, I, V, F, and M have preferred distances of k = 3,4,7,11. The dashed line indicates that the hydro- phobic residues oscillate out-of-phase with the hydrophilic Fig. 2. Correlation functions of 6173 yeast protein sequences residues D, E, K, and Q. We have shown that these periodici- downloaded from ftp://genome-ftp.stanford.edu/pub/yeast/ ties are indeed strong in α-helix rich proteins and not signifi- yeast_ORFs/. Thick line: autocorrelations of hydrophobic amino cant in β-sheet rich proteins (Weiss and Herzel, 1998). Since acids L, I, V, F, and M. Dashed line: cross-correlations L, I, V, F, M the typical length of an α-helix ranges from 5 to 15 residues, versus D, E, K, Q. The standard deviations have been calculated according to Weiss and Herzel (1998) in order to demonstrate the there is a fast decay of the oscillations in Figure 2. As the significance of the first peaks. genetic code is degenerate, it is not obvious that these period- icities in protein sequences induce periodicities in the corre- sponding DNA sequences. There is, however, a peculiarity of the genetic code that implies a direct correspondence of On the contrary, periodicities in the third position extend to protein and DNA correlations. The middle letter of the co- as far as 100 bp. dons is closely related to the physical properties of the amino acid (Woese et al., 1966). More specifically, all five hydro- Nucleosomal patterns in the yeast genome phobic amino acids L, I, V, F, and M have a T at that position and the hydrophilic amino acids D, E, K, and Q possess an The main results of the preceding sections can be summar- A in the middle of all their codons. That is why we have ized as follows. There are 10–11 bp periodicities due to pro- chosen just these amino acids in Figure 2. Due to this peculiar teins but these oscillations are restricted to the first 3 peaks feature of the genetic code the oscillations shown in Figure of the correlation functions. The oscillations beyond 35 bp 2 should be visible in DNA sequences as well. In order to (cf. Figure 1) can be attributed to DNA structure. In eukary- verify this claim we perform the following numerical experi- otes DNA is wrapped in a left-handed toroidal superhelix ment as in Zhurkin (1981). We translate the protein se- around the nucleosome core (van Holde, 1988). This is asso- quences back to DNA-like sequences by using a uniform ciated with a slight decrease of the helical period. Estima- codon usage, i.e., codons are selected randomly with equal tions of the helical period are in the range of 10.0 to 10.5 bp. probability for each codon. The resulting oscillations in Fig- These numbers have been derived from digestion experi- ure 3 resemble clearly some of the periodicities in the yeast ments (Drew and Travers, 1985) or from an alignment of genome shown in Figure 1. In particular, the out-of-phase nucleosomal sequences (Ioshikhes et al., 1996). Here we oscillations of C + C and C + C are reproduced. AA TT AT TA present estimations of nucleosomal periodicities from the The similarity of the actual oscillations of yeast DNA in whole yeast genome which turn out to be consistent with Figure 1 and the artificial sequences in Figure 3 is, however, earlier estimates. restricted to the first 3 periods. The long-ranging correlations As described in the Methods section we analyze the se- in the lower graph of Figure 1 cannot be explained solely by quence periodicities with nonlinear curve fits. Figure 4 protein structure. shows the result of such a fit leading to a period of 10.2 bp. Recently, we confirmed this claim by studying separately Quite similar estimates of the period result from dinucleotide correlations in the three frames of coding DNA (Herzel et al., correlation functions or roll–roll correlation functions (not 1998b). As expected from the discussed peculiarity of the shown). genetic code, protein correlations affect mainly the second Closer examination of Figure 4 reveals a peculiar feature position of codons and are extended over a few periods only. of the pattern — there seems to be a phase shift in the middle 189 H.Herzel et al. Fig. 3. 3 bp running averages of correlation functions of randomly back-translated yeast proteins (cf. Fig. 2). From top to bottom: C + C , AA TT C + C , C . AT TA WW the nucleosome DNA periodicity 10.39 bp which is derived from 8 independent measurements (Trifonov, 1995). Now we discuss possible explanations of the phase shift in Figure 4. First, the DNA path around nucleosomes is not necessarily a continuous superhelix. Zhurkin (1985) dis- cussed, for example, nonuniform bending around the nu- cleosomal core. In addition to such INTRAnucleosomal deviations from a regular superhelix, also INTERnucleosomal correlations can cause the phase shift. Indeed, linker lengths of 8 bp, 10.5+8 bp, 21+8 bp and so on (rather than, say, mere multiples of 10.5 bases) have been observed as typical values both exper- imentally (Noll et al., 1980) and by sequence-directed map- ping of the nucleosomes (Mengeritsky and Trifonov, 1983). They correspond to the midpoints of sterically allowed ranges of rotations of the closely neighboring nucleosomes around their connecting linkers (Ulanovsky and Trifonov, Fig. 4. WW-correlations in the complete yeast genome. Nonlinear 1986). Such displacement between neighboring nucleo- curve fitting (dashed line) was applied in the range from 38 to 105 bp to avoid dominance of protein correlations. somes (2.5 bases off the 10.5 base ladder) would result in the apparant phase shift as in Figure 4. The observed oscillation, therefore, can be related even in detail to chromatin structure, and is consistent with earlier of the k-range. The maxima around 55 bp are only 8 bp apart. results on nucleosome sequence periodicities. Since this phenomenon was found in various correlation functions, it does not seem to be just a statistical fluctuation. Supercoiling in eubacteria and archaea Separate fittings of the first peaks of the curve in the Figure 4 and of the remaining peaks result in somewhat higher va- In prokaryotes DNA is not packed as tightly as in eukaryotic lues for the nucleosome DNA periodicity (data not shown) chromatin. It is, therefore, particularly interesting that pro- which is consistent with the latest weight average estimate of nounced 10–11 bp oscillations appear in prokaryotes as well 190 Sequence periodicities reflect DNA supercoiling Fig. 5. Comparison of correlation functions from bacteria and archaea listed in Table 1. Upper graph: E. coli (solid line), B. subtilis (grey), Synechocystis (dashed), H. influenzae (long dashed). Middle graph: H. pylori (solid), M. pneumoniae (dashed), M. genitalium (long dashed). Lower graph: A. fulgidis (solid), M. thermo. (dashed), M. jannaschii (long dashed). In order to visualize the significantly different periodicities, arrows are drawn in distances of 11 bp (upper and middle graph) and 10 bp (lower graph). Table 1. Estimated periodicities of genomic DNA (Herzel et al., 1998a; Tomita et al., 1998). In this section we discuss these periodicities in connection with the supercoiled Length Nucleotides Dinucleotides state of genomic DNA which is important for its replication, recombination and transcription (Wang, 1996). E. coli 4.6 M 11.0 11.0 It turns out that for eubacterial DNA the bases A and T B. subtilis 4.2 M 11.2 11.2 Synechocystis 3.6 M 11.5 11.6 nearly oscillate in phase and, therefore, we study primarily H. influenzae 1.8 M 11.2 11.0 autocorrelations of weakly binding nucleotides termed WW- H. pylori 1.7 M 11.2 11.2 correlations. The resulting oscillations in Figure 5 are highly M. pneumoniae 0.8 M 11.3 11.4 significant since autocorrelation functions of random se- M. genitalium 0.6 M 11.5 11.5 quences of the same length have standard deviations below A. fulgidus 2.2 M 10.0 10.0 0.0003 (Weiss and Herzel, 1998) compared to observed am- M. thermo. 1.8 M 10.1 – plitudes up to 0.003. The high number of peaks allows, more- M. jannaschii 1.7 M 10.0 10.0 over, a fairly accurate estimation of the corresponding periods. The estimated periods are summarized in Table 1. In We estimated the periods from the correlation functions in the range from 38 the case of B. subtilis the oscillations are less regular (grey to 105 bp via nonlinear curve fitting (see Methods section). Distances below 38 bp are excluded to avoid dominance of protein correlations. The left col- line in Figure 5), presumably due to inhomogeneities of the umn refers to correlations of weakly binding nucleotides (A or T) whereas the genomes due to phages, varying codon usage or repeats. middle column is obtained from correlations of AA or TT dinucleotides. In the Nevertheless, the fitting procedure leads to a reasonable esti- case of Methanobacterium thermoautotrophicum the dinucleotide correlation mate of the period T. functions exhibit no clear periodicities. 191 H.Herzel et al. The helical period of free DNA is about 10.55 bp (Trifo- in genomic DNA of archaea has not been determined yet (Pereira et al., 1997). nov, 1998). Inspection of Figure 5 and Table 1 reveals, how- Our statistical analysis of complete genomes provides evi- ever, that the periods of bacteria and archaea deviate signifi- dence that genomic DNA of thermophilic archaea exhibits cantly from the value for the helical repeat for free DNA, positive supercoiling. The significantly different periodici- 10.55 bp. For example, the ninth peak around 100 bp in the ties of bacteria and archaea (compare Figure 5) give further upper graph implies a period of about 11 bp. For archaea justification of the three-domain concept — archaea, bacteria (lower graph) the peak at 100 bp is already the tenth peak and eucarya (Woese et al., 1990; Ouzonis et al., 1995). Since which corresponds to a period of about 10 bp. Least square reverse gyrase is present also in thermophilic eubacteria fits of the averaged correlation functions gave periods of (Bouthier de la Tour et al., 1991), the positive supercoiling 11.36 bp for bacteria and 10.01 bp for archaea (Herzel et al., might be a thermophilic feature. In order to decide whether 1998b). These periodicities have been confirmed also by or not positive supercoiling is an archaeal or a thermophilic spectral analysis. Moreover, the estimated periods in Table 1 feature, forthcoming genomes of mesophilic methanogens provide clear evidence that bacteria typically have periods and thermophilic eubacteria have to be analyzed along the above 11 bp whereas the analyzed thermophilic archaea have lines of this paper. periods of about 10 bp. Sequence periodicities (such as periodically placed AA dinucleotides) having the helical period are associated with Summary curved DNA — a local flat arc. When the sequence period We have shown that correlation functions of complete ge- is not exactly equal to the DNA helical period, the sequence- nomes reveal pronounced oscillations with periods in the dependent DNA curvature turns into superhelical writhe. range of 10–11 bp. These periodicities appear in nucleotide– According to Crick’s formula for helical DNA trajectories nucleotide as well as in dinucleotide–dinucleotide correla- (Crick, 1976), periods above 10.55 bp generate negatively tion functions. Partly, the ocsillations could be traced back to supercoiled DNA whereas lower periods induce positive correlations in protein sequences. However, we attribute the supercoiling. In this way DNA periodicities may stabilize periodicities beyond the first three peaks to signals which and synchronize the supercoiled state introduced by proteins reflect the folding of DNA. For yeast the earlier predicted or topoisomerases. Therefore, sequence periodicities reflect nucleosomal sequence patterns are confirmed. For bacteria the characteristic superhelical density of genomic DNA. and archaea, a significant deviation from the equilibrium For bacterial DNA the superhelical density is negative period for free DNA of 10.55 bp is found. We emphasize, that (Vologodsky, 1992), i.e. the DNA is underwound. Indeed, these deviations reflect the supercoiled state of genomic the observed sequence periodicity of about 11 bp is clearly DNA. Indeed, the negative supercoiling of eubacteria is con- above the equilibrium helical period as expected. sistent with a sequence periodicity of 11 bp. For archaea, Similarly, the 10 bp sequence periodicity of archaeal DNA however, the supercoiled state of genomic DNA is disputed can be interpreted as a reflection of supercoiling of opposite (Pereira et al., 1997). The observed period of 10 bp suggests sign, i.e. positive supercoiling. Since high temperatures in- positive supercoiling for thermophilic archaea. crease the helical repeat slightly (Depew and Wang, 1975) a sequence periodicity of 10 bp indicates a substantial over- Acknowledgements winding of DNA that would contribute to DNA stabilization at high temperatures (Kikuchi and Asai, 1984). This work has been supported by the Deutsche For- So far, there is no direct experimental information avail- schungsgemeinschaft. We thank Ivo Große, Johannes able about the superhelical state of genomic DNA in archaea. Schuchhardt and anonymous referees for many valuable There are, however, strong indications for positive supercoil- comments. ing: Archaeal plasmids and a virus-like particle from Sulfolo- bus are positively supercoiled (Lopez-Garcia and Forterre, References 1997; Nadal et al., 1986). Moreover, hyperthermophilic ar- Bernardi,G., Olofsson,B., Filipski,J., Zerial,M., Salinas,J., Cuny,G., chaea possess reverse gyrase activity which generates posi- Meunier–Rotival,M. and Rodier,F. (1985) The mosaic genome of tive supercoiling of DNA (Kikuchi and Asai, 1984). Finally, warm-blooded vertebtates. Science, 228, 953–957. histone-like proteins have been found in various archaea Bouthier de la Tour,C., Portemer,C., Huber,R., Forterre,P. and (Pereira et al., 1997). These histones when bound to DNA Duguet,M. (1991) Reverse gyrase in thermophilic eubacteria. J. form nucleosome-like structures (NLS) and introduce toroi- Bacteriol., 173, 3921–3923. dal supercoils in DNA (Musgrave et al., 1991). Despite Crick,F.H.C. (1976) Linking numbers and nucleosomes. Proc. Natl many experimental studies the handedness of the supercoils Acad. Sci. USA, 73, 2639–2643. 192 Sequence periodicities reflect DNA supercoiling Depew,R.E. and Wang,J.C. (1975) Conformal fluctuations of DNA Noll,M., Zimmer,S., Engel,S. and Dubochet,J. (1980) Self-assembly helix. Proc. Natl Acad. Sci. USA, 72, 4275–4279. of single and closely spaced nucleosome core particles. Nucleic Drew,H.R. and Travers,A.A. (1985) DNA bending and its relation to Acids Res., 8, 21–42. nucleosome positioning. J. Mol. Biol., 186, 773–790. Ouzonis,C., Kyrpides,N. and Sander,C. (1995) Novel protein families Elton,R.A. (1974) Theoretical models for heterogeneity of base in archaean genomes. Nucleic Acids Res., 23, 565–570. composition in DNA. J. Theor. Biol., 45, 533–553. Pereira,S.L., Grayling,R.A., Lurz,R. and Reeve,J.N. (1997) Archaeal Fickett,J.W. and Tung,C.S. (1992) Assessment of protein coding nucleosomes. Proc. Natl Acad. Sci. USA, 94, 12633–12637. measures. Nucleic Acids Res., 20, 6441–6450. Tomita,M., Wada,M. and Kawashima,Y. (1998) Periodic patterns in Garnier,J., Osguthorpe,D.J. and Robson,B. (1978) Analysis of the bacterial genomes. Submitted. accuracy and implications of simple methods for predicting the Trifonov,E.N. (1995) Hidden segmentation of protein sequences: secondary structure of globular proteins. J. Mol. Biol., 120, 95–120. structural connection with DNA. In Pullman,A., Jortner,J. and Herzel,H. and Große,I. (1995) Measuring correlations in symbol Pullman,B. (eds), Modelling of Biomolecular Structures and sequences. Physica A, 216, 518–542. Mechanisms, pp. 473–479. Kluwer Academic, Dordrecht. Herzel,H., Trifonov,E.N., Weiss,O. and Große,I. (1998a) Interpreting Trifonov,E.N. (1998) 3-, 10.5-, 200- and 400-base periodicities in correlations in biosequences. Physica A, 249, 449–459. genome sequences. Physica A, 249, 511–516. Herzel,H., Weiss,O. and Trifonov,E.N. (1998b) Sequence periodicity Trifonov,E.N. and Sussman,J.L. (1980) The pitch of chromatin DNA is in complete genomes of Archaea suggests positive supercoiling. J. reflected in its nucleotide sequence. Proc. Natl Acad. Sci. USA, 77, Biomol. Struct. Dyn., 16, 341–345. 3816–3820. Hobohm,U. and Sander,C. (1995) A sequence property approach to Ulanovsky,L.E. and Trifonov,E.N. (1986) A different view point on the searching protein databases. J. Mol. Biol., 251, 390–399. chromatin higher order structure: steric exclusion effects. In Ioshikhes,I., Bolshoy,A., Derenshteyn,K., Borodovsky,M. and Trifo- Sarma,R.H. and Sarma,M.H. (eds), Biomolecular Stereodynamics nov,E.N. (1996) Nucleosome DNA sequence pattern revealed by III, pp. 35–44. Adenine Press, Guilderland. multiple alignment of experimentally mapped sequences. J. Mol. van Holde,K.E. (1988) Chromatin. Springer, Berlin. Biol., 262, 129–139. Vologodsky,A. (1992) Topology and Physics of Circular DNA. CRC Kanehisa,M.I. and Tsong,T.Y. (1980) Hydrophobicity and protein Press, Boca Raton. structure. Biopolymers, 19, 1617–1628. Wang,J.C. (1996) DNA topoisomerases. Annu. Rev. Biochem., 65, Kikuchi,A. and Asai,K. (1984) Reverse gyrase — a topoisomerase 635–692. which introduces positive superhelical turns into DNA. Nature, 309, Weiss,O. and Herzel,H. (1998) Correlations in protein sequences and 677–681. property codes. J. Theor. Biol., 190, 341–353. Lopez-Garcia,P. and Forterre,P. (1997) DNA topology in hyperther- Widom,J. (1996) Short–range order in two eukaryotic genomes: mophilic Archaea: Reference states and their variation with growth relation to chromosome structure. J. Mol. Biol., 259, 579–588. phase, growth temperature, and temperature stresses. Mol. Micro- Woese,D.R., Dugre,D.H., Saxinger,W.C. and Dugre,S.A. (1966) The biol., 23, 1267–1279. molecular basis of the genetic code. Proc. Natl Acad. Sci. USA, 55, Mengeritsky,G. and Trifonov,E.N. (1983) Nucleotide sequence-di- 966–974. rected mapping of the nucleosomes. Nucleic Acids Res., 11, Woese,C.R., Kandler,O. and Wheelis,M.L. (1990) Towards a natural 3833–3851. system of organisms: proposal for the domains Archaea, Bacteria, Musgrave,D.R., Sandman,K.M. and Reeve,J.N. (1991) DNA binding and Eucarya. Proc. Natl Acad. Sci. USA, 87, 4576–4579. by the archaeal histone HMf results in positive supercoiling. Proc. Zhurkin,V.B. (1981) Periodicity in DNA primary structure is defined Natl Acad. Sci. USA, 88, 10397–10401. by secondary structure of the coded protein. Nucleic Acids Res., 9, Nadal,M., Mirambeau,G., Forterre,P., Reiter,W.-D. and Duguet,M. 1963–1971. (1986) Positively supercoiled DNA in a virus-like particle of an Zhurkin,V.B. (1985) Sequence-dependent bending of DNA and archaebacterium. Nature, 321, 256–258. phasing of nucleosomes. J. Biomol. Struct. Dyn., 4, 785–804. http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Bioinformatics Oxford University Press

10-11 bp periodicities in complete genomes reflect protein structure and DNA folding.

Bioinformatics , Volume 15 (3): 7 – Mar 1, 1999

Loading next page...
 
/lp/oxford-university-press/10-11-bp-periodicities-in-complete-genomes-reflect-protein-structure-kYJDW0TBWj

References (34)

Publisher
Oxford University Press
Copyright
© Published by Oxford University Press.
ISSN
1367-4803
eISSN
1460-2059
DOI
10.1093/bioinformatics/15.3.187
Publisher site
See Article on Publisher Site

Abstract

+(  *+ BIOINFORMATICS %#. 10–11 bp periodicities in complete genomes reflect protein structure and DNA folding #-3#(   #'.. *"   -'$+*+1 *./'/0/# $+- &#+-#/'!( '+(+%2 0) +("/ *'1#-.'/2 *1('"#*./- 4 #-('* #-)*2 *" #,-/)#*/ +$ /-0!/0-( '+(+%2 #'3)** *./'/0/# +$ !'#*!# #&+1+/  .-#( Abstract in protein coding DNA is widely used to find exons in newly sequenced DNA (Fickett and Tung, 1992) and correlations Motivation: Completely sequenced genomes allow for over thousands of base pairs are discussed in connection with detection and analysis of the relatively weak periodicities of chromosomal organization (Elton, 1974; Bernardi et al., 10–11 basepairs (bp). Two sources contribute to such 1985). signals: correlations in the corresponding protein sequences In this paper we analyze 10–11 base pair (bp) oscillations due to the amphipatic character of α-helices and the folding in complete genomes of yeast, bacteria, and archaea. In addi- of DNA (nucleosomal patterns, DNA supercoiling). Since the tion to the period 3 oscillations, the correlation functions in topological state of genomic DNA is of importance for its Figure 1 exhibit pronounced oscillations with a period of replication, recombination and transcription, there is an 10–11 bp. Such periodicities have been described already in immediate interest to obtain information about the super- Widom (1996) and Herzel et al. (1998a). However, the inter- coiled state from sequence periodicities. pretation of these oscillations remained vague. Our paper is Results: We show that correlations within proteins affect devoted to the interpretation of these periodicities in relation mainly the oscillations at distances below 35 bp. The to protein structure and DNA folding. long-ranging correlations up to 100 bp reflect primarily We discuss two possible origins of 10–11 bp oscillations. DNA folding. For the yeast genome these oscillations are On the one hand, the alternation of hydrophobic and hydro- consistent in detail with the chromatin structure. For philic amino acids (aa) in α-helices leads to a periodicity of eubacteria and archaea the periods deviate significantly about 3.5 aa in protein sequences (Garnier et al., 1978; Kane- from the 10.55 bp value for free DNA. These deviations hisa and Tsong, 1980), which corresponds to 10–11 bases in suggest that while a period of 11 bp in bacteria reflects DNA sequences (Zhurkin, 1981). On the other hand, the heli- negative supercoiling, the significantly different period of cal periodicity of DNA is about 10.5 bp. Consequently, sig- thermophilic archaea close to 10 bp corresponds to positive nals for DNA folding show periodicities of 10–11 bp as well supercoiling of thermophilic archaeal genomes. (Trifonov and Sussman, 1980). Availability: Protein sets and C programs for the calculation It is the aim of this paper to separate these two different of correlation functions are available on request from the sources as a prerequisite of a detailed discussion of DNA authors (see http://itb.biologie.hu-berlin.de). supercoiling. We show in Section 3 that protein-induced cor- Contact: h.herzel@biologie.hu-berlin.de, o.weiss@biolo- relations affect mainly the first 3 peaks around 10, 21, and 31 gie.hu-berlin.de, BPTRIFO1@WEIZMANN.weizmann.ac.il bp. The peaks beyond 35 bp reflect primarily the curvature of DNA. This will be discussed in detail in Sections 4 and 5. Introduction The analysis of correlations in DNA sequences is an import- ant complement to experimental studies. Since several com- Methods plete genomes are available now, a careful analysis of se- Correlation functions as in Figure 1 measure the excess of quence periodicities can help to understand structure and certain nucleotide pairs at a distance of k base pairs. For the function of genomic DNA. For example, the 3 bp periodicity calculation of A–A autocorrelations we count, for example, the number N (k) of A–A pairs in a distance k. Altogether AA there are N–k pairs in a sequence of length N. Consequently, the pair probability to find A and A in a distance k can be estimated as *To whom correspondence should be addressed Oxford University Press 187 H.Herzel et al. Fig. 1. Correlation functions of the complete yeast genome. Upper graph: C + C . Middle graph: C + C . Lower graph: C . The thick AA TT AT TA WW lines correspond to running averages over 3 bp (see Methods section for details). for correlation functions that involve A or T. We find C ≈ N (k) AA AA p (k)  (1) AA C and C ≈ C . Hence, we plot in Figure 1 the sums C N k TT AT TA AA + C and C + C to improve the signal-to-noise ratio. TT AT TA The probability of the single nucleotide A is denoted as p . Moreover, we present correlation functions C of weakly WW If the nucleotides at a distance k are statistically independent binding nucleotides A or T. In many cases this choice gives we find p (k) = p · p . Thus the difference AA A A the strongest oscillations with a period of 10–11 bp. C (k) = p (k) – p · p (2) We consider also correlations between pairs of dinucleo- AA AA A A tides in connection with DNA bending. Furthermore, we measures correlations at a distance of k base pairs. A positive present in the next section correlation functions of amino peak of C implies that there are more A–A pairs at that AA acid sequences. For this purpose we count, for example, pairs distance k than expected by chance. of hydrophobic amino acids as a function of the distance k. Correlation functions of protein-coding DNA show a The systematic error (bias) due to the finite lengths of the strong period 3 due to the genetic code. Widom (1996) separ- sequences has been removed according to the formulae ates this periodicity from periods of 10–11 bp by spectral given in (Weiss and Herzel, 1998). analysis. We apply simply a 3 bp running average to remove In order to extract the periods T of observed oscillations we the period 3: use nonlinear curve fitting. Trends are described by a parab- ola and the oscillations are fitted by a cos-function with expo- C(k 1) C(k) C(k 1) C(k)  (3) nential decay: Figure 1 illustrates that such a procedure highlights the C(k)  a  a  k a  k  a 1 2 3 4 periodicities of interest. cos( k a ) exp( a  k) (4) There are 9 independent correlation functions for DNA se- 5 6 quences (Herzel and Große, 1995). Consequently, we ana- lyzed various pair correlations such as C , C , C , etc. The free parameters a and the period T are then obtained AA GG AC i It turns out that 10–11 bp oscillations are most pronounced by least-square fits. 188 Sequence periodicities reflect DNA supercoiling Protein-induced oscillations It is well known that amino acid sequences of native proteins exhibit periodicities (Garnier et al., 1978; Kanehisa and Tsong, 1980). In particular, α-helices are characterized by an alternation of hydrophilic and hydrophobic amino acids in a distance of 3–4 amino acids (Hobohm and Sander, 1995). Leucine zippers and coiled-coils show, for instance, such periodicities. In Figure 2 we present correlation functions averaged over all yeast proteins. Details regarding the calculation of cor- relation functions from sets of protein sequences have been described elsewhere (Weiss and Herzel, 1998). The hydro- phobic amino acids L, I, V, F, and M have preferred distances of k = 3,4,7,11. The dashed line indicates that the hydro- phobic residues oscillate out-of-phase with the hydrophilic Fig. 2. Correlation functions of 6173 yeast protein sequences residues D, E, K, and Q. We have shown that these periodici- downloaded from ftp://genome-ftp.stanford.edu/pub/yeast/ ties are indeed strong in α-helix rich proteins and not signifi- yeast_ORFs/. Thick line: autocorrelations of hydrophobic amino cant in β-sheet rich proteins (Weiss and Herzel, 1998). Since acids L, I, V, F, and M. Dashed line: cross-correlations L, I, V, F, M the typical length of an α-helix ranges from 5 to 15 residues, versus D, E, K, Q. The standard deviations have been calculated according to Weiss and Herzel (1998) in order to demonstrate the there is a fast decay of the oscillations in Figure 2. As the significance of the first peaks. genetic code is degenerate, it is not obvious that these period- icities in protein sequences induce periodicities in the corre- sponding DNA sequences. There is, however, a peculiarity of the genetic code that implies a direct correspondence of On the contrary, periodicities in the third position extend to protein and DNA correlations. The middle letter of the co- as far as 100 bp. dons is closely related to the physical properties of the amino acid (Woese et al., 1966). More specifically, all five hydro- Nucleosomal patterns in the yeast genome phobic amino acids L, I, V, F, and M have a T at that position and the hydrophilic amino acids D, E, K, and Q possess an The main results of the preceding sections can be summar- A in the middle of all their codons. That is why we have ized as follows. There are 10–11 bp periodicities due to pro- chosen just these amino acids in Figure 2. Due to this peculiar teins but these oscillations are restricted to the first 3 peaks feature of the genetic code the oscillations shown in Figure of the correlation functions. The oscillations beyond 35 bp 2 should be visible in DNA sequences as well. In order to (cf. Figure 1) can be attributed to DNA structure. In eukary- verify this claim we perform the following numerical experi- otes DNA is wrapped in a left-handed toroidal superhelix ment as in Zhurkin (1981). We translate the protein se- around the nucleosome core (van Holde, 1988). This is asso- quences back to DNA-like sequences by using a uniform ciated with a slight decrease of the helical period. Estima- codon usage, i.e., codons are selected randomly with equal tions of the helical period are in the range of 10.0 to 10.5 bp. probability for each codon. The resulting oscillations in Fig- These numbers have been derived from digestion experi- ure 3 resemble clearly some of the periodicities in the yeast ments (Drew and Travers, 1985) or from an alignment of genome shown in Figure 1. In particular, the out-of-phase nucleosomal sequences (Ioshikhes et al., 1996). Here we oscillations of C + C and C + C are reproduced. AA TT AT TA present estimations of nucleosomal periodicities from the The similarity of the actual oscillations of yeast DNA in whole yeast genome which turn out to be consistent with Figure 1 and the artificial sequences in Figure 3 is, however, earlier estimates. restricted to the first 3 periods. The long-ranging correlations As described in the Methods section we analyze the se- in the lower graph of Figure 1 cannot be explained solely by quence periodicities with nonlinear curve fits. Figure 4 protein structure. shows the result of such a fit leading to a period of 10.2 bp. Recently, we confirmed this claim by studying separately Quite similar estimates of the period result from dinucleotide correlations in the three frames of coding DNA (Herzel et al., correlation functions or roll–roll correlation functions (not 1998b). As expected from the discussed peculiarity of the shown). genetic code, protein correlations affect mainly the second Closer examination of Figure 4 reveals a peculiar feature position of codons and are extended over a few periods only. of the pattern — there seems to be a phase shift in the middle 189 H.Herzel et al. Fig. 3. 3 bp running averages of correlation functions of randomly back-translated yeast proteins (cf. Fig. 2). From top to bottom: C + C , AA TT C + C , C . AT TA WW the nucleosome DNA periodicity 10.39 bp which is derived from 8 independent measurements (Trifonov, 1995). Now we discuss possible explanations of the phase shift in Figure 4. First, the DNA path around nucleosomes is not necessarily a continuous superhelix. Zhurkin (1985) dis- cussed, for example, nonuniform bending around the nu- cleosomal core. In addition to such INTRAnucleosomal deviations from a regular superhelix, also INTERnucleosomal correlations can cause the phase shift. Indeed, linker lengths of 8 bp, 10.5+8 bp, 21+8 bp and so on (rather than, say, mere multiples of 10.5 bases) have been observed as typical values both exper- imentally (Noll et al., 1980) and by sequence-directed map- ping of the nucleosomes (Mengeritsky and Trifonov, 1983). They correspond to the midpoints of sterically allowed ranges of rotations of the closely neighboring nucleosomes around their connecting linkers (Ulanovsky and Trifonov, Fig. 4. WW-correlations in the complete yeast genome. Nonlinear 1986). Such displacement between neighboring nucleo- curve fitting (dashed line) was applied in the range from 38 to 105 bp to avoid dominance of protein correlations. somes (2.5 bases off the 10.5 base ladder) would result in the apparant phase shift as in Figure 4. The observed oscillation, therefore, can be related even in detail to chromatin structure, and is consistent with earlier of the k-range. The maxima around 55 bp are only 8 bp apart. results on nucleosome sequence periodicities. Since this phenomenon was found in various correlation functions, it does not seem to be just a statistical fluctuation. Supercoiling in eubacteria and archaea Separate fittings of the first peaks of the curve in the Figure 4 and of the remaining peaks result in somewhat higher va- In prokaryotes DNA is not packed as tightly as in eukaryotic lues for the nucleosome DNA periodicity (data not shown) chromatin. It is, therefore, particularly interesting that pro- which is consistent with the latest weight average estimate of nounced 10–11 bp oscillations appear in prokaryotes as well 190 Sequence periodicities reflect DNA supercoiling Fig. 5. Comparison of correlation functions from bacteria and archaea listed in Table 1. Upper graph: E. coli (solid line), B. subtilis (grey), Synechocystis (dashed), H. influenzae (long dashed). Middle graph: H. pylori (solid), M. pneumoniae (dashed), M. genitalium (long dashed). Lower graph: A. fulgidis (solid), M. thermo. (dashed), M. jannaschii (long dashed). In order to visualize the significantly different periodicities, arrows are drawn in distances of 11 bp (upper and middle graph) and 10 bp (lower graph). Table 1. Estimated periodicities of genomic DNA (Herzel et al., 1998a; Tomita et al., 1998). In this section we discuss these periodicities in connection with the supercoiled Length Nucleotides Dinucleotides state of genomic DNA which is important for its replication, recombination and transcription (Wang, 1996). E. coli 4.6 M 11.0 11.0 It turns out that for eubacterial DNA the bases A and T B. subtilis 4.2 M 11.2 11.2 Synechocystis 3.6 M 11.5 11.6 nearly oscillate in phase and, therefore, we study primarily H. influenzae 1.8 M 11.2 11.0 autocorrelations of weakly binding nucleotides termed WW- H. pylori 1.7 M 11.2 11.2 correlations. The resulting oscillations in Figure 5 are highly M. pneumoniae 0.8 M 11.3 11.4 significant since autocorrelation functions of random se- M. genitalium 0.6 M 11.5 11.5 quences of the same length have standard deviations below A. fulgidus 2.2 M 10.0 10.0 0.0003 (Weiss and Herzel, 1998) compared to observed am- M. thermo. 1.8 M 10.1 – plitudes up to 0.003. The high number of peaks allows, more- M. jannaschii 1.7 M 10.0 10.0 over, a fairly accurate estimation of the corresponding periods. The estimated periods are summarized in Table 1. In We estimated the periods from the correlation functions in the range from 38 the case of B. subtilis the oscillations are less regular (grey to 105 bp via nonlinear curve fitting (see Methods section). Distances below 38 bp are excluded to avoid dominance of protein correlations. The left col- line in Figure 5), presumably due to inhomogeneities of the umn refers to correlations of weakly binding nucleotides (A or T) whereas the genomes due to phages, varying codon usage or repeats. middle column is obtained from correlations of AA or TT dinucleotides. In the Nevertheless, the fitting procedure leads to a reasonable esti- case of Methanobacterium thermoautotrophicum the dinucleotide correlation mate of the period T. functions exhibit no clear periodicities. 191 H.Herzel et al. The helical period of free DNA is about 10.55 bp (Trifo- in genomic DNA of archaea has not been determined yet (Pereira et al., 1997). nov, 1998). Inspection of Figure 5 and Table 1 reveals, how- Our statistical analysis of complete genomes provides evi- ever, that the periods of bacteria and archaea deviate signifi- dence that genomic DNA of thermophilic archaea exhibits cantly from the value for the helical repeat for free DNA, positive supercoiling. The significantly different periodici- 10.55 bp. For example, the ninth peak around 100 bp in the ties of bacteria and archaea (compare Figure 5) give further upper graph implies a period of about 11 bp. For archaea justification of the three-domain concept — archaea, bacteria (lower graph) the peak at 100 bp is already the tenth peak and eucarya (Woese et al., 1990; Ouzonis et al., 1995). Since which corresponds to a period of about 10 bp. Least square reverse gyrase is present also in thermophilic eubacteria fits of the averaged correlation functions gave periods of (Bouthier de la Tour et al., 1991), the positive supercoiling 11.36 bp for bacteria and 10.01 bp for archaea (Herzel et al., might be a thermophilic feature. In order to decide whether 1998b). These periodicities have been confirmed also by or not positive supercoiling is an archaeal or a thermophilic spectral analysis. Moreover, the estimated periods in Table 1 feature, forthcoming genomes of mesophilic methanogens provide clear evidence that bacteria typically have periods and thermophilic eubacteria have to be analyzed along the above 11 bp whereas the analyzed thermophilic archaea have lines of this paper. periods of about 10 bp. Sequence periodicities (such as periodically placed AA dinucleotides) having the helical period are associated with Summary curved DNA — a local flat arc. When the sequence period We have shown that correlation functions of complete ge- is not exactly equal to the DNA helical period, the sequence- nomes reveal pronounced oscillations with periods in the dependent DNA curvature turns into superhelical writhe. range of 10–11 bp. These periodicities appear in nucleotide– According to Crick’s formula for helical DNA trajectories nucleotide as well as in dinucleotide–dinucleotide correla- (Crick, 1976), periods above 10.55 bp generate negatively tion functions. Partly, the ocsillations could be traced back to supercoiled DNA whereas lower periods induce positive correlations in protein sequences. However, we attribute the supercoiling. In this way DNA periodicities may stabilize periodicities beyond the first three peaks to signals which and synchronize the supercoiled state introduced by proteins reflect the folding of DNA. For yeast the earlier predicted or topoisomerases. Therefore, sequence periodicities reflect nucleosomal sequence patterns are confirmed. For bacteria the characteristic superhelical density of genomic DNA. and archaea, a significant deviation from the equilibrium For bacterial DNA the superhelical density is negative period for free DNA of 10.55 bp is found. We emphasize, that (Vologodsky, 1992), i.e. the DNA is underwound. Indeed, these deviations reflect the supercoiled state of genomic the observed sequence periodicity of about 11 bp is clearly DNA. Indeed, the negative supercoiling of eubacteria is con- above the equilibrium helical period as expected. sistent with a sequence periodicity of 11 bp. For archaea, Similarly, the 10 bp sequence periodicity of archaeal DNA however, the supercoiled state of genomic DNA is disputed can be interpreted as a reflection of supercoiling of opposite (Pereira et al., 1997). The observed period of 10 bp suggests sign, i.e. positive supercoiling. Since high temperatures in- positive supercoiling for thermophilic archaea. crease the helical repeat slightly (Depew and Wang, 1975) a sequence periodicity of 10 bp indicates a substantial over- Acknowledgements winding of DNA that would contribute to DNA stabilization at high temperatures (Kikuchi and Asai, 1984). This work has been supported by the Deutsche For- So far, there is no direct experimental information avail- schungsgemeinschaft. We thank Ivo Große, Johannes able about the superhelical state of genomic DNA in archaea. Schuchhardt and anonymous referees for many valuable There are, however, strong indications for positive supercoil- comments. ing: Archaeal plasmids and a virus-like particle from Sulfolo- bus are positively supercoiled (Lopez-Garcia and Forterre, References 1997; Nadal et al., 1986). Moreover, hyperthermophilic ar- Bernardi,G., Olofsson,B., Filipski,J., Zerial,M., Salinas,J., Cuny,G., chaea possess reverse gyrase activity which generates posi- Meunier–Rotival,M. and Rodier,F. (1985) The mosaic genome of tive supercoiling of DNA (Kikuchi and Asai, 1984). Finally, warm-blooded vertebtates. Science, 228, 953–957. histone-like proteins have been found in various archaea Bouthier de la Tour,C., Portemer,C., Huber,R., Forterre,P. and (Pereira et al., 1997). These histones when bound to DNA Duguet,M. (1991) Reverse gyrase in thermophilic eubacteria. J. form nucleosome-like structures (NLS) and introduce toroi- Bacteriol., 173, 3921–3923. dal supercoils in DNA (Musgrave et al., 1991). Despite Crick,F.H.C. (1976) Linking numbers and nucleosomes. Proc. Natl many experimental studies the handedness of the supercoils Acad. Sci. USA, 73, 2639–2643. 192 Sequence periodicities reflect DNA supercoiling Depew,R.E. and Wang,J.C. (1975) Conformal fluctuations of DNA Noll,M., Zimmer,S., Engel,S. and Dubochet,J. (1980) Self-assembly helix. Proc. Natl Acad. Sci. USA, 72, 4275–4279. of single and closely spaced nucleosome core particles. Nucleic Drew,H.R. and Travers,A.A. (1985) DNA bending and its relation to Acids Res., 8, 21–42. nucleosome positioning. J. Mol. Biol., 186, 773–790. Ouzonis,C., Kyrpides,N. and Sander,C. (1995) Novel protein families Elton,R.A. (1974) Theoretical models for heterogeneity of base in archaean genomes. Nucleic Acids Res., 23, 565–570. composition in DNA. J. Theor. Biol., 45, 533–553. Pereira,S.L., Grayling,R.A., Lurz,R. and Reeve,J.N. (1997) Archaeal Fickett,J.W. and Tung,C.S. (1992) Assessment of protein coding nucleosomes. Proc. Natl Acad. Sci. USA, 94, 12633–12637. measures. Nucleic Acids Res., 20, 6441–6450. Tomita,M., Wada,M. and Kawashima,Y. (1998) Periodic patterns in Garnier,J., Osguthorpe,D.J. and Robson,B. (1978) Analysis of the bacterial genomes. Submitted. accuracy and implications of simple methods for predicting the Trifonov,E.N. (1995) Hidden segmentation of protein sequences: secondary structure of globular proteins. J. Mol. Biol., 120, 95–120. structural connection with DNA. In Pullman,A., Jortner,J. and Herzel,H. and Große,I. (1995) Measuring correlations in symbol Pullman,B. (eds), Modelling of Biomolecular Structures and sequences. Physica A, 216, 518–542. Mechanisms, pp. 473–479. Kluwer Academic, Dordrecht. Herzel,H., Trifonov,E.N., Weiss,O. and Große,I. (1998a) Interpreting Trifonov,E.N. (1998) 3-, 10.5-, 200- and 400-base periodicities in correlations in biosequences. Physica A, 249, 449–459. genome sequences. Physica A, 249, 511–516. Herzel,H., Weiss,O. and Trifonov,E.N. (1998b) Sequence periodicity Trifonov,E.N. and Sussman,J.L. (1980) The pitch of chromatin DNA is in complete genomes of Archaea suggests positive supercoiling. J. reflected in its nucleotide sequence. Proc. Natl Acad. Sci. USA, 77, Biomol. Struct. Dyn., 16, 341–345. 3816–3820. Hobohm,U. and Sander,C. (1995) A sequence property approach to Ulanovsky,L.E. and Trifonov,E.N. (1986) A different view point on the searching protein databases. J. Mol. Biol., 251, 390–399. chromatin higher order structure: steric exclusion effects. In Ioshikhes,I., Bolshoy,A., Derenshteyn,K., Borodovsky,M. and Trifo- Sarma,R.H. and Sarma,M.H. (eds), Biomolecular Stereodynamics nov,E.N. (1996) Nucleosome DNA sequence pattern revealed by III, pp. 35–44. Adenine Press, Guilderland. multiple alignment of experimentally mapped sequences. J. Mol. van Holde,K.E. (1988) Chromatin. Springer, Berlin. Biol., 262, 129–139. Vologodsky,A. (1992) Topology and Physics of Circular DNA. CRC Kanehisa,M.I. and Tsong,T.Y. (1980) Hydrophobicity and protein Press, Boca Raton. structure. Biopolymers, 19, 1617–1628. Wang,J.C. (1996) DNA topoisomerases. Annu. Rev. Biochem., 65, Kikuchi,A. and Asai,K. (1984) Reverse gyrase — a topoisomerase 635–692. which introduces positive superhelical turns into DNA. Nature, 309, Weiss,O. and Herzel,H. (1998) Correlations in protein sequences and 677–681. property codes. J. Theor. Biol., 190, 341–353. Lopez-Garcia,P. and Forterre,P. (1997) DNA topology in hyperther- Widom,J. (1996) Short–range order in two eukaryotic genomes: mophilic Archaea: Reference states and their variation with growth relation to chromosome structure. J. Mol. Biol., 259, 579–588. phase, growth temperature, and temperature stresses. Mol. Micro- Woese,D.R., Dugre,D.H., Saxinger,W.C. and Dugre,S.A. (1966) The biol., 23, 1267–1279. molecular basis of the genetic code. Proc. Natl Acad. Sci. USA, 55, Mengeritsky,G. and Trifonov,E.N. (1983) Nucleotide sequence-di- 966–974. rected mapping of the nucleosomes. Nucleic Acids Res., 11, Woese,C.R., Kandler,O. and Wheelis,M.L. (1990) Towards a natural 3833–3851. system of organisms: proposal for the domains Archaea, Bacteria, Musgrave,D.R., Sandman,K.M. and Reeve,J.N. (1991) DNA binding and Eucarya. Proc. Natl Acad. Sci. USA, 87, 4576–4579. by the archaeal histone HMf results in positive supercoiling. Proc. Zhurkin,V.B. (1981) Periodicity in DNA primary structure is defined Natl Acad. Sci. USA, 88, 10397–10401. by secondary structure of the coded protein. Nucleic Acids Res., 9, Nadal,M., Mirambeau,G., Forterre,P., Reiter,W.-D. and Duguet,M. 1963–1971. (1986) Positively supercoiled DNA in a virus-like particle of an Zhurkin,V.B. (1985) Sequence-dependent bending of DNA and archaebacterium. Nature, 321, 256–258. phasing of nucleosomes. J. Biomol. Struct. Dyn., 4, 785–804.

Journal

BioinformaticsOxford University Press

Published: Mar 1, 1999

There are no references for this article.