Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

Molecular Evolution of Minisatellites in Hemiascomycetous Yeasts

Molecular Evolution of Minisatellites in Hemiascomycetous Yeasts Abstract Minisatellites are DNA tandem repeats exhibiting size polymorphism among individuals of a population. This polymorphism is generated by two different mechanisms, both in human and yeast cells, “replication slippage” during S-phase DNA synthesis and “repair slippage” associated to meiotic gene conversion. The Saccharomyces cerevisiae genome contains numerous natural minisatellites. They are located on all chromosomes without any obvious distribution bias. Minisatellites found in protein-coding genes have longer repeat units and on the average more repeat units than minisatellites in noncoding regions. They show an excess of cytosines on the coding strand, as compared to guanines (negative GC skew). They are always multiples of three, encode serine- and threonine-rich amino acid repeats, and are found preferably within genes encoding cell wall proteins, suggesting that they are positively selected in this particular class of genes. Genome-wide, there is no statistically significant association between minisatellites and meiotic recombination hot spots. In addition, minisatellites that are located in the vicinity of a meiotic hot spot are not more polymorphic than minisatellites located far from any hot spot. This suggests that minisatellites, in S. cerevisiae, evolve probably by strand slippage during replication or mitotic recombination. Finally, evolution of minisatellites among hemiascomycetous yeasts shows that even though many minisatellite-containing genes are conserved, most of the time the minisatellite itself is not conserved. The diversity of minisatellite sequences found in orthologous genes of different species suggests that minisatellites are differentially acquired and lost during evolution of hemiascomycetous yeasts at a pace faster than the genes containing them. minisatellite, replication slippage, meiotic hot spot, GC skew, yeast Introduction Repetitive elements are a common feature of all prokaryotic and eukaryotic genomes. They can be classified in two categories: dispersed repeat elements (transposons, tRNAs, paralogous protein encoding genes, etc.) and tandem repeat elements. Micro- and minisatellites are tandem repeat arrays whose unit sizes range from a few nucleotides for the former to more than 10 bp for the latter (Charlesworth, Sniegowski, and Stephan 1994). Their size polymorphism in populations has been widely used for physical mapping of genomes (Dib et al. 1996; Röder et al. 1998; Sakamoto et al. 2000; Waldbieser et al. 2001), forensic medicine (Gill, Jeffreys, and Werrett 1985; Hagelberg, Gray, and Jeffreys 1991), and paternity tests (Helminen et al. 1988; Foster et al. 1998). Molecular mechanisms underlying micro- and minisatellite size changes have been studied in humans and in model organisms (reviewed in Debrauwère et al. 1997). One of the earliest models proposes that microsatellites gain and lose repeat units by replication slippage during S-phase DNA synthesis (reviewed in Ellegren 2004), but other models involving slippage during gene conversion associated to homologous recombination have been suggested (reviewed in Richard and Pâques 2000). Minisatellites were initially proposed to undergo size changes in humans, mainly during meiosis (Jeffreys et al. 1994). It was subsequently shown that the MS32 minisatellite instability was associated to a meiotic recombination hot spot that triggers size changes during meiosis (Jeffreys, Murray, and Neumann 1998). Experiments in the yeast Saccharomyces cerevisiae have confirmed the frequent size changes of human minisatellites when inserted in the yeast genome, near a meiotic hot spot (Appelgren, Cederberg, and Rannug 1997), these size changes being dependent on the presence of Spo11p, the topoisomerase responsible for making meiotic double-strand breaks (DSBs) (Debrauwère et al. 1999). However, minisatellites are also unstable during somatic cell growth, undergoing rare size changes by replication slippage or unequal sister chromatid recombination (Jeffreys and Neumann 1997). In yeast, these somatic size changes depend on the presence of the Rad27 protein, involved in Okazaki fragment processing (Lopes et al. 2002; Maleki, Cederberg, and Rannug 2002). One of the most intriguing question concerning tandem repeat sequences in general and minisatellites in particular is their very origin. Are minisatellites initially created by S-phase replication slippage or by homologous recombination when located by chance near a meiotic hot spot? To address this question, we performed an in silico analysis of the completely sequenced S. cerevisiae genome, looking for minisatellites. To our surprise, we found a large number of such elements, most of them never described before in the literature. Most of the time, they are located within genes exhibiting a negative GC skew (more cytosines than guanines on the coding strand) and are themselves more skewed than their containing genes. Very often, short flanking repeats are found upstream and downstream of minisatellites. No positive correlation was found between the location of minisatellites and the distribution of meiotic hot spots in the yeast genome. Altogether, these data suggest that, in S. cerevisiae, natural minisatellites are acquired and lost by a molecular mechanism independent of meiotic recombination, and probably involving replication slippage between short flanking sequences, in genes exhibiting a strong bias for cytosines on the coding strand. Materials and Methods Analysis of the S. cerevisiae Genome We ran the program MREPS (Kolpakov, Bana, and Kucherov 2003) using the following parameters: minimal size of repeat unit (-minp) equal to 10 and minimal repeat length (-minsize) equal to 30. Using these parameters, only minisatellites of at least three 10-nt repeat unit long were detected. Because the resolution parameter (allowing some degree of “fuzziness” within the repeat) was set at the minimal value, variant repeats could not be detected. Therefore, repeats were individually examined and minisatellites manually extended 5′ and 3′ of the initial repeat detected by MREPS. In order to determine the threshold under which a minisatellite was too degenerate to be detected by the program, we calculated, for each minisatellite, the percentage of base substitution between all the repeats. It ranges from 0% (all repeat units identical, three occurences: minisatellites in PAN1, BBC1, and YLR114c) to 88.9% (only 2 nt out of 18 conserved in all repeats in DAN4), with an average of 35% ± 6% (median: 33.3%). Generally, a minisatellite contains a few conserved repeats, and others are more diverged. The percentage of base substitution was calculated between all the repeats. Given that the program was able to detect very degenerate minisatellites, like the one in DAN4, it is unlikely that many minisatellites in the S. cerevisiae genome were missed using this approach. In addition, some minisatellites, corresponding in fact to imperfect microsatellites (Richard and Dujon 1996), were detected by the program but not taken into account thereafter. Using this approach, MREPS detected 257 repeats fulfilling the required criteria. After careful examination, some of the repeats found by the program were partially overlapping or were part of the same minisatellite, resulting in a final number of 84 minisatellites used for the present analysis. GC skews were calculated as (G − C/G + C), using DNA Strider 1.4f6 (Marck 1988). Windows of 100 bp were used for the calculation. Both GC content and GC skew of minisatellite-containing genes were calculated on the gene DNA sequence without the minisatellite. Functional annotations are based on Gene Ontology annotations retrieved from the Saccharomyces Genome Database. Search for Orthologues in Hemiascomycetous Yeasts The Saccharomyces paradoxus orthologues of S. cerevisiae genes were retrieved from the Saccharomyces Genome Database (ftp://genome-ftp.stanford.edu/yeast/data_download/sequence/fungal_genomes/S_paradoxus/). For Candida glabrata, Kluyveromyces lactis, Debaryomyces hansenii, and Yarrowia lipolytica, we started from protein families built from sequence similarities during Génolevures 2 (Dujon et al. 2004). For families containing only one gene per sequenced species (1:1:1:1:1 relationship), we considered that this gene was the direct orthologue of the S. cerevisiae gene. For other gene families, when orthologues could not be chosen among paralogues based on sequence similarity, synteny conservation was used, whenever possible, to determine the correct orthologue. Most of the time, synteny did not help to find the correct orthologue, and these genes were, therefore, tagged as “family” (“fam.” in table 5). Finally, for S. cerevisiae genes without orthologues by the former approach, we performed BlastP searches, using as a query the S. cerevisiae gene, in the Génolevures database (http://cbi.labri.fr/Genolevures). The best match was in turn used as a query in a BlastP search against the S. cerevisiae genome. Positive bidirectional best hits were validated as real S. cerevisiae orthologues (22 orthologues found using this approach). Polymerase Chain Reaction Analysis of Minisatellite Polymorphism Specific primers were designed to amplify polymerase chain reaction (PCR) fragments of 212 bp (SNF11), 254 bp (PRY2), 315 bp (BUD27), 209 bp (DSN1), 192 bp (SCW11), 292 bp (YKL105c), 321 bp (YOL155c), or 148 bp (NIS1). Primer sequences are available on request. The PCR program used was 95°C for 15 s, 60°C for 1 min, 72°C for 30 s (30 cycles), and a final extension step at 72°C for 10 min. A sample was loaded on a 3% Metaphor agarose gel (TEBU, Le Perray en Yvelines Cedex, France) with 100-bp ladder as a size marker (Eurogentec, Seraing, Belgium). The gel was run overnight at 1 V/cm in 1× TBE. Origin of the different yeast strains used can be found in Richard and Dujon (1996). Results Distribution of Minisatellites in the S. cerevisiae Genome We performed a systematic search of minisatellites in the S. cerevisiae genome using the MREPS sofware (Kolpakov, Bana, and Kucherov 2003) (see Materials and Methods). Using as criteria a minimum repeat unit size of 10 bp and a minimum of three repeat units, we found 84 minisatellites in the genome of S. cerevisae, 55 of them in 49 different protein-coding genes (table 1), 11 in noncoding regions (table 2), and 18 in Y' subtelomeric elements (Louis et al. 1994). Their distribution does not show any obvious bias toward centromeric or telomeric regions (except Y' minisatellites, always subtelomeric) (fig. 1). As expected, repeat units of minisatellites in genes are always multiples of 3 nt, allowing changes in the number of repeat units without disrupting the reading frame. This is not the case for minisatellites in noncoding regions (table 2). Among the 49 minisatellite-containing genes, four are essential genes (table 1). One of them, DSN1, exhibits size polymorphism between different yeast strains (see later), proving that the size change does not disrupt gene function. On the average, minisatellites encoded by genes are longer—both in terms of unit size and of unit number—than those found in noncoding regions (fig. 2). This suggests that amino acid repeats are positively selected, or alternatively that long minisatellites are counter selected in noncoding regions, due to transcription initiation or termination constraints. FIG. 1.— View largeDownload slide Distribution of minisatellites in the Saccharomyces cerevisiae genome. Each chromosome is represented as a black line. Minisatellites in genes are indicated in black, those in noncoding regions in red. Number of units and unit sizes are shown below the vertical line. Short horizontal lines near chromosome telomeres symbolize Y' minisatellites (with their respective unit numbers below). FIG. 1.— View largeDownload slide Distribution of minisatellites in the Saccharomyces cerevisiae genome. Each chromosome is represented as a black line. Minisatellites in genes are indicated in black, those in noncoding regions in red. Number of units and unit sizes are shown below the vertical line. Short horizontal lines near chromosome telomeres symbolize Y' minisatellites (with their respective unit numbers below). FIG. 2.— View largeDownload slide Minisatellite unit size as a function of unit number. Gene names of minisatellite-containing genes with unit numbers or unit sizes statistically larger than the average are indicated. Note that DAN4 contains two minisatellites, one with unit size larger than the average and the other with unit number larger than the average. FIG. 2.— View largeDownload slide Minisatellite unit size as a function of unit number. Gene names of minisatellite-containing genes with unit numbers or unit sizes statistically larger than the average are indicated. Note that DAN4 contains two minisatellites, one with unit size larger than the average and the other with unit number larger than the average. Table 1 Minisatellites in Genes Chr.   Gene   Systematic Name   Nbr.   Size   Gene GC%a   Gene GC Skewa   MS GC%   MS GC Skew   I  FLO9  YAL063cb  13  135  45  −0.26  47  −0.37c  I  FLO1  YAR050wb  10  135  45  −0.25  47  −0.39c  II  —  YBR016w  5  15  51  −0.16  47  +0.03  II  TIP1  YBR067cb  9  18  48  −0.21  57  −0.43c  II  CYC8  YBR112c  3  18  44  −0.16  46  −0.36c  IV  RPO21  YDL140c  6  21  41  +0.01  49  −0.77c        13  21      50  −0.59c  IV  BSC1  YDL037c  16  15  40  −0.39  39  −1.00c  IV  SNF11  YDR073w  6  12  41  −0.16  36  −0.08  IV  —  YDR134cb  3  12  47  −0.30  44  −0.25  IV  NUM1  YDR150w  10  192  38  +0.01  43  −0.01  IV  HKR1  YDR420w  21  42  43  −0.22  53  −0.41c  IV  FIT1  YDR534cb  10  18  47  −0.15  52  −0.12  V  TIR1  YER011wb  7  36  47  −0.16  51  −0.38c  VI  BUD27  YFL023w  6  30  40  +0.21  46  +0.54  VII  NAB2  YGL122c  9  12  44  −0.04  51  −0.60c  VII  SCW11  YGL028c  8  12  41  −0.21  40  −0.86c  VII  CRH1  YGR189cb  5  24  43  −0.14  49  −0.42c  VIII  FLO5  YHR211wb  7  135  44  −0.19  46  −0.24c        3  21      48  −0.07  IX  —  YIL169c  12  42  45  −0.26  45  −0.33c  IX  TIR3  YIL011wb  4  12  47  −0.23  50  −0.42c  IX  PAN1  YIR006c  9  18  43  −0.11  48  0.00        3  21      50  −0.56c  IX  DSN1  YIR010w  6  12  42  −0.003  50  +0.13  IX  FLO11  YIR019cb  5  30  47  −0.41  46  −0.45c        5  36      49  −0.46c  X  PIR2  YJL159w  6  78  49  −0.21  50  −0.18  X  BBC1  YJL020c  3  21  44  −0.03  62  −0.23c  X  DAN1  YJR150cb  5  12  46  −0.13  47  0.00  X  DAN4  YJR151cb  30  18  44  −0.33  44  −0.85c        7  72      46  −0.32  XI  PIR1  YKL164c  8  57  46  −0.22  47  −0.30c  XI  PIR3  YKL163w  6  54  46  −0.08  50  −0.19c  XI  —  YKL105c  8  18  41  +0.003  38  +0.53  XI  —  YKL023w  4  12  41  +0.2  31  +0.60  XI  PRY2  YKR013w  6  18  48  −0.19  53  −0.72c  XI  FLO10  YKR102w  3  81  44  −0.17  51  −0.12  XII  —  YLR114c  3  27  39  +0.14  56  +0.60  XII  CTS1  YLR286c  3  15  41  −0.12  49  0.00  XII  CHS5  YLR330w  9  21  42  +0.09  48  +0.08  XII  CCW14  YLR390w-ab  3  33  48  −0.28  55  −0.44c  XIII  DDR48  YMR173w  6  24  39  −0.10  37  −0.28c        4  24      43  −0.37c  XIII  —  YMR317wb  16  36  43  −0.23  47  −0.07  XIV  UBP10  YNL186w  7  12  39  +0.02  48  +0.40  XIV  NIS1  YNL078w  5  12  45  −0.11  38  −0.48c  XIV  AGA1  YNR044wb  17  21  42  −0.38  41  −0.66c  XV  —  YOL155cb  5  39  47  −0.26  53  −0.33c  XV  WSC3  YOL105c  17  12  42  −0.15  48  −0.96c  XV  TIR4  YOR009wb  12  36  47  −0.25  50  −0.20  XV  TIR2  YOR010cb  5  33  47  −0.17  51  −0.31c  XV  PET127  YOR017w  4  18  37  +0.04  47  0.00c  XV  FIT3  YOR383cb  3  15  51  −0.21  60  −0.33c  XVI  MF(ALPHA)1  YPL187w  3  63  44  +0.02  49  +0.08  Mean      8  36  44.0  −0.13  47.6  −0.25  SE       0.7   5.3   0.5   0.02   0.8   0.05   Chr.   Gene   Systematic Name   Nbr.   Size   Gene GC%a   Gene GC Skewa   MS GC%   MS GC Skew   I  FLO9  YAL063cb  13  135  45  −0.26  47  −0.37c  I  FLO1  YAR050wb  10  135  45  −0.25  47  −0.39c  II  —  YBR016w  5  15  51  −0.16  47  +0.03  II  TIP1  YBR067cb  9  18  48  −0.21  57  −0.43c  II  CYC8  YBR112c  3  18  44  −0.16  46  −0.36c  IV  RPO21  YDL140c  6  21  41  +0.01  49  −0.77c        13  21      50  −0.59c  IV  BSC1  YDL037c  16  15  40  −0.39  39  −1.00c  IV  SNF11  YDR073w  6  12  41  −0.16  36  −0.08  IV  —  YDR134cb  3  12  47  −0.30  44  −0.25  IV  NUM1  YDR150w  10  192  38  +0.01  43  −0.01  IV  HKR1  YDR420w  21  42  43  −0.22  53  −0.41c  IV  FIT1  YDR534cb  10  18  47  −0.15  52  −0.12  V  TIR1  YER011wb  7  36  47  −0.16  51  −0.38c  VI  BUD27  YFL023w  6  30  40  +0.21  46  +0.54  VII  NAB2  YGL122c  9  12  44  −0.04  51  −0.60c  VII  SCW11  YGL028c  8  12  41  −0.21  40  −0.86c  VII  CRH1  YGR189cb  5  24  43  −0.14  49  −0.42c  VIII  FLO5  YHR211wb  7  135  44  −0.19  46  −0.24c        3  21      48  −0.07  IX  —  YIL169c  12  42  45  −0.26  45  −0.33c  IX  TIR3  YIL011wb  4  12  47  −0.23  50  −0.42c  IX  PAN1  YIR006c  9  18  43  −0.11  48  0.00        3  21      50  −0.56c  IX  DSN1  YIR010w  6  12  42  −0.003  50  +0.13  IX  FLO11  YIR019cb  5  30  47  −0.41  46  −0.45c        5  36      49  −0.46c  X  PIR2  YJL159w  6  78  49  −0.21  50  −0.18  X  BBC1  YJL020c  3  21  44  −0.03  62  −0.23c  X  DAN1  YJR150cb  5  12  46  −0.13  47  0.00  X  DAN4  YJR151cb  30  18  44  −0.33  44  −0.85c        7  72      46  −0.32  XI  PIR1  YKL164c  8  57  46  −0.22  47  −0.30c  XI  PIR3  YKL163w  6  54  46  −0.08  50  −0.19c  XI  —  YKL105c  8  18  41  +0.003  38  +0.53  XI  —  YKL023w  4  12  41  +0.2  31  +0.60  XI  PRY2  YKR013w  6  18  48  −0.19  53  −0.72c  XI  FLO10  YKR102w  3  81  44  −0.17  51  −0.12  XII  —  YLR114c  3  27  39  +0.14  56  +0.60  XII  CTS1  YLR286c  3  15  41  −0.12  49  0.00  XII  CHS5  YLR330w  9  21  42  +0.09  48  +0.08  XII  CCW14  YLR390w-ab  3  33  48  −0.28  55  −0.44c  XIII  DDR48  YMR173w  6  24  39  −0.10  37  −0.28c        4  24      43  −0.37c  XIII  —  YMR317wb  16  36  43  −0.23  47  −0.07  XIV  UBP10  YNL186w  7  12  39  +0.02  48  +0.40  XIV  NIS1  YNL078w  5  12  45  −0.11  38  −0.48c  XIV  AGA1  YNR044wb  17  21  42  −0.38  41  −0.66c  XV  —  YOL155cb  5  39  47  −0.26  53  −0.33c  XV  WSC3  YOL105c  17  12  42  −0.15  48  −0.96c  XV  TIR4  YOR009wb  12  36  47  −0.25  50  −0.20  XV  TIR2  YOR010cb  5  33  47  −0.17  51  −0.31c  XV  PET127  YOR017w  4  18  37  +0.04  47  0.00c  XV  FIT3  YOR383cb  3  15  51  −0.21  60  −0.33c  XVI  MF(ALPHA)1  YPL187w  3  63  44  +0.02  49  +0.08  Mean      8  36  44.0  −0.13  47.6  −0.25  SE       0.7   5.3   0.5   0.02   0.8   0.05   NOTE.—RPO21, essential gene; Chr., chromosome number; Nbr., number of repeats; Size, unit size (bp); SE, standard error; and MS: minisatellite. a Gene GC skews and GC% were calculated excluding the minisatellite. b GPI-containing gene (see text). c Minisatellite GC skew statistically lower than the gene GC skew. View Large Table 2 Minisatellites in Noncoding Regions Chr.   Location   Nbr.   Size   MS GC%   MS GC skewa   II  YBR246w-YBR247c  5  12  27  +0.38  III  YCL074w-YCL073c  4  17  52  −0.14  IV  YDR534c-YDR535c  3  25  33  −0.52  IX  RPL34B intron  3  11  18  0.00  X  ARS1006  3  14  2  −1.00  XI  YKL072w-YKL071w  6  12  26  +0.89  XIII  YMR243c-YMR244w  5  12  17  −1.00  XV  YOL143c-YOL142w  4  10  45  −0.11  XV  YOL005c-YOL004w  3  13  62  0.00  XVI  YPL179w-YPL178w  3  12  8  −1.00  XVI  YPL155c-YPL156c  3  10  30  +0.56  Mean    4  13  29  −0.18  SE     0.3   1.3   5.5   0.20   Chr.   Location   Nbr.   Size   MS GC%   MS GC skewa   II  YBR246w-YBR247c  5  12  27  +0.38  III  YCL074w-YCL073c  4  17  52  −0.14  IV  YDR534c-YDR535c  3  25  33  −0.52  IX  RPL34B intron  3  11  18  0.00  X  ARS1006  3  14  2  −1.00  XI  YKL072w-YKL071w  6  12  26  +0.89  XIII  YMR243c-YMR244w  5  12  17  −1.00  XV  YOL143c-YOL142w  4  10  45  −0.11  XV  YOL005c-YOL004w  3  13  62  0.00  XVI  YPL179w-YPL178w  3  12  8  −1.00  XVI  YPL155c-YPL156c  3  10  30  +0.56  Mean    4  13  29  −0.18  SE     0.3   1.3   5.5   0.20   NOTE.—Chr., chromosome number; Nbr., number of repeats; Size, unit size (bp); SE, standard error; and MS: minisatellite. a The Watson strand was arbitrarily chosen to calculate minisatellite GC skews in noncoding regions. View Large Interestingly, minisatellite-containing genes tend to show, on the average, a higher GC content (44 ± 0.5%) than other yeast genes (ca. 39%, Dujon 1996; Goffeau et al. 1996), and minisatellites in genes exhibit an even higher GC content (47.6 ± 0.8%, table 1). This is not true for minisatellites in noncoding regions which are, on the average, GC poor (table 2). More surprisingly, there is a bias for cytosines as compared to guanines in minisatellite-containing genes which exhibit, on the average, a negative GC skew (−0.13 ± 0.02). This bias is even stronger in minisatellites themselves (−0.25 ± 0.05). Out of 55 minisatellites in genes, 33 (60%) show a negative GC skew, indicative of a strong bias toward cytosines in the coding strand (table 1). Base composition of minisatellites in noncoding regions is, on the average, less biased (−0.18 ± 0.20) and ranges from G-rich to G-poor sequences (table 2). A few examples of GC skews in minisatellite-containing genes are shown in figure 3. FIG. 3.— View largeDownload slide Three examples of GC skews in minisatellites. For each of the three genes (BSC1, DAN4, and WSC3), the GC skew is shown on the y axis (see Materials and Methods). Minisatellite locations are represented by gray shadings. DAN4 contains two minisatellites, but only the first one shows a GC skew significantly lower than the gene GC skew (see table 1). FIG. 3.— View largeDownload slide Three examples of GC skews in minisatellites. For each of the three genes (BSC1, DAN4, and WSC3), the GC skew is shown on the y axis (see Materials and Methods). Minisatellite locations are represented by gray shadings. DAN4 contains two minisatellites, but only the first one shows a GC skew significantly lower than the gene GC skew (see table 1). Human minisatellites are not perfect tandem repeats but a succession of variant repeats, differing from each other by one or more nucleotides. This polymorphism was used to rapidly determine their exact sequence by minisatellite variant repeats mapping (Jeffreys, Neumann, and Wilson 1990). Yeast minisatellites also share this property and contain variant repeats, as examplified in figure 4. FIG. 4.— View largeDownload slide Two examples of minisatellites in the HKR1 and TIR1 genes. Minisatellite repeats have been aligned using ClustalW. Variable nucleotides in repeats are shaded (repeat number 1 was considered as the reference). Numbers to the left indicate the repeat type. HKR1 contains eight different types of variant repeats; TIR1 contains six different types. Underlined sequences are the flanking repeats (see table 6). The distance (in nucleotides) between the last minisatellite nucleotide and the downstream flanking repeat is shown in both cases. Note that the flanking repeats are also found at the 3′ end of each individual repeat, suggesting an ancestral origin of the flanking repeats (see Discussion). FIG. 4.— View largeDownload slide Two examples of minisatellites in the HKR1 and TIR1 genes. Minisatellite repeats have been aligned using ClustalW. Variable nucleotides in repeats are shaded (repeat number 1 was considered as the reference). Numbers to the left indicate the repeat type. HKR1 contains eight different types of variant repeats; TIR1 contains six different types. Underlined sequences are the flanking repeats (see table 6). The distance (in nucleotides) between the last minisatellite nucleotide and the downstream flanking repeat is shown in both cases. Note that the flanking repeats are also found at the 3′ end of each individual repeat, suggesting an ancestral origin of the flanking repeats (see Discussion). Minisatellites located in the subtelomeric Y' elements are made of a 36-bp repeat unit, as previously described (Horowitz and Haber 1984; Haber and Louis 1998). In the sequenced strain, their unit number ranges from 7 (on chromosome IX) to 26 (on chromosome II). Some chromosomes contain two Y' elements, one on each arm, some contain only one, and finally chromosomes I, III, and XI do not contain any Y'. Y' elements are always in the same orientation relative to the centromere. Among the 49 minisatellite-containing genes, it is striking to note that half of them (25 out of 49) are involved in cell wall organization. Among them, a majority encode proteins that are covalently associated to cell wall polysaccharides (FLO9, FLO1, TIP1, TIR1, FLO5, FLO11, PIR2, DAN1, PIR1, and PIR3). A few others are involved in processes such as cell division, budding, transcription, or RNA processing (table 3). Some of these proteins were known to contain internal amino acid repeats (Klis et al. 2002). However, at the DNA level, they do not necessarily contain a recognizable minisatellite. For example, the PIR family (for protein with internal repeats) contains four members (PIR1–4), but only PIR1, PIR2, and PIR3 contain a minisatellite and the fourth member, PIR4, contains a degenerate repeat that does not fulfill our criteria (see Materials and Methods). Among the 25 cell wall genes, 19 are known or predicted to encode a glycosyl-phosphatidylinositol domain (GPI), involved in anchoring the protein to the plasma membrane (Caro et al. 1997; Hagen et al. 2004). GPIs are always located very close to the C-terminal part of the protein (except in the case of FLO9). Minisatellite location is apparently less constrained and corresponds to the first two-thirds of the protein. The average distance of the GPI from the 3′ part of the gene (if one excludes FLO9) is 69 ± 1 bp, whereas the average distance of the minisatellite from the 3′ part of the gene is 1417 ± 291 bp. Table 3 Functions Encoded by Minisatellite-Containing Genes Function   Genes   Nbr.   Cell wall organization  FLO9, FLO1, TIP1, HKR1, FIT1, TIR1, SCW11, CRH1, FLO5, TIR3, FLO11, PIR2, DAN1, DAN4, PIR1, PIR3, FLO10, CTS1, CHS5, CCW14, AGA1, WSC3, TIR4, TIR2, FIT3  25  Cell division and budding  NUM1, BUD27, PAN1, DSN1, BBC1, NIS1  6  Transcription, RNA processing  CYC8, RPO21, SNF11, NAB2, PET127  5  Other  DDR48, UBP10, MF(ALPHA)1  3  Unknown  YBR016w, BSC1, YDR134c, YIL169c, YKL105c, YKL023w, PRY2, YLR114c, YMR317w, YOL155c  10  Total     49   Function   Genes   Nbr.   Cell wall organization  FLO9, FLO1, TIP1, HKR1, FIT1, TIR1, SCW11, CRH1, FLO5, TIR3, FLO11, PIR2, DAN1, DAN4, PIR1, PIR3, FLO10, CTS1, CHS5, CCW14, AGA1, WSC3, TIR4, TIR2, FIT3  25  Cell division and budding  NUM1, BUD27, PAN1, DSN1, BBC1, NIS1  6  Transcription, RNA processing  CYC8, RPO21, SNF11, NAB2, PET127  5  Other  DDR48, UBP10, MF(ALPHA)1  3  Unknown  YBR016w, BSC1, YDR134c, YIL169c, YKL105c, YKL023w, PRY2, YLR114c, YMR317w, YOL155c  10  Total     49   NOTE.—Nbr., number of repeats. View Large Among amino acids encoded by minisatellites, serine and threonine are the most abundant, representing together 42% of the total (table 4). Among them, minisatellite-containing genes encoding cell wall proteins contain more Ser and Thr residues than other proteins. On the average, cell wall protein repeats contain 59% of Ser + Thr residues, whereas other classes of minisatellite-containing proteins contain from 13% to 26% of Ser + Thr. The second most frequent amino acids encoded by minisatellites are alanine (9%), glutamic acid (7%), and valine (7%). Each of the other amino acids is found only one to five percent of the time. This is very different from what was observed for genes encoding trinucleotide repeats (a particular class of microsatellites) in which glutamine, asparagine, glutamic acid, and aspartic acid are the four most common amino acids encoded by these repeats, and genes containing these repeats are mostly transcription factors (Richard and Dujon 1996; Alba, Santibañez-Koref, and Hancock 1999; Young, Sloan, and Van Riper 2000; Malpertuy, Dujon, and Richard 2003). In S. cerevisiae, the serine-threonine–rich repeats are thought to be the sites of O-mannosylations by the Pmt4 protein, these glycosylations taking place in the endoplasmic reticulum and being important for maintaining the protein at the cell wall surface (Ecker et al. 2003; Latgé and Calderone 2005). Table 4 Amino Acids Encoded by Minisatellites in Saccharomyces cerevisiae Gene   Systematic Name   Motif Size (aa)   Amino Acid Motif Sequencea   Ser %b   Thr %b   (Ser + Thr) %   FLO9c  YAL063c  45  DTFTSSTELTTVTGTNGLPTDETIIVIRTPTTATTAMTTTQPWNT  4  40  44  FLO1c  YAR050w  45  TFTSTSTELTTVTGTNGLPTDETIIVIRTPTTATTAMTTTQPWNS  7  40  47    YBR016w  5  YNQQG  —  —  0  TIP1c  YBR067c  6  EAASSS  50  —  50  CYC8  YBR112C  6  QAQAQA  —  —  0  RPO21  YDL140c  7  PSYSPTS  43  14  57      7  PTSPSYS  43  14  57  BSC1  YDL037c  5  STTSS  60  40  100  SNF11  YDR073w  4  TANA  —  25  25    YDR134c  4  TEKP  —  25  25  NUM1  YDR150w  64  AYSELEKKLEQPSLEYLVEHAKATNHHLLSDSAYEDLVKCKENP DMEFLKEKSAKLGHTVVSNE  9  3  12  HKR1c  YDR420w  14  APAAISSTYTSSPS  36  14  50  FIT1c  YDR534c  6  ASSAVE  33  —  33  TIR1c  YER011w  12  SSSSEAKSSSAA  58  —  58  BUD27  YFL023w  10  VVGDIIEKEP  —  —  0  NAB2  YGL122c  4  PQQQ  —  —  0  SCW11c  YGL028c  4  TSSS  75  25  100  CRH1c  YGR189c  8  SSTVSSSA  63  13  76  FLO5c  YHR211w  45  TFTSTSTEMTTITDTNGQLTDETVIVIRTPTTASTITTTTEPWTG  7  42  49      7  QTKGTTE  —  43  43    YIL169c  14  VVSSSVSQSSSSAS  64  —  64  TIR3c  YIL011w  4  SAAS  50  —  50  PAN1  YIR006c  6  PTQPVQ  —  17  17      7  PQTTGMM  —  29  29  DSN1  YIR010w  4  ATAN  —  25  25  FLO11c  YIR019c  10  SSTTTSSTSE  50  40  90      12  PVTSSTTESSSA  42  42  84  PIR2c  YJL159w  26  GDGQVQAATTTASVSTKSTAAAVSQI  15  19  34  BBC1  YJL020c  7  VPVPAAT  —  14  14  DAN1c  YJR150c  4  VASS  50  —  50  DAN4c  YJR151c  6  TTPTTS  17  67  84      24  SAEPTTVSEVTSSVEPTRSSQVTS  29  21  50  PIR1c  YKL164c  19  QIGDGQIQATTKTTAAAVS  5  21  26  PIR3c  YKL163w  18  VSQITDGQVQAAKSTAAA  11  11  22    YKL105c  6  ENVDDD  —  —  0    YKL023w  4  KQEK  —  —  0  PRY2  YKR013w  6  SPTTTT  17  67  84  FLO10c  YKR102w  27  SSWSSSEVCTECTETESTSYVTPYVTS  30  22  52    YLR114c  9  GEGDENGDD  —  —  0  CTS1c  YLR286c  5  STSSG  60  20  80  CHS5c  YLR330w  7  EDSNEPV  14  —  14  CCW14c  YLR390w-a  11  ASSSTKASSSS  64  9  73  DDR48  YMR173w  8  SNNNDSYG  25  —  25      8  SNNNDSYG  25  —  25    YMR317w  12  SSPVSSEAPSAT  42  8  50  UBP10  YNL186w  4  DIGE  —  —  0  NIS1  YNL078w  4  SNTN  25  25  50  AGA1c  YNR044w  7  SLSSTST  57  29  86    YOL155c  13  GSSVSGSTSATES  46  15  61  WSC3c  YOL105c  4  TTSS  50  50  100  TIR4c  YOR009w  12  SSSVAPSSSEVV  50  —  50  TIR2c  YOR010c  11  SSSETTSSAVA  45  18  63  PET127  YOR017w  6  YPGRRT  —  17  17  FIT3c  YOR383c  5  SAAET  20  20  40  MF(α)1   YPL187w   21   KREAEAEAWHWLQLKPGQPMY   —   —   0   Gene   Systematic Name   Motif Size (aa)   Amino Acid Motif Sequencea   Ser %b   Thr %b   (Ser + Thr) %   FLO9c  YAL063c  45  DTFTSSTELTTVTGTNGLPTDETIIVIRTPTTATTAMTTTQPWNT  4  40  44  FLO1c  YAR050w  45  TFTSTSTELTTVTGTNGLPTDETIIVIRTPTTATTAMTTTQPWNS  7  40  47    YBR016w  5  YNQQG  —  —  0  TIP1c  YBR067c  6  EAASSS  50  —  50  CYC8  YBR112C  6  QAQAQA  —  —  0  RPO21  YDL140c  7  PSYSPTS  43  14  57      7  PTSPSYS  43  14  57  BSC1  YDL037c  5  STTSS  60  40  100  SNF11  YDR073w  4  TANA  —  25  25    YDR134c  4  TEKP  —  25  25  NUM1  YDR150w  64  AYSELEKKLEQPSLEYLVEHAKATNHHLLSDSAYEDLVKCKENP DMEFLKEKSAKLGHTVVSNE  9  3  12  HKR1c  YDR420w  14  APAAISSTYTSSPS  36  14  50  FIT1c  YDR534c  6  ASSAVE  33  —  33  TIR1c  YER011w  12  SSSSEAKSSSAA  58  —  58  BUD27  YFL023w  10  VVGDIIEKEP  —  —  0  NAB2  YGL122c  4  PQQQ  —  —  0  SCW11c  YGL028c  4  TSSS  75  25  100  CRH1c  YGR189c  8  SSTVSSSA  63  13  76  FLO5c  YHR211w  45  TFTSTSTEMTTITDTNGQLTDETVIVIRTPTTASTITTTTEPWTG  7  42  49      7  QTKGTTE  —  43  43    YIL169c  14  VVSSSVSQSSSSAS  64  —  64  TIR3c  YIL011w  4  SAAS  50  —  50  PAN1  YIR006c  6  PTQPVQ  —  17  17      7  PQTTGMM  —  29  29  DSN1  YIR010w  4  ATAN  —  25  25  FLO11c  YIR019c  10  SSTTTSSTSE  50  40  90      12  PVTSSTTESSSA  42  42  84  PIR2c  YJL159w  26  GDGQVQAATTTASVSTKSTAAAVSQI  15  19  34  BBC1  YJL020c  7  VPVPAAT  —  14  14  DAN1c  YJR150c  4  VASS  50  —  50  DAN4c  YJR151c  6  TTPTTS  17  67  84      24  SAEPTTVSEVTSSVEPTRSSQVTS  29  21  50  PIR1c  YKL164c  19  QIGDGQIQATTKTTAAAVS  5  21  26  PIR3c  YKL163w  18  VSQITDGQVQAAKSTAAA  11  11  22    YKL105c  6  ENVDDD  —  —  0    YKL023w  4  KQEK  —  —  0  PRY2  YKR013w  6  SPTTTT  17  67  84  FLO10c  YKR102w  27  SSWSSSEVCTECTETESTSYVTPYVTS  30  22  52    YLR114c  9  GEGDENGDD  —  —  0  CTS1c  YLR286c  5  STSSG  60  20  80  CHS5c  YLR330w  7  EDSNEPV  14  —  14  CCW14c  YLR390w-a  11  ASSSTKASSSS  64  9  73  DDR48  YMR173w  8  SNNNDSYG  25  —  25      8  SNNNDSYG  25  —  25    YMR317w  12  SSPVSSEAPSAT  42  8  50  UBP10  YNL186w  4  DIGE  —  —  0  NIS1  YNL078w  4  SNTN  25  25  50  AGA1c  YNR044w  7  SLSSTST  57  29  86    YOL155c  13  GSSVSGSTSATES  46  15  61  WSC3c  YOL105c  4  TTSS  50  50  100  TIR4c  YOR009w  12  SSSVAPSSSEVV  50  —  50  TIR2c  YOR010c  11  SSSETTSSAVA  45  18  63  PET127  YOR017w  6  YPGRRT  —  17  17  FIT3c  YOR383c  5  SAAET  20  20  40  MF(α)1   YPL187w   21   KREAEAEAWHWLQLKPGQPMY   —   —   0   a The first repeat unit of the minisatellite is shown. b Serine and threonine percentages are given according to the first repeat unit, and because units are slightly different from each other, these numbers may therefore slightly vary when the whole minisatellite is considered. c Cell wall gene. View Large Note that some repeat sequences are similar (FLO1, FLO5, and FLO9 or PIR1 and PIR2) and may have arisen by gene conversion. Meiotic Hot Spots and Minisatellites It was previously shown in man and yeast that minisatellites located near a meiotic hot spot expand and contract at a high frequency during meiosis (Appelgren, Cederberg, and Rannug 1997; Jeffreys, Murray, and Neumann 1998; Debrauwère et al. 1999). We asked whether minisatellites were close to meiotic hot spots, defined from whole-genome analyses of meiotic DSB sites (Gerton et al. 2000; Borde et al. 2004). In S. cerevisiae, meiotic gene conversion tracts are rather limited in size (1–2 kb) (reviewed in Pâques and Haber 1999). We found two minisatellites within 2 kb from a meiotic hot spot (SNF11 and PRY2; fig. 1) and four others within 5 kb of a hot spot (FLO9, RPL34B, DSN1, BBC1; fig. 1). Given the numbers of hot spots and minisatellites in the yeast genome, a random distribution would generate, respectively, two minisatellites within 2 kb of a hot spot and five within 5 kb, which is not different from what we found. We therefore rejected the hypothesis that minisatellites are associated to meiotic hot spots more often than randomly expected. Minisatellite Size Polymorphism Among Different Yeast Strains We previously demonstrated microsatellite size polymorphism among laboratory or industrial yeast strains or strains isolated from infected patients (Richard and Dujon 1996; Hennequin et al. 2001). This size polymorphism was used to classify the strains studied and could be used as a typing method to find their origin. In order to determine to what extent natural yeast minisatellites were also polymorphic, we selected eight independent laboratory haploid yeast strains, based on the uniqueness of their microsatellite haplotype (Richard and Dujon 1996) and studied eight minisatellite loci. Four were chosen within 2 or 5 kb of a meiotic hot spot (SNF11, BUD27, PRY2, and DSN1; fig. 1). The other four were selected so that their unit size and unit number were as similar to the first four as possible and so that they were not close to a hot spot. Unique primers were designed to PCR amplify the eight minisatellites in each strain, strain FYBL1-8B (a derivative of the S288C sequenced strain) being used as the reference. Six out of the eight minisatellites exhibited size polymorphism (only SCW11 and PRY2 minisatellites did not) (fig. 5). We were able to assign each strain to a specific unique haplotype because we did not find two strains with the same haplotype. Interestingly, the HC9-7 strain exhibited three different bands at the NIS1 locus, on chromosome XIV. In the other strains, amplification of this minisatellite was very specific because it amplified only one band. The HC9-7 strain also showed two different alleles of a microsatellite located on chromosome XI in a former study (Richard and Dujon 1996). Therefore, there must be some aneuploidy (or segmental duplications) in this particular strain. FIG. 5.— View largeDownload slide Minisatellite size polymorphism in different yeast strains. PCR products of each locus are run in parallel on the same gel to estimate size variations. Strain FYBL1-8B is used as the size control. Size variations were estimated using the 100-bp ladder. (A) An example of a stable locus in the strains studied (SCW11). (B) Two examples of unstable loci in the strains studied (SNF11 and BUD27). (C) Summary of PCR amplification of the eight loci studied. The number of different alleles are shown to the right. Only two minisatellites (marked by an asterisk) exhibit no size polymorphism (SCW11 and PRY2). FIG. 5.— View largeDownload slide Minisatellite size polymorphism in different yeast strains. PCR products of each locus are run in parallel on the same gel to estimate size variations. Strain FYBL1-8B is used as the size control. Size variations were estimated using the 100-bp ladder. (A) An example of a stable locus in the strains studied (SCW11). (B) Two examples of unstable loci in the strains studied (SNF11 and BUD27). (C) Summary of PCR amplification of the eight loci studied. The number of different alleles are shown to the right. Only two minisatellites (marked by an asterisk) exhibit no size polymorphism (SCW11 and PRY2). We did not find any difference in the degree of polymorphism of minisatellites located near meiotic hot spots or far from them. In both cases, three minisatellites out of four showed some level of polyphormism (fig. 5). The number of different alleles for a given minisatellite is not correlated to the presence of a hot spot either. We therefore concluded that minisatellite stability in these different laboratory strains did not depend on the presence of a near meiotic hot spot. Conservation of Minisatellites in Hemiascomycetous Yeasts In order to estimate minisatellite conservation during evolution, we investigated other hemiascomycetous yeast genomes (fig. 6). Saccharomyces paradoxus is a Saccharomyces sensu stricto, very close to S. cerevisiae. Candida glabrata is a pathogenic yeast, a causative agent of human candidiasis (Bennett, Izumikawa, and Marr 2004). Kluyveromyces lactis is also related to S. cerevisiae and has been used for genetic studies or industrial applications (Bolotin-Fukuhara et al. 2000). Debaryomyces hansenii is a halotolerant yeast, phylogenetically close to the pathogen Candida albicans (Lépingle et al. 2000). Yarrowia lipolytica is a more distantly related yeast, able to grow as individual cells or as a mycelium (Casarégola et al. 2000). The evolutionary distance between S. cerevisiae and Y. lipolytica, measured as the amino acid divergence between orthologous proteins, is larger than the entire phylum of Chordates (Dujon et al. 2004). FIG. 6.— View largeDownload slide (A) Phylogenetic tree of hemiascomycetous yeast species used in this study, based on Dujon et al. (2004), showing evolution of the PRY2 minisatellite in hemiascomycetes. Self-dot matrices of 250 bp surrounding the minisatellite are shown (stringency: 7, window: 9). In Candida glabrata, no orthologue could be unambiguously assigned because PRY2 is part of a two-member gene family in this species, but none of the two homologues contains a minisatellite. (B) Alignment of the region containing the minisatellite. Upstream and downstream protein sequences are perfectly aligned but are not shown here. Boxed sequences represent the three minisatellites, and the dotted box represents the minisatellite relic in Debaryomyces hansenii. There is no detectable tandem repeat sequence in this region in Kluyveromyces lactis. FIG. 6.— View largeDownload slide (A) Phylogenetic tree of hemiascomycetous yeast species used in this study, based on Dujon et al. (2004), showing evolution of the PRY2 minisatellite in hemiascomycetes. Self-dot matrices of 250 bp surrounding the minisatellite are shown (stringency: 7, window: 9). In Candida glabrata, no orthologue could be unambiguously assigned because PRY2 is part of a two-member gene family in this species, but none of the two homologues contains a minisatellite. (B) Alignment of the region containing the minisatellite. Upstream and downstream protein sequences are perfectly aligned but are not shown here. Boxed sequences represent the three minisatellites, and the dotted box represents the minisatellite relic in Debaryomyces hansenii. There is no detectable tandem repeat sequence in this region in Kluyveromyces lactis. In the closest species, S. paradoxus, we found one orthologue for each of the 49 S. cerevisiae minisatellite-containing genes. In 73% of the cases (36 out of 55 minisatellites), a minisatellite was also found in the S. paradoxus gene (table 5). Except in 8 cases out of 36, the motif unit has the same size as in S. cerevisiae. In one case (MF(alpha)1), the minisatellite is not detectable at the DNA level anymore, although the protein repeat is still present. We called it a “minisatellite relic” as a reminiscence of the term “gene relic” used to describe very degenerate genes found in the genomes of hemiascomycetous yeasts (Lafontaine et al. 2004; Lafontaine and Dujon, in preparation). Altogether, these observations show that although there is an excellent conservation of genes between S. cerevisiae and S. paradoxus, minisatellites are much less conserved, suggesting a fast evolution rate of these tandem repeat sequences, reminiscent of what was observed for microsatellites in a previous study (Malpertuy, Dujon, and Richard 2003). Among minisatellites that are conserved between S. cerevisiae and S. paradoxus, a clear bias for serine and threonine was also observed in S. paradoxus. Table 5 Conservation of Minisatellites in Hemiascomycetous Yeast Species Saccharomyces cerevisiae       Saccharomyces paradoxus   Candida glabrata   Kluyveromyces lactis   Debaryomyces hansenii   Yarrowia lipolytica   Gene   Systematic Name   Minisatellite             FLO9  YAL063c  13 × 135  10 × 135  Fam. (22)  Fam. (21)  Fam. (9)  Fam. (18)  FLO1  YAR050w  10 × 135  3 × 135  Fam. (22)  Fam. (20)  Fam. (10)  Fam. (17)  —  YBR016w  5 × 15  No  No  —  —  3 × 15  TIP1  YBR067c  9 × 18  8 × 18  Fam. (6)  —  —  —  CYC8  YBR112c  3 × 18  10 × 18  No  4 × 9  No  No  RPO21  YDL140c  6 × 21  No  18 × 21  14 × 21  16 × 21  8 × 21      13 × 21  No  (1)  (1)  (1)  (1)  BSC1  YDL037c  16 × 15  6 × 36  —  Fam. (4)  —  —  SNF11  YDR073w  6 × 12  3 × 12  —  No  —  —  Pseudo  YDR134c  3 × 12  4 × 12  No  No  —  —  NUM1  YDR150w  10 × 192  2 × 213  No  No  6 × 96  —  HKR1  YDR420w  21 × 42  No  Fam. (11)  Relic  —  —  FIT1  YDR534c  10 × 18  11 × 18  —  Relic  —  —  TIR1  YER011w  7 × 36  6 × 36  Relic  —  —  —  BUD27  YFL023w  6 × 30  3 × 30  8 × 27  No  5 × 30  —  NAB2  YGL122c  9 × 12  No  No  No  4 × 12  No  SCW11  YGL028c  8 × 12  5 × 12  No  No  No  No  CRH1  YGR189c  5 × 24  4 × 24  5 × 24  No  Fam. (6)  6 × 15  FLO5  YHR211w  7 × 135  11 × 135  Fam. (25)  Fam. (21)  Fam. (8)  Fam. (20)      3 × 21  No          —  YIL169c  12 × 42  5 × 42  Fam. (14)  —  —  —  TIR3  YIL011w  4 × 12  4 × 33  Fam. (10)  —  —  —  PAN1  YIR006c  9 × 18  15 × 9  No  No  No  4 × 30      3 × 21  No  No  No  No  No  DSN1  YIR010w  6 × 12  No  No  No  —  —  FLO11  YIR019c  5 × 30  8 × 81  Fam. (42)  Fam. (20)  Fam. (17)  Fam. (65)      5 × 36  4 × 36          PIR2  YJL159w  6 × 78  9 × 57  Fam. (5)  Fam. (3)  Fam. (2)a  Fam. (2)  BBC1  YJL020c  3 × 21  21 × 9  Relic  No  Relic  —  DAN1  YJR150c  5 × 12  No  Fam. (3)  —  —  —  DAN4  YJR151c  29 × 18  25 × 18  Fam. (27)  Fam. (20)  Fam. (13)  Fam. (39)      7 × 72  No          PIR1  YKL164c  8 × 57  10 × 57  Fam. (5)  3 × 57 (2)  5 × 66 (2)  Fam. (2)  PIR3  YKL163w  6 × 54  9 × 54  Fam. (5)  Relic (2)  (2)  Fam. (2)  —  YKL105c  8 × 18  No  No  —  —  —  —  YKL023w  4 × 12  No  —  —  —  —  PRY2  YKR013w  6 × 18  9 × 15  Fam. (2)b  No  Relic  6 × 15  FLO10  YKR102w  3 × 81  No  Fam. (19)  Fam. (16)  Fam. (10)  Fam. (17)  —  YLR114c  3 × 27  No  No  No  No  No  CTS1  YLR286c  3 × 15  No  No  No  Fam. (2)  Fam. (2)b  CHS5  YLR330w  9 × 21  7 × 21  4 × 21  5 × 24  No  5 × 15  CCW14  YLR390w-a  3 × 33  5 × 33  Relic  —  —  9 × 15  DDR48  YMR173w  6 × 24  4 × 24  —  —  —  —      4 × 24  No  —  —  —  —  —  YMR317w  16 × 36  6 × 36  Fam. (22)  Fam. (13)  Fam. (12)  Fam. (34)  UBP10  YNL186w  7 × 12  4 × 12  8 × 12  No  No  No  NIS1  YNL078w  5 × 12  No  No  No  —  —  AGA1  YNR044w  17 × 21  20 × 21  —  —  —  —  —  YOL155c  5 × 39  4 × 39  —  —  —  —  WSC3  YOL105c  17 × 12  8 × 12  —  7 × 12  —  —  TIR4  YOR009w  12 × 36  4 × 36  Fam. (15)  —  —  —  TIR2  YOR010c  5 × 33  No  Fam. (6)  —  —  —  PET127  YOR017w  4 × 18  4 × 18  No  No  No  No  FIT3  YOR383c  3 × 15  5 × 15  —  Fam. (3)c  —  —  MF(ALPHA)1  YPL187w  3 × 63  Relic  3 × 75  3 × 81  —  3 × 102  Conserved genesd      49  23  27  15  16  Minisatellites      36/55  6/24  7/28  5/16  8/17        (73%)  (25%)  (25%)  (31%)  (47%)  Conserved minisatellites      23  2  2  2  1  Minisatellite relics       1   3   3   3   0   Saccharomyces cerevisiae       Saccharomyces paradoxus   Candida glabrata   Kluyveromyces lactis   Debaryomyces hansenii   Yarrowia lipolytica   Gene   Systematic Name   Minisatellite             FLO9  YAL063c  13 × 135  10 × 135  Fam. (22)  Fam. (21)  Fam. (9)  Fam. (18)  FLO1  YAR050w  10 × 135  3 × 135  Fam. (22)  Fam. (20)  Fam. (10)  Fam. (17)  —  YBR016w  5 × 15  No  No  —  —  3 × 15  TIP1  YBR067c  9 × 18  8 × 18  Fam. (6)  —  —  —  CYC8  YBR112c  3 × 18  10 × 18  No  4 × 9  No  No  RPO21  YDL140c  6 × 21  No  18 × 21  14 × 21  16 × 21  8 × 21      13 × 21  No  (1)  (1)  (1)  (1)  BSC1  YDL037c  16 × 15  6 × 36  —  Fam. (4)  —  —  SNF11  YDR073w  6 × 12  3 × 12  —  No  —  —  Pseudo  YDR134c  3 × 12  4 × 12  No  No  —  —  NUM1  YDR150w  10 × 192  2 × 213  No  No  6 × 96  —  HKR1  YDR420w  21 × 42  No  Fam. (11)  Relic  —  —  FIT1  YDR534c  10 × 18  11 × 18  —  Relic  —  —  TIR1  YER011w  7 × 36  6 × 36  Relic  —  —  —  BUD27  YFL023w  6 × 30  3 × 30  8 × 27  No  5 × 30  —  NAB2  YGL122c  9 × 12  No  No  No  4 × 12  No  SCW11  YGL028c  8 × 12  5 × 12  No  No  No  No  CRH1  YGR189c  5 × 24  4 × 24  5 × 24  No  Fam. (6)  6 × 15  FLO5  YHR211w  7 × 135  11 × 135  Fam. (25)  Fam. (21)  Fam. (8)  Fam. (20)      3 × 21  No          —  YIL169c  12 × 42  5 × 42  Fam. (14)  —  —  —  TIR3  YIL011w  4 × 12  4 × 33  Fam. (10)  —  —  —  PAN1  YIR006c  9 × 18  15 × 9  No  No  No  4 × 30      3 × 21  No  No  No  No  No  DSN1  YIR010w  6 × 12  No  No  No  —  —  FLO11  YIR019c  5 × 30  8 × 81  Fam. (42)  Fam. (20)  Fam. (17)  Fam. (65)      5 × 36  4 × 36          PIR2  YJL159w  6 × 78  9 × 57  Fam. (5)  Fam. (3)  Fam. (2)a  Fam. (2)  BBC1  YJL020c  3 × 21  21 × 9  Relic  No  Relic  —  DAN1  YJR150c  5 × 12  No  Fam. (3)  —  —  —  DAN4  YJR151c  29 × 18  25 × 18  Fam. (27)  Fam. (20)  Fam. (13)  Fam. (39)      7 × 72  No          PIR1  YKL164c  8 × 57  10 × 57  Fam. (5)  3 × 57 (2)  5 × 66 (2)  Fam. (2)  PIR3  YKL163w  6 × 54  9 × 54  Fam. (5)  Relic (2)  (2)  Fam. (2)  —  YKL105c  8 × 18  No  No  —  —  —  —  YKL023w  4 × 12  No  —  —  —  —  PRY2  YKR013w  6 × 18  9 × 15  Fam. (2)b  No  Relic  6 × 15  FLO10  YKR102w  3 × 81  No  Fam. (19)  Fam. (16)  Fam. (10)  Fam. (17)  —  YLR114c  3 × 27  No  No  No  No  No  CTS1  YLR286c  3 × 15  No  No  No  Fam. (2)  Fam. (2)b  CHS5  YLR330w  9 × 21  7 × 21  4 × 21  5 × 24  No  5 × 15  CCW14  YLR390w-a  3 × 33  5 × 33  Relic  —  —  9 × 15  DDR48  YMR173w  6 × 24  4 × 24  —  —  —  —      4 × 24  No  —  —  —  —  —  YMR317w  16 × 36  6 × 36  Fam. (22)  Fam. (13)  Fam. (12)  Fam. (34)  UBP10  YNL186w  7 × 12  4 × 12  8 × 12  No  No  No  NIS1  YNL078w  5 × 12  No  No  No  —  —  AGA1  YNR044w  17 × 21  20 × 21  —  —  —  —  —  YOL155c  5 × 39  4 × 39  —  —  —  —  WSC3  YOL105c  17 × 12  8 × 12  —  7 × 12  —  —  TIR4  YOR009w  12 × 36  4 × 36  Fam. (15)  —  —  —  TIR2  YOR010c  5 × 33  No  Fam. (6)  —  —  —  PET127  YOR017w  4 × 18  4 × 18  No  No  No  No  FIT3  YOR383c  3 × 15  5 × 15  —  Fam. (3)c  —  —  MF(ALPHA)1  YPL187w  3 × 63  Relic  3 × 75  3 × 81  —  3 × 102  Conserved genesd      49  23  27  15  16  Minisatellites      36/55  6/24  7/28  5/16  8/17        (73%)  (25%)  (25%)  (31%)  (47%)  Conserved minisatellites      23  2  2  2  1  Minisatellite relics       1   3   3   3   0   NOTE.—Fam. (n): gene is part of a family in this species, n is the number of members in the family; “—,” no orthologue could be unambiguously assigned; No: at least one orthologue could be assigned but no minisatellite could be detected; Relic: a very degenerate minisatellite could be detected; (1): one minisatellite replaces the two minisatellites found in S. cerevisiae; (2): PIR1 or PIR3; the underlined minisatellites have the same motif sequence as in S. cerevisiae (see text); and the minisatellites that are not underlined have a different motif sequence as compared to S. cerevisiae (see text). a All members of the family contain a minisatellite relic. b None of the members of the family contains a minisatellite. c All members of the family contain a minisatellite. d Excluding gene families. View Large We subsequently looked for minisatellite conservation in the four other species. Finding orthologues of S. cerevisiae genes was more challenging, even in the second closest species, C. glabrata, because many of the minisatellite-containing genes belong to gene families containing from 2 to 65 paralogous members (table 5). Most of the time, we could not use the synteny data to choose among several homologues because synteny breakpoints are frequent in regions containing dispersed repeated elements, like retrotransposons or gene families (Fischer et al. 2000, 2001). As expected, minisatellite-containing genes were easier to identify in C. glabrata and K. lactis, as compared to the more distant D. hansenii and Y. lipolytica. However, we more often found a minisatellite in Y. lipolytica (8 minisatellites out of 17 conserved genes, chi-square test: P = 0.05) than in C. glabrata, K. lactis, or D. hansenii (table 5). Also, three minisatellite relics are found in C. glabrata, K. lactis, and D. hanseni and none in Y. lipolytica. When minisatellite sequences were compared, most of the time their sequence was found to be different between S. cerevisiae and the other hemiascomycetous yeast species. Sequence alignments showed that in 25% of the cases accumulation of point mutations “erased” the minisatellite, in 25% there was a complete deletion of the minisatellite although the protein sequence is conserved upstream and downstream of it, and in the remaining cases (50%) a mix of point mutations and small deletions led to the loss of the minisatellite. The kind of mutational events encountered is reminiscent of what was observed for microsatellites in different yeast species (Malpertuy, Dujon, and Richard 2003). In S. paradoxus, 23 minisatellite sequences are conserved, whereas in other species, only 2 sequences in C. glabrata, K. lactis, and D. hansenii and 1 sequence in Y. lipolytica are conserved. Note that the only minisatellite whose repeat motif is conserved in all species in which it is found is the RPO21 minisatellite. This minisatellite is split in two minisatellites in baker's yeast, separated by only 9 nt, whereas in the other species there is only one minisatellite covering the same region of the gene. In conclusion, when a minisatellite is found in a yeast species, its sequence is most of the time different from the S. cerevisiae sequence, although it is located at the same position within the gene. A striking example is the case of the PRY2 minisatellite, present in three species and only found as a relic in D. hansenii (fig. 6A). Protein sequence alignment shows that the repeat unit is different in the four species in which it is found (fig. 6B). This raises the intriguing question of the origin of this minisatellite. Either there was a minisatellite in the common ancestor of the PRY2 gene and it diverged rapidly in all species or each species acquired independently a different minisatellite in the same gene, suggesting that some genes might be preferential targets for minisatellite formation. Discussion In the present work, we report the first comprehensive analysis of minisatellites in the genome of a completely sequenced organism. To the best of our knowledge, although numerous papers describing microsatellites in eukaryotic and prokaryotic genomes have been published (Richard and Dujon 1997; Alba, Santibañez-Koref, and Hancock 1999; Richard et al. 1999; Toth, Gaspari, and Jurka 2000; Young, Sloan, and Van Riper 2000; International Human Genome Sequencing Consortium 2001; Morgante, Hanafey, and Powell 2002; Malpertuy, Dujon, and Richard 2003; Subramanian, Mishra, and Singh 2003), only one work in the literature reports the presence of some minisatellites in a sequenced genome, Tetraodon nigroviridis (Roest Crollius et al. 2000). The authors found two main locations for minisatellites, respectively, in the subtelocentric (10mer minisatellite) and in the centromeric (118mer minisatellite) regions. Other minisatellites were also found but remain negligible as compared to the two main locations and were not further described by the authors. We have found altogether 84 minisatellites in the S. cerevisiae genome showing no obvious distribution bias toward telomeric or centromeric regions (if one excludes the Y' minisatellite). Out of 66 non-Y' minisatellites, 55 are found in genes, a slightly higher proportion than expected, given the respective genome coverage of coding and noncoding regions (47 expected). Out of 49 minisatellite-containing genes, half of them encode cell wall proteins, particularly proteins covalently associated to cell wall polysaccharides. These minisatellites encode serine- and threonine-rich amino acid repeats, involved in the O-glycosylation of the protein. It must be noted that some minisatellite-containing gene functions are unknown. Therefore, the presence of a Ser/Thr minisatellite might give a clue to the putative involvement of a gene of yet unknown function in cell wall organization. It is the case of YIL169c, YMR317w, BSC1, PRY2, and YOL155c, containing a Ser/Thr-rich repeat and being conserved at least in S. paradoxus (table 5). The other genes of unknown function (YBR016w, YKL105c, YKL023w, and YLR114c) do not encode Ser/Thr-rich repeats and might be involved in functions different from cell wall metabolism. It was surprising to find a strong negative GC skew in minisatellites and in their associated genes. We investigated whether it was linked to amino acid composition and/or to codon usage bias because it was shown that biased base composition in CAG repeat containing genes in human and mouse were due to their unusual amino acid content (Hancock, Worthey, and Santibañez-Koref 2001). Among the six possible serine codons, TCT is overrepresented and AGT underrepresented, and among the four possible threonine codons, ACT and ACC are overrepresented and ACA and ACG are underrepresented in minisatellites. This could be the basis for the GC skew observed in some cases but not all because 5 minisatellites exhibit a (Ser + Thr) composition higher than 50% and no skew (DAN1, FLO10, CTS1, YMR317w, and TIR4; tables 1 and 4), whereas 10 minisatellites exhibit a (Ser + Thr) composition lower than 50% and a negative GC skew (FLO9, FLO1, FLO5, PAN1, BBC1, PIR1, PIR3, DDR48 [both repeats], and FIT3; tables 1 and 4). Many circular bacterial chromosomes exhibit a strong GC skew with guanines more abundant on the leading strand of DNA replication, on each side of the replication origin (Lobry 1996). In addition, an opposite GC skew (cytosines > guanines) was recently described around the transcription start site of Arabidopsis thaliana and Oryza sativa (rice) genes. Some fungal genomes (but not S. cerevisiae) show the same bias around their transcription initiation regions (Fujimori, Washio, and Tomita 2005). But so far, no such compositional bias was described in S. cerevisiae. No other significant skew was found in the minisatellites described here, so it is specifically an overrepresentation of cytosines in the gene-coding strand. No obvious correlation was found between minisatellite locations and replication profiles of yeast chromosomes (Raghuraman et al. 2001) because out of 33 minisatellites exhibiting a negative GC skew (table 1), 18 are predicted to be replicated on the leading strand and 15 on the lagging strand during S-phase replication. Finally, no significant association of minisatellites with replication origins was found. Possible Molecular Mechanisms Propagating Minisatellites Despite earlier observations of minisatellite instability due to the presence of a nearby meiotic hot spot, we found no preferential association of minisatellites with them and no greater polymorphism for those near a hot spot, suggesting that in S. cerevisiae, minisatellites mainly evolve independently of such hot spots. However, it was recently shown that meiotic hot spots are, for their most part, not conserved between humans and chimpanzees, despite 99% conservation of the DNA sequence between these two species (Winckler et al. 2005). Therefore, we cannot rule out that the minisatellites we found, originally arose near ancient meiotic hot spots that have since disappeared. A possible mechanism to explain minisatellite origin in yeast was proposed by Haber and Louis (1998). They observed that the Y' minisatellite, a Saccharomyces carlbergensis minisatellite, and several human minisatellites are flanked by two short identical sequences. They speculated that an initial duplication event, resulting from replication slippage between these two short sequences, was responsible for the birth of the minisatellite, followed in next generations by unequal crossing-over between sister chromatids or again, replication slippage leading to minisatellite expansion. Examination of sequences flanking the 55 S. cerevisiae minisatellites found in genes confirm and extend this finding. We found such short identical sequences for 49 minisatellites; we could not detect such sequences only for NUM1, DSN1, CTS1, CHS5, and one of the two minisatellites in RPO21 and FLO11 (table 6). The average size between the end of the minisatellite and the downstream short sequence is 27 ± 6 nt (fig. 4). These flanking repeats are trimers (3 cases out of 49), tetramers (15 cases), pentamers (20 cases), hexamers (4 cases), or heptamers or more (7 cases). Pentamers were also found flanking the 18 occurences of the Y' minisatellite. The reason why pentamers are more frequent, as compared to other repeat sizes, is unknown. Table 6 Minisatellite-Flanking Identical Sequences Gene   Systematic Name   Flanking Sequence   Size   Dist.a   Gene   Syst. Name   Flanking Sequence   Size   Dist.a   FLO9  YAL063c  TTCTA  5  26  BBC1  YJL020c  CCAG  4  16  FLO1  YAR050w  ACCGG  5  37  DAN1  YJR150c  ATCT  4  32    YBR016w  GGATA  5  12  DAN4  YJR151c  TCAAGT  6  8  TIP1  YBR067c  AATC  4  1      ACTAC  5  43  CYC8  YBR112C  CTCAA  5  1  PIR1  YKL164c  ATCTC  5  15  RPO21  YDL140c  TTC  3  18  PIR3  YKL163w  TACAG  5  5      No  —  —    YKL105c  AACAA  5  53  BSC1  YDL037c  CGTC  4  29    YKL023w  AGAA  4  61  SNF11  YDR073w  GTA  3  41  PRY2  YKR013w  AACACAAC  8  66    YDR134c  CCAA  4  1  FLO10  YKR102w  GCAA  4  14  NUM1  YDR150w  No  —  —    YLR114c  ACGA  4  43  HKR1  YDR420w  CCATCA  6  57  CTS1  YLR286c  No  —  —  FIT1  YDR534c  GAG  3  12  CHS5  YLR330w  No  —  —  TIR1  YER011w  CTGCC  5  25  CCW14  YLR390w-a  TCTTCT  6  27  BUD27  YFL023w  AACGA  5  25  DDR48  YMR173w  ACGA  4  41  NAB2  YGL122c  CAACC  5  16      AACAATGACGATTC  14  31  SCW11  YGL028c  CTAC  4  8    YMR317w  CATCA  5  42  CRH1  YGR189c  CATCC  5  16  UBP10  YNL186w  GATG  4  20  FLO5  YHR211w  CAACT  5  76  NIS1  YNL078w  GATT  4  28      AACAA  5  0  AGA1  YNR044w  ATCC  4  36    YIL169c  TTCTG  5  10    YOL155c  GGCTCATC  8  40  TIR3  YIL011w  CCAAG  5  0  WSC3  YOL105c  TACCAC  6  54  PAN1  YIR006c  TCAACCAACT  10  8  TIR4  YOR009w  AGTT  4  60      ACCTCAG  7  23  TIR2  YOR010c  TCTAC  5  22  DSN1  YIR010w  No  —  —  PET127  YOR017w  TAGG  4  51  FLO11  YIR019c  No  —  —  FIT3  YOR383c  CACTT  5  13      TCCATCCAG  9  0  MF(α)1  YPL187w  TAAAA  5  30  PIR2   YJL159w   TTCCCAAATT   10   23             Gene   Systematic Name   Flanking Sequence   Size   Dist.a   Gene   Syst. Name   Flanking Sequence   Size   Dist.a   FLO9  YAL063c  TTCTA  5  26  BBC1  YJL020c  CCAG  4  16  FLO1  YAR050w  ACCGG  5  37  DAN1  YJR150c  ATCT  4  32    YBR016w  GGATA  5  12  DAN4  YJR151c  TCAAGT  6  8  TIP1  YBR067c  AATC  4  1      ACTAC  5  43  CYC8  YBR112C  CTCAA  5  1  PIR1  YKL164c  ATCTC  5  15  RPO21  YDL140c  TTC  3  18  PIR3  YKL163w  TACAG  5  5      No  —  —    YKL105c  AACAA  5  53  BSC1  YDL037c  CGTC  4  29    YKL023w  AGAA  4  61  SNF11  YDR073w  GTA  3  41  PRY2  YKR013w  AACACAAC  8  66    YDR134c  CCAA  4  1  FLO10  YKR102w  GCAA  4  14  NUM1  YDR150w  No  —  —    YLR114c  ACGA  4  43  HKR1  YDR420w  CCATCA  6  57  CTS1  YLR286c  No  —  —  FIT1  YDR534c  GAG  3  12  CHS5  YLR330w  No  —  —  TIR1  YER011w  CTGCC  5  25  CCW14  YLR390w-a  TCTTCT  6  27  BUD27  YFL023w  AACGA  5  25  DDR48  YMR173w  ACGA  4  41  NAB2  YGL122c  CAACC  5  16      AACAATGACGATTC  14  31  SCW11  YGL028c  CTAC  4  8    YMR317w  CATCA  5  42  CRH1  YGR189c  CATCC  5  16  UBP10  YNL186w  GATG  4  20  FLO5  YHR211w  CAACT  5  76  NIS1  YNL078w  GATT  4  28      AACAA  5  0  AGA1  YNR044w  ATCC  4  36    YIL169c  TTCTG  5  10    YOL155c  GGCTCATC  8  40  TIR3  YIL011w  CCAAG  5  0  WSC3  YOL105c  TACCAC  6  54  PAN1  YIR006c  TCAACCAACT  10  8  TIR4  YOR009w  AGTT  4  60      ACCTCAG  7  23  TIR2  YOR010c  TCTAC  5  22  DSN1  YIR010w  No  —  —  PET127  YOR017w  TAGG  4  51  FLO11  YIR019c  No  —  —  FIT3  YOR383c  CACTT  5  13      TCCATCCAG  9  0  MF(α)1  YPL187w  TAAAA  5  30  PIR2   YJL159w   TTCCCAAATT   10   23             a Dist., distance in nucleotides from the end of the minisatellite to the downstream repeat; No, no flanking repeat was found. View Large Rapid Evolution of Minisatellites In a former study, it was shown that microsatellites evolved rapidly among several hemiascomycetous yeast genomes (Malpertuy, Dujon, and Richard 2003). We come to the same conclusion for minisatellites. Although a minisatellite-containing gene is conserved, its minisatellite is not necessarily conserved, and most of the time its sequence is divergent from the S. cerevisiae sequence (table 5 and fig. 6). Analysis of the completely sequenced genomes of the hemiascomycetes studied here, using criteria similar to the present study, shows that they all contain numerous minisatellites, in proportions comparable to what was found in S. cerevisiae (data not shown). This observation implies that each species contains minisatellites that are absent from the S. cerevisiae genome, suggesting that each species has a specific subset of minisatellites that are not shared by the others. Hence, there must be molecular mechanisms responsible for de novo creation of minisatellites, as suggested before for microsatellites (Malpertuy, Dujon, and Richard 2003). Birth, Life, and Death of Minisatellites: A Model We propose that initial formation of a minisatellite requires a negatively GC-skewed DNA region; hence, it has more chance to occur in genes that naturally exhibit this negative skew. Birth requires slippage (probably occuring during DNA replication) between two short repeats flanking the region that will be duplicated, as originally proposed by Haber and Louis (1998). After the initial duplication event, the minisatellite can be amplified by different mechanisms, including slippage during replication, mitotic recombination, or meiotic gene conversion. Replication errors can introduce point mutations into a given unit that will eventually lead to correction or propagation of the mutation by gene conversion. If too many mutations accumulate in a minisatellite, repeat size change cannot occur anymore because the repeat units are too divergent to promote slippage during replication or recombination (Pâques, Richard, and Haber 2001). From then on, the minisatellite will accumulate more point mutations, eventually erasing the repeats. Toward a Biological Definition of Tandem Repeat Sequences Finally, we want to point out that the frontier between micro- and minisatellites varies a lot in the literature, depending on authors. The present work allows us to propose a biological definition of these two genetic objects. In S. cerevisiae, mono- to hexanucleotide repeats are found (Richard et al. 1999), and trinucleotide repeats (a particular class of microsatellites) are mainly found in nuclear genes, often encoding transcription factors (Richard and Dujon 1996; Alba, Santibañez-Koref, and Hancock 1999; Young, Sloan, and Van Riper 2000; Malpertuy, Dujon, and Richard 2003), whereas minisatellites are mainly found in cell wall genes, as shown by the present study. The shortest repeat size of a minisatellite found in S. cerevisiae was 12 nt long, but 9-nt-long repeat units were found in S. paradoxus (PAN1 and BBC1). We therefore propose that the frontier between micro- and minisatellites be set at 9 nt, defining two classes of tandem repeat sequences, short tandem repeats found in transcription factors and longer ones in cell wall genes. Note Added in Proof A recent work by G. Fink and colleagues also show that minisatellites are frequently found in cell wall genes (Verstrepen et al. 2005). Edward Holmes, Associate Editor We thank our colleagues, especially C. Fairhead, B. Llorente, and J.-P. Latgé, for fruitful discussions and advices, H. Muller, C. Hennequin, and G. Fischer for sharing unpublished results, and two anonymous reviewers for helpful comments. B.D. is a member of the Institut Universitaire de France. References Alba, M. M., M. F. Santibañez-Koref, and J. M. Hancock. 1999. Amino acid reiterations in yeast are overexpressed in particular classes of proteins and show evidence of a slippage-like mutational process. J. Mol. Evol.  49: 789–797. Google Scholar Appelgren, H., H. Cederberg, and U. Rannug. 1997. Mutations at the human minisatellite MS32 integrated in yeast occur with high frequency in meiosis and involve complex recombination events. Mol. Gen. Genet.  256: 7–17. Google Scholar Bennett, J. E., K. Izumikawa, and K. A. Marr. 2004. Mechanism of increased fluconazole resistance in Candida glabrata during prophylaxis. Antimicrob. Agents Chemother.  48: 1773–1777. Google Scholar Bolotin-Fukuhara, M., C. Toffano-Nioche, F. Artiguenave et al. (11 co-authors). 2000. Genomic exploration of the hemiascomycetous yeasts: 11. Kluyveromyces lactis. FEBS Lett.  487: 66–70. Google Scholar Borde, V., W. Lin, E. Novikov, J. H. Petrini, M. Lichten, and A. Nicolas. 2004. Association of Mre11p with double-strand break sites during yeast meiosis. Mol. Cell  13: 389–401. Google Scholar Caro, L. H., H. Tettelin, J. H. Vossen, A. F. Ram, H. van den Ende, and F. M. Klis. 1997. In silicio identification of glycosyl-phosphatidylinositol-anchored plasma-membrane and cell wall proteins of Saccharomyces cerevisiae. Yeast  13: 1477–1489. Google Scholar Casarégola, S., C. Neuveglise, A. Lepingle, E. Bon, C. Feynerol, F. Artiguenave, P. Wincker, and C. Gaillardin. 2000. Genomic exploration of the hemiascomycetous yeasts: 17. Yarrowia lipolytica. FEBS Lett.  487: 95–100. Google Scholar Charlesworth, B., P. Sniegowski, and W. Stephan. 1994. The evolutionary dynamics of repetitive DNA in eukaryotes. Nature  371: 215–220. Google Scholar Debrauwère, H., J. Buard, J. Tessier, D. Aubert, G. Vergnaud, and A. Nicolas. 1999. Meiotic instability of human minisatellite CEB1 in yeast requires double-strand breaks. Nat. Genet.  23: 367–371. Google Scholar Debrauwère, H., C. G. Gendrel, S. Lechat, and M. Dutreix. 1997. Differences and similarities between various tandem repeat sequences: minisatellites and microsatellites. Biochimie  79: 577–586. Google Scholar Dib, C., S. Faure, C. Fizames et al. (14 co-authors). 1996. A comprehensive genetic map of the human genome based on 5,264 sequences. Nature  380: 152–154. Google Scholar Dujon, B. 1996. The yeast genome project: what did we learn? Trends Genet.  12: 263–270. Google Scholar Dujon, B., D. Sherman, G. Fischer et al. (67 co-authors). 2004. Genome evolution in yeasts. Nature  430: 35–44. Google Scholar Ecker, M., V. Mrsa, I. Hagen, R. Deutzmann, S. Strahl, and W. Tanner. 2003. O-mannosylation precedes and potentially controls the N-glycosylation of a yeast cell wall glycoprotein. EMBO Rep.  4: 628–632. Google Scholar Ellegren, H. 2004. Microsatellites: simple sequences with complex evolution. Nat. Rev. Genet.  5: 435–445. Google Scholar Fischer, G., S. A. James, I. N. Roberts, S. G. Oliver, and E. J. Louis. 2000. Chromosomal evolution in Saccharomyces. Nature  405: 451–454. Google Scholar Fischer, G., C. Neuveglise, P. Durrens, C. Gaillardin, and B. Dujon. 2001. Evolution of gene order in the genomes of two related yeast species. Genome Res.  11: 2009–2019. Google Scholar Foster, E. A., M. A. Jobling, P. G. Taylor, P. Donnelly, P. de Knijff, R. Mieremet, T. Zerjal, and C. Tyler-Smith. 1998. Jefferson fathered slave's last child. Nature  396: 27–28. Google Scholar Fujimori, S., T. Washio, and M. Tomita. 2005. GC-compositional strand bias around transcription start sites in plants and fungi. BMC Genomics  6: 26. Google Scholar Gerton, J. L., J. DeRisi, R. Shroff, M. Lichten, P. O. Brown, and T. D. Petes. 2000. Global mapping of meiotic recombination hotspots and coldspots in the yeast Saccharomyces cerevisiae. Proc. Natl. Acad. Sci. USA  97: 11383–11390. Google Scholar Gill, P., A. J. Jeffreys, and D. J. Werrett. 1985. Forensic application of DNA ‘fingerprints’. Nature  318: 577–579. Google Scholar Goffeau, A., B. G. Barrell, H. Bussey et al. (16 co-authors). 1996. Life with 6000 genes. Science  274: 546–567. Google Scholar Haber, J. E., and E. J. Louis. 1998. Minisatellite origins in yeast and humans. Genomics  48: 132–135. Google Scholar Hagelberg, E., I. C. Gray, and A. J. Jeffreys. 1991. Identification of the skeletal remains of a murder victim by DNA analysis. Nature  352: 427–429. Google Scholar Hagen, I., M. Ecker, A. Lagorce et al. (11 co-authors). 2004. Sed1p and Srl1p are required to compensate for cell wall instability in Saccharomyces cerevisiae mutants defective in multiple GPI-anchored mannoproteins. Mol. Microbiol.  52: 1413–1425. Google Scholar Hancock, J. M., E. A. Worthey, and M. F. Santibañez-Koref. 2001. A role for selection in regulating the evolutionary emergence of disease-causing and other coding CAG repeats in humans and mice. Mol. Biol. Evol.  18: 1014–1023. Google Scholar Helminen, P., C. Ehnholm, M. L. Lokki, A. Jeffreys, and L. Peltonen. 1988. Application of DNA “fingerprints” to paternity determinations. Lancet  1: 574–576. Google Scholar Hennequin, C., A. Thierry, G.-F. Richard, G. Lecointre, H. V. Nguyen, C. Gaillardin, and B. Dujon. 2001. Microsatellite typing as a new tool for identification of Saccharomyces cerevisiae strains. J. Clin. Microbiol.  39: 551–559. Google Scholar Horowitz, H., and J. E. Haber 1984. Subtelomeric regions of yeast chromosomes contain a 36 base-pair tandemly repeated sequence. Nucleic Acids Res.  12: 7105–7121. Google Scholar International Human Genome Sequencing Consortium. 2001. Initial sequencing and analysis of the human genome. Nature  409: 860–921. Google Scholar Jeffreys, A. J., J. Murray, and R. Neumann. 1998. High-resolution mapping of crossovers in human sperm defines a minisatellite-associated recombination hot spot. Mol. Cell  2: 267–273. Google Scholar Jeffreys, A. J., and R. Neumann. 1997. Somatic mutation processes at a human minisatellite. Hum. Mol. Genet.  6: 129–136. Google Scholar Jeffreys, A. J., R. Neumann, and V. Wilson. 1990. Repeat unit sequence variation in minisatellites: a novel source of DNA polymorphism for studying variation and mutation by single molecule analysis. Cell : 473–485. Google Scholar Jeffreys, A. J., K. Tamaki, A. McLeod, D. G. Monckton, D. L. Neil, and J. A. L. Armour. 1994. Complex gene conversion events in germline mutation at human minisatellites. Nat. Genet.  6: 136–145. Google Scholar Klis, F. M., P. Mol, K. Hellingwerf, and S. Brul. 2002. Dynamics of cell wall structure in Saccharomyces cerevisiae. FEMS Microbiol. Rev.  26: 239–256. Google Scholar Kolpakov, R., G. Bana, and G. Kucherov. 2003. mreps: efficient and flexible detection of tandem repeats in DNA. Nucleic Acids Res.  31: 3672–3678. Google Scholar Lafontaine, I., G. Fischer, E. Talla, and B. Dujon. 2004. Gene relics in the genome of the yeast Saccharomyces cerevisiae. Gene  335: 1–17. Google Scholar Latgé, J.-P., and R. Calderone. 2005. The fungal cell wall. In K. Esser and R. Fischer, ed. The Mycota XIII. Springer, Berlin, Germany. Google Scholar Lépingle, A., S. Casaregola, C. Neuveglise, E. Bon, H. Nguyen, F. Artiguenave, P. Wincker, and C. Gaillardin. 2000. Genomic exploration of the hemiascomycetous yeasts: 14. Debaryomyces hansenii var. hansenii. FEBS Lett.  487: 82–86. Google Scholar Lobry, J. R. 1996. Asymmetric substitution patterns in the two DNA strands of bacteria. Mol. Biol. Evol.  13: 660–665. Google Scholar Lopes, J., H. Debrauwère, J. Buard, and A. Nicolas. 2002. Instability of the human minisatellite CEB1 in rad27Δ and dna2-1 replication-deficient yeast cells. EMBO J.  21: 3201–3211. Google Scholar Louis, E. J., E. S. Naumova, A. Lee, G. Naumov, and J. E. Haber. 1994. The chromosome end in yeast: its mosaic nature and influence on recombinational dynamics. Genetics  136: 789–802. Google Scholar Maleki, S., H. Cederberg, and U. Rannug. 2002. The human minisatellites MS1, MS32, MS205 and CEB1 integrated into the yeast genome exhibit different degrees of mitotic instability but are all stabilised by RAD27. Curr. Genet.  41: 333–341. Google Scholar Malpertuy, A., B. Dujon, and G.-F. Richard. 2003. Analysis of microsatellites in 13 hemiascomycetous yeast species: mechanisms involved in genome dynamics. J. Mol. Evol.  56: 730–741. Google Scholar Marck, C. 1988. ‘DNA Strider’: a ‘C’ program for the fast analysis of DNA and protein sequences on the Apple Macintosh family of computers. Nucleic Acids Res.  16: 1829–1836. Google Scholar Morgante, M., M. Hanafey, and W. Powell. 2002. Microsatellites are preferentially associated with nonrepetitive DNA in plant genomes. Nat. Genet.  30: 194–200. Google Scholar Pâques, F., and J. E. Haber. 1999. Multiple pathways of recombination induced by double-strand breaks in Saccharomyces cerevisiae. Microbiol. Mol. Biol. Rev.  63: 349–404. Google Scholar Pâques, F., G.-F. Richard, and J. E. Haber. 2001. Expansions and contractions in 36-bp minisatellites by gene conversion in yeast. Genetics  158: 155–166. Google Scholar Raghuraman, M. K., E. A. Winzeler, D. Collingwood, S. Hunt, L. Wodicka, A. Conway, D. J. Lockhart, R. W. Davis, B. J. Brewer, and W. L. Fangman. 2001. Replication dynamics of the yeast genome. Science  294: 115–121. Google Scholar Richard, G.-F., and B. Dujon. 1996. Distribution and variability of trinucleotide repeats in the genome of the yeast Saccharomyces cerevisiae. Gene  174: 165–174. Google Scholar ———. 1997. Trinucleotide repeats in yeast. Res. Microbiol.  148: 731–744. Google Scholar Richard, G.-F., C. Hennequin, A. Thierry, and B. Dujon. 1999. Trinucleotide repeats and other microsatellites in yeasts. Res. Microbiol.  150: 589–602. Google Scholar Richard, G.-F., and F. Pâques. 2000. Mini- and microsatellite expansions: the recombination connection. EMBO Rep.  1: 122–126. Google Scholar Röder, M. S., V. Korzun, K. Wendehake, J. Plaschke, M.-H. Tixier, P. Leroy, and M. W. Ganal. 1998. A microsatellite map of wheat. Genetics  149: 2007–2023. Google Scholar Roest Crollius, H., O. Jaillon, C. Dasilva et al. (12 co-authors). 2000. Characterization and repeat analysis of the compact genome of the freshwater pufferfish Tetraodon nigroviridis. Genome Res.  10: 939–949. Google Scholar Sakamoto, T., R. G. Danzmann, K. Gharbi et al. (12 co-authors). 2000. A microsatellite linkage map of rainbow trout (Oncorhynchus mykiss) characterized by large sex-specific differences in recombination rates. Genetics  155: 1331–1345. Google Scholar Subramanian, S., R. K. Mishra, and L. Singh. 2003. Genome-wide analysis of microsatellite repeats in humans: their abundance and density in specific genomic regions. Genome Biol.  4: R13.11–R13.19. Google Scholar Toth, G., Z. Gaspari, and J. Jurka. 2000. Microsatellites in different eukaryotic genomes: survey and analysis. Genome Res.  10: 967–981. Google Scholar Verstrepen, K. J., A. Jansen, F. Lewitter, and G. Fink. 2005. Intragenic tandem repeats generate functional variability. Nat. Genet.  37: 986–990. Google Scholar Waldbieser, G. C., B. G. Bosworth, D. J. Nonneman, and W. R. Wolters. 2001. A microsatellite-based genetic linkage map for channel catfish, Ictalurus punctatus. Genetics  158: 727–734. Google Scholar Winckler, W., S. R. Myers, D. J. Richter et al. (11 co-authors). 2005. Comparison of fine-scale recombination rates in humans and chimpanzees. Science  308: 107–111. Google Scholar Young, E. T., J. S. Sloan, and K. Van Riper. 2000. Trinucleotide repeats are clustered in regulatory genes in Saccharomyces cerevisiae. Genetics  154: 1053–1068. Google Scholar © The Author 2005. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. All rights reserved. For permissions, please e-mail: journals.permissions@oxfordjournals.org http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Molecular Biology and Evolution Oxford University Press

Molecular Evolution of Minisatellites in Hemiascomycetous Yeasts

Loading next page...
 
/lp/oxford-university-press/molecular-evolution-of-minisatellites-in-hemiascomycetous-yeasts-ANWbdTTSNi

References (62)

Publisher
Oxford University Press
Copyright
© The Author 2005. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. All rights reserved. For permissions, please e-mail: journals.permissions@oxfordjournals.org
ISSN
0737-4038
eISSN
1537-1719
DOI
10.1093/molbev/msj022
pmid
16177231
Publisher site
See Article on Publisher Site

Abstract

Abstract Minisatellites are DNA tandem repeats exhibiting size polymorphism among individuals of a population. This polymorphism is generated by two different mechanisms, both in human and yeast cells, “replication slippage” during S-phase DNA synthesis and “repair slippage” associated to meiotic gene conversion. The Saccharomyces cerevisiae genome contains numerous natural minisatellites. They are located on all chromosomes without any obvious distribution bias. Minisatellites found in protein-coding genes have longer repeat units and on the average more repeat units than minisatellites in noncoding regions. They show an excess of cytosines on the coding strand, as compared to guanines (negative GC skew). They are always multiples of three, encode serine- and threonine-rich amino acid repeats, and are found preferably within genes encoding cell wall proteins, suggesting that they are positively selected in this particular class of genes. Genome-wide, there is no statistically significant association between minisatellites and meiotic recombination hot spots. In addition, minisatellites that are located in the vicinity of a meiotic hot spot are not more polymorphic than minisatellites located far from any hot spot. This suggests that minisatellites, in S. cerevisiae, evolve probably by strand slippage during replication or mitotic recombination. Finally, evolution of minisatellites among hemiascomycetous yeasts shows that even though many minisatellite-containing genes are conserved, most of the time the minisatellite itself is not conserved. The diversity of minisatellite sequences found in orthologous genes of different species suggests that minisatellites are differentially acquired and lost during evolution of hemiascomycetous yeasts at a pace faster than the genes containing them. minisatellite, replication slippage, meiotic hot spot, GC skew, yeast Introduction Repetitive elements are a common feature of all prokaryotic and eukaryotic genomes. They can be classified in two categories: dispersed repeat elements (transposons, tRNAs, paralogous protein encoding genes, etc.) and tandem repeat elements. Micro- and minisatellites are tandem repeat arrays whose unit sizes range from a few nucleotides for the former to more than 10 bp for the latter (Charlesworth, Sniegowski, and Stephan 1994). Their size polymorphism in populations has been widely used for physical mapping of genomes (Dib et al. 1996; Röder et al. 1998; Sakamoto et al. 2000; Waldbieser et al. 2001), forensic medicine (Gill, Jeffreys, and Werrett 1985; Hagelberg, Gray, and Jeffreys 1991), and paternity tests (Helminen et al. 1988; Foster et al. 1998). Molecular mechanisms underlying micro- and minisatellite size changes have been studied in humans and in model organisms (reviewed in Debrauwère et al. 1997). One of the earliest models proposes that microsatellites gain and lose repeat units by replication slippage during S-phase DNA synthesis (reviewed in Ellegren 2004), but other models involving slippage during gene conversion associated to homologous recombination have been suggested (reviewed in Richard and Pâques 2000). Minisatellites were initially proposed to undergo size changes in humans, mainly during meiosis (Jeffreys et al. 1994). It was subsequently shown that the MS32 minisatellite instability was associated to a meiotic recombination hot spot that triggers size changes during meiosis (Jeffreys, Murray, and Neumann 1998). Experiments in the yeast Saccharomyces cerevisiae have confirmed the frequent size changes of human minisatellites when inserted in the yeast genome, near a meiotic hot spot (Appelgren, Cederberg, and Rannug 1997), these size changes being dependent on the presence of Spo11p, the topoisomerase responsible for making meiotic double-strand breaks (DSBs) (Debrauwère et al. 1999). However, minisatellites are also unstable during somatic cell growth, undergoing rare size changes by replication slippage or unequal sister chromatid recombination (Jeffreys and Neumann 1997). In yeast, these somatic size changes depend on the presence of the Rad27 protein, involved in Okazaki fragment processing (Lopes et al. 2002; Maleki, Cederberg, and Rannug 2002). One of the most intriguing question concerning tandem repeat sequences in general and minisatellites in particular is their very origin. Are minisatellites initially created by S-phase replication slippage or by homologous recombination when located by chance near a meiotic hot spot? To address this question, we performed an in silico analysis of the completely sequenced S. cerevisiae genome, looking for minisatellites. To our surprise, we found a large number of such elements, most of them never described before in the literature. Most of the time, they are located within genes exhibiting a negative GC skew (more cytosines than guanines on the coding strand) and are themselves more skewed than their containing genes. Very often, short flanking repeats are found upstream and downstream of minisatellites. No positive correlation was found between the location of minisatellites and the distribution of meiotic hot spots in the yeast genome. Altogether, these data suggest that, in S. cerevisiae, natural minisatellites are acquired and lost by a molecular mechanism independent of meiotic recombination, and probably involving replication slippage between short flanking sequences, in genes exhibiting a strong bias for cytosines on the coding strand. Materials and Methods Analysis of the S. cerevisiae Genome We ran the program MREPS (Kolpakov, Bana, and Kucherov 2003) using the following parameters: minimal size of repeat unit (-minp) equal to 10 and minimal repeat length (-minsize) equal to 30. Using these parameters, only minisatellites of at least three 10-nt repeat unit long were detected. Because the resolution parameter (allowing some degree of “fuzziness” within the repeat) was set at the minimal value, variant repeats could not be detected. Therefore, repeats were individually examined and minisatellites manually extended 5′ and 3′ of the initial repeat detected by MREPS. In order to determine the threshold under which a minisatellite was too degenerate to be detected by the program, we calculated, for each minisatellite, the percentage of base substitution between all the repeats. It ranges from 0% (all repeat units identical, three occurences: minisatellites in PAN1, BBC1, and YLR114c) to 88.9% (only 2 nt out of 18 conserved in all repeats in DAN4), with an average of 35% ± 6% (median: 33.3%). Generally, a minisatellite contains a few conserved repeats, and others are more diverged. The percentage of base substitution was calculated between all the repeats. Given that the program was able to detect very degenerate minisatellites, like the one in DAN4, it is unlikely that many minisatellites in the S. cerevisiae genome were missed using this approach. In addition, some minisatellites, corresponding in fact to imperfect microsatellites (Richard and Dujon 1996), were detected by the program but not taken into account thereafter. Using this approach, MREPS detected 257 repeats fulfilling the required criteria. After careful examination, some of the repeats found by the program were partially overlapping or were part of the same minisatellite, resulting in a final number of 84 minisatellites used for the present analysis. GC skews were calculated as (G − C/G + C), using DNA Strider 1.4f6 (Marck 1988). Windows of 100 bp were used for the calculation. Both GC content and GC skew of minisatellite-containing genes were calculated on the gene DNA sequence without the minisatellite. Functional annotations are based on Gene Ontology annotations retrieved from the Saccharomyces Genome Database. Search for Orthologues in Hemiascomycetous Yeasts The Saccharomyces paradoxus orthologues of S. cerevisiae genes were retrieved from the Saccharomyces Genome Database (ftp://genome-ftp.stanford.edu/yeast/data_download/sequence/fungal_genomes/S_paradoxus/). For Candida glabrata, Kluyveromyces lactis, Debaryomyces hansenii, and Yarrowia lipolytica, we started from protein families built from sequence similarities during Génolevures 2 (Dujon et al. 2004). For families containing only one gene per sequenced species (1:1:1:1:1 relationship), we considered that this gene was the direct orthologue of the S. cerevisiae gene. For other gene families, when orthologues could not be chosen among paralogues based on sequence similarity, synteny conservation was used, whenever possible, to determine the correct orthologue. Most of the time, synteny did not help to find the correct orthologue, and these genes were, therefore, tagged as “family” (“fam.” in table 5). Finally, for S. cerevisiae genes without orthologues by the former approach, we performed BlastP searches, using as a query the S. cerevisiae gene, in the Génolevures database (http://cbi.labri.fr/Genolevures). The best match was in turn used as a query in a BlastP search against the S. cerevisiae genome. Positive bidirectional best hits were validated as real S. cerevisiae orthologues (22 orthologues found using this approach). Polymerase Chain Reaction Analysis of Minisatellite Polymorphism Specific primers were designed to amplify polymerase chain reaction (PCR) fragments of 212 bp (SNF11), 254 bp (PRY2), 315 bp (BUD27), 209 bp (DSN1), 192 bp (SCW11), 292 bp (YKL105c), 321 bp (YOL155c), or 148 bp (NIS1). Primer sequences are available on request. The PCR program used was 95°C for 15 s, 60°C for 1 min, 72°C for 30 s (30 cycles), and a final extension step at 72°C for 10 min. A sample was loaded on a 3% Metaphor agarose gel (TEBU, Le Perray en Yvelines Cedex, France) with 100-bp ladder as a size marker (Eurogentec, Seraing, Belgium). The gel was run overnight at 1 V/cm in 1× TBE. Origin of the different yeast strains used can be found in Richard and Dujon (1996). Results Distribution of Minisatellites in the S. cerevisiae Genome We performed a systematic search of minisatellites in the S. cerevisiae genome using the MREPS sofware (Kolpakov, Bana, and Kucherov 2003) (see Materials and Methods). Using as criteria a minimum repeat unit size of 10 bp and a minimum of three repeat units, we found 84 minisatellites in the genome of S. cerevisae, 55 of them in 49 different protein-coding genes (table 1), 11 in noncoding regions (table 2), and 18 in Y' subtelomeric elements (Louis et al. 1994). Their distribution does not show any obvious bias toward centromeric or telomeric regions (except Y' minisatellites, always subtelomeric) (fig. 1). As expected, repeat units of minisatellites in genes are always multiples of 3 nt, allowing changes in the number of repeat units without disrupting the reading frame. This is not the case for minisatellites in noncoding regions (table 2). Among the 49 minisatellite-containing genes, four are essential genes (table 1). One of them, DSN1, exhibits size polymorphism between different yeast strains (see later), proving that the size change does not disrupt gene function. On the average, minisatellites encoded by genes are longer—both in terms of unit size and of unit number—than those found in noncoding regions (fig. 2). This suggests that amino acid repeats are positively selected, or alternatively that long minisatellites are counter selected in noncoding regions, due to transcription initiation or termination constraints. FIG. 1.— View largeDownload slide Distribution of minisatellites in the Saccharomyces cerevisiae genome. Each chromosome is represented as a black line. Minisatellites in genes are indicated in black, those in noncoding regions in red. Number of units and unit sizes are shown below the vertical line. Short horizontal lines near chromosome telomeres symbolize Y' minisatellites (with their respective unit numbers below). FIG. 1.— View largeDownload slide Distribution of minisatellites in the Saccharomyces cerevisiae genome. Each chromosome is represented as a black line. Minisatellites in genes are indicated in black, those in noncoding regions in red. Number of units and unit sizes are shown below the vertical line. Short horizontal lines near chromosome telomeres symbolize Y' minisatellites (with their respective unit numbers below). FIG. 2.— View largeDownload slide Minisatellite unit size as a function of unit number. Gene names of minisatellite-containing genes with unit numbers or unit sizes statistically larger than the average are indicated. Note that DAN4 contains two minisatellites, one with unit size larger than the average and the other with unit number larger than the average. FIG. 2.— View largeDownload slide Minisatellite unit size as a function of unit number. Gene names of minisatellite-containing genes with unit numbers or unit sizes statistically larger than the average are indicated. Note that DAN4 contains two minisatellites, one with unit size larger than the average and the other with unit number larger than the average. Table 1 Minisatellites in Genes Chr.   Gene   Systematic Name   Nbr.   Size   Gene GC%a   Gene GC Skewa   MS GC%   MS GC Skew   I  FLO9  YAL063cb  13  135  45  −0.26  47  −0.37c  I  FLO1  YAR050wb  10  135  45  −0.25  47  −0.39c  II  —  YBR016w  5  15  51  −0.16  47  +0.03  II  TIP1  YBR067cb  9  18  48  −0.21  57  −0.43c  II  CYC8  YBR112c  3  18  44  −0.16  46  −0.36c  IV  RPO21  YDL140c  6  21  41  +0.01  49  −0.77c        13  21      50  −0.59c  IV  BSC1  YDL037c  16  15  40  −0.39  39  −1.00c  IV  SNF11  YDR073w  6  12  41  −0.16  36  −0.08  IV  —  YDR134cb  3  12  47  −0.30  44  −0.25  IV  NUM1  YDR150w  10  192  38  +0.01  43  −0.01  IV  HKR1  YDR420w  21  42  43  −0.22  53  −0.41c  IV  FIT1  YDR534cb  10  18  47  −0.15  52  −0.12  V  TIR1  YER011wb  7  36  47  −0.16  51  −0.38c  VI  BUD27  YFL023w  6  30  40  +0.21  46  +0.54  VII  NAB2  YGL122c  9  12  44  −0.04  51  −0.60c  VII  SCW11  YGL028c  8  12  41  −0.21  40  −0.86c  VII  CRH1  YGR189cb  5  24  43  −0.14  49  −0.42c  VIII  FLO5  YHR211wb  7  135  44  −0.19  46  −0.24c        3  21      48  −0.07  IX  —  YIL169c  12  42  45  −0.26  45  −0.33c  IX  TIR3  YIL011wb  4  12  47  −0.23  50  −0.42c  IX  PAN1  YIR006c  9  18  43  −0.11  48  0.00        3  21      50  −0.56c  IX  DSN1  YIR010w  6  12  42  −0.003  50  +0.13  IX  FLO11  YIR019cb  5  30  47  −0.41  46  −0.45c        5  36      49  −0.46c  X  PIR2  YJL159w  6  78  49  −0.21  50  −0.18  X  BBC1  YJL020c  3  21  44  −0.03  62  −0.23c  X  DAN1  YJR150cb  5  12  46  −0.13  47  0.00  X  DAN4  YJR151cb  30  18  44  −0.33  44  −0.85c        7  72      46  −0.32  XI  PIR1  YKL164c  8  57  46  −0.22  47  −0.30c  XI  PIR3  YKL163w  6  54  46  −0.08  50  −0.19c  XI  —  YKL105c  8  18  41  +0.003  38  +0.53  XI  —  YKL023w  4  12  41  +0.2  31  +0.60  XI  PRY2  YKR013w  6  18  48  −0.19  53  −0.72c  XI  FLO10  YKR102w  3  81  44  −0.17  51  −0.12  XII  —  YLR114c  3  27  39  +0.14  56  +0.60  XII  CTS1  YLR286c  3  15  41  −0.12  49  0.00  XII  CHS5  YLR330w  9  21  42  +0.09  48  +0.08  XII  CCW14  YLR390w-ab  3  33  48  −0.28  55  −0.44c  XIII  DDR48  YMR173w  6  24  39  −0.10  37  −0.28c        4  24      43  −0.37c  XIII  —  YMR317wb  16  36  43  −0.23  47  −0.07  XIV  UBP10  YNL186w  7  12  39  +0.02  48  +0.40  XIV  NIS1  YNL078w  5  12  45  −0.11  38  −0.48c  XIV  AGA1  YNR044wb  17  21  42  −0.38  41  −0.66c  XV  —  YOL155cb  5  39  47  −0.26  53  −0.33c  XV  WSC3  YOL105c  17  12  42  −0.15  48  −0.96c  XV  TIR4  YOR009wb  12  36  47  −0.25  50  −0.20  XV  TIR2  YOR010cb  5  33  47  −0.17  51  −0.31c  XV  PET127  YOR017w  4  18  37  +0.04  47  0.00c  XV  FIT3  YOR383cb  3  15  51  −0.21  60  −0.33c  XVI  MF(ALPHA)1  YPL187w  3  63  44  +0.02  49  +0.08  Mean      8  36  44.0  −0.13  47.6  −0.25  SE       0.7   5.3   0.5   0.02   0.8   0.05   Chr.   Gene   Systematic Name   Nbr.   Size   Gene GC%a   Gene GC Skewa   MS GC%   MS GC Skew   I  FLO9  YAL063cb  13  135  45  −0.26  47  −0.37c  I  FLO1  YAR050wb  10  135  45  −0.25  47  −0.39c  II  —  YBR016w  5  15  51  −0.16  47  +0.03  II  TIP1  YBR067cb  9  18  48  −0.21  57  −0.43c  II  CYC8  YBR112c  3  18  44  −0.16  46  −0.36c  IV  RPO21  YDL140c  6  21  41  +0.01  49  −0.77c        13  21      50  −0.59c  IV  BSC1  YDL037c  16  15  40  −0.39  39  −1.00c  IV  SNF11  YDR073w  6  12  41  −0.16  36  −0.08  IV  —  YDR134cb  3  12  47  −0.30  44  −0.25  IV  NUM1  YDR150w  10  192  38  +0.01  43  −0.01  IV  HKR1  YDR420w  21  42  43  −0.22  53  −0.41c  IV  FIT1  YDR534cb  10  18  47  −0.15  52  −0.12  V  TIR1  YER011wb  7  36  47  −0.16  51  −0.38c  VI  BUD27  YFL023w  6  30  40  +0.21  46  +0.54  VII  NAB2  YGL122c  9  12  44  −0.04  51  −0.60c  VII  SCW11  YGL028c  8  12  41  −0.21  40  −0.86c  VII  CRH1  YGR189cb  5  24  43  −0.14  49  −0.42c  VIII  FLO5  YHR211wb  7  135  44  −0.19  46  −0.24c        3  21      48  −0.07  IX  —  YIL169c  12  42  45  −0.26  45  −0.33c  IX  TIR3  YIL011wb  4  12  47  −0.23  50  −0.42c  IX  PAN1  YIR006c  9  18  43  −0.11  48  0.00        3  21      50  −0.56c  IX  DSN1  YIR010w  6  12  42  −0.003  50  +0.13  IX  FLO11  YIR019cb  5  30  47  −0.41  46  −0.45c        5  36      49  −0.46c  X  PIR2  YJL159w  6  78  49  −0.21  50  −0.18  X  BBC1  YJL020c  3  21  44  −0.03  62  −0.23c  X  DAN1  YJR150cb  5  12  46  −0.13  47  0.00  X  DAN4  YJR151cb  30  18  44  −0.33  44  −0.85c        7  72      46  −0.32  XI  PIR1  YKL164c  8  57  46  −0.22  47  −0.30c  XI  PIR3  YKL163w  6  54  46  −0.08  50  −0.19c  XI  —  YKL105c  8  18  41  +0.003  38  +0.53  XI  —  YKL023w  4  12  41  +0.2  31  +0.60  XI  PRY2  YKR013w  6  18  48  −0.19  53  −0.72c  XI  FLO10  YKR102w  3  81  44  −0.17  51  −0.12  XII  —  YLR114c  3  27  39  +0.14  56  +0.60  XII  CTS1  YLR286c  3  15  41  −0.12  49  0.00  XII  CHS5  YLR330w  9  21  42  +0.09  48  +0.08  XII  CCW14  YLR390w-ab  3  33  48  −0.28  55  −0.44c  XIII  DDR48  YMR173w  6  24  39  −0.10  37  −0.28c        4  24      43  −0.37c  XIII  —  YMR317wb  16  36  43  −0.23  47  −0.07  XIV  UBP10  YNL186w  7  12  39  +0.02  48  +0.40  XIV  NIS1  YNL078w  5  12  45  −0.11  38  −0.48c  XIV  AGA1  YNR044wb  17  21  42  −0.38  41  −0.66c  XV  —  YOL155cb  5  39  47  −0.26  53  −0.33c  XV  WSC3  YOL105c  17  12  42  −0.15  48  −0.96c  XV  TIR4  YOR009wb  12  36  47  −0.25  50  −0.20  XV  TIR2  YOR010cb  5  33  47  −0.17  51  −0.31c  XV  PET127  YOR017w  4  18  37  +0.04  47  0.00c  XV  FIT3  YOR383cb  3  15  51  −0.21  60  −0.33c  XVI  MF(ALPHA)1  YPL187w  3  63  44  +0.02  49  +0.08  Mean      8  36  44.0  −0.13  47.6  −0.25  SE       0.7   5.3   0.5   0.02   0.8   0.05   NOTE.—RPO21, essential gene; Chr., chromosome number; Nbr., number of repeats; Size, unit size (bp); SE, standard error; and MS: minisatellite. a Gene GC skews and GC% were calculated excluding the minisatellite. b GPI-containing gene (see text). c Minisatellite GC skew statistically lower than the gene GC skew. View Large Table 2 Minisatellites in Noncoding Regions Chr.   Location   Nbr.   Size   MS GC%   MS GC skewa   II  YBR246w-YBR247c  5  12  27  +0.38  III  YCL074w-YCL073c  4  17  52  −0.14  IV  YDR534c-YDR535c  3  25  33  −0.52  IX  RPL34B intron  3  11  18  0.00  X  ARS1006  3  14  2  −1.00  XI  YKL072w-YKL071w  6  12  26  +0.89  XIII  YMR243c-YMR244w  5  12  17  −1.00  XV  YOL143c-YOL142w  4  10  45  −0.11  XV  YOL005c-YOL004w  3  13  62  0.00  XVI  YPL179w-YPL178w  3  12  8  −1.00  XVI  YPL155c-YPL156c  3  10  30  +0.56  Mean    4  13  29  −0.18  SE     0.3   1.3   5.5   0.20   Chr.   Location   Nbr.   Size   MS GC%   MS GC skewa   II  YBR246w-YBR247c  5  12  27  +0.38  III  YCL074w-YCL073c  4  17  52  −0.14  IV  YDR534c-YDR535c  3  25  33  −0.52  IX  RPL34B intron  3  11  18  0.00  X  ARS1006  3  14  2  −1.00  XI  YKL072w-YKL071w  6  12  26  +0.89  XIII  YMR243c-YMR244w  5  12  17  −1.00  XV  YOL143c-YOL142w  4  10  45  −0.11  XV  YOL005c-YOL004w  3  13  62  0.00  XVI  YPL179w-YPL178w  3  12  8  −1.00  XVI  YPL155c-YPL156c  3  10  30  +0.56  Mean    4  13  29  −0.18  SE     0.3   1.3   5.5   0.20   NOTE.—Chr., chromosome number; Nbr., number of repeats; Size, unit size (bp); SE, standard error; and MS: minisatellite. a The Watson strand was arbitrarily chosen to calculate minisatellite GC skews in noncoding regions. View Large Interestingly, minisatellite-containing genes tend to show, on the average, a higher GC content (44 ± 0.5%) than other yeast genes (ca. 39%, Dujon 1996; Goffeau et al. 1996), and minisatellites in genes exhibit an even higher GC content (47.6 ± 0.8%, table 1). This is not true for minisatellites in noncoding regions which are, on the average, GC poor (table 2). More surprisingly, there is a bias for cytosines as compared to guanines in minisatellite-containing genes which exhibit, on the average, a negative GC skew (−0.13 ± 0.02). This bias is even stronger in minisatellites themselves (−0.25 ± 0.05). Out of 55 minisatellites in genes, 33 (60%) show a negative GC skew, indicative of a strong bias toward cytosines in the coding strand (table 1). Base composition of minisatellites in noncoding regions is, on the average, less biased (−0.18 ± 0.20) and ranges from G-rich to G-poor sequences (table 2). A few examples of GC skews in minisatellite-containing genes are shown in figure 3. FIG. 3.— View largeDownload slide Three examples of GC skews in minisatellites. For each of the three genes (BSC1, DAN4, and WSC3), the GC skew is shown on the y axis (see Materials and Methods). Minisatellite locations are represented by gray shadings. DAN4 contains two minisatellites, but only the first one shows a GC skew significantly lower than the gene GC skew (see table 1). FIG. 3.— View largeDownload slide Three examples of GC skews in minisatellites. For each of the three genes (BSC1, DAN4, and WSC3), the GC skew is shown on the y axis (see Materials and Methods). Minisatellite locations are represented by gray shadings. DAN4 contains two minisatellites, but only the first one shows a GC skew significantly lower than the gene GC skew (see table 1). Human minisatellites are not perfect tandem repeats but a succession of variant repeats, differing from each other by one or more nucleotides. This polymorphism was used to rapidly determine their exact sequence by minisatellite variant repeats mapping (Jeffreys, Neumann, and Wilson 1990). Yeast minisatellites also share this property and contain variant repeats, as examplified in figure 4. FIG. 4.— View largeDownload slide Two examples of minisatellites in the HKR1 and TIR1 genes. Minisatellite repeats have been aligned using ClustalW. Variable nucleotides in repeats are shaded (repeat number 1 was considered as the reference). Numbers to the left indicate the repeat type. HKR1 contains eight different types of variant repeats; TIR1 contains six different types. Underlined sequences are the flanking repeats (see table 6). The distance (in nucleotides) between the last minisatellite nucleotide and the downstream flanking repeat is shown in both cases. Note that the flanking repeats are also found at the 3′ end of each individual repeat, suggesting an ancestral origin of the flanking repeats (see Discussion). FIG. 4.— View largeDownload slide Two examples of minisatellites in the HKR1 and TIR1 genes. Minisatellite repeats have been aligned using ClustalW. Variable nucleotides in repeats are shaded (repeat number 1 was considered as the reference). Numbers to the left indicate the repeat type. HKR1 contains eight different types of variant repeats; TIR1 contains six different types. Underlined sequences are the flanking repeats (see table 6). The distance (in nucleotides) between the last minisatellite nucleotide and the downstream flanking repeat is shown in both cases. Note that the flanking repeats are also found at the 3′ end of each individual repeat, suggesting an ancestral origin of the flanking repeats (see Discussion). Minisatellites located in the subtelomeric Y' elements are made of a 36-bp repeat unit, as previously described (Horowitz and Haber 1984; Haber and Louis 1998). In the sequenced strain, their unit number ranges from 7 (on chromosome IX) to 26 (on chromosome II). Some chromosomes contain two Y' elements, one on each arm, some contain only one, and finally chromosomes I, III, and XI do not contain any Y'. Y' elements are always in the same orientation relative to the centromere. Among the 49 minisatellite-containing genes, it is striking to note that half of them (25 out of 49) are involved in cell wall organization. Among them, a majority encode proteins that are covalently associated to cell wall polysaccharides (FLO9, FLO1, TIP1, TIR1, FLO5, FLO11, PIR2, DAN1, PIR1, and PIR3). A few others are involved in processes such as cell division, budding, transcription, or RNA processing (table 3). Some of these proteins were known to contain internal amino acid repeats (Klis et al. 2002). However, at the DNA level, they do not necessarily contain a recognizable minisatellite. For example, the PIR family (for protein with internal repeats) contains four members (PIR1–4), but only PIR1, PIR2, and PIR3 contain a minisatellite and the fourth member, PIR4, contains a degenerate repeat that does not fulfill our criteria (see Materials and Methods). Among the 25 cell wall genes, 19 are known or predicted to encode a glycosyl-phosphatidylinositol domain (GPI), involved in anchoring the protein to the plasma membrane (Caro et al. 1997; Hagen et al. 2004). GPIs are always located very close to the C-terminal part of the protein (except in the case of FLO9). Minisatellite location is apparently less constrained and corresponds to the first two-thirds of the protein. The average distance of the GPI from the 3′ part of the gene (if one excludes FLO9) is 69 ± 1 bp, whereas the average distance of the minisatellite from the 3′ part of the gene is 1417 ± 291 bp. Table 3 Functions Encoded by Minisatellite-Containing Genes Function   Genes   Nbr.   Cell wall organization  FLO9, FLO1, TIP1, HKR1, FIT1, TIR1, SCW11, CRH1, FLO5, TIR3, FLO11, PIR2, DAN1, DAN4, PIR1, PIR3, FLO10, CTS1, CHS5, CCW14, AGA1, WSC3, TIR4, TIR2, FIT3  25  Cell division and budding  NUM1, BUD27, PAN1, DSN1, BBC1, NIS1  6  Transcription, RNA processing  CYC8, RPO21, SNF11, NAB2, PET127  5  Other  DDR48, UBP10, MF(ALPHA)1  3  Unknown  YBR016w, BSC1, YDR134c, YIL169c, YKL105c, YKL023w, PRY2, YLR114c, YMR317w, YOL155c  10  Total     49   Function   Genes   Nbr.   Cell wall organization  FLO9, FLO1, TIP1, HKR1, FIT1, TIR1, SCW11, CRH1, FLO5, TIR3, FLO11, PIR2, DAN1, DAN4, PIR1, PIR3, FLO10, CTS1, CHS5, CCW14, AGA1, WSC3, TIR4, TIR2, FIT3  25  Cell division and budding  NUM1, BUD27, PAN1, DSN1, BBC1, NIS1  6  Transcription, RNA processing  CYC8, RPO21, SNF11, NAB2, PET127  5  Other  DDR48, UBP10, MF(ALPHA)1  3  Unknown  YBR016w, BSC1, YDR134c, YIL169c, YKL105c, YKL023w, PRY2, YLR114c, YMR317w, YOL155c  10  Total     49   NOTE.—Nbr., number of repeats. View Large Among amino acids encoded by minisatellites, serine and threonine are the most abundant, representing together 42% of the total (table 4). Among them, minisatellite-containing genes encoding cell wall proteins contain more Ser and Thr residues than other proteins. On the average, cell wall protein repeats contain 59% of Ser + Thr residues, whereas other classes of minisatellite-containing proteins contain from 13% to 26% of Ser + Thr. The second most frequent amino acids encoded by minisatellites are alanine (9%), glutamic acid (7%), and valine (7%). Each of the other amino acids is found only one to five percent of the time. This is very different from what was observed for genes encoding trinucleotide repeats (a particular class of microsatellites) in which glutamine, asparagine, glutamic acid, and aspartic acid are the four most common amino acids encoded by these repeats, and genes containing these repeats are mostly transcription factors (Richard and Dujon 1996; Alba, Santibañez-Koref, and Hancock 1999; Young, Sloan, and Van Riper 2000; Malpertuy, Dujon, and Richard 2003). In S. cerevisiae, the serine-threonine–rich repeats are thought to be the sites of O-mannosylations by the Pmt4 protein, these glycosylations taking place in the endoplasmic reticulum and being important for maintaining the protein at the cell wall surface (Ecker et al. 2003; Latgé and Calderone 2005). Table 4 Amino Acids Encoded by Minisatellites in Saccharomyces cerevisiae Gene   Systematic Name   Motif Size (aa)   Amino Acid Motif Sequencea   Ser %b   Thr %b   (Ser + Thr) %   FLO9c  YAL063c  45  DTFTSSTELTTVTGTNGLPTDETIIVIRTPTTATTAMTTTQPWNT  4  40  44  FLO1c  YAR050w  45  TFTSTSTELTTVTGTNGLPTDETIIVIRTPTTATTAMTTTQPWNS  7  40  47    YBR016w  5  YNQQG  —  —  0  TIP1c  YBR067c  6  EAASSS  50  —  50  CYC8  YBR112C  6  QAQAQA  —  —  0  RPO21  YDL140c  7  PSYSPTS  43  14  57      7  PTSPSYS  43  14  57  BSC1  YDL037c  5  STTSS  60  40  100  SNF11  YDR073w  4  TANA  —  25  25    YDR134c  4  TEKP  —  25  25  NUM1  YDR150w  64  AYSELEKKLEQPSLEYLVEHAKATNHHLLSDSAYEDLVKCKENP DMEFLKEKSAKLGHTVVSNE  9  3  12  HKR1c  YDR420w  14  APAAISSTYTSSPS  36  14  50  FIT1c  YDR534c  6  ASSAVE  33  —  33  TIR1c  YER011w  12  SSSSEAKSSSAA  58  —  58  BUD27  YFL023w  10  VVGDIIEKEP  —  —  0  NAB2  YGL122c  4  PQQQ  —  —  0  SCW11c  YGL028c  4  TSSS  75  25  100  CRH1c  YGR189c  8  SSTVSSSA  63  13  76  FLO5c  YHR211w  45  TFTSTSTEMTTITDTNGQLTDETVIVIRTPTTASTITTTTEPWTG  7  42  49      7  QTKGTTE  —  43  43    YIL169c  14  VVSSSVSQSSSSAS  64  —  64  TIR3c  YIL011w  4  SAAS  50  —  50  PAN1  YIR006c  6  PTQPVQ  —  17  17      7  PQTTGMM  —  29  29  DSN1  YIR010w  4  ATAN  —  25  25  FLO11c  YIR019c  10  SSTTTSSTSE  50  40  90      12  PVTSSTTESSSA  42  42  84  PIR2c  YJL159w  26  GDGQVQAATTTASVSTKSTAAAVSQI  15  19  34  BBC1  YJL020c  7  VPVPAAT  —  14  14  DAN1c  YJR150c  4  VASS  50  —  50  DAN4c  YJR151c  6  TTPTTS  17  67  84      24  SAEPTTVSEVTSSVEPTRSSQVTS  29  21  50  PIR1c  YKL164c  19  QIGDGQIQATTKTTAAAVS  5  21  26  PIR3c  YKL163w  18  VSQITDGQVQAAKSTAAA  11  11  22    YKL105c  6  ENVDDD  —  —  0    YKL023w  4  KQEK  —  —  0  PRY2  YKR013w  6  SPTTTT  17  67  84  FLO10c  YKR102w  27  SSWSSSEVCTECTETESTSYVTPYVTS  30  22  52    YLR114c  9  GEGDENGDD  —  —  0  CTS1c  YLR286c  5  STSSG  60  20  80  CHS5c  YLR330w  7  EDSNEPV  14  —  14  CCW14c  YLR390w-a  11  ASSSTKASSSS  64  9  73  DDR48  YMR173w  8  SNNNDSYG  25  —  25      8  SNNNDSYG  25  —  25    YMR317w  12  SSPVSSEAPSAT  42  8  50  UBP10  YNL186w  4  DIGE  —  —  0  NIS1  YNL078w  4  SNTN  25  25  50  AGA1c  YNR044w  7  SLSSTST  57  29  86    YOL155c  13  GSSVSGSTSATES  46  15  61  WSC3c  YOL105c  4  TTSS  50  50  100  TIR4c  YOR009w  12  SSSVAPSSSEVV  50  —  50  TIR2c  YOR010c  11  SSSETTSSAVA  45  18  63  PET127  YOR017w  6  YPGRRT  —  17  17  FIT3c  YOR383c  5  SAAET  20  20  40  MF(α)1   YPL187w   21   KREAEAEAWHWLQLKPGQPMY   —   —   0   Gene   Systematic Name   Motif Size (aa)   Amino Acid Motif Sequencea   Ser %b   Thr %b   (Ser + Thr) %   FLO9c  YAL063c  45  DTFTSSTELTTVTGTNGLPTDETIIVIRTPTTATTAMTTTQPWNT  4  40  44  FLO1c  YAR050w  45  TFTSTSTELTTVTGTNGLPTDETIIVIRTPTTATTAMTTTQPWNS  7  40  47    YBR016w  5  YNQQG  —  —  0  TIP1c  YBR067c  6  EAASSS  50  —  50  CYC8  YBR112C  6  QAQAQA  —  —  0  RPO21  YDL140c  7  PSYSPTS  43  14  57      7  PTSPSYS  43  14  57  BSC1  YDL037c  5  STTSS  60  40  100  SNF11  YDR073w  4  TANA  —  25  25    YDR134c  4  TEKP  —  25  25  NUM1  YDR150w  64  AYSELEKKLEQPSLEYLVEHAKATNHHLLSDSAYEDLVKCKENP DMEFLKEKSAKLGHTVVSNE  9  3  12  HKR1c  YDR420w  14  APAAISSTYTSSPS  36  14  50  FIT1c  YDR534c  6  ASSAVE  33  —  33  TIR1c  YER011w  12  SSSSEAKSSSAA  58  —  58  BUD27  YFL023w  10  VVGDIIEKEP  —  —  0  NAB2  YGL122c  4  PQQQ  —  —  0  SCW11c  YGL028c  4  TSSS  75  25  100  CRH1c  YGR189c  8  SSTVSSSA  63  13  76  FLO5c  YHR211w  45  TFTSTSTEMTTITDTNGQLTDETVIVIRTPTTASTITTTTEPWTG  7  42  49      7  QTKGTTE  —  43  43    YIL169c  14  VVSSSVSQSSSSAS  64  —  64  TIR3c  YIL011w  4  SAAS  50  —  50  PAN1  YIR006c  6  PTQPVQ  —  17  17      7  PQTTGMM  —  29  29  DSN1  YIR010w  4  ATAN  —  25  25  FLO11c  YIR019c  10  SSTTTSSTSE  50  40  90      12  PVTSSTTESSSA  42  42  84  PIR2c  YJL159w  26  GDGQVQAATTTASVSTKSTAAAVSQI  15  19  34  BBC1  YJL020c  7  VPVPAAT  —  14  14  DAN1c  YJR150c  4  VASS  50  —  50  DAN4c  YJR151c  6  TTPTTS  17  67  84      24  SAEPTTVSEVTSSVEPTRSSQVTS  29  21  50  PIR1c  YKL164c  19  QIGDGQIQATTKTTAAAVS  5  21  26  PIR3c  YKL163w  18  VSQITDGQVQAAKSTAAA  11  11  22    YKL105c  6  ENVDDD  —  —  0    YKL023w  4  KQEK  —  —  0  PRY2  YKR013w  6  SPTTTT  17  67  84  FLO10c  YKR102w  27  SSWSSSEVCTECTETESTSYVTPYVTS  30  22  52    YLR114c  9  GEGDENGDD  —  —  0  CTS1c  YLR286c  5  STSSG  60  20  80  CHS5c  YLR330w  7  EDSNEPV  14  —  14  CCW14c  YLR390w-a  11  ASSSTKASSSS  64  9  73  DDR48  YMR173w  8  SNNNDSYG  25  —  25      8  SNNNDSYG  25  —  25    YMR317w  12  SSPVSSEAPSAT  42  8  50  UBP10  YNL186w  4  DIGE  —  —  0  NIS1  YNL078w  4  SNTN  25  25  50  AGA1c  YNR044w  7  SLSSTST  57  29  86    YOL155c  13  GSSVSGSTSATES  46  15  61  WSC3c  YOL105c  4  TTSS  50  50  100  TIR4c  YOR009w  12  SSSVAPSSSEVV  50  —  50  TIR2c  YOR010c  11  SSSETTSSAVA  45  18  63  PET127  YOR017w  6  YPGRRT  —  17  17  FIT3c  YOR383c  5  SAAET  20  20  40  MF(α)1   YPL187w   21   KREAEAEAWHWLQLKPGQPMY   —   —   0   a The first repeat unit of the minisatellite is shown. b Serine and threonine percentages are given according to the first repeat unit, and because units are slightly different from each other, these numbers may therefore slightly vary when the whole minisatellite is considered. c Cell wall gene. View Large Note that some repeat sequences are similar (FLO1, FLO5, and FLO9 or PIR1 and PIR2) and may have arisen by gene conversion. Meiotic Hot Spots and Minisatellites It was previously shown in man and yeast that minisatellites located near a meiotic hot spot expand and contract at a high frequency during meiosis (Appelgren, Cederberg, and Rannug 1997; Jeffreys, Murray, and Neumann 1998; Debrauwère et al. 1999). We asked whether minisatellites were close to meiotic hot spots, defined from whole-genome analyses of meiotic DSB sites (Gerton et al. 2000; Borde et al. 2004). In S. cerevisiae, meiotic gene conversion tracts are rather limited in size (1–2 kb) (reviewed in Pâques and Haber 1999). We found two minisatellites within 2 kb from a meiotic hot spot (SNF11 and PRY2; fig. 1) and four others within 5 kb of a hot spot (FLO9, RPL34B, DSN1, BBC1; fig. 1). Given the numbers of hot spots and minisatellites in the yeast genome, a random distribution would generate, respectively, two minisatellites within 2 kb of a hot spot and five within 5 kb, which is not different from what we found. We therefore rejected the hypothesis that minisatellites are associated to meiotic hot spots more often than randomly expected. Minisatellite Size Polymorphism Among Different Yeast Strains We previously demonstrated microsatellite size polymorphism among laboratory or industrial yeast strains or strains isolated from infected patients (Richard and Dujon 1996; Hennequin et al. 2001). This size polymorphism was used to classify the strains studied and could be used as a typing method to find their origin. In order to determine to what extent natural yeast minisatellites were also polymorphic, we selected eight independent laboratory haploid yeast strains, based on the uniqueness of their microsatellite haplotype (Richard and Dujon 1996) and studied eight minisatellite loci. Four were chosen within 2 or 5 kb of a meiotic hot spot (SNF11, BUD27, PRY2, and DSN1; fig. 1). The other four were selected so that their unit size and unit number were as similar to the first four as possible and so that they were not close to a hot spot. Unique primers were designed to PCR amplify the eight minisatellites in each strain, strain FYBL1-8B (a derivative of the S288C sequenced strain) being used as the reference. Six out of the eight minisatellites exhibited size polymorphism (only SCW11 and PRY2 minisatellites did not) (fig. 5). We were able to assign each strain to a specific unique haplotype because we did not find two strains with the same haplotype. Interestingly, the HC9-7 strain exhibited three different bands at the NIS1 locus, on chromosome XIV. In the other strains, amplification of this minisatellite was very specific because it amplified only one band. The HC9-7 strain also showed two different alleles of a microsatellite located on chromosome XI in a former study (Richard and Dujon 1996). Therefore, there must be some aneuploidy (or segmental duplications) in this particular strain. FIG. 5.— View largeDownload slide Minisatellite size polymorphism in different yeast strains. PCR products of each locus are run in parallel on the same gel to estimate size variations. Strain FYBL1-8B is used as the size control. Size variations were estimated using the 100-bp ladder. (A) An example of a stable locus in the strains studied (SCW11). (B) Two examples of unstable loci in the strains studied (SNF11 and BUD27). (C) Summary of PCR amplification of the eight loci studied. The number of different alleles are shown to the right. Only two minisatellites (marked by an asterisk) exhibit no size polymorphism (SCW11 and PRY2). FIG. 5.— View largeDownload slide Minisatellite size polymorphism in different yeast strains. PCR products of each locus are run in parallel on the same gel to estimate size variations. Strain FYBL1-8B is used as the size control. Size variations were estimated using the 100-bp ladder. (A) An example of a stable locus in the strains studied (SCW11). (B) Two examples of unstable loci in the strains studied (SNF11 and BUD27). (C) Summary of PCR amplification of the eight loci studied. The number of different alleles are shown to the right. Only two minisatellites (marked by an asterisk) exhibit no size polymorphism (SCW11 and PRY2). We did not find any difference in the degree of polymorphism of minisatellites located near meiotic hot spots or far from them. In both cases, three minisatellites out of four showed some level of polyphormism (fig. 5). The number of different alleles for a given minisatellite is not correlated to the presence of a hot spot either. We therefore concluded that minisatellite stability in these different laboratory strains did not depend on the presence of a near meiotic hot spot. Conservation of Minisatellites in Hemiascomycetous Yeasts In order to estimate minisatellite conservation during evolution, we investigated other hemiascomycetous yeast genomes (fig. 6). Saccharomyces paradoxus is a Saccharomyces sensu stricto, very close to S. cerevisiae. Candida glabrata is a pathogenic yeast, a causative agent of human candidiasis (Bennett, Izumikawa, and Marr 2004). Kluyveromyces lactis is also related to S. cerevisiae and has been used for genetic studies or industrial applications (Bolotin-Fukuhara et al. 2000). Debaryomyces hansenii is a halotolerant yeast, phylogenetically close to the pathogen Candida albicans (Lépingle et al. 2000). Yarrowia lipolytica is a more distantly related yeast, able to grow as individual cells or as a mycelium (Casarégola et al. 2000). The evolutionary distance between S. cerevisiae and Y. lipolytica, measured as the amino acid divergence between orthologous proteins, is larger than the entire phylum of Chordates (Dujon et al. 2004). FIG. 6.— View largeDownload slide (A) Phylogenetic tree of hemiascomycetous yeast species used in this study, based on Dujon et al. (2004), showing evolution of the PRY2 minisatellite in hemiascomycetes. Self-dot matrices of 250 bp surrounding the minisatellite are shown (stringency: 7, window: 9). In Candida glabrata, no orthologue could be unambiguously assigned because PRY2 is part of a two-member gene family in this species, but none of the two homologues contains a minisatellite. (B) Alignment of the region containing the minisatellite. Upstream and downstream protein sequences are perfectly aligned but are not shown here. Boxed sequences represent the three minisatellites, and the dotted box represents the minisatellite relic in Debaryomyces hansenii. There is no detectable tandem repeat sequence in this region in Kluyveromyces lactis. FIG. 6.— View largeDownload slide (A) Phylogenetic tree of hemiascomycetous yeast species used in this study, based on Dujon et al. (2004), showing evolution of the PRY2 minisatellite in hemiascomycetes. Self-dot matrices of 250 bp surrounding the minisatellite are shown (stringency: 7, window: 9). In Candida glabrata, no orthologue could be unambiguously assigned because PRY2 is part of a two-member gene family in this species, but none of the two homologues contains a minisatellite. (B) Alignment of the region containing the minisatellite. Upstream and downstream protein sequences are perfectly aligned but are not shown here. Boxed sequences represent the three minisatellites, and the dotted box represents the minisatellite relic in Debaryomyces hansenii. There is no detectable tandem repeat sequence in this region in Kluyveromyces lactis. In the closest species, S. paradoxus, we found one orthologue for each of the 49 S. cerevisiae minisatellite-containing genes. In 73% of the cases (36 out of 55 minisatellites), a minisatellite was also found in the S. paradoxus gene (table 5). Except in 8 cases out of 36, the motif unit has the same size as in S. cerevisiae. In one case (MF(alpha)1), the minisatellite is not detectable at the DNA level anymore, although the protein repeat is still present. We called it a “minisatellite relic” as a reminiscence of the term “gene relic” used to describe very degenerate genes found in the genomes of hemiascomycetous yeasts (Lafontaine et al. 2004; Lafontaine and Dujon, in preparation). Altogether, these observations show that although there is an excellent conservation of genes between S. cerevisiae and S. paradoxus, minisatellites are much less conserved, suggesting a fast evolution rate of these tandem repeat sequences, reminiscent of what was observed for microsatellites in a previous study (Malpertuy, Dujon, and Richard 2003). Among minisatellites that are conserved between S. cerevisiae and S. paradoxus, a clear bias for serine and threonine was also observed in S. paradoxus. Table 5 Conservation of Minisatellites in Hemiascomycetous Yeast Species Saccharomyces cerevisiae       Saccharomyces paradoxus   Candida glabrata   Kluyveromyces lactis   Debaryomyces hansenii   Yarrowia lipolytica   Gene   Systematic Name   Minisatellite             FLO9  YAL063c  13 × 135  10 × 135  Fam. (22)  Fam. (21)  Fam. (9)  Fam. (18)  FLO1  YAR050w  10 × 135  3 × 135  Fam. (22)  Fam. (20)  Fam. (10)  Fam. (17)  —  YBR016w  5 × 15  No  No  —  —  3 × 15  TIP1  YBR067c  9 × 18  8 × 18  Fam. (6)  —  —  —  CYC8  YBR112c  3 × 18  10 × 18  No  4 × 9  No  No  RPO21  YDL140c  6 × 21  No  18 × 21  14 × 21  16 × 21  8 × 21      13 × 21  No  (1)  (1)  (1)  (1)  BSC1  YDL037c  16 × 15  6 × 36  —  Fam. (4)  —  —  SNF11  YDR073w  6 × 12  3 × 12  —  No  —  —  Pseudo  YDR134c  3 × 12  4 × 12  No  No  —  —  NUM1  YDR150w  10 × 192  2 × 213  No  No  6 × 96  —  HKR1  YDR420w  21 × 42  No  Fam. (11)  Relic  —  —  FIT1  YDR534c  10 × 18  11 × 18  —  Relic  —  —  TIR1  YER011w  7 × 36  6 × 36  Relic  —  —  —  BUD27  YFL023w  6 × 30  3 × 30  8 × 27  No  5 × 30  —  NAB2  YGL122c  9 × 12  No  No  No  4 × 12  No  SCW11  YGL028c  8 × 12  5 × 12  No  No  No  No  CRH1  YGR189c  5 × 24  4 × 24  5 × 24  No  Fam. (6)  6 × 15  FLO5  YHR211w  7 × 135  11 × 135  Fam. (25)  Fam. (21)  Fam. (8)  Fam. (20)      3 × 21  No          —  YIL169c  12 × 42  5 × 42  Fam. (14)  —  —  —  TIR3  YIL011w  4 × 12  4 × 33  Fam. (10)  —  —  —  PAN1  YIR006c  9 × 18  15 × 9  No  No  No  4 × 30      3 × 21  No  No  No  No  No  DSN1  YIR010w  6 × 12  No  No  No  —  —  FLO11  YIR019c  5 × 30  8 × 81  Fam. (42)  Fam. (20)  Fam. (17)  Fam. (65)      5 × 36  4 × 36          PIR2  YJL159w  6 × 78  9 × 57  Fam. (5)  Fam. (3)  Fam. (2)a  Fam. (2)  BBC1  YJL020c  3 × 21  21 × 9  Relic  No  Relic  —  DAN1  YJR150c  5 × 12  No  Fam. (3)  —  —  —  DAN4  YJR151c  29 × 18  25 × 18  Fam. (27)  Fam. (20)  Fam. (13)  Fam. (39)      7 × 72  No          PIR1  YKL164c  8 × 57  10 × 57  Fam. (5)  3 × 57 (2)  5 × 66 (2)  Fam. (2)  PIR3  YKL163w  6 × 54  9 × 54  Fam. (5)  Relic (2)  (2)  Fam. (2)  —  YKL105c  8 × 18  No  No  —  —  —  —  YKL023w  4 × 12  No  —  —  —  —  PRY2  YKR013w  6 × 18  9 × 15  Fam. (2)b  No  Relic  6 × 15  FLO10  YKR102w  3 × 81  No  Fam. (19)  Fam. (16)  Fam. (10)  Fam. (17)  —  YLR114c  3 × 27  No  No  No  No  No  CTS1  YLR286c  3 × 15  No  No  No  Fam. (2)  Fam. (2)b  CHS5  YLR330w  9 × 21  7 × 21  4 × 21  5 × 24  No  5 × 15  CCW14  YLR390w-a  3 × 33  5 × 33  Relic  —  —  9 × 15  DDR48  YMR173w  6 × 24  4 × 24  —  —  —  —      4 × 24  No  —  —  —  —  —  YMR317w  16 × 36  6 × 36  Fam. (22)  Fam. (13)  Fam. (12)  Fam. (34)  UBP10  YNL186w  7 × 12  4 × 12  8 × 12  No  No  No  NIS1  YNL078w  5 × 12  No  No  No  —  —  AGA1  YNR044w  17 × 21  20 × 21  —  —  —  —  —  YOL155c  5 × 39  4 × 39  —  —  —  —  WSC3  YOL105c  17 × 12  8 × 12  —  7 × 12  —  —  TIR4  YOR009w  12 × 36  4 × 36  Fam. (15)  —  —  —  TIR2  YOR010c  5 × 33  No  Fam. (6)  —  —  —  PET127  YOR017w  4 × 18  4 × 18  No  No  No  No  FIT3  YOR383c  3 × 15  5 × 15  —  Fam. (3)c  —  —  MF(ALPHA)1  YPL187w  3 × 63  Relic  3 × 75  3 × 81  —  3 × 102  Conserved genesd      49  23  27  15  16  Minisatellites      36/55  6/24  7/28  5/16  8/17        (73%)  (25%)  (25%)  (31%)  (47%)  Conserved minisatellites      23  2  2  2  1  Minisatellite relics       1   3   3   3   0   Saccharomyces cerevisiae       Saccharomyces paradoxus   Candida glabrata   Kluyveromyces lactis   Debaryomyces hansenii   Yarrowia lipolytica   Gene   Systematic Name   Minisatellite             FLO9  YAL063c  13 × 135  10 × 135  Fam. (22)  Fam. (21)  Fam. (9)  Fam. (18)  FLO1  YAR050w  10 × 135  3 × 135  Fam. (22)  Fam. (20)  Fam. (10)  Fam. (17)  —  YBR016w  5 × 15  No  No  —  —  3 × 15  TIP1  YBR067c  9 × 18  8 × 18  Fam. (6)  —  —  —  CYC8  YBR112c  3 × 18  10 × 18  No  4 × 9  No  No  RPO21  YDL140c  6 × 21  No  18 × 21  14 × 21  16 × 21  8 × 21      13 × 21  No  (1)  (1)  (1)  (1)  BSC1  YDL037c  16 × 15  6 × 36  —  Fam. (4)  —  —  SNF11  YDR073w  6 × 12  3 × 12  —  No  —  —  Pseudo  YDR134c  3 × 12  4 × 12  No  No  —  —  NUM1  YDR150w  10 × 192  2 × 213  No  No  6 × 96  —  HKR1  YDR420w  21 × 42  No  Fam. (11)  Relic  —  —  FIT1  YDR534c  10 × 18  11 × 18  —  Relic  —  —  TIR1  YER011w  7 × 36  6 × 36  Relic  —  —  —  BUD27  YFL023w  6 × 30  3 × 30  8 × 27  No  5 × 30  —  NAB2  YGL122c  9 × 12  No  No  No  4 × 12  No  SCW11  YGL028c  8 × 12  5 × 12  No  No  No  No  CRH1  YGR189c  5 × 24  4 × 24  5 × 24  No  Fam. (6)  6 × 15  FLO5  YHR211w  7 × 135  11 × 135  Fam. (25)  Fam. (21)  Fam. (8)  Fam. (20)      3 × 21  No          —  YIL169c  12 × 42  5 × 42  Fam. (14)  —  —  —  TIR3  YIL011w  4 × 12  4 × 33  Fam. (10)  —  —  —  PAN1  YIR006c  9 × 18  15 × 9  No  No  No  4 × 30      3 × 21  No  No  No  No  No  DSN1  YIR010w  6 × 12  No  No  No  —  —  FLO11  YIR019c  5 × 30  8 × 81  Fam. (42)  Fam. (20)  Fam. (17)  Fam. (65)      5 × 36  4 × 36          PIR2  YJL159w  6 × 78  9 × 57  Fam. (5)  Fam. (3)  Fam. (2)a  Fam. (2)  BBC1  YJL020c  3 × 21  21 × 9  Relic  No  Relic  —  DAN1  YJR150c  5 × 12  No  Fam. (3)  —  —  —  DAN4  YJR151c  29 × 18  25 × 18  Fam. (27)  Fam. (20)  Fam. (13)  Fam. (39)      7 × 72  No          PIR1  YKL164c  8 × 57  10 × 57  Fam. (5)  3 × 57 (2)  5 × 66 (2)  Fam. (2)  PIR3  YKL163w  6 × 54  9 × 54  Fam. (5)  Relic (2)  (2)  Fam. (2)  —  YKL105c  8 × 18  No  No  —  —  —  —  YKL023w  4 × 12  No  —  —  —  —  PRY2  YKR013w  6 × 18  9 × 15  Fam. (2)b  No  Relic  6 × 15  FLO10  YKR102w  3 × 81  No  Fam. (19)  Fam. (16)  Fam. (10)  Fam. (17)  —  YLR114c  3 × 27  No  No  No  No  No  CTS1  YLR286c  3 × 15  No  No  No  Fam. (2)  Fam. (2)b  CHS5  YLR330w  9 × 21  7 × 21  4 × 21  5 × 24  No  5 × 15  CCW14  YLR390w-a  3 × 33  5 × 33  Relic  —  —  9 × 15  DDR48  YMR173w  6 × 24  4 × 24  —  —  —  —      4 × 24  No  —  —  —  —  —  YMR317w  16 × 36  6 × 36  Fam. (22)  Fam. (13)  Fam. (12)  Fam. (34)  UBP10  YNL186w  7 × 12  4 × 12  8 × 12  No  No  No  NIS1  YNL078w  5 × 12  No  No  No  —  —  AGA1  YNR044w  17 × 21  20 × 21  —  —  —  —  —  YOL155c  5 × 39  4 × 39  —  —  —  —  WSC3  YOL105c  17 × 12  8 × 12  —  7 × 12  —  —  TIR4  YOR009w  12 × 36  4 × 36  Fam. (15)  —  —  —  TIR2  YOR010c  5 × 33  No  Fam. (6)  —  —  —  PET127  YOR017w  4 × 18  4 × 18  No  No  No  No  FIT3  YOR383c  3 × 15  5 × 15  —  Fam. (3)c  —  —  MF(ALPHA)1  YPL187w  3 × 63  Relic  3 × 75  3 × 81  —  3 × 102  Conserved genesd      49  23  27  15  16  Minisatellites      36/55  6/24  7/28  5/16  8/17        (73%)  (25%)  (25%)  (31%)  (47%)  Conserved minisatellites      23  2  2  2  1  Minisatellite relics       1   3   3   3   0   NOTE.—Fam. (n): gene is part of a family in this species, n is the number of members in the family; “—,” no orthologue could be unambiguously assigned; No: at least one orthologue could be assigned but no minisatellite could be detected; Relic: a very degenerate minisatellite could be detected; (1): one minisatellite replaces the two minisatellites found in S. cerevisiae; (2): PIR1 or PIR3; the underlined minisatellites have the same motif sequence as in S. cerevisiae (see text); and the minisatellites that are not underlined have a different motif sequence as compared to S. cerevisiae (see text). a All members of the family contain a minisatellite relic. b None of the members of the family contains a minisatellite. c All members of the family contain a minisatellite. d Excluding gene families. View Large We subsequently looked for minisatellite conservation in the four other species. Finding orthologues of S. cerevisiae genes was more challenging, even in the second closest species, C. glabrata, because many of the minisatellite-containing genes belong to gene families containing from 2 to 65 paralogous members (table 5). Most of the time, we could not use the synteny data to choose among several homologues because synteny breakpoints are frequent in regions containing dispersed repeated elements, like retrotransposons or gene families (Fischer et al. 2000, 2001). As expected, minisatellite-containing genes were easier to identify in C. glabrata and K. lactis, as compared to the more distant D. hansenii and Y. lipolytica. However, we more often found a minisatellite in Y. lipolytica (8 minisatellites out of 17 conserved genes, chi-square test: P = 0.05) than in C. glabrata, K. lactis, or D. hansenii (table 5). Also, three minisatellite relics are found in C. glabrata, K. lactis, and D. hanseni and none in Y. lipolytica. When minisatellite sequences were compared, most of the time their sequence was found to be different between S. cerevisiae and the other hemiascomycetous yeast species. Sequence alignments showed that in 25% of the cases accumulation of point mutations “erased” the minisatellite, in 25% there was a complete deletion of the minisatellite although the protein sequence is conserved upstream and downstream of it, and in the remaining cases (50%) a mix of point mutations and small deletions led to the loss of the minisatellite. The kind of mutational events encountered is reminiscent of what was observed for microsatellites in different yeast species (Malpertuy, Dujon, and Richard 2003). In S. paradoxus, 23 minisatellite sequences are conserved, whereas in other species, only 2 sequences in C. glabrata, K. lactis, and D. hansenii and 1 sequence in Y. lipolytica are conserved. Note that the only minisatellite whose repeat motif is conserved in all species in which it is found is the RPO21 minisatellite. This minisatellite is split in two minisatellites in baker's yeast, separated by only 9 nt, whereas in the other species there is only one minisatellite covering the same region of the gene. In conclusion, when a minisatellite is found in a yeast species, its sequence is most of the time different from the S. cerevisiae sequence, although it is located at the same position within the gene. A striking example is the case of the PRY2 minisatellite, present in three species and only found as a relic in D. hansenii (fig. 6A). Protein sequence alignment shows that the repeat unit is different in the four species in which it is found (fig. 6B). This raises the intriguing question of the origin of this minisatellite. Either there was a minisatellite in the common ancestor of the PRY2 gene and it diverged rapidly in all species or each species acquired independently a different minisatellite in the same gene, suggesting that some genes might be preferential targets for minisatellite formation. Discussion In the present work, we report the first comprehensive analysis of minisatellites in the genome of a completely sequenced organism. To the best of our knowledge, although numerous papers describing microsatellites in eukaryotic and prokaryotic genomes have been published (Richard and Dujon 1997; Alba, Santibañez-Koref, and Hancock 1999; Richard et al. 1999; Toth, Gaspari, and Jurka 2000; Young, Sloan, and Van Riper 2000; International Human Genome Sequencing Consortium 2001; Morgante, Hanafey, and Powell 2002; Malpertuy, Dujon, and Richard 2003; Subramanian, Mishra, and Singh 2003), only one work in the literature reports the presence of some minisatellites in a sequenced genome, Tetraodon nigroviridis (Roest Crollius et al. 2000). The authors found two main locations for minisatellites, respectively, in the subtelocentric (10mer minisatellite) and in the centromeric (118mer minisatellite) regions. Other minisatellites were also found but remain negligible as compared to the two main locations and were not further described by the authors. We have found altogether 84 minisatellites in the S. cerevisiae genome showing no obvious distribution bias toward telomeric or centromeric regions (if one excludes the Y' minisatellite). Out of 66 non-Y' minisatellites, 55 are found in genes, a slightly higher proportion than expected, given the respective genome coverage of coding and noncoding regions (47 expected). Out of 49 minisatellite-containing genes, half of them encode cell wall proteins, particularly proteins covalently associated to cell wall polysaccharides. These minisatellites encode serine- and threonine-rich amino acid repeats, involved in the O-glycosylation of the protein. It must be noted that some minisatellite-containing gene functions are unknown. Therefore, the presence of a Ser/Thr minisatellite might give a clue to the putative involvement of a gene of yet unknown function in cell wall organization. It is the case of YIL169c, YMR317w, BSC1, PRY2, and YOL155c, containing a Ser/Thr-rich repeat and being conserved at least in S. paradoxus (table 5). The other genes of unknown function (YBR016w, YKL105c, YKL023w, and YLR114c) do not encode Ser/Thr-rich repeats and might be involved in functions different from cell wall metabolism. It was surprising to find a strong negative GC skew in minisatellites and in their associated genes. We investigated whether it was linked to amino acid composition and/or to codon usage bias because it was shown that biased base composition in CAG repeat containing genes in human and mouse were due to their unusual amino acid content (Hancock, Worthey, and Santibañez-Koref 2001). Among the six possible serine codons, TCT is overrepresented and AGT underrepresented, and among the four possible threonine codons, ACT and ACC are overrepresented and ACA and ACG are underrepresented in minisatellites. This could be the basis for the GC skew observed in some cases but not all because 5 minisatellites exhibit a (Ser + Thr) composition higher than 50% and no skew (DAN1, FLO10, CTS1, YMR317w, and TIR4; tables 1 and 4), whereas 10 minisatellites exhibit a (Ser + Thr) composition lower than 50% and a negative GC skew (FLO9, FLO1, FLO5, PAN1, BBC1, PIR1, PIR3, DDR48 [both repeats], and FIT3; tables 1 and 4). Many circular bacterial chromosomes exhibit a strong GC skew with guanines more abundant on the leading strand of DNA replication, on each side of the replication origin (Lobry 1996). In addition, an opposite GC skew (cytosines > guanines) was recently described around the transcription start site of Arabidopsis thaliana and Oryza sativa (rice) genes. Some fungal genomes (but not S. cerevisiae) show the same bias around their transcription initiation regions (Fujimori, Washio, and Tomita 2005). But so far, no such compositional bias was described in S. cerevisiae. No other significant skew was found in the minisatellites described here, so it is specifically an overrepresentation of cytosines in the gene-coding strand. No obvious correlation was found between minisatellite locations and replication profiles of yeast chromosomes (Raghuraman et al. 2001) because out of 33 minisatellites exhibiting a negative GC skew (table 1), 18 are predicted to be replicated on the leading strand and 15 on the lagging strand during S-phase replication. Finally, no significant association of minisatellites with replication origins was found. Possible Molecular Mechanisms Propagating Minisatellites Despite earlier observations of minisatellite instability due to the presence of a nearby meiotic hot spot, we found no preferential association of minisatellites with them and no greater polymorphism for those near a hot spot, suggesting that in S. cerevisiae, minisatellites mainly evolve independently of such hot spots. However, it was recently shown that meiotic hot spots are, for their most part, not conserved between humans and chimpanzees, despite 99% conservation of the DNA sequence between these two species (Winckler et al. 2005). Therefore, we cannot rule out that the minisatellites we found, originally arose near ancient meiotic hot spots that have since disappeared. A possible mechanism to explain minisatellite origin in yeast was proposed by Haber and Louis (1998). They observed that the Y' minisatellite, a Saccharomyces carlbergensis minisatellite, and several human minisatellites are flanked by two short identical sequences. They speculated that an initial duplication event, resulting from replication slippage between these two short sequences, was responsible for the birth of the minisatellite, followed in next generations by unequal crossing-over between sister chromatids or again, replication slippage leading to minisatellite expansion. Examination of sequences flanking the 55 S. cerevisiae minisatellites found in genes confirm and extend this finding. We found such short identical sequences for 49 minisatellites; we could not detect such sequences only for NUM1, DSN1, CTS1, CHS5, and one of the two minisatellites in RPO21 and FLO11 (table 6). The average size between the end of the minisatellite and the downstream short sequence is 27 ± 6 nt (fig. 4). These flanking repeats are trimers (3 cases out of 49), tetramers (15 cases), pentamers (20 cases), hexamers (4 cases), or heptamers or more (7 cases). Pentamers were also found flanking the 18 occurences of the Y' minisatellite. The reason why pentamers are more frequent, as compared to other repeat sizes, is unknown. Table 6 Minisatellite-Flanking Identical Sequences Gene   Systematic Name   Flanking Sequence   Size   Dist.a   Gene   Syst. Name   Flanking Sequence   Size   Dist.a   FLO9  YAL063c  TTCTA  5  26  BBC1  YJL020c  CCAG  4  16  FLO1  YAR050w  ACCGG  5  37  DAN1  YJR150c  ATCT  4  32    YBR016w  GGATA  5  12  DAN4  YJR151c  TCAAGT  6  8  TIP1  YBR067c  AATC  4  1      ACTAC  5  43  CYC8  YBR112C  CTCAA  5  1  PIR1  YKL164c  ATCTC  5  15  RPO21  YDL140c  TTC  3  18  PIR3  YKL163w  TACAG  5  5      No  —  —    YKL105c  AACAA  5  53  BSC1  YDL037c  CGTC  4  29    YKL023w  AGAA  4  61  SNF11  YDR073w  GTA  3  41  PRY2  YKR013w  AACACAAC  8  66    YDR134c  CCAA  4  1  FLO10  YKR102w  GCAA  4  14  NUM1  YDR150w  No  —  —    YLR114c  ACGA  4  43  HKR1  YDR420w  CCATCA  6  57  CTS1  YLR286c  No  —  —  FIT1  YDR534c  GAG  3  12  CHS5  YLR330w  No  —  —  TIR1  YER011w  CTGCC  5  25  CCW14  YLR390w-a  TCTTCT  6  27  BUD27  YFL023w  AACGA  5  25  DDR48  YMR173w  ACGA  4  41  NAB2  YGL122c  CAACC  5  16      AACAATGACGATTC  14  31  SCW11  YGL028c  CTAC  4  8    YMR317w  CATCA  5  42  CRH1  YGR189c  CATCC  5  16  UBP10  YNL186w  GATG  4  20  FLO5  YHR211w  CAACT  5  76  NIS1  YNL078w  GATT  4  28      AACAA  5  0  AGA1  YNR044w  ATCC  4  36    YIL169c  TTCTG  5  10    YOL155c  GGCTCATC  8  40  TIR3  YIL011w  CCAAG  5  0  WSC3  YOL105c  TACCAC  6  54  PAN1  YIR006c  TCAACCAACT  10  8  TIR4  YOR009w  AGTT  4  60      ACCTCAG  7  23  TIR2  YOR010c  TCTAC  5  22  DSN1  YIR010w  No  —  —  PET127  YOR017w  TAGG  4  51  FLO11  YIR019c  No  —  —  FIT3  YOR383c  CACTT  5  13      TCCATCCAG  9  0  MF(α)1  YPL187w  TAAAA  5  30  PIR2   YJL159w   TTCCCAAATT   10   23             Gene   Systematic Name   Flanking Sequence   Size   Dist.a   Gene   Syst. Name   Flanking Sequence   Size   Dist.a   FLO9  YAL063c  TTCTA  5  26  BBC1  YJL020c  CCAG  4  16  FLO1  YAR050w  ACCGG  5  37  DAN1  YJR150c  ATCT  4  32    YBR016w  GGATA  5  12  DAN4  YJR151c  TCAAGT  6  8  TIP1  YBR067c  AATC  4  1      ACTAC  5  43  CYC8  YBR112C  CTCAA  5  1  PIR1  YKL164c  ATCTC  5  15  RPO21  YDL140c  TTC  3  18  PIR3  YKL163w  TACAG  5  5      No  —  —    YKL105c  AACAA  5  53  BSC1  YDL037c  CGTC  4  29    YKL023w  AGAA  4  61  SNF11  YDR073w  GTA  3  41  PRY2  YKR013w  AACACAAC  8  66    YDR134c  CCAA  4  1  FLO10  YKR102w  GCAA  4  14  NUM1  YDR150w  No  —  —    YLR114c  ACGA  4  43  HKR1  YDR420w  CCATCA  6  57  CTS1  YLR286c  No  —  —  FIT1  YDR534c  GAG  3  12  CHS5  YLR330w  No  —  —  TIR1  YER011w  CTGCC  5  25  CCW14  YLR390w-a  TCTTCT  6  27  BUD27  YFL023w  AACGA  5  25  DDR48  YMR173w  ACGA  4  41  NAB2  YGL122c  CAACC  5  16      AACAATGACGATTC  14  31  SCW11  YGL028c  CTAC  4  8    YMR317w  CATCA  5  42  CRH1  YGR189c  CATCC  5  16  UBP10  YNL186w  GATG  4  20  FLO5  YHR211w  CAACT  5  76  NIS1  YNL078w  GATT  4  28      AACAA  5  0  AGA1  YNR044w  ATCC  4  36    YIL169c  TTCTG  5  10    YOL155c  GGCTCATC  8  40  TIR3  YIL011w  CCAAG  5  0  WSC3  YOL105c  TACCAC  6  54  PAN1  YIR006c  TCAACCAACT  10  8  TIR4  YOR009w  AGTT  4  60      ACCTCAG  7  23  TIR2  YOR010c  TCTAC  5  22  DSN1  YIR010w  No  —  —  PET127  YOR017w  TAGG  4  51  FLO11  YIR019c  No  —  —  FIT3  YOR383c  CACTT  5  13      TCCATCCAG  9  0  MF(α)1  YPL187w  TAAAA  5  30  PIR2   YJL159w   TTCCCAAATT   10   23             a Dist., distance in nucleotides from the end of the minisatellite to the downstream repeat; No, no flanking repeat was found. View Large Rapid Evolution of Minisatellites In a former study, it was shown that microsatellites evolved rapidly among several hemiascomycetous yeast genomes (Malpertuy, Dujon, and Richard 2003). We come to the same conclusion for minisatellites. Although a minisatellite-containing gene is conserved, its minisatellite is not necessarily conserved, and most of the time its sequence is divergent from the S. cerevisiae sequence (table 5 and fig. 6). Analysis of the completely sequenced genomes of the hemiascomycetes studied here, using criteria similar to the present study, shows that they all contain numerous minisatellites, in proportions comparable to what was found in S. cerevisiae (data not shown). This observation implies that each species contains minisatellites that are absent from the S. cerevisiae genome, suggesting that each species has a specific subset of minisatellites that are not shared by the others. Hence, there must be molecular mechanisms responsible for de novo creation of minisatellites, as suggested before for microsatellites (Malpertuy, Dujon, and Richard 2003). Birth, Life, and Death of Minisatellites: A Model We propose that initial formation of a minisatellite requires a negatively GC-skewed DNA region; hence, it has more chance to occur in genes that naturally exhibit this negative skew. Birth requires slippage (probably occuring during DNA replication) between two short repeats flanking the region that will be duplicated, as originally proposed by Haber and Louis (1998). After the initial duplication event, the minisatellite can be amplified by different mechanisms, including slippage during replication, mitotic recombination, or meiotic gene conversion. Replication errors can introduce point mutations into a given unit that will eventually lead to correction or propagation of the mutation by gene conversion. If too many mutations accumulate in a minisatellite, repeat size change cannot occur anymore because the repeat units are too divergent to promote slippage during replication or recombination (Pâques, Richard, and Haber 2001). From then on, the minisatellite will accumulate more point mutations, eventually erasing the repeats. Toward a Biological Definition of Tandem Repeat Sequences Finally, we want to point out that the frontier between micro- and minisatellites varies a lot in the literature, depending on authors. The present work allows us to propose a biological definition of these two genetic objects. In S. cerevisiae, mono- to hexanucleotide repeats are found (Richard et al. 1999), and trinucleotide repeats (a particular class of microsatellites) are mainly found in nuclear genes, often encoding transcription factors (Richard and Dujon 1996; Alba, Santibañez-Koref, and Hancock 1999; Young, Sloan, and Van Riper 2000; Malpertuy, Dujon, and Richard 2003), whereas minisatellites are mainly found in cell wall genes, as shown by the present study. The shortest repeat size of a minisatellite found in S. cerevisiae was 12 nt long, but 9-nt-long repeat units were found in S. paradoxus (PAN1 and BBC1). We therefore propose that the frontier between micro- and minisatellites be set at 9 nt, defining two classes of tandem repeat sequences, short tandem repeats found in transcription factors and longer ones in cell wall genes. Note Added in Proof A recent work by G. Fink and colleagues also show that minisatellites are frequently found in cell wall genes (Verstrepen et al. 2005). Edward Holmes, Associate Editor We thank our colleagues, especially C. Fairhead, B. Llorente, and J.-P. Latgé, for fruitful discussions and advices, H. Muller, C. Hennequin, and G. Fischer for sharing unpublished results, and two anonymous reviewers for helpful comments. B.D. is a member of the Institut Universitaire de France. References Alba, M. M., M. F. Santibañez-Koref, and J. M. Hancock. 1999. Amino acid reiterations in yeast are overexpressed in particular classes of proteins and show evidence of a slippage-like mutational process. J. Mol. Evol.  49: 789–797. Google Scholar Appelgren, H., H. Cederberg, and U. Rannug. 1997. Mutations at the human minisatellite MS32 integrated in yeast occur with high frequency in meiosis and involve complex recombination events. Mol. Gen. Genet.  256: 7–17. Google Scholar Bennett, J. E., K. Izumikawa, and K. A. Marr. 2004. Mechanism of increased fluconazole resistance in Candida glabrata during prophylaxis. Antimicrob. Agents Chemother.  48: 1773–1777. Google Scholar Bolotin-Fukuhara, M., C. Toffano-Nioche, F. Artiguenave et al. (11 co-authors). 2000. Genomic exploration of the hemiascomycetous yeasts: 11. Kluyveromyces lactis. FEBS Lett.  487: 66–70. Google Scholar Borde, V., W. Lin, E. Novikov, J. H. Petrini, M. Lichten, and A. Nicolas. 2004. Association of Mre11p with double-strand break sites during yeast meiosis. Mol. Cell  13: 389–401. Google Scholar Caro, L. H., H. Tettelin, J. H. Vossen, A. F. Ram, H. van den Ende, and F. M. Klis. 1997. In silicio identification of glycosyl-phosphatidylinositol-anchored plasma-membrane and cell wall proteins of Saccharomyces cerevisiae. Yeast  13: 1477–1489. Google Scholar Casarégola, S., C. Neuveglise, A. Lepingle, E. Bon, C. Feynerol, F. Artiguenave, P. Wincker, and C. Gaillardin. 2000. Genomic exploration of the hemiascomycetous yeasts: 17. Yarrowia lipolytica. FEBS Lett.  487: 95–100. Google Scholar Charlesworth, B., P. Sniegowski, and W. Stephan. 1994. The evolutionary dynamics of repetitive DNA in eukaryotes. Nature  371: 215–220. Google Scholar Debrauwère, H., J. Buard, J. Tessier, D. Aubert, G. Vergnaud, and A. Nicolas. 1999. Meiotic instability of human minisatellite CEB1 in yeast requires double-strand breaks. Nat. Genet.  23: 367–371. Google Scholar Debrauwère, H., C. G. Gendrel, S. Lechat, and M. Dutreix. 1997. Differences and similarities between various tandem repeat sequences: minisatellites and microsatellites. Biochimie  79: 577–586. Google Scholar Dib, C., S. Faure, C. Fizames et al. (14 co-authors). 1996. A comprehensive genetic map of the human genome based on 5,264 sequences. Nature  380: 152–154. Google Scholar Dujon, B. 1996. The yeast genome project: what did we learn? Trends Genet.  12: 263–270. Google Scholar Dujon, B., D. Sherman, G. Fischer et al. (67 co-authors). 2004. Genome evolution in yeasts. Nature  430: 35–44. Google Scholar Ecker, M., V. Mrsa, I. Hagen, R. Deutzmann, S. Strahl, and W. Tanner. 2003. O-mannosylation precedes and potentially controls the N-glycosylation of a yeast cell wall glycoprotein. EMBO Rep.  4: 628–632. Google Scholar Ellegren, H. 2004. Microsatellites: simple sequences with complex evolution. Nat. Rev. Genet.  5: 435–445. Google Scholar Fischer, G., S. A. James, I. N. Roberts, S. G. Oliver, and E. J. Louis. 2000. Chromosomal evolution in Saccharomyces. Nature  405: 451–454. Google Scholar Fischer, G., C. Neuveglise, P. Durrens, C. Gaillardin, and B. Dujon. 2001. Evolution of gene order in the genomes of two related yeast species. Genome Res.  11: 2009–2019. Google Scholar Foster, E. A., M. A. Jobling, P. G. Taylor, P. Donnelly, P. de Knijff, R. Mieremet, T. Zerjal, and C. Tyler-Smith. 1998. Jefferson fathered slave's last child. Nature  396: 27–28. Google Scholar Fujimori, S., T. Washio, and M. Tomita. 2005. GC-compositional strand bias around transcription start sites in plants and fungi. BMC Genomics  6: 26. Google Scholar Gerton, J. L., J. DeRisi, R. Shroff, M. Lichten, P. O. Brown, and T. D. Petes. 2000. Global mapping of meiotic recombination hotspots and coldspots in the yeast Saccharomyces cerevisiae. Proc. Natl. Acad. Sci. USA  97: 11383–11390. Google Scholar Gill, P., A. J. Jeffreys, and D. J. Werrett. 1985. Forensic application of DNA ‘fingerprints’. Nature  318: 577–579. Google Scholar Goffeau, A., B. G. Barrell, H. Bussey et al. (16 co-authors). 1996. Life with 6000 genes. Science  274: 546–567. Google Scholar Haber, J. E., and E. J. Louis. 1998. Minisatellite origins in yeast and humans. Genomics  48: 132–135. Google Scholar Hagelberg, E., I. C. Gray, and A. J. Jeffreys. 1991. Identification of the skeletal remains of a murder victim by DNA analysis. Nature  352: 427–429. Google Scholar Hagen, I., M. Ecker, A. Lagorce et al. (11 co-authors). 2004. Sed1p and Srl1p are required to compensate for cell wall instability in Saccharomyces cerevisiae mutants defective in multiple GPI-anchored mannoproteins. Mol. Microbiol.  52: 1413–1425. Google Scholar Hancock, J. M., E. A. Worthey, and M. F. Santibañez-Koref. 2001. A role for selection in regulating the evolutionary emergence of disease-causing and other coding CAG repeats in humans and mice. Mol. Biol. Evol.  18: 1014–1023. Google Scholar Helminen, P., C. Ehnholm, M. L. Lokki, A. Jeffreys, and L. Peltonen. 1988. Application of DNA “fingerprints” to paternity determinations. Lancet  1: 574–576. Google Scholar Hennequin, C., A. Thierry, G.-F. Richard, G. Lecointre, H. V. Nguyen, C. Gaillardin, and B. Dujon. 2001. Microsatellite typing as a new tool for identification of Saccharomyces cerevisiae strains. J. Clin. Microbiol.  39: 551–559. Google Scholar Horowitz, H., and J. E. Haber 1984. Subtelomeric regions of yeast chromosomes contain a 36 base-pair tandemly repeated sequence. Nucleic Acids Res.  12: 7105–7121. Google Scholar International Human Genome Sequencing Consortium. 2001. Initial sequencing and analysis of the human genome. Nature  409: 860–921. Google Scholar Jeffreys, A. J., J. Murray, and R. Neumann. 1998. High-resolution mapping of crossovers in human sperm defines a minisatellite-associated recombination hot spot. Mol. Cell  2: 267–273. Google Scholar Jeffreys, A. J., and R. Neumann. 1997. Somatic mutation processes at a human minisatellite. Hum. Mol. Genet.  6: 129–136. Google Scholar Jeffreys, A. J., R. Neumann, and V. Wilson. 1990. Repeat unit sequence variation in minisatellites: a novel source of DNA polymorphism for studying variation and mutation by single molecule analysis. Cell : 473–485. Google Scholar Jeffreys, A. J., K. Tamaki, A. McLeod, D. G. Monckton, D. L. Neil, and J. A. L. Armour. 1994. Complex gene conversion events in germline mutation at human minisatellites. Nat. Genet.  6: 136–145. Google Scholar Klis, F. M., P. Mol, K. Hellingwerf, and S. Brul. 2002. Dynamics of cell wall structure in Saccharomyces cerevisiae. FEMS Microbiol. Rev.  26: 239–256. Google Scholar Kolpakov, R., G. Bana, and G. Kucherov. 2003. mreps: efficient and flexible detection of tandem repeats in DNA. Nucleic Acids Res.  31: 3672–3678. Google Scholar Lafontaine, I., G. Fischer, E. Talla, and B. Dujon. 2004. Gene relics in the genome of the yeast Saccharomyces cerevisiae. Gene  335: 1–17. Google Scholar Latgé, J.-P., and R. Calderone. 2005. The fungal cell wall. In K. Esser and R. Fischer, ed. The Mycota XIII. Springer, Berlin, Germany. Google Scholar Lépingle, A., S. Casaregola, C. Neuveglise, E. Bon, H. Nguyen, F. Artiguenave, P. Wincker, and C. Gaillardin. 2000. Genomic exploration of the hemiascomycetous yeasts: 14. Debaryomyces hansenii var. hansenii. FEBS Lett.  487: 82–86. Google Scholar Lobry, J. R. 1996. Asymmetric substitution patterns in the two DNA strands of bacteria. Mol. Biol. Evol.  13: 660–665. Google Scholar Lopes, J., H. Debrauwère, J. Buard, and A. Nicolas. 2002. Instability of the human minisatellite CEB1 in rad27Δ and dna2-1 replication-deficient yeast cells. EMBO J.  21: 3201–3211. Google Scholar Louis, E. J., E. S. Naumova, A. Lee, G. Naumov, and J. E. Haber. 1994. The chromosome end in yeast: its mosaic nature and influence on recombinational dynamics. Genetics  136: 789–802. Google Scholar Maleki, S., H. Cederberg, and U. Rannug. 2002. The human minisatellites MS1, MS32, MS205 and CEB1 integrated into the yeast genome exhibit different degrees of mitotic instability but are all stabilised by RAD27. Curr. Genet.  41: 333–341. Google Scholar Malpertuy, A., B. Dujon, and G.-F. Richard. 2003. Analysis of microsatellites in 13 hemiascomycetous yeast species: mechanisms involved in genome dynamics. J. Mol. Evol.  56: 730–741. Google Scholar Marck, C. 1988. ‘DNA Strider’: a ‘C’ program for the fast analysis of DNA and protein sequences on the Apple Macintosh family of computers. Nucleic Acids Res.  16: 1829–1836. Google Scholar Morgante, M., M. Hanafey, and W. Powell. 2002. Microsatellites are preferentially associated with nonrepetitive DNA in plant genomes. Nat. Genet.  30: 194–200. Google Scholar Pâques, F., and J. E. Haber. 1999. Multiple pathways of recombination induced by double-strand breaks in Saccharomyces cerevisiae. Microbiol. Mol. Biol. Rev.  63: 349–404. Google Scholar Pâques, F., G.-F. Richard, and J. E. Haber. 2001. Expansions and contractions in 36-bp minisatellites by gene conversion in yeast. Genetics  158: 155–166. Google Scholar Raghuraman, M. K., E. A. Winzeler, D. Collingwood, S. Hunt, L. Wodicka, A. Conway, D. J. Lockhart, R. W. Davis, B. J. Brewer, and W. L. Fangman. 2001. Replication dynamics of the yeast genome. Science  294: 115–121. Google Scholar Richard, G.-F., and B. Dujon. 1996. Distribution and variability of trinucleotide repeats in the genome of the yeast Saccharomyces cerevisiae. Gene  174: 165–174. Google Scholar ———. 1997. Trinucleotide repeats in yeast. Res. Microbiol.  148: 731–744. Google Scholar Richard, G.-F., C. Hennequin, A. Thierry, and B. Dujon. 1999. Trinucleotide repeats and other microsatellites in yeasts. Res. Microbiol.  150: 589–602. Google Scholar Richard, G.-F., and F. Pâques. 2000. Mini- and microsatellite expansions: the recombination connection. EMBO Rep.  1: 122–126. Google Scholar Röder, M. S., V. Korzun, K. Wendehake, J. Plaschke, M.-H. Tixier, P. Leroy, and M. W. Ganal. 1998. A microsatellite map of wheat. Genetics  149: 2007–2023. Google Scholar Roest Crollius, H., O. Jaillon, C. Dasilva et al. (12 co-authors). 2000. Characterization and repeat analysis of the compact genome of the freshwater pufferfish Tetraodon nigroviridis. Genome Res.  10: 939–949. Google Scholar Sakamoto, T., R. G. Danzmann, K. Gharbi et al. (12 co-authors). 2000. A microsatellite linkage map of rainbow trout (Oncorhynchus mykiss) characterized by large sex-specific differences in recombination rates. Genetics  155: 1331–1345. Google Scholar Subramanian, S., R. K. Mishra, and L. Singh. 2003. Genome-wide analysis of microsatellite repeats in humans: their abundance and density in specific genomic regions. Genome Biol.  4: R13.11–R13.19. Google Scholar Toth, G., Z. Gaspari, and J. Jurka. 2000. Microsatellites in different eukaryotic genomes: survey and analysis. Genome Res.  10: 967–981. Google Scholar Verstrepen, K. J., A. Jansen, F. Lewitter, and G. Fink. 2005. Intragenic tandem repeats generate functional variability. Nat. Genet.  37: 986–990. Google Scholar Waldbieser, G. C., B. G. Bosworth, D. J. Nonneman, and W. R. Wolters. 2001. A microsatellite-based genetic linkage map for channel catfish, Ictalurus punctatus. Genetics  158: 727–734. Google Scholar Winckler, W., S. R. Myers, D. J. Richter et al. (11 co-authors). 2005. Comparison of fine-scale recombination rates in humans and chimpanzees. Science  308: 107–111. Google Scholar Young, E. T., J. S. Sloan, and K. Van Riper. 2000. Trinucleotide repeats are clustered in regulatory genes in Saccharomyces cerevisiae. Genetics  154: 1053–1068. Google Scholar © The Author 2005. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. All rights reserved. For permissions, please e-mail: journals.permissions@oxfordjournals.org

Journal

Molecular Biology and EvolutionOxford University Press

Published: Sep 21, 2005

There are no references for this article.