Mammalian Genome 13, 481±482 (2002). Ó Springer-Verlag New York Inc. 2002
Whither the mouse genome?
Steve D.M. Brown
MRC Mammalian Genetics Unit and UK Mouse Genome Centre, Harwell, OX11 ORD, UK
In the late 70's and early 80's there was an explosion of interest
in genomes and their evolution. This was the ®rst golden era
for genomics, fueled by new techniques for DNA analysis and
the emergence of novel insights into the structure and com-
position of diverse genomes. The ®rst repeat sequence families
in the mouse were characterised and their organisation and
evolution compared between dierent species and sub-species.
The talk was of sel®sh DNA, junk DNA, concerted evolution
and molecular drive. It was realised that the genome was an
extraordinarily dynamic structure, where change may be the
driver for evolution and speciation. It is curious to contrast
that period with genomics 21st century styleÐthe age of the
genome sequence, where we now have the opportunity to view
a genome in toto and to consider the functional dynamics and
evolution of genomes on a global scale.
The publicmouse genome sequencing eortÐa joint pro-
ject of Washington University, St. Louis, the Sanger Institute
and the Whitehead InstituteÐhave recently announced version
3 of the draft assembly of the mouse genome. This represents
the most complete draft sequence of the mouse genome so far.
Suciently complete indeed that it deserved a closer look. At
the 2002 Cold Spring Harbor Meeting on Genome Sequencing
and Biology there were a number of presentations on these ®rst
glimpses of the mouse genomeÐthere were a number of sur-
prises and a few tantalising hints that some aspects of mouse
genetics may need some rethinking.
The draft sequence has been assembled into 224,000 con-
tigs. Sequence reads from clones that connect gaps between
adjacent contigs have allowed the construction of a smaller
number of supercontigs (44,500) across the genome. The
alignment and positioning of supercontigs on the framework
genetic and RH maps have also enabled the construction of an
even smaller number of ultracontigs (89 in total). Analysis of
the ultracontigs indicates that some 96% of the mouse genome
is covered by draft sequence (Bob Waterston, Wash U).
Part of the raison d'etre of the mouse sequencing eort was
to provide a better annotation of the gene content of mam-
malian genomes. Comparison of human and mouse genome
sequence already has a proven track record for annotating
shorter regions of the mouse and human genomes. The pro-
vision of a mouse draft sequence now allows the comparison to
be carried out on a genome-wide scale, and reports from a
number of centres underline the value of comparative sequence
in identifying coding sequences that were previously missed.
Employing human±mouse comparisons on a genome-wide
scale to enhance classical predictive algorithms such as
GENSCAN (Michael Brent, Mouse Genome Sequencing
Consortium) gives an estimate of total gene number in the
mouse in the region of 30±35,000.
It has also been possible to re-evaluate the human/mouse
synteny relationships. Not unsurprisingly, there are many
more conserved segments (over 300) detectable between mouse
and human on the basis of global alignments (Michael
Kamal), Whitehead Institute). The increased number of con-
served segments is borne out not only from the comparative
analysis of draft sequence but also from comparison of mouse
and human ®nished sequence where available (Anne-Marie
Mallon, MRC UK Mouse Sequencing Consortium). It was
always to be expected that a comprehensive and ®ne-scale
comparison of the mouse and human genomes would reveal
additional conserved segments. This comparative analysis also
con®rmed previous ®ndings that there appear to be fewer CpG
islands in the mouse genome compared to the human (Michael
Kamal, Whitehead Institute and see Antequera and Bird, Proc.
Natl. Acad. Sci. USA 90:11995±11999, 1993).
The completion of the mouse draft has also enabled in-
vestigators for the ®rst time to examine the patterns of varia-
tion between inbred strains across large genome regions. Two
centres (Richard Mural, Celera, and Claire Wade, Whitehead)
reported the analysis of SNP variation across the mouse ge-
nome. Celera have assayed for SNPs genome wide by com-
parison of sequence reads from four inbred strains (C57BL/6J,
A/J, DBA/2J, and 129X1/SvJ) that constitute their current
mouse genome assembly. Although on average there is one
SNP every 1 kb, the distribution is clearly non-random and
there are signi®cantly sized regions that are SNP deserts.
Similar conclusions were drawn from the work of the White-
head who compared 15 Mb of publicly available ®nished 129
BAC sequence with the current C57BL/6J draft. 1 SNP desert
of 26 Mb was observed and 42 deserts over 5 Mb; the average
SNP desert encompassed 2 Mb. The deserts appear to repre-
sent regions of shared ancestry between the inbred strains. Do
these regions represent a problem for the ®ne mapping of in-
bred strain traits? Probably not, as it is likely that the loci
underlying distinguishable traits in inbred strains will lie in
SNP-rich regions and can be localised using the surrounding
sequence variation. Indeed, it may be that the de®nition of the
SNP landscape across multiple inbred strains may assist in the
localisation of loci determining inbred strain traits. However,
high resolution mapping of ENU induced mutations may be
hampered if they lie in a SNP desert that is common to many
inbred strains, in which case mappers may have to resort to
wild species or sub-species for the appropriate variant markers.
These ®ndings also suggest that a reappraisal of the nature of
recombinant inbred (RI) strains is merited. It appears that the
variation between the parental strains for RI lines is probably
more limited than originally supposed and that certain regions
of the genome cannot be contributing to phenotype variation.
As the regions of common ancestry between inbred strains are
characterised in more detail, it will be possible to incorporate
this information into the standard strain distribution patterns
of RI lines or, for that matter, recombinant congenic and
consomic lines as well.
Two further interesting features of the mouse draft se-
quence have a bearing upon evolution. The ®rst is the presence
of so-called ``gene deserts''Ðregions of the genome that appear
Correspondence to: S.D.M. Brown; e-mail: email@example.com