Characterization of a cluster comprising ∼100 odorant receptor genes
Shirley Y. Xie, Paul Feinstein, Peter Mombaerts
The Rockefeller University, 1230 York Avenue, New York, New York 10021, USA
Received: 27 June 2000 / Accepted: 17 August 2000
Abstract. With ∼1000 genes, the odorant receptor (OR) gene
repertoire is the largest gene family in the mouse genome. Here we
have established a 129/Sv BAC contig for mouse OR gene cluster
7 (Olfr7) on Chromosome (Chr) 9. The assembled ∼2-Mb contig
consists of 75 BACs and may contain as many as 100 OR genes,
or ∼10% of the mouse repertoire. Facilitated by the lack of introns
in the coding region, we have determined the nucleotide sequence
of 37 full-length, 2 partial, and 3 pseudo coding regions. These 42
OR genes and 3 additional OR genes previously mapped to the
mouse Olfr7 cluster can be organized into 13 classes based on OR
probe cross-hybridizations with 129/Sv mouse genomic DNA. OR
genes belonging to the same class tend to be located next to each
other within the cluster. Comparison of published full-length
mouse and rat OR coding sequences with those identified here
shows that the Olfr7 OR genes are highly related to each other,
clustering on two major branches of an unrooted phylogenetic tree.
Eight ORs contain an unusual NXC sequon at the amino-terminal
extracellular domain that may represent a novel N-linked glyco-
sylation site. The BAC contig presented here provides the substrate
for sequencing of the cluster.
The initial step in olfaction involves the interaction of odorous
ligands with ORs located on the cilia of olfactory sensory neurons.
A molecular basis for odorant recognition was provided by the
discovery of a large family of genes that code for seven-
transmembrane (7TM) domain proteins (Buck and Axel 1991).
Functional expression assays have provided evidence that these
receptors can recognize odorants (Krautwurst et al. 1998; Zhao et
al. 1998; Murrell and Hunter 1999; Touhara et al. 1999). It is
estimated that in mouse, the OR gene family may contain as many
as 1000 distinct genes, comprising 2% of all genes; it would thus
be the largest known gene family in any mammalian genome (for
reviews, see Mombaerts 1999a, 1999b). Identified OR genes reside
within clusters located in at least 16 different loci spread over at
least 10 different chromosomes in the mouse genome. The number
of OR genes contained in a cluster ranges from one to approxi-
While many full-length coding sequences (cds) are found in
human sequence databases, comparatively few sequences are
available for the mouse. Furthermore, although detailed genomic
characterization of OR gene clusters has been reported in human
(Ben-Arie et al. 1994; Buettner et al. 1998; Trask et al. 1998a,
1998b; Brand-Arpon et al. 1999; Sharon et al. 1999; Glusman et al.
2000), much less is known about the genomic organization of OR
gene clusters in mouse (Sullivan et al. 1996; Strotmann et al. 1999;
Tsuboi et al. 1999). In human, the high percentage of pseudogenes
and the lack of experimental tractability complicate the functional
exploration of this wealth of information. Here, we characterize
the largest known cluster of OR genes in any mammalian genome.
Materials and methods
BAC libraries and screening.
The MGS1 BAC libraries, consisting of
129/Sv mouse embryonic stem cell BAC libraries I and II, were purchased
from Genome Systems (St. Louis, Mo.).
The screening was performed as follows: BAC membranes were pre-
wet and hybridized in hybridization buffer (5× SSPE, 5× Denhardts, 0.1%
SDS, and 10 g/ml salmon sperm DNA) overnight at 65°C. Membranes
were then washed twice in 2× SSC, 0.1% SDS for 20 min at the same
temperature. Positive clones were identified after overnight or longer ex-
posure at −80°C. The probes were generated from the TM III–V region of
individual OR genes by PCR with gene-specific primers. The PCR reaction
consisted of 50 ng of DNA template, 100 ng of each oligonucleotide
primer, 25 m
, 2.5 m
of each dNTP, 2.5 l of 10× PCR buffer,
and1unitTaq DNA polymerase (Perkin-Elmer, Gaithersburg, MD) in a
total volume of 25 l. The PCR reaction was performed under the follow-
ing conditions: denaturation at 95°C for 5 min; 35 cycles of 30 s at 95°C,
30 s at 55°C, and 1 min at 72°C; followed by 5 min at 72°C. PCR products
were then purified by low-melting-point agarose, labeled with
by using random labeling (Prime-It RmT, Stratagene, Cedar Creek, Tex.),
and combined to make a pool of probes as desired.
Dot blot hybridization and BAC Southern hybridization.
clones were grown in 40 l microcultures for4hina96-well dish at 37°C.
An aliquot of 20 l was spotted onto nitrocellulose membranes (NEN Life
Science Products, Boston, Mass.) and processed by alkaline lysis and neu-
tralization treatment. The membranes were auto-cross linked (UV Stra-
talinker 2400, Stratagene, La Jolla, Calif.) and hybridized to individual OR
probes prepared as described above in 10 ml QuikHyb hybridization so-
lution (Stratagene, La Jolla, Calif.) for2hat65°C.Then, membranes were
washed twice, once in 2× SSC, 0.1% SDS and once in 0.5× SSC, 0.1%
SDS, for 15 min at the same temperature. All hybridizing clones could be
identified after 6 h exposure at −80°C.
BAC DNA was prepared from 2 ml of overnight culture by a standard
alkaline lysis method and resuspended in 35 l of TE buffer. An aliquot of
10 l was digested in a total volume of 20 l with restriction enzyme
HindIII for2hat37°C. DNA samples were separated in a 0.7% agarose gel
overnight in 1× TAE (0.04
EDTA) and then trans-
ferred to a piece of nitrocellulose membrane overnight in 0.4
Membranes were hybridized to a genomic pool of OR probes generated by
PCR on 129/Sv mouse genomic DNA with a pair of degenerate oligo-
nucleotide primers for ORs, P24 and P28 (Ressler et al. 1993). The hy-
bridization procedure is the same as for the dot blot hybridizations. BAC
clone overlaps were analyzed manually.
Insert size estimation of BAC clones.
BAC clone inserts were released
from pBeloBACII vectors by NotI restriction enzyme digestion. The insert
sizes were determined by field-inversion gel electrophoresis (FIGE Mapper
Electrophoresis System, Bio-Rad, Hercules, Calif.) under the following
conditions: 1% agarose gel (SeaKem LE agarose, FMC BioProducts,
The nucleotide sequence data reported in this paper have been submitted to
GenBank. Their accession numbers are displayed in Table 2.
Correspondence to: P. Mombaerts; E-mail: firstname.lastname@example.org
Mammalian Genome 11, 1070–1078 (2000).
© Springer-Verlag New York Inc. 2000