Mouse H-Y encoding Smcy gene and its X Chromosomal
Alexander I. Agulnik,
Maria T. Ty,
Colin E. Bishop,
Department of Obstetrics and Gynecology, 6550 Fannin St., Baylor College of Medicine, Houston, Texas 77030, USA
INSERM Unite 406, Marseille, France
Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas 77030, USA
Received: 3 March 1999 / Accepted: 13 May 1999
SMCY is present on the Y in mouse, human, and most mammals.
Importantly, it is present on the marsupial Y showing that SMCX/
SMCY diverged at least 120 Myr years ago (Agulnik et al. 1994a).
Unlike many other Y-Chromosomal genes, SMCY is present in a
single copy in mouse and man. Both Y and X Chr homologs are
widely transcribed in all male tissues, and the X gene is expressed
in all female tissues tested including preimplantation mouse em-
bryos (Agulnik et al. 1994a). SMCX escapes X-inactivation in
human and mouse (Agulnik et al. 1994b; Carrel et al. 1996; Je-
galian and Page 1998; Sheardown et al. 1996; Wu et al. 1994a,
1994b), indicating that the X and Y copies are functionally inter-
changeable and that transcript dose is important for the expression
of this gene. Although the biological role of the genes remains
unknown, it has been established that human and mouse SMCY
contain epitopes of the minor histocompatibility antigen, H-Y
(Ehrmann et al. 1997; Markiewicz et al. 1998; Meadows et al.
1997; Scott et al. 1995; Wang et al. 1995). In this paper we report
the complete cDNA sequence of the mouse Smcy and Smcx genes.
We have established the genomic structure of the Smcy gene and
show that the gene is composed of 26 exons spread over 47 kb of
genomic DNA. The longest open reading frame encodes 1548
amino acids for Smcy and 1551 amino acids for Smcx. The geno-
mic structure of SMCY is entirely conserved between mouse and
human. As yet, its biological role is not understood, but the ho-
mology to RBP2 (retinoblastoma-binding protein 2) and the pres-
ence of a zinc-finger domain indicate a possible involvement of the
SMCX/Y (SMCX and SMCY) proteins in DNA binding and tran-
A 1.8-kb cDNA of the mouse Smcy gene and 3 kb of the mouse
Smcx genes identified previously (Agulnik et al. 1994a,b) have
been used in the present study to clone and sequence full-length
transcripts of the genes. RT and RACE PCR, as well as conven-
tional screening of testis cDNA libraries, have been employed to
obtain a series of the overlapping cDNA fragments representing
the missing 3Ј and 5Ј ends of both genes. The Smcy sequence is
5316 bp, and Smcx is 5673 bp with open reading frames (ORF) of
4647 bp and 4656 bp respectively. The transcripts are predicted to
encode polypeptides of 1548aa, MW 177 kDA (Smcy), and 1551
aa, MW 175 kDa (Smcx).
The translation start codon in Smcy is situated in the context
A, which is close to the optimal ACCATGG present in
the Smcx sequence (Kozak 1986). In the Smcy transcript, two
consensus polyadenylation signals AATAAA are at 266 bp and 18
bp upstream of poly (A) tail. In Smcx there is only one polyade-
nylation signal at 55 bp upstream of the poly(A) tail.
During cDNA isolation we found several splice variants of the
Smcy and Smcx transcripts. Several splice variants of the SMCY
gene were also recovered from human testis and lymphocyte
RNA (data not shown). Different transcripts apparently
utilizing different splice sites have been also reported for the hu-
man SMCX gene (Wu et al. 1994a) and for the SMCY gene
(cDNA KIAA0234, GeneBank accession # D87072).
Specific differences between the amino acid sequence of the
SMCY and SMCX genes, coupled with their widespread expres-
sion pattern (Agulnik et al. 1994a), form the basis of the H-Y
male-specific minor antigen system. Thus, Smcy has been shown
to encode several H-Y antigen epitopes, the position of which are
shown in Fig. 1 (Ehrmann et al. 1997; Markiewicz et al. 1998;
Meadows et al. 1997; Scott et al. 1995; Wang et al. 1995).
As shown in Fig. 1, proteins derived from the translation of the
open reading frames of the human and mouse SMCX/SMCY
genes are highly homologous to each other. Overall amino acid
sequence similarities are as follows: Smcx/SMCX 96%; Smcy/
SMCY 84%; Smcy/Smcx 84%; Smcy/SMCX 83%; and Smcx/
SMCY 86%. In all proteins the C-terminal end encoded by the last
exon is the least conserved. The similarity between the amino acid
sequence of Smcx and Smcy in this region is only 32% compared
with an overall value of 84%.
The closest identified homolog of the SMCX/SMCY gene pair
is the human retinoblastoma binding protein 2 (RBP2; Fataey et al.
1993). The overall identity of the latter protein to the products of
the mouse X and Y genes is 58% (67% similarity) and 57% (65%
similarity) respectively. The homology is higher at the N-terminal
end of the sequence. At the C-terminal end, the middle part of the
twenty-third exon of Smcy encodes 57 amino acids, which are
highly conserved among mouse and human SMCX/Y proteins and
RBP2 (89% identity). The RBP2 protein contains two significant
domains—a zinc-finger at the N-terminal end and a homeodomain
similar to the engrailed family of homeotic genes in the middle
part of the sequence (Fataey et al. 1993). The zinc-finger domain
is well conserved between SMCX/Y products and RBP2 protein
(70% identity), but the homeodomain is less conserved (53–57%
identity). The retinoblastoma product (pRB) binding domain (a
stretch of amino acids shared by all retinoblastoma-binding pro-
teins) is not conserved in any SMCX/Y protein.
At the nucleotide and amino acid level, database searching
revealed that both genes have several homologs: yeast hypothetical
85.0 kDa protein (Accession # P47156); mouse jumonji protein
(Q62315); yeast putative 90.2 kDa zinc-finger protein (P39956);
and others. SMCX/Y proteins contain a zinc-finger domain as part
of a highly conserved PHD finger, a cysteine-rich region encoded
by the eighth exon of the gene. Such a domain is present in a
number of putative proteins derived from yeast, mammals, and
plants (Aasland et al. 1995).
To obtain Smcy genomic clones, we have screened a mouse
Correspondence to: A.I. Agulnik
The nucleotide sequence data reported in this paper have been submitted to
GenBank and have been assigned the accession numbers AF127244 and
Mammalian Genome 10, 926–929 (1999).
© Springer-Verlag New York Inc. 1999