TY - JOUR AU1 - Seixas, Susana AU2 - Suriano, Gianpaolo AU3 - Carvalho, Filipa AU4 - Seruca, Raquel AU5 - Rocha, Jorge AU6 - Di Rienzo, Anna AB - Abstract The superfamily of serine protease inhibitors (SERPINs) plays a key role in controlling the activity of proteinases in diverse biological processes. α1-antitrypsin (SERPINA1), the most studied member of this family, is encoded by a gene located within the proximal 14q32.1 SERPIN subcluster, together with the highly homologous α1-antitrypsin–like sequence (SERPINA2), which was previously proposed to be a pseudogene. Here, we performed a resequencing study encompassing both SERPINA1 and SERPINA2 as well as the adjacent gene coding for corticosteroid-binding globulin (SERPINA6) in samples from Europe and West Africa. In the African sample, we found that a common haplotype carrying a 2-kb deletion in the SERPINA2 gene is associated with remarkable long-range homozygozity as if it was quickly driven to high frequency by natural selection acting on an advantageous variant. An analysis of the HapMap Phase I data for the Yoruba sample confirmed that variation in this subcluster carries a strong signal of positive selection. We also show that the SERPINA2 gene is expressed and probably encodes a functional SERPIN. Finally, comparisons with orthologous sequences in nonhuman primates showed that SERPINA2 is present in some great apes, but in chimpanzees it was lost by a deletion event independent from that observed in humans. In agreement with the “less is more” hypothesis, we propose that loss of SERPINA2 is an ongoing process associated with a selective advantage during recent primate evolution, possibly because of a role in fertility or in host–pathogen interactions. natural selection, pseudogenization, SERPIN Introduction Serine protease inhibitors (SERPINs) are a superfamily of highly conserved proteins that are widely distributed among animals, plants, viruses, and bacteria. In vertebrates, these proteins act mainly as protease inhibitors in a number of biological processes such as blood coagulation, complement activation, fibrinolysis, tissue repair, inflammation, and tumor suppression. However, a small fraction of SERPINs have lost their inhibitory ability and developed other specialized roles, such as hormone carriers, chaperones, or storage proteins (Stein and Carrell 1995; Irving et al. 2000, 2002). SERPIN functional diversity and substrate specificity is essentially determined by variation at a reactive center defined by a short stretch of amino acids. This reactive center accumulated substitutions at a greater rate than the rest of the molecule, probably as a result of positive natural selection (Hill and Hastie 1987; Creighton and Darby 1989; Goodwin et al. 1996). The inhibitory function of SERPINs involves a high level of molecular plasticity, which renders the molecules particularly vulnerable to mutations. Single amino acid changes affecting their mobile reactive loop can cause abnormal protein folding and may lead to the pathogenic polymerization processes that underlie the recently recognized category of conformational diseases (Stein and Carrell 1995; Carrell and Lomas 2002; Lomas and Carrell 2002). This concept is behind the most common serpinopathy, α1-antitrypsin (SERPINA1) deficiency, which is mainly caused by homozygosity for a E342L mutation (the Z allele), affecting 1 in 2,000 to 1 in 7,000 individuals of European ancestry (WHO 1997). The major clinical manifestations of SERPINA1 deficiency are early pulmonary emphysema, due to the unopposed action of neutrophil elastase in the lower respiratory tract and hepatic disease caused by the cytotoxic effect of protein accumulation in the rough endoplasmic reticulum from hepatocytes (Cox 1995; WHO 1997; Needham and Stockley 2004). Furthermore, SERPINA1 has been extensively characterized in human populations as a classical protein polymorphism with 5 additional common alleles: M1Ala213, M1Val213, M2, M3 and S all with normal circulating protein levels and a mildly deficient S variant (Cox 1995; Nukiwa et al. 1996). In humans, SERPINA1 is part of a gene cluster, which spans over ∼370 kb on chromosome 14q32.1 and includes 10 additional members of the SERPIN superfamily. Within this cluster, the SERPIN genes are organized into 3 distinct subclusters. The proximal subcluster contains the α1-antitrypsin (SERPINA1), α1-antitrypsin–like (SERPINA2), corticosteroid-binding globulin (SERPINA6), and protein Z inhibitor (SERPINA10) genes. The central subcluster harbors the recently characterized vaspin (SERPINA12), centerin (SERPINA9), and antiproteinase-like 2 (SERPINA11) genes. Finally, the distal subcluster contains the kallistatin-like (KAL-like), α1-antichymotrypsin (SERPINA3), protein C inhibitor (SERPINA5), and kallistatin (SERPINA4) genes (Namciu et al. 2004; Marsden and Fournier 2005). All these genes have a significant sequence similarity and most share a common gene structure with 1 untranslated exon and 4 coding exons. Accordingly, it has been proposed that they evolved from a common ancestral gene through a series of duplication events (Atchley et al. 2001; van Gent et al. 2003). Except for SERPINA2 (Bao et al. 1988; Kelsey et al. 1988), all the members of the chromosome 14 cluster were previously shown to be expressed (Namciu et al. 2004; Marsden and Fournier 2005). Although the gene structure and sequence of SERPINA2 are very similar to those of SERPINA1, no promoter could be identified for SERPINA2 by sequence homology, leading to the proposal that this is a pseudogene (Bao et al. 1988; Hofker et al. 1988; Marsden and Fournier 2005). However, different studies have yielded contrasting results regarding the extent of SERPINA2 sequence degeneration. Bao et al. (1988) reported a sequence with preserved RNA splice sites and no premature stop codons, suggesting that the SERPINA2 gene, if expressed, could encode a new secretory SERPIN with different substrate specificity. In contrast, Hofker et al. (1988) reported a cloned sequence, which bears a critical mutation in the start codon (ATG → ATA) and an ∼2-kb deletion encompassing exon IV and part of exon V. This deletion was shown to occur at a 30% frequency in a sample from the Dutch population (Hofker et al. 1988). The characterization of copy number polymorphism in the human genome, including insertions and deletions from a few to hundreds of kilobases, has been the focus of a number of recent surveys (Sebat et al. 2004; Sharp et al. 2005; Tuzun et al. 2005; Conrad et al. 2006; Feuk et al. 2006; Hinds et al. 2006; McCarroll et al. 2006). Although these genome-wide studies have provided a great deal of information on structural variation, little is known about the role of these variants in common disease phenotypes and in human adaptations. In particular, the “less is more” hypothesis posits that loss of function mutations, such as deletions, are an important substrate for natural selection and may be the basis for many evolutionary adaptations (Olson 1999). Hence, evolutionary studies of genomic regions harboring polymorphic deletions, such as the proximal SERPIN 14q32.1 subcluster, will contribute to understanding the significance of this class of common genetic variation. Here, we performed a resequencing study of the proximal SERPIN 14q32.1 subcluster encompassing the 2 most closely related genes, SERPINA1 and SERPINA2, and the adjacent SERPINA6 gene. By surveying both coding and noncoding regions in 2 ethnically distinct samples from Europe and Africa, we aimed to provide a deeper understanding of the evolution of this subcluster. We found an unusual pattern in the African sample, resulting from the high frequency of a haplotype carrying a 2-kb deletion in the SERPINA2 gene. This haplotype shows considerable long-range homozygosity across the surveyed region, suggesting that it quickly reached high frequency due to the action of positive natural selection. Furthermore, we show that the nondeleted form of SERPINA2 is expressed in different human tissues and that the gene is deleted in chimpanzee, but intact in other great apes. These data taken together suggest that recent positive selection favored the loss of SERPINA2 function and that the pseudogenization process is still ongoing in humans. Materials and Methods DNA Samples Sequence variation was surveyed in DNA samples from unrelated individuals, with known SERPINA1 protein phenotypes, belonging to 2 populations from different backgrounds: Portugal and São Tomé (Gulf of Guinea, West Africa). The island of São Tomé, located 240 km off the coast of Gabon, was peopled at the end of the 15th century by slaves from the nearby coasts of Africa and hence retained the high levels of genetic diversity that are generally observed in the African mainland (Tomas et al. 2002). Population samples of 40 chromosomes of Portuguese origin and 40 chromosomes from the island of São Tomé were selected from larger samples in which SERPINA1 protein polymorphism had been previously studied (Seixas et al. 2001). All samples were collected with informed consent. Orthologous regions of the proximal SERPIN subcluster were also sequenced in 1 chimpanzee (Pan troglodytes), 1 gorilla (Gorilla gorilla), and 1 orangutan (Pongo pygmeus). Polymerase Chain Reaction and Sequence Determination Primers for amplification and sequencing were designed on the basis of GenBank (http://www.gdb.org/) sequence entries AL132708 for SERPINA1 and AL117259 for SERPINA2 and SERPINA6; all nucleotide positions in this article are numbered according to these sequences. To distinguish between chromosomes bearing SERPINA2 deletion and nondeletion alleles, we performed allele-specific polymerase chain reaction (PCR) using the following reverse primers: 5′-AGT TGG TGC CAT ACA CTA AT-3′ for the deletion and 5′-AGT TGG TGA TGT CAT CCT TG-3′ for the nondeletion. Sequencing was performed using the ABI BigDye Terminator version 3 cycle sequencing chemistry (Applied Biosystems, Foster City, CA), and electrophoresis analysis was done on an ABI 3100 automated sequencer. All human sequences were assembled and analyzed using the Phred-Phrap-Consed package (Nickerson et al. 1997). All putative polymorphisms and software-derived genotype calls were visually inspected and were individually confirmed using Consed. Details about PCR and sequencing conditions are available from the authors upon request. Expression Studies by Real-Time PCR To study the expression of SERPINA2, real-time (RT) PCR experiments were performed using cDNA synthesized by reverse transcription assays in total mRNA from liver, leukocytes, and testis samples. Additionally, the expression of the known functional gene SERPINA1 was studied for comparison. Reverse transcription was performed using the Superscript II RT PCR system (Invitrogen Life Technology, Carlsbad, CA), according to the manufacturer's protocol. Quantitative RT PCR reactions were performed on an ABI Prism 7000 Sequence Detection System with the TaqMan Universal Master Mix (Applied Biosystems), according to the manufacturer's instructions. The expression level of the rRNA gene was used as control. All primers and probes were designed by using Primer Express version 2.0 software. In order to prevent cross hybridization, primers were anchored on a region of low sequence similarity at the exon II–III junction. Primers for SERPINA1 were 5′-GTC AAA CAC CTG AAA AAA GAC ACA A-3′ and 5′-CAC TTG CCG TGA AAG GAA ATG-3′, and the probe was 5′-TCT TGC CCT GGT GGA T-3′. For SERPINA2, we used the primers 5′-GGA GCT TGA CAG AGA CAC TTT T-3′ and 5′-GGT CTC TCC CAT TTG CCT TTA A-3′ and a 5′-CTC TGG TGA ATT ACA TCT T-3′ probe. For each gene, primers and probe concentrations were 900 nM and 250 nM, respectively. The cycling conditions were as follows: 50 °C for 2 min, 95 °C for 10 min, and 40 cycles of 95 °C for 15 s and 60 °C for 1 min. The expression levels of SERPINA1 and SERPINA2 were normalized relative to the expression level of rRNA. All reactions were run in triplicate. Statistical Analysis Summary statistics of population genetic variation were calculated using the online applications SLIDER (http://genapps.uchicago.edu/slider/index.html) and MAXDIP (http://genapps.uchicago.edu/labweb/index.html). Haplotypes for SERPINA2 in deletion-bearing heterozygotes were phased unambiguously by allele-specific PCR and sequencing. The remaining haplotypes were inferred in the total sample by using the program PHASE 2.02. (Stephens et al. 2001; Stephens and Donnelly 2003). |D′| and r2 were calculated from the inferred haplotypes using the DNAsp program, version 4.0. (Rozas J and Rozas R 1997). The haplotype test described by Hudson et al. (1994) was performed by simulating 10,000 replicates under neutrality, using estimates of the population recombination and mutation rate parameters calculated for our data using MAXDIP and SLIDER, respectively. We initially performed the test using the best reconstruction of haplotypes provided by the program PHASE. To assess the robustness of results to misspecification of haplotype phase, the tests were subsequently rerun on 100 samples of haplotypes drawn from the posterior distribution provided by PHASE. The extended haplotype homozygosity statistic (EHH) (Sabeti et al. 2002) was computed using the online tool EHH calculator (http://ihg.gsf.de/cgi-bin/mueller/webehh.pl). To assess the statistical significance of EHH, we performed the long-range haplotype (LRH) test (Sabeti et al. 2002) by comparing the observed values of the relative EHH with theoretical null distributions generated by coalescent simulations with recombination, assuming no selection (Hudson 2002). We simulated 500-kb regions under 4 different demographic models: constant population size, expansion, and bottleneck models described in Voight et al. (2005) and the structured population model used by Sabeti et al. (2002). Comparison of haplotype frequencies versus relative EHH and significance estimation were carried out using the program Sweep 1.0 (http://www.broad.mit.edu/mpg/sweep). Values of the integrated haplotype score (iHS) statistic (Voight et al. 2006) for the HapMap Phase I data (http://www.hapmap.org/) and their P values were obtained using the online tool Haplotter (http://hg-wen.uchicago.edu/selection/haplotter.htm). Results To characterize the patterns of variation across the proximal SERPIN subcluster at 14q32.1, we surveyed 7 DNA fragments, spanning a total of 18.7 kb and covering the coding and adjacent noncoding segments depicted in figure 1. All segments were sequenced in each individual in the samples from the sub-Saharan African population of São Tomé and from the European population of Portugal, in which the common polymorphisms of SERPINA1 had been previously typed (Seixas et al. 2001). FIG. 1.— View largeDownload slide Surveyed segments of the 14q32.1 SERPIN proximal subcluster. Upper line shows the relative position of the 3 genes on the cluster and lower lines the exon/intron structure of each gene (exons are represented by boxes). Large arrows indicate the regions surveyed for sequence variation (F1–F7). The dotted line represents the 2-kb deletion of SERPINA2 gene. FIG. 1.— View largeDownload slide Surveyed segments of the 14q32.1 SERPIN proximal subcluster. Upper line shows the relative position of the 3 genes on the cluster and lower lines the exon/intron structure of each gene (exons are represented by boxes). Large arrows indicate the regions surveyed for sequence variation (F1–F7). The dotted line represents the 2-kb deletion of SERPINA2 gene. We also sequenced the orthologous regions in 1 chimpanzee, 1 gorilla, and 1 orangutan, to infer the ancestral states for each human polymorphism. Alignment of the human (AL117259) and chimpanzee (NW_115886) genomic reference sequences showed that an ∼7.5-kb region orthologous to positions 20714–28182 in the human sequence and spanning the entire SERPINA2 gene was absent in the chimpanzee. To evaluate whether the missing sequence represented a genuine deletion, we performed PCR sequencing in our chimpanzee sample using primers specific to regions flanking the putative deletion (AL117259 19321–19342 and 29646–29668). The results indicated that both chromosomes analyzed harbor the ∼7.5-kb deletion encompassing the whole SERPINA2 gene, suggesting that this gene deletion is not rare and is, perhaps, fixed in chimpanzee. In contrast, gorilla and orangutan sequences showed an intact SERPINA2 gene. Sequence Variation and Polymorphism Levels We found a total of 130 polymorphic sites, including 14 nonsynonymous, 13 synonymous, and 103 noncoding mutations. In SERPINA1, we observed 5 amino acid replacements resulting in the common protein variants previously described (R101H, A213V, E264V, E342L, and E376D). In SERPINA6, we found a previously identified A224S polymorphism (Smith et al. 1992; Torpy et al. 2004). In SERPINA2, we confirmed that a 2,024-bp deletion spanning from intron III to exon V (fig. 1) is a common polymorphism, occurring in 23 and 7 chromosomes in the São Tomé and Portuguese samples, respectively. Within the deletion-carrying chromosomes, we additionally identified 1 mutation in the start codon (ATG → ATA) and 1 amino acid replacement variant (Q102L). In the chromosomes not carrying the deletion, we identified 2 frameshift mutations leading to premature stop codons (L108fs and L277fs) and 4 amino acid replacement variants (I280T, L308P, E330K, and P387L). Except for I280T and E330K, these amino acid replacements are likely to alter the protein structure based on the computational predictions of Polyphen (Ramensky et al. 2002). In the sample from São Tomé, the overall frequency of chromosomes bearing mutations likely to affect SERPINA2 function is 85% (34/40) (i.e., 57.5% deletion, 10% frameshift mutations, and 17.5% mutations predicted to affect protein structure). In the Portuguese sample, the frequency is 67.5% (27/40) (i.e., 17.5% deletion, 7.5% frameshift mutations, and 42.5% mutations predicted to affect protein structure). Summary statistics of the polymorphism and sequence divergence data are shown in table 1. Variation within the deleted fragment of SERPINA2 was omitted from this analysis. Polymorphism levels as summarized by nucleotide diversity (π), which is based on average number of differences between sequences, and by the estimator of the population mutation rate parameter θw (Watterson 1975), which is based on the number of polymorphic sites and sample size, are slightly higher than the genome-wide average in humans (Crawford et al. 2005; Stajich and Hahn 2005). The São Tomé sample shows the highest number of polymorphic sites in all surveyed regions. Estimates of the population recombination rate parameter (4Ner) based on the composite likelihood estimator ρH01 (Frisse et al. 2001; Hudson 2001) are higher than genome-wide average values (4 × 10−4 and 2 × 10−4 for African and European–Americans, respectively) (Serre et al. 2005) and fall within the range estimated for subtelomeric regions (Serre et al. 2005). These results are consistent with the high recombination rates (2 cM/Mb) reported for the interval spanning the SERPIN cluster (Kong et al. 2002). The lowest and highest levels of linkage disequilibrium (LD), as summarized by ρH01, were found in the SERPINA1 and SERPINA2 regions, respectively. SERPINA2 was the only region in which LD levels in São Tomé were not lower than in the Portuguese sample (table 1). The Tajima's D statistic, which summarizes information about the spectrum of allele frequencies, is expected to be approximately 0 under the neutral equilibrium model (Tajima 1989). A negative value indicates an excess of rare variants, which may result from a selective sweep, whereas a positive value indicates an excess of intermediate frequency variants, which may reflect the action of balancing selection. Studies of sequence variation in humans have shown that populations of African ancestry tend to have a slight excess of rare variants, whereas non-African populations show an excess of intermediate frequency variants (Wall and Przeworski 2000; Frisse et al. 2001; Akey et al. 2004; Stajich and Hahn 2005; Voight et al. 2005). All Tajima's D values obtained for the SERPIN subcluster do not depart significantly from the expectations of the neutral equilibrium model and show the same trends observed in population samples from similar ethnic groups (Wall and Przeworski 2000; Frisse et al. 2001; Akey et al. 2004; Stajich and Hahn 2005; Voight et al. 2005). Table 1. Summary Statistics of Population Variation     SERPINA1  SERPINA2a  SERPINA6    Nb  Lc  Sd  θWe  πf  Dg  ρH01h  Lc  Sd  θWe  πf  Dg  ρH01h  Lc  Sd  θWe  πf  Dg  ρH01h  São Tomé  40  7,587  44  13.63  11.35  −0.59  15.01  3,098  20  15.18  14.26  −0.20  3.81  6,273  42  15.74  17.58  0.41  7.61  Portugal  40  7,587  29  8.99  12.22  1.24  8.91  3,098  15  11.38  13.81  0.68  4.51  6,273  35  13.12  18.73  1.49  3.41      SERPINA1  SERPINA2a  SERPINA6    Nb  Lc  Sd  θWe  πf  Dg  ρH01h  Lc  Sd  θWe  πf  Dg  ρH01h  Lc  Sd  θWe  πf  Dg  ρH01h  São Tomé  40  7,587  44  13.63  11.35  −0.59  15.01  3,098  20  15.18  14.26  −0.20  3.81  6,273  42  15.74  17.58  0.41  7.61  Portugal  40  7,587  29  8.99  12.22  1.24  8.91  3,098  15  11.38  13.81  0.68  4.51  6,273  35  13.12  18.73  1.49  3.41  a Variation in segments spanning the 2-kb deletion were omitted from analysis. b N = number of chromosomes. c L = total number of sites surveyed. d S = number of segregating sites. e Watterson's estimator of θ (4Neμ) (Watterson 1975) per basepair (×10−4). f Nucleotide diversity per basepair (×10−4). g Tajima's D statistic (Tajima 1989). h Hudson's estimator of ρ (4Ner) per basepair (×10−4), based on a conversion-to-crossover ratio of 2 and a mean conversion tract length of 500 bp (Frisse et al. 2001; Hudson 2001). View Large Haplotype Diversity and Tests for the Signature of Natural Selection A visual representation of the sequence data in the form of inferred haplotypes is shown in figure 2. In the São Tomé sample, the SERPINA2 region harbors a common haplotype defined by the 2-kb deletion and by derived alleles at 3 additional sites (25095, 26806, and 26911). Sites 23876 (corresponding to the start codon ATG → ATA mutation) and 25546 were also found to bear derived alleles that are strongly associated with the 2-kb deletion (|D′|=1; r2 = 0.82 and |D′| = 0.89; r2 = 0.71, respectively). FIG. 2.— View largeDownload slide Haplotypes inferred by PHASE for SERPINA1, SERPINA2, and SERPINA6. The nonhuman primate sequences were used to infer the ancestral state at each site. Numbers below the gene names indicate the position of each polymorphic site relative to the reference sequence for each gene (GenBank accession numbers AL132708 for SERPINA1 and AL117259 for SERPINA2 and SERPINA6). The segment spanned by the 2-kb deletion was omitted. Nonsynonymous sites are marked by asterisk. D and ND indicate the chromosomes bearing the SERPINA2 deletion and nondeletion alleles, respectively. FIG. 2.— View largeDownload slide Haplotypes inferred by PHASE for SERPINA1, SERPINA2, and SERPINA6. The nonhuman primate sequences were used to infer the ancestral state at each site. Numbers below the gene names indicate the position of each polymorphic site relative to the reference sequence for each gene (GenBank accession numbers AL132708 for SERPINA1 and AL117259 for SERPINA2 and SERPINA6). The segment spanned by the 2-kb deletion was omitted. Nonsynonymous sites are marked by asterisk. D and ND indicate the chromosomes bearing the SERPINA2 deletion and nondeletion alleles, respectively. To evaluate whether this haplotype structure could result from the action of positive natural selection, we calculated the EHH statistic proposed by Sabeti et al. (2002). Specifically, we measured the decay of LD around core haplotypes defined by sites 26806 and 26911, which are surrogate markers for the 2-kb deletion polymorphism at SERPINA2. When EHH was plotted against distance in the São Tomé sample, we found that, in spite of their higher frequency (0.575 vs. 0.425), chromosomes with the 26806–26911 G-A haplotypes, bearing the 2-kb deletion, have greater EHH than T-G chromosomes bearing the nondeleted allele (fig. 3A). In the Portuguese sample, we used 3 additional polymorphic sites (25881, 26354, and 26461) to define a set of core haplotypes that occur at frequencies closer to the deletion haplotype (fig. 3B). In this case, EHH for the 2-kb deletion haplotype (CGTGA) still appears to decay relatively slowly compared with the EHH for other intermediate frequency haplotypes (CATTG and CGATG). FIG. 3.— View largeDownload slide Plots of EHH breakdown over distance from core haplotypes defined by SNPs in SERPINA2. (A) Comparison between São-Tomean chromosomes bearing the 2-kb deletion (GA) and nondeletion (TG) alleles at SERPINA2. (B) Comparison between Portuguese chromosomes bearing the 2-kb deletion (CGTGA) and nondeletion (CATTG, GGATG, and CGATG) alleles. The unsurveyed segment between SERPINA2 and SERPINA6 was omitted. Frequencies of core haplotypes are shown in parenthesis. FIG. 3.— View largeDownload slide Plots of EHH breakdown over distance from core haplotypes defined by SNPs in SERPINA2. (A) Comparison between São-Tomean chromosomes bearing the 2-kb deletion (GA) and nondeletion (TG) alleles at SERPINA2. (B) Comparison between Portuguese chromosomes bearing the 2-kb deletion (CGTGA) and nondeletion (CATTG, GGATG, and CGATG) alleles. The unsurveyed segment between SERPINA2 and SERPINA6 was omitted. Frequencies of core haplotypes are shown in parenthesis. To test the null hypothesis of evolutionary neutrality for the SERPINA2 haplotypes, we used the haplotype test described by Hudson et al. (1994) and the LRH test proposed by Sabeti et al. (2002). To evaluate if the haplotype class defined by the 2-kb deletion and by derived alleles at sites 25095, 26806, and 26911 contained fewer segregating sites than expected under neutrality, given its frequency the haplotype test of Hudson et al. (1994) was applied to windows of different sizes defined by the concatenation of the different surveyed fragments (fig. 1). Using the haplotypes inferred by PHASE as the best reconstruction, we obtained significant tests in the São Tomé sample, for the 8-kb region resulting from concatenation of fragments F2 and F3 (P = 0.0036; fig. 1) and the 10.5-kb region including fragments F1–F3 (P = 0.0275). In the Portuguese sample, we tested the same concatenated fragments, but no test was found to be significant (P values of 0.74 and 0.98, respectively). Haplotype tests of fragments centered on SERPINA6 were not significant in either population sample (P values ranging from 0.14 to 0.99). All tests were based on simulations of the standard neutral model. This model was shown to provide a good fit to sequence variation data for populations of African ancestry, including the admixed populations of African-Americans (Adams and Hudson 2004; Voight et al. 2005). Conversely, non-African data do not fit the standard neutral model and were shown to be compatible with bottleneck models (Adams and Hudson 2004; Voight et al. 2005). Because the evolutionary variance is greater under bottleneck models, it is highly likely that the tests of the Portuguese data would remain not significant even if the simulations were based on such models. To determine if test results in the São Tomé sample were robust to misspecification of haplotype assignment, we generated 100 haplotype samples drawn at random from the posterior distribution estimated by PHASE. After rerunning the test on each sample, we found that for the 8-kb concatenated fragment including F2–F3, all samples had a P value lower than 0.01, and for the 10.5-kb region including fragments F1–F3, 96% of samples had P values lower than 0.05. Taken together, these results suggest that the São Tomé sample harbors a signature of natural selection on a haplotype class that encompasses a region up to 28 kb in length, including SERPINA1 and SERPINA2. To determine whether the EHH for the 26806–26911 G-A core haplotype in the African sample was unusual, we compared the haplotype frequency with the relative EHH at the 2 largest distances where non–G-A haplotypes had nonzero values of EHH (−25 kb distal and 60 kb proximal; fig. 3A). The deviations from simulated null distributions were significant for all 4 tested demographic models (P values for −25 kb distal are constant-sized population, P < 0.000007; expansion, P < 0.000008; bottleneck, P < 0.0000005; and population structure, P < 0.00001; and those for 60 kb proximal are constant-sized population, P < 0.02; expansion P < 0.004; bottleneck P < 0.01; and population structure P < 0.003). Simulation-based tests allow testing the hypothesis of evolutionary neutrality based on specified demographic scenarios. An alternative, empirical approach consists in comparing the haplotype structure observed at a candidate locus with the pattern observed over the entire genome. This approach was recently implemented for the HapMap Phase I data (http://www.hapmap.org/) by calculating an iHS for each single nucleotide polymorphism (SNP) in each of the 3 HapMap population samples; the absolute value of iHS measures the strength of the evidence for selection acting on a SNP or one tightly linked to it (Voight et al. 2006). In the Yoruba, the |iHS| for SNP rs6647 (site 135395), which is the best surrogate marker for the SERPINA2 deletion in HapMap Phase I data, is 2.611; this indicates that, consistent with the resequencing data, SNP rs6647 is associated with a selection signal in the top 5% of the entire genome for this population. However, because this SNP is only moderately correlated with the deletion allele (r2 = 0.22), it may not be ideal for assessing the evidence for selection acting on the deletion. Therefore, we used the iHS statistic to ask whether the SERPIN genes evolved by positive selection. In this analysis, the signal is proportional to the number of SNPs with |iHS| > 2 in a window of 50 SNPs centered on the gene (Voight et al. 2006). Based on this approach, the empirical P values for the SERPIN genes in the Yoruba sample also appear to be extreme in the genome-wide distribution (SERPINA1 P = 0.053, SERPINA2 P = 0.053, and SERPINA6 P = 0.037), further suggesting a recent selective process. It should be noted that the iHS results point to a stronger signal in SERPINA6 compared with SERPINA1 and SERPINA2, in contrast with what we observed in our resequencing study. However, the HapMap Phase I data only included a few SNPs within the proximal SERPIN subcluster (8 SNPs within SERPINA1; 2 SNPs within SERPINA2; and 9 SNPs within SERPINA6) and, as a consequence, the SNP windows analyzed for each gene largely overlap. Hence, it is not possible to determine which gene contributes to the selection signal based on the HapMap data. In the European (Centre d'Etude du Polymorphisme Humain—Utah residents with ancestry from northern and western Europe) and Asian (Han Chinese from Beijing and Japanese from Tokyo) population samples, no signal of selection was detected based on the |iHS| approach, corroborating the idea that the signature of selection is specific to African populations. Expression Studies SERPINA2 and Evaluation of Protein Structure Although the haplotype structure suggests a role for positive selection in shaping variation at the SERPIN subcluster, the target of this selection was not immediately obvious. Site 135395, corresponding to the A213V amino acid replacement at SERPINA1 could be regarded as a potential target of selection. However, this seems unlikely because: 1) the allele associated with the selected haplotype is the ancestral one (A213) (see fig. 2) and 2) the A213 allele is only loosely linked to the selected haplotype (r2 = 0.22). Moreover, clinical studies did not report any phenotypic association with the A213 allele that would suggest a selective advantage (Nukiwa et al. 1987; Gaillard et al. 1994). With regard to SERPINA6, a single nonsynonymous variant was observed (A224S) and this variant is not strongly associated with the selected haplotype (r2 = 0.11). Hence, the remaining possible target of selection is the deletion of the SERPINA2 gene. However, if the nondeleted form of SERPINA2 is not a functional gene, as previously proposed, the deletion would be unlikely to have phenotypic and fitness effects. We tested for SERPINA2 expression in liver, testis, and leukocytes and used RT PCR to assess the expression of SERPINA2 relative to SERPINA1. The SERPINA2 gene is highly expressed in the testis at 3-fold and 60-fold higher levels relative to leukocytes and liver, respectively (fig. 4). On the contrary, and as previously reported, the highest expression levels of SERPINA1 were observed in the liver, the major SERPINA1 producer (Cox 1995; Nukiwa et al. 1996), at 160-fold higher levels compared with leukocytes. In testis, multiple attempts did not show evidence for SERPINA1 expression. Interestingly, leukocytes appeared to have higher expression levels of SERPINA2 (28 ct) than SERPINA1 (34 ct). These results demonstrate that the nondeleted form of SERPINA2 is expressed and its expression is different across the tissues tested. In addition, SERPINA1 and SERPINA2 are differentially expressed relative to each other. FIG. 4.— View largeDownload slide Relative mRNA levels of SERPINA2 and SERPINA1 in liver, leukocytes, and testis, as estimated by quantitative RT PCR. FIG. 4.— View largeDownload slide Relative mRNA levels of SERPINA2 and SERPINA1 in liver, leukocytes, and testis, as estimated by quantitative RT PCR. To further explore the possibility of a functional role for SERPINA2, a 3-dimensional protein model (http://swissmodel.expasy.org/workspace/) for the nondeleted form was built on the scaffold of the available crystal structure of the highly homologous SERPINA1 protein (75–81% homology). According to this model (fig. 5), SERPINA2 preserves the typical structure of the SERPIN reactive center loop, which is compatible with a protease inhibitory activity (fig. 5B). However, the sequences flanking the reactive site appear to have diverged considerably (fig. 5C), and the putative reactive site (P1–P1′) of SERPINA2 harbors a tryptophan–serine sequence instead of a methionine–serine motif, as previously noted (Bao et al. 1988). The tryptophan–serine motif was also found in the orthologous sequences of our gorilla and orangutan samples as well as in the rhesus monkey (UCSC Genome Browser,—http://genome.ucsc.edu/—chr7: 157590203–157595117) and in a rat SERPINA2 putative protein (XP_2345123). Moreover, other members of the SERPIN family share a reactive site similar to that of SERPINA2 (P1-hydrophobic and P1′-polar amino acids); they include SERPINA10 (Q5RDA8), SERPINA4 (P29622), mouse α-1-antichymotrypsin (P01011), and chicken ovalbumin (P01012). Hence, it appears that the nondeleted SERPINA2 protein has retained important features of a functional active site. FIG. 5.— View largeDownload slide Three-dimensional theoretical model of the SERPIN reactive center loop (RCL), as obtained from SWISS-MODEL Automated Protein Modelling Server. (A) SERPINA1 RCL (D341–P361). (B) SERPINA2 RCL (D346–Y366). (C) RCL sequence homology between the SERPINA1 and SERPINA2 (conserved amino acids are shown in red). Images were obtained using the Swiss PDB viewer 3.7 (http://www.expasy.ch/spdbv/). FIG. 5.— View largeDownload slide Three-dimensional theoretical model of the SERPIN reactive center loop (RCL), as obtained from SWISS-MODEL Automated Protein Modelling Server. (A) SERPINA1 RCL (D341–P361). (B) SERPINA2 RCL (D346–Y366). (C) RCL sequence homology between the SERPINA1 and SERPINA2 (conserved amino acids are shown in red). Images were obtained using the Swiss PDB viewer 3.7 (http://www.expasy.ch/spdbv/). Discussion We have performed an extensive survey of sequence variation of the proximal 14q32.1 SERPIN subcluster, including the 2 functional genes, SERPINA1 and SERPINA6, and the previously proposed pseudogene, SERPINA2. In the São Tomé sample, we show that a haplotype defined by a partial deletion of SERPINA2 is associated with too little variation, given its frequency, relative to neutral expectations. We further show that, in the absence of the deletion allele, the SERPINA2 gene may be active and differentially expressed in liver, testis, and leukocytes. Moreover, its expression pattern is distinct from that of the highly homologous SERPINA1 gene. Finally, we show that a 7.5-kb deletion removed the SERPINA2 ortholog in the chimpanzee genome, but not in the other great apes. Hence, we propose that the loss of SERPINA2 was advantageous in recent primate evolution. We assessed the signature of positive natural selection in 2 ways. The first one was based on the application of the haplotype test of Hudson et al. (1994) and the LRH test (Sabeti et al. 2002) to our full resequencing data and relied on the assumption of a specified set of demographic models. By this approach, we found a homogeneous haplotype class, particularly frequent in the African sample that is defined by the 2-kb deletion, an ATG → ATA mutation in the start codon and 3 additional mutations in near-perfect LD with each other. This finding suggests that a haplotype class was quickly driven to high frequency by natural selection acting on an advantageous variant. Although the São Tomé sample is geographically defined, the population is likely to be a mixture of people from different parts of sub-Saharan Africa (Tomas et al. 2002). Hence, we cannot definitively rule out the possibility that the observed departure in the haplotype test is due to a violation of the demographic assumptions of the standard neutral model rather than positive natural selection. However, previous multilocus surveys of sequence variation in samples of sub-Saharan African ancestry, including the highly admixed African-Americans, did not detect significant departures from the standard neutral model (Adams and Hudson 2004; Akey et al. 2004; Stajich and Hahn 2005; Voight et al. 2005). Moreover, the significance of the results remained unchanged under alternative demographic models, including population subdivision. This suggests that simulation-based tests for the São Tomé data may be relatively robust. The second approach we used to detect the signature of natural selection, however, does not suffer from this potential problem and offers additional advantages. In this approach, we used the iHS statistic to investigate the signature of natural selection in the HapMap Phase I data; the iHS has high power to detect a sweep of a variant at intermediate frequency such as the SERPINA2 deletion. Because the HapMap samples include a well-defined African population, that is, the Yoruba from Ibadan (Nigeria), admixture is a less serious problem. In addition, the analysis based on the iHS does not rely on the assumptions of a null neutral model in that the data at the test locus, in this case SERPINA2, are compared with the empirical genome-wide distribution in the same population sample. This approach effectively circumvents the need to specify a model of population history that is unknown. On the other hand, a possible drawback of genome scans for selection, such as those performed by using the iHS statistics, is that they may have limited power because only the strongest selection signals can be detected and the false negative rate may be high (Teshima et al. 2006). Despite this potential limitation, the genes in the proximal SERPIN subcluster exhibit signals of selection that are in the top 3.7–5.3% of the distribution for the HapMap Phase I data in the Yoruba. We also performed an SNP-based analysis of the iHS statistic by using a surrogate SNP for the deletion allele (rs6647); the iHS for this SNP in the Yoruba is 2.611, corresponding to an empirical P value of 0.05–0.01. Thus, both approaches and both data sets converge on supporting the notion that the haplotype structure at the proximal subcluster is unusual and likely to be due to the action of natural selection acting on an advantageous variant. This conclusion is consistent with our previous analysis of a microsatellite located between SERPINA1 and SERPINA2, which exhibited lower levels of diversity in a sample from São Tomé compared with 2 European samples (from Portugal and Basque Country) as well as a unimodal allele frequency distribution within the A213 (rs6647) chromosomes from São Tomé (Seixas et al. 2001). Consistent with the findings reported here, the patterns of microsatellite variation were proposed to be the result of positive natural selection on a unknown advantageous variant tightly linked to SERPINA1 (Seixas et al. 2001). Full resequencing of confined genomic regions and genome-wide SNP typing offer complementary opportunities for detecting the signature of natural selection. Although the latter allows the assessment of haplotype homozygosity over large distances and, hence, may have greater power to detect a signature of selection for less common variants, it provides limited information on the underlying patterns of variation and the full array of variants. On the other hand, thanks to the availability of large-scale data sets, SNP-typing data allow the implementation of empirical approaches that do not rely on assumptions about the unknown underlying demographic model. In the case of the proximal SERPIN subcluster, there is generally good agreement between these 2 types of data for the population samples of African ancestry. In the HapMap data, the SERPINA6 gene has a slightly stronger signal of selection compared with the other 2 genes in the subcluster (P = 0.037 vs. P = 0.053 at SERPINA6 vs. SERPINA2, respectively) corresponding to a larger fraction of SNPs with |iHS| > 2. This gene-based analyses, as described and implemented in Voight et al. (2006), use information from a window of at least 50 SNPs centered on each gene; because of the close proximity of SERPINA6 to SERPINA2 and the sparseness of the HapMap Phase I data, many of the SNPs in the windows for these 2 genes overlap, making it difficult to distinguish the location of the selection signal. In our resequencing data, the signature of selection seems to be stronger for a region centered on SERPINA2. Indeed, haplotype tests centered on SERPINA6 did not yield significant results. Because of the higher density of the resequencing data and the presence of a common variant with a clear effect on gene function, that is, the 2-kb deletion in SERPINA2, we hypothesize that SERPINA2 was the target of selection. The other damaging mutations observed in both samples (i.e., 2 frameshift mutations and 2 mutations predicted to affect function) may also be advantageous. The 2 amino acid replacements are expected to have less drastic effects on function compared with the deletion; therefore, they may not be as strongly advantageous as the 2-kb deletion. The 2 frameshift mutations are relatively rare (7.5% and a singleton) and may be more recent than the 2-kb deletion. The finding of 2 independent rearrangements, namely the 2-kb deletion in humans and the 7.5-kb deletion in chimpanzees, in the 6 Myr elapsed since the divergence of these 2 species, corroborates the idea that loss of SERPINA2 is an ongoing process that was associated with a selective advantage during recent primate evolution. This hypothesis is consistent with the high mRNA levels and the tissue-specific expression profiles observed for the nondeleted SERPINA2 gene. In addition, the analysis of structural models shows that the nondeleted SERPINA2 gene not only is expressed, but may also code for a functional SERPIN. Interestingly, although both SERPINA1 and SERPINA2 have the prototypical SERPIN configuration, they diverged at the reactive center, which may have led to different substrate affinities and SERPIN activities. It was previously proposed that, in some cases, pseudogenization could confer a selective advantage. For example, a variant with a premature stop codon in the Caspase12 gene, which codes for a cystein protease, reached near-fixation frequency probably because it confers increased resistance to severe sepsis (Wang et al. 2006; Xue et al. 2006). Likewise, the G protein–coupled receptor 33 (GPR33) and CMP-N-acetylneuraminic acid hydroxylase (CMAH) genes also carry inactivating alleles at high frequencies (a premature stop codon in GPR33 and a 92-bp deletion removing CMAH exon 6) (Rompler et al. 2005; Hayakawa et al. 2006). These inactivating mutations were estimated to have occurred 1 MYA and 3 MYA, respectively. Interestingly, the loss of GPR33 function was proposed to have occurred multiple times during primate and rodent evolution probably as a result of a shared selective pressure (Rompler et al. 2005). Hence, our findings on the loss of SERPINA2, resulting from deletions and other inactivating mutations, add to a growing body of evidence supporting the idea that pseudogenization was advantageous in recent human evolution. Several recent genome-wide surveys have shown that polymorphic deletions are a pervasive feature of human variation (Sebat et al. 2004; Sharp et al. 2005; Tuzun et al. 2005; Conrad et al. 2006; Feuk et al. 2006; Hinds et al. 2006; McCarroll et al. 2006). Despite their abundance, only a few examples of such variants were shown to affect risk for common disease phenotypes or play an important role in human adaptations. In this sense, the polymorphic deletion of SERPINA2 may provide another example of the evolutionary potential of such variation. More generally, if loss of function is an important mechanism contributing to species differences, as posited by the less is more hypothesis (Olson 1999), deletions may be an important source of advantageous variation. Interestingly, in the case of SERPINA2, there seems to be an adaptive convergence between humans and chimpanzees toward loss of function. Although additional functional studies are needed to determine whether the SERPINA2 deletion has phenotypic effects, the high expression levels found in the testis raise the possibility that SERPINA2 plays a role in reproduction. Proteins implicated in reproduction have already been documented in several taxa as preferred targets for adaptive evolution, driven by sperm competition, sexual conflict, or pathogen interactions (Clark and Swanson 2005). It is well known that the orchestrated interaction between proteases and protease inhibitors plays a crucial role in sperm modifications and in tissue integrity, from spermatogenesis to fertilization. In rodents and primates, SERPINs have the ability to inhibit proteases found in the reproductive tract, including semen proteases, such as urokinase (uPA) and/or kallikrein 3 (KLK3 or prostate-specific antigen), which liquefy sperm coagulum leading to the release of sperm. Consistent with our observation at SERPINA2, comparative genomic analysis of proteases and protease inhibitors in rodents (mouse and rat) and primates (human and chimpanzee) have shown a marked change in gene content, namely in kallikreins and SERPIN cluster genes, with lineage-specific patterns of gene inactivation that were proposed to reflect differences in reproductive biology (Puente and Lopez-Otin 2004; Puente, Gutierrez-Fernandez, et al. 2005; Puente, Sanchez, et al. 2005). Alternatively, the signature of selection on the haplotype carrying the SERPINA2 deletion may result from a selective pressure mediated through host–pathogen interactions. In fact, proteases and their inhibitors are known to have important roles in host defense against pathogens, within and outside the reproductive tract. For example, prolactin-induced protein is a protease from the seminal fluid that is thought to protect sperm from infection by binding bacteria and suppressing T-cell apoptosis (Schenkels et al. 1997; Gaubin et al. 1999). On the other hand, SERPINs are proposed to play an important role in antagonizing the pathogen invasion process (Hill and Hastie 1987; Goodwin et al. 1996). For example, SERPINA1 was shown to interfere with Schistosoma, Cryptosporidium, and HIV infections (Asch and Dresden 1977; Forney et al. 1996, 1997; Shapiro et al. 2001; Freudenstein-Dan et al. 2003; Hayes and Gardiner-Garden 2003). The expression of SERPINA2 in leukocytes is consistent with a role in the response to pathogens. Infectious diseases have constituted a major selective pressure in human populations, especially in concomitance with the environmental changes linked to the onset of agriculture. For example, the onset of malaria in Africa has resulted in strong selective pressures acting on variation at the Duffy blood group, β-globin, and glucose-6-phosphate dehydrogenase loci (Hamblin and Di Rienzo 2000; Tishkoff et al. 2001; Currat et al. 2002; Seixas et al. 2002). Interestingly, a crude estimate of the time to the most recent common ancestor of the deletion allele based on the decay of haplotype homozygosity by historical recombination (Voight et al. 2006) suggests a recent selective event, that is, 10,000–24,000 years ago, consistent with the history of selective pressures on host–pathogen interactions. We used a surrogate marker for the deletion, that is, rs1956172 (r2 = 1 and r2 = 0.82 in Portugal and São Tomé, respectively), to infer the deletion frequencies in the HapMap phase II (http://www.hapmap.org/) and Perlegen data (http://genome.perlegen.com/) (Hinds et al. 2005). The inferred frequencies are highest in samples of African ancestry (0.52 in Yoruba and 0.41 in African-Americans), intermediate in Europeans (0.23 in Central Europeans from Utah and 0.23 in European-Americans), and lowest in those of Asian ancestry (0.01 in Chinese, 0 in Japanese, and 0.06 in individuals of Han Chinese ancestry). Based on this geographic distribution of allele frequencies and the restriction of the selection signature to the African samples, it may be speculated that an adaptive pressure driven by host–pathogen interactions is more likely than a selective advantage due to an effect on fertility. Additional data on the phenotypic consequences of the SERPINA2 deletion are necessary to further test the hypothesis of a selective advantage for SERPINA2 loss in human populations. The authors wish to thank the people and the Ministry of Health of the Democratic Republic of São Tomé e Príncipe, Maria de Jesus Trovoada, and Licínio Manco for making São Tomé blood samples available for DNA extraction, Alberto Barros and Mário Sousa for having provided biopsies from human testis, and Molly Przeworski for comments on the manuscript. S.S. is the recipient of a postdoctoral fellowship from the Fundação para a Ciência e a Tecnologia, Portugal (FCTBPD/12532/2003). This research was supported by the Fundação para a Ciência e a Tecnologia (POCTI/42510/ANT/2001 and POCI/BIA-BDE/56654/2004 grants to J.R.) and by National Institutes of Health grant DK56670 to A.D. References Adams AM,  Hudson RR.  Maximum-likelihood estimation of demographic parameters using the frequency spectrum of unlinked single-nucleotide polymorphisms,  Genetics ,  2004, vol.  168 (pg.  1699- 1712) Google Scholar CrossRef Search ADS PubMed  Akey JM,  Eberle MA,  Rieder MJ,  Carlson CS,  Shriver MD,  Nickerson DA,  Kruglyak L.  Population history and natural selection shape patterns of genetic variation in 132 genes,  PLoS Biol ,  2004, vol.  2 pg.  e286  Google Scholar CrossRef Search ADS PubMed  Asch HL,  Dresden MH.  Schistosoma mansoni: inhibition of cercarial “penetration” proteases by components of mammalian blood,  Comp Biochem Physiol B ,  1977, vol.  58 (pg.  89- 95) Google Scholar PubMed  Atchley WR,  Lokot T,  Wollenberg K,  Dress A,  Ragg H.  Phylogenetic analyses of amino acid variation in the serpin proteins,  Mol Biol Evol ,  2001, vol.  18 (pg.  1502- 1511) Google Scholar CrossRef Search ADS PubMed  Bao JJ,  Reed-Fourquet L,  Sifers RN,  Kidd VJ,  Woo SL.  Molecular structure and sequence homology of a gene related to alpha 1-antitrypsin in the human genome,  Genomics ,  1988, vol.  2 (pg.  165- 173) Google Scholar CrossRef Search ADS PubMed  Carrell RW,  Lomas DA.  Alpha1-antitrypsin deficiency—a model for conformational diseases,  N Engl J Med ,  2002, vol.  346 (pg.  45- 53) Google Scholar CrossRef Search ADS PubMed  Clark NL,  Swanson WJ.  Pervasive adaptive evolution in primate seminal proteins,  PLoS Genet ,  2005, vol.  1 pg.  e35  Google Scholar CrossRef Search ADS PubMed  Conrad DF,  Andrews TD,  Carter NP,  Hurles ME,  Pritchard JK.  A high-resolution survey of deletion polymorphism in the human genome,  Nat Genet ,  2006, vol.  38 (pg.  75- 81) Google Scholar CrossRef Search ADS PubMed  Cox DW.  Sriver CR,  Beaudet AL,  Sly WL,  Valle D.  α1-antitrypsin deficiency,  The metabolic and molecular bases of inherited disease ,  1995 New York McGraw-Hill(pg.  4125- 4158) Crawford DC,  Akey DT,  Nickerson DA.  The patterns of natural variation in human genes,  Annu Rev Genomics Hum Genet ,  2005, vol.  6 (pg.  287- 312) Google Scholar CrossRef Search ADS PubMed  Creighton TE,  Darby NJ.  Functional evolutionary divergence of proteolytic enzymes and their inhibitors,  Trends Biochem Sci ,  1989, vol.  14 (pg.  319- 324) Google Scholar CrossRef Search ADS PubMed  Currat M,  Trabuchet G,  Rees D,  Perrin P,  Harding RM,  Clegg JB,  Langaney A,  Excoffier L.  Molecular analysis of the beta-globin gene cluster in the Niokholo Mandenka population reveals a recent origin of the beta(S) Senegal mutation,  Am J Hum Genet ,  2002, vol.  70 (pg.  207- 223) Google Scholar CrossRef Search ADS PubMed  Feuk L,  Carson AR,  Scherer SW.  Structural variation in the human genome,  Nat Rev Genet ,  2006, vol.  7 (pg.  85- 97) Google Scholar CrossRef Search ADS PubMed  Forney JR,  Yang S,  Healey MC.  Protease activity associated with excystation of Cryptosporidium parvum oocysts,  J Parasitol ,  1996, vol.  82 (pg.  889- 892) Google Scholar CrossRef Search ADS PubMed  Forney JR,  Yang S,  Healey MC.  Synergistic anticryptosporidial potential of the combination alpha-1-antitrypsin and paromomycin,  Antimicrob Agents Chemother ,  1997, vol.  41 (pg.  2006- 2008) Google Scholar PubMed  Freudenstein-Dan A,  Gold D,  Fishelson Z.  Killing of schistosomes by elastase and hydrogen peroxide: implications for leukocyte-mediated schistosome killing,  J Parasitol ,  2003, vol.  89 (pg.  1129- 1135) Google Scholar CrossRef Search ADS PubMed  Frisse L,  Hudson RR,  Bartoszewicz A,  Wall JD,  Donfack J,  Di Rienzo A.  Gene conversion and different population histories may explain the contrast between polymorphism and linkage disequilibrium levels,  Am J Hum Genet ,  2001, vol.  69 (pg.  831- 843) Google Scholar CrossRef Search ADS PubMed  Gaillard MC,  Zwi S,  Nogueira CM,  Ludewick H,  Feldman C,  Frankel A,  Tsilimigras C,  Kilroe-Smith TA.  Ethnic differences in the occurrence of the M1(ala213) haplotype of alpha-1-antitrypsin in asthmatic and non-asthmatic black and white South Africans,  Clin Genet ,  1994, vol.  45 (pg.  122- 127) Google Scholar CrossRef Search ADS PubMed  Gaubin M,  Autiero M,  Basmaciogullari S,  Metivier D,  Misëhal Z,  Culerrier R,  Oudin A,  Guardiola J,  Piatier-Tonneau D.  Potent inhibition of CD4/TCR-mediated T cell apoptosis by a CD4-binding glycoprotein secreted from breast tumor and seminal vesicle cells,  J Immunol ,  1999, vol.  162 (pg.  2631- 2638) Google Scholar PubMed  Goodwin RL,  Baumann H,  Berger FG.  Patterns of divergence during evolution of alpha 1-proteinase inhibitors in mammals,  Mol Biol Evol ,  1996, vol.  13 (pg.  346- 358) Google Scholar CrossRef Search ADS PubMed  Hamblin MT,  Di Rienzo A.  Detection of the signature of natural selection in humans: evidence from the Duffy blood group locus,  Am J Hum Genet ,  2000, vol.  66 (pg.  1669- 1679) Google Scholar CrossRef Search ADS PubMed  Hayakawa T,  Aki I,  Varki A,  Satta Y,  Takahata N.  Fixation of the human-specific CMP-N-acetylneuraminic acid hydroxylase pseudogene and implications of haplotype diversity for human evolution,  Genetics ,  2006, vol.  172 (pg.  1139- 1146) Google Scholar CrossRef Search ADS PubMed  Hayes VM,  Gardiner-Garden M.  Are polymorphic markers within the alpha-1-antitrypsin gene associated with risk of human immunodeficiency virus disease?,  J Infect Dis ,  2003, vol.  188 (pg.  1205- 1208) Google Scholar CrossRef Search ADS PubMed  Hill RE,  Hastie ND.  Accelerated evolution in the reactive centre regions of serine protease inhibitors,  Nature ,  1987, vol.  326 (pg.  96- 99) Google Scholar CrossRef Search ADS PubMed  Hinds DA,  Kloek AP,  Jen M,  Chen X,  Frazer KA.  Common deletions and SNPs are in linkage disequilibrium in the human genome,  Nat Genet ,  2006, vol.  38 (pg.  82- 85) Google Scholar CrossRef Search ADS PubMed  Hinds DA,  Stuve LL,  Nilsen GB,  Halperin E,  Eskin E,  Ballinger DG,  Frazer KA,  Cox DR.  Whole-genome patterns of common DNA variation in three human populations,  Science ,  2005, vol.  307 (pg.  1072- 1079) Google Scholar CrossRef Search ADS PubMed  Hofker MH,  Nelen M,  Klasen EC,  Nukiwa T,  Curiel D,  Crystal RG,  Frants RR.  Cloning and characterization of an alpha 1-antitrypsin like gene 12 KB downstream of the genuine alpha 1-antitrypsin gene,  Biochem Biophys Res Commun ,  1988, vol.  155 (pg.  634- 642) Google Scholar CrossRef Search ADS PubMed  Hudson RR.  Two-locus sampling distributions and their application,  Genetics ,  2001, vol.  159 (pg.  1805- 1817) Google Scholar PubMed  Hudson RR.  Generating samples under a Wright-Fisher neutral model of genetic variation,  Bioinformatics ,  2002, vol.  18 (pg.  337- 338) Google Scholar CrossRef Search ADS PubMed  Hudson RR,  Bailey K,  Skarecky D,  Kwiatowski J,  Ayala FJ.  Evidence for positive selection in the superoxide dismutase (Sod) region of Drosophila melanogaster,  Genetics ,  1994, vol.  136 (pg.  1329- 1340) Google Scholar PubMed  Irving JA,  Pike RN,  Lesk AM,  Whisstock JC.  Phylogeny of the serpin superfamily: implications of patterns of amino acid conservation for structure and function,  Genome Res ,  2000, vol.  10 (pg.  1845- 1864) Google Scholar CrossRef Search ADS PubMed  Irving JA,  Steenbakkers PJ,  Lesk AM,  Op den Camp HJ,  Pike RN,  Whisstock JC.  Serpins in prokaryotes,  Mol Biol Evol ,  2002, vol.  19 (pg.  1881- 1890) Google Scholar CrossRef Search ADS PubMed  Kelsey GD,  Parkar M,  Povey S.  The human alpha-1-antitrypsin-related sequence gene: isolation and investigation of its expression,  Ann Hum Genet ,  1988, vol.  52 (pg.  151- 160) Google Scholar CrossRef Search ADS PubMed  Kong A,  Gudbjartsson DF,  Sainz J, et al.  ,  (16 co-authors).  A high-resolution recombination map of the human genome,  Nat Genet ,  2002, vol.  31 (pg.  241- 247) Google Scholar PubMed  Lomas DA,  Carrell RW.  Serpinopathies and the conformational dementias,  Nat Rev Genet ,  2002, vol.  3 (pg.  759- 768) Google Scholar CrossRef Search ADS PubMed  Marsden MD,  Fournier RE.  Organization and expression of the human serpin gene cluster at 14q32.1,  Front Biosci ,  2005, vol.  10 (pg.  1768- 1778) Google Scholar CrossRef Search ADS PubMed  McCarroll SA,  Hadnott TN,  Perry GH, et al.  ,  (11 co-authors).  Common deletion polymorphisms in the human genome,  Nat Genet ,  2006, vol.  38 (pg.  86- 92) Google Scholar CrossRef Search ADS PubMed  Namciu SJ,  Friedman RD,  Marsden MD,  Sarausad LM,  Jasoni CL,  Fournier RE.  Sequence organization and matrix attachment regions of the human serine protease inhibitor gene cluster at 14q32.1,  Mamm Genome ,  2004, vol.  15 (pg.  162- 178) Google Scholar CrossRef Search ADS PubMed  Needham M,  Stockley RA.  Alpha 1-antitrypsin deficiency. 3: clinical manifestations and natural history,  Thorax ,  2004, vol.  59 (pg.  441- 445) Google Scholar CrossRef Search ADS PubMed  Nickerson DA,  Tobe VO,  Taylor SL.  PolyPhred: automating the detection and genotyping of single nucleotide substitutions using fluorescence-based resequencing,  Nucleic Acids Res ,  1997, vol.  25 (pg.  2745- 2751) Google Scholar CrossRef Search ADS PubMed  Nukiwa T,  Brantly M,  Ogushi F,  Fells G,  Satoh K,  Stier L,  Courtney M,  Crystal RG.  Characterization of the M1(Ala213) type of alpha 1-antitrypsin, a newly recognized, common “normal” alpha 1-antitrypsin haplotype,  Biochemistry ,  1987, vol.  26 (pg.  5259- 5267) Google Scholar CrossRef Search ADS PubMed  Nukiwa T,  Seyama K,  Kira S.  Crystal RG.  The prevalence of a1AT deficiency outside the United States and Europe,  Alpha-1-antitrypsin deficiency. Biology, pathogenesis, clinical manifestations, therapy ,  1996 New York Marcel Dekker Inc(pg.  293- 300) Olson MV.  When less is more: gene loss as an engine of evolutionary change,  Am J Hum Genet ,  1999, vol.  64 (pg.  18- 23) Google Scholar CrossRef Search ADS PubMed  Puente XS,  Gutierrez-Fernandez A,  Ordonez GR,  Hillier LW,  Lopez-Otin C.  Comparative genomic analysis of human and chimpanzee proteases,  Genomics ,  2005, vol.  86 (pg.  638- 647) Google Scholar CrossRef Search ADS PubMed  Puente XS,  Lopez-Otin C.  A genomic analysis of rat proteases and protease inhibitors,  Genome Res ,  2004, vol.  14 (pg.  609- 622) Google Scholar CrossRef Search ADS PubMed  Puente XS,  Sanchez LM,  Gutierrez-Fernandez A,  Velasco G,  Lopez-Otin C.  A genomic view of the complexity of mammalian proteolytic systems,  Biochem Soc Trans ,  2005, vol.  33 (pg.  331- 334) Google Scholar CrossRef Search ADS PubMed  Ramensky V,  Bork P,  Sunyaev S.  Human non-synonymous SNPs: server and survey,  Nucleic Acids Res ,  2002, vol.  30 (pg.  3894- 3900) Google Scholar CrossRef Search ADS PubMed  Rompler H,  Schulz A,  Pitra C,  Coop G,  Przeworski M,  Paabo S,  Schoneberg T.  The rise and fall of the chemoattractant receptor GPR33,  J Biol Chem ,  2005, vol.  280 (pg.  31068- 31075) Google Scholar CrossRef Search ADS PubMed  Rozas J,  Rozas R.  DnaSP version 2.0: a novel software package for extensive molecular population genetics analysis,  Comput Appl Biosci ,  1997, vol.  13 (pg.  307- 311) Google Scholar PubMed  Sabeti PC,  Reich DE,  Higgins JM, et al.  ,  (17 co-authors).  Detecting recent positive selection in the human genome from haplotype structure,  Nature ,  2002, vol.  419 (pg.  832- 837) Google Scholar CrossRef Search ADS PubMed  Schenkels LC,  Walgreen-Weterings E,  Oomen LC,  Bolscher JG,  Veerman EC,  Nieuw Amerongen AV.  In vivo binding of the salivary glycoprotein EP-GP (identical to GCDFP-15) to oral and non-oral bacteria detection and identification of EP-GP binding species,  Biol Chem ,  1997, vol.  378 (pg.  83- 88) Google Scholar CrossRef Search ADS PubMed  Sebat J,  Lakshmi B,  Troge J, et al.  ,  (21 co-authors).  Large-scale copy number polymorphism in the human genome,  Science ,  2004, vol.  305 (pg.  525- 528) Google Scholar CrossRef Search ADS PubMed  Seixas S,  Ferrand N,  Rocha J.  Microsatellite variation and evolution of the human Duffy blood group polymorphism,  Mol Biol Evol ,  2002, vol.  19 (pg.  1802- 1806) Google Scholar CrossRef Search ADS PubMed  Seixas S,  Garcia O,  Trovoada MJ,  Santos MT,  Amorim A,  Rocha J.  Patterns of haplotype diversity within the serpin gene cluster at 14q32.1: insights into the natural history of the alpha1-antitrypsin polymorphism,  Hum Genet ,  2001, vol.  108 (pg.  20- 30) Google Scholar CrossRef Search ADS PubMed  Serre D,  Nadon R,  Hudson TJ.  Large-scale recombination rate patterns are conserved among human populations,  Genome Res ,  2005, vol.  15 (pg.  1547- 1552) Google Scholar CrossRef Search ADS PubMed  Shapiro L,  Pott GB,  Ralston AH.  Alpha-1-antitrypsin inhibits human immunodeficiency virus type 1,  FASEB J ,  2001, vol.  15 (pg.  115- 122) Google Scholar CrossRef Search ADS PubMed  Sharp AJ,  Locke DP,  McGrath SD, et al.  ,  (14 co-authors).  Segmental duplications and copy-number variation in the human genome,  Am J Hum Genet ,  2005, vol.  77 (pg.  78- 88) Google Scholar CrossRef Search ADS PubMed  Smith CL,  Power SG,  Hammond GL.  A Leu–His substitution at residue 93 in human corticosteroid binding globulin results in reduced affinity for cortisol,  J Steroid Biochem Mol Biol ,  1992, vol.  42 (pg.  671- 676) Google Scholar CrossRef Search ADS PubMed  Stajich JE,  Hahn MW.  Disentangling the effects of demography and selection in human history,  Mol Biol Evol ,  2005, vol.  22 (pg.  63- 73) Google Scholar CrossRef Search ADS PubMed  Stein PE,  Carrell RW.  What do dysfunctional serpins tell us about molecular mobility and disease?,  Nat Struct Biol ,  1995, vol.  2 (pg.  96- 113) Google Scholar CrossRef Search ADS PubMed  Stephens M,  Donnelly P.  A comparison of bayesian methods for haplotype reconstruction from population genotype data,  Am J Hum Genet ,  2003, vol.  73 (pg.  1162- 1169) Google Scholar CrossRef Search ADS PubMed  Stephens M,  Smith NJ,  Donnelly P.  A new statistical method for haplotype reconstruction from population data,  Am J Hum Genet ,  2001, vol.  68 (pg.  978- 989) Google Scholar CrossRef Search ADS PubMed  Tajima F.  Statistical method for testing the neutral mutation hypothesis by DNA polymorphism,  Genetics ,  1989, vol.  123 (pg.  585- 595) Google Scholar PubMed  Teshima KM,  Coop G,  Przeworski M.  How reliable are empirical genomic scans for selective sweeps?,  Genome Res ,  2006, vol.  16 (pg.  702- 712) Google Scholar CrossRef Search ADS PubMed  Tishkoff SA,  Varkonyi R,  Cahinhinan N, et al.  ,  (17 co-authors).  Haplotype diversity and linkage disequilibrium at human G6PD: recent origin of alleles that confer malarial resistance,  Science ,  2001, vol.  293 (pg.  455- 462) Google Scholar CrossRef Search ADS PubMed  Tomas G,  Seco L,  Seixas S,  Faustino P,  Lavinha J,  Rocha J.  The peopling of Sao Tome (Gulf of Guinea): origins of slave settlers and admixture with the Portuguese,  Hum Biol ,  2002, vol.  74 (pg.  397- 411) Google Scholar CrossRef Search ADS PubMed  Torpy DJ,  Bachmann AW,  Gartside M,  Grice JE,  Harris JM,  Clifton P,  Easteal S,  Jackson RV,  Whitworth JA.  Association between chronic fatigue syndrome and the corticosteroid-binding globulin gene ALA SER224 polymorphism,  Endocr Res ,  2004, vol.  30 (pg.  417- 429) Google Scholar CrossRef Search ADS PubMed  Tuzun E,  Sharp AJ,  Bailey JA, et al.  ,  (12 co-authors).  Fine-scale structural variation of the human genome,  Nat Genet ,  2005, vol.  37 (pg.  727- 732) Google Scholar CrossRef Search ADS PubMed  van Gent D,  Sharp P,  Morgan K,  Kalsheker N.  Serpins: structure, function and molecular evolution,  Int J Biochem Cell Biol ,  2003, vol.  35 (pg.  1536- 1547) Google Scholar CrossRef Search ADS PubMed  Voight BF,  Adams AM,  Frisse LA,  Qian Y,  Hudson RR,  Di Rienzo A.  Interrogating multiple aspects of variation in a full resequencing data set to infer human population size changes,  Proc Natl Acad Sci ,  2005, vol.  102 (pg.  18508- 18513) Google Scholar CrossRef Search ADS PubMed  Voight BF,  Kudaravalli S,  Wen X,  Pritchard JK.  A map of recent positive selection in the human genome,  PLoS Biol ,  2006, vol.  4 pg.  e72  Google Scholar CrossRef Search ADS PubMed  Wall JD,  Przeworski M.  When did the human population size start increasing?,  Genetics ,  2000, vol.  155 (pg.  1865- 1874) Google Scholar PubMed  Wang X,  Grus WE,  Zhang J.  Gene losses during human origins,  PLoS Biol ,  2006, vol.  4 pg.  e52  Google Scholar CrossRef Search ADS PubMed  Watterson GA.  On the number of segregating sites in genetical models without recombination,  Theor Popul Biol ,  1975, vol.  7 (pg.  256- 276) Google Scholar CrossRef Search ADS PubMed  [WHO] World Health Organization Alpha 1-antitrypsin deficiency: memorandum from a WHO meeting,  Bull W H O ,  1997, vol.  75 (pg.  397- 415) PubMed  Xue Y,  Daly A,  Yngvadottir B, et al.  ,  (14 co-authors).  Spread of an inactive form of caspase-12 in humans is due to recent positive selection,  Am J Hum Genet ,  2006, vol.  78 (pg.  659- 670) Google Scholar CrossRef Search ADS PubMed  © The Author 2006. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. All rights reserved. For permissions, please e-mail: journals.permissions@oxfordjournals.org TI - Sequence Diversity at the Proximal 14q32.1 SERPIN Subcluster: Evidence for Natural Selection Favoring the Pseudogenization of SERPINA2 JF - Molecular Biology and Evolution DO - 10.1093/molbev/msl187 DA - 2006-11-29 UR - https://www.deepdyve.com/lp/oxford-university-press/sequence-diversity-at-the-proximal-14q32-1-serpin-subcluster-evidence-GYFwLU4cz0 SP - 587 EP - 598 VL - 24 IS - 2 DP - DeepDyve ER -