Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

Phylogenomic Investigation of CR1 LINE Diversity in Reptiles

Phylogenomic Investigation of CR1 LINE Diversity in Reptiles Abstract It is unlikely that taxonomically diverse phylogenetic studies will be completed rapidly in the near future for nonmodel organisms on a whole-genome basis. However, one approach to advancing the field of “phylogenomics” is to estimate the structure of poorly known genomes by mining libraries of clones from suites of taxa, rather than from single species. The present analysis adopts this approach by taking advantage of megabase-scale end-sequence scanning of reptilian genomic clones to characterize diversity of CR1-like LINEs, the dominant family of transposable elements (TEs) in the sister group of mammals. As such, it helps close an important gap in the literature on the molecular systematics and evolution of retroelements in nonavian reptiles. Results from aligning more than 14 Mb of sequence from the American alligator (Alligator mississippiensis), painted turtle (Chrysemys picta), Bahamian green anole (Anolis smaragdinus.), Tuatara (Sphenodon punctatus), Emu (Dromaius novaehollandiae), and Zebra Finch (Taeniopygia guttata) against a comprehensive library ∼3000 TE-encoding peptides reflect an increasing abundance of LINE and non-long-terminal-repeat (non-LTR) retrotransposon repeat types with the age of common ancestry among exemplar reptilian clades. The hypothesis that repeat diversity is correlated with basal metabolic rate was tested using comparative methods and a significant nonlinear relationship was indicated. This analysis suggests that the age of divergence between an exemplary clade and its sister group as well as metabolic correlates should be considered in addition to genome size in explaining patterns of retroelement diversity. The first phylogenetic analysis of the largely unexplored chicken repeat 1 (CR1) 3′ reverse transcriptase (RT) conserved domains 8 and 9 in nonavian reptiles reveals a pattern of multiple lineages with variable branch lengths, suggesting presence of both old and young elements and the existence of several distinct well-supported clades not apparent from previous characterization of CR1 subfamily structure in birds and the turtle. This mode of CR1 evolution contrasts with historical patterns of LINE 1 diversification in mammals and hints toward the existence of a rich but still largely unexplored diversity of nonavian retroelements of importance to advancing both comparative vertebrate genomics and amniote systematics. Amniote, CR1 LINE, phylogenomics, Reptilia, retroelement, RT domain The Advent of Reptilian Phylogenomics The repetitive landscape of eukaryotic genomes has emerged as an important area of recent “phylogenomic” research in light of both the role transposable elements (TEs) play in generating genomic diversity (Batzer and Deininger, 2001; Kazazian, 2004) and their exceptional value as systematic characters (Okada et al., 2004; Shedlock and Okada, 2000; Shedlock et al., 2004). Comparative analysis of vertebrate whole-genome sequence assemblies has demonstrated that while the number of genes has apparently remained relatively stable over ∼450 Myr of vertebrate evolution, repetitive DNA elements have undergone a much more dynamic evolutionary history (ICGSC, 2004; Jaillon et al., 2004; Lander et al., 2001; RGSPC, 2004; Waterston et al., 2002). The importance of repetitive elements in modulating the nearly fivefold range of genome size among living amniotes is becoming increasingly apparent from comparative investigations (Kazazian, 2004; Primmer et al., 1997; Shedlock et al., 2006a; Wicker et al., 2004). For example, analysis of the chicken genome has suggested that a massive loss of mobile interspersed repeats is likely responsible for the relatively small size of avian genomes (ICGSC, 2004), whereas the proliferation of both transposable elements and nonmobile simple sequence repeats (SSRs) over the last ∼75 million years has significantly accelerated divergence of rodent genomes relative to our own (ICGSC, 2004; Waterston et al., 2002). However, comparative analysis of such chicken-mammal comparisons remains tenuous in the absence of detailed information for nonavian reptile genome structure. Despite the completion of the chicken genome assembly, the near absence of genome-scale information on nonavian reptiles precludes our ability to infer how mammals diverged from the common amniote ancestor. The approximately fourfold difference in divergence time between fish (e.g., Fugu) and humans versus mouse and humans often makes genome comparisons within each of these species pairs either too conserved or too distant to effectively analyze many functional sequences of interest (Elgar et al., 1999). Likewise the low-resolution taxon sampling in vertebrates thus far limits our ability to make biologically meaningful inferences about fundamental questions in vertebrate evolution (Garland and Adolph, 1994) and makes pairwise comparisons of species the only viable approach to studying genome evolution (Hughes and Hughes, 1995). Genomic Diversity and the Reptilian Repetitive Landscape The majority of vertebrate genomes are comprised of mobile, repetitive elements and their inactive, “fossilized” remains that not only create distinctive profiles of genome structure but can provide landmarks of species phylogeny (Shedlock and Okada, 2000; Smit, 1999). Class I transposable elements (TEs) comprise nearly half of human chromosomal DNA (Lander et al., 2001) and among vertebrates are dominated by non-long-terminal-repeat (non-LTR) short and long interspersed elements (SINEs and LINEs) (Malik et al., 1999; Weiner et al., 1986) that rely on a copy-and-paste mechanism termed retrotransposition, which utilizes an RNA intermediate for amplification and movement about the genome (Kajikawa and Okada, 2002; Luan et al., 1993). The chicken genome appears to possess exclusively chicken repeat 1 (CR1) non-LTR LINEs (Burch et al., 1993; Vandergon and Reitman, 1994; Wicker et al., 2004) and a depauperate number of CR1-related SINEs that share conserved core sequence blocks with mammalian-interspersed repeats (MIRs) (Gilbert and Labuda, 1999; ICGSC, 2004; Smit and Riggs, 1995; Watanabe et al., 2006). CR1-like and MIR-like repeats are relatively ancient TEs that have been detected across all vertebrate classes (Kajikawa et al., 1997; Vandergon and Reitman, 1994). The preliminary picture of interspersed repeats in nonavian reptile genomes seems to be considerably more complex, with both ancient CR1s present as well as lineage-specific elements (Kajikawa et al., 1997; Lovsin et al., 2001; Piskurek et al., 2006; Sasaki et al., 2004). Reptilia, which of course includes all birds, are far more taxonomically diverse than their sister group, mammals (∼17,000 versus ∼4500 species) and also exhibit a remarkable range of variation in morphological, developmental, reproductive, and chromosomal triats. We know from the chicken genome assembly (ICGSC, 2004) that similar numbers of genes exist in birds as in mammals, but there are presently insufficient data to confirm this situation in non-avian reptiles. Fewer retroelements (e.g., CR1), microsatellites, and intergenic and intronic DNA likely acccount in part for the reduced genome sizes apparent in some reptilian lineages (Shedlock et al., 2006a; Waltari and Edwards, 2002). It has also been suggested that intron and genome size may be correlated with basal metabolic rate and associated selection for cell size (Hughes and Piontkivska, 2005; Hughes and Hughes, 1995), but this trend is upheld only in specific clades across vertebrates (Waltari and Edwards, 2002). A complete picture of the diversity and evolutionary dynamics of reptilian TEs has thus remained impossible to infer accurately from available bird, mammal, and fish DNA sequences and will be central to establishing an accurate model for genomic diversification since our amniote common ancestor lived some 310 million years ago (Mya). In the present investigation of reptilian lineages, results on the abundance, distribution, and phylogenetic structure of retroelements are placed into a comparative evolutionary perspective that could only be achieved through examination of large-scale sequence data previously unavailable for the sister taxa of both birds and mammals. Although it is unlikely that taxonomically diverse phylogenetic studies will be completed in the near future for non-model organisms on a whole-genome basis, one approach to advancing the field of phylogenomics is to estimate the structure of poorly known genomes by mining libraries of clones from suites of taxa, rather than from single species (Edwards et al., 2005; Shedlock et al., 2006b). The present analysis adopts this approach by taking advantage of megabase-scale end-sequence scanning of reptilian genomic clones to characterize diversity of CR1-like LINEs, the dominant family of TEs in the sister group of mammals, and thereby helps close an important gap in the literature on the molecular systematics and evolution of retroelements in eukaryotes. Target Species and Phylogenomic Resources The American alligator (Alligator mississippiensis), painted turtle (Chrysemys picta), Bahamian green anole (Anolis smaragdinus), and Tuatara (Sphenodon punctatus) are four vertebrates that each represent a major nonavian reptilian clade and thus make exemplary species for phylogenomic research on the repetitive landscape in the sister group of mammals. Moreover, Bacterial Artificial Chromosome (BAC) libraries for these and other reptile species have been produced and are accessible through the Joint Genome Institute (http://evogen.jgi.doe.gov/second_levels/BACs/Our_libraries.html). These resources have helped generate support for an initiative to sequence the first reptile genome from Anolis carolinensis (see NHGRI Genome Sequencing Proposals, http://www.genome.gov; also http://www.reptilegenome.com). Genomic clone analysis for these four exemplary reptile species can bring a considerable amount of new molecular data to bear on a host of important research problems in vertebrate biology and medicine. Although the complete genome assembly and details of repeat structure have now been published for the chicken (ICGSC, 2004; Wicker et al., 2004), it is valuable to compare the chicken model with multimegabase BAC-clone sequences recently made available for a phylogenetically basal ratite bird, the Emu (Dromaius novaehollandiae), and a derived passerine bird, the Zebra Finch (Taeniopygia guttata). Material and Methods Genbank accession numbers for all primary sequences analyzed in this study are listed in Supplementary Materials (http://www.systematicbiology.org). More than 14 megabases (Mb) of reptile BAC sequence were derived from a total of 8638 nonoverlapping paired BAC-and plasmid-end reads produced as part of a separate investigation of amniote genomics (Shedlock et al., 2006a) and compiled from publicly available databases of BAC-clone sequences (http://www.ncbi.nlm.nih.gov). Reference sequences of CR1 reverse transcriptase coding domains were obtained from Repbase (Jurka et al., 2005; http://www.girinst.org/repbase/index.html) and the literature (ICGSC, 2004; Kajikawa et al., 1997; Malik et al., 1999; Sasaki et al., 2004; Silva and Burch, 1989; Vandergon and Reitman, 1994; Wicker et al., 2004). The reptile sequence was surveyed for local alignments to a comprehensive database of transposable element encoded proteins containing ∼3000 TE-derived peptides (∼2.2 million amino acids) using the program RepeatProteinMasker v. 3.1.5 (Smit, 2006). Output tables from RepeatProteinMaker queries are listed in Supplementary Materials online at http://www.systematicbiology.org. Masked sequence for protein hits with values of ≤ e−10 probability of random alignment were sorted by repeat type and subjected to pairwise BLASTn and tBLASTx alignments (http://www.ncbi.nlm.nih.gov/blast/bl2seq/wblast2.cgi?0) to the published 3' reverse transcriptase (RT) ORF-2 domain sequence of the CR1 LINE element for chicken, Gallus gallus (Burch et al., 1993), and PsCR1 LINE element sequence for side-necked turtle, Platemys spixii (Kajikawa et al., 1997). The 3' RT domain of individual LINE elements was chosen for building a data matrix of orthologous nucleotide sequences because of the common truncation of CR1s upon insertion that extends a variable 5' distance from a common 3' end. Overlapping pairwise alignments of newly isolated reptile CR1 LINE sequences with published 3' RT gene reference sequences were compiled and aligned simultaneously in ClustalW (Thompson et al., 1994), then manually edited for length variation and gaps using MacClade v. 4.06 (Maddison and Maddison, 2000) prior to phylogenetic analysis. The matrix has been submitted to TreeBASE (http://www.treebase.org). Amino acid translations from tBlastx and MacClade were considered in all 3' reading frames for forward and reverse DNA strands and compiled relative to the published alignment for conserved RT domains among eukaryotes (Malik et al., 1999; Xiong and Eickbush, 1990). Phylogeny of reptilian CR1 RT sequences was inferred by Bayesian analysis of the nucleotide data matrix using MrBayes (Huelsenbeck and Ronquist, 2001; Ronquist and Huelsenbeck, 2003) with 10 million generations (10%burn-in component) run under a general time-reversible evolutionary model, including an estimated proportion of invariable sites and gamma-shaped distribution of rate variation across sites. Genetic distances of aligned RT domain amino acid sequences and phylogenetic trees generated by MrBayes were evaluated using PAUP* v. 4.0 (Swofford, 1999). Genome size statistics were obtained from Gregory (2005). Phylogenetically independent contrasts were employed using the program PDAP (Garland et al., 1993; Garland and Ives, 2000) as implemented in the Mesquite v. 1.06 software platform (Maddison and Maddison, 2005) to test for correlations between lineage-specific repeat diversity and mass-corrected basal metabolic rate (BMR). A generalized least squares (GLS) model was also employed as implemented in the program Continuous (Pagel, 1999) to test for correlated evolution between this pair of characters. Natural log transformations of data were used to evaluate the linearity of relationship. A likelihood-ratio test was used to evaluate the strength of the correlation based on the regression generated by Continuous. Scaled BMR values for representative reptilian species in units of kilocalories/gram-day were obtained from data published in Waltari and Edwards (2002) and Nagy et al. (1999) using a scaling factor of 0.75 to accommodate rates of change in metabolism with change in body mass. The index of repeat diversity was based on the total number of different types of LINE and LTR repeats detected with RepeatProteinMasker (Smit, 2006) per megabase of sequence for each exemplar species examined. The phylogenetic tree of exemplar reptilian clades used to calculate regressions is summarized in Figure 1. Branch lengths were calibrated based on divergence time estimates of Kumar and Hedges (1998). All input values and output results from the above comparative methods analyses are presented in Supplementary Materials (http://www.systematicbiology.org). Figure 1 View largeDownload slide Chart of the frequency of repeat types detected per megabase of reptile sequence examined per species. Repeat classifications were determined with RepeatProteinMasker v. 3.1.5 (Smit, 2006) and are based on only alignments with random probability ≤e−10. The phylogenetic relationships of major reptilian clades are indicated by the tree insert. Relationships follow Lee (2001) but place turtles near the archosaurs based on recent molecular studies cited in the text. Divergence time estimates scaled in Myr that were used to calibrate the tree for comparative analyses are indicated at nodes. Estimates are taken from Kumar and Hedges (1998). Figure 1 View largeDownload slide Chart of the frequency of repeat types detected per megabase of reptile sequence examined per species. Repeat classifications were determined with RepeatProteinMasker v. 3.1.5 (Smit, 2006) and are based on only alignments with random probability ≤e−10. The phylogenetic relationships of major reptilian clades are indicated by the tree insert. Relationships follow Lee (2001) but place turtles near the archosaurs based on recent molecular studies cited in the text. Divergence time estimates scaled in Myr that were used to calibrate the tree for comparative analyses are indicated at nodes. Estimates are taken from Kumar and Hedges (1998). Results and Discussion Patterns of Retrotransposon Diversity Based on Comparison of TE-Derived Peptides A well-known ascertainment bias exists for screening reference DNA sequence databases for retroelements between relatively divergent species, such that only the most conserved subset of target elements are recovered. Furthermore, numerous false positives appear from spurious alignments (e.g., BLASTn, DNA RepeatMasker) to short noncoding TEs such as SINEs and retroviral-like elements and low complexity and tandem repeats. This problem is greatly simplified when restricting alignments to amino acids of TE-derived peptides such as those in the endonuclease (EN) and reverse transcriptase (RT) domains encoded by LINEs. Such targeted queries are also valuable for reducing spurious matches in BLASTx-like searches against the comprehensive annotated protein database. TE-derived amino acid alignments are particularly useful for classifying autonomous DNA-and RNA-based retroelements in the absence of a reference sequence library for species under investigation, such as the case of nonavian reptiles investigated here. Moreover, RT and EN domain sequences have been used as a standard character set for inferring phylogeny and defining subfamily structure of LINEs in eukaryotes (Malik et al., 1999; Xiong and Eickbush, 1990). False positives from TE-derived functional genes are expected to be negligible in the present analysis of BAC-and plasmid-clone sequence data based on random sampling of the majority of clones and the small fraction of the each genome surveyed. It is recognized, however, that the detection of repeats in the present survey may be bias slightly upward or downward from true genome-wide densities due to both experimental and informatics reasons. The results of a survey of the different types of LINE and LTR retrotransposons detected in more than 14 Mb of genomic-clone sequence from six reptile target species using RepeatProteinMasker is summarized in Table 1. Here we refer to retroelement diversity based on the numbers of different repeat types detected per megabase of sequence with less than e− 10 probability of random alignment to the reference library of peptide sequences. A number of class II DNA–based transposons were also detected but have been excluded from this summary focused on class I CR1–like LINE retroelements. A clear decreasing trend in LINE/LTR family diversity is apparent in archosaurs relative to squamate lizards and the turtle, with only CR1 LINEs present in birds. Accounting for uneven genome size and variation in the amount of sequence surveyed per species examined reflects an increasing trend in repeat diversity with the relative age of divergence times indicated for major reptilian clades in the phylogram insert in Figure 1. Results of both phylogenetically independent contrasts (Garland et al., 1993; Garland and Ives, 2000) and GLS regression (Pagel, 1999) analyses confirmed significant correlated evolution between the natural log of total repeat types detected per species and BMR (P < 0.0061; R2 = 0.941; likelihood ratio = 14.18). The linear correlation was not significant (P < 0.297; R2 = 0.345), indicating a nonlinear relationship between these two traits. The strength of the correlations obtained may be slightly elevated by the limited taxonomic sampling of the present analysis, which in the absence of data for extinct clades creates an internode distance between avian and nonavian taxa almost 10 times larger than distances between nonavian sister lineages. Significant correlations based on more inclusive sampling have been previously reported between BMR and genome size among amniotes and archosaurs (Olmo, 2003; Waltari and Edwards, 2002), although the significance of this relationship was not upheld when examining only avian species (Waltari and Edwards, 2002). Overall, results from the use of comparative methods here suggest that in addition to considerations of genome size, both the age of divergence of an exemplary clade from its sister group and physiological correlates linked to metabolism may be influencing the diversity of retroelements in major reptilian clades. The degree to which each of these forces is shaping the complexity of the repetitive landscape along particular branches of the amniote tree remains difficult to quantify in the absence of more extensive and detailed comparative information on the structure and composition of reptilian genomes. Table 1 Summary of repeat types within LINE/LTR families detected by alignment with transposable element encoded protein sequences using RepeatProteinMasker. LINE/LTR  Tuatara  Anole  Turtle  Alligator  Emu  Zebra Finch  Genome size (picograms)  5.0  2.3  2.6  2.5  1.6  1.2  Mb of sequence surveyed  1.5  1.6  2.5  2.5  2.3  3.7  Raw total protein hits  1048  421  987  1076  128  451  Total hits ≤ e− 10 per Mb  87.3  42.5  32.8  20.4  8.7  4.3  Total hits ≤ e− 10  131  68  82  51  20  16  CR1  25  18  24  23  18  16  Dong-R4  —  2  —  —  —  —  Jockey  4  1  —  1  —  —  L1  29  7  —  —  —  —  L1(Tx1)  2  —  1  3  —  —  L2  22  10  6  —  —  —  LOA  1  —  —  —  —  —  Penelope  2  1  3  1  —  —  R2  —  —  1  —  —  —  RTE-BovB  11  5  5  3  —  —  RTE-RTE  2  —  2  —  —  —  LTR/Copia  3  8  1  —  —  —  LTR/DIRS1  8  2  5  —  —  —  LTR/ERV1  1  —  18  14  —  —  LTR/ERVL  —  —  —  1  2  —  LTR/Gypsy  15  14  15  4  —  —  LTR/Gypsy (Gmr1)  —  —  1  1  —  —  LTR/Ngaro  6  —  —  —  —  —  LINE/LTR  Tuatara  Anole  Turtle  Alligator  Emu  Zebra Finch  Genome size (picograms)  5.0  2.3  2.6  2.5  1.6  1.2  Mb of sequence surveyed  1.5  1.6  2.5  2.5  2.3  3.7  Raw total protein hits  1048  421  987  1076  128  451  Total hits ≤ e− 10 per Mb  87.3  42.5  32.8  20.4  8.7  4.3  Total hits ≤ e− 10  131  68  82  51  20  16  CR1  25  18  24  23  18  16  Dong-R4  —  2  —  —  —  —  Jockey  4  1  —  1  —  —  L1  29  7  —  —  —  —  L1(Tx1)  2  —  1  3  —  —  L2  22  10  6  —  —  —  LOA  1  —  —  —  —  —  Penelope  2  1  3  1  —  —  R2  —  —  1  —  —  —  RTE-BovB  11  5  5  3  —  —  RTE-RTE  2  —  2  —  —  —  LTR/Copia  3  8  1  —  —  —  LTR/DIRS1  8  2  5  —  —  —  LTR/ERV1  1  —  18  14  —  —  LTR/ERVL  —  —  —  1  2  —  LTR/Gypsy  15  14  15  4  —  —  LTR/Gypsy (Gmr1)  —  —  1  1  —  —  LTR/Ngaro  6  —  —  —  —  —  View Large The exclusive presence of CR1 non-LTR LINEs in emu and zebra finch is predicted by the flat, largely inactive retroposon landscape of the chicken genome assembly (ICGSC, 2004; Wicker et al., 2004) and initial investigation of CR1s in other avian species (Watanabe et al., 2006). Given the near extinction of CR1 activity observed in the chicken, it is noteworthy that an elevated diversity of CR1 types were found in the basal ratite, emu, as well as in the avian sister group represented by the alligator. The large diversity of repeat types in squamates and turtles relative to archosaurs is consistent with previous detection of Bov-B–derived SINEs in squamates (Piskurek et al., 2006), PsCR1-derived polIII SINEs in turtles (Sasaki et al., 2004), and a diverse assemblage of DeuSINEs of apparent mixed ancestry detected in both birds and mammals (Nishihara et al., 2006). Many of these diverse SINE groups exhibit 3' tail or central domain sequence homology with ancient LINE-2 or CR1-like lineages that likely predate the existence of the common amniote ancestor around 310 Mya (Kajikawa et al., 1997; Nishihara et al., 2006), whereas some SINEs detected in lizards, snakes, and turtles, such as those apparently derived from Bov-B and non-psCR1 LINEs, exhibit sequence divergence profiles that suggest they are probably ≤200 Myr of age (Piskurek et al., 2006; Sasaki et al., 2004). Alligator and turtle genome sizes are roughly one-third smaller than human and slightly more than twice the size of the passerine zebra finch, which is similar to chicken but smaller than the more phylogenetically basal ratite emu (Table 1). Anolis has a genome size on the order of 12% smaller than the alligator and painted turtle and is close to the average size across nonavian reptiles. It is thus interesting that Anolis has a higher diversity of detected repeats per megabase than either alligator or turtle. Sphenodon, on the other hand, is both an ancient species and has roughly twice the haploid DNA content of the other nonavian species examined. It exhibits more than twice the number of repeat types per megabase than Anolis and over twenty times more than the zebra finch. It must be emphasized that the phylogenetic position of turtles remains uncertain. Establishing the phylogenetic position of turtles with better certainty will have substantial impact on the interpretation of character change through the course of vertebrate genome evolution—including retroposon dynamics. By collecting a large sampling of gene and noncoding sequences from these lineages, phylogenomic research promises to shed considerable light on this issue. Several recent molecular analyses (Hedges and Poling, 1999; Janke et al., 2001; Zardoya and Meyer, 1998) and one from morphology (Reipell and deBraga, 1996) suggested that turtles are not the sister group of, but are nested within, the diapsids, in some cases as the sister group to birds and crocodilians, in others as sister to crocodilians. Subsequent analyses have revealed weaknesses in both molecular and morphological characters linking turtles to various groups (Cao et al., 2000; Lee, 2001; Mindell et al., 1999; Wilkinson et al., 1997), and in light of this, turtles are shown as sister to archosaurs on the tree of relationships in Figure 1. The intermediate abundance of detected repeat types in the turtle (Table 1) maps consistently with the relationships in Figure 1 if we assume a gradual loss of repeat diversity took place from a relatively large, repeat-rich common amniote ancestral genomic condition some 310 Mya (Shedlock et al., 2006a). Profiles of Conserved CR1 RT Amino Acid Domains CR1 elements were originally described by Stumph et al. (1981) based on chicken-mammal homologous sequence comparisons, and the family was precisely delineated by Silva and Burch (1989) based on characteristic 3' target site duplications of the octamer NATTCTRT. Subsequently, 11 major subfamilies of CR1s have been described in the chicken alone (ICGSC, 2004) and CR1-like elements have been detected in all classes of vertebrates and exhibit a remarkable phylum-wide taxonomic distribution (Kajikawa et al., 1997; Vandergon and Reitman, 1994; Watanabe et al., 2006). A cartoon of the generic full-length CR1 LINE structure is shown in Figure 2a. Intact, active, full-length elements are far less common in the genome than the vast majority of CR1 copies that experience variable length 5' truncation of ORF-1 products upon insertion via target-primed reverse transcription (TPRT; Luan et al., 1993) and are thus inactivated. The conserved 3' RT domains are more frequently detected due to the prevalence of 5' truncation and are the target region for comparative investigation in the present study. Although RT domains 1 to 7 are commonly used as reference sequence (Malik et al., 1999; Xiong and Eickbush, 1990), RT domains 8 and 9 have been given much less attention in the literature despite the detailed characterization of 3' untranslated region (UTR) homology between SINEs and partner LINEs that facilitates SINE nonautonomous amplification (Kajikawa and Okada, 2002; Ohshima et al., 1996). Consequently, the short CR1 composite sequences of two squamates referenced in previous studies of CR1s (e.g., Anolis carolinensis, Genbank accension number L31503; Okinawan pit viper Trimeresurus flavorviridis, D13384; Kajikawa et al., 1997; Vandergon and Reitman, 1994) do not BLAST significantly to the conserved 3' RT domains of newly isolated reptile CR1s examined here. Figure 2 View largeDownload slide (a) Cartoon of full-length CR1 LINE structure indicating typical length in kilobases, 5' promoter-like region, annotated domains for nucleic acid binding, endonuclease, and reverse transcriptase in the two open reading frames (ORFs) and 3' untranslated tail region (UTR). (b) Grayscale multiple protein sequence alignment of amino acid residues spanning conserved CR1 3' reverse transcriptase (RT) domains among eight reptile species. The matrix has 123 sites corresponding to the 3' ORF-2 product region boxed on the cartoon in (a) and includes a subset of 25 representative sequences including published data for chicken, the side-necked turtle, and human (ICGSC, 2004; Jurka et al., 2005; Kajikawa et al., 1997; Vandergon and Reitman, 1994; Wicker et al., 2004). The scale of sequence conservation at each site is indicated below the alignment. The most conserved sites are contained within domains 8 and 9 and annotated by asterisks and dots on top of the alignment. Numbers of residues in each sequence are listed on the right. Sequence names cross-reference with Genbank accessions listed in Supplementary Materials (http://www.systematicbiology.org); CH and HS sequences listed are registered in Repbase (Jurka et al., 2005) under the names shown. Two-letter taxon designations are listed below (b). Figure 2 View largeDownload slide (a) Cartoon of full-length CR1 LINE structure indicating typical length in kilobases, 5' promoter-like region, annotated domains for nucleic acid binding, endonuclease, and reverse transcriptase in the two open reading frames (ORFs) and 3' untranslated tail region (UTR). (b) Grayscale multiple protein sequence alignment of amino acid residues spanning conserved CR1 3' reverse transcriptase (RT) domains among eight reptile species. The matrix has 123 sites corresponding to the 3' ORF-2 product region boxed on the cartoon in (a) and includes a subset of 25 representative sequences including published data for chicken, the side-necked turtle, and human (ICGSC, 2004; Jurka et al., 2005; Kajikawa et al., 1997; Vandergon and Reitman, 1994; Wicker et al., 2004). The scale of sequence conservation at each site is indicated below the alignment. The most conserved sites are contained within domains 8 and 9 and annotated by asterisks and dots on top of the alignment. Numbers of residues in each sequence are listed on the right. Sequence names cross-reference with Genbank accessions listed in Supplementary Materials (http://www.systematicbiology.org); CH and HS sequences listed are registered in Repbase (Jurka et al., 2005) under the names shown. Two-letter taxon designations are listed below (b). Figure 2b presents an alignment of amino acid residues spanning a portion of the 3' end of the CR1 ORF2 region for eight species representing all major reptile clades, including published chicken CR-1 subfamily sequences (ICGSC, 2004; Jurka et al., 2005; Vandergon and Reitman, 1994), the PsCR1 sequence (Kajikawa et al., 1997), and a CR1-like element found in the human genome (Jurka et al., 2005). The extent of variation at each of the sites in the matrix is indicated by the histogram below the alignment and clearly indicates peaks of sequence conservation within the two distinct RT domains 8 and 9. A table of pairwise distances for the 25 sequences in this alignment is included in Supplementary Materials. Mean character differences between host species is smallest for tuatara vs. Anolis (0.14), and greatest for emu versus alligator (0.58), with moderate values observed between chicken subfamilies and non-avian species, including human (0.26–0.46). This range is similar to that observed among birds (0.29–0.41), but as much as 2.5 times greater than the difference among only chicken subfamilies (0.17). Although this protein sequence represents only a limited subsample of data, the overall pattern of variation in RT domain amino acid residues supports the idea that CR1 diversity in reptiles includes both young, lineage-specific elements as well as a significant number of ancient elements shared among host species that predate the amniote common ancestor. Phylogenetic Structure among CR1 3' ORF-2 Sequences The evolutionary pattern of reptilian CR1 inheritance was further evaluated by a phylogenetic analysis of 69 LINE element DNA sequences extending outward from the 3' RT core domains 8 and 9. The character matrix included 1046 sites and was subjected to 10 million generations of Bayesian inference as described in Materials and Methods. The distribution of log likelihoods for all trees is plotted in Figure S1 of Supplementary Materials and indicates that the 10% “burn-in” component completely captures all iterations prior to convergence. A phylogram illustrating branching patterns observed among CR1 element sequences is presented in Figure 3a, and the consensus tree of all 9000 trees with branch nodes supported by ≥ 50% posterior probability is shown in Figure 3b. The tree structure of relationships in Figure 3a shows multiple bifurcations of relatively short and long branch lengths within several distinct clades, such as those of zebra finch, emu, alligator, and turtle, suggesting the presence of multiple active elements of different age across the tree. The topology largely parallels the expected phylogeny of amniotes in that the majority of avian species, including published chicken CR1 subfamilies, are derived in the tree to the exclusion of nonavian taxa. Several basal clades with high probabilities are apparent for turtle and alligator elements with Anolis and tuatara sequences forming a monophyletic group. The published PsCR1 sequence also clusters with a subgroup of painted turtle elements with 100% probability. The most obvious exception to this structure is a subset of 10 elements of mixed host species association that form two derived assemblages at the top of the tree with long branch lengths in Figure 3a and high posterior probability in Figure 3b. The repeat type based on RepeatProteinMasker classification of each of these sequences is indicated next to their taxon labels in Figure 3a and shows a pattern of mixed association with chicken subfamilies C, E, and Y and the PsCR1 turtle LINE. One possibility is that these elements have experienced an increased rate of sequence evolution that now distinguishes them as an evolutionarily distinct assemblage. However, the long branch lengths and mixed species and subfamily associations of this group caution that they may require additional data to properly classify and place accurately within a phylogenetic context. Figure 3 View largeDownload slide (a) Phylogram illustrating branch length patterns generated by Bayesian analysis of 1046 aligned nucleotide positions among 68 reptilian CR1 3' RT sequences, rooted with data for Homo sapiens (LnL = −23,050.319). Parameter settings are listed in Materials and Methods. Two-letter taxon designations are listed below (b). Repeat classifications for taxa near the top of the tree are in parentheses. (b) Consensus of 9000 MrBayes trees generated for the same data set analyzed in (a) subsampled from 10 million iterations minus a 10% burn-in component. The distribution of likelihood values for all trees is plotted in Figure S1 in Supplementary Materials (http://www.systematicbiology.org). Posterior probabilities of relationships > 50% are indicated at nodes. After the suggestion of Malik et al. (1999), a set of novel clades with >70% support that do not group with known CR1 subfamily sequences are annotated with bold font and brackets. Am, A lligatorm ississippiensis; Sp, S phenodonp unctatus; As, A noliss maragdinus; Cp, C hrysemysp icta. Figure 3 View largeDownload slide (a) Phylogram illustrating branch length patterns generated by Bayesian analysis of 1046 aligned nucleotide positions among 68 reptilian CR1 3' RT sequences, rooted with data for Homo sapiens (LnL = −23,050.319). Parameter settings are listed in Materials and Methods. Two-letter taxon designations are listed below (b). Repeat classifications for taxa near the top of the tree are in parentheses. (b) Consensus of 9000 MrBayes trees generated for the same data set analyzed in (a) subsampled from 10 million iterations minus a 10% burn-in component. The distribution of likelihood values for all trees is plotted in Figure S1 in Supplementary Materials (http://www.systematicbiology.org). Posterior probabilities of relationships > 50% are indicated at nodes. After the suggestion of Malik et al. (1999), a set of novel clades with >70% support that do not group with known CR1 subfamily sequences are annotated with bold font and brackets. Am, A lligatorm ississippiensis; Sp, S phenodonp unctatus; As, A noliss maragdinus; Cp, C hrysemysp icta. Two contrasting phylogenetic tree shapes are diagrammed in Figure 4, to illustrate alternative modes of retroelement evolution based on single versus multiple active source element models. The branching pattern for CR1 element sequences analyzed here clearly reflects the pattern in Figure 4b, whereas the tree shape in Figure 4a has been observed for LINE 1 elements in mammals, where successive subfamilies have gone extinct shortly after diverging from a common lineage (Adey et al., 1994; Boissinot et al., 2000; Khan et al., 2006). Results of the present phylogenetic analysis reveal a bifurcating process of diversification from multiple active progenitors where young elements of recent common ancestry can be found together with more widely divergent lineages of much older age. This process has resulted in a diverse CR1 landscape in reptilian host genomes indicated by earlier studies of subfamily structure of mostly avian elements (Vandergon and Reitman, 1994) and elaborated here for major nonavian reptile clades. This is reflected in Figure 3 by highly supported distinct clades of nonavian CR1s that do not cluster with either chicken subfamilies or the PsCR1 sequence. Each of these new clades with greater than 70% statistical support represents a putative subfamily and has been named according to the suggestion of Malik et al. (1999) and annotated in Figure 3b by bold font and brackets as described in the figure legend. Figure 4 View largeDownload slide Differences in tree shapes illustrated by contrasting modes of retroelement evolution. Tree in (a) illustrates pattern produced by persistence of a single, relatively old active lineage with rapid extinction of each young diverged lineage. Tree in (b) illustrates a pattern of multiple active source elements of various age, as reflected by both short and long branch lengths among different clades. Figure 4 View largeDownload slide Differences in tree shapes illustrated by contrasting modes of retroelement evolution. Tree in (a) illustrates pattern produced by persistence of a single, relatively old active lineage with rapid extinction of each young diverged lineage. Tree in (b) illustrates a pattern of multiple active source elements of various age, as reflected by both short and long branch lengths among different clades. Relevance of CR1 Gene Trees to the Use of CR1s for Inferring Species Phylogeny It is important to bear in mind the distinction between the genealogies of CR1 elements themselves and the use of insertion patterns of CR1 elements at independent orthologous loci present in different host genomes to infer species phylogeny. The phylogenetic analysis presented here includes CR1 sequences from many different loci and in effect produces a tree of multi-gene families that have not been completely sampled in the host genomes examined in addition to the limited taxon sampling of reptile species. Thus, the discordance in CR1 gene trees in Figure 3 should not be confused with an indication of species phylogeny. Given the long evolutionary history of CR1s and their diverse subfamily structure, it is expected that CR1 gene trees will likely show significant discordance with their host species trees, even though insertion patterns at orthologous CR1 loci among reptile species should still make excellent phylogenetic markers for advancing reptilian systematics (Shedlock et al., 2004). So although the topologies of gene trees of CR1 elements may not reliably indicate species relationships, they can provide valuable information about the age structure and life span of particular active CR1 subfamilies pertinent to the effective sampling of phylogenetically informative CR1s for othologous LINE insertion studies. Unlike the persistence of only a single active lineage in mammalian LINE 1, multiple active subfamilies of reptilian CR1s should provide numerous sources of both older and younger elements that have proliferated in copy number during different periods of reptilian diversification and therefore can be matched to a variety of different systematic problems of interest at different taxonomic levels of divergence. The high copy numbers of CR1s in reptilian genomes implies that there should also be a substantial density and diversity of partner CR1-derived SINE elements in reptile genomes that can add to the volume of potentially informative phylogenetic markers. Taken together, the variable ages of reptilian retroelements and their high copy numbers should bolster efforts to gather sufficient numbers of informative loci to tackle large, difficult systematic projects, such as the higher-order relationships of modern birds. In such cases, lineage sorting effects and short internode distances among major clades can make it impractical to find enough useful LINE and SINE markers to fully resolve branches of a cladogram (Shedlock et al., 2004). Given the phylogenetic structure of CR1 elements apparent from this and other surveys, the prospects for CR1 retroelements providing a wealth of valuable genetic markers to advance reptilian systematics seems promising. It is anticipated that the forthcoming genome sequence assemblies of Anolis and the zebra finch will greatly advance our understanding of reptilian CR1 evolution by providing two comprehensive annotated libraries of reference sequences from near opposite ends of the reptilian tree of life. Moreover, additional BAC-clone sequences of non-model but phylogenetically unique species such as Sphenodon, as well as highly derived forms such as the garter snake, Thamnophis sirtalis, will surely enrich our comparative understanding of yet unknown reptilian genomic diversity. Acknowledgements I would like to thank Miyako Fujiwara for careful technical assistance with sequence editing and alignment, Charles Chapus for critical computational and informatics support, Chris Organ for providing expertise with comparative methods analysis, and Corrie Saux Moreau for help with running MrBayes on the Harvard OEB Cluster. Two referees provided numerous helpful comments that significantly improved the manuscript. This paper would not have been possible without the enthusiastic support of Scott Edwards, Roderic Page, Deborah Ciszek, members of the SBB Council, and the participants of the 2005 symposium on Genome Analysis and the Molecular Systematics of Retroelements held in Fairbanks, Alaska. Funding was provided in part by the Society of Systematic Biologists and by Harvard University. References Adey N. B.,  Schichman S. A.,  Graham D. K.,  Peterson S. N.,  Edgell M. H.,  Hutchison C. A.3rd.  Rodent L1 evolution has been driven by a single dominant lineage that has repeatedly acquired new transcriptional regulatory sequences,  Mol. Biol. Evol ,  1994, vol.  11 (pg.  778- 789) Google Scholar PubMed  Batzer M.,  Deininger P. L..  Alu repeats and human genomic diversity,  Nat. Rev. Genet. ,  2001, vol.  3 (pg.  370- 379) Google Scholar CrossRef Search ADS   Boissinot S.,  Chevret P.,  Furano A. V..  L1 (LINE-1) Retrotransposon evolution and amplification in recent human history,  Mol. Biol. Evol. ,  2000, vol.  17 (pg.  915- 928) Google Scholar CrossRef Search ADS PubMed  Burch J. B. E.,  Davis D. L.,  Haas N. B..  Chicken repeat 1 elements contain a pol-like open reading frame and belong to the non-long terminal repeat class of retrotransposons,  Proc. Natl. Acad. Sci. USA ,  1993, vol.  90 (pg.  8199- 8203) Google Scholar CrossRef Search ADS   Cao Y.,  Sorenson M. D.,  Kumazawa Y.,  Mindell D. P.,  Hasegawa M..  Phylogenetic position of turtles among amniotes: Evidence from mitochondrial and nuclear genes,  Gene ,  2000, vol.  259 (pg.  139- 148) Google Scholar CrossRef Search ADS PubMed  Edwards S. V.,  Jennings W. B.,  Shedlock A. M..  Phylogenetics of modern birds in the era of genomics,  Proc. R Soc. Lond. B ,  2005, vol.  272 (pg.  979- 992) Google Scholar CrossRef Search ADS   Elgar G.,  Clark M. S.,  Meek S.,  Smith S.,  W. S..  Generation and analysis of 25 Mb of genomic DNA from the pufferfish Fugu rubripes by sequence scanning,  Genome Res. ,  1999, vol.  9 (pg.  960- 971) Google Scholar CrossRef Search ADS PubMed  Garland T.Jr.,  Dickerman A. W.,  Janis C. M.,  Jones J. A..  Phylogenetic analysis of covariance by computer simulation,  Syst. Biol. ,  1993, vol.  42 (pg.  265- 292) Google Scholar CrossRef Search ADS   Garland T.Jr.,  Adolph S. C..  Why not to do two species comparative studies: Limitations on inferring adaptation,  Physiol. Zool. ,  1994, vol.  67 (pg.  797- 828) Google Scholar CrossRef Search ADS   Garland T.Jr.,  Ives R. A..  Using the past to predict the present: Confidence intervals for regression equations in phylogenetic comparative methods,  Am. Nat. ,  2000, vol.  155 (pg.  346- 364) Google Scholar CrossRef Search ADS PubMed  Gilbert N.,  Labuda D..  CORE-SINEs: Eukaryotic short interspersed retroposing elements with common sequence motifs,  Proc. Natl. Acad. Sci. USA ,  1999, vol.  96 (pg.  2869- 2874) Google Scholar CrossRef Search ADS   Gregory T. R.. ,  Animal Genome Size Database ,  2005  http://www.genomesize.com Hedges S. B.,  Poling L. L..  A molecular phylogeny of reptiles,  Science ,  1999, vol.  283 (pg.  998- 1001) Google Scholar CrossRef Search ADS PubMed  Huelsenbeck J. P.,  Ronquist F..  MrBayes: Bayesian inference of phylogeny,  Bioinformatics ,  2001, vol.  17 (pg.  754- 755) Google Scholar CrossRef Search ADS PubMed  Hughes A.,  Piontkivska H..  DNA repeat arrays in chicken and human genomes and the adaptive evolution of avian genome size,  BMC Evol. Biol. ,  2005, vol.  5 pg.  12  Google Scholar CrossRef Search ADS PubMed  Hughes A. L.,  Hughes M. K..  Small genomes for better flyers,  Nature ,  1995, vol.  377 pg.  391  Google Scholar CrossRef Search ADS PubMed  ICGSC, International Chicken Genome Sequencing Consortium Sequence and comparative analysis of the chicken genome provide unique perspectives on vertebrate evolution,  Nature ,  2004, vol.  432 (pg.  695- 716) CrossRef Search ADS PubMed  Jaillon O.,  Aury J. M.,  Brunet F.,  Petit J. L.,  Stange-Thomann N.,  Mauceli E.,  Bouneau L.,  Fischer C.,  Ozouf-Costaz C.,  Bernot A., et al.  Genome duplication in the teleost fish Tetraodon nigroviridis reveals the early vertebrate proto-karyotype,  Nature ,  2004, vol.  431 (pg.  946- 957) Google Scholar CrossRef Search ADS PubMed  Janke A.,  Erpenbeck D.,  Nilsson M.,  Arnason U..  The mitochondrial genomes of the iguana (Iguana iguana) and the caiman (Caiman crocodylus): Implications for amniote phylogeny,  Proc. R. Soc. Lond. B ,  2001, vol.  268 (pg.  623- 631) Google Scholar CrossRef Search ADS   Jurka J.,  Kapitonov V. V.,  Pavlicek A.,  Klownowski P.,  Kohany O.,  Walichiewicz J..  Repbase Update, a database of eukaryotic repetitive elements,  Cytogenet. Genome Res. ,  2005, vol.  110 (pg.  462- 467)  http://www.girinst.org/repbase/index.html Google Scholar CrossRef Search ADS PubMed  Kajikawa M.,  Ohshima K.,  Okada N..  Determination of the entire sequence of turtle CR1: The first open reading frame of the turtle CR1 element encodes a protein with a novel zinc finger motif,  Mol. Biol. Evol. ,  1997, vol.  14 (pg.  1206- 1217) Google Scholar CrossRef Search ADS PubMed  Kajikawa M.,  Okada N..  LINEs mobilize SINEs in the Eel through a shared 3' sequence,  Cell ,  2002, vol.  111 (pg.  433- 444) Google Scholar CrossRef Search ADS PubMed  Kazazian H. H. J..  Mobile elements: Drivers of genome evolution,  Science ,  2004, vol.  303 (pg.  1626- 1632) Google Scholar CrossRef Search ADS PubMed  Khan H.,  Smit A.,  Boissinot S..  Molecular evolution and tempo of amplification of human LINE-1 retrotransposons since the origin of primates,  Genome Res. ,  2006, vol.  16 (pg.  78- 87) Google Scholar CrossRef Search ADS PubMed  Kumar S.,  Hedges B..  A molecular timescale for vertebrate evolution,  Nature ,  1998, vol.  392 (pg.  917- 920) Google Scholar CrossRef Search ADS PubMed  Lander E. S.,  Linton L. M.,  Birren B.,  Nusbaum C.,  Zody M. C.,  Baldwin J.,  Devon K.,  Dewar K.,  Doyle M.,  FitzHugh W., et al.  Initial sequencing and analysis of the human genome,  Nature ,  2001, vol.  409 (pg.  860- 921) Google Scholar CrossRef Search ADS PubMed  Lee M. S. Y..  Molecules, morphology and the monophyly of diapsid reptiles,  Contr. Zool. ,  2001, vol.  70 (pg.  1- 22) Lovsin N.,  Gubensek F.,  Kordi D..  Evolutionary dynamics in a novel L2 clade of non-LTR retrotransposons in Deuterostomia,  Mol. Biol. Evol. ,  2001, vol.  18 (pg.  2213- 2224) Google Scholar CrossRef Search ADS PubMed  Luan D. D.,  Korman H.,  Jakubczak J. L.,  Eickbush T..  Reverse transcription of R2Bm RNA is primed by a nick at the chromosomal target site: A mechanism for non-LTR retrotransposition,  Cell ,  1993, vol.  72 (pg.  595- 605) Google Scholar CrossRef Search ADS PubMed  Maddison W. P.,  Maddison D. R.. ,  MacClade 4: Interactive analysis of phylogeny and character evolution ,  2000 Sunderland, Massachusetts Sinauer Associates Maddison W. P.,  Maddison D. R.. ,  Mesquite: A modular system for evolutionary analysis. Version 1.06 ,  2005  http://mesquiteproject.org Malik H. S.,  Burke W. D.,  Eickbush T..  The age and evolution of non-LTR retrotransposable elements,  Mol. Biol. Evol. ,  1999, vol.  16 (pg.  793- 805) Google Scholar CrossRef Search ADS PubMed  Mindell D. P.,  Sorenson M. D.,  Dimcheff D. E.,  Hasegawa M.,  Ast J. C.,  Yuri T..  Interordinal relationships of birds and other reptiles based on whole mitochondrial genomes,  Syst. Biol. ,  1999, vol.  48 (pg.  138- 152) Google Scholar CrossRef Search ADS PubMed  Nagy K. A.,  Girard I. A.,  Brown T. K..  Energetics of free-ranging mammals, reptiles, and birds,  Annu. Rev. Nutr. ,  1999, vol.  19 (pg.  247- 277) Google Scholar CrossRef Search ADS PubMed  Nishihara H.,  Smit A. F. A.,  Norihiro O..  Functional noncoding sequences derived from SINEs in the mammalian genome,  Genome Res. ,  2006, vol.  16 (pg.  864- 874) Google Scholar CrossRef Search ADS PubMed  Ohshima K.,  Hamada M.,  T. Y.,  Okada N..  The 3' ends of short interspersed repetitive elements are derived from the 3' ends of long interspersed repetitive elements,  Mol. Cell. Biol. ,  1996, vol.  16 (pg.  3756- 3764) Google Scholar CrossRef Search ADS PubMed  Okada N.,  Shedlock A. M.,  Nikaido M..  Miller W. J.,  Capy P..  Retroposon mapping in molecular systematics,  Mobile genetic elements ,  2004 Totowa, New Jersey Humana Press(pg.  189- 226) Google Scholar CrossRef Search ADS   Olmo E..  Reptiles: A group in transition in the evolution of genome size and the nucleotypic effect,  Cytogenet. Genome Res. ,  2003, vol.  101 (pg.  166- 171) Google Scholar CrossRef Search ADS PubMed  Pagel M..  Inferring the historical patterns of biological evolution,  Nature ,  1999, vol.  401 (pg.  877- 884) Google Scholar CrossRef Search ADS PubMed  Piskurek O.,  Austin C. C.,  Okada N..  Sauria SINEs: Novel short interspersed transposable lements that are widespread in reptilian genomes,  J. Mol. Evol. ,  2006, vol.  62 (pg.  630- 644) Google Scholar CrossRef Search ADS PubMed  Primmer C. R.,  Raudsepp T.,  Chowdhary B. P.,  Moller A. P.,  Ellegren H..  Low frequency of microsatellites in the avian genome,  Genome Res. ,  1997, vol.  7 (pg.  471- 482) Google Scholar CrossRef Search ADS PubMed  Reipell O.,  deBraga M..  Turtles as diapsid reptiles,  Nature ,  1996pg.  384  RGSPC, Rat Genome Sequencing Project Consortium Genome sequence of the brown Norway rat yields insights into mammalian evolution,  Nature ,  2004, vol.  428 (pg.  493- 521) CrossRef Search ADS PubMed  Ronquist F.,  Huelsenbeck J. P..  MrBayes 3: Bayesian phylogenetic inference under mixed models,  Bioinformatics ,  2003, vol.  19 (pg.  1572- 1574) Google Scholar CrossRef Search ADS PubMed  Sasaki T.,  Takahashi K.,  Nikaido M.,  Miura S.,  Yasukawa Y.,  Okada N..  First application of the SINE (short interspersed repetitive element) method to infer phylogenetic relationships in reptiles: An example from the turtle superfamily Testudinoidea,  Mol. Biol. Evol. ,  2004, vol.  21 (pg.  705- 715) Google Scholar CrossRef Search ADS PubMed  Shedlock A. M.,  Botka C. W.,  Zhao S.,  Shetty J.,  Zhang T.,  Liu J. S.,  Deschavanne P. J.,  Edwards S. V..  Phylogenomics of non-avian reptiles and the structure of the ancestral amniote genome,  Proc. Natl. Acad. Sci. USA ,  2006  Submitted Shedlock A. M.,  Janes D.,  Edwards S. V..  Murphy W. J..  Amniote phylogenomics: Testing evolutionary hypotheses with BAC library scanning and targeted clone analysis of large-scale DNA sequences from reptiles,  Phylogenomics ,  2006 Totowa, New Jersey Humana Press  in press Google Scholar CrossRef Search ADS   Shedlock A. M.,  Okada N..  SINE insertions: Powerful tools for molecular systematics,  Bioessays ,  2000, vol.  22 (pg.  148- 160) Google Scholar CrossRef Search ADS PubMed  Shedlock A. M.,  Takahashi K.,  Okada N..  SINEs of speciation: Tracking lineages with retroposons,  Trends Ecol. Evol. ,  2004, vol.  19 (pg.  545- 553) Google Scholar CrossRef Search ADS PubMed  Silva R.,  Burch J. B..  Evidence that chicken CR1 elements represent a novel family of retroposons,  Mol. Cell. Biol. ,  1989, vol.  9 (pg.  3563- 3566) Google Scholar CrossRef Search ADS PubMed  Smit A. F. A..  Interspersed repeats and other mementos of transposable elements in mammalian genomes,  Curr. Opin. Genet. Dev. ,  1999, vol.  9 (pg.  657- 663) Google Scholar CrossRef Search ADS PubMed  Smit A. F. A.. ,  RepeatMasker version 3.1.5 ,  2006  http://www.repeatmasker.org Smit A. F. A.,  Riggs A. D..  MIRs are classic, tRNA-derived SINEs that amplified before the mammalian radiation,  Nucleic Acids Res. ,  1995, vol.  23 (pg.  98- 102) Google Scholar CrossRef Search ADS PubMed  Stumph W. E.,  Kristo P.,  Tsai M. J.,  O'Malley B. W..  A chicken middle-repetitive DNA sequence which shares homology with mammalian ubiquitous repeats,  Nucleic Acids Res. ,  1981, vol.  9 (pg.  5383- 5397) Google Scholar CrossRef Search ADS PubMed  Swofford D. L.. ,  PAUP*: Phylogenetic analysis using parsimony (*and other methods) v.4.0b ,  1999 Sunderland, Massachusetts Sinauer Associates Thompson J.,  Higgins D.,  Gibson T..  CLUSTAL W: Improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice,  Nucleic Acids Res. ,  1994, vol.  22 (pg.  4673- 4680) Google Scholar CrossRef Search ADS PubMed  Vandergon T. L.,  Reitman M..  Evolution of chicken repeat 1(CR1) elements: evidence for ancient subfamilies and multiple progenitors,  Mol. Biol. Evol. ,  1994, vol.  11 (pg.  886- 898) Google Scholar PubMed  Waltari E.,  Edwards S. V..  Evolutionary dynamics of intron size, genome size, and physiological correlates in archosaurs,  Am. Nat. ,  2002, vol.  160 (pg.  539- 552) Google Scholar CrossRef Search ADS PubMed  Watanabe M.,  Nikaido M.,  Tsuda T.,  Inoko H.,  Mindell D. P.,  Murata K.,  Okada N..  The rise and fall of the CR1 subfamily in the lineage leading to penguins,  Gene ,  2006, vol.  365 (pg.  57- 66) Google Scholar CrossRef Search ADS PubMed  Waterston R. H.,  Lindblad-Toh K.,  Birney E.,  Rogers J.,  Abril J. F.,  Agarwal P.,  Agarwala R.,  Ainscough R.,  Alexandersson M.,  An P., et al.  Initial sequencing and comparative analysis of the mouse genome,  Nature ,  2002, vol.  420 (pg.  520- 562) Google Scholar CrossRef Search ADS PubMed  Weiner A. M.,  Deininger P. L.,  Efstratiadis A..  Nonviral retroposons: Genes, pseudogenes, and transposable elements generated by the reverse flow of genetic information,  Ann. Rev. Biochem. ,  1986, vol.  55 (pg.  631- 661) Google Scholar CrossRef Search ADS   Wicker T.,  Robertson J. S.,  Schulze S. R.,  Feltus F. A.,  Magrini V.,  Morrison J. A.,  Mardis E. R.,  Wilson R. K.,  Peterson D. G.,  Paterson A. H.,  Ivarie R..  The repetitive landscape of the chicken genome,  Genome Res. ,  2004, vol.  15 (pg.  126- 136) Google Scholar CrossRef Search ADS PubMed  Wilkinson M.,  Thorley J.,  Benton M. J..  Uncertain turtle relationships,  Nature ,  1997, vol.  387 pg.  466  Google Scholar CrossRef Search ADS   Xiong Y.,  Eickbush T. H..  Origin and evolution of retroelements based upon their reverse transcriptase sequences,  EMBO J. ,  1990, vol.  9 (pg.  3353- 3362) Google Scholar PubMed  Zardoya R.,  Meyer A..  Complete mitochondrial genome suggests diapsid affinities of turtles,  Proc. Natl. Acad. Sci. USA ,  1998, vol.  95 (pg.  14226- 14231) Google Scholar CrossRef Search ADS   © 2006 Society of Systematic Biologists http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Systematic Biology Oxford University Press

Phylogenomic Investigation of CR1 LINE Diversity in Reptiles

Systematic Biology , Volume 55 (6) – Dec 1, 2006

Loading next page...
 
/lp/oxford-university-press/phylogenomic-investigation-of-cr1-line-diversity-in-reptiles-O3j4QoxL3l

References (65)

Publisher
Oxford University Press
Copyright
© 2006 Society of Systematic Biologists
ISSN
1063-5157
eISSN
1076-836X
DOI
10.1080/10635150601091924
Publisher site
See Article on Publisher Site

Abstract

Abstract It is unlikely that taxonomically diverse phylogenetic studies will be completed rapidly in the near future for nonmodel organisms on a whole-genome basis. However, one approach to advancing the field of “phylogenomics” is to estimate the structure of poorly known genomes by mining libraries of clones from suites of taxa, rather than from single species. The present analysis adopts this approach by taking advantage of megabase-scale end-sequence scanning of reptilian genomic clones to characterize diversity of CR1-like LINEs, the dominant family of transposable elements (TEs) in the sister group of mammals. As such, it helps close an important gap in the literature on the molecular systematics and evolution of retroelements in nonavian reptiles. Results from aligning more than 14 Mb of sequence from the American alligator (Alligator mississippiensis), painted turtle (Chrysemys picta), Bahamian green anole (Anolis smaragdinus.), Tuatara (Sphenodon punctatus), Emu (Dromaius novaehollandiae), and Zebra Finch (Taeniopygia guttata) against a comprehensive library ∼3000 TE-encoding peptides reflect an increasing abundance of LINE and non-long-terminal-repeat (non-LTR) retrotransposon repeat types with the age of common ancestry among exemplar reptilian clades. The hypothesis that repeat diversity is correlated with basal metabolic rate was tested using comparative methods and a significant nonlinear relationship was indicated. This analysis suggests that the age of divergence between an exemplary clade and its sister group as well as metabolic correlates should be considered in addition to genome size in explaining patterns of retroelement diversity. The first phylogenetic analysis of the largely unexplored chicken repeat 1 (CR1) 3′ reverse transcriptase (RT) conserved domains 8 and 9 in nonavian reptiles reveals a pattern of multiple lineages with variable branch lengths, suggesting presence of both old and young elements and the existence of several distinct well-supported clades not apparent from previous characterization of CR1 subfamily structure in birds and the turtle. This mode of CR1 evolution contrasts with historical patterns of LINE 1 diversification in mammals and hints toward the existence of a rich but still largely unexplored diversity of nonavian retroelements of importance to advancing both comparative vertebrate genomics and amniote systematics. Amniote, CR1 LINE, phylogenomics, Reptilia, retroelement, RT domain The Advent of Reptilian Phylogenomics The repetitive landscape of eukaryotic genomes has emerged as an important area of recent “phylogenomic” research in light of both the role transposable elements (TEs) play in generating genomic diversity (Batzer and Deininger, 2001; Kazazian, 2004) and their exceptional value as systematic characters (Okada et al., 2004; Shedlock and Okada, 2000; Shedlock et al., 2004). Comparative analysis of vertebrate whole-genome sequence assemblies has demonstrated that while the number of genes has apparently remained relatively stable over ∼450 Myr of vertebrate evolution, repetitive DNA elements have undergone a much more dynamic evolutionary history (ICGSC, 2004; Jaillon et al., 2004; Lander et al., 2001; RGSPC, 2004; Waterston et al., 2002). The importance of repetitive elements in modulating the nearly fivefold range of genome size among living amniotes is becoming increasingly apparent from comparative investigations (Kazazian, 2004; Primmer et al., 1997; Shedlock et al., 2006a; Wicker et al., 2004). For example, analysis of the chicken genome has suggested that a massive loss of mobile interspersed repeats is likely responsible for the relatively small size of avian genomes (ICGSC, 2004), whereas the proliferation of both transposable elements and nonmobile simple sequence repeats (SSRs) over the last ∼75 million years has significantly accelerated divergence of rodent genomes relative to our own (ICGSC, 2004; Waterston et al., 2002). However, comparative analysis of such chicken-mammal comparisons remains tenuous in the absence of detailed information for nonavian reptile genome structure. Despite the completion of the chicken genome assembly, the near absence of genome-scale information on nonavian reptiles precludes our ability to infer how mammals diverged from the common amniote ancestor. The approximately fourfold difference in divergence time between fish (e.g., Fugu) and humans versus mouse and humans often makes genome comparisons within each of these species pairs either too conserved or too distant to effectively analyze many functional sequences of interest (Elgar et al., 1999). Likewise the low-resolution taxon sampling in vertebrates thus far limits our ability to make biologically meaningful inferences about fundamental questions in vertebrate evolution (Garland and Adolph, 1994) and makes pairwise comparisons of species the only viable approach to studying genome evolution (Hughes and Hughes, 1995). Genomic Diversity and the Reptilian Repetitive Landscape The majority of vertebrate genomes are comprised of mobile, repetitive elements and their inactive, “fossilized” remains that not only create distinctive profiles of genome structure but can provide landmarks of species phylogeny (Shedlock and Okada, 2000; Smit, 1999). Class I transposable elements (TEs) comprise nearly half of human chromosomal DNA (Lander et al., 2001) and among vertebrates are dominated by non-long-terminal-repeat (non-LTR) short and long interspersed elements (SINEs and LINEs) (Malik et al., 1999; Weiner et al., 1986) that rely on a copy-and-paste mechanism termed retrotransposition, which utilizes an RNA intermediate for amplification and movement about the genome (Kajikawa and Okada, 2002; Luan et al., 1993). The chicken genome appears to possess exclusively chicken repeat 1 (CR1) non-LTR LINEs (Burch et al., 1993; Vandergon and Reitman, 1994; Wicker et al., 2004) and a depauperate number of CR1-related SINEs that share conserved core sequence blocks with mammalian-interspersed repeats (MIRs) (Gilbert and Labuda, 1999; ICGSC, 2004; Smit and Riggs, 1995; Watanabe et al., 2006). CR1-like and MIR-like repeats are relatively ancient TEs that have been detected across all vertebrate classes (Kajikawa et al., 1997; Vandergon and Reitman, 1994). The preliminary picture of interspersed repeats in nonavian reptile genomes seems to be considerably more complex, with both ancient CR1s present as well as lineage-specific elements (Kajikawa et al., 1997; Lovsin et al., 2001; Piskurek et al., 2006; Sasaki et al., 2004). Reptilia, which of course includes all birds, are far more taxonomically diverse than their sister group, mammals (∼17,000 versus ∼4500 species) and also exhibit a remarkable range of variation in morphological, developmental, reproductive, and chromosomal triats. We know from the chicken genome assembly (ICGSC, 2004) that similar numbers of genes exist in birds as in mammals, but there are presently insufficient data to confirm this situation in non-avian reptiles. Fewer retroelements (e.g., CR1), microsatellites, and intergenic and intronic DNA likely acccount in part for the reduced genome sizes apparent in some reptilian lineages (Shedlock et al., 2006a; Waltari and Edwards, 2002). It has also been suggested that intron and genome size may be correlated with basal metabolic rate and associated selection for cell size (Hughes and Piontkivska, 2005; Hughes and Hughes, 1995), but this trend is upheld only in specific clades across vertebrates (Waltari and Edwards, 2002). A complete picture of the diversity and evolutionary dynamics of reptilian TEs has thus remained impossible to infer accurately from available bird, mammal, and fish DNA sequences and will be central to establishing an accurate model for genomic diversification since our amniote common ancestor lived some 310 million years ago (Mya). In the present investigation of reptilian lineages, results on the abundance, distribution, and phylogenetic structure of retroelements are placed into a comparative evolutionary perspective that could only be achieved through examination of large-scale sequence data previously unavailable for the sister taxa of both birds and mammals. Although it is unlikely that taxonomically diverse phylogenetic studies will be completed in the near future for non-model organisms on a whole-genome basis, one approach to advancing the field of phylogenomics is to estimate the structure of poorly known genomes by mining libraries of clones from suites of taxa, rather than from single species (Edwards et al., 2005; Shedlock et al., 2006b). The present analysis adopts this approach by taking advantage of megabase-scale end-sequence scanning of reptilian genomic clones to characterize diversity of CR1-like LINEs, the dominant family of TEs in the sister group of mammals, and thereby helps close an important gap in the literature on the molecular systematics and evolution of retroelements in eukaryotes. Target Species and Phylogenomic Resources The American alligator (Alligator mississippiensis), painted turtle (Chrysemys picta), Bahamian green anole (Anolis smaragdinus), and Tuatara (Sphenodon punctatus) are four vertebrates that each represent a major nonavian reptilian clade and thus make exemplary species for phylogenomic research on the repetitive landscape in the sister group of mammals. Moreover, Bacterial Artificial Chromosome (BAC) libraries for these and other reptile species have been produced and are accessible through the Joint Genome Institute (http://evogen.jgi.doe.gov/second_levels/BACs/Our_libraries.html). These resources have helped generate support for an initiative to sequence the first reptile genome from Anolis carolinensis (see NHGRI Genome Sequencing Proposals, http://www.genome.gov; also http://www.reptilegenome.com). Genomic clone analysis for these four exemplary reptile species can bring a considerable amount of new molecular data to bear on a host of important research problems in vertebrate biology and medicine. Although the complete genome assembly and details of repeat structure have now been published for the chicken (ICGSC, 2004; Wicker et al., 2004), it is valuable to compare the chicken model with multimegabase BAC-clone sequences recently made available for a phylogenetically basal ratite bird, the Emu (Dromaius novaehollandiae), and a derived passerine bird, the Zebra Finch (Taeniopygia guttata). Material and Methods Genbank accession numbers for all primary sequences analyzed in this study are listed in Supplementary Materials (http://www.systematicbiology.org). More than 14 megabases (Mb) of reptile BAC sequence were derived from a total of 8638 nonoverlapping paired BAC-and plasmid-end reads produced as part of a separate investigation of amniote genomics (Shedlock et al., 2006a) and compiled from publicly available databases of BAC-clone sequences (http://www.ncbi.nlm.nih.gov). Reference sequences of CR1 reverse transcriptase coding domains were obtained from Repbase (Jurka et al., 2005; http://www.girinst.org/repbase/index.html) and the literature (ICGSC, 2004; Kajikawa et al., 1997; Malik et al., 1999; Sasaki et al., 2004; Silva and Burch, 1989; Vandergon and Reitman, 1994; Wicker et al., 2004). The reptile sequence was surveyed for local alignments to a comprehensive database of transposable element encoded proteins containing ∼3000 TE-derived peptides (∼2.2 million amino acids) using the program RepeatProteinMasker v. 3.1.5 (Smit, 2006). Output tables from RepeatProteinMaker queries are listed in Supplementary Materials online at http://www.systematicbiology.org. Masked sequence for protein hits with values of ≤ e−10 probability of random alignment were sorted by repeat type and subjected to pairwise BLASTn and tBLASTx alignments (http://www.ncbi.nlm.nih.gov/blast/bl2seq/wblast2.cgi?0) to the published 3' reverse transcriptase (RT) ORF-2 domain sequence of the CR1 LINE element for chicken, Gallus gallus (Burch et al., 1993), and PsCR1 LINE element sequence for side-necked turtle, Platemys spixii (Kajikawa et al., 1997). The 3' RT domain of individual LINE elements was chosen for building a data matrix of orthologous nucleotide sequences because of the common truncation of CR1s upon insertion that extends a variable 5' distance from a common 3' end. Overlapping pairwise alignments of newly isolated reptile CR1 LINE sequences with published 3' RT gene reference sequences were compiled and aligned simultaneously in ClustalW (Thompson et al., 1994), then manually edited for length variation and gaps using MacClade v. 4.06 (Maddison and Maddison, 2000) prior to phylogenetic analysis. The matrix has been submitted to TreeBASE (http://www.treebase.org). Amino acid translations from tBlastx and MacClade were considered in all 3' reading frames for forward and reverse DNA strands and compiled relative to the published alignment for conserved RT domains among eukaryotes (Malik et al., 1999; Xiong and Eickbush, 1990). Phylogeny of reptilian CR1 RT sequences was inferred by Bayesian analysis of the nucleotide data matrix using MrBayes (Huelsenbeck and Ronquist, 2001; Ronquist and Huelsenbeck, 2003) with 10 million generations (10%burn-in component) run under a general time-reversible evolutionary model, including an estimated proportion of invariable sites and gamma-shaped distribution of rate variation across sites. Genetic distances of aligned RT domain amino acid sequences and phylogenetic trees generated by MrBayes were evaluated using PAUP* v. 4.0 (Swofford, 1999). Genome size statistics were obtained from Gregory (2005). Phylogenetically independent contrasts were employed using the program PDAP (Garland et al., 1993; Garland and Ives, 2000) as implemented in the Mesquite v. 1.06 software platform (Maddison and Maddison, 2005) to test for correlations between lineage-specific repeat diversity and mass-corrected basal metabolic rate (BMR). A generalized least squares (GLS) model was also employed as implemented in the program Continuous (Pagel, 1999) to test for correlated evolution between this pair of characters. Natural log transformations of data were used to evaluate the linearity of relationship. A likelihood-ratio test was used to evaluate the strength of the correlation based on the regression generated by Continuous. Scaled BMR values for representative reptilian species in units of kilocalories/gram-day were obtained from data published in Waltari and Edwards (2002) and Nagy et al. (1999) using a scaling factor of 0.75 to accommodate rates of change in metabolism with change in body mass. The index of repeat diversity was based on the total number of different types of LINE and LTR repeats detected with RepeatProteinMasker (Smit, 2006) per megabase of sequence for each exemplar species examined. The phylogenetic tree of exemplar reptilian clades used to calculate regressions is summarized in Figure 1. Branch lengths were calibrated based on divergence time estimates of Kumar and Hedges (1998). All input values and output results from the above comparative methods analyses are presented in Supplementary Materials (http://www.systematicbiology.org). Figure 1 View largeDownload slide Chart of the frequency of repeat types detected per megabase of reptile sequence examined per species. Repeat classifications were determined with RepeatProteinMasker v. 3.1.5 (Smit, 2006) and are based on only alignments with random probability ≤e−10. The phylogenetic relationships of major reptilian clades are indicated by the tree insert. Relationships follow Lee (2001) but place turtles near the archosaurs based on recent molecular studies cited in the text. Divergence time estimates scaled in Myr that were used to calibrate the tree for comparative analyses are indicated at nodes. Estimates are taken from Kumar and Hedges (1998). Figure 1 View largeDownload slide Chart of the frequency of repeat types detected per megabase of reptile sequence examined per species. Repeat classifications were determined with RepeatProteinMasker v. 3.1.5 (Smit, 2006) and are based on only alignments with random probability ≤e−10. The phylogenetic relationships of major reptilian clades are indicated by the tree insert. Relationships follow Lee (2001) but place turtles near the archosaurs based on recent molecular studies cited in the text. Divergence time estimates scaled in Myr that were used to calibrate the tree for comparative analyses are indicated at nodes. Estimates are taken from Kumar and Hedges (1998). Results and Discussion Patterns of Retrotransposon Diversity Based on Comparison of TE-Derived Peptides A well-known ascertainment bias exists for screening reference DNA sequence databases for retroelements between relatively divergent species, such that only the most conserved subset of target elements are recovered. Furthermore, numerous false positives appear from spurious alignments (e.g., BLASTn, DNA RepeatMasker) to short noncoding TEs such as SINEs and retroviral-like elements and low complexity and tandem repeats. This problem is greatly simplified when restricting alignments to amino acids of TE-derived peptides such as those in the endonuclease (EN) and reverse transcriptase (RT) domains encoded by LINEs. Such targeted queries are also valuable for reducing spurious matches in BLASTx-like searches against the comprehensive annotated protein database. TE-derived amino acid alignments are particularly useful for classifying autonomous DNA-and RNA-based retroelements in the absence of a reference sequence library for species under investigation, such as the case of nonavian reptiles investigated here. Moreover, RT and EN domain sequences have been used as a standard character set for inferring phylogeny and defining subfamily structure of LINEs in eukaryotes (Malik et al., 1999; Xiong and Eickbush, 1990). False positives from TE-derived functional genes are expected to be negligible in the present analysis of BAC-and plasmid-clone sequence data based on random sampling of the majority of clones and the small fraction of the each genome surveyed. It is recognized, however, that the detection of repeats in the present survey may be bias slightly upward or downward from true genome-wide densities due to both experimental and informatics reasons. The results of a survey of the different types of LINE and LTR retrotransposons detected in more than 14 Mb of genomic-clone sequence from six reptile target species using RepeatProteinMasker is summarized in Table 1. Here we refer to retroelement diversity based on the numbers of different repeat types detected per megabase of sequence with less than e− 10 probability of random alignment to the reference library of peptide sequences. A number of class II DNA–based transposons were also detected but have been excluded from this summary focused on class I CR1–like LINE retroelements. A clear decreasing trend in LINE/LTR family diversity is apparent in archosaurs relative to squamate lizards and the turtle, with only CR1 LINEs present in birds. Accounting for uneven genome size and variation in the amount of sequence surveyed per species examined reflects an increasing trend in repeat diversity with the relative age of divergence times indicated for major reptilian clades in the phylogram insert in Figure 1. Results of both phylogenetically independent contrasts (Garland et al., 1993; Garland and Ives, 2000) and GLS regression (Pagel, 1999) analyses confirmed significant correlated evolution between the natural log of total repeat types detected per species and BMR (P < 0.0061; R2 = 0.941; likelihood ratio = 14.18). The linear correlation was not significant (P < 0.297; R2 = 0.345), indicating a nonlinear relationship between these two traits. The strength of the correlations obtained may be slightly elevated by the limited taxonomic sampling of the present analysis, which in the absence of data for extinct clades creates an internode distance between avian and nonavian taxa almost 10 times larger than distances between nonavian sister lineages. Significant correlations based on more inclusive sampling have been previously reported between BMR and genome size among amniotes and archosaurs (Olmo, 2003; Waltari and Edwards, 2002), although the significance of this relationship was not upheld when examining only avian species (Waltari and Edwards, 2002). Overall, results from the use of comparative methods here suggest that in addition to considerations of genome size, both the age of divergence of an exemplary clade from its sister group and physiological correlates linked to metabolism may be influencing the diversity of retroelements in major reptilian clades. The degree to which each of these forces is shaping the complexity of the repetitive landscape along particular branches of the amniote tree remains difficult to quantify in the absence of more extensive and detailed comparative information on the structure and composition of reptilian genomes. Table 1 Summary of repeat types within LINE/LTR families detected by alignment with transposable element encoded protein sequences using RepeatProteinMasker. LINE/LTR  Tuatara  Anole  Turtle  Alligator  Emu  Zebra Finch  Genome size (picograms)  5.0  2.3  2.6  2.5  1.6  1.2  Mb of sequence surveyed  1.5  1.6  2.5  2.5  2.3  3.7  Raw total protein hits  1048  421  987  1076  128  451  Total hits ≤ e− 10 per Mb  87.3  42.5  32.8  20.4  8.7  4.3  Total hits ≤ e− 10  131  68  82  51  20  16  CR1  25  18  24  23  18  16  Dong-R4  —  2  —  —  —  —  Jockey  4  1  —  1  —  —  L1  29  7  —  —  —  —  L1(Tx1)  2  —  1  3  —  —  L2  22  10  6  —  —  —  LOA  1  —  —  —  —  —  Penelope  2  1  3  1  —  —  R2  —  —  1  —  —  —  RTE-BovB  11  5  5  3  —  —  RTE-RTE  2  —  2  —  —  —  LTR/Copia  3  8  1  —  —  —  LTR/DIRS1  8  2  5  —  —  —  LTR/ERV1  1  —  18  14  —  —  LTR/ERVL  —  —  —  1  2  —  LTR/Gypsy  15  14  15  4  —  —  LTR/Gypsy (Gmr1)  —  —  1  1  —  —  LTR/Ngaro  6  —  —  —  —  —  LINE/LTR  Tuatara  Anole  Turtle  Alligator  Emu  Zebra Finch  Genome size (picograms)  5.0  2.3  2.6  2.5  1.6  1.2  Mb of sequence surveyed  1.5  1.6  2.5  2.5  2.3  3.7  Raw total protein hits  1048  421  987  1076  128  451  Total hits ≤ e− 10 per Mb  87.3  42.5  32.8  20.4  8.7  4.3  Total hits ≤ e− 10  131  68  82  51  20  16  CR1  25  18  24  23  18  16  Dong-R4  —  2  —  —  —  —  Jockey  4  1  —  1  —  —  L1  29  7  —  —  —  —  L1(Tx1)  2  —  1  3  —  —  L2  22  10  6  —  —  —  LOA  1  —  —  —  —  —  Penelope  2  1  3  1  —  —  R2  —  —  1  —  —  —  RTE-BovB  11  5  5  3  —  —  RTE-RTE  2  —  2  —  —  —  LTR/Copia  3  8  1  —  —  —  LTR/DIRS1  8  2  5  —  —  —  LTR/ERV1  1  —  18  14  —  —  LTR/ERVL  —  —  —  1  2  —  LTR/Gypsy  15  14  15  4  —  —  LTR/Gypsy (Gmr1)  —  —  1  1  —  —  LTR/Ngaro  6  —  —  —  —  —  View Large The exclusive presence of CR1 non-LTR LINEs in emu and zebra finch is predicted by the flat, largely inactive retroposon landscape of the chicken genome assembly (ICGSC, 2004; Wicker et al., 2004) and initial investigation of CR1s in other avian species (Watanabe et al., 2006). Given the near extinction of CR1 activity observed in the chicken, it is noteworthy that an elevated diversity of CR1 types were found in the basal ratite, emu, as well as in the avian sister group represented by the alligator. The large diversity of repeat types in squamates and turtles relative to archosaurs is consistent with previous detection of Bov-B–derived SINEs in squamates (Piskurek et al., 2006), PsCR1-derived polIII SINEs in turtles (Sasaki et al., 2004), and a diverse assemblage of DeuSINEs of apparent mixed ancestry detected in both birds and mammals (Nishihara et al., 2006). Many of these diverse SINE groups exhibit 3' tail or central domain sequence homology with ancient LINE-2 or CR1-like lineages that likely predate the existence of the common amniote ancestor around 310 Mya (Kajikawa et al., 1997; Nishihara et al., 2006), whereas some SINEs detected in lizards, snakes, and turtles, such as those apparently derived from Bov-B and non-psCR1 LINEs, exhibit sequence divergence profiles that suggest they are probably ≤200 Myr of age (Piskurek et al., 2006; Sasaki et al., 2004). Alligator and turtle genome sizes are roughly one-third smaller than human and slightly more than twice the size of the passerine zebra finch, which is similar to chicken but smaller than the more phylogenetically basal ratite emu (Table 1). Anolis has a genome size on the order of 12% smaller than the alligator and painted turtle and is close to the average size across nonavian reptiles. It is thus interesting that Anolis has a higher diversity of detected repeats per megabase than either alligator or turtle. Sphenodon, on the other hand, is both an ancient species and has roughly twice the haploid DNA content of the other nonavian species examined. It exhibits more than twice the number of repeat types per megabase than Anolis and over twenty times more than the zebra finch. It must be emphasized that the phylogenetic position of turtles remains uncertain. Establishing the phylogenetic position of turtles with better certainty will have substantial impact on the interpretation of character change through the course of vertebrate genome evolution—including retroposon dynamics. By collecting a large sampling of gene and noncoding sequences from these lineages, phylogenomic research promises to shed considerable light on this issue. Several recent molecular analyses (Hedges and Poling, 1999; Janke et al., 2001; Zardoya and Meyer, 1998) and one from morphology (Reipell and deBraga, 1996) suggested that turtles are not the sister group of, but are nested within, the diapsids, in some cases as the sister group to birds and crocodilians, in others as sister to crocodilians. Subsequent analyses have revealed weaknesses in both molecular and morphological characters linking turtles to various groups (Cao et al., 2000; Lee, 2001; Mindell et al., 1999; Wilkinson et al., 1997), and in light of this, turtles are shown as sister to archosaurs on the tree of relationships in Figure 1. The intermediate abundance of detected repeat types in the turtle (Table 1) maps consistently with the relationships in Figure 1 if we assume a gradual loss of repeat diversity took place from a relatively large, repeat-rich common amniote ancestral genomic condition some 310 Mya (Shedlock et al., 2006a). Profiles of Conserved CR1 RT Amino Acid Domains CR1 elements were originally described by Stumph et al. (1981) based on chicken-mammal homologous sequence comparisons, and the family was precisely delineated by Silva and Burch (1989) based on characteristic 3' target site duplications of the octamer NATTCTRT. Subsequently, 11 major subfamilies of CR1s have been described in the chicken alone (ICGSC, 2004) and CR1-like elements have been detected in all classes of vertebrates and exhibit a remarkable phylum-wide taxonomic distribution (Kajikawa et al., 1997; Vandergon and Reitman, 1994; Watanabe et al., 2006). A cartoon of the generic full-length CR1 LINE structure is shown in Figure 2a. Intact, active, full-length elements are far less common in the genome than the vast majority of CR1 copies that experience variable length 5' truncation of ORF-1 products upon insertion via target-primed reverse transcription (TPRT; Luan et al., 1993) and are thus inactivated. The conserved 3' RT domains are more frequently detected due to the prevalence of 5' truncation and are the target region for comparative investigation in the present study. Although RT domains 1 to 7 are commonly used as reference sequence (Malik et al., 1999; Xiong and Eickbush, 1990), RT domains 8 and 9 have been given much less attention in the literature despite the detailed characterization of 3' untranslated region (UTR) homology between SINEs and partner LINEs that facilitates SINE nonautonomous amplification (Kajikawa and Okada, 2002; Ohshima et al., 1996). Consequently, the short CR1 composite sequences of two squamates referenced in previous studies of CR1s (e.g., Anolis carolinensis, Genbank accension number L31503; Okinawan pit viper Trimeresurus flavorviridis, D13384; Kajikawa et al., 1997; Vandergon and Reitman, 1994) do not BLAST significantly to the conserved 3' RT domains of newly isolated reptile CR1s examined here. Figure 2 View largeDownload slide (a) Cartoon of full-length CR1 LINE structure indicating typical length in kilobases, 5' promoter-like region, annotated domains for nucleic acid binding, endonuclease, and reverse transcriptase in the two open reading frames (ORFs) and 3' untranslated tail region (UTR). (b) Grayscale multiple protein sequence alignment of amino acid residues spanning conserved CR1 3' reverse transcriptase (RT) domains among eight reptile species. The matrix has 123 sites corresponding to the 3' ORF-2 product region boxed on the cartoon in (a) and includes a subset of 25 representative sequences including published data for chicken, the side-necked turtle, and human (ICGSC, 2004; Jurka et al., 2005; Kajikawa et al., 1997; Vandergon and Reitman, 1994; Wicker et al., 2004). The scale of sequence conservation at each site is indicated below the alignment. The most conserved sites are contained within domains 8 and 9 and annotated by asterisks and dots on top of the alignment. Numbers of residues in each sequence are listed on the right. Sequence names cross-reference with Genbank accessions listed in Supplementary Materials (http://www.systematicbiology.org); CH and HS sequences listed are registered in Repbase (Jurka et al., 2005) under the names shown. Two-letter taxon designations are listed below (b). Figure 2 View largeDownload slide (a) Cartoon of full-length CR1 LINE structure indicating typical length in kilobases, 5' promoter-like region, annotated domains for nucleic acid binding, endonuclease, and reverse transcriptase in the two open reading frames (ORFs) and 3' untranslated tail region (UTR). (b) Grayscale multiple protein sequence alignment of amino acid residues spanning conserved CR1 3' reverse transcriptase (RT) domains among eight reptile species. The matrix has 123 sites corresponding to the 3' ORF-2 product region boxed on the cartoon in (a) and includes a subset of 25 representative sequences including published data for chicken, the side-necked turtle, and human (ICGSC, 2004; Jurka et al., 2005; Kajikawa et al., 1997; Vandergon and Reitman, 1994; Wicker et al., 2004). The scale of sequence conservation at each site is indicated below the alignment. The most conserved sites are contained within domains 8 and 9 and annotated by asterisks and dots on top of the alignment. Numbers of residues in each sequence are listed on the right. Sequence names cross-reference with Genbank accessions listed in Supplementary Materials (http://www.systematicbiology.org); CH and HS sequences listed are registered in Repbase (Jurka et al., 2005) under the names shown. Two-letter taxon designations are listed below (b). Figure 2b presents an alignment of amino acid residues spanning a portion of the 3' end of the CR1 ORF2 region for eight species representing all major reptile clades, including published chicken CR-1 subfamily sequences (ICGSC, 2004; Jurka et al., 2005; Vandergon and Reitman, 1994), the PsCR1 sequence (Kajikawa et al., 1997), and a CR1-like element found in the human genome (Jurka et al., 2005). The extent of variation at each of the sites in the matrix is indicated by the histogram below the alignment and clearly indicates peaks of sequence conservation within the two distinct RT domains 8 and 9. A table of pairwise distances for the 25 sequences in this alignment is included in Supplementary Materials. Mean character differences between host species is smallest for tuatara vs. Anolis (0.14), and greatest for emu versus alligator (0.58), with moderate values observed between chicken subfamilies and non-avian species, including human (0.26–0.46). This range is similar to that observed among birds (0.29–0.41), but as much as 2.5 times greater than the difference among only chicken subfamilies (0.17). Although this protein sequence represents only a limited subsample of data, the overall pattern of variation in RT domain amino acid residues supports the idea that CR1 diversity in reptiles includes both young, lineage-specific elements as well as a significant number of ancient elements shared among host species that predate the amniote common ancestor. Phylogenetic Structure among CR1 3' ORF-2 Sequences The evolutionary pattern of reptilian CR1 inheritance was further evaluated by a phylogenetic analysis of 69 LINE element DNA sequences extending outward from the 3' RT core domains 8 and 9. The character matrix included 1046 sites and was subjected to 10 million generations of Bayesian inference as described in Materials and Methods. The distribution of log likelihoods for all trees is plotted in Figure S1 of Supplementary Materials and indicates that the 10% “burn-in” component completely captures all iterations prior to convergence. A phylogram illustrating branching patterns observed among CR1 element sequences is presented in Figure 3a, and the consensus tree of all 9000 trees with branch nodes supported by ≥ 50% posterior probability is shown in Figure 3b. The tree structure of relationships in Figure 3a shows multiple bifurcations of relatively short and long branch lengths within several distinct clades, such as those of zebra finch, emu, alligator, and turtle, suggesting the presence of multiple active elements of different age across the tree. The topology largely parallels the expected phylogeny of amniotes in that the majority of avian species, including published chicken CR1 subfamilies, are derived in the tree to the exclusion of nonavian taxa. Several basal clades with high probabilities are apparent for turtle and alligator elements with Anolis and tuatara sequences forming a monophyletic group. The published PsCR1 sequence also clusters with a subgroup of painted turtle elements with 100% probability. The most obvious exception to this structure is a subset of 10 elements of mixed host species association that form two derived assemblages at the top of the tree with long branch lengths in Figure 3a and high posterior probability in Figure 3b. The repeat type based on RepeatProteinMasker classification of each of these sequences is indicated next to their taxon labels in Figure 3a and shows a pattern of mixed association with chicken subfamilies C, E, and Y and the PsCR1 turtle LINE. One possibility is that these elements have experienced an increased rate of sequence evolution that now distinguishes them as an evolutionarily distinct assemblage. However, the long branch lengths and mixed species and subfamily associations of this group caution that they may require additional data to properly classify and place accurately within a phylogenetic context. Figure 3 View largeDownload slide (a) Phylogram illustrating branch length patterns generated by Bayesian analysis of 1046 aligned nucleotide positions among 68 reptilian CR1 3' RT sequences, rooted with data for Homo sapiens (LnL = −23,050.319). Parameter settings are listed in Materials and Methods. Two-letter taxon designations are listed below (b). Repeat classifications for taxa near the top of the tree are in parentheses. (b) Consensus of 9000 MrBayes trees generated for the same data set analyzed in (a) subsampled from 10 million iterations minus a 10% burn-in component. The distribution of likelihood values for all trees is plotted in Figure S1 in Supplementary Materials (http://www.systematicbiology.org). Posterior probabilities of relationships > 50% are indicated at nodes. After the suggestion of Malik et al. (1999), a set of novel clades with >70% support that do not group with known CR1 subfamily sequences are annotated with bold font and brackets. Am, A lligatorm ississippiensis; Sp, S phenodonp unctatus; As, A noliss maragdinus; Cp, C hrysemysp icta. Figure 3 View largeDownload slide (a) Phylogram illustrating branch length patterns generated by Bayesian analysis of 1046 aligned nucleotide positions among 68 reptilian CR1 3' RT sequences, rooted with data for Homo sapiens (LnL = −23,050.319). Parameter settings are listed in Materials and Methods. Two-letter taxon designations are listed below (b). Repeat classifications for taxa near the top of the tree are in parentheses. (b) Consensus of 9000 MrBayes trees generated for the same data set analyzed in (a) subsampled from 10 million iterations minus a 10% burn-in component. The distribution of likelihood values for all trees is plotted in Figure S1 in Supplementary Materials (http://www.systematicbiology.org). Posterior probabilities of relationships > 50% are indicated at nodes. After the suggestion of Malik et al. (1999), a set of novel clades with >70% support that do not group with known CR1 subfamily sequences are annotated with bold font and brackets. Am, A lligatorm ississippiensis; Sp, S phenodonp unctatus; As, A noliss maragdinus; Cp, C hrysemysp icta. Two contrasting phylogenetic tree shapes are diagrammed in Figure 4, to illustrate alternative modes of retroelement evolution based on single versus multiple active source element models. The branching pattern for CR1 element sequences analyzed here clearly reflects the pattern in Figure 4b, whereas the tree shape in Figure 4a has been observed for LINE 1 elements in mammals, where successive subfamilies have gone extinct shortly after diverging from a common lineage (Adey et al., 1994; Boissinot et al., 2000; Khan et al., 2006). Results of the present phylogenetic analysis reveal a bifurcating process of diversification from multiple active progenitors where young elements of recent common ancestry can be found together with more widely divergent lineages of much older age. This process has resulted in a diverse CR1 landscape in reptilian host genomes indicated by earlier studies of subfamily structure of mostly avian elements (Vandergon and Reitman, 1994) and elaborated here for major nonavian reptile clades. This is reflected in Figure 3 by highly supported distinct clades of nonavian CR1s that do not cluster with either chicken subfamilies or the PsCR1 sequence. Each of these new clades with greater than 70% statistical support represents a putative subfamily and has been named according to the suggestion of Malik et al. (1999) and annotated in Figure 3b by bold font and brackets as described in the figure legend. Figure 4 View largeDownload slide Differences in tree shapes illustrated by contrasting modes of retroelement evolution. Tree in (a) illustrates pattern produced by persistence of a single, relatively old active lineage with rapid extinction of each young diverged lineage. Tree in (b) illustrates a pattern of multiple active source elements of various age, as reflected by both short and long branch lengths among different clades. Figure 4 View largeDownload slide Differences in tree shapes illustrated by contrasting modes of retroelement evolution. Tree in (a) illustrates pattern produced by persistence of a single, relatively old active lineage with rapid extinction of each young diverged lineage. Tree in (b) illustrates a pattern of multiple active source elements of various age, as reflected by both short and long branch lengths among different clades. Relevance of CR1 Gene Trees to the Use of CR1s for Inferring Species Phylogeny It is important to bear in mind the distinction between the genealogies of CR1 elements themselves and the use of insertion patterns of CR1 elements at independent orthologous loci present in different host genomes to infer species phylogeny. The phylogenetic analysis presented here includes CR1 sequences from many different loci and in effect produces a tree of multi-gene families that have not been completely sampled in the host genomes examined in addition to the limited taxon sampling of reptile species. Thus, the discordance in CR1 gene trees in Figure 3 should not be confused with an indication of species phylogeny. Given the long evolutionary history of CR1s and their diverse subfamily structure, it is expected that CR1 gene trees will likely show significant discordance with their host species trees, even though insertion patterns at orthologous CR1 loci among reptile species should still make excellent phylogenetic markers for advancing reptilian systematics (Shedlock et al., 2004). So although the topologies of gene trees of CR1 elements may not reliably indicate species relationships, they can provide valuable information about the age structure and life span of particular active CR1 subfamilies pertinent to the effective sampling of phylogenetically informative CR1s for othologous LINE insertion studies. Unlike the persistence of only a single active lineage in mammalian LINE 1, multiple active subfamilies of reptilian CR1s should provide numerous sources of both older and younger elements that have proliferated in copy number during different periods of reptilian diversification and therefore can be matched to a variety of different systematic problems of interest at different taxonomic levels of divergence. The high copy numbers of CR1s in reptilian genomes implies that there should also be a substantial density and diversity of partner CR1-derived SINE elements in reptile genomes that can add to the volume of potentially informative phylogenetic markers. Taken together, the variable ages of reptilian retroelements and their high copy numbers should bolster efforts to gather sufficient numbers of informative loci to tackle large, difficult systematic projects, such as the higher-order relationships of modern birds. In such cases, lineage sorting effects and short internode distances among major clades can make it impractical to find enough useful LINE and SINE markers to fully resolve branches of a cladogram (Shedlock et al., 2004). Given the phylogenetic structure of CR1 elements apparent from this and other surveys, the prospects for CR1 retroelements providing a wealth of valuable genetic markers to advance reptilian systematics seems promising. It is anticipated that the forthcoming genome sequence assemblies of Anolis and the zebra finch will greatly advance our understanding of reptilian CR1 evolution by providing two comprehensive annotated libraries of reference sequences from near opposite ends of the reptilian tree of life. Moreover, additional BAC-clone sequences of non-model but phylogenetically unique species such as Sphenodon, as well as highly derived forms such as the garter snake, Thamnophis sirtalis, will surely enrich our comparative understanding of yet unknown reptilian genomic diversity. Acknowledgements I would like to thank Miyako Fujiwara for careful technical assistance with sequence editing and alignment, Charles Chapus for critical computational and informatics support, Chris Organ for providing expertise with comparative methods analysis, and Corrie Saux Moreau for help with running MrBayes on the Harvard OEB Cluster. Two referees provided numerous helpful comments that significantly improved the manuscript. This paper would not have been possible without the enthusiastic support of Scott Edwards, Roderic Page, Deborah Ciszek, members of the SBB Council, and the participants of the 2005 symposium on Genome Analysis and the Molecular Systematics of Retroelements held in Fairbanks, Alaska. Funding was provided in part by the Society of Systematic Biologists and by Harvard University. References Adey N. B.,  Schichman S. A.,  Graham D. K.,  Peterson S. N.,  Edgell M. H.,  Hutchison C. A.3rd.  Rodent L1 evolution has been driven by a single dominant lineage that has repeatedly acquired new transcriptional regulatory sequences,  Mol. Biol. Evol ,  1994, vol.  11 (pg.  778- 789) Google Scholar PubMed  Batzer M.,  Deininger P. L..  Alu repeats and human genomic diversity,  Nat. Rev. Genet. ,  2001, vol.  3 (pg.  370- 379) Google Scholar CrossRef Search ADS   Boissinot S.,  Chevret P.,  Furano A. V..  L1 (LINE-1) Retrotransposon evolution and amplification in recent human history,  Mol. Biol. Evol. ,  2000, vol.  17 (pg.  915- 928) Google Scholar CrossRef Search ADS PubMed  Burch J. B. E.,  Davis D. L.,  Haas N. B..  Chicken repeat 1 elements contain a pol-like open reading frame and belong to the non-long terminal repeat class of retrotransposons,  Proc. Natl. Acad. Sci. USA ,  1993, vol.  90 (pg.  8199- 8203) Google Scholar CrossRef Search ADS   Cao Y.,  Sorenson M. D.,  Kumazawa Y.,  Mindell D. P.,  Hasegawa M..  Phylogenetic position of turtles among amniotes: Evidence from mitochondrial and nuclear genes,  Gene ,  2000, vol.  259 (pg.  139- 148) Google Scholar CrossRef Search ADS PubMed  Edwards S. V.,  Jennings W. B.,  Shedlock A. M..  Phylogenetics of modern birds in the era of genomics,  Proc. R Soc. Lond. B ,  2005, vol.  272 (pg.  979- 992) Google Scholar CrossRef Search ADS   Elgar G.,  Clark M. S.,  Meek S.,  Smith S.,  W. S..  Generation and analysis of 25 Mb of genomic DNA from the pufferfish Fugu rubripes by sequence scanning,  Genome Res. ,  1999, vol.  9 (pg.  960- 971) Google Scholar CrossRef Search ADS PubMed  Garland T.Jr.,  Dickerman A. W.,  Janis C. M.,  Jones J. A..  Phylogenetic analysis of covariance by computer simulation,  Syst. Biol. ,  1993, vol.  42 (pg.  265- 292) Google Scholar CrossRef Search ADS   Garland T.Jr.,  Adolph S. C..  Why not to do two species comparative studies: Limitations on inferring adaptation,  Physiol. Zool. ,  1994, vol.  67 (pg.  797- 828) Google Scholar CrossRef Search ADS   Garland T.Jr.,  Ives R. A..  Using the past to predict the present: Confidence intervals for regression equations in phylogenetic comparative methods,  Am. Nat. ,  2000, vol.  155 (pg.  346- 364) Google Scholar CrossRef Search ADS PubMed  Gilbert N.,  Labuda D..  CORE-SINEs: Eukaryotic short interspersed retroposing elements with common sequence motifs,  Proc. Natl. Acad. Sci. USA ,  1999, vol.  96 (pg.  2869- 2874) Google Scholar CrossRef Search ADS   Gregory T. R.. ,  Animal Genome Size Database ,  2005  http://www.genomesize.com Hedges S. B.,  Poling L. L..  A molecular phylogeny of reptiles,  Science ,  1999, vol.  283 (pg.  998- 1001) Google Scholar CrossRef Search ADS PubMed  Huelsenbeck J. P.,  Ronquist F..  MrBayes: Bayesian inference of phylogeny,  Bioinformatics ,  2001, vol.  17 (pg.  754- 755) Google Scholar CrossRef Search ADS PubMed  Hughes A.,  Piontkivska H..  DNA repeat arrays in chicken and human genomes and the adaptive evolution of avian genome size,  BMC Evol. Biol. ,  2005, vol.  5 pg.  12  Google Scholar CrossRef Search ADS PubMed  Hughes A. L.,  Hughes M. K..  Small genomes for better flyers,  Nature ,  1995, vol.  377 pg.  391  Google Scholar CrossRef Search ADS PubMed  ICGSC, International Chicken Genome Sequencing Consortium Sequence and comparative analysis of the chicken genome provide unique perspectives on vertebrate evolution,  Nature ,  2004, vol.  432 (pg.  695- 716) CrossRef Search ADS PubMed  Jaillon O.,  Aury J. M.,  Brunet F.,  Petit J. L.,  Stange-Thomann N.,  Mauceli E.,  Bouneau L.,  Fischer C.,  Ozouf-Costaz C.,  Bernot A., et al.  Genome duplication in the teleost fish Tetraodon nigroviridis reveals the early vertebrate proto-karyotype,  Nature ,  2004, vol.  431 (pg.  946- 957) Google Scholar CrossRef Search ADS PubMed  Janke A.,  Erpenbeck D.,  Nilsson M.,  Arnason U..  The mitochondrial genomes of the iguana (Iguana iguana) and the caiman (Caiman crocodylus): Implications for amniote phylogeny,  Proc. R. Soc. Lond. B ,  2001, vol.  268 (pg.  623- 631) Google Scholar CrossRef Search ADS   Jurka J.,  Kapitonov V. V.,  Pavlicek A.,  Klownowski P.,  Kohany O.,  Walichiewicz J..  Repbase Update, a database of eukaryotic repetitive elements,  Cytogenet. Genome Res. ,  2005, vol.  110 (pg.  462- 467)  http://www.girinst.org/repbase/index.html Google Scholar CrossRef Search ADS PubMed  Kajikawa M.,  Ohshima K.,  Okada N..  Determination of the entire sequence of turtle CR1: The first open reading frame of the turtle CR1 element encodes a protein with a novel zinc finger motif,  Mol. Biol. Evol. ,  1997, vol.  14 (pg.  1206- 1217) Google Scholar CrossRef Search ADS PubMed  Kajikawa M.,  Okada N..  LINEs mobilize SINEs in the Eel through a shared 3' sequence,  Cell ,  2002, vol.  111 (pg.  433- 444) Google Scholar CrossRef Search ADS PubMed  Kazazian H. H. J..  Mobile elements: Drivers of genome evolution,  Science ,  2004, vol.  303 (pg.  1626- 1632) Google Scholar CrossRef Search ADS PubMed  Khan H.,  Smit A.,  Boissinot S..  Molecular evolution and tempo of amplification of human LINE-1 retrotransposons since the origin of primates,  Genome Res. ,  2006, vol.  16 (pg.  78- 87) Google Scholar CrossRef Search ADS PubMed  Kumar S.,  Hedges B..  A molecular timescale for vertebrate evolution,  Nature ,  1998, vol.  392 (pg.  917- 920) Google Scholar CrossRef Search ADS PubMed  Lander E. S.,  Linton L. M.,  Birren B.,  Nusbaum C.,  Zody M. C.,  Baldwin J.,  Devon K.,  Dewar K.,  Doyle M.,  FitzHugh W., et al.  Initial sequencing and analysis of the human genome,  Nature ,  2001, vol.  409 (pg.  860- 921) Google Scholar CrossRef Search ADS PubMed  Lee M. S. Y..  Molecules, morphology and the monophyly of diapsid reptiles,  Contr. Zool. ,  2001, vol.  70 (pg.  1- 22) Lovsin N.,  Gubensek F.,  Kordi D..  Evolutionary dynamics in a novel L2 clade of non-LTR retrotransposons in Deuterostomia,  Mol. Biol. Evol. ,  2001, vol.  18 (pg.  2213- 2224) Google Scholar CrossRef Search ADS PubMed  Luan D. D.,  Korman H.,  Jakubczak J. L.,  Eickbush T..  Reverse transcription of R2Bm RNA is primed by a nick at the chromosomal target site: A mechanism for non-LTR retrotransposition,  Cell ,  1993, vol.  72 (pg.  595- 605) Google Scholar CrossRef Search ADS PubMed  Maddison W. P.,  Maddison D. R.. ,  MacClade 4: Interactive analysis of phylogeny and character evolution ,  2000 Sunderland, Massachusetts Sinauer Associates Maddison W. P.,  Maddison D. R.. ,  Mesquite: A modular system for evolutionary analysis. Version 1.06 ,  2005  http://mesquiteproject.org Malik H. S.,  Burke W. D.,  Eickbush T..  The age and evolution of non-LTR retrotransposable elements,  Mol. Biol. Evol. ,  1999, vol.  16 (pg.  793- 805) Google Scholar CrossRef Search ADS PubMed  Mindell D. P.,  Sorenson M. D.,  Dimcheff D. E.,  Hasegawa M.,  Ast J. C.,  Yuri T..  Interordinal relationships of birds and other reptiles based on whole mitochondrial genomes,  Syst. Biol. ,  1999, vol.  48 (pg.  138- 152) Google Scholar CrossRef Search ADS PubMed  Nagy K. A.,  Girard I. A.,  Brown T. K..  Energetics of free-ranging mammals, reptiles, and birds,  Annu. Rev. Nutr. ,  1999, vol.  19 (pg.  247- 277) Google Scholar CrossRef Search ADS PubMed  Nishihara H.,  Smit A. F. A.,  Norihiro O..  Functional noncoding sequences derived from SINEs in the mammalian genome,  Genome Res. ,  2006, vol.  16 (pg.  864- 874) Google Scholar CrossRef Search ADS PubMed  Ohshima K.,  Hamada M.,  T. Y.,  Okada N..  The 3' ends of short interspersed repetitive elements are derived from the 3' ends of long interspersed repetitive elements,  Mol. Cell. Biol. ,  1996, vol.  16 (pg.  3756- 3764) Google Scholar CrossRef Search ADS PubMed  Okada N.,  Shedlock A. M.,  Nikaido M..  Miller W. J.,  Capy P..  Retroposon mapping in molecular systematics,  Mobile genetic elements ,  2004 Totowa, New Jersey Humana Press(pg.  189- 226) Google Scholar CrossRef Search ADS   Olmo E..  Reptiles: A group in transition in the evolution of genome size and the nucleotypic effect,  Cytogenet. Genome Res. ,  2003, vol.  101 (pg.  166- 171) Google Scholar CrossRef Search ADS PubMed  Pagel M..  Inferring the historical patterns of biological evolution,  Nature ,  1999, vol.  401 (pg.  877- 884) Google Scholar CrossRef Search ADS PubMed  Piskurek O.,  Austin C. C.,  Okada N..  Sauria SINEs: Novel short interspersed transposable lements that are widespread in reptilian genomes,  J. Mol. Evol. ,  2006, vol.  62 (pg.  630- 644) Google Scholar CrossRef Search ADS PubMed  Primmer C. R.,  Raudsepp T.,  Chowdhary B. P.,  Moller A. P.,  Ellegren H..  Low frequency of microsatellites in the avian genome,  Genome Res. ,  1997, vol.  7 (pg.  471- 482) Google Scholar CrossRef Search ADS PubMed  Reipell O.,  deBraga M..  Turtles as diapsid reptiles,  Nature ,  1996pg.  384  RGSPC, Rat Genome Sequencing Project Consortium Genome sequence of the brown Norway rat yields insights into mammalian evolution,  Nature ,  2004, vol.  428 (pg.  493- 521) CrossRef Search ADS PubMed  Ronquist F.,  Huelsenbeck J. P..  MrBayes 3: Bayesian phylogenetic inference under mixed models,  Bioinformatics ,  2003, vol.  19 (pg.  1572- 1574) Google Scholar CrossRef Search ADS PubMed  Sasaki T.,  Takahashi K.,  Nikaido M.,  Miura S.,  Yasukawa Y.,  Okada N..  First application of the SINE (short interspersed repetitive element) method to infer phylogenetic relationships in reptiles: An example from the turtle superfamily Testudinoidea,  Mol. Biol. Evol. ,  2004, vol.  21 (pg.  705- 715) Google Scholar CrossRef Search ADS PubMed  Shedlock A. M.,  Botka C. W.,  Zhao S.,  Shetty J.,  Zhang T.,  Liu J. S.,  Deschavanne P. J.,  Edwards S. V..  Phylogenomics of non-avian reptiles and the structure of the ancestral amniote genome,  Proc. Natl. Acad. Sci. USA ,  2006  Submitted Shedlock A. M.,  Janes D.,  Edwards S. V..  Murphy W. J..  Amniote phylogenomics: Testing evolutionary hypotheses with BAC library scanning and targeted clone analysis of large-scale DNA sequences from reptiles,  Phylogenomics ,  2006 Totowa, New Jersey Humana Press  in press Google Scholar CrossRef Search ADS   Shedlock A. M.,  Okada N..  SINE insertions: Powerful tools for molecular systematics,  Bioessays ,  2000, vol.  22 (pg.  148- 160) Google Scholar CrossRef Search ADS PubMed  Shedlock A. M.,  Takahashi K.,  Okada N..  SINEs of speciation: Tracking lineages with retroposons,  Trends Ecol. Evol. ,  2004, vol.  19 (pg.  545- 553) Google Scholar CrossRef Search ADS PubMed  Silva R.,  Burch J. B..  Evidence that chicken CR1 elements represent a novel family of retroposons,  Mol. Cell. Biol. ,  1989, vol.  9 (pg.  3563- 3566) Google Scholar CrossRef Search ADS PubMed  Smit A. F. A..  Interspersed repeats and other mementos of transposable elements in mammalian genomes,  Curr. Opin. Genet. Dev. ,  1999, vol.  9 (pg.  657- 663) Google Scholar CrossRef Search ADS PubMed  Smit A. F. A.. ,  RepeatMasker version 3.1.5 ,  2006  http://www.repeatmasker.org Smit A. F. A.,  Riggs A. D..  MIRs are classic, tRNA-derived SINEs that amplified before the mammalian radiation,  Nucleic Acids Res. ,  1995, vol.  23 (pg.  98- 102) Google Scholar CrossRef Search ADS PubMed  Stumph W. E.,  Kristo P.,  Tsai M. J.,  O'Malley B. W..  A chicken middle-repetitive DNA sequence which shares homology with mammalian ubiquitous repeats,  Nucleic Acids Res. ,  1981, vol.  9 (pg.  5383- 5397) Google Scholar CrossRef Search ADS PubMed  Swofford D. L.. ,  PAUP*: Phylogenetic analysis using parsimony (*and other methods) v.4.0b ,  1999 Sunderland, Massachusetts Sinauer Associates Thompson J.,  Higgins D.,  Gibson T..  CLUSTAL W: Improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice,  Nucleic Acids Res. ,  1994, vol.  22 (pg.  4673- 4680) Google Scholar CrossRef Search ADS PubMed  Vandergon T. L.,  Reitman M..  Evolution of chicken repeat 1(CR1) elements: evidence for ancient subfamilies and multiple progenitors,  Mol. Biol. Evol. ,  1994, vol.  11 (pg.  886- 898) Google Scholar PubMed  Waltari E.,  Edwards S. V..  Evolutionary dynamics of intron size, genome size, and physiological correlates in archosaurs,  Am. Nat. ,  2002, vol.  160 (pg.  539- 552) Google Scholar CrossRef Search ADS PubMed  Watanabe M.,  Nikaido M.,  Tsuda T.,  Inoko H.,  Mindell D. P.,  Murata K.,  Okada N..  The rise and fall of the CR1 subfamily in the lineage leading to penguins,  Gene ,  2006, vol.  365 (pg.  57- 66) Google Scholar CrossRef Search ADS PubMed  Waterston R. H.,  Lindblad-Toh K.,  Birney E.,  Rogers J.,  Abril J. F.,  Agarwal P.,  Agarwala R.,  Ainscough R.,  Alexandersson M.,  An P., et al.  Initial sequencing and comparative analysis of the mouse genome,  Nature ,  2002, vol.  420 (pg.  520- 562) Google Scholar CrossRef Search ADS PubMed  Weiner A. M.,  Deininger P. L.,  Efstratiadis A..  Nonviral retroposons: Genes, pseudogenes, and transposable elements generated by the reverse flow of genetic information,  Ann. Rev. Biochem. ,  1986, vol.  55 (pg.  631- 661) Google Scholar CrossRef Search ADS   Wicker T.,  Robertson J. S.,  Schulze S. R.,  Feltus F. A.,  Magrini V.,  Morrison J. A.,  Mardis E. R.,  Wilson R. K.,  Peterson D. G.,  Paterson A. H.,  Ivarie R..  The repetitive landscape of the chicken genome,  Genome Res. ,  2004, vol.  15 (pg.  126- 136) Google Scholar CrossRef Search ADS PubMed  Wilkinson M.,  Thorley J.,  Benton M. J..  Uncertain turtle relationships,  Nature ,  1997, vol.  387 pg.  466  Google Scholar CrossRef Search ADS   Xiong Y.,  Eickbush T. H..  Origin and evolution of retroelements based upon their reverse transcriptase sequences,  EMBO J. ,  1990, vol.  9 (pg.  3353- 3362) Google Scholar PubMed  Zardoya R.,  Meyer A..  Complete mitochondrial genome suggests diapsid affinities of turtles,  Proc. Natl. Acad. Sci. USA ,  1998, vol.  95 (pg.  14226- 14231) Google Scholar CrossRef Search ADS   © 2006 Society of Systematic Biologists

Journal

Systematic BiologyOxford University Press

Published: Dec 1, 2006

Keywords: Amniote CR1 LINE phylogenomics Reptilia retroelement RT domain

There are no references for this article.