Access the full text.
Sign up today, get DeepDyve free for 14 days.
E. Goldmuntz, Z. Wang, B. Roe, M. Budarf (1996)
Cloning, genomic organization, and chromosomal localization of human citrate transport protein to the DiGeorge/velocardiofacial syndrome minimal critical region.Genomics, 33 2
S. Sze, P. Pevzner (1997)
Las Vegas algorithms for gene recognition: suboptimal and error-tolerant spliced alignmentJournal of computational biology : a journal of computational molecular cell biology, 4 3
J. Laporte, Ling-Jia Hu, C. Kretz, J. Mandel, P. Kioschis, J. Coy, S. Klauck, A. Poustka, N. Dahl (1996)
A gene mutated in X–linked myotubular myopathy defines a new putative tyrosine phosphatase family conserved in yeastNature Genetics, 13
(1996)
Evaluation of gene structure prediction algorithms
M. Budarf, J. Collins, W. Gong, B. Roe, Zhili Wang, L. Bailey, B. Sellinger, D. Michaud, D. Driscoll, Beverly Emanuel (1995)
Cloning a balanced translocation associated with DiGeorge syndrome and identification of a disrupted candidate geneNature Genetics, 10
R. Legouis, J. Hardelin, J. Levilliers, J. Claverie, S. Compain, V. Wunderle, P. Millasseau, D. Paslier, D. Cohen, D. Caterina, L. Bougueleret, H. Waal, G. Lutfalla, J. Weissenbach, C. Petit (1991)
The candidate gene for the X-linked Kallmann syndrome encodes a protein related to adhesion moleculesCell, 67
L. Selleri, M. Smith, A. Holmsen, A. Romo, S. Thomas, C. Paternotte, L. Romberg, Y. Wei, G. Evans (1995)
High-resolution physical mapping of a 250-kb region of human chromosome 11q24 by genomic sequence sampling (GSS).Genomics, 26 3
J. Parrish, D. Nelson (1993)
Methods for finding genes. A major rate-limiting step in positional cloning.Genetic analysis, techniques and applications, 10 2
(1996)
Evaluation of gene structure prediction
Yanghong Gu, Ying Shen, R. Gibbs, D. Nelson (1996)
Identification of FMR2, a novel gene associated with the FRAXE CCG repeat and CpG islandNature Genetics, 13
S. Parimoo, S. Patanjali, R. Kolluri, H. Xu, H. Wei, S. Weissman (1995)
cDNA selection and other approaches in positional cloning.Analytical biochemistry, 228 1
M. Gelfand, T. Astakhova, M. Roytberg (1996)
An Algorithm for Highly Specific Recognition of Protein-coding RegionsGenome Informatics, 7
are partially supported by the Russian State Program 'Human Genome
A. Meindl, K. Dry, K. Herrmann, F. Manson, A. Ciccodicola, A. Edgar, Maria Carvalho, H. Achatz, H. Hellebrand, A. Lennon, C. Migliaccio, K. Porter, Eberhart Zrenner, A. Bird, M. Jay, Birgit Lorenz, B. Wittwer, M. D'urso, T. Meitinger, A. Wright (1996)
A gene (RPGR) with homology to the RCC1 guanine nucleotide exchange factor is mutated in X–linked retinitis pigmentosa (RP3)Nature Genetics, 13
G. Benian, T. Tinley, Xuexin Tang, M. Borodovsky (1996)
The Caenorhabditis elegans gene unc-89, required fpr muscle M-line assembly, encodes a giant modular protein composed of Ig and signal transduction domainsThe Journal of Cell Biology, 132
D. Church, L. Banks, A. Rogers, S. Graw, D. Housman, J. Gusella, A. Buckler (1993)
Identification of human chromosome 9 specific genes using exon amplification.Human molecular genetics, 2 11
A. Mironov, Michael Roytberg, P. Pevzner, M. Gelfand (1998)
Performance-guarantee gene predictions via spliced alignment.Genomics, 51 3
S. Mansfield, D. al-Shirawi, A. Ketchum, E. Newbern, D. Kiehart (1996)
Molecular organization and alternative splicing in zipper, the gene that encodes the Drosophila non-muscle myosin II heavy chain.Journal of molecular biology, 255 1
Peter Korning, Stefan Hebsgaard, Pierre Rouzé, Søren Brunak (1996)
Cleaning the GenBank Arabidopsis thaliana data set.Nucleic acids research, 24 2
J. Fickett, C. Tung (1992)
Assessment of protein coding measures.Nucleic acids research, 20 24
L. Brody, K. Abel, L. Castilla, F. Couch, D. McKinley, Guiying Yin, P. Ho, S. Merajver, S. Chandrasekharappa, Junzhe Xu, J. Cole, J. Struewing, J. Valdes, F. Collins, Barbara Weber (1995)
Construction of a transcription map surrounding the BRCA1 locus of human chromosome 17.Genomics, 25 1
M. Gelfand (1995)
Prediction of function in DNA sequence analysis.Journal of computational biology : a journal of computational molecular cell biology, 2 1
M. Gelfand, A. Mironov, P. Pevzner (1996)
Gene recognition via spliced sequence alignment.Proceedings of the National Academy of Sciences of the United States of America, 93 17
M. Timmermans, O. Das, J. Messing (1996)
Characterization of a meiotic crossover in maize identified by a restriction fragment length polymorphism-based method.Genetics, 143 4
..0 BIOINFORMATICS % $ Algorithms and software for support of gene identification experiments " # $ % ' () $ *)(+ )$ , $ + * -$ $ * .//0. , 1$+ ) % )$ 2 1$+ % $ +$$ ) $ %+$ $3 +$$ - ) $ 4 * 2 411545"16 $3 +$$ 1 * ./ , Abstract Availability: The programs are implemented as Web servers (GenePrimer and CASSANDRA) and can be reached at Motivation: Gene annotation is the final goal of gene http://www-hto.usc.edu/software/procrustes/ prediction algorithms. However, these algorithms frequently Contact: ssze@hto.usc.edu make mistakes and therefore the use of gene predictions for sequence annotation is hardly possible. As a result, biologists are forced to conduct time-consuming gene identification Introduction experiments by designing appropriate PCR primers to test cDNA libraries or applying RT-PCR, exon trapping/ampli- In the absence of accurate gene prediction programs, gene fication, or other techniques. This process frequently amounts identification and exon annotation in genomic DNA usually to ‘guessing’ PCR primers on top of unreliable gene amount to sequencing the corresponding mRNA. This predictions and frequently leads to wasting of experimental mRNA can be found by direct screening of cDNA libraries, efforts. Northern blot analysis or hybrid selection of cDNA. The li- Results: The present paper proposes a simple and reliable miting factor in these techniques is non-specific hybridiza- algorithm for experimental gene identification which by- tion (Hattier et al., 1995; Timmermans et al., 1996). The passes the unreliable gene prediction step. Studies of the problem of non-specific hybridization is usually addressed performance of the algorithm on a sample of human genes by pre-hybridization to repeats and restricting the analysis to indicate that an experimental protocol based on the algo- genomic DNA fragments already suspected to contain po- rithm’s predictions achieves an accurate gene identification tential exons found by zoo blotting, CpG island selection, with relatively few PCR primers. Predictions of PCR primers exon trapping, exon amplification, etc. (for a review, see Par- may be used for exon amplification in preliminary mutation rish and Nelson, 1993; Parimoo et al., 1995). Comparisons analysis during an attempt to identify a gene responsible for of different techniques for the identification of transcribed a disease. We propose a simple approach to find a short sequences in the particular case of chromosome 17 BRCA1 region from a genomic sequence that with high probability region containing more than 26 genes were carried out in overlaps with some exon of the gene. The algorithm is Hattier et al. (1995) and Brody et al. (1995). enhanced to find one or more segments that are probably A serious limitation of these techniques is the low signal- contained in the translated region of the gene and can be to-noise ratio in hybridization experiments and high false- used as PCR primers to select appropriate clones in cDNA positive rate of splicing-based exon amplification methods libraries by selective amplification. The algorithm is further (Church et al., 1993; North et al., 1993). Furthermore, some extended to locate a set of PCR primers that uniformly cover of the techniques cannot be applied to intronless or single-in- all translated regions and can be used for RT-PCR and tron genes (Parimoo et al., 1995). For example, these tech- further sequencing of (unknown) mRNA. niques failed during analysis of the FMR2 gene in the FRAXE fragile site on the X chromosome associated with mild mental retardation (Gu et al., 1996) and the RPGR gene To whom correspondence should be addressed mutated in X-linked retinitis pigmentosa (Meindl et al., Oxford University Press 14 Algorithms and software for gene identification support 1996). In addition, many of these methods are very labor is found by analyzing the coding potential of open reading intensive (Parrish and Nelson, 1993; Selleri et al., 1995). frames or chains of candidate exons. The most complicated experiments are necessary to se- An alternative strategy is in silico gene prediction. How- quence tissue-specific or low-copy mRNAs not represented ever, no existing gene recognition algorithm provides accu- in cDNA libraries. Such mRNAs are identified and amplified racy sufficient for gene identification and exon annotation. simultaneously by RT-PCR. We introduce the primer cover The accuracy of gene predictions drastically depends on the problem which models the primer selection in this case and availability of a related protein. If a related mammalian pro- attempts to find a small set of primers uniformly covering the tein of an analyzed human gene is known, the accuracy of (unknown) mRNA corresponding to the (known) genomic gene predictions in this fragment is as high as 97–99%, and sequence. In contrast to conventional gene recognition algo- it is 95, 93 and 91% for related plant, fungal and prokaryotic rithms, the primer cover algorithm helps a biologist to ident- proteins, respectively (Gelfand et al., 1996a; Mironov et al., ify a gene experimentally without attempting to predict all 1998; Sze and Pevzner, 1997). On the contrary, recognition exons explicitly. of genes having no relatives in sequence databases is diffi- cult, and the accuracy falls significantly for genes with many exons or with unusual codon usage (Burset and Guigó, Data 1996). The algorithms were tested on a set of 257 human genomic Although insufficient for final annotation, such predictions fragments containing non-homologous complete genes (Mi- are used in practice to decrease the noise and to limit the ex- ronov et al., 1998), and on a set of 133 single-gene Arabidop- perimental analysis to promising regions. In addition to the sis thaliana DNA fragments from Korning et al. (1996). above examples, this approach was used, in particular, to se- lect cDNAs of the gene for X-linked Kallmann syndrome Maximal open reading frames (Legouis et al., 1991), the gene for DiGeorge syndrome (Bu- darf et al., 1995; Goldmuntz et al., 1996), Caenorhabditis A genomic sequence can be read in three frames (in each elegans muscle-specific gene unc-89 (Benian et al., 1996), direction). An open reading frame is a region of a genomic as well as analysis of alternative splicing in the Drosophila sequence with no stop codon in frame. For each frame, stop gene zipper encoding non-muscle myosin II heavy chain codons partition a genomic sequence into non-overlapping (Mansfield et al., 1996). Computer predictions were also regions called maximal open reading frames (MORFs). Each used to perform single-strand conformation polymorphism translated exon in a genomic sequence is contained inside a mutation analysis in X-linked myotubular myopathy (La- MORF. Although not every MORF contains an exon, the porte et al., 1996). Thus, the use of computer predictions in longest MORF contains an exon in 72% of cases in our experimental practice is limited to the construction of oligo- human sample. nucleotide probes for Southern and Northern analyses, and Since a random sequence in a given frame contains a stop the construction of PCR primers for clone detection in cDNA codon approximately every 60 nucleotides, ‘random’ libraries or RT-PCR. Since the reliability of a predicted gene MORFs are relatively short. A typical genomic sequence is not known, PCR primers for experimental gene identifica- contains a large number of short MORFs which are unlikely tion are either selected at random, or guessed on top of unreli- to contain exons. Our algorithm discards short MORFs able exon predictions. This procedure often leads to wasting (<150 nucleotides) and tries to find reliable PCR primers in- of experimental efforts. side long MORFs. Although some short exons are lost after This paper describes a different approach to this problem. this procedure, it does not create serious problems since such Instead of trying to develop a universal gene prediction pro- exons can be recovered by PCR experiments with primers cedure, we use simple combinatorial techniques to make pre- from long MORFs. dictions needed in particular experimental schemes. We ana- lyze open reading frames to find regions including long po- Coding windows tential exons, thus reducing the noise level in hybridization experiments (zoo blotting, Southern and Northern analyses). Given a long MORF, we would like to find a short region The results of this step can be used for hybridization-based within this MORF that is likely to overlap with an exon. This analyses and preliminary mutation analysis if a disease gene region may be used later for PCR primer selection. A number is studied. They also serve as the base for further processing. of measures (coding potentials; reviewed in Fickett and If the cDNA libraries contain a clone corresponding to the Tung, 1992; Gelfand, 1995) correlate with the likelihood of analyzed gene, it can be identified more effectively not by the region being coding. We use the simplest definition of hybridization, but by PCR selection. To do that, a biologist coding potential requiring only information about codon needs a set of candidate PCR primers guaranteed to contain usage. Let f(abc) be the frequency of the codon abc in the a given number of primers to translated regions. Such a set learning sample. The coding potential of a fragment consist- 15 S.-H.Sze et al. Fig. 1. Percentage of human genes when the top n windows (150 nucleotides) are sufficient to get a window that overlaps an exon. Fig. 2. Percentages of human genes (with at least k exons) when the top n primers (20 nucleotides) are sufficient to have primers inside k exons. ing of n codons is S(a b c … a b c ) log f(abc). We 1 1 1 n n n i1 select a window with the highest coding potential within a contained in an exon in 96% of cases. With k increasing, MORF and assume that this window overlaps an exon. Alter- many more primers are needed to detect more exons and the natively, a window with the largest difference between the above approach does not always find primers inside all coding potential in one frame and the maximum coding po- exons, leading to exon misses. No exons were missed in 37% tential in the other two frames (difference criteria) can be of cases, one exon was missed in 28% of cases, while two selected. Computational experiments indicated that the dif- exons were missed in 16% of cases. In 91% of cases, no more ference criteria give slightly better results. than three exons were missed. In 99% of cases, less than An algorithm for finding windows that are likely to overlap eight exons were missed. There are two exceptional cases an exon is as follows. Given a genomic sequence, generate with 14 and 38 exons missing (collagen and retinoblastoma, all long MORFs (150 nucleotides). For each such MORF, respectively). A large number of missing exons is an indica- find the best window (of length 150 nucleotides) with respect tion of the complexity of the gene recognition problem (gene to the difference criteria. Return the windows in decreasing recognition programs frequently miss exons). This problem order of their difference values. Figure 1 shows the percen- is overcome by designing a primer cover of genomic se- tage of human DNA fragments when the top n windows are quences which can be used for experimental gene identifica- sufficient to get a window that overlaps an exon. One win- tion. dow was sufficient in 90% of cases, whereas four windows were sufficient in all cases with one exception. The exception Primer cover is the retinoblastoma gene (of 180 kb) which required 21 windows. Let be a set of primers in a genomic sequence. Some of these primers may be contained in the corresponding mRNA (valid primers), while others may not. For a valid primer p, PCR primers define left(p) as the valid primer preceding p in the cDNA or Since coding windows are likely to overlap with exons, we (if p is the leftmost valid primer) as a fictitious primer corre- select the central 20-nucleotide region of a coding window as sponding to the beginning of the cDNA. Similarly, define candidate PCR primers likely to be inside an exon. Figure 2 right(p) as the valid primer following p in the cDNA or (if p shows the percentages of human genes when the top n primers is the rightmost valid primer) as a fictitious primer corre- are sufficient to have primers inside k exons. There was only sponding to the end of the cDNA. Given a threshold r indi- one case when all exons were missed by this procedure. In this cating the maximum length of potential PCR products, a set case, all exons are shorter than 100 nucleotides. of primers is a cover of a genomic sequence if for every In particular, for k = 1, the first primer was contained in an valid primer from , the distances from left(p) to p and from exon in 86% of cases and one of the top five primers was p to right(p) are less than r. Intuitively, in this formulation, 16 Algorithms and software for gene identification support primers to the cDNA in question. In other words, we are in- terested in the number of primers needed to get the first ad- jacent pair of primers separated by at most r nucleotides (primer pair problem). If a set of primers contains such a pair, then the PCR product corresponding to the primers from the pair leads to the identification of the corresponding part of the gene. There is only one case when the approach fails to find such an adjacent pair (Figure 3). Otherwise, only one primer (in addition to the two fictitious primers at the start and at the end of the cDNA) was needed to have a primer pair in 75% of cases, at most three primers were needed in 90% of cases and at most 16 primers were needed in 99% of cases. Alternative approach for finding single probes and primer pairs Below we describe a different approach specifically for find- ing single probes and primer pairs based on exon chains. Let Fig. 3. Percentages of human genes when n primers (20 nucleotides) S be a set of suboptimal exons or exon chains, and let each are sufficient to have a primer cover or a primer pair separated by at most r = 500 nucleotides (in addition to the two fictitious primers at chain p ∈ S be ascribed a statistical weight R(p) [here we used the start and at the end of the cDNA). 3-exon chains weighed by the function from Gelfand et al. (1996b) measuring coding potential and splice site strengths]. For each position b, let S(b) be the subset of chains coming through b. The score of a candidate primer corre- primers are undirected fragments that can be used to con- sponding to positions b … b is defined as: struct a set of primers for PCR amplification, each undirected 1 k fragment corresponding to two PCR primers, one in each direction. Given a genomic sequence, the goal is to find a 1 R(p) W(b b ) C (1) 1 k primer cover of minimal size. In reality, some adjustments in i1 pS(b ) the positions of primers are necessary to avoid PCR artifacts. We implemented a simple algorithm to find a primer cover. where C is a constant. The candidate primers are sorted by For each MORF, a set of primers is constructed as follows. decreasing order of their scores and a fixed number of Find a primer p in the middle of the best window as before. highest scoring primers are retained with the additional re- To the left of p and to the right of p, add primers every r/2 quirement that the distance between the primers exceeds nucleotides as long as the primers are inside the correspon- some given threshold. ding MORF and there are no primers already put within r The algorithm was tested on samples of human and Arabi- nucleotides. We position primers every r/2 nucleotides to en- dopsis genes (Tables 1 and 2, respectively). We predicted sure that when primers are found inside all exons, the result- single probes of length 150 nucleotides (a probe was ac- ing set of primers is guaranteed to form a primer cover. Con- cepted if at least 100 nucleotides could hybridize with the sider MORFs in sorted order as before, return primers in a coding region of cDNA), and pairs of primers of length 30 MORF in increasing order of the distance from p. nucleotides. Figure 3 shows the percentages of human genes when n On the first sample, the highest scoring candidate probe primers are sufficient to have a primer cover for r = 500 (in was coding in 92% of cases and a set of five candidate probes addition to the two fictitious primers at the start and at the end almost always contained a coding one. It was possible to con- of the cDNA). There are only two exceptional cases when the struct a primer pair in all but two cases (99%). The two ex- algorithm fails to construct a primer cover. When the algo- ceptions are genes with one or two short exons. Two candi- rithm constructs a primer cover successfully, only one primer dates were sufficient in 81% of cases and three candidates was needed in 37% of cases. At most eight primers were were sufficient in 89% of cases. needed to cover 80% of cases, while at most 14 primers were On the second sample, the highest scoring candidate probe needed to cover 90% of cases. was coding in 97% of cases, and three probes were sufficient To find a clone corresponding to the analyzed gene in a in all cases. A primer pair was constructed in all but one case cDNA library, biologists often use experimental protocols and it always contained no more than four candidates (two based on PCR amplification. In this situation, they need a candidates in 93% of cases, three candidates in 98% of relatively small set of PCR primers containing at least two cases). 17 S.-H.Sze et al. Table 1. Number of genes in each category from predictions of 257 human genes Type of prediction Candidates needed 1 2 3 4 5 6–10 >10 No prediction Single probe 237 11 4 1 1 3 0 0 Primer pair – 209 19 9 2 12 4 2 Table 2. Number of genes in each category from predictions of 133 Arabidopsis genes Type of prediction Candidates needed 1 2 3 4 No prediction Single probe 129 3 1 0 0 Primer pair – 124 6 2 1 Fig. 4. Output of GenePrimer software for human gene l33842. Rectangles denote MORFs in different frames. Primers within MORFs are shown in ‘successive’ colors in the order of their priorities with each color showing three primers (the first color is Fig. 5. Output of CASSANDRA software for human gene l33842. red). The real gene structure is also shown with each exon in its The predicted segments are shown as arrows pointing to their respective frame. positions on the sequence line. The number above each arrow is the exon position in the candidate list. The height of an arrow’s solid part is proportional to the candidate score, so that the arrow for the most probable segment is the longest one. The color of an arrow indicates Discussion self-hybridization (red) or cross-hybridization to some other seg- ment (yellow). Green lines correspond to segments for which the The above algorithms provide computational support for program does not expect any hybridization artifacts. gene identification experiments. We have tested the algo- rithms on two samples of human and plant genes, and dem- onstrated that the reliability of predictions is extremely high. Further developments can be based on merging these data in The programs are implemented as Web servers (GenePrimer the following way. First, anchor primers are generated either for the primer cover problem; CASSANDRA for the alterna- at the middle of the window with the highest coding poten- tive approach for finding single probes and primer pairs) and tial, or as the points where most highest scoring exons (or can be reached at http://www-hto.usc.edu/software/pro- exon chains) intersect. Then the set of anchor primers is crustes/. Figures 4 and 5 show sample graphical outputs of modified to generate a primer cover (by taking all primers the two programs. within MORFs conforming to PCR requirements), a primer Of course, biological experiments are rather diverse, and pair (by considering only highest scoring anchor primers), or the proposed algorithms do not cover all experimental ap- whatever set of primers is needed. proaches to gene identification. An appealing feature of the approach is its simplicity and combinatorial flexibility, mak- Acknowledgements ing it easy to modify or optimize the program for different experimental schemes. We are grateful to Paul Hardy for many helpful comments. The algorithms described are based on two types of analy- This work is supported by Department of Energy grant DE- ses: (i) information about long open reading frames and (ii) FG02-94ER61919. The work of M.S.G. is also partially sup- information about the intersection of predicted exons. ported by Russian Fund of Basic Research grant 18 Algorithms and software for gene identification support Hattier,T., Bell,R., Shaffer,D., Stone,S., Phelps,R.S., Tavtigian,S.V., 94-04-12330 and grant MTW300 from ISF and the Russian Skolnik,M.H., Shattuck-Eidens,D. and Kamb,A. (1995) Monitoring Government. M.S.G. and A.A.M. are partially supported by the efficacy of hybrid selection during positional cloning. Mamm. the Russian State Program ‘Human Genome’. Genome, 6, 873–879. Korning,P.G., Hebsgaard,S.M., Rouze,P. and Brunak,S. (1996) Clean- ing the GenBank Arabidopsis thaliana data set. Nucleic Acids Res., 24, 316–320. Laporte,J., Hu,L.J., Kretz,C., Mandel,J.-L., Kioschis,P., Coy,J.F., References Klauck,S.M., Poustka,A. and Dahl,N. (1996) A gene mutated in X-linked myotubular myopathy defines a new putative tyrosine Benian,G.M., Tinley,T.L., Tang,X. and Borodovsky,M. (1996) The phosphatase family conserved in yeast. Nature Genet., 13, 175–182. Caenorhabditis elegans gene unc-89, required for muscle M-line Legouis,R. et al. (1991) The candidate gene for the X-linked Kallmann assembly, encodes a giant modular protein composed of Ig and syndrome encodes a protein related to adhesion molecules. Cell, 67, signal transduction domains. J. Cell Biol., 132, 835–848. 423–435. Brody,L.C. et al. (1995) Construction of a transcription map surround- Mansfield,S.G., Al-Shirawi,D.Y., Ketchum,A.S., Newbern,E.C. and ing the BRCA1 locus of human chromosome 17. Genomics, 25, Kiehart,D.P. (1996) Molecular organization and alternative splicing 238–247. in zipper, the gene that encodes the Drosophila non-muscle myosin Budarf,M.L. et al. (1995) Cloning a balanced translocation associated II heavy chain. J. Mol. Biol., 255, 98–109. with DiGeorge syndrome and identification of a disrupted candidate Meindl,A. et al. (1996) A gene (RPGR) with homology to the RCC1 gene. Nature Genet., 10, 269–278. guanine nucleotide exchange factor is mutated in X-linked retinitis Burset,M. and Guigó,R. (1996) Evaluation of gene structure prediction pigmentosa (RP3). Nature Genet., 13, 35–42. algorithms. Genomics, 34, 353–375. Mironov,A.A., Roytberg,M.A., Pevzner,P.A. and Gelfand,M.S. (1998) Church,D.M., Banks,L.T., Rogers,A.C., Graw,S.L., Housman,D.E., Performance guarantee gene predictions via spliced alignment. Gusella,J.F. and Buckler,A.J. (1993) Identification of human Genomics, in press. chromosome 9 specific genes using exon amplification. Hum. Mol. North,M.A., Sanseau,P., Buckler,A.J., Church,D., Jackson,A., Genet., 2, 1915–1920. Patel,K., Trowsdale,J. and Lehrach,H. (1993) Efficiency and Fickett,J.W. and Tung,C.-S. (1992) Assessment of protein coding specificity of gene isolation by exon amplification. Mamm. Genome, measures. Nucleic Acids Res., 21, 2837–2844. 4, 466–474. Gelfand,M.S. (1995) Prediction of function in DNA sequence analysis. Parimoo,S., Patanjali,S.R., Kolluri,R., Xu,H., Wei,H. and Weis- J. Comp. Biol., 2, 87–115. sman,S.M. (1995) cDNA selection and other approaches in Gelfand,M.S., Mironov,A.A. and Pevzner,P.A. (1996a) Gene recogni- positional cloning. Anal. Biochem., 228, 1–17. tion via spliced sequence alignment. Proc. Natl Acad. Sci. USA, 93, Parrish,J.E. and Nelson,D.L. (1993) Methods for finding genes: A 9061–9066. major rate-limiting step in positional cloning. Gene Anal. Tech. Gelfand,M.S., Astakhova,T.V. and Roytberg,M.A. (1996b) An algo- Appl., 10, 29–41. rithm for highly specific recognition of protein-coding regions. In Selleri,L., Smith,M.W., Holmsen,A.L., Romo,A.J., Thomas,S.D., Akutsu,T., Asai,K., Hagiya,M., Kuhara,S., Miyano,S. and Nakai,K. Paternotte,C., Romberg,L.C.R., Wei,Y.H. and Evans,G.A. (1995) (eds), Genome Informatics 1996 (Proceedings of the 7th Workshop High-resolution physical mapping of a 250-kb region of human on Genome Informatics, December 1996, Tokyo, Japan). Universal chromosome 11q24 by genomic sequence sampling (GSS). Ge- Academy Press, Tokyo, pp. 82–87. nomics, 26, 489–501. Goldmuntz,E., Wang,Z., Roe,B.A. and Budarf,M.L. (1996) Cloning, Sze,S.-H. and Pevzner,P.A. (1997) Las Vegas algorithms for gene genomic organization, and chromosomal localization of human recognition: suboptimal and error-tolerant spliced alignment. J. citrate transport protein to the DiGeorge/velocardiofacial syndrome Comp. Biol., 4, 297–309. minimal critical region. Genomics, 33, 271–276. Timmermans,M.C.P., Das,O.P. and Messing,J. (1996) Characteriz- Gu,Y., Shen,Y., Gibbs,R.A. and Nelson,D.L. (1996) Identification of ation of a meiotic crossover in maize identified by a restriction FMR2, a novel gene associated with the FRAXE CCG repeat and fragment length polymorphism-based method. Genetics, 143, CpG island. Nature Genet., 13, 109–113. 1771–1783.
Bioinformatics – Oxford University Press
Published: Jan 1, 1998
You can share this free article with as many people as you like with the url below! We hope you enjoy this feature!
Read and print from thousands of top scholarly journals.
Already have an account? Log in
Bookmark this article. You can see your Bookmarks on your DeepDyve Library.
To save an article, log in first, or sign up for a DeepDyve account if you don’t already have one.
Copy and paste the desired citation format or use the link below to download a file formatted for EndNote
Access the full text.
Sign up today, get DeepDyve free for 14 days.
All DeepDyve websites use cookies to improve your online experience. They were placed on your computer when you launched this website. You can change your cookie settings through your browser.