Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

GABI-Kat SimpleSearch: a flanking sequence tag (FST) database for the identification of T-DNA insertion mutants in Arabidopsisthaliana

GABI-Kat SimpleSearch: a flanking sequence tag (FST) database for the identification of T-DNA... Vol. 19 no. 11 2003, pages 1441–1442 BIOINFORMATICS APPLICATIONS NOTE DOI: 10.1093/bioinformatics/btg170 GABI-Kat SimpleSearch: a flanking sequence tag (FST) database for the identification of T-DNA insertion mutants in Arabidopsis thaliana Yong Li, Mario G. Rosso, Nicolai Strizhov , Prisca Viehoever and ∗,‡ Bernd Weisshaar Max Planck Institute for Plant Breeding Research, Carl-von-Linne-Weg 10, D-50829 Koeln, Germany Received on December 19, 2002; revised on February 17, 2003; accepted on February 20, 2003 ABSTRACT techniques that enable us to recover plant genomic DNA Summary: GABI-Kat SimpleSearch is a database of sequences flanking the insertion site (flanking sequence flanking sequence tags (FSTs) of T-DNA mutagenized tag, FST), make it possible to locate the exact position of Arabidopsis thaliana lines that were generated by the insertions in the A.thaliana genome. GABI-Kat project. Sequences flanking the T-DNA insertion The GABI-Kat project is building a large T-DNA sites were aligned to the A.thaliana genome sequence, mutagenized A.thaliana population (accession Columbia, annotated with information about the FST, the insertion the sequenced genotype) of finally 70 000 plants with site and the line from which the FST was derived. A web sequence-characterized insertion sites. The sequence trace interface permits text-based as well as sequence-based files derived from PCR-generated DNA fragments span- searches for relevant insertions. GABI-Kat SimpleSearch ning insertion sites were processed using phred (Ewing et aims to help biologists to quickly find T-DNA insertion al., 1998). T-DNA vector sequences in the candidate FSTs mutants for their research. were masked using cross match, and sequences shorter Availability: http://www.mpiz-koeln.mpg.de/GABI-Kat/ than 30 nt were discarded. Candidate FSTs passing this Contact: weisshaa@mpiz-koeln.mpg.de filter were aligned to the A.thaliana genome sequence by BLASTN (Altschul et al., 1997). Up-to-date nuclear genome sequence data (BAC sequences of the minimal In plants, an effective way of disrupting genes is tiling path) and gene annotation data were obtained from insertional mutagenesis using transposons or T- MAtDB (MIPS Arabidopsis thaliana Database, Schoof DNA. Collections of mutant lines containing trans- et al., 2002). Only sequences with BLAST expect values poson or T-DNA insertions are valuable resources lower than 5e-4 were considered as good FSTs. BLAST for plant functional genomics. Several transpo- reports were parsed and the expected insertion site were son (Wisman et al., 1998; Parinov et al., 1999; calculated. MAtDB gene annotation data were used to Genetrap DB, http://genetrap.cshl.org/) and T-DNA determine if a given insertion site was within a gene. An (Krysan et al., 1999, SIGnAL, http://signal.salk.edu/ FST qualifies as a ‘gene hit’ when the insertion site is tabout.html; FLAGdb/FST, http://genoplante-info. located between 300 bp upstream of the ATG and 300 infobiogen.fr; Samson et al., 2002) mutagenized popu- bp downstream of stop codon of an annotated gene. We lations have been created for Arabidopsis thaliana, the use the term ‘CDSi hit’ when an insertion site is located model system in which many aspects of plant biology are studied. The transposon or T-DNA integrated into between ATG and stop codon (insertions in CDS and the plant genome not only disrupts a gene that might be included introns). located at the insertion site, but also provides a tag to facil- Annotated FSTs and gene annotation data were stored itate the identification of that gene. The availability of the in a relational database (MySQL). The data are accessible almost complete A.thaliana genome sequence generated through a web interface termed GABI-Kat SimpleSearch (http://www.mpiz-koeln.mpg.de/GABI-Kat/). On the web by The Arabidopsis Genome Initiative (2000), and PCR site there are several static HTML pages providing general To whom correspondence should be addressed. information of the project, the procedure to obtain seeds Present address: Max Planck Unit for Structural Molecular Biology, and other relevant information. The database queries were Notkestrasse 85, D-22607 Hamburg, Germany. ‡ implemented by a set of PHP scripts running in an Present address: Bielefeld University, Faculty of Biology, PO Box, D- 33501 Bielefeld, Germany. Apache/PHP environment. The PHP module of the GD Bioinformatics 19(11)  c Oxford University Press 2003; all rights reserved. 1441 Y.Li et al. library was used for drawing the graphics. In addition to will be entered into SimpleSearch every 3 months. GABI-Kat SimpleSearch, FST data are also available from Also, new MAtDB releases containing improved gene the GSS division of EMBL/GenBank/DDBJ. annotation data and additional genomic sequences from The goal of SimpleSearch is to allow quick access from the few remaining gaps in the genomic sequence will be plant biologists to the T-DNA insertion mutants generated incorporated. Future developments will address the com- by GABI-Kat. Two ways to start a search are available: pleteness of the annotation information incorporated into (i) a text-based search to find ‘gene hits’; and (ii) a SimpleSearch by including additional data from sources sequence-based search using BLASTN or TBLASTN like TIGR (http://www.tigr.org/), TAIR (http://www. against all FSTs. The text-based search accepts either arabidopsis.org/) and CATMA (http://www.catma.org/). AGI gene codes (e.g. At1g23450) or a keyword as input. Up to 20 AGI gene codes can be entered at once. The ACKNOWLEDGEMENTS keyword search performs a substring search in the gene The authors thank all members of the GABI-Kat and the annotation text field. This feature helps to find FSTs ADIS teams for greenhouse and laboratory work. Thanks related to a given gene family, provided that the gene also to Thomas Rosleff Soerensen and Martin Werber for annotation is accurate. The sequence-based search allows helpful discussions. This work is supported by the BMBF accessing FSTs mapping to intergenic regions that might in the context of the German plant genomics program potentially be of interest to users, for example because a GABI (Forderk ¨ ennzeichen 0312273). not annotated gene is assumed to exist. The results from both search methods will eventually be presented in a REFERENCES table of entries matching the search criteria. The columns Altschul,S.F., Madden,T.L., Schaffer,A.A., Zhang,J., Zhang,Z., in the table are the GABI-Kat line ID linked to the FST Miller,W. and Lipman,D.J. (1997) Gapped BLAST and PSI- page, the AGI gene code linked to MAtDB, the gene BLAST: a new generation of protein database search programs. annotation text and a link to the graphic display of the Nucleic Acids Res., 25, 3389–3402. expected insertion site. The FST page displays the FST in Ewing,B., Hillier,L., Wendl,M.C. and Green,P. (1998) Base-calling FASTA format, along with data from the BLAST output of automated sequencer traces using phred. I. Accuracy assess- including the presence/absence of T-DNA sequences in ment. Genome Res., 8, 175–185. the original sequence read. To assist users in evaluating Krysan,P.J., Young,J.C. and Sussman,M.R. (1999) T-DNA as an insertion sites relative to genes of interest, the graphic insertional mutagen in Arabidopsis. Plant Cell, 11, 2283–2290. view displays the genome fragment around the gene or Parinov,S., Sevugan,M., De,Y., Yang,W.C., Kumaran,M. and Sun- tagged sequence at the expected insertion site as an image. daresan,V. (1999) Analysis of flanking sequences from dissocia- Annotated genes with exon–intron structure and other tion insertion lines: a database for reverse genetics in Arabidop- FSTs located in this region are shown, and BLAST expect sis. Plant Cell, 11, 2263–2270. values of the FSTs are colour-coded. The image can be Samson,F., Brunaud,V., Balzergue,S., Dubreucq,B., Lepiniec,L., zoomed out and back in again to display two differently Pelletier,G., Caboche,M. and Lecharny,A. (2002) FLAGdb/FST: sized genome fragments. Users can request seeds of an a database of mapped flanking insertion sites (FSTs) of Ara- bidopsis thaliana T-DNA transformants. Nucleic Acids Res., 30, insertion line and seeds will be delivered if the T-DNA 94–97. insertion can be confirmed, and the SimpleSearch web site Schoof,H., Zaccaria,P., Gundlach,H., Lemcke,K., Rudd,S., contains additional features such as seed request tracking Kolesov,G., Arnold,R., Mewes,H.W. and Mayer,K.F. (2002) and personalized access to confirmation sequences. MIPS Arabidopsis thaliana Database (MAtDB): an integrated The February 2003 release of SimpleSearch includes biological knowledge resource based on the first complete plant 41 389 FSTs from 26 375 T-DNA transformed A.thaliana genome. Nucleic Acids Res., 30, 91–93. lines; annotation is based on MAtDB release 11012003. The Arabidopsis Genome Initiative (2000) Analysis of the genome Atotal of 9862 different genes (37% coverage) have been sequence of the flowering plant Arabidopsis thaliana. Nature, hit, of which 6458 qualify as CDSi hits (24% coverage). 408, 796–815. The mean length of FSTs is 239 bp, and in 70% of the Wisman,E., Hartman,U., Sagasser,M., Baumann,E., Palme,K., FSTs T-DNA-derived sequence was detected. GABI-Kat Hahlbrock,K., Saedler,H. and Weisshaar,B. (1998) Knock-out SimpleSearch will be updated regularly with new FSTs. mutants from an En-1 mutagenized Arabidopsis thaliana popu- Until the final goal of 70 000 analyzed lines is reached, lation generate phenylpropanoid biosynthesis phenotypes. Proc. the FSTs from an average of about 6000 transformed lines Natl Acad. Sci. USA, 95, 12432–12437. http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Bioinformatics Oxford University Press

GABI-Kat SimpleSearch: a flanking sequence tag (FST) database for the identification of T-DNA insertion mutants in Arabidopsisthaliana

Loading next page...
 
/lp/oxford-university-press/gabi-kat-simplesearch-a-flanking-sequence-tag-fst-database-for-the-45VR7AFY0T

References (8)

Publisher
Oxford University Press
Copyright
© Oxford University Press 2003
ISSN
1367-4803
eISSN
1460-2059
DOI
10.1093/bioinformatics/btg170
Publisher site
See Article on Publisher Site

Abstract

Vol. 19 no. 11 2003, pages 1441–1442 BIOINFORMATICS APPLICATIONS NOTE DOI: 10.1093/bioinformatics/btg170 GABI-Kat SimpleSearch: a flanking sequence tag (FST) database for the identification of T-DNA insertion mutants in Arabidopsis thaliana Yong Li, Mario G. Rosso, Nicolai Strizhov , Prisca Viehoever and ∗,‡ Bernd Weisshaar Max Planck Institute for Plant Breeding Research, Carl-von-Linne-Weg 10, D-50829 Koeln, Germany Received on December 19, 2002; revised on February 17, 2003; accepted on February 20, 2003 ABSTRACT techniques that enable us to recover plant genomic DNA Summary: GABI-Kat SimpleSearch is a database of sequences flanking the insertion site (flanking sequence flanking sequence tags (FSTs) of T-DNA mutagenized tag, FST), make it possible to locate the exact position of Arabidopsis thaliana lines that were generated by the insertions in the A.thaliana genome. GABI-Kat project. Sequences flanking the T-DNA insertion The GABI-Kat project is building a large T-DNA sites were aligned to the A.thaliana genome sequence, mutagenized A.thaliana population (accession Columbia, annotated with information about the FST, the insertion the sequenced genotype) of finally 70 000 plants with site and the line from which the FST was derived. A web sequence-characterized insertion sites. The sequence trace interface permits text-based as well as sequence-based files derived from PCR-generated DNA fragments span- searches for relevant insertions. GABI-Kat SimpleSearch ning insertion sites were processed using phred (Ewing et aims to help biologists to quickly find T-DNA insertion al., 1998). T-DNA vector sequences in the candidate FSTs mutants for their research. were masked using cross match, and sequences shorter Availability: http://www.mpiz-koeln.mpg.de/GABI-Kat/ than 30 nt were discarded. Candidate FSTs passing this Contact: weisshaa@mpiz-koeln.mpg.de filter were aligned to the A.thaliana genome sequence by BLASTN (Altschul et al., 1997). Up-to-date nuclear genome sequence data (BAC sequences of the minimal In plants, an effective way of disrupting genes is tiling path) and gene annotation data were obtained from insertional mutagenesis using transposons or T- MAtDB (MIPS Arabidopsis thaliana Database, Schoof DNA. Collections of mutant lines containing trans- et al., 2002). Only sequences with BLAST expect values poson or T-DNA insertions are valuable resources lower than 5e-4 were considered as good FSTs. BLAST for plant functional genomics. Several transpo- reports were parsed and the expected insertion site were son (Wisman et al., 1998; Parinov et al., 1999; calculated. MAtDB gene annotation data were used to Genetrap DB, http://genetrap.cshl.org/) and T-DNA determine if a given insertion site was within a gene. An (Krysan et al., 1999, SIGnAL, http://signal.salk.edu/ FST qualifies as a ‘gene hit’ when the insertion site is tabout.html; FLAGdb/FST, http://genoplante-info. located between 300 bp upstream of the ATG and 300 infobiogen.fr; Samson et al., 2002) mutagenized popu- bp downstream of stop codon of an annotated gene. We lations have been created for Arabidopsis thaliana, the use the term ‘CDSi hit’ when an insertion site is located model system in which many aspects of plant biology are studied. The transposon or T-DNA integrated into between ATG and stop codon (insertions in CDS and the plant genome not only disrupts a gene that might be included introns). located at the insertion site, but also provides a tag to facil- Annotated FSTs and gene annotation data were stored itate the identification of that gene. The availability of the in a relational database (MySQL). The data are accessible almost complete A.thaliana genome sequence generated through a web interface termed GABI-Kat SimpleSearch (http://www.mpiz-koeln.mpg.de/GABI-Kat/). On the web by The Arabidopsis Genome Initiative (2000), and PCR site there are several static HTML pages providing general To whom correspondence should be addressed. information of the project, the procedure to obtain seeds Present address: Max Planck Unit for Structural Molecular Biology, and other relevant information. The database queries were Notkestrasse 85, D-22607 Hamburg, Germany. ‡ implemented by a set of PHP scripts running in an Present address: Bielefeld University, Faculty of Biology, PO Box, D- 33501 Bielefeld, Germany. Apache/PHP environment. The PHP module of the GD Bioinformatics 19(11)  c Oxford University Press 2003; all rights reserved. 1441 Y.Li et al. library was used for drawing the graphics. In addition to will be entered into SimpleSearch every 3 months. GABI-Kat SimpleSearch, FST data are also available from Also, new MAtDB releases containing improved gene the GSS division of EMBL/GenBank/DDBJ. annotation data and additional genomic sequences from The goal of SimpleSearch is to allow quick access from the few remaining gaps in the genomic sequence will be plant biologists to the T-DNA insertion mutants generated incorporated. Future developments will address the com- by GABI-Kat. Two ways to start a search are available: pleteness of the annotation information incorporated into (i) a text-based search to find ‘gene hits’; and (ii) a SimpleSearch by including additional data from sources sequence-based search using BLASTN or TBLASTN like TIGR (http://www.tigr.org/), TAIR (http://www. against all FSTs. The text-based search accepts either arabidopsis.org/) and CATMA (http://www.catma.org/). AGI gene codes (e.g. At1g23450) or a keyword as input. Up to 20 AGI gene codes can be entered at once. The ACKNOWLEDGEMENTS keyword search performs a substring search in the gene The authors thank all members of the GABI-Kat and the annotation text field. This feature helps to find FSTs ADIS teams for greenhouse and laboratory work. Thanks related to a given gene family, provided that the gene also to Thomas Rosleff Soerensen and Martin Werber for annotation is accurate. The sequence-based search allows helpful discussions. This work is supported by the BMBF accessing FSTs mapping to intergenic regions that might in the context of the German plant genomics program potentially be of interest to users, for example because a GABI (Forderk ¨ ennzeichen 0312273). not annotated gene is assumed to exist. The results from both search methods will eventually be presented in a REFERENCES table of entries matching the search criteria. The columns Altschul,S.F., Madden,T.L., Schaffer,A.A., Zhang,J., Zhang,Z., in the table are the GABI-Kat line ID linked to the FST Miller,W. and Lipman,D.J. (1997) Gapped BLAST and PSI- page, the AGI gene code linked to MAtDB, the gene BLAST: a new generation of protein database search programs. annotation text and a link to the graphic display of the Nucleic Acids Res., 25, 3389–3402. expected insertion site. The FST page displays the FST in Ewing,B., Hillier,L., Wendl,M.C. and Green,P. (1998) Base-calling FASTA format, along with data from the BLAST output of automated sequencer traces using phred. I. Accuracy assess- including the presence/absence of T-DNA sequences in ment. Genome Res., 8, 175–185. the original sequence read. To assist users in evaluating Krysan,P.J., Young,J.C. and Sussman,M.R. (1999) T-DNA as an insertion sites relative to genes of interest, the graphic insertional mutagen in Arabidopsis. Plant Cell, 11, 2283–2290. view displays the genome fragment around the gene or Parinov,S., Sevugan,M., De,Y., Yang,W.C., Kumaran,M. and Sun- tagged sequence at the expected insertion site as an image. daresan,V. (1999) Analysis of flanking sequences from dissocia- Annotated genes with exon–intron structure and other tion insertion lines: a database for reverse genetics in Arabidop- FSTs located in this region are shown, and BLAST expect sis. Plant Cell, 11, 2263–2270. values of the FSTs are colour-coded. The image can be Samson,F., Brunaud,V., Balzergue,S., Dubreucq,B., Lepiniec,L., zoomed out and back in again to display two differently Pelletier,G., Caboche,M. and Lecharny,A. (2002) FLAGdb/FST: sized genome fragments. Users can request seeds of an a database of mapped flanking insertion sites (FSTs) of Ara- bidopsis thaliana T-DNA transformants. Nucleic Acids Res., 30, insertion line and seeds will be delivered if the T-DNA 94–97. insertion can be confirmed, and the SimpleSearch web site Schoof,H., Zaccaria,P., Gundlach,H., Lemcke,K., Rudd,S., contains additional features such as seed request tracking Kolesov,G., Arnold,R., Mewes,H.W. and Mayer,K.F. (2002) and personalized access to confirmation sequences. MIPS Arabidopsis thaliana Database (MAtDB): an integrated The February 2003 release of SimpleSearch includes biological knowledge resource based on the first complete plant 41 389 FSTs from 26 375 T-DNA transformed A.thaliana genome. Nucleic Acids Res., 30, 91–93. lines; annotation is based on MAtDB release 11012003. The Arabidopsis Genome Initiative (2000) Analysis of the genome Atotal of 9862 different genes (37% coverage) have been sequence of the flowering plant Arabidopsis thaliana. Nature, hit, of which 6458 qualify as CDSi hits (24% coverage). 408, 796–815. The mean length of FSTs is 239 bp, and in 70% of the Wisman,E., Hartman,U., Sagasser,M., Baumann,E., Palme,K., FSTs T-DNA-derived sequence was detected. GABI-Kat Hahlbrock,K., Saedler,H. and Weisshaar,B. (1998) Knock-out SimpleSearch will be updated regularly with new FSTs. mutants from an En-1 mutagenized Arabidopsis thaliana popu- Until the final goal of 70 000 analyzed lines is reached, lation generate phenylpropanoid biosynthesis phenotypes. Proc. the FSTs from an average of about 6000 transformed lines Natl Acad. Sci. USA, 95, 12432–12437.

Journal

BioinformaticsOxford University Press

Published: Jul 22, 2003

There are no references for this article.