Access the full text.
Sign up today, get DeepDyve free for 14 days.
The Initiative (2000)
Analysis of the genome sequence of the flowering plant Arabidopsis thalianaNature, 408
S. Altschul, Thomas Madden, A. Schäffer, Jinghui Zhang, Zheng Zhang, W. Miller, D. Lipman (1997)
Gapped BLAST and PSI-BLAST: a new generation of protein database search programs.Nucleic acids research, 25 17
S. Parinov, M. Sevugan, D. Ye, Wei Yang, M. Kumaran, V. Sundaresan (1999)
Analysis of Flanking Sequences from Dissociation Insertion Lines: A Database for Reverse Genetics in ArabidopsisPlant Cell, 11
P. Krysan, J. Young, M. Sussman (1999)
T-DNA as an Insertional Mutagen in ArabidopsisPlant Cell, 11
B. Ewing, L. Hillier, M. Wendl, P. Green (1998)
Base-calling of automated sequencer traces using phred. I. Accuracy assessment.Genome research, 8 3
H. Schoof, P. Zaccaria, H. Gundlach, K. Lemcke, S. Rudd, G. Kolesov, Roland Arnold, H. Mewes, K. Mayer (2002)
MIPS Arabidopsis thaliana Database (MAtDB): an integrated biological knowledge resource based on the first complete plant genomeNucleic acids research, 30 1
E. Wisman, Ulrike Hartmann, M. Sagasser, E. Baumann, K. Palme, K. Hahlbrock, H. Saedler, B. Weisshaar (1998)
Knock-out mutants from an En-1 mutagenized Arabidopsis thaliana population generate phenylpropanoid biosynthesis phenotypes.Proceedings of the National Academy of Sciences of the United States of America, 95 21
F. Samson, V. Brunaud, S. Balzergue, B. Dubreucq, L. Lepiniec, G. Pelletier, M. Caboche, A. Lecharny (2002)
FLAGdb/FST: a database of mapped flanking insertion sites (FSTs) of Arabidopsis thaliana T-DNA transformantsNucleic acids research, 30 1
Vol. 19 no. 11 2003, pages 1441–1442 BIOINFORMATICS APPLICATIONS NOTE DOI: 10.1093/bioinformatics/btg170 GABI-Kat SimpleSearch: a flanking sequence tag (FST) database for the identification of T-DNA insertion mutants in Arabidopsis thaliana Yong Li, Mario G. Rosso, Nicolai Strizhov , Prisca Viehoever and ∗,‡ Bernd Weisshaar Max Planck Institute for Plant Breeding Research, Carl-von-Linne-Weg 10, D-50829 Koeln, Germany Received on December 19, 2002; revised on February 17, 2003; accepted on February 20, 2003 ABSTRACT techniques that enable us to recover plant genomic DNA Summary: GABI-Kat SimpleSearch is a database of sequences flanking the insertion site (flanking sequence flanking sequence tags (FSTs) of T-DNA mutagenized tag, FST), make it possible to locate the exact position of Arabidopsis thaliana lines that were generated by the insertions in the A.thaliana genome. GABI-Kat project. Sequences flanking the T-DNA insertion The GABI-Kat project is building a large T-DNA sites were aligned to the A.thaliana genome sequence, mutagenized A.thaliana population (accession Columbia, annotated with information about the FST, the insertion the sequenced genotype) of finally 70 000 plants with site and the line from which the FST was derived. A web sequence-characterized insertion sites. The sequence trace interface permits text-based as well as sequence-based files derived from PCR-generated DNA fragments span- searches for relevant insertions. GABI-Kat SimpleSearch ning insertion sites were processed using phred (Ewing et aims to help biologists to quickly find T-DNA insertion al., 1998). T-DNA vector sequences in the candidate FSTs mutants for their research. were masked using cross match, and sequences shorter Availability: http://www.mpiz-koeln.mpg.de/GABI-Kat/ than 30 nt were discarded. Candidate FSTs passing this Contact: weisshaa@mpiz-koeln.mpg.de filter were aligned to the A.thaliana genome sequence by BLASTN (Altschul et al., 1997). Up-to-date nuclear genome sequence data (BAC sequences of the minimal In plants, an effective way of disrupting genes is tiling path) and gene annotation data were obtained from insertional mutagenesis using transposons or T- MAtDB (MIPS Arabidopsis thaliana Database, Schoof DNA. Collections of mutant lines containing trans- et al., 2002). Only sequences with BLAST expect values poson or T-DNA insertions are valuable resources lower than 5e-4 were considered as good FSTs. BLAST for plant functional genomics. Several transpo- reports were parsed and the expected insertion site were son (Wisman et al., 1998; Parinov et al., 1999; calculated. MAtDB gene annotation data were used to Genetrap DB, http://genetrap.cshl.org/) and T-DNA determine if a given insertion site was within a gene. An (Krysan et al., 1999, SIGnAL, http://signal.salk.edu/ FST qualifies as a ‘gene hit’ when the insertion site is tabout.html; FLAGdb/FST, http://genoplante-info. located between 300 bp upstream of the ATG and 300 infobiogen.fr; Samson et al., 2002) mutagenized popu- bp downstream of stop codon of an annotated gene. We lations have been created for Arabidopsis thaliana, the use the term ‘CDSi hit’ when an insertion site is located model system in which many aspects of plant biology are studied. The transposon or T-DNA integrated into between ATG and stop codon (insertions in CDS and the plant genome not only disrupts a gene that might be included introns). located at the insertion site, but also provides a tag to facil- Annotated FSTs and gene annotation data were stored itate the identification of that gene. The availability of the in a relational database (MySQL). The data are accessible almost complete A.thaliana genome sequence generated through a web interface termed GABI-Kat SimpleSearch (http://www.mpiz-koeln.mpg.de/GABI-Kat/). On the web by The Arabidopsis Genome Initiative (2000), and PCR site there are several static HTML pages providing general To whom correspondence should be addressed. information of the project, the procedure to obtain seeds Present address: Max Planck Unit for Structural Molecular Biology, and other relevant information. The database queries were Notkestrasse 85, D-22607 Hamburg, Germany. ‡ implemented by a set of PHP scripts running in an Present address: Bielefeld University, Faculty of Biology, PO Box, D- 33501 Bielefeld, Germany. Apache/PHP environment. The PHP module of the GD Bioinformatics 19(11) c Oxford University Press 2003; all rights reserved. 1441 Y.Li et al. library was used for drawing the graphics. In addition to will be entered into SimpleSearch every 3 months. GABI-Kat SimpleSearch, FST data are also available from Also, new MAtDB releases containing improved gene the GSS division of EMBL/GenBank/DDBJ. annotation data and additional genomic sequences from The goal of SimpleSearch is to allow quick access from the few remaining gaps in the genomic sequence will be plant biologists to the T-DNA insertion mutants generated incorporated. Future developments will address the com- by GABI-Kat. Two ways to start a search are available: pleteness of the annotation information incorporated into (i) a text-based search to find ‘gene hits’; and (ii) a SimpleSearch by including additional data from sources sequence-based search using BLASTN or TBLASTN like TIGR (http://www.tigr.org/), TAIR (http://www. against all FSTs. The text-based search accepts either arabidopsis.org/) and CATMA (http://www.catma.org/). AGI gene codes (e.g. At1g23450) or a keyword as input. Up to 20 AGI gene codes can be entered at once. The ACKNOWLEDGEMENTS keyword search performs a substring search in the gene The authors thank all members of the GABI-Kat and the annotation text field. This feature helps to find FSTs ADIS teams for greenhouse and laboratory work. Thanks related to a given gene family, provided that the gene also to Thomas Rosleff Soerensen and Martin Werber for annotation is accurate. The sequence-based search allows helpful discussions. This work is supported by the BMBF accessing FSTs mapping to intergenic regions that might in the context of the German plant genomics program potentially be of interest to users, for example because a GABI (Forderk ¨ ennzeichen 0312273). not annotated gene is assumed to exist. The results from both search methods will eventually be presented in a REFERENCES table of entries matching the search criteria. The columns Altschul,S.F., Madden,T.L., Schaffer,A.A., Zhang,J., Zhang,Z., in the table are the GABI-Kat line ID linked to the FST Miller,W. and Lipman,D.J. (1997) Gapped BLAST and PSI- page, the AGI gene code linked to MAtDB, the gene BLAST: a new generation of protein database search programs. annotation text and a link to the graphic display of the Nucleic Acids Res., 25, 3389–3402. expected insertion site. The FST page displays the FST in Ewing,B., Hillier,L., Wendl,M.C. and Green,P. (1998) Base-calling FASTA format, along with data from the BLAST output of automated sequencer traces using phred. I. Accuracy assess- including the presence/absence of T-DNA sequences in ment. Genome Res., 8, 175–185. the original sequence read. To assist users in evaluating Krysan,P.J., Young,J.C. and Sussman,M.R. (1999) T-DNA as an insertion sites relative to genes of interest, the graphic insertional mutagen in Arabidopsis. Plant Cell, 11, 2283–2290. view displays the genome fragment around the gene or Parinov,S., Sevugan,M., De,Y., Yang,W.C., Kumaran,M. and Sun- tagged sequence at the expected insertion site as an image. daresan,V. (1999) Analysis of flanking sequences from dissocia- Annotated genes with exon–intron structure and other tion insertion lines: a database for reverse genetics in Arabidop- FSTs located in this region are shown, and BLAST expect sis. Plant Cell, 11, 2263–2270. values of the FSTs are colour-coded. The image can be Samson,F., Brunaud,V., Balzergue,S., Dubreucq,B., Lepiniec,L., zoomed out and back in again to display two differently Pelletier,G., Caboche,M. and Lecharny,A. (2002) FLAGdb/FST: sized genome fragments. Users can request seeds of an a database of mapped flanking insertion sites (FSTs) of Ara- bidopsis thaliana T-DNA transformants. Nucleic Acids Res., 30, insertion line and seeds will be delivered if the T-DNA 94–97. insertion can be confirmed, and the SimpleSearch web site Schoof,H., Zaccaria,P., Gundlach,H., Lemcke,K., Rudd,S., contains additional features such as seed request tracking Kolesov,G., Arnold,R., Mewes,H.W. and Mayer,K.F. (2002) and personalized access to confirmation sequences. MIPS Arabidopsis thaliana Database (MAtDB): an integrated The February 2003 release of SimpleSearch includes biological knowledge resource based on the first complete plant 41 389 FSTs from 26 375 T-DNA transformed A.thaliana genome. Nucleic Acids Res., 30, 91–93. lines; annotation is based on MAtDB release 11012003. The Arabidopsis Genome Initiative (2000) Analysis of the genome Atotal of 9862 different genes (37% coverage) have been sequence of the flowering plant Arabidopsis thaliana. Nature, hit, of which 6458 qualify as CDSi hits (24% coverage). 408, 796–815. The mean length of FSTs is 239 bp, and in 70% of the Wisman,E., Hartman,U., Sagasser,M., Baumann,E., Palme,K., FSTs T-DNA-derived sequence was detected. GABI-Kat Hahlbrock,K., Saedler,H. and Weisshaar,B. (1998) Knock-out SimpleSearch will be updated regularly with new FSTs. mutants from an En-1 mutagenized Arabidopsis thaliana popu- Until the final goal of 70 000 analyzed lines is reached, lation generate phenylpropanoid biosynthesis phenotypes. Proc. the FSTs from an average of about 6000 transformed lines Natl Acad. Sci. USA, 95, 12432–12437.
Bioinformatics – Oxford University Press
Published: Jul 22, 2003
You can share this free article with as many people as you like with the url below! We hope you enjoy this feature!
Read and print from thousands of top scholarly journals.
Already have an account? Log in
Bookmark this article. You can see your Bookmarks on your DeepDyve Library.
To save an article, log in first, or sign up for a DeepDyve account if you don’t already have one.
Copy and paste the desired citation format or use the link below to download a file formatted for EndNote
Access the full text.
Sign up today, get DeepDyve free for 14 days.
All DeepDyve websites use cookies to improve your online experience. They were placed on your computer when you launched this website. You can change your cookie settings through your browser.