Access the full text.
Sign up today, get DeepDyve free for 14 days.
M. Hattori, Y. Okuno, S. Goto, M. Kanehisa (2003)
Development of a chemical structure comparison method for integrated analysis of chemical and genomic information in the metabolic pathways.Journal of the American Chemical Society, 125 39
(2005)
KEGG as a glycome 75 informatics resource
M. Kanehisa, S. Goto, S. Kawashima, Y. Okuno, M. Hattori (2004)
The KEGG resource for deciphering the genomeNucleic acids research, 32 Database issue
M. Kanehisa (1997)
A database for post-genome analysis.Trends in genetics : TIG, 13 9
S. Goto, T. Nishioka, M. Kanehisa (2000)
LIGAND: chemical database of enzyme reactionsNucleic acids research, 28 1
K. Pruitt, T. Tatusova, D. Maglott (2005)
Ncbi reference sequence (refseq): a curated non - redundant sequence database of genomesNucleic Acids Research
S. Goto, T. Nishioka, M. Kanehisa (1998)
LIGAND: chemical database for enzyme reactionsBioinformatics, 14
Masaaki Kotera, Y. Okuno, M. Hattori, S. Goto, M. Kanehisa (2004)
Computational assignment of the EC numbers for genomic-scale analysis of enzymatic reactions.Journal of the American Chemical Society, 126 50
K. Pruitt, T. Tatusova, D. Maglott (2004)
NCBI Reference Sequence (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteinsNucleic Acids Research, 33
M. Kanehisa, S. Goto, S. Kawashima, A. Nakaya (2002)
The KEGG databases at GenomeNetNucleic acids research, 30 1
R. Tatusov, D. Natale, I. Garkavtsev, T. Tatusova, U. Shankavaram, B. Rao, B. Kiryutin, Michael Galperin, N. Fedorova, E. Koonin (2001)
The COG database: new developments in phylogenetic classification of proteins from complete genomesNucleic acids research, 29 1
D354–D357 Nucleic Acids Research, 2006, Vol. 34, Database issue doi:10.1093/nar/gkj102 From genomics to chemical genomics: new developments in KEGG 1,2, 1 1 1 Minoru Kanehisa *, Susumu Goto , Masahiro Hattori , Kiyoko F. Aoki-Kinoshita , 1 2 2 2 Masumi Itoh , Shuichi Kawashima , Toshiaki Katayama , Michihiro Araki and 1,3 5 Mika Hirakawa Bioinformatics Center, Institute for Chemical Research, Kyoto University, Uji, Kyoto 611-0011, Japan, Human Genome Center, Institute of Medical Science, University of Tokyo, Minato-ku, Tokyo 108-8639, Japan and Institute for Bioinformatics Research and Development, Japan Science and Technology Agency, Chiyoda-ku, Tokyo 102-8666, Japan 10 Received September 14, 2005; Revised and Accepted October 17, 2005 ABSTRACT INTRODUCTION The increasing amount of genomic and molecular While traditional genomics and other types of omics information is the basis for understanding higher- approaches have contributed to our knowledge on the genomic order biological systems, such as the cell and the space of possible genes and proteins that make up the bio- 45 15 organism, and their interactions with the environ- logical system, the new chemical genomics initiatives will ment, as well as for medical, industrial and other give us a glimpse of the chemical space of possible chemical practical applications. The KEGG resource (http:// substances that exist as an interface between the biological www.genome.jp/kegg/) provides a reference know- world and the natural world. The KEGG database project was initiated in 1995, the last year of the first 5-year 50 ledge base for linking genomes to biological systems, phase of the Japanese Human Genome Programme (1). 20 categorized as building blocks in the genomic space After 10 years of development in parallel with the growing (KEGG GENES) and the chemical space (KEGG number of completely sequenced genomes and increased LIGAND), and wiring diagrams of interaction activities in post-genomic research, the KEGG project has networks and reaction networks (KEGG PATHWAY). entered a new phase in accordance with the chemical genom- 55 A fourth component, KEGG BRITE, has been formally ics initiatives. 25 added to the KEGG suite of databases. This reflects KEGG is a database resource for understanding higher- our attempt to computerize functional interpretations order functions and utilities of the biological system, such as part of the pathway reconstruction process based as the cell or the organism, from genomic and molecular on the hierarchically structured knowledge about the information. In fact, we consider KEGG as a computer rep- 60 genomic, chemical and network spaces. In accord- resentation of the biological system, consisting of building 30 ance with the new chemical genomics initiatives, blocks and wiring diagrams, which can be used for modeling and simulation as well as for browsing and retrieval (2). Ori- the scope of KEGG LIGAND has been significantly ginally, the wiring diagrams involved endogenous molecules, expanded to cover both endogenous and exogenous both those that are directly encoded in the genome (proteins 65 molecules. Specifically, RPAIR contains curated and RNAs) and those that are indirectly encoded through chemical structure transformation patterns extracted biosynthetic/biodegradation pathways (metabolites, glycans 35 from known enzymatic reactions, which would enable and so on). Now we are extending these wiring diagrams to analysis of genome-environment interactions, such include exogenous molecules. This will help understand inter- as the prediction of new reactions and new enzyme actions between the biological system and the natural envir- 70 genes that would degrade new environmental onment, and would eventually lead to representation and compounds. Additionally, drug information is now reconstruction of another higher-level biological system, the 40 stored separately and linked to new KEGG DRUG biological world. Here we report new developments in KEGG towards this direction. structure maps. *To whom correspondence should be addressed. Tel: +81 774 38 3270; Fax: +81 774 38 3269; Email: [email protected] The Author 2006. Published by Oxford University Press. All rights reserved. The online version of this article has been published under an open access model. Users are entitled to use, reproduce, disseminate, or display the open access version of this article for non-commercial purposes provided that: the original authorship is properly and fully attributed; the Journal and Oxford University Press are attributed as the original place of publication with the correct citation details given; if an article is subsequently reproduced or disseminated not in its entirety but only in part or as a derivative work this must be clearly indicated. For commercial re-use, please contact [email protected] Downloaded from https://academic.oup.com/nar/article-abstract/34/suppl_1/D354/1133379 by Ed 'DeepDyve' Gillespie user on 31 January 2018 Nucleic Acids Research, 2006, Vol. 34, Database issue D355 Table 1. URLs for the KEGG resource Database/content URL KEGG home page www.genome.jp/kegg/ KEGG table of contents www.genome.jp/kegg/kegg2.html KEGG PATHWAY www.genome.jp/kegg/pathway.html KEGG GENES www.genome.jp/kegg/genes.html KEGG LIGAND www.genome.jp/kegg/ligand.html KEGG BRITE www.genome.jp/kegg/brite.html KGML www.genome.jp/kegg/xml/ KEGG API www.genome.jp/kegg/soap/ KEGG DRUG www.genome.jp/kegg/drug/ KEGG GLYCAN www.genome.jp/kegg/glycan/ KEGG REACTION www.genome.jp/kegg/reaction/ KEGG EXPRESSION www.genome.jp/kegg/expression/ KEGG ANNOTATION www.genome.jp/kegg/kaas/ KegArray/KegDraw www.genome.jp/download/ DBGET www.genome.jp/dbget/ BLAST/FASTA blast.genome.jp/ GenomeNet FTP www.genome.jp/anonftp/ GenomeNet home page www.genome.jp/ Figure 1. The overall architecture of KEGG now consisting of four main components. KEGG BRITE has been formally added to establish a logical The current GenomeNet address ‘www.genome.jp’ is recommended, but the foundation for inference of higher-order functions. previous address ‘www.genome.ad.jp’ will still be made available. Table 2. Functional hierarchies in KEGG BRITE THE KEGG RESOURCE Overview Network hierarchy KO KEGG consists of four main databases. As illustrated in Protein families Figure 1 they are categorized as building blocks in the Enzymes 5 genomic space (GENES databases) and the chemical space Transcription factors Ribosome (LIGAND database), wiring diagrams in the network space Translation factors (PATHWAY database) and ontologies for pathway recon- ABC transporters struction (BRITE database). BRITE had been a separate data- G-protein-coupled receptors base for many years, but it was formally included in KEGG Ion channels Cytokines 10 in release 34.0 (April 2005) to establish a logical foundation Cytokine receptors for the KEGG Project. The URLs for accessing KEGG are Cell adhesion molecules (CAMs) summarized in Table 1. CAM ligands Biological systems are represented in KEGG by two types CD molecules of graphs, called nested graphs and line graphs in theoretical Bacterial motility proteins Compounds 15 computer science. The nested graph is a graph whose nodes Compounds with biological roles can themselves be graphs. It is used for representing KEGG Lipids network hierarchy and for pathway reconstruction and func- Phytochemical compounds tional inference. The line graph is a graph derived by inter- Compound interactions Ion channel agonists/antagonists changing nodes and edges of another graph. It represents the Cytochrome P450 substrates 20 inherent complementarity of the metabolic pathway, which Drugs can be viewed either as a network of genes (enzymes) or as Therapeutic category of drugs a network of compounds, meaning that one can be generated Drug classification from the other by the line graph transformation. Thus, the line Diseases Disease genes, genomes and pathways graph is the basis for integrated analysis of genomic and Organisms 25 chemical information. KEGG organisms As on September 12, 2005. BRITE database KEGG BRITE is a collection of hierarchies and binary relations with two inter-related objectives corresponding to classifications for compounds and drugs tentatively called the two types of graphs: to automate functional interpretations chemical ontology as shown in Figure 1. We plan to extend 30 associated with the KEGG pathway reconstruction and to assist the KO system to include the definition of functional modules discovery of empirical rules involving genome-environment in the KEGG pathways and to develop ontologies for compu- 40 interactions. Currently, we focus on hierarchical structuring tational inference of higher-order functions. of our knowledge on functional aspects of the genomic and PATHWAY database chemical spaces (Table 2), including the KEGG orthology 35 (KO) system for ortholog/paralog gene groups, the reaction The KEGG PATHWAY database is a collection of manually classification (RC) system for biochemical reactions, and other drawn pathway maps for metabolism, genetic information Downloaded from https://academic.oup.com/nar/article-abstract/34/suppl_1/D354/1133379 by Ed 'DeepDyve' Gillespie user on 31 January 2018 D356 Nucleic Acids Research, 2006, Vol. 34, Database issue processing, environmental information processing such as A growing number of protein families are being added to signal transduction, various other cellular processes and the KO system, and they are shown in separate hierarchies 60 human diseases. During the past 2 years we have significantly different from the KEGG network hierarchy. The KO system increased the number of pathway maps for regulatory path- can be best viewed from the KEGG BRITE database (Table 2). 5 ways including signal transduction, ligand–receptor interac- LIGAND database tion and cell communication, all based on extensive survey of published literature. For metabolic pathways we created two Originally, the LIGAND database consisted of just two new sections, ‘Glycan Biosynthesis and Metabolism’ and components: ENZYME for enzyme nomenclature and COM- 65 ‘Biosynthesis of Polyketides and Nonribosomal Peptides’. POUND for chemical compound structures (6). It later suc- 10 The XML version of the pathway maps is available for both cessively included additional components: REACTION for metabolic and regulatory pathways. These KEGG Markup chemical reaction formulas, GLYCAN for glycan structures, Language (KGML) files provide graph information that can RPAIR for reactant pair transformation patterns and DRUG be used to computationally reproduce and manipulate KEGG for drug information. This expansion of the LIGAND collec- 70 pathway maps. tion represents our expanded efforts for understanding the chemical space that is part of the biological world. 15 GENES database The KEGG DRUG database is a new addition from KEGG release 36.1 (December 2005). It contains chemical structures The KEGG GENES database is a collection of gene catalogs and additional information such as therapeutic categories and 75 for all complete genomes and some partial genomes (31 euka- target molecules. A most unique feature of KEGG DRUG is a ryotes, 235 bacteriaand 23 archaea as of September 12, 2005), collection of drug structure maps, which graphically illustrate, generated from publicly available resources, mostly NCBI in a manner similar to KEGG pathway maps, our knowledge 20 RefSeq (3). All genomes in KEGG GENES are subject to on groups of chemical structural patterns, therapeutic categor- SSDB computation and given manual KO assignments as ies, their relationships and the chronology of drug develop- 80 described below. There are auxiliary collections of gene ment if known. catalogs: DGENES for draft genomes (21 eukaryotes) and EGENES for expressed sequence tag consensus contigs Reaction classification 25 (25 plants). These are meant to supplement the repertoire of KEGG organisms, and all are given automatic KO assign- The RC system in the chemical space is a counterpart of the ments using GENES as a reference dataset. Each GENES KO system in the genomic space (Figure 1). It represents our entry contains cross-reference information to outside attempt to organize knowledge on chemical reactions by 85 databases, including NCBI gi numbers, Entrez Gene IDs categorizing chemical structure transformation patterns. The 30 and UniProt accession numbers. Starting with KEGG release REACTION database contains individual reaction formulas 37.0 (January 2006) automatic ID conversion is implemented taken from the ENZYME database. Each reaction formula enabling use of such outside identifiers to access KEGG is split into a set of substrate-product pairs, and the chemical GENES and then the other KEGG databases. structure comparison program SIMCOMP is applied to obtain 90 an optimal alignment. This comparison is based on atom typ- KEGG orthology ing, which is the conversion of regular atomic (C, N, O, S, P and so on) representation to what we call KCF representation 35 There is a total of over one million genes in KEGG GENES, that consists of 68 atom types distinguishing functional groups representing a tiny, but well-characterized part of the genomic and atomic environments (7). The chemical structure align- 95 space that makes up the biological world. From this part we ment generated by SIMCOMP is used to define the R atom for organize knowledge about orthologous genes and paralogous the reaction center, the D atom(s) for adjacent atom(s) in the genes, which, we hope, can be generalized for understanding mismatched region and the M atom(s) for adjacent atom(s) in 40 the entire genomic space. This knowledge is stored in the the matched region (8). This is first done computationally and KO system, a pathway-based classification of orthologous is followed by extensive manual curation. 100 genes, including orthologous relationships of paralogous The RPAIR database is still under development, but it is the gene groups. The KO identifier, or the K number, is a common basis for the RC system categorizing curated RDM patterns. identifier for linking genomic information in the GENES data- Since an enzymatic reaction usually involves multiple sub- 45 base with network information in the PATHWAY database. strates and products, one EC number corresponds to a com- The pathway nodes represented by rectangles in the KEGG bination of RDM patterns. The RC system has enabled 105 reference pathway maps are given KO identifiers, so that automatic assignment of EC numbers from a set of substrate organism-specific pathways can be computationally generated and product structures (8) and will further enable exploration once each genome is annotated with KO’s. This annotation or of unknown reactions by generating plausible combinations of 50 the KO assignment is done manually for KEGG GENES with RDM patterns, which may then be related to possible paralogs the help of the GFIT tool using best-hit relations in pairwise of enzyme genes. 110 genome comparisons stored in the SSDB database (4). Because the number of ortholog groups that can be linked to Glycosyltransferase reactions pathways is limited, we have introduced two additional ways 55 to define KO’s. One is to use COG (5) to cover a broad-range Functional glycomics has been a most successful area for of possible ortholog groups. The other is to rely on experts’ integrated analysis of genomic and chemical information classifications of protein families, which tend to be more (9). The carbohydrate sequence of glycans is determined by functionally oriented resulting in narrowly defined KO’s. a specific set of biosynthetic reactions catalyzed by different 115 Downloaded from https://academic.oup.com/nar/article-abstract/34/suppl_1/D354/1133379 by Ed 'DeepDyve' Gillespie user on 31 January 2018 Nucleic Acids Research, 2006, Vol. 34, Database issue D357 types of glycosyltransferases. Thus, once we know the reper- ACKNOWLEDGEMENTS toire of glycosyltransferases in the genome or in the transcrip- The KEGG project is supported by the Institute for 35 tome, it should in principle be possible to predict the repertoire Bioinformatics Research and Development of the Japan of glycan structures. Conversely, the knowledge about glycan Science and Technology Agency, the 21st Century COE 5 structures can be used to search and annotate new glycosyl- program ‘Genome Science’, and a grant-in-aid for scientific transferases. Composite Structure Map in KEGG GLYCAN is research on the priority area from the Ministry of Education, a tool for converting genomic or transcriptomic data to glycan Culture, Sports, Science and Technology of Japan. The com- 40 structure variations based on a curated set of known glycosyl- putational resources were provided by the Bioinformatics transferase reactions. Center, Institute for Chemical Research, Kyoto University. Funding to pay the Open Access publication charges for this article was provided by the grant-in-aid for scientific research. 10 ACCESSING KEGG Conflict of interest statement. None declared. 45 Web and FTP KEGG is the major component of the Japanese GenomeNet, which is served by the Kyoto University Bioinformatics REFERENCES Center. The other GenomeNet services including DBGET 1. Kanehisa,M. (1997) A database for post-genome analysis. Trends 15 and BLAST/FASTA searches are now primarily developed Genet., 13, 375–376. and used to support KEGG. The official URL for GenomeNet 2. Kanehisa,M., Goto,S., Kawashima,S., Okuno,Y. and Hattori,M. (2004) has been modified to http://www.genome.jp/, but the former The KEGG resource for deciphering the genome. Nucleic Acids Res., 50 URL http://www.genome.ad.jp/ will still be made available 32, D277–D280. 3. Pruitt,K.D., Tatusova,T. and Maglott,D.R. (2005) NCBI Reference (Table 1). To download the KEGG data, academic users Sequence (RefSeq): a curated non-redundant sequence database 20 may use the GenomeNet FTP site. of genomes, transcripts and proteins. Nucleic Acids Res., 33, D501–D504. 55 4. Kanehisa,M., Goto,S., Kawashima,S. and Nakaya,A. (2002) The KEGG databases at GenomeNet. Nucleic Acids Res., 30, 42–46. KEGG API 5. Tatusov,R.L., Natale,D.A., Garkavtsev,I.V., Tatusova,T.A., Shankavaram,U.T., Rao,B.S., Kiryutin,B., Galperin,M.Y., Fedorova,N.D. The KEGG API service has become an increasingly popular and Koonin,E.V. (2001) The COG database: new developments in 60 mode of access. It is the SOAP/WSDL interface to KEGG, phylogenetic classification of proteins from complete genomes. Nucleic Acids Res., 29, 22–28. enabling users to write their own programs to access, custom- 6. Goto,S., Nishioka,T. and Kanehisa,M. (1998) LIGAND: chemical database 25 ize and utilize KEGG. for enzyme reactions. Bioinformatics, 14, 591–599. 7. Hattori,M., Okuno,Y., Goto,S. and Kanehisa,M. (2003) Development of a 65 KegArray and KegDraw chemical structure comparison method for integrated analysis of chemical and genomic information in the metabolic pathways. J. Am. Chem. Soc., KegArray and KegDraw are standalone Java applications that 125, 11853–11865. make use of the KEGG resources. KegArray is for microarray 8. Kotera,M., Okuno,Y., Hattori,M., Goto,S. and Kanehisa,M. (2004) Computational assignment of the EC numbers for genomic-scale 70 data analysis in conjunction with KEGG pathways and gen- analysis of enzymatic reactions. J. Am. Chem. Soc., 126, 30 omes. KegDraw is for drawing glycan structures and chemical 16487–16498. compound structures, which can then be used to query against 9. Hashimoto,K., Goto,S., Kawano,S., Aoki-Kinoshita,K.F., Ueda,N., KEGG and PubChem databases. Both are freely available to Hamajima,M., Kawasaki,T. and Kanehisa,M. (2005) KEGG as a glycome informatics resource. Glycobiology, in press. 75 academic and non-academic users. Downloaded from https://academic.oup.com/nar/article-abstract/34/suppl_1/D354/1133379 by Ed 'DeepDyve' Gillespie user on 31 January 2018
Nucleic Acids Research – Oxford University Press
Published: Jan 1, 2006
You can share this free article with as many people as you like with the url below! We hope you enjoy this feature!
Read and print from thousands of top scholarly journals.
Already have an account? Log in
Bookmark this article. You can see your Bookmarks on your DeepDyve Library.
To save an article, log in first, or sign up for a DeepDyve account if you don’t already have one.
Copy and paste the desired citation format or use the link below to download a file formatted for EndNote
Access the full text.
Sign up today, get DeepDyve free for 14 days.
All DeepDyve websites use cookies to improve your online experience. They were placed on your computer when you launched this website. You can change your cookie settings through your browser.