Ehrentreich, F.;Schomburg, D. - Dynamic generation and qualitative analysis of metabolic pathways by a joint database / graph theoretical approach

ECCB 2002 Poster sorted by: Author | Number

Next | Previous poster (in order of the view you have selected)

Title: Dynamic generation and qualitative analysis of metabolic pathways by a joint database / graph theoretical approach	P30
Ehrentreich, F.; Schomburg, D. f.ehrentreich@uni-koeln.de, D.Schomburg@uni-koeln.de Universität zu Köln, Institut für Biochemie, Zülpicher Straße 47, D-50674 Köln

Applications of graph theory for investigation of large biochemical reaction networks have been described by several authors, e.g. [1, 2]. An important feature of our approach concerns the dynamic generation of metabolic networks relying on continuously growing, qualified metabolic data from BRENDA [3] and KEGG [4].

The enzyme catalyzed biochemical reactions are the most important primary information for generating metabolic networks, including substrates and products represented by its compound-ID's, catalyzing enzyme and qualitative evaluation of the thermodynamic and kinetic behavior as reversible or irreversible reactions. Further properties used as attributes are: pathway-ID's, chemical compound name, reactions ID's, stoichiometric coefficients, and references to raw and other data. By that means, access to the structure of the low molecular weight compounds at the abstraction level of connection tables (undirected graphs) could be achieved, e.g. for performing substructure searches. The compounds are stored as mol-file connection tables, representing a quasi standard in this area.

The data are stored in a MySQL [5] database that will be integrated into other BRENDA-activities [6] in the future. As library for the graph theoretical computations the LEDA-library [7] has been applied. Task sharing is performed according to the strategy: perform as much as possible using SQL-statements and restrict the rich algorithmic possibilities LEDA offers to the graph theoretical ones. Data transfer (forward and backward) between MySQL and the C++ based LEDA-library relies on the MySQL-C-API included in the MySQL distribution.

As preprocessing step before network generation, the reactions have to be normalized. Reversible reactions are splitted into forward and back reactions. If necessary, reactions are reverted to adopt to the conventional chemical style.
By appropriate SQL-selections reactions may be excluded to restrict the network size, e.g. excluding the most common compounds as water, oxygen, ATP and so on, excluding xenobiotica, constraining to specific pathways or specific organisms.
Applying SQL-commands, enzyme-substrate-enzyme chains may be build up from selected reaction data. Relying on the KEGG database and excluding the most common compounds 35000 EC-substrate-EC relationships have been build up.

After normalization, the compounds of the left reaction side are taken as tail and the enzymes as head. For the right side, the assignment is inverted. This separation lead to the model of a bipartite graph. Hence, the edges have not the chemical meaning of reactions from substrates to products, but to and from transition complexes, formed by the enzyme and the low molecular weight compounds. The enzyme-nodes help to avoid some of the problems caused by bimolecular and higher order reactions.

The described information system was applied in a joint project studying diabetes/MODY.
Consequences of enzyme disfunction at the metabolic level could be tolerated by the organism or tissue if the affected enzyme could be bypassed on other reactions pathways.
As appropriate tool for qualitative analysis the concept of strongly connected components has been applied to metabolic pathways studying MODY-enzymes. As an example it has been shwon for the glycolysis pathway that elimination of some of the MODY-enzymes preserves the strong connection, hence alternative reaction chains exist between substrate and products, while loss of others break the strong connection. The further refinement of the analysis by shortest path algorithms revealed the differences in the distance and sequence in more detail.

Other applications of the concept of connected components or strongly connected components concern the data evaluation and refinement. An analysis of the mentioned large KEGG-pathway has shown that most of the enzymes and low molecular weight compounds (more than 3000 nodes) are strongly connected and almost all of the other components have only single entries.
Similarly, organism specific analysis could show the frequency of actually investigated enzymes in the context of metabolic subnetworks. That item by itself shows the necessity to apply the proposed procedures with biochemical insight and not in a formal way.

[1] M. C. Kohn and W. J. Letzkus, J. Theoret. Biol., 100 (1983) 293.
[2] J. v. Helden; A. Naim; R. Mancuso; M. Eldridge; L. Wernisch; D. Gilbert and S. J. Wodak, Biol Chem., 381 (2000) 921.
[3] I. Schomburg, A. Chang, O. Hofmann, Ch. Ebeling, F. Ehrentreich, D. Schomburg, Trends in Biochemical Sciences, 27, 2002
[4] M. Kanehisa; S. Goto; S. Kawashima and A. Nakaya, Nucleic Acids Res., 30 (2002) 42.
[5] http://www.mysql.com/
[6] http://www.brenda.uni-koeln.de/
[7] K. Mehlhorn, S. Näher , M. Seel, C. Uhrig: LEDA, Version 4.1 (cf. also http:// www.algorithmic-solutions.de/leda.htm)