ECCB 2002 Poster sorted by: Author | Number

Next | Previous poster (in order of the view you have selected)

Title: Computational analysis of metabolic pathways with the program MING
P4
Aguilar, Daniel; Oliva, Baldomero; Aviles, Francesc-Xavier; Querol, Enrique

daguilar@ibb.uab.es
Institut de Biotecnoligia i Biomedicina at Universitat Autonoma de Barcelona, Spain

INTRODUCTION: Enormous amounts of data are resulting from genome sequencing projects and new experimental methods. As an effort to handle the growing knowledge on biochemical pathways, some pathway databases have been made publicly available (Wittig & De Beuckelaer, 2001). Moreover, the availability of computerized simulations of biochemical pathways opens new opportunities for developing tools to characterize and understand the interactome and the metabolome. Since the interpretation and explanation of these metabolic data constitute a major challenge, tools for querying databases, modeling and visualizing metabolic pathways and regulatory networks have been developed. Those tools use different modeling approaches including structural pathway synthesis, stochiometric pathway analysis, metabolic flux analysis and metabolic control analysis among others (Wiechert, 2002). Our aim was to develop a tool for automatically integrate, generate and visualize interaction pathways from the currently available data. The program, called MING (Metabolic Interacting Nets Generator) will soon be publicly available through a web-based interface.
METHODS: We previously built a database by cross-checking the information from databases related to metabolism (i.e. LIGAND, KEGG, BRENDA) (Goto et al, 2002; Kanehisa & Goto, 2000; Schomburg et al, 2002) and to protein interaction (i.e. DIP) (Xenarios, 2002) for 15 organisms (10 eukaryotes, 4 bacteriae and 1 archea). However, organization and nomenclature problems and lack of experimental data are often present in these databases. Therefore, a previous data mining was to be done in order to extract the information and several subsequent minings were necessary to assure coherency of the data. Our pathway prediction approach is based on the concept of graphs; for example, the interactome is a graph with proteins as nodes and their relationships (i.e. a metabolite, a phyiscal protein interaction) are arrows linking the edges between pairs of nodes. Given a pathway length and other constraints added by the user (i.e. protein names or specific enzymatic functions or metabolites, among others), a recursive algortihm on the connectivity matrix representation of the graph will find the pathways fulfilling the user's constraints. As a result, the user is provided with multiple fully interactome-oriented pathways which the program can cluster according to different user-defined parameters. Moreover, the program can phylogenetically compare the predicted pathways with those of different organisms using different comparison criteria. Therefore, the program can also be seen as a useful tool for phylogenetic analysis.
RESULTS: In general, model validation turns out to be particularly difficult for metabolic systems, no matter which modelling system has been used (Wiechert, 2002). The power of our program strongly depends on the simplifications made, the data sources we used and the user's constraints. The algorithm is designed to err on the side of more false positives to bring more potential pathways to the user's attention, meaning that all possible pathways within the oriented graph fulfilling the user's constraints will be clustered and visualized. To determine the accuracy of our predicted interaction pathways, we tried to predict some well-known pathways. Out of ten pathways tested, seven of the computationally predicted pathways were consistent with the literature and three were not found due to the lack of information for one (or more) of the steps in the source databases. Some additional pathways were found that had not been described in the literature in spite of being biochemically correct. This may be due to the fact that either they are still not known or they do not take place in the query organism. All pathways could be subsequently aligned to similar pathways in different organisms. Although agreement between predicted and real pathways was good, two points should be taken into account: the first is that the lack of complete information on a given organism will always be a cause of false negatives and the second is that a false positive (meaning a known pathway as described in the literature) may be still a biologically valid pathway under certain conditions or in certain organisms since often there is not a biologically optimal solution.
[1] Goto S, Okuno Y, Hattori M, Nishioka T, Kanehisa M. LIGAND: database of chemical compounds and reactions in biological pathways. Nucleic Acids Res. 2002 Jan 1;30(1):402-42.
[2] Kanehisa M, Goto S. KEGG: kyoto encyclopedia of genes and genomes. Nucleic Acids Res. 2000 Jan 1;28(1):27-30.
[3] Schomburg I, Chang A, Hofmann O, Ebeling C, Ehrentreich F, Schomburg D. BRENDA: a resource for enzyme data and metabolic information. Trends Biochem Sci 2002 Jan;27(1):54-6.
[4] Wiechert W. Modeling and simulation: tools for metabolic engineering. J Biotechnol 2002 Mar 14;94(1):37-63.
[5] Wittig U, De Beuckelaer A. Analysis and comparison of metabolic pathway databases. Brief Bioinform 2001 May;2(2):126-424.
[6] Xenarios I, Salwinski L, Duan XJ, Higney P, Kim SM, Eisenberg D. DIP, the Database of Interacting Proteins: a research tool for studying cellular networks of protein interactions.Nucleic Acids Res. 2002 Jan 1;30(1):303-5.