Karopka, Thomas;Scheel, Thomas;Bansemer, Sven;Glass, Änne - Automatic construction of genetic networks using artificial neural networks and natural language processing

ECCB 2002 Poster sorted by: Author | Number

Next | Previous poster (in order of the view you have selected)

Title: Automatic construction of genetic networks using artificial neural networks and natural language processing	P76
Karopka, Thomas; Scheel, Thomas; Bansemer, Sven; Glass, Änne thomas.karopka@medizin.uni-rostock.de, thomas.scheel2@medizin.uni-rostock.de, sven.bansemer@medizin.uni-rostock.de, aenne.glass@medizin.uni-rostock.de University of Rostock, Faculty of Medicine, Institute for Medical Informatics and Biometry

Large-scale genomic analysis like microarray expression profiling has produced huge amounts of data. Yet there is no standard way of analyzing and interpreting these data. Several methods have been proposed in the literature, all intended to explore the relationships in the data [1],[2],[3],[4]. However, these results are only useful, if they can be related to existing knowledge. Recently, there was a growing interest in applying information extraction to molecular biology to integrate existing knowledge into the analysis process [5],[6],[7].
One promising way in microarray expression data analysis is the combination of computational analysis methods like artificial intelligence (AI) with methods of information extraction (IE).

One way for characterizing diseases on a molecular level is to apply causal genetic networks as models [8]. In our project we generate these causal genetic networks automatically from microarray gene expression data using artificial neural networks (ANNs) and natural language processing (NLP) techniques. The data to be analyzed is a result of microarray gene expression experiments which are generated using Affymetrix GeneChips(TM). This data is fed through an ART1 ANN. The purpose of this data processing is to identify subsets of genes which represent typical patterns of gene expression in a set of experiments under same conditions. In particular these are type of disease and or state of disease, tissue, organ, organism. In a second phase we need to know the causal relationships between this classified subset of genes to generate causal genetic networks.
One of the richest sources of biomedical knowledge is the PubMed database. PubMed currently contains more than 12 million articles about biomedical subjects. However, this knowledge is expressed in unstructured text. Therefore, the only way to mine this knowledge is the usage of sophisticated text mining techniques. Our system contains a central database storing microarray expression data as experimental results and a list of gene names and gene synonyms. The synonym list was automatically gathered from online databases like GeneCards[9], HGNC[10] and OMIM[11]. A second list of domain specific words which express a relationship between genes is used in combination with the list of genes to generate a query to the PubMed database. The relevant abstracts are downloaded and analyzed using information extraction methods. The text analysis is done by using the GATE 2 IE system developed at the University of Sheffield [12]. We have developed special grammars to identify the relationship between two genes. In our context we do not need a high recall as a measure for the performance of the IE algorithm but a high precision because we assume that an important relationship is described in several abstracts and we only need the relationship once. Therefore the focus was on high precision when developing the grammar rules. The identified information is filled into templates and then converted to database entries. In the result of our hybrid intelligent system using methods of AI and IE we got a subset of genes typical for disease and causal relations between them. These information we need to model causal genetic networks. A visualization tool is used to present the networks to the user. Additionally it is possible to visualize the experimental results in several views including a 3-D view of the microarray data.
In the current stage of the project we have implemented the database, the visualization tool, the AI module and the NLP module as components of our hybrid system. For evaluation purposes of these components we have constructed a causal genetic network for NF-kappaB interactions as immune response in multiple sclerosis(MS) and rheumatoid arthritis(RA) as well as networks of Drosophila and sea urchin semi-automatically. First results for the AI module and the NLP module are available for microarray experimental data. To generate causal genetic networks from experimental data we have to combine these results and compare them to the evaluation data.

In future work we would like to apply AI methods like case based reasoning (CBR) to evaluate generated causal networks for characterizing diseases and thus provide a bioinformatic approach to analyze huge amounts of microarray gene expression data.
Our work is part of the project "Genomorientierte Klinische Forschung" and is funded by the BMBF (FKZ:01ZZ0108,01GG9831).

[1] Quackenbusch J (2001) Computational Analysis of Microarray Data. Nature Genetics 2:418-427
[2] Ramaswamy S and Golub T R (2002) DNA Microarrays in Clinical Oncology. Journal of Clinical Oncology 20:1932-1941
[3] Golub T R et al. (1999) Molecular Classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286:531-537.
[4] Tamayo P et al. (1999) Interpreting patterns of gene expression with self-organizing maps: methods and application to hematopoietic differentiation. PNAS USA 96:2907-2912.
[5] Blaschke C, Hirschmann L and Valencia A (2002) Information extraction in molecular biology. Briefings in Bioinformatics 3:154-165.
[6] Jenssen T K, Laegreid A Komorowski J and Hovig E (2001) A literature network of human genes for high-throughput analysis of gene expression. Nature Genetics 28:21-28.
[7] Friedmann C, Kra P, Yu H, Krauthammer M and Rzhetsky A (2001) GENIES: A natural-language processing system for the extraction of molecular pathways from journal articles. Bioinformatics Suppl. 1:74-82.
[8] Glass Ä, Karopka T (2002) Genomic Data Explosion - the Challenge for Bioinformatics? In: Advances in Data Mining; Applications in E-Commerce, Medicine and Knowledge Management. Perner P (eds). Springer-Verlag ISBN 3-540-44116-6. Berlin. Vol 2394 (in Press).
[9] Rebhan M, Chalifa-Caspi V, Prilusky J, Lancet D (1997) GeneCards: encyclopedia for genes, proteins and diseases. Weizmann Institute of Science, Bioinformatics Unit and Genome Center (Rehovot, Israel). World Wide Web URL: http://bioinformatics.weizmann.ac.il/cards
[10] HUGO Human Gene Nomenclature Committee. World Wide Web URL: http://www.gene.ucl.ac.uk/nomenclature/
[11] Online Mendelian Inheritance in Man - OMIM (TM). McKusick-Nathans Institute for Genetic Medicine, Johns Hopkins University (Baltimore, MD) and National Center for Biotechnology Information, National Library of Medicine (Bethesda, MD), 2000.
World Wide Web URL: http://www.ncbi.nlm.nih.gov/omim/
[12] URL: http://gate.ac.uk