ECCB 2002 Poster sorted by: Author | Number

Next | Previous poster (in order of the view you have selected)

Title: TRANSFAC® Database on Eukaryotic Gene Transcription Regulation: Innovations and Improvements in Content, Structure and Bioinformatic Tools
P143
Scheer, Maurice P.; Matys, Volker; Fricke, Ellen; Land, Sigrid; Thiele, Susanne; Michael, Holger; Gößling, Ellen; Hornischer, Klaus; Reuter, Ingmar; Kel, Alexander E.; Kel-Margoulis, Olga V.; Wingender, Edgar

mas@biobase.de
BIOBASE GmbH, Halchtersche Str.33, D-38304 Wolfenbüttel, Germany

Transcription regulation is a key process in the realization of genetic information in multicellular organisms. Due to the huge amount of results and data emerging in this field a strong need arises to store, order and interrelate them. TRANSFAC® is a relational database that contains data about eukaryotic transcription factors and the genes regulated by them (Wingender et al., 2001). It is distributed in form of seven textual flatfiles (FACTOR, GENE, SITE, MATRIX, CELL, CLASS, REFERENCE).

The FACTOR table provides structural (e.g. amino acid sequence, DNA-binding domains, differences between isoforms), functional (e.g. regulatory effects), and physico-chemical properties (e.g. mass) of transcription factors, data about their expression and links to physically interacting factors. In total more than 4800 transcription factors from vertebrates (>3000), invertebrates (>300), plants (>800) and fungi (>450) are present so far. A new feature in the FACTOR table is the inclusion of direct links from the transcription factors to their target genes as well as to the encoding gene enabling, for example, the identification of autoregulatory loops. To determine the transcriptional activity of a transcription factor detailed information about its expression pattern is necessary. Therefore the FACTOR expression field was expanded taking advantage of CYTOMER®, a hierarchically organized relational database on gene expression sources on the level of organs, tissues, cell types and developmental stages (Wingender et al., 2001). TRANSFAC® now provides preformatted, experimental expression data like method of detection, molecule type that was detected (e.g. mRNA, protein), organ and/or developmental stage and relative level of expression. A strict vocabulary is used making expression data easily comparable between different factors and enables searching for factors that are expressed in the same organ/and or at the same stage.
The GENE table gives information about genes including name, synonyms, a list of transcription factor binding sites in the different regulatory regions (e.g. promoter, enhancer). Furthermore chromosomal location (especially of human genes) is included. In addition, GENE table now contains links to the following databases: EMBL, LocusLink, RefSeq, OMIM and TRANSPATH®, a database on signal transduction elements and its cascades (Schacherer et al., 2001) . In total more than 3500 genes are present in TRANSFAC® now.
The SITE table contains DNA-binding sites comprising sequence, position, experimental methods and binding transcription factors. Here links to TRANSPATH® have been introduced. The number of binding sites exceeds 12000. Along with genomic binding sites TRANSFAC® contains information about oligonucleotides taken, for example, from binding site selection experiments.
The database also comprises nucleotide weight matrices for a number of transcription factors that are stored in the MATRIX table. For construction of matrices we used both in vitro selection studies and compiled genomic binding sites. Now the number of matrices reaches almost 600 (about 20% increase in the last year). Libraries of these matrices are used for prediction of potential binding sites with the bioinformatic tools MatchTM and PatchTM that belong to the TRANSFAC® system (Goessling et al., 2001).

MatchTM is a weight matrix-based tool for searching putative transcription factor binding sites in DNA sequences. The user may construct and save his/her specific profiles which are selected subsets of matrices including default or user-defined cut-off values. A public version of the MatchTM tool is available at http://www.gene-regulation.com/pub/programs.html.
TRANSPLORERTM (TRANScription exPLORER) is a software package for the analysis of transcription regulatory sequences with a powerful graphical user interface. It includes MatchTM and DPF (Dragon Promoter Finder; http://sdmc.krdl.org.sg/promoter) for prediction of promoters. TRANSPLORERTM comes with a large number of filtering options, which allow the user to specify which kind of sites he wants to see in the program output. It is possible, e. g., to restrict the program to show only potential binding sites for human factors, for factors belonging to a certain class, for tissue-specific factors, etc. TRANSPLORERTM can visualize feature information of EMBL or GenBank® entries.
MatchTM and TRANSPLORERTM are provided with a number of profiles that are optimised for particular search tasks. There are profiles for minimizing the false negative (minFN) or the false positive error rate (minFP). MinFN cut-offs were estimated on the sets of known genomic binding sites from TRANSFAC® and can guarantee a minimal number of missing sites. They can be used for a thorough analysis of short regulatory sequences. MinFP cut-offs are designed for scanning of long genomic sequences and reveal the best scoring sites only. In addition, a number of tissue- or regulatoy process-specific profiles (muscle, liver, immune cells, cell cycle control) is provided to enable the search for potential regulatory elements in genes with partially known function.
[1] Wingender E., Chen X., Fricke E., Geffers R., Hehl R., Liebich I., Krull M., Matys V., Michael H., Ohnhäuser R., Prüß M., Schacherer F., Thiele S., Urbach S.: The TRANSFAC system on gene expression regulation. Nucleic Acids Res. 29: 281-283 (2001).
[2] Schacherer F., Choi C., Gotze U., Krull M., Pistor S., Wingender E.:The TRANSPATH signal transduction database: a knowledge base on signal transduction networks. Bioinformatics 17: 1053-1057 (2001).
[3] Goessling E., Kel-Margoulis O.V., Kel A.E., Wingender E.: Match? - a tool for searching transcription factor binding sites in DNA sequences. Application for the analysis of human chromosomes. Proceedings of the German Conference on Bioinformatics (GCB 2001). E. Wingender, R. Hofestädt, I. Liebich (eds.). GBF-Braunschweig and University of Bielefeld, pp. 158-161 (2001).