ECCB 2002 Poster sorted by: Author | Number

Next | Previous poster (in order of the view you have selected)

Title: Analysis of topological representations of transcriptional regulatory regions
P140
Sand, Olivier; Vu, Tien Dung; Gilbert, David; Viksna, Juris

osand@brc.dcs.gla.ac.uk
Bioinformatics Research Centre, Department of Computer Science, University of Glasgow

Transcriptional regulatory regions (TRR) of genes play an essential role in genomes, since they mediate the selective expression of proteins in response to the availability of metabolite in the external medium, the developmental stage, the presence of a stress, etc. The ability of a sequence to regulate the level of transcription of a neighbouring gene is due to the action of very short segments that are specifically recognised by transcription factors. In higher organisms, regulatory sites tend to aggregate in so called composite elements (CE), i.e. DNA regions where several transcription factors bind simultaneously and interact either synergistically or competitively, contributing to a highly specific pattern of gene transcriptional regulation (Kel et al., 1995).
We have developed a topological representation of transcriptional regulatory regions (TREGS) providing a high-level abstraction we are using for pattern matching and pattern discovery. This representation is in the form of a context-sensitive grammar, describing binding sites and their associated factors as well as interactions between bound factors. We have generalized this representation to permit the definitions of patterns over TREGS, in a manner similar to that which we have developed for protein topology (Gilbert et al., 1999). These patterns are defined in terms of regulatory element aggregations rather than nucleotide sequences.
We have collected information on regulatory regions from publicly available databases: COMPEL (http://compel.bionet.nsc.ru/), TRRD (http://wwwmgs.bionet.nsc.ru/mgs /dbases/trrd4/trrdintro.html) and TRANSFAC (http://transfac.gbf.de.TRANSFAC/index.html) and stored this in a relational database implemented in MySQL. The data has been cleaned to remove synonyms, alternative spellings and typographical errors, using a program that we have developed based on our FURY programming system (Gilbert and Schroeder, 2000). In addition, we have developed a method to automatically generate a graphical representation ("cartoon") of a TREGS.
We are now developing a TREGS pattern database. To start, we have compiled a set of patterns by hand, and stored them in that database, together with an automatically generated cartoon for each pattern, annotated with the TREGS it describes.
We are now designing methods to automatically discover TREGS patterns from TRR data. Two approaches are considered. The first one is sequence based (Brazma et al., 1998). It is focusing on a subset of our pattern language, comprising regular expressions over TRR binding element sequences and uses a dynamic programming algorithm to find the longest common subsequence (LCS) of each pair of sequences (Pevzner, 2000). The other is graph based. It uses the Bron-Kerbosch algorithm (Bron and Kerbosch, 1973) to find the maximal clique of the edge product graph of each pair of instance graphs.
We have grouped the 607 genes in our database by performing an all against all pair-wise pattern discovery over the TREGS and hierarchically clustering them using the OC program (Barton, 1993). The clustering is done according to a similarity measure and means linkage analysis. We then generated unique non-null TREGS patterns for the clusters and computed their compression. We have evaluated the goodness of those patterns with reference to the entire TREGS database. Our initial results have shown that
(i) Many of our groups of genes are associated with 'good' (characteristic) patterns.
(ii) The overlapping nature of transcription factors binding sites has very little effect on the goodness of the patterns discovered.
We are now working on evaluating the relationship between the discovered TREGS patterns and the expression profiles of the genes sharing the patterns.
[1] Barton, C. J. (1993) OC- A cluster analysis program, [http://www.compbio.dundee.ac.uk/Software/ OC/oc.html].
[2] Brazma, A., Jonassen, I., Eidhammer, I. and Gilbert, D. R. (1998) Approaches to the automatic discovery of patterns in biosequences., Journal of Computational Biology. 5:2, 277-303.
[3] Bron, C., and Kerbosch, J. (1973) Algorithm 457 - finding all cliques of an undirected graph. Commun. ACM. 16, 575-577.
[4] Gilbert, D. R. and Schroeder, M., FURY: Fuzzy unification and resolution based on edit distance, BIBE 2000: IEEE International Symposium on Bio-Informatics and Biomedical Engineering, November 8-10 2000.
[5] Gilbert, D. R., Westhead, D. R., Nagano, N. and Thornton, J. M. (1999) Motif-based searching in TOPS protein topology databases. Bioinformatics. 15:4, 317-326.
[6] Kel, O. V., Romaschenko, A. G., Kel, A. E., Wingender, E., and Kolchanov, N. A. (1995) A compilation of composite regulatory elements affecting gene transcription in vertebrates. Nucleic Acid Res. 23, 4097-4103.
[7] Pevzner, P. A. (2000) Computational molecular biology: an algorithmic approach. In Istrail, S., Pevzner, P., and Waterman, M. (Eds). MIT Press, Cambridge, Massachusetts, 96-97.