ECCB 2002 Poster sorted by: Author | Number

Next | Previous poster (in order of the view you have selected)

Title: MotifScanner: a novel probabilistic approach to screen DNA sequences for predefined regulatory elements
P165
Thijs, Gert; Aerts, Stein; Moreau, Yves; Marchal, Kathleen; De Moor, Bart

gert.thijs@esat.kuleuven.ac.be
ESAT-SCD, K.U.Leuven, Kasteelpark Arenberg 10, 3001 Leuven, Belgium

Deciphering the transcription regulatory mechanism is still a largely unsolved problem. Recently, the application of high throughput expression profiling techniques has allowed researchers to study in detail the transcriptome. Many algorithms have been developed to identify and to study sets of co-expressed genes. One important aspect in this analysis is the localization of potential transcription factor binding site in the promoter region of the genes at hand.
To this end, we have designed and implemented a method based on a probabilistic sequence model to screen upstream sequence with precompiled motif models. The proposed method is based on the sequence model, which we used in our Gibbs sampling method for motif detection (1). This model states that the binding sites are hidden in a noisy background sequence. We use this probabilistic to estimate the number of instances of a motif in a sequence given the background model and the motif model. Except from the motif model and the background model, the algorithm has one important parameter: the prior probability of finding one copy of a motif instance. If this prior is set to a value close to zero then only instances that very closely match the matrix model will be retained. When the prior is increased, more degenerated instances are also retained as potential binding sets. The prior can be seen as a threshold to define the quality of the instances.
Motif models can be constructed from the known motif instances which are stored in specialized databases like TransFac(2), PlantCARE(3) and SCPD(4). The motif models can also be the result of a motif finding algorithm like our MotifSampler(1). The results show that the quality of the motif model has a profound impact on the detection process.
We have extensively tested our program by screening promoter sequences of several organisms with motif models extracted from different source (2,3,4). This allowed us to estimate the expected number of potential binding sites throughout a genome. This expected frequency was used to test the significance of finding a specific binding factor in a set of coregulated genes. From these screenings histograms from the position of the motif instances relative to transcription or translation start were computed. These histograms clearly show that some motifs have a preferential distance to the start site, while others tend to bind everywhere.
The program is currently accessible through a web interface (see URL) and a standalone linux executable is downloadable from our site. The MotifScanner is also usable from inside TOUCAN, our tool for the analysis of coregulated genes (5).
[1] Thijs et al. 2002, J.Comp.Biol., 9(2), 447-464.
[2] Wingender et al. 2000, Nucleic Acids Res, 28, 316-319.
[3] Lescot et al. 2002, Nucleic Acids Res, 30(1), 325-327.
[4] Zhu and Zhang. 1999, Bioinformatics 15(7/8), 607-611.
[5] Toucan, http://www.esat.kuleuven.ac.be/~saerts/software/toucan.html