ECCB 2002 Poster sorted by: Author | Number

Next | Previous poster (in order of the view you have selected)

Title: Biclustering microarray data by Gibbs sampling
P108
Moreau, Yves; Qizheng, Sheng; De Moor, Bart

moreau@esat.kuleuven.ac.be
Department of Electrical Engineering, Katholieke Universiteit Leuven

Biclustering is an important challenge in the analysis of microarray data [1,2,3]. In contrast with standard clustering techniques such as hierarchical clustering, K-means, or self-organizing maps, objects (here, the genes) are grouped together on the basis of only a subset of the variables (here, the experiments). A key application of biclustering is the mining of large heterogeneous compendia [4] of microarray experiments. Indeed, detecting a pattern of correlation between genes may involve only a limited subset of experiments as such compendia are collections of many smaller experiments that are not necessarily strongly related.

We present a biclustering strategy based on Gibbs sampling. Indeed, Gibbs sampling has become a method of choice for the discovery of noisy patterns in DNA and protein sequence data [5] thanks to its high sensitivity. Because handling noise in microarray data presents similar challenges, we have adapted this strategy to the biclustering of discretized microarray data. The objective of the algorithm is to sample from the genes and experiments so as to define patterns of co-expression that have a high probability. A key advantage of our approach is that the setup of Gibbs sampling offers a transparent interpretation of the biclusters in terms of a simple probabilistic model. Gibbs sampling also does not suffer from the problems of local minima that often characterize Expectation-Maximization.

We demonstrate the effectiveness of our approach at the hand of two data sets. The first one is yeast data set [4] and the second one is a data set from leukemia patients [6]. We show that Gibbs biclustering detects biologically relevant patterns.
[1] Cheng Y, Church GM. Biclustering of expression data. Proc Int Conf Intell Syst Mol Biol 2000;8:93-103.
[2] Segal E, Taskar B, Gasch A, Friedman N, Koller D. Rich probabilistic models for gene expression. Bioinformatics. 2001;17 Suppl 1:S243-52.
[3] Tanay A, Sharan R, Shamir R. Discovering statistically significant biclusters in gene expression data. Bioinformatics. 2002; to appear.
[4] Gasch AP, Spellman PT, Kao CM, Carmel-Harel O, Eisen MB, Storz G, Botstein D, Brown PO. Genomic expression programs in the response of yeast cells to environmental changes. Mol Biol Cell. 2000 Dec;11(12):4241-57.
[5] Thijs G, Marchal K, Lescot M, Rombauts S, De Moor B, Rouze P, Moreau Y. A Gibbs sampling method to detect overrepresented motifs in the upstream regions of coexpressed genes. J Comput Biol 2002;9(2):447-64.
[6] Armstrong SA, Staunton JE, Silverman LB, Pieters R, den Boer ML, Minden MD, Sallan SE, Lander ES, Golub TR, Korsmeyer SJ. MLL translocations specify a distinct gene expression profile that distinguishes a unique leukemia. Nat Genet. 2002 Jan;30(1):41-7.