ECCB 2002 Poster sorted by: Author | Number

Next | Previous poster (in order of the view you have selected)

Title: Hidden Markov Model based clustering of gene expression data
P147
Schoenhuth, Alexander; Schliep, Alexander; Mueller, Tobias; Steinhoff, Christine

aschoen@zpr.uni-koeln.de, schliep@molgen.mpg.de, steinhof@molgen.mpg.de, Tobias.Mueller@molgen.mpg.de
Center for Applied Computer Science Cologne (ZAIK), University of Cologne; Max Planck Institute for Molecular Genetics Berlin

In microarray experiments the expression levels of thousands of genes are being measured simultaneously. When performing microarray experiments consecutively in time we call this experimental setting a time course of gene expression profiles. One goal of such a setting is the detection of the underlying cellular processes, to set up regulatory networks and to assign function to time courses.

There have been a number of attempts to analyse such time courses. They can be divided into those which assume the different experiments to be independent and those that do not. Methods of the first class do not consider any dependencies between profiles belonging to subsequent timepoints (horizontal dependencies). Methods as
hierarchical clustering, k-means clustering or singular value decomposition [4,5,6] belong to this class. Methods which use splines [1] and autoregressive curves [2] belong to the second class and model horizontal dependencies. Bar-Joseph et al. [1] present a model where groups of expression profiles along a time axis are being modeled by cubic splines. Ramoni et al. [2] use autoregressive equations to fit agglomeratively gene profiles to representative curves giving clusters within the time series dataset.

Methods which take the horizontal dependencies into account are more suitable to analyse several aspects of time course data. The assumption of independent expression profiles in time even admits permutating time points arbitrarily without changing the result of the clustering which does not reflect the nature of a time course setting.

Note further that all of the methods of the second class are model-based approaches. Instead of defining a distance measure and grouping data points in a way that minimizes a distance-based scoring function, statistical models are used to represent clusters and cluster membership is decided based on maximization of a data point's likelihood given a model/cluster.

In contrast to the methods described above, we apply Hidden Markov Models (HMMs) to account for the horizontal dependencies. Besides their prevalent use for biological sequence analysis, HMMs have been successfully applied for analysing time series data in a wide range of different problem domains. They are particularly suitable, if essential types of qualitative behavior can be proposed, as ``grammatical'' or ``structural'' constraints in the data can be effectively and explicitly modeled.

We present a method to partition a set of expression time course data into clusters by use of HMMs. Given a number of clusters, each of which is represented by one Hidden Markov Model from a finite collection encompassing typical qualitative behavior, an iterative procedure finds cluster models and an assignment of data points to these models, which maximizes the joint likelihood of the clustering. We apply the method on simulated data and on various published datasets. Furthermore we discuss the results in comparison to the methods described in [1] and [2].

The software is based on the GHMM (GNU Hidden Markov Model Library), freely available under the LGPL.
[1] Bar-Joseph Z; Gerber G; Gifford DK; Jaakkola TS; RECOMB2002.
[2] Ramoni MF; Sebastiani P; Kohane IS; PNAS; vol99(14);9121-9126;2002.
[3] Hidden Markov and Other Models for Discrete-valued Time Series, by I.L., MacDonald and W. Zucchini. Chapman & Hall, London, 1997. (Research monograph, no. 70 in series Monographs on Statistics and Applied Probability).
[4] Spellman PT; Sherlock G; Zhang MQ; Iyer VR; Anders K; Eisen MB; Brown PO, Botstein D, Futcher B; Mol Biol Cell 1998 Dec;9(12):3273-97.
[5] Tavazoie S, Hughes JD, Campbell MJ, Cho RJ, Church GM; NatGen vol22, 281-285, 1999.
[6] Alter O, Brown PO, Botstein D.;Proc Natl Acad Sci U S A 2000 Aug 29;97(18):10101-6.