Schmidt-Heck, Wolfgang;Guthke, Reinhard;Reischer, Helga;Dürrschmid, Karin;Bayer, Karl - A Novel Binary Clustering Algorithm to Analyze Gene Expression Time Series

ECCB 2002 Poster sorted by: Author | Number

Next | Previous poster (in order of the view you have selected)

Title: A Novel Binary Clustering Algorithm to Analyze Gene Expression Time Series	P145
Schmidt-Heck, Wolfgang; Guthke, Reinhard; Reischer, Helga; Dürrschmid, Karin; Bayer, Karl wsheck@pmail.hki-jena.de, rguthke@pmail.hki-jena.de, bayer@mail.boku.ac.at Hans Knöll Institute for Natural Product Research,Beutenbergstr. 11, D-07745 Jena, Germany; Institute of Applied Microbiology, University of Agricultural Sciences, Muthgasse 18, A-1190 Vienna, Austria

Clustering is a useful exploratory technique for the analysis of gene expression data, and many different heuristic clustering algorithms have been proposed in this context. We present in this poster a heuristic algorithm for gene expression time series analysis that is based on the following conceptual ideas:
- The input data are a set of estimates of relative gene expression from a microarray experiment.
- It is only important to know whether the change of the gene expression is significant with respect to the preceding time point.
- The noise threshold of the relative gene expression depends on the signal intensity.

The algorithm was validated using Escherichia coli expression data derived from MWG E. coli K12 Array system. The aim of this experiment was to monitor changes of the transcription profile in relation to stress induced by recombinant gene expression. Therefore eight samples were taken at significant states of a chemostat fermentation process.

The noise threshold was determined using four gene arrays hybridised with the same sample of RNA in the two channel (Cy3, Cy5). For each gene the expression ratio r = ICy5/ICy3 was calculated. The statistical evaluation of these experiments delivered the following function e = f(I)+g for the threshold. The term f(I) was dependent on the average value of the signal intensity I and g was independent on I. The term f(I) decreased with increasing signal intensity.

The clustering algorithm uses the following equation to transform the time series of expression ratios rij with the sample number i=1,..., m and gene number j=1,..., n into time series of qualitative descriptors Sij with i = 1,..., m-1:

(equation 1...)
(For the equation see the web representation of the poster abstract)

110 of the 6561 possible expression profiles Sik were found in the data investigated (m = 9, n =4289). The similarity matrix d of the profiles Sik with k =1,...,110 was calculated using the Simple Matching Coefficient

dk,k' = (number of samples i with Si,k=Si,k')/(m-1)

The average linkage method was used to explore clusters in the similarity matrix. 30 different clusters were recognized.

We demonstrate that our algorithm generates stable clustering results with biological relevance. The gene of the lac-operon lacAYZ together with the other genes (agaD, yheT, yicG and b1598) were found within a cluster described by S= [+, -|0, -, -|0, 0, 0, 0, 0]. The expression profile of this cluster is decreasing after an initial peak.