ECCB 2002 Poster sorted by: Author | Number

Next | Previous poster (in order of the view you have selected)

Title: Discovery of Differentially Expressed Genes using Fuzzy Technology
P53
Guthke, Reinhard; Scherf, Uwe

rguthke@pmail.hki-jena.de
Hans Knöll Institute for Natural Products Research, Jena, GERMANY; Gene Logic Inc., Gaithersburg, MD, USA

Various statistical methods have been established to discover differentially expressed genes between normal and disease tissue or cell systems exposed to different conditions. Most of these methods utilize t-statistics, such as the t-test (1), the Significance Analysis of Microarrays - SAM (2) or Golub's criterion of ranking (3).
The authors have applied the statistical methods as well as a newly developed fuzzy logic approach (4) to identify genes that predict malignant neoplasm of breast tissue. The analysis was performed on normalized and logarithmized data xij (i = 1,..., m, j = 1,..., n+ +n-) of the expression of m = 62.840 human gene fragments detected by Affymetrix's GeneChip® HumanGenome U95 set in n+ =25 infiltrating ductal carcinomas (DCIS) and n- = 25 "normal" breast tissues (data set "A"). The two different conditions have been labeled by the positive sign (+) for malignant and negative sign (-) for normal tissue.
In addition to this proprietary data set A, a publicly available data set published by Golub et al. (3) has been exploited for methodological comparison of the statistic and fuzzy logic approach. This data set is available at: http://www-genome.wi.mit.edu/mpr/data_set_ALL_AML.html (data set "B"). It consists of gene expression data profiling m = 7.129 genes in 38 samples, i.e. n+ = 27 acute lymphoblastic leukemia (ALL) and n- = 11 acute myeloid leukemia (AML).
The present paper compares the results of statistical and fuzzy logic approaches for gene discovery applied to the data sets "A" and "B". The genes are ranked using 3 criteria: (i) p-value of t-test, (ii) Golub's criterion, and (iii) fuzzy logic criterion. µj+/-, sj+/- and n j+/- are the sample means, standard deviations and number of valid expression data (n j+/- £ n+/-) for the gene j under the two conditions labeled with a positive or a negative sign, respectively. Details of the criteria are:

a) p-value of t-test, i.e. reject the null hypothesis "means µj+ and µj- are equal" at significance level pj

b) Golub's criterion (3)

c) the novel fuzzy logic criterion (4)

Zj = Mj × n j+ × n j- / n + / n -

where Mj is the trapezoidal fuzzy membership function quantifying whether the summarized and normalized distance of overlapping is "small", i.e. Dj = 0 is considered to be "small", whereas Dj ³ D* is "large" (= NOT small). The distance Dj between a value xij obtained under one of both conditions and the minimum or maximum of data set obtained under the opposite condition is defined positive. Thus, the function P(x) = (|x|+x)/2 cuts off negative values.
(For the equations see web representation of the poster abstract)

The concordances of the top-100 selected by the three criteria are shown in the following Table.


Selection criterionData set AData set B
t-test vs. Golub's criterion82%84%
t-test vs. Fuzzy criterion43%23%
Golub's vs. Fuzzy criterion35%26%


The concordance of the results found by fuzzy method compared to the results obtained by statistical methods is smaller than 50 %. This is due to the fact that the hypothesis of normally distributed expression data has to be rejected for 36 % of all m genes in the data set "A" and 15 % of all m genes in the data set "B". This hypothesis is fundamental for the t-test and Golub's criterion. This result was obtained applying the Lilliefors modification of the Kolmogorov-Smirnov test at the significance level 5%. As a consequence, applying only statistical or fuzzy methods can lead to false negative results, i.e. some differentially expressed genes would not be identified. Therefore, statistical and fuzzy methods should be applied complementary.
[1] Pan, W.: A comparative review of statistical methods for discovery differentially expressed genes in replicated microarray experiments. Bioinformatics 18 (2002), 546-554.
[2] Tusher V.G., Tibshirani, R., Chu, G.: Significance analysis of microarrays applied to the ionization radiation response. PNAS, 98 (2001), 5116-5121.
[3] Golub, T., Slonim, D., Tamayo, P., Huard, C., Gaasenbeek, M., Mesirov, J., Coller, H., Loh, M., Downing, J., Caligiuri, M., Bloomfield, C., and Lander, E.: Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286 (1999), 531-537.
[4] Guthke, R., Scherf, U.: Fuzzy Clustering of the Human Transcriptome with Respect to Breast Cancer. Statusseminar "Chiptechnologien - Transkriptom - Proteom - Metabolom, Mikroarrays als universelles Werkzeug", DECHEMA-Haus, Frankfurt /Main, 21.-22.1.2002.