ECCB 2002 Poster sorted by: Author | Number

Next | Previous poster (in order of the view you have selected)

Title: ProToGO - evaluating biological features for a set of proteins using GO annotations
P167
Ulanovsky, Hagit (1); Ron, Shany (1); Kifer, Ilona (2)

hagitmor@cs.huji.ac.il, shany@pob.huji.ac.il
(1) Dept of Biological Chemistry, Life Sciences Institute, Hebrew University, Jerusalem, Israel. (2) The school of Computer Science and engineering, Hebrew University, Jerusalem, Israel.

The accelerated rate of sequence accumulation and the in-depth research into expression and regulation raises a critical demand for assigning biological significance to large sets of sequences. Computational approaches for clustering the high throughput results of modern biological research methods, such as 2D gel, libraries scanning and cDNA microarray experiments, are well established, while tools for quantifying the likelihood that a set of proteins is associated with a certain biological characteristic are only starting to emerge.
We attempt to fill this gap by applying a new tool - ProToGO - that offers an online analysis of the biological features of a set of proteins using the Gene Ontology database [1].
The ProToGO server receives a set of protein entries from the user (SwisProt, TrEMBL or GenBank accessions) and uses the GO Annotation@EBI [2] and Compugen Inc GO annotations [3] to assess the biological connectivity and features of these proteins. Most entries are associated with multiple terms according to the GO partitions: Molecular function, Biological process and Cellular component. ProToGO uses this association to produce a sub-graph of the complete GO graph that includes the GO terms, associated with the proteins in the query set, as nodes. Theses nodes are assigned the percentage of query proteins associated to them and an appropriate p-value, derived from testing against a background binomial probability of protein association with a specific node or its progeny. ProToGO results are presented in a graphical or a textual format, according to the user's preference. The generic quality of ProToGO makes it useful for biologically-oriented evaluation and validation of protein clustering tools.
There exist many internet tools for clustering proteins using sequence, structure, or other properties, each displaying their statistical methods for evaluating the quality of the clustering, somewhat missing on the biological evidence of the uniformity of the produced clusters.
We used ProToGO for evaluating the biological quality of clusters created automatically by ProtoNet (3). ProtoNet provides a classification for all proteins in SwissProt at different levels of granularity. As a measure of quality, the ProToGO analysis for many clusters produced by ProtoNet were compared with results of biological uniformity statistical tests which use SwissProt keywords and InterPro domains. The ProtoGO analysis was in accordance with the tests results for very uniform or for fragmented clusters. However, for some clusters the ProToGO analysis assured a high biological uniformity to a cluster, while the statistical tests were not significant and thus inconclusive. The higher quality of information that the ProToGO algorithm produces is an added value of its use of the hierarchical structure of the GO graph, and of the comprehensive proteins-to-GO mappings by EBI and Compugen Inc..
Our next trial of ProToGO was with gene expression data produced from cDNA microarray experiments. We found several benefits in using the ProTOGO to analyze such data. First, the ProToGO divides the query set of proteins to the different pathways they participate in, and presents the results in a user-friendly interface. Second, this presentation format helps focusing research on pathways containing multiple proteins from the array. Finally, the quality of the annotations presents us with up to 10 nested hierarchies of the proteins, adding a new insight to the input data.
In summary, we would like to point out that ProToGO is a generic tool that can be used for assessing the biological purity and uniformity of any set of proteins, from any biological source.
[1] Ashburner,M. et al., The Gene Ontology Consortium, Gene Ontology: tool for the unification of biology (2000). Nature Genet., 25, 25-29.
[2] Camon,E. & Magrane, M. et al., The Gene Ontology Annotation (GOA) project: implementation of GO in SWISS-PROT, TrEMBL and InterPro Genome Research (submitted), (http:// www.ebi.ac.uk/GOA/).
[3] Xie, H. et al., Large-Scale Protein Annotation through Gene Ontology (2002). Genome Research, 12(5), 785-794. Compugen, Inc. (http://www.cgen.com, http://www.labonweb.com).
(4) Sasson, O. et al., The metric space of proteins ? comparative study of clustering algorithms (2002). Bioinformatics, 18: S14-S21
Proceedings of the Tenth International Conference on Intelligent Systems for Molecular Biology (ISMB).