ECCB 2002 Poster sorted by: Author | Number

Next | Previous poster (in order of the view you have selected)

Title: Similarity based approach to protein domain architecture prediction
P170
Vlahovicek, Kristian; Kajan, Laszlo; Pongor, Sandor

kristian@icgeb.org, pongor@icgeb.org
Protein structure and Bioinformatics, ICGEB, Area Science Park, Padriciano 99, 34012 Trieste, ITALY

Increasing amount of primary biological information originating from genome sequencing projects calls for new approaches to large-scale classification and annotation methods.
We present a method based on sequence similarity that can be applied to both functional characterization of whole proteins as well as prediction of domain architecture. The method consists of building an exemplar-based database and preprocessing it, by running a database vs. database comparison, to yield threshold values of biologically significant similarities [1-3]. The annotation of domains is then carried out by comparing an unknown query sequence against the database and processing the search output using the predetermined thresholds. The method performance evaluation shows overall prediction success rate of 90% on a set of 140 000 protein domains divided in 2000 domain groups, each containing 3-7000 members, with median specificity and sensitivity per group of 98% and 93%, respectively. The ease of implementation, prediction speed and method robustness make it an interesting candidate for large-scale annotation projects, as it involves minimal manual intervention in both training and prediction.
The database of annotated protein domains - SBASE, and the domain architecture prediction system are available via the www interface (figure 1) at http://www.icgeb.org/sbase.

(For the figures see web representation of the poster abstracts)
[1] Murvai, J., Vlahovicek, K. and Pongor, S. (2000) A simple probabilistic scoring method for protein domain identification. Bioinformatics, 16, 1155-6.
[2] Murvai, J., Vlahovicek, K. and Pongor, S. (2001) A memory-based approach to protein sequence similarity searching. In Pifat, G. (ed.) Supramolecular Structure and Function. Kluwer Scientific, Dordrecht/Plenum Press, New York, USA, pp. 167-184.
[3] Murvai, J., Vlahovicek, K., Szepesvari, C. and Pongor, S. (2001) Prediction of protein functional domains from sequences using artificial neural networks. Genome Res, 11, 1410-7.