ECCB 2002 Poster sorted by: Author | Number

Next | Previous poster (in order of the view you have selected)

Title: Towards a Pattern-Based "Composite Homology Analysis and SEarch" (CHASE) Tool
P6
Alam, Intikhab; Dress, Andreas; Fuellen, Georg

intikhab@techfak.uni-bielefeld.de, dress@mathematik.uni-bielefeld.de, fuellen@uni-muenster.de
International Graduate School in Bioinformatics and Genome Research, University of Bielefeld, Bielefeld, Germany

Relationships among protein sequences can be revealed by the occurrence of particular clusters of residues, which are variously known as patterns, motifs, signatures or fingerprints [1]. Such clusters can be very simple and useful tools in helping to identify new members of protein families and in trying to understand the relationship between sequence, structure and function [2]. Efforts have been made to organize such important information into libraries of motifs (e.g. PROSITE [3], BLOCKS [4], and ProDom [5]), and more recently into more sophisticated mathematical models of protein families based on hidden Markov models [6, 7]. Several motif based homology search methods like Phi-Blast and HMMsearch enable the search for members of a protein family. However, it is difficult to decide which method should be used if one wants to find as many true members of a protein family as possible. Every homology search method gives different results, and manual analyses are difficult. To automate database searching, we develop a tool called CHASE (Composite Homology Analysis and SEarch). This tool is currently based on homology searching in Swissprot, using information from Pfam [8], Prosite and CDD [9] (Conserved Domain Database). It performs database searches by applying Hidden Markov Models, PHI-Blast and Vmatch (a pattern matching method like Agrep [10], with E-values [11]). It analyses the results of these homology search methods on the basis of E-values. It then compares those results with expert knowledge (derived from Prosite or Swissprot Keywords). At the end CHASE produces a ranking of hits, dependent on the E-values of a qualified majority of methods. Soon CHASE will be available on the web for general protein family based data searching and easy sequence retrieval.
[1] Bairoch, A., P. Bucher, and K. Hofmann. 1997. The PROSITE database, its status in 1997. Nucleic Acids Res. 25:217-221.
[2] Jonassen I, Collins J F, Higgins D G (1995) Finding flexible patterns in unaligned protein sequences. Protein Science 4, 1587-1595.
[3] A. Bairoch and P. Bucher (1994). PROSITE: recent developments. Nucleic Acids Res. 22, 3583-3589.
[4] S. Henikoff and J.G. Henikoff (1992) Amino acid substitution matrices from protein blocks. Proc. Natl. Acad. Sci. USA, 89, 10915- 10919.
[5] E.L. Sonnhammer and D. Kahn (1994). The modular arrangement of proteins as inferred from Analysis of Homology. Protein Science, 3, 482-492.
[6] A. Krogh, M. Brown, I.S. Mian, K. Sjolander and D. Haussler (1994). Hidden Markov models in computational biology, applications to protein modeling. J. Molec. Biol., 235, 1501-1531.
[7] P. Baldi ; Y. Chauvin.; T. Hunkapiller and M.A. McClure (1994). Hidden Markov Models of. Biological primary sequence information. Proc. Natl. Acad. Sci. USA, 91, 1059-063.
[8] A.Bateman, E. Birney, L. Cerruti, R. Durbin, L. Etwiller, S.R. Eddy, S. Griffiths Jones, K.L. Howe, M. Marshall, and E.L.L. Sonnhammer(2002). Nucleic Acids Res. 30(1):276-283.
[9] Marchler-Bauer, A., Panchenko, A.R., Shoemaker, B.A., Thiessen, P.A., Geer, L.Y. and Bryant, S.H. (2002). CDD: a database of conserved domain alignments with links to domain three -dimensional structure. Nucleic Acids Res. 30:281-283.
[10] Wu S. and U. Manber, ``Agrep -- A Fast Approximate Pattern-Matching Tool,'' Usenix Winter 1992 Technical Conference, San Francisco (January 1992), pp. 153-162
[11] Kurtz S (2002). Personal Communication.