ECCB 2002 Poster sorted by: Author | Number Next | Previous poster (in order of the view you have selected) |
Title: Towards a Pattern-Based "Composite Homology Analysis and SEarch" (CHASE) Tool | P6 |
Alam, Intikhab; Dress, Andreas; Fuellen, Georg intikhab@techfak.uni-bielefeld.de, dress@mathematik.uni-bielefeld.de, fuellen@uni-muenster.de International Graduate School in Bioinformatics and Genome Research, University of Bielefeld, Bielefeld, Germany |
Relationships among protein sequences can be revealed by the occurrence of particular clusters of residues, which are variously known as patterns, motifs, signatures or fingerprints [1]. Such clusters can be very simple and useful tools in helping to identify new members of protein families and in trying to understand the relationship between sequence, structure and function [2]. Efforts have been made to organize such important information into libraries of motifs (e.g. PROSITE [3], BLOCKS [4], and ProDom [5]), and more recently into more sophisticated mathematical models of protein families based on hidden Markov models [6, 7]. Several motif based homology search methods like Phi-Blast and HMMsearch enable the search for members of a protein family. However, it is difficult to decide which method should be used if one wants to find as many true members of a protein family as possible. Every homology search method gives different results, and manual analyses are difficult. To automate database searching, we develop a tool called CHASE (Composite Homology Analysis and SEarch). This tool is currently based on homology searching in Swissprot, using information from Pfam [8], Prosite and CDD [9] (Conserved Domain Database). It performs database searches by applying Hidden Markov Models, PHI-Blast and Vmatch (a pattern matching method like Agrep [10], with E-values [11]). It analyses the results of these homology search methods on the basis of E-values. It then compares those results with expert knowledge (derived from Prosite or Swissprot Keywords). At the end CHASE produces a ranking of hits, dependent on the E-values of a qualified majority of methods. Soon CHASE will be available on the web for general protein family based data searching and easy sequence retrieval. |
[1] Bairoch, A., P. Bucher, and K. Hofmann. 1997. The PROSITE database, its status in 1997. Nucleic Acids Res. 25:217-221. [2] Jonassen I, Collins J F, Higgins D G (1995) Finding flexible patterns in unaligned protein sequences. Protein Science 4, 1587-1595. [3] A. Bairoch and P. Bucher (1994). PROSITE: recent developments. Nucleic Acids Res. 22, 3583-3589. [4] S. Henikoff and J.G. Henikoff (1992) Amino acid substitution matrices from protein blocks. Proc. Natl. Acad. Sci. USA, 89, 10915- 10919. [5] E.L. Sonnhammer and D. Kahn (1994). The modular arrangement of proteins as inferred from Analysis of Homology. Protein Science, 3, 482-492. [6] A. Krogh, M. Brown, I.S. Mian, K. Sjolander and D. Haussler (1994). Hidden Markov models in computational biology, applications to protein modeling. J. Molec. Biol., 235, 1501-1531. [7] P. Baldi ; Y. Chauvin.; T. Hunkapiller and M.A. McClure (1994). Hidden Markov Models of. Biological primary sequence information. Proc. Natl. Acad. Sci. USA, 91, 1059-063. [8] A.Bateman, E. Birney, L. Cerruti, R. Durbin, L. Etwiller, S.R. Eddy, S. Griffiths Jones, K.L. Howe, M. Marshall, and E.L.L. Sonnhammer(2002). Nucleic Acids Res. 30(1):276-283. [9] Marchler-Bauer, A., Panchenko, A.R., Shoemaker, B.A., Thiessen, P.A., Geer, L.Y. and Bryant, S.H. (2002). CDD: a database of conserved domain alignments with links to domain three -dimensional structure. Nucleic Acids Res. 30:281-283. [10] Wu S. and U. Manber, ``Agrep -- A Fast Approximate Pattern-Matching Tool,'' Usenix Winter 1992 Technical Conference, San Francisco (January 1992), pp. 153-162 [11] Kurtz S (2002). Personal Communication. |