ECCB 2002 Poster sorted by: Author | Number

Next | Previous poster (in order of the view you have selected)

Title: Data mining for protein interaction domains
P142
Scheel, Hartmut; Hofmann, Kay

hartmut.scheel@memorec.com
Bioinformatics Group, MEMOREC Stoffel GmbH, Stöckheimer Weg 1, D-50829 Köln, Germany

A protein domain, in the structural sense, is a part of the whole protein that folds independently from the rest of the structure and has a hydrophobic core of its own. Localized regions of high sequence conservations, which are typically found in several proteins per organism, are called 'homology domains' and usually correspond roughly to structural domains. Many protein classes - including enzymes, signal transducers and structural proteins - have a highly modular architecture consisting of several domains. Frequently, each of these domains has a function that is distinct from the function of the remainder of the protein. Thus, domains, and by extension also homology domains, often serve as a minimal functional units. The function of the entire protein can in these cases be deduced from the corresponding domain functions, taking into account potential synergistic effects. Functionalities frequently associated with particular domain types include catalytic activities and propensities for binding to particular DNA- or protein-sequences.
Typically, the domain functionalities are conserved to a certain extent throughout domain classes. To name only a few examples, most 'kinase-domains' are able to phosphorylate proteins, most 'SH2-domains' bind to phosphorylated Tyr residues, and most 'bHLH'-domains are able to bind DNA. Thus, the assignment of functionalities to domain classes is a important tool for the functional annotation of unknown proteins harbouring these domains. For some domain classes, like the examples mentioned above, the function is well known. However, there are many other homology domains, some of them very abundant, for which no clue to their putative function exist. Several databases of characterized and uncharacterized homology domains exist, including PROSITE, PFAM, SMART, and a proprietary collection assembled at MEMOREC.
Here, we present an approach to identify putative protein interaction domains in collections of so far uncharacterized homology domains. To this end, we made use of the large body of protein interaction data available for the model organism Saccharomyces cerevisiae (budding yeast). Specifically, we tried to identify domain classes with high propensity i) to form homo-dimers, ii) to form hetero-dimers, i.e. to interact with different proteins belonging to the same domain class, iii) to interact with a particular protein, and iv) to interact with members of a different domain class. For assessing the significance of the interaction propensities, we applied methods of inferential statistics, including c2 and Fisher's exact test. Most of the highly significant interaction propensities were found for well characterized domains, confirming interaction modes that were already known beforehand. Nevertheless, several novel candidates for protein interaction domains were also identified with high statistical confidence. One particularly interesting example, involving proteins of endocytosis and Golgi transport, will be described in more detail.