MPI-INF Logo
Campus Event Calendar

Event Entry

New for: D2, D3

What and Who

PhD Application Talk: Identifying functional discriminative motifs in protein families

Phuc Loi Luu
Saarland University
Talk
AG 1, AG 3, AG 5, SWS, AG 2, AG 4, RG1, MMCI  
MPI Audience
English

Date, Time and Location

Monday, 12 July 2010
08:45
90 Minutes
E1 4
024
Saarbrücken

Abstract

Predicting the sites in the absence of the corresponding protein structure, being important for function of protein and not conserved, plays an important role in many field of molecular biology, particularly protein engineering. These sites are called functional discriminative motif. In this work, we developed a method composing of three steps (processing data, learning data and validating motif) to identify for the functional discriminative motif by mining the public abundant database, Pfam. In the first step, we process both of protein sequences and their associated Gene Ontology (GO) annotations with profile HMM alignment to generate the data frame for learning. Each sample in the data frame consists of a set of features (positions in profile HMM) and a class label (the associated GO term). This step also aims to eliminate GO term outliers and inappropriate sequences. In the second step, a comparison of two ensemble methods for classification and prediction of variable importance, boosting and random forest, are carried out on the processed data. Two methods perform approximately equal, we focus on random forest in the further development as it is quite fast compared to gradient boosting and random forest can easily deal with multiclass problem. Next, five protein families of Pfam are classified by random forest. Finally, taking advantage of classification results, functional discriminative motifs are predicted by variable importance to produce the motifs. In the validation, the motifs are mapped onto the corresponding structures where available. We observed that almost sites of motifs locate around the substrate binding pocket in their corresponding structures. Fisher exact test is employed to test the significance. The results of significant test lead to a conclusion that there is a significant difference in the frequencies of the important sites inside the catalytic sphere and outside the catalytic sites as compared with unimportant sites. This is the evidence to prove that the motifs associate to its function. In this work, we have developed a pipeline (Pfam and internal database -> derived database -> processed data set -> classification of protein function -> prediction of protein function -> prediction of functional discriminative motif -> mapping onto structure -> significant test). Furthermore, two applications of the developed pipeline are proposed. The first is the classification and prediction of protein functions. In the second application, we proposed a method, called paralogous substitutions, for protein engineering with directed molecular evolution approach. The proposal is illustrated with experiments, examples and discussion.

Contact

IMPRS-CS
--email hidden
passcode not visible
logged in users only

Tags, Category, Keywords and additional notes

Please note: The talks will take place in random order!

Heike Przybyl, 07/01/2010 15:17 -- Created document.