ECCB 2002 Poster sorted by: Author | Number

Next | Previous poster (in order of the view you have selected)

Title: Optimized recognition of errors in protein structures using knowledge based potentials: general findings and application to ProSAII
P78
Kienberger, Ferry; Wiederstein, Markus; Lackner, Peter; Sippl, Manfred J.

ferry.kienberger@came.sbg.ac.at
Center of Applied Molecular Engineering, University of Salzburg, Jakob-Haringer-Strasse 3, A-5020 Salzburg, Austria

Introduction: Knowledge based potentials are distance dependent potentials which are generated from a set of known 3-D structures of proteins. They are used in several areas of protein structure prediction and protein structure analysis. In fold recognition techniques, knowledge based potentials are used to detect protein structures that are similar to the unknown native structure of a query sequence by scanning a library of known folds [1]. Another frequent application of knowledge based potentials is the detection of errors in three dimensional structures of proteins. The software package ProSAII uses a set of knowledge based potentials for backbone and C-beta atoms to detect errors in protein structures by calculating z-scores and energetic profiles [2]. The z-score measures the quality of a fold in terms of statistical parameters which are derived from a large number of incorrect folds.

Results: In the work reported here we optimized knowledge based potentials and the detection of errors in protein structures. We investigated the performance of knowledge based potentials with respect to the number of protein folds in the knowledge base used to compile potentials. From the current PDB entries we extracted a knowledge base of 7000 protein folds where the sequence similarity between any two protein chains is less than 95 %. Fig.1 shows the performance as a function of the number of protein folds in the knowledge base used to compile the potentials. The performance is estimated with respect to a constant set of protein domains composed of 1517 protein domains which were chosen to cover the range of protein sequence families which are currently represented in PDB.

Subsequent to the change of the size of the knowledge base, the composition of the knowledge base was subject for investigation. We studied the use of globular/nonglobular proteins, X-ray/NMR determined proteins and protein chains/domains in the knowledge base with respect to the performance of knowledge based potentials. In addition, the reference state for the compilation of knowledge based potentials was found to have an impact on the performance of the potentials.

Optimized knowledge based potentials were then incorporated into the ProSAII package. In addition to optimized knowledge based potentials, we incorporated optimal parameters for z-score calculations into ProSAII. Optimization of parameters such as the relative weighting of pair and surface potentials, the use of virtual C-beta atoms in glycines and a large polyprotein chain used to generate the statistical background considerably improved the performance in error detection. After all the parameters are optimized, the scores obtained from the ProSAII program increases by two units of standard deviation from -6.91 to -8.92 units as compared to the current version of ProSAII. In addition, the optimized version of ProSAII has shown to be successful in ranking the accuracy of X-ray and NMR models of several proteins.

Summary: We investigate the dependency of knowledge based potentials on the number of proteins available for the compilation of potentials and we optimize several parameters to increase the performance of potentials with respect to error recognition in protein structures. The results presented here are obtained from ProSAII, a program package that is designed for the analysis of errors in three dimensional structures of proteins. The larger data bases available today together with parameter optimization increases the average z-score by two units of standard deviation. This is a significant improvement in performance, since the z-score is the main criterion to distinguish native from misfolded structures. This is also corroborated by the ability to correctly rank the quality of NMR models as compared to X-ray structures.

Fig.1 Performance of knowledge based potentials as a function of the size of the knowledge base. The size of the knowledge base ranges from 50 to 7000 proteins. The performance of the respective knowledge based potential is represented by the average z-score derived from a reference set of 1517 protein domains.
[1] Manfred J. Sippl (1990) Calculation of Conformational Ensembles from Potentials of Mean Force. J. Mol. Biol. 213: 859-883.
[2] Manfred J. Sippl (1993) Recognition of Errors in Three-Dimensional Structures of Proteins. Proteins 17: 355-362.