ECCB 2002 Poster sorted by: Author | Number

Next | Previous poster (in order of the view you have selected)

Title: Calculating Residue Flexibility Information from Statistics and Energy based Prediction
P190
Zoellner, Frank; Koch, Kerstin; Neumann, Steffen; Kummert, Franz; Sagerer, Gerhard

fzoellne@techfak.uni-bielefeld.de, kerstin@techfak.uni-bielefeld.de, sneumann@techfak.uni-bielefeld.de, franz@techfak.uni-bielefeld.de, sagerer@techfak.uni-bielefeld.de
Technische Fakultaet, AG Angewandte Informatik Universitaet Bielefeld, Postfach 100131, 33501 Bielefeld

Introduction
Fast Docking algorithms usually employ the rigid-body-assumption[1, 3]. They score geometric complementarity as well as physico-chemical features. However, for unbound protein docking steric clashes might impose wrong penalization if side-chains change their conformation during the docking process.

Using elasticity information obtained through energy calculations and statistics we introduce a flexibility measure that combines the two. In the search of docking constellations we reduce the influence of steric clash at side chains that are likely to change.



Methods
For unbound protein docking no information about the complex is known, so flexibility information has to be extracted from the structure of the unbound protein. Applying the paradigm that every molecule changes towards its energetically best conformation we assume that the energy difference between the base energy (Ebase ) calculated on the conformation given in the original PDB file and a conformation with minimal energy (Emin ) is a measure of flexibility. We predict a rotamer change as follows:

PRC(aa) = { 1 if norm(Ebase - Emin) >= S,
{ 0 else

If the energy difference exceeds a threshhold (S) we consider that residue as flexible (1) otherwise as not flexible (0). The energy difference is normalized through the function norm which minimizes the impact of outliers.


We use a training set based on our comparison study of complex and unbound proteins[5]. Within this study we analysed the probability of rotamer changes statistically and identified rotamer changes between complex and unbound proteins [4].

To evaluate our prediction we apply Reciever Operating Characteristics [2]. We calculate the ROC area to score our predictions.

The docking system ElMaR does a fast fourier correlation of surface geometry, electrostatics and hydrophobicity. The penalty for steric clash at a given residue is reduced in case a conformational change is predicted using above schema.


Results
We have applied our prediction method to a training set containing 8471 residues for which the bound and unbound conformation exist in the PDB for verification.

At first we only consider the first torsion angle, Chi 1 . For some residues, e.g. LYS, we achieve good results predicting rotamer changes, but others do not perform well, e.g. SER. Table 1 shows the ROC area for each residue.

For CYS, HIS, LEU, PHE, TRP and TYR there have been no examples in the data set to calculate the ROC area. This due to the fact that these residues show a low percen-


residue | ARG | ASN | ASP | CYS | GLN | GLU | HIS | ILE | LEU
ROC area | 0.70| 0.68| 0.36| - | 0.65| 0.95| - | 0.52| -

residue | LYS | MET | PHE | SER | THR | TRP | TYR | VAL
ROC area | 0.88| 1 | - | 0.48| 0.72| - | - | 0.69
Table 1. ROC areas for different residues, Chi 1


tage of changing residues. CYS for example forms sulphor bridges within a protein and therefore its flexibility is neglectable. HIS, PHE or TYR have ring systems within their side chain which cause potential steric hindrance. First docking results show that this method improves our docking system.

The best docking hypothisis on RMSD basis is still worse (in both cases) but the hypothese with the lowest RMSD (probably the most correct one) is ranked higher using our prediction method than using unelastic docking.


Summary
We have developped an approach for predicting rotamer changes by just looking at the unbound protein. Flexibility is classified and quantified by a binary decision (0/1). Prediction is good for most residues. Those not predictable either have a low ROC area value or no true positive examples (for Chi 1 torsion angle) exist in the training set, which can be improved by extending the trainings set.

In future this approach will be extended to higher torsion angles (Chi 2-4 ) including more degrees of freedom of a sidechain. These additional information may help to increase the ROC area values and supply additional criteria for decision. We also plan to implement a new cost function for the flexibility to get a more specific quantisation of the elasticity of a specific residue.
[1] F. Ackermann, G. Hermann, S. Posch, and G. Sagerer. Estimation and filtering of potential protein-protein docking positions. Bioinformatics, 14(2):196-205, August 1998.
[2] J. P. Egan. Signal detection theory and ROC analysis. Academic Pr., New York, 1975.
[3] T. J. A. Ewing, S. Makino, A. G. Skillman, and I. D. Kuntz. Dock 4.0: Search strategies for automated molecular docking of flexible molecular databases. Journal of Computer Aided Molecular Design, 15:411-428, 2001.
[4] K. Koch, S. Neumann, G. Sagerer, and F. Zoellner. Side chain flexibility for 1:n protein-protein docking. 2002. poster 92A, ISMB 2002.
[5] K. Koch, F. Zoellner, S. Neumann, F. Kummert, and G. Sagerer. Comparing bound and unbound protein structures using energy calculation and rotamer statistics. In Silico Biology, 2:32, 2002.