Neumann, Steffen;Zoellner, Frank;Koch, Kerstin;Kummert, Franz;Sagerer, Gerhard - ElMaR: A Protein Docking System using Flexibility Information

ECCB 2002 Poster sorted by: Author | Number

Next | Previous poster (in order of the view you have selected)

Title: ElMaR: A Protein Docking System using Flexibility Information	P113
Neumann, Steffen; Zoellner, Frank; Koch, Kerstin; Kummert, Franz; Sagerer, Gerhard sneumann@techfak.uni-bielefeld.de, fzoellne@techfak.uni-bielefeld.de, kerstin@techfak.uni-bielefeld.de, franz@techfak.uni-bielefeld.de, sagerer@techfak.uni-bielefeld.de Technische Fakultaet, AG Angewandte Informatik Universitaet Bielefeld, Postfach 100131, 33501 Bielefeld

We give an overview over the ElMaR Docking System. Using a distributed modular and optionally parallel architecture results can be obtained within a few minutes. ElMaR incorporates protein flexibility obtained through statistics and force field calculation. Using a fast correlation technique steric clash penalties are weighted according to the possibility of amino acid rotamer changes.

For development of protein docking systems large datasets are needed for training, test and comparison of docking algorithms. Because of exponential growth of the PDB automated test case generation is needed. We present a method for automated Test Case generation based on combined searches and filters in the PDB.

Finally we present results of the individual modules and the complete system.

Methods
Pipelined architecture: The whole system is set up like a pipeline, with dependency tracking between the individual modules. Other systems very often do batch processing on the whole content of the PDB updating with a fixed schedule.
Whenever new PDB entries are deposited it can enter the pipeline, or if a module is updated, the depending data can be regenerated.

Docking Algorithm
The docking algorithm is based on earlier work [1] and performs a fast fourier correlation of geometric, electrostatic features as well as hydrophobicity.
Steric clash is penalized, weighted by the elasticity parameters obtained through rotamer statistics and energy calculation [3]. Statistic present a global view on the PDB, where the energy calculation is time consuming but accurate with respect to the PDB entry in question. For residues where no information is known no elasticity is assumed.

Test Case generation: Test cases for unbound protein docking are tripels consisting of the crystal structure of the complex and the two docking partners in their unbound form. We use three different heuristics:

- Sequence and number of chains: A PDB entry consisting of two chains where for each chain there is a sequence identical PDB entry with only one chain can be considered a test case.
- Chain names: A specialisation of the above is also considering chain names. PDB entries with three chains "AB" and "I" are usually a two-chain enzyme with an inhibitor. Several of these name conventions are included.
- PDB-ata-Glance: This database (see [4]) classifies PDB entries using keyword search.

A Graphical User Interface visualizes the state of each entry, which tests it passed or what stages it needs to go through yet.

Results
Testset created: The testset obtained containes between 8 and 316 test cases, depending on the stringency of filters (like minimum chain length, maximum crystallographic resolution, whether modelled entries are allowed etc.).
Those test cases with the best overall crystallographic resolution in the crossproduct are taken.

Docking results
For evaluation the unbound structures are docked and the RMSD of the C-alpha is computed. The RMSD is plotted against the cost estimation. Good hypotheses have both low cost and low RMSD values.
We show the results for the rigid docking of the complex and the unbound partners side by side. Examples for docking results that improve using flexibility are given. The reason of performance decline and correction of that behaviour is subject of further research. We'll also present the results in the "Docking Results Unified Format" [2].
Average runtime is below 20 minutes for a run, with 90% time spent in the scoring module. Since scoring can run in parallel, runtimes below 10 minutes can be expected for parallel runs. Both methods for flexibility prediction are applied during the preprocessing stage and don't increase the actual docking time.

Summary
We developed a modular docking system that can be extended to do flexible docking using residue-specific elasticity parameters. The parameters are calculated based on rotamer statistics and energy calculation. The approach is targeted at database searches, therefore as many calculations as possible are moved in the preprocessing stages.

Further research will add a module navigation within the result sets to allow for adaptive docking parameters based on relevance feedback as known from content based image retrieval.

[1] F. Ackermann, G. Hermann, S. Posch, and G. Sagerer. Estimation and filtering of potential protein-protein docking positions. Bioinformatics, 14(2):196-205, August 1998.
[2]I. M. Halperin, B. H. Wolfson, and R. Nussinov. Principles of docking: An overview of search algorithms and a guide to scoring functions. PROTEINS NEW YORK , 2002, Vol. 47, T. 4, S. 409-443, 2002.
[3] K. Koch, F. Zoellner, S. Neumann, F. Kummert, and G. Sagerer. Comparing bound and unbound protein structures using energy calculation and rotamer statistics. In Silico Biology, 2:32, 2002.
[4] The CENTER FOR MOLECULAR MODELING (CMM). PDB-at-a-glance. http://cmm.info.nih.gov/modeling/PDB_at_a_glance.html, 1996. Link 11.12.2001.