Various principles underlying the protein structure description must be studied carefully to efficiently understand the relationships between sequence, structure and function. Mean force potentials from atom interactions and main torsion angles were used by different investigators to evaluate the protein structure, stability and protein-protein interactions. In recent experiments, these were also used in the prediction of protein function and enzyme catalysis. Five different atom classification models with interactions in different distance ranges were selected for this study to check their ability to describe the protein environment. Furthermore, torsion angle potentials were derived in addition to atom potentials so that orientational information of amino acids can be included to the model.
The five atom classification models that are used for atom potentials include the following: a basic five (basic5) atom model (C aliphatic, C aromatic, H, O, N), amino acid Catoms (C20), Li-Nussinov atom model (LN24), SATIS model (SA28) and Melo and Feytmans atom model (MF40). Carbon atoms with aromatic and aliphatic nature exhibit significantly different chemical and functional behavior and they were considered separately in the basic5 atom model with N, O and S. Li and Nussinov defined 24 different amino acid atom types using the polarity and hydrophobicity of atoms, though some of the atoms may substantially have partial polar or apolar nature. SATIS (Simple Atom Type Information System) is a protocol for the definition and automatic assignment of atom types and the classification of atoms according to their covalent connectivity. The free energy values (G and GH2O values from thermal and chemical denaturation) of unfolding from point mutation experiments were used as an experimental measure of protein stability. In future, this method will also be extended to evaluate other structure descriptors. It has already been reported that the measured free energy changes between wild type and mutant proteins can be predicted using statistical potentials. But, these models lack good prediction efficiency and reliability to predict protein mutant stability for wide range of protein structures.
A dataset of 4024 non-redundant structures was used to assess the optimal distribution of atom interactions and torsion angles (and ). DSSP was used to calculate the torsion angles. For torsion angle potentials, the distribution was normalized with a standard procedure using the circular Gaussian function for and having the bivariate normal distribution. Since the mutants may exhibit torsion angle perturbation in an amino acid position, the Gaussian function would increase the efficiency of predicting slightly altered amino acid conformations.
Results were validated based on the correlation observed between the experimental and predicted G values. Prediction accuracy of being correctly predicted as stabilizing or destabilizing was also observed. Results show that the Melo and Feytmens atom model predicts the protein stability with better accuracy, since it showed a correlation coefficient of 0.85 with 85.31% of 1536 mutations correctly predicted to be either stabilizing or destabilizing. SA28, LN24, C20 and basic5 atom models showed a correlation coefficient of 0.82, 0.78, 0.76 and 0.55 respectively. In order to maintain consistent prediction efficiency, stepwise regression methods were used to optimize the number of atoms used for the model. Effect of torsion angle potentials with and without the Gaussian apodisation was compared. This shows that the amino acids adapt perturbed torsion angle conformations in partially buried beta sheets than the other structural elements.
For the final prediction model, two datasets of point mutations were taken for the comparison of theoretically predicted stabilizing energy values with experimental G and GH2O from thermal and chemical denaturation experiments respectively. These include 1538 and 1581 mutations respectively and contain 101 proteins that share wide range of sequence identity. Results were carefully evaluated with a variety of statistical tests. Results show a maximum correlation of 0.87 between predicted and experimental G values and a prediction accuracy of 85.3% (stabilizing or destabilizing) for all mutations together. A correlation of 0.77 each for the test dataset of split-sample validation and k-fold cross validation tests was obtained and a correlation of 0.70 (thermal) was shown by the jack-knife test. A similar model was implemented and the results were analyzed for mutations with GH2O. A correlation of 0.79 was observed with a prediction efficiency of 85.03%. This model can be used for the future prediction of protein structural stability upon point mutations together with various experimental techniques. A web tool (CUPSAT hosted at http://cupsat.uni-koeln.de) has been developed for this algorithm. This is available as a part of the CUBIC bioinformatics toolbox.