Determining the function of proteins is a problem with immense practical
impact on the identification of drug targets and the reduction of potential
side effects. Unfortunately, experimental determination of protein function
is an expensive and time consuming process. For this reason, algorithms
have been designed to maximize experimental impact by identifying additional
proteins that may have biological function similar to proteins with
experimentally determined function. The algorithms we discuss in this talk
implement this approach by searching for "matches" of geometric and chemical
similarity between "motifs", representing known functional sites, and
substructures of functionally uncharacterized proteins ("targets"). The
identification of a match could imply the existence of similar active sites,
and thus similarity to experimentally determined function.
Successfully identifying functional homologs is a multifaceted problem which
requires effective motifs, efficient algorithms, and methods for filtering
out matches which could not indicate similar function. We begin by
describing Match Augmentation, our algorithm for efficiently identifying
matches, which we have shown to identify cognate active sites on a test set
of known homologs. However, when searching for matches within the entire
PDB, we encounter matches to many functionally unrelated proteins. For this
reason, we developed a method for computing the statistical significance of
a match, and showed that matches to cognate active sites are statistically
significant. This permitted us to filter out matches with geometric
similarity insufficient to identify functionally related proteins. Finally,
we designed a novel distributed algorithm, Geometric Sieving, which refined
motif definitions, based on geometric properties, producing more effective
motifs with greater geometric and chemical similarity to cognate active
sites, and diminished similarity to functionally unrelated proteins. In the
near future, we plan to pipeline these algorithms: the design of effective
motifs, match search, and match filtering, into a process for automated
accurate function predictions.