K+-channel gene identification with conventional methods like pattern recognition and even emotif Nevill-Manning et al., Enumerating and ranking diskrete motifs. Proc. Int. Conf. Intell. Syst. Mol. Biol., 5: 202-209, 1997
produce a large number of false positives caused by the close relationship among all ion-channel pores and therefore have limited use for K+-channel-gene screening.
Here we introduce a new method that is based on a distinctive signature and a new algorithm analysing not the amino acid sequences themselves but physico-chemical properties of their residues. The signature comprises (due to the low conservation level of K+-channels) only 25 amino acids along the pore region and the selectivity filter. In contrast to other signatures, it represents a broad range of amino acids that can appear at a given position instead of only a few highly conserved residues. In a second step, the algorithm uses this signature to create a highly-specific string, describing the physico-chemical properties of the pore region and the selectivity filter. Therefore, a potential hit represents a match to a stringent order of properties rather than to a biased amino acid sequence.
The method was developed with a set of 1418 sequences, consisting of 461 K+-channel pore sequences, 178 pore-domain related sequences, 188 K+-channel β-subunits and 591 random sequences. The method was validated using 10-fold cross-validation and statistical analysis of the results.
Using conventional pattern recognition to recover 90% of all K+-channels, leads to a false positive rate of 30%. The false positive rate of our method is smaller by a factor of 10, even when recovering 99% of the K+-channels.