ECCB 2002 Poster sorted by: Author | Number

Next | Previous poster (in order of the view you have selected)

Title: Non-Gumbel behavior of the statistics of local sequence alignment
P56
Hartmann, Alexander K.

hartmann@theorie.physik.uni-goettingen.de
Universität Göttingen, Institut für Theoretische Physik

The statistics of local alignment of protein sequences is studied, i.e. the distribution p(S) of optimum scores from random pairs of sequences. Amino acid sequences distributed according the background frequencies obtained by Robinson and Robinson [1] are used together with the BLOSUM62 scoring matrix [2] for (12,1) affine gap costs. Here, the concentration is on the rare-event tail of p(S). In the underlying random model, it is very unlikely (e.g. p(S) ~ 10-40) to obtain scores in this region, but it is the region where the biologically relevant scores are found.

In previous studies [3], the distribution of scores was obtained by direct sampling of random sequences. In this frequent-event region, a good agreement of the results with the extreme-value (or Gumbel) distribution was obtained. But this former approach has the drawback that only the low-scoring region is accessible.

Here, a new method [8] to calculate probability distributions in regions where the events are very unlikely is applied. The basic idea is to map the underlying model on a physical system kept a temperature T. The system is simulated at low temperatures, such that preferably configurations with originally low probabilities (i.e. high scores) are generated. Since the distribution of such physical systems is known from statistical mechanics, the original unbiased distribution can be obtained.

Here, significant deviations of the rare-event tail of p(S) from the extreme-value distribution are found for medium-length, i.e. biologically relevant sequences (see Figure, n=m=length of sequences) Nevertheless, this deviation decreases with growing sequence length, hence for sequences of infinite length, the Gumbel distribution is obtained.
    [1] A.B. Robinson, L.R. Robinson, Proc. Natl. , Acad. Sci. USA, 88, 8880 (1991).
    [2] S. Heinkoff, J.G. Heinkoff, Proc. Natl. Acad. Sci. USA 89, 10915 (1992).
    [3] T.F. Smith, M.S. Waterman, C. Burks, Nucleic Acids Res. 13, 645 (1985).
    [4] J.F. Collins, A.F.W. Coulson, A. Lyall, CABIOS 4, 67 (1988).
    [5] R. Mott, Bull. Math. Biol. 54, 59 (1992).
    [6] M.S. Waterman V. Vingron, Proc. Natl. Acad. Sci. USA 91, 4625, (1994); Stat. Sci. 9, 367 (1994).
    [7] S.F. Altschul W. Gish, Methods in Enzymology 266, 460 (1996).
    [8] A.K. Hartmann, Phys. Rev. E 65, 056102 (2002).