MPI-INF Logo
Campus Event Calendar

Event Entry

What and Who

Fourier Spectral Analysis of DNA Sequences: Coding regions, Intron-Exon Boundaries and Protein Domains

Anguraj Sadanandam
Presentation
AG 1, AG 2, AG 3, AG 4  
Expert Audience

Date, Time and Location

Thursday, 17 January 2002
11:00
30 Minutes
46.1 - MPII
24
Saarbrücken

Abstract

fast and versatile computer algorithms and statistical methods for gene finding and

protein domain identification. There are statistical methods, which discern correlations
within DNA sequences. The universal property of the coding region is the short-range
correlations related to the codons. Based on this universal property, the discrete Fourier
spectrum shows a distinct peak at f = 1/3, which is absent in the case of non-coding
regions. GeneScan (http://202.41.10.146) is a gene finding technique, which calculates
the peak-to-noise ratio for a sequence. The application of GeneScan to some prokaryotes
and eukaryotes is complicated by the presence of short exons and introns. In such case,
the window length has to be reduced and this leads to high noise.

A modification of the GeneScan window analysis called Extended Length Window
Analysis (ELWA) has been tried. In this case, a variable window length is used and an
increase in slope denotes the occurrence of coding regions. But for rigorous
discrimination, both GeneScan and ELWA are combined, to give intron-exon boundaries
with accuracy ranging from 5-10 bases. In this a preliminary work has been done and this
is being checked with many genomes.

Few pathogenic organisms are found to have Low Complexity Regions (LCRs) in protein
domains. Well known, that it is difficult to find these regions using integrated method of
protein identification. Since Fourier spectrum and peak-to-noise ratio could pick up
periodicities, they can be used to identify proteins with LCRs. This has been shown for S-
antigen of Plasmodium falciparum and few other proteins. Hence, Fourier technique may
assist in the integrated mode of functional genomics. But this may work out well if a
periodicity database for known protein domains is available.

Contact

Roxane Wetzel
-900
--email hidden
passcode not visible
logged in users only