Siebert, Sven - Multiple Local Sequence Structure Alignment of RNA Sequences

ECCB 2002 Poster sorted by: Author | Number

Next | Previous poster (in order of the view you have selected)

Title: Multiple Local Sequence Structure Alignment of RNA Sequences	P152
Siebert, Sven siebert@inf.uni-jena.de Chair of Bioinformatics, Institute of Computer Science, Friedrich-Schiller University Jena, D-07443 Jena

We present a new method that finds and aligns local sequence structure segments in a set of RNA sequences. Recognising structural features of RNA sequences allows us to find highly probable sequence structure segments with high similarities among them. Multiple alignment of those segments is given.
In order to overcome difficulties of NP-hardness we restrict the search space of finding these segments by giving structural information such as a stem loop. We build a set of structures around the given structure with distance at most a constant d. We use the following edit operations with their corresponding costs given shortly in brackets : insertion of a base [1], deletion of a base [1] and breaking a base pair into two bases [2]. These generated structures are used for finding corresponding RNA sequence segments,i.e. there exists a base pair (i,j) in the structure iff there exists a k such that the pair (k+i,k+j) is a base pair in the RNA sequence. The aim is to assign to each RNA a local structure out of the set of already constructed structures fulfilling
-the local structure occurs with high probability,
-the similarity among all other local sequence structures is maximised.
Scanning through the set of generated structures and finding all occurrence positions in RNA sequences allows us to compute the probabilities of structure occurrences based on a modified computational scheme by McCaskill [1]. A list for each RNA keeps the m best structures, i.e. with the highest probabilities. Following Zhang [3], we implemented a dynamic programming method for scoring and aligning these RNA subsequences pairwise whereas nm(nm-1)/2 comparisons are needed. A branch and bound technique starts searching for assigning the sequences to their substructures. It chooses the structures on the basis of minimising the sum of pairwise edit distances of local structures multiplied by their probabilities. This yields the substructures for every RNA sequence. Local multiple alignment is done by considering pairwise alignments with their appropriate scores from Zhang. Extended libraries are constructed after an essential idea from T-Coffee[4]. Weights between pairwise bases reflect some information in the whole alignment.We give an example of RNA sequences from SELEX experiments searching for a stem loop.

[1] J.S. McCaskill. The equilibrium partition function and base pair binding proba-bilities, Biopolymers, vol.29, no.6-7, pp.1105-19, 1990
[2] Gorodkin, J and Heyer, LJ and Stormo, GD. Finding the Most Significant Com-mon Sequence and Structure Motifs in a Set of RNA Sequences, Nucleic Acids Res, vol.25, no.18, pp.3724-32, 1997
[3] Jiang, Tao and Lin, Guohui and Ma, Bin and Zhang, Kaizhong. A General Edit Distance between RNA Structures, JCB, vol.9, no.2, pp.371-88, 2002
[4] C. Notredame and D. G. Higgins and J. Heringa. T-Coffee: A novel method for fast and accurate multiple sequence alignment, JMB, vol.302, no.1, pp.205-17, 2000