ECCB 2002 Poster sorted by: Author | Number

Next | Previous poster (in order of the view you have selected)

Title: An Approach to Support the Comparison of Microbial Genomic DNA Sequences with Spatial Knowledge of Genomic Structures
P174
Wetjen, Tom

twetjen@tzi.de
University of Bremen,Center for Computing Technologies (TZI)

Recently, there has been a growing interest in bioinformatics in interspecies comparison of complete microbial genomic DNA sequences. The alignment of such sequences allows the identification of coding and of regulatory regions, helps to throw light on the evolution of microbial organisms (archaea and bacteria), and supports the understanding of their metabolic pathways (Schwartz et al. 2000). Existing alignment tools for this task, e.g. MUMer (Delcher et al. 1999) and PipMaker (Schwartz et al. 2000), search for local similarities and align the identified subsequences. This kind of procedure is necessary since a global (end-to-end) alignment strategy would align unrelated regions for the frequent case of genome rearrangements, including gene duplication and change of orientation (Miller 2001). In order to avoid inconsistencies with any kind of repeats, paralogues vs. orthologues genes or any unrelated region (i.e. any false positive hit), the tools usually take subsequences with a local similarity value above a minimal threshold. When comparison are made among species, however, this limits the possibilities for the identification of less conserved structures like regulatory regions. In addition to the similarity value, the assessment of identified subsequences will be more accurate by evaluating any local similarity using the biological context they are located in (e.g. gene, regulatory region, operon etc.). The knowledge of the genomic structures is used to test the biological consistency of the order of local similarities found between two genomes, and thus allows lowering the threshold of the local similarity value.
The approach assumes bacterial genome sequences to be linear like they appear in data base entries instead of circular like they are for most bacteria in vivo. This allows to qualitatively describe the order of genomic structures in a given genome by using relations like before, after, or overlaps. Such a system of relations between intervals is known from research on temporal and spatial reasoning (Beek and Manchak 1996). There are thirteen basic relations that can hold between two genomic structures, namely: before, meets, overlaps, starts, during, finishes, their inverse, and equals. In order to represent indefinite information between two genomic structures, the relations are allowed to be a disjunction of this relations. Thus, two genomic structures can be formalized as binary variables on which constraints are defined. This allows a formalization of the reasoning tasks as a constraint satisfaction problem (CSP). Solving CSP?s when using the interval based-framework can be done by path consistency and backtracking algorithms (Beek and Manchak 1996).
The approach is described in detail on the poster including the modeling of a reference-genome. However, once such a modeling is available it may be re-used in further analysis. The approach described increases the computational expenses of the comparison of microbial genome sequences. On the other hand, the approach can increase the biological plausibility of existing methods and might help to refine our knowledge of bacterial genome organization.
[1] Beek, P.v. and D.W. Manchak. 1996. The Design and Experimental Analysis of Algorithms for Temporal Reasoning. Journal of Artificial Intelligence Research 4: 1-18.
[2] Delcher, A.L., S. Kasif, R.D. Fleischmann, J. Peterson, O. White, and S.L. Salzberg. 1999. Alignment of Whole Genomes. Nucleic Acids Research 27: 2369-2376.
[3] Miller, W. 2001. Comparison of Genomic DNA Sequences: Solved and Unsolved Problems. Bioinformatics 17: 391-397.
[4] Schwartz, S., Z. Zhang, K.A. Frazer, A. Smit, C. Riemer, J. Bouck, R. Gibbs, R. Hardison, and W. Miller. 2000. PipMaker - A Web Server for Aligning Two Genomic DNA Sequences. Genome Research 10: 577-586.