ECCB 2002 Poster sorted by: Author | Number

Next | Previous poster (in order of the view you have selected)

Title: A new model for Sequencing by Hybridization
P49
Guinand, Frédéric; Mouchard, Laurent; Rabia, Aurélia

frederic.guinand@univ-lehavre.fr, laurent.mouchard@univ-rouen.fr., aurelia.rabia@univ-lehavre.fr
University of le Havre

One of the most important problems in bioinformatic is the DNA-sequencing, that consists in determining the sequence of nucleotides of a given target piece of DNA. Several methods exist, mainly gel-based technologies, but a new method has been developped at the end of the Eighties, called sequencing by hybridization (SBH). This method produces a spectrum that is a set of oligonucleotides of the same length that are contained into the original sequence. But this set may contain different kinds of errors : the errors proceeding from the experiment that are positive and negative errors and the errors proceeding from the process, repetitions. Positive errors are oligonucleotides that are present in the spectrum and not in the original sequence. Conversely, oligonucleotides which are negative errors are not present in the spectrum and present in the original sequence. If the sequence contains repetitions, any oligonucleotides present more than once in the original sequence will appear only once in the spectrum.
The computational problem is to rebuild the original sequence from a possible erroneous spectrum. For more than ten years, many works has been dedicated to this problem. All models use different graph formulations according to kinds of errors that have been taken into account, either no error or only repetitions or exclusively both positive and negative errors, but never positive error, negative error and repetitions together. In [6] and [5], a perfect spectrum is used, in [4] only repetitions are taken into account, in [3] both negative and positive errors are taken into consideration but no repetition was allowed. Finally the first attempt to take all kinds of errors into account was presented in [2], but the model was not suitable to represent simultaneously positive and negative errors and repetitions.
The previous models did always not contain the original sequence and no method was not be able to determine the solution. Therefore a better model must be found. This better model should contain additional vertices in order to represent potential negative errors. With the new model, if oligonucleotides that appear in repetitions are lost, the original sequence may be rebuilt. This model by allowing for the manipulation of all types of errors is more consistent than the existing ones. Moreover, we have theoretically proved that given a limited set of assumptions this model always contains the solution (the original sequence). It was hypothesized that the number of consecutive negative errors is less than the length of a probe and that the last oligonucleotide of the sequence is not a negative error. We have called this new model SBH-graphs. A series of tests was realized. On the average, for oligonucleotides of length 7, the spectrum cardinal is equal to 63 and that of, SBH-graph's 1275, for oligonucleotides of length 8, the spectrum has 63 oligonucleotides and the SBH-graph 1853 and for oligonucleotides of length 10 the cardinal is 61 for the spectrum and 3109 for the SBH-graph. To resume, the SBH-graph is enable to refind all repetitions and all possible negative and positive errors. The future perspectives are to optimize the graph building and to apply resolution methods, such as [1].
[1] C. Bertelle, A. Dutot, F. Guinand and D. Olivier. DNA-Sequencing by Hybridization based on a Multi-Castes Ant System. In Proceedings of NETTAB'02 ("Agents in Bioinformatics"), pages 1-7, Bologna (Italy), July, 12th-14th 2002 [2] J. Blazewicz, P. Formanowicz, F. Guinand and M. Kasprzak. Heuristic managing errors for dna-sequencing. Bioinformatics, 18:652-660, 2002.
[3] J. Blazewicz, P. Formanowicz, M. Kasprzak, W. T. Markiewicz and J. Weglarz. DNA sequencing with positive and negative errors. Journal of Computational Biology, 6:113-123, 1999.
[4] A. Guénoche. Can we recover a sequence, just knowing all its subsequences of given length?. Computer Applications in the Biosciences (CABIOS), 8:569-574, 1992.
[5] P.A. Pevzner. l-tuple DNA sequencing: computer analysis. Journal of Biomolecular Structure and Dynamics, 7:63-73, 1989.
[6] Lysov, P., Yu, V. L., Florentiev, A. A., Khorlin, K. R. ,Khrapko, V. V., Shik and A. D. Mirzabekov. bDetermination of the nucleotide sequence of DNA using hybridization with oligonucleotides. Doklady Akademii Nauk SSSR, 303:1508-1511, 1988.