MPI-INF Logo
Campus Event Calendar

Event Entry

What and Who

Fast methylation calling on mammalian bisulfite sequencing data

Jonas Fischer
MMCI
PhD Application Talk

Graduate Student Informatics, UdS
AG 1, AG 2, AG 3, AG 4, AG 5, SWS, RG1, MMCI  
Public Audience
English

Date, Time and Location

Tuesday, 10 October 2017
11:00
60 Minutes
E1 4
024
Saarbrücken

Abstract

The major advances in Next Generation Sequencing (NGS) approaches improved the accuracy and at the same time drastically reduced the time required for sequencing large genomic libraries. One particular such library type is whole genome bisulfite sequencing (WGBS) to measure DNA methylation, a covalent modification of the DNA. DNA methylation is one of the most studied epigenetic marks and is known to be responsible for developmental changes in the genomic landscape such as X-chromosome inactivation, genomic imprinting, and silencing of pluripotency-associated genes. Furthermore, DNA methylation is associated with neurodegenerative diseases and cancer, showing aberrant methylation patterns in affected cells.
However, where the advances in NGS reduced the time for WGBS library sequencing, the algorithms for analysis are slow and take days for a large data set, a serious bottleneck in current applications. In this talk I will explain the main computational problems arising with the WGBS protocol and abstract it to a string matching problem, and present how the state of the art approaches tackle this problem. Then I will outline the main ideas of our solution to overcome the limitations of the current software. Our approach revolves around a succinct index representation of the reference genome utilizing a fast cyclic rolling hash function tailored for k-mers of genomic sequences. To align bisulfite reads to the reference genome, we use this index to find candidate regions in the genome, which will then be further filtered by several heuristics to drastically reduce the search space. The remaining candidates are then validated by a modified Shift-And automaton, which allows for asymmetric C/T mapping. The overview will be rounded up by a benchmark with sampled and real data to show that our approach is one order of magnitude faster than the competing algorithms while maintaining similar or even better performance.

Contact

IMPRS Office Team
0681 93251800
--email hidden
passcode not visible
logged in users only

Tags, Category, Keywords and additional notes

Aaron Alsancak, 10/09/2017 13:46 -- Created document.