Max-Planck-Institut für Informatik
max planck institut
mpii logo Minerva of the Max Planck Society

MPI-INF or MPI-SWS or Local Campus Event Calendar

<< Previous Entry Next Entry >> New Event Entry Edit this Entry Login to DB (to update, delete)
What and Who
Title:Optimising de novo transcriptome assembly
Speaker:Dilip Ariyur Durai
coming from:International Max Planck Research School for Computer Science - IMPRS
Speakers Bio:
Event Type:PhD Application Talk
Visibility:D1, D2, D3, D4, D5, SWS, RG1, MMCI
We use this to send out email in the morning.
Level:Public Audience
Date, Time and Location
Date:Monday, 26 October 2015
Duration:90 Minutes
Building:E1 4
 Motivation: De novo transcriptome assembly is a widely used process for transcriptome analysis. Most assemblers use de Bruijn graph as their base data structure. The graph uses kmer(substrings of length k) as the nodes and two nodes are connected if they have k-1 overlap. A fundamental parameter which highly influences the de Bruijn graph and hence the assembly is the value of k. It has been shown that no single k value leads to an optimal result. As a result, researchers use multi kmer based assembly which builds de Bruijn graphs over multiple k values and merges the resulting assemblies. One of the main constraints of this method is the amount of time and memory it requires for large datasets. Limited research has been done to tackle this issue. With this view, we introduce two algorithms: KREATION and RE-READ which significantly reduces the computational time and resources.

KREATION: Most of the current multi kmer based assemblers run the assembly for a kmer set scattered over the entire read length. This results in generation of suboptimal assembly and increase in run time. We propose KREATION, a method that can be incorporated into an assembler to automatically learns at which kmer value to stop the assembly by analysing the transcripts generated by single kmer iteration. It clusters the related assemblies to estimate the necessity of an additional kmer assembly. We found that a linear model based fit approach works well for predicting the kmer value beyond which no assembly is required. This approach was tested on datasets of different sequence coverage and read length. When compared to the assembly generated by using full range of kmer values, KREATION was found to produce lossless results with a significant reduction in runtime.
 RE-READ: Assembling transcriptome reads with high sequence coverage requires large amount of computational memory. These datasets generally consists of redundant data which when removed can reduce the memory requirement. Current algorithms have a risk of losing kmers which form connections between nodes in the de Bruijn graph. This results in suboptimal assembly. We propose an algorithm called RE-READ which analyses the reads and predicts the connections in the de bruijn graph. Reads are removed if they do not contribute to the connectivity or might contribute to an misassemble. The algorithm will reduce the data significantly without significant loss of assembly quality.
CONCLUSION: We put forward two completely automated methods which significantly reduces the runtime and memory requirements. This would make assemblers efficient and easier to use.

Name(s):Andrea Ruffing
Video Broadcast
Video Broadcast:NoTo Location:
Tags, Category, Keywords and additional notes
Attachments, File(s):

Created:Andrea Ruffing/MPI-INF, 10/23/2015 06:58 PM Last modified:Uwe Brahm/MPII/DE, 11/24/2016 04:13 PM
  • Andrea Ruffing, 10/23/2015 07:00 PM -- Created document.