Campus Event Calendar: Dilip Ariyur Durai (10/26/2015 in E1 4/024)

Campus Event Calendar

Campus Event Calendar:
- All Upcoming:
  - only for D1
  - only for D2
  - only for INET
  - only for D4
  - only for D5
  - only for D6
  - only for RG1
  - Mailing Lists
  - by Speaker
  - by Type
  - by Category
  - by Title
  - Calendar
  - RSS Feed
- History of Events:

Event Entry

What and Who

Optimising de novo transcriptome assembly

Dilip Ariyur Durai

International Max Planck Research School for Computer Science - IMPRS

PhD Application Talk

AG 1, AG 2, AG 3, AG 4, AG 5, SWS, RG1, MMCI

Public Audience

English

Note: We use this to send email in the morning.

Date, Time and Location

Monday, 26 October 2015

10:20

90 Minutes

E1 4

024

Saarbrücken

Abstract

Motivation: De novo transcriptome assembly is a widely used process for transcriptome analysis. Most assemblers use de Bruijn graph as their base data structure. The graph uses kmer(substrings of length k) as the nodes and two nodes are connected if they have k-1 overlap. A fundamental parameter which highly influences the de Bruijn graph and hence the assembly is the value of k. It has been shown that no single k value leads to an optimal result. As a result, researchers use multi kmer based assembly which builds de Bruijn graphs over multiple k values and merges the resulting assemblies. One of the main constraints of this method is the amount of time and memory it requires for large datasets. Limited research has been done to tackle this issue. With this view, we introduce two algorithms: KREATION and RE-READ which significantly reduces the computational time and resources.

KREATION: Most of the current multi kmer based assemblers run the assembly for a kmer set scattered over the entire read length. This results in generation of suboptimal assembly and increase in run time. We propose KREATION, a method that can be incorporated into an assembler to automatically learns at which kmer value to stop the assembly by analysing the transcripts generated by single kmer iteration. It clusters the related assemblies to estimate the necessity of an additional kmer assembly. We found that a linear model based fit approach works well for predicting the kmer value beyond which no assembly is required. This approach was tested on datasets of different sequence coverage and read length. When compared to the assembly generated by using full range of kmer values, KREATION was found to produce lossless results with a significant reduction in runtime.
RE-READ: Assembling transcriptome reads with high sequence coverage requires large amount of computational memory. These datasets generally consists of redundant data which when removed can reduce the memory requirement. Current algorithms have a risk of losing kmers which form connections between nodes in the de Bruijn graph. This results in suboptimal assembly. We propose an algorithm called RE-READ which analyses the reads and predicts the connections in the de bruijn graph. Reads are removed if they do not contribute to the connectivity or might contribute to an misassemble. The algorithm will reduce the data significantly without significant loss of assembly quality.
CONCLUSION: We put forward two completely automated methods which significantly reduces the runtime and memory requirements. This would make assemblers efficient and easier to use.

Contact

Andrea Ruffing

--email hidden

System used:

Meeting URL:

Meeting ID:

Passcode:

passcode not visible

Code Visible for:

logged in users only

Andrea Ruffing, 10/23/2015 19:00 -- Created document.

Imprint / Impressum | Data Protection / Datenschutzhinweis