KREATION: Most of the current multi kmer based assemblers run the assembly for a kmer set scattered over the entire read length. This results in generation of suboptimal assembly and increase in run time. We propose KREATION, a method that can be incorporated into an assembler to automatically learns at which kmer value to stop the assembly by analysing the transcripts generated by single kmer iteration. It clusters the related assemblies to estimate the necessity of an additional kmer assembly. We found that a linear model based fit approach works well for predicting the kmer value beyond which no assembly is required. This approach was tested on datasets of different sequence coverage and read length. When compared to the assembly generated by using full range of kmer values, KREATION was found to produce lossless results with a significant reduction in runtime.
RE-READ: Assembling transcriptome reads with high sequence coverage requires large amount of computational memory. These datasets generally consists of redundant data which when removed can reduce the memory requirement. Current algorithms have a risk of losing kmers which form connections between nodes in the de Bruijn graph. This results in suboptimal assembly. We propose an algorithm called RE-READ which analyses the reads and predicts the connections in the de bruijn graph. Reads are removed if they do not contribute to the connectivity or might contribute to an misassemble. The algorithm will reduce the data significantly without significant loss of assembly quality.
CONCLUSION: We put forward two completely automated methods which significantly reduces the runtime and memory requirements. This would make assemblers efficient and easier to use.