Metagenome studies have retrieved vast amounts of sequence data from a variety of environments leading to novel discoveries and insights into the uncultured microbial world.
Except for very simple communities, the encountered diversity has made fragment assembly and the subsequent analysis a challenging problem.
A taxonomic characterization of such sequences is required for deeper understanding of such microbial communities.
In my presentation I will talk about our work on composition-based methods for the phylogenetic classification of metagenome sequence samples.
Our composition-based classifier /PhyloPythia/combines higher level generic clades learned from publicly available genome sequences with sample-derived population models.
Extensive analyses on synthetic and real metagenome data sets showed that /PhyloPythia/allows the accurate classification of most sequence fragments across all considered taxonomic ranks,
even for unknown organisms, and can assign fragments ≥ 1 kb with high specificity.
In our ongoing work, we are working on a novel model which requires substantially less training time,
while maintaining high levels of accuracy and furthermore allows extraction and analysis of the compositional signals characterizing individual clades.