To address some of these challenges, we have developed the EpiGRAPH web service. It implements best-practice workflows to identify hidden correlations among integrated genome, epigenome and transcriptome data and to detect functional elements that are invisible to any single experimental approach. The web service combines flexible genome attribute calculation with powerful machine learning methods and a large (epi-) genome database. In EpiGRAPH’s most basic application scenario, the user submits a set of genomic regions that share an interesting feature (e.g. epigenetically altered in cancer, frequent retroviral integration sites, or tissue-specific enhancer elements), in order to identify common genomic features. EpiGRAPH will then automatically calculate appropriate control sets as well as a wide range of potentially predictive genomic and epigenetic features. Based on these data, it will perform statistical tests to identify features that are significantly different between cases and controls, and perform more sophisticated machine learning analyses to assess the joint impact of biologically related feature groups. In addition, classification algorithms such as support vector machines, logistic regression and ensemble learning methods are implemented to predict class membership for new regions.
We regard EpiGRAPH as a significant step towards the development of a new type of genome data analysis tools that are substantially more powerful than current genome browsers. Such “statistical genome browsers” will not only visualize genome and epigenome data, but will also provide data mining features and best-practice workflows for a wide range of analytical tasks.