MicroRNAs (miRNAs) are small non-coding RNAs which play critical role in a wide range of biological processes, via post-transcriptional gene regulation. Identifying miRNA targets is a critical step toward elucidating their functions in different diseases. In recent years, several computational methods based on miRNA-mRNA sequence complementarity information have been developed. However the expected false positive rate of sequence based predictions is still large. In addition many target relationships are context-specific. Therefore, most approaches incorporate miRNA-gene expression levels to improve prediction accuracy. Because microRNAs most often do not target all transcripts of one gene, using the expression level of the gene may be suboptimal. RNA-seq extends the possibilities of transcriptome profiling to quantitative analysis of expression levels of genes and their transcripts. We challenged traditional microRNA target inference methods and used the estimated transcript expression level instead of gene expression level as input for our models. We formulated miRNA target interaction prediction using different linear regression models (LASSO, Elastic Net), that can deal with the large number of features encountered. Furthermore to incorporate prior knowledge of miRNA-target interactions into the models, we solved these regression models with negative constraints on their feature coefficients.
We show that models based on transcript expression levels show improved prediction performance, independent on the regression method used. In general, recall is increased without sacrifice in precision, supporting the idea that using transcript annotation is indeed helpful for predicting miRNA-gene interactions. Additionally, transcript-based models can, for the first time, pinpoint which transcript of the gene is regulated by which miRNA on a genome-wide scale. Finally, analysis of the regression coefficients shows, that microRNAs with negative coefficients are highly enriched in transcripts with putative 3'UTR binding sites, as expected. More surprisingly, we observe enrichment of miRNAs with positive coefficients for genes with putative binding sites in the promoter region, supporting the idea of a possibly widespread transcriptional regulatory mechanism involving in the nucleus.
Overall, we conclude that the transcript based prediction models introduced in this work are more powerful in predicting miRNA-gene interactions from miRNA and mRNA expression data than established approaches.