Prediction tasks in machine learning usually require deducing a latent variable, or structure, from observed traces of activity — sometimes, these tasks can be carried out with a significant precision, while sometimes getting any significance out of the prediction requires an unrealistically large number of traces.
In this talk, we will study the trace complexity of (that is: the number of traces needed for) a number of prediction tasks in social networks: the network inference problem, the number of signers problem, and the star rating problem.
The first problem was defined by [Gomez-Rodriguez et al, 2010] and consists of reconstructing the edge set of a network given traces representing the chronology of infection times as epidemics spread through the network.
The second problem’s goal is to predict the unknown number of signers of email-based petitions, given only a small subset of the emails that circulated.
The last problem aims to predict the unknown absolute "quality" of a movie using the ratings given by different users (each with their own unknown precision)
These examples will allow us to highlight some interesting general points of prediction tasks.
Joint work with subsets of Bruno Abrahao, Anirban Dasgupta, Bobby Kleinberg, Jon Kleinberg, Ravi Kumar, Silvio Lattanzi and David Liben-Nowell.