At each time step i from 1 to T, the player must choose a strategy x_i in some convex set K. After the choice of x_i, the (convex) cost function f_i is revealed to the player; the cost of x_i is then f_i(x_i).
Since f_i is revealed only after x_i has been chosen, the player obviously cannot hope to maximize f_i(x_i). Instead, the player tries to minimize a quantity known as regret.
The online algorithm proposed by Zinkevich is inspired by gradient
descent, a very simple (and slow) algorithm for numerical optimization. We will present Zinkevich's algorithm, as well as subsequent improvements in the analysis of its performance (only section 3.1 of Agarwal et al.'s paper).
Reading group homepage (links to papers will appear):
http://www.mpi-inf.mpg.de/~elbassio/approx-group.html