MPI-INF Logo
Campus Event Calendar

Event Entry

What and Who

Online Checkpointing with Improved Worst-Case Guarantees

Adrian Neumann
Max-Planck-Institut für Informatik - D1
AG1 Mittagsseminar (own work)
AG 1, AG 2, AG 3, AG 4, AG 5, RG1, SWS, MMCI  
AG Audience
English

Date, Time and Location

Tuesday, 26 February 2013
13:00
30 Minutes
E1 4
024
Saarbrücken

Abstract

In the online checkpointing problem, the task is to continuously maintain a set of k checkpoints that allow to rewind an ongoing computation faster than by a full restart.

The only operation allowed is to remove an old checkpoint and to store the current state instead. Our aim are checkpoint placement strategies that minimize rewinding cost, i.e., such that at all times T when requested to rewind to some time t <= T the number of computation steps that need to be redone to get to t from a checkpoint before t is as small as possible.
In particular, we want that the closest checkpoint earlier than t is not further away from t than p_k times the ideal distance T / (k+1), where p_k is a small constant.
Improving over earlier work showing 1 + 1/k <= p_k <= 2, we show that p_k can be chosen less than 2 uniformly for all k. More precisely, we show the uniform bound p_k <= 1.7 for all k, and present algorithms with asymptotic performance p_k <= 1.59 + o(1) valid for all k and p_k <= ln(4) + o(1) <= 1.39 + o(1) valid for k being a power of two. For small values of k, we show how to use a linear programming approach to compute good checkpointing algorithms. This gives performances of less than 1.53 for k <= 10.
One the more theoretical side, we show the first lower bound that is asymptotically more than one, namely p_k >= 1.30 - o(1). We also show that optimal algorithms (yielding the infimum performance) exist for all k.

http://arxiv.org/abs/1302.4216

Contact

Adrian Neumann
--email hidden
passcode not visible
logged in users only

Adrian Neumann, 02/19/2013 10:12
Adrian Neumann, 02/19/2013 10:11 -- Created document.