Max-Planck-Institut für Informatik
max planck institut
mpii logo Minerva of the Max Planck Society

MPI-INF or MPI-SWS or Local Campus Event Calendar

<< Previous Entry Next Entry >> New Event Entry Edit this Entry Login to DB (to update, delete)
What and Who
Title:Sustainable Reliability for Distributed Systems
Speaker:Manos Kapritsos
coming from:Microsoft Research, Redmond
Speakers Bio:Manos Kapritsos is a Postdoctoral Researcher at Microsoft
Research in Redmond, WA. He received his Ph.D. from the University of Texas
at Austin in 2014. His research focuses on designing reliable distributed
systems, by applying fault-tolerant replication to combat machine failures
and using formal verification to ensure software correctness.
Event Type:SWS Colloquium
Visibility:D1, D2, D3, D4, D5, SWS, RG1, MMCI
We use this to send out email in the morning.
Level:Expert Audience
Date, Time and Location
Date:Monday, 11 April 2016
Duration:60 Minutes
Reliability is a first-order concern in modern distributed
systems. Even large, well-provisioned systems such as Gmail and Amazon Web
Services can be brought down by failures, incurring millions of dollars of
cost and hurting company reputation. Such service outages are typically
caused by either hardware failures or software bugs. The systems community
has developed various techniques for dealing with both kinds of failures
(e.g. replication, software testing), but those techniques come at a
significant cost. For example, our replication techniques for handling
hardware failures are incompatible with multithreaded execution, forcing a
stark choice between reliability and performance. As for guarding against
software failures, our only real option today is to test our system as best
we can and hope we have not missed any subtle bugs. In principle there
exists another option, formal verification, that fully addresses this
problem, but its overhead in both raw performance and programming effort is
considered way too impractical to adopt in real developments.

In this talk, I make the case for Sustainable Reliability, i.e. reliability
techniques that provide strong guarantees without imposing unnecessary
overhead that limits their practicality. My talk covers the challenges faced
by both hardware and software failures and proposes novel techniques in each
area. In particular, I will describe how we can reconcile replication and
multithreaded execution by rethinking the architecture of replicated
systems. The resulting system, Eve, offers an unprecedented combination of
strong guarantees and high performance. I will also describe IronFleet, a
new methodology that brings formal verification of distributed systems
within the realm of practicality. Despite its strong guarantees, IronFleet
incurs a very reasonable overhead in both performance and programming

Name(s):Brigitta Hansen
Phone:0681 93039102
EMail:--email address not disclosed on the web
Video Broadcast
Video Broadcast:YesTo Location:Saarbr├╝cken
To Building:E1 5To Room:029
Tags, Category, Keywords and additional notes
Attachments, File(s):

Brigitta Hansen/MPI-SWS, 03/30/2016 01:49 PM
Last modified:
Uwe Brahm/MPII/DE, 11/24/2016 04:13 PM
  • Brigitta Hansen, 03/30/2016 01:52 PM -- Created document.