MPI-INF Logo
Campus Event Calendar

Event Entry

What and Who

"CrystalBall: Predicting and Preventing Inconsistencies in Deployed Distributed Systems"

Dejan Kostic
EPFL
SWS Colloquium


Dejan Kostic obtained his Ph.D. in Computer Science at the Duke University, under Amin Vahdat. He spent the last two
years of his studies and a brief stay as a postdoctoral scholar at the University of California, San Diego. He received
his Master of Science degree in Computer Science from the University of Texas at Dallas, and his Bachelor of Science
degree in Computer Engineering and Information Technology from the University of Belgrade (ETF), Serbia. In January
2006, he started as a tenure-track assistant professor at the School of Computer and Communications Sciences at EPFL
(Ecole Polytechnique Fédérale de Lausanne), Switzerland. His interests include Distributed Systems (Peer to Peer
Computing, Overlay Networks), Computer Networks, Operating Systems, and Mobile Computing.
SWS, RG1  
AG Audience
English

Date, Time and Location

Friday, 21 November 2008
14:00
60 Minutes
E1 5
rotunda 6th floor
Saarbrücken

Abstract


Distributed systems form the foundation of our society's infrastructure. Complex distributed protocols and algorithms
are used in enterprise storage systems, distributed databases, large-scale planetary systems, and sensor networks.
Errors in these protocols translate to denial of service to some clients, potential loss of data, and even monetary
losses. Unfortunately, it is notoriously difficult to develop reliable high-performance distributed systems that run
over asynchronous networks, such as the Internet.  Even if a distributed system is based on a well-understood
distributed algorithm, its implementation can contain coding bugs and errors arising from complexities of realistic
distributed environments.

This talk describes CrystalBall, a new approach for developing and deploying distributed systems. In CrystalBall, nodes
predict distributed consequences of their actions, and use this information to detect and avoid errors.  Each node
continuously runs a state exploration algorithm on a recent consistent snapshot of its neighborhood and predicts
possible future violations of specified safety properties.  We describe a new state exploration algorithm, consequence
prediction, which explores causally related chains of events that lead to property violation. Using CrystalBall, we
identified new bugs in mature Mace implementations of a random overlay tree, BulletPrime content distribution system,
and the Chord distributed hash table.  Furthermore, we show that if the bug is not corrected during system development,
CrystalBall is effective in steering the execution away from inconsistent states at run-time, with low false negative
rates.

Contact

Brigitta Hansen
0681 - 9325200
--email hidden
passcode not visible
logged in users only

Brigitta Hansen, 11/05/2008 15:17 -- Created document.