New for: D2, D3, D4, D5
The research presented in this talk attempts to extend Byzantine fault
tolerance protocols to provide better scalability. I will cover two
axes of scalability in such systems.
First, I will focus on systems with a large number of nodes, where
existing solutions are not well-suited, since they assume a static set
of replicas, or provide limited forms of reconfiguration. In the
context of a storage system we implemented, I will present a
membership service that is part of the overall design, but can be
reused by any large-scale Byzantine-fault-tolerant system. The
membership service provides applications with a globally consistent
view of the set of currently available servers. Its operation is
mostly automatic, to avoid human configuration errors; it can be
implemented by a subset of the storage servers; it tolerates arbitrary
failure of the servers that implement it; and it can itself be
reconfigured.
The second aspect of scalability of Byzantine-fault-tolerant systems
that this talk discusses is scalability with the size of the replica
group (and, consequently, the number of faults that the system
tolerates). I show a new replication protocol called HQ, that
combines quorum-based techniques that are more scalable when there is
no contention with more traditional state-machine replication
protocols with quadratic inter-replica communication like
Castro-Liskov's BFT that are useful for resolving concurrency issues.