MPI-INF Logo
Campus Event Calendar

Event Entry

What and Who

Failures-In-Time (FIT) Analysis for Fault-Tolerant Distributed Real-Time Systems

Arpan Gujarati
MMCI
SWS Student Defense Talks - Thesis Proposal
SWS  
Public Audience
English

Date, Time and Location

Wednesday, 28 March 2018
16:00
-- Not specified --
G26
111
Kaiserslautern

Abstract

Distributed real-time (DiRT) systems are widely deployed in contemporary cyber-physical systems (CPS). Many of these systems are safety-critical, since their failure or malfunction can result in death or serious injuries to the people and/or severe damage to the environment involved, e.g., human spaceflight vehicles, surgical robots, air traffic and nuclear reactor control systems, drive-by-wire and fly-by-wire systems, railway signaling systems, etc.


Safety-certification standards mandate that the failure rate of safety-critical systems in the presence of any unpreventable and intolerable errors due to environmentally-induced transient faults (such as due to electromagnetic, thermal, and radiation sources) must be under a certain threshold.

In this regard, prior work on the reliability analysis of DiRTs in the presence of environmentally-induced transient faults does not target all possible error scenarios (such as Byzantine errors). This is mainly because the likelihood of a complex error scenario is extremely low and/or because the workloads for safety-critical systems have traditionally been simple, with sufficient slack to tolerate fault-induced failures and with mechanical backups to tolerate complete software failures.

However, a majority of CPS devices are expected to be fully autonomous in future, thus requiring stronger reliability guarantees with fail-operational semantics. In addition, since the workloads used for safety-critical systems are becoming more and more complex (e.g., deep learning neural networks are being used in self-driving cars) and since there is a push towards the use of cheaper community hardware, the likelihood of complex Byzantine errors is going to increase. Therefore, it is imperative that we revisit the existing techniques for analyzing and building safety-critical DiRTs.

To address this issue, we propose analyses to derive a safe upper-bound on the failure rates of safety-critical DiRTs in the presence of Byzantine errors due to environmentally-induced transient faults. We focus on DiRTs based on Controller Area Network that are commonly used in today's cyber-physical systems, and on Ethernet-based DiRTs that are expected to be at the core of next-generation cyber-physical systems.

Contact

--email hidden

Video Broadcast

Yes
Saarbrücken
E1 5
029
passcode not visible
logged in users only

Maria-Louise Albrecht, 03/21/2018 10:45
Maria-Louise Albrecht, 03/02/2018 15:34 -- Created document.