MPI-INF Logo
Campus Event Calendar

Event Entry

What and Who

Saving Stochastic Bandits from Poisoning Attacks via Limited Data Verification

Long Tran-Thanh
University of Warwick
SWS Colloquium

Long Tran-Thanh is an Associate Professor in Artificial Intelligence at
the University of Warwick. Long has been doing active research in a number
of key areas of Artificial Intelligence and multi-agent systems, mainly
focusing on multi-armed bandits, game theory, and incentive engineering,
and their applications to crowdsourcing, human-agent learning, and AI for
Social Good. He has published more than 80 papers at top AI/ML conferences
(including AAAI, AAMAS, CVPR, ECAI, IJCAI, NeurIPS, UAI) and journals
(JAAMAS, AIJ), and have received a number of prestigious
national/international awards, including 2 best paper honourable mention
awards at top-tier AI conferences (AAAI, ECAI), 2 Best PhD Thesis Awards
(in the UK and in Europe), and the AIJ Prominent Paper Award, for being
the author of one of the most influential papers published at the flagship
journal in AI. Long currently serves as a board member (2018-2024) of the
IFAAMAS Directory Board, the main international governing body of the
International Federation for Autonomous Agents and Multiagent Systems, a
major sub-field of the AI community. He is also the local chair of the
AAMAS 2021 and AAMAS 2023 conference, the flagship international
conference of the multi-agent systems community.
SWS  
AG Audience
English

Date, Time and Location

Thursday, 7 April 2022
11:00
60 Minutes
Virtual talk
Virtual talk
Saarbrücken

Abstract

In this paper, we study how bandit algorithms in a bounded reward setting can be contaminated and fooled to learn wrong actions. In particular, we consider a strong attacker model in which the attacker aims to fool the learning algorithm to learn a suboptimal bandit policy. To do so, this attacker can observe both the selected actions and their corresponding rewards, and can contaminate the rewards with additive noise. We show that any bandit algorithm with regret O(\log T) can be forced to suffer a linear regret with an expected amount of contamination O(\log T). We show that this amount of contamination is also necessary, as we prove that there exists a no-regret bandit algorithm, specifically the classical UCB, that requires \Omega(\log T) amount of contamination to suffer linear regret. To combat such poisoning attacks, our second main contribution is to propose verification-based mechanisms, which use a verification scheme to access a limited number of uncontaminated rewards. In particular, for the case of unlimited verifications, we show that with O(\log T) expected number of verifications, a simple modified version of the Explore-then-Commit type bandit algorithm can restore the order optimal O(\log T) regret irrespective of the amount of contamination used by the attacker. We also provide a UCB-like verification scheme, called Secure-UCB, that also enjoys full recovery from any attacks, also with O(\log T) expected number of verifications. To derive a matching lower bound on the number of verifications, we also prove that for any order-optimal bandit algorithm, this number of verifications O(\log T) is necessary to recover the order-optimal regret. On the other hand, when the number of verifications is bounded above by a budget B, we propose a novel algorithm, Secure-BARBAR, which provably achieves O(\min\{C,T/sqrt{B} \}) regret with high probability against weak attackers (i.e., attackers who have to place the contamination \emph{before} seeing the actual pulls of the bandit algorithm), where C is the total amount of contamination by the attacker. This new result breaks the known \Omega(C) lower bound of the non-verified setting if C is large.  

This is joint work with Anshuka Rangi and Haifeng Xu.

Please contact crichter@mpi-sws.org for link information

Contact

Claudia Richter
+49 681 9303 9103
--email hidden
passcode not visible
logged in users only

Claudia Richter, 03/24/2022 11:42 -- Created document.