Campus Event Calendar: Long Tran-Thanh (04/07/2022 in Virtual talk/Virtual talk)

Campus Event Calendar

Campus Event Calendar:
- All Upcoming:
  - only for D1
  - only for D2
  - only for INET
  - only for D4
  - only for D5
  - only for D6
  - only for RG1
  - Mailing Lists
  - by Speaker
  - by Type
  - by Category
  - by Title
  - Calendar
  - RSS Feed
- History of Events:

Event Entry

What and Who

Saving Stochastic Bandits from Poisoning Attacks via Limited Data Verification

Long Tran-Thanh

University of Warwick

SWS Colloquium

Long Tran-Thanh is an Associate Professor in Artificial Intelligence at the University of Warwick. Long has been doing active research in a number of key areas of Artificial Intelligence and multi-agent systems, mainly focusing on multi-armed bandits, game theory, and incentive engineering, and their applications to crowdsourcing, human-agent learning, and AI for Social Good. He has published more than 80 papers at top AI/ML conferences (including AAAI, AAMAS, CVPR, ECAI, IJCAI, NeurIPS, UAI) and journals (JAAMAS, AIJ), and have received a number of prestigious national/international awards, including 2 best paper honourable mention awards at top-tier AI conferences (AAAI, ECAI), 2 Best PhD Thesis Awards (in the UK and in Europe), and the AIJ Prominent Paper Award, for being the author of one of the most influential papers published at the flagship journal in AI. Long currently serves as a board member (2018-2024) of the IFAAMAS Directory Board, the main international governing body of the International Federation for Autonomous Agents and Multiagent Systems, a major sub-field of the AI community. He is also the local chair of the AAMAS 2021 and AAMAS 2023 conference, the flagship international conference of the multi-agent systems community.

SWS

AG Audience

English

Note: We use this to send email in the morning.

Date, Time and Location

Thursday, 7 April 2022

11:00

60 Minutes

Virtual talk

Saarbrücken

Abstract

In this paper, we study how bandit algorithms in a bounded reward setting can be contaminated and fooled to learn wrong actions. In particular, we consider a strong attacker model in which the attacker aims to fool the learning algorithm to learn a suboptimal bandit policy. To do so, this attacker can observe both the selected actions and their corresponding rewards, and can contaminate the rewards with additive noise. We show that any bandit algorithm with regret O(\log T) can be forced to suffer a linear regret with an expected amount of contamination O(\log T). We show that this amount of contamination is also necessary, as we prove that there exists a no-regret bandit algorithm, specifically the classical UCB, that requires \Omega(\log T) amount of contamination to suffer linear regret. To combat such poisoning attacks, our second main contribution is to propose verification-based mechanisms, which use a verification scheme to access a limited number of uncontaminated rewards. In particular, for the case of unlimited verifications, we show that with O(\log T) expected number of verifications, a simple modified version of the Explore-then-Commit type bandit algorithm can restore the order optimal O(\log T) regret irrespective of the amount of contamination used by the attacker. We also provide a UCB-like verification scheme, called Secure-UCB, that also enjoys full recovery from any attacks, also with O(\log T) expected number of verifications. To derive a matching lower bound on the number of verifications, we also prove that for any order-optimal bandit algorithm, this number of verifications O(\log T) is necessary to recover the order-optimal regret. On the other hand, when the number of verifications is bounded above by a budget B, we propose a novel algorithm, Secure-BARBAR, which provably achieves O(\min\{C,T/sqrt{B} \}) regret with high probability against weak attackers (i.e., attackers who have to place the contamination \emph{before} seeing the actual pulls of the bandit algorithm), where C is the total amount of contamination by the attacker. This new result breaks the known \Omega(C) lower bound of the non-verified setting if C is large.

This is joint work with Anshuka Rangi and Haifeng Xu.

Please contact crichter@mpi-sws.org for link information

Contact

Claudia Richter

+49 681 9303 9103

--email hidden

System used:

Meeting URL:

Meeting ID:

Passcode:

passcode not visible

Code Visible for:

logged in users only

Claudia Richter, 03/24/2022 11:42 -- Created document.

Imprint / Impressum | Data Protection / Datenschutzhinweis