MPI-INF Logo
Campus Event Calendar

Event Entry

What and Who

Extending Logits-Based Watermarking Schemes to Mitigate Stealing Attacks

Mikko Tripakis
Noerheastern University
PhD Application Talk
AG 1, AG 2, AG 3, INET, AG 4, AG 5, D6, SWS, RG1, MMCI  
AG Audience
English

Date, Time and Location

Wednesday, 29 January 2025
12:00
30 Minutes
Virtual talk
zoom

Abstract

Logits-based watermarking schemes for large language models are robust to moderate attacks

such as editing, paraphrasing, and token replacement. However, watermark stealing attacks pose
a serious threat to these schemes, which use a fixed-width token context to seed pseudorandom
generator functions. We propose a variable-width context mechanism to increase robustness against
stealing attacks while maintaining quality and detectability of watermarked text. We implement our
mechanism on the KGW-SelfHash variant of the watermark proposed by Kirchenbauer et al.
and evaluate it against the watermark stealing attack developed by Jovanovic et al.. We find that
our mitigation successfully degrades the attack’s effectiveness while maintaining high quality and
detectability of watermarked text.

Contact

Ina Geisler
+49 681 9325 1802
--email hidden

Virtual Meeting Details

Zoom
passcode not visible
logged in users only

Ina Geisler, 01/24/2025 12:19 -- Created document.