MPI-INF Logo
Campus Event Calendar

Event Entry

What and Who

Faster Approximate Pattern Matching: A Unified Approach

Philip Wellnitz
Max-Planck-Institut für Informatik - D1
AG1 Mittagsseminar (own work)
AG 1  
AG Audience
English

Date, Time and Location

Friday, 16 October 2020
13:00
30 Minutes
n/a
n/a
(virtual)

Abstract

Approximate pattern matching is a natural and well-studied problem on strings: Given a text T, a pattern P, and a threshold k, find (the starting positions of) all substrings of T that are at distance at most k from P. We consider the two most fundamental string metrics: the Hamming distance and the edit distance. Under the Hamming distance, we search for substrings of T that have at most k mismatches with P, while under the edit distance, we search for substrings of T that can be transformed to P with at most k edits.

Exact occurrences of P in T have a very simple structure: If we assume for simplicity that |T| ≤ 3 / 2 |P| and that P occurs both as a prefix and as a suffix of T, then both P and T are periodic with a common period. However, an analogous characterization for the structure of occurrences with up to k mismatches was proved only recently by Bringmann et al. [SODA’19]: Either there are O(k²) k-mismatch occurrences of P in T, or both P and T are at Hamming distance O(k) from strings with a common period O(m/k). We tighten this characterization by showing that there are O(k) k-mismatch occurrences in the case when the pattern is not (approximately) periodic, and we lift it to the edit distance setting, where we tightly bound the number of k-edit occurrences by O(k²) in the non-periodic case. Our proofs are constructive and let us obtain a unified framework for approximate pattern matching for both considered distances. In particular, we provide meta-algorithms that only rely on a small set of crucial primitive operations (such as identifying the longest common prefix of two substrings). We showcase the generality of our meta-algorithms with results for the fully-compressed setting, the dynamic setting, and the standard setting.

(Joint work with Panagiotis Charalampopoulos and Tomasz Kociumaka.)

---------------
Join Zoom Meeting
Meeting ID: 527 278 8807

Note: for people outside D1 interested in listening to this talk, please contact Sándor Kisfaludi-Bak at skisfalu@mpi-inf.mpg.de for the password.

Contact

Philip Wellnitz
+49 681 9325 0
--email hidden
passcode not visible
logged in users only

Philip Wellnitz, 10/13/2020 16:31 -- Created document.