Proceedings Article, Paper
@InProceedings
Beitrag in Tagungsband, Workshop


Show entries of:

this year (2019) | last year (2018) | two years ago (2017) | Notes URL

Action:

login to update

Options:




Library Locked Library locked




Author, Editor

Author(s):

Jez, Artur

dblp



Editor(s):

Kulikov, Alexander
Kuznetsov, Sergei
Pevzner, Pavel

dblp
dblp
dblp

Not MPII Editor(s):

Kulikov, Alexander
Kuznetsov, Sergei
Pevzner, Pavel

BibTeX cite key*:

Jez2014CPM

Title, Booktitle

Title*:

A really Simple Approximation of Smallest Grammar

Booktitle*:

25th Annual Symposium on Combinatorial Pattern Matching (CPM 2014)

Event, URLs

URL of the conference:

http://cpm2014.hse.ru/

URL for downloading the paper:


Event Address*:

Moscow, Russia

Language:

English

Event Date*
(no longer used):


Organization:


Event Start Date:

16 June 2014

Event End Date:

18 June 2014

Publisher

Name*:

Springer

URL:

www.springer.com

Address*:

Berlin

Type:

Extended abstract

Vol, No, Year, pp.

Series:

Lecture Notes in Computer Science

Volume:

8486

Number:


Month:

June

Pages:

182-191

Year*:

2014

VG Wort Pages:


ISBN/ISSN:


Sequence Number:


DOI:

10.1007/978-3-319-07566-2_19



Note, Abstract, ©


(LaTeX) Abstract:

In this paper we present a \emph{really} simple linear-time algorithm
constructing a context-free grammar of size $O(g \log (N/g))$
for the input string, where $N$ is the size of the input string
and $g$ the size of the optimal grammar generating this string.
The algorithm works for arbitrary size alphabets,
but the running time is linear assuming that the alphabet $\Sigma$ of the input string
can be identified with numbers from $\{1,\ldots , N^c \}$ for some constant $c$.
Algorithms with such an approximation guarantee and running time are known,
however all of them were non-trivial and their analyses were involved.
The here presented algorithm computes the LZ77 factorisation and transforms it in phases to a grammar.
In each phase it maintains an LZ77-like factorisation of the word with at most $\ell$ factors
as well as additional $O(\ell)$ letters, where $\ell$ was the size of the original LZ77 factorisation.
In one phase in a greedy way (by a left-to-right sweep and a help of the factorisation)
we choose a set of pairs of consecutive letters to be replaced with new symbols,
i.e. nonterminals of the constructed grammar.
We choose at least 2/3 of the letters in the word and there are $O(\ell)$ many different pairs among them.
Hence there are $O(\log N)$ phases, each of them introduces $O(\ell)$ nonterminals to a grammar.
A more precise analysis yields a bound $O(\ell \log(N/\ell))$.
As $\ell \leq g$, this yields the desired bound $O(g \log(N/g))$.

Keywords:

Grammar-based compression, Construction of the smallest grammar, SLP, compression



Download
Access Level:

Internal

Correlation

MPG Unit:

Max-Planck-Institut für Informatik



MPG Subunit:

Algorithms and Complexity Group

Audience:

experts only

Appearance:

MPII WWW Server, MPII FTP Server, MPG publications list, university publications list, working group publication list, Fachbeirat, VG Wort



BibTeX Entry:

@INPROCEEDINGS{Jez2014CPM,
AUTHOR = {Jez, Artur},
EDITOR = {Kulikov, Alexander and Kuznetsov, Sergei and Pevzner, Pavel},
TITLE = {A really Simple Approximation of Smallest Grammar},
BOOKTITLE = {25th Annual Symposium on Combinatorial Pattern Matching (CPM 2014)},
PUBLISHER = {Springer},
YEAR = {2014},
TYPE = {Extended abstract},
VOLUME = {8486},
PAGES = {182--191},
SERIES = {Lecture Notes in Computer Science},
ADDRESS = {Moscow, Russia},
MONTH = {June},
DOI = {10.1007/978-3-319-07566-2_19},
}


Entry last modified by Artur Jez, 12/15/2014
Show details for Edit History (please click the blue arrow to see the details)Edit History (please click the blue arrow to see the details)
Hide details for Edit History (please click the blue arrow to see the details)Edit History (please click the blue arrow to see the details)

Editor(s)
[Library]
Created
11/24/2014 03:06:02 PM
Revision
1.
0.


Editor
Artur Jez
Artur Jez


Edit Date
11/24/2014 04:34:52 PM
11/24/2014 03:06:02 PM