Proceedings Article, Paper
@InProceedings
Beitrag in Tagungsband, Workshop


Show entries of:

this year (2019) | last year (2018) | two years ago (2017) | Notes URL

Action:

login to update

Options:




Library Locked Library locked




Author, Editor

Author(s):

Miliaraki, Iris
Berberich, Klaus
Gemulla, Rainer
Zoupanos, Spyros

dblp
dblp
dblp
dblp



Editor(s):





BibTeX cite key*:

Miliaraki2013

Title, Booktitle

Title*:

Mind the Gap: Large-Scale Frequent Sequence Mining

Booktitle*:

ACM SIGMOD International Conference on Management of Data (SIGMOD 2013)

Event, URLs

URL of the conference:

http://www.sigmod.org/2013/

URL for downloading the paper:


Event Address*:

New York, USA

Language:

English

Event Date*
(no longer used):


Organization:

Association for Computing Machinery (ACM)

Event Start Date:

22 June 2013

Event End Date:

27 June 2013

Publisher

Name*:

ACM

URL:

http://www.acm.org

Address*:

New York, USA

Type:


Vol, No, Year, pp.

Series:


Volume:


Number:


Month:

June

Pages:


Year*:

2013

VG Wort Pages:


ISBN/ISSN:


Sequence Number:


DOI:




Note, Abstract, ©

Note:

To appear

(LaTeX) Abstract:

Frequent sequence mining is one of the fundamental building blocks in data mining. While the problem has been extensively studied, few of the available techniques are suffciently scalable to handle datasets with billions of sequences; such large-scale datasets arise, for instance, in text mining and session analysis. In this paper, we propose PFSM, a scalable algorithm for frequent sequence mining on MapReduce. PFSM can handle so-called ``gap constraints'', which can be used to limit the output to a controlled set of frequent sequences. At its heart, PFSM partitions the input database in a way that allows us to mine each partition independently using any existing frequent sequence mining algorithm. We introduce the notion of $w$-equivalency, which is a generalization of the notion of a ``projected database'' used by many frequent pattern mining algorithms. We also present a number of optimization techniques that minimize partition size, and therefore computational and communication costs, while still maintaining correctness. Our extensive experimental study in the context of text mining suggests that PFSM is significantly more efficient and scalable than alternative approaches.



Download
Access Level:

Internal

Correlation

MPG Unit:

Max-Planck-Institut für Informatik



MPG Subunit:

Databases and Information Systems Group

Audience:

experts only

Appearance:

MPII WWW Server, MPII FTP Server, MPG publications list, university publications list, working group publication list, Fachbeirat, VG Wort



BibTeX Entry:

@INPROCEEDINGS{Miliaraki2013,
AUTHOR = {Miliaraki, Iris and Berberich, Klaus and Gemulla, Rainer and Zoupanos, Spyros},
TITLE = {Mind the Gap: Large-Scale Frequent Sequence Mining},
BOOKTITLE = {ACM SIGMOD International Conference on Management of Data (SIGMOD 2013)},
PUBLISHER = {ACM},
YEAR = {2013},
ORGANIZATION = {Association for Computing Machinery (ACM)},
ADDRESS = {New York, USA},
MONTH = {June},
NOTE = {To appear},
}


Entry last modified by Rainer Gemulla, 01/30/2014
Show details for Edit History (please click the blue arrow to see the details)Edit History (please click the blue arrow to see the details)
Hide details for Edit History (please click the blue arrow to see the details)Edit History (please click the blue arrow to see the details)

Editor(s)
[Library]
Created
02/11/2013 12:12:16 PM
Revisions
3.
2.
1.
0.
Editor(s)
Rainer Gemulla
Rainer Gemulla
Iris Miliaraki
Iris Miliaraki
Edit Dates
02/12/2013 11:33:56 AM
02/12/2013 11:30:57 AM
02/11/2013 01:26:37 PM
02/11/2013 12:12:16 PM