MPI-INF Logo
Campus Event Calendar

Event Entry

What and Who

Estimating the Number of Frequent Itemsets in a Large Database

Ruoming Jin
Kent State University
Talk
AG 1, AG 3, AG 5, RG2, AG 2, AG 4, RG1, SWS  
AG Audience
English

Date, Time and Location

Thursday, 30 August 2007
10:00
90 Minutes
E1 4
4th Floor Rotunda
Saarbrücken

Abstract

Estimating the number of frequent data item sets for minimal support
$\alpha$ in a large dataset is of great interest from both theoretical and practical
prospectives. However, finding not only the number of frequent item sets, but
even the number of maximally frequent itemsets, is \#P-complete.
In the meantime, a procedure for estimating the number of frequent itemsets
before running any algorithm to enumerate them can be an important tool to facilitate
the mining process and estimate its complexity.

However, all algorithms known to date that estimate the number of frequent data item sets were using sampling techniques.
These techniques, while quite powerful, may lead to a significant increase
in the size of the sample with the increase of the estimation confidence level and, thus, lead
to a significant increase of the estimation process. To the best of our knowledge there is no
available algorithm other than a sampling process for such a purpose.

In this talk, I will introduce the first algorithm to estimate the number of frequent item sets
with respect to minimal support on a given dataset without using a sampling technique.
Specifically, our approach can estimate the following problems related to frequent itemset mining:
1) Estimating the total number of frequent itemsets;
2) Estimating the number of frequent $k$-itemsets;
3) Estimating the largest size of a frequent itemset.
Our detailed experimental results have shown the accuracy and efficiency of our proposed approach.

Contact

Srikanta Bedathur
0681-9325-520
--email hidden
passcode not visible
logged in users only

Srikanta Bedathur, 08/20/2007 13:01 -- Created document.