Estimating the number of frequent data item sets for minimal support
$\alpha$ in a large dataset is of great interest from both theoretical and practical
prospectives. However, finding not only the number of frequent item sets, but
even the number of maximally frequent itemsets, is \#P-complete.
In the meantime, a procedure for estimating the number of frequent itemsets
before running any algorithm to enumerate them can be an important tool to facilitate
the mining process and estimate its complexity.
However, all algorithms known to date that estimate the number of frequent data item sets were using sampling techniques.
These techniques, while quite powerful, may lead to a significant increase
in the size of the sample with the increase of the estimation confidence level and, thus, lead
to a significant increase of the estimation process. To the best of our knowledge there is no
available algorithm other than a sampling process for such a purpose.
In this talk, I will introduce the first algorithm to estimate the number of frequent item sets
with respect to minimal support on a given dataset without using a sampling technique.
Specifically, our approach can estimate the following problems related to frequent itemset mining:
1) Estimating the total number of frequent itemsets;
2) Estimating the number of frequent $k$-itemsets;
3) Estimating the largest size of a frequent itemset.
Our detailed experimental results have shown the accuracy and efficiency of our proposed approach.