Max-Planck-Institut für Informatik
max planck institut
mpii logo Minerva of the Max Planck Society

MPI-INF D5 Publications :: Thesis :: Yadava, Prashant

MPI-INF D5 Publications
Show all entries of:this year (2017)last year (2016)two years ago (2015)Open in Notes
Action:login to update

Thesis - Master's thesis | @MastersThesis | Masterarbeit

Author(s)*:Yadava, Prashant
BibTeX citekey*:Yadava2012

Title, School
Title*:Boolean matrix factorization with missing values
School:Universität des Saarlandes
Type of Thesis*:Master's thesis

Publishers Name:Universität des Saarlandes
Publishers Address:Saarbrücken

Note, Abstract, Copyright
LaTeX Abstract:Is it possible to meaningfully analyze the structure of a Boolean matrix for which 99% data is missing?

Real-life data sets usually contain a high percentage of missing values which hamper structure estimation from the data and the difficulty only increases when the missing values dominate the known elements in the data set. There are good real-valued factorization methods for such scenarios, but there exist another class of data "Boolean data", which demand a different handling strategy than their real-valued counterpart.
There are many application which find logical representation only via Boolean matrices, where real-valued factorization methods do not provide correct and intuitive solutions.
Currently, there exists no method which can factorize a Boolean matrix containing a percentage of missing values usually associated with non-trivial real-world data set. In this thesis, we introduce a method to fill this gap. Our method is based on the correlation among the data records and is not restricted by the percentage of unknowns in the matrix. It performs greedy selection of the basis vectors, which represent the underlying
structure in the data.
This thesis also presents several experiments on a variety of synthetic and real-world data, and discusses the performance of the algorithm for a range of data properties.
However, it was not easy to obtain comparison statistics with existing methods, for the reason that none exist. Hence we present indirect comparisons with existing matrix completion methods which work with real-valued data sets.

Keywords:Boolean Matrix Factorization, missing values, big data, matrix factorization, association rule mining
Download Access Level:Public
Download File(s):View attachments here:

Referees, Status, Dates
1. Referee:Pauli Miettinen
2. Referee:Gerhard Weikum
Date Kolloquium:22 November 2012

MPG Unit:Max-Planck-Institut für Informatik
MPG Subunit:Databases and Information Systems Group
Appearance:MPII WWW Server, MPII FTP Server, MPG publications list, university publications list, working group publication list, Fachbeirat

BibTeX Entry:
AUTHOR = {Yadava, Prashant},
TITLE = {Boolean matrix factorization with missing values},
PUBLISHER = {Universität des Saarlandes},
SCHOOL = {Universit{\"a}t des Saarlandes},
YEAR = {2012},
TYPE = {Master's thesis}
ADDRESS = {Saarbr{\"u}cken},
MONTH = {November},

Hide details for Attachment SectionAttachment Section
View attachments here:

Entry last modified by Prashant Yadava, 04/03/2013
Hide details for Edit History (please click the blue arrow to see the details)Edit History (please click the blue arrow to see the details)

Andrea Ruffing
11/05/2012 10:20:17 AM
Prashant Yadava
Prashant Yadava
Prashant Yadava
Petra Schaaf
Petra Schaaf
Edit Dates
04/03/2013 03:13:15 PM
11/30/2012 12:35:08 AM
11/29/2012 11:10:41 PM
22.11.2012 11:37:49
22.11.2012 11:34:17