Technical, Research Report
@TechReport
Technischer-, Forschungsbericht


Show entries of:

this year (2019) | last year (2018) | two years ago (2017) | Notes URL

Action:

login to update

Options:




Library Locked Library locked





Author, Editor

Author(s):

Gemulla, Rainer
Haas, Peter J.
Nijkamp, Erik
Sismanis, Yannis

dblp
dblp
dblp
dblp

Not MPG Author(s):

Haas, Peter J.
Nijkamp, Erik
Sismanis, Yannis

Editor(s):





BibTeX Citekey*:

gemulla11

Language:

English

Title, Institution

Title*:

Large-scale matrix factorization with distributed stochastic gradient descent

Institution*:

IBM Almaden Research Center

Publishers or Institutions Address*:

San Jose, CA

Type:

Technical Report

No, Year, pp.,

Number*:

RJ10481

Pages*:

47

Month:

March

VG Wort
Pages*:


Year*:

2011

ISBN/ISSN:






DOI:




Note, Abstract, ©

Note:


(LaTeX) Abstract:

As Web 2.0 and enterprise-cloud applications have proliferated, data mining
algorithms increasingly need to be (re)designed to handle web-scale
datasets. For this reason, low-rank matrix factorization has received a lot
of attention in recent years, since it is fundamental to a variety of mining
tasks, such as topic detection and collaborative filtering, that are
increasingly being applied to massive datasets. We provide a novel algorithm
to approximately factor large matrices with millions of rows, millions of
columns, and billions of nonzero elements. Our approach rests on stochastic
gradient descent (SGD), an iterative stochastic optimization algorithm; the
idea is to exploit the special structure of the matrix factorization problem
to develop a new ``stratified'' SGD variant that can be fully distributed
and run on web-scale datasets using, e.g., MapReduce. The resulting
distributed SGD factorization algorithm, called DSGD, provides good speed-up
and handles a wide variety of matrix factorizations. We establish
convergence properties of DSGD using results from stochastic approximation
theory and regenerative process theory, and also describe the practical
techniques used to optimize performance in our DSGD
implementation. Experiments suggest that DSGD converges significantly faster
and has better scalability properties than alternative algorithms.

Categories / Keywords:


Copyright Message:


HyperLinks / References / URLs:

http://www.almaden.ibm.com/cs/people/peterh/dsgdTechRep.pdf

Personal Comments:


File Upload:


Download
Access Level:

Public

Correlation

MPG Unit:

Max-Planck-Institut für Informatik



MPG Subunit:

Databases and Information Systems Group

Appearance:

MPII WWW Server, MPII FTP Server, MPG publications list, university publications list, working group publication list, Fachbeirat, VG Wort


BibTeX Entry:
@TECHREPORT{gemulla11,
AUTHOR = {Gemulla, Rainer and Haas, Peter J. and Nijkamp, Erik and Sismanis, Yannis},
TITLE = {Large-scale matrix factorization with distributed stochastic gradient descent},
YEAR = {2011},
TYPE = {Technical Report},
INSTITUTION = {IBM Almaden Research Center},
NUMBER = {RJ10481},
PAGES = {47},
ADDRESS = {San Jose, CA},
MONTH = {March},
}


Entry last modified by Anja Becker, 03/22/2012
Show details for Edit History (please click the blue arrow to see the details)Edit History (please click the blue arrow to see the details)
Hide details for Edit History (please click the blue arrow to see the details)Edit History (please click the blue arrow to see the details)

Editor(s)
[Library]
Created
03/08/2011 01:30:17 PM
Revisions
6.
5.
4.
3.
2.
Editor(s)
Anja Becker
Anja Becker
Anja Becker
Rainer Gemulla
Rainer Gemulla
Edit Dates
22.03.2012 14:19:15
15.03.2012 16:28:44
12.03.2012 11:12:35
01/08/2012 01:16:40 PM
03/08/2011 01:46:05 PM
Show details for Attachment SectionAttachment Section
Hide details for Attachment SectionAttachment Section

View attachments here:


File Attachment Icon
dsgdTechRep.pdf