Max-Planck-Institut für Informatik
max planck institut
informatik
mpii logo Minerva of the Max Planck Society
 

MPI-I-2009-5-003

The RDF-3X engine for scalable management of RDF data

Neumann, Thomas and Weikum, Gerhard

MPI-I-2009-5-003. March 2009, 53 S. pages. | Status: submitted to printer - 1st edition | Next --> Entry | Previous <-- Entry

Abstract in LaTeX format:
RDF is a data model for schema-free structured information that is gaining
momentum in the context of Semantic-Web data, life sciences, and also Web 2.0
platforms. The ``pay-as-you-go'' nature of RDF and the flexible pattern-
matching capabilities of its query language SPARQL entail efficiency and
scalability challenges for complex queries including long join paths. This
paper presents the RDF-3X engine, an implementation of SPARQL that achieves
excellent performance by pursuing a RISC-style architecture with streamlined
indexing and query processing.

The physical design is identical for all RDF-3X databases regardless of their
workloads, and completely eliminates the need for index tuning by exhaustive
indexes for all permutations of subject-property-object triples and their
binary and unary projections. These indexes are highly compressed, and the
query processor can aggressively leverage fast merge joins with excellent
performance of processor caches. The query optimizer is able to choose optimal
join orders even for complex queries, with a cost model that includes
statistical synopses for entire join paths. Although RDF-3X is optimized for
queries, it also provides good support for efficient online updates by means of
a staging architecture: direct updates to the main database indexes are
deferred, and instead applied to compact differential indexes which are later
merged into the main indexes in a batched manner.

Experimental studies with several large-scale datasets with more than 50
million RDF triples and benchmark queries that include pattern matching,
manyway star-joins, and long path-joins demonstrate that RDF-3X can outperform
the previously best alternatives by one or two orders of magnitude.
Acknowledgement:
References to related material:

To download this research report, please select the type of document that fits best your needs.Attachement Size(s):
MPI-I-2009-5-003_neu.pdf369 KBytes
Please note: If you don't have a viewer for PostScript on your platform, try to install GhostScript and GhostView
URL to this document: http://domino.mpi-inf.mpg.de/internet/reports.nsf/NumberView/2009-5-003
Hide details for BibTeXBibTeX
@TECHREPORT{NeumannWeikum,
  AUTHOR = {Neumann, Thomas and Weikum, Gerhard},
  TITLE = {The RDF-3X engine for scalable management of RDF data},
  TYPE = {Research Report},
  INSTITUTION = {Max-Planck-Institut f{\"u}r Informatik},
  ADDRESS = {Stuhlsatzenhausweg 85, 66123 Saarbr{\"u}cken, Germany},
  NUMBER = {MPI-I-2009-5-003},
  MONTH = {March},
  YEAR = {2009},
  ISSN = {0946-011X},
}