MPI-INF Logo
Campus Event Calendar

Event Entry

What and Who

Efficient Main Memory-based XML Stream Processing

Stefanie Scherzinger
Fachrichtung Informatik - Saarbrücken
Vortrag
AG 5  
AG Audience
English

Date, Time and Location

Thursday, 23 August 2007
14:00
45 Minutes
E1 4
433
Saarbrücken

Abstract

XML data management is by no means restricted to XML databases, as
there are manifold applications that process XML documents as
files or streams. These applications are naturally main-memory based,
which makes main memory the bottleneck for  scalability.
The streaming evaluation of XPath queries  has been worked
on extensively in the past, and state-of-the-art techniques use
very little memory.

However, for data transformations with real-world XQueries, as opposed
to just XPath filtering, the need for substantial main memory buffers
cannot be avoided in general. An important task is thus to devise a
well-principled machinery for XQuery processing that is parsimonious
with resources and treats buffering as an optimization target.

We present a toolkit for effective buffer management in XQuery processors.
Our toolkit comprises three compatible buffer management techniques.

First, in the spirit of XML document projection, we only
load data that is relevant to query evaluation into main memory
buffers. We present a novel implementation that
takes string matching algorithms designed for efficient
keyword search in  flat strings into the second dimension,
to navigate in tree-structured data.
Different from existing prefiltering schemes, we usually process
only fractions of the input and get by with very economical
consumption of both main memory and processing time.

Second, we introduce an extension of the XQuery  language, called FluX,
that supports event-based query processing and the conscious handling
of main memory buffers. Purely event-based queries of this language can be
executed on streaming XML data in a very direct way. We then develop
an algorithm to efficiently rewrite XQueries into the event-based FluX
language. This algorithm uses order constraints from a DTD to schedule
event handlers and to thus  minimize the amount of buffering required
for evaluating a query.

Third, we purge buffers preemptively from data that is no longer relevant
to query evaluation. This is accomplished by extending garbage collection
techniques originally developed for programming languages to operate on
tree-structured data. By the combined effort of static and dynamic analysis,
we timely purge buffers and effectively reduce their size.

Our prototype implementations and extensive experiments demonstrate the
efficiency of our techniques.


To compare our contributions to related work in a systematic manner, we
introduce an abstract framework for XML stream processing. Our model
captures XML stream tokens as well as the current buffer contents, and lets
us model different architectures for XQuery evaluation. Using our framework,
we show the interoperability of our tools with existing XQuery optimization
techniques.

Contact

Petra Schaaf
500
--email hidden
passcode not visible
logged in users only

Petra Schaaf, 07/23/2007 13:16 -- Created document.