Democratic Community-based Search with XML Full-Text Queries

Emiran Curtmola
UC San Diego
AG 5  
Tuesday, 9 June 2009
60 Minutes
E1 4


As the web evolves, it is becoming easier to form communities based on
shared interests, and to create and publish data on a wide variety of
topics. With this democratization of information creation comes the
natural desire to query the global collection that is the union of all
local data collections of others within the community. In order to
fully deliver on the promise of free data exchange, any
community-supporting infrastructure needs to enforce the key
requirement of being resistant to censorship by third parties.
Censorship resistance precludes some obvious approaches that reuse and
build on existing centralized technologies, e.g., search engines, hosted
online communities, etc. The talk introduces a distributed framework to
disseminate queries in online communities which facilitates
democratization of publishing and efficient search with powerful
full-text queries.

We address two challenging issues. Given the virtual nature of the
global data collection, we first study the problem of efficiently
locating publishers in the community that contain data items matching
a user query. We propose a novel distributed infrastructure in which
data resides only with the publishers owning it. The infrastructure
disseminates user queries to publishers, who answer them at their own
discretion, under data-location anonymity constraints, i.e., prevent the
query forwarding infrastructure from leaking information about which
publishers are capable of answering a certain query.

Second, we study how publishers efficiently process incoming queries
over their local repositories. Given that the commonly used data model
for information exchange on the Web is semi-structured, we propose
algorithms for evaluation and optimization of expressive XML queries
(e.g., W3C XQuery Full-Text) that integrate structure and full-text search.


