on the web. XPATH and XQUERY are standard query languages proposed for
querying XML data. These query languages have tree pattern queries as
basic building blocks. Finding a match for a given tree pattern on a set of
XML documents efficiently is an important operation. There have been
efforts to process the twig queries holistically(without breaking)
by converting both XML data tree and query twig into sequences and
performing a non-contiguous subsequence match of the query sequence on
data sequence to get the results. The existing sequence based approaches
suffer from the problem of generating duplicate results and false positives.
All of them require post-processing to get the valid results after
subsequence matching is done. The number of disk I/O's incurred during
subsequence match is high as it produces all the subsequence matches, of
which most are invalidated in post-processing phase.
To address the above issues, we propose a new way of converting XML
data into a sequence called "Modified prufer Sequence", which is based
on a variation of the Prufer sequences of graph theory. We also
propose an improved pattern matching algorithm that processes twig
queries holistically and produces results without requiring any post-
processing. In our approach validation is performed while performing
subsequence match and our results are free of false positives and
duplicates. We also present several optimization mechanisms including
parent-child optimization to speedup the query processing. Experimental
results show that our system performs better than the existing system.