The envisioned Semantic Web aims to provide richly annotated
and explicitly structured Web pages in XML, RDF, or
description logics, based upon underlying ontologies and thesauri.
Ideally, this should enable a wealth of query processing
and semantic reasoning capabilities using XQuery and logical inference
engines.
However, I believe that the diversity and uncertainty of
terminologies and schema-like annotations will make precise
querying on a Web scale extremely elusive if not hopeless, and the
same argument holds for large-scale dynamic federations of
Deep Web sources.
Therefore, ontology-based reasoning and querying needs to be
enhanced by statistical means, leading to relevance-ranked lists as
query results.
This talk presents steps towards such a "statistically semantic" Web
and outlines technical challenges. I discuss how statistically
quantified ontological relations can be exploited in XML retrieval,
how statistics can help in making Web-scale search efficient, and
how statistical information extracted from users' query logs and click
streams
can be leveraged for better search result ranking.
I believe these are decisive issues for improving the quality of
next-generation
search engines for intranets, digital libraries, and the Web,
and they are crucial also for peer-to-peer collaborative Web search.