Information retrieval is the process of helping users find, use, and understand information in a given document collection in order to satisfy an information need. On the web, conventional information retrieval techniques do not work as well as for conventional document collections, because of the large size of the web and the high degree of variety in document quality.
We discuss two information retrieval problems on the web and present novel solutions. First, we discuss the ranking problem, namely, the problem of ordering the web pages that are retrieved by a search engine by decreasing order of relevance to the user query. Second, we discuss the similarity problem, namely, the problem of finding web pages that are similar to a
given page. For both problems we present new algorithms, which for the first time combine connectivity analysis with content analysis. Connectivity analysis, which uses information about the hyperlink structure of the web, is based on graph algorithms; content analysis, which uses information about the contents of web pages, is based on conventional information retrieval techniques. According to a user study, our ranking algorithm increases the precision at 10 (i.e., the number of relevant pages within the first 10 pages) by 45% over the algorithms
currently in use.
über das Thema:
Information Retrieval on the World Wide Web
Abstract:
Information retrieval is the process of helping users find, use, and understand information in a given document collection in order to satisfy an information need. On the web, conventional information retrieval techniques do not work as well as for conventional document collections, because of the large size of the web and the high degree of variety in document quality.
We discuss two information retrieval problems on the web and present novel solutions. First, we discuss the ranking problem, namely, the problem of ordering the web pages that are retrieved by a search engine by decreasing order of relevance to the user query. Second, we discuss the similarity problem, namely, the problem of finding web pages that are similar to a
given page. For both problems we present new algorithms, which for the first time combine connectivity analysis with content analysis. Connectivity analysis, which uses information about the hyperlink structure of the web, is based on graph algorithms; content analysis, which uses information about the contents of web pages, is based on conventional information retrieval techniques. According to a user study, our ranking algorithm increases the precision at 10 (i.e., the number of relevant pages within the first 10 pages) by 45% over the algorithms
currently in use.
Interessenten/innen sind zum Vortrag herzlich eingeladen.