MPI-INF Logo
Campus Event Calendar

Event Entry

New for: D3

What and Who

Distributed Text Analytics over Web Archives

Yagiz Kargin
International Max Planck Research School for Computer Science - IMPRS
IMPRS Research Seminar
AG 1, AG 3, AG 4, AG 5, SWS, RG1, MMCI  
Public Audience
English

Date, Time and Location

Thursday, 29 April 2010
13:00
60 Minutes
E1 4
024
Saarbrücken

Abstract

Web archives are similar to the history books, in which one can see what happened in the past. They accumulate and store valuable information with the corresponding timestamps. Hence, temporal Mining and analytics of this data is required to understand how things were done and recorded in a specific time in the past. However, as history enlarges by the time passing, web archives also become increasingly large-scale. Distributed approaches can help in this case. Our solution is to do the temporal analytics on query results over web archives using MapReduce, a programming model in which the programs are automatically parallelized. For us to have distributed analytics over query results, we should have the indexed data distributed into machines before hand, which is distributed indexing.

Contact

Jennifer Gerling
225
--email hidden
passcode not visible
logged in users only

Jennifer Gerling, 04/28/2010 09:53
Jennifer Gerling, 04/28/2010 09:50 -- Created document.