MPI-INF Logo
Campus Event Calendar

Event Entry

What and Who

Text + Time Search & Analytics

Klaus Berberich
Max-Planck-Institut für Informatik - D5
Joint MPI-INF/MPI-SWS Lecture Series
AG 1, AG 2, AG 3, AG 4, AG 5, SWS, RG1, MMCI  
MPI Audience
English

Date, Time and Location

Wednesday, 5 December 2012
12:15
60 Minutes
E1 5
002
Saarbrücken

Abstract

Nowadays, both archives of born-digital documents (e.g., snapshots of
web pages from a decade ago) as well as archives of now-digital
documents (e.g., scanned books published centuries ago) exist. Along
with their text content, documents in these archives carry temporal
information in the form of publication timestamps and contained
temporal expressions.

Search on this data, as one side, has many facets and my talk will
cover two of them: (i) Time-travel text search lets users search a
collection of versioned documents "as of" a given time in the past. I
will show how this can be done efficiently by using the right index
structures. (ii) Temporal expressions are key to satisfy users'
temporal information needs (e.g., to find out about 16th century
artists). I will explain how statistical language models, as the state
of the art in information retrieval, can be made aware of temporal
expressions and their inherent semantics.

Analyses of this data, as the other side, can lead to interesting
insights. One can find out, for instance, whether "the lord of the
rings" or "lord of the flies" is mentioned more often and how the
usage of these two n-grams evolved over time. I will talk about how
n-gram (time series) statistics can be computed at scale using
MapReduce.

The talk will wind down with a collection of challenges that I
consider interesting for future research.

Contact

Jennifer Müller
2900
--email hidden
passcode not visible
logged in users only

Jennifer Müller, 11/16/2012 11:37
Jennifer Müller, 09/27/2012 10:07 -- Created document.