MPI-INF Logo
Campus Event Calendar

Event Entry

What and Who

From Text to Geographic Space: Toponym Resolution in Text

Jochen Leidner
School of Informatics, University of Edinburgh
Talk
AG 1, AG 2, AG 3, AG 4, AG 5  
AG Audience

Date, Time and Location

Wednesday, 9 November 2005
10:00
60 Minutes
46.1 - MPII
024
Saarbrücken

Abstract

 From Text to Geographic Space: Toponym Resolution in Text
      =========================================================

    Jochen Leidner
School of Informatics
       University of Edinburgh

Traditionally, Named entity recognition and classification (NERC)
comprises the sub-tasks of identifying a text span and classifying it,
but this view ignores the relationship between the entities and the
world. Spatial and temporal entities ground events in space-time, and
this relationship is vital for applications such as question
answering, event tracking, and automatic map visualisation. There is
much recent work regarding the temporal dimension (Setzer and
Gaizauskas, 2002), but no detailed study of the spatial dimension.
propose to investigate how spatial named entities (which are often
referentially ambiguous) can be automatically resolved with respect to
an extensional coordinate model ('toponym resolution'; Leidner 2004).
    To this end, various information sources including linguistic cue
patterns, co-occurrence information, discourse/positional information,
world knowledge (such as size and population) as well as minimality
heuristics (Leidner et al., 2003) can be combined.
    However, to embark in a comparative study of algorithms
proposed in the past, we argue it is necessary to curate a reference
resource for evaluating these methods. In partial analogy to the Word
Sense Disambiguation (WSD) task, such a resource comprises a static
gazetteer (geographic thesaurus) snapshot and a textual corpus in which
LOCATIONs are marked up as such, and enriched with latitude/longitude
information.
    I report on the curation of the first reference corpus for the
toponym resolution task (Leidner, 2004; Leidner, in press). In this
synchronic corpus of present-day written English news, toponyms are
marked up with geographic latitude/longitude coordinates. I briefly
describe the construction of the reference gazetteer, the XML-based
markup scheme TRML, the new Web-based annotation tool TAME, and the
resulting dataset.
    Then I give a sketch of the work ahead (including proposing a
new evaluation metric) and the big picture, which this research is
but a small part of: curators of Digital Libraries want to enable
their collections for geographic browsing, analysts (e.g. in
competitive marketing and intelligence) need maps to visualize events
in news, and we all want Web search engines to be location-aware so
as to be able to find the pizza takeaway that is closest to to
Stuhlsatzenhausweg.

http://www.iccs.informatics.ed.ac.uk/~s0239229/documents/publications.html

Contact

Thomas Neumann
518
--email hidden
passcode not visible
logged in users only

Petra Schaaf, 11/03/2005 08:32 -- Created document.