From Text to Geographic Space: Toponym Resolution in Text
=========================================================
Jochen Leidner
School of Informatics
University of Edinburgh
Traditionally, Named entity recognition and classification (NERC)
comprises the sub-tasks of identifying a text span and classifying it,
but this view ignores the relationship between the entities and the
world. Spatial and temporal entities ground events in space-time, and
this relationship is vital for applications such as question
answering, event tracking, and automatic map visualisation. There is
much recent work regarding the temporal dimension (Setzer and
Gaizauskas, 2002), but no detailed study of the spatial dimension.
propose to investigate how spatial named entities (which are often
referentially ambiguous) can be automatically resolved with respect to
an extensional coordinate model ('toponym resolution'; Leidner 2004).
To this end, various information sources including linguistic cue
patterns, co-occurrence information, discourse/positional information,
world knowledge (such as size and population) as well as minimality
heuristics (Leidner et al., 2003) can be combined.
However, to embark in a comparative study of algorithms
proposed in the past, we argue it is necessary to curate a reference
resource for evaluating these methods. In partial analogy to the Word
Sense Disambiguation (WSD) task, such a resource comprises a static
gazetteer (geographic thesaurus) snapshot and a textual corpus in which
LOCATIONs are marked up as such, and enriched with latitude/longitude
information.
I report on the curation of the first reference corpus for the
toponym resolution task (Leidner, 2004; Leidner, in press). In this
synchronic corpus of present-day written English news, toponyms are
marked up with geographic latitude/longitude coordinates. I briefly
describe the construction of the reference gazetteer, the XML-based
markup scheme TRML, the new Web-based annotation tool TAME, and the
resulting dataset.
Then I give a sketch of the work ahead (including proposing a
new evaluation metric) and the big picture, which this research is
but a small part of: curators of Digital Libraries want to enable
their collections for geographic browsing, analysts (e.g. in
competitive marketing and intelligence) need maps to visualize events
in news, and we all want Web search engines to be location-aware so
as to be able to find the pizza takeaway that is closest to to
Stuhlsatzenhausweg.