MPI-INF Logo
MPI-INF/SWS Research Reports 1991-2021

2. Number - only D5

MPI-I-2009-5-005

Towards a Universal Wordnet by learning from combined evidenc

de Melo, Gerard and Weikum, Gerhard

December 2009, 32 pages.

.
Status: available - back from printing

Lexical databases are invaluable sources of knowledge about words and their meanings, with numerous applications in areas like NLP, IR, and AI. We propose a methodology for the automatic construction of a large-scale multilingual lexical database where words of many languages are hierarchically organized in terms of their meanings and their semantic relations to other words. This resource is bootstrapped from WordNet, a well-known English-language resource. Our approach extends WordNet with around 1.5 million meaning links for 800,000 words in over 200 languages, drawing on evidence extracted from a variety of resources including existing (monolingual) wordnets, (mostly bilingual) translation dictionaries, and parallel corpora. Graph-based scoring functions and statistical learning techniques are used to iteratively integrate this information and build an output graph. Experiments show that this wordnet has a high level of precision and coverage, and that it can be useful in applied tasks such as cross-lingual text classification.

  • mpi-i-2009-5-005.pdf
  • Attachement: mpi-i-2009-5-005.pdf (717 KBytes)

URL to this document: https://domino.mpi-inf.mpg.de/internet/reports.nsf/NumberView/2009-5-005

Hide details for BibTeXBibTeX
@TECHREPORT{deMeloWeikum2009,
  AUTHOR = {de Melo, Gerard and Weikum, Gerhard},
  TITLE = {Towards a Universal Wordnet by learning from combined evidenc},
  TYPE = {Research Report},
  INSTITUTION = {Max-Planck-Institut f{\"u}r Informatik},
  ADDRESS = {Stuhlsatzenhausweg 85, 66123 Saarbr{\"u}cken, Germany},
  NUMBER = {MPI-I-2009-5-005},
  MONTH = {December},
  YEAR = {2009},
  ISSN = {0946-011X},
}