MPI-I-2009-5-005
Towards a Universal Wordnet by learning from combined evidenc
de Melo, Gerard and Weikum, Gerhard
December 2009, 32 pages.
.
Status: available - back from printing
Lexical databases are invaluable sources of knowledge about words and
their meanings,
with numerous applications in areas like NLP, IR, and AI.
We propose a methodology for the automatic construction of a large-scale
multilingual
lexical database where words of many languages are hierarchically
organized in terms of their
meanings and their semantic relations to other words. This resource is
bootstrapped from
WordNet, a well-known English-language resource. Our approach extends
WordNet with around
1.5 million meaning links for 800,000 words in over 200 languages,
drawing on evidence extracted
from a variety of resources including existing (monolingual) wordnets,
(mostly bilingual) translation
dictionaries, and parallel corpora.
Graph-based scoring functions and statistical learning techniques are
used to iteratively integrate
this information and build an output graph. Experiments show that this
wordnet has a high
level of precision and coverage, and that it can be useful in applied
tasks such as
cross-lingual text classification.
-
- Attachement: mpi-i-2009-5-005.pdf (717 KBytes)
URL to this document: https://domino.mpi-inf.mpg.de/internet/reports.nsf/NumberView/2009-5-005
BibTeX
@TECHREPORT{deMeloWeikum2009,
AUTHOR = {de Melo, Gerard and Weikum, Gerhard},
TITLE = {Towards a Universal Wordnet by learning from combined evidenc},
TYPE = {Research Report},
INSTITUTION = {Max-Planck-Institut f{\"u}r Informatik},
ADDRESS = {Stuhlsatzenhausweg 85, 66123 Saarbr{\"u}cken, Germany},
NUMBER = {MPI-I-2009-5-005},
MONTH = {December},
YEAR = {2009},
ISSN = {0946-011X},
}