MPI-INF Logo
MPI-INF/SWS Research Reports 1991-2021

2. Number - only D5

MPI-I-2008-5-004

SOFIE: a self-organizing framework for information extraction

Suchanek, Fabian and Sozio, Mauro and Weikum, Gerhard

November 2008, 49 pages.

.
Status: available - back from printing

This paper presents SOFIE, a system for automated ontology extension. SOFIE can parse natural language documents, extract ontological facts from them and link the facts into an ontology. SOFIE uses logical reasoning on the existing knowledge and on the new knowledge in order to disambiguate words to their most probable meaning, to reason on the meaning of text patterns and to take into account world knowledge axioms. This allows SOFIE to check the plausibility of hypotheses and to avoid inconsistencies with the ontology. The framework of SOFIE unites the paradigms of pattern matching, word sense disambiguation and ontological reasoning in one unified model. Our experiments show that SOFIE delivers near-perfect output, even from unstructured Internet documents.

  • MPI-I-2008-5-004.pdf
  • Attachement: MPI-I-2008-5-004.pdf (410 KBytes)

URL to this document: https://domino.mpi-inf.mpg.de/internet/reports.nsf/NumberView/2008-5-004

Hide details for BibTeXBibTeX
@TECHREPORT{SuchanekMauroWeikum2008,
  AUTHOR = {Suchanek, Fabian and Sozio, Mauro and Weikum, Gerhard},
  TITLE = {{SOFIE}: a self-organizing framework for information extraction},
  TYPE = {Research Report},
  INSTITUTION = {Max-Planck-Institut f{\"u}r Informatik},
  ADDRESS = {Stuhlsatzenhausweg 85, 66123 Saarbr{\"u}cken, Germany},
  NUMBER = {MPI-I-2008-5-004},
  MONTH = {November},
  YEAR = {2008},
  ISSN = {0946-011X},
}