Campus Event Calendar: String and Tree Pattern Generalization for n-Ary Information

Campus Event Calendar

Campus Event Calendar:
- All Upcoming:
  - only for D1
  - only for D2
  - only for INET
  - only for D4
  - only for D5
  - only for D6
  - only for RG1
  - Mailing Lists
  - by Speaker
  - by Type
  - by Category
  - by Title
  - Calendar
  - RSS Feed
- History of Events:

Event Entry

What and Who

Benjamin Habegger

String and Tree Pattern Generalization for n-Ary Information Extraction from the Web.

Max-Planck-Institut für Informatik - AG 5

Talk

AG 1, AG 2, AG 3, AG 4, AG 5

AG Audience

Note: We use this to send email in the morning.

Date, Time and Location

Wednesday, 7 September 2005

14:00

90 Minutes

46.1 - MPII

024

Saarbrücken

Abstract

Currently, data from online sources is given in a presentational format (HTML) which makes is difficult to use them in an automated
process. The problem of information extraction from the Web consists in building patterns based on presentational clues allowing to extract information for a specific task and from a specific sources of information. The approach to information extraction we take is to use machinge learning techniques to build extraction patterns. While the problem of unary extraction (ie. learning patterns allowing to extract lists of single item) has been highly studied, few works consider the problem of n-ary extraction (ie. extracting tuples of items). In this talk we will present pattern generalization for n-ary information extraction from the Web. HTML documents can be considered both as string or as trees. In a first part, a string-based approach to pattern generalization will be presented. It is based on the extraction the contexts of the desired information an their generalization into patterns. With few examples and without decomposing the examples this method allows to direclty build n-ary patterns. A thorough evaluation of the application of this technique to different Web sources has been lead showing its efficiency on real-world sources. In a second part, our currently ongoing work on tree-pattern generalization will be presented.

Contact

Jens Graupmann

--email hidden

System used:

Meeting URL:

Meeting ID:

Passcode:

passcode not visible

Code Visible for:

logged in users only

Adriana Davidescu, 09/01/2005 09:35
Adriana Davidescu, 08/23/2005 12:14 -- Created document.

Imprint / Impressum | Data Protection / Datenschutzhinweis