Electronic Proceedings Article
@InProceedings
Internet-Beitrag in Tagungsband, Workshop


Show entries of:

this year (2017) | last year (2016) | two years ago (2015) | Notes URL

Action:

login to update

Options:








Author, Editor

Author(s):

Theobald, Martin
Schenkel, Ralf
Weikum, Gerhard

dblp
dblp
dblp



Editor(s):

Christophides, Vassilis
Freire, Juliana

dblp
dblp

Not MPII Editor(s):

Christophides, Vassilis
Freire, Juliana

BibTeX cite key*:

TheobaldSW03a

Title, Conference

Title*:

Exploiting Structure, Annotation, and Ontological Knowledge for Automatic Classification of XML Data

Booktitle*:

6th International Workshop on the Web and Databases (WebDB-03)

Event Address*:

San Diego, USA

URL of the conference:

http://www.cse.ogi.edu/webdb03/

Event Date*:
(no longer used):

June 12-13, 2003

URL for downloading the paper:

http://www.cse.ogi.edu/webdb03/papers/01.pdf

Event Start Date:

12 June 2003

Event End Date:

13 June 2003

Language:

English

Organization:


Publisher

Publisher's Name:

OGI School of Science and Engineering / CSE

Publisher's URL:


Address*:

Beaverton, USA

Type:


Vol, No, pp., Year

Series:


Volume:


Number:


Month:


Pages:

1-6



Sequence Number:


Year*:

2003

ISBN/ISSN:






Abstract, Links, ©

URL for Reference:


Note:

Acceptance ratio 1:4

(LaTeX) Abstract:

This paper investigates how to automatically classify non-schematic XML data into a user-defined topic directory. The main focus is on constructing appropriate feature spaces on which a classifier operates. In addition to the usual text-based term frequency vectors, we study XML twigs and tag paths as extended features that can be combined with text term occurrences in XML elements. Moreover, we show how to leverage ontological background information, more specifically, the WordNet thesaurus, for the construction of more expressive feature spaces. For efficiency our implementation computes features incrementally and caches ontology entries. Our experiments demonstrate the improved accuracy of automatic classification based on the enhanced feature spaces.

URL for the Abstract:




Tags, Categories, Keywords:

Classification, XML, Semistructured Data, Focused Crawling, SVM

HyperLinks / References / URLs:


Copyright Message:


Personal Comments:


Download
Access Level:

Public

Correlation

MPG Unit:

Max-Planck-Institut für Informatik



MPG Subunit:

AG5

Audience:

popular

Appearance:

MPII WWW Server, MPII FTP Server, MPG publications list, university publications list, working group publication list, Fachbeirat

BibTeX Entry:
@INPROCEEDINGS{TheobaldSW03a,
AUTHOR = {Theobald, Martin and Schenkel, Ralf and Weikum, Gerhard},
EDITOR = {Christophides, Vassilis and Freire, Juliana},
TITLE = {Exploiting Structure, Annotation, and Ontological Knowledge for Automatic Classification of {XML} Data},
BOOKTITLE = {6th International Workshop on the Web and Databases (WebDB-03)},
PUBLISHER = {OGI School of Science and Engineering / CSE},
YEAR = {2003},
PAGES = {1--6},
ADDRESS = {San Diego, USA},
NOTE = {Acceptance ratio 1:4},
}


Entry last modified by Ralf Schenkel, 03/29/2005
Show details for Edit History (please click the blue arrow to see the details)Edit History (please click the blue arrow to see the details)
Hide details for Edit History (please click the blue arrow to see the details)Edit History (please click the blue arrow to see the details)

Editor(s)
Ralf Schenkel
Created
11/20/2003 03:02:32 PM
Revisions
11.
10.
9.
8.
7.
Editor(s)
Ralf Schenkel
Ralf Schenkel
Uwe Brahm
Anja Becker
Anja Becker
Edit Dates
29.03.2005 16:11:04
01.10.2004 16:07:12
08/18/2004 04:18:51 PM
12.07.2004 14:22:58
23.06.2004 12:34:15
Show details for Attachment SectionAttachment Section
Hide details for Attachment SectionAttachment Section