Journal Article
@Article
Artikel in Fachzeitschrift


Show entries of:

this year (2019) | last year (2018) | two years ago (2017) | Notes URL

Action:

login to update

Options:








Author, Editor(s)

Author(s):

Weikum, Gerhard

dblp



BibTeX cite key*:

Weikum00b

Title

Title*:

Review - XTRACT: A System for Extracting Document Type Descriptors from XML Documents

Journal

Journal Title*:

ACM SIGMOD Digital Review

Journal's URL:

http://www.eecs.umich.edu/digital-review/

Download URL
for the article:


Language:

English

Publisher

Publisher's
Name:

ACM

Publisher's URL:


Publisher's
Address:

New York, USA

ISSN:

-

Vol, No, pp, Date

Volume*:

2

Number:


Publishing Date:

2000

Pages*:

-

Number of
VG Pages:


Page Start:


Page End:


Sequence Number:


DOI:


Note, Abstract, ©

Note:


(LaTeX) Abstract:

The paper describes the architecture of XTRACT, a system for inferring an accurate, meaningful, near optimal DTD schema for a repository of XML documents. The paper presents some very interesting ideas on an important and challenging subject.

The XTRACT system executes three steps:
1. Generalization (finding patterns in the input sequences and replacing them with regular expressions to generate general candidate DTDs)
2. Factoring (factoring candidate DTDs using adaptions of algorithms for the optimization of Boolean functions)
3. applying MDL principle (applying the Minimum Description Length principle to find the near optimal DTD among the candidates).

The authors provide experimental results in comparison with DDbE (Data Description by Example generated by IBM alphaworks(R))

The paper's key contribution lies in applying the MDL principle for defining an information-theoretic measure to quantify and resolve the tradeoff between the conciseness and precision of DTDs. This is indeed a reasonable and intriguing first cut on this difficult problem, but I am not fully convinced that this should be the bottom line. It could well be that conciseness by general regular expressions may reduce the readability and intuitiveness of a DTD. But this paper should be an excellent starting point for more intensive work along these lines.


URL for the Abstract:


Categories,
Keywords:


HyperLinks / References / URLs:


Copyright Message:


Personal Comments:


Download
Access Level:

Public

Correlation

MPG Unit:

Max-Planck-Institut für Informatik



MPG Subunit:

Databases and Information Systems Group

Audience:

Expert

Appearance:

MPII WWW Server, MPII FTP Server, MPG publications list, university publications list, working group publication list, Fachbeirat, VG Wort


BibTeX Entry:

@ARTICLE{Weikum00b,
AUTHOR = {Weikum, Gerhard},
TITLE = {Review - {XTRACT}: A System for Extracting Document Type Descriptors from XML Documents},
JOURNAL = {ACM SIGMOD Digital Review},
PUBLISHER = {ACM},
YEAR = {2000},
VOLUME = {2},
PAGES = {--},
ADDRESS = {New York, USA},
ISBN = {-},
}


Entry last modified by Adriana Davidescu, 03/28/2006
Show details for Edit History (please click the blue arrow to see the details)Edit History (please click the blue arrow to see the details)
Hide details for Edit History (please click the blue arrow to see the details)Edit History (please click the blue arrow to see the details)

Editor(s)
Adriana Davidescu
Created
03/28/2006 02:09:47 PM
Revision
0.



Editor
Adriana Davidescu



Edit Date
28.03.2006 14:15:02



Show details for Attachment SectionAttachment Section
Hide details for Attachment SectionAttachment Section