Publication Details

HTML Document Analysis for Information Extraction

BURGET Radek. HTML Document Analysis for Information Extraction. In: Proceedings of 8th EEICT conference. Brno: Faculty of Information Technology BUT, 2002, pp. 426-430. ISBN 80-214-2116-9.
Czech title
Analýza HTML dokumentů pro extrakci informace
Type
conference paper
Language
english
Authors
Keywords

HTML Analysis, Information Extraction

Abstract

The today's World Wide Web contains a vast amount of information stored in HTML documents. However, the HTML language primarily describes the look of the documents and it doesn't contain facilities for the description of contained data structure. In this paper we propose a model of a Web site that describes logical structure of contained data. Furthermore, we propose methods for creating such a model by analyzing the look and the structure of HTML documents.

Published
2002
Pages
426-430
Proceedings
Proceedings of 8th EEICT conference
Conference
ELECTRICAL ENGINEERING, INFORMATION AND COMMUNICATION TECHNOLOGIES 2002, Brno, CZ
ISBN
80-214-2116-9
Publisher
Faculty of Information Technology BUT
Place
Brno, CZ
BibTeX
@INPROCEEDINGS{FITPUB6921,
   author = "Radek Burget",
   title = "HTML Document Analysis for Information Extraction",
   pages = "426--430",
   booktitle = "Proceedings of 8th EEICT conference",
   year = 2002,
   location = "Brno, CZ",
   publisher = "Faculty of Information Technology BUT",
   ISBN = "80-214-2116-9",
   language = "english",
   url = "https://www.fit.vut.cz/research/publication/6921"
}
Back to top